Berkeley MIDS
Graduate coursework projects spanning capstone engineering, interpretability research, and RAG system evaluation
My Berkeley MIDS work combines applied ML engineering and model evaluation, with projects focused on time-series modeling, sparse autoencoder-based interpretability, and retrieval-augmented generation systems.
Technical Contributions
Capstone (OvertakeAI)
For my capstone course (DATASCI 210, Summer 2025), I helped build OvertakeAI, a Formula 1 safety car prediction and telemetry analysis system.
DATASCI 266 (NLP): SAE Paper
Co-authored Unveiling the Black Box: Causal Inference and Feature Analysis in Fine-Tuned Language Models Using Sparse Autoencoders, a controlled study comparing baseline vs. medically fine-tuned GPT-2 using SAE feature extraction, a semantic coherence metric, and vector-steering interventions. Key result: the baseline model showed stronger interpretability and steerability despite fewer medical features, with higher mean coherence (0.5438 vs 0.3511) and higher intervention pass rate (99% vs 83.7%).
DATASCI 267 (GenAI): RAG Pipeline with LangGraph
Built and systematically evaluated a retrieval-augmented generation pipeline orchestrated with LangGraph for domain-specific Q&A. Results showed template and retriever design had greater performance impact than embedding model choice; the strongest configuration used multi-qa-mpnet-base-cos-v1 with chunk size 1024, overlap 100, and similarity retrieval at top-k=8.