Berkeley MIDS | Sean M. Sica

My Berkeley MIDS work combines applied ML engineering and model evaluation, with projects focused on time-series modeling, sparse autoencoder-based interpretability, and retrieval-augmented generation systems.

Technical Contributions

Capstone (OvertakeAI)

For my capstone course (DATASCI 210, Summer 2025), I helped build OvertakeAI, a Formula 1 safety car prediction and telemetry analysis system.

Project page
Code deliverables
PyPI artifact

DATASCI 266 (NLP): SAE Paper

Co-authored Unveiling the Black Box: Causal Inference and Feature Analysis in Fine-Tuned Language Models Using Sparse Autoencoders, a controlled study comparing baseline vs. medically fine-tuned GPT-2 using SAE feature extraction, a semantic coherence metric, and vector-steering interventions. Key result: the baseline model showed stronger interpretability and steerability despite fewer medical features, with higher mean coherence (0.5438 vs 0.3511) and higher intervention pass rate (99% vs 83.7%).

Research page
Code

DATASCI 267 (GenAI): RAG Pipeline with LangGraph

Built and systematically evaluated a retrieval-augmented generation pipeline orchestrated with LangGraph for domain-specific Q&A. Results showed template and retriever design had greater performance impact than embedding model choice; the strongest configuration used multi-qa-mpnet-base-cos-v1 with chunk size 1024, overlap 100, and similarity retrieval at top-k=8.

Code
Paper