Sean M. Sica

At MITRE, I help lead software development for MITRE ATT&CK and contribute to research on Judy, the supercomputing platform powering the Federal AI Sandbox.

My recent research has focused on understanding how fine-tuning reshapes internal model representations. During my master’s work at Berkeley (2024), I used mechanistic interpretability methods to empirically characterize how fine-tuning corpora affect a model’s parametric knowledge. In 2025, I worked on a small research team training sparse autoencoders (SAEs) and experimenting with linear cross-layer probes to study feature drift and representation shift under fine-tuning.

That work produced:

A general-purpose Python library for mechanistic interpretability workflows, published internally at MITRE to reduce barriers to experimentation.
Exploratory experiments using linear probes as a lightweight alternative to crosscoders for tracking how features change during fine-tuning. While preliminary, this work deepened my interest in questions like whether features relocate across layers, collapse, or transform into new vector representations.

In parallel, I collaborated with Neuronpedia’s founder to help bring Neuronpedia into a zero-trust government environment. I contributed features enabling secure internal deployment and supported efforts that led to its open-sourcing.

I bring 5+ years of experience shipping production software across distributed infrastructure, APIs, open-source tooling, and Kubernetes-based systems.

I’m currently enrolled in the BlueDot Technical AI Safety Course and am seeking research engineering roles or fellowships in frontier AI labs or safety-focused research organizations.

I’m especially interested in research agendas that use mechanistic interpretability to inform or guide training¹²—for example, leveraging internal representations to shape objectives, reduce hallucinations, or monitor alignment-relevant behavior. I’m also closely following emerging work on evaluation awareness and model introspection³⁴⁵, particularly where it intersects with representation-level monitoring.