About

I’m a Research Engineer at Meta Superintelligence Labs where I work on safety and alignment evaluations for our frontier model effort. My primary research areas are in the faithfulness of chain-of-thought reasoning and scalable oversight. Before MSL, I was a Research Scientist on the SEAL team at Scale AI, a member of the NYU Alignment Research Group under Prof. Sam Bowman, and an early employee at Cohere. I earned a Bachelor’s in Machine Learning and Mathematics from Duke University.

Publications & Preprints

Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning
Miles Turpin, Andy Arditi, Marvin Li, Joe Benton, Julian Michael
ICML 2025, Workshop on Reliable and Responsible Foundation Models
[arXiv] [Twitter thread]

Looking Inward: Language Models Can Learn About Themselves by Introspection
Felix J Binder, James Chua, Tomek Korbak, Henry Sleight, John Hughes, Robert Long, Ethan Perez, Miles Turpin, Owain Evans
ICLR 2025
[arXiv]

Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, …, Yejin Choi, David Krueger
arXiv 2024
[arXiv] [Website] [Summary twitter thread] [Twitter Thread for Interpretability Section]

Co-authored section 3.4 (Tools for Interpreting or Explaining LLM Behavior Are Absent or Lack Faithfulness) with Peter Hase.

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
James Chua, Edward Rees, Hunar Batra, Samuel R. Bowman, Julian Michael, Ethan Perez, Miles Turpin
arXiv 2024
[arXiv] [Twitter thread] [Code]

Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman
NeurIPS 2023
[OpenReview] [Twitter thread] [Code]

A machine learning toolkit for genetic engineering attribution to facilitate biosecurity
Ethan C. Alley, Miles Turpin, Andrew Bo Liu, Taylor Kulp-McDowall, Jacob Swett, Rey Edison, Stephen E. Von Stetina, George M. Church & Kevin M. Esvelt
Nature Communications, 2020
[Paper] [Twitter thread] [Code]

Machine Learning Prediction of Surgical Intervention for Small Bowel Obstruction
Miles Turpin, Joshua Watson, Matthew Engelhard, Ricardo Henao, David Thompson, Lawrence Carin, Allan Kirk
medRxiv, 2021
[Preprint]

Blog Posts

Do models say what they learn?
Andy Arditi, Marvin Li, Joe Benton, Miles Turpin
LessWrong, 2025
[Link]

Reward hacking behavior can generalize across tasks
Kei Nishimura-Gasparian, Isaac Dunn, Henry Sleight, Miles Turpin, Evan Hubinger, Carson Denison, Ethan Perez
Alignment Forum, 2024 [Alignment Forum]

Precursor to Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models.

Past Projects

Scalable Hierarchical Bayesian Neural Networks via Factorization. During an internship at IBM Research in 2019, I worked with Dr. Soumya Ghosh on scaling hierarchical Bayesian modeling to Bayesian neural networks when dealing with very large numbers of groups.
Probabilistic Wave Function Collapse with Markov Random Fields. I used Markov Random Fields to create a generalized version of Wave Function Collapse algorithm for texture synthesis. This generalization enables the algorithm to handle continuous pixel values and model longer range dependencies.

Get in touch

milesaturpin at gmail.com