Sung Min (Sam) Park

About

Hi! I am a postdoc at Stanford CS working with Profs. Tatsu Hashimoto, Percy Liang, and James Zou. I received my PhD from MIT, where I was advised by Prof. Aleksander Mądry.

I’m currently interested in understanding and improving machine learning (ML) methodology through the lens of data. Some questions I think about include:

How do we attribute model predictions back to training data?
How do we select the right data for a given task?
Can we derive insights about ML phenomena (e.g., scaling laws, emergence, in-context learning) through this lens?

I’m also more broadly interested in the science of machine learning/deep learning.

[News] (Sep, ‘24) I graudated! Thesis [link]

[News] (July, ‘24) I co-presented a tutorial at ICML ‘24 on Data Attribution at Scale: [video] [notes]

Bio

Previously at MIT, I worked on understanding statistical-computational tradeoffs in high-dimensional statistics with Prof. Guy Bresler for my SM thesis. Earlier during my PhD, I was supported by the MIT Akamai Presidential Fellowship and the Samsung Scholarship.

From 2016-18, I served in the Republic of Korea Army in the top signals intelligence unit as a researcher.

Prior to grad school, I received a BS in Computer Science from Cornell University (2011-14), where I was fortunate to work with Prof. Ramin Zabih and Prof. Bobby Kleinberg.

I have interned at Waymo, Dropbox, and Google.

Research

Selected Publications All Publications

**Attribute-to-Delete: Machine Unlearning via Datamodel Matching**\ Kristian Georgiev\*, Roy Rinberg\*, Sung Min Park*, Shivam Garg\*, Andrew Ilyas, Aleksander Mądry, Seth Neel \ ICLR 2025\ [[arxiv]](https://arxiv.org/abs/2410.23232) [[blog](https://t.co/QVgG2FlNmB)] **The Journey, Not the Destination: How Data Guides Diffusion Models**\ Kristian Georgiev\*, Josh Vendrow\*, Hadi Salman, Sung Min Park, Aleksander Mądry \ [[arxiv]](https://arxiv.org/abs/2312.06205) **TRAK: Attributing Model Behavior at Scale**\ Sung Min Park*, Kristian Georgiev\*, Andrew Ilyas\*, Guillaume Leclerc, Aleksander Mądry \ ICML 2023 (**Oral presentation**)\ [[arxiv]](https://arxiv.org/abs/2303.14186) [[blog](https://gradientscience.org/trak/)][[code](https://github.com/MadryLab/trak)] [[website]](https://trak.csail.mit.edu/)[[talk](https://icml.cc/virtual/2023/oral/25526)] **ModelDiff: A Framework for Comparing Learning Algorithms**\ Harshay Shah\*, Sung Min Park*, Andrew Ilyas\*, Aleksander Mądry \ ICML 2023\ [[arxiv]](https://arxiv.org/abs/2211.12491) [[blog](https://gradientscience.org/modeldiff/)][[code](https://github.com/MadryLab/modeldiff)] **FFCV: Accelerating Training by Removing Data Bottlenecks**\ Guillaume Leclerc, Andrew Ilyas, Logan Engstrom, Sung Min Park, Hadi Salman, Aleksander Mądry \ CVPR 2023\ [[code](https://github.com/libffcv/ffcv)] **A Data-Based Perspective on Transfer Learning**\ Saachi Jain\*, Hadi Salman\*, Alaa Khaddaj\*, Eric Wong, Sung Min Park, Aleksander Mądry\ CVPR 2023\ [[arxiv]](https://arxiv.org/abs/2207.05739) [[blog](https://gradientscience.org/data-transfer/)] **Datamodels: Predicting Predictions from Training Data**\ Andrew Ilyas\*, Sung Min Park*, Logan Engstrom\*, Guillaume Leclerc, Aleksander Mądry\ ICML 2022\ [[arxiv]](https://arxiv.org/abs/2202.00622) [blog [part 1](https://gradientscience.org/datamodels-1/) [part 2](https://gradientscience.org/datamodels-2/)] [[code](https://github.com/MadryLab/datamodels)][[data]](https://github.com/MadryLab/datamodels-data) **On Distinctive Properties of Universal Perturbations**\ Sung Min Park, Kuo-An Wei, Kai Xiao, Jerry Li, Aleksander Mądry\ 2021\ [[arxiv]](https://arxiv.org/abs/2112.15329) **Sparse PCA from Sparse Linear Regression**\ (α-β order) Guy Bresler, Sung Min Park, Madalina Persu\ NeurIPS 2018\ [[arxiv]](https://arxiv.org/abs/1811.10106) [[poster]](/assets/files/neurips_2018_poster.pdf) [[code]](https://github.com/sung-max/SPCAvSLR) **On the Equivalence of Sparse Statistical Problems**\ Sung Min Park\ SM thesis 2016\ [[pdf]](/assets/files/sm_thesis.pdf) **Structured learning of sum-of-submodular higher order energy functions**\ Alexander Fix, Thorsten Joachims, Sung Min Park, Ramin Zabih\ ICCV 2013\ [[pdf]](/assets/files/submodular.pdf)

Talks

Mar 2024 Stanford ML lunch
Jul 2023 ICML Oral
May 2023 LIDS & Stats Tea
May 2023 MIT MLTea
Apr 2023 ML Collective Reading group
Feb 2023 MIT LIDS Student Conference
Aug 2022 UMN ML Seminar
Feb 2022 LIDS & Stats Tea
Jan 2022 MIT LIDS Student Conference

Misc

Region Detection and Geometry Prediction
Patent from work during Summer 2020 internship at Waymo
[pdf]

Fourier Theoretic Probabilistic Inference over Permutations
Cornell, Spring 2014
[pdf]

Analysis of pipage method for k-max coverage
Cornell, Fall 2012
[pdf]

Personal

I grew up between the Bay Area, Seoul, and Singapore, where I attended SAS.

In my free time, I enjoy lifting, playing basketball, rowing, watching the NBA (nuggets!), watching movies, and learning physics and math.