Stanford AI Lab Papers and Talks at ICLR 2025

April 22, 2025

The International Conference on Learning Representations (ICLR) 2025 is being hosted in Singapore from April 24th - April 28th. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!

List of Accepted Papers

3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

Authors: Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, Ceyuan Yang
Contact: yhxu@stanford.edu
Links: Paper | Website
Keywords: 3d scene editing; gaussian splatting;

Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling

Authors: Yuejiang Liu, Jubayer Ibn Hamid, Annie Xie, Yoonho Lee, Max Du, Chelsea Finn
Contact: yuejiang.liu@cs.stanford.edu
Links: Paper | Website
Keywords: robot learning, action chunking, action decoding, test-time compute

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Authors: Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang
Contact: yhxu@stanford.edu
Links: Paper | Website
Keywords: video generative models; 3d control for video generation

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Authors: Chenglei Si, Diyi Yang, Tatsunori Hashimoto
Contact: clsi@stanford.edu
Links: Paper
Keywords: large language models, automating research

Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHR Data

Authors: Michael Wornow, Suhana Bedi, Miguel Angel Fuentes Hernandez, Ethan Steinberg, Jason Alan Fries, Christopher Re, Sanmi Koyejo, Nigam Shah
Contact: mwornow@stanford.edu
Links: Paper
Keywords: healthcare, foundation models, long context

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models

Authors: Andy K Zhang, Neil Perry, Riya Dulepet, Joey Ji, Celeste Menders, Justin W Lin, Eliot Jones, Gashon Hussein, Samantha Liu, Donovan Julian Jasper, Pura Peetathawatchai, Ari Glenn, Vikram Sivashankar, Daniel Zamoshchin, Leo Glikbarg, Derek Askaryar, Haoxiang Yang, Aolin Zhang, Rishi Alluri, Nathan Tran, Rinnara Sangpisit, Kenny O Oseleononmen, Dan Boneh, Daniel E. Ho, Percy Liang
Contact: andyzh@stanford.edu
Award nominations: Oral
Links: Paper | Website
Keywords: language model agents, benchmark, cybersecurity, risk

Dr.

Authors: Christopher Fifty, Ronald Guenther Junkins, Dennis Duan, Aniketh Iyengar, Jerry Weihong Liu, Ehsan Amid, Sebastian Thrun, Christopher Ré
Contact: fifty@cs.stanford.edu
Award nominations: Oral
Links: Paper | Website
Keywords: generative modeling, computer vision

Energy-Based Diffusion Language Models for Text Generation

Authors: Minkai Xu, Tomas Geffner, Karsten Kreis, Weili Nie, Yilun Xu, Jure Leskovec, Stefano Ermon, Arash Vahdat
Contact: minkai@cs.stanford.edu
Links: Paper | Website
Keywords: language models, discrete diffusion models, energy-based models

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models

Authors: Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristóbal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez
Contact: rschaef@cs.stanford.edu
Links: Paper | Website
Keywords: adversarial robustness, jailbreaking, language model, vision language model

Foundation Models Secretly Understand Neural Network Weights: Enhancing Hypernetwork Architectures with Foundation Models

Authors: Jeffrey Gu, Serena Yeung-Levy
Contact: jeffgu@stanford.edu
Links: Paper | Website
Keywords: hypernetworks, neural fields, implicit neural representations, generalizable neural fields, foundation models

Generative Representational Instruction Tuning

Authors: Niklas Muennighoff, Hongjin SU, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, Douwe Kiela
Contact: niklasm@stanford.edu
Links: Paper | Website
Keywords: large language models, instruction tuning, text embedding

Aligning Language Models with Demonstrated Feedback

Authors: Omar Shaikh, Michelle S. Lam, Joey Hejna, Yijia Shao, Hyundong Justin Cho, Michael S. Bernstein, Diyi Yang
Contact: oshaikh@stanford.edu
Keywords: personalization, few-shot learning, human computer interaction, alignment

Learning Efficient Positional Encodings with Graph Neural Networks

Authors: Charilaos Kanatsoulis, Evelyn Choi, Stefanie Jegelka, Jure Leskovec, Alejandro Ribeiro
Contact: charilaos@cs.stanford.edu
Links: Paper
Keywords: graph transformers, positional encodings, graph neural networks

LoLCATs: On Low-Rank Linearizing of Large Language Models

Authors: Michael Zhang, Simran Arora, Rahul Chalamala, Benjamin Frederick Spector, Alan Wu, Krithik Ramesh, Aaryan Singhal, Christopher Re
Contact: mzhang@cs.stanford.edu
Links: Paper | Blog Post
Keywords: llms, efficient architectures, attention

Model Equality Testing: Which Model is this API Serving?

Authors: Irena Gao, Percy Liang, Carlos Guestrin
Contact: irena@cs.stanford.edu
Links: Paper | Website
Keywords: api monitoring, model shift, two-sample testing

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Authors: Julie Kallini, Shikhar Murty, Christopher D. Manning, Christopher Potts, Róbert Csordás
Contact: kallini@stanford.edu
Links: Paper | Website
Keywords: nlp, byt5, t5, tokenization, byte-level language models, character-level language models

OLMoE: Open Mixture-of-Experts Language Models

Authors: Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison, Sewon Min, Weijia Shi, Evan Pete Walsh, Oyvind Tafjord, Nathan Lambert, Yuling Gu, Shane Arora, Akshita Bhagia, Dustin Schwenk, David Wadden, Alexander Wettig, Binyuan Hui, Tim Dettmers, Douwe Kiela, Ali Farhadi, Noah A. Smith, Pang Wei Koh, Amanpreet Singh, Hannaneh Hajishirzi
Contact: niklasm@stanford.edu
Award nominations: Oral
Links: Paper | Website
Keywords: large language models, mixture-of-experts, open-source

Predicate Hierarchies Improve Few-Shot State Classification

Authors: Emily Jin*, Joy Hsu*, Jiajun Wu
Contact: emilyjin@stanford.edu
Links: Paper | Website
Keywords: few-shot state classification, predicate hierarchies

Real2Code: Reconstruct Articulated Objects via Code Generation

Authors: Zhao Mandi, Yijia Weng, Dominik Bauer, Shuran Song
Contact: mandi@stanford.edu
Links: Paper | Blog Post | Website
Keywords: code llms; articulated objects; digital twins; foundation models

Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering

Authors: Sheng Liu, Haotian Ye, James Zou
Contact: shengl@stanford.edu
Award nominations: Spotlight
Links: Paper | Website
Keywords: hallucination, multimodal language model, large language model

SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?

Authors: John Yang, Carlos E. Jimenez, Alex L. Zhang, Kilian Lieret, Joyce Yang, Xindi Wu, Ori Press, Niklas Muennighoff, Gabriel Synnaeve, Karthik R. Narasimhan, Diyi Yang, Sida I. Wang, Ofir Press
Contact: johnby@stanford.edu
Links: Paper | Website
Keywords: language models, natural language processing, software engineering

Synthetic Continued Pretraining

Authors: Zitong Yang*, Neil Band*, Shuangping Li, Emmanuel Candès, Tatsunori Hashimoto
Contact: zitong@stanford.edu
Links: Paper | Website
Keywords: synthetic data, continued pretraining

TEOChat: Large Language and Vision Assistant for Temporal Earth Observation Data

Authors: Jeremy Andrew Irvin, Emily Ruoyu Liu, Joyce Chuyi Chen, Ines Dormoy, Jinyoung Kim, Samar Khanna, Zhuo Zheng, Stefano Ermon
Contact: jirvin16@cs.stanford.edu
Links: Paper | Website
Keywords: vision-language model, large multimodal model, satellite imagery, earth observation, change detection

TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation

Authors: Juntong Shi, Minkai Xu, Harper Hua, Hengrui Zhang, Stefano Ermon, Jure Leskovec
Contact: minkai@cs.stanford.edu
Links: Paper | Website
Keywords: tabular representative learning, generative models, diffusion models

The Utility and Complexity of in- and out-of-Distribution Machine Unlearning

Authors: Youssef Allouah, Joshua Kazdan, Rachid Guerraoui, Sanmi Koyejo
Contact: youssef.allouah@epfl.ch
Links: Paper
Keywords: machine unlearning, differential privacy, optimization, theory, right to be forgotten

TopoLM: brain-like spatio-functional organization in a topographic language model

Authors: Neil Rathi, Johannes Mehrer, Badr AlKhamissi, Taha Osama A Binhuraib, Nicholas Blauch, Martin Schrimpf
Contact: rathi@stanford.edu
Award nominations: oral
Links: Paper | Website
Keywords: language modeling, topography, fmri, neuroscience

Video Action Differencing

Authors: James Burgess, Xiaohan Wang, Yuhui Zhang, Anita Rau, Alejandro Lozano, Lisa Dunlap, Trevor Darrell, Serena Yeung-Levy
Contact: jmhb@stanford.edu
Links: Paper | Blog Post | Website
Keywords: video, action, comparion, lvm, lmm, benchmark

What Makes a Maze Look Like a Maze?

Authors: Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Noah D. Goodman, and Jiajun Wu
Contact: joycj@stanford.edu
Links: Paper | Website
Keywords: visual reasoning, abstract concepts, schemas

What’s the Move? Hybrid Imitation Learning via Salient Points

Authors: Priya Sundaresan*, Hengyuan Hu*, Quan Vuong, Jeannette Bohg, Dorsa Sadigh
Contact: priyasun@stanford.edu
Links: Paper | Website
Keywords: imitation learning, robot learning, robot manipulation, robotics

We look forward to seeing you at ICLR 2025!

Keep on top of the latest SAIL Blog posts via , , or email: