Ph.D. Student @ Seoul National University
Machine Perception and Reasoning Lab (MPRLAB) · Advised by Prof. Jonghyun Choi
I build embodied AI systems that perceive, reason, and act like humans — from video-language understanding and multimodal alignment to multi-agent reasoning and robotic manipulation.
📬 Open to research internship opportunities — feel free to reach out!
- 🤖 Embodied AI — robotic manipulation, vision-language-action models
- 🎬 Video-Language Understanding — temporal reasoning, video grounding
- 🧠 Multi-Agent Reasoning — hierarchical planning, strategic decision-making
- 🎯 Multimodal Alignment — RLHF/RLAIF for large multimodal models
Full list on my website and Google Scholar
| Year | Paper | Venue |
|---|---|---|
| 2026 | SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for VLA Models | ICML (Spotlight) |
| 2026 | BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands | ICRA |
| 2026 | LWE: Becoming Experienced Judges — Selective Test-Time Learning for Evaluators | EACL (Oral) |
| 2026 | VECTOR: What Happens When — Learning Temporal Orders of Events in Videos | WACV |
| 2025 | HIMA: Society of Mind Meets Real-Time Strategy | COLM |
| 2025 | ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO | AAAI |
| 2024 | VLM-RLAIF: Tuning Large Multimodal Models for Videos using RLAIF | ACL (Oral) |
| 2023 | CMOTA: Story Visualization by Online Text Augmentation with Context Memory | ICCV |
| 2021 | PSVL: Zero-shot Natural Language Video Localization | ICCV (Oral) |
- 🏅 Outstanding Reviewer, CVPR 2025 (Top 5.6%)
- 🥇 Best Paper Award, 1st Yonsei AI Workshop, 2022
- 📖 Reviewer for CVPR, ICCV, ECCV, AAAI, WACV, ICRA, IJCV, TPAMI
I enjoy photography as a hobby — check out my portfolio at dafoto.info 📷
If you're interested in collaboration or research internships, feel free to reach out!


