Skip to content
View dcahn12's full-sized avatar

Highlights

  • Pro

Block or report dcahn12

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dcahn12/README.md

Hi there, I'm Daechul Ahn 👋

Ph.D. Student @ Seoul National University
Machine Perception and Reasoning Lab (MPRLAB) · Advised by Prof. Jonghyun Choi

I build embodied AI systems that perceive, reason, and act like humans — from video-language understanding and multimodal alignment to multi-agent reasoning and robotic manipulation.

📬 Open to research internship opportunities — feel free to reach out!

Website Google Scholar LinkedIn Email


🔬 Research Interests

  • 🤖 Embodied AI — robotic manipulation, vision-language-action models
  • 🎬 Video-Language Understanding — temporal reasoning, video grounding
  • 🧠 Multi-Agent Reasoning — hierarchical planning, strategic decision-making
  • 🎯 Multimodal Alignment — RLHF/RLAIF for large multimodal models

📝 Selected Publications

Full list on my website and Google Scholar

Year Paper Venue
2026 SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for VLA Models ICML (Spotlight)
2026 BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands ICRA
2026 LWE: Becoming Experienced Judges — Selective Test-Time Learning for Evaluators EACL (Oral)
2026 VECTOR: What Happens When — Learning Temporal Orders of Events in Videos WACV
2025 HIMA: Society of Mind Meets Real-Time Strategy COLM
2025 ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO AAAI
2024 VLM-RLAIF: Tuning Large Multimodal Models for Videos using RLAIF ACL (Oral)
2023 CMOTA: Story Visualization by Online Text Augmentation with Context Memory ICCV
2021 PSVL: Zero-shot Natural Language Video Localization ICCV (Oral)

🏆 Highlights

  • 🏅 Outstanding Reviewer, CVPR 2025 (Top 5.6%)
  • 🥇 Best Paper Award, 1st Yonsei AI Workshop, 2022
  • 📖 Reviewer for CVPR, ICCV, ECCV, AAAI, WACV, ICRA, IJCV, TPAMI

🛠️ Tech Stack

Python PyTorch HuggingFace CUDA Linux Docker Git ROS


📊 GitHub Stats


📸 Beyond Research

I enjoy photography as a hobby — check out my portfolio at dafoto.info 📷


If you're interested in collaboration or research internships, feel free to reach out!

Pinned Loading

  1. yonseivnl/vlm-rlaif yonseivnl/vlm-rlaif Public

    ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback

    Python 77 4

  2. gistvision/PSVL gistvision/PSVL Public

    Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).

    Python 48 7

  3. snumprlab/isr-dpo snumprlab/isr-dpo Public

    Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)

    Python 23 1

  4. yonseivnl/cmota yonseivnl/cmota Public

    Python 10 1