
Junhong Shen
Ph.D. Student
Carnegie Mellon University
Advisor: Ameet Talwalkar
Email: junhongs@andrew.cmu.edu
Office: Gates Hillman Centers 8226
Google Scholar GitHub Twitter
Bio
I’m a 4th-year Ph.D. student in the Machine Learning Department at CMU, advised by Ameet Talwalkar. My work centers on enhancing LLM’s interaction with real-world applications, in particular building multi-modal models and agent systems that operate in real-world environments, such as browsers, command lines, and IDEs. I’m also interested in enhancing LLMs’ abilities to model diverse data types and applying them to long-tail, low-resource domains such as science and business.
I obtained my B.S. in Mathematics of Computation at UCLA, where I was fortunate to work with Lin Yang on sample-efficient reinforcement learning. I have also worked on multi-agent RL and Theory of Mind, advised by Song-Chun Zhu and Ying Nian Wu. My PhD is supported by JP Morgan AI PhD Fellowship.
News
- Feb 2025: I'll intern at Google DeepMind (Mountain View) this May. Catch me up for a coffee!
- Jan 2025: Check out our new work on Multi-Modal Mixture of Mamba!
- Dec 2024: My internship work at Meta FAIR, Content-Adaptive Image Tokenizer, is released!
- Nov 2024: Check out our newest work on web agents built on top of open-source LLMs! ScribeAgent paper, code, and blog post.
- Sep 2024: Grateful to be awarded the J.P. Morgan AI PhD Fellowship (accepted) and the Bloomberg PhD Fellowship (declined)!
- May 2024: My internship work at Microsoft Research, Tag-LLM, is accepted by ICML 2024!
- May 2024: We are organizing the CMU Agent Workshop. Check out and participate!
Selected Publications

CAT: Content-Adaptive Image Tokenization

ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation

Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains

Cross-Modal Fine-Tuning: Align then Refine

Efficient Architecture Search for Diverse Tasks

NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks

Iterative Teacher-Aware Learning

Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation

Emergence of Pragmatics from Referential Game between Theory of Mind Agents
Cite CAT: Content-Adaptive Image Tokenization
@misc{shen2024adaptivetokenizer,
title={CAT: Content-Adaptive Image Tokenization},
author={Junhong Shen and Kushal Tirumala and Michihiro Yasunaga and Ishan Misra and Luke Zettlemoyer and Lili Yu and Chunting Zhou},
year={2025},
eprint={2501.03120},
archivePrefix={arXiv},
primaryClass={cs.CV},
}
Cite ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data
@misc{shen2024scribeagent,
title={ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data},
author={Junhong Shen and Atishay Jain and Zedian Xiao and Ishan Amlekar and Mouad Hadji and Aaron Podolny and Ameet Talwalkar},
year={2024},
eprint={2411.15004},
archivePrefix={arXiv},
primaryClass={cs.CL},
}
Cite UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation
@misc{shen2024ups, title={UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation},
author={Junhong Shen and Tanya Marwah and Ameet Talwalkar},
year={2024},
eprint={2403.07187},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Cite Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
@misc{shen2024tagllm,
title={Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains},
author={Junhong Shen and Neil Tenenholtz and James Brian Hall and David Alvarez-Melis and Nicolo Fusi},
year={2024},
eprint={2402.05140},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Cite Cross-Modal Fine-Tuning: Align then Refine
@misc{shen2023orca,
author = {Shen, Junhong and Li, Liam and Dery, Lucio M. and Staten, Corey and Khodak, Mikhail and Neubig, Graham and Talwalkar, Ameet},
title = {Cross-Modal Fine-Tuning: Align then Refine},
publisher = {ICML},
year = {2023},
url = {https://arxiv.org/abs/2302.05738}
}
Cite Efficient Architecture Search for Diverse Tasks
@inproceedings{shen2022efficient,
title={Efficient Architecture Search for Diverse Tasks},
author={Shen, Junhong and Khodak, Mikhail and Talwalkar, Ameet},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2022}
}
Cite NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks
@inproceedings{nasbench360,
title={NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks},
author={Renbo Tu and Nicholas Roberts and Mikhail Khodak and Junhong Shen and Frederic Sala and Ameet Talwalkar},
booktitle={Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track},
year={2022}
}
Cite Iterative Teacher-Aware Learning
@inproceedings{yuan2021iterative,
title={Iterative Teacher-Aware Learning},
author={Luyao Yuan and Dongruo Zhou and Junhong Shen and Jingdong Gao and Jeffrey L. Chen and Quanquan Gu and Ying Nian Wu and Song-Chun Zhu},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2021}
}
Cite Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation
@inproceedings{Shen2021TheoreticallyPD,
title={Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation},
author={Junhong Shen and Lin F. Yang},
booktitle={AAAI},
year={2021}
}
Cite Emergence of Pragmatics from Referential Game between Theory of Mind Agents
@article{Yuan2020EmergenceOP,
title={Emergence of Pragmatics from Referential Game between Theory of Mind Agents},
author={Luyao Yuan and Zipeng Fu and Jingyue Shen and Lu Xu and Junhong Shen and Song-Chun Zhu},
journal={NeurIPS 2019 Workshop on Emergent Communication},
year={2019}
}