Junhong Shen
Ph.D. Student
Carnegie Mellon University
Advisor: Ameet Talwalkar
Email: junhongs@andrew.cmu.edu
Office: Gates Hillman Centers 8226
Google Scholar GitHub Twitter
Bio
I’m a 4th-year Ph.D. student in the Machine Learning Department at CMU, advised by Ameet Talwalkar. My work centers on enhancing LLM’s interaction with real-world applications, in particular building multi-modal models and agent systems that operate in real-world environments, such as browsers, command lines, and IDEs. I’m also interested in enhancing LLMs’ abilities to model diverse data types and applying them to long-tail, low-resource domains such as science and business.
I obtained my B.S. in Mathematics of Computation at UCLA, where I was fortunate to work with Lin Yang on sample-efficient reinforcement learning. I have also worked on multi-agent RL and Theory of Mind, advised by Song-Chun Zhu and Ying Nian Wu. My PhD is supported by JP Morgan AI PhD Fellowship.
I'm looking for summer 2025 research internship. Please email me if there's a fit!
News
- Dec 2024: My internship work at Meta FAIR is released! Check out Content-Adaptive Tokenizer (CAT) for adaptive image tokenization!
- Nov 2024: Check out our newest work on web agents built on top of open-source LLMs! ScribeAgent paper, code, and blog post.
- Sep 2024: Grateful to be awarded the J.P. Morgan AI PhD Fellowship (accepted) and the Bloomberg PhD Fellowship (declined)!
- May 2024: My internship work at Microsoft Research, Tag-LLM, is accepted by ICML 2024!
- Apr 2023: Our work on cross-modal fine-tuning is accepted by ICML 2023 as oral presentation!
- Oct 2022: We are organizing the 2022 AutoML Decathlon. Check out and participate!
- Sep 2022: Our work on NAS for diverse tasks is accepted by NeurIPS 2022 and we’ve released the code.
Selected Publications
CAT: Content-Adaptive Image Tokenization
ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data
UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Cross-Modal Fine-Tuning: Align then Refine
Efficient Architecture Search for Diverse Tasks
NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks
Iterative Teacher-Aware Learning
Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation
Emergence of Pragmatics from Referential Game between Theory of Mind Agents
Cite CAT: Content-Adaptive Image Tokenization
@misc{shen2024adaptivetokenizer,
title={CAT: Content-Adaptive Image Tokenization},
author={Junhong Shen and Kushal Tirumala and Michihiro Yasunaga and Ishan Misra and Luke Zettlemoyer and Lili Yu and Chunting Zhou},
year={2025},
eprint={2501.03120},
archivePrefix={arXiv},
primaryClass={cs.CV},
}
Cite ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data
@misc{shen2024scribeagent,
title={ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data},
author={Junhong Shen and Atishay Jain and Zedian Xiao and Ishan Amlekar and Mouad Hadji and Aaron Podolny and Ameet Talwalkar},
year={2024},
eprint={2411.15004},
archivePrefix={arXiv},
primaryClass={cs.CL},
}
Cite UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation
@misc{shen2024ups, title={UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation},
author={Junhong Shen and Tanya Marwah and Ameet Talwalkar},
year={2024},
eprint={2403.07187},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Cite Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
@misc{shen2024tagllm,
title={Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains},
author={Junhong Shen and Neil Tenenholtz and James Brian Hall and David Alvarez-Melis and Nicolo Fusi},
year={2024},
eprint={2402.05140},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Cite Cross-Modal Fine-Tuning: Align then Refine
@misc{shen2023orca,
author = {Shen, Junhong and Li, Liam and Dery, Lucio M. and Staten, Corey and Khodak, Mikhail and Neubig, Graham and Talwalkar, Ameet},
title = {Cross-Modal Fine-Tuning: Align then Refine},
publisher = {ICML},
year = {2023},
url = {https://arxiv.org/abs/2302.05738}
}
Cite Efficient Architecture Search for Diverse Tasks
@inproceedings{shen2022efficient,
title={Efficient Architecture Search for Diverse Tasks},
author={Shen, Junhong and Khodak, Mikhail and Talwalkar, Ameet},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2022}
}
Cite NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks
@inproceedings{nasbench360,
title={NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks},
author={Renbo Tu and Nicholas Roberts and Mikhail Khodak and Junhong Shen and Frederic Sala and Ameet Talwalkar},
booktitle={Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track},
year={2022}
}
Cite Iterative Teacher-Aware Learning
@inproceedings{yuan2021iterative,
title={Iterative Teacher-Aware Learning},
author={Luyao Yuan and Dongruo Zhou and Junhong Shen and Jingdong Gao and Jeffrey L. Chen and Quanquan Gu and Ying Nian Wu and Song-Chun Zhu},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2021}
}
Cite Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation
@inproceedings{Shen2021TheoreticallyPD,
title={Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation},
author={Junhong Shen and Lin F. Yang},
booktitle={AAAI},
year={2021}
}
Cite Emergence of Pragmatics from Referential Game between Theory of Mind Agents
@article{Yuan2020EmergenceOP,
title={Emergence of Pragmatics from Referential Game between Theory of Mind Agents},
author={Luyao Yuan and Zipeng Fu and Jingyue Shen and Lu Xu and Junhong Shen and Song-Chun Zhu},
journal={NeurIPS 2019 Workshop on Emergent Communication},
year={2019}
}