Junhong Shen
Ph.D. Student
Machine Learning Department
Carnegie Mellon University
Advisor: Ameet Talwalkar
Email: junhongs@andrew.cmu.edu
Office: Gates Hillman Centers 8226
Google Scholar GitHub Twitter
Carnegie Mellon University
Advisor: Ameet Talwalkar
Email: junhongs@andrew.cmu.edu
Office: Gates Hillman Centers 8226
Google Scholar GitHub Twitter
Bio
I'm a 4th-year Ph.D. student in the Machine Learning Department at CMU, advised by Ameet Talwalkar. My work centers on developing efficient, multi-modal models that bridge the gap between research and real-world applications. I'm particularly interested in enhancing LLMs' abilities to model diverse data types and applying them to long-tail, low-resource domains such as science and business.
I obtained my B.S. in Mathematics of Computation at UCLA, where I was fortunate to work with Lin Yang on sample-efficient reinforcement learning. I have also worked on multi-agent RL and Theory of Mind, advised by Song-Chun Zhu and Ying Nian Wu. My PhD is supported by JP Morgan AI PhD Fellowship.
News
- Sep 2024: Grateful to be awarded the J.P. Morgan AI PhD Fellowship (accepted) and the Bloomberg PhD Fellowship (declined)!
- Jun 2024: I joined FAIR at Meta as a research intern! Stay tuned for our work on efficient image/video tokenization!
- May 2024: My internship work at Microsoft Research, Tag-LLM, is accepted by ICML 2024!
- Feb 2024: I joined Scribe as a senior research engineer. Stay tuned for our work on LLM web agents!
- Apr 2023: Our work on cross-modal fine-tuning is accepted by ICML 2023 as oral presentation!
- Oct 2022: We are organizing the 2022 AutoML Decathlon. Check out and participate!
- Sep 2022: Our work on NAS for diverse tasks is accepted by NeurIPS 2022 and we’ve released the code.
Selected Publications
For a full list of publications, see Research.
Junhong Shen,
Tanya Marwah,
Ameet Talwalkar
UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation
UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation
In
TMLR 2024 & ICML AI4Science Workshop,
2024 (Spotlight).
UPS is developed for solving diverse spatiotemporal PDEs defined over various domains, dimensions, and resolutions. It unifies different PDEs into a consistent representation space and processes diverse collections of PDE data using a unified network architecture that combines LLMs with domain-specific neural operators.
Junhong Shen,
Neil Tenenholtz, James Brian Hall, David Alvarez-Melis,
Nicolo Fusi
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
In
ICML,
2024.
LLMs have demonstrated remarkable proficiency in understanding and generating natural language. However, their capabilities wane in highly specialized domains underrepresented in the pretraining corpus, such as physical and biomedical sciences. This work explores how to repurpose general LLMs into specialized task solvers through a novel, model-agnostic framework for learning custom input tags to condition the LLM.
Junhong Shen,
Liam Li, Lucio Dery, Corey Staten, Mikhail Khodak, Graham Neubig,
Ameet Talwalkar
Cross-Modal Fine-Tuning: Align then Refine
Cross-Modal Fine-Tuning: Align then Refine
In
ICML,
2023 (Oral).
ORCA is a general cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. It adapts to a target task via an align-then-refine workflow. Given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality. The pretrained model is then fine-tuned on the embedded data to exploit the knowledge shared across modalities.
Junhong Shen*,
Mikhail Khodak*,
Ameet Talwalkar
Efficient Architecture Search for Diverse Tasks
Efficient Architecture Search for Diverse Tasks
In
NeurIPS,
2022.
DASH is developed for efficiently solving diverse ML problems outside of the well-researched domains such as vision and natural language processing. Being fast, simple, and broadly applicable, DASH fixes a standard CNN topology and searches for the right kernel sizes and dilation rates that its operations should take on. It expands the network capacity to extract features at multiple resolutions for different types of data while only requiring searching over the operation space.
Renbo Tu*,
Nicholas Roberts*,
Mikhail Khodak,
Junhong Shen,
Frederic Sala,
Ameet Talwalkar
NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks
NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks
In
NeurIPS Datasets and Benchmarks Track,
2022.
Neural architecture search (NAS) benchmarks and methods prioritize performance on well-studied tasks, e.g., image classification on CIFAR and ImageNet. To mitigate this bias, NAS-Bench-360 is a benchmark suite for evaluating state-of-the-art NAS methods on a diverse set of tasks. The selection spans different application domains, dataset sizes, problem dimensionalities, and learning objectives.
Luyao Yuan,
Dongruo Zhou,
Junhong Shen,
Jingdong Gao, Jeffrey L. Chen, Quanquan Gu, Ying Nian Wu,
Song-Chun Zhu
Iterative Teacher-Aware Learning
Iterative Teacher-Aware Learning
In
NeurIPS,
2021.
In this paper, we propose a gradient optimization based teacher-aware learner who can incorporate teacher’s cooperative intention into the likelihood function and learn provably faster compared with the naive learning algorithms used in previous machine teaching works.
Junhong Shen,
Lin F. Yang
Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation
Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation
In
AAAI,
2021.
We propose a theoretically principled nearest neighbor (NN) function approximator that can replace the value networks in deep RL methods. Inspired by human similarity judgments, the NN approximator estimates the action values using rollouts on past observations and can provably obtain a small regret bound that depends only on the intrinsic complexity of the environment.
Luyao Yuan,
Zipeng Fu, Jingyue Shen, Lu Xu,
Junhong Shen,
Song-Chun Zhu
Emergence of Pragmatics from Referential Game between Theory of Mind Agents
Emergence of Pragmatics from Referential Game between Theory of Mind Agents
In
Emergent Communication Workshop, NeurIPS,
2019.
We integrate the theory of mind (ToM) in a cooperative multi-agent pedagogical situation and propose an adaptive reinforcement learning (RL) algorithm to develop a communication protocol.
Cite UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation
@misc{shen2024ups, title={UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation},
author={Junhong Shen and Tanya Marwah and Ameet Talwalkar},
year={2024},
eprint={2403.07187},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Cite Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
@misc{shen2024tagllm,
title={Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains},
author={Junhong Shen and Neil Tenenholtz and James Brian Hall and David Alvarez-Melis and Nicolo Fusi},
year={2024},
eprint={2402.05140},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Cite Cross-Modal Fine-Tuning: Align then Refine
@misc{shen2023orca,
author = {Shen, Junhong and Li, Liam and Dery, Lucio M. and Staten, Corey and Khodak, Mikhail and Neubig, Graham and Talwalkar, Ameet},
title = {Cross-Modal Fine-Tuning: Align then Refine},
publisher = {ICML},
year = {2023},
url = {https://arxiv.org/abs/2302.05738}
}
Cite Efficient Architecture Search for Diverse Tasks
@inproceedings{shen2022efficient,
title={Efficient Architecture Search for Diverse Tasks},
author={Shen, Junhong and Khodak, Mikhail and Talwalkar, Ameet},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2022}
}
Cite NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks
@inproceedings{nasbench360,
title={NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks},
author={Renbo Tu and Nicholas Roberts and Mikhail Khodak and Junhong Shen and Frederic Sala and Ameet Talwalkar},
booktitle={Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track},
year={2022}
}
Cite Iterative Teacher-Aware Learning
@inproceedings{yuan2021iterative,
title={Iterative Teacher-Aware Learning},
author={Luyao Yuan and Dongruo Zhou and Junhong Shen and Jingdong Gao and Jeffrey L. Chen and Quanquan Gu and Ying Nian Wu and Song-Chun Zhu},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2021}
}
Cite Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation
@inproceedings{Shen2021TheoreticallyPD,
title={Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation},
author={Junhong Shen and Lin F. Yang},
booktitle={AAAI},
year={2021}
}
Cite Emergence of Pragmatics from Referential Game between Theory of Mind Agents
@article{Yuan2020EmergenceOP,
title={Emergence of Pragmatics from Referential Game between Theory of Mind Agents},
author={Luyao Yuan and Zipeng Fu and Jingyue Shen and Lu Xu and Junhong Shen and Song-Chun Zhu},
journal={NeurIPS 2019 Workshop on Emergent Communication},
year={2019}
}