Resarch

I’m interested in machine learning and deep learning in general. My current research focuses mainly on the practical side of ML, i.e., developing effective ML tools and pipelines for diverse applications in real life. I’m particuarly intereted in enhancing LLM’s interaction with real-world applications by developing efficient and unified multi-modal models and building LLM agents capable of environment (e.g., browsers, command lines, IDEs) and user interactions. Besides, I also study:

Automated machine learning (AutoML): how do we use neural architecture search (NAS) to generate effective and task-specific neural network architectures for different downstream problems?
Tranfer learning to scientific domains: how can we leverage existing large-scale pretrained models effectvely for solving problems that are not within the model’s pretraining domain and modality?

Talks

Cross-Modal Fine-Tuning, AI4Science Talks, March 20, 2023.
DASH: How to Search Over Convolutions, The AutoML Podcast, December 19, 2022.
Tackling Diverse Tasks with Neural Architecture Search, Deep Learning Machine Learning Journal Club, Mayo Clinic, October 17, 2022.

Publications

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction paper illustration

Junhong Shen*, Hao Bai*, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Shengbang Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, Aviral Kumar
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

In Preprint, .

The current paradigm of test-time scaling relies on generating long reasoning traces ("thinking" more) before producing a response. In this work, we propose to scale test-time interaction, an untapped dimension of test-time scaling that increases the agent's interaction horizon to enable running rich behaviors such as exploration, backtracking, and dynamic re-planning within a single rollout.

papercodewebsiteblogpost

CodePDE: Benchmarking LLMs' Abilities to Solve PDEs through Code Generation paper illustration

Shanda Li, Tanya Marwah, Junhong Shen, Weiwei Sun, Andrej Risteski, Yiming Yang, Ameet Talwalkar
CodePDE: Benchmarking LLMs' Abilities to Solve PDEs through Code Generation

In Preprint, .

We frame PDE solving as a code generation task and introduce CodePDE, the first inference framework for generating PDE solvers using large language models (LLMs). Leveraging advanced inference-time algorithms and scaling strategies, CodePDE unlocks critical capacities of LLM for PDE solving---reasoning, debugging, self-refinement, and test-time scaling---all without task-specific tuning.

papercode

Mixture‑of‑Mamba: Enhancing Multi‑Modal State‑Space Models with Modality‑Aware Sparsity paper illustration

Weixin Liang*, Junhong Shen*, Genghan Zhang, Ning Dong, Luke Zettlemoyer, Lili Yu
Mixture‑of‑Mamba: Enhancing Multi‑Modal State‑Space Models with Modality‑Aware Sparsity

In ICLR Scalable Optimization for Efficient and Adaptive Foundation Models Workshop, 2025 (Oral, top 8/96).

We propose Mixture-of-Mamba, a novel SSM architecture that introduces modality-aware sparsity through modality-specific parameterization of the Mamba block. Building on Mixture-of-Transformers, we extend the benefits of modality-aware sparsity to SSMs while preserving their computational efficiency.

papercode

CAT: Content-Adaptive Image Tokenization paper illustration

Junhong Shen, Kushal Tirumala, Michihiro Yasunaga, Ishan Misra, Luke Zettlemoyer, Lili Yu, Chunting Zhou
CAT: Content-Adaptive Image Tokenization

In Preprint, .

Most existing image tokenizers encode images into a fixed number of tokens or patches. We introduce Content-Adaptive Tokenizer (CAT), which dynamically adjusts representation capacity based on the image content and encodes simpler images into fewer tokens. We design a caption-based evaluation system that leverages LLMs to predict content complexity and determine the optimal compression ratio for a given image.

paper

Junhong Shen, Atishay Jain, Zedian Xiao, Ishan Amlekar, Mouad Hadji, Aaron Podolny, Ameet Talwalkar
ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

In ICLR Foundation Models in the Wild Workshop, 2025.

Most LLM-based web agents rely on prompting general-purpose, proprietary models like GPT-4. We explore an alternative approach that fine-tunes open-source LLMs using production-scale workflow data. This simple yet effective approach achieves SOTA direct generation performance on Mind2Web and improves the task success rate by 7.3% over the previous best text-only web agents on WebArena.

papercode

Specialized Foundation Models Struggle to Beat Supervised Baselines paper illustration

Zongzhe Xu, Ritvik Gupta, Wenduo Cheng, Alexander Shen, Junhong Shen, Ameet Talwalkar, Mikhail Khodak
Specialized Foundation Models Struggle to Beat Supervised Baselines

In NeurIPS FM4Science Workshop, 2024.

We look at three modalities--genomics, satellite imaging, and time series--with multiple recent FMs and compare them to a standard supervised learning workflow (model development, hyperparameter tuning, and training, all using only data from the target task). We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.

papercode

Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains paper illustration

Junhong Shen, Neil Tenenholtz, James Brian Hall, David Alvarez-Melis, Nicolo Fusi
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains

In ICML, 2024.

LLMs demonstrate proficiency in understanding natural language. However, their capabilities wane in highly specialized domains underrepresented in the pretraining corpus, such as physical and biomedical sciences. This work explores how to repurpose general LLMs into specialized task solvers through learning custom input tags to condition the LLM.

papercode

Efficient Architecture Search for Diverse Tasks paper illustration

Junhong Shen*, Mikhail Khodak*, Ameet Talwalkar
Efficient Architecture Search for Diverse Tasks

In NeurIPS, 2022.

DASH is developed for efficiently solving diverse ML problems outside of the well-researched domains such as vision and natural language processing. Being fast, simple, and broadly applicable, DASH fixes a standard CNN topology and searches for the right kernel sizes and dilation rates that its operations should take on. It expands the network capacity to extract features at multiple resolutions for different types of data while only requiring searching over the operation space.

papercodeblogpost

NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks paper illustration

Renbo Tu*, Nicholas Roberts*, Mikhail Khodak, Junhong Shen, Frederic Sala, Ameet Talwalkar
NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks

In NeurIPS Datasets and Benchmarks Track, 2022.

Neural architecture search (NAS) benchmarks and methods prioritize performance on well-studied tasks, e.g., image classification on CIFAR and ImageNet. To mitigate this bias, NAS-Bench-360 is a benchmark suite for evaluating state-of-the-art NAS methods on a diverse set of tasks. The selection spans different application domains, dataset sizes, problem dimensionalities, and learning objectives.

papercodewebsiteblogpost

Iterative Teacher-Aware Learning paper illustration

Luyao Yuan, Dongruo Zhou, Junhong Shen, Jingdong Gao, Jeffrey L. Chen, Quanquan Gu, Ying Nian Wu, Song-Chun Zhu
Iterative Teacher-Aware Learning

In NeurIPS, 2021.

In this paper, we propose a gradient optimization based teacher-aware learner who can incorporate teacher’s cooperative intention into the likelihood function and learn provably faster compared with the naive learning algorithms used in previous machine teaching works.

papercode

Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation paper illustration

Junhong Shen, Lin F. Yang
Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation

In AAAI, 2021.

We propose a theoretically principled nearest neighbor (NN) function approximator that can replace the value networks in deep RL methods. Inspired by human similarity judgments, the NN approximator estimates the action values using rollouts on past observations and can provably obtain a small regret bound that depends only on the intrinsic complexity of the environment.

papercodeslides

Mathematical Reconstruction of Patient-Specific Vascular Networks Based on Clinical Images and Global Optimization paper illustration

Junhong Shen, Abdul Hannan Faruqi, Yifan Jiang, Nima Maftoon
Mathematical Reconstruction of Patient-Specific Vascular Networks Based on Clinical Images and Global Optimization

In IEEE Access, 2021.

We developed a computational framework that takes 3D medical images as input and reconstructs complete, patient-specific vascular network models using a mathematical optimization procedure. Our framework extracts major vessels from the images and uses the organ geometry to select vessel termination points. Then, it generates the remainder network based on physiological optimality principles.

papercodeslides

Emergence of Pragmatics from Referential Game between Theory of Mind Agents paper illustration

Luyao Yuan, Zipeng Fu, Jingyue Shen, Lu Xu, Junhong Shen, Song-Chun Zhu
Emergence of Pragmatics from Referential Game between Theory of Mind Agents

In Emergent Communication Workshop, NeurIPS, 2019.

We integrate the theory of mind (ToM) in a cooperative multi-agent pedagogical situation and propose an adaptive reinforcement learning (RL) algorithm to develop a communication protocol.

paperwebsite

Cite Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

@misc{shenbai2025tti,
 title={Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction}, 
 author={Junhong Shen and Hao Bai and Lunjun Zhang and Yifei Zhou and Amrith Setlur and Shengbang Tong and Diego Caples and Nan Jiang and Tong Zhang and Ameet Talwalkar and Aviral Kumar},
 year={2025},
 eprint={2506.07976},
 archivePrefix={arXiv},
 primaryClass={cs.LG},
 url={https://arxiv.org/abs/2506.07976},
 }

Cite CodePDE: Benchmarking LLMs' Abilities to Solve PDEs through Code Generation

@misc{li2025codepde,
 title={CodePDE: An Inference Framework for LLM-driven PDE Solver Generation},
 author={Shanda Li and Tanya Marwah and Junhong Shen and Weiwei Sun and Andrej Risteski and Yiming Yang and Ameet Talwalkar},
 year={2025},
 eprint={2505.08783},
 archivePrefix={arXiv},
 primaryClass={cs.LG},
 url={https://arxiv.org/abs/2505.08783},
 }

Cite Mixture‑of‑Mamba: Enhancing Multi‑Modal State‑Space Models with Modality‑Aware Sparsity

@misc{liangshen2025mixtureofmamba,
 title={Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity},
 author={Weixin Liang and Junhong Shen and Genghan Zhang and Ning Dong and Luke Zettlemoyer and Lili Yu},
 year={2025},
 eprint={2501.16295},
 archivePrefix={arXiv},
 primaryClass={cs.LG},
 url={https://arxiv.org/abs/2501.16295},
 }

Cite CAT: Content-Adaptive Image Tokenization

@misc{shen2024adaptivetokenizer,
 title={CAT: Content-Adaptive Image Tokenization},
 author={Junhong Shen and Kushal Tirumala and Michihiro Yasunaga and Ishan Misra and Luke Zettlemoyer and Lili Yu and Chunting Zhou},
 year={2025},
 eprint={2501.03120},
 archivePrefix={arXiv},
 primaryClass={cs.CV},
 }

Cite ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

@misc{shen2024scribeagent,
 title={ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data},
 author={Junhong Shen and Atishay Jain and Zedian Xiao and Ishan Amlekar and Mouad Hadji and Aaron Podolny and Ameet Talwalkar},
 year={2024},
 eprint={2411.15004},
 archivePrefix={arXiv},
 primaryClass={cs.CL},
 }

Cite Specialized Foundation Models Struggle to Beat Supervised Baselines

@misc{xu2024specializedfm,
 title={Specialized Foundation Models Struggle to Beat Supervised Baselines},
 author={Zongzhe Xu and Ritvik Gupta and Wenduo Cheng and Alexander Shen and Junhong Shen and Ameet Talwalkar and Mikhail Khodak},
 year={2024},
 eprint={2411.02796},
 archivePrefix={arXiv},
 primaryClass={cs.LG},
 }

Cite UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation

@misc{shen2024ups, title={UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation},
 author={Junhong Shen and Tanya Marwah and Ameet Talwalkar},
 year={2024},
 eprint={2403.07187},
 archivePrefix={arXiv},
 primaryClass={cs.LG}
 }

Cite Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains

@misc{shen2024tagllm,
 title={Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains}, 
 author={Junhong Shen and Neil Tenenholtz and James Brian Hall and David Alvarez-Melis and Nicolo Fusi},
 year={2024},
 eprint={2402.05140},
 archivePrefix={arXiv},
 primaryClass={cs.LG}
 }

Cite Cross-Modal Fine-Tuning: Align then Refine

@misc{shen2023orca,
 author = {Shen, Junhong and Li, Liam and Dery, Lucio M. and Staten, Corey and Khodak, Mikhail and Neubig, Graham and Talwalkar, Ameet},
 title = {Cross-Modal Fine-Tuning: Align then Refine},
 publisher = {ICML},
 year = {2023},
 url = {https://arxiv.org/abs/2302.05738}
 }

Cite Efficient Architecture Search for Diverse Tasks

@inproceedings{shen2022efficient,
 title={Efficient Architecture Search for Diverse Tasks},
 author={Shen, Junhong and Khodak, Mikhail and Talwalkar, Ameet},
 booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
 year={2022}
 }

Cite NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks

@inproceedings{nasbench360,
 title={NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks},
 author={Renbo Tu and Nicholas Roberts and Mikhail Khodak and Junhong Shen and Frederic Sala and Ameet Talwalkar},
 booktitle={Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track},
 year={2022}
 }

Cite Iterative Teacher-Aware Learning

@inproceedings{yuan2021iterative,
 title={Iterative Teacher-Aware Learning},
 author={Luyao Yuan and Dongruo Zhou and Junhong Shen and Jingdong Gao and Jeffrey L. Chen and Quanquan Gu and Ying Nian Wu and Song-Chun Zhu},
 booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
 year={2021}
 }

Cite Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation

@inproceedings{Shen2021TheoreticallyPD,
 title={Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation},
 author={Junhong Shen and Lin F. Yang},
 booktitle={AAAI},
 year={2021}
 }

Cite Mathematical Reconstruction of Patient-Specific Vascular Networks Based on Clinical Images and Global Optimization

@article{shen2021reconstruction,
 author={Shen, Junhong and Faruqi, Abdul Hannan and Jiang, Yifan and Maftoon, Nima},
 journal={IEEE Access}, 
 title={Mathematical Reconstruction of Patient-Specific Vascular Networks Based on Clinical Images and Global Optimization}, 
 year={2021},
 volume={9},
 pages={20648-20661}
 }

Cite Emergence of Pragmatics from Referential Game between Theory of Mind Agents

@article{Yuan2020EmergenceOP,
 title={Emergence of Pragmatics from Referential Game between Theory of Mind Agents},
 author={Luyao Yuan and Zipeng Fu and Jingyue Shen and Lu Xu and Junhong Shen and Song-Chun Zhu},
 journal={NeurIPS 2019 Workshop on Emergent Communication},
 year={2019}
 }