Hongcheng Gao*, Yue Liu*, Yufei He, Longxu Dou, Chao Du, Zhijie Deng,
Bryan Hooi, Min Lin, Tianyu Pang†
*Equal Contribution † Corresponding Author
This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner. Then, we further enhance it via reinforcement learning (RL) with external execution feedback.A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner. Remarkably, it surpasses o1-mini by
We follow the MetaGPT to install the required dependencies, please run the following commands:
git clone https://github.com/sail-sg/FlowReasoner
cd code
pip install --upgrade -e .
All experiments are conducted on NVIDIA A100 GPUs with 80GB of memory.
Configure LLM parameters in config/config2.yaml (see examples/FlowReasoner/config2.example.yaml for reference)
models:
"<model_name>": # model: "gpt-4-turbo" # or gpt-3.5-turbo
api_type: "openai" # or azure / ollama / groq etc.
base_url: "<your base url>"
api_key: "<your api key>"
temperature: 0
"<model_name>":
api_type: "openai"
base_url: "<your base url>"
api_key: "<your api key>"
temperature: 0
CALC_USAGE: True
python -m examples.FlowReasoner.optimize --dataset MATH
python -m examples.FlowReasoner.optimize --dataset MATH --sample n --optimized_path xxx ...
Note that the test cases of each dataset should be split to two part with key val
and key test
seperately. The val
test cases are used for external execution feedback for optimaze workflow.
The SFT dataset is generated by the inference stage. The SFT is conducted by the standard training process using LLaMA-Factory while the RL is based on EasyRL.
This repository is based on the codebase of the MetaGPT, LLaMA-Factory, and EasyRL. Thanks for their impressive work!
If you find our work helpful, please cite as
@misc{gao2025flowreasonerreinforcingquerylevelmetaagents,
title={FlowReasoner: Reinforcing Query-Level Meta-Agents},
author={Hongcheng Gao and Yue Liu and Yufei He and Longxu Dou and Chao Du and Zhijie Deng and Bryan Hooi and Min Lin and Tianyu Pang},
year={2025},
eprint={2504.15257},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2504.15257},
}