Skip to content

sail-sg/FlowReasoner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

4ff14c9 · Apr 23, 2025

History

20 Commits
Apr 23, 2025
Apr 21, 2025
Apr 23, 2025

Repository files navigation

This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner. Then, we further enhance it via reinforcement learning (RL) with external execution feedback.A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner. Remarkably, it surpasses o1-mini by 10.52 % accuracy across three benchmarks.

Installation

We follow the MetaGPT to install the required dependencies, please run the following commands:

git clone https://github.com/sail-sg/FlowReasoner 
cd code
pip install --upgrade -e .

All experiments are conducted on NVIDIA A100 GPUs with 80GB of memory.

Configure optimization parameters

Configure LLM parameters in config/config2.yaml (see examples/FlowReasoner/config2.example.yaml for reference)

models:
 "<model_name>": # model: "gpt-4-turbo"  # or gpt-3.5-turbo
   api_type: "openai"  # or azure / ollama / groq etc.
   base_url: "<your base url>" 
   api_key: "<your api key>"
   temperature: 0
 "<model_name>":  
   api_type: "openai"  
   base_url: "<your base url>"
   api_key: "<your api key>"
   temperature: 0
CALC_USAGE: True 

Run the inference

Using default parameters

python -m examples.FlowReasoner.optimize --dataset MATH

Or with custom parameters

python -m examples.FlowReasoner.optimize --dataset MATH --sample n --optimized_path xxx ...

Note that the test cases of each dataset should be split to two part with key val and key test seperately. The val test cases are used for external execution feedback for optimaze workflow.

Training Stage

The SFT dataset is generated by the inference stage. The SFT is conducted by the standard training process using LLaMA-Factory while the RL is based on EasyRL.

Acknowledgments

This repository is based on the codebase of the MetaGPT, LLaMA-Factory, and EasyRL. Thanks for their impressive work!

Citation

If you find our work helpful, please cite as

@misc{gao2025flowreasonerreinforcingquerylevelmetaagents,
      title={FlowReasoner: Reinforcing Query-Level Meta-Agents}, 
      author={Hongcheng Gao and Yue Liu and Yufei He and Longxu Dou and Chao Du and Zhijie Deng and Bryan Hooi and Min Lin and Tianyu Pang},
      year={2025},
      eprint={2504.15257},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2504.15257}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published