Home
A step-by-step guide to understanding the open-source multi-agent AI system
OpenManus is an open-source project aimed at replicating the capabilities of Manus AI, a groundbreaking general-purpose AI system. It uses a modular, containerized framework built with Docker, Python, and JavaScript to create a multi-agent AI system capable of autonomously executing complex tasks.
This powerful system can handle diverse tasks ranging from personalized travel planning to stock analysis, leveraging a collaborative team of AI agents working together to solve problems.
This guide takes you through a progressive learning journey to understand OpenManus:
At its core, OpenManus is built on a multi-agent architecture where specialized AI agents collaborate to solve complex tasks. This modular design enables high code reusability, strong extensibility, and clear separation of responsibilities.
The brain of OpenManus, consisting of specialized AI agents that handle different aspects of task execution. Agents are organized in a hierarchical structure, from basic proxies to specialized ones.
Handles interactions with large language models, serving as the intelligence engine that powers decision-making, content generation, and understanding.
Stores and manages conversation history and context, ensuring coherent and contextually relevant interactions across multiple exchanges.
Provides interfaces for agents to interact with external systems and perform actions like web browsing, code execution, and data retrieval.
Manages the workflows and execution patterns, coordinating how multiple agents collaborate to solve complex tasks.
Defines the behavior patterns and guidelines for agents, shaping how they respond to tasks and make decisions.
OpenManus/
docker/ # Docker configurations
frontend/ # Next.js frontend container
Dockerfile # Frontend container configuration
unified/ # Backend container configuration
Dockerfile # Backend container configuration
start.sh # Container startup script
src/ # Source code
agents/ # Multi-agent logic (Python)
nodes/ # Agent node implementations
browser_agent.py
coder_agent.py
coordinator.py
reporter_agent.py
research_agent.py
components/ # React components
config/ # Configuration files
graph/ # Graph-based workflow
llms/ # LLM integrations
pages/ # Next.js pages
prompts/ # Agent prompts
service/ # Backend services
tools/ # Tool implementations
utils/ # Utility functions
workflow/ # Workflow management
client.py # CLI client for testing
server.py # FastAPI server
docs/ # Documentation and API specs
package.json # Next.js frontend dependencies
next.config.js # Next.js configuration
docker-compose.yml # Docker Compose configuration
README.md # Main documentation file
OpenManus implements a hierarchical agent structure, with each agent type building upon the capabilities of the previous one. This modular approach allows for specialized agents that excel at specific tasks while sharing common functionality.
BaseAgent is the foundation of the entire agent framework, defining the core attributes and methods that all agents share. It handles basic state management, memory operations, and the execution lifecycle.
class BaseAgent(BaseModel, ABC):
"""Abstract base class for managing agent state and execution."""
# Core attributes
name: str = Field(..., description="Unique name of the agent")
description: Optional[str] = Field(None, description="Optional agent description")
# Prompts
system_prompt: Optional[str] = Field(None, description="System-level instruction prompt")
next_step_prompt: Optional[str] = Field(None, description="Prompt for determining next action")
# Dependent components
llm: LLM = Field(default_factory=LLM, description="Language model instance")
memory: Memory = Field(default_factory=Memory, description="Agent's memory store")
state: AgentState = Field(default=AgentState.IDLE, description="Current agent state")
# Execution control
max_steps: int = Field(default=10, description="Maximum steps before termination")
current_step: int = Field(default=0, description="Current step in execution")
ReActAgent extends BaseAgent by implementing the "Think-Act" pattern, which divides the agent's execution into two distinct phases: a thinking phase for decision making and an action phase for execution.
class ReActAgent(BaseAgent, ABC):
@abstractmethod
async def think(self) -> bool:
"""Process the current state and decide the next action."""
@abstractmethod
async def act(self) -> str:
"""Execute the decided actions."""
async def step(self) -> str:
"""Execute a single step: think and act."""
should_act = await self.think()
if not should_act:
return "Thinking complete - no action needed"
return await self.act()
ToolCallAgent extends ReActAgent by adding the ability to interact with external tools and APIs. This enables the agent to perform actions like web browsing, code execution, and data retrieval.
class ToolCallAgent(ReActAgent):
"""Base agent class for handling tool/function calls with enhanced abstraction"""
available_tools: ToolCollection = ToolCollection(
CreateChatCompletion(), Terminate()
)
tool_choices: Literal["none", "auto", "required"] = "auto"
async def think(self) -> bool:
# Get the LLM response and tool selection
response = await self.llm.ask_tool(
messages=self.messages,
system_msgs=[Message.system_message(self.system_prompt)]
if self.system_prompt
else None,
tools=self.available_tools.to_params(),
tool_choice=self.tool_choices,
)
self.tool_calls = response.tool_calls
# Process the response and tool calls
# ...
async def act(self) -> str:
# Execute tool calls
results = []
for command in self.tool_calls:
result = await self.execute_tool(command)
# Add tool response to memory
# ...
results.append(result)
return "\n\n".join(results)
PlanningAgent extends ToolCallAgent by adding planning capabilities, allowing it to break down complex tasks into manageable steps and track progress through the execution of a plan.
class PlanningAgent(ToolCallAgent):
"""
An agent that creates and manages plans to solve tasks.
This agent uses a planning tool to create and manage structured plans,
and tracks progress through individual steps until task completion.
"""
name: str = "planning"
description: str = "An agent that creates and manages plans to solve tasks"
system_prompt: str = PLANNING_SYSTEM_PROMPT
next_step_prompt: str = NEXT_STEP_PROMPT
available_tools: ToolCollection = Field(
default_factory=lambda: ToolCollection(PlanningTool(), Terminate())
)
# Step execution tracker
step_execution_tracker: Dict[str, Dict] = Field(default_factory=dict)
current_step_index: Optional[int] = None
async def think(self) -> bool:
"""Decide the next action based on plan status."""
prompt = (
f"CURRENT PLAN STATUS:\n{await self.get_plan()}\n\n{self.next_step_prompt}"
if self.active_plan_id
else self.next_step_prompt
)
self.messages.append(Message.user_message(prompt))
# Get the current step index
self.current_step_index = await self._get_current_step_index()
result = await super().think()
# Associate tool calls with the current step
if result and self.tool_calls:
# ...association logic...
return result
Manus is the flagship agent of OpenManus, combining all the capabilities of previous agent types with additional specialized tools to create a versatile, general-purpose AI assistant.
class Manus(ToolCallAgent):
"""
A versatile general-purpose agent that uses planning to solve various tasks.
This agent extends PlanningAgent with a comprehensive set of tools and capabilities,
including Python execution, web browsing, file operations, and information retrieval
to handle a wide range of user requests.
"""
name: str = "manus"
description: str = "A versatile general-purpose agent"
system_prompt: str = SYSTEM_PROMPT
next_step_prompt: str = NEXT_STEP_PROMPT
available_tools: ToolCollection = Field(
default_factory=lambda: ToolCollection(
PythonExecute(), GoogleSearch(), BrowserUseTool(), FileSaver(), Terminate()
)
)
| Agent Type | Basic State Management | Think-Act Pattern | Tool Usage | Planning | Specialized Capabilities |
|---|---|---|---|---|---|
| BaseAgent | ✅ | ❌ | ❌ | ❌ | ❌ |
| ReActAgent | ✅ | ✅ | ❌ | ❌ | ❌ |
| ToolCallAgent | ✅ | ✅ | ✅ | ❌ | ❌ |
| PlanningAgent | ✅ | ✅ | ✅ | ✅ | ❌ |
| Manus | ✅ | ✅ | ✅ | ✅ | ✅ |
OpenManus's workflow system orchestrates how agents collaborate to solve complex tasks. The Flow component manages these workflows, determining which agents handle which parts of a task and how their results are integrated.
class BaseFlow(BaseModel, ABC):
"""Base class for execution flows supporting multiple agents"""
agents: Dict[str, BaseAgent]
tools: Optional[List] = None
primary_agent_key: Optional[str] = None
@property
def primary_agent(self) -> Optional[BaseAgent]:
"""Get the primary agent for the flow"""
return self.agents.get(self.primary_agent_key)
@abstractmethod
async def execute(self, input_text: str) -> str:
"""Execute the flow with the given input"""
class PlanningFlow(BaseFlow):
"""A flow that manages planning and execution of tasks using agents."""
llm: LLM = Field(default_factory=lambda: LLM())
planning_tool: PlanningTool = Field(default_factory=PlanningTool)
executor_keys: List[str] = Field(default_factory=list)
active_plan_id: str = Field(default_factory=lambda: f"plan_{int(time.time())}")
current_step_index: Optional[int] = None
async def execute(self, input_text: str) -> str:
"""Execute the planning flow with agents."""
try:
# Create the initial plan
if input_text:
await self._create_initial_plan(input_text)
# Execute plan steps
while await self._has_next_step():
# Get the current step
step_info = await self._get_current_step()
# Select the appropriate executor
executor = self.get_executor(step_info.get("type"))
# Execute the step
result = await self._execute_step(executor, step_info)
# Update the step status
await self._update_step_status(step_info["index"], "completed")
# Complete the plan
return await self._finalize_plan()
except Exception as e:
# Handle exceptions
return f"Error executing flow: {str(e)}"
OpenManus implements a graph-based workflow system that allows for flexible orchestration of agent activities. Nodes in the graph represent agents or actions, while edges represent the flow of data and control.
When a user submits a task, the workflow system breaks it down into smaller, manageable sub-tasks. Each sub-task is assigned to the most suitable agent based on its capabilities.
The workflow system handles communication and coordination between agents, ensuring they can share information and build upon each other's work. This coordination is managed by specialized flow components.
As agents complete their assigned sub-tasks, their results are collected and integrated into a coherent final output. This integration considers dependencies between sub-tasks and ensures logical flow.
Consider a user asking OpenManus to "Plan a 3-day trip to Tokyo with a budget of $1000":
Tools are the interfaces through which OpenManus agents interact with the external world. The flexible tool system allows agents to perform a wide range of actions, from web browsing to code execution.
class BaseTool(ABC, BaseModel):
name: str
description: str
parameters: Optional[dict] = None
async def __call__(self, **kwargs) -> Any:
"""Execute the tool with given parameters."""
return await self.execute(**kwargs)
@abstractmethod
async def execute(self, **kwargs) -> Any:
"""Execute the tool with given parameters."""
def to_param(self) -> Dict:
"""Convert tool to function call format."""
return {
"type": "function",
"function": {
"name": self.name,
"description": self.description,
"parameters": self.parameters,
},
}
class ToolResult(BaseModel):
"""Represents the result of a tool execution."""
output: Any = Field(default=None)
error: Optional[str] = Field(default=None)
system: Optional[str] = Field(default=None)
Let's look at how an agent might use the PythonExecute tool to perform data analysis:
# Tool definition
class PythonExecute(BaseTool):
name: str = "python_execute"
description: str = "Execute Python code and return the result."
parameters: dict = {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "The Python code to execute.",
}
},
"required": ["code"],
}
async def execute(self, code: str) -> ToolResult:
try:
# Set up a secure execution environment
local_vars = {}
# Execute the code
exec(code, {"__builtins__": __builtins__}, local_vars)
# Return the result
return ToolResult(output=local_vars.get("result", "Code executed successfully"))
except Exception as e:
return ToolResult(error=str(e))
# Agent using the tool
async def analyze_data(agent, dataset_url):
# First, download the dataset
browser_result = await agent.execute_tool({
"name": "browser_use",
"arguments": {"url": dataset_url}
})
# Now, analyze the data with Python
python_result = await agent.execute_tool({
"name": "python_execute",
"arguments": {
"code": """
import pandas as pd
import matplotlib.pyplot as plt
# Load the data
data = pd.read_csv('downloaded_data.csv')
# Perform analysis
summary = data.describe()
correlations = data.corr()
# Create visualization
plt.figure(figsize=(10, 6))
data.plot(kind='scatter', x='feature1', y='feature2')
plt.savefig('analysis_plot.png')
# Store result for return
result = {
'summary': summary.to_dict(),
'correlations': correlations.to_dict()
}
"""
}
})
# Save the visualization
save_result = await agent.execute_tool({
"name": "file_saver",
"arguments": {
"file_path": "analysis_results.json",
"content": json.dumps(python_result.output)
}
})
return {
"analysis": python_result.output,
"visualization": "analysis_plot.png",
"saved_results": save_result.output
}
Now that we've explored the individual components of OpenManus, let's see how they all work together to create a powerful multi-agent AI system capable of handling complex tasks.
User submits a task request via the CLI, API, or web interface.
python client.py --task "Build a dashboard to visualize Tesla stock trends for the past year"
The Coordinator Agent analyzes the task and creates a plan using the Planning Agent.
Plan created:
1. Research Tesla stock data sources
2. Collect historical stock data
3. Select visualization framework
4. Create dashboard code
5. Test and refine dashboard
Multiple agents work on different aspects of the task in parallel.
Agents use various tools to perform actions and gather information.
# Browser Agent uses BrowserUseTool
await agent.execute_tool({
"name": "browser_use",
"arguments": {"url": "https://finance.yahoo.com/quote/TSLA"}
})
# Coder Agent uses PythonExecute
await agent.execute_tool({
"name": "python_execute",
"arguments": {"code": "import yfinance as yf\ndata = yf.download('TSLA', period='1y')\nresult = data.head()"}
})
The Flow Manager collects results from all agents and constructs the final solution.
OpenManus's architecture enables a wide range of complex applications:
Conducts comprehensive research on topics, synthesizing information from multiple sources, verifying facts, and generating coherent reports.
Plans, codes, and tests applications based on user requirements, handling both frontend and backend components with appropriate frameworks.
Collects, cleans, analyzes, and visualizes data from various sources, applying appropriate statistical methods and creating insightful visualizations.
Researches topics, plans content structure, generates written material, creates visual assets, and optimizes for specific platforms and audiences.
OpenManus represents a significant step forward in open-source AI agent architecture, offering a modular, extensible framework for building powerful multi-agent systems. By understanding its architecture, you can now:
Ready to dive deeper into OpenManus? Here are some ways to continue your journey:
Set up the OpenManus environment and experiment with its capabilities.
Visit GitHub RepositoryCreate new specialized agents or tools to enhance OpenManus's capabilities.
Contribution GuidelinesConnect with other developers and researchers working on AI agent systems.
Discussions & Issues