Skip to content

Commit 411d2fb

Browse files
strickvlclaudeCopilot
authored
Add MCP (Model Context Protocol) integration for enhanced research (#230)
* Add support for query parameter in config files Allow users to specify the research query in the configuration file as a fallback when not provided via CLI. This enables easier reuse of predefined queries for specific research scenarios. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Add MCP (Model Context Protocol) integration for enhanced research - Add new MCP step that uses Anthropic's Claude with MCP tools to perform additional targeted searches via Exa - Implement MCPResult model and custom materializer for visualization - Update final report step to properly handle MCPResult objects - Add preprocessing for Pydantic objects in MCP prompts - Update README with MCP integration details and requirements - Add support for MCP-powered searches including research papers, companies, LinkedIn, Wikipedia, and GitHub The MCP step runs after reflection/approval and before final report generation, providing an additional layer of research depth using advanced search capabilities. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Fix README images * Add Deep Research to main README * Fix typo in comment within compute_metrics function in misc.py - Corrected "mertic" to "metric" for clarity and accuracy in the code documentation. This change enhances the readability and maintainability of the code by ensuring that comments accurately reflect their intended meaning. * Fix LiteLLM model naming for Google Gemini models via OpenRouter When replacing SambaNova models with Google ones, the model names were using incorrect format "google/gemini-*" instead of the correct LiteLLM format "openrouter/google/gemini-*" for OpenRouter routing. Changes: - Update all model defaults from "google/gemini-*" to "openrouter/google/gemini-*" - Fix provider validation in llm_utils.py to handle OpenRouter's nested format - Update comments to clarify correct naming conventions - Ensure all Google Gemini models use proper OpenRouter prefix This fixes the "LLM Provider NOT provided" error when using Google models. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Update deep_research/utils/pydantic_models.py Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Copilot <[email protected]>
1 parent b0b4e11 commit 411d2fb

21 files changed

+657
-46
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ etc.
5858
| [Gamesense](gamesense) | 🤖 LLMOps | 🧠 LoRA, ⚡ Efficient Training | pytorch, peft, phi-2 |
5959
| [Nightwatch AI](nightwatch-ai) | 🤖 LLMOps | 📝 Summarization, 📊 Reporting | openai, supabase, slack |
6060
| [ResearchRadar](research-radar) | 🤖 LLMOps | 📝 Classification, 📊 Comparison | anthropic, huggingface, transformers |
61+
| [Deep Research](deep_research) | 🤖 LLMOps | 📝 Research, 📊 Reporting, 🔍 Web Search | anthropic, mcp, agents, openai |
6162
| [End-to-end Computer Vision](end-to-end-computer-vision) | 👁 CV | 🔎 Object Detection, 🏷️ Labeling | pytorch, label_studio, yolov8 |
6263
| [Magic Photobooth](magic-photobooth) | 👁 CV | 📷 Image Gen, 🎞️ Video Gen | stable-diffusion, huggingface |
6364
| [OmniReader](omni-reader) | 👁 CV | 📑 OCR, 📊 Evaluation, ⚙️ Batch Processing | polars, litellm, openai, ollama |

deep_research/README.md

Lines changed: 40 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ The ZenML Deep Research Agent is a scalable, modular pipeline that automates in-
1313

1414
- Creates a structured outline based on your research query
1515
- Researches each section through targeted web searches and LLM analysis
16+
- **NEW**: Performs additional MCP-powered searches using Anthropic's Model Context Protocol with Exa integration
1617
- Iteratively refines content through reflection cycles
1718
- Produces a comprehensive, well-formatted research report
1819
- Visualizes the research process and report structure in the ZenML dashboard
@@ -24,7 +25,7 @@ This project transforms exploratory notebook-based research into a production-gr
2425
The Deep Research Agent produces comprehensive, well-structured reports on any topic. Here's an example of research conducted on quantum computing:
2526

2627
<div align="center">
27-
<img alt="Sample Research Report" src="assets/sample_report.png" width="70%">
28+
<img alt="Sample Research Report" src="assets/sample_report.gif" width="70%">
2829
<p><em>Sample report generated by the Deep Research Agent</em></p>
2930
</div>
3031

@@ -40,8 +41,9 @@ The pipeline uses a parallel processing architecture for efficiency and breaks d
4041
6. **Reflection Generation**: Generate recommendations for improving research quality
4142
7. **Human Approval** (optional): Get human approval for additional searches
4243
8. **Execute Approved Searches**: Perform approved additional searches to fill gaps
43-
9. **Final Report Generation**: Compile all synthesized information into a coherent HTML report
44-
10. **Collect Tracing Metadata**: Gather comprehensive metrics about token usage, costs, and performance
44+
9. **MCP-Powered Search**: Use Anthropic's Model Context Protocol to perform additional targeted searches via Exa
45+
10. **Final Report Generation**: Compile all synthesized information into a coherent HTML report
46+
11. **Collect Tracing Metadata**: Gather comprehensive metrics about token usage, costs, and performance
4547

4648
This architecture enables:
4749
- Better reproducibility and caching of intermediate results
@@ -55,6 +57,7 @@ This architecture enables:
5557

5658
- **LLM Integration**: Uses litellm for flexible access to various LLM providers
5759
- **Web Research**: Utilizes Tavily API for targeted internet searches
60+
- **MCP Integration**: Leverages Anthropic's Model Context Protocol with Exa for enhanced research capabilities
5861
- **ZenML Orchestration**: Manages pipeline flow, artifacts, and caching
5962
- **Reproducibility**: Track every step, parameter, and output via ZenML
6063
- **Visualizations**: Interactive visualizations of the research structure and progress
@@ -70,6 +73,8 @@ This architecture enables:
7073
- ZenML installed and configured
7174
- API key for your preferred LLM provider (configured with litellm)
7275
- Tavily API key
76+
- Anthropic API key (for MCP integration)
77+
- Exa API key (for MCP-powered searches)
7378
- Langfuse account for LLM tracking (optional but recommended)
7479

7580
### Installation
@@ -85,7 +90,8 @@ pip install -r requirements.txt
8590
# Set up API keys
8691
export OPENAI_API_KEY=your_openai_key # Or another LLM provider key
8792
export TAVILY_API_KEY=your_tavily_key # For Tavily search (default)
88-
export EXA_API_KEY=your_exa_key # For Exa search (optional)
93+
export EXA_API_KEY=your_exa_key # For Exa search and MCP integration (required for MCP)
94+
export ANTHROPIC_API_KEY=your_anthropic_key # For MCP integration (required)
8995

9096
# Set up Langfuse for LLM tracking (optional)
9197
export LANGFUSE_PUBLIC_KEY=your_public_key
@@ -227,6 +233,31 @@ python run.py --num-results 5 # Get 5 results per sea
227233
python run.py --num-results 10 --search-provider exa # 10 results with Exa
228234
```
229235

236+
### MCP (Model Context Protocol) Integration
237+
238+
The pipeline includes a powerful MCP integration step that uses Anthropic's Model Context Protocol to perform additional targeted searches. This step runs after the reflection phase and before final report generation, providing an extra layer of research depth.
239+
240+
#### How MCP Works
241+
242+
The MCP step:
243+
1. Receives the synthesized research data and analysis from previous steps
244+
2. Uses Claude (via Anthropic API) with MCP tools to identify gaps or areas needing more research
245+
3. Performs targeted searches using Exa's advanced search capabilities including:
246+
- `research_paper_search`: Academic paper and research content
247+
- `company_research`: Company website crawling for business information
248+
- `competitor_finder`: Find company competitors
249+
- `linkedin_search`: Search LinkedIn for companies and people
250+
- `wikipedia_search_exa`: Wikipedia article retrieval
251+
- `github_search`: GitHub repositories and issues
252+
253+
#### MCP Requirements
254+
255+
To use the MCP integration, you need:
256+
- `ANTHROPIC_API_KEY`: For accessing Claude with MCP capabilities
257+
- `EXA_API_KEY`: For the Exa search tools used by MCP
258+
259+
The MCP step uses Claude Sonnet 4.0 (claude-sonnet-4-20250514) which supports the MCP protocol.
260+
230261
### Search Providers
231262

232263
The pipeline supports multiple search providers for flexibility and comparison:
@@ -364,6 +395,7 @@ zenml_deep_research/
364395
│ ├── execute_approved_searches_step.py # Execute approved searches
365396
│ ├── generate_reflection_step.py # Generate reflection without execution
366397
│ ├── iterative_reflection_step.py # Legacy combined reflection step
398+
│ ├── mcp_step.py # MCP integration for additional searches
367399
│ ├── merge_results_step.py
368400
│ ├── process_sub_question_step.py
369401
│ ├── pydantic_final_report_step.py
@@ -421,16 +453,16 @@ query: "Climate change policy debates"
421453
steps:
422454
initial_query_decomposition_step:
423455
parameters:
424-
llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
456+
llm_model: "google/gemini-2.0-flash-lite-001"
425457

426458
cross_viewpoint_analysis_step:
427459
parameters:
428-
llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
460+
llm_model: "google/gemini-2.0-flash-lite-001"
429461
viewpoint_categories: ["scientific", "political", "economic", "social", "ethical", "historical"]
430462

431463
iterative_reflection_step:
432464
parameters:
433-
llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
465+
llm_model: "google/gemini-2.0-flash-lite-001"
434466
max_additional_searches: 2
435467
num_results_per_search: 3
436468

@@ -442,7 +474,7 @@ steps:
442474

443475
pydantic_final_report_step:
444476
parameters:
445-
llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
477+
llm_model: "google/gemini-2.0-flash-lite-001"
446478

447479
# Environment settings
448480
settings:
Loading
3.56 MB
Loading

deep_research/configs/enhanced_research.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,11 @@ langfuse_project_name: "deep-research"
2727
steps:
2828
initial_query_decomposition_step:
2929
parameters:
30-
llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
30+
llm_model: "openrouter/google/gemini-2.5-flash-preview-05-20"
3131

3232
cross_viewpoint_analysis_step:
3333
parameters:
34-
llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
34+
llm_model: "openrouter/google/gemini-2.5-flash-preview-05-20"
3535
viewpoint_categories:
3636
[
3737
"scientific",
@@ -44,7 +44,7 @@ steps:
4444

4545
generate_reflection_step:
4646
parameters:
47-
llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
47+
llm_model: "openrouter/google/gemini-2.5-flash-preview-05-20"
4848

4949
get_research_approval_step:
5050
parameters:
@@ -53,11 +53,11 @@ steps:
5353

5454
execute_approved_searches_step:
5555
parameters:
56-
llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
56+
llm_model: "openrouter/google/gemini-2.5-flash-preview-05-20"
5757

5858
pydantic_final_report_step:
5959
parameters:
60-
llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
60+
llm_model: "openrouter/google/gemini-2.5-flash-preview-05-20"
6161

6262
# Environment settings
6363
settings:

deep_research/materializers/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
from .analysis_data_materializer import AnalysisDataMaterializer
99
from .approval_decision_materializer import ApprovalDecisionMaterializer
1010
from .final_report_materializer import FinalReportMaterializer
11+
from .mcp_result_materializer import MCPResultMaterializer
1112
from .prompt_materializer import PromptMaterializer
1213
from .query_context_materializer import QueryContextMaterializer
1314
from .search_data_materializer import SearchDataMaterializer
@@ -23,4 +24,5 @@
2324
"SynthesisDataMaterializer",
2425
"AnalysisDataMaterializer",
2526
"FinalReportMaterializer",
27+
"MCPResultMaterializer",
2628
]

0 commit comments

Comments
 (0)