arterm-sedov commited on
Commit
5a69551
Β·
0 Parent(s):

first commit

Browse files
.cursor/rules/cmw-platform-agent.mdc ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ alwaysApply: true
3
+ ---
4
+
5
+ - Be super smart, create super lean and dry code.
6
+ - Use abstractions.
7
+ - Group and isolate code based on its function in different files to avoid clutter.
8
+ - Do not duplicate code, encapsulate any reused code in methods/functions.
9
+ - Never break existing code.
10
+ - Do not delete logging, but update it.
11
+ - Do not delete comments, rather update them.
12
+ - Produce flawless code.
13
+ - Reanalyze your changes twice for any issues you might have introduced.
14
+ - Place imports always on top.
15
+ - Use environment variables for secrets.
16
+ - Ensure testability and extensibility.
.env.example ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ HF_TOKEN=XXX
2
+ HUGGINGFACE_API_KEY=XXX
3
+ SUPABASE_URL=XXX
4
+ SUPABASE_KEY=XXX
5
+ GEMINI_KEY=XXX
6
+ GROQ_API_KEY=XXX
7
+ TAVILY_API_KEY=XXX
.gitattributes ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.csv filter=lfs diff=lfs merge=lfs -text
37
+ *.png filter=lfs diff=lfs merge=lfs -text
38
+ *.mp3 filter=lfs diff=lfs merge=lfs -text
39
+ *.xlsx filter=lfs diff=lfs merge=lfs -text
40
+ *.ttf filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ .env
2
+ venv/
3
+ __pycache__/
4
+ !logs/*.log
.vscode/settings.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "terminal.integrated.defaultProfile.windows": "Ubuntu"
3
+ }
README.md ADDED
@@ -0,0 +1,387 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ emoji: πŸ•΅πŸ»β€β™‚οΈ
3
+ colorFrom: indigo
4
+ colorTo: indigo
5
+ sdk: gradio
6
+ sdk_version: 5.35.0
7
+ app_file: app.py
8
+ pinned: false
9
+ hf_oauth: true
10
+ hf_oauth_expiration_minutes: 480
11
+ ---
12
+
13
+ # CMW Platform Agent
14
+
15
+ ---
16
+
17
+ **Authors:** Arte(r)m Sedov & Marat Mutalimov
18
+
19
+ **Github:** <https://github.com/arterm-sedov/>
20
+
21
+ **This repo:** <https://github.com/arterm-sedov/cmw-platform-agent>
22
+
23
+ ## πŸš€ The CMW Platform Agent
24
+
25
+ Behold the CMW Platform Agent β€” a robust and extensible system designed for real-world reliability and performance.
26
+
27
+ ## πŸ•΅πŸ»β€β™‚οΈ What is this project?
28
+
29
+ This is an **experimental multi-LLM agent** that demonstrates AI agent and CMW Platform iteration:
30
+
31
+ - **Input**: The user asks the CMW Platform Agent to create entities in the CMW Platform instance.
32
+ - **Task**: The agent agent has a set of tools to translate natural language user requests into the CMW Platform API calls.
33
+
34
+ ## 🎯 Project Goals
35
+
36
+ To create an agent that will allow batch entity creation within the CMW Platform.
37
+
38
+ ## ❓ Why This Project?
39
+
40
+ This experimental system is based on current AI agent technology and demonstrates:
41
+
42
+ - **Advanced Tool Usage**: Seamless integration of 20+ specialized tools including AI-powered tools and third-party AI engines
43
+ - **Multi-Provider Resilience**: Automatic testing and switching between different LLM providers
44
+ - **Comprehensive Tracing**: Complete visibility into the agent's decision-making process
45
+ - **Structured Initialization Summary:** After startup, a clear table shows which models/providers are available, with/without
46
+ tools, and any errorsβ€”so you always know your agent's capabilities.
47
+
48
+ ## πŸ“Š What You'll Find Here
49
+
50
+ - **Documentation**: Detailed technical specifications and usage guides
51
+
52
+ ## πŸ—οΈ Technical Architecture
53
+
54
+ ### LLM Configuration
55
+
56
+ The agent uses a sophisticated multi-LLM approach with the following providers in sequence:
57
+
58
+ 1. **OpenRouter** (Primary)
59
+ - Models: `deepseek/deepseek-chat-v3-0324:free`, `mistralai/mistral-small-3.2-24b-instruct:free`, `openrouter/cypher-alpha:free`
60
+ - Token Limits: 100K-1M tokens
61
+ - Tool Support: βœ… Full tool-calling capabilities
62
+
63
+ 2. **Google Gemini** (Fallback)
64
+ - Model: `gemini-2.5-pro`
65
+ - Token Limit: 2M tokens (virtually unlimited)
66
+ - Tool Support: βœ… Full tool-calling capabilities
67
+
68
+ 3. **Groq** (Second Fallback)
69
+ - Model: `qwen-qwq-32b`
70
+ - Token Limit: 3K tokens
71
+ - Tool Support: βœ… Full tool-calling capabilities
72
+
73
+ 4. **HuggingFace** (Final Fallback)
74
+ - Models: `Qwen/Qwen2.5-Coder-32B-Instruct`, `microsoft/DialoGPT-medium`, `gpt2`
75
+ - Token Limits: 1K tokens
76
+ - Tool Support: ❌ No tool-calling (text-only responses)
77
+
78
+ ### Tool Suite
79
+
80
+ The agent includes 20+ specialized tools:
81
+
82
+ - **Attribute creation**: creates an attribute in a specified template.
83
+
84
+ ### Performance Expectations
85
+
86
+ - **Success Rate**: 50-65% entities created
87
+ - **Response Time**: 30-100 seconds per question (depending on complexity and LLM)
88
+ - **Tool Usage**: 2-8 tool calls per request on average
89
+ - **Fallback Rate**: 20-40% of questions require human clarification
90
+
91
+ ## Dataset Structure
92
+
93
+ The output trace facilitates:
94
+
95
+ - **Debugging**: Complete visibility into execution flow
96
+ - **Performance Analysis**: Detailed timing and token usage metrics
97
+ - **Error Analysis**: Comprehensive error information with context
98
+ - **Tool Usage Analysis**: Complete tool execution history
99
+ - **LLM Comparison**: Detailed comparison of different LLM behaviors
100
+ - **Cost Optimization**: Token usage analysis for cost management
101
+
102
+ Each request trace is uploaded to a HuggingFace dataset.
103
+
104
+ The dataset contains comprehensive execution traces with the following structure:
105
+
106
+ ### Root Level Fields
107
+
108
+ ```python
109
+ {
110
+ "question": str, # Original question text
111
+ "file_name": str, # Name of attached file (if any)
112
+ "file_size": int, # Length of base64 file data (if any)
113
+ "start_time": str, # ISO format timestamp when processing started
114
+ "end_time": str, # ISO format timestamp when processing ended
115
+ "total_execution_time": float, # Total execution time in seconds
116
+ "tokens_total": int, # Total tokens used across all LLM calls
117
+ "debug_output": str, # Comprehensive debug output as text
118
+ }
119
+ ```
120
+
121
+ ### LLM Traces
122
+
123
+ ```python
124
+ "llm_traces": {
125
+ "llm_type": [ # e.g., "openrouter", "gemini", "groq", "huggingface"
126
+ {
127
+ "call_id": str, # e.g., "openrouter_call_1"
128
+ "llm_name": str, # e.g., "deepseek-chat-v3-0324" or "Google Gemini"
129
+ "timestamp": str, # ISO format timestamp
130
+
131
+ # === LLM CALL INPUT ===
132
+ "input": {
133
+ "messages": List, # Input messages (trimmed for base64)
134
+ "use_tools": bool, # Whether tools were used
135
+ "llm_type": str # LLM type
136
+ },
137
+
138
+ # === LLM CALL OUTPUT ===
139
+ "output": {
140
+ "content": str, # Response content
141
+ "tool_calls": List, # Tool calls from response
142
+ "response_metadata": dict, # Response metadata
143
+ "raw_response": dict # Full response object (trimmed for base64)
144
+ },
145
+
146
+ # === TOOL EXECUTIONS ===
147
+ "tool_executions": [
148
+ {
149
+ "tool_name": str, # Name of the tool
150
+ "args": dict, # Tool arguments (trimmed for base64)
151
+ "result": str, # Tool result (trimmed for base64)
152
+ "execution_time": float, # Time taken for tool execution
153
+ "timestamp": str, # ISO format timestamp
154
+ "logs": List # Optional: logs during tool execution
155
+ }
156
+ ],
157
+
158
+ # === TOOL LOOP DATA ===
159
+ "tool_loop_data": [
160
+ {
161
+ "step": int, # Current step number
162
+ "tool_calls_detected": int, # Number of tool calls detected
163
+ "consecutive_no_progress": int, # Steps without progress
164
+ "timestamp": str, # ISO format timestamp
165
+ "logs": List # Optional: logs during this step
166
+ }
167
+ ],
168
+
169
+ # === EXECUTION METRICS ===
170
+ "execution_time": float, # Time taken for this LLM call
171
+ "total_tokens": int, # Estimated token count (fallback)
172
+
173
+ # === TOKEN USAGE TRACKING ===
174
+ "token_usage": { # Detailed token usage data
175
+ "prompt_tokens": int, # Total prompt tokens across all calls
176
+ "completion_tokens": int, # Total completion tokens across all calls
177
+ "total_tokens": int, # Total tokens across all calls
178
+ "call_count": int, # Number of calls made
179
+ "calls": [ # Individual call details
180
+ {
181
+ "call_id": str, # Unique call identifier
182
+ "timestamp": str, # ISO format timestamp
183
+ "prompt_tokens": int, # This call's prompt tokens
184
+ "completion_tokens": int, # This call's completion tokens
185
+ "total_tokens": int, # This call's total tokens
186
+ "finish_reason": str, # How the call finished (optional)
187
+ "system_fingerprint": str, # System fingerprint (optional)
188
+ "input_token_details": dict, # Detailed input breakdown (optional)
189
+ "output_token_details": dict # Detailed output breakdown (optional)
190
+ }
191
+ ]
192
+ },
193
+
194
+ # === ERROR INFORMATION ===
195
+ "error": { # Only present if error occurred
196
+ "type": str, # Exception type name
197
+ "message": str, # Error message
198
+ "timestamp": str # ISO format timestamp
199
+ },
200
+
201
+ # === LLM-SPECIFIC LOGS ===
202
+ "logs": List, # Logs specific to this LLM call
203
+
204
+ # === FINAL ANSWER ENFORCEMENT ===
205
+ "final_answer_enforcement": [ # Optional: logs from _force_final_answer for this LLM call
206
+ {
207
+ "timestamp": str, # ISO format timestamp
208
+ "message": str, # Log message
209
+ "function": str # Function that generated the log (always "_force_final_answer")
210
+ }
211
+ ]
212
+ }
213
+ ]
214
+ }
215
+ ```
216
+
217
+ ### Per-LLM Stdout Capture
218
+
219
+ ```python
220
+ "per_llm_stdout": [
221
+ {
222
+ "llm_type": str, # LLM type
223
+ "llm_name": str, # LLM name (model ID or provider name)
224
+ "call_id": str, # Call ID
225
+ "timestamp": str, # ISO format timestamp
226
+ "stdout": str # Captured stdout content
227
+ }
228
+ ]
229
+ ```
230
+
231
+ ### Question-Level Logs
232
+
233
+ ```python
234
+ "logs": [
235
+ {
236
+ "timestamp": str, # ISO format timestamp
237
+ "message": str, # Log message
238
+ "function": str # Function that generated the log
239
+ }
240
+ ]
241
+ ```
242
+
243
+ ### Final Results
244
+
245
+ ```python
246
+ "final_result": {
247
+ "submitted_answer": str, # Final answer (consistent with code)
248
+ "similarity_score": float, # Similarity score (0.0-1.0)
249
+ "llm_used": str, # LLM that provided the answer
250
+ "reference": str, # Reference answer used
251
+ "question": str, # Original question
252
+ "file_name": str, # File name (if any)
253
+ "error": str # Error message (if any)
254
+ }
255
+ ```
256
+
257
+ ## Key Features
258
+
259
+ ### Intelligent Fallback System
260
+
261
+ The agent automatically tries multiple LLM providers in sequence:
262
+
263
+ - **OpenRouter** (Primary): Fast, reliable, good tool support, has tight daily limits on free tiers
264
+ - **Google Gemini** (Fallback): High token limits, excellent reasoning
265
+ - **Groq** (Second Fallback): Fast inference, good for simple tasks, has tight token limits per request
266
+ - **HuggingFace** (Final Fallback): Local models, no API costs, does not support tools typically
267
+
268
+ ### Advanced Tool Management
269
+
270
+ - **Automatic Tool Selection**: LLM chooses appropriate tools based on question
271
+ - **Tool Deduplication**: Prevents duplicate tool calls using vector similarity
272
+ - **Usage Limits**: Prevents excessive tool usage (e.g., max 3 web searches per question)
273
+ - **Error Handling**: Graceful degradation when tools fail
274
+
275
+ ### Sophisticated implementations
276
+
277
+ - **Recursive Truncation**: Separate methods for base64 and max-length truncation
278
+ - **Recursive JSON Serialization**: Ensures the complex objects ar passable as HuggingFace JSON dataset
279
+ - **Decorator-Based Print Capture**: Captures all print statements into trace data
280
+ - **Multilevel Contextual Logging**: Logs tied to specific execution contexts
281
+ - **Per-LLM Stdout Traces**: Stdout captured separately for each LLM attempt in a human-readable form
282
+ - **Consistent LLM Schema**: Data structures for consistent model identification, configuring and calling
283
+ - **Complete Trace Model**: Hierarchical structure with comprehensive coverage
284
+ - **Structured dataset uploads** to HuggingFace datasets
285
+ - **Schema validation** against `dataset_config.json`
286
+ - **Three data splits**: `init` (initialization), `runs` (legacy aggregated results), and `runs_new` (granular per-question results)
287
+ - **Robust error handling** with fallback mechanisms
288
+
289
+ ### Comprehensive Tracing
290
+
291
+ Every question generates a complete execution trace including:
292
+
293
+ - **LLM Interactions**: All input/output for each LLM attempt
294
+ - **Tool Executions**: Detailed logs of every tool call
295
+ - **Performance Metrics**: Token usage, execution times, success rates
296
+ - **Error Information**: Complete error context and fallback decisions
297
+ - **Stdout Capture**: All debug output from each LLM attempt
298
+
299
+ ### Rate Limiting & Reliability
300
+
301
+ - **Smart Rate Limiting**: Different intervals for different providers
302
+ - **Token Management**: Automatic truncation and summarization
303
+ - **Error Recovery**: Automatic retry with different LLMs
304
+ - **Graceful Degradation**: Continues processing even if some components fail
305
+
306
+ ## Usage
307
+
308
+ ### Live Demo
309
+
310
+ Visit the Gradio interface to test the agent interactively:
311
+
312
+ <https://localhost/cmw-platform-agent>
313
+
314
+ ### Programmatic Usage
315
+
316
+ ```python
317
+ from agent import GaiaAgent
318
+
319
+ # Initialize the agent
320
+ agent = GaiaAgent()
321
+
322
+ # Process a question
323
+ result = agent("What is the capital of France?")
324
+
325
+ # Access the results
326
+ print(f"Answer: {result['submitted_answer']}")
327
+ print(f"Similarity: {result['similarity_score']}")
328
+ print(f"LLM Used: {result['llm_used']}")
329
+ ```
330
+
331
+ ### Dataset Access
332
+
333
+ ```python
334
+ from datasets import load_dataset
335
+
336
+ # Load the dataset
337
+ dataset = load_dataset("arterm-sedov/agent-course-final-assignment")
338
+
339
+ # Access initialization data
340
+ init_data = dataset["init"]["train"]
341
+
342
+ # Access evaluation results
343
+ runs_data = dataset["runs_new"]["train"]
344
+ ```
345
+
346
+ ## File Structure
347
+
348
+ The main agent runtime files are:
349
+
350
+ ```
351
+ gaia-agent/
352
+ β”œβ”€β”€ agent.py # Main agent implementation
353
+ β”œβ”€β”€ app.py # Gradio web interface
354
+ β”œβ”€β”€ tools.py # Tool definitions and implementations
355
+ β”œβ”€β”€ utils.py # Core upload functions with validation
356
+ β”œβ”€β”€ system_prompt.json # System prompt configuration
357
+ └── logs/ # Execution logs and results
358
+ ```
359
+
360
+ There are other files in the root directory, but they are not used at the runtime, rather for setting up the Supabase vector store.
361
+
362
+ ## Performance Statistics
363
+
364
+ The agent has been evaluated on complex benchmark questions with the following results:
365
+
366
+ - **Overall Success Rate**: 50-65%, up to 80% with all four LLMs available
367
+ - **Tool Usage**: Average 2-8 tools per question
368
+ - **LLM Fallback Rate**: 20-40% of questions require multiple LLMs
369
+ - **Response Time**: 30-120 seconds per question
370
+ - **Token Usage**: 1K-100K tokens per question (depending on complexity)
371
+
372
+ ## Contributing
373
+
374
+ This is an experimental research project. Contributions are welcome in the form of:
375
+
376
+ - **Bug Reports**: Issues with the agent's reasoning or tool usage
377
+ - **Feature Requests**: New tools or capabilities
378
+ - **Performance Improvements**: Optimizations for speed or accuracy
379
+ - **Documentation**: Improvements to this README or code comments
380
+
381
+ ## License
382
+
383
+ This project is part of the Hugging Face Agents Course final assignment. See the course materials for licensing information.
384
+
385
+ ---
386
+
387
+ **Built with ❀️ by Arte(r)m Sedov using Cursor IDE**
SETUP_INSTRUCTIONS.md ADDED
@@ -0,0 +1,222 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # arterm-sedov Setup Instructions
2
+
3
+ ## Overview
4
+
5
+ Welcome to the arterm-sedov CMW Platform Agent project! This guide ensures a smooth setup for both Windows and Linux/macOS, leveraging robust multi-LLM orchestration, model-level tool support, and transparent initialization diagnostics.
6
+
7
+ ## Prerequisites
8
+
9
+ - **Python 3.8 or higher**
10
+ - **Git** (for cloning)
11
+ - **Internet connection**
12
+
13
+ ## Quick Start
14
+
15
+ ### Option 1: Automated Setup (Recommended)
16
+
17
+ ```bash
18
+ # Clone the repository (if not already done)
19
+ git clone <repository-url>
20
+ cd arterm-sedov
21
+
22
+ # Run the automated setup script
23
+ python setup_venv.py
24
+ ```
25
+
26
+ This script will:
27
+ - Detect your platform and Python version
28
+ - Create a virtual environment
29
+ - Use the correct requirements file for your OS
30
+ - Install all dependencies in order
31
+ - Verify installation and print next steps
32
+ - Print a summary of LLM/model initialization and tool support
33
+
34
+ ### Option 2: Manual Setup
35
+
36
+ #### Step 1: Create Virtual Environment
37
+
38
+ **Windows:**
39
+ ```cmd
40
+ python -m venv venv
41
+ venv\Scripts\activate
42
+ ```
43
+
44
+ **Linux/macOS:**
45
+ ```bash
46
+ python3 -m venv venv
47
+ source venv/bin/activate
48
+ ```
49
+
50
+ #### Step 2: Install Dependencies
51
+
52
+ **For Windows:**
53
+ ```bash
54
+ python -m pip install --upgrade pip
55
+ pip install wheel setuptools
56
+ pip install -r requirements.win.txt
57
+ ```
58
+
59
+ **For Linux/macOS:**
60
+ ```bash
61
+ python -m pip install --upgrade pip
62
+ pip install -r requirements.txt
63
+ ```
64
+
65
+ ## Requirements Files
66
+
67
+ - `requirements.txt`: For Linux/macOS/Hugging Face Spaces
68
+ - `requirements.win.txt`: For Windows (handles platform quirks)
69
+
70
+ The setup script auto-selects the right file for you.
71
+
72
+ ## Environment Variables Setup
73
+
74
+ Create a `.env` file in the project root:
75
+
76
+ ```env
77
+ # Required for Google Gemini integration
78
+ GEMINI_KEY=your_gemini_api_key_here
79
+ # Required for Supabase vector store
80
+ SUPABASE_URL=your_supabase_url_here
81
+ SUPABASE_KEY=your_supabase_key_here
82
+ # Optional: For HuggingFace, OpenRouter, Groq
83
+ HUGGINGFACEHUB_API_TOKEN=your_hf_token
84
+ OPENROUTER_API_KEY=your_openrouter_key
85
+ GROQ_API_KEY=your_groq_key
86
+ ```
87
+
88
+ ### Getting API Keys
89
+
90
+ - **Google Gemini:** [Google AI Studio](https://makersuite.google.com/app/apikey)
91
+ - **Supabase:** [supabase.com](https://supabase.com) > Settings > API
92
+ - **HuggingFace:** [HuggingFace Tokens](https://huggingface.co/settings/tokens)
93
+
94
+ ## Vector Store Setup
95
+
96
+ ```bash
97
+ python setup_vector_store.py
98
+ ```
99
+ This loads reference Q&A into Supabase for similarity search.
100
+
101
+ ## Running the Agent
102
+
103
+ ```bash
104
+ python app.py
105
+ ```
106
+ This launches the Gradio web interface for interactive testing and evaluation.
107
+
108
+ ## LLM Initialization & Tool Support
109
+
110
+ - On startup, each LLM/model is tested for plain and tool-calling support.
111
+ - **Google Gemini** is always bound with tools if enabled, even if the tool test returns empty (tool-calling works in practice; a warning is logged for transparency).
112
+ - **OpenRouter, Groq, and HuggingFace** are supported with model-level tool-calling detection and fallback.
113
+ - After initialization, a summary table is printed showing provider, model, plain/tools status, and any errorsβ€”so you always know what's available.
114
+
115
+ ## Troubleshooting
116
+
117
+ ### Common Issues
118
+
119
+ 1. **Wrong requirements file used:**
120
+ - The setup script auto-detects your platform. To force a file:
121
+ ```bash
122
+ pip install -r requirements.win.txt # Windows
123
+ pip install -r requirements.txt # Linux/macOS
124
+ ```
125
+ 2. **Virtual environment creation fails:**
126
+ - Remove and recreate:
127
+ ```bash
128
+ rm -rf venv # Linux/macOS
129
+ rmdir /s /q venv # Windows
130
+ python setup_venv.py
131
+ ```
132
+ 3. **Permission errors:**
133
+ - Use `--user` flag:
134
+ ```bash
135
+ pip install --user -r requirements.txt
136
+ ```
137
+ 4. **Import errors after install:**
138
+ - Reinstall dependencies:
139
+ ```bash
140
+ pip install --force-reinstall -r requirements.txt
141
+ ```
142
+ 5. **API key issues:**
143
+ - Check your `.env` file for correct format and valid keys.
144
+
145
+ ### Platform-Specific Issues
146
+
147
+ **Windows:**
148
+ - PowerShell execution policy: `Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser`
149
+ - Visual Studio Build Tools may be required for TensorFlow. Or use conda:
150
+ ```cmd
151
+ conda install pandas numpy
152
+ pip install -r requirements.win.txt
153
+ ```
154
+
155
+ **Linux/macOS:**
156
+ - Install system packages if needed:
157
+ ```bash
158
+ sudo apt-get install python3-dev build-essential # Ubuntu/Debian
159
+ xcode-select --install # macOS
160
+ ```
161
+
162
+ ## Verification
163
+
164
+ After setup, verify everything works:
165
+
166
+ ```python
167
+ import numpy as np
168
+ import pandas as pd
169
+ import langchain
170
+ import supabase
171
+ import gradio
172
+ print("βœ… All core packages imported successfully!")
173
+ print(f"Pandas version: {pd.__version__}")
174
+ ```
175
+
176
+ ## Project Structure
177
+
178
+ ```
179
+ arterm-sedov/
180
+ β”œβ”€β”€ agent.py # Main agent implementation
181
+ β”œβ”€β”€ app.py # Gradio web interface
182
+ β”œβ”€β”€ tools.py # Tool functions for the agent
183
+ β”œβ”€β”€ setup_venv.py # Cross-platform setup script
184
+ β”œβ”€β”€ setup_vector_store.py # Vector store initialization
185
+ β”œβ”€β”€ requirements.txt # Dependencies (Linux/macOS/HF Space)
186
+ β”œβ”€β”€ requirements.win.txt # Dependencies (Windows)
187
+ β”œβ”€β”€ system_prompt.txt # Agent system prompt
188
+ β”œβ”€β”€ metadata.jsonl # Reference Q&A data
189
+ β”œβ”€β”€ supabase_docs.csv # Vector store backup
190
+ └── .env # Environment variables (create this)
191
+ ```
192
+
193
+ ## Advanced Configuration
194
+
195
+ ### Custom Model Providers
196
+
197
+ The agent supports multiple LLM providers with robust fallback and model-level tool support:
198
+ - **Google Gemini**: Always bound with tools if enabled (tool-calling works even if test is empty)
199
+ - **Groq, OpenRouter, HuggingFace**: Model-level tool-calling detection and fallback
200
+
201
+ ### Vector Store Configuration
202
+ - **Table:** `agent_course_reference`
203
+ - **Embedding Model:** `sentence-transformers/all-mpnet-base-v2`
204
+ - **Similarity Search:** Cosine similarity
205
+
206
+ ### Tool Configuration
207
+ - Math, web, file, image, chess, code, and moreβ€”modular and extensible
208
+
209
+ ## Support
210
+
211
+ - See the summary table after startup for LLM/model/tool status
212
+ - Review error logs for diagnostics
213
+ - For advanced help, see the troubleshooting section above
214
+
215
+ ## Next Steps
216
+
217
+ 1. **Test the agent** with sample questions
218
+ 2. **Run the evaluation** for performance metrics
219
+ 3. **Submit to CMW Platform Agent benchmark** for scoring
220
+ 4. **Customize the agent** for your needs
221
+
222
+ The agent is now ready for the CMW Platform benchmarkβ€”battle-tested, transparent, and extensible. πŸš€
agent.py ADDED
The diff for this file is too large to render. See raw diff
 
app.py ADDED
@@ -0,0 +1,735 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import gradio as gr
3
+ import requests
4
+ import inspect
5
+ import pandas as pd
6
+ import random
7
+ import datetime
8
+ import subprocess
9
+ import json
10
+ import re
11
+ import base64
12
+ from typing import Any
13
+ from agent import GaiaAgent
14
+ from utils import TRACES_DIR, upload_run_data, ensure_valid_answer
15
+
16
+ # (Keep Constants as is)
17
+ # --- Constants ---
18
+ DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
19
+
20
+ # --- Main Agent Definition ---
21
+ # Instantiate the agent once (choose provider as needed)
22
+ AGENT_PROVIDER = os.environ.get("AGENT_PROVIDER", "google")
23
+ try:
24
+ agent = GaiaAgent(provider=AGENT_PROVIDER)
25
+ except Exception as e:
26
+ agent = None
27
+ print(f"Error initializing GaiaAgent: {e}")
28
+
29
+
30
+
31
+ # Helper to save DataFrame as CSV and upload via API
32
+ def save_df_to_csv(df, path):
33
+ try:
34
+ # Convert DataFrame to CSV string
35
+ csv_content = df.to_csv(index=False, encoding="utf-8")
36
+
37
+ # Upload via API
38
+ success = save_and_commit_file(
39
+ file_path=path,
40
+ content=csv_content,
41
+ commit_message=f"Add results CSV {path}"
42
+ )
43
+ if success:
44
+ print(f"βœ… Results CSV uploaded successfully: {path}")
45
+ else:
46
+ print(f"⚠️ Results CSV upload failed, saved locally only: {path}")
47
+ # Fallback to local save
48
+ df.to_csv(path, index=False, encoding="utf-8")
49
+ except Exception as e:
50
+ print(f"⚠️ Results CSV upload error: {e}, saving locally only")
51
+ # Fallback to local save
52
+ df.to_csv(path, index=False, encoding="utf-8")
53
+
54
+ return path
55
+
56
+ # --- Provide init log for download on app load ---
57
+ def get_init_log():
58
+ init_log_path = getattr(agent, "init_log_path", None)
59
+ if init_log_path and os.path.exists(init_log_path):
60
+ return init_log_path
61
+ return None
62
+
63
+ def generate_run_id(timestamp: str, idx: int) -> str:
64
+ """Generate a unique run ID for a question."""
65
+ return f"{timestamp}_q{idx+1:02d}"
66
+
67
+ def upload_questions_with_results(results_log: list, timestamp: str, username: str, total_score: str, success_type: str = "final"):
68
+ """
69
+ Upload all questions with their results to the runs_new dataset.
70
+
71
+ Args:
72
+ results_log: List of question results
73
+ timestamp: Timestamp for run IDs
74
+ username: Username for the run
75
+ total_score: Final score from evaluator
76
+ success_type: Type of upload ("final evaluated results" or "unevaluated results")
77
+ """
78
+ successful_uploads = 0
79
+ for idx, result in enumerate(results_log):
80
+ try:
81
+ run_id = generate_run_id(timestamp, idx)
82
+
83
+ # Get LLM stats JSON for this run
84
+ llm_stats_json = agent._get_llm_stats_json()
85
+
86
+ # Create updated run data for this question
87
+ run_data = create_run_data_for_runs_new(
88
+ run_id,
89
+ idx,
90
+ len(results_log),
91
+ result,
92
+ llm_stats_json,
93
+ username,
94
+ total_score
95
+ )
96
+
97
+ success = upload_run_data(run_data, split="runs_new")
98
+ if success:
99
+ print(f"βœ… Uploaded question {idx+1} with {success_type}. Run ID: {run_id}")
100
+ successful_uploads += 1
101
+ else:
102
+ print(f"⚠️ Failed to upload question {idx+1} with {success_type}")
103
+
104
+ except Exception as e:
105
+ print(f"⚠️ Failed to upload question {idx+1}. Error: {e}")
106
+
107
+ return successful_uploads
108
+
109
+ def create_run_data_for_runs_new(
110
+ run_id: str,
111
+ idx: int,
112
+ total_questions: int,
113
+ result: dict,
114
+ llm_stats_json: dict,
115
+ username: str = "N/A",
116
+ total_score: str = "N/A"
117
+ ) -> dict:
118
+ """
119
+ Create run data for the runs_new split.
120
+
121
+ Args:
122
+ run_id: Unique identifier for the run
123
+ idx: Index of the question in the batch (0-based)
124
+ total_questions: Total number of questions in the batch
125
+ result: Individual result dictionary
126
+ llm_stats_json: LLM statistics JSON
127
+ username: Username of the person running the agent
128
+ total_score: Overall score for the complete evaluation run
129
+
130
+ Returns:
131
+ dict: Run data for upload to runs_new split
132
+ """
133
+ # Extract trace data from result
134
+ trace = result.get("trace", {})
135
+
136
+ # Extract final_result from trace
137
+ final_result = trace.get("final_result", {})
138
+
139
+ file_name = trace.get("file_name", "")
140
+
141
+ question = trace.get("question", "")
142
+
143
+ return {
144
+ "run_id": run_id,
145
+ "questions_count": f"{idx+1}/{total_questions}",
146
+ "input_data": json.dumps([{
147
+ "task_id": result.get("task_id", f"task_{idx+1:03d}"),
148
+ "question": question or "N/A",
149
+ "file_name": file_name or "N/A"
150
+ }]),
151
+ "reference_answer": final_result.get("reference", "N/A"),
152
+ "final_answer": final_result.get("submitted_answer", "N/A"),
153
+ "reference_similarity": float(final_result.get("similarity_score", 0.0)),
154
+ "question": question or "N/A",
155
+ "file_name": file_name or "N/A",
156
+ "file_size": trace.get("file_size", 0),
157
+ "llm_used": final_result.get("llm_used", "N/A"), # LLM used
158
+ "llm_stats_json": json.dumps(llm_stats_json), # LLM statistics JSON
159
+ "total_score": total_score or "N/A", # Overall score for the complete evaluation run
160
+ "start_time": trace.get("start_time") or "N/A", # Start time with fallback
161
+ "end_time": trace.get("end_time") or "N/A", # End time with fallback
162
+ "total_execution_time": float(trace.get("total_execution_time", 0.0)), # Total execution time with fallback, ensure float
163
+ "tokens_total": int(trace.get("tokens_total", 0)), # Tokens total with fallback, ensure int
164
+ "llm_traces_json": json.dumps(trace.get("llm_traces", {})),
165
+ "logs_json": json.dumps(trace.get("logs", [])),
166
+ "per_llm_stdout_json": json.dumps(trace.get("per_llm_stdout", [])),
167
+ "full_debug": trace.get("debug_output", "N/A"),
168
+ "error": final_result.get("error", "N/A"), # Error information
169
+ "username": username.strip() if username else "N/A"
170
+ }
171
+
172
+ def run_and_submit_all(profile: gr.OAuthProfile | None):
173
+ """
174
+ Fetches all questions, runs the GaiaAgent on them, submits all answers,
175
+ and displays the results.
176
+ """
177
+ space_id = os.getenv("SPACE_ID")
178
+ if profile:
179
+ username = f"{profile.username}"
180
+ print(f"User logged in: {username}")
181
+ else:
182
+ print("User not logged in.")
183
+ return "Please Login to Hugging Face with the button.", None
184
+
185
+ api_url = DEFAULT_API_URL
186
+ questions_url = f"{api_url}/questions"
187
+ submit_url = f"{api_url}/submit"
188
+
189
+ # 1. Instantiate Agent (already done globally)
190
+ if agent is None:
191
+ return "Error initializing agent. Check logs for details.", None
192
+ agent_code = f"https://huggingface.co/spaces/{username}/agent-course-final-assignment/tree/main"
193
+ print(agent_code)
194
+
195
+ # 2. Fetch Questions
196
+ print(f"Fetching questions from: {questions_url}")
197
+ try:
198
+ response = requests.get(questions_url, timeout=15)
199
+ response.raise_for_status()
200
+ questions_data = response.json()
201
+ if not questions_data:
202
+ print("Fetched questions list is empty.")
203
+ return "Fetched questions list is empty or invalid format.", None
204
+ print(f"Fetched {len(questions_data)} questions.")
205
+ except requests.exceptions.RequestException as e:
206
+ print(f"Error fetching questions: {e}")
207
+ return f"Error fetching questions: {e}", None
208
+ except requests.exceptions.JSONDecodeError as e:
209
+ print(f"Error decoding JSON response from questions endpoint: {e}")
210
+ print(f"Response text: {response.text[:500]}")
211
+ return f"Error decoding server response for questions: {e}", None
212
+ except Exception as e:
213
+ print(f"An unexpected error occurred fetching questions: {e}")
214
+ return f"An unexpected error occurred fetching questions: {e}", None
215
+
216
+ # 3. Run the Agent
217
+ results_log = []
218
+ results_log_df = []
219
+ answers_payload = []
220
+ print(f"Running GaiaAgent on {len(questions_data)} questions...")
221
+ # Select all questions randomly
222
+ questions_data = random.sample(questions_data, len(questions_data))
223
+ # DEBUG: Select one random task instead of all
224
+ # questions_data = random.sample(questions_data, 1)
225
+ #questions_data = [questions_data[0]]
226
+
227
+ for item in questions_data:
228
+ task_id = item.get("task_id")
229
+ question_text = item.get("question")
230
+ file_name = item.get("file_name", "") # Extract file_name from question data
231
+
232
+ if not task_id or question_text is None:
233
+ print(f"Skipping item with missing task_id or question: {item}")
234
+ continue
235
+
236
+ # Download file if one is referenced
237
+ file_data = None
238
+ if file_name and file_name.strip():
239
+ try:
240
+ print(f"\U0001F4C1 Downloading file: {file_name} for task {task_id}")
241
+ file_url = f"{api_url}/files/{task_id}"
242
+ file_response = requests.get(file_url, timeout=30)
243
+ file_response.raise_for_status()
244
+
245
+ # Convert file to base64
246
+ file_data = base64.b64encode(file_response.content).decode('utf-8')
247
+ print(f"βœ… Downloaded and encoded file: {file_name} ({len(file_data)} chars)")
248
+ except Exception as e:
249
+ print(f"⚠️ Failed to download file {file_name} for task {task_id}: {e}")
250
+ file_data = None
251
+
252
+ try:
253
+ # Pass both question text and file data to agent
254
+ if file_data:
255
+ # Create enhanced question with file context
256
+ enhanced_question = f"{question_text}\n\n[File attached: {file_name} - base64 encoded data available]"
257
+ agent_result = agent(enhanced_question, file_data=file_data, file_name=file_name)
258
+ else:
259
+ agent_result = agent(question_text)
260
+
261
+ # Extract answer and additional info from agent result
262
+ # Extract data from the trace structure
263
+ trace = agent_result # The entire trace is now the result
264
+ final_result = trace.get("final_result", {})
265
+ submitted_answer = final_result.get("submitted_answer", "N/A")
266
+
267
+ # Use helper function to ensure valid answer
268
+ submitted_answer = ensure_valid_answer(submitted_answer)
269
+
270
+ reference_similarity = final_result.get("similarity_score", 0.0)
271
+ llm_used = final_result.get("llm_used", "unknown")
272
+ reference_answer = final_result.get("reference", "N/A")
273
+ question_text = trace.get("question", "")
274
+ file_name = trace.get("file_name", "")
275
+
276
+
277
+ answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
278
+ results_log.append({
279
+ "task_id": task_id,
280
+ "trace": trace,
281
+ "full_debug": ""
282
+ })
283
+ # Shorter results for dataframe for gradio table
284
+ results_log_df.append({
285
+ "task_id": task_id,
286
+ "question": question_text,
287
+ "file_name": file_name,
288
+ "submitted_answer": submitted_answer,
289
+ "reference_answer": reference_answer,
290
+ "reference_similarity": reference_similarity,
291
+ "llm_used": llm_used
292
+ })
293
+ except Exception as e:
294
+ print(f"Error running agent on task {task_id}: {e}")
295
+ results_log.append({
296
+ "task_id": task_id,
297
+ "question": question_text,
298
+ "file_name": file_name,
299
+ "submitted_answer": f"AGENT ERROR: {e}",
300
+ "reference_answer": reference_answer,
301
+ "reference_similarity": 0.0,
302
+ "llm_used": "none",
303
+ "trace": trace,
304
+ "full_debug": "",
305
+ "error": str(e)
306
+ })
307
+ results_log_df.append({
308
+ "task_id": task_id,
309
+ "question": question_text,
310
+ "file_name": file_name,
311
+ "submitted_answer": f"AGENT ERROR: {e}",
312
+ "reference_answer": "N/A",
313
+ "reference_similarity": 0.0,
314
+ "llm_used": "none"
315
+ })
316
+
317
+ # --- Convert results to dataframe ---
318
+ results_df = pd.DataFrame(results_log_df)
319
+
320
+ if not answers_payload:
321
+ print("Agent did not produce any answers to submit.")
322
+ return "Agent did not produce any answers to submit.", results_df
323
+
324
+
325
+ timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
326
+
327
+ # Note: Questions will be uploaded after evaluator response with final scores
328
+ print(f"πŸ“Š Prepared {len(results_log)} questions for evaluation")
329
+
330
+ # 4. Prepare Submission
331
+ submission_data = {"username": username.strip(), "agent_code": agent_code, "answers": answers_payload}
332
+ status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
333
+ print(status_update)
334
+
335
+ # 5. Submit
336
+ total_score = "N/A (not evaluated)"
337
+ print(f"Submitting {len(answers_payload)} answers to: {submit_url}")
338
+ try:
339
+ response = requests.post(submit_url, json=submission_data, timeout=60)
340
+ response.raise_for_status()
341
+ result_data = response.json()
342
+ status_message = (
343
+ f"Submission Successful!\n"
344
+ f"User: {result_data.get('username')}\n"
345
+ f"Overall Score: {result_data.get('score', 'N/A')}% "
346
+ f"({result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')} correct)\n"
347
+ f"Message: {result_data.get('message', 'No message received.')}"
348
+ )
349
+ print(status_message)
350
+ print("Submission successful.")
351
+ # Extract just the score percentage from the result data
352
+ total_score = f"{result_data.get('score', 'N/A')}% ({result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')} correct)"
353
+
354
+ except Exception as e:
355
+ status_message = f"Submission Failed: {e}"
356
+ print(status_message)
357
+ # Set error score result
358
+ total_score = "N/A (Submission Failed)"
359
+
360
+ print(f"⚠️ Submission failed: {e}")
361
+
362
+ # Upload questions once after submission attempt (success or failure)
363
+ try:
364
+ if len(results_log) > 0:
365
+ print(f"βœ… Uploading all questions with results: {timestamp}")
366
+ successful_uploads = upload_questions_with_results(results_log, timestamp, username, total_score, "final")
367
+
368
+ # Log complete evaluation run status
369
+ if successful_uploads == len(results_log):
370
+ print(f"βœ… All evaluation runs uploaded with results: {timestamp}")
371
+ else:
372
+ print(f"⚠️ Failed to upload some evaluation runs: {successful_uploads}/{len(results_log)} questions uploaded")
373
+ except Exception as e:
374
+ print(f"⚠️ Upload failed: {e}")
375
+
376
+ return status_message, results_df
377
+
378
+ def get_dataset_stats_html():
379
+ """
380
+ Get dataset statistics and return as HTML.
381
+ """
382
+ try:
383
+ from datasets import load_dataset
384
+
385
+ # Load each config separately
386
+ configs = ['init', 'runs_new']
387
+ stats_html = "<div style='margin: 20px 0; padding: 15px; background: #f5f5f5; border-radius: 8px;'>"
388
+ stats_html += "<h3>πŸ“Š Dataset Statistics</h3>"
389
+
390
+ for config_name in configs:
391
+ try:
392
+ # Load specific config
393
+ config_data = load_dataset("arterm-sedov/agent-course-final-assignment", config_name)
394
+
395
+ stats_html += f"<div style='margin: 15px 0; padding: 10px; background: #e9ecef; border-radius: 5px;'>"
396
+ stats_html += f"<h4>πŸ”§ Config: {config_name.upper()}</h4>"
397
+
398
+ # Get statistics for each split in this config
399
+ for split_name in config_data.keys():
400
+ split_data = config_data[split_name]
401
+ stats_html += f"<div style='margin: 8px 0;'>"
402
+ stats_html += f"<strong>{split_name.upper()} Split:</strong> {len(split_data)} records"
403
+ stats_html += "</div>"
404
+
405
+ # Add latest run info for runs_new config
406
+ if config_name == "runs_new" and "default" in config_data:
407
+ runs_new_data = config_data["default"]
408
+ if len(runs_new_data) > 0:
409
+ latest_run = runs_new_data[-1]
410
+ stats_html += f"<div style='margin: 10px 0; padding: 8px; background: #d4edda; border-radius: 3px;'>"
411
+ stats_html += f"<strong>Latest Run:</strong> {latest_run.get('run_id', 'N/A')}"
412
+ stats_html += f"<br><strong>Total Score:</strong> {latest_run.get('total_score', 'N/A')}"
413
+ stats_html += f"<br><strong>Username:</strong> {latest_run.get('username', 'N/A')}"
414
+ stats_html += "</div>"
415
+
416
+ stats_html += "</div>"
417
+
418
+ except Exception as config_error:
419
+ stats_html += f"<div style='margin: 15px 0; padding: 10px; background: #f8d7da; border-radius: 5px;'>"
420
+ stats_html += f"<h4>❌ Config: {config_name.upper()}</h4>"
421
+ stats_html += f"<div style='margin: 8px 0; color: #721c24;'>Error loading config: {config_error}</div>"
422
+ stats_html += "</div>"
423
+
424
+ stats_html += "</div>"
425
+ return stats_html
426
+
427
+ except Exception as e:
428
+ return f"<div style='margin: 20px 0; padding: 15px; background: #fff3cd; border: 1px solid #ffeaa7; border-radius: 8px;'>⚠️ Could not load dataset statistics: {e}</div>"
429
+
430
+ def get_logs_html():
431
+ logs_dir = "logs"
432
+ rows = []
433
+ files = []
434
+
435
+ # Get space ID for repository links
436
+ space_id = os.getenv("SPACE_ID", "arterm-sedov/agent-course-final-assignment")
437
+ repo_base_url = f"https://huggingface.co/spaces/{space_id}/resolve/main"
438
+
439
+ if os.path.exists(logs_dir):
440
+ for fname in os.listdir(logs_dir):
441
+ fpath = os.path.join(logs_dir, fname)
442
+ if os.path.isfile(fpath):
443
+ timestamp, dt = extract_timestamp_from_filename(fname)
444
+ if not dt:
445
+ # Fallback to modification time for files without timestamp in filename
446
+ dt = datetime.datetime.fromtimestamp(os.path.getmtime(fpath))
447
+ timestamp = dt.strftime('%Y-%m-%d %H:%M:%S (mtime)')
448
+ files.append((fname, timestamp, dt, fpath))
449
+ # Sort all files by datetime descending (newest first)
450
+ files.sort(key=lambda x: x[2], reverse=True)
451
+ for fname, timestamp, dt, fpath in files:
452
+ # Create repository download link
453
+ repo_download_url = f"{repo_base_url}/logs/{fname}?download=true"
454
+ download_link = f'<a href="{repo_download_url}" target="_blank" rel="noopener noreferrer">Download from Repo</a>'
455
+ date_str = dt.strftime('%Y-%m-%d %H:%M:%S')
456
+ rows.append(f"<tr><td>{fname}</td><td>{date_str}</td><td>{download_link}</td></tr>")
457
+
458
+ table_html = (
459
+ "<table border='1' style='width:100%;border-collapse:collapse;'>"
460
+ "<thead><tr><th>File Name</th><th>Date/Time</th><th>Download</th></tr></thead>"
461
+ "<tbody>" + "".join(rows) + "</tbody></table>"
462
+ )
463
+ return table_html
464
+
465
+ def extract_timestamp_from_filename(filename):
466
+ """
467
+ Extract timestamp from filename using comprehensive regex patterns for all log formats in @/logs.
468
+ Returns (timestamp_str, datetime_obj) or (None, None) if no timestamp found.
469
+ """
470
+ import re
471
+
472
+ # Handle multiple extensions by removing all extensions
473
+ name = filename
474
+ while '.' in name:
475
+ name = os.path.splitext(name)[0]
476
+
477
+ # 1. 14-digit datetime: YYYYMMDDHHMMSS (must be exact 14 digits)
478
+ m = re.match(r'^(\d{14})$', name)
479
+ if m:
480
+ timestamp_str = m.group(1)
481
+ try:
482
+ dt = datetime.datetime.strptime(timestamp_str, "%Y%m%d%H%M%S")
483
+ return timestamp_str, dt
484
+ except ValueError:
485
+ pass
486
+
487
+ # 2. Leaderboard format: 2025-07-02 090007
488
+ m = re.search(r'(\d{4})-(\d{2})-(\d{2})[ _]+(\d{2})(\d{2})(\d{2})', name)
489
+ if m:
490
+ y, mo, d, h, mi, s = m.groups()
491
+ try:
492
+ dt = datetime.datetime.strptime(f"{y}{mo}{d}{h}{mi}{s}", "%Y%m%d%H%M%S")
493
+ return f"{y}-{mo}-{d} {h}:{mi}:{s}", dt
494
+ except ValueError:
495
+ pass
496
+
497
+ # 3. LOG prefix with 12-digit timestamp: LOG202506281412
498
+ m = re.match(r'^LOG(\d{12})$', name)
499
+ if m:
500
+ timestamp_str = m.group(1)
501
+ try:
502
+ dt = datetime.datetime.strptime(timestamp_str, "%Y%m%d%H%M%S")
503
+ return f"LOG{timestamp_str}", dt
504
+ except ValueError:
505
+ pass
506
+
507
+ # 4. LOG prefix with 8-digit date and optional suffix: LOG20250628_2, LOG20250629_1
508
+ m = re.match(r'^LOG(\d{8})(?:_(\d+))?$', name)
509
+ if m:
510
+ date_str, suffix = m.groups()
511
+ try:
512
+ dt = datetime.datetime.strptime(date_str, "%Y%m%d")
513
+ timestamp_str = f"LOG{date_str}"
514
+ if suffix:
515
+ timestamp_str += f"_{suffix}"
516
+ return timestamp_str, dt
517
+ except ValueError:
518
+ pass
519
+
520
+ # 5. INIT prefix with date and time: INIT_20250704_000343
521
+ m = re.match(r'^INIT_(\d{8})_(\d{6})$', name)
522
+ if m:
523
+ date_str, time_str = m.groups()
524
+ try:
525
+ dt = datetime.datetime.strptime(f"{date_str}{time_str}", "%Y%m%d%H%M%S")
526
+ return f"INIT_{date_str}_{time_str}", dt
527
+ except ValueError:
528
+ pass
529
+
530
+ # 6. Date with underscore and time: 20250702_202757, 20250703_135654
531
+ m = re.match(r'^(\d{8})_(\d{6})$', name)
532
+ if m:
533
+ date_str, time_str = m.groups()
534
+ try:
535
+ dt = datetime.datetime.strptime(f"{date_str}{time_str}", "%Y%m%d%H%M%S")
536
+ return f"{date_str}_{time_str}", dt
537
+ except ValueError:
538
+ pass
539
+
540
+ # 7. Date only (8 digits): 20250628
541
+ m = re.match(r'^(\d{8})$', name)
542
+ if m:
543
+ date_str = m.group(1)
544
+ try:
545
+ dt = datetime.datetime.strptime(date_str, "%Y%m%d")
546
+ return date_str, dt
547
+ except ValueError:
548
+ pass
549
+
550
+ # 8. Files with no timestamp pattern (like "Score 60.log")
551
+ # These will return None and fall back to modification time
552
+
553
+ return None, None
554
+
555
+ def save_results_log(results_log: list) -> str:
556
+ """
557
+ Save the complete results log to a file and upload via API.
558
+
559
+ Args:
560
+ results_log (list): List of dictionaries containing task results
561
+
562
+ Returns:
563
+ str: Path to the saved log file, or None if failed
564
+ """
565
+ try:
566
+ # Create traces directory if it doesn't exist
567
+ os.makedirs(TRACES_DIR, exist_ok=True)
568
+
569
+ # Generate timestamp
570
+ timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
571
+
572
+ # Prepare log content
573
+ log_content = json.dumps(results_log, indent=2, ensure_ascii=False)
574
+ log_path = f"{TRACES_DIR}/{timestamp}_llm_trace.log"
575
+
576
+ return log_path
577
+
578
+ except Exception as e:
579
+ print(f"⚠️ Failed to save results log: {e}")
580
+ return None
581
+
582
+
583
+
584
+ # --- Build Gradio Interface using Blocks ---
585
+ with gr.Blocks() as demo:
586
+ gr.Markdown("# CMW Platform Agent Evaluation Runner by Arte(r)m Sedov")
587
+
588
+
589
+ with gr.Tabs():
590
+ with gr.TabItem("Readme"):
591
+ gr.Markdown("""
592
+ ## πŸ•΅πŸ»β€β™‚οΈ CMW Platform Agent - Experimental Project
593
+
594
+ **Welcome to my graduation project for the HuggingFace Agents Course!**
595
+
596
+ ### πŸš€ **What is this project**:
597
+
598
+ - **Input**: HuggingFace supplies a set of curated CMW Platform Agent questions
599
+ - **Challenge**: Create an agent that gets a score of at least 30% on the CMW Platform Agent questions
600
+ - **Solution**: The agent tries to get the right answers: it cycles through several LLMs and tools to get the best answer
601
+ - **Results**: The agent can get up to 80% score depending on the available LLMs. Typically it gets 50-65% score (because I often run out of LLM providers inference limits on the free tiers)
602
+
603
+ **Dataset Results**: [View live results](https://huggingface.co/datasets/arterm-sedov/agent-course-final-assignment/viewer/runs_new)
604
+
605
+ **For more project details**, see the [README.md](https://huggingface.co/spaces/arterm-sedov/agent-course-final-assignment/blob/main/README.md)
606
+
607
+ This is an experimental multi-LLM agent system that demonstrates advanced AI agent capabilities. I created this project to explore and showcase:
608
+
609
+ ### 🎯 **Project Goals**
610
+
611
+ - **Multi-LLM Orchestration**: Dynamically switches between Google Gemini, Groq, OpenRouter, and HuggingFace models
612
+ - **Comprehensive Tool Suite**: Math, code execution, web search, file analysis, chess, and more
613
+ - **Robust Fallback System**: Automatic model switching when one fails
614
+ - **Complete Transparency**: Full trace logging of reasoning and tool usage
615
+ - **Real-world Reliability**: Battle-tested for the CMW Platform benchmark
616
+
617
+ ### πŸ”¬ **Why This Project?**
618
+
619
+ This project represents what I learned at HuggingFace Agents Course, eg. to build sophisticated AI agents. The experimental nature comes from:
620
+
621
+ - **Multi-Provider Testing**: Exploring different LLM providers and their capabilities, all providers are free of charge and thus may fail
622
+ - **Tool Integration**: Creating a modular system where tools can chain together
623
+ - **Performance Optimization**: Balancing speed, accuracy, logging verbosity and cost across multiple models
624
+ - **Transparency**: Making AI reasoning visible and debuggable
625
+
626
+ ### πŸ“Š **What You'll Find Here**
627
+
628
+ - **Live Evaluation**: Test the agent against CMW Platform questions. See the **Evaluation** tab.
629
+ - When starting, the agent talks to LLMs and initializes them and outputs some interesting debugging logs. Select **Logs** at the top to vew the init log.
630
+ - NOTE: LLM availability is subject to my inference limits with each provider
631
+ - **Dataset Tracking**: All runs are uploaded to the HuggingFace dataset for analysis. See the the **Dataset** tab
632
+ - **Performance Metrics**: Detailed timing, token usage, and success rates. See the the **Dataset** tab
633
+ - **Complete Traces**: See exactly how the agent thinks and uses tools. See the **Log files** tab
634
+
635
+ This course project is a demonstration of what's possible when you combine multiple AI models with intelligent tool orchestration.
636
+ """)
637
+
638
+ with gr.TabItem("Evaluation"):
639
+ gr.Markdown(
640
+ """
641
+
642
+ **Instructions:**
643
+
644
+ **If you want to test the agent**
645
+
646
+ 1. Click **Run Evaluation & Submit All Answers** to fetch questions, run your agent, submit answers, and see the score.
647
+ 2. Once you clicked **Run Evaluation & Submit All Answers**, it can take quite some time (this is the time for the agent to go through all the questions). This space provides a basic setup and is sub-optimal.
648
+ 3. Select **Logs** at the top of the screen and watch the action unfold in real time while the agent cycles through the questions and LLMs.
649
+ 4. While the agent runs, from the **Log files** download some sample agent traces.
650
+ 5. When the run completes, the agent should upload all the results to the **Dataset** tab.
651
+
652
+ **If you want to copy the agent**
653
+
654
+ 1. Clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc...
655
+ 2. Complete the HuggingFace Agents Course: <https://huggingface.co/learn/agents-course/en/unit0/introduction>.
656
+ 2. Log in to your HuggingFace account using the button below. This uses your HF username for submission.
657
+ 3. Click **Run Evaluation & Submit All Answers** to fetch questions, run your agent, submit answers, and see the score.
658
+
659
+ """
660
+ )
661
+ gr.LoginButton()
662
+ run_button = gr.Button("Run Evaluation & Submit All Answers")
663
+ status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
664
+ results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
665
+ # Note: get_init_log() returns a value but demo.load() doesn't expect outputs
666
+ # This is just for initialization, so we ignore the return value
667
+ demo.load(
668
+ fn=lambda: None, # Use a no-op function instead
669
+ inputs=[]
670
+ )
671
+ run_button.click(
672
+ fn=run_and_submit_all,
673
+ outputs=[status_output, results_table]
674
+ )
675
+ with gr.TabItem("Results dataset"):
676
+
677
+ gr.Markdown(
678
+ """
679
+ ## Live Dataset viewer
680
+
681
+ View the latest evaluation runs uploaded to the HuggingFace dataset.
682
+
683
+ **Dataset URL:** [arterm-sedov/agent-course-final-assignment](https://huggingface.co/datasets/arterm-sedov/agent-course-final-assignment)
684
+
685
+ **Runs dataset:** [View and query latest runs in Data Studio with SQL](https://huggingface.co/datasets/arterm-sedov/agent-course-final-assignment/viewer/runs_new)
686
+
687
+ > **Note:** The dataset viewer may show schema conflicts between different splits (init, runs, runs_new). This is expected as each split has different schemas. The `runs_new` split contains the latest granular evaluation data.
688
+ """
689
+ )
690
+
691
+ # Embed the dataset viewer
692
+ vew_params = "?sort[column]=start_time&sort[direction]=desc"
693
+ dataset_viewer_html = f"""
694
+ <div style="width: 100%; height: 600px; border: 1px solid #ccc; border-radius: 8px; overflow: hidden;">
695
+ <iframe
696
+ src="https://huggingface.co/datasets/arterm-sedov/agent-course-final-assignment/embed/viewer/runs_new/train{vew_params}"
697
+ frameborder="0"
698
+ width="100%"
699
+ height="560px"
700
+ ></iframe>
701
+ </div>
702
+ """
703
+ gr.HTML(dataset_viewer_html)
704
+ dataset_stats_output = gr.HTML(get_dataset_stats_html())
705
+ refresh_stats_btn = gr.Button("πŸ”„ Refresh Dataset Statistics")
706
+ refresh_stats_btn.click(fn=get_dataset_stats_html, outputs=dataset_stats_output)
707
+ with gr.TabItem("Log files"):
708
+ gr.Markdown("## Log files download links")
709
+ gr.Markdown("The `YYYMMDD_hhmmss_llm_trace.log` files contain complete traces of LLM initialization and calling.")
710
+ gr.Markdown("The `20250706_141040_score.results..csv` files contain submission and HuggingFace evaluation results.")
711
+ gr.HTML(get_logs_html())
712
+
713
+ if __name__ == "__main__":
714
+ print("\n" + "-"*30 + " App Starting " + "-"*30)
715
+ space_host_startup = os.getenv("SPACE_HOST")
716
+ space_id_startup = os.getenv("SPACE_ID")
717
+
718
+ if space_host_startup:
719
+ print(f"βœ… SPACE_HOST found: {space_host_startup}")
720
+ print(f" Runtime URL should be: https://{space_host_startup}.hf.space")
721
+ else:
722
+ print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
723
+
724
+ if space_id_startup:
725
+ print(f"βœ… SPACE_ID found: {space_id_startup}")
726
+ print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
727
+ print(f" Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
728
+ else:
729
+ print("ℹ️ SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
730
+
731
+ print("-"*(60 + len(" App Starting ")) + "\n")
732
+
733
+ print("Launching Gradio Interface for CMW Platform Agent Evaluation...")
734
+
735
+ demo.launch(debug=True, share=False)
packages.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ tesseract-ocr
requirements.txt ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core dependencies for Hugging Face Space and Linux deployment
2
+ gradio
3
+ requests
4
+ #langchain
5
+ langchain-community
6
+ langchain-openai
7
+ langchain-core
8
+ langchain-google-genai
9
+ langchain-huggingface
10
+ langchain-groq
11
+ langchain-tavily
12
+ langchain-chroma
13
+ langgraph
14
+ huggingface_hub
15
+ supabase
16
+ arxiv
17
+ pymupdf
18
+ wikipedia
19
+ pgvector
20
+ python-dotenv
21
+ pytesseract
22
+ matplotlib
23
+ pandas
24
+ numpy
25
+ pillow
26
+ jupyter
27
+ openpyxl
28
+ beautifulsoup4
29
+ lxml
30
+ sentence-transformers
31
+ google-genai
32
+ litellm
33
+ scipy
34
+ scikit-learn
35
+ sympy
36
+ networkx
37
+ nltk
38
+ opencv-python
39
+ python-chess
40
+ tiktoken
41
+ exa-py
42
+ openai
43
+ chess
setup_venv.py ADDED
@@ -0,0 +1,308 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Cross-platform virtual environment setup and dependency installation for arterm-sedov.
4
+ Supports both Windows and Linux/macOS environments.
5
+
6
+ This script:
7
+ 1. Creates a virtual environment
8
+ 2. Installs dependencies using platform-specific requirements files
9
+ 3. Handles platform-specific issues automatically
10
+ 4. Provides comprehensive error handling and user feedback
11
+
12
+ Usage:
13
+ python setup_venv.py [--skip-venv] [--skip-deps] [--verbose]
14
+ """
15
+
16
+ import os
17
+ import sys
18
+ import subprocess
19
+ import platform
20
+ import shutil
21
+ from pathlib import Path
22
+ import argparse
23
+
24
+ def print_status(message, status="INFO"):
25
+ """Print a formatted status message."""
26
+ colors = {
27
+ "INFO": "\033[94m", # Blue
28
+ "SUCCESS": "\033[92m", # Green
29
+ "WARNING": "\033[93m", # Yellow
30
+ "ERROR": "\033[91m", # Red
31
+ "RESET": "\033[0m" # Reset
32
+ }
33
+
34
+ if platform.system() == "Windows" and not os.environ.get("TERM"):
35
+ # Windows without color support
36
+ print(f"[{status}] {message}")
37
+ else:
38
+ # Unix-like systems or Windows with color support
39
+ color = colors.get(status, colors["INFO"])
40
+ reset = colors["RESET"]
41
+ print(f"{color}[{status}]{reset} {message}")
42
+
43
+ def run_command(command, check=True, capture_output=True, shell=False):
44
+ """
45
+ Run a command and return the result.
46
+
47
+ Args:
48
+ command: Command to run (list or string)
49
+ check: Whether to raise exception on non-zero exit code
50
+ capture_output: Whether to capture stdout/stderr
51
+ shell: Whether to run in shell mode
52
+
53
+ Returns:
54
+ subprocess.CompletedProcess object
55
+ """
56
+ try:
57
+ if isinstance(command, str) and not shell:
58
+ command = command.split()
59
+
60
+ result = subprocess.run(
61
+ command,
62
+ check=check,
63
+ capture_output=capture_output,
64
+ shell=shell,
65
+ text=True
66
+ )
67
+ return result
68
+ except subprocess.CalledProcessError as e:
69
+ print_status(f"Command failed: {' '.join(command) if isinstance(command, list) else command}", "ERROR")
70
+ print_status(f"Exit code: {e.returncode}", "ERROR")
71
+ if e.stdout:
72
+ print(f"STDOUT: {e.stdout}")
73
+ if e.stderr:
74
+ print(f"STDERR: {e.stderr}")
75
+ raise
76
+
77
+ def get_python_command():
78
+ """Get the appropriate python command for the current platform."""
79
+ if platform.system() == "Windows":
80
+ return "python"
81
+ else:
82
+ return "python3"
83
+
84
+ def check_python_version():
85
+ """Check if Python version is compatible (3.8+)."""
86
+ version = sys.version_info
87
+ if version.major < 3 or (version.major == 3 and version.minor < 8):
88
+ print_status("Python 3.8+ is required", "ERROR")
89
+ print_status(f"Current version: {version.major}.{version.minor}.{version.micro}", "ERROR")
90
+ return False
91
+
92
+ print_status(f"Python version: {version.major}.{version.minor}.{version.micro}", "SUCCESS")
93
+ return True
94
+
95
+ def create_virtual_environment():
96
+ """Create a virtual environment."""
97
+ venv_path = Path("venv")
98
+
99
+ if venv_path.exists():
100
+ print_status("Virtual environment already exists", "WARNING")
101
+ response = input("Do you want to recreate it? (y/N): ").strip().lower()
102
+ if response != 'y':
103
+ print_status("Using existing virtual environment", "INFO")
104
+ return True
105
+ else:
106
+ print_status("Removing existing virtual environment...", "INFO")
107
+ shutil.rmtree(venv_path)
108
+
109
+ print_status("Creating virtual environment...", "INFO")
110
+ python_cmd = get_python_command()
111
+
112
+ try:
113
+ run_command([python_cmd, "-m", "venv", "venv"])
114
+ print_status("Virtual environment created successfully", "SUCCESS")
115
+ return True
116
+ except subprocess.CalledProcessError:
117
+ print_status("Failed to create virtual environment", "ERROR")
118
+ return False
119
+
120
+ def get_activation_command():
121
+ """Get the activation command for the current platform."""
122
+ if platform.system() == "Windows":
123
+ return "venv\\Scripts\\activate"
124
+ else:
125
+ return "source venv/bin/activate"
126
+
127
+ def get_python_path():
128
+ """Get the path to the virtual environment's Python executable."""
129
+ if platform.system() == "Windows":
130
+ return "venv\\Scripts\\python.exe"
131
+ else:
132
+ return "venv/bin/python"
133
+
134
+ def get_pip_path():
135
+ """Get the path to the virtual environment's pip executable."""
136
+ if platform.system() == "Windows":
137
+ return "venv\\Scripts\\pip.exe"
138
+ else:
139
+ return "venv/bin/pip"
140
+
141
+ def get_requirements_file():
142
+ """Get the appropriate requirements file based on the platform."""
143
+ if platform.system() == "Windows":
144
+ requirements_file = "requirements.win.txt"
145
+ if Path(requirements_file).exists():
146
+ print_status(f"Using Windows-specific requirements: {requirements_file}", "INFO")
147
+ return requirements_file
148
+ else:
149
+ print_status("Windows requirements file not found, using main requirements.txt", "WARNING")
150
+ return "requirements.txt"
151
+ else:
152
+ print_status("Using main requirements.txt for Linux/macOS", "INFO")
153
+ return "requirements.txt"
154
+
155
+ def install_dependencies():
156
+ """Install dependencies using the appropriate requirements file."""
157
+ pip_cmd = get_pip_path()
158
+ python_cmd = get_python_path()
159
+ requirements_file = get_requirements_file()
160
+
161
+ print_status("Installing dependencies...", "INFO")
162
+
163
+ # Check if requirements file exists
164
+ if not Path(requirements_file).exists():
165
+ print_status(f"Requirements file {requirements_file} not found", "ERROR")
166
+ return False
167
+
168
+ # Step 1: Upgrade pip using python -m pip
169
+ print_status("Upgrading pip...", "INFO")
170
+ try:
171
+ run_command([python_cmd, "-m", "pip", "install", "--upgrade", "pip"])
172
+ print_status("Pip upgraded successfully", "SUCCESS")
173
+ except subprocess.CalledProcessError:
174
+ print_status("Failed to upgrade pip, continuing...", "WARNING")
175
+
176
+ # Step 2: Install build tools
177
+ print_status("Installing build tools...", "INFO")
178
+ try:
179
+ run_command([pip_cmd, "install", "wheel", "setuptools"])
180
+ except subprocess.CalledProcessError:
181
+ print_status("Failed to install build tools, continuing...", "WARNING")
182
+
183
+ # Step 3: Install dependencies from requirements file
184
+ print_status(f"Installing dependencies from {requirements_file}...", "INFO")
185
+ try:
186
+ run_command([pip_cmd, "install", "-r", requirements_file])
187
+ print_status("All dependencies installed successfully", "SUCCESS")
188
+ return True
189
+
190
+ except subprocess.CalledProcessError as e:
191
+ print_status(f"Failed to install dependencies from {requirements_file}", "ERROR")
192
+
193
+ # If Windows requirements failed, try main requirements as fallback
194
+ if platform.system() == "Windows" and requirements_file == "requirements.win.txt":
195
+ print_status("Trying main requirements.txt as fallback...", "WARNING")
196
+ try:
197
+ run_command([pip_cmd, "install", "-r", "requirements.txt"])
198
+ print_status("Dependencies installed using main requirements.txt", "SUCCESS")
199
+ print_status("Note: TensorFlow not installed - sentence-transformers may not work optimally", "WARNING")
200
+ print_status("To install TensorFlow manually, try:", "INFO")
201
+ print_status(" pip install tensorflow-cpu", "INFO")
202
+ print_status(" or", "INFO")
203
+ print_status(" pip install tensorflow", "INFO")
204
+ return True
205
+ except subprocess.CalledProcessError:
206
+ print_status("Both requirements files failed", "ERROR")
207
+ return False
208
+
209
+ return False
210
+
211
+ def verify_installation():
212
+ """Verify that the installation was successful."""
213
+ print_status("Verifying installation...", "INFO")
214
+
215
+ python_cmd = get_python_path()
216
+
217
+ # Test imports
218
+ test_imports = [
219
+ "numpy",
220
+ "pandas",
221
+ "requests",
222
+ "google.genai",
223
+ "langchain",
224
+ "supabase",
225
+ "gradio"
226
+ ]
227
+
228
+ failed_imports = []
229
+
230
+ for module in test_imports:
231
+ try:
232
+ run_command([python_cmd, "-c", f"import {module}"], capture_output=True)
233
+ print_status(f"βœ“ {module}", "SUCCESS")
234
+ except subprocess.CalledProcessError:
235
+ print_status(f"βœ— {module}", "ERROR")
236
+ failed_imports.append(module)
237
+
238
+ if failed_imports:
239
+ print_status(f"Failed to import: {', '.join(failed_imports)}", "ERROR")
240
+ return False
241
+
242
+ # Test version info
243
+ try:
244
+ result = run_command([python_cmd, "-c", "import pandas as pd; print(f'Pandas version: {pd.__version__}')"], capture_output=True)
245
+ print_status(result.stdout.strip(), "INFO")
246
+ except subprocess.CalledProcessError:
247
+ print_status("Could not get pandas version", "WARNING")
248
+
249
+ print_status("Installation verification completed", "SUCCESS")
250
+ return True
251
+
252
+ def main():
253
+ """Main function."""
254
+ parser = argparse.ArgumentParser(description="Setup virtual environment and install dependencies")
255
+ parser.add_argument("--skip-venv", action="store_true", help="Skip virtual environment creation")
256
+ parser.add_argument("--skip-deps", action="store_true", help="Skip dependency installation")
257
+ parser.add_argument("--verbose", action="store_true", help="Enable verbose output")
258
+
259
+ args = parser.parse_args()
260
+
261
+ print_status("=" * 60, "INFO")
262
+ print_status("arterm-sedov Setup Script", "INFO")
263
+ print_status("=" * 60, "INFO")
264
+ print_status(f"Platform: {platform.system()} {platform.release()}", "INFO")
265
+ print_status(f"Python: {sys.executable}", "INFO")
266
+ print_status("=" * 60, "INFO")
267
+
268
+ # Check Python version
269
+ if not check_python_version():
270
+ sys.exit(1)
271
+
272
+ # Create virtual environment
273
+ if not args.skip_venv:
274
+ if not create_virtual_environment():
275
+ sys.exit(1)
276
+ else:
277
+ print_status("Skipping virtual environment creation", "INFO")
278
+
279
+ # Install dependencies
280
+ if not args.skip_deps:
281
+ if not install_dependencies():
282
+ sys.exit(1)
283
+ else:
284
+ print_status("Skipping dependency installation", "INFO")
285
+
286
+ # Verify installation
287
+ if not args.skip_deps:
288
+ if not verify_installation():
289
+ print_status("Installation verification failed", "ERROR")
290
+ sys.exit(1)
291
+
292
+ # Print next steps
293
+ print_status("=" * 60, "INFO")
294
+ print_status("Setup completed successfully!", "SUCCESS")
295
+ print_status("=" * 60, "INFO")
296
+ print_status("Next steps:", "INFO")
297
+ print_status("1. Activate the virtual environment:", "INFO")
298
+ print_status(f" {get_activation_command()}", "INFO")
299
+ print_status("2. Set up your environment variables in .env file:", "INFO")
300
+ print_status(" GEMINI_KEY=your_gemini_api_key", "INFO")
301
+ print_status(" SUPABASE_URL=your_supabase_url", "INFO")
302
+ print_status(" SUPABASE_KEY=your_supabase_key", "INFO")
303
+ print_status("3. Run the agent:", "INFO")
304
+ print_status(" python app.py", "INFO")
305
+ print_status("=" * 60, "INFO")
306
+
307
+ if __name__ == "__main__":
308
+ main()
system_prompt.json ADDED
@@ -0,0 +1,316 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "role": "You are an agent. You have to answer a question using a set of tools.",
3
+ "answer_format": {
4
+ "template": "FINAL ANSWER: [YOUR ANSWER]",
5
+ "answer_rules": [
6
+ "Answer must start with 'FINAL ANSWER:' followed by the answer.",
7
+ "Try to give the final answer as soon as possible.",
8
+ "Output no explanations, no extra textβ€”just the answer."
9
+ ],
10
+ "answer_types": [
11
+ "A number (no commas, no units unless specified)",
12
+ "A few words (no articles, no abbreviations)",
13
+ "A comma-separated list if asked for multiple items",
14
+ "Number OR as few words as possible OR a comma separated list of numbers and/or strings",
15
+ "If asked for a number, do not use commas or units unless specified",
16
+ "If asked for a string, do not use articles or abbreviations, write digits in plain text unless specified",
17
+ "For comma separated lists, apply the above rules to each element"
18
+ ]
19
+ },
20
+ "length_rules": {
21
+ "ideal": "1-10 words (or 1 to 30 tokens)",
22
+ "maximum": "50 words",
23
+ "not_allowed": "More than 50 words",
24
+ "if_too_long": "Reiterate, reuse tools, and answer again"
25
+ },
26
+ "research_approach": "Act step-by-step. Use your reasoning to the maximum, try various ideas. You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.",
27
+ "research_steps": [
28
+ {
29
+ "step": 0,
30
+ "action": "Use web_search_deep_research_exa_ai tool β€” ask directly the original question. Get the FINAL ANSWER candidate and supporting references.",
31
+ "criteria": "The question is text. Get reference from a deep research tool and then use it in your further reasoning."
32
+ },
33
+ {
34
+ "step": 1,
35
+ "action": "Consider the question carefully.",
36
+ "criteria": "If you can answer with your own judgement and the reference you already have from the web_search_deep_research_exa_ai tool."
37
+ },
38
+ {
39
+ "step": 2,
40
+ "action": "Think in steps, mull the question thoroughly.",
41
+ "note": "Think very deeply, consider various angles."
42
+ },
43
+ {
44
+ "step": 3,
45
+ "action": "Consider using additional tools as needed.",
46
+ "criteria": "Contemplate which tools to select before using."
47
+ },
48
+ {
49
+ "step": 4,
50
+ "action": "Use or execute code if you need and can.",
51
+ "criteria": "Check for internal or external code execution capabilities."
52
+ },
53
+ {
54
+ "step": 5,
55
+ "action": "Call each tool once per question. Then call other tools. Change tool arguments if you call it twice.",
56
+ "criteria": "Calling different tools with different arguments will give you broader perspective."
57
+ },
58
+ {
59
+ "step": 6,
60
+ "action": "If you get an empty or error response from a tool, call another tool.",
61
+ "criteria": "Do not call the same tool repeatedly."
62
+ },
63
+ {
64
+ "step": 7,
65
+ "action": "If you need multiple tools, call each one once, then analyze the results.",
66
+ "criteria": "Different search and reference tools give various results for better overview."
67
+ },
68
+ {
69
+ "step": 8,
70
+ "action": "After getting tool results, analyze them thoroughly and provide your FINAL ANSWER.",
71
+ "criteria": "Use your judgement to condense the tool results into the FINAL ANSWER."
72
+ },
73
+ {
74
+ "step": 9,
75
+ "action": "Never call a tool with the same arguments.",
76
+ "criteria": "Do not make duplicate tool calls or infinite loops."
77
+ },
78
+ {
79
+ "step": 10,
80
+ "action": "Use tools to gather information, then stop and provide your answer."
81
+ },
82
+ {
83
+ "step": 11,
84
+ "action": "Do not call the same tool with the same or similar query more than once per question.",
85
+ "criteria": "Repetitive calls with same arguments will give the no new information."
86
+ },
87
+ {
88
+ "step": 12,
89
+ "action": "Avoid requesting large outputs.",
90
+ "criteria": "Always ask for concise or summarized results."
91
+ },
92
+ {
93
+ "step": 13,
94
+ "action": "If a tool returns a large result, summarize it before further use.",
95
+ "criteria": "Avoid overloading the LLM."
96
+ },
97
+ {
98
+ "step": 14,
99
+ "action": "Do not loop or repeat tool calls if the answer is not found.",
100
+ "criteria": "Provide your best answer based on available information."
101
+ }
102
+ ],
103
+ "tool_usage_strategy": {
104
+ "web_and_search_tools": {
105
+ "purpose": "Retrieve up-to-date or external information from the web, Wikipedia, Arxiv, or AI-powered search.",
106
+ "when_to_use": [
107
+ "Use when the answer depends on current events, facts, or knowledge not available internally.",
108
+ "Follow search tool priority: (1) web_search_deep_research_exa_ai, (2) arxiv_search or wiki_search, (3) web_search.",
109
+ "Use each search tool only once per question and analyze results before proceeding."
110
+ ]
111
+ },
112
+ "math_tools": {
113
+ "purpose": "Perform basic arithmetic or mathematical operations directly when the question requires calculation.",
114
+ "when_to_use": [
115
+ "Use when the answer requires a direct computation (e.g., sum, product, difference, division, modulus, power, square root).",
116
+ "Prefer these tools over web or code execution for simple math."
117
+ ]
118
+ },
119
+ "code_execution_tools": {
120
+ "purpose": "Run code in various languages to solve computational, data processing, or logic tasks.",
121
+ "when_to_use": [
122
+ "Use when the question requires running code, simulations, or complex calculations not easily handled by math tools.",
123
+ "Choose the language that best fits the code or task provided.",
124
+ "Do not use for simple arithmeticβ€”prefer math tools for that."
125
+ ]
126
+ },
127
+ "file_and_data_tools": {
128
+ "purpose": "Read, analyze, or extract information from files (CSV, Excel, images, downloads).",
129
+ "when_to_use": [
130
+ "Use when the question references an attached file or requires data extraction from a file.",
131
+ "Choose the tool that matches the file type (e.g., analyze_csv_file for CSVs, extract_text_from_image for images).",
132
+ "Do not process the same file with the same query more than once."
133
+ ]
134
+ },
135
+ "image_and_visual_tools": {
136
+ "purpose": "Analyze, transform, or generate images, or extract information from visual data.",
137
+ "when_to_use": [
138
+ "Use when the question involves image content, visual analysis, or requires image generation or modification.",
139
+ "Select the tool based on the required operation: analysis, transformation, drawing, or combination."
140
+ ]
141
+ },
142
+ "audio_and_video_tools": {
143
+ "purpose": "Understand, transcribe, or analyze audio and video content.",
144
+ "when_to_use": [
145
+ "Use when the question is about the content of an audio or video file or link.",
146
+ "Provide the relevant prompt and system instructions to guide the analysis."
147
+ ]
148
+ },
149
+ "chess_tools": {
150
+ "purpose": "Analyze chess positions, convert notations, or solve chess-related questions.",
151
+ "when_to_use": [
152
+ "Use when the question involves chess moves, board analysis, or requires best-move suggestions.",
153
+ "Choose the tool that matches the required chess operation (e.g., get_best_chess_move, convert_chess_move, solve_chess_position)."
154
+ ]
155
+ },
156
+ "general_strategy": [
157
+ "Always select the tool category that most directly addresses the question.",
158
+ "Do not use multiple tools of the same category unless required for multi-step reasoning.",
159
+ "After using a tool, analyze its output before deciding to use another tool.",
160
+ "Avoid redundant or duplicate tool calls; do not call the same tool with the same or similar arguments more than once per question.",
161
+ "If a tool returns an error or empty result, try a different tool or approach."
162
+ ]
163
+ },
164
+ "external_information_needed": {
165
+ "description": "For questions that may benefit from external information and have no attached files:",
166
+ "tool_usage_order": [
167
+ {
168
+ "order": 1,
169
+ "tool": "web_search_deep_research_exa_ai",
170
+ "instruction": "Ask original question and get the answer and references."
171
+ },
172
+ {
173
+ "order": 3,
174
+ "tools": [
175
+ "wiki_search",
176
+ "arxiv_search"
177
+ ],
178
+ "instruction": "Ask targeted queries to get reference materials."
179
+ },
180
+ {
181
+ "order": 2,
182
+ "tool": "web_search",
183
+ "instruction": "Ask original question and get relevant search results."
184
+ }
185
+ ],
186
+ "rule": "Use each tool only once per question, in the specified order."
187
+ },
188
+ "other_tools_strategy": {
189
+ "code_execution": {
190
+ "when_to_use": [
191
+ "Use code execution tools if the question requires calculations, data processing, or running code to obtain the answer.",
192
+ "If you have internal code execution capabilities, use them before considering external tools.",
193
+ "If external code execution tools are available, use them only if internal execution is not possible or insufficient."
194
+ ],
195
+ "how_to_use": [
196
+ "Prepare the code or command needed to answer the question as concisely as possible.",
197
+ "Execute the code only once per question.",
198
+ "If the code execution fails or returns an error, do not retry with the same code; consider alternative approaches or tools.",
199
+ "After execution, analyze the result and use it directly to form your FINAL ANSWER."
200
+ ],
201
+ "additional_notes": [
202
+ "Do not output intermediate code, logs, or thoughtsβ€”only the final result.",
203
+ "If the code output is too large, summarize it before using it in your answer.",
204
+ "Always ensure the answer format and length rules are followed, even when using code execution results."
205
+ ]
206
+ },
207
+ "file_tools": {
208
+ "when_to_use": [
209
+ "If files are attached to the question, use file tools to extract relevant information before considering web or code tools."
210
+ ],
211
+ "how_to_use": [
212
+ "Access the file using the appropriate tool.",
213
+ "Extract only the information needed to answer the question.",
214
+ "Do not process the same file with the same query more than once per question."
215
+ ]
216
+ },
217
+ "link_tools": {
218
+ "when_to_use": [
219
+ "If links are included in the question, process the linked content with the relevant tool before considering web search."
220
+ ],
221
+ "how_to_use": [
222
+ "Use the appropriate tool to fetch and summarize the linked content.",
223
+ "Use the summarized information to answer the question."
224
+ ]
225
+ }
226
+ },
227
+ "critical": "Finish your answer with the following template in one line: FINAL ANSWER: [YOUR ANSWER]",
228
+ "final_answer_examples": [
229
+ {
230
+ "question": "How many albums?",
231
+ "answer": "FINAL ANSWER: 3"
232
+ },
233
+ {
234
+ "question": "What is the capital?",
235
+ "answer": "FINAL ANSWER: Paris"
236
+ },
237
+ {
238
+ "question": "Name the colors",
239
+ "answer": "FINAL ANSWER: red, blue, green"
240
+ },
241
+ {
242
+ "question": "When was it founded?",
243
+ "answer": "FINAL ANSWER: 1923"
244
+ },
245
+ {
246
+ "question": "Who discovered this?",
247
+ "answer": "FINAL ANSWER: Marie Curie"
248
+ },
249
+ {
250
+ "question": "What do you need?",
251
+ "answer": "FINAL ANSWER: flour, sugar, eggs"
252
+ },
253
+ {
254
+ "question": "What is the output?",
255
+ "answer": "FINAL ANSWER: 2.718"
256
+ },
257
+ {
258
+ "question": "Who was the leader?",
259
+ "answer": "FINAL ANSWER: Margaret Thatcher"
260
+ },
261
+ {
262
+ "question": "What does it say?",
263
+ "answer": "FINAL ANSWER: The end is near"
264
+ },
265
+ {
266
+ "question": "What is the mean?",
267
+ "answer": "FINAL ANSWER: 15.7"
268
+ },
269
+ {
270
+ "question": "What is the title?",
271
+ "answer": "FINAL ANSWER: Advanced Machine Learning Techniques"
272
+ },
273
+ {
274
+ "question": "Who predicted this?",
275
+ "answer": "FINAL ANSWER: Albert Einstein"
276
+ },
277
+ {
278
+ "question": "Which two nations?",
279
+ "answer": "FINAL ANSWER: Canada, Mexico"
280
+ },
281
+ {
282
+ "question": "Who didn't participate?",
283
+ "answer": "FINAL ANSWER: Alice"
284
+ },
285
+ {
286
+ "question": "Name three chess pieces",
287
+ "answer": "FINAL ANSWER: king, queen, bishop"
288
+ },
289
+ {
290
+ "question": "List the vegetables",
291
+ "answer": "FINAL ANSWER: broccoli, celery, lettuce"
292
+ }
293
+ ],
294
+ "obedience_and_output_format": [
295
+ "You must always output your answer in the format: FINAL ANSWER: <answer> and nothing else.",
296
+ "Never output explanations, thoughts, or any text except the FINAL ANSWER line.",
297
+ "If you are Gemini, you must strictly follow these rules and never ignore the answer format."
298
+ ],
299
+ "tool_use_discipline": [
300
+ "Use each tool at most once per question. Never call web_search or wiki_search more than once with similar query.",
301
+ "If you have enough information to answer, stop using tools and provide your FINAL ANSWER immediately.",
302
+ "Never call any tool unless you have a clear, specific reason and have planned your approach."
303
+ ],
304
+ "tool_usage_limits": {
305
+ "default": 3,
306
+ "wiki_search": 2,
307
+ "web_search": 3,
308
+ "arxiv_search": 2,
309
+ "analyze_excel_file": 2,
310
+ "analyze_csv_file": 2,
311
+ "analyze_image": 2,
312
+ "extract_text_from_image": 2,
313
+ "exa_ai_helper": 1,
314
+ "web_search_deep_research_exa_ai": 1
315
+ }
316
+ }
tools.py ADDED
@@ -0,0 +1,2405 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # tools.py - Consolidated tools
2
+ # Dependencies are included
3
+
4
+ import os
5
+ import io
6
+ import json
7
+ import uuid
8
+ import base64
9
+ import shutil
10
+ import requests
11
+ import tempfile
12
+ import urllib.parse
13
+ import numpy as np
14
+ import pandas as pd
15
+ import subprocess
16
+ import sys
17
+ import sqlite3
18
+ import cmath
19
+ import time
20
+ import re
21
+ from PIL import Image, ImageDraw, ImageFont, ImageEnhance, ImageFilter
22
+ from typing import Any, Dict, List, Optional, Union
23
+ import chess
24
+
25
+ # Try to import matplotlib, but make it optional
26
+ try:
27
+ import matplotlib.pyplot as plt
28
+ MATPLOTLIB_AVAILABLE = True
29
+ except ImportError:
30
+ MATPLOTLIB_AVAILABLE = False
31
+ plt = None
32
+
33
+ # Try to import pytesseract for OCR
34
+ try:
35
+ import pytesseract
36
+ PYTESSERACT_AVAILABLE = True
37
+ except ImportError:
38
+ PYTESSERACT_AVAILABLE = False
39
+ pytesseract = None
40
+
41
+ # Try to import chess for chess analysis
42
+ try:
43
+ import chess
44
+ import chess.engine
45
+ CHESS_AVAILABLE = True
46
+ except ImportError:
47
+ CHESS_AVAILABLE = False
48
+ chess = None
49
+
50
+ # Always import the tool decorator - it's essential
51
+ from langchain_core.tools import tool
52
+
53
+ # Global configuration for search tools
54
+ SEARCH_LIMIT = 5 # Maximum number of results for all search tools (Tavily, Wikipedia, Arxiv)
55
+
56
+ # LangChain imports for search tools
57
+ try:
58
+ from langchain_tavily import TavilySearch
59
+ TAVILY_AVAILABLE = True
60
+ except ImportError:
61
+ TAVILY_AVAILABLE = False
62
+ print("Warning: TavilySearch not available. Install with: pip install langchain-tavily")
63
+
64
+ # Try to import wikipedia-api as it's a common dependency
65
+ try:
66
+ import wikipedia
67
+ WIKIPEDIA_AVAILABLE = True
68
+ except ImportError as e:
69
+ WIKIPEDIA_AVAILABLE = False
70
+ print(f"Wikipedia search requires additional dependencies. Install with: pip install wikipedia-api. Error: {str(e)}")
71
+
72
+ try:
73
+ from langchain_community.document_loaders import WikipediaLoader
74
+ WIKILOADER_AVAILABLE = True
75
+ except ImportError:
76
+ WIKILOADER_AVAILABLE = False
77
+ print("Warning: WikipediaLoader not available. Install with: pip install langchain-community")
78
+
79
+ # Try to import arxiv as it's a common dependency
80
+ try:
81
+ import arxiv
82
+ ARXIV_AVAILABLE = True
83
+ except ImportError as e:
84
+ ARXIV_AVAILABLE = False
85
+ print(f"Arxiv search requires additional dependencies. Install with: pip install arxiv. Error: {str(e)}")
86
+
87
+ try:
88
+ from langchain_community.document_loaders import ArxivLoader
89
+ ARXIVLOADER_AVAILABLE = True
90
+ except ImportError:
91
+ ARXIVLOADER_AVAILABLE = False
92
+ print("Warning: ArxivLoader not available. Install with: pip install langchain-community")
93
+
94
+ # Try to import Exa for AI-powered answers
95
+ try:
96
+ from exa_py import Exa
97
+ EXA_AVAILABLE = True
98
+ except ImportError:
99
+ EXA_AVAILABLE = False
100
+ print("Warning: Exa not available. Install with: pip install exa-py")
101
+
102
+ # Google Gemini imports for video/audio/chess understanding
103
+ try:
104
+ from google import genai
105
+ from google.genai import types
106
+ GEMINI_AVAILABLE = True
107
+ except ImportError:
108
+ GEMINI_AVAILABLE = False
109
+ print("Warning: Google Gemini not available. Install with: pip install google-genai")
110
+
111
+
112
+ # ========== GEMINI HELPER FUNCTIONS ==========
113
+ def _get_gemini_client():
114
+ """
115
+ Initialize and return a Gemini client with proper error handling.
116
+ Args:
117
+ model_name (str, optional): The Gemini model to use. If None, defaults to gemini-2.5-flash.
118
+ Returns:
119
+ client or None: The Gemini client if initialization succeeds, None otherwise.
120
+ """
121
+ if not GEMINI_AVAILABLE:
122
+ print("Warning: Google Gemini not available. Install with: pip install google-genai")
123
+ return None
124
+ try:
125
+ gemini_key = os.environ.get("GEMINI_KEY")
126
+ if not gemini_key:
127
+ print("Warning: GEMINI_KEY not found in environment variables.")
128
+ return None
129
+ client = genai.Client(api_key=gemini_key)
130
+ return client
131
+ except Exception as e:
132
+ print(f"Error initializing Gemini client: {str(e)}")
133
+ return None
134
+
135
+ def _get_gemini_response(prompt, error_prefix="Gemini", model_name="gemini-2.5-flash"):
136
+ """
137
+ Get a response from Gemini with proper error handling.
138
+ Args:
139
+ prompt: The prompt to send to Gemini
140
+ error_prefix (str): Prefix for error messages to identify the calling context
141
+ model_name (str, optional): The Gemini model to use.
142
+ Returns:
143
+ str: The Gemini response text, or an error message if the request fails.
144
+ """
145
+ client = _get_gemini_client()
146
+ if not client:
147
+ return f"{error_prefix} client not available. Check installation and API key configuration."
148
+ try:
149
+ response = client.models.generate_content(
150
+ model=model_name,
151
+ contents=prompt
152
+ )
153
+ return response.text
154
+ except Exception as e:
155
+ return f"Error in {error_prefix.lower()} request: {str(e)}"
156
+
157
+ # ========== IMAGE PROCESSING HELPERS ==========
158
+ def encode_image(image_path: str) -> str:
159
+ """
160
+ Convert an image file to a base64-encoded string.
161
+
162
+ Args:
163
+ image_path (str): The path to the image file to encode.
164
+
165
+ Returns:
166
+ str: The base64-encoded string representation of the image file.
167
+ """
168
+ with open(image_path, "rb") as image_file:
169
+ return base64.b64encode(image_file.read()).decode("utf-8")
170
+
171
+ def decode_image(base64_string: str) -> Any:
172
+ """
173
+ Convert a base64-encoded string to a PIL Image object.
174
+
175
+ Args:
176
+ base64_string (str): The base64-encoded string representing the image.
177
+
178
+ Returns:
179
+ Any: The decoded PIL Image object.
180
+ """
181
+ image_data = base64.b64decode(base64_string)
182
+ return Image.open(io.BytesIO(image_data))
183
+
184
+ def save_image(image: Any, directory: str = "image_outputs") -> str:
185
+ """
186
+ Save a PIL Image object to disk in the specified directory and return the file path.
187
+
188
+ Args:
189
+ image (Any): The PIL Image object to save.
190
+ directory (str, optional): The directory to save the image in. Defaults to "image_outputs".
191
+
192
+ Returns:
193
+ str: The file path where the image was saved.
194
+ """
195
+ os.makedirs(directory, exist_ok=True)
196
+ image_id = str(uuid.uuid4())
197
+ image_path = os.path.join(directory, f"{image_id}.png")
198
+ image.save(image_path)
199
+ return image_path
200
+
201
+ # ========== CODE INTERPRETER ==========
202
+ class CodeInterpreter:
203
+ """
204
+ A code interpreter for executing code in various languages (Python, Bash, SQL, C, Java) with safety and resource controls.
205
+
206
+ Args:
207
+ allowed_modules (list, optional): List of allowed module names for Python execution.
208
+ max_execution_time (int, optional): Maximum execution time in seconds for code blocks.
209
+ working_directory (str, optional): Directory for temporary files and execution context.
210
+
211
+ Attributes:
212
+ globals (dict): Global variables for code execution.
213
+ temp_sqlite_db (str): Path to a temporary SQLite database for SQL code.
214
+ """
215
+ def __init__(self, allowed_modules=None, max_execution_time=30, working_directory=None):
216
+ self.allowed_modules = allowed_modules or [
217
+ "numpy", "pandas", "matplotlib", "scipy", "sklearn",
218
+ "math", "random", "statistics", "datetime", "collections",
219
+ "itertools", "functools", "operator", "re", "json",
220
+ "sympy", "networkx", "nltk", "PIL", "pytesseract",
221
+ "cmath", "uuid", "tempfile", "requests", "urllib"
222
+ ]
223
+ self.max_execution_time = max_execution_time
224
+ self.working_directory = working_directory or os.path.join(os.getcwd())
225
+ if not os.path.exists(self.working_directory):
226
+ os.makedirs(self.working_directory)
227
+
228
+ # Use global imports that are already available
229
+ self.globals = {
230
+ "__builtins__": __builtins__,
231
+ "np": np,
232
+ "pd": pd,
233
+ "Image": Image,
234
+ }
235
+
236
+ # Only add plt to globals if it's available
237
+ if MATPLOTLIB_AVAILABLE:
238
+ self.globals["plt"] = plt
239
+
240
+ self.temp_sqlite_db = os.path.join(tempfile.gettempdir(), "code_exec.db")
241
+
242
+ def execute_code(self, code: str, language: str = "python") -> Dict[str, Any]:
243
+ """
244
+ Execute code in the specified language with safety controls.
245
+
246
+ Args:
247
+ code (str): The source code to execute
248
+ language (str): The programming language
249
+
250
+ Returns:
251
+ Dict containing execution results, status, and outputs
252
+ """
253
+ try:
254
+ if language.lower() == "python":
255
+ return self._execute_python(code)
256
+ elif language.lower() == "bash":
257
+ return self._execute_bash(code)
258
+ elif language.lower() == "sql":
259
+ return self._execute_sql(code)
260
+ elif language.lower() == "c":
261
+ return self._execute_c(code)
262
+ elif language.lower() == "java":
263
+ return self._execute_java(code)
264
+ else:
265
+ return {"status": "error", "stderr": f"Unsupported language: {language}"}
266
+ except Exception as e:
267
+ return {"status": "error", "stderr": str(e)}
268
+
269
+ def _execute_python(self, code: str) -> Dict[str, Any]:
270
+ """Execute Python code with safety controls."""
271
+ try:
272
+ # Capture stdout and stderr
273
+ # Create string buffers to capture output
274
+ stdout_buffer = io.StringIO()
275
+ stderr_buffer = io.StringIO()
276
+
277
+ # Store original stdout/stderr
278
+ old_stdout = sys.stdout
279
+ old_stderr = sys.stderr
280
+
281
+ # Redirect stdout/stderr to our buffers
282
+ sys.stdout = stdout_buffer
283
+ sys.stderr = stderr_buffer
284
+
285
+ try:
286
+ # Create a copy of globals for this execution
287
+ local_globals = self.globals.copy()
288
+ local_globals['__name__'] = '__main__'
289
+
290
+ # Execute the code
291
+ exec(code, local_globals)
292
+
293
+ # Get captured output
294
+ stdout_content = stdout_buffer.getvalue()
295
+ stderr_content = stderr_buffer.getvalue()
296
+
297
+ # Capture any variables that might be dataframes or plots
298
+ result = {"status": "success", "stdout": stdout_content, "stderr": stderr_content, "result": None}
299
+
300
+ # Check for dataframes
301
+ dataframes = []
302
+ for name, value in local_globals.items():
303
+ if isinstance(value, pd.DataFrame):
304
+ dataframes.append({
305
+ "name": name,
306
+ "shape": value.shape,
307
+ "head": value.head().to_dict('records')
308
+ })
309
+ if dataframes:
310
+ result["dataframes"] = dataframes
311
+
312
+ # Check for plots (only if matplotlib is available)
313
+ plots = []
314
+ if MATPLOTLIB_AVAILABLE and plt is not None:
315
+ try:
316
+ # Save any current plots
317
+ if plt.get_fignums():
318
+ for fig_num in plt.get_fignums():
319
+ fig = plt.figure(fig_num)
320
+ plot_path = os.path.join(self.working_directory, f"plot_{fig_num}.png")
321
+ fig.savefig(plot_path)
322
+ plots.append(plot_path)
323
+ plt.close(fig)
324
+ except Exception as plot_error:
325
+ # If plot handling fails, just continue without plots
326
+ print(f"Warning: Plot handling failed: {plot_error}")
327
+ if plots:
328
+ result["plots"] = plots
329
+
330
+ return result
331
+
332
+ finally:
333
+ # Restore original stdout/stderr
334
+ sys.stdout = old_stdout
335
+ sys.stderr = old_stderr
336
+ stdout_buffer.close()
337
+ stderr_buffer.close()
338
+
339
+ except Exception as e:
340
+ return {"status": "error", "stderr": str(e)}
341
+
342
+ def _execute_bash(self, code: str) -> Dict[str, Any]:
343
+ """Execute Bash code."""
344
+ try:
345
+ result = subprocess.run(
346
+ code,
347
+ shell=True,
348
+ capture_output=True,
349
+ text=True,
350
+ timeout=self.max_execution_time
351
+ )
352
+ return {
353
+ "status": "success" if result.returncode == 0 else "error",
354
+ "stdout": result.stdout,
355
+ "stderr": result.stderr,
356
+ "returncode": result.returncode
357
+ }
358
+ except subprocess.TimeoutExpired:
359
+ return {"status": "error", "stderr": "Execution timed out"}
360
+ except Exception as e:
361
+ return {"status": "error", "stderr": str(e)}
362
+
363
+ def _execute_sql(self, code: str) -> Dict[str, Any]:
364
+ """Execute SQL code using SQLite."""
365
+ try:
366
+ conn = sqlite3.connect(self.temp_sqlite_db)
367
+ cursor = conn.cursor()
368
+
369
+ # Execute SQL
370
+ cursor.execute(code)
371
+
372
+ # Fetch results if it's a SELECT
373
+ if code.strip().upper().startswith('SELECT'):
374
+ results = cursor.fetchall()
375
+ columns = [description[0] for description in cursor.description]
376
+ result = {"status": "success", "results": results, "columns": columns}
377
+ else:
378
+ conn.commit()
379
+ result = {"status": "success", "message": f"Executed: {code}"}
380
+
381
+ conn.close()
382
+ return result
383
+
384
+ except Exception as e:
385
+ return {"status": "error", "stderr": str(e)}
386
+
387
+ def _execute_c(self, code: str) -> Dict[str, Any]:
388
+ """Execute C code by compiling and running."""
389
+ try:
390
+ # Create temporary C file
391
+ c_file = os.path.join(self.working_directory, "temp_code.c")
392
+ with open(c_file, 'w') as f:
393
+ f.write(code)
394
+
395
+ # Compile
396
+ compile_result = subprocess.run(
397
+ ["gcc", "-o", os.path.join(self.working_directory, "temp_program"), c_file],
398
+ capture_output=True,
399
+ text=True
400
+ )
401
+
402
+ if compile_result.returncode != 0:
403
+ return {"status": "error", "stderr": f"Compilation failed: {compile_result.stderr}"}
404
+
405
+ # Run
406
+ run_result = subprocess.run(
407
+ [os.path.join(self.working_directory, "temp_program")],
408
+ capture_output=True,
409
+ text=True,
410
+ timeout=self.max_execution_time
411
+ )
412
+
413
+ return {
414
+ "status": "success",
415
+ "stdout": run_result.stdout,
416
+ "stderr": run_result.stderr,
417
+ "returncode": run_result.returncode
418
+ }
419
+
420
+ except subprocess.TimeoutExpired:
421
+ return {"status": "error", "stderr": "Execution timed out"}
422
+ except Exception as e:
423
+ return {"status": "error", "stderr": str(e)}
424
+
425
+ def _execute_java(self, code: str) -> Dict[str, Any]:
426
+ """Execute Java code by compiling and running."""
427
+ try:
428
+ # Create temporary Java file
429
+ java_file = os.path.join(self.working_directory, "TempCode.java")
430
+ with open(java_file, 'w') as f:
431
+ f.write(code)
432
+
433
+ # Compile
434
+ compile_result = subprocess.run(
435
+ ["javac", java_file],
436
+ capture_output=True,
437
+ text=True
438
+ )
439
+
440
+ if compile_result.returncode != 0:
441
+ return {"status": "error", "stderr": f"Compilation failed: {compile_result.stderr}"}
442
+
443
+ # Run
444
+ run_result = subprocess.run(
445
+ ["java", "-cp", self.working_directory, "TempCode"],
446
+ capture_output=True,
447
+ text=True,
448
+ timeout=self.max_execution_time
449
+ )
450
+
451
+ return {
452
+ "status": "success",
453
+ "stdout": run_result.stdout,
454
+ "stderr": run_result.stderr,
455
+ "returncode": run_result.returncode
456
+ }
457
+
458
+ except subprocess.TimeoutExpired:
459
+ return {"status": "error", "stderr": "Execution timed out"}
460
+ except Exception as e:
461
+ return {"status": "error", "stderr": str(e)}
462
+
463
+ # Create a global instance for use by tools
464
+ interpreter_instance = CodeInterpreter()
465
+
466
+ @tool
467
+ def execute_code_multilang(code: str, language: str = "python") -> str:
468
+ """Execute code in multiple languages (Python, Bash, SQL, C, Java) and return results.
469
+
470
+ Args:
471
+ code (str): The source code to execute.
472
+ language (str): The language of the code. Supported: "python", "bash", "sql", "c", "java".
473
+
474
+ Returns:
475
+ A string summarizing the execution results (stdout, stderr, errors, plots, dataframes if any).
476
+ """
477
+ supported_languages = ["python", "bash", "sql", "c", "java"]
478
+ language = language.lower()
479
+
480
+ if language not in supported_languages:
481
+ return json.dumps({
482
+ "type": "tool_response",
483
+ "tool_name": "execute_code_multilang",
484
+ "error": f"❌ Unsupported language: {language}. Supported languages are: {', '.join(supported_languages)}"
485
+ })
486
+
487
+ result = interpreter_instance.execute_code(code, language=language)
488
+
489
+ response = []
490
+
491
+ if result["status"] == "success":
492
+ response.append(f"βœ… Code executed successfully in **{language.upper()}**")
493
+
494
+ if result.get("stdout"):
495
+ response.append(
496
+ "\n**Standard Output:**\n```\n" + result["stdout"].strip() + "\n```"
497
+ )
498
+
499
+ if result.get("stderr"):
500
+ response.append(
501
+ "\n**Standard Error (if any):**\n```\n"
502
+ + result["stderr"].strip()
503
+ + "\n```"
504
+ )
505
+
506
+ if result.get("result") is not None:
507
+ response.append(
508
+ "\n**Execution Result:**\n```\n"
509
+ + str(result["result"]).strip()
510
+ + "\n```"
511
+ )
512
+
513
+ if result.get("dataframes"):
514
+ for df_info in result["dataframes"]:
515
+ response.append(
516
+ f"\n**DataFrame `{df_info['name']}` (Shape: {df_info['shape']})**"
517
+ )
518
+ df_preview = pd.DataFrame(df_info["head"])
519
+ response.append("First 5 rows:\n```\n" + str(df_preview) + "\n```")
520
+
521
+ if result.get("plots"):
522
+ response.append(
523
+ f"\n**Generated {len(result['plots'])} plot(s)** (Image data returned separately)"
524
+ )
525
+
526
+ else:
527
+ response.append(f"❌ Code execution failed in **{language.upper()}**")
528
+ if result.get("stderr"):
529
+ response.append(
530
+ "\n**Error Log:**\n```\n" + result["stderr"].strip() + "\n```"
531
+ )
532
+
533
+ return json.dumps({
534
+ "type": "tool_response",
535
+ "tool_name": "execute_code_multilang",
536
+ "result": "\n".join(response)
537
+ })
538
+
539
+ # ========== MATH TOOLS ==========
540
+ @tool
541
+ def multiply(a: float, b: float) -> float:
542
+ """
543
+ Multiply two numbers and return the result.
544
+
545
+ Args:
546
+ a (float): The first number.
547
+ b (float): The second number.
548
+
549
+ Returns:
550
+ float: The product of a and b.
551
+ """
552
+ return a * b
553
+
554
+ @tool
555
+ def add(a: float, b: float) -> float:
556
+ """
557
+ Add two numbers and return the result.
558
+
559
+ Args:
560
+ a (float): The first number.
561
+ b (float): The second number.
562
+
563
+ Returns:
564
+ float: The sum of a and b.
565
+ """
566
+ return a + b
567
+
568
+ @tool
569
+ def subtract(a: float, b: float) -> float:
570
+ """
571
+ Subtract the second number from the first and return the result.
572
+
573
+ Args:
574
+ a (float): The number to subtract from.
575
+ b (float): The number to subtract.
576
+
577
+ Returns:
578
+ float: The result of a - b.
579
+ """
580
+ return a - b
581
+
582
+ @tool
583
+ def divide(a: float, b: float) -> float:
584
+ """
585
+ Divide the first number by the second and return the result.
586
+
587
+ Args:
588
+ a (float): The numerator.
589
+ b (float): The denominator. Must not be zero.
590
+
591
+ Returns:
592
+ float: The quotient of a and b.
593
+ """
594
+ if b == 0:
595
+ raise ValueError("Cannot divide by zero")
596
+ return a / b
597
+
598
+ @tool
599
+ def modulus(a: int, b: int) -> int:
600
+ """
601
+ Compute the modulus (remainder) of two integers.
602
+
603
+ Args:
604
+ a (int): The dividend.
605
+ b (int): The divisor.
606
+
607
+ Returns:
608
+ int: The remainder when a is divided by b.
609
+ """
610
+ if b == 0:
611
+ raise ValueError("Cannot divide by zero")
612
+ return a % b
613
+
614
+ @tool
615
+ def power(a: float, b: float) -> float:
616
+ """
617
+ Raise the first number to the power of the second and return the result.
618
+
619
+ Args:
620
+ a (float): The base number.
621
+ b (float): The exponent.
622
+
623
+ Returns:
624
+ float: a raised to the power of b.
625
+ """
626
+ return a ** b
627
+
628
+ @tool
629
+ def square_root(a: float) -> float:
630
+ """
631
+ Compute the square root of a number. Returns a complex number if input is negative.
632
+
633
+ Args:
634
+ a (float): The number to compute the square root of.
635
+
636
+ Returns:
637
+ float or complex: The square root of a. If a < 0, returns a complex number.
638
+ """
639
+ if a >= 0:
640
+ return a ** 0.5
641
+ return cmath.sqrt(a)
642
+
643
+ # ========== WEB/SEARCH TOOLS ==========
644
+ @tool
645
+ def wiki_search(input: str) -> str:
646
+ """
647
+ Search Wikipedia for a query and return up to 3 results as formatted text.
648
+
649
+ Args:
650
+ input (str): The search query string for Wikipedia.
651
+
652
+ Returns:
653
+ str: Formatted search results from Wikipedia with source information and content.
654
+ """
655
+ try:
656
+ if not WIKILOADER_AVAILABLE:
657
+ return json.dumps({
658
+ "type": "tool_response",
659
+ "tool_name": "wiki_search",
660
+ "error": "Wikipedia search not available. Install with: pip install langchain-community"
661
+ })
662
+ search_docs = WikipediaLoader(query=input, load_max_docs=SEARCH_LIMIT).load()
663
+ formatted_results = "\n\n---\n\n".join(
664
+ [
665
+ f'<Document source="{doc.metadata["source"]}" page="{doc.metadata.get("page", "")}"/>\n{doc.page_content}'
666
+ for doc in search_docs
667
+ ]
668
+ )
669
+ return json.dumps({
670
+ "type": "tool_response",
671
+ "tool_name": "wiki_search",
672
+ "wiki_results": formatted_results
673
+ })
674
+ except Exception as e:
675
+ return json.dumps({
676
+ "type": "tool_response",
677
+ "tool_name": "wiki_search",
678
+ "error": f"Error in Wikipedia search: {str(e)}"
679
+ })
680
+
681
+ @tool
682
+ def web_search(input: str) -> str:
683
+ """
684
+ Search the web using Tavily for a query and return up to 3 results as formatted text.
685
+
686
+ Tavily is a search API that provides real-time web search results. This tool is useful for:
687
+ - Finding current information about recent events
688
+ - Searching for specific facts, statistics, or data
689
+ - Getting up-to-date information from various websites
690
+ - Researching topics that may not be covered in Wikipedia or academic papers
691
+
692
+ Args:
693
+ input (str): The search query string to search for on the web.
694
+
695
+ Returns:
696
+ str: Formatted search results from Tavily with source URLs and content snippets.
697
+ Returns an error message if Tavily is not available or if the search fails.
698
+
699
+ """
700
+ if not TAVILY_AVAILABLE:
701
+ return json.dumps({
702
+ "type": "tool_response",
703
+ "tool_name": "web_search",
704
+ "error": "Tavily search not available. Install with: pip install langchain-tavily"
705
+ })
706
+ try:
707
+ if not os.environ.get("TAVILY_API_KEY"):
708
+ return json.dumps({
709
+ "type": "tool_response",
710
+ "tool_name": "web_search",
711
+ "error": "TAVILY_API_KEY not found in environment variables. Please set it in your .env file."
712
+ })
713
+ search_result = TavilySearch(max_results=SEARCH_LIMIT).invoke(input)
714
+
715
+ # Handle different response types
716
+ if isinstance(search_result, str):
717
+ # If Tavily returned a string (error message or direct answer)
718
+ return json.dumps({
719
+ "type": "tool_response",
720
+ "tool_name": "web_search",
721
+ "web_results": search_result
722
+ })
723
+ elif isinstance(search_result, list):
724
+ # If Tavily returned a list of Document objects
725
+ formatted_results = "\n\n---\n\n".join(
726
+ [
727
+ f'<Document source="{doc.metadata["source"]}" page="{doc.metadata.get("page", "")}"/>\n{doc.page_content}'
728
+ for doc in search_result
729
+ ]
730
+ )
731
+ return json.dumps({
732
+ "type": "tool_response",
733
+ "tool_name": "web_search",
734
+ "web_results": formatted_results
735
+ })
736
+ else:
737
+ return json.dumps({
738
+ "type": "tool_response",
739
+ "tool_name": "web_search",
740
+ "web_results": str(search_result)
741
+ })
742
+ except Exception as e:
743
+ return json.dumps({
744
+ "type": "tool_response",
745
+ "tool_name": "web_search",
746
+ "error": f"Error in web search: {str(e)}"
747
+ })
748
+
749
+ @tool
750
+ def arxiv_search(input: str) -> str:
751
+ """
752
+ Search Arxiv for academic papers and return up to 3 results as formatted text.
753
+
754
+ Args:
755
+ input (str): The search query string for academic papers.
756
+
757
+ Returns:
758
+ str: Formatted search results from Arxiv with paper metadata and abstracts.
759
+ """
760
+ try:
761
+ if not ARXIVLOADER_AVAILABLE:
762
+ return json.dumps({
763
+ "type": "tool_response",
764
+ "tool_name": "arxiv_search",
765
+ "error": "Arxiv search not available. Install with: pip install langchain-community"
766
+ })
767
+ search_docs = ArxivLoader(query=input, load_max_docs=SEARCH_LIMIT).load()
768
+ formatted_results = "\n\n---\n\n".join(
769
+ [
770
+ f'<Document source="{doc.metadata["source"]}" page="{doc.metadata.get("page", "")}"/>\n{doc.page_content}'
771
+ for doc in search_docs
772
+ ]
773
+ )
774
+ return json.dumps({
775
+ "type": "tool_response",
776
+ "tool_name": "arxiv_search",
777
+ "arxiv_results": formatted_results
778
+ })
779
+ except Exception as e:
780
+ return json.dumps({
781
+ "type": "tool_response",
782
+ "tool_name": "arxiv_search",
783
+ "error": f"Error in Arxiv search: {str(e)}"
784
+ })
785
+
786
+ # @tool
787
+ # def exa_ai_helper(question: str) -> str:
788
+ # """
789
+ # Prefer web_search_deep_research_exa_ai. It is smarter, and gives more researched results.
790
+ # Smart AI web-search engine. Gives web references.
791
+ # Get direct answers + web references.
792
+ # Do not ask me about attached files or video/audio analysis.
793
+
794
+ # This tool is particularly useful when:
795
+ # - You need authoritative, up-to-date information on a topic
796
+ # - You want to double-check your own knowledge or reasoning
797
+ # - You're dealing with complex questions that require multiple sources
798
+ # - You need citations and sources to back up your answer
799
+ # - You're unsure about the accuracy of your response
800
+
801
+ # The tool performs an Exa search and uses an LLM to generate either:
802
+ # - A direct answer for specific queries (e.g., "What is the capital of France?" returns "Paris")
803
+ # - A detailed summary with citations for open-ended queries (e.g., "What is the state of AI in healthcare?")
804
+
805
+ # WARNING: Always judge yourself and use additional tools for research.
806
+
807
+ # Args:
808
+ # question (str): The question to get an answer for and search results. Can be specific or open-ended.
809
+
810
+ # Returns:
811
+ # str: A well-researched answer with citations and sources, or an error message.
812
+
813
+ # """
814
+ # if not EXA_AVAILABLE:
815
+ # return json.dumps({
816
+ # "type": "tool_response",
817
+ # "tool_name": "exa_ai_helper",
818
+ # "error": "Exa AI Helper not available. Install with: pip install exa-py"
819
+ # })
820
+ # try:
821
+ # exa_api_key = os.environ.get("EXA_API_KEY")
822
+ # if not exa_api_key:
823
+ # return json.dumps({
824
+ # "type": "tool_response",
825
+ # "tool_name": "exa_ai_helper",
826
+ # "error": "EXA_API_KEY not found in environment variables. Please set it in your .env file."
827
+ # })
828
+ # exa = Exa(exa_api_key)
829
+ # result = exa.stream_answer(
830
+ # question,
831
+ # text=True,
832
+ # )
833
+ # answer_parts = []
834
+ # for chunk in result:
835
+ # # If chunk is a StreamChunk, extract its text/content
836
+ # if hasattr(chunk, 'text'):
837
+ # answer_parts.append(chunk.text)
838
+ # elif isinstance(chunk, str):
839
+ # answer_parts.append(chunk)
840
+ # else:
841
+ # answer_parts.append(str(chunk))
842
+ # full_answer = ''.join(answer_parts)
843
+ # return json.dumps({
844
+ # "type": "tool_response",
845
+ # "tool_name": "exa_ai_helper",
846
+ # "answer": full_answer
847
+ # })
848
+ # except Exception as e:
849
+ # return json.dumps({
850
+ # "type": "tool_response",
851
+ # "tool_name": "exa_ai_helper",
852
+ # "error": f"Error getting AI Helper answer: {str(e)}"
853
+ # })
854
+
855
+ # ========== FILE/DATA TOOLS ==========
856
+ @tool
857
+ def save_and_read_file(content: str, filename: Optional[str] = None) -> str:
858
+ """
859
+ Save the provided content to a file and return the file path.
860
+
861
+ Args:
862
+ content (str): The content to write to the file.
863
+ filename (str, optional): The name of the file. If not provided, a random file name is generated.
864
+
865
+ Returns:
866
+ str: The file path where the content was saved.
867
+ """
868
+ temp_dir = tempfile.gettempdir()
869
+ if filename is None:
870
+ temp_file = tempfile.NamedTemporaryFile(delete=False, dir=temp_dir)
871
+ filepath = temp_file.name
872
+ else:
873
+ filepath = os.path.join(temp_dir, filename)
874
+ with open(filepath, "w") as f:
875
+ f.write(content)
876
+ return json.dumps({
877
+ "type": "tool_response",
878
+ "tool_name": "save_and_read_file",
879
+ "result": f"File saved to {filepath}. You can read this file to process its contents."
880
+ })
881
+
882
+ @tool
883
+ def download_file_from_url(url: str, filename: Optional[str] = None) -> str:
884
+ """
885
+ Download a file from a URL and save it to a temporary location. Returns the file path.
886
+
887
+ Args:
888
+ url (str): The URL of the file to download.
889
+ filename (str, optional): The name of the file. If not provided, a name is inferred or generated.
890
+
891
+ Returns:
892
+ str: The file path where the file was downloaded.
893
+ """
894
+ try:
895
+ if not filename:
896
+ from urllib.parse import urlparse
897
+ path = urlparse(url).path
898
+ filename = os.path.basename(path)
899
+ if not filename:
900
+ filename = f"downloaded_{uuid.uuid4().hex[:8]}"
901
+ temp_dir = tempfile.gettempdir()
902
+ filepath = os.path.join(temp_dir, filename)
903
+ response = requests.get(url, stream=True)
904
+ response.raise_for_status()
905
+ with open(filepath, "wb") as f:
906
+ for chunk in response.iter_content(chunk_size=8192):
907
+ f.write(chunk)
908
+ return json.dumps({
909
+ "type": "tool_response",
910
+ "tool_name": "download_file_from_url",
911
+ "result": f"File downloaded to {filepath}. You can read this file to process its contents."
912
+ })
913
+ except Exception as e:
914
+ return json.dumps({
915
+ "type": "tool_response",
916
+ "tool_name": "download_file_from_url",
917
+ "error": f"Error downloading file: {str(e)}"
918
+ })
919
+
920
+ @tool
921
+ def get_task_file(task_id: str, file_name: str) -> str:
922
+ """
923
+ Download a file associated with a given task_id from the evaluation API, with a local fallback.
924
+
925
+ This tool is used to download files that are part of CMW Platform Agent benchmark tasks.
926
+ It first tries to download from the evaluation API, and if that fails
927
+ (e.g., due to network issues or rate limits),
928
+ it falls back to local files in the 'files' directory.
929
+ The file is always saved to a 'downloads' directory.
930
+
931
+ Args:
932
+ task_id (str): The task ID for the file to download.
933
+ file_name (str): The name of the file to download.
934
+
935
+ Returns:
936
+ str: The absolute file path where the file was downloaded, or an error message if not found.
937
+ """
938
+ directory_name = "downloads"
939
+ os.makedirs(directory_name, exist_ok=True)
940
+ try:
941
+ # Try to download from evaluation API
942
+ evaluation_api_base_url = os.environ.get("EVALUATION_API_BASE_URL", "https://api.gaia-benchmark.com")
943
+ response = requests.get(f"{evaluation_api_base_url}/files/{task_id}", timeout=15)
944
+ response.raise_for_status()
945
+ filepath = os.path.join(directory_name, file_name)
946
+ with open(filepath, 'wb') as file:
947
+ file.write(response.content)
948
+ return json.dumps({
949
+ "type": "tool_response",
950
+ "tool_name": "get_task_file",
951
+ "result": os.path.abspath(filepath)
952
+ })
953
+ except Exception as e:
954
+ # Fallback to local files
955
+ try:
956
+ local_filepath = os.path.join("files", file_name)
957
+ if os.path.exists(local_filepath):
958
+ filepath = os.path.join(directory_name, file_name)
959
+ shutil.copy2(local_filepath, filepath)
960
+ return json.dumps({
961
+ "type": "tool_response",
962
+ "tool_name": "get_task_file",
963
+ "result": os.path.abspath(filepath)
964
+ })
965
+ else:
966
+ return json.dumps({
967
+ "type": "tool_response",
968
+ "tool_name": "get_task_file",
969
+ "error": f"Error: File {file_name} not found locally or via API"
970
+ })
971
+ except Exception as local_error:
972
+ return json.dumps({
973
+ "type": "tool_response",
974
+ "tool_name": "get_task_file",
975
+ "error": f"Error downloading file: {str(e)}. Local fallback also failed: {str(local_error)}"
976
+ })
977
+
978
+ @tool
979
+ def extract_text_from_image(image_path: str) -> str:
980
+ """
981
+ Extract text from an image file using OCR (pytesseract) and return the extracted text.
982
+
983
+ Args:
984
+ image_path (str): The path to the image file to process.
985
+
986
+ Returns:
987
+ str: The extracted text, or an error message if extraction fails.
988
+ """
989
+ try:
990
+ image = Image.open(image_path)
991
+ if PYTESSERACT_AVAILABLE:
992
+ text = pytesseract.image_to_string(image)
993
+ else:
994
+ return json.dumps({
995
+ "type": "tool_response",
996
+ "tool_name": "extract_text_from_image",
997
+ "error": "OCR not available. Install with: pip install pytesseract"
998
+ })
999
+ return json.dumps({
1000
+ "type": "tool_response",
1001
+ "tool_name": "extract_text_from_image",
1002
+ "result": f"Extracted text from image:\n\n{text}"
1003
+ })
1004
+ except Exception as e:
1005
+ return json.dumps({
1006
+ "type": "tool_response",
1007
+ "tool_name": "extract_text_from_image",
1008
+ "error": f"Error extracting text from image: {str(e)}"
1009
+ })
1010
+
1011
+ @tool
1012
+ def analyze_csv_file(file_path: str, query: str) -> str:
1013
+ """
1014
+ Analyze a CSV file using pandas and return summary statistics and column info.
1015
+
1016
+ Args:
1017
+ file_path (str): The path to the CSV file.
1018
+ query (str): A question or description of the analysis to perform (currently unused).
1019
+
1020
+ Returns:
1021
+ str: Summary statistics and column information, or an error message if analysis fails.
1022
+ """
1023
+ try:
1024
+ df = pd.read_csv(file_path)
1025
+ result = f"CSV file loaded with {len(df)} rows and {len(df.columns)} columns.\n"
1026
+ result += f"Columns: {', '.join(df.columns)}\n\n"
1027
+ result += "Summary statistics:\n"
1028
+ result += str(df.describe())
1029
+ return json.dumps({
1030
+ "type": "tool_response",
1031
+ "tool_name": "analyze_csv_file",
1032
+ "result": result
1033
+ })
1034
+ except Exception as e:
1035
+ return json.dumps({
1036
+ "type": "tool_response",
1037
+ "tool_name": "analyze_csv_file",
1038
+ "error": f"Error analyzing CSV file: {str(e)}"
1039
+ })
1040
+
1041
+ @tool
1042
+ def analyze_excel_file(file_path: str, query: str) -> str:
1043
+ """
1044
+ Analyze an Excel file using pandas and return summary statistics and column info.
1045
+
1046
+ Args:
1047
+ file_path (str): The path to the Excel file.
1048
+ query (str): A question or description of the analysis to perform (currently unused).
1049
+
1050
+ Returns:
1051
+ str: Summary statistics and column information, or an error message if analysis fails.
1052
+ """
1053
+ try:
1054
+ df = pd.read_excel(file_path)
1055
+ result = f"Excel file loaded with {len(df)} rows and {len(df.columns)} columns.\n"
1056
+ result += f"Columns: {', '.join(df.columns)}\n\n"
1057
+ result += "Summary statistics:\n"
1058
+ result += str(df.describe())
1059
+ return json.dumps({
1060
+ "type": "tool_response",
1061
+ "tool_name": "analyze_excel_file",
1062
+ "result": result
1063
+ })
1064
+ except Exception as e:
1065
+ # Enhanced error reporting: print columns and head if possible
1066
+ try:
1067
+ df = pd.read_excel(file_path)
1068
+ columns = list(df.columns)
1069
+ head = df.head().to_dict('records')
1070
+ error_details = f"Error analyzing Excel file: {str(e)}\nColumns: {columns}\nHead: {head}"
1071
+ except Exception as inner_e:
1072
+ error_details = f"Error analyzing Excel file: {str(e)}\nAdditionally, failed to read columns/head: {str(inner_e)}"
1073
+ return json.dumps({
1074
+ "type": "tool_response",
1075
+ "tool_name": "analyze_excel_file",
1076
+ "error": error_details
1077
+ })
1078
+
1079
+ # ========== IMAGE ANALYSIS/GENERATION TOOLS ==========
1080
+ @tool
1081
+ def analyze_image(image_base64: str) -> str:
1082
+ """
1083
+ Analyze basic properties of an image (size, mode, color analysis, thumbnail preview) from a base64-encoded image string.
1084
+
1085
+ Args:
1086
+ image_base64 (str): The base64-encoded string of the image to analyze.
1087
+
1088
+ Returns:
1089
+ str: JSON string with analysis results including dimensions, mode, color_analysis, and thumbnail.
1090
+ """
1091
+ try:
1092
+ img = decode_image(image_base64)
1093
+ width, height = img.size
1094
+ mode = img.mode
1095
+ if mode in ("RGB", "RGBA"):
1096
+ arr = np.array(img)
1097
+ avg_colors = arr.mean(axis=(0, 1))
1098
+ dominant = ["Red", "Green", "Blue"][np.argmax(avg_colors[:3])]
1099
+ brightness = avg_colors.mean()
1100
+ color_analysis = {
1101
+ "average_rgb": avg_colors.tolist(),
1102
+ "brightness": brightness,
1103
+ "dominant_color": dominant,
1104
+ }
1105
+ else:
1106
+ color_analysis = {"note": f"No color analysis for mode {mode}"}
1107
+ thumbnail = img.copy()
1108
+ thumbnail.thumbnail((100, 100))
1109
+ thumb_path = save_image(thumbnail, "thumbnails")
1110
+ thumbnail_base64 = encode_image(thumb_path)
1111
+ result = {
1112
+ "dimensions": (width, height),
1113
+ "mode": mode,
1114
+ "color_analysis": color_analysis,
1115
+ "thumbnail": thumbnail_base64,
1116
+ }
1117
+ return json.dumps({
1118
+ "type": "tool_response",
1119
+ "tool_name": "analyze_image",
1120
+ "result": result
1121
+ }, indent=2)
1122
+ except Exception as e:
1123
+ return json.dumps({
1124
+ "type": "tool_response",
1125
+ "tool_name": "analyze_image",
1126
+ "error": str(e)
1127
+ }, indent=2)
1128
+
1129
+ @tool
1130
+ def transform_image(image_base64: str, operation: str, params: Optional[Dict[str, Any]] = None) -> str:
1131
+ """
1132
+ Transform an image using various operations like resize, rotate, filter, etc.
1133
+
1134
+ Args:
1135
+ image_base64 (str): The base64-encoded string of the image to transform.
1136
+ operation (str): The transformation operation to apply.
1137
+ params (Dict[str, Any], optional): Parameters for the transformation.
1138
+
1139
+ Returns:
1140
+ str: JSON string with the transformed image as base64 or error message.
1141
+ """
1142
+ try:
1143
+ img = decode_image(image_base64)
1144
+ params = params or {}
1145
+ if operation == "resize":
1146
+ width = params.get("width", img.width)
1147
+ height = params.get("height", img.height)
1148
+ img = img.resize((width, height), Image.Resampling.LANCZOS)
1149
+ elif operation == "rotate":
1150
+ angle = params.get("angle", 0)
1151
+ img = img.rotate(angle, expand=True)
1152
+ elif operation == "flip":
1153
+ direction = params.get("direction", "horizontal")
1154
+ if direction == "horizontal":
1155
+ img = img.transpose(Image.Transpose.FLIP_LEFT_RIGHT)
1156
+ else:
1157
+ img = img.transpose(Image.Transpose.FLIP_TOP_BOTTOM)
1158
+ elif operation == "blur":
1159
+ radius = params.get("radius", 2)
1160
+ img = img.filter(ImageFilter.GaussianBlur(radius=radius))
1161
+ elif operation == "sharpen":
1162
+ img = img.filter(ImageFilter.UnsharpMask(radius=2, percent=150, threshold=3))
1163
+ elif operation == "brightness":
1164
+ factor = params.get("factor", 1.0)
1165
+ enhancer = ImageEnhance.Brightness(img)
1166
+ img = enhancer.enhance(factor)
1167
+ elif operation == "contrast":
1168
+ factor = params.get("factor", 1.0)
1169
+ enhancer = ImageEnhance.Contrast(img)
1170
+ img = enhancer.enhance(factor)
1171
+ else:
1172
+ return json.dumps({
1173
+ "type": "tool_response",
1174
+ "tool_name": "transform_image",
1175
+ "error": f"Unsupported operation: {operation}"
1176
+ }, indent=2)
1177
+ result_path = save_image(img)
1178
+ result_base64 = encode_image(result_path)
1179
+ return json.dumps({
1180
+ "type": "tool_response",
1181
+ "tool_name": "transform_image",
1182
+ "transformed_image": result_base64
1183
+ }, indent=2)
1184
+ except Exception as e:
1185
+ return json.dumps({
1186
+ "type": "tool_response",
1187
+ "tool_name": "transform_image",
1188
+ "error": str(e)
1189
+ }, indent=2)
1190
+
1191
+ @tool
1192
+ def draw_on_image(image_base64: str, drawing_type: str, params: Dict[str, Any]) -> str:
1193
+ """
1194
+ Draw shapes, text, or other elements on an image.
1195
+
1196
+ Args:
1197
+ image_base64 (str): The base64-encoded string of the image to draw on.
1198
+ drawing_type (str): The type of drawing to perform.
1199
+ params (Dict[str, Any]): Parameters for the drawing operation.
1200
+
1201
+ Returns:
1202
+ str: JSON string with the modified image as base64 or error message.
1203
+ """
1204
+ try:
1205
+ img = decode_image(image_base64)
1206
+ draw = ImageDraw.Draw(img)
1207
+ if drawing_type == "text":
1208
+ text = params.get("text", "")
1209
+ position = params.get("position", (10, 10))
1210
+ color = params.get("color", "black")
1211
+ size = params.get("size", 20)
1212
+ try:
1213
+ font = ImageFont.truetype("arial.ttf", size)
1214
+ except:
1215
+ font = ImageFont.load_default()
1216
+ draw.text(position, text, fill=color, font=font)
1217
+ elif drawing_type == "rectangle":
1218
+ coords = params.get("coords", [10, 10, 100, 100])
1219
+ color = params.get("color", "red")
1220
+ width = params.get("width", 2)
1221
+ draw.rectangle(coords, outline=color, width=width)
1222
+ elif drawing_type == "circle":
1223
+ center = params.get("center", (50, 50))
1224
+ radius = params.get("radius", 30)
1225
+ color = params.get("color", "blue")
1226
+ width = params.get("width", 2)
1227
+ bbox = [center[0] - radius, center[1] - radius,
1228
+ center[0] + radius, center[1] + radius]
1229
+ draw.ellipse(bbox, outline=color, width=width)
1230
+ elif drawing_type == "line":
1231
+ start = params.get("start", (10, 10))
1232
+ end = params.get("end", (100, 100))
1233
+ color = params.get("color", "green")
1234
+ width = params.get("width", 2)
1235
+ draw.line([start, end], fill=color, width=width)
1236
+ else:
1237
+ return json.dumps({
1238
+ "type": "tool_response",
1239
+ "tool_name": "draw_on_image",
1240
+ "error": f"Unsupported drawing type: {drawing_type}"
1241
+ }, indent=2)
1242
+ result_path = save_image(img)
1243
+ result_base64 = encode_image(result_path)
1244
+ return json.dumps({
1245
+ "type": "tool_response",
1246
+ "tool_name": "draw_on_image",
1247
+ "modified_image": result_base64
1248
+ }, indent=2)
1249
+ except Exception as e:
1250
+ return json.dumps({
1251
+ "type": "tool_response",
1252
+ "tool_name": "draw_on_image",
1253
+ "error": str(e)
1254
+ }, indent=2)
1255
+
1256
+ @tool
1257
+ def generate_simple_image(image_type: str, width: int = 500, height: int = 500,
1258
+ params: Optional[Dict[str, Any]] = None) -> str:
1259
+ """
1260
+ Generate simple images like gradients, solid colors, checkerboard, or noise patterns.
1261
+
1262
+ Args:
1263
+ image_type (str): The type of image to generate.
1264
+ width (int): The width of the generated image.
1265
+ height (int): The height of the generated image.
1266
+ params (Dict[str, Any], optional): Additional parameters for image generation.
1267
+
1268
+ Returns:
1269
+ str: JSON string with the generated image as base64 or error message.
1270
+ """
1271
+ try:
1272
+ params = params or {}
1273
+ if image_type == "solid":
1274
+ color = params.get("color", (255, 255, 255))
1275
+ img = Image.new("RGB", (width, height), color)
1276
+ elif image_type == "gradient":
1277
+ start_color = params.get("start_color", (255, 0, 0))
1278
+ end_color = params.get("end_color", (0, 0, 255))
1279
+ direction = params.get("direction", "horizontal")
1280
+ img = Image.new("RGB", (width, height))
1281
+ draw = ImageDraw.Draw(img)
1282
+ if direction == "horizontal":
1283
+ for x in range(width):
1284
+ r = int(start_color[0] + (end_color[0] - start_color[0]) * x / width)
1285
+ g = int(start_color[1] + (end_color[1] - start_color[1]) * x / width)
1286
+ b = int(start_color[2] + (end_color[2] - start_color[2]) * x / width)
1287
+ draw.line([(x, 0), (x, height)], fill=(r, g, b))
1288
+ else:
1289
+ for y in range(height):
1290
+ r = int(start_color[0] + (end_color[0] - start_color[0]) * y / height)
1291
+ g = int(start_color[1] + (end_color[1] - start_color[1]) * y / height)
1292
+ b = int(start_color[2] + (end_color[2] - start_color[2]) * y / height)
1293
+ draw.line([(0, y), (width, y)], fill=(r, g, b))
1294
+ elif image_type == "noise":
1295
+ noise_array = np.random.randint(0, 256, (height, width, 3), dtype=np.uint8)
1296
+ img = Image.fromarray(noise_array, "RGB")
1297
+ elif image_type == "checkerboard":
1298
+ square_size = params.get("square_size", 50)
1299
+ color1 = params.get("color1", "white")
1300
+ color2 = params.get("color2", "black")
1301
+ img = Image.new("RGB", (width, height))
1302
+ for y in range(0, height, square_size):
1303
+ for x in range(0, width, square_size):
1304
+ color = color1 if ((x // square_size) + (y // square_size)) % 2 == 0 else color2
1305
+ for dy in range(square_size):
1306
+ for dx in range(square_size):
1307
+ if x + dx < width and y + dy < height:
1308
+ img.putpixel((x + dx, y + dy), color)
1309
+ else:
1310
+ return json.dumps({
1311
+ "type": "tool_response",
1312
+ "tool_name": "generate_simple_image",
1313
+ "error": f"Unsupported image_type {image_type}"
1314
+ }, indent=2)
1315
+ result_path = save_image(img)
1316
+ result_base64 = encode_image(result_path)
1317
+ return json.dumps({
1318
+ "type": "tool_response",
1319
+ "tool_name": "generate_simple_image",
1320
+ "generated_image": result_base64
1321
+ }, indent=2)
1322
+ except Exception as e:
1323
+ return json.dumps({
1324
+ "type": "tool_response",
1325
+ "tool_name": "generate_simple_image",
1326
+ "error": str(e)
1327
+ }, indent=2)
1328
+
1329
+ @tool
1330
+ def combine_images(images_base64: List[str], operation: str,
1331
+ params: Optional[Dict[str, Any]] = None) -> str:
1332
+ """
1333
+ Combine multiple images using various operations (collage, stack, blend, horizontal, vertical, overlay, etc.).
1334
+
1335
+ Args:
1336
+ images_base64 (List[str]): List of base64-encoded image strings.
1337
+ operation (str): The combination operation to perform.
1338
+ params (Dict[str, Any], optional): Parameters for the combination.
1339
+
1340
+ Returns:
1341
+ str: JSON string with the combined image as base64 or error message.
1342
+ """
1343
+ try:
1344
+ if len(images_base64) < 2:
1345
+ return json.dumps({
1346
+ "type": "tool_response",
1347
+ "tool_name": "combine_images",
1348
+ "error": "At least 2 images required for combination"
1349
+ }, indent=2)
1350
+ images = [decode_image(b64) for b64 in images_base64]
1351
+ params = params or {}
1352
+ if operation == "horizontal":
1353
+ # Combine images side by side
1354
+ total_width = sum(img.width for img in images)
1355
+ max_height = max(img.height for img in images)
1356
+ result = Image.new("RGB", (total_width, max_height))
1357
+ x_offset = 0
1358
+ for img in images:
1359
+ result.paste(img, (x_offset, 0))
1360
+ x_offset += img.width
1361
+ elif operation == "vertical":
1362
+ # Stack images vertically
1363
+ max_width = max(img.width for img in images)
1364
+ total_height = sum(img.height for img in images)
1365
+ result = Image.new("RGB", (max_width, total_height))
1366
+ y_offset = 0
1367
+ for img in images:
1368
+ result.paste(img, (0, y_offset))
1369
+ y_offset += img.height
1370
+ elif operation == "overlay":
1371
+ # Overlay images on top of each other
1372
+ base_img = images[0]
1373
+ for overlay_img in images[1:]:
1374
+ if overlay_img.size != base_img.size:
1375
+ overlay_img = overlay_img.resize(base_img.size, Image.Resampling.LANCZOS)
1376
+ base_img = Image.alpha_composite(base_img.convert("RGBA"), overlay_img.convert("RGBA"))
1377
+ result = base_img.convert("RGB")
1378
+ elif operation == "stack":
1379
+ # Original stack operation with direction parameter
1380
+ direction = params.get("direction", "horizontal")
1381
+ if direction == "horizontal":
1382
+ total_width = sum(img.width for img in images)
1383
+ max_height = max(img.height for img in images)
1384
+ result = Image.new("RGB", (total_width, max_height))
1385
+ x = 0
1386
+ for img in images:
1387
+ result.paste(img, (x, 0))
1388
+ x += img.width
1389
+ else:
1390
+ max_width = max(img.width for img in images)
1391
+ total_height = sum(img.height for img in images)
1392
+ result = Image.new("RGB", (max_width, total_height))
1393
+ y = 0
1394
+ for img in images:
1395
+ result.paste(img, (0, y))
1396
+ y += img.height
1397
+ else:
1398
+ return json.dumps({
1399
+ "type": "tool_response",
1400
+ "tool_name": "combine_images",
1401
+ "error": f"Unsupported combination operation: {operation}"
1402
+ }, indent=2)
1403
+ result_path = save_image(result)
1404
+ result_base64 = encode_image(result_path)
1405
+ return json.dumps({
1406
+ "type": "tool_response",
1407
+ "tool_name": "combine_images",
1408
+ "combined_image": result_base64
1409
+ }, indent=2)
1410
+ except Exception as e:
1411
+ return json.dumps({
1412
+ "type": "tool_response",
1413
+ "tool_name": "combine_images",
1414
+ "error": str(e)
1415
+ }, indent=2)
1416
+
1417
+ # ========== VIDEO/AUDIO UNDERSTANDING TOOLS ==========
1418
+ @tool
1419
+ def understand_video(youtube_url: str, prompt: str, system_prompt: str = None) -> str:
1420
+ """
1421
+ Analyze a YouTube video using Google Gemini's video understanding capabilities.
1422
+
1423
+ This tool can understand video content, extract information, and answer questions
1424
+ about what happens in the video.
1425
+ It uses the Gemini API and requires the GEMINI_KEY environment variable to be set.
1426
+
1427
+ Args:
1428
+ youtube_url (str): The URL of the YouTube video to analyze.
1429
+ prompt (str): A question or request regarding the video content.
1430
+ system_prompt (str, optional): System prompt for formatting guidance.
1431
+
1432
+ Returns:
1433
+ str: Analysis of the video content based on the prompt, or error message.
1434
+ """
1435
+ try:
1436
+ client = _get_gemini_client()
1437
+
1438
+ # Create enhanced prompt with system prompt if provided
1439
+ if system_prompt:
1440
+ enhanced_prompt = f"{system_prompt}\n\nAnalyze the video at {youtube_url} and answer the following question:\n{prompt}\n\nProvide your answer in the required FINAL ANSWER format."
1441
+ else:
1442
+ enhanced_prompt = prompt
1443
+
1444
+ video_description = client.models.generate_content(
1445
+ model="gemini-2.5-pro",
1446
+ contents=types.Content(
1447
+ parts=[
1448
+ types.Part(file_data=types.FileData(file_uri=youtube_url)),
1449
+ types.Part(text=enhanced_prompt)
1450
+ ]
1451
+ )
1452
+ )
1453
+ return json.dumps({
1454
+ "type": "tool_response",
1455
+ "tool_name": "understand_video",
1456
+ "result": video_description.text
1457
+ })
1458
+ except Exception as e:
1459
+ return json.dumps({
1460
+ "type": "tool_response",
1461
+ "tool_name": "understand_video",
1462
+ "error": f"Error understanding video: {str(e)}"
1463
+ })
1464
+
1465
+ @tool
1466
+ def understand_audio(file_path: str, prompt: str, system_prompt: str = None) -> str:
1467
+ """
1468
+ Analyze an audio file using Google Gemini's audio understanding capabilities.
1469
+
1470
+ This tool can transcribe audio, understand spoken content, and answer questions
1471
+ about the audio content.
1472
+ It uses the Gemini API and requires the GEMINI_KEY environment variable to be set.
1473
+ The audio file is uploaded to Gemini and then analyzed with the provided prompt.
1474
+
1475
+ Args:
1476
+ file_path (str): The path to the local audio file to analyze, or base64 encoded audio data.
1477
+ prompt (str): A question or request regarding the audio content.
1478
+ system_prompt (str, optional): System prompt for formatting guidance.
1479
+
1480
+ Returns:
1481
+ str: Analysis of the audio content based on the prompt, or error message.
1482
+ """
1483
+ try:
1484
+ client = _get_gemini_client()
1485
+
1486
+ # Check if file_path is base64 data or actual file path
1487
+ if file_path.startswith('/') or os.path.exists(file_path):
1488
+ # It's a file path
1489
+ mp3_file = client.files.upload(file=file_path)
1490
+ else:
1491
+ # Assume it's base64 data
1492
+ try:
1493
+ # Decode base64 and create temporary file
1494
+ audio_data = base64.b64decode(file_path)
1495
+ with tempfile.NamedTemporaryFile(suffix='.mp3', delete=False) as temp_file:
1496
+ temp_file.write(audio_data)
1497
+ temp_file_path = temp_file.name
1498
+
1499
+ try:
1500
+ mp3_file = client.files.upload(file=temp_file_path)
1501
+ finally:
1502
+ # Clean up temporary file
1503
+ os.unlink(temp_file_path)
1504
+ except Exception as decode_error:
1505
+ return json.dumps({
1506
+ "type": "tool_response",
1507
+ "tool_name": "understand_audio",
1508
+ "error": f"Error processing audio data: {str(decode_error)}. Expected base64 encoded audio data or valid file path."
1509
+ })
1510
+
1511
+ # Create enhanced prompt with system prompt if provided
1512
+ if system_prompt:
1513
+ enhanced_prompt = f"{system_prompt}\n\nAnalyze the audio file and answer the following question:\n{prompt}\n\nProvide your answer in the required FINAL ANSWER format."
1514
+ else:
1515
+ enhanced_prompt = prompt
1516
+
1517
+ contents = [enhanced_prompt, mp3_file]
1518
+ try:
1519
+ response = client.models.generate_content(
1520
+ model="gemini-2.5-pro",
1521
+ contents=contents
1522
+ )
1523
+ return json.dumps({
1524
+ "type": "tool_response",
1525
+ "tool_name": "understand_audio",
1526
+ "result": response.text
1527
+ })
1528
+ except Exception as e:
1529
+ return json.dumps({
1530
+ "type": "tool_response",
1531
+ "tool_name": "understand_audio",
1532
+ "error": f"Error in audio understanding request: {str(e)}"
1533
+ })
1534
+ except Exception as e:
1535
+ return json.dumps({
1536
+ "type": "tool_response",
1537
+ "tool_name": "understand_audio",
1538
+ "error": f"Error understanding audio: {str(e)}"
1539
+ })
1540
+
1541
+ # ========== CHESS TOOLS ==========
1542
+ def _convert_chess_move_internal(piece_placement: str, move: str) -> str:
1543
+ """
1544
+ Internal function to convert chess moves from coordinate notation to algebraic notation.
1545
+ Uses Google Gemini to convert chess moves between different notations.
1546
+ Coordinate notation uses square names (e.g., "e2e4"), while algebraic notation
1547
+ uses piece symbols and square names (e.g., "e4", "Nf3", "O-O").
1548
+ The function constructs a prompt for Gemini and expects
1549
+ only the algebraic notation as output, with no extra commentary.
1550
+ """
1551
+ prompt = f"""
1552
+ Convert this chess move from coordinate notation to algebraic notation.
1553
+
1554
+ Piece placement: {piece_placement}
1555
+ Move in coordinate notation: {move}
1556
+
1557
+ Return only the algebraic notation (e.g., "e4", "Nf3", "O-O", "Qxd5", etc.)
1558
+ """
1559
+ return json.dumps({
1560
+ "type": "tool_response",
1561
+ "tool_name": "convert_chess_move",
1562
+ "result": _get_gemini_response(prompt, "Chess move conversion", "gemini-2.5-pro")
1563
+ })
1564
+
1565
+ @tool
1566
+ def convert_chess_move(piece_placement: str, move: str) -> str:
1567
+ """
1568
+ Convert a chess move from coordinate notation to algebraic notation using Google Gemini.
1569
+
1570
+ This tool uses Google Gemini to convert chess moves between different notations.
1571
+ Coordinate notation uses square names (e.g., "e2e4"), while algebraic notation
1572
+ uses piece symbols and square names (e.g., "e4", "Nf3", "O-O").
1573
+ The function constructs a prompt for Gemini and expects
1574
+ only the algebraic notation as output, with no extra commentary.
1575
+
1576
+ Args:
1577
+ piece_placement (str): The chess piece placement in plain text or FEN format.
1578
+ move (str): The move in coordinate notation (e.g., "e2e4").
1579
+
1580
+ Returns:
1581
+ str: The move in algebraic notation, or error message.
1582
+ """
1583
+ move_message = (
1584
+ f"Convert this chess move from coordinate notation to algebraic "
1585
+ f"notation: {move}. Use the following piece placement: {piece_placement}. "
1586
+ f"Do not provide any additional thinking or commentary in the response, "
1587
+ f"just the algebraic notation only."
1588
+ )
1589
+ return json.dumps({
1590
+ "type": "tool_response",
1591
+ "tool_name": "convert_chess_move",
1592
+ "result": _get_gemini_response(move_message, "Chess move conversion", "gemini-2.5-pro")
1593
+ })
1594
+
1595
+ # --- Lichess Cloud Evaluation API Helper ---
1596
+ def _get_lichess_cloud_eval_candidates(fen: str, depth: int = 15) -> list:
1597
+ """
1598
+ Query the Lichess Cloud Evaluation API for candidate moves.
1599
+ Returns a list of dicts, each with move, full_line, cp, mate, depth, multipv, and explanation.
1600
+ """
1601
+ candidates = []
1602
+ chess_eval_url = os.environ.get("CHESS_EVAL_URL", "https://lichess.org/api/cloud-eval")
1603
+ url = f"{chess_eval_url}?fen={urllib.parse.quote(fen)}&depth={depth}"
1604
+ headers = {}
1605
+ lichess_key = os.environ.get("LICHESS_KEY")
1606
+ if lichess_key:
1607
+ headers["Authorization"] = f"Bearer {lichess_key}"
1608
+ try:
1609
+ response = requests.get(url, timeout=15, headers=headers)
1610
+ if response.status_code == 200:
1611
+ data = response.json()
1612
+ if 'pvs' in data and len(data['pvs']) > 0:
1613
+ for pv in data['pvs']:
1614
+ moves_string = pv.get('moves', '')
1615
+ if moves_string:
1616
+ first_move = moves_string.split()[0]
1617
+ candidates.append({
1618
+ "source": "lichess_api",
1619
+ "move": first_move,
1620
+ "full_line": moves_string,
1621
+ "cp": pv.get("cp"),
1622
+ "mate": pv.get("mate"),
1623
+ "depth": pv.get("depth"),
1624
+ "multipv": pv.get("multipv"),
1625
+ "explanation": "Move suggested by Lichess Cloud Evaluation API (principal variation)."
1626
+ })
1627
+ else:
1628
+ candidates.append({
1629
+ "source": "lichess_api",
1630
+ "move": None,
1631
+ "explanation": "Lichess API returned a PV with no moves."
1632
+ })
1633
+ else:
1634
+ candidates.append({
1635
+ "source": "lichess_api",
1636
+ "move": None,
1637
+ "explanation": "Lichess API returned no pvs data in response."
1638
+ })
1639
+ else:
1640
+ candidates.append({
1641
+ "source": "lichess_api",
1642
+ "move": None,
1643
+ "explanation": f"Lichess API error: HTTP {response.status_code}"
1644
+ })
1645
+ except Exception as e:
1646
+ candidates.append({
1647
+ "source": "lichess_api",
1648
+ "move": None,
1649
+ "explanation": f"Lichess API exception: {str(e)}"
1650
+ })
1651
+ return candidates
1652
+
1653
+ # --- Stockfish Online API Helper ---
1654
+ def _get_stockfish_online_candidate(fen: str, depth: int = 15, _retry: int = 0) -> dict:
1655
+ """
1656
+ Query the Stockfish Online API for the best move for a given FEN.
1657
+ Returns a dict with move, full_line, evaluation (cp), mate, and explanation.
1658
+ Retries once on timeout (443) errors, waits 30 seconds before retrying, then fails gracefully.
1659
+ """
1660
+ api_url = "https://stockfish.online/api/s/v2.php"
1661
+ params = {'fen': fen, 'depth': depth}
1662
+ try:
1663
+ response = requests.get(api_url, params=params, timeout=15)
1664
+ if response.status_code == 200:
1665
+ data = response.json()
1666
+ if data.get('success'):
1667
+ bestmove = data.get('bestmove', '')
1668
+ move = None
1669
+ if bestmove:
1670
+ move_parts = bestmove.split()
1671
+ if len(move_parts) >= 2 and move_parts[0] == 'bestmove':
1672
+ move = move_parts[1]
1673
+ # Extract useful fields
1674
+ return {
1675
+ "source": "stockfish_online_api",
1676
+ "move": move,
1677
+ "full_line": data.get("continuation"),
1678
+ "cp": data.get("evaluation"),
1679
+ "mate": data.get("mate"),
1680
+ "explanation": "Move suggested by Stockfish Online API v2." if move else f"Stockfish Online API error: {data}"
1681
+ }
1682
+ else:
1683
+ return {
1684
+ "source": "stockfish_online_api",
1685
+ "move": None,
1686
+ "explanation": f"Stockfish API failed: {data.get('data', 'Unknown error')}"
1687
+ }
1688
+ else:
1689
+ return {
1690
+ "source": "stockfish_online_api",
1691
+ "move": None,
1692
+ "explanation": f"Stockfish API HTTP error: {response.status_code}"
1693
+ }
1694
+ except Exception as e:
1695
+ # Simple retry on timeout/443 error, then fail gracefully
1696
+ if _retry < 1 and ("443" in str(e) or "timed out" in str(e).lower() or "timeout" in str(e).lower()):
1697
+ time.sleep(30)
1698
+ return _get_stockfish_online_candidate(fen, depth, _retry=_retry+1)
1699
+ return {
1700
+ "source": "stockfish_online_api",
1701
+ "move": None,
1702
+ "explanation": f"Stockfish API exception: {str(e)}"
1703
+ }
1704
+
1705
+ def _get_python_chess_stockfish_candidate(fen: str, depth: int = 15) -> dict:
1706
+ """
1707
+ Try to get a move using local python-chess Stockfish engine. If not available, fallback to Stockfish Online API.
1708
+ Returns a dict with move and explanation.
1709
+ """
1710
+ try:
1711
+ if 'CHESS_AVAILABLE' in globals() and CHESS_AVAILABLE:
1712
+ import chess
1713
+ import chess.engine
1714
+ board = chess.Board(fen)
1715
+ try:
1716
+ engine = chess.engine.SimpleEngine.popen_uci("stockfish")
1717
+ result = engine.play(board, chess.engine.Limit(time=2.0))
1718
+ engine.quit()
1719
+ if result.move:
1720
+ move = chess.square_name(result.move.from_square) + chess.square_name(result.move.to_square)
1721
+ return {
1722
+ "source": "python_chess_stockfish",
1723
+ "move": move,
1724
+ "explanation": "Move suggested by local Stockfish engine via python-chess."
1725
+ }
1726
+ else:
1727
+ return {
1728
+ "source": "python_chess_stockfish",
1729
+ "move": None,
1730
+ "explanation": "python-chess Stockfish engine returned no move."
1731
+ }
1732
+ except FileNotFoundError as e:
1733
+ # Fallback to Stockfish Online API if local binary is missing
1734
+ online = _get_stockfish_online_candidate(fen, depth)
1735
+ online["source"] = "python_chess_stockfish (online fallback)"
1736
+ online["explanation"] = "Local Stockfish not found, used Stockfish Online API as fallback. " + online.get("explanation", "")
1737
+ return online
1738
+ except Exception as e:
1739
+ return {
1740
+ "source": "python_chess_stockfish",
1741
+ "move": None,
1742
+ "explanation": f"python-chess Stockfish engine exception: {str(e)}"
1743
+ }
1744
+ else:
1745
+ return {
1746
+ "source": "python_chess_stockfish",
1747
+ "move": None,
1748
+ "explanation": "python-chess or Stockfish engine not available."
1749
+ }
1750
+ except Exception as e:
1751
+ return {
1752
+ "source": "python_chess_stockfish",
1753
+ "move": None,
1754
+ "explanation": f"python-chess Stockfish engine import/availability exception: {str(e)}"
1755
+ }
1756
+
1757
+ # --- Main Internal Move Candidate Function ---
1758
+ def _get_best_chess_move_internal(fen: str) -> dict:
1759
+ """
1760
+ Internal function to get the best chess move for a given FEN position.
1761
+ Tries multiple sources (Lichess, Stockfish Online, python-chess, heuristics) and returns all candidates with explanations for LLM selection.
1762
+ Returns a Python dict, not a JSON string.
1763
+ """
1764
+ move_candidates = []
1765
+ # 1. Lichess API (all PVs)
1766
+ move_candidates.extend(_get_lichess_cloud_eval_candidates(fen))
1767
+ # 2. Stockfish Online API (single best move)
1768
+ move_candidates.append(_get_stockfish_online_candidate(fen))
1769
+ # 3. python-chess local engine, with online fallback
1770
+ move_candidates.append(_get_python_chess_stockfish_candidate(fen))
1771
+ # 4. _get_best_move_simple_heuristic
1772
+ try:
1773
+ heuristic_move = _get_best_move_simple_heuristic(fen)
1774
+ move = None
1775
+ if isinstance(heuristic_move, str) and len(heuristic_move) in [4, 5]:
1776
+ move = heuristic_move
1777
+ move_candidates.append({
1778
+ "source": "simple_heuristic",
1779
+ "move": move,
1780
+ "explanation": "Move suggested by simple FEN-based heuristic." if move else f"Heuristic error: {heuristic_move}"
1781
+ })
1782
+ except Exception as e:
1783
+ move_candidates.append({
1784
+ "source": "simple_heuristic",
1785
+ "move": None,
1786
+ "explanation": f"Simple heuristic exception: {str(e)}"
1787
+ })
1788
+ # 5. _evaluate_moves_simple
1789
+ try:
1790
+ if 'CHESS_AVAILABLE' in globals() and CHESS_AVAILABLE:
1791
+ import chess
1792
+ board = chess.Board(fen)
1793
+ legal_moves = list(board.legal_moves)
1794
+ best_move = _evaluate_moves_simple(board, legal_moves)
1795
+ move = None
1796
+ if best_move:
1797
+ move = chess.square_name(best_move.from_square) + chess.square_name(best_move.to_square)
1798
+ move_candidates.append({
1799
+ "source": "evaluate_moves_simple",
1800
+ "move": move,
1801
+ "explanation": "Move suggested by simple move evaluation (captures, checks, center, development)." if move else "No move found by simple evaluation."
1802
+ })
1803
+ except Exception as e:
1804
+ move_candidates.append({
1805
+ "source": "evaluate_moves_simple",
1806
+ "move": None,
1807
+ "explanation": f"Simple evaluation exception: {str(e)}"
1808
+ })
1809
+ return {
1810
+ "fen": fen,
1811
+ "candidates": move_candidates
1812
+ }
1813
+
1814
+ def _get_best_move_fallback(fen: str) -> str:
1815
+ """
1816
+ Fallback function to get best move when Lichess API returns 404.
1817
+ Uses alternative APIs, local chess engine, and intelligent heuristics.
1818
+ """
1819
+ try:
1820
+ # Try alternative chess API (Stockfish Online API v2)
1821
+ try:
1822
+ stockfish_result = _try_stockfish_online_api_v2(fen)
1823
+ if not stockfish_result.startswith("Error"):
1824
+ return stockfish_result
1825
+ except:
1826
+ pass
1827
+
1828
+ # Try using Stockfish via python-chess if available
1829
+ try:
1830
+ if CHESS_AVAILABLE:
1831
+ board = chess.Board(fen)
1832
+
1833
+ # Use Stockfish if available
1834
+ try:
1835
+ engine = chess.engine.SimpleEngine.popen_uci("stockfish")
1836
+ result = engine.play(board, chess.engine.Limit(time=2.0))
1837
+ engine.quit()
1838
+ if result.move:
1839
+ return chess.square_name(result.move.from_square) + chess.square_name(result.move.to_square)
1840
+ except:
1841
+ pass
1842
+
1843
+ # Fallback: use legal moves and simple evaluation
1844
+ legal_moves = list(board.legal_moves)
1845
+ if legal_moves:
1846
+ # Try to find a good move using simple evaluation
1847
+ best_move = _evaluate_moves_simple(board, legal_moves)
1848
+ if best_move:
1849
+ return chess.square_name(best_move.from_square) + chess.square_name(best_move.to_square)
1850
+ else:
1851
+ # Return first legal move as fallback
1852
+ move = legal_moves[0]
1853
+ return chess.square_name(move.from_square) + chess.square_name(move.to_square)
1854
+ else:
1855
+ return json.dumps({
1856
+ "type": "tool_response",
1857
+ "tool_name": "get_best_chess_move",
1858
+ "error": "Error: No legal moves available"
1859
+ })
1860
+
1861
+ except ImportError:
1862
+ # python-chess not available, use simple heuristic
1863
+ return _get_best_move_simple_heuristic(fen)
1864
+
1865
+ except Exception as e:
1866
+ return json.dumps({
1867
+ "type": "tool_response",
1868
+ "tool_name": "get_best_chess_move",
1869
+ "error": f"Error in fallback chess evaluation: {str(e)}"
1870
+ })
1871
+
1872
+ def _try_stockfish_online_api_v2(fen: str, depth: int = 15) -> str:
1873
+ """
1874
+ Try to get best move using Stockfish Online API v2 (https://stockfish.online/api/s/v2.php).
1875
+ Based on the official documentation. Adds debug output for troubleshooting.
1876
+ """
1877
+ try:
1878
+ # Use Stockfish Online API v2
1879
+ api_url = "https://stockfish.online/api/s/v2.php"
1880
+ params = {
1881
+ 'fen': fen,
1882
+ 'depth': depth
1883
+ }
1884
+ print(f"[DEBUG] Requesting Stockfish API: {api_url}")
1885
+ print(f"[DEBUG] Params: {params}")
1886
+ response = requests.get(api_url, params=params, timeout=15)
1887
+ print(f"[DEBUG] Status code: {response.status_code}")
1888
+ print(f"[DEBUG] Response text: {response.text}")
1889
+ if response.status_code == 200:
1890
+ data = response.json()
1891
+ # Check if request was successful
1892
+ if data.get('success') == True:
1893
+ bestmove = data.get('bestmove', '')
1894
+ if bestmove:
1895
+ # Extract the actual move from the bestmove string
1896
+ # Format: "bestmove b7b6 ponder f3e5" -> extract "b7b6"
1897
+ move_parts = bestmove.split()
1898
+ if len(move_parts) >= 2 and move_parts[0] == 'bestmove':
1899
+ return move_parts[1] # Return the actual move
1900
+ else:
1901
+ return bestmove # Return full string if parsing fails
1902
+ else:
1903
+ return json.dumps({
1904
+ "type": "tool_response",
1905
+ "tool_name": "get_best_chess_move",
1906
+ "error": "Error: No bestmove in Stockfish API response",
1907
+ "api_response": data
1908
+ })
1909
+ else:
1910
+ error_msg = data.get('data', 'Unknown error')
1911
+ return json.dumps({
1912
+ "type": "tool_response",
1913
+ "tool_name": "get_best_chess_move",
1914
+ "error": f"Error: Stockfish API failed - {error_msg}",
1915
+ "api_response": data
1916
+ })
1917
+ return json.dumps({
1918
+ "type": "tool_response",
1919
+ "tool_name": "get_best_chess_move",
1920
+ "error": f"Error: Stockfish API returned status {response.status_code}",
1921
+ "response_text": response.text
1922
+ })
1923
+ except Exception as e:
1924
+ return json.dumps({
1925
+ "type": "tool_response",
1926
+ "tool_name": "get_best_chess_move",
1927
+ "error": f"Error accessing Stockfish Online API v2: {str(e)}"
1928
+ })
1929
+
1930
+ def _evaluate_moves_simple(board, legal_moves):
1931
+ """
1932
+ Simple move evaluation for when no chess engine is available.
1933
+ """
1934
+ try:
1935
+ best_move = None
1936
+ best_score = float('-inf')
1937
+
1938
+ for move in legal_moves:
1939
+ score = 0
1940
+
1941
+ # Check if move captures a piece
1942
+ if board.is_capture(move):
1943
+ captured_piece = board.piece_at(move.to_square)
1944
+ if captured_piece:
1945
+ # Piece values: Q=9, R=5, B=3, N=3, P=1
1946
+ piece_values = {'Q': 9, 'R': 5, 'B': 3, 'N': 3, 'P': 1}
1947
+ score += piece_values.get(captured_piece.symbol().upper(), 1)
1948
+
1949
+ # Check if move gives check
1950
+ board.push(move)
1951
+ if board.is_check():
1952
+ score += 2
1953
+ board.pop()
1954
+
1955
+ # Prefer center moves for pawns
1956
+ if board.piece_at(move.from_square) and board.piece_at(move.from_square).symbol().upper() == 'P':
1957
+ center_files = ['d', 'e']
1958
+ if chr(ord('a') + move.to_square % 8) in center_files:
1959
+ score += 1
1960
+
1961
+ # Prefer developing moves (moving pieces from back rank)
1962
+ if move.from_square // 8 in [0, 7]: # Back ranks
1963
+ score += 0.5
1964
+
1965
+ if score > best_score:
1966
+ best_score = score
1967
+ best_move = move
1968
+
1969
+ return best_move
1970
+
1971
+ except Exception as e:
1972
+ return None
1973
+
1974
+ def _get_best_move_simple_heuristic(fen: str) -> str:
1975
+ """
1976
+ Simple heuristic-based move selection when no chess engine is available.
1977
+ This analyzes the position and makes intelligent move decisions.
1978
+ """
1979
+ try:
1980
+ # Parse FEN to understand the position
1981
+ parts = fen.split()
1982
+ if len(parts) < 1:
1983
+ return json.dumps({
1984
+ "type": "tool_response",
1985
+ "tool_name": "get_best_chess_move",
1986
+ "error": "Error: Invalid FEN format"
1987
+ })
1988
+
1989
+ board_part = parts[0]
1990
+ side_to_move = parts[1] if len(parts) > 1 else 'w'
1991
+ ranks = board_part.split('/')
1992
+
1993
+ # Convert FEN to a more analyzable format
1994
+ board = []
1995
+ for rank in ranks:
1996
+ row = []
1997
+ for char in rank:
1998
+ if char.isdigit():
1999
+ row.extend([''] * int(char))
2000
+ else:
2001
+ row.append(char)
2002
+ board.append(row)
2003
+
2004
+ # Find all pieces for the side to move
2005
+ pieces = []
2006
+ for rank_idx, rank in enumerate(board):
2007
+ for file_idx, piece in enumerate(rank):
2008
+ if piece:
2009
+ # Determine if piece belongs to side to move
2010
+ is_white_piece = piece.isupper()
2011
+ is_black_piece = piece.islower()
2012
+
2013
+ if (side_to_move == 'w' and is_white_piece) or (side_to_move == 'b' and is_black_piece):
2014
+ pieces.append({
2015
+ 'piece': piece.lower(),
2016
+ 'rank': rank_idx,
2017
+ 'file': file_idx,
2018
+ 'square': chr(ord('a') + file_idx) + str(8 - rank_idx)
2019
+ })
2020
+
2021
+ # Simple move selection based on piece values and position
2022
+ # Priority: Queen > Rook > Bishop > Knight > Pawn
2023
+ piece_values = {'q': 9, 'r': 5, 'b': 3, 'n': 3, 'p': 1}
2024
+
2025
+ # Sort pieces by value (highest first)
2026
+ pieces.sort(key=lambda p: piece_values.get(p['piece'], 0), reverse=True)
2027
+
2028
+ # For now, return a move from the highest value piece
2029
+ # This is a simplified approach - in reality you'd want to analyze legal moves
2030
+ if pieces:
2031
+ piece = pieces[0]
2032
+ # Create a simple move (this is just a placeholder)
2033
+ # In a real implementation, you'd generate legal moves for this piece
2034
+ from_square = piece['square']
2035
+
2036
+ # Simple heuristic: try to move towards center or capture
2037
+ if piece['piece'] == 'p': # Pawn
2038
+ # Move pawn forward
2039
+ if side_to_move == 'w':
2040
+ to_rank = piece['rank'] - 1
2041
+ else:
2042
+ to_rank = piece['rank'] + 1
2043
+
2044
+ if 0 <= to_rank < 8:
2045
+ to_square = chr(ord('a') + piece['file']) + str(8 - to_rank)
2046
+ return from_square + to_square
2047
+
2048
+ elif piece['piece'] == 'q': # Queen
2049
+ # Try to move queen to center or capture
2050
+ center_squares = ['d4', 'e4', 'd5', 'e5']
2051
+ for center in center_squares:
2052
+ if center != from_square:
2053
+ return from_square + center
2054
+
2055
+ elif piece['piece'] == 'r': # Rook
2056
+ # Try to move rook to open file or rank
2057
+ return from_square + 'd' + str(8 - piece['rank'])
2058
+
2059
+ elif piece['piece'] == 'b': # Bishop
2060
+ # Try to move bishop to long diagonal
2061
+ return from_square + 'd4'
2062
+
2063
+ elif piece['piece'] == 'n': # Knight
2064
+ # Try to move knight towards center
2065
+ return from_square + 'd4'
2066
+
2067
+ elif piece['piece'] == 'k': # King
2068
+ # Try to castle or move king to safety
2069
+ return from_square + 'g1' if side_to_move == 'w' else from_square + 'g8'
2070
+
2071
+ # Fallback: return a basic move
2072
+ return json.dumps({
2073
+ "type": "tool_response",
2074
+ "tool_name": "get_best_chess_move",
2075
+ "result": "e2e4" if side_to_move == 'w' else "e7e5"
2076
+ })
2077
+
2078
+ except Exception as e:
2079
+ return json.dumps({
2080
+ "type": "tool_response",
2081
+ "tool_name": "get_best_chess_move",
2082
+ "error": f"Error in simple heuristic: {str(e)}"
2083
+ })
2084
+
2085
+ # ========== FEN HELPER FUNCTIONS ==========
2086
+
2087
+ @tool
2088
+ def get_best_chess_move(fen: str, original_input: str = None) -> str:
2089
+ """
2090
+ Get the best chess move candidates in coordinate notation based on a FEN representation using multiple chess evaluation sources.
2091
+ The result is a structured object containing:
2092
+ - The FEN string used for evaluation
2093
+ - The original input (if provided)
2094
+ - A list of candidate moves, each with its source and explanation
2095
+ The LLM should analyze the candidates and explanations to decide which move is best for the context.
2096
+ The FEN (Forsyth-Edwards Notation) describes the current chess position.
2097
+ Eg. rn1q1rk1/pp2b1pp/2p2n2/3p1pB1/3P4/1QP2N2/PP1N1PPP/R4RK1 b - - 1 11
2098
+ This tool tries several candidate sources (Lichess cloud eval, Stockfish Online API, local python-chess Stockfish, simple heuristics)
2099
+
2100
+ Args:
2101
+ fen (str): The chess position in FEN (Forsyth-Edwards Notation) format.
2102
+ original_input (str, optional): The original chess problem or input details.
2103
+
2104
+ Returns:
2105
+ str: JSON string with all move candidates and their explanations, for LLM reasoning.
2106
+ """
2107
+ result = _get_best_chess_move_internal(fen)
2108
+ # Attach original_input if provided
2109
+ if isinstance(result, dict):
2110
+ result["original_input"] = original_input
2111
+ return json.dumps({
2112
+ "type": "tool_response",
2113
+ "tool_name": "get_best_chess_move",
2114
+ "fen": result.get("fen"),
2115
+ "original_input": result.get("original_input"),
2116
+ "candidates": result.get("candidates", [])
2117
+ })
2118
+
2119
+ @tool
2120
+ def solve_chess_position(image_path: str, player_turn: str, question: str = "") -> str:
2121
+ """
2122
+ Solve a chess position by analyzing the board image and finding the best move.
2123
+ This tool returns a structured object containing:
2124
+ - The extracted FEN (with explanation)
2125
+ - The original input details (image path, player turn, question)
2126
+ - A list of candidate moves (with explanations)
2127
+ The LLM should analyze the candidates and explanations to decide which move is best for the context.
2128
+
2129
+ Args:
2130
+ image_path (str): The path to the chess board image file or base64-encoded image data.
2131
+ player_turn (str): The player with the next turn ("black" or "white").
2132
+ question (str): Optional question about the position (e.g., "guarantees a win").
2133
+
2134
+ Returns:
2135
+ str: JSON string with all details and move candidates for LLM reasoning.
2136
+ """
2137
+ # Step 1: Get FEN from image
2138
+ fen_explanation = ""
2139
+ fen = None
2140
+ try:
2141
+ fen_result = _get_chess_board_fen_internal(image_path)
2142
+ if isinstance(fen_result, str) and fen_result.startswith("Error"):
2143
+ fen_explanation = fen_result
2144
+ fen = None
2145
+ else:
2146
+ fen = fen_result
2147
+ fen_explanation = "FEN extracted successfully from image."
2148
+ except Exception as e:
2149
+ fen_explanation = f"Error extracting FEN: {str(e)}"
2150
+ fen = None
2151
+ # Step 2: Get best move candidates (if FEN available)
2152
+ candidates = []
2153
+ if fen:
2154
+ best_move_result = _get_best_chess_move_internal(fen)
2155
+ if isinstance(best_move_result, dict):
2156
+ candidates = best_move_result.get('candidates', [])
2157
+ else:
2158
+ candidates = []
2159
+ return json.dumps({
2160
+ 'type': 'tool_response',
2161
+ 'tool_name': 'solve_chess_position',
2162
+ 'fen': fen,
2163
+ 'fen_explanation': fen_explanation,
2164
+ 'original_input': {
2165
+ 'image_path': image_path,
2166
+ 'player_turn': player_turn,
2167
+ 'question': question
2168
+ },
2169
+ 'candidates': candidates
2170
+ })
2171
+
2172
+ # ========== FEN PROCESSING HELPERS ==========
2173
+ def _add_fen_game_state(board_placement,
2174
+ side_to_move,
2175
+ castling="-",
2176
+ en_passant="-",
2177
+ halfmove_clock=0,
2178
+ fullmove_number=1):
2179
+ """
2180
+ Appends standard game state information to a FEN board placement string.
2181
+
2182
+ Args:
2183
+ board_placement (str): The board layout part of the FEN string
2184
+ (e.g., "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR").
2185
+ side_to_move (str): The active color ('w' for White, 'b' for Black).
2186
+ Case-insensitive, will be converted to lowercase.
2187
+ castling (str, optional): Castling availability string (e.g., "KQkq", "-").
2188
+ Defaults to "-".
2189
+ en_passant (str, optional): En passant target square string (e.g., "e3", "-").
2190
+ Defaults to "-".
2191
+ halfmove_clock (int, optional): The number of halfmoves since the last
2192
+ capture or pawn advance. Defaults to 0.
2193
+ fullmove_number (int, optional): The number of the full move. Starts at 1
2194
+ and increments after Black's move. Defaults to 1.
2195
+
2196
+ Returns:
2197
+ str: The complete FEN string including the game state,
2198
+ or an error message string if inputs are invalid.
2199
+ """
2200
+ # Validate side_to_move
2201
+ side_to_move_lower = str(side_to_move).lower()
2202
+ if side_to_move_lower not in ['w', 'b']:
2203
+ return json.dumps({
2204
+ "type": "tool_response",
2205
+ "tool_name": "add_fen_game_state",
2206
+ "error": f"Error: side_to_move must be 'w' or 'b', received '{side_to_move}'"
2207
+ })
2208
+
2209
+ # Validate clock values (should be non-negative integers, fullmove >= 1)
2210
+ try:
2211
+ halfmove_clock = int(halfmove_clock)
2212
+ fullmove_number = int(fullmove_number)
2213
+ if halfmove_clock < 0:
2214
+ raise ValueError("halfmove_clock cannot be negative.")
2215
+ if fullmove_number < 1:
2216
+ raise ValueError("fullmove_number must be 1 or greater.")
2217
+ except (ValueError, TypeError):
2218
+ return json.dumps({
2219
+ "type": "tool_response",
2220
+ "tool_name": "add_fen_game_state",
2221
+ "error": f"Error: halfmove_clock ('{halfmove_clock}') and "
2222
+ f"fullmove_number ('{fullmove_number}') must be valid integers "
2223
+ f"(non-negative and positive respectively)."
2224
+ })
2225
+
2226
+ # Assemble the full FEN string using the validated/defaulted values
2227
+ # Note: castling and en_passant strings are used directly as passed or defaulted.
2228
+ # More complex validation could be added for them if needed.
2229
+ full_fen = (f"{board_placement} {side_to_move_lower} {castling} "
2230
+ f"{en_passant} {halfmove_clock} {fullmove_number}")
2231
+
2232
+ return json.dumps({
2233
+ "type": "tool_response",
2234
+ "tool_name": "add_fen_game_state",
2235
+ "result": full_fen
2236
+ })
2237
+
2238
+ def _fen_normalize(fen: str, default_side='w'):
2239
+ """
2240
+ Normalize and validate a FEN string. Always return a best-effort valid FEN.
2241
+ - If only the board part is present, append default fields.
2242
+ - If FEN is valid, return as is.
2243
+ - If not valid, try to fix or return a clear error FEN.
2244
+ """
2245
+ fen = fen.strip()
2246
+ parts = fen.split()
2247
+ # If only board part, append defaults
2248
+ if len(parts) == 1 and parts[0].count('/') == 7:
2249
+ fen = f"{fen} {default_side} - - 0 1"
2250
+ # Validate using python-chess
2251
+ try:
2252
+ board = chess.Board(fen)
2253
+ return board.fen()
2254
+ except Exception as e:
2255
+ return f"8/8/8/8/8/8/8/8 w - - 0 1" # Return an empty board as a fallback
2256
+
2257
+ def _get_chess_board_fen_internal(image_input: str) -> str:
2258
+ """
2259
+ Internal function to get the FEN representation from an image of a chess board.
2260
+ Uses the DerekLiu35-ImageToFen Hugging Face Space API.
2261
+ Args:
2262
+ image_input (str): Path to the chessboard image file or base64-encoded image data.
2263
+ Returns:
2264
+ str: The FEN string predicted by the recognizer, or an error message.
2265
+ """
2266
+ api_url = "https://DerekLiu35-ImageToFen.hf.space/api/predict"
2267
+ try:
2268
+ # Detect if input is a file path or base64 data
2269
+ if os.path.exists(image_input):
2270
+ with open(image_input, "rb") as f:
2271
+ img_b64 = base64.b64encode(f.read()).decode("utf-8")
2272
+ else:
2273
+ img_b64 = image_input
2274
+ payload = {"data": [img_b64]}
2275
+ response = requests.post(api_url, json=payload, timeout=60)
2276
+ if response.ok:
2277
+ result = response.json()
2278
+ data = result.get("data", [])
2279
+ if data:
2280
+ # FEN is usually the last string in the list
2281
+ fen_candidate = data[-1]
2282
+ if isinstance(fen_candidate, str) and fen_candidate.count('/') == 7:
2283
+ return _fen_normalize(fen_candidate)
2284
+ # Fallback: search for a line with 7 slashes
2285
+ for item in data:
2286
+ if isinstance(item, str) and item.count('/') == 7:
2287
+ return _fen_normalize(item)
2288
+ return json.dumps({
2289
+ "type": "tool_response",
2290
+ "tool_name": "get_chess_board_fen",
2291
+ "error": f"Error: FEN not found in API response: {result}"
2292
+ })
2293
+ else:
2294
+ return json.dumps({
2295
+ "type": "tool_response",
2296
+ "tool_name": "get_chess_board_fen",
2297
+ "error": f"Error: API call failed: {response.text}"
2298
+ })
2299
+ except Exception as e:
2300
+ return json.dumps({
2301
+ "type": "tool_response",
2302
+ "tool_name": "get_chess_board_fen",
2303
+ "error": f"Error running image-to-FEN API: {str(e)}"
2304
+ })
2305
+ @tool
2306
+ def get_chess_board_fen(image_path: str, player_turn: str) -> str:
2307
+ """
2308
+ Get the FEN representation from an image of a chess board.
2309
+ This tool uses computer vision to analyze a chess board image and convert it
2310
+ to FEN (Forsyth-Edwards Notation) format.
2311
+ Args:
2312
+ image_path (str): The path to the chess board image file.
2313
+ player_turn (str): The player with the next turn ("black" or "white").
2314
+ Returns:
2315
+ str: The FEN representation of the chess position, or error message.
2316
+ """
2317
+ fen = _get_chess_board_fen_internal(image_path)
2318
+ # If the result is a JSON error, pass it through
2319
+ try:
2320
+ import json
2321
+ data = json.loads(fen)
2322
+ if isinstance(data, dict) and 'error' in data:
2323
+ return fen
2324
+ except Exception:
2325
+ pass
2326
+ # Otherwise, return the normalized FEN in the required structure
2327
+ return json.dumps({
2328
+ "type": "tool_response",
2329
+ "tool_name": "get_chess_board_fen",
2330
+ "result": _fen_normalize(fen, default_side='b' if player_turn.lower().startswith('b') else 'w')
2331
+ })
2332
+
2333
+ @tool
2334
+ def web_search_deep_research_exa_ai(instructions: str) -> str:
2335
+ """
2336
+ Search the web and site content using deep research tool.
2337
+ Ask a query and get a well-researched answer with references.
2338
+ Can provide FINAL ANSWER candidate.
2339
+ Ideal for research tasks on any topic that require fact searching.
2340
+ Can find answers and reference about science, scholars, sports, events, books, films, movies, mems, citations, etc.
2341
+
2342
+ The tool researches a topic, verifies facts and outputs a structured answer.
2343
+ It deeply crawls websites to find the right answer, results and links.
2344
+
2345
+ RESPONSE STRUCTURE:
2346
+ The tool returns a structured response with the following format:
2347
+ 1. Task ID and Status
2348
+ 2. Original Instructions
2349
+ 3. Inferred Schema (JSON schema describing the response data structure)
2350
+ 4. Data (JSON object containing the answer according to the schema)
2351
+ 5. Citations (source references)
2352
+
2353
+ SCHEMA INFERENCE:
2354
+ The tool automatically infers the appropriate schema based on your question.
2355
+ For example, a schema might include:
2356
+ - Person data: {"firstName", "lastName", "nationality", "year", etc.}
2357
+ - Event data: {"event", "date", "location", "participants", etc.}
2358
+ - Fact data: {"fact", "source", "context", etc.}
2359
+
2360
+ DATA EXTRACTION:
2361
+ To extract the answer from the response:
2362
+ 1. Look for the "Data" section in the response
2363
+ 2. Parse the JSON object in the "Data" field according to the schema
2364
+ 3. Extract the relevant fields based on your question
2365
+
2366
+ Args:
2367
+ instructions (str): Direct question or research instructions.
2368
+
2369
+ Returns:
2370
+ str: The research result as a structured JSON string with schema, data, and citations, or an error message.
2371
+ """
2372
+ if not EXA_AVAILABLE:
2373
+ return json.dumps({
2374
+ "type": "tool_response",
2375
+ "tool_name": "web_search_deep_research_exa_ai",
2376
+ "error": "Exa not available. Install with: pip install exa-py"
2377
+ })
2378
+ try:
2379
+ exa_api_key = os.environ.get("EXA_API_KEY")
2380
+ if not exa_api_key:
2381
+ return json.dumps({
2382
+ "type": "tool_response",
2383
+ "tool_name": "web_search_deep_research_exa_ai",
2384
+ "error": "EXA_API_KEY not found in environment variables. Please set it in your .env file."
2385
+ })
2386
+ exa = Exa(exa_api_key)
2387
+ task_stub = exa.research.create_task(
2388
+ instructions=instructions,
2389
+ model="exa-research-pro",
2390
+ output_infer_schema = True
2391
+ )
2392
+ task = exa.research.poll_task(task_stub.id)
2393
+ return json.dumps({
2394
+ "type": "tool_response",
2395
+ "tool_name": "web_search_deep_research_exa_ai",
2396
+ "result": str(task)
2397
+ })
2398
+ except Exception as e:
2399
+ return json.dumps({
2400
+ "type": "tool_response",
2401
+ "tool_name": "web_search_deep_research_exa_ai",
2402
+ "error": f"Error in Exa research: {str(e)}"
2403
+ })
2404
+
2405
+ # ========== END OF TOOLS.PY ==========
utils.py ADDED
@@ -0,0 +1,347 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import datetime
3
+ import json
4
+ from typing import Optional, Union, Dict, Any, List
5
+ from pathlib import Path
6
+
7
+ # Global constants
8
+ TRACES_DIR = "traces" # Directory for uploading trace files (won't trigger Space restarts)
9
+
10
+ # Dataset constants
11
+ DATASET_ID = "arterm-sedov/agent-course-final-assignment"
12
+ DATASET_CONFIG_PATH = "dataset_config.json" # Local copy of dataset config
13
+
14
+ # Import huggingface_hub components for API-based file operations
15
+ try:
16
+ from huggingface_hub import HfApi, CommitOperationAdd
17
+ HF_HUB_AVAILABLE = True
18
+ except ImportError:
19
+ HF_HUB_AVAILABLE = False
20
+ print("Warning: huggingface_hub not available. Install with: pip install huggingface_hub")
21
+
22
+ def load_dataset_schema() -> Optional[Dict]:
23
+ """
24
+ Load dataset schema from local dataset_config.json file.
25
+ Tries multiple possible locations for robustness.
26
+ """
27
+ possible_paths = [
28
+ Path("dataset_config.json"), # Current working directory (root)
29
+ Path("./dataset_config.json"),
30
+ Path("../dataset_config.json"), # Parent directory (if run from misc_files)
31
+ Path(__file__).parent / "dataset_config.json",
32
+ Path(__file__).parent.parent / "dataset_config.json"
33
+ ]
34
+ for path in possible_paths:
35
+ if path.exists():
36
+ with open(path, "r", encoding="utf-8") as f:
37
+ return json.load(f)
38
+ print("Warning: Dataset config file not found: dataset_config.json")
39
+ return None
40
+
41
+ def get_dataset_features(split: str) -> Optional[Dict]:
42
+ """
43
+ Get features schema for a specific dataset split.
44
+
45
+ Args:
46
+ split (str): Dataset split name (init or runs)
47
+
48
+ Returns:
49
+ Dict: Features schema for the split or None if not found
50
+ """
51
+ schema = load_dataset_schema()
52
+ if schema and "features" in schema and split in schema["features"]:
53
+ features = schema["features"][split]
54
+ print(f"πŸ” Loaded schema for {split}: {list(features.keys())}")
55
+ return features
56
+ print(f"❌ No schema found for {split}")
57
+ return None
58
+
59
+ def validate_data_structure(data: Dict, split: str) -> bool:
60
+ """
61
+ Validate that data matches the expected schema for the split.
62
+
63
+ Args:
64
+ data (Dict): Data to validate
65
+ split (str): Dataset split name
66
+
67
+ Returns:
68
+ bool: True if data structure is valid
69
+ """
70
+ features = get_dataset_features(split)
71
+ if not features:
72
+ print(f"Warning: No schema found for split '{split}', skipping validation")
73
+ return True
74
+
75
+ # Debug: Print what we're checking
76
+ print(f"πŸ” Validating {split} split:")
77
+ print(f" Expected fields: {list(features.keys())}")
78
+ print(f" Actual fields: {list(data.keys())}")
79
+
80
+ # Check that all required fields are present
81
+ required_fields = set(features.keys())
82
+ data_fields = set(data.keys())
83
+
84
+ missing_fields = required_fields - data_fields
85
+ if missing_fields:
86
+ print(f"Warning: Missing required fields for {split} split: {missing_fields}")
87
+ return False
88
+
89
+ # Enhanced validation: Check nullable fields and data types
90
+ for field_name, field_spec in features.items():
91
+ if field_name in data:
92
+ value = data[field_name]
93
+
94
+ # Check nullable fields
95
+ is_nullable = field_spec.get("nullable", False)
96
+ if value is None and not is_nullable:
97
+ print(f"Warning: Field '{field_name}' is not nullable but contains None")
98
+ return False
99
+
100
+ # Check data types for non-null values
101
+ if value is not None:
102
+ expected_dtype = field_spec.get("dtype", "string")
103
+ if expected_dtype == "float64" and not isinstance(value, (int, float)):
104
+ print(f"Warning: Field '{field_name}' should be float64 but got {type(value)}")
105
+ return False
106
+ elif expected_dtype == "int64" and not isinstance(value, int):
107
+ print(f"Warning: Field '{field_name}' should be int64 but got {type(value)}")
108
+ return False
109
+ elif expected_dtype == "string" and not isinstance(value, str):
110
+ print(f"Warning: Field '{field_name}' should be string but got {type(value)}")
111
+ return False
112
+
113
+ return True
114
+
115
+ def get_hf_api_client(token: Optional[str] = None):
116
+ """
117
+ Create and configure an HfApi client for repository operations.
118
+
119
+ Args:
120
+ token (str, optional): HuggingFace token. If None, uses environment variable.
121
+
122
+ Returns:
123
+ HfApi: Configured API client or None if not available
124
+ """
125
+ if not HF_HUB_AVAILABLE:
126
+ return None
127
+
128
+ try:
129
+ # Get token from parameter or environment
130
+ hf_token = token or os.environ.get("HF_TOKEN") or os.environ.get("HUGGINGFACEHUB_API_TOKEN")
131
+ if not hf_token:
132
+ print("Warning: No HuggingFace token found. API operations will fail.")
133
+ return None
134
+
135
+ # Create API client
136
+ api = HfApi(token=hf_token)
137
+ return api
138
+ except Exception as e:
139
+ print(f"Error creating HfApi client: {e}")
140
+ return None
141
+
142
+
143
+
144
+ def upload_to_dataset(
145
+ dataset_id: str,
146
+ data: Union[Dict, List[Dict]],
147
+ split: str = "train",
148
+ token: Optional[str] = None
149
+ ) -> bool:
150
+ """
151
+ Upload structured data to HuggingFace dataset.
152
+
153
+ Args:
154
+ dataset_id (str): Dataset repository ID (e.g., "username/dataset-name")
155
+ data (Union[Dict, List[Dict]]): Data to upload (single dict or list of dicts)
156
+ split (str): Dataset split name (default: "train")
157
+ token (str, optional): HuggingFace token
158
+
159
+ Returns:
160
+ bool: True if successful, False otherwise
161
+ """
162
+ if not HF_HUB_AVAILABLE:
163
+ print("Error: huggingface_hub not available for dataset operations")
164
+ return False
165
+
166
+ try:
167
+ # Get API client
168
+ api = get_hf_api_client(token)
169
+ if not api:
170
+ return False
171
+
172
+ # Prepare data as list
173
+ if isinstance(data, dict):
174
+ data_list = [data]
175
+ else:
176
+ data_list = data
177
+
178
+ # Validate data structure against local schema only
179
+ # Note: HuggingFace may show warnings about remote schema mismatch, but uploads still work
180
+ for i, item in enumerate(data_list):
181
+ if not validate_data_structure(item, split):
182
+ print(f"Warning: Data item {i} does not match local schema for split '{split}'")
183
+ # Continue anyway, but log the warning
184
+
185
+ # Convert to JSONL format with proper serialization
186
+ jsonl_content = ""
187
+ for item in data_list:
188
+ # Ensure all complex objects are serialized as strings
189
+ serialized_item = {}
190
+ for key, value in item.items():
191
+ if isinstance(value, (dict, list)):
192
+ serialized_item[key] = json.dumps(value, ensure_ascii=False)
193
+ else:
194
+ serialized_item[key] = value
195
+ jsonl_content += json.dumps(serialized_item, ensure_ascii=False) + "\n"
196
+
197
+ # Create file path for dataset
198
+ timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
199
+ file_path = f"{split}-{timestamp}.jsonl"
200
+
201
+ # Upload to dataset
202
+ operation = CommitOperationAdd(
203
+ path_in_repo=file_path,
204
+ path_or_fileobj=jsonl_content.encode('utf-8')
205
+ )
206
+
207
+ commit_message = f"Add {split} data at {timestamp}"
208
+
209
+ # Commit to dataset repository
210
+ commit_info = api.create_commit(
211
+ repo_id=dataset_id,
212
+ repo_type="dataset",
213
+ operations=[operation],
214
+ commit_message=commit_message
215
+ )
216
+
217
+ print(f"βœ… Data uploaded to dataset: {dataset_id}")
218
+ print(f" File: {file_path}")
219
+ print(f" Records: {len(data_list)}")
220
+ return True
221
+
222
+ except Exception as e:
223
+ print(f"❌ Error uploading to dataset: {e}")
224
+ return False
225
+
226
+ def upload_init_summary(
227
+ init_data: Dict,
228
+ token: Optional[str] = None
229
+ ) -> bool:
230
+ """
231
+ Upload agent initialization summary to init split.
232
+
233
+ Args:
234
+ init_data (Dict): Initialization data including LLM config, model status, etc.
235
+ token (str, optional): HuggingFace token
236
+
237
+ Returns:
238
+ bool: True if successful, False otherwise
239
+ """
240
+ return upload_to_dataset(DATASET_ID, init_data, "init", token)
241
+
242
+ def upload_run_data(
243
+ run_data: Dict,
244
+ split: str = "runs_new",
245
+ token: Optional[str] = None
246
+ ) -> bool:
247
+ """
248
+ Upload evaluation run data to specified split.
249
+
250
+ Args:
251
+ run_data (Dict): Evaluation run data including results, stats, etc.
252
+ split (str): Dataset split name (default: "runs_new" for current schema)
253
+ token (str, optional): HuggingFace token
254
+
255
+ Returns:
256
+ bool: True if successful, False otherwise
257
+ """
258
+ return upload_to_dataset(DATASET_ID, run_data, split, token)
259
+
260
+ def get_dataset_info() -> Optional[Dict]:
261
+ """
262
+ Get dataset information from the local config file.
263
+
264
+ Returns:
265
+ Dict: Dataset info including splits and features, or None if not found
266
+ """
267
+ schema = load_dataset_schema()
268
+ if schema and "dataset_info" in schema:
269
+ return schema["dataset_info"]
270
+ return None
271
+
272
+ def print_dataset_schema():
273
+ """
274
+ Print the dataset schema for debugging purposes.
275
+ """
276
+ schema = load_dataset_schema()
277
+ if schema:
278
+ print("πŸ“Š Dataset Schema:")
279
+ print(f" Dataset: {schema.get('dataset_info', {}).get('dataset_name', 'Unknown')}")
280
+ print(f" Splits: {list(schema.get('features', {}).keys())}")
281
+ for split_name, features in schema.get('features', {}).items():
282
+ print(f" {split_name} split fields: {list(features.keys())}")
283
+ else:
284
+ print("❌ No dataset schema found")
285
+
286
+ def ensure_valid_answer(answer: Any) -> str:
287
+ """
288
+ Ensure the answer is a valid string, never None or empty.
289
+
290
+ Args:
291
+ answer (Any): The answer to validate
292
+
293
+ Returns:
294
+ str: A valid string answer, defaulting to "No answer provided" if invalid
295
+ """
296
+ if answer is None:
297
+ return "No answer provided"
298
+ elif not isinstance(answer, str):
299
+ return str(answer)
300
+ elif answer.strip() == "":
301
+ return "No answer provided"
302
+ else:
303
+ return answer
304
+
305
+ def get_nullable_field_value(value: Any, field_name: str, default: Any = None) -> Any:
306
+ """
307
+ Get a value for a nullable field, handling None values appropriately.
308
+
309
+ Args:
310
+ value (Any): The value to process
311
+ field_name (str): Name of the field for logging
312
+ default (Any): Default value if None
313
+
314
+ Returns:
315
+ Any: The processed value or default
316
+ """
317
+ if value is None:
318
+ print(f"πŸ“ Field '{field_name}' is None, using default: {default}")
319
+ return default
320
+ return value
321
+
322
+ def validate_nullable_field(value: Any, field_name: str, expected_type: str) -> bool:
323
+ """
324
+ Validate a nullable field against expected type.
325
+
326
+ Args:
327
+ value (Any): The value to validate
328
+ field_name (str): Name of the field
329
+ expected_type (str): Expected data type (string, float64, int64)
330
+
331
+ Returns:
332
+ bool: True if valid
333
+ """
334
+ if value is None:
335
+ return True # Null is always valid for nullable fields
336
+
337
+ if expected_type == "float64" and not isinstance(value, (int, float)):
338
+ print(f"❌ Field '{field_name}' should be float64 but got {type(value)}")
339
+ return False
340
+ elif expected_type == "int64" and not isinstance(value, int):
341
+ print(f"❌ Field '{field_name}' should be int64 but got {type(value)}")
342
+ return False
343
+ elif expected_type == "string" and not isinstance(value, str):
344
+ print(f"❌ Field '{field_name}' should be string but got {type(value)}")
345
+ return False
346
+
347
+ return True