Update README.md
Browse files
README.md
CHANGED
|
@@ -306,102 +306,9 @@ Command for running OpenCUA-7B in OSWorld:
|
|
| 306 |
--coordinate_type qwen25
|
| 307 |
```
|
| 308 |
|
| 309 |
-
|
| 310 |
-
|
| 311 |
-
# AgentNet Dataset - Large-Scale Computer-Use Dataset
|
| 312 |
-
|
| 313 |
-
<div align="center">
|
| 314 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/67b327cdd4665a0448eef7d5/dw5k183ucDSB2SZuS5f2V.png" width="400" alt="AgentNet Dataset Domain Distribution">
|
| 315 |
-
</div>
|
| 316 |
-
|
| 317 |
-
AgentNet is the first large-scale desktop computer-use agent trajectory dataset, containing 22.6K human-annotated computer-use tasks across Windows, macOS, and Ubuntu systems.
|
| 318 |
-
|
| 319 |
-
👉 **[AgentNet Huggingface Dataset](https://huggingface.co/datasets/xlangai/AgentNet)**
|
| 320 |
-
|
| 321 |
-
Download the dataset here:
|
| 322 |
-
```
|
| 323 |
-
pip install -U huggingface_hub
|
| 324 |
-
huggingface-cli download xlangai/AgentNet --repo-type dataset --local-dir ./AgentNet
|
| 325 |
-
```
|
| 326 |
-
|
| 327 |
-
Collecting computer-use agent training data requires 3 steps:
|
| 328 |
-
- Demonstrate human computer-use task via [AgentNetTool](https://agentnet-tool.xlang.ai/);
|
| 329 |
-
- Preprocess the demonstration using [Action Reduction & State-Action Matching](./data/data-processor);
|
| 330 |
-
- For each step, [synthesize reflective long CoT](./data/cot-generator)
|
| 331 |
-
|
| 332 |
-
|
| 333 |
-
## 1 AgentNetTool – Annotation & Verification Tool
|
| 334 |
-
<div align="center">
|
| 335 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/67b327cdd4665a0448eef7d5/ETjCOoIRR7f1YZCJ2kfiW.png" width="700" alt="AgentNet Tool">
|
| 336 |
-
</div>
|
| 337 |
-
|
| 338 |
-
|
| 339 |
-
Our **AgentNetTool** is a cross-platform GUI recorder that runs unobtrusively on annotators' machines. It captures synchronized **screen video**, **mouse/keyboard events**, and **accessibility trees**, then provides an in-browser UI for reviewing, trimming, and submitting demonstrations. AgentNet Tool is available on Windows, macOS and Ubuntu.
|
| 340 |
-
|
| 341 |
-
👉 **[AgentNetTool Document](https://agentnet-tool.xlang.ai/)**
|
| 342 |
-
|
| 343 |
-
|
| 344 |
-
|
| 345 |
-
## 2 DataProcessor – Action Reduction & State–Action Matching
|
| 346 |
-
Raw demonstrations can contain thousands of low-level events that are too dense for model training.
|
| 347 |
-
The **DataProcessor** module (`./data/data-process/`) performs two key steps:
|
| 348 |
-
|
| 349 |
-
1. **Action Reduction** — merges granular signals into concise, semantically meaningful PyAutoGUI actions (e.g., collapsing mouse moves → click, coalescing scrolls, grouping key-press sequences into text or hotkeys).
|
| 350 |
-
2. **State–Action Matching** — aligns every reduced action with the *last visually distinct frame* **before** the action begins, avoiding future-information leakage and yielding compact state–action pairs.
|
| 351 |
-
|
| 352 |
-
These processed trajectories underlie all downstream training and evaluation.
|
| 353 |
-
|
| 354 |
-
---
|
| 355 |
-
|
| 356 |
-
## 3 CoTGenerator – Synthesizing Reflective Long Chain-of-Thought Inner Monologue
|
| 357 |
-
To boost robustness and interpretability, we augment each trajectory with **reflective long Chain-of-Thought (CoT) reasoning**.
|
| 358 |
-
The **CoTGenerator** pipeline (`./data/cot-generator/`) synthesizes step-level reflections that:
|
| 359 |
-
|
| 360 |
-
* reflect on the previous action,
|
| 361 |
-
* explain *why* an action is chosen given the current observation and history,
|
| 362 |
-
* note potential alternative actions, and
|
| 363 |
-
* forecast the expected next state.
|
| 364 |
-
|
| 365 |
-
Empirically, models trained with these rich CoTs scale better with data and generalize across unseen applications.
|
| 366 |
-
|
| 367 |
-
|
| 368 |
-
# Evaluation
|
| 369 |
-
|
| 370 |
-
<div align="center">
|
| 371 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/67b327cdd4665a0448eef7d5/emy1QCJwQj9KqHkVmtNH2.png" width="800" alt="AgentNetBench">
|
| 372 |
-
</div>
|
| 373 |
-
|
| 374 |
-
|
| 375 |
-
**AgentNetBench** (`./AgentNetBench/`) provides a realistic offline evaluator for OS agent trajectories. It compares model-predicted low-level actions (click, moveTo, write, press, scroll, terminate, etc.) against ground-truth human actions and reports detailed metrics.
|
| 376 |
-
|
| 377 |
-
👉 See **[AgentNetBench/README.md](./evaluation/agentnetbench/README.md)** for usage instructions.
|
| 378 |
-
|
| 379 |
-
# Acknowledge
|
| 380 |
-
<p>
|
| 381 |
-
We thank Yu Su, Caiming Xiong, and the anonymous reviewers for their insightful discussions and valuable feedback.
|
| 382 |
-
We are grateful to Moonshot AI for providing training infrastructure and annotated data.
|
| 383 |
-
We also sincerely appreciate Hao Yang, Zhengtao Wang, and Yanxu Chen from the Kimi Team for their strong infrastructure support and helpful guidance.
|
| 384 |
-
We thank Chong Peng, Taofeng Xue, and Qiumian Huang from the <a href="https://github.com/meituan/EvoCUA" target="_blank">Meituan EvoCUA Team</a> for their contributions to vLLM integration.
|
| 385 |
-
The development of our tool is based on the open-source projects-<a href="https://github.com/TheDuckAI/DuckTrack" target="_blank">DuckTrack</a> and <a href="https://github.com/OpenAdaptAI/OpenAdapt" target="_blank">OpenAdapt</a>.
|
| 386 |
-
We are very grateful to their commitment to the open source community. Finally, we extend our deepest thanks to all annotators for their tremendous effort and contributions to this project.
|
| 387 |
-
</p>
|
| 388 |
-
|
| 389 |
-
# License
|
| 390 |
-
|
| 391 |
-
This project is licensed under the MIT License - see the LICENSE file in the root folder for details.
|
| 392 |
-
|
| 393 |
-
## Research Use and Disclaimer
|
| 394 |
-
|
| 395 |
-
OpenCUA models are intended for **research and educational purposes only**.
|
| 396 |
-
|
| 397 |
-
### Prohibited Uses
|
| 398 |
-
- The model may **not** be used for any purpose or activity that violates applicable laws or regulations in any jurisdiction
|
| 399 |
-
- Use for illegal, unethical, or harmful activities is strictly prohibited
|
| 400 |
|
| 401 |
-
|
| 402 |
-
- The authors, contributors, and copyright holders are **not responsible** for any illegal, unethical, or harmful use of the Software, nor for any direct or indirect damages resulting from such use
|
| 403 |
-
- Use of the "OpenCUA" name, logo, or trademarks does **not** imply any endorsement or affiliation unless separate written permission is obtained
|
| 404 |
-
- Users are solely responsible for ensuring their use complies with applicable laws and regulations
|
| 405 |
|
| 406 |
## Important Notes on Coordinate Systems
|
| 407 |
<div style="border-left: 6px solid #9ca3af; background: #f5f5f5; padding: 12px 16px; margin: 16px 0;">
|
|
|
|
| 306 |
--coordinate_type qwen25
|
| 307 |
```
|
| 308 |
|
| 309 |
+
## Research and Commercial Use
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 310 |
|
| 311 |
+
OpenCUA (including the model, dataset, tools, and code) may be used for **research, educational, and commercial purposes** under the **MIT License** (see `LICENSE`).
|
|
|
|
|
|
|
|
|
|
| 312 |
|
| 313 |
## Important Notes on Coordinate Systems
|
| 314 |
<div style="border-left: 6px solid #9ca3af; background: #f5f5f5; padding: 12px 16px; margin: 16px 0;">
|