xywang626 commited on
Commit
31743a9
·
verified ·
1 Parent(s): 6d6f65b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -95
README.md CHANGED
@@ -306,102 +306,9 @@ Command for running OpenCUA-7B in OSWorld:
306
  --coordinate_type qwen25
307
  ```
308
 
309
- ---
310
-
311
- # AgentNet Dataset - Large-Scale Computer-Use Dataset
312
-
313
- <div align="center">
314
- <img src="https://cdn-uploads.huggingface.co/production/uploads/67b327cdd4665a0448eef7d5/dw5k183ucDSB2SZuS5f2V.png" width="400" alt="AgentNet Dataset Domain Distribution">
315
- </div>
316
-
317
- AgentNet is the first large-scale desktop computer-use agent trajectory dataset, containing 22.6K human-annotated computer-use tasks across Windows, macOS, and Ubuntu systems.
318
-
319
- 👉 **[AgentNet Huggingface Dataset](https://huggingface.co/datasets/xlangai/AgentNet)**
320
-
321
- Download the dataset here:
322
- ```
323
- pip install -U huggingface_hub
324
- huggingface-cli download xlangai/AgentNet --repo-type dataset --local-dir ./AgentNet
325
- ```
326
-
327
- Collecting computer-use agent training data requires 3 steps:
328
- - Demonstrate human computer-use task via [AgentNetTool](https://agentnet-tool.xlang.ai/);
329
- - Preprocess the demonstration using [Action Reduction & State-Action Matching](./data/data-processor);
330
- - For each step, [synthesize reflective long CoT](./data/cot-generator)
331
-
332
-
333
- ## 1 AgentNetTool – Annotation & Verification Tool
334
- <div align="center">
335
- <img src="https://cdn-uploads.huggingface.co/production/uploads/67b327cdd4665a0448eef7d5/ETjCOoIRR7f1YZCJ2kfiW.png" width="700" alt="AgentNet Tool">
336
- </div>
337
-
338
-
339
- Our **AgentNetTool** is a cross-platform GUI recorder that runs unobtrusively on annotators' machines. It captures synchronized **screen video**, **mouse/keyboard events**, and **accessibility trees**, then provides an in-browser UI for reviewing, trimming, and submitting demonstrations. AgentNet Tool is available on Windows, macOS and Ubuntu.
340
-
341
- 👉 **[AgentNetTool Document](https://agentnet-tool.xlang.ai/)**
342
-
343
-
344
-
345
- ## 2 DataProcessor – Action Reduction & State–Action Matching
346
- Raw demonstrations can contain thousands of low-level events that are too dense for model training.
347
- The **DataProcessor** module (`./data/data-process/`) performs two key steps:
348
-
349
- 1. **Action Reduction** — merges granular signals into concise, semantically meaningful PyAutoGUI actions (e.g., collapsing mouse moves → click, coalescing scrolls, grouping key-press sequences into text or hotkeys).
350
- 2. **State–Action Matching** — aligns every reduced action with the *last visually distinct frame* **before** the action begins, avoiding future-information leakage and yielding compact state–action pairs.
351
-
352
- These processed trajectories underlie all downstream training and evaluation.
353
-
354
- ---
355
-
356
- ## 3 CoTGenerator – Synthesizing Reflective Long Chain-of-Thought Inner Monologue
357
- To boost robustness and interpretability, we augment each trajectory with **reflective long Chain-of-Thought (CoT) reasoning**.
358
- The **CoTGenerator** pipeline (`./data/cot-generator/`) synthesizes step-level reflections that:
359
-
360
- * reflect on the previous action,
361
- * explain *why* an action is chosen given the current observation and history,
362
- * note potential alternative actions, and
363
- * forecast the expected next state.
364
-
365
- Empirically, models trained with these rich CoTs scale better with data and generalize across unseen applications.
366
-
367
-
368
- # Evaluation
369
-
370
- <div align="center">
371
- <img src="https://cdn-uploads.huggingface.co/production/uploads/67b327cdd4665a0448eef7d5/emy1QCJwQj9KqHkVmtNH2.png" width="800" alt="AgentNetBench">
372
- </div>
373
-
374
-
375
- **AgentNetBench** (`./AgentNetBench/`) provides a realistic offline evaluator for OS agent trajectories. It compares model-predicted low-level actions (click, moveTo, write, press, scroll, terminate, etc.) against ground-truth human actions and reports detailed metrics.
376
-
377
- 👉 See **[AgentNetBench/README.md](./evaluation/agentnetbench/README.md)** for usage instructions.
378
-
379
- # Acknowledge
380
- <p>
381
- We thank Yu Su, Caiming Xiong, and the anonymous reviewers for their insightful discussions and valuable feedback.
382
- We are grateful to Moonshot AI for providing training infrastructure and annotated data.
383
- We also sincerely appreciate Hao Yang, Zhengtao Wang, and Yanxu Chen from the Kimi Team for their strong infrastructure support and helpful guidance.
384
- We thank Chong Peng, Taofeng Xue, and Qiumian Huang from the <a href="https://github.com/meituan/EvoCUA" target="_blank">Meituan EvoCUA Team</a> for their contributions to vLLM integration.
385
- The development of our tool is based on the open-source projects-<a href="https://github.com/TheDuckAI/DuckTrack" target="_blank">DuckTrack</a> and <a href="https://github.com/OpenAdaptAI/OpenAdapt" target="_blank">OpenAdapt</a>.
386
- We are very grateful to their commitment to the open source community. Finally, we extend our deepest thanks to all annotators for their tremendous effort and contributions to this project.
387
- </p>
388
-
389
- # License
390
-
391
- This project is licensed under the MIT License - see the LICENSE file in the root folder for details.
392
-
393
- ## Research Use and Disclaimer
394
-
395
- OpenCUA models are intended for **research and educational purposes only**.
396
-
397
- ### Prohibited Uses
398
- - The model may **not** be used for any purpose or activity that violates applicable laws or regulations in any jurisdiction
399
- - Use for illegal, unethical, or harmful activities is strictly prohibited
400
 
401
- ### Disclaimer
402
- - The authors, contributors, and copyright holders are **not responsible** for any illegal, unethical, or harmful use of the Software, nor for any direct or indirect damages resulting from such use
403
- - Use of the "OpenCUA" name, logo, or trademarks does **not** imply any endorsement or affiliation unless separate written permission is obtained
404
- - Users are solely responsible for ensuring their use complies with applicable laws and regulations
405
 
406
  ## Important Notes on Coordinate Systems
407
  <div style="border-left: 6px solid #9ca3af; background: #f5f5f5; padding: 12px 16px; margin: 16px 0;">
 
306
  --coordinate_type qwen25
307
  ```
308
 
309
+ ## Research and Commercial Use
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
310
 
311
+ OpenCUA (including the model, dataset, tools, and code) may be used for **research, educational, and commercial purposes** under the **MIT License** (see `LICENSE`).
 
 
 
312
 
313
  ## Important Notes on Coordinate Systems
314
  <div style="border-left: 6px solid #9ca3af; background: #f5f5f5; padding: 12px 16px; margin: 16px 0;">