Skip to content

Commit ab0f24b

Browse files
committed
docs: update README with new agent entries and research section
Add 12 new AI agent entries across Platform/Cloud/Desktop categories. Introduce new "Research Projects (Computer Use)" section with 8 academic projects. Update .gitignore to exclude .research directory.
1 parent 3eb8f20 commit ab0f24b

2 files changed

Lines changed: 32 additions & 1 deletion

File tree

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,4 +72,5 @@ build/
7272
# package-lock.json
7373
# yarn.lock
7474
# pnpm-lock.yaml
75-
docs/UPDATE_RULES.md
75+
docs/UPDATE_RULES.md
76+
.research

docs/readme.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -583,6 +583,12 @@ Platforms and runtimes for running or connecting AI agents.
583583
| **Pico Claw** | Ultra-lightweight agent | Free (OSS) || Embedded/IoT deployments, single-binary under 10 MB ||
584584
| **Clam** | Compliance-focused agent | Free (OSS) || Regulated industries, detailed audit logs of agent decisions ||
585585
| **Taskllet** | No-code agent builder | Free (OSS) || Drag-and-drop workflow builder for non-developers ||
586+
| **AutoGPT** | Autonomous agent | Free (OSS) || Self-prompting GPT agent with memory, pioneer project | [🔗](https://github.com/Significant-Gravitas/AutoGPT) |
587+
| **BabyAGI** | Task-driven agent | Free (OSS) || Autonomous task creation and prioritization | [🔗](https://github.com/yoheinakajima/babyagi) |
588+
| **Suna** | Generalist agent | Free (OSS) || Versatile open-source agent for complex tasks (Kortix) | [🔗](https://github.com/kortix-ai/suna) |
589+
| **OWL** | Multi-agent framework | Free (OSS) || Distributed task automation (Camel-AI) | [🔗](https://github.com/camel-ai/owl) |
590+
| **CogAgent** | Vision GUI model | Free (Research) || High-performance vision-based GUI understanding (Tsinghua/Zhipu) | [🔗](https://github.com/THUDM/CogVLM2) |
591+
| **HyperAgent** | Code agent | Free (OSS) || GitHub issue resolution, repository-level code generation | [🔗](https://github.com/FSoft-AI4Code/HyperAgent) |
586592

587593
#### Cloud Agent Services
588594

@@ -613,6 +619,12 @@ Agents that run directly on your machine and interact with the OS, screen, keybo
613619
| **UFO** ||||| Windows-specific app automation | [🔗](https://github.com/microsoft/UFO) |
614620
| **Bytebot** ||||| Self-hosted (Docker), headless ||
615621
| **Microsoft Fara-7B** ||||| Open-weight vision grounding model | [🔗](https://github.com/microsoft/Fara) |
622+
| **UI-TARS** ||||| Autonomous GUI execution, vision-language-action model (ByteDance) | [🔗](https://github.com/bytedance/UI-TARS-desktop) |
623+
| **c/ua** ||||| Isolated VM environments, open-source CU infrastructure | [🔗](https://github.com/trycua/cua) |
624+
| **Windows-Use** ||||| Windows OS-specific agent automation | [🔗](https://github.com/CursorTouch/Windows-Use) |
625+
| **OpenCUA** ||||| Open foundations for computer-use agents | [🔗](https://github.com/xlang-ai/OpenCUA) |
626+
| **Devin** ||||| Full-stack software engineering agent (Cognition Labs) ||
627+
| **Ace** ||||| 20x human speed on UI tasks (General Agents) ||
616628

617629
##### Cloud / API Computer Use Agents
618630

@@ -624,6 +636,9 @@ Agents accessed via API or cloud service — OS-independent, but require interne
624636
| **OpenAI Operator** | API || Guided browser and desktop computer use ||
625637
| **Amazon Nova Act** | API || AWS browser automation SDK ||
626638
| **Manus AI** | Cloud || General-purpose cloud agent ||
639+
| **Adept AI (ACT-1)** | API || Pioneer in digital actions, self-correcting behavior ||
640+
| **AskUI Vision Agent** | API || Cross-platform vision automation without VMs ||
641+
| **Highlight AI** | Desktop + Cloud || Privacy-first desktop Q&A and automation ||
627642

628643
#### RPA & Visual Frameworks
629644

@@ -640,6 +655,21 @@ Agents accessed via API or cloud service — OS-independent, but require interne
640655
| **Nut.js** | Cross-platform | Visual search, image matching ||
641656
| **OpenAdapt** | Windows, macOS | Learning from demonstration | [🔗](https://github.com/OpenAdaptAI/OpenAdapt) |
642657

658+
#### Research Projects (Computer Use)
659+
660+
Notable academic and industry research advancing the field of computer-use agents.
661+
662+
| Project | Developer | Focus | Year | Paper |
663+
|---------|-----------|-------|------|-------|
664+
| **Gato** | Google DeepMind | Multi-modal, multi-task, multi-embodiment agent | 2022 | [DeepMind](https://deepmind.google/research/publications/60307/) |
665+
| **PaLM-E** | Google DeepMind | Embodied multimodal language model | 2023 | [arXiv](https://arxiv.org/abs/2303.03378) |
666+
| **RT-2** | Google DeepMind | Vision-language-action model for robotics | 2023 | [arXiv](https://arxiv.org/abs/2307.15818) |
667+
| **HuggingGPT (Jarvis)** | Microsoft | Orchestrates specialists for multi-modal tasks | 2023 | [arXiv](https://arxiv.org/abs/2303.17580) |
668+
| **SIMA** | Google DeepMind | Generalist AI agent for 3D virtual environments | 2024 | [DeepMind](https://deepmind.google/discover/blog/sima/) |
669+
| **Magma** | Microsoft Research | Vision-language-action foundation model | 2025 | [arXiv](https://arxiv.org/abs/2502.12256) |
670+
| **WebAgent** | Google DeepMind | Autonomous web browsing and form-filling | 2024 | [arXiv](https://arxiv.org/abs/2310.03685) |
671+
| **WebVoyager** | Hongliang He et al. | Autonomous web browsing (59.1% on 15-website benchmark) | 2024 | [arXiv](https://arxiv.org/abs/2401.13919) |
672+
643673
---
644674

645675
## Guides 📚

0 commit comments

Comments
 (0)