@@ -583,6 +583,12 @@ Platforms and runtimes for running or connecting AI agents.
583583| ** Pico Claw** | Ultra-lightweight agent | Free (OSS) | ✅ | Embedded/IoT deployments, single-binary under 10 MB | ❌ |
584584| ** Clam** | Compliance-focused agent | Free (OSS) | ✅ | Regulated industries, detailed audit logs of agent decisions | ❌ |
585585| ** Taskllet** | No-code agent builder | Free (OSS) | ✅ | Drag-and-drop workflow builder for non-developers | ❌ |
586+ | ** AutoGPT** | Autonomous agent | Free (OSS) | ✅ | Self-prompting GPT agent with memory, pioneer project | [ 🔗] ( https://github.com/Significant-Gravitas/AutoGPT ) |
587+ | ** BabyAGI** | Task-driven agent | Free (OSS) | ✅ | Autonomous task creation and prioritization | [ 🔗] ( https://github.com/yoheinakajima/babyagi ) |
588+ | ** Suna** | Generalist agent | Free (OSS) | ✅ | Versatile open-source agent for complex tasks (Kortix) | [ 🔗] ( https://github.com/kortix-ai/suna ) |
589+ | ** OWL** | Multi-agent framework | Free (OSS) | ✅ | Distributed task automation (Camel-AI) | [ 🔗] ( https://github.com/camel-ai/owl ) |
590+ | ** CogAgent** | Vision GUI model | Free (Research) | ✅ | High-performance vision-based GUI understanding (Tsinghua/Zhipu) | [ 🔗] ( https://github.com/THUDM/CogVLM2 ) |
591+ | ** HyperAgent** | Code agent | Free (OSS) | ✅ | GitHub issue resolution, repository-level code generation | [ 🔗] ( https://github.com/FSoft-AI4Code/HyperAgent ) |
586592
587593#### Cloud Agent Services
588594
@@ -613,6 +619,12 @@ Agents that run directly on your machine and interact with the OS, screen, keybo
613619| ** UFO** | ✅ | ❌ | ❌ | ✅ | Windows-specific app automation | [ 🔗] ( https://github.com/microsoft/UFO ) |
614620| ** Bytebot** | ❌ | ❌ | ✅ | ✅ | Self-hosted (Docker), headless | ❌ |
615621| ** Microsoft Fara-7B** | ✅ | ✅ | ✅ | ✅ | Open-weight vision grounding model | [ 🔗] ( https://github.com/microsoft/Fara ) |
622+ | ** UI-TARS** | ✅ | ✅ | ✅ | ✅ | Autonomous GUI execution, vision-language-action model (ByteDance) | [ 🔗] ( https://github.com/bytedance/UI-TARS-desktop ) |
623+ | ** c/ua** | ✅ | ✅ | ✅ | ✅ | Isolated VM environments, open-source CU infrastructure | [ 🔗] ( https://github.com/trycua/cua ) |
624+ | ** Windows-Use** | ✅ | ❌ | ❌ | ✅ | Windows OS-specific agent automation | [ 🔗] ( https://github.com/CursorTouch/Windows-Use ) |
625+ | ** OpenCUA** | ✅ | ✅ | ✅ | ✅ | Open foundations for computer-use agents | [ 🔗] ( https://github.com/xlang-ai/OpenCUA ) |
626+ | ** Devin** | ✅ | ✅ | ✅ | ✅ | Full-stack software engineering agent (Cognition Labs) | ❌ |
627+ | ** Ace** | ✅ | ✅ | ✅ | ✅ | 20x human speed on UI tasks (General Agents) | ❌ |
616628
617629##### Cloud / API Computer Use Agents
618630
@@ -624,6 +636,9 @@ Agents accessed via API or cloud service — OS-independent, but require interne
624636| ** OpenAI Operator** | API | ✅ | Guided browser and desktop computer use | ❌ |
625637| ** Amazon Nova Act** | API | ✅ | AWS browser automation SDK | ❌ |
626638| ** Manus AI** | Cloud | ✅ | General-purpose cloud agent | ❌ |
639+ | ** Adept AI (ACT-1)** | API | ✅ | Pioneer in digital actions, self-correcting behavior | ❌ |
640+ | ** AskUI Vision Agent** | API | ✅ | Cross-platform vision automation without VMs | ❌ |
641+ | ** Highlight AI** | Desktop + Cloud | ✅ | Privacy-first desktop Q&A and automation | ❌ |
627642
628643#### RPA & Visual Frameworks
629644
@@ -640,6 +655,21 @@ Agents accessed via API or cloud service — OS-independent, but require interne
640655| ** Nut.js** | Cross-platform | Visual search, image matching | ❌ |
641656| ** OpenAdapt** | Windows, macOS | Learning from demonstration | [ 🔗] ( https://github.com/OpenAdaptAI/OpenAdapt ) |
642657
658+ #### Research Projects (Computer Use)
659+
660+ Notable academic and industry research advancing the field of computer-use agents.
661+
662+ | Project | Developer | Focus | Year | Paper |
663+ | ---------| -----------| -------| ------| -------|
664+ | ** Gato** | Google DeepMind | Multi-modal, multi-task, multi-embodiment agent | 2022 | [ DeepMind] ( https://deepmind.google/research/publications/60307/ ) |
665+ | ** PaLM-E** | Google DeepMind | Embodied multimodal language model | 2023 | [ arXiv] ( https://arxiv.org/abs/2303.03378 ) |
666+ | ** RT-2** | Google DeepMind | Vision-language-action model for robotics | 2023 | [ arXiv] ( https://arxiv.org/abs/2307.15818 ) |
667+ | ** HuggingGPT (Jarvis)** | Microsoft | Orchestrates specialists for multi-modal tasks | 2023 | [ arXiv] ( https://arxiv.org/abs/2303.17580 ) |
668+ | ** SIMA** | Google DeepMind | Generalist AI agent for 3D virtual environments | 2024 | [ DeepMind] ( https://deepmind.google/discover/blog/sima/ ) |
669+ | ** Magma** | Microsoft Research | Vision-language-action foundation model | 2025 | [ arXiv] ( https://arxiv.org/abs/2502.12256 ) |
670+ | ** WebAgent** | Google DeepMind | Autonomous web browsing and form-filling | 2024 | [ arXiv] ( https://arxiv.org/abs/2310.03685 ) |
671+ | ** WebVoyager** | Hongliang He et al. | Autonomous web browsing (59.1% on 15-website benchmark) | 2024 | [ arXiv] ( https://arxiv.org/abs/2401.13919 ) |
672+
643673---
644674
645675## Guides 📚
0 commit comments