Data Coverage
Explains the measurable breadth of tasks, tools, environments, and workflows represented in the dataset.
Coverage is not a function of volume.
It is a function of how much of the real interaction space is captured in a form that is learnable, reproducible, and aligned with actual human execution.
Scale Metrics
The dataset is built to operate at large-scale model and agent training.
Task Creation
Feature-level completeness alone is a weak proxy for learning. Many features are rarely used, weakly tied to outcomes, or do not require meaningful reasoning. Optimizing for feature coverage only, leads to shallow interaction surfaces.
Coverage is built by breaking tools into foundational blocks and features:
- Function Mapping defines how users interact with the UI (navigation, layout, interaction patterns)
- Feature Mapping defines what the tool enables (its core capabilities or use cases)
They are covered through tasks at different levels of task complexity.
Tasks are divided into five types:
- Elementary (1 step): single UI interactions covering basic actions
- Atomic (2–3 steps): short sequences covering simple feature usage
- Multi-Step (4–15 steps): combines multiple features to complete a specific goal
- Workflow (15+ steps): long-horizon execution with retries, backtracking, and context switching
- Games: interactive tasks with dynamic state changes and variable-length execution
Reasoning traces are aligned with these tasks to capture intent-based decision-making across steps.
Outcome
- UI interaction space is fully covered
- Tool capabilities are exercised across increasing complexity
- Tasks capture both execution and reasoning behavior
Coverage
Each trajectory corresponds to a goal-completion, where a user moves from an intent to a verifiable outcome. Tasks are distributed on the core function of user intent:
Creation
Generating documents, designs, structured content
Modification
Editing, formatting, transforming existing files
Retrieval
Search, navigation, information extraction
Configuration
Settings, permissions, system adjustments
Transactional
Booking, shortlisting, and submissions
Communication
Messaging, scheduling, collaboration
Analytical
Structured data, dashboards, reporting
Diversity
Task diversity emerges naturally throughout the data collection process from variation across multiple axes:
| Source of Variability | How it Varies |
|---|---|
| Execution | Multiple valid approaches exist for the same task, with operator-specific strategies, action ordering, and variation in decision paths and intermediate steps. |
| Cross-Tool Trajectories | Tasks span multiple tools and interfaces, requiring transition of context, data, and documents across environments, often forming workflows that cannot be completed within a single tool. |
| Environment | Tasks are executed across different screen resolutions, device configurations, operating systems, UI layouts, themes (light/dark), and dynamic UI states such as popups, async loading, and conditional rendering. |
| Context | Execution begins from non-canonical starting states, including pre-existing files, sessions, and dashboards, where prior context and system conditions directly influence task behavior. |
| Interaction Diversity | Tasks involve varied input modalities including mouse, keyboard, and gestures, with differences in timing, latency, execution speed, and natural inclusion of retries, backtracking, and corrections. |
Domain Coverage
Domain coverage defines where tasks occur and how interaction patterns are distributed across real software ecosystems. Percentages are normalized against total task volume (~201K tasks).
| Domain | Tasks (% of Total) | Tool Count | Tools |
|---|---|---|---|
| Office Productivity | 102,600 (52.65%) | 46 | Apple Calendar, Apple Keynote, Apple Mail, Apple Number, Apple Pages, Asana, Confluence, Dropbox, Eval TaskSet, Evernote, Gmail, Google Calendar, Google Classroom, Google Docs, Google Drive, Google Forms, Google Keep, Google Meet, Google Sheets, Google Slides, iCloud, Monday.com, MS Excel, MS PowerPoint, MS Word, Multi-Tool, Notes, Notion, Obsidian, OneDrive, OneNote, Outlook, Outlook Calendar, Power BI, Seafile, Skype, SurveyMonkey, Trello, VS Code Theme Studio, Web Browsing, WeTransfer, WinRAR, WinZip, Zoho Notebook, Zoho Sprints, Zoom |
| Design & Creativity | 17,893 (9.18%) | 7 | Adobe Photoshop, Affinity Designer, Blender, Canva, CapCut, DaVinci Resolve, Figma |
| Coding & Development | 17,770 (9.12%) | 16 | Android Studio, Bitbucket, Docker, GitHub, Google Colab, Jenkins, Jira, Jupyter Notebook, Looker Studio, MDN Docs, Mode Analytics, MySQL, PyCharm, PythonAnywhere, Sublime Text, VS Code |
| Other Tools | 8,264 (4.24%) | 18 | 7-Zip, Adidas Running, Amazon Music, Audible, Audiomack, Calculator, Calendly, ChatGPT, Deezer, Google Contacts, Mailchimp, Outlook Contacts, Preview, Spotify, The Guardian, VLC Media Player, Wikipedia, YouTube Music |
| Travel & Booking | 8,165 (4.19%) | 6 | Airbnb, Booking.com, Expedia, Google Flights, Kayak, Skyscanner |
| Communication | 7,955 (4.08%) | 5 | Discord, Microsoft Teams, Slack, Teamwork, WhatsApp |
| Shopping / E-commerce | 7,348 (3.77%) | 7 | Amazon, Ikea, Instacart, Nike, Sephora, Walmart, Zara |
| CRM & Operations | 6,285 (3.23%) | 4 | HubSpot, Klaviyo, Salesforce, Zoho CRM |
| Browser & Web | 5,826 (2.99%) | 7 | Brave, Google, Google Chrome, Microsoft Edge, Mozilla Firefox, Opera, Safari |
| Web Design / CMS | 5,417 (2.78%) | 6 | Carrd, Squarespace, Tailwind UI, Webflow, Wix, WordPress |
| Social Media | 3,064 (1.57%) | 6 | Facebook, Imgur, Instagram, LinkedIn, X.com, YouTube |
| OS Control | 1,920 (0.99%) | 3 | macOS System Controls, Windows 10, Windows System Controls |
| Games | 1,481 (0.76%) | 26 | 8 Ball Pool, Angry Birds, Bridge, Checkers, Chess.com, Go, Go (Game), Happy Glass, Ludo, Mario, Plant Vs. Zombies, Purble Place, Richup.io, Scribble, Smash Karts, Snake and Ladder, Spider Solitaire, Stumble Guys, Subway Surfers, Sudoku, Temple Run, Tetris, Tic Tac Toe, UNO, Water Sort, Wordle |
| Maps | 421 (0.22%) | 2 | Apple Maps, Google Maps |
| Benchmark | 338 (0.17%) | 4 | Mind2Web, OSWorld, SheetBench, WebVoyager |
| Cloud / Infra | 112 (0.06%) | 3 | AWS, Azure, GCP |
| Engineering Desgin | 3,698 (1.83%) | 2 | SolidWorks, Fusion 360 |
Dataset Signals
- 54% of total tasks are concentrated in office productivity workflows, reflecting the highest-density real-world interaction environments.
- 100+ tools across 16 domains are covered, providing broad interface diversity across software ecosystems.
- 20%+ of tasks come from structured domains (coding, CRM, design), where execution involves dependencies, multi-step reasoning, and planning.
- 10%+ of tasks come from dynamic domains (shopping, travel, social), introducing non-deterministic UI behavior, layout shifts, and content variability.
- 4%+ of tasks are explicitly categorized under system-level and browser interactions, capturing navigation, control flow, and environment transitions. These interaction patterns are also implicitly present across the broader dataset through web-based and desktop workflows.