Home
Data Coverage

Data Coverage

Explains the measurable breadth of tasks, tools, environments, and workflows represented in the dataset.

Coverage is not a function of volume.

It is a function of how much of the real interaction space is captured in a form that is learnable, reproducible, and aligned with actual human execution.

Scale Metrics

The dataset is built to operate at large-scale model and agent training.

Trajectories
201K+
State Action Pairs
~8.9M
Recorded Hours
3044
Tools
150+
Average Step Count
~43.89

Task Creation

Feature-level completeness alone is a weak proxy for learning. Many features are rarely used, weakly tied to outcomes, or do not require meaningful reasoning. Optimizing for feature coverage only, leads to shallow interaction surfaces.

Coverage is built by breaking tools into foundational blocks and features:

  • Function Mapping defines how users interact with the UI (navigation, layout, interaction patterns)
  • Feature Mapping defines what the tool enables (its core capabilities or use cases)

They are covered through tasks at different levels of task complexity.

Tasks are divided into five types:

  • Elementary (1 step): single UI interactions covering basic actions
  • Atomic (2–3 steps): short sequences covering simple feature usage
  • Multi-Step (4–15 steps): combines multiple features to complete a specific goal
  • Workflow (15+ steps): long-horizon execution with retries, backtracking, and context switching
  • Games: interactive tasks with dynamic state changes and variable-length execution

Reasoning traces are aligned with these tasks to capture intent-based decision-making across steps.

Outcome

  • UI interaction space is fully covered
  • Tool capabilities are exercised across increasing complexity
  • Tasks capture both execution and reasoning behavior

Coverage

Each trajectory corresponds to a goal-completion, where a user moves from an intent to a verifiable outcome. Tasks are distributed on the core function of user intent:

Creation

Generating documents, designs, structured content

Modification

Editing, formatting, transforming existing files

Retrieval

Search, navigation, information extraction

Configuration

Settings, permissions, system adjustments

Transactional

Booking, shortlisting, and submissions

Communication

Messaging, scheduling, collaboration

Analytical

Structured data, dashboards, reporting

Diversity

Task diversity emerges naturally throughout the data collection process from variation across multiple axes:

Source of VariabilityHow it Varies
ExecutionMultiple valid approaches exist for the same task, with operator-specific strategies, action ordering, and variation in decision paths and intermediate steps.
Cross-Tool TrajectoriesTasks span multiple tools and interfaces, requiring transition of context, data, and documents across environments, often forming workflows that cannot be completed within a single tool.
EnvironmentTasks are executed across different screen resolutions, device configurations, operating systems, UI layouts, themes (light/dark), and dynamic UI states such as popups, async loading, and conditional rendering.
ContextExecution begins from non-canonical starting states, including pre-existing files, sessions, and dashboards, where prior context and system conditions directly influence task behavior.
Interaction DiversityTasks involve varied input modalities including mouse, keyboard, and gestures, with differences in timing, latency, execution speed, and natural inclusion of retries, backtracking, and corrections.

Domain Coverage

Domain coverage defines where tasks occur and how interaction patterns are distributed across real software ecosystems. Percentages are normalized against total task volume (~201K tasks).

DomainTasks (% of Total)Tool CountTools
Office Productivity102,600 (52.65%)46Apple Calendar, Apple Keynote, Apple Mail, Apple Number, Apple Pages, Asana, Confluence, Dropbox, Eval TaskSet, Evernote, Gmail, Google Calendar, Google Classroom, Google Docs, Google Drive, Google Forms, Google Keep, Google Meet, Google Sheets, Google Slides, iCloud, Monday.com, MS Excel, MS PowerPoint, MS Word, Multi-Tool, Notes, Notion, Obsidian, OneDrive, OneNote, Outlook, Outlook Calendar, Power BI, Seafile, Skype, SurveyMonkey, Trello, VS Code Theme Studio, Web Browsing, WeTransfer, WinRAR, WinZip, Zoho Notebook, Zoho Sprints, Zoom
Design & Creativity17,893 (9.18%)7Adobe Photoshop, Affinity Designer, Blender, Canva, CapCut, DaVinci Resolve, Figma
Coding & Development17,770 (9.12%)16Android Studio, Bitbucket, Docker, GitHub, Google Colab, Jenkins, Jira, Jupyter Notebook, Looker Studio, MDN Docs, Mode Analytics, MySQL, PyCharm, PythonAnywhere, Sublime Text, VS Code
Other Tools8,264 (4.24%)187-Zip, Adidas Running, Amazon Music, Audible, Audiomack, Calculator, Calendly, ChatGPT, Deezer, Google Contacts, Mailchimp, Outlook Contacts, Preview, Spotify, The Guardian, VLC Media Player, Wikipedia, YouTube Music
Travel & Booking8,165 (4.19%)6Airbnb, Booking.com, Expedia, Google Flights, Kayak, Skyscanner
Communication7,955 (4.08%)5Discord, Microsoft Teams, Slack, Teamwork, WhatsApp
Shopping / E-commerce7,348 (3.77%)7Amazon, Ikea, Instacart, Nike, Sephora, Walmart, Zara
CRM & Operations6,285 (3.23%)4HubSpot, Klaviyo, Salesforce, Zoho CRM
Browser & Web5,826 (2.99%)7Brave, Google, Google Chrome, Microsoft Edge, Mozilla Firefox, Opera, Safari
Web Design / CMS5,417 (2.78%)6Carrd, Squarespace, Tailwind UI, Webflow, Wix, WordPress
Social Media3,064 (1.57%)6Facebook, Imgur, Instagram, LinkedIn, X.com, YouTube
OS Control1,920 (0.99%)3macOS System Controls, Windows 10, Windows System Controls
Games1,481 (0.76%)268 Ball Pool, Angry Birds, Bridge, Checkers, Chess.com, Go, Go (Game), Happy Glass, Ludo, Mario, Plant Vs. Zombies, Purble Place, Richup.io, Scribble, Smash Karts, Snake and Ladder, Spider Solitaire, Stumble Guys, Subway Surfers, Sudoku, Temple Run, Tetris, Tic Tac Toe, UNO, Water Sort, Wordle
Maps421 (0.22%)2Apple Maps, Google Maps
Benchmark338 (0.17%)4Mind2Web, OSWorld, SheetBench, WebVoyager
Cloud / Infra112 (0.06%)3AWS, Azure, GCP
Engineering Desgin3,698 (1.83%)2SolidWorks, Fusion 360

Dataset Signals

  • 54% of total tasks are concentrated in office productivity workflows, reflecting the highest-density real-world interaction environments.
  • 100+ tools across 16 domains are covered, providing broad interface diversity across software ecosystems.
  • 20%+ of tasks come from structured domains (coding, CRM, design), where execution involves dependencies, multi-step reasoning, and planning.
  • 10%+ of tasks come from dynamic domains (shopping, travel, social), introducing non-deterministic UI behavior, layout shifts, and content variability.
  • 4%+ of tasks are explicitly categorized under system-level and browser interactions, capturing navigation, control flow, and environment transitions. These interaction patterns are also implicitly present across the broader dataset through web-based and desktop workflows.