Home
Data Coverage

Data Coverage

The General Data Platform is designed to reflect real-world user interactions across computers and mobile devices in diverse tools and environments. The platform currently includes 160.9K+ tasks, representing ~6 million actions across 1,500+ hours of recordings and 120+ tools. This breadth enables AI researchers to train and evaluate Computer-Use Agents (CUA), Browser-Use Agents, and Mobile-Use Agents at a meaningful scale.

Environment Diversity

The dataset captures environment variability to improve robustness and reduce overfitting.

OS variations

  • Tasks are recorded across Windows, macOS, and Android environments
  • iOS coverage is in progress as part of the ongoing mobile dataset expansion

Multiple screen resolutions

  • Recordings are collected across a variety of screen sizes and resolutions to capture layout and responsiveness differences

UI themes

  • Data includes recordings in Light mode, Dark mode, and Auto/system-controlled themes, reflecting real user settings

Video capture fidelity

  • All recordings are captured at 60 FPS, preserving fine cursor movements, UI transitions, and timing signals that are important for agent training

Input Representation

  • The platform captures keyboard input at the raw keycode level
  • Raw keycodes are converted into meaningful, standardized keypress representations
  • This conversion preserves user intent rather than hardware-specific signals
  • Keypress functions ensure consistency across different keyboards and layouts

Domains and Use Cases

The dataset covers both general-purpose tasks and specialized workflows across common software categories. Chain-of-Thought (CoT) tasks are also performed, where actions are linked with the human reasoning behind them.

Diverse use cases covered include:

Office productivity

  • Document editing, spreadsheets, presentations, email workflows, and collaboration tasks

Enterprise SaaS

  • CRM and business workflows, internal tools, operational platforms, and administrative interfaces

Design and creative tools

  • Prototyping, design, image editing, and creative production workflows

Consumer applications

  • Social media, travel planning, shopping, and other consumer-facing tools

Coverage depth

  • Includes human-generic workflows used broadly across users
  • Includes niche specializations that require tool-specific expertise
  • Focuses on globally recognized tools that are widely adopted across regions and industries

Tool Coverage

The platform captures interaction across a broad and growing ecosystem of software tools, currently spanning 120+ tools.

Desktop applications

  • Native productivity, creative, coding, communication, and administrative tools
  • Executed primarily in Windows and macOS environments

Browser-based tools

  • SaaS platforms, dashboards, CRMs, web builders, and consumer web services
  • Provides strong coverage for training and evaluating Browser-Use Agent (BUA) systems

Mobile applications

  • Mobile workflows captured to support training Mobile-Use Agent (MUA) systems
  • The platform currently includes Android environments, with an iOS dataset under active curation

Coverage Growth

The General Data Platform is built to grow fast and stay relevant. Initial coverage began with trusted analytics tools such as Tableau, Looker, Metabase, Superset, and Google Data Studio, and expanded into broader real-world workflows as demand evolved.

Today, coverage continues to scale across coding, e-commerce, CMS, and other high-impact categories, alongside ongoing investment in new capture innovations to keep pace with evolving software ecosystems.

Continuous dataset expansion

  • On average, the platform produces 100,000 to 150,000 tasks per month, expanding coverage across tools, workflows, and domains
  • Growth is driven by a globally distributed recorder workforce that scales with demand, enabling rapid onboarding of new tools and fast expansion into emerging use cases

Recorder workforce expansion

  • The recorder workforce grows steadily, with 200 to 300 new recorders added per month
  • Recorders come from diverse professional backgrounds, including engineers, software developers, CAD professionals, and other domain specialists
  • As the workforce grows, more specialized workflows are captured and new tools are added

New tools added regularly

  • Workforce scaling enables faster introduction of new tools, deeper domain coverage, and higher-fidelity recordings
  • Ongoing expansion supports adoption of emerging tools, updated software versions, and new categories, keeping coverage current and industry-relevant

Custom data on request

  • Custom data collection is supported for specific tools, domains, task types, OS constraints, and environment requirements
  • Enables targeted workflows for partners and researchers, supporting both broad generalization and deep coverage for niche, high-value workflows

Human and Geographic Diversity

Recorder distribution

  • Supported by a global crowd network of approximately 1,500 to 2,000 remote workers
  • Contributors span regions including the United States, Canada, Australia, France, Germany, Greece, Turkey, and others as global presence expands
  • Additionally, 70 to 80 in-office recorders are based at headquarters in India

Localization differences

Recordings naturally reflect differences in:

  • Local time and work schedules
  • Regional usage patterns
  • Interaction styles and pacing

This strengthens coverage for agents designed to operate reliably across diverse user populations.