Home
Temporal Semantics

Temporal Semantics

Temporal Semantics ensures precise alignment between actions, visual states, and time in General Data demonstrations. All interactions are first captured in a raw action log that records every low-level mouse, keyboard, and touch event in exact execution order using millisecond-precision timestamps.

These raw events are then consolidated into grouped actions, where multiple related events are combined into a single interaction unit representing one user intent. For example, a click is represented by mouse down and mouse up, and a drag is represented by drag start and drag end.

Pre-action and post-action frames are generated only for grouped actions, not for every raw event. This event-driven design ensures consistent action–frame–video alignment while preserving true timing and ordering for inspection and model training.

Capture Strategy

  • Frames are captured on every grouped interaction event, not at fixed intervals
  • Frame capture is event-driven, triggered by mouse, keyboard, drag, scroll, or touch actions
  • Video is recorded at 60 FPS, but supervision frames are selected based on action timing
  • No fixed-FPS frame sampling is used for action–state alignment

State Representation

  • Pre-action frame is captured immediately before an interaction begins
  • Post-action frame is captured immediately after the interaction completes
  • For grouped actions, frames are tied to explicit start and end events
  • Each frame is linked to the exact action timestamp in milliseconds
  • Visual state, action event, and timestamp share the same Task ID (TID)

Action Timing

  • All actions are logged with millisecond-level timestamps
  • Time gaps between actions preserve natural human latency
  • No smoothing, interpolation, or time normalization is applied
  • No manipulation or simulation of consecutive actions is performed synthetically

Known Edge Cases

Some UI behaviors do not update instantly or synchronously with user actions:

  • UI Animations- Visual changes may continue after an action is performed, such as button animations or transitions.

  • Loading Delays- Delays between user action and UI updates due to processing or loading.

  • Asynchronous UI Updates- Background updates without direct user actions.

  • Network-dependent State Changes- Variable delays based on connectivity and server response time.

Enforcement Rules

  • Every action must have:

    • A start timestamp
    • An end timestamp
    • A pre-action frame of the grouped action
    • A post-action frame of the grouped action
  • Timestamps must be strictly monotonic within a task

  • Grouped actions must expose all internal sub-events

  • Direction, duration, coordinates, and key identity must be explicitly logged

  • Missing frames, logs, or timestamps result in automatic rejection

  • Frame–action misalignment beyond defined thresholds invalidates the task

  • No inferred, derived, or interpolated timestamps are permitted