Skip to content

Zero QA Architecture

What Users Need to Know

From a user point of view, Zero QA is simple:

  • you provide suites in business language
  • you define suite dependencies in zero-qa.yaml
  • you can provide an ordered initialization script list in zero-qa.yaml
  • Zero QA runs executors through an Agent runtime such as Codex or Claude Code
  • zero-qa ui can expose the shared workspace through a local read-only dashboard
  • mobile and browser connections are already available out of the box

How Zero QA Works

mermaid
flowchart TD
    subgraph O[Only Once]
        A[Business Project<br/>zero-qa.yaml] --> B

        subgraph B["Scenario + Target Suites"]
            T1["Suite A (Usage doc)<br/>how to use Feature A<br/>e.g. log in"]
            T2["Suite B (Usage doc)<br/>how to use Feature B<br/>e.g. browse products"]
            T3["Suite Z (Usage doc)<br/>how to use Feature Z<br/>e.g. place an order"]
            T1 -->|"prerequisite of"| T2
            T2 -->|"prerequisite of"| T3
        end

        C[DAG Planning<br/>validate + topological order]
        D[Executor Preflight<br/>executor.yaml + host tools]
        E[Project Init Scripts<br/>pre_test_scripts]
        R[Runtime Inputs Resolution<br/>scenario vars + resolve commands]

        B --> C
        C --> D
        D --> E
        E --> R
    end

    subgraph L[Loop Suites]
        F[One Planned Suite]
        G[Per-Suite Workspace]
        H[Per-Suite Execution<br/>Agent runtime + executor skill]
        I[Android or iPhone / Browser]
        J[Suite Result<br/>PASS or FAILED]
        W[Shared Run Workspace<br/>run_meta.json + suites/* + results/summary.json]

        F --> G
        G --> H
        H --> I
        I --> H
        H --> J
        G --> W
        J --> W
    end

    subgraph N[Run Finalization]
        M[Project Finalization Scripts<br/>post_test_scripts]
        K[Run Summary<br/>selected suites + durations]
        P[Post-run Hooks<br/>best-effort cleanup]
        M --> K
        K --> P
        K --> W
    end

    subgraph U[Read-only UI]
        Q[Workspace Reader<br/>filesystem scanner + status inference]
        S[JSON API + HTML Pages<br/>FastAPI + Jinja2]
        T[SSE Refresh Stream<br/>watchfiles]
        V[Browser Dashboard<br/>projects, runs, DAG, suite detail]

        Q --> S
        Q --> T
        S --> V
        T --> V
    end

    R --> F
    J --> M
    W --> Q

This is the intended meaning:

  • start from Business Project / zero-qa.yaml
  • Target Suites means the suite set that will actually run: user-selected suites plus automatically expanded upstream dependencies
  • Scenario + Target Suites, DAG Planning, Executor Preflight, and Project Init Scripts are run setup stages and happen once before the suite loop
  • Runtime Inputs Resolution runs once before the first suite workspace is created
  • Project Finalization Scripts, Run Summary, and Post-run Hooks are run finalization stages and happen once after the suite loop finishes
  • Per-Suite Workspace, Per-Suite Execution, and Suite Result happen once for each suite in the planned order
  • Shared Run Workspace is the contract boundary between zero-qa run and zero-qa ui
  • Zero QA parses the scenario, expands upstream dependencies when users select specific suites, and derives a topological execution order
  • suite focuses on business steps such as how to use Feature A to Feature Z
  • in practice, suite is often the usage document for that feature
  • suite does not include phone commands or browser commands
  • Executor Preflight loads executor metadata from executor.yaml, checks host tools before any suite execution starts, installs missing tools only when allowed, and re-checks afterward
  • Project Init Scripts runs once before suite execution starts and fails fast on the first script error
  • Runtime Inputs Resolution collects executor-scoped scenario variables, runs executor-declared resolve commands, and builds the concrete values rendered into copied skill files
  • Project Finalization Scripts runs post_test_scripts best-effort before the run summary is written, even when suite execution, pre-test scripts, host-tool checks, or runtime input resolution already failed
  • Post-run Hooks runs executor-declared best-effort cleanup commands after the run summary is written, preserving the main run result even when cleanup fails
  • each suite then runs in an isolated workspace with only the current suite and copied executor skill directories
  • Per-Suite Execution runs the Agent runtime and selected executor skill inside that suite workspace
  • the shared workspace persists run_meta.json, per-suite result.json, evidence files, and the final results/summary.json
  • run_meta.json now carries dag_nodes and dag_edges, so the UI can render the execution graph without importing planner code
  • Workspace Reader scans the shared workspace, infers in-progress suite states from directory and file presence, and fails fast when workspace data is inconsistent
  • JSON API + HTML Pages expose project history, run detail, and suite detail through one local FastAPI app
  • SSE Refresh Stream watches one run directory and tells the browser to refresh while a run is still in progress
  • suite results stay minimal, while run-level summary data tracks selected suites and stage durations
  • executors already know how to interact with Android or iPhone and Browser

In many cases, a suite is effectively the user document for one feature:

  • for Mobile App flows, it describes how to use the app
  • for Web flows, it describes how to use the website
  • once that usage document exists, Zero QA can use it as the testing input
  • an Agent can also generate the usage document or suites automatically by reading code

Minimal Demo

You can look at this minimal demo directly in examples/minimal_ecommerce_demo/zero-qa.yaml

It shows the three things users usually need to prepare:

  • the minimal zero-qa.yaml
  • the minimal suite documents
  • the dependency relationship between suites

Run-level executor selection:

  • use zero-qa run --executor-type web-executor for Browser runs
  • use zero-qa run --executor-type mobile-executor for Android or iPhone runs
  • if --executor-type is omitted, Zero QA first checks scenario defaults.executor_type
  • if the scenario does not define it, Zero QA uses global defaults.executor_type

Minimal zero-qa.yaml example:

yaml
scenario_name: minimal-ecommerce-demo
defaults:
  executor_type: web-executor
pre_test_scripts:
  - path: scripts/start-backend.sh
  - path: scripts/start-frontend.sh
    executor_type: web-executor
  - path: scripts/build-install-apk.sh
    executor_type: mobile-executor
post_test_scripts:
  - path: scripts/stop-backend.sh
  - path: scripts/stop-frontend.sh
    executor_type: web-executor
suites:
  - name: login
    path: suites/login.md
  - name: browse-products
    path: suites/browse-products.md
    needs:
      - login
  - name: place-order
    path: suites/place-order.md
    needs:
      - browse-products

Roles

  • Business Project / zero-qa.yaml
    • Defines the test entry for the project.
    • Declares suites and their dependencies.
    • Can set shared run defaults under defaults, including executor_type.
    • Can define an ordered initialization script list for the project.
  • suite
    • Is the usage document for one feature or one flow.
    • Describes how to use one feature.
    • Describes what counts as success or failure for that feature.
    • Stays in business language.
    • May be written manually or generated by an Agent.
  • Scenario + Target Suites
    • Parses zero-qa.yaml for the current run.
    • Starts from the suites chosen by the user for the run.
    • Expands upstream dependencies automatically before execution.
  • DAG Planner
    • Validates suite dependency declarations.
    • Detects invalid or cyclic graphs.
    • Produces the final execution order.
  • Executor Preflight
    • Loads the selected executor definition from executor.yaml.
    • Checks whether required host tools already exist on the host.
    • Installs missing tools only when auto-install is enabled.
    • Re-checks the tools after installation and fails fast on mismatch.
    • Runs once before any suite execution starts.
  • Project Init Scripts
    • Runs the ordered pre_test_scripts list from zero-qa.yaml.
    • Filters scripts by the effective run executor type while preserving order.
  • Project Finalization Scripts
    • Runs the ordered post_test_scripts list from zero-qa.yaml.
    • Reuses the same relative-path, working-directory, and executor-type filtering rules as pre_test_scripts.
    • Treats script failures as warnings so cleanup does not replace the main run result.
  • Runtime Inputs Resolution
    • Reads executor-scoped scenario variables from the <executor-name>: top-level mapping in zero-qa.yaml.
    • Executes executor-declared resolve_command entries from executor.yaml.
    • Merges scenario variables and resolved values into one runtime input map before any suite workspace is created.
  • Run Summary
    • Aggregates the final result after post_test_scripts finish.
    • Records the selected suite set and run-level timing information.
    • Keeps total_duration scoped to the main run path, excluding post_test_scripts and executor post_run_hooks.
  • Shared Run Workspace
    • Stores the filesystem contract shared by run and ui.
    • Persists run_meta.json, per-suite directories under suites/, debug evidence, and results/summary.json.
    • Keeps dag_nodes and dag_edges in run_meta.json so the read path can render the run graph directly.
  • Post-run Hooks
    • Executes executor-declared post_run_hooks from executor.yaml after the run summary is written.
    • Receives the final runtime input map again under the original declared names, so cleanup scripts can reuse resolved values directly.
    • Treats cleanup as best-effort: hook failures are logged, but they do not replace the main run result.
  • Per-Suite Workspace
    • Creates an isolated workspace for one suite at a time.
    • Copies the current suite file and executor skill directories into the workspace.
    • Renders runtime inputs into copied markdown skill files before the Agent starts.
    • Workspace layout (all directories pre-created by the framework — the Agent must not create or check them):
      <suite>/
      ├── suite.md      — suite instructions (read-only); copied and renamed from the user-declared suite path in zero-qa.yaml
      ├── evidence/     — artifacts written by the Agent: screenshots, logs, and captured outputs; the framework pre-creates this directory, executor agents write flat files by default and must not create subdirectories, and the UI can still read nested paths recursively for compatibility
      └── result.json   — structured result written by the Agent at the end
  • Per-Suite Execution
    • Runs once for each suite in the planned order.
    • Uses the Agent runtime to execute the selected executor skill.
    • Turns suite steps into actions on the phone or browser surface.
  • Workspace Reader
    • Lives under zero_qa/ui/ and reads the shared workspace without importing scheduler or runner code.
    • Lists projects and runs, loads DAG metadata, exposes suite evidence, and infers pending / running / passed / failed states.
    • Treats malformed workspace data as an error instead of silently skipping it.
  • UI Server
    • Runs behind zero-qa ui.
    • Serves read-only HTML pages, JSON API endpoints, and one SSE refresh endpoint per run.
    • Uses the shared workspace as its only data source.
  • Agent
    • Is the runtime that actually executes the executor.
    • Zero QA currently supports Codex and Claude Code.
    • The same suite and executor model works with either Agent runtime.
  • executor
    • Is the agent that executes the suite.
    • Turns suite steps into actions on the target.
    • Observes the target and acts on the target during the test.
    • Is a concrete domain specialist, not a generic assistant.
    • Stays reusable across many suites.
    • Carries its expertise through agent-facing documentation, not just through a tool declaration.
  • Codex
    • Is one supported Agent runtime for Zero QA.
  • Claude Code
    • Is another supported Agent runtime for Zero QA.
  • mobile-executor
    • Is a mobile testing expert for Android or iPhone flows.
    • Reads suite steps and knows how to operate the phone to complete them.
    • Focuses on mobile observation, interaction, screenshots, and debugging.
    • Should include concrete guidance for agent-device and the boundary for adb fallback.
  • web-executor
    • Is a web testing expert for browser flows.
    • Reads suite steps and knows how to use Playwright to complete them.
    • Focuses on web observation, interaction, screenshots, and debugging.
    • Should include concrete guidance for Playwright observation, action, waiting, and evidence capture.
  • Zero QA
    • Owns parsing, DAG planning, host-tool checks, workspace assembly, dispatch, and result collection.
    • Already connects executors to mobile and browser targets.

Kernel-Executor Decoupling

The kernel (zero_qa/) must never contain code written for a specific executor. It defines generic contracts; executors fulfill those contracts through declarative configuration in executor.yaml.

  • The kernel does not reference any executor by name, does not reference executor-specific tools, and does not branch on executor identity.
  • Executor-specific behavior is declared in executor.yaml (host tools, runtime input resolve commands, required scenario variables) and executed generically by the kernel.
  • Adding a new executor only requires adding a directory under executors/. No kernel changes are needed.

This is enforced by scripts/lints/check_kernel_executor_decoupling.py, which dynamically discovers executor names and tool names from executors/ and rejects any occurrence in kernel code.

For the executor-side view of this contract, see executors/README.md.

Run and UI Decoupling

zero-qa run and zero-qa ui are intentionally separate processes:

  • zero-qa run writes the shared workspace and does not import zero_qa.ui.
  • zero-qa ui reads the shared workspace and does not import planner, runner, or workspace builder write-path modules.
  • zero-qa ui acquires one machine-wide lock at startup, so only one local dashboard instance can run at a time.
  • the two processes synchronize only through stable files and directories, not through in-process callbacks or RPC
  • UI dependencies remain optional, so the core run path can stay installable without FastAPI, uvicorn, watchfiles, or Jinja2

This keeps the write path minimal and lets one long-running UI aggregate many historical or in-progress runs at once.

What Users Own

  • your suites or usage documents
  • your suite dependencies in zero-qa.yaml
  • your initialization scripts in zero-qa.yaml, such as service startup or building and deploying an APK to an emulator or a phone
  • your scenario-level defaults.executor_type, if needed
  • optionally, the agent_type choice when you want to choose between Codex and Claude Code

The minimal demo in examples/minimal_ecommerce_demo/ is the reference shape for these inputs.

What Zero QA Already Owns

  • selected-suite expansion and DAG planning
  • executor metadata loading and host-tool checks
  • ordered pre-test script execution
  • per-suite isolated workspace assembly
  • shared workspace metadata writing, including DAG nodes and edges
  • executor dispatch
  • read-only workspace scanning for UI consumers
  • local API, HTML, and SSE serving for the dashboard
  • out-of-the-box mobile connection through mobile-executor
  • out-of-the-box browser connection through web-executor
  • final result collection

This means users do not need to design how to connect to phones or browsers. They only need to describe:

  • what each suite verifies
  • how suites depend on each other

If the product already has clear usage documentation for Mobile App or Web flows, that documentation is often enough to become the suites for Zero QA.

For project setup, users can also provide initialization scripts in zero-qa.yaml, for example:

  • start backend services
  • start frontend services
  • build an APK and deploy it to an emulator or install it on a phone

These script lists may be different for web-executor and mobile-executor. These scripts stay in one ordered list. Each script may optionally declare executor_type:

  • omit executor_type when the script should always run
  • use executor_type: web-executor when the script should run only for web
  • use executor_type: mobile-executor when the script should run only for mobile

Agent Runtime

Zero QA separates two concerns:

  • executor: what kind of target to operate
  • agent: which runtime executes that executor

For executor knowledge, Zero QA also separates two layers:

  • executor.yaml: machine-readable runtime dependencies and host-tool setup
  • skill/: agent-facing execution knowledge, including the main workflow and optional focused references

Today, Zero QA supports:

  • Codex
  • Claude Code

In most cases:

  • choose --executor-type mobile-executor when the target is Android or iPhone
  • choose --executor-type web-executor when the target is Browser
  • choose Codex or Claude Code through agent_type when you want to pick the Agent runtime

Important:

  • Codex can run all supported executor types, including mobile-executor and web-executor
  • Claude Code can also run all supported executor types, including mobile-executor and web-executor
  • agent_type only selects the Agent runtime
  • --executor-type selects the target type for the run
  • scenario defaults.executor_type sets the executor directory name used when the CLI does not pass --executor-type
  • zero-qa ui is separate from executor selection and only needs a readable run workspace

Result

  • Suite success returns PASS
  • Suite failure returns FAILED with:
    • a short reason
    • the business step or expectation that failed

At run level, Zero QA also writes a summary with the selected suites and stage durations.

The same workspace also powers the UI dashboard:

  • project and historical run listing
  • per-run DAG visualization
  • suite detail and evidence listing
  • live refresh while result.json or summary.json files are still changing

Debug evidence such as logs and screenshots may be kept by the system, but they are not the normal business-facing result.