Skip to main content
Testboxes are in early beta. The interface and behavior may evolve as we iterate.

Overview

Testboxes sync your local changes to a Blacksmith microVM and run commands inside a real GitHub Actions job, with access to secrets, OIDC tokens, and service containers. Output streams back to the agent. The CLI is agent-first. The intended consumer is a coding agent, not a human. This informs the design: commands and flags are verbose and explicit so agents can construct invocations without ambiguity.

When to use Testboxes

  • Parallel agents in worktrees. Each agent warms up its own testbox and runs tests independently, without interfering with the others.
  • Cross-platform development. Developing on macOS but targeting Linux. Testboxes run on Linux microVMs with the real target environment.
  • Debug flaky tests. Spin up dozens of testboxes in parallel to reproduce a non-deterministic flake, then let the agent debug once it triggers.
  • Fast iteration loops. After initial hydration, subsequent runs take 1-3 seconds. Only changed files sync.

How it works

Testbox follows a two-step workflow: warmup, then run.
1

Warmup

The agent calls blacksmith testbox warmup <workflow>. This dispatches a GitHub Actions workflow and returns a testbox ID immediately. The runner boots and hydrates in the background: installing dependencies, starting service containers, and running any setup steps defined in the workflow.
2

Run

The agent calls blacksmith testbox run --id <id> "<command>". This syncs local changes to the testbox, executes the command, and streams stdout/stderr back. The agent checks the output and exit code.
Once the testbox is ready, run commands complete in 1-3 seconds since only changed files need to sync. You can run as many commands as you want against the same warm testbox.

Get started

1

Install the CLI

curl -fsSL https://get.blacksmith.sh | sh
The CLI auto-updates in the background on every invocation.
2

Set up a testbox workflow for your repository

blacksmith testbox init
Scans your repo’s workflows, selects the right job, and generates a testbox-compatible workflow file and agent skill. If you’re not logged in yet, the CLI will open a browser to authenticate first. Only needs to be run once per repo.

Agent integration

The init command generates a skill file that teaches agents to:
  1. Warm up immediately when receiving a coding task.
  2. Never run tests locally. Always run through Testbox.
  3. Reuse the testbox ID for all subsequent run commands within the same task.
  4. Fix and re-run when tests fail. Re-runs take 1-2 seconds.
  5. Stop the testbox when done, or let the idle timeout handle cleanup.

Pricing

Testbox execution is billed per-minute on the same Blacksmith runner pricing as CI. The idle timeout (default 30 minutes) ensures testboxes don’t run indefinitely.

FAQ

The run command automatically waits for the testbox to reach ready status before syncing files and executing the command. The agent does not need to poll status separately.
The CLI uses rsync --delete --checksum over SSH to mirror the local working tree to the testbox. Only changed files are transferred. Deleted files locally are removed from the testbox. After the initial full sync, incremental syncs typically complete in under a second.
Yes. The testbox stays warm until the idle timeout is reached or the agent calls stop. The agent can issue as many run commands as needed.
The default idle timeout is 30 minutes. If no run command is issued within that window, the testbox is automatically stopped. The timeout can be configured via --idle-timeout during warmup.
No. When --ssh-public-key is not specified during warmup, the CLI generates a keypair and caches it at ~/.blacksmith/testboxes/{id}/. You don’t need to manage SSH keys yourself.
Authentication tokens are stored at ~/.blacksmith/credentials. SSH keypairs are cached at ~/.blacksmith/testboxes/{id}/.