Overview
Blacksmith Testboxes let coding agents validate local changes against a CI-like environment, instantly. Local changes are synced to a Blacksmith microVM, and commands execute inside a live GitHub Actions job with full access to secrets, OIDC tokens, and service containers. Results stream back for the agent to interpret and continue iterating. Testbox is not designed for humans to invoke directly. It is a machine interface for AI coding agents (Claude Code, Cursor, Codex, Devin, and others) to call programmatically as part of their edit-test-fix loop. Humans configure Testbox once during onboarding; after that, agents own the workflow. Since agents are the primary consumer, the CLI is intentionally verbose and explicit. Every flag is spelled out, every ID is passed explicitly, and there are no implicit defaults that require human intuition.When to use Testboxes
- Parallel agents in worktrees. Running multiple coding agents across parallel git worktrees, using tools like Codex or Conductor, where each agent needs to test its own changes in isolation before creating a PR. Each agent warms up its own testbox and runs tests independently without interfering with the others.
- Cross-platform development. Developing on macOS but targeting Linux. The code depends on Linux-specific behavior, system libraries, or services that don’t exist on macOS. Testboxes run on Linux microVMs, so agents can validate builds and tests against the real target environment.
- CI environment without the wait. Any non-trivial change usually needs the agent to run tests, which require access to secrets, OIDC tokens, and service containers (Postgres, Redis, etc.). Testboxes provide a full GitHub Actions environment without pushing a commit and polling CI.
- Fast iteration loops. After initial hydration, subsequent runs take 1-3 seconds. Agents can fix a failing test and re-run only what changed, instead of pushing commits and waiting for a full CI pipeline to finish. The same warm testbox is reused and only changed files sync.
Installation
Authentication
~/.blacksmith/credentials. Required once per machine.
How it works
Testbox follows a two-step workflow: warmup, then run.Warmup
The agent calls
blacksmith testbox warmup <workflow>. This dispatches a GitHub Actions workflow and returns a testbox ID immediately. The runner boots and hydrates in the background: installing dependencies, starting service containers, and running any setup steps defined in the workflow.npm install). Once the testbox is ready, run commands complete in 1-3 seconds since only changed files need to sync. The agent can issue as many run commands as needed against the same warm testbox.
From the client’s perspective: Warm up → Runner hydrates (installs deps, starts services) → Runner ready → Agent runs commands (each run syncs state, executes, and streams output).
Onboarding
- Scans the repo’s
.github/workflows/for workflow files - Prompts the user to select a workflow and job that has the dependencies and services they need
- Generates a testbox-compatible workflow from the selected job (keeps setup, strips execution, adds testbox actions)
- Writes
.github/workflows/blacksmith-testbox.ymland.claude/skills/blacksmith-testbox/SKILL.md - Optionally creates a PR
workflow_dispatch and contains only setup steps (no test execution). The generated skill file teaches agents how to use Testbox in this repository.
CLI reference
blacksmith testbox warmup
Dispatches a testbox and returns an ID immediately. Required before any run command.
| Flag | Description |
|---|---|
--idle-timeout | Minutes of inactivity before the testbox is automatically stopped. Defaults to 10. |
--job | Specific job within the workflow to run. Useful when the workflow defines multiple jobs. |
--ref | Git ref (branch, tag, SHA) to dispatch the workflow against. Defaults to the repo’s default branch. |
--ssh-public-key | Path to an SSH public key to install on the testbox. When omitted, a keypair is auto-generated and cached at ~/.blacksmith/testboxes/{id}/. |
blacksmith testbox run
Syncs local changes and executes a command on the testbox. If the testbox is still hydrating, the command blocks until the testbox is ready.
| Flag | Description |
|---|---|
--id | Testbox ID returned by warmup. Required. |
--debug | Prints detailed rsync timing and transfer statistics. |
--ssh-private-key | Path to the SSH private key. Only needed when warmup was called with --ssh-public-key. |
rsync --delete --checksum to mirror the local working tree to the testbox. Deleted files locally are removed from the testbox. The testbox is always an exact replica of local state.
The command exits with the remote command’s exit code. Agents can check the exit code to determine pass/fail.
blacksmith testbox status
Shows the current status of a testbox. Supports --wait to block until the testbox is ready.
| Flag | Description |
|---|---|
--id | Testbox ID returned by warmup. Required. |
--wait | Block until the testbox reaches ready status. |
--wait-timeout | Maximum duration to wait before timing out. Defaults to 5m. |
queued → hydrating → ready → completed.
blacksmith testbox stop
Stops a running testbox and cancels the underlying GitHub Actions run.
blacksmith testbox init
Interactive onboarding TUI. Sets up a testbox workflow and agent skill for the current repository.
blacksmith auth login
Authenticates with Blacksmith. Opens a browser for the OAuth flow.
Agent integration
Theinit command generates a Claude Code skill at .claude/skills/blacksmith-testbox/SKILL.md that teaches agents to:
- Warm up immediately when receiving a coding task. Start the testbox while the agent begins writing code.
- Never run tests locally. Always run through Testbox.
- Reuse the testbox ID for all subsequent
runcommands within the same task - Fix and re-run when tests fail. The same warm testbox is reused, only changed files sync, so re-runs take 1-2 seconds.
- Stop the testbox when done, or let the idle timeout handle cleanup
Example agent workflow
Pricing
Testbox execution is billed per-minute on the same Blacksmith runner pricing as CI. The idle timeout (default 10 minutes) ensures testboxes don’t run indefinitely.FAQ
What happens if the testbox is still hydrating when the agent calls run?
What happens if the testbox is still hydrating when the agent calls run?
The
run command automatically waits for the testbox to reach ready status before syncing files and executing the command. The agent does not need to poll status separately.How does file sync work?
How does file sync work?
The CLI uses
rsync --delete --checksum over SSH to mirror the local working tree to the testbox. Only changed files are transferred. Deleted files locally are removed from the testbox. After the initial full sync, incremental syncs typically complete in under a second.Can the agent run multiple commands on the same testbox?
Can the agent run multiple commands on the same testbox?
Yes. The testbox stays warm until the idle timeout is reached or the agent calls
stop. The agent can issue as many run commands as needed.What is the idle timeout?
What is the idle timeout?
The default idle timeout is 10 minutes. If no
run command is issued within that window, the testbox is automatically stopped. The timeout can be configured via --idle-timeout during warmup.Do I need to set up SSH keys?
Do I need to set up SSH keys?
No. When
--ssh-public-key is not specified during warmup, the CLI auto-generates a keypair and caches it at ~/.blacksmith/testboxes/{id}/. SSH key management is fully automatic.Where are credentials stored?
Where are credentials stored?
Authentication tokens are stored at
~/.blacksmith/credentials. SSH keypairs are cached at ~/.blacksmith/testboxes/{id}/.