Testboxes are in early beta. The interface and behavior may evolve as we iterate.
Overview
Testboxes sync your local changes to a Blacksmith microVM and run commands inside a real GitHub Actions job, with access to secrets, OIDC tokens, and service containers. Output streams back to the agent. The CLI is agent-first. The intended consumer is a coding agent, not a human. This informs the design: commands and flags are verbose and explicit so agents can construct invocations without ambiguity.When to use Testboxes
- Parallel agents in worktrees. Each agent warms up its own testbox and runs tests independently, without interfering with the others.
- Cross-platform development. Developing on macOS but targeting Linux. Testboxes run on Linux microVMs with the real target environment.
- Debug flaky tests. Spin up dozens of testboxes in parallel to reproduce a non-deterministic flake, then let the agent debug once it triggers.
- Fast iteration loops. After initial hydration, subsequent runs take 1-3 seconds. Only changed files sync.
How it works
Testbox follows a two-step workflow: warmup, then run.Warmup
The agent calls
blacksmith testbox warmup <workflow>. This dispatches a GitHub Actions workflow and returns a testbox ID immediately. The runner boots and hydrates in the background: installing dependencies, starting service containers, and running any setup steps defined in the workflow.Get started
Agent integration
Theinit command generates a skill file that teaches agents to:
- Warm up immediately when receiving a coding task.
- Never run tests locally. Always run through Testbox.
- Reuse the testbox ID for all subsequent
runcommands within the same task. - Fix and re-run when tests fail. Re-runs take 1-2 seconds.
- Stop the testbox when done, or let the idle timeout handle cleanup.
Pricing
Testbox execution is billed per-minute on the same Blacksmith runner pricing as CI. The idle timeout (default 30 minutes) ensures testboxes don’t run indefinitely.FAQ
What happens if the testbox is still hydrating when the agent calls run?
What happens if the testbox is still hydrating when the agent calls run?
The
run command automatically waits for the testbox to reach ready status before syncing files and executing the command. The agent does not need to poll status separately.How does file sync work?
How does file sync work?
The CLI uses
rsync --delete --checksum over SSH to mirror the local working tree to the testbox. Only changed files are transferred. Deleted files locally are removed from the testbox. After the initial full sync, incremental syncs typically complete in under a second.Can the agent run multiple commands on the same testbox?
Can the agent run multiple commands on the same testbox?
Yes. The testbox stays warm until the idle timeout is reached or the agent calls
stop. The agent can issue as many run commands as needed.What is the idle timeout?
What is the idle timeout?
The default idle timeout is 30 minutes. If no
run command is issued within that window, the testbox is automatically stopped. The timeout can be configured via --idle-timeout during warmup.Do I need to set up SSH keys?
Do I need to set up SSH keys?
No. When
--ssh-public-key is not specified during warmup, the CLI generates a keypair and caches it at ~/.blacksmith/testboxes/{id}/. You don’t need to manage SSH keys yourself.Where are credentials stored?
Where are credentials stored?
Authentication tokens are stored at
~/.blacksmith/credentials. SSH keypairs are cached at ~/.blacksmith/testboxes/{id}/.