Skip to main content

Overview

Blacksmith Testboxes let coding agents validate local changes against a CI-like environment, instantly. Local changes are synced to a Blacksmith microVM, and commands execute inside a live GitHub Actions job with full access to secrets, OIDC tokens, and service containers. Results stream back for the agent to interpret and continue iterating. Testbox is not designed for humans to invoke directly. It is a machine interface for AI coding agents (Claude Code, Cursor, Codex, Devin, and others) to call programmatically as part of their edit-test-fix loop. Humans configure Testbox once during onboarding; after that, agents own the workflow. Since agents are the primary consumer, the CLI is intentionally verbose and explicit. Every flag is spelled out, every ID is passed explicitly, and there are no implicit defaults that require human intuition.

When to use Testboxes

  • Parallel agents in worktrees. Running multiple coding agents across parallel git worktrees, using tools like Codex or Conductor, where each agent needs to test its own changes in isolation before creating a PR. Each agent warms up its own testbox and runs tests independently without interfering with the others.
  • Cross-platform development. Developing on macOS but targeting Linux. The code depends on Linux-specific behavior, system libraries, or services that don’t exist on macOS. Testboxes run on Linux microVMs, so agents can validate builds and tests against the real target environment.
  • CI environment without the wait. Any non-trivial change usually needs the agent to run tests, which require access to secrets, OIDC tokens, and service containers (Postgres, Redis, etc.). Testboxes provide a full GitHub Actions environment without pushing a commit and polling CI.
  • Fast iteration loops. After initial hydration, subsequent runs take 1-3 seconds. Agents can fix a failing test and re-run only what changed, instead of pushing commits and waiting for a full CI pipeline to finish. The same warm testbox is reused and only changed files sync.

Installation

curl -fsSL https://get.blacksmith.sh | sh
The CLI auto-updates in the background. On every invocation it checks for new versions and downloads updates transparently.

Authentication

blacksmith auth login
Opens a browser to authenticate with GitHub via the Blacksmith dashboard. Saves a token to ~/.blacksmith/credentials. Required once per machine.

How it works

Testbox follows a two-step workflow: warmup, then run.
1

Warmup

The agent calls blacksmith testbox warmup <workflow>. This dispatches a GitHub Actions workflow and returns a testbox ID immediately. The runner boots and hydrates in the background: installing dependencies, starting service containers, and running any setup steps defined in the workflow.
2

Run

The agent calls blacksmith testbox run --id <id> "<command>". This syncs local changes to the testbox via rsync over SSH, executes the command, and streams stdout/stderr back. The agent inspects the output and exit code to determine pass/fail.
The warmup phase includes checking out the Git repo, setting up any Docker service containers, and installing dependencies (e.g., running npm install). Once the testbox is ready, run commands complete in 1-3 seconds since only changed files need to sync. The agent can issue as many run commands as needed against the same warm testbox. From the client’s perspective: Warm upRunner hydrates (installs deps, starts services) → Runner readyAgent runs commands (each run syncs state, executes, and streams output).

Onboarding

blacksmith testbox init
Interactive TUI that sets up a testbox workflow for the current repository. This only needs to be run once per repo. The init command:
  1. Scans the repo’s .github/workflows/ for workflow files
  2. Prompts the user to select a workflow and job that has the dependencies and services they need
  3. Generates a testbox-compatible workflow from the selected job (keeps setup, strips execution, adds testbox actions)
  4. Writes .github/workflows/blacksmith-testbox.yml and .claude/skills/blacksmith-testbox/SKILL.md
  5. Optionally creates a PR
The generated workflow is triggered via workflow_dispatch and contains only setup steps (no test execution). The generated skill file teaches agents how to use Testbox in this repository.

CLI reference

blacksmith testbox warmup

Dispatches a testbox and returns an ID immediately. Required before any run command.
Usage:
  blacksmith testbox warmup <workflow> [flags]

Flags:
      --idle-timeout int        Idle timeout in minutes (default 10)
      --job string              Job name within the workflow
      --ref string              Git ref to dispatch against (default: repo's default branch)
      --ssh-public-key string   SSH public key to install on the testbox
FlagDescription
--idle-timeoutMinutes of inactivity before the testbox is automatically stopped. Defaults to 10.
--jobSpecific job within the workflow to run. Useful when the workflow defines multiple jobs.
--refGit ref (branch, tag, SHA) to dispatch the workflow against. Defaults to the repo’s default branch.
--ssh-public-keyPath to an SSH public key to install on the testbox. When omitted, a keypair is auto-generated and cached at ~/.blacksmith/testboxes/{id}/.
The returned testbox ID is the handle for all subsequent commands.
blacksmith testbox warmup blacksmith-testbox.yml
# → tbx_01jkz5b3t9n8qr4xvwy0g6m2h1

blacksmith testbox warmup blacksmith-testbox.yml --ref feature/auth --job test-backend
# → tbx_01jkz6a2m4p7rs5ywx0h8n3c4d

blacksmith testbox run

Syncs local changes and executes a command on the testbox. If the testbox is still hydrating, the command blocks until the testbox is ready.
Usage:
  blacksmith testbox run --id <id> "<command>" [flags]

Flags:
      --debug                    Show detailed sync timing information
      --id string                Testbox ID from warmup
      --ssh-private-key string   Path to SSH private key (use when warmup was called with --ssh-public-key)
FlagDescription
--idTestbox ID returned by warmup. Required.
--debugPrints detailed rsync timing and transfer statistics.
--ssh-private-keyPath to the SSH private key. Only needed when warmup was called with --ssh-public-key.
File sync uses rsync --delete --checksum to mirror the local working tree to the testbox. Deleted files locally are removed from the testbox. The testbox is always an exact replica of local state. The command exits with the remote command’s exit code. Agents can check the exit code to determine pass/fail.
blacksmith testbox run --id tbx_01jkz5b3t9... "npm test"
blacksmith testbox run --id tbx_01jkz5b3t9... "go test ./pkg/api/... -run TestHandler -v"
blacksmith testbox run --id tbx_01jkz5b3t9... "cd backend && php artisan test --filter=HealthCheckTest"
blacksmith testbox run --id tbx_01jkz5b3t9... "python -m pytest tests/test_api.py -k test_auth"

blacksmith testbox status

Shows the current status of a testbox. Supports --wait to block until the testbox is ready.
Usage:
  blacksmith testbox status [flags]

Flags:
      --id string             Testbox ID to look up
      --wait                  Block until the testbox is ready
      --wait-timeout string   Maximum time to wait (e.g., 5m, 10m, 1h) (default "5m")
FlagDescription
--idTestbox ID returned by warmup. Required.
--waitBlock until the testbox reaches ready status.
--wait-timeoutMaximum duration to wait before timing out. Defaults to 5m.
Testbox statuses progress through: queuedhydratingreadycompleted.

blacksmith testbox stop

Stops a running testbox and cancels the underlying GitHub Actions run.
Usage:
  blacksmith testbox stop --id <id> [flags]

Flags:
      --id string   Testbox ID to stop

blacksmith testbox init

Interactive onboarding TUI. Sets up a testbox workflow and agent skill for the current repository.

blacksmith auth login

Authenticates with Blacksmith. Opens a browser for the OAuth flow.

Agent integration

The init command generates a Claude Code skill at .claude/skills/blacksmith-testbox/SKILL.md that teaches agents to:
  1. Warm up immediately when receiving a coding task. Start the testbox while the agent begins writing code.
  2. Never run tests locally. Always run through Testbox.
  3. Reuse the testbox ID for all subsequent run commands within the same task
  4. Fix and re-run when tests fail. The same warm testbox is reused, only changed files sync, so re-runs take 1-2 seconds.
  5. Stop the testbox when done, or let the idle timeout handle cleanup

Example agent workflow

# Agent receives a task, immediately warms up
blacksmith testbox warmup blacksmith-testbox.yml
# → tbx_01abc...

# Agent writes code while testbox hydrates...

# Agent runs tests
blacksmith testbox run --id tbx_01abc... "npm test"

# Tests fail, agent fixes code, re-runs (~1-2s)
blacksmith testbox run --id tbx_01abc... "npm test"

# Green. Agent commits and pushes.
git add -A && git commit -m "fix: handler timeout" && git push

Pricing

Testbox execution is billed per-minute on the same Blacksmith runner pricing as CI. The idle timeout (default 10 minutes) ensures testboxes don’t run indefinitely.

FAQ

The run command automatically waits for the testbox to reach ready status before syncing files and executing the command. The agent does not need to poll status separately.
The CLI uses rsync --delete --checksum over SSH to mirror the local working tree to the testbox. Only changed files are transferred. Deleted files locally are removed from the testbox. After the initial full sync, incremental syncs typically complete in under a second.
Yes. The testbox stays warm until the idle timeout is reached or the agent calls stop. The agent can issue as many run commands as needed.
The default idle timeout is 10 minutes. If no run command is issued within that window, the testbox is automatically stopped. The timeout can be configured via --idle-timeout during warmup.
No. When --ssh-public-key is not specified during warmup, the CLI auto-generates a keypair and caches it at ~/.blacksmith/testboxes/{id}/. SSH key management is fully automatic.
Authentication tokens are stored at ~/.blacksmith/credentials. SSH keypairs are cached at ~/.blacksmith/testboxes/{id}/.