> ## Documentation Index
> Fetch the complete documentation index at: https://docs.blacksmith.sh/llms.txt
> Use this file to discover all available pages before exploring further.

# Monitors

> The fastest way to debug your CI jobs

## Overview

Monitors watch your GitHub Actions workflows and notify your team in Slack when a job fails, a step is skipped, or a run exceeds a duration threshold. Each monitor encodes a rule you would otherwise track manually, such as alerting after three consecutive failures on `main`, alerting whenever a specific step is skipped, or alerting when a deploy runs longer than 20 minutes.

You can also enable [VM retention](#vm-retention), which keeps the runner VM alive when an alert fires so you can SSH in and inspect the failure directly.

To get started, [connect your Slack workspace](https://app.blacksmith.sh/settings?tab=integrations).

<img src="https://mintcdn.com/blacksmith/WaKckQrngeBOTwws/images/duration-monitors-screen.png?fit=max&auto=format&n=WaKckQrngeBOTwws&q=85&s=4f6492dcb436ec38f96069a857ac4ed9" alt="Monitor list showing active monitors and their status" width="3343" height="2254" data-path="images/duration-monitors-screen.png" />

## Basics

### Creating a monitor

Open **Monitors** in the sidebar and click **New Monitor**. The wizard walks you through the repository, workflow, job, optional step, alert condition, Slack channels, and review step.

<video autoPlay muted loop playsInline controls width="100%">
  <source src="https://mintcdn.com/blacksmith/WaKckQrngeBOTwws/images/duration-monitors.mp4?fit=max&auto=format&n=WaKckQrngeBOTwws&q=85&s=8f226d9c9ef607712662cb3969ae4621" type="video/mp4" data-path="images/duration-monitors.mp4" />
</video>

A monitor can watch an entire job or a single step. You can also narrow it to specific workflows, branches, or naming patterns when you only care about part of a CI pipeline.

### Alert condition

Monitors support two main trigger types:

* **Result** - alert after a job or step finishes with a matching result, such as a failure, cancellation, skipped step, or success.
* **Duration** - alert while a job or step is still running, once it exceeds the threshold you set. Duration monitors are useful for hangs, unexpectedly slow tests, and deploys that should finish within a known window. They can fire even if the run eventually succeeds.

### Log pattern filters

For result-based monitors, you can require the logs to match a pattern before an alert fires. This is useful when you only care about a specific error, such as `OutOfMemory` or `SIGKILL`, instead of every failure.

## Notifications

When a monitor fires, Blacksmith posts a Slack alert with the workflow context and a direct link back to the job or step in GitHub Actions.

<img src="https://mintcdn.com/blacksmith/WaKckQrngeBOTwws/images/duration-monitors-slack-message.png?fit=max&auto=format&n=WaKckQrngeBOTwws&q=85&s=cc9e87fbe43ce57839185dd90106568c" alt="Slack alert message with severity badge and action buttons" width="1444" height="824" data-path="images/duration-monitors-slack-message.png" />

## Managing monitors

You can mute a monitor to pause alerts, resolve it to clear the alert state and reset cooldown, or edit its filters, condition, notification channels, and retention settings from the dashboard.

## VM retention

When an alert fires, a monitor can keep the VM alive so you can SSH in and investigate the failure. This is useful when logs are not enough and you want to inspect the exact environment where the job failed or got stuck.

For more on connecting to retained VMs, see [SSH Access](/blacksmith-observability/ssh-access).

## FAQ

<AccordionGroup>
  <Accordion title="Can I monitor specific steps within a job?">
    Yes. Add a step name filter when you create the monitor. Both exact matches and glob patterns such as `Run tests*` are supported.
  </Accordion>

  <Accordion title="Can I alert on jobs or steps that take too long?">
    Yes. Select **Duration** as the alert condition and set your threshold in seconds. Because duration monitors run while the job or step is still in progress, they catch a slow run even if it eventually succeeds.
  </Accordion>

  <Accordion title="How do I avoid alert fatigue?">
    Several settings help reduce noise:

    * Use consecutive events to alert only after N failures in a row
    * Increase the cooldown to limit how often a monitor can fire
    * Mute monitors during planned maintenance windows
    * Scope monitors to the workflows, jobs, and branches that matter
    * Add log pattern filters so only relevant errors trigger an alert
  </Accordion>

  <Accordion title="How can I get help with monitors?">
    Open a [support ticket](https://app.blacksmith.sh/?support=open) and our team will assist you.
  </Accordion>
</AccordionGroup>
