Error handling in Bosun tasks
Bosun's runtime surfaces step failures immediately, but you control whether those failures should stop the task. Every step honours the continue_on_error flag so you can keep the workflow moving while still recording detailed diagnostics.
When using tasks as graphs, you can also configure error handling at the edge level. This enables branching based on success or failure states. See Task Graphs and Branching for details.
Default behaviour
If a step errors and continue_on_error is omitted (or set to false), execution stops at that point. Typical failure modes include:
agent: the agent calls thefailtool, instruction rendering fails, or the model session errors.run: the shell command exits with a non-zero status or the executor encounters an error.prompt/structured_prompt: templating fails, the model call errors, or (for structured prompts) the response cannot be validated against the schema.for_each: any iteration fails whilecontinue_on_erroris disabled; in-flight work is cancelled and the error bubbles out.
Continuing after failures
Set continue_on_error: true on any step to log the failure and continue with the rest of the task.
steps:
- id: format_sources
run: npm run fmt
continue_on_error: true
- id: notify
prompt: |
Formatting left {{ errors.format_sources | length }} issues.
Details:\n{{ errors.format_sources | json_encode(pretty=true) }}
What gets recorded
When a step continues after an error:
- The step's output will be the output of what caused the failure. If an agent fails, the payload from the
task_failedtool (when present) becomes the step output. If arunstep fails, the capturedstdoutandstderrfields are preserved. - The same object is appended to the
errorscollection in the template context. Access it either by step index (errors.0) or byid(errors.format_sources). Each entry is an array because afor_eachstep can emit multiple failures. for_eachadds extra metadata for each failing iteration, including theindexand the renderedinput.
You can use this structured data to branch on specific failure types, produce concise summaries, or feed the details into a follow-up agent.
Designing resilient workflows
- Enable
continue_on_erroron steps where a failure should not block the workflow, then add follow-up steps that inspecterrors.<step>to decide on remediation. - Combine templating helpers like
json_encode,length, orfirstto present concise summaries to humans or agents. - For
for_each, collect the failures and spin up a targeted task (or rerun the loop) with the inputs that still need work. - For convenience, outputs from failed steps are still available under
outputs.<step>, so you can mix-and-match success and failure data as needed.
Structured stop/fail payloads
Agents expose two schema-aware tools:
stopreturns whatever schema you attach withstop_schema. Without overrides the tool expects a singleoutputstring, but you can request richer data (arrays, booleans, nested objects) when needed.task_failedfollows the schema infail_schema. The runtime records that JSON in the step output and inerrors.<step>[i].reason, so you can branch on specific keys without parsing free-form prose.
Example: capture retry signals whenever the agent cannot complete its work.
steps:
- id: stabilize_tests
agent:
extends: Coding
instructions: "Fix the flaky tests listed in the issue."
fail_schema:
type: object
required: [summary, retryable]
properties:
summary:
type: string
retryable:
type: boolean
If the agent emits:
{
"summary": "CI cannot reach the staging database",
"retryable": false
}
then outputs.stabilize_tests.summary mirrors the JSON, and errors.stabilize_tests[0].reason.retryable is false. A follow-up prompt step can check that field and notify an operator only when manual help is required.
Bosun also preserves any successful output it had already stored before the failure, so enabling continue_on_error no longer deletes useful data.