Skip to content

Upstream Source

This page is part of Truthound Orchestration 3.x.

Source repository: seadonggyun4/truthound-orchestration Upstream docs path: docs/kestra/task-runner-retries.md Edit upstream page: Edit in orchestration

Kestra Task Runners and Retries

Truthound does not override Kestra retry semantics. Instead, generated flows and script helpers expose the places where retries belong so teams can keep execution policy aligned with the rest of their Kestra estate.

Who This Is For

  • Kestra operators defining retry policy for validation flows
  • platform teams using generated flow templates
  • engineers separating transient infrastructure failures from data failures

When To Use It

Use this page when:

  • generated flows need retry configuration
  • a script task should be rerun on transient issues
  • the team is deciding what the task runner, not the validation engine, should own

Prerequisites

  • a Kestra deployment with task runners
  • familiarity with generated flow config and RetryConfig
  • a clear policy for transient failure vs deterministic quality failure

Minimal Quickstart

Generated flows accept retry configuration:

from truthound_kestra import RetryConfig, generate_check_flow

yaml_content = generate_check_flow(
    flow_id="users_quality",
    namespace="production",
    retry=RetryConfig(max_attempts=3),
)

Production Pattern

Recommended policy:

Failure Type Retry? Owner
transient runner/storage/network issue yes, cautiously Kestra retry policy
deterministic bad data no quality result and operator response
unsupported source or config error no fix configuration first

Checklist:

  • retry infrastructure failures, not broken data
  • keep retry counts low for expensive quality tasks
  • pair retries with clear logging so the final failure is diagnosable

Failure Modes and Troubleshooting

Symptom Likely Cause What To Do
retries never help the failure is deterministic data quality remove retries and surface the result directly
validation saturates the runner retry counts are too high for expensive tasks lower attempts and improve gating
root cause is hard to inspect retries overwrite context without clear logs emit metrics and structured outputs on each attempt