Skip to content

Lineage Commands

Data lineage tracking and visualization commands.

Overview

Command Description Primary Use Case
show Display lineage information Explore dependencies
impact Analyze change impact Change management
visualize Generate lineage visualization Documentation

What is Data Lineage?

Data lineage tracks the flow of data through your systems:

  • Upstream: Where data comes from (sources)
  • Downstream: Where data goes (consumers)
  • Transformations: How data is modified

Lineage File Format

Lineage information is stored in JSON format:

{
  "version": "1.0",
  "nodes": [
    {
      "id": "raw_data",
      "type": "source",
      "name": "Raw Data",
      "metadata": {
        "path": "s3://bucket/raw/",
        "format": "parquet"
      }
    },
    {
      "id": "cleaned_data",
      "type": "transformation",
      "name": "Cleaned Data",
      "metadata": {
        "path": "s3://bucket/cleaned/",
        "transformation": "data_cleaning_job"
      }
    },
    {
      "id": "analytics_table",
      "type": "table",
      "name": "Analytics Table",
      "metadata": {
        "database": "analytics",
        "table": "user_metrics"
      }
    }
  ],
  "edges": [
    {
      "source": "raw_data",
      "target": "cleaned_data"
    },
    {
      "source": "cleaned_data",
      "target": "analytics_table"
    }
  ]
}

Workflow

graph LR
    A[Lineage File] --> B{Task?}
    B -->|Explore| C[lineage show]
    B -->|Impact| D[lineage impact]
    B -->|Document| E[lineage visualize]
    C --> F[Console/JSON/DOT]
    D --> G[Impact Report]
    E --> H[HTML/SVG/Mermaid]

Quick Examples

Show Lineage

# Show all lineage
truthound lineage show lineage.json

# Show upstream of a specific node
truthound lineage show lineage.json --node analytics_table --direction upstream

# Export as DOT format
truthound lineage show lineage.json --format dot > lineage.dot

Analyze Impact

# Analyze impact of changing raw_data
truthound lineage impact lineage.json raw_data

# Limit analysis depth
truthound lineage impact lineage.json raw_data --max-depth 2

Visualize

# Generate interactive D3 visualization
truthound lineage visualize lineage.json -o graph.html

# Generate Graphviz diagram
truthound lineage visualize lineage.json -o graph.svg --renderer graphviz

# Generate Mermaid diagram
truthound lineage visualize lineage.json -o graph.md --renderer mermaid

Visualization Renderers

Renderer Output Features
d3 HTML Interactive, pan/zoom, tooltips
cytoscape HTML Interactive, layouts, search
graphviz SVG/PNG Static, publication quality
mermaid Markdown Embeddable in docs

Node Types

Type Description Icon
source Raw data source 📥
table Database table 🗃️
file File-based data 📄
stream Streaming source 🌊
transformation Data transformation ⚙️
validation Validation checkpoint
model ML model 🤖
report Output report 📊
external External system 🔗
virtual Virtual/computed dataset 💭

Use Cases

1. Data Discovery

# Explore what feeds into a table
truthound lineage show lineage.json --node my_table --direction upstream

2. Change Impact Analysis

# Before changing a source, check what's affected
truthound lineage impact lineage.json source_table -o impact_report.json

3. Documentation

# Generate documentation for data catalog
truthound lineage visualize lineage.json -o docs/lineage.html --renderer d3 --theme light

4. Debugging

# Trace data flow to find issues
truthound lineage show lineage.json --node failed_table --direction both

Integration with OpenLineage

Truthound supports the OpenLineage standard for interoperability via the Python API:

from truthound.lineage.integrations.openlineage import OpenLineageEmitter

# Create emitter
emitter = OpenLineageEmitter()

# Start a run
run = emitter.start_run("my-job")

# Emit completion
emitter.emit_complete(run, outputs=[...])

OpenLineage API

OpenLineage integration is available through the Python API. CLI commands for import/export are planned for future releases.

Next Steps

  • show - Display lineage information
  • impact - Analyze change impact
  • visualize - Generate visualization

See Also