System Architecture¶

This document presents a systematic exposition of the Truthound Dashboard architecture, encompassing the system design, inter-component interactions, and the underlying architectural design rationale that informed key engineering decisions.

Overview¶

The Truthound Dashboard has been designed and implemented as a single-process application that consolidates web serving, API handling, task scheduling, and database operations into a unified runtime environment. This architectural decision was deliberately adopted to eliminate external dependencies such as Redis, Celery, or PostgreSQL, thereby achieving a zero-configuration deployment model that minimizes operational overhead and reduces deployment complexity.

┌─────────────────────────────────────────────────────────────────┐
│                    Truthound Dashboard                          │
│                    (Single Process Architecture)                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  $ truthound serve                                              │
│         │                                                       │
│         ▼                                                       │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                     Uvicorn                              │   │
│  │                  (ASGI Server)                           │   │
│  └──────────────────────┬──────────────────────────────────┘   │
│                         │                                       │
│         ┌───────────────┼───────────────┐                      │
│         │               │               │                      │
│         ▼               ▼               ▼                      │
│  ┌───────────┐   ┌───────────┐   ┌───────────┐                │
│  │  FastAPI  │   │   React   │   │APScheduler│                │
│  │   (API)   │   │  (Static) │   │  (Cron)   │                │
│  └─────┬─────┘   └───────────┘   └─────┬─────┘                │
│        │                               │                       │
│        └───────────────┬───────────────┘                       │
│                        │                                       │
│                        ▼                                       │
│         ┌─────────────────────────────┐                       │
│         │          SQLite             │                       │
│         │   (~/.truthound/dashboard.db)                       │
│         └─────────────────────────────┘                       │
│                        │                                       │
│                        ▼                                       │
│         ┌─────────────────────────────┐                       │
│         │        truthound            │                       │
│         │      (Core Library)         │                       │
│         │  th.check, th.profile,      │                       │
│         │  th.learn, th.compare,      │                       │
│         │  th.scan, th.mask           │                       │
│         └─────────────────────────────┘                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Component Specifications¶

Backend Components¶

The backend subsystem is composed of the following constituent components, each of which fulfills a well-defined responsibility within the overall system architecture.

Component	Technology	Responsibility	Location
CLI	Typer	Command-line interface for `truthound serve` and `truthound translate`	`cli.py`
Web Server	Uvicorn	ASGI server providing single-process execution	`main.py`
API Layer	FastAPI	REST API endpoints and static file serving	`api/`
Database	SQLite + aiosqlite	Persistent storage for metadata and validation results	`db/`
ORM	SQLAlchemy 2.0	Asynchronous database operations	`db/models.py`
Scheduler	APScheduler	Cron-based task scheduling	`core/scheduler.py`
Core Engine	truthound	Data quality validation operations	`core/truthound_adapter.py`
Notifications	httpx, aiosmtplib	Multi-channel alert delivery (Slack, Email, Webhook)	`core/notifications/`
Cache	In-memory	API response caching for performance optimization	`core/cache.py`
Security	Fernet	Symmetric encryption for connection credentials	`core/encryption.py`
Schemas	Pydantic 2.x	Request and response validation	`schemas/`
Translation	AI Providers	Multi-language translation CLI	`translate/`
Enterprise Sampling	truthound 1.2.10+	Large-scale sampling strategies (Block, Multi-Stage, Column-Aware, Progressive)	`core/enterprise_sampling.py`

Frontend Components¶

The frontend subsystem has been constructed upon a modern single-page application architecture, leveraging the following technologies and frameworks.

Component	Technology	Responsibility
Framework	React 18	Single-page application architecture
Build System	Vite	Development server and production bundling
Styling	TailwindCSS	Utility-first CSS framework
UI Components	shadcn/ui	Radix-based accessible component library
State Management	Zustand	Lightweight reactive state management
Data Visualization	Recharts	Trend analysis and charting
Routing	React Router 6	Client-side navigation
Internationalization	Intlayer	Type-safe multi-language support
Theming	Zustand + Tailwind	Dark and light mode implementation
Lineage Visualization	ReactFlow, Cytoscape, Mermaid	Data lineage graph rendering

Directory Structure¶

The following directory hierarchy illustrates the organizational taxonomy of the codebase, reflecting the separation of concerns between backend services, frontend presentation, and supporting documentation.

truthound-dashboard/
├── src/truthound_dashboard/     # Backend (FastAPI)
│   ├── __init__.py
│   ├── __main__.py              # Entry point for python -m
│   ├── cli.py                   # CLI commands
│   ├── main.py                  # FastAPI application
│   ├── config.py                # Configuration management
│   │
│   ├── api/                     # REST API endpoints
│   │   ├── sources.py           # Data source CRUD operations
│   │   ├── schemas.py           # Schema management (th.learn)
│   │   ├── validations.py       # Validation execution
│   │   ├── schedules.py         # Schedule management
│   │   ├── notifications.py     # Notification channels and rules
│   │   ├── notifications_advanced.py  # Routing, deduplication, throttling
│   │   ├── glossary.py          # Business glossary API
│   │   ├── catalog.py           # Data catalog API
│   │   ├── collaboration.py     # Comments and activities API
│   │   ├── anomaly.py           # ML-based anomaly detection
│   │   ├── lineage.py           # Data lineage tracking
│   │   ├── reports.py           # Multi-format report generation
│   │   ├── versioning.py        # Result versioning
│   │   ├── rule_suggestions.py  # AI-powered rule generation
│   │   ├── model_monitoring.py  # ML model performance monitoring
│   │   ├── maintenance.py       # Data retention and cleanup
│   │   ├── plugins.py           # Plugin marketplace
│   │   ├── triggers.py          # Event trigger system
│   │   ├── quality_reporter.py  # Quality scoring and reporting
│   │   ├── enterprise_sampling.py  # Enterprise-scale sampling API
│   │   └── health.py            # Health check endpoint
│   │
│   ├── core/                    # Business logic layer
│   │   ├── truthound_adapter.py # truthound library wrapper
│   │   ├── services.py          # Service layer implementation
│   │   ├── scheduler.py         # APScheduler configuration
│   │   ├── cache.py             # In-memory caching
│   │   ├── encryption.py        # Credential encryption
│   │   ├── maintenance.py       # Database cleanup operations
│   │   ├── versioning.py        # Result versioning logic
│   │   ├── notifications/       # Notification subsystem
│   │   │   ├── routing/         # Rule-based message routing
│   │   │   ├── deduplication/   # Duplicate notification prevention
│   │   │   ├── throttling/      # Rate limiting implementation
│   │   │   └── escalation/      # Multi-level alert escalation
│   │   ├── reporters/           # Report generation engines
│   │   │   ├── csv_reporter.py
│   │   │   ├── json_reporter.py
│   │   │   ├── markdown_reporter.py
│   │   │   ├── pdf_reporter.py
│   │   │   └── junit_reporter.py
│   │   ├── quality_reporter.py  # Quality scoring service
│   │   ├── enterprise_sampling.py  # Enterprise-scale sampling strategies
│   │   └── phase5/              # Glossary and catalog services
│   │
│   ├── db/                      # Database layer
│   │   ├── database.py          # SQLite connection management
│   │   ├── models.py            # SQLAlchemy model definitions
│   │   └── repository.py        # Data access layer
│   │
│   ├── schemas/                 # Pydantic model definitions
│   │   ├── source.py
│   │   ├── validation.py
│   │   ├── drift.py
│   │   ├── glossary.py
│   │   ├── catalog.py
│   │   ├── enterprise_sampling.py  # Enterprise sampling request/response models
│   │   └── validators/          # 150+ validator definitions
│   │
│   ├── translate/               # AI translation subsystem
│   │   ├── translator.py        # Translation orchestration
│   │   ├── config_updater.py    # Intlayer configuration updater
│   │   └── providers/           # AI provider implementations
│   │       ├── openai.py
│   │       ├── anthropic.py
│   │       ├── ollama.py
│   │       └── mistral.py
│   │
│   └── static/                  # React build output
│
├── frontend/                    # React source code
│   ├── src/
│   │   ├── pages/               # Page components
│   │   ├── components/          # UI components
│   │   ├── api/                 # API client
│   │   ├── hooks/               # Custom React hooks
│   │   ├── stores/              # Zustand state stores
│   │   ├── content/             # Intlayer translation files
│   │   ├── lib/                 # Utility functions
│   │   ├── providers/           # React context providers
│   │   └── types/               # TypeScript type definitions
│   └── intlayer.config.ts       # Intlayer configuration
│
└── docs/                        # Documentation

API Design¶

Base URL¶

All API endpoints are served under the following base URI, which establishes the versioned namespace for the RESTful interface.

http://localhost:8765/api/v1

Endpoint Categories¶

The API surface area is organized into the following functional categories, each of which encapsulates a logically cohesive set of operations.

Health Monitoring¶

GET  /health                    Server status verification

Data Source Management¶

GET  /sources                   List all data sources
POST /sources                   Create a new data source
GET  /sources/{id}              Retrieve source details
PUT  /sources/{id}              Update source configuration
DEL  /sources/{id}              Delete a data source
POST /sources/{id}/test         Test connection validity

Validation Operations¶

POST /sources/{id}/validate     Execute validation (th.check)
                                  — supports result_format (PHASE 1),
                                    catch_exceptions/max_retries (PHASE 5)
GET  /validations/{id}          Retrieve validation results
                                  — includes statistics (PHASE 2),
                                    validator_execution_summary (PHASE 4),
                                    exception_summary (PHASE 5)
GET  /sources/{id}/validations  Retrieve validation history

Data Profiling¶

POST /sources/{id}/profile      Execute data profiling (th.profile)
POST /sources/{id}/learn        Generate schema automatically (th.learn)

Drift Detection¶

POST /drift/compare             Compare two data sources (th.compare)

Validator Registry¶

GET  /validators                List 150+ available validators

Privacy Operations¶

POST /scans/sources/{id}/scan   Execute PII scanning (th.scan)
POST /masks/sources/{id}/mask   Execute data masking (th.mask)

Schedule Management¶

GET/POST /schedules             Manage scheduled validations

Notification System¶

GET/POST /notifications/channels    Manage notification channels
GET/POST /notifications/rules       Manage notification rules

Advanced Features¶

GET/POST /anomaly               ML-based anomaly detection
GET/POST /lineage               Data lineage management
GET/POST /glossary/terms        Business glossary
GET/POST /catalog/assets        Data catalog
GET/POST /reports               Report generation
GET/POST /model-monitoring      ML model monitoring
GET/POST /plugins               Plugin management
GET/POST /quality/*             Quality scoring and reporting
GET/POST /sampling/*            Enterprise sampling operations

Architectural Design Rationale¶

The selection of constituent technologies was governed by the overarching principle of operational simplicity, wherein each component was evaluated against the criterion of whether it could be embedded within a single-process runtime without introducing external service dependencies.

Selected Technology Stack¶

The following table enumerates the technologies that were selected for inclusion in the system architecture, together with the rationale underpinning each selection decision.

Component	Technology	Rationale
Web Server	Uvicorn	Single-process ASGI server with excellent performance
API Framework	FastAPI	High-performance framework with automatic OpenAPI generation
Database	SQLite	Zero-configuration embedded database
Scheduler	APScheduler	Cron-compatible scheduling within a single process
Frontend	React Static	Pre-built static files for simplified deployment

Architectural Exclusion Rationale¶

Conversely, several widely adopted technologies were deliberately excluded from the architecture. The following table documents each exclusion together with its justification, demonstrating that these omissions represent conscious design decisions rather than oversights.

Component	Rationale
Redis	SQLite provides sufficient functionality; eliminates external dependency
Celery	APScheduler meets scheduling requirements; reduces complexity
PostgreSQL	SQLite satisfies data persistence needs; maintains zero-configuration
WebSocket	HTTP polling provides adequate real-time functionality; reduces complexity
Prometheus/Grafana	Unnecessary for local deployment scenarios

Data Flow Topology¶

The following diagram illustrates the data flow topology through which user requests are propagated across the system's architectural layers. Requests are first subjected to schema validation at the API boundary, subsequently dispatched to the appropriate business logic handler, and ultimately routed to either the persistent storage layer or the truthound validation engine, depending on the nature of the operation.

User Request
     │
     ▼
┌─────────────┐
│   FastAPI   │ ← Request validation (Pydantic)
└─────┬───────┘
      │
      ▼
┌─────────────┐
│    API      │ ← Business logic execution
│  Handlers   │
└─────┬───────┘
      │
      ├──────────────────┐
      │                  │
      ▼                  ▼
┌───────────┐     ┌────────────┐
│  SQLite   │     │ truthound  │ ← Data validation engine
│    DB     │     │  adapter   │
└───────────┘     └────────────┘
                        │
                        ▼
                  ┌───────────┐
                  │   Data    │ ← CSV, Parquet, databases
                  │  Sources  │
                  └───────────┘

Truthound Core Engine Integration Architecture¶

The dashboard maintains a bidirectional integration with the Truthound core validation engine (v1.3.0), which has undergone a systematic five-phase enhancement programme. Each enhancement phase in the core library necessitated corresponding adaptations across the dashboard's backend adapter, result converter, Pydantic schema definitions, and frontend TypeScript type declarations. This section provides a formal specification of the integration architecture and its constituent components.

Integration Layer Topology¶

The integration between the dashboard and the Truthound core engine is mediated by a layered adapter architecture, wherein each layer fulfils a well-defined translation responsibility:

Frontend (React/TypeScript)
  └─ ValidationRunOptions           ← TypeScript interface (PHASE 1/5 params)
        │
        ▼
  └─ POST /sources/{id}/validate    ← REST API boundary
        │
        ▼
Backend (FastAPI)
  └─ ValidationRunRequest           ← Pydantic request schema
        │
        ▼
  └─ ValidationService              ← Parameter propagation (services.py)
        │
        ▼
  └─ TruthoundAdapter.check()       ← Core engine invocation
        │
        ▼
  └─ th.check(**kwargs)             ← truthound Python API
        │
        ▼
  └─ TruthoundResultConverter       ← Domain object → dict translation
        │
        ▼
  └─ CheckResult                    ← Dashboard-internal dataclass
        │
        ▼
  └─ result_json (SQLite)           ← Persistent storage (JSON column)
        │
        ▼
  └─ ValidationResponse             ← Pydantic response schema
        │
        ▼
  └─ Validation (TypeScript)        ← Frontend consumption

Phase-by-Phase Integration Specification¶

The following table enumerates the integration scope for each core engine enhancement phase, together with the specific dashboard files that were modified or extended:

Phase	Core Enhancement	Dashboard Integration Scope	Modified Files
PHASE 1	Result Format System (4-level progressive disclosure)	`result_format`, `include_unexpected_rows`, `max_unexpected_rows` parameter propagation through all layers	`schemas/validation.py`, `truthound_adapter.py`, `services.py`, `api/validations.py`, `validations.ts`, `SourceDetail.tsx`
PHASE 2	Structured Results (`ValidationDetail`, `ReportStatistics`)	`ValidationDetailResult`, `ReportStatistics`, `ValidationIssue` schema extensions; converter rewrite	`converters/truthound.py`, `truthound_adapter.py`, `schemas/validation.py`, `validations.ts`, `Validations.tsx`
PHASE 3	Metric Deduplication (`SharedMetricStore`)	No changes required — internal optimisation transparent to API consumers	—
PHASE 4	DAG Execution (dependency-based conditional validator scheduling)	`ValidatorExecutionSummary`, `SkippedValidatorInfo`; history comparison considerations	`truthound_adapter.py`, `schemas/validation.py`, `validations.ts`, `services.py`
PHASE 5	Exception Isolation (3-tier fallback, auto-retry, circuit breaker)	`catch_exceptions`, `max_retries` parameter propagation; `ExceptionInfo`, `ExceptionSummary` schemas	`converters/truthound.py`, `truthound_adapter.py`, `schemas/validation.py`, `validations.ts`, `services.py`, `api/validations.py`, `SourceDetail.tsx`

Backward Compatibility Strategy¶

The integration adheres to a principled backward compatibility protocol that ensures uninterrupted service during incremental upgrades:

Optional Field Declarations: All fields introduced through the enhancement phases are declared as Optional with None default values in both Pydantic schemas and TypeScript interfaces.
Pydantic Extra Ignore: model_config = ConfigDict(extra="ignore") is applied to all schema classes that receive data from the core engine, ensuring forward compatibility with future Truthound versions.
Defensive Attribute Access: The TruthoundResultConverter employs getattr(obj, "field", default) patterns throughout, gracefully handling absent fields in older engine versions.
Database Schema Stability: No SQLAlchemy model changes were required; all new data is accommodated within the existing result_json JSON column, preserving schema continuity.

Frontend Visualisation Components (PHASE 1–5)¶

The frontend implements dedicated panel components for each enhancement phase's data:

Component	Phase	Functionality
`IssueDetailPanel`	PHASE 2	Renders `ValidationDetail` metrics (element count, unexpected percent, sample values)
`StatisticsPanel`	PHASE 2	Visualises `ReportStatistics` with severity/column/validator breakdowns
`ExecutionSummaryPanel`	PHASE 4	Displays executed/skipped/failed validator counts with skip reason details
`ExceptionSummaryPanel`	PHASE 5	Presents exception statistics, retry rates, and circuit breaker status
`IssueCard`	PHASE ⅖	Enhanced issue card with `ValidationDetail` expansion and `ExceptionInfo` badge
Advanced Options	PHASE ⅕	Collapsible configuration section for `result_format`, `catch_exceptions`, `max_retries`

Security Architecture¶

Credential Management¶

Connection credentials are subjected to symmetric encryption using the Fernet cryptographic scheme prior to their persistence in the SQLite database. The encryption key is automatically generated during initial system initialization and is stored at ~/.truthound/.key with restricted file system permissions, thereby ensuring that sensitive credential material is not exposed in plaintext at rest.

Data Isolation¶

Each Truthound Dashboard instance maintains a strictly isolated data directory (~/.truthound by default), thereby guaranteeing complete separation of state and configuration between multiple concurrent installations. This isolation boundary ensures that no cross-instance information leakage can occur.

Scalability Analysis and Extension Points¶

The single-process architecture has been expressly optimized for individual workstation and small team deployment scenarios, where operational simplicity and minimal configuration overhead are prioritized. For enterprise-scale deployments necessitating horizontal scaling capabilities, the architecture has been designed with the following extension points to facilitate a graduated scaling trajectory:

Load Balancing: Multiple Truthound Dashboard instances may be deployed behind a reverse proxy to distribute request load across replicas.
Shared Storage: Migration to an external relational database can be undertaken to enable multi-instance state coordination and consistency.
Message Queue Integration: Optional integration with Celery or equivalent distributed task processing frameworks can be introduced to support asynchronous workload distribution.