Skip to content

FileSystem & Memory Stores

Local storage backends for development, testing, and single-node deployments.

FileSystem Store

The FileSystemStore persists validation results as JSON files on the local filesystem. It requires no external dependencies and is the default backend.

Basic Usage

from truthound.stores.backends.filesystem import FileSystemStore

# Default configuration
store = FileSystemStore()

# Custom path
store = FileSystemStore(
    base_path=".truthound/store",
    namespace="production",
    prefix="validations",
)

# Save a result
result = ValidationResult.from_report(report, "customers.csv")
run_id = store.save(result)

# Retrieve
result = store.get(run_id)

Using the Factory

from truthound.stores import get_store

store = get_store("filesystem", base_path=".truthound/store")

Configuration

from truthound.stores.backends.filesystem import FileSystemConfig

config = FileSystemConfig(
    base_path=".truthound/store",    # Base directory
    namespace="default",              # Namespace for organization
    prefix="validations",             # Path prefix
    file_extension=".json",           # File extension
    create_dirs=True,                 # Auto-create directories
    pretty_print=True,                # Indent JSON output
    use_compression=False,            # Enable gzip compression
)

Configuration Options

Option Type Default Description
base_path str .truthound/store Base directory for files
namespace str "default" Logical grouping namespace
prefix str "validations" Additional path prefix
file_extension str .json File extension
create_dirs bool True Create directories if missing
pretty_print bool True Format JSON with indentation
use_compression bool False Compress files with gzip

Compression

Enable gzip compression to reduce storage space:

store = FileSystemStore(
    base_path=".truthound/store",
    compression=True,  # Files saved as .json.gz
)

When compression is enabled: - Files are saved with .json.gz extension - Content is compressed using gzip.compress() - Decompression is automatic on read

Directory Structure

The store organizes files as:

{base_path}/
├── {namespace}/
│   └── {prefix}/
│       ├── _index.json          # Metadata index
│       ├── {run_id}.json        # Result files
│       └── {run_id}.json.gz     # (if compression enabled)

Index Management

The store maintains an _index.json file for fast queries:

{
  "run-123": {
    "data_asset": "customers.csv",
    "run_time": "2024-01-15T10:30:00",
    "status": "failure",
    "file": "run-123.json",
    "tags": {"env": "prod"}
  }
}

Rebuild the index if it becomes corrupted:

store = FileSystemStore(base_path=".truthound/store")
store.initialize()
count = store.rebuild_index()
print(f"Indexed {count} items")

Expectation Store

Store expectation suites separately:

from truthound.stores.backends.filesystem import FileSystemExpectationStore

store = FileSystemExpectationStore(
    base_path=".truthound/expectations",
    namespace="default",
    prefix="suites",
)

# Save suite
suite = ExpectationSuite.create("my_suite", "customers.csv")
store.save(suite)

# Retrieve
suite = store.get("my_suite")

Memory Store

The MemoryStore keeps data in memory. Data is not persisted between sessions.

Use Cases

  • Unit testing
  • Development and prototyping
  • Temporary validation workflows
  • Performance benchmarking

Basic Usage

from truthound.stores.backends.memory import MemoryStore

store = MemoryStore()

# Save and retrieve
run_id = store.save(result)
result = store.get(run_id)

# Check existence
assert store.exists(run_id)

# Clear all data
count = store.clear_all()

Configuration

from truthound.stores.backends.memory import MemoryConfig

config = MemoryConfig(
    max_items=0,       # 0 = unlimited
    deep_copy=True,    # Deep copy on save/retrieve
)

Configuration Options

Option Type Default Description
max_items int 0 Max items to store (0 = unlimited)
deep_copy bool True Deep copy items on save/retrieve

Memory Management

Limit memory usage with max_items:

store = MemoryStore(max_items=1000)

# When limit is reached, oldest items are removed
for i in range(1500):
    store.save(result)

# Only the newest 1000 items remain
assert len(store.list_ids()) == 1000

Deep Copy Behavior

By default, items are deep copied to prevent mutation:

# With deep_copy=True (default)
store = MemoryStore(deep_copy=True)
run_id = store.save(result)
retrieved = store.get(run_id)
retrieved.tags["modified"] = True
original = store.get(run_id)
assert "modified" not in original.tags  # Original unchanged

# With deep_copy=False (faster but mutable)
store = MemoryStore(deep_copy=False)
# Warning: Changes to retrieved items affect stored data

Expectation Store

from truthound.stores.backends.memory import MemoryExpectationStore

store = MemoryExpectationStore()

suite = ExpectationSuite.create("test_suite", "data.csv")
store.save(suite)
suite = store.get("test_suite")

Common Operations

Both stores implement the ValidationStore protocol:

# Initialize (lazy, called automatically)
store.initialize()

# Save
run_id = store.save(result)

# Retrieve
result = store.get(run_id)

# Check existence
exists = store.exists(run_id)

# Delete
deleted = store.delete(run_id)

# List IDs
ids = store.list_ids()

# Query with filters
from truthound.stores.base import StoreQuery

query = StoreQuery(
    data_asset="customers.csv",
    status="failure",
    limit=10,
)
results = store.query(query)

Error Handling

from truthound.stores.base import (
    StoreNotFoundError,
    StoreReadError,
    StoreWriteError,
)

try:
    result = store.get("nonexistent-id")
except StoreNotFoundError as e:
    print(f"Not found: {e.identifier}")

try:
    store.save(result)
except StoreWriteError as e:
    print(f"Write failed: {e}")

Choosing Between Stores

Scenario Recommended Store
Production (single node) FileSystemStore
Development FileSystemStore
Unit tests MemoryStore
Integration tests MemoryStore or FileSystemStore
CI/CD pipelines FileSystemStore
Temporary workflows MemoryStore

Next Steps