Model Monitoring¶

The Model Monitoring module implements comprehensive lifecycle monitoring for machine learning models deployed in production environments. This module is integrated with the truthound.ml.monitoring framework to provide enterprise-grade performance tracking, drift detection, quality metrics computation, and intelligent alerting capabilities.

Overview¶

Machine learning models necessitate continuous monitoring to ensure that predictive performance is sustained over time. In contrast to traditional software systems, ML models are susceptible to silent degradation resulting from data drift, concept drift, or alterations in the underlying data distribution. These challenges are addressed by the present module through the provision of:

Performance Metrics Collection: Systematic tracking of latency, throughput, and error rate measurements
Drift Detection: Application of statistical methods for the identification of distribution changes
Quality Metrics: Quantitative assessment of classification and regression model performance
Intelligent Alerting: Implementation of threshold-based, statistical, and trend-based alert rule evaluation

Theoretical Foundation¶

Statistical Characterization of Data Drift¶

Data drift is observed when the statistical properties of input data undergo temporal change. Multiple statistical tests derived from the truthound framework are employed by this module:

Method	Mathematical Basis	Interpretation
PSI (Population Stability Index)	$PSI = \sum_{i} (A_i - E_i) \times \ln(A_i / E_i)$	<0.1 stable, 0.1-0.25 slight drift, >0.25 significant
KS (Kolmogorov-Smirnov)	$D_n = \sup_x	F_n(x) - F(x)
JS (Jensen-Shannon)	$JS(P \\| Q) = \frac{1}{2}KL(P \\| M) + \frac{1}{2}KL(Q \\| M)$	Bounded [0,1], symmetric divergence
Wasserstein	$W_p(P, Q) = \left(\inf_{\gamma \in \Gamma(P,Q)} \int \\|x-y\\|^p d\gamma(x,y)\right)^{1/p}$	Earth Mover's Distance

Concept Drift Detection Methodologies¶

Concept drift is characterized by a temporal change in the relationship between input features and the target variable. The module incorporates the following detection methods:

DDM (Drift Detection Method): Error rate is monitored against warning and drift thresholds
ADWIN (Adaptive Windowing): Window size is automatically adjusted based on change detection outcomes
Page-Hinkley: A cumulative sum test is applied for the detection of gradual distributional changes

Quality Metrics Definitions¶

For classification models, the following metrics are computed: - Accuracy: $\frac{TP + TN}{TP + TN + FP + FN}$ - Precision: $\frac{TP}{TP + FP}$ - Recall: $\frac{TP}{TP + FN}$ - F1 Score: $2 \times \frac{Precision \times Recall}{Precision + Recall}$

For regression models, the following metrics are computed: - MAE: $\frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i|$ - MSE: $\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2$ - RMSE: $\sqrt{MSE}$

Model Monitoring Interface¶

Aggregate Statistics Dashboard¶

The interface presents aggregate model monitoring metrics as summarized below:

Metric	Description
Total Models	Count of registered models
Active Models	Models currently in production
Degraded Models	Models exhibiting performance degradation
Predictions (24h)	Total predictions across all models
Active Alerts	Unresolved model-related alerts
Models with Drift	Models where input/output drift detected
Average Latency	Mean inference latency across models

Model Registration and Version Management¶

Registration of a New Model¶

Click Register Model
Complete the registration form across three tabs:

Basic Information Tab¶

Field	Description	Required
Model Name	Unique identifier for the model	Yes
Version	Semantic version (e.g., 1.0.0)	Yes
Description	Model purpose and documentation	No
Metadata	Custom key-value pairs	No

Configuration Tab¶

The configuration parameters correspond directly to the truthound MonitorConfig specification:

Feature Toggles

Setting	Description	Default
Enable Performance Metrics	Track latency, throughput, error rates	Enabled
Enable Drift Detection	Monitor distribution changes using `th.compare()`	Enabled
Enable Quality Metrics	Track accuracy, precision, recall, F1 (requires actual values)	Enabled

Drift Detection Parameters (when enabled)

Setting	Description	Default
Drift Method	Statistical method for drift detection	Auto
Drift Threshold	Score threshold for triggering alerts	10%

Available drift methods are enumerated below: - Auto: The optimal method is automatically selected based on column type - PSI: Population Stability Index (recommended for tabular data) - KS: Kolmogorov-Smirnov test (distribution comparison) - JS: Jensen-Shannon divergence (symmetric, bounded) - Wasserstein: Earth Mover's Distance (geometry-aware) - Chi-squared: Applicable to categorical features - KL: Kullback-Leibler divergence - Hellinger: Bounded distance metric

Collection Parameters

Setting	Description	Default	Range
Batch Size	Metrics collection batch size	100	1-10,000
Retention Hours	Data retention period	24	1-720
Collection Interval	Metric collection frequency (seconds)	60	1-3,600
Alert Evaluation Interval	Rule evaluation frequency (seconds)	30	1-3,600

Alerts Tab¶

The default alert rules that are automatically instantiated upon model registration are displayed: - High Latency Alert (P95 > 500ms) - Drift Detection Alert (exceeds configured threshold) - Error Rate Alert (> 5%)

Supported Model Types¶

Type	Description	Key Metrics
Classification	Categorical prediction	Accuracy, precision, recall, F1
Regression	Continuous value prediction	MAE, MSE, RMSE
Ranking	Ordered list generation	NDCG, MAP, MRR

Metrics Inspection Tab¶

Examination of Model Metrics¶

Select a model from the dropdown
Choose time range (1h, 6h, 24h, 7d)
Review metrics display and time-series charts

Performance Metrics¶

Metric	Description	Applicable To
Accuracy	Correct predictions / total predictions	Classification
Precision	True positives / predicted positives	Classification
Recall	True positives / actual positives	Classification
F1 Score	Harmonic mean of precision and recall	Classification
MAE	Mean absolute error	Regression
MSE	Mean squared error	Regression
RMSE	Root mean squared error	Regression

Operational Metrics¶

Metric	Description
Latency (p50)	Median inference time
Latency (p95)	95^th percentile inference time
Latency (p99)	99^th percentile inference time
Predictions Count	Total predictions in period
Error Rate	Percentage of failed predictions
Throughput	Predictions per second

Drift Detection¶

Application of truthound th.compare()¶

The drift detection capability is implemented through truthound's th.compare() function, which is employed to identify distribution changes between reference and current datasets.

Operational Workflow: 1. Select reference data source (baseline distribution) 2. Select current data source (production distribution) 3. Choose drift detection method 4. Review per-column drift scores

Interpretation Guidelines:

PSI Score	Interpretation	Action
< 0.10	No significant drift	Continue monitoring
0.10 - 0.25	Slight drift	Investigate root cause
> 0.25	Significant drift	Consider model retraining

Drift Alert Generation Mechanism¶

When the drift score exceeds the configured threshold, the following sequence is initiated: 1. An alert is created by the system with severity determined by score magnitude 2. The alert includes identification of drifted columns and their individual scores 3. The model status may be transitioned to "Degraded" if the score exceeds 0.3

Quality Metrics Assessment¶

Computation of Quality Metrics¶

Quality metrics are derived from predictions for which associated actual (ground truth) values have been recorded.

For Classification Models: - Binary versus multi-class classification is automatically detected - Accuracy is computed for all classification types - Precision, recall, and F1 are computed for binary classification

For Regression Models: - MAE (Mean Absolute Error) is computed - MSE (Mean Squared Error) is computed - RMSE (Root Mean Squared Error) is computed

Recording Predictions with Ground Truth Values¶

To enable quality metrics computation, predictions must be recorded with the actual field:

POST /model-monitoring/models/{id}/predictions
{
  "features": {"amount": 150.0, "merchant_type": "online"},
  "prediction": 0.85,
  "actual": 1,
  "latency_ms": 5.2
}

Alert Rule Configuration¶

Classification of Alert Rule Types¶

The module supports three distinct rule types, each corresponding to components within truthound's alerting framework:

Threshold-Based Rules¶

Threshold-based alerting is configured through the following parameters: - Metric Name: Target metric to be monitored - Threshold: Trigger value - Comparison: gt, lt, gte, lte, eq - Duration: Time period over which the condition must persist

Statistical Rules (Anomaly Detection Rules)¶

Anomaly-based alerting is performed using statistical methods: - Window Size: Sample size utilized for statistical computation - Std Devs: Number of standard deviations defining the threshold boundary - An alert is triggered when the metric value exceeds the expected statistical range

Trend-Based Rules¶

Trend-based alerting is designed for the detection of gradual changes: - Direction: "increasing" or "decreasing" - Slope Threshold: Minimum rate of change required for activation - Lookback Minutes: Time window employed for trend calculation - Linear regression is utilized to detect degradation trends

Exemplary Rule Configurations¶

Rule	Type	Metric	Condition	Severity
Low Accuracy	threshold	accuracy	< 0.85	High
High Latency	threshold	latency_p95	> 500ms	Medium
Error Spike	statistical	error_rate	> 3 std devs	Critical
Drift Detected	threshold	drift_score	> 0.1	High
Degrading Performance	trend	accuracy	decreasing, slope > 0.01	Warning

Alert Handler Configuration Tab¶

Supported Handler Types¶

The module supports handlers that correspond to truthound's alert handler framework:

Handler	truthound Mapping	Use Case
Slack	SlackAlertHandler	Team notifications
Webhook	WebhookAlertHandler	External integrations
Email	-	Stakeholder notifications
PagerDuty	PagerDutyAlertHandler	On-call escalation

Handler Configuration Parameters¶

Slack Handler¶

Parameter	Description
Webhook URL	Slack incoming webhook URL
Channel	Target channel (optional override)
Mention	Users/groups to mention

Webhook Handler¶

Parameter	Description
URL	Webhook endpoint
Method	HTTP method (POST, PUT)
Headers	Custom HTTP headers

PagerDuty Handler¶

Parameter	Description
Routing Key	PagerDuty integration key
Severity Mapping	Map alert severity to PagerDuty severity

Model Lifecycle Management¶

Status Transition Model¶

Status	Color	Description	Automatic Transition
Active	Green	Operating within parameters	-
Paused	Gray	Monitoring suspended	Manual
Degraded	Yellow	Performance below threshold	When drift_score > 0.3
Error	Red	Experiencing errors	On repeated failures

Health Score Computation¶

The health score (0-100) is computed as a weighted composite of the following factors: - Drift score contribution (weighted) - Error rate contribution - Latency threshold violations - Active alert count

Integration with the truthound Framework¶

Component Mapping¶

Dashboard Feature	truthound Component
Model Config	`MonitorConfig`
Performance Metrics	`PerformanceCollector`
Drift Detection	`th.compare()`, `DriftCollector`
Quality Metrics	`QualityCollector`
Threshold Rules	`ThresholdRule`
Statistical Rules	`AnomalyRule`
Trend Rules	`TrendRule`
Slack Alerts	`SlackAlertHandler`
Webhook Alerts	`WebhookAlertHandler`
PagerDuty Alerts	`PagerDutyAlertHandler`

Drift Detection Methods Reference¶

Method	Type	Best For	Notes
auto	-	General use	Selects optimal method per column
psi	Binned	Tabular data	Industry standard
ks	Distribution	Numeric columns	Sensitive to shape
js	Divergence	All types	Symmetric, bounded [0,1]
wasserstein	Distance	Numeric columns	Geometry-aware
chi2	Statistical	Categorical	Chi-squared test
kl	Divergence	All types	Information-theoretic
cvm	Statistical	Numeric	Sensitive to tails
anderson	Statistical	Numeric	Most sensitive to tails
hellinger	Distance	All types	Bounded [0,1]
energy	Distance	Numeric	Location/scale sensitive
mmd	Kernel	High-dimensional	Maximum Mean Discrepancy

API Reference¶

Model Management Endpoints¶

Endpoint	Method	Description
`/model-monitoring/models`	GET	List registered models
`/model-monitoring/models`	POST	Register a new model
`/model-monitoring/models/{id}`	GET	Retrieve model details
`/model-monitoring/models/{id}`	PUT	Update model configuration
`/model-monitoring/models/{id}`	DELETE	Delete a model
`/model-monitoring/models/{id}/pause`	POST	Pause monitoring
`/model-monitoring/models/{id}/resume`	POST	Resume monitoring

Metrics and Analysis Endpoints¶

Endpoint	Method	Description
`/model-monitoring/models/{id}/metrics`	GET	Retrieve performance metrics
`/model-monitoring/models/{id}/quality-metrics`	GET	Retrieve quality metrics
`/model-monitoring/models/{id}/detect-drift`	POST	Run drift detection
`/model-monitoring/models/{id}/predictions`	POST	Record prediction

Alert and Rule Management Endpoints¶

Endpoint	Method	Description
`/model-monitoring/alerts`	GET	List model alerts
`/model-monitoring/alerts/{id}/acknowledge`	POST	Acknowledge alert
`/model-monitoring/alerts/{id}/resolve`	POST	Resolve alert
`/model-monitoring/rules`	GET	List alert rules
`/model-monitoring/rules`	POST	Create alert rule
`/model-monitoring/rules/{id}`	PUT	Update alert rule
`/model-monitoring/rules/{id}`	DELETE	Delete alert rule
`/model-monitoring/handlers`	GET	List alert handlers
`/model-monitoring/handlers`	POST	Create alert handler
`/model-monitoring/handlers/{id}/test`	POST	Test alert handler

Dashboard Endpoints¶

Endpoint	Method	Description
`/model-monitoring/overview`	GET	Retrieve monitoring overview
`/model-monitoring/models/{id}/dashboard`	GET	Model-specific dashboard

Recommended Operational Practices¶

Monitoring Strategy¶

Establish a baseline: Reference metrics should be established prior to production deployment
Configure appropriate thresholds: Thresholds should be determined based on business requirements and historical data analysis
Enable drift detection: Drift detection is considered essential for identifying silent model degradation
Implement alerting: Alert handlers should be configured to ensure timely notification of operational issues

Drift Detection¶

Select an appropriate method: PSI is recommended for general use; KS is preferred when distributional sensitivity is required
Define reasonable thresholds: It is advisable to begin with conservative thresholds (0.1) and adjust based on observed drift patterns
Monitor at the per-column level: Individual features contributing to drift should be identified
Correlate with performance metrics: It should be noted that not all drift impacts model performance with equal magnitude

Alert Configuration¶

Prioritize critical metrics: Alerting should be focused on metrics that directly impact business outcomes
Mitigate alert fatigue: Thresholds should be calibrated to minimize false positive rates
Employ trend-based rules: Trend rules are recommended for detecting gradual degradation before it reaches a critical state
Configure escalation pathways: Critical alerts should be routed to the appropriate operational channels

Diagnostic and Troubleshooting Procedures¶

This section is reserved for the documentation of common diagnostic procedures, known failure modes, and their corresponding resolution strategies. Practitioners are advised to consult the truthound ML Module Documentation for framework-level troubleshooting guidance.

References¶

truthound ML Module Documentation: .truthound_docs/advanced/ml-anomaly.md
Statistical Drift Detection Methods: Population Stability Index (PSI), Kolmogorov-Smirnov Test
Concept Drift Detection: Gama, J., et al. (2014). A survey on concept drift adaptation

Method	Mathematical Basis	Interpretation
PSI (Population Stability Index)	\(PSI = \sum_{i} (A_i - E_i) \times \ln(A_i / E_i)\)	<0.1 stable, 0.1-0.25 slight drift, >0.25 significant
KS (Kolmogorov-Smirnov)	$D_n = \sup_x	F_n(x) - F(x)
JS (Jensen-Shannon)	\(JS(P \\| Q) = \frac{1}{2}KL(P \\| M) + \frac{1}{2}KL(Q \\| M)\)	Bounded [0,1], symmetric divergence
Wasserstein	\(W_p(P, Q) = \left(\inf_{\gamma \in \Gamma(P,Q)} \int \\|x-y\\|^p d\gamma(x,y)\right)^{1/p}\)	Earth Mover's Distance