Rule Suggestions¶

The Rule Suggestions module implements automated generation of validation rules by analyzing the statistical profile of a data source. This capability is designed to reduce the manual effort otherwise required to define comprehensive validation configurations, thereby ensuring that the resulting rules are empirically grounded in observed data characteristics.

Overview¶

Rule suggestion is operationalized as a profile-driven inference engine. The system examines column-level statistics—including null ratios, uniqueness measures, value distributions, and detected patterns—to propose validation rules with associated confidence scores. Generated rules are subsequently reviewed, adjusted, and selectively applied by the user to their respective data sources.

Rule Generation Workflow¶

Initiating Rule Generation¶

Navigate to the Profile page for a data source
Ensure that at least one profiling result exists (run Run Profile if necessary)
Click the Suggest Rules button in the page header
The Rule Suggestion dialog opens and automatically generates an initial set of rules using default parameters

Generation Settings¶

The Settings tab within the dialog provides fine-grained control over the generation process:

Parameter	Type	Description
Strictness	`low` / `medium` / `high`	Controls the threshold sensitivity for rule generation. Higher strictness produces more restrictive rules
Preset	`none` / predefined sets	Applies a predefined rule template optimized for specific use cases
Minimum Confidence	`0–100%`	Excludes rules whose confidence score falls below this threshold
Categories	Multi-select	Restricts generation to specific rule categories (schema, completeness, uniqueness, distribution, stats, pattern)

Upon adjustment of the settings, the Generate Rules button must be activated to regenerate the entire suggestion set with the updated parameters. It should be noted that each generation operation replaces the previous suggestion list in its entirety; rules are not incrementally appended.

API Endpoint¶

Method	Endpoint	Description
POST	`/sources/{id}/rules/suggest`	Generate rule suggestions based on the source's profile

Request body parameters correspond to the dialog settings:

{
  "strictness": "medium",
  "min_confidence": 0.5,
  "preset": "none",
  "include_categories": ["completeness", "uniqueness", "distribution"]
}

Suggestion Review and Selection¶

Suggestions Tab¶

Generated rules are presented as a filterable list within the Suggestions tab:

Element	Description
Validator Name	The specific validator to be applied (e.g., `null_check`, `range_check`)
Column	Target column for column-level rules
Category	Classification (completeness, uniqueness, distribution, schema, stats, pattern)
Confidence	Numerical score (0.0–1.0) indicating the statistical confidence of the suggestion
Reason	Human-readable explanation of why the rule was suggested

Automatic Pre-Selection¶

Upon generation, rules exhibiting a confidence score of 0.85 or higher are automatically selected for application. This behavior has been designed to streamline the review process by pre-selecting rules that demonstrate strong statistical support. Users may manually adjust the selection prior to application.

When no suggestions are present (e.g., prior to generation or during loading), the selection state is cleared and the Apply Rules button is rendered inactive.

Filtering and Search¶

The suggestion list supports the following filtering mechanisms:

Text search: Filtering is performed by validator name, column name, or reason text
Category filter: The visible list may be restricted to a specific rule category

Rule Application Procedure¶

Review and adjust the selection checkboxes in the Suggestions tab
Click Apply Rules (N) where N reflects the current selection count
The selected rules are submitted to the backend for persistent association with the data source
Applied rules are thereby incorporated into the source's active validation configuration

The Apply button is disabled when no rules have been selected (selectedIds.size === 0).

Export and Serialization Capabilities¶

Selected rules may be exported in multiple formats for external utilization:

Format	Description
YAML	Structured configuration suitable for version control
JSON	Machine-readable format for programmatic consumption

It should be noted that export and clipboard copy operations are made available only when at least one rule has been selected.

Profiling Integration Dependencies¶

Rule suggestion is contingent upon prior profiling data. The quality and comprehensiveness of suggested rules is directly proportional to the richness of the underlying profile. Advanced profiling options—such as pattern detection, distribution analysis, and correlation computation—have been observed to yield more diverse and precise rule suggestions.

For optimal results, it is recommended that advanced profiling be executed with the following configuration prior to rule generation:

Setting	Recommended Value	Impact
`include_patterns`	`true`	Enables pattern-based rule suggestions (email, phone, UUID)
`include_distributions`	`true`	Enables distribution-based range and outlier rules
`include_correlations`	`true`	Enables cross-column relationship rules
`top_n_values`	`20`	Provides richer categorical value analysis