Validator Module¶
signalflow.validator.base.SignalValidator
dataclass
¶
SignalValidator(model: Any | None = None, model_type: str | None = None, model_params: dict | None = None, train_params: dict | None = None, tune_enabled: bool = False, tune_params: dict | None = None, feature_columns: list[str] | None = None, pair_col: str = 'pair', ts_col: str = 'timestamp')
Base class for signal validators (meta-labelers).
Validates trading signals by predicting their risk/quality. In De Prado's terminology - this is a meta-labeler.
Note: Filtering to active signals (RISE/FALL only) should be done BEFORE passing data to fit. This keeps the validator simple and gives users full control over data preparation.
Attributes:
| Name | Type | Description |
|---|---|---|
model |
Any | None
|
The trained model instance |
model_type |
str | None
|
String identifier for model type (e.g., "lightgbm", "xgboost") |
model_params |
dict | None
|
Parameters for model initialization |
train_params |
dict | None
|
Parameters for training (e.g., early stopping) |
tune_enabled |
bool
|
Whether hyperparameter tuning is enabled |
tune_params |
dict | None
|
Parameters for tuning (e.g., n_trials, cv_folds) |
feature_columns |
list[str] | None
|
List of feature column names (set after fit) |
fit ¶
fit(X_train: DataFrame, y_train: DataFrame | Series, X_val: DataFrame | None = None, y_val: DataFrame | Series | None = None) -> SignalValidator
Train the validator model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X_train
|
DataFrame
|
Training features (Polars DataFrame) |
required |
y_train
|
DataFrame | Series
|
Training labels |
required |
X_val
|
DataFrame | None
|
Validation features (optional) |
None
|
y_val
|
DataFrame | Series | None
|
Validation labels (optional) |
None
|
Returns:
| Type | Description |
|---|---|
SignalValidator
|
Self for method chaining |
Source code in src/signalflow/validator/base.py
load
classmethod
¶
predict ¶
Predict class labels and return updated Signals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
signals
|
Signals
|
Input signals container |
required |
X
|
DataFrame
|
Features (Polars DataFrame) with (pair, timestamp) + feature columns |
required |
Returns:
| Type | Description |
|---|---|
Signals
|
New Signals with prediction column added |
Source code in src/signalflow/validator/base.py
predict_proba ¶
Predict class probabilities and return updated Signals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
signals
|
Signals
|
Input signals container |
required |
X
|
DataFrame
|
Features (Polars DataFrame) |
required |
Returns:
| Type | Description |
|---|---|
Signals
|
New Signals with probability columns added |
Source code in src/signalflow/validator/base.py
save ¶
tune ¶
tune(X_train: DataFrame, y_train: DataFrame | Series, X_val: DataFrame | None = None, y_val: DataFrame | Series | None = None) -> dict[str, Any]
Tune hyperparameters.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Best parameters found |
Source code in src/signalflow/validator/base.py
validate_signals ¶
Add validation predictions to signals.
Convenience method - calls predict_proba internally.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
signals
|
Signals
|
Input signals container |
required |
features
|
DataFrame
|
Features DataFrame with (pair, timestamp) + feature columns |
required |
prefix
|
str
|
Prefix for probability columns |
'probability_'
|
Returns:
| Type | Description |
|---|---|
Signals
|
Signals with added validation columns |
Source code in src/signalflow/validator/base.py
signalflow.validator.sklearn_validator.SklearnSignalValidator
dataclass
¶
SklearnSignalValidator(model: Any | None = None, model_type: str | None = None, model_params: dict | None = None, train_params: dict | None = None, tune_enabled: bool = False, tune_params: dict | None = None, feature_columns: list[str] | None = None, pair_col: str = 'pair', ts_col: str = 'timestamp', auto_select_metric: str = 'roc_auc', auto_select_cv_folds: int = 5)
Bases: SignalValidator
Sklearn-based signal validator.
Supports: - Multiple sklearn-compatible models (LightGBM, XGBoost, RF, etc.) - Automatic model selection via cross-validation - Hyperparameter tuning with Optuna - Early stopping for boosting models
Note: Filter data to active signals (not NONE) BEFORE calling fit(). This gives you full control over data preparation.
Example
Prepare data - filter to active signals¶
df = df.filter(pl.col("signal_type") != "none")
validator = SklearnSignalValidator(model_type="lightgbm") validator.fit( ... train_df.select(["pair", "timestamp"] + feature_cols), ... train_df.select("label"), ... )
validate_signals returns Signals object¶
validated = validator.validate_signals( ... Signals(test_df.select(signal_cols)), ... test_df.select(["pair", "timestamp"] + feature_cols), ... ) validated.value.filter(pl.col("probability_rise") > 0.7)
fit ¶
fit(X_train: DataFrame, y_train: DataFrame | Series, X_val: DataFrame | None = None, y_val: DataFrame | Series | None = None) -> SklearnSignalValidator
Train the validator.
Note: Filter to active signals BEFORE calling this method.
For boosting models with validation data, early stopping is applied.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X_train
|
DataFrame
|
Training features (already filtered to active signals) |
required |
y_train
|
DataFrame | Series
|
Training labels |
required |
X_val
|
DataFrame | None
|
Validation features (optional) |
None
|
y_val
|
DataFrame | Series | None
|
Validation labels (optional) |
None
|
Returns:
| Type | Description |
|---|---|
SklearnSignalValidator
|
Self for method chaining |
Source code in src/signalflow/validator/sklearn_validator.py
load
classmethod
¶
Load validator from file.
Source code in src/signalflow/validator/sklearn_validator.py
predict ¶
Predict class labels and return updated Signals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
signals
|
Signals
|
Input signals container |
required |
X
|
DataFrame
|
Features DataFrame with (pair, timestamp) + feature columns |
required |
Returns:
| Type | Description |
|---|---|
Signals
|
New Signals with 'validation_pred' column added |
Source code in src/signalflow/validator/sklearn_validator.py
predict_proba ¶
Predict class probabilities and return updated Signals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
signals
|
Signals
|
Input signals container |
required |
X
|
DataFrame
|
Features DataFrame with (pair, timestamp) + feature columns |
required |
Returns:
| Type | Description |
|---|---|
Signals
|
New Signals with probability columns (probability_none, probability_rise, probability_fall) |
Source code in src/signalflow/validator/sklearn_validator.py
save ¶
Save validator to file.
Source code in src/signalflow/validator/sklearn_validator.py
tune ¶
tune(X_train: DataFrame, y_train: DataFrame | Series, X_val: DataFrame | None = None, y_val: DataFrame | Series | None = None) -> dict[str, Any]
Tune hyperparameters using Optuna.
Note: Filter to active signals BEFORE calling this method.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Best parameters found |
Source code in src/signalflow/validator/sklearn_validator.py
validate_signals ¶
Add validation probabilities to signals.
Adds probability columns for each class:
- probability_none: P(signal is noise / not actionable)
- probability_rise: P(signal leads to price rise)
- probability_fall: P(signal leads to price fall)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
signals
|
Signals
|
Input Signals container |
required |
features
|
DataFrame
|
Features DataFrame with (pair, timestamp) + features |
required |
prefix
|
str
|
Prefix for probability columns (default: "probability_") |
'probability_'
|
Returns:
| Type | Description |
|---|---|
Signals
|
New Signals with probability columns added. |
Example
validated = validator.validate_signals(signals, features) df = validated.value confident_rise = df.filter( ... (pl.col("signal_type") == "rise") & ... (pl.col("probability_rise") > 0.7) ... )