Skip to content

Validator Module

Signal validators (meta-labelers) predict the quality or risk of trading signals. In De Prado's terminology, this implements the meta-labeling approach - training a secondary model to predict whether a primary signal will be successful.

Available Validators

Validator Model Best For
LightGBMValidator LGBMClassifier Fast training, good defaults
XGBoostValidator XGBClassifier Robust, regularized
RandomForestValidator RandomForestClassifier Ensemble, interpretable
LogisticRegressionValidator LogisticRegression Fast, linear relationships
SVMValidator SVC Small datasets, non-linear
AutoSelectValidator Auto Automatic model selection

Quick Start

import polars as pl
from signalflow.validator import LightGBMValidator
from signalflow.core import Signals

# Prepare data - filter to active signals (not NONE)
train_df = train_df.filter(pl.col("signal_type") != "none")

# Create and train validator
validator = LightGBMValidator(n_estimators=200, learning_rate=0.05)
validator.fit(
    train_df.select(["pair", "timestamp"] + feature_cols),
    train_df.select("label"),
    X_val=val_df.select(["pair", "timestamp"] + feature_cols),  # Early stopping
    y_val=val_df.select("label"),
)

# Validate new signals
validated = validator.validate_signals(
    Signals(test_df.select(signal_cols)),
    test_df.select(["pair", "timestamp"] + feature_cols),
)

# Filter to high-confidence predictions
confident = validated.value.filter(pl.col("probability_rise") > 0.7)

Auto Model Selection

Use AutoSelectValidator to automatically select the best model:

from signalflow.validator import AutoSelectValidator

validator = AutoSelectValidator()
validator.fit(X_train, y_train)

# Check which model was selected
print(validator.selected_validator)  # e.g., LightGBMValidator

Hyperparameter Tuning

Each validator supports Optuna-based hyperparameter tuning:

from signalflow.validator import RandomForestValidator

validator = RandomForestValidator()
validator.tune_params = {"n_trials": 50, "cv_folds": 5, "timeout": 600}
best_params = validator.tune(X_train, y_train)

# Fit with best params
validator.fit(X_train, y_train)

Base Class

signalflow.validator.base.SignalValidator dataclass

SignalValidator(model: Any | None = None, model_type: str | None = None, model_params: dict | None = None, train_params: dict | None = None, tune_enabled: bool = False, tune_params: dict | None = None, feature_columns: list[str] | None = None, pair_col: str = 'pair', ts_col: str = 'timestamp')

Base class for signal validators (meta-labelers).

Validates trading signals by predicting their risk/quality. In De Prado's terminology - this is a meta-labeler.

Note: Filtering to active signals (RISE/FALL only) should be done BEFORE passing data to fit. This keeps the validator simple and gives users full control over data preparation.

Attributes:

Name Type Description
model Any | None

The trained model instance

model_type str | None

String identifier for model type (e.g., "lightgbm", "xgboost")

model_params dict | None

Parameters for model initialization

train_params dict | None

Parameters for training (e.g., early stopping)

tune_enabled bool

Whether hyperparameter tuning is enabled

tune_params dict | None

Parameters for tuning (e.g., n_trials, cv_folds)

feature_columns list[str] | None

List of feature column names (set after fit)

component_type class-attribute

component_type: SfComponentType = VALIDATOR

feature_columns class-attribute instance-attribute

feature_columns: list[str] | None = field(default=None, repr=False)

model class-attribute instance-attribute

model: Any | None = None

model_params class-attribute instance-attribute

model_params: dict | None = None

model_type class-attribute instance-attribute

model_type: str | None = None

pair_col class-attribute instance-attribute

pair_col: str = 'pair'

train_params class-attribute instance-attribute

train_params: dict | None = None

ts_col class-attribute instance-attribute

ts_col: str = 'timestamp'

tune_enabled class-attribute instance-attribute

tune_enabled: bool = False

tune_params class-attribute instance-attribute

tune_params: dict | None = None

fit

fit(X_train: DataFrame, y_train: DataFrame | Series, X_val: DataFrame | None = None, y_val: DataFrame | Series | None = None) -> SignalValidator

Train the validator model.

Parameters:

Name Type Description Default
X_train DataFrame

Training features (Polars DataFrame)

required
y_train DataFrame | Series

Training labels

required
X_val DataFrame | None

Validation features (optional)

None
y_val DataFrame | Series | None

Validation labels (optional)

None

Returns:

Type Description
SignalValidator

Self for method chaining

Source code in src/signalflow/validator/base.py
def fit(
    self,
    X_train: pl.DataFrame,
    y_train: pl.DataFrame | pl.Series,
    X_val: pl.DataFrame | None = None,
    y_val: pl.DataFrame | pl.Series | None = None,
) -> "SignalValidator":
    """Train the validator model.

    Args:
        X_train: Training features (Polars DataFrame)
        y_train: Training labels
        X_val: Validation features (optional)
        y_val: Validation labels (optional)

    Returns:
        Self for method chaining
    """
    raise NotImplementedError("Subclasses must implement fit()")

load classmethod

load(path: str | Path) -> SignalValidator

Load model from file.

Source code in src/signalflow/validator/base.py
@classmethod
def load(cls, path: str | Path) -> "SignalValidator":
    """Load model from file."""
    raise NotImplementedError("Subclasses must implement load()")

predict

predict(signals: Signals, X: DataFrame) -> Signals

Predict class labels and return updated Signals.

Parameters:

Name Type Description Default
signals Signals

Input signals container

required
X DataFrame

Features (Polars DataFrame) with (pair, timestamp) + feature columns

required

Returns:

Type Description
Signals

New Signals with prediction column added

Source code in src/signalflow/validator/base.py
def predict(self, signals: Signals, X: pl.DataFrame) -> Signals:
    """Predict class labels and return updated Signals.

    Args:
        signals: Input signals container
        X: Features (Polars DataFrame) with (pair, timestamp) + feature columns

    Returns:
        New Signals with prediction column added
    """
    raise NotImplementedError("Subclasses must implement predict()")

predict_proba

predict_proba(signals: Signals, X: DataFrame) -> Signals

Predict class probabilities and return updated Signals.

Parameters:

Name Type Description Default
signals Signals

Input signals container

required
X DataFrame

Features (Polars DataFrame)

required

Returns:

Type Description
Signals

New Signals with probability columns added

Source code in src/signalflow/validator/base.py
def predict_proba(self, signals: Signals, X: pl.DataFrame) -> Signals:
    """Predict class probabilities and return updated Signals.

    Args:
        signals: Input signals container
        X: Features (Polars DataFrame)

    Returns:
        New Signals with probability columns added
    """
    raise NotImplementedError("Subclasses must implement predict_proba()")

save

save(path: str | Path) -> None

Save model to file.

Source code in src/signalflow/validator/base.py
def save(self, path: str | Path) -> None:
    """Save model to file."""
    raise NotImplementedError("Subclasses must implement save()")

tune

tune(X_train: DataFrame, y_train: DataFrame | Series, X_val: DataFrame | None = None, y_val: DataFrame | Series | None = None) -> dict[str, Any]

Tune hyperparameters.

Returns:

Type Description
dict[str, Any]

Best parameters found

Source code in src/signalflow/validator/base.py
def tune(
    self,
    X_train: pl.DataFrame,
    y_train: pl.DataFrame | pl.Series,
    X_val: pl.DataFrame | None = None,
    y_val: pl.DataFrame | pl.Series | None = None,
) -> dict[str, Any]:
    """Tune hyperparameters.

    Returns:
        Best parameters found
    """
    if not self.tune_enabled:
        raise ValueError("Tuning is not enabled for this validator")
    raise NotImplementedError("Subclasses must implement tune()")

validate_signals

validate_signals(signals: Signals, features: DataFrame, prefix: str = 'probability_') -> Signals

Add validation predictions to signals.

Convenience method - calls predict_proba internally.

Parameters:

Name Type Description Default
signals Signals

Input signals container

required
features DataFrame

Features DataFrame with (pair, timestamp) + feature columns

required
prefix str

Prefix for probability columns

'probability_'

Returns:

Type Description
Signals

Signals with added validation columns

Source code in src/signalflow/validator/base.py
def validate_signals(
    self,
    signals: Signals,
    features: pl.DataFrame,
    prefix: str = "probability_",
) -> Signals:
    """Add validation predictions to signals.

    Convenience method - calls predict_proba internally.

    Args:
        signals: Input signals container
        features: Features DataFrame with (pair, timestamp) + feature columns
        prefix: Prefix for probability columns

    Returns:
        Signals with added validation columns
    """
    raise NotImplementedError("Subclasses must implement validate_signals()")

Sklearn Base

signalflow.validator.sklearn_validator.SklearnValidatorBase dataclass

SklearnValidatorBase(model: Any | None = None, model_type: str | None = None, model_params: dict | None = None, train_params: dict | None = None, tune_enabled: bool = False, tune_params: dict | None = None, feature_columns: list[str] | None = None, pair_col: str = 'pair', ts_col: str = 'timestamp', tune_metric: str = 'roc_auc', tune_cv_folds: int = 5, tune_n_trials: int = 50, tune_timeout: int = 600, early_stopping_rounds: int = 50)

Bases: SignalValidator

Base class for sklearn-compatible signal validators.

Provides common functionality for feature extraction, fitting, prediction, and serialization. Subclasses define model-specific configuration via class variables.

Class Variables (override in subclasses): _model_class: Import path for the model class (e.g., "lightgbm.LGBMClassifier") _default_params: Default model parameters _tune_space: Optuna tuning space definition _supports_early_stopping: Whether the model supports early stopping

_default_params class-attribute

_default_params: dict[str, Any] = {}

_model_class class-attribute

_model_class: str = ''

_supports_early_stopping class-attribute

_supports_early_stopping: bool = False

_tune_space class-attribute

_tune_space: dict[str, tuple] = {}

early_stopping_rounds class-attribute instance-attribute

early_stopping_rounds: int = 50

tune_cv_folds class-attribute instance-attribute

tune_cv_folds: int = 5

tune_metric class-attribute instance-attribute

tune_metric: str = 'roc_auc'

tune_n_trials class-attribute instance-attribute

tune_n_trials: int = 50

tune_timeout class-attribute instance-attribute

tune_timeout: int = 600

__post_init__

__post_init__() -> None
Source code in src/signalflow/validator/sklearn_validator.py
def __post_init__(self) -> None:
    if self.model_params is None:
        self.model_params = {}
    if self.train_params is None:
        self.train_params = {}
    if self.tune_params is None:
        self.tune_params = {
            "n_trials": self.tune_n_trials,
            "cv_folds": self.tune_cv_folds,
            "timeout": self.tune_timeout,
        }

_create_model

_create_model(params: dict | None = None) -> Any

Create model instance with given parameters.

Source code in src/signalflow/validator/sklearn_validator.py
def _create_model(self, params: dict | None = None) -> Any:
    """Create model instance with given parameters."""
    model_class = import_model_class(self._model_class)

    final_params = {**self._default_params}
    if self.model_params:
        final_params.update(self.model_params)
    if params:
        final_params.update(params)

    return model_class(**final_params)

_extract_features

_extract_features(X: DataFrame, fit_mode: bool = False) -> np.ndarray

Extract feature matrix from DataFrame.

Parameters:

Name Type Description Default
X DataFrame

Input DataFrame

required
fit_mode bool

If True, infer and store feature columns

False

Returns:

Type Description
ndarray

Feature matrix as numpy array

Source code in src/signalflow/validator/sklearn_validator.py
def _extract_features(
    self,
    X: pl.DataFrame,
    fit_mode: bool = False,
) -> np.ndarray:
    """Extract feature matrix from DataFrame.

    Args:
        X: Input DataFrame
        fit_mode: If True, infer and store feature columns

    Returns:
        Feature matrix as numpy array
    """
    exclude_cols = {self.pair_col, self.ts_col}

    if fit_mode:
        self.feature_columns = [c for c in X.columns if c not in exclude_cols]

    if self.feature_columns is None:
        raise ValueError("feature_columns not set. Call fit() first.")

    missing = set(self.feature_columns) - set(X.columns)
    if missing:
        raise ValueError(f"Missing feature columns: {sorted(missing)}")

    return X.select(self.feature_columns).to_numpy()

_extract_labels

_extract_labels(y: DataFrame | Series) -> np.ndarray

Extract label array.

Source code in src/signalflow/validator/sklearn_validator.py
def _extract_labels(self, y: pl.DataFrame | pl.Series) -> np.ndarray:
    """Extract label array."""
    if isinstance(y, pl.DataFrame):
        if y.width == 1:
            return y.to_numpy().ravel()
        elif "label" in y.columns:
            return y["label"].to_numpy()
        else:
            raise ValueError("y DataFrame must have single column or 'label' column")
    return y.to_numpy()

_get_class_labels

_get_class_labels() -> list[str]

Get class labels for probability columns.

Source code in src/signalflow/validator/sklearn_validator.py
def _get_class_labels(self) -> list[str]:
    """Get class labels for probability columns."""
    if self.model is None:
        raise ValueError("Model not fitted.")

    classes = getattr(self.model, "classes_", None)
    if classes is None:
        return ["none", "rise", "fall"]

    _legacy_map = {0: "none", 1: "rise", 2: "fall"}
    return [_legacy_map.get(c, str(c)) for c in classes]

_get_early_stopping_kwargs

_get_early_stopping_kwargs(X_val: ndarray, y_val: ndarray) -> dict[str, Any]

Get early stopping fit kwargs. Override in subclasses.

Source code in src/signalflow/validator/sklearn_validator.py
def _get_early_stopping_kwargs(
    self,
    X_val: np.ndarray,
    y_val: np.ndarray,
) -> dict[str, Any]:
    """Get early stopping fit kwargs. Override in subclasses."""
    return {}

fit

fit(X_train: DataFrame, y_train: DataFrame | Series, X_val: DataFrame | None = None, y_val: DataFrame | Series | None = None) -> SklearnValidatorBase

Train the validator.

Note: Filter to active signals BEFORE calling this method.

Parameters:

Name Type Description Default
X_train DataFrame

Training features (already filtered to active signals)

required
y_train DataFrame | Series

Training labels

required
X_val DataFrame | None

Validation features (optional, for early stopping)

None
y_val DataFrame | Series | None

Validation labels (optional)

None

Returns:

Type Description
SklearnValidatorBase

Self for method chaining

Source code in src/signalflow/validator/sklearn_validator.py
def fit(
    self,
    X_train: pl.DataFrame,
    y_train: pl.DataFrame | pl.Series,
    X_val: pl.DataFrame | None = None,
    y_val: pl.DataFrame | pl.Series | None = None,
) -> "SklearnValidatorBase":
    """Train the validator.

    Note: Filter to active signals BEFORE calling this method.

    Args:
        X_train: Training features (already filtered to active signals)
        y_train: Training labels
        X_val: Validation features (optional, for early stopping)
        y_val: Validation labels (optional)

    Returns:
        Self for method chaining
    """
    X_np = self._extract_features(X_train, fit_mode=True)
    y_np = self._extract_labels(y_train)

    self.model = self._create_model()

    fit_kwargs: dict[str, Any] = {}

    if X_val is not None and y_val is not None and self._supports_early_stopping:
        X_val_np = self._extract_features(X_val)
        y_val_np = self._extract_labels(y_val)
        fit_kwargs = self._get_early_stopping_kwargs(X_val_np, y_val_np)

    self.model.fit(X_np, y_np, **fit_kwargs)

    return self

load classmethod

load(path: str | Path) -> SklearnValidatorBase

Load validator from file.

Source code in src/signalflow/validator/sklearn_validator.py
@classmethod
def load(cls, path: str | Path) -> "SklearnValidatorBase":
    """Load validator from file."""
    path = Path(path)

    with open(path, "rb") as f:
        state = pickle.load(f)

    validator = cls(
        model=state["model"],
        model_params=state["model_params"],
        train_params=state["train_params"],
        tune_params=state["tune_params"],
        feature_columns=state["feature_columns"],
        pair_col=state.get("pair_col", "pair"),
        ts_col=state.get("ts_col", "timestamp"),
    )

    return validator

predict

predict(signals: Signals, X: DataFrame) -> Signals

Predict class labels and return updated Signals.

Parameters:

Name Type Description Default
signals Signals

Input signals container

required
X DataFrame

Features DataFrame with (pair, timestamp) + feature columns

required

Returns:

Type Description
Signals

New Signals with 'validation_pred' column added

Source code in src/signalflow/validator/sklearn_validator.py
def predict(self, signals: Signals, X: pl.DataFrame) -> Signals:
    """Predict class labels and return updated Signals.

    Args:
        signals: Input signals container
        X: Features DataFrame with (pair, timestamp) + feature columns

    Returns:
        New Signals with 'validation_pred' column added
    """
    if self.model is None:
        raise ValueError("Model not fitted. Call fit() first.")

    signals_df = signals.value

    X_matched = signals_df.select([self.pair_col, self.ts_col]).join(
        X,
        on=[self.pair_col, self.ts_col],
        how="left",
    )

    X_np = self._extract_features(X_matched)
    predictions = self.model.predict(X_np)

    result_df = signals_df.with_columns(pl.Series(name="validation_pred", values=predictions))

    return Signals(result_df)

predict_proba

predict_proba(signals: Signals, X: DataFrame) -> Signals

Predict class probabilities and return updated Signals.

Parameters:

Name Type Description Default
signals Signals

Input signals container

required
X DataFrame

Features DataFrame with (pair, timestamp) + feature columns

required

Returns:

Type Description
Signals

New Signals with probability columns added

Source code in src/signalflow/validator/sklearn_validator.py
def predict_proba(self, signals: Signals, X: pl.DataFrame) -> Signals:
    """Predict class probabilities and return updated Signals.

    Args:
        signals: Input signals container
        X: Features DataFrame with (pair, timestamp) + feature columns

    Returns:
        New Signals with probability columns added
    """
    if self.model is None:
        raise ValueError("Model not fitted. Call fit() first.")

    signals_df = signals.value
    classes = self._get_class_labels()

    X_matched = signals_df.select([self.pair_col, self.ts_col]).join(
        X,
        on=[self.pair_col, self.ts_col],
        how="left",
    )

    X_np = self._extract_features(X_matched)
    probas = self.model.predict_proba(X_np)

    result_df = signals_df
    for i, class_label in enumerate(classes):
        col_name = f"probability_{class_label}"
        result_df = result_df.with_columns(pl.Series(name=col_name, values=probas[:, i]))

    return Signals(result_df)

save

save(path: str | Path) -> None

Save validator to file.

Source code in src/signalflow/validator/sklearn_validator.py
def save(self, path: str | Path) -> None:
    """Save validator to file."""
    path = Path(path)

    state = {
        "validator_class": self.__class__.__name__,
        "model": self.model,
        "model_params": self.model_params,
        "train_params": self.train_params,
        "tune_params": self.tune_params,
        "feature_columns": self.feature_columns,
        "pair_col": self.pair_col,
        "ts_col": self.ts_col,
    }

    with open(path, "wb") as f:
        pickle.dump(state, f)

tune

tune(X_train: DataFrame, y_train: DataFrame | Series, X_val: DataFrame | None = None, y_val: DataFrame | Series | None = None) -> dict[str, Any]

Tune hyperparameters using Optuna.

Note: Filter to active signals BEFORE calling this method.

Returns:

Type Description
dict[str, Any]

Best parameters found

Source code in src/signalflow/validator/sklearn_validator.py
def tune(
    self,
    X_train: pl.DataFrame,
    y_train: pl.DataFrame | pl.Series,
    X_val: pl.DataFrame | None = None,
    y_val: pl.DataFrame | pl.Series | None = None,
) -> dict[str, Any]:
    """Tune hyperparameters using Optuna.

    Note: Filter to active signals BEFORE calling this method.

    Returns:
        Best parameters found
    """
    import optuna
    from sklearn.model_selection import cross_val_score

    X_np = self._extract_features(X_train, fit_mode=True)
    y_np = self._extract_labels(y_train)

    _tp = self.tune_params or {}
    n_trials = _tp.get("n_trials", self.tune_n_trials)
    cv_folds = _tp.get("cv_folds", self.tune_cv_folds)
    timeout = _tp.get("timeout", self.tune_timeout)

    def objective(trial: optuna.Trial) -> float:
        params = build_optuna_params(trial, self._tune_space)
        model = self._create_model(params)

        scores = cross_val_score(
            model,
            X_np,
            y_np,
            cv=cv_folds,
            scoring=self.tune_metric,
            n_jobs=-1,
        )
        return float(scores.mean())

    study = optuna.create_study(direction="maximize")
    study.optimize(
        objective,
        n_trials=n_trials,
        timeout=timeout,
        show_progress_bar=True,
    )

    best_params = {**self._default_params, **study.best_params}
    self.model_params = best_params

    return best_params

validate_signals

validate_signals(signals: Signals, features: DataFrame, prefix: str = 'probability_') -> Signals

Add validation probabilities to signals.

Parameters:

Name Type Description Default
signals Signals

Input Signals container

required
features DataFrame

Features DataFrame with (pair, timestamp) + features

required
prefix str

Prefix for probability columns (default: "probability_")

'probability_'

Returns:

Type Description
Signals

New Signals with probability columns added.

Source code in src/signalflow/validator/sklearn_validator.py
def validate_signals(
    self,
    signals: Signals,
    features: pl.DataFrame,
    prefix: str = "probability_",
) -> Signals:
    """Add validation probabilities to signals.

    Args:
        signals: Input Signals container
        features: Features DataFrame with (pair, timestamp) + features
        prefix: Prefix for probability columns (default: "probability_")

    Returns:
        New Signals with probability columns added.
    """
    return self.predict_proba(signals, features)

LightGBM Validator

signalflow.validator.sklearn_validator.LightGBMValidator dataclass

LightGBMValidator(model: Any | None = None, model_type: str | None = None, model_params: dict | None = None, train_params: dict | None = None, tune_enabled: bool = False, tune_params: dict | None = None, feature_columns: list[str] | None = None, pair_col: str = 'pair', ts_col: str = 'timestamp', tune_metric: str = 'roc_auc', tune_cv_folds: int = 5, tune_n_trials: int = 50, tune_timeout: int = 600, early_stopping_rounds: int = 50, n_estimators: int = 100, max_depth: int = 6, learning_rate: float = 0.1, num_leaves: int = 31)

Bases: SklearnValidatorBase

LightGBM-based signal validator.

Gradient boosting model optimized for speed and performance. Supports early stopping with validation data.

Example

validator = LightGBMValidator(n_estimators=200) validator.fit(X_train, y_train, X_val, y_val) validated = validator.validate_signals(signals, features)

_default_params class-attribute

_default_params: dict[str, Any] = {'n_estimators': 100, 'max_depth': 6, 'learning_rate': 0.1, 'num_leaves': 31, 'min_child_samples': 20, 'subsample': 0.8, 'colsample_bytree': 0.8, 'random_state': 42, 'n_jobs': -1, 'verbosity': -1}

_model_class class-attribute

_model_class: str = 'lightgbm.LGBMClassifier'

_supports_early_stopping class-attribute

_supports_early_stopping: bool = True

_tune_space class-attribute

_tune_space: dict[str, tuple] = {'n_estimators': ('int', 50, 500), 'max_depth': ('int', 3, 12), 'learning_rate': ('log_float', 0.01, 0.3), 'num_leaves': ('int', 15, 127), 'min_child_samples': ('int', 5, 100), 'subsample': ('float', 0.6, 1.0), 'colsample_bytree': ('float', 0.6, 1.0)}

learning_rate class-attribute instance-attribute

learning_rate: float = 0.1

max_depth class-attribute instance-attribute

max_depth: int = 6

n_estimators class-attribute instance-attribute

n_estimators: int = 100

num_leaves class-attribute instance-attribute

num_leaves: int = 31

__post_init__

__post_init__() -> None
Source code in src/signalflow/validator/sklearn_validator.py
def __post_init__(self) -> None:
    super().__post_init__()
    # Merge instance params into model_params
    self.model_params = {
        **(self.model_params or {}),
        "n_estimators": self.n_estimators,
        "max_depth": self.max_depth,
        "learning_rate": self.learning_rate,
        "num_leaves": self.num_leaves,
    }

_get_early_stopping_kwargs

_get_early_stopping_kwargs(X_val: ndarray, y_val: ndarray) -> dict[str, Any]
Source code in src/signalflow/validator/sklearn_validator.py
def _get_early_stopping_kwargs(
    self,
    X_val: np.ndarray,
    y_val: np.ndarray,
) -> dict[str, Any]:
    import lightgbm

    return {
        "eval_set": [(X_val, y_val)],
        "callbacks": [lightgbm.early_stopping(self.early_stopping_rounds, verbose=False)],
    }

XGBoost Validator

signalflow.validator.sklearn_validator.XGBoostValidator dataclass

XGBoostValidator(model: Any | None = None, model_type: str | None = None, model_params: dict | None = None, train_params: dict | None = None, tune_enabled: bool = False, tune_params: dict | None = None, feature_columns: list[str] | None = None, pair_col: str = 'pair', ts_col: str = 'timestamp', tune_metric: str = 'roc_auc', tune_cv_folds: int = 5, tune_n_trials: int = 50, tune_timeout: int = 600, early_stopping_rounds: int = 50, n_estimators: int = 100, max_depth: int = 6, learning_rate: float = 0.1)

Bases: SklearnValidatorBase

XGBoost-based signal validator.

Robust gradient boosting with regularization. Supports early stopping with validation data.

Example

validator = XGBoostValidator(n_estimators=200) validator.fit(X_train, y_train, X_val, y_val) validated = validator.validate_signals(signals, features)

_default_params class-attribute

_default_params: dict[str, Any] = {'n_estimators': 100, 'max_depth': 6, 'learning_rate': 0.1, 'subsample': 0.8, 'colsample_bytree': 0.8, 'random_state': 42, 'n_jobs': -1, 'verbosity': 0, 'use_label_encoder': False, 'eval_metric': 'logloss'}

_model_class class-attribute

_model_class: str = 'xgboost.XGBClassifier'

_supports_early_stopping class-attribute

_supports_early_stopping: bool = True

_tune_space class-attribute

_tune_space: dict[str, tuple] = {'n_estimators': ('int', 50, 500), 'max_depth': ('int', 3, 12), 'learning_rate': ('log_float', 0.01, 0.3), 'subsample': ('float', 0.6, 1.0), 'colsample_bytree': ('float', 0.6, 1.0), 'min_child_weight': ('int', 1, 10), 'gamma': ('float', 0, 0.5)}

learning_rate class-attribute instance-attribute

learning_rate: float = 0.1

max_depth class-attribute instance-attribute

max_depth: int = 6

n_estimators class-attribute instance-attribute

n_estimators: int = 100

__post_init__

__post_init__() -> None
Source code in src/signalflow/validator/sklearn_validator.py
def __post_init__(self) -> None:
    super().__post_init__()
    self.model_params = {
        **(self.model_params or {}),
        "n_estimators": self.n_estimators,
        "max_depth": self.max_depth,
        "learning_rate": self.learning_rate,
    }

_get_early_stopping_kwargs

_get_early_stopping_kwargs(X_val: ndarray, y_val: ndarray) -> dict[str, Any]
Source code in src/signalflow/validator/sklearn_validator.py
def _get_early_stopping_kwargs(
    self,
    X_val: np.ndarray,
    y_val: np.ndarray,
) -> dict[str, Any]:
    return {
        "eval_set": [(X_val, y_val)],
        "early_stopping_rounds": self.early_stopping_rounds,
        "verbose": False,
    }

Random Forest Validator

signalflow.validator.sklearn_validator.RandomForestValidator dataclass

RandomForestValidator(model: Any | None = None, model_type: str | None = None, model_params: dict | None = None, train_params: dict | None = None, tune_enabled: bool = False, tune_params: dict | None = None, feature_columns: list[str] | None = None, pair_col: str = 'pair', ts_col: str = 'timestamp', tune_metric: str = 'roc_auc', tune_cv_folds: int = 5, tune_n_trials: int = 50, tune_timeout: int = 600, early_stopping_rounds: int = 50, n_estimators: int = 100, max_depth: int = 10)

Bases: SklearnValidatorBase

Random Forest-based signal validator.

Ensemble of decision trees with bagging.

Example

validator = RandomForestValidator(n_estimators=200) validator.fit(X_train, y_train) validated = validator.validate_signals(signals, features)

_default_params class-attribute

_default_params: dict[str, Any] = {'n_estimators': 100, 'max_depth': 10, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': 'sqrt', 'random_state': 42, 'n_jobs': -1}

_model_class class-attribute

_model_class: str = 'sklearn.ensemble.RandomForestClassifier'

_supports_early_stopping class-attribute

_supports_early_stopping: bool = False

_tune_space class-attribute

_tune_space: dict[str, tuple] = {'n_estimators': ('int', 50, 300), 'max_depth': ('int', 5, 30), 'min_samples_split': ('int', 2, 20), 'min_samples_leaf': ('int', 1, 10)}

max_depth class-attribute instance-attribute

max_depth: int = 10

n_estimators class-attribute instance-attribute

n_estimators: int = 100

__post_init__

__post_init__() -> None
Source code in src/signalflow/validator/sklearn_validator.py
def __post_init__(self) -> None:
    super().__post_init__()
    self.model_params = {
        **(self.model_params or {}),
        "n_estimators": self.n_estimators,
        "max_depth": self.max_depth,
    }

Logistic Regression Validator

signalflow.validator.sklearn_validator.LogisticRegressionValidator dataclass

LogisticRegressionValidator(model: Any | None = None, model_type: str | None = None, model_params: dict | None = None, train_params: dict | None = None, tune_enabled: bool = False, tune_params: dict | None = None, feature_columns: list[str] | None = None, pair_col: str = 'pair', ts_col: str = 'timestamp', tune_metric: str = 'roc_auc', tune_cv_folds: int = 5, tune_n_trials: int = 50, tune_timeout: int = 600, early_stopping_rounds: int = 50, C: float = 1.0, max_iter: int = 1000)

Bases: SklearnValidatorBase

Logistic Regression-based signal validator.

Linear classifier with regularization.

Example

validator = LogisticRegressionValidator(C=0.1) validator.fit(X_train, y_train) validated = validator.validate_signals(signals, features)

C class-attribute instance-attribute

C: float = 1.0

_default_params class-attribute

_default_params: dict[str, Any] = {'C': 1.0, 'max_iter': 1000, 'random_state': 42, 'n_jobs': -1}

_model_class class-attribute

_model_class: str = 'sklearn.linear_model.LogisticRegression'

_supports_early_stopping class-attribute

_supports_early_stopping: bool = False

_tune_space class-attribute

_tune_space: dict[str, tuple] = {'C': ('log_float', 0.0001, 100), 'penalty': ('categorical', ['l1', 'l2']), 'solver': ('categorical', ['saga'])}

max_iter class-attribute instance-attribute

max_iter: int = 1000

__post_init__

__post_init__() -> None
Source code in src/signalflow/validator/sklearn_validator.py
def __post_init__(self) -> None:
    super().__post_init__()
    self.model_params = {
        **(self.model_params or {}),
        "C": self.C,
        "max_iter": self.max_iter,
    }

SVM Validator

signalflow.validator.sklearn_validator.SVMValidator dataclass

SVMValidator(model: Any | None = None, model_type: str | None = None, model_params: dict | None = None, train_params: dict | None = None, tune_enabled: bool = False, tune_params: dict | None = None, feature_columns: list[str] | None = None, pair_col: str = 'pair', ts_col: str = 'timestamp', tune_metric: str = 'roc_auc', tune_cv_folds: int = 5, tune_n_trials: int = 50, tune_timeout: int = 600, early_stopping_rounds: int = 50, C: float = 1.0, kernel: str = 'rbf')

Bases: SklearnValidatorBase

Support Vector Machine-based signal validator.

SVM classifier with RBF kernel by default.

Example

validator = SVMValidator(C=10.0, kernel="rbf") validator.fit(X_train, y_train) validated = validator.validate_signals(signals, features)

C class-attribute instance-attribute

C: float = 1.0

_default_params class-attribute

_default_params: dict[str, Any] = {'C': 1.0, 'kernel': 'rbf', 'probability': True, 'random_state': 42}

_model_class class-attribute

_model_class: str = 'sklearn.svm.SVC'

_supports_early_stopping class-attribute

_supports_early_stopping: bool = False

_tune_space class-attribute

_tune_space: dict[str, tuple] = {'C': ('log_float', 0.001, 100), 'kernel': ('categorical', ['rbf', 'linear', 'poly']), 'gamma': ('categorical', ['scale', 'auto'])}

kernel class-attribute instance-attribute

kernel: str = 'rbf'

__post_init__

__post_init__() -> None
Source code in src/signalflow/validator/sklearn_validator.py
def __post_init__(self) -> None:
    super().__post_init__()
    self.model_params = {
        **(self.model_params or {}),
        "C": self.C,
        "kernel": self.kernel,
    }

Auto-Select Validator

signalflow.validator.sklearn_validator.AutoSelectValidator dataclass

AutoSelectValidator(model: Any | None = None, model_type: str | None = None, model_params: dict | None = None, train_params: dict | None = None, tune_enabled: bool = False, tune_params: dict | None = None, feature_columns: list[str] | None = None, pair_col: str = 'pair', ts_col: str = 'timestamp', tune_metric: str = 'roc_auc', tune_cv_folds: int = 5, tune_n_trials: int = 50, tune_timeout: int = 600, early_stopping_rounds: int = 50, auto_select_metric: str = 'roc_auc', auto_select_cv_folds: int = 5, selected_validator: SklearnValidatorBase | None = None)

Bases: SklearnValidatorBase

Auto-selecting signal validator.

Automatically selects the best model via cross-validation. Tests LightGBM, XGBoost, Random Forest, and Logistic Regression.

Example

validator = AutoSelectValidator() validator.fit(X_train, y_train) # Selects best model print(validator.selected_validator) # Shows which was selected

_default_params class-attribute

_default_params: dict[str, Any] = {}

_model_class class-attribute

_model_class: str = ''

_supports_early_stopping class-attribute

_supports_early_stopping: bool = False

_tune_space class-attribute

_tune_space: dict[str, tuple] = {}

auto_select_cv_folds class-attribute instance-attribute

auto_select_cv_folds: int = 5

auto_select_metric class-attribute instance-attribute

auto_select_metric: str = 'roc_auc'

selected_validator class-attribute instance-attribute

selected_validator: SklearnValidatorBase | None = field(default=None, repr=False)

fit

fit(X_train: DataFrame, y_train: DataFrame | Series, X_val: DataFrame | None = None, y_val: DataFrame | Series | None = None) -> AutoSelectValidator

Train the validator, auto-selecting the best model.

Source code in src/signalflow/validator/sklearn_validator.py
def fit(
    self,
    X_train: pl.DataFrame,
    y_train: pl.DataFrame | pl.Series,
    X_val: pl.DataFrame | None = None,
    y_val: pl.DataFrame | pl.Series | None = None,
) -> "AutoSelectValidator":
    """Train the validator, auto-selecting the best model."""
    from sklearn.model_selection import cross_val_score

    X_np = self._extract_features(X_train, fit_mode=True)
    y_np = self._extract_labels(y_train)

    best_score = -np.inf
    best_validator_cls = None

    for validator_cls in AUTO_SELECT_VALIDATORS:
        try:
            validator = validator_cls()
            model = validator._create_model()

            scores = cross_val_score(
                model,
                X_np,
                y_np,
                cv=self.auto_select_cv_folds,
                scoring=self.auto_select_metric,
                n_jobs=-1,
            )
            mean_score = scores.mean()

            if mean_score > best_score:
                best_score = mean_score
                best_validator_cls = validator_cls

        except ImportError:
            continue
        except Exception:
            continue

    if best_validator_cls is None:
        raise RuntimeError("No suitable model found. Install lightgbm, xgboost, or scikit-learn.")

    # Create and fit the selected validator
    self.selected_validator = best_validator_cls(feature_columns=self.feature_columns)
    self.selected_validator.fit(X_train, y_train, X_val, y_val)
    self.model = self.selected_validator.model

    return self

predict

predict(signals: Signals, X: DataFrame) -> Signals
Source code in src/signalflow/validator/sklearn_validator.py
def predict(self, signals: Signals, X: pl.DataFrame) -> Signals:
    if self.selected_validator is None:
        raise ValueError("Model not fitted. Call fit() first.")
    return self.selected_validator.predict(signals, X)

predict_proba

predict_proba(signals: Signals, X: DataFrame) -> Signals
Source code in src/signalflow/validator/sklearn_validator.py
def predict_proba(self, signals: Signals, X: pl.DataFrame) -> Signals:
    if self.selected_validator is None:
        raise ValueError("Model not fitted. Call fit() first.")
    return self.selected_validator.predict_proba(signals, X)

tune

tune(*args: Any, **kwargs: Any) -> dict[str, Any]
Source code in src/signalflow/validator/sklearn_validator.py
def tune(self, *args: Any, **kwargs: Any) -> dict[str, Any]:
    raise NotImplementedError("AutoSelectValidator does not support tune(). Use a specific validator.")