Signal Architecture & Meta-Labeling¶

This page describes the core philosophy behind SignalFlow: the signal-driven approach to algorithmic trading based on Marcos Lopez de Prado's meta-labeling methodology.

The Core Idea¶

Traditional algorithmic trading systems make binary decisions: buy or sell based on some rule. SignalFlow takes a fundamentally different approach inspired by machine learning research in quantitative finance:

Signal Detection -- identify potential market changes
Signal Validation -- use ML to estimate the probability that a signal is correct
Strategy Execution -- use validated signals to make trading decisions

This separation is critical: the detector's job is to have high recall (detect as many real opportunities as possible), while the validator's job is to have high precision (filter out false positives).

flowchart LR
    D[Detector] -->|raw signals| V[Validator]
    V -->|filtered signals + probability| S[Strategy]
    L[Labeler] -->|historical labels| T[Train Validator]
    T --> V

    style D fill:#ea580c,stroke:#f97316,color:#fff
    style V fill:#16a34a,stroke:#22c55e,color:#fff
    style S fill:#dc2626,stroke:#ef4444,color:#fff
    style L fill:#7c3aed,stroke:#8b5cf6,color:#fff
    style T fill:#0891b2,stroke:#06b6d4,color:#fff

What is a Signal?¶

A signal is a prediction that a specific type of market change is occurring or is about to occur. Signals are not trading orders -- they are informational events that the strategy layer interprets.

Two-Level Signal Model¶

SignalFlow uses a two-level signal model:

Level	Field	Type	Purpose
Category	`signal_category`	`SignalCategory` enum	Broad classification for routing
Type	`signal_type`	`str`	Specific signal value within category

from signalflow.core.enums import SignalCategory

# A signal has both a category and a type
signal_category = SignalCategory.PRICE_DIRECTION  # broad: about price movement
signal_type = "rise"                               # specific: price is rising

This design allows the system to:

Route signals based on category (e.g., directional signals go to entry rules, volatility signals go to position sizing)
Extend signal types without modifying core code (any string is valid as a signal_type)
Keep multiple signal categories per timestamp (e.g., a bar can simultaneously have a price_direction=rise signal and a volatility=high_volatility signal)

Signal Categories¶

Category	Description	Example Types
`PRICE_DIRECTION`	Price movement direction	`rise`, `fall`, `flat`
`PRICE_STRUCTURE`	Price patterns and extrema	`local_max`, `local_min`, `breakout_up`
`TREND_MOMENTUM`	Trend state and momentum	`trend_start`, `trend_reversal`, `overbought`
`VOLATILITY`	Volatility regime	`high_volatility`, `low_volatility`, `volatility_expansion`
`VOLUME_LIQUIDITY`	Volume and liquidity patterns	`abnormal_volume`, `illiquidity`, `accumulation`
`MARKET_WIDE`	Cross-pair market events	`market_crash`, `regime_shift`, `synchronization`
`ANOMALY`	Anomalous events	`extreme_positive_anomaly`, `extreme_negative_anomaly`

The full registry of known signal types is in signalflow.core.signal_registry.KNOWN_SIGNALS. This registry is advisory -- new signal types can be used without modifying it.

Null vs FLAT: Uncertainty vs Market State¶

A critical distinction in SignalFlow:

Value	Meaning	Example
`"flat"`	The market is moving sideways. This is a valid signal -- a real market state.	After a strong trend, price consolidates in a range. The labeler identifies this as `"flat"`.
`null`	The labeler/detector doesn't know. Epistemic uncertainty.	An unexpected tweet from a public figure causes unpredictable volatility. The labeler cannot determine direction and outputs `null`.

This matters for ML training: "flat" is a valid class label that the model should learn to predict. null means "skip this sample" -- the labeler has no information about this timestamp, and the model should not be trained on it.

# FLAT = real market state (valid signal)
pl.lit("flat").alias("signal_type")     # "I know the market is sideways"

# null = uncertainty (no information)
pl.lit(None, dtype=pl.Utf8).alias("signal_type")  # "I don't know what's happening"

The Dual Pipeline: Labelers vs Detectors¶

Every signal category in SignalFlow can have two complementary implementations:

Labeler (Historical, Forward-Looking)¶

Lives in signalflow.target module
Knows the future -- uses data after timestamp t to label t
Purpose: generate training labels for ML models
Extends Labeler base class, implements compute_group()
Length-preserving: output has the same number of rows as input

from signalflow.target import AnomalyLabeler

labeler = AnomalyLabeler(
    threshold_return_std=4.0,
    horizon=60,
    mask_to_signals=False,
)
labeled_df = labeler.compute(ohlcv_df)
# Each row gets a label: "extreme_positive_anomaly", "extreme_negative_anomaly", or null

Detector (Real-Time, Backward-Looking)¶

Lives in signalflow.detector module
Only uses past and current data -- safe for live trading
Purpose: generate trading signals in real-time
Extends SignalDetector base class, implements detect()
Outputs a Signals DataFrame (only rows with detected signals)

from signalflow.detector import AnomalyDetector

detector = AnomalyDetector(
    threshold_return_std=4.0,
    vol_window=1440,
)
signals = detector.run(raw_data_view)
# Only bars with anomalies are returned

Why Both?¶

The labeler knows the future, so its labels are more accurate but can only be used for training. The detector works in real-time but is less precise.

The training pipeline:

Run labeler on historical data to create perfect (or near-perfect) labels
Train a validator (ML model) to predict labeler output from features
In production, the detector generates candidate signals
The validator filters and scores them

flowchart TB
    subgraph Training ["Training (offline)"]
        H[Historical Data] --> L[Labeler]
        L -->|labels| ML[Train ML Model]
        H --> F1[Features]
        F1 --> ML
    end

    subgraph Production ["Production (real-time)"]
        M[Market Data] --> D[Detector]
        M --> F2[Features]
        D -->|candidate signals| V[Validator / ML Model]
        F2 --> V
        V -->|validated signals| S[Strategy]
    end

    ML -.->|trained model| V

    style L fill:#7c3aed,stroke:#8b5cf6,color:#fff
    style D fill:#ea580c,stroke:#f97316,color:#fff
    style V fill:#16a34a,stroke:#22c55e,color:#fff
    style ML fill:#0891b2,stroke:#06b6d4,color:#fff

Available Labelers & Detectors¶

Price Direction¶

The most basic signal category. Predicts whether price will rise or fall.

Component	Algorithm	Signal Types
`FixedHorizonLabeler`	Forward return sign over N bars	`rise`, `fall`, `null`
`TripleBarrierLabeler`	Triple barrier method (De Prado)	`rise`, `fall`, `null`
`TakeProfitLabeler`	Symmetric TP/SL barrier	`rise`, `fall`, `null`
`TrendScanningLabeler`	OLS t-statistic across windows (De Prado)	`rise`, `fall`, `null`
`ExampleSmaCrossDetector`	SMA crossover (real-time)	`rise`, `fall`

Anomaly¶

Detects extreme, unexpected market events (anomalous returns).

Component	Algorithm	Signal Types
`AnomalyLabeler`	Forward return magnitude vs rolling vol	`extreme_positive_anomaly`, `extreme_negative_anomaly`, `null`
`AnomalyDetector`	Current return magnitude vs rolling vol	`extreme_positive_anomaly`, `extreme_negative_anomaly`

Volatility Regime¶

Classifies current volatility state.

Component	Algorithm	Signal Types
`VolatilityRegimeLabeler`	Forward realized vol percentile	`high_volatility`, `low_volatility`, `null`
`VolatilityDetector`	Backward realized vol percentile	`high_volatility`, `low_volatility`

Price Structure¶

Identifies local price extrema.

Component	Algorithm	Signal Types
`StructureLabeler`	Symmetric window extrema (look-ahead)	`local_max`, `local_min`, `null`
`StructureDetector`	Backward zigzag with confirmation delay	`local_max`, `local_min`

Volume Regime¶

Classifies volume patterns.

Component	Algorithm	Signal Types
`VolumeRegimeLabeler`	Forward volume ratio vs SMA	`abnormal_volume`, `illiquidity`, `null`

Imperfect Labels Are Expected¶

A fundamental principle of SignalFlow:

Labels are not ground truth

No labeler produces perfect labels. Different labelers can contradict each other for the same timestamp. Some labelers have large periods of null (uncertainty). This is by design.

The philosophy:

Each labeler implements a specific definition of what constitutes a signal
The FixedHorizonLabeler says "rise" means the price went up over N bars
The TrendScanningLabeler says "rise" means there's a statistically significant upward OLS trend
The TripleBarrierLabeler says "rise" means the take-profit barrier was hit first

These definitions do not always agree. A model that can predict any one of these labelers with reasonable accuracy is considered a good result.

The task of choosing the right labeler for a given model architecture is itself an important research question. Different model architectures may perform better with different labeling strategies:

Linear models may work best with FixedHorizonLabeler
Sequence models (LSTM, Transformer) may benefit from TrendScanningLabeler
Tree models may handle the noise in TripleBarrierLabeler well

Meta-Labeling: The Two-Stage Approach¶

De Prado's meta-labeling methodology works in two stages:

Stage 1: Primary Model (Detector)¶

The primary model detects the direction of potential trades. It should have high recall -- it's better to detect too many signals than to miss real ones.

from signalflow.detector import ExampleSmaCrossDetector

detector = ExampleSmaCrossDetector(fast_period=20, slow_period=50)
signals = detector.run(raw_data_view)
# Many signals, some false positives

Stage 2: Secondary Model (Validator)¶

The secondary model (meta-labeler) predicts the probability of success for each signal from the primary model. It acts as a filter.

from signalflow.validator import LightGBMValidator

validator = LightGBMValidator(n_estimators=100)
validator.fit(X_train=features, y_train=labels)

validated = validator.validate_signals(signals, features)
# Each signal now has a probability estimate

Combined in Strategy¶

The SignalAggregator supports a dedicated META_LABELING voting mode for combining detector signals with validator confidence:

from signalflow.strategy.component.entry.aggregation import (
    SignalAggregator, VotingMode,
)

aggregator = SignalAggregator(
    voting_mode=VotingMode.META_LABELING,
    probability_threshold=0.6,
)
combined = aggregator.aggregate([detector_signals, validator_signals])
# Direction from detector, confidence from validator

From Signals to Actions¶

Not all signals map directly to buy/sell decisions. A high_volatility signal doesn't mean "buy" or "sell" -- it means "volatility is high", which might affect position sizing or risk management.

The decision of what to do with a signal is the strategy layer's responsibility. This logic can range from simple rules to complex RL policies:

Simple: rise -> BUY, fall -> SELL (built into entry rules)
Configurable: local_min -> BUY, overbought -> SELL (via signal_type_map)
Contextual: high_volatility + local_min -> BUY with larger size
Learned: RL model that optimizes actions over signal combinations

Configurable Signal-to-Action Mapping¶

Entry rules (SignalEntryRule, FixedSizeEntryRule, ModelEntryRule) support a signal_type_map parameter that maps any signal_type to an order side:

from signalflow.strategy.component.entry import SignalEntryRule

# Custom mapping: trade structure signals
entry = SignalEntryRule(
    signal_type_map={
        "local_min": "BUY",
        "local_max": "SELL",
        "oversold": "BUY",
        "overbought": "SELL",
    },
    base_position_size=200.0,
)

When signal_type_map=None (default), legacy behavior is used: only "rise" and "fall" signals are recognized.

DIRECTIONAL_SIGNAL_MAP¶

The DIRECTIONAL_SIGNAL_MAP in signalflow.core.signal_registry provides a global registry of inherently directional signal types. Use the from_directional_map() classmethod to create an entry rule that trades all known directional signals:

from signalflow.core.signal_registry import DIRECTIONAL_SIGNAL_MAP

# Global registry of directional signal types
DIRECTIONAL_SIGNAL_MAP = {
    "rise": "BUY", "fall": "SELL",
    "local_min": "BUY", "local_max": "SELL",
    "breakout_up": "BUY", "breakout_down": "SELL",
    "oversold": "BUY", "overbought": "SELL",
}

# Create entry rule that trades all directional signals
entry = SignalEntryRule.from_directional_map(base_position_size=200.0)

Smooth Labeling (Planned)¶

Standard categorical labels ("rise", "fall") lose information about the magnitude of the signal. A barely-rising market and a strongly-rising market both get the same label.

Smooth labeling preserves this information:

signal_category = "price_direction"
signal_type = "rise"           # categorical (for routing and strategy logic)
signal_value = 0.73            # continuous (magnitude/confidence for ML training)

The signal_type remains categorical for routing purposes, while signal_value (stored in the existing signal column of the Signals DataFrame) carries the continuous magnitude. This improves ML training by providing a richer target variable.

Status

Smooth labeling is planned for a future release. The TrendScanningLabeler already provides a t_stat meta column that can serve as a continuous signal value when include_meta=True.

Extending the Signal Taxonomy¶

Adding a new signal category requires minimal changes:

1. Create a Labeler¶

from dataclasses import dataclass
import signalflow as sf
from signalflow.core.enums import SignalCategory
from signalflow.target.base import Labeler

@dataclass
@sf.labeler("my_custom_labeler")
class MyLabeler(Labeler):
    signal_category = SignalCategory.MARKET_WIDE  # or any category

    def compute_group(self, group_df, data_context=None):
        # Your forward-looking labeling algorithm
        ...
        return group_df.with_columns(
            pl.when(condition)
            .then(pl.lit("my_signal_type"))
            .otherwise(pl.lit(None, dtype=pl.Utf8))
            .alias(self.out_col)
        )

2. Create a Detector (optional)¶

from dataclasses import dataclass, field
import signalflow as sf
from signalflow.core.enums import SignalCategory
from signalflow.detector.base import SignalDetector

@dataclass
@sf.detector("my_custom_detector")
class MyDetector(SignalDetector):
    signal_category = SignalCategory.MARKET_WIDE
    allowed_signal_types: set[str] | None = field(
        default_factory=lambda: {"my_signal_type"}
    )

    def detect(self, features, context=None):
        # Your backward-looking detection algorithm
        ...
        return Signals(signals_df)

3. Register in signal_registry (optional)¶

Update KNOWN_SIGNALS and DIRECTIONAL_SIGNAL_MAP if the signal types should be discoverable or have directional mappings.

Architecture Summary¶

flowchart TB
    subgraph Signals ["Signal Categories"]
        direction TB
        PD["Price Direction<br/>rise, fall, flat"]
        PS["Price Structure<br/>local_max, local_min"]
        TM["Trend Momentum<br/>trend_start, trend_reversal"]
        VOL["Volatility<br/>high_volatility, low_volatility"]
        VL["Volume Liquidity<br/>abnormal_volume, illiquidity"]
        MW["Market Wide<br/>market_crash, synchronization"]
        AN["Anomaly<br/>extreme_positive_anomaly"]
    end

    subgraph Pipeline ["Processing Pipeline"]
        direction LR
        DET[Detectors] --> ROUTER[Signal Router]
        ROUTER --> |directional| ENTRY[Entry Rules]
        ROUTER --> |non-directional| SIZING[Position Sizing]
        ROUTER --> |anomaly| RISK[Risk Management]
        ENTRY --> EXEC[Strategy Execution]
        SIZING --> EXEC
        RISK --> EXEC
    end

    subgraph Training ["Training Pipeline"]
        direction LR
        LAB[Labelers] --> TRAIN[Train Validator]
        TRAIN --> VAL[Validator]
        VAL --> DET
    end

    Signals --> DET
    Signals --> LAB

    style PD fill:#3b82f6,color:#fff
    style PS fill:#8b5cf6,color:#fff
    style TM fill:#06b6d4,color:#fff
    style VOL fill:#f97316,color:#fff
    style VL fill:#22c55e,color:#fff
    style MW fill:#ef4444,color:#fff
    style AN fill:#dc2626,color:#fff