Signal Architecture & Meta-Labeling¶
This page describes the core philosophy behind SignalFlow: the signal-driven approach to algorithmic trading based on Marcos Lopez de Prado's meta-labeling methodology.
The Core Idea¶
Traditional algorithmic trading systems make binary decisions: buy or sell based on some rule. SignalFlow takes a fundamentally different approach inspired by machine learning research in quantitative finance:
- Signal Detection -- identify potential market changes
- Signal Validation -- use ML to estimate the probability that a signal is correct
- Strategy Execution -- use validated signals to make trading decisions
This separation is critical: the detector's job is to have high recall (detect as many real opportunities as possible), while the validator's job is to have high precision (filter out false positives).
flowchart LR
D[Detector] -->|raw signals| V[Validator]
V -->|filtered signals + probability| S[Strategy]
L[Labeler] -->|historical labels| T[Train Validator]
T --> V
style D fill:#ea580c,stroke:#f97316,color:#fff
style V fill:#16a34a,stroke:#22c55e,color:#fff
style S fill:#dc2626,stroke:#ef4444,color:#fff
style L fill:#7c3aed,stroke:#8b5cf6,color:#fff
style T fill:#0891b2,stroke:#06b6d4,color:#fff
What is a Signal?¶
A signal is a prediction that a specific type of market change is occurring or is about to occur. Signals are not trading orders -- they are informational events that the strategy layer interprets.
Two-Level Signal Model¶
SignalFlow uses a two-level signal model:
| Level | Field | Type | Purpose |
|---|---|---|---|
| Category | signal_category |
SignalCategory enum |
Broad classification for routing |
| Type | signal_type |
str |
Specific signal value within category |
from signalflow.core.enums import SignalCategory
# A signal has both a category and a type
signal_category = SignalCategory.PRICE_DIRECTION # broad: about price movement
signal_type = "rise" # specific: price is rising
This design allows the system to:
- Route signals based on category (e.g., directional signals go to entry rules, volatility signals go to position sizing)
- Extend signal types without modifying core code (any string is valid as a
signal_type) - Keep multiple signal categories per timestamp (e.g., a bar can
simultaneously have a
price_direction=risesignal and avolatility=high_volatilitysignal)
Signal Categories¶
| Category | Description | Example Types |
|---|---|---|
PRICE_DIRECTION |
Price movement direction | rise, fall, flat |
PRICE_STRUCTURE |
Price patterns and extrema | local_max, local_min, breakout_up |
TREND_MOMENTUM |
Trend state and momentum | trend_start, trend_reversal, overbought |
VOLATILITY |
Volatility regime | high_volatility, low_volatility, volatility_expansion |
VOLUME_LIQUIDITY |
Volume and liquidity patterns | abnormal_volume, illiquidity, accumulation |
MARKET_WIDE |
Cross-pair market events | market_crash, regime_shift, synchronization |
ANOMALY |
Anomalous events | extreme_positive_anomaly, extreme_negative_anomaly |
The full registry of known signal types is in
signalflow.core.signal_registry.KNOWN_SIGNALS. This registry is advisory --
new signal types can be used without modifying it.
Null vs FLAT: Uncertainty vs Market State¶
A critical distinction in SignalFlow:
| Value | Meaning | Example |
|---|---|---|
"flat" |
The market is moving sideways. This is a valid signal -- a real market state. | After a strong trend, price consolidates in a range. The labeler identifies this as "flat". |
null |
The labeler/detector doesn't know. Epistemic uncertainty. | An unexpected tweet from a public figure causes unpredictable volatility. The labeler cannot determine direction and outputs null. |
This matters for ML training: "flat" is a valid class label that the model
should learn to predict. null means "skip this sample" -- the labeler has no
information about this timestamp, and the model should not be trained on it.
# FLAT = real market state (valid signal)
pl.lit("flat").alias("signal_type") # "I know the market is sideways"
# null = uncertainty (no information)
pl.lit(None, dtype=pl.Utf8).alias("signal_type") # "I don't know what's happening"
The Dual Pipeline: Labelers vs Detectors¶
Every signal category in SignalFlow can have two complementary implementations:
Labeler (Historical, Forward-Looking)¶
- Lives in
signalflow.targetmodule - Knows the future -- uses data after timestamp
tto labelt - Purpose: generate training labels for ML models
- Extends
Labelerbase class, implementscompute_group() - Length-preserving: output has the same number of rows as input
from signalflow.target import AnomalyLabeler
labeler = AnomalyLabeler(
threshold_return_std=4.0,
horizon=60,
mask_to_signals=False,
)
labeled_df = labeler.compute(ohlcv_df)
# Each row gets a label: "extreme_positive_anomaly", "extreme_negative_anomaly", or null
Detector (Real-Time, Backward-Looking)¶
- Lives in
signalflow.detectormodule - Only uses past and current data -- safe for live trading
- Purpose: generate trading signals in real-time
- Extends
SignalDetectorbase class, implementsdetect() - Outputs a
SignalsDataFrame (only rows with detected signals)
from signalflow.detector import AnomalyDetector
detector = AnomalyDetector(
threshold_return_std=4.0,
vol_window=1440,
)
signals = detector.run(raw_data_view)
# Only bars with anomalies are returned
Why Both?¶
The labeler knows the future, so its labels are more accurate but can only be used for training. The detector works in real-time but is less precise.
The training pipeline:
- Run labeler on historical data to create perfect (or near-perfect) labels
- Train a validator (ML model) to predict labeler output from features
- In production, the detector generates candidate signals
- The validator filters and scores them
flowchart TB
subgraph Training ["Training (offline)"]
H[Historical Data] --> L[Labeler]
L -->|labels| ML[Train ML Model]
H --> F1[Features]
F1 --> ML
end
subgraph Production ["Production (real-time)"]
M[Market Data] --> D[Detector]
M --> F2[Features]
D -->|candidate signals| V[Validator / ML Model]
F2 --> V
V -->|validated signals| S[Strategy]
end
ML -.->|trained model| V
style L fill:#7c3aed,stroke:#8b5cf6,color:#fff
style D fill:#ea580c,stroke:#f97316,color:#fff
style V fill:#16a34a,stroke:#22c55e,color:#fff
style ML fill:#0891b2,stroke:#06b6d4,color:#fff
Available Labelers & Detectors¶
Price Direction¶
The most basic signal category. Predicts whether price will rise or fall.
| Component | Algorithm | Signal Types |
|---|---|---|
FixedHorizonLabeler |
Forward return sign over N bars | rise, fall, null |
TripleBarrierLabeler |
Triple barrier method (De Prado) | rise, fall, null |
TakeProfitLabeler |
Symmetric TP/SL barrier | rise, fall, null |
TrendScanningLabeler |
OLS t-statistic across windows (De Prado) | rise, fall, null |
ExampleSmaCrossDetector |
SMA crossover (real-time) | rise, fall |
Anomaly¶
Detects extreme, unexpected market events (anomalous returns).
| Component | Algorithm | Signal Types |
|---|---|---|
AnomalyLabeler |
Forward return magnitude vs rolling vol | extreme_positive_anomaly, extreme_negative_anomaly, null |
AnomalyDetector |
Current return magnitude vs rolling vol | extreme_positive_anomaly, extreme_negative_anomaly |
Volatility Regime¶
Classifies current volatility state.
| Component | Algorithm | Signal Types |
|---|---|---|
VolatilityRegimeLabeler |
Forward realized vol percentile | high_volatility, low_volatility, null |
VolatilityDetector |
Backward realized vol percentile | high_volatility, low_volatility |
Price Structure¶
Identifies local price extrema.
| Component | Algorithm | Signal Types |
|---|---|---|
StructureLabeler |
Symmetric window extrema (look-ahead) | local_max, local_min, null |
StructureDetector |
Backward zigzag with confirmation delay | local_max, local_min |
Volume Regime¶
Classifies volume patterns.
| Component | Algorithm | Signal Types |
|---|---|---|
VolumeRegimeLabeler |
Forward volume ratio vs SMA | abnormal_volume, illiquidity, null |
Imperfect Labels Are Expected¶
A fundamental principle of SignalFlow:
Labels are not ground truth
No labeler produces perfect labels. Different labelers can contradict
each other for the same timestamp. Some labelers have large periods of
null (uncertainty). This is by design.
The philosophy:
- Each labeler implements a specific definition of what constitutes a signal
- The
FixedHorizonLabelersays "rise" means the price went up over N bars - The
TrendScanningLabelersays "rise" means there's a statistically significant upward OLS trend - The
TripleBarrierLabelersays "rise" means the take-profit barrier was hit first
These definitions do not always agree. A model that can predict any one of these labelers with reasonable accuracy is considered a good result.
The task of choosing the right labeler for a given model architecture is itself an important research question. Different model architectures may perform better with different labeling strategies:
- Linear models may work best with
FixedHorizonLabeler - Sequence models (LSTM, Transformer) may benefit from
TrendScanningLabeler - Tree models may handle the noise in
TripleBarrierLabelerwell
Meta-Labeling: The Two-Stage Approach¶
De Prado's meta-labeling methodology works in two stages:
Stage 1: Primary Model (Detector)¶
The primary model detects the direction of potential trades. It should have high recall -- it's better to detect too many signals than to miss real ones.
from signalflow.detector import ExampleSmaCrossDetector
detector = ExampleSmaCrossDetector(fast_period=20, slow_period=50)
signals = detector.run(raw_data_view)
# Many signals, some false positives
Stage 2: Secondary Model (Validator)¶
The secondary model (meta-labeler) predicts the probability of success for each signal from the primary model. It acts as a filter.
from signalflow.validator import LightGBMValidator
validator = LightGBMValidator(n_estimators=100)
validator.fit(X_train=features, y_train=labels)
validated = validator.validate_signals(signals, features)
# Each signal now has a probability estimate
Combined in Strategy¶
The SignalAggregator supports a dedicated META_LABELING voting mode for
combining detector signals with validator confidence:
from signalflow.strategy.component.entry.aggregation import (
SignalAggregator, VotingMode,
)
aggregator = SignalAggregator(
voting_mode=VotingMode.META_LABELING,
probability_threshold=0.6,
)
combined = aggregator.aggregate([detector_signals, validator_signals])
# Direction from detector, confidence from validator
From Signals to Actions¶
Not all signals map directly to buy/sell decisions. A high_volatility signal
doesn't mean "buy" or "sell" -- it means "volatility is high", which might
affect position sizing or risk management.
The decision of what to do with a signal is the strategy layer's responsibility. This logic can range from simple rules to complex RL policies:
- Simple:
rise -> BUY,fall -> SELL(built into entry rules) - Configurable:
local_min -> BUY,overbought -> SELL(viasignal_type_map) - Contextual:
high_volatility + local_min -> BUY with larger size - Learned: RL model that optimizes actions over signal combinations
Configurable Signal-to-Action Mapping¶
Entry rules (SignalEntryRule, FixedSizeEntryRule, ModelEntryRule) support
a signal_type_map parameter that maps any signal_type to an order side:
from signalflow.strategy.component.entry import SignalEntryRule
# Custom mapping: trade structure signals
entry = SignalEntryRule(
signal_type_map={
"local_min": "BUY",
"local_max": "SELL",
"oversold": "BUY",
"overbought": "SELL",
},
base_position_size=200.0,
)
When signal_type_map=None (default), legacy behavior is used: only "rise"
and "fall" signals are recognized.
DIRECTIONAL_SIGNAL_MAP¶
The DIRECTIONAL_SIGNAL_MAP in signalflow.core.signal_registry provides a
global registry of inherently directional signal types. Use the
from_directional_map() classmethod to create an entry rule that trades all
known directional signals:
from signalflow.core.signal_registry import DIRECTIONAL_SIGNAL_MAP
# Global registry of directional signal types
DIRECTIONAL_SIGNAL_MAP = {
"rise": "BUY", "fall": "SELL",
"local_min": "BUY", "local_max": "SELL",
"breakout_up": "BUY", "breakout_down": "SELL",
"oversold": "BUY", "overbought": "SELL",
}
# Create entry rule that trades all directional signals
entry = SignalEntryRule.from_directional_map(base_position_size=200.0)
Smooth Labeling (Planned)¶
Standard categorical labels ("rise", "fall") lose information about the
magnitude of the signal. A barely-rising market and a strongly-rising market
both get the same label.
Smooth labeling preserves this information:
signal_category = "price_direction"
signal_type = "rise" # categorical (for routing and strategy logic)
signal_value = 0.73 # continuous (magnitude/confidence for ML training)
The signal_type remains categorical for routing purposes, while signal_value
(stored in the existing signal column of the Signals DataFrame) carries
the continuous magnitude. This improves ML training by providing a richer
target variable.
Status
Smooth labeling is planned for a future release. The TrendScanningLabeler
already provides a t_stat meta column that can serve as a continuous signal
value when include_meta=True.
Extending the Signal Taxonomy¶
Adding a new signal category requires minimal changes:
1. Create a Labeler¶
from dataclasses import dataclass
import signalflow as sf
from signalflow.core.enums import SignalCategory
from signalflow.target.base import Labeler
@dataclass
@sf.labeler("my_custom_labeler")
class MyLabeler(Labeler):
signal_category = SignalCategory.MARKET_WIDE # or any category
def compute_group(self, group_df, data_context=None):
# Your forward-looking labeling algorithm
...
return group_df.with_columns(
pl.when(condition)
.then(pl.lit("my_signal_type"))
.otherwise(pl.lit(None, dtype=pl.Utf8))
.alias(self.out_col)
)
2. Create a Detector (optional)¶
from dataclasses import dataclass, field
import signalflow as sf
from signalflow.core.enums import SignalCategory
from signalflow.detector.base import SignalDetector
@dataclass
@sf.detector("my_custom_detector")
class MyDetector(SignalDetector):
signal_category = SignalCategory.MARKET_WIDE
allowed_signal_types: set[str] | None = field(
default_factory=lambda: {"my_signal_type"}
)
def detect(self, features, context=None):
# Your backward-looking detection algorithm
...
return Signals(signals_df)
3. Register in signal_registry (optional)¶
Update KNOWN_SIGNALS and DIRECTIONAL_SIGNAL_MAP if the signal types should
be discoverable or have directional mappings.
Architecture Summary¶
flowchart TB
subgraph Signals ["Signal Categories"]
direction TB
PD["Price Direction<br/>rise, fall, flat"]
PS["Price Structure<br/>local_max, local_min"]
TM["Trend Momentum<br/>trend_start, trend_reversal"]
VOL["Volatility<br/>high_volatility, low_volatility"]
VL["Volume Liquidity<br/>abnormal_volume, illiquidity"]
MW["Market Wide<br/>market_crash, synchronization"]
AN["Anomaly<br/>extreme_positive_anomaly"]
end
subgraph Pipeline ["Processing Pipeline"]
direction LR
DET[Detectors] --> ROUTER[Signal Router]
ROUTER --> |directional| ENTRY[Entry Rules]
ROUTER --> |non-directional| SIZING[Position Sizing]
ROUTER --> |anomaly| RISK[Risk Management]
ENTRY --> EXEC[Strategy Execution]
SIZING --> EXEC
RISK --> EXEC
end
subgraph Training ["Training Pipeline"]
direction LR
LAB[Labelers] --> TRAIN[Train Validator]
TRAIN --> VAL[Validator]
VAL --> DET
end
Signals --> DET
Signals --> LAB
style PD fill:#3b82f6,color:#fff
style PS fill:#8b5cf6,color:#fff
style TM fill:#06b6d4,color:#fff
style VOL fill:#f97316,color:#fff
style VL fill:#22c55e,color:#fff
style MW fill:#ef4444,color:#fff
style AN fill:#dc2626,color:#fff