Statistical Analysis¶
SignalFlow provides three statistical validation frameworks for assessing strategy robustness. All numerical kernels are Numba JIT-compiled for high performance.
Quick Start¶
from signalflow.analytic.stats import monte_carlo, bootstrap, statistical_tests
result = sf.Backtest("test").data(raw=data).detector("sma_cross").run()
# Monte Carlo — trade order shuffling
mc = monte_carlo(result, n_simulations=10_000)
print(f"Risk of Ruin: {mc.risk_of_ruin:.1%}")
print(f"Expected Max Drawdown: {mc.expected_max_drawdown:.2%}")
# Bootstrap — confidence intervals
bs = bootstrap(result, method="bca", confidence_level=0.95)
print(bs.intervals["sharpe_ratio"])
# Statistical tests — PSR & MinTRL
tests = statistical_tests(result, sr_benchmark=0.5)
print(f"PSR: {tests.psr:.2%}")
print(f"Min trades needed: {tests.min_track_record_length}")
Monte Carlo Simulation¶
Randomizes trade execution order across thousands of simulations to estimate the distribution of outcomes. Answers: "How lucky/unlucky was this specific trade sequence?"
from signalflow.analytic.stats import MonteCarloSimulator
mc = MonteCarloSimulator(
n_simulations=10_000,
ruin_threshold=0.20, # 20% drawdown = ruin
random_seed=42,
confidence_levels=(0.05, 0.50, 0.95),
)
result = mc.validate(backtest_result)
MonteCarloResult¶
| Attribute | Description |
|---|---|
final_equity_dist |
Distribution of final equity values |
max_drawdown_dist |
Distribution of maximum drawdowns |
max_consecutive_losses_dist |
Distribution of losing streak lengths |
equity_percentiles |
Equity at 5th, 50th, 95th percentiles |
drawdown_percentiles |
Drawdown at 5th, 50th, 95th percentiles |
risk_of_ruin |
P(max drawdown > threshold) |
expected_max_drawdown |
Mean of drawdown distribution |
expected_worst_equity |
5th percentile of equity |
mc_result = monte_carlo(result, n_simulations=10_000)
# Visualize
mc_result.plot() # 3 Plotly figures: equity fan, drawdown dist, ruin curve
# Text summary
print(mc_result.summary())
Bootstrap Confidence Intervals¶
Estimates uncertainty of performance metrics through resampling. Three methods:
| Method | Use Case |
|---|---|
| BCa (bias-corrected accelerated) | General metrics, adjusts for bias and skewness |
| Percentile | Simple intervals, no correction |
| Block | Time series with autocorrelation |
from signalflow.analytic.stats import BootstrapValidator
bs = BootstrapValidator(
n_bootstrap=5_000,
method="bca", # "bca", "percentile", "block"
confidence_level=0.95,
block_size=None, # Auto for block bootstrap
metrics=(
"sharpe_ratio",
"sortino_ratio",
"calmar_ratio",
"profit_factor",
"win_rate",
),
)
result = bs.validate(backtest_result)
Available Metrics¶
| Metric | Formula |
|---|---|
sharpe_ratio |
Mean return / Std of returns |
sortino_ratio |
Mean return / Downside std |
calmar_ratio |
Total return / Max drawdown |
profit_factor |
Gross profit / Gross loss |
win_rate |
Winning trades / Total trades |
BootstrapResult¶
bs_result = bootstrap(result, method="bca")
# Access confidence intervals
ci = bs_result.intervals["sharpe_ratio"]
print(f"Sharpe: {ci.point_estimate:.2f} [{ci.lower:.2f}, {ci.upper:.2f}]")
# Full bootstrap distributions
dist = bs_result.distributions["sharpe_ratio"] # np.ndarray
# Visualize
bs_result.plot() # Forest plot with CIs
Statistical Significance Tests¶
Two tests from Bailey & Lopez de Prado (2012):
Probabilistic Sharpe Ratio (PSR)¶
"What is the probability that the true Sharpe ratio exceeds a benchmark?"
Accounts for skewness and kurtosis of returns — unlike the naive Sharpe ratio which assumes normality.
from signalflow.analytic.stats import StatisticalTestsValidator
tests = StatisticalTestsValidator(
sr_benchmark=0.0, # Benchmark to beat
confidence_level=0.95,
)
result = tests.validate(backtest_result)
| Attribute | Description |
|---|---|
psr |
Probability that true SR > benchmark (0–1) |
psr_is_significant |
Whether PSR > confidence_level |
Minimum Track Record Length (MinTRL)¶
"How many observations do we need for the Sharpe ratio to be statistically significant?"
| Attribute | Description |
|---|---|
min_track_record_length |
Minimum trades needed for significance |
current_track_record |
Current number of observations |
track_record_sufficient |
Whether current data is enough |
tests_result = statistical_tests(result, sr_benchmark=0.5)
if tests_result.track_record_sufficient:
print("Sufficient data for significance")
else:
needed = tests_result.min_track_record_length - tests_result.current_track_record
print(f"Need {needed} more observations")
Combined Validation¶
from signalflow.analytic.stats import ValidationResult
combined = ValidationResult(
monte_carlo=mc_result,
bootstrap=bs_result,
statistical_tests=tests_result,
)
# Dashboard with all results
combined.plot() # 2x2 Plotly dashboard
print(combined.summary())
Numba Acceleration¶
All compute-intensive kernels use @njit(cache=True) with parallel support:
| Kernel | Parallelization |
|---|---|
simulate_equity_curves() |
prange — parallel simulations |
bootstrap_sharpe_ratio() |
prange — parallel resamples |
bootstrap_generic() |
prange — parallel resamples |
compute_acceleration() |
Single-threaded (jackknife) |
| Metric functions | Single-threaded |
Kernels are JIT-compiled on first run and cached to disk. Subsequent calls use the cached machine code.
Typical runtimes:
- 10,000 Monte Carlo simulations: ~50–200ms
- 5,000 bootstrap resamples: ~100–300ms
- Statistical tests: ~10–50ms
Imports¶
# Convenience functions
from signalflow.analytic.stats import monte_carlo, bootstrap, statistical_tests
# Class-based API
from signalflow.analytic.stats import (
MonteCarloSimulator,
BootstrapValidator,
StatisticalTestsValidator,
ValidationResult,
)
# Result types
from signalflow.analytic.stats import (
MonteCarloResult,
BootstrapResult,
StatisticalTestResult,
ConfidenceInterval,
)