Statistical Analysis¶

SignalFlow provides three statistical validation frameworks for assessing strategy robustness. All numerical kernels are Numba JIT-compiled for high performance.

Quick Start¶

from signalflow.analytic.stats import monte_carlo, bootstrap, statistical_tests

result = sf.Backtest("test").data(raw=data).detector("sma_cross").run()

# Monte Carlo — trade order shuffling
mc = monte_carlo(result, n_simulations=10_000)
print(f"Risk of Ruin: {mc.risk_of_ruin:.1%}")
print(f"Expected Max Drawdown: {mc.expected_max_drawdown:.2%}")

# Bootstrap — confidence intervals
bs = bootstrap(result, method="bca", confidence_level=0.95)
print(bs.intervals["sharpe_ratio"])

# Statistical tests — PSR & MinTRL
tests = statistical_tests(result, sr_benchmark=0.5)
print(f"PSR: {tests.psr:.2%}")
print(f"Min trades needed: {tests.min_track_record_length}")

Monte Carlo Simulation¶

Randomizes trade execution order across thousands of simulations to estimate the distribution of outcomes. Answers: "How lucky/unlucky was this specific trade sequence?"

from signalflow.analytic.stats import MonteCarloSimulator

mc = MonteCarloSimulator(
    n_simulations=10_000,
    ruin_threshold=0.20,        # 20% drawdown = ruin
    random_seed=42,
    confidence_levels=(0.05, 0.50, 0.95),
)
result = mc.validate(backtest_result)

MonteCarloResult¶

Attribute	Description
`final_equity_dist`	Distribution of final equity values
`max_drawdown_dist`	Distribution of maximum drawdowns
`max_consecutive_losses_dist`	Distribution of losing streak lengths
`equity_percentiles`	Equity at 5th, 50th, 95th percentiles
`drawdown_percentiles`	Drawdown at 5th, 50th, 95th percentiles
`risk_of_ruin`	P(max drawdown > threshold)
`expected_max_drawdown`	Mean of drawdown distribution
`expected_worst_equity`	5th percentile of equity

mc_result = monte_carlo(result, n_simulations=10_000)

# Visualize
mc_result.plot()    # 3 Plotly figures: equity fan, drawdown dist, ruin curve

# Text summary
print(mc_result.summary())

Bootstrap Confidence Intervals¶

Estimates uncertainty of performance metrics through resampling. Three methods:

Method	Use Case
BCa (bias-corrected accelerated)	General metrics, adjusts for bias and skewness
Percentile	Simple intervals, no correction
Block	Time series with autocorrelation

from signalflow.analytic.stats import BootstrapValidator

bs = BootstrapValidator(
    n_bootstrap=5_000,
    method="bca",               # "bca", "percentile", "block"
    confidence_level=0.95,
    block_size=None,            # Auto for block bootstrap
    metrics=(
        "sharpe_ratio",
        "sortino_ratio",
        "calmar_ratio",
        "profit_factor",
        "win_rate",
    ),
)
result = bs.validate(backtest_result)

Available Metrics¶

Metric	Formula
`sharpe_ratio`	Mean return / Std of returns
`sortino_ratio`	Mean return / Downside std
`calmar_ratio`	Total return / Max drawdown
`profit_factor`	Gross profit / Gross loss
`win_rate`	Winning trades / Total trades

BootstrapResult¶

bs_result = bootstrap(result, method="bca")

# Access confidence intervals
ci = bs_result.intervals["sharpe_ratio"]
print(f"Sharpe: {ci.point_estimate:.2f} [{ci.lower:.2f}, {ci.upper:.2f}]")

# Full bootstrap distributions
dist = bs_result.distributions["sharpe_ratio"]  # np.ndarray

# Visualize
bs_result.plot()    # Forest plot with CIs

Statistical Significance Tests¶

Two tests from Bailey & Lopez de Prado (2012):

Probabilistic Sharpe Ratio (PSR)¶

"What is the probability that the true Sharpe ratio exceeds a benchmark?"

Accounts for skewness and kurtosis of returns — unlike the naive Sharpe ratio which assumes normality.

from signalflow.analytic.stats import StatisticalTestsValidator

tests = StatisticalTestsValidator(
    sr_benchmark=0.0,           # Benchmark to beat
    confidence_level=0.95,
)
result = tests.validate(backtest_result)

Attribute	Description
`psr`	Probability that true SR > benchmark (0–1)
`psr_is_significant`	Whether PSR > confidence_level

Minimum Track Record Length (MinTRL)¶

"How many observations do we need for the Sharpe ratio to be statistically significant?"

Attribute	Description
`min_track_record_length`	Minimum trades needed for significance
`current_track_record`	Current number of observations
`track_record_sufficient`	Whether current data is enough

tests_result = statistical_tests(result, sr_benchmark=0.5)

if tests_result.track_record_sufficient:
    print("Sufficient data for significance")
else:
    needed = tests_result.min_track_record_length - tests_result.current_track_record
    print(f"Need {needed} more observations")

Combined Validation¶

from signalflow.analytic.stats import ValidationResult

combined = ValidationResult(
    monte_carlo=mc_result,
    bootstrap=bs_result,
    statistical_tests=tests_result,
)

# Dashboard with all results
combined.plot()         # 2x2 Plotly dashboard
print(combined.summary())

Numba Acceleration¶

All compute-intensive kernels use @njit(cache=True) with parallel support:

Kernel	Parallelization
`simulate_equity_curves()`	`prange` — parallel simulations
`bootstrap_sharpe_ratio()`	`prange` — parallel resamples
`bootstrap_generic()`	`prange` — parallel resamples
`compute_acceleration()`	Single-threaded (jackknife)
Metric functions	Single-threaded

Kernels are JIT-compiled on first run and cached to disk. Subsequent calls use the cached machine code.

Typical runtimes:

10,000 Monte Carlo simulations: ~50–200ms
5,000 bootstrap resamples: ~100–300ms
Statistical tests: ~10–50ms

Imports¶

# Convenience functions
from signalflow.analytic.stats import monte_carlo, bootstrap, statistical_tests

# Class-based API
from signalflow.analytic.stats import (
    MonteCarloSimulator,
    BootstrapValidator,
    StatisticalTestsValidator,
    ValidationResult,
)

# Result types
from signalflow.analytic.stats import (
    MonteCarloResult,
    BootstrapResult,
    StatisticalTestResult,
    ConfidenceInterval,
)