Data Loading & Resampling¶
This notebook demonstrates how to load market data from DuckDB stores and use SignalFlow's OHLCV resampling utilities to work with multiple timeframes.
What you'll learn:
- Generate synthetic OHLCV data with
VirtualDataProvider - Load data using
RawDataFactoryand thesf.load()shortcut - Detect the timeframe of existing data automatically
- Resample between timeframes (e.g. 1m to 1h, 1m to 4h)
- Check exchange-specific timeframe support
SignalFlow version: 0.5.0
1. Generate Synthetic Data¶
We use VirtualDataProvider to create realistic OHLCV bars via a geometric
random walk. This lets us explore the data loading and resampling APIs without
needing exchange credentials or real market data.
from datetime import datetime
from pathlib import Path
import signalflow as sf
from signalflow.data import RawDataFactory
from signalflow.data.raw_store import DuckDbSpotStore
from signalflow.data.source import VirtualDataProvider
# Create a temporary DuckDB store
db_path = Path("/tmp/data_loading_demo.duckdb")
store = DuckDbSpotStore(db_path=db_path)
# Generate 10,000 one-minute bars for 3 pairs
provider = VirtualDataProvider(store=store, seed=42)
provider.download(
pairs=["BTCUSDT", "ETHUSDT", "SOLUSDT"],
n_bars=10_000,
)
print(f"Store created at: {db_path}")
print(f"Store stats:\n{store.get_stats()}")
2026-02-15 00:50:32.143 | INFO | signalflow.data.raw_store.duckdb_stores:_ensure_tables:153 - Database initialized: /tmp/data_loading_demo.duckdb (data_type=spot, timeframe=1m) 2026-02-15 00:50:32.201 | DEBUG | signalflow.data.raw_store.duckdb_stores:insert_klines:220 - Inserted 10,000 rows for BTCUSDT 2026-02-15 00:50:32.202 | INFO | signalflow.data.source.virtual:download:255 - VirtualDataProvider: generated 10000 bars for BTCUSDT 2026-02-15 00:50:32.263 | DEBUG | signalflow.data.raw_store.duckdb_stores:insert_klines:220 - Inserted 10,000 rows for ETHUSDT 2026-02-15 00:50:32.264 | INFO | signalflow.data.source.virtual:download:255 - VirtualDataProvider: generated 10000 bars for ETHUSDT 2026-02-15 00:50:32.330 | DEBUG | signalflow.data.raw_store.duckdb_stores:insert_klines:220 - Inserted 10,000 rows for SOLUSDT 2026-02-15 00:50:32.332 | INFO | signalflow.data.source.virtual:download:255 - VirtualDataProvider: generated 10000 bars for SOLUSDT
Store created at: /tmp/data_loading_demo.duckdb Store stats: shape: (3, 5) ┌─────────┬───────┬─────────────────────┬─────────────────────┬──────────────┐ │ pair ┆ rows ┆ first_candle ┆ last_candle ┆ total_volume │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ datetime[μs] ┆ datetime[μs] ┆ f64 │ ╞═════════╪═══════╪═════════════════════╪═════════════════════╪══════════════╡ │ BTCUSDT ┆ 10000 ┆ 2024-01-01 00:00:00 ┆ 2024-01-07 22:39:00 ┆ 1.8066e7 │ │ ETHUSDT ┆ 10000 ┆ 2024-01-01 00:00:00 ┆ 2024-01-07 22:39:00 ┆ 1.7957e7 │ │ SOLUSDT ┆ 10000 ┆ 2024-01-01 00:00:00 ┆ 2024-01-07 22:39:00 ┆ 1.7978e7 │ └─────────┴───────┴─────────────────────┴─────────────────────┴──────────────┘
2. Load Data with RawDataFactory¶
RawDataFactory.from_duckdb_spot_store() gives you full control over data
loading: pair selection, date range filtering, schema validation, deduplication,
and optional auto-resampling.
raw_data = RawDataFactory.from_duckdb_spot_store(
spot_store_path=db_path,
pairs=["BTCUSDT", "ETHUSDT", "SOLUSDT"],
start=datetime(2020, 1, 1),
end=datetime(2030, 1, 1),
data_types=["spot"],
)
spot_df = raw_data.get("spot")
print(f"Shape: {spot_df.shape}")
print(f"Pairs: {spot_df['pair'].unique().sort().to_list()}")
print(f"Time range: {spot_df['timestamp'].min()} .. {spot_df['timestamp'].max()}")
print(f"Columns: {spot_df.columns}")
2026-02-15 00:50:32.348 | INFO | signalflow.data.raw_store.duckdb_stores:_ensure_tables:153 - Database initialized: /tmp/data_loading_demo.duckdb (data_type=spot, timeframe=1m)
Shape: (30000, 8) Pairs: ['BTCUSDT', 'ETHUSDT', 'SOLUSDT'] Time range: 2024-01-01 00:00:00 .. 2024-01-07 22:39:00 Columns: ['pair', 'timestamp', 'open', 'high', 'low', 'close', 'volume', 'trades']
3. Load with sf.load() Shortcut¶
For quick exploration, sf.load() wraps the factory method in a single call.
It accepts a path to a .duckdb file and returns a RawData container.
raw_quick = sf.load(
db_path,
pairs=["BTCUSDT", "ETHUSDT", "SOLUSDT"],
start="2024-01-01",
timeframe="1m",
)
print(f"Loaded pairs: {raw_quick.pairs}")
print(f"Spot shape: {raw_quick.get('spot').shape}")
2026-02-15 00:50:32.381 | INFO | signalflow.data.raw_store.duckdb_stores:_ensure_tables:153 - Database initialized: /tmp/data_loading_demo.duckdb (data_type=spot, timeframe=1m)
Loaded pairs: ['BTCUSDT', 'ETHUSDT', 'SOLUSDT'] Spot shape: (30000, 8)
4. Detect Timeframe¶
detect_timeframe() computes the most common timestamp delta across all pairs
and maps it to the nearest known timeframe string.
from signalflow.data.resample import detect_timeframe
df = raw_data.get("spot")
detected_tf = detect_timeframe(df)
print(f"Detected timeframe: {detected_tf}")
Detected timeframe: 1m
5. OHLCV Resampling¶
resample_ohlcv() aggregates candles from a smaller timeframe to a larger one
using correct OHLCV rules:
| Column | Aggregation |
|---|---|
open |
first |
high |
max |
low |
min |
close |
last |
volume |
sum |
trades |
sum |
from signalflow.data.resample import resample_ohlcv
df_1m = raw_data.get("spot")
print(f"Original (1m): {df_1m.shape}")
df_1h = resample_ohlcv(df_1m, source_tf="1m", target_tf="1h")
print(f"Resampled (1h): {df_1h.shape}")
df_4h = resample_ohlcv(df_1m, source_tf="1m", target_tf="4h")
print(f"Resampled (4h): {df_4h.shape}")
Original (1m): (30000, 8) Resampled (1h): (501, 8) Resampled (4h): (126, 8)
6. Auto-Detect and Resample¶
align_to_timeframe() combines detection and resampling in one step: it
auto-detects the source timeframe and resamples to the target if possible.
If resampling is not possible (e.g. the target is not a multiple of the
source), the data is returned unchanged with a warning.
from signalflow.data.resample import align_to_timeframe
df_auto_1h = align_to_timeframe(df_1m, target_tf="1h")
print(f"Auto-resampled to 1h: {df_auto_1h.shape}")
Auto-resampled to 1h: (501, 8)
7. Auto-Resampling During Data Loading¶
Both RawDataFactory.from_duckdb_spot_store() and RawDataFactory.from_stores()
accept a target_timeframe parameter. When set, the data is automatically
resampled after loading -- no separate resampling step needed.
raw_1h = RawDataFactory.from_duckdb_spot_store(
spot_store_path=db_path,
pairs=["BTCUSDT", "ETHUSDT", "SOLUSDT"],
start=datetime(2020, 1, 1),
end=datetime(2030, 1, 1),
data_types=["spot"],
target_timeframe="1h",
)
print(f"Auto-resampled spot: {raw_1h.get('spot').shape}")
2026-02-15 00:50:32.488 | INFO | signalflow.data.raw_store.duckdb_stores:_ensure_tables:153 - Database initialized: /tmp/data_loading_demo.duckdb (data_type=spot, timeframe=1m)
Auto-resampled spot: (501, 8)
8. Exchange Timeframe Support¶
Not every exchange supports every timeframe. SignalFlow ships with
EXCHANGE_TIMEFRAMES (a mapping of exchange name to supported timeframes)
and helper functions to navigate this.
from signalflow.data.resample import (
EXCHANGE_TIMEFRAMES,
TIMEFRAME_MINUTES,
can_resample,
select_best_timeframe,
)
# Show standard timeframes
print("Standard timeframes:")
for tf, minutes in TIMEFRAME_MINUTES.items():
print(f" {tf:>4s} = {minutes:>5} min")
print("\nExchange support:")
for exchange, tfs in EXCHANGE_TIMEFRAMES.items():
# Sort by duration for readability
sorted_tfs = sorted(tfs, key=lambda t: TIMEFRAME_MINUTES[t])
print(f" {exchange:>15s}: {', '.join(sorted_tfs)}")
Standard timeframes:
1m = 1 min
3m = 3 min
5m = 5 min
15m = 15 min
30m = 30 min
1h = 60 min
2h = 120 min
4h = 240 min
6h = 360 min
8h = 480 min
12h = 720 min
1d = 1440 min
Exchange support:
binance: 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d
bybit: 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 12h, 1d
okx: 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 12h, 1d
kraken_spot: 1m, 5m, 15m, 30m, 1h, 4h, 1d
kraken_futures: 1m, 5m, 15m, 30m, 1h, 4h, 12h, 1d
deribit: 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 12h, 1d
hyperliquid: 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 8h, 12h, 1d
whitebit: 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d
# Find best timeframe for an exchange
# Bybit does not support 8h natively, so select_best_timeframe picks
# the largest supported TF that evenly divides 8h.
best = select_best_timeframe("bybit", target_tf="8h")
print(f"Best Bybit TF for 8h target: {best}")
# Binance supports 8h directly
best_binance = select_best_timeframe("binance", target_tf="8h")
print(f"Best Binance TF for 8h target: {best_binance}")
# Check if resampling is possible
print(f"\nCan resample 1m -> 1h? {can_resample('1m', '1h')}")
print(f"Can resample 1h -> 1m? {can_resample('1h', '1m')}")
print(f"Can resample 1h -> 4h? {can_resample('1h', '4h')}")
print(f"Can resample 1h -> 3h? {can_resample('1h', '3h')}")
Best Bybit TF for 8h target: 4h Best Binance TF for 8h target: 8h Can resample 1m -> 1h? True Can resample 1h -> 1m? False Can resample 1h -> 4h? True Can resample 1h -> 3h? False
9. Summary¶
| Function | Purpose |
|---|---|
sf.load() |
Quick data loading from DuckDB |
RawDataFactory.from_duckdb_spot_store() |
Full-control data loading with validation |
detect_timeframe() |
Auto-detect timeframe from data |
resample_ohlcv() |
Resample OHLCV between timeframes |
align_to_timeframe() |
Auto-detect source TF + resample |
select_best_timeframe() |
Find best exchange TF for a target |
can_resample() |
Check if resampling is possible |
Cleanup¶
store.close()
db_path.unlink(missing_ok=True)
print("Temporary DuckDB file removed. Done!")
Temporary DuckDB file removed. Done!
Next Steps¶
- 01 - Quick Start: Run your first backtest in 5 minutes
- 02 - Custom Detector: Create your own signal detector
- 04 - Pipeline Visualization: Visualize your strategy pipeline
- 05 - Advanced Strategies: Multi-detector ensembles