Author: Thijs van den Berg (Page 3 of 3)

Validating Trading Backtests with Surrogate Time-Series

Back-testing trading strategies is a dangerous business because there is a high risk you will keep tweaking your trading strategy model to make the back-test results better. When you do so, you’ll find out that after tweaking you have actually worsened the ‘live’ performance later on. The reason is that you’ve been overfitting your trading model to your back-test data through selection bias.

In this post we will use two techniques that help quantify and monitor the statistical significance of backtesting and tweaking:

  1. First, we analyze the performance of backtest results by comparing them against random trading strategies that similar trading characteristics (time period, number of trades, long/short ratio). This quantifies specifically how “special” the timing of the trading strategy is while keeping all other things equal (like the trends, volatility, return distribution, and patterns in the traded asset).
  2. Second, we analyse the impact and cost of tweaking strategies by comparing it against doing the same thing with random strategies. This allows us to see if improvements are significant, or simply what one would expect when picking the best strategy from a set of multiple variants.
Continue reading

Parallel Processing of Tasks with Python’s Multiprocessing lib

The Python code snippet below uses the multiprocessing library to processes a list of tasks in parallel using a pool of 5 threads.

note: Python also has a multithreading library called “threading”, but it is well documented that Python multithreading doesn’t work for CPU-bound tasks due to Python’s Global Interpreter Lock (GIL), for more info google: “python multithreading gil”.

from multiprocessing import Process, Pool

import itertools
import time


def train(opt, delay=2.0):
    time.sleep(delay)
    return f'Done training {opt}'


# Grid search
grid = {
    'batch_size': [32, 64, 128],
    'learning_rate': [1E-4, 1E-3, 1E-2]
}


def main():
    settings_list = []
    for values in itertools.product(*grid.values()):
        settings_list.append( dict(zip(grid.keys(), values)) )

    with Pool(5) as p:
        print(p.map(train, settings_list))


if __name__ == "__main__":
    main()

Output

[
"Done training {'batch_size': 32, 'learning_rate': 0.0001}", 
"Done training {'batch_size': 32, 'learning_rate': 0.001}", 
"Done training {'batch_size': 32, 'learning_rate': 0.01}", 
"Done training {'batch_size': 64, 'learning_rate': 0.0001}", 
"Done training {'batch_size': 64, 'learning_rate': 0.001}", 
"Done training {'batch_size': 64, 'learning_rate': 0.01}", 
"Done training {'batch_size': 128, 'learning_rate': 0.0001}", 
"Done training {'batch_size': 128, 'learning_rate': 0.001}", 
"Done training {'batch_size': 128, 'learning_rate': 0.01}"
]

Parameter Grid-searching with Python’s itertools

Python’s Itertools offers a great solution when you want to do a grid-search for optimal hyperparameter values, -or in general generate sets of experiments-.

In the code fragment below we generate experiment settings (key-value pairs stored in dictionaries) for all combinations of batch sizes and learning rates.

import itertools

# General settings
base_settings = {'epochs': 10}

# Grid search
grid = {
    'batch_size': [32, 64, 128],
    'learning_rate': [1E-4, 1E-3, 1E-2]
}

# Loop over al grid search combinations
for values in itertools.product(*grid.values()):
    point = dict(zip(grid.keys(), values))

    # merge the general settings
    settings = {**base_settings, **point}

    print(settings)

output:

{'epochs': 10, 'batch_size': 32, 'learning_rate': 0.0001}
{'epochs': 10, 'batch_size': 32, 'learning_rate': 0.001}
{'epochs': 10, 'batch_size': 32, 'learning_rate': 0.01}
{'epochs': 10, 'batch_size': 64, 'learning_rate': 0.0001}
{'epochs': 10, 'batch_size': 64, 'learning_rate': 0.001}
{'epochs': 10, 'batch_size': 64, 'learning_rate': 0.01}
{'epochs': 10, 'batch_size': 128, 'learning_rate': 0.0001}
{'epochs': 10, 'batch_size': 128, 'learning_rate': 0.001}
{'epochs': 10, 'batch_size': 128, 'learning_rate': 0.01}

Gaussian Mixture Approximation for the Laplace Distribution

The Laplacian distribution is an interesting alternative building-block compared to the Gaussian distribution because it has much fatter tails. A drawback might be that some nice analytical properties that Gaussian distribution gives you don’t easily translate to Laplacian distributions. In those cases, it can be handy to approximate the Laplacian distribution with a mixture of Gaussians. The following approximation can then be uses

    \[L(x) = \frac{1}{2}e^{-|x|} \approx \frac{1}{n} \sum_{i=1}^n N\left(x | \mu=0, \sigma^2=-2\ln \frac{1+2i}{2n}\right)\]

def laplacian_gmm(n=4):
    # all components have the same weight
    weights = np.repeat(1.0/n, n)
    
    # centers of the n bins in the interval [0,1]
    uniform = np.arange(0.5/n, 1.0, 1.0/n)
    
    # Uniform- to Exponential-distribution transform
    sigmas = np.array(-2*np.log(uniform))**.5
    return weights, sigmas

def laplacian_gmm_pdf(x, n=4):
    weights, sigmas = laplacian_gmm(n)
    p = np.zeros_like(x)
    for i in range(n):
        p += weights[i] * norm(loc=0, scale=sigmas[i]).pdf(x)
    return p
Newer posts »
SITMO Machine Learning | Quantitative Finance