Category: Statistics

Building Correlation Matrices with Controlled Eigenvalues: A Simple Algorithm

In some cases, we need to construct a correlation matrix with a predefined set of eigenvalues, which is not trivial since arbitrary symmetric matrices with a given set of eigenvalues may not satisfy correlation constraints (e.g., unit diagonal elements).

A practical method to generate such matrices is based on the Method of Alternating Projections (MAP), as introduced by Waller (2018). This approach iteratively adjusts a matrix between two sets until convergence. It goes like this:

  1. Normalize the eigenvalues:
    • Ensure they are sorted in descending order.
    • Scale them such that their sum equals N, the matrix size.
  2. Generate a random orthonormal matrix:
    • Construct a random matrix.
    • Perform QR decomposition to obtain the orthonormal matrix.
  3. Iterate until convergence:
    • Construct an symmetric positive definite (SPD) matrix using the target eigenvalues and the random orthonormal matrix:
      • Clamp values between -1 and +1 to maintain correlation constraints.
      • Force all elements on the diagonal to be 1.
    • Perform eigen decomposition to extract new eigenvalues and eigenvectors.
    • Replace the computed eigenvalues with the target values.
    • Compute the distance measure .
    • Stop if the distance measure is below a given tolerance.

The algorithm seems to converge exponentially fast:

Below Python code!

import numpy as np

def eig_to_cor(eigen_values, eigen_vectors):
    """
    Construct a correlation matrix from given eigenvalues and eigenvectors.
    """
    cor = eigen_vectors @ np.diag(eigen_values) @ eigen_vectors.T
    cor = np.clip(cor, -1, 1)  # Ensure valid correlation values
    np.fill_diagonal(cor, 1.0)  # Force unit diagonal elements
    return cor


def cor_to_eig(cor):
    """
    Extract eigenvalues and eigenvectors from a correlation matrix,
    ensuring they are sorted in descending order.
    """
    eigen_values, eigen_vectors = np.linalg.eigh(cor)
    idx = eigen_values.argsort()[::-1]
    return eigen_values[idx].real, eigen_vectors[:, idx].real


def random_orthonormal_matrix(n):
    """
    Generate a random nxn orthonormal matrix using QR decomposition.
    """
    Q, _ = np.linalg.qr(np.random.randn(n, n))
    return Q


def normalize_eigen_values(eigen_values):
    """
    Normalize eigenvalues to sum to the matrix dimension (n) and ensure that they are sorted in descending order.
    """
    eigen_values = np.sort(eigen_values)[::-1]  # Ensure descending order
    return len(eigen_values) * eigen_values / np.sum(eigen_values)


def gen_cor_with_eigenvalues(eigen_values, tol=1E-7):
    """
    Generate a correlation matrix with the specified eigenvalues
    using the Method of Alternating Projections.
    """
    target_eigen_values = normalize_eigen_values(eigen_values)
    eigen_vectors = random_orthonormal_matrix(len(target_eigen_values))
    
    while True:
        cor = eig_to_cor(target_eigen_values, eigen_vectors)
        eigen_values, eigen_vectors = cor_to_eig(cor)
        
        if np.max(np.abs(target_eigen_values - eigen_values)) < tol:
            break
    
    return cor
# Example usage:

cor_matrix = gen_cor_with_eigenvalues([3, 1, 1/3, 1/3, 1/3])
print(cor_matrix)

>
array([[ 1.        ,  0.52919521, -0.32145437, -0.45812423, -0.65954354],
       [ 0.52919521,  1.        , -0.61037909, -0.65821403, -0.46442884],
       [-0.32145437, -0.61037909,  1.        ,  0.64519865,  0.23287104],
       [-0.45812423, -0.65821403,  0.64519865,  1.        ,  0.38261988],
       [-0.65954354, -0.46442884,  0.23287104,  0.38261988,  1.        ]])

Waller, N. G. (2018). Generating Correlation Matrices With Specified Eigenvalues Using the Method of Alternating Projections. The American Statistician, 74(1), 21–28. https://doi.org/10.1080/00031305.2017.1401960

Faster Option Pricing: How Quasi-Monte Carlo Outperforms Standard Monte Carlo

In this post, we discuss the usefulness of low-discrepancy sequences (LDS) in finance, particularly for option pricing. Unlike purely random sampling, LDS methods generate points that are more evenly distributed over the sample space. This uniformity reduces the gaps and clustering seen in standard Monte Carlo (MC) sampling and improves convergence in numerical integration problems.

A key measure of sampling quality is discrepancy, which quantifies how evenly a set of points covers the space. Low-discrepancy sequences minimize this discrepancy, leading to faster convergence in high-dimensional simulations.

Continue reading

Understanding the Uncertainty of Correlation Estimates

Correlation is everywhere in finance. It’s the backbone of portfolio optimization, risk management, and models like the CAPM. The idea is simple: mix assets that don’t move in sync, and you can reduce risk without sacrificing too much return. But there’s a problem—correlation is usually taken at face value, even though it’s often some form of an estimate based on historical data. …and that estimate comes with uncertainty!

This matters because small errors in correlation can throw off portfolio models. If you overestimate diversification, your portfolio might be riskier than expected. If you underestimate it, you could miss out on returns. In models like the CAPM, where correlation helps determine expected returns, bad estimates can lead to bad decisions.

Despite this, some asset managers don’t give much thought to how unstable correlation estimates can be. In this post, we’ll dig into the uncertainty behind empirical correlation, and how to quantify it.

Continue reading

Validating Trading Backtests with Surrogate Time-Series

Back-testing trading strategies is a dangerous business because there is a high risk you will keep tweaking your trading strategy model to make the back-test results better. When you do so, you’ll find out that after tweaking you have actually worsened the ‘live’ performance later on. The reason is that you’ve been overfitting your trading model to your back-test data through selection bias.

In this post we will use two techniques that help quantify and monitor the statistical significance of backtesting and tweaking:

  1. First, we analyze the performance of backtest results by comparing them against random trading strategies that similar trading characteristics (time period, number of trades, long/short ratio). This quantifies specifically how “special” the timing of the trading strategy is while keeping all other things equal (like the trends, volatility, return distribution, and patterns in the traded asset).
  2. Second, we analyse the impact and cost of tweaking strategies by comparing it against doing the same thing with random strategies. This allows us to see if improvements are significant, or simply what one would expect when picking the best strategy from a set of multiple variants.
Continue reading

Gaussian Mixture Approximation for the Laplace Distribution

The Laplacian distribution is an interesting alternative building-block compared to the Gaussian distribution because it has much fatter tails. A drawback might be that some nice analytical properties that Gaussian distribution gives you don’t easily translate to Laplacian distributions. In those cases, it can be handy to approximate the Laplacian distribution with a mixture of Gaussians. The following approximation can then be uses

    \[L(x) = \frac{1}{2}e^{-|x|} \approx \frac{1}{n} \sum_{i=1}^n N\left(x | \mu=0, \sigma^2=-2\ln \frac{1+2i}{2n}\right)\]

def laplacian_gmm(n=4):
    # all components have the same weight
    weights = np.repeat(1.0/n, n)
    
    # centers of the n bins in the interval [0,1]
    uniform = np.arange(0.5/n, 1.0, 1.0/n)
    
    # Uniform- to Exponential-distribution transform
    sigmas = np.array(-2*np.log(uniform))**.5
    return weights, sigmas

def laplacian_gmm_pdf(x, n=4):
    weights, sigmas = laplacian_gmm(n)
    p = np.zeros_like(x)
    for i in range(n):
        p += weights[i] * norm(loc=0, scale=sigmas[i]).pdf(x)
    return p
SITMO Machine Learning | Quantitative Finance