Chapter 27 — Machine Learning in Quantitative Finance

"Neural networks are option pricers with too many hyperparameters and not enough Greeks."

After this chapter you will be able to:

Identify the key ways financial ML problems differ from standard ML (non-stationarity, label scarcity, low signal-to-noise)
Implement ridge and LASSO regression for return prediction and explain the bias-variance tradeoff
Build a neural network for option pricing and understand its advantages (speed at inference) and limitations
Construct ML features from price series and fundamental data while avoiding look-ahead bias
Detect and avoid feature leakage, data snooping, and other backtesting pitfalls specific to ML

Machine learning entered quantitative finance in the 2010s not as a revolution but as a carefully qualified evolution. Statistical models had always underpinned quantitative research; ridge regression, LASSO, and their variants were already standard tools. The question was whether the more expressive function classes of neural networks and gradient boosting would provide better out-of-sample predictions in financial applications than the simpler linear and factor models they were meant to replace.

The answer is empirically mixed — and the reason illuminates something deep about both machine learning and financial markets. The canonical machine learning success stories (image recognition, language translation, protein folding) involve stationary problems where training data and deployment data come from the same distribution. Financial markets are non-stationary: the statistical relationships between features and future returns shift over time as the market's composition changes, as strategies become crowded, and as macroeconomic regimes change. A neural network trained on the 2010–2018 bull market may have essentially no predictive value in a bear market regime it has never seen. This regime non-stationarity is the central challenge of ML in finance.

Despite these caveats, ML tools have found genuine applications: in options pricing (neural networks as fast approximators of slow numerical pricers), in natural language processing (extracting signals from earnings calls and news), in cross-sectional equity prediction with careful cross-validation, and in reinforcement learning for execution optimisation. This chapter covers the ML toolkit most relevant to quantitative finance practitioners and, crucially, the failure modes to watch for.

27.1 ML Landscape for Quant Finance

Machine learning in finance divides broadly into:

Application	Typical Method	Chapter Context
Return prediction	Ridge/LASSO, gradient boosting	Alpha generation
Options pricing	Neural networks, kernel methods	Calibration
Risk forecasting	LSTM, GRU	VaR, volatility
Credit scoring	Logistic regression, XGBoost	PD estimation
NLP/sentiment	Transformers	Alternative data

27.2 Feature Engineering

Feature engineering is the process of transforming raw data (prices, volumes, financial statement items) into inputs suitable for an ML model. In finance, the quality of features matters far more than the sophistication of the model — a well-engineered set of predictive signals with linear regression almost always outperforms a neural network applied to poorly-constructed raw inputs.

Common feature categories for equity return prediction:

Price momentum: trailing returns at 1M, 3M, 6M, 12M (excluding the most recent month to avoid the short-term reversal effect)
Mean reversion: 1-day return, 5-day return (these negatively predict next-day return due to market microstructure mean-reversion)
Value signals: book-to-market, earnings yield, cash flow yield (from financial statements)
Quality signals: profitability (ROE, gross margin), balance sheet strength (debt/equity), accruals ratio
Volatility: realised volatility of returns, volume-scaled price impact
Analyst signals: earnings revision momentum, analyst consensus changes

Feature engineering pitfalls:

Look-ahead bias: the most dangerous error. If a signal at time $t$ uses data that was not available until after $t$ (e.g., a financial ratio computed from annual report data released 3 months after the period end), it appears predictive in backtest but is useless live. Always use point-in-time data and build pipelines that carefully track data availability dates.

Feature leakage: a subtler form of look-ahead bias where the normalisation or transformation of a feature uses information from future time periods. Example: if you z-score a momentum signal using the full-sample mean and standard deviation, the scaling uses data from after the signal date. Use rolling or expanding windows for all normalisation.

Overfitting in feature selection: if you test 1,000 candidate features and report the 20 that have the best in-sample predictive power, you have selected for noise. Apply out-of-sample validation strictly and penalise for the number of features tested.

module Features = struct

  (** Technical indicators as features for ML *)
  let sma ~prices ~window =
    let n = Array.length prices in
    Array.init n (fun i ->
      if i < window - 1 then Float.nan
      else
        let s = ref 0.0 in
        for k = i - window + 1 to i do s := !s +. prices.(k) done;
        !s /. float_of_int window
    )

  let ema ~prices ~alpha =
    let n = Array.length prices in
    let em = Array.make n prices.(0) in
    for i = 1 to n - 1 do
      em.(i) <- alpha *. prices.(i) +. (1.0 -. alpha) *. em.(i - 1)
    done;
    em

  let rsi ~returns ~period =
    let n   = Array.length returns in
    let rsi = Array.make n 50.0 in
    for i = period to n - 1 do
      let gains  = ref 0.0 and losses = ref 0.0 in
      for k = i - period + 1 to i do
        let r = returns.(k) in
        if r > 0.0 then gains := !gains +. r
        else losses := !losses -. r
      done;
      let avg_gain = !gains /. float_of_int period in
      let avg_loss = !losses /. float_of_int period in
      rsi.(i) <- if avg_loss < 1e-8 then 100.0
                 else 100.0 -. 100.0 /. (1.0 +. avg_gain /. avg_loss)
    done;
    rsi

  (** Normalise features to zero mean/unit variance *)
  let normalise features =
    let n = Array.length features in
    let mean = Array.fold_left (+.) 0.0 features /. float_of_int n in
    let std  = sqrt (Array.fold_left (fun a x -> a +. (x -. mean) *. (x -. mean))
                       0.0 features /. float_of_int n) in
    Array.map (fun x -> (x -. mean) /. (std +. 1e-8)) features

end

27.3 Ridge and LASSO Regression

module Regularised_regression = struct

  (** Ridge regression: min ‖y - Xβ‖² + λ‖β‖²
      Closed form: β = (X'X + λI)^{-1} X'y *)
  let ridge ~x_matrix ~y_vec ~lambda =
    let n_obs  = Array.length y_vec in
    let n_feat = Array.length x_matrix.(0) in
    (* Build X'X + λI *)
    let xTx = Array.init n_feat (fun i ->
      Array.init n_feat (fun j ->
        Array.fold_left (fun acc row -> acc +. row.(i) *. row.(j)) 0.0 x_matrix
        +. (if i = j then lambda else 0.0)
      )
    ) in
    (* Build X'y *)
    let xTy = Array.init n_feat (fun j ->
      Array.fold_left2 (fun acc row y -> acc +. row.(j) *. y) 0.0 x_matrix y_vec
    ) in
    (* Solve xTx * beta = xTy via Owl *)
    let xTx_mat = Owl.Mat.of_arrays xTx in
    let xTy_vec = Owl.Mat.of_array xTy n_feat 1 in
    Owl.Mat.to_array (Owl.Mat.solve xTx_mat xTy_vec)

  (** LASSO: coordinate descent *)
  let lasso ~x_matrix ~y_vec ~lambda ?(max_iter = 1000) ?(tol = 1e-6) () =
    let n_obs  = Array.length y_vec in
    ignore n_obs;
    let n_feat = Array.length x_matrix.(0) in
    let beta   = Array.make n_feat 0.0 in
    let converged = ref false in
    let iter = ref 0 in
    while not !converged && !iter < max_iter do
      incr iter;
      let prev = Array.copy beta in
      for j = 0 to n_feat - 1 do
        (* Partial residual *)
        let rj = Array.mapi (fun i yi ->
          yi -. Array.fold_left (fun a k -> if k = j then a
                                   else a +. x_matrix.(i).(k) *. beta.(k))
                   0.0 (Array.init n_feat Fun.id)
        ) y_vec in
        let xj_rj = Array.fold_left2 (fun a row r -> a +. row.(j) *. r) 0.0 x_matrix rj in
        let xj2   = Array.fold_left (fun a row -> a +. row.(j) *. row.(j)) 0.0 x_matrix in
        (* Soft thresholding *)
        let raw   = xj_rj /. xj2 in
        beta.(j) <- Float.max 0.0 (Float.abs raw -. lambda /. xj2) *. Float.sign_exn raw
      done;
      let diff = Array.fold_left2 (fun a p c -> a +. (p -. c) *. (p -. c)) 0.0 prev beta in
      if sqrt diff < tol then converged := true
    done;
    beta

end

27.4 Neural Networks for Option Pricing

Neural networks can interpolate the implied volatility surface or approximate the option price function directly.

module Neural_pricer = struct

  (** Simple feedforward network: input = [log(S/K), T, σ, r], output = call price *)
  type layer = {
    weights : float array array;
    bias    : float array;
    activation : [`Relu | `Sigmoid | `Linear];
  }

  type network = layer list

  let relu x = Float.max 0.0 x
  let sigmoid x = 1.0 /. (1.0 +. exp (-. x))

  let forward_layer layer input =
    let n_out = Array.length layer.bias in
    let n_in  = Array.length input in
    Array.init n_out (fun j ->
      let z = ref layer.bias.(j) in
      for i = 0 to n_in - 1 do
        z := !z +. layer.weights.(j).(i) *. input.(i)
      done;
      match layer.activation with
      | `Relu    -> relu !z
      | `Sigmoid -> sigmoid !z
      | `Linear  -> !z
    )

  let predict net input =
    List.fold_left (fun x layer -> forward_layer layer x) input net

  (** Training via backpropagation is left as an exercise;
      in practice use ONNX models loaded via C bindings *)

  (** Use pre-trained network to price options *)
  let price net ~spot ~strike ~rate ~vol ~tau =
    let input = [| log (spot /. strike); tau; vol; rate |] in
    let output = predict net input in
    output.(0) *. strike   (* unnormalise *)

end

27.5 Chapter Summary

Machine learning in quantitative finance is a powerful toolkit that requires more discipline, not less, than traditional statistical methods. The signal-to-noise ratio in financial data is extremely low — annual Sharpe ratios of 0.5 to 1.0 for genuine market inefficiencies translate to tiny $R^2$ values in return prediction regressions. In this environment, the expressive power of neural networks is as much a liability as an asset, because the same flexibility that lets the network capture real patterns also lets it fit noise.

Feature engineering for financial ML follows the same principles as traditional factor investing: momentum (trailing returns), value (price/earnings, book/price), quality (profitability, accruals), and volatility (realised and implied) have the most consistent empirical support. These features should be cross-sectionally normalised (z-scored) to remove scale effects and winsorised to reduce the influence of outliers. Time-series features (RSI, moving average crossovers, Bollinger bands) have weaker evidence but are widely used.

Regularisation is not optional in financial ML — it is essential. Ridge regression shrinks all coefficients toward zero, reducing variance at the cost of some bias. LASSO achieves sparsity, automatically setting irrelevant features to exactly zero. Both are preferable to OLS in the typical overparameterised setting of cross-sectional equity prediction (many features, noisy labels, non-stationarity). Neural networks require dropout, weight decay, and early stopping to achieve generalisation.

The most important safeguard is rigorous out-of-sample testing. A backtest evaluated only in-sample is meaningless for financial ML because it will almost always appear profitable due to overfitting. Walk-forward validation — training on a rolling window and testing on the next period, never looking forward — is the minimum standard. Cross-validation must be time-series-aware, using purged and embargoed folds to avoid information leakage between training and test sets.

Exercises

27.1 Train a ridge regression model to predict 1-month stock returns using momentum, value, and volatility features. Report in-sample and out-of-sample $R^2$.

27.2 Implement LASSO on a 50-feature dataset with 30 irrelevant features. Study how the solution path varies with $\lambda$.

27.3 Build a neural network pricer trained on 10,000 BS call prices with inputs $(S/K, T, \sigma, r)$. Measure pricing error on 1,000 out-of-sample points.

27.4 Apply the neural pricer to fit a market implied vol surface (treat market prices as ground truth). Compare to spline interpolation.

Next: Chapter 28 — Regulatory and Accounting Frameworks

Quantitative Finance with OCaml