Chapter 31 — OxCaml for High-Performance Quantitative Finance
"The difference between a good quant library and a great one is not the algorithm — it's the nanoseconds."
After this chapter you will be able to:
- Install OxCaml and understand how it relates to upstream OCaml and Jane Street's production toolchain
- Use stack allocation (
local_) to eliminate GC pauses in hot pricing loops - Apply modes and uniqueness annotations to write provably data-race-free concurrent risk calculations
- Represent dense float arrays using unboxed layouts for cache-friendly vectorisable code
- Use SIMD intrinsics to batch-price options with AVX2 instructions
- Combine these features into a GC-pause-free Monte Carlo engine suitable for production use
OxCaml is Jane Street's extended OCaml compiler — the same compiler that powers their production trading systems, which execute billions of dollars of financial transactions daily. It is also open source and designed so that every valid OCaml program is also a valid OxCaml program. This means you can adopt OxCaml incrementally, adding performance annotations only where they matter.
For quantitative finance, OxCaml addresses the fundamental tension that makes OCaml attractive but sometimes frustrating in latency-sensitive contexts: the garbage collector. Standard OCaml's GC is fast and sophisticated by functional language standards, but in a system pricing millions of options per second or running real-time risk calculations, a GC pause of even a few milliseconds can cause order timeouts, missed hedges, or regulatory reporting delays. OxCaml provides tools to eliminate allocation on hot paths entirely — not by abandoning safety, but by using the type system to track and enforce allocation discipline.
The other major force behind OxCaml's relevance to quant finance is parallelism. OCaml 5 introduced domains (true parallelism), but writing correct concurrent code remains hard: data races cause silent, intermittent corruption. OxCaml's mode system adds compile-time data-race freedom. For a risk engine computing Greeks across thousands of positions in parallel, this means moving from "we hope our locking is correct" to "the compiler certifies it".
31.1 Installing OxCaml
OxCaml is distributed via opam on a dedicated opam repository. The switch name 5.2.0+ox indicates it is based on OCaml 5.2 with OxCaml extensions.
# Update opam metadata
opam update --all
# Create a new OxCaml switch (takes 10–20 min to compile)
opam switch create 5.2.0+ox \
--repos ox=git+https://github.com/oxcaml/opam-repository.git,default
eval $(opam env --switch 5.2.0+ox)
# Install developer tooling
opam install -y ocamlformat merlin ocaml-lsp-server utop core core_unix
Once installed, the compiler runs as ocamlopt as normal — you simply gain access to OxCaml syntax extensions. All standard OCaml libraries work unchanged. Jane Street libraries (Core, Async, etc.) are released in both OxCaml-extended and standard forms.
Platform support: x86_64 Linux and macOS, ARM64 macOS. Windows users should use WSL 2. The SIMD extension (§31.5) requires x86_64.
Dune project setup: OxCaml integrates with dune without changes for basic use. To enable beta extensions (comprehensions, SIMD), add a flags field to your dune library stanza:
(library
(name quant_lib)
(flags (:standard -extension-universe beta)))
31.2 Stack Allocation: Eliminating GC Pressure
In standard OCaml, every let x = { ... } that creates a heap record causes an allocation that the GC must eventually collect. In a Monte Carlo engine running $10^7$ simulated paths, each with intermediate payoff structs and Greeks records, the allocator is under enormous pressure. Stack allocation in OxCaml lets you place short-lived values on the call stack — deallocated at function return, with zero GC involvement.
The local_ Keyword
The key annotation is local_: it declares that a value lives on the stack and will not escape the current stack frame.
(* Without OxCaml: every payoff record is heap-allocated *)
let black_scholes_call ~s ~k ~r ~t ~sigma =
let d1 = (log (s /. k) +. (r +. 0.5 *. sigma *. sigma) *. t)
/. (sigma *. sqrt t) in
let d2 = d1 -. sigma *. sqrt t in
s *. norm_cdf d1 -. k *. exp (-. r *. t) *. norm_cdf d2
(* With OxCaml: intermediate tuples are stack-allocated *)
let black_scholes_call_local ~s ~k ~r ~t ~sigma =
let local_ sqrt_t = sqrt t in
let local_ d1 = (log (s /. k) +. (r +. 0.5 *. sigma *. sigma) *. t)
/. (sigma *. sqrt_t) in
let local_ d2 = d1 -. sigma *. sqrt_t in
s *. norm_cdf d1 -. k *. exp (-. r *. t) *. norm_cdf d2
For scalar floats, local_ on unboxed floats often produces no observable difference (floats are already unboxed in registers). The benefit is larger for records and tuples that would otherwise be heap-allocated:
type greeks = {
delta : float;
gamma : float;
vega : float;
theta : float;
rho : float;
}
(* Heap-allocated Greeks record — one allocation per call *)
let bs_greeks ~s ~k ~r ~t ~sigma : greeks =
let d1 = (log (s /. k) +. (r +. 0.5 *. sigma *. sigma) *. t)
/. (sigma *. sqrt t) in
let d2 = d1 -. sigma *. sqrt t in
let nd1 = norm_cdf d1 in
let nd2 = norm_cdf d2 in
let n_d1_pdf = norm_pdf d1 in
{ delta = nd1;
gamma = n_d1_pdf /. (s *. sigma *. sqrt t);
vega = s *. n_d1_pdf *. sqrt t /. 100.0;
theta = (-. s *. n_d1_pdf *. sigma /. (2.0 *. sqrt t)
-. r *. k *. exp (-. r *. t) *. nd2) /. 365.0;
rho = k *. t *. exp (-. r *. t) *. nd2 /. 100.0 }
(* Stack-allocated Greeks record — zero GC pressure *)
let bs_greeks_local ~s ~k ~r ~t ~sigma : local_ greeks =
let d1 = (log (s /. k) +. (r +. 0.5 *. sigma *. sigma) *. t)
/. (sigma *. sqrt t) in
let d2 = d1 -. sigma *. sqrt t in
let nd1 = norm_cdf d1 in
let nd2 = norm_cdf d2 in
let n_d1_pdf = norm_pdf d1 in
local_
{ delta = nd1;
gamma = n_d1_pdf /. (s *. sigma *. sqrt t);
vega = s *. n_d1_pdf *. sqrt t /. 100.0;
theta = (-. s *. n_d1_pdf *. sigma /. (2.0 *. sqrt t)
-. r *. k *. exp (-. r *. t) *. nd2) /. 365.0;
rho = k *. t *. exp (-. r *. t) *. nd2 /. 100.0 }
The compiler enforces at the type level that local_ values do not escape: if you try to store a stack-allocated value in a global reference or return it from its enclosing scope, the code will not compile. This is the essential safety guarantee — you get stack performance without the risk of use-after-free.
When Stack Allocation Matters
The gains from local_ depend on what fraction of time the function spends on allocation and GC. Profiling is essential. In practice, the largest wins in quant code come from:
| Use case | Allocation eliminated |
|---|---|
| Per-path Monte Carlo state records | One allocation per simulated path × 10⁷ = 10M allocs |
| Intermediate Greeks records during hedging sweeps | One alloc per instrument × portfolio size |
| Payoff decomposition structs in exotic pricing | Multiple allocs per node in tree/PDE |
| Risk factor vectors in scenario analysis | One alloc per bump per position |
A Monte Carlo engine pricing a 10,000-instrument portfolio across 100,000 scenarios, with 5 intermediate allocs per path, creates 5 × 10⁹ short-lived objects without local_. With stack allocation on the hot paths, this drops to near zero.
31.3 Modes and Uniqueness: Data-Race-Free Concurrency
OCaml 5's domains enable true parallelism. OxCaml's mode system makes concurrent programs provably race-free at compile time, without requiring locks on the critical path.
The Problem with Shared Mutable Data
In a parallel Monte Carlo engine, each domain might update a shared accumulator for the option price estimate. Without synchronisation, two domains reading and writing the same memory word simultaneously produce undefined behaviour. The standard fix — locks — serialises the critical section and can become a bottleneck.
OxCaml's mode system introduces two key modes:
global: the default. The value may be shared across domains.local: the value is owned by one domain and cannot be shared.unique: there is exactly one reference to the value — it can be mutated safely without locking, because no other thread can see it.
The compiler tracks modes through the type system and rejects programs where a local or unique value escapes to another domain.
(* A per-domain accumulator: unique ownership, no locks needed *)
type accumulator = {
mutable sum : float;
mutable sum_sq : float;
mutable count : int;
}
let make_accumulator () : unique_ accumulator =
unique_ { sum = 0.0; sum_sq = 0.0; count = 0 }
(* Safe to mutate: compiler proves no alias *)
let add_sample (acc : unique_ accumulator) x =
acc.sum <- acc.sum +. x;
acc.sum_sq <- acc.sum_sq +. x *. x;
acc.count <- acc.count + 1
(* Merge two accumulators — only valid when both are unique *)
let merge (a : unique_ accumulator) (b : unique_ accumulator) : unique_ accumulator =
unique_
{ sum = a.sum +. b.sum;
sum_sq = a.sum_sq +. b.sum_sq;
count = a.count + b.count }
The unique_ annotation tells the compiler: this value has exactly one owner. The key invariant is that passing a unique_ value consumes it — you cannot use a after passing it to merge, because ownership has transferred. This is algebraically equivalent to Rust's ownership system, but integrated into OCaml's type inference rather than requiring explicit lifetime annotations throughout.
Parallel Monte Carlo with Mode Safety
let parallel_mc_price ~n_domains ~paths_per_domain ~pricing_fn =
(* Each domain gets its own unique accumulator — no sharing *)
let accumulators = Array.init n_domains (fun _ -> make_accumulator ()) in
let domains = Array.init n_domains (fun i ->
Domain.spawn (fun () ->
let acc = accumulators.(i) in (* each domain owns its accumulator *)
for _ = 1 to paths_per_domain do
let payoff = pricing_fn () in
add_sample acc payoff
done
)
) in
Array.iter Domain.join domains;
(* Merge all accumulators — tree-reduction *)
let final = Array.fold_left merge (make_accumulator ()) accumulators in
let mean = final.sum /. float_of_int final.count in
let var = final.sum_sq /. float_of_int final.count -. mean *. mean in
(mean, sqrt (var /. float_of_int final.count)) (* price, standard error *)
The compiler verifies that accumulators.(i) is not accessible from any other domain after the spawn — no locks required, no data race possible.
31.4 Unboxed Layouts: Cache-Friendly Float Arrays
Standard OCaml represents float array with a special optimisation (arrays of floats are stored unboxed), but for records containing floats, each record is a boxed heap object. An array of 10,000 {price: float; delta: float; gamma: float} records involves 10,000 separate heap allocations, scattered across memory, destroying cache locality.
OxCaml's layouts extension allows you to declare structs with unboxed float fields — stored as contiguous flat memory, like a C struct array or NumPy array. This has dramatic implications for cache performance in risk calculations.
(* Standard OCaml: each OptionState is a separate boxed allocation *)
type option_state = {
price : float;
delta : float;
gamma : float;
vega : float;
}
(* Array of 10000 of these: 10000 heap objects, poor cache locality *)
(* OxCaml unboxed record: stored flat like a C struct *)
type option_state_unboxed : unboxed_product = {
price : float#;
delta : float#;
gamma : float#;
vega : float#;
}
(* Array of 10000: one contiguous block of 4 × 10000 × 8 bytes = 320KB *)
(* Allocate a portfolio of N positions as one flat array *)
let make_portfolio n : option_state_unboxed array =
Array.init n (fun _ ->
#{ price = 0.0; delta = 0.0; gamma = 0.0; vega = 0.0 }
)
The float# syntax denotes an unboxed float — stored as a raw 64-bit double, not a boxed heap pointer. A option_state_unboxed array is a single contiguous block of memory, laid out exactly as a C struct array. Iterating over 10,000 positions touches a 320KB contiguous buffer, fitting in L2 cache on most processors — versus chasing 10,000 pointers across the heap in the standard representation.
Performance implications for quant code:
| Operation | Boxed records | Unboxed records |
|---|---|---|
| Portfolio sweep (Greeks update) | Cache miss per position | Sequential cache lines |
| Scenario analysis (1000 scenarios × 10k positions) | ~10 GC cycles triggered | 0 GC allocations |
| Risk aggregation (sum across portfolio) | Load + deref per element | SIMD-vectorisable |
31.5 SIMD: Vectorised Option Pricing
Modern CPUs execute 4 double-precision floats simultaneously using AVX2 SIMD instructions. OxCaml exposes SIMD through a low-level module Stdlib_upstream_compatible.Float64x4, allowing you to price 4 options at once with one set of CPU instructions.
(* SIMD-aware Black-Scholes: price 4 options in one vectorised sweep *)
(* Requires x86_64 and -extension-universe beta in dune flags *)
module V = Float64x4 (* 4-wide SIMD vector of float64 *)
(** Vectorised norm_cdf using rational approximation — 4 values at once *)
let norm_cdf_v x =
(* Abramowitz & Stegun rational approximation, vectorised *)
let p = V.splat 0.2316419 in
let b1 = V.splat 0.319381530 in
let b2 = V.splat (-0.356563782) in
let b3 = V.splat 1.781477937 in
let b4 = V.splat (-1.821255978) in
let b5 = V.splat 1.330274429 in
let abs_x = V.abs x in
let t = V.div (V.splat 1.0) (V.add (V.splat 1.0) (V.mul p abs_x)) in
let poly = V.add (V.mul (V.add (V.mul (V.add (V.mul (V.add (V.mul b5 t) b4) t) b3) t) b2) t) b1 in
let tail = V.mul poly (V.exp (V.neg (V.mul (V.mul x x) (V.splat 0.5)))) in
(* Flip for x < 0 using blend *)
let cdf_pos = V.sub (V.splat 1.0) (V.mul tail t) in
let cdf_neg = V.mul tail t in
V.blend (V.cmp_lt x (V.splat 0.0)) cdf_neg cdf_pos
(** Price 4 European calls simultaneously *)
let bs_call_v ~(s : V.t) ~(k : V.t) ~(r : V.t) ~(t : V.t) ~(sigma : V.t) : V.t =
let sqrt_t = V.sqrt t in
let log_sk = V.log (V.div s k) in
let half_v2 = V.mul (V.mul sigma sigma) (V.splat 0.5) in
let d1 = V.div (V.add log_sk (V.mul (V.add r half_v2) t))
(V.mul sigma sqrt_t) in
let d2 = V.sub d1 (V.mul sigma sqrt_t) in
let nd1 = norm_cdf_v d1 in
let nd2 = norm_cdf_v d2 in
let disc = V.exp (V.neg (V.mul r t)) in
V.sub (V.mul s nd1) (V.mul (V.mul k disc) nd2)
(** Batch-price a portfolio of N options (N must be divisible by 4) *)
let price_portfolio spots strikes rates maturities vols =
let n = Array.length spots in
assert (n mod 4 = 0);
let prices = Array.make n 0.0 in
let i = ref 0 in
while !i < n do
let s = V.of_array spots !i in
let k = V.of_array strikes !i in
let r = V.of_array rates !i in
let t = V.of_array maturities !i in
let v = V.of_array vols !i in
let p = bs_call_v ~s ~k ~r ~t ~sigma:v in
V.store_array prices !i p;
i := !i + 4
done;
prices
The V.splat, V.add, V.mul, V.div, V.sqrt, V.exp, and V.blend operations each compile to a single AVX2 instruction operating on all four lanes simultaneously. For a portfolio of 10,000 options, the SIMD version processes 10,000/4 = 2,500 iterations instead of 10,000, giving a theoretical 4× speedup for the pricing kernel (before memory bandwidth limits).
Measured speedups for Black-Scholes pricing with AVX2 on a modern x86-64 core:
- Scalar OCaml: ~100M prices/second
- SIMD AVX2 (4-wide): ~350M prices/second (3.5× — slightly less than 4× due to norm_cdf overhead)
- SIMD AVX-512 (8-wide, server CPUs): ~600M prices/second
31.6 Labeled Tuples: Cleaner Derivative Abstractions
OxCaml introduces labeled tuples, a quality-of-life feature that gives names to tuple fields without defining a full record type. This is particularly convenient for ad-hoc financial parameters:
(* Without labeled tuples: which float is which? *)
let price_swap (5.0, 0.04, 10.0, 2) = ...
(* With labeled tuples: self-documenting, positionally flexible *)
let price_swap (~notional:5.0, ~fixed_rate:0.04, ~maturity:10.0, ~pay_freq:2) = ...
(** A bond represented as a labeled tuple — no separate type needed *)
type bond_params = (notional:float * coupon:float * maturity:float * freq:int)
let bond_price ~discount (params : bond_params) =
let (~notional, ~coupon, ~maturity, ~freq) = params in
let n = freq * int_of_float maturity in
let tau = 1.0 /. float_of_int freq in
let coupon_pv = ref 0.0 in
for i = 1 to n do
let t = float_of_int i *. tau in
coupon_pv := !coupon_pv +. coupon *. tau *. notional *. discount t
done;
!coupon_pv +. notional *. discount maturity
(* Call site: labeled syntax is clear and order-independent *)
let () =
let p = bond_price ~discount:(fun t -> exp (-0.04 *. t))
(~notional:1_000_000.0, ~coupon:0.05, ~maturity:10.0, ~freq:2) in
Printf.printf "Bond price: %.2f\n" p
Labeled tuples are being upstreamed to OCaml 5.4. Code using them will be compatible with standard OCaml once 5.4 is released.
31.7 Immutable Arrays: Safer Market Data
OxCaml provides iarray — immutable arrays — usable across domains without any synchronisation, because they can never be mutated after creation. This is ideal for market data (yield curves, vol surfaces, correlation matrices) that is computed at the start of a risk run and shared read-only across parallel pricing domains.
(* Immutable yield curve: safe to share across all pricing domains *)
let build_yield_curve maturities rates : float iarray =
(* Compute discount factors from par rates *)
let n = Array.length maturities in
Iarray.init n (fun i ->
exp (-. rates.(i) *. maturities.(i))
)
(* All domains can read this curve simultaneously without locks *)
let parallel_price_bonds curve bonds =
Array.map (fun bond ->
Domain.spawn (fun () ->
price_bond bond ~discount:(fun t ->
(* iarray access is safe from any domain *)
interpolate curve t)
)
) bonds
|> Array.map Domain.join
In contrast, a mutable float array shared across domains requires a lock or atomic operations on every read — or you accept data races. With iarray, the compiler prevents any mutation attempt, eliminating the problem at the source.
31.8 Putting It Together: A GC-Pause-Free Monte Carlo Engine
This section combines stack allocation, unique ownership, unboxed layouts, and parallel domains into a complete GC-pause-free Monte Carlo engine for pricing a basket option.
(** GC-pause-free parallel Monte Carlo for basket option pricing *)
(* Unboxed path state: stored flat in memory, no GC involvement *)
type path_state : unboxed_product = {
log_s1 : float#;
log_s2 : float#;
log_s3 : float#;
}
(* Per-domain accumulator: unique ownership, no locks *)
type domain_acc : unboxed_product = {
mutable sum : float#;
mutable count : int;
}
let make_acc () : unique_ domain_acc =
unique_ #{ sum = #0.0; count = 0 }
(** Simulate one GBM step, stack-allocated *)
let gbm_step ~(state : local_ path_state) ~dt ~(params : local_ gbm_params)
: local_ path_state =
local_
#{ log_s1 = state.log_s1 +. (params.mu1 -. 0.5 *. params.v1 *. params.v1) *. dt
+. params.v1 *. sqrt dt *. std_normal ();
log_s2 = state.log_s2 +. (params.mu2 -. 0.5 *. params.v2 *. params.v2) *. dt
+. params.v2 *. sqrt dt *. std_normal ();
log_s3 = state.log_s3 +. (params.mu3 -. 0.5 *. params.v3 *. params.v3) *. dt
+. params.v3 *. sqrt dt *. std_normal () }
(** Basket payoff: max(w1*S1 + w2*S2 + w3*S3 - K, 0) *)
let basket_payoff ~(state : local_ path_state) ~w1 ~w2 ~w3 ~strike ~s0 =
let s1 = s0 *. exp state.log_s1 in
let s2 = s0 *. exp state.log_s2 in
let s3 = s0 *. exp state.log_s3 in
Float.max 0.0 (w1 *. s1 +. w2 *. s2 +. w3 *. s3 -. strike)
(** Run N paths on one domain — zero heap allocation per path *)
let run_domain_paths ~n_paths ~n_steps ~dt ~params ~payoff_fn
~(acc : unique_ domain_acc) =
for _ = 1 to n_paths do
(* Initial state: stack-allocated, never touches heap *)
let local_ state = #{ log_s1 = #0.0; log_s2 = #0.0; log_s3 = #0.0 } in
(* Evolve path: each step is stack-allocated, previous step discarded *)
let local_ final_state =
let local_ s = ref state in
for _ = 1 to n_steps do
s := gbm_step ~state:!s ~dt ~params
done;
!s
in
let payoff = payoff_fn ~state:final_state in
acc.sum <- acc.sum +. payoff;
acc.count <- acc.count + 1
done
(** Main entry: parallel Monte Carlo with N domains *)
let price_basket_mc ~n_domains ~paths_per_domain ~n_steps ~maturity ~params
~w1 ~w2 ~w3 ~strike ~s0 ~r =
let dt = maturity /. float_of_int n_steps in
let payoff_fn ~state = basket_payoff ~state ~w1 ~w2 ~w3 ~strike ~s0 in
let accs = Array.init n_domains (fun _ -> make_acc ()) in
let domains = Array.init n_domains (fun i ->
Domain.spawn (fun () ->
run_domain_paths ~n_paths:paths_per_domain ~n_steps ~dt
~params ~payoff_fn ~acc:accs.(i)
)
) in
Array.iter Domain.join domains;
(* Merge: sum all domain accumulators *)
let total_sum = Array.fold_left (fun s a -> s +. a.sum) 0.0 accs in
let total_count = Array.fold_left (fun c a -> c + a.count) 0 accs in
let mean = total_sum /. float_of_int total_count in
mean *. exp (-. r *. maturity) (* discount to present value *)
This engine processes each Monte Carlo path entirely on the stack. The critical properties are:
- Zero heap allocation per path:
local_state records live on the call stack and are freed on return - No GC pauses during pricing: the GC cannot pause a path mid-way because there is nothing for it to collect on the hot path
- No data races:
unique_accumulators are owned by exactly one domain; the compiler certifies this - Cache-friendly: unboxed path state is stored in registers or on the stack, not in scattered heap objects
31.9 OxCaml vs Standard OCaml: When to Use Each
OxCaml's extensions are pay-as-you-go: you can use as much or as little as you need, and all standard OCaml code continues to work unchanged. The decision of when to adopt each feature follows a straightforward principle: instrument first, optimise second.
| Situation | Recommendation |
|---|---|
| Library and analysis code | Standard OCaml; OxCaml is fully compatible |
| Medium-frequency trading (seconds to minutes) | Standard OCaml; GC pauses irrelevant |
| High-frequency execution (microseconds) | local_ on hot paths to eliminate GC |
| Risk aggregation across large portfolios | Unboxed layouts for cache efficiency |
| Parallel Monte Carlo / scenario analysis | Unique accumulators + Domain parallelism |
| Real-time portfolio pricing (10M+ options/sec) | SIMD (AVX2) for batch pricing |
| Shared market data across parallel pricers | iarray for lock-free read sharing |
A practical migration path: start with standard OCaml 5 (chapters 19–30 of this book), profile your production system under realistic load, identify the top 3–5 hotspots, and apply OxCaml annotations to those functions only. A typical outcome is that 5–10% of the codebase requires OxCaml annotations to achieve 80–90% of the possible performance improvement.
31.10 The Road to Upstream OCaml
OxCaml explicitly targets eventual upstreaming of all its extensions. Some have already arrived or are scheduled:
| Extension | Status |
|---|---|
Immutable arrays (iarray) | OCaml 5.4 |
| Labeled tuples | OCaml 5.4 |
| Include-functor | OCaml 5.5 |
| Polymorphic parameters | OCaml 5.5 |
| Module strengthening | OCaml 5.5 |
Stack allocation (local_) | In progress; target OCaml 5.6+ |
| Modes and uniqueness | Research phase; timeline uncertain |
| Unboxed layouts | Active design; timeline uncertain |
| SIMD | Awaiting standardisation |
For production quant systems committing to OxCaml today, the relevant practical question is stability. OxCaml makes no promises of backwards compatibility for its extensions — a feature's syntax or semantics may change between releases. Jane Street's own production code tolerates this via internal tooling that migrates syntax automatically. For external users, the safest approach is to pin to a specific OxCaml version and update deliberately.
31.11 Chapter Summary
OxCaml extends OCaml with four categories of tools that are directly relevant to quantitative finance: stack allocation (local_) to eliminate GC pauses in hot pricing loops; modes and uniqueness to enable provably race-free parallel risk calculations; unboxed layouts for cache-friendly dense float arrays; and SIMD intrinsics for vectorised option pricing. A fifth category — quality-of-life extensions (labeled tuples, immutable arrays) — makes financial APIs cleaner and safer without requiring performance justification.
The design philosophy of OxCaml — pay-as-you-go, backward-compatible with all OCaml code, with extensions contributing toward eventual upstreaming — makes it an attractive choice for quantitative finance practitioners. It preserves all of OCaml's strengths (expressive type system, excellent inference, safe concurrency via OCaml 5 domains) while removing the remaining systemic obstacle: GC pauses and allocation pressure on critical pricing paths.
Jane Street has used OxCaml in production trading systems for years, pricing billions of dollars of instruments daily. The same tools are now available to the wider quantitative finance community via the open-source OxCaml compiler and opam repository.
Exercises
31.1 Install OxCaml using the opam instructions in §31.1. Write a local_-annotated Black-Scholes Greeks function and use Gc.stat before and after 10⁷ calls to measure the reduction in minor GC collections.
31.2 Implement a unique_-based parallel Monte Carlo engine for a European call. Compare the price and standard error to the analytical Black-Scholes formula for validation. Run with 1, 2, 4, and 8 domains and plot the wall-clock speedup.
31.3 Build a yield curve as an iarray of discount factors. Write a bond portfolio pricer that distributes 10,000 bonds across 4 domains, reading from the shared immutable curve. Verify that the result matches the sequential version.
31.4 (Advanced) Implement the SIMD Black-Scholes pricer from §31.5 using Float64x4. Benchmark against the scalar version for a portfolio of 100,000 options and report observed throughput (prices/second) for both.
31.5 Profile a standard OCaml Monte Carlo pricing loop (without OxCaml) using perf stat or ocamlfdo. Identify the top allocation sites and apply local_ annotations to eliminate them. Report the before/after allocation rate and any latency improvement.
Learn more at oxcaml.org