Chapter 25 — High-Performance Trading Infrastructure
"Microseconds are money. Nanoseconds are more money. Latency is the hidden spread."
On 6 May 2010, at 2:32 PM, the Dow Jones Industrial Average dropped nearly 1,000 points in minutes and recovered almost as quickly. The Flash Crash was triggered by a large algorithmic sell order that interacted with the automated responses of high-frequency trading systems — systems that process millions of market events per second and execute trades in microseconds. High-frequency trading firms can submit and cancel an order in under 100 nanoseconds. To put that in perspective: a single tick of a 3 GHz CPU clock takes 333 picoseconds, meaning an order round-trip traverses the network fabric and the exchange matching engine in perhaps 300 CPU clock cycles. At this boundary between software engineering and electrical engineering, every design decision has measurable financial consequences.
High-performance trading infrastructure is not just about speed for its own sake. It is about building systems that are predictably fast: not systems that are fast on average, but systems whose tail latency (the p99.9 case) is bounded and measurable. A single slow response — caused by garbage collection, OS scheduling jitter, cache misses, or memory allocation — can cause a hedging algorithm to be late to market and create unwanted risk. The engineering discipline of low-latency systems is fundamentally about eliminating non-determinism.
OCaml is an unusually strong choice for this domain. Its native code compiler generates efficient machine code comparable to C++. Its incremental garbage collector has bounded pause times that can be tuned. Its type system eliminates entire classes of runtime errors. And with OCaml 5's domain-based parallelism and atomic operations, it can now build genuinely lock-free multi-threaded systems. This chapter shows how to exploit these properties for real-time trading applications.
25.1 OCaml for Low-Latency Systems
OCaml has several properties that make it suitable for low-latency trading:
- Predictable GC: incremental minor GC pauses are ~microseconds
- Unboxed values: OCaml 5 / OxCaml greatly reduce allocation in hot paths
- Zero-cost abstractions: functors and modules compile to efficient code
- Native compilation:
ocamloptgenerates competitive native code
Critical techniques:
- Avoid allocation in hot paths (use mutable buffers, ring buffers)
- Pre-allocate all data structures at startup
- Use
Bytes.tand bigarrays for binary protocol parsing - Pin threads to cores with
Domain+ affinity
25.2 Ring Buffer for Market Data
module Ring_buffer = struct
type 'a t = {
data : 'a array;
capacity : int;
mutable head : int;
mutable tail : int;
mutable size : int;
}
let create ?(capacity = 1024) default =
{ data = Array.make capacity default;
capacity; head = 0; tail = 0; size = 0 }
let push buf x =
if buf.size < buf.capacity then begin
buf.data.(buf.tail) <- x;
buf.tail <- (buf.tail + 1) mod buf.capacity;
buf.size <- buf.size + 1;
true
end else false (* full *)
let pop buf =
if buf.size = 0 then None
else begin
let x = buf.data.(buf.head) in
buf.head <- (buf.head + 1) mod buf.capacity;
buf.size <- buf.size - 1;
Some x
end
let peek buf =
if buf.size = 0 then None
else Some buf.data.(buf.head)
let is_empty buf = buf.size = 0
let is_full buf = buf.size = buf.capacity
end
25.3 FIX Protocol Parsing
The Financial Information eXchange (FIX) protocol is the standard for electronic trading. FIX messages are tag-value pairs separated by SOH (\001):
8=FIX.4.2|9=65|35=D|49=BUYER|56=EXCHANGE|34=1|11=ORD001|55=AAPL|54=1|38=100|40=2|44=150.50|10=123|
module Fix = struct
type tag = int
type value = string
type message = {
msg_type : string;
fields : (tag * value) list;
}
let soh = '\001'
let parse_message raw =
let pairs = String.split_on_char soh raw
|> List.filter (fun s -> String.length s > 0) in
let fields = List.filter_map (fun pair ->
match String.split_on_char '=' pair with
| [tag_s; value] -> (
match int_of_string_opt tag_s with
| Some tag -> Some (tag, value)
| None -> None)
| _ -> None
) pairs in
let msg_type = List.assoc_opt 35 fields |> Option.value ~default:"" in
{ msg_type; fields }
let get_field msg tag = List.assoc_opt tag msg.fields
let parse_new_order msg =
let field t = get_field msg t in
{| {
cl_ord_id = field 11;
symbol = field 55;
side = (match field 54 with Some "1" -> `Buy | _ -> `Sell);
qty = Option.bind (field 38) float_of_string_opt;
ord_type = field 40;
price = Option.bind (field 44) float_of_string_opt;
} |}
(** Build a FIX Execution Report (tag 35=8) *)
let build_exec_report ~cl_ord_id ~exec_id ~ord_status ~fill_qty ~fill_price =
let fields = [
(35, "8");
(11, cl_ord_id);
(17, exec_id);
(39, ord_status);
(32, string_of_float fill_qty);
(31, string_of_float fill_price);
] in
String.concat (String.make 1 soh)
(List.map (fun (t, v) -> string_of_int t ^ "=" ^ v) fields)
end
25.4 Lock-Free Data Structures for OCaml 5
OCaml 5 provides Atomic operations for building lock-free structures:
module Lock_free_queue = struct
(**
Michael-Scott lock-free queue using Atomic references.
Suitable for single-producer / multi-consumer market data distribution.
*)
type 'a node = {
value : 'a option;
next : 'a node Atomic.t;
}
type 'a t = {
head : 'a node Atomic.t;
tail : 'a node Atomic.t;
}
let create () =
let sentinel = { value = None; next = Atomic.make { value = None; next = Atomic.make {
value = None; next = Atomic.make (Obj.magic ()) } } } in
let node = Atomic.make sentinel in
{ head = node; tail = Atomic.make sentinel }
let enqueue q v =
let new_node = { value = Some v; next = Atomic.make (Obj.magic ()) } in
let rec try_enqueue () =
let tail = Atomic.get q.tail in
let next = Atomic.get tail.next in
if Atomic.get q.tail == tail then begin
if next.value = None then begin
if Atomic.compare_and_set tail.next next new_node then
ignore (Atomic.compare_and_set q.tail tail new_node)
else try_enqueue ()
end else begin
ignore (Atomic.compare_and_set q.tail tail next);
try_enqueue ()
end
end else try_enqueue ()
in
try_enqueue ()
(* Dequeue simplified — production uses full MS-queue logic *)
let dequeue q =
let head = Atomic.get q.head in
let next = Atomic.get head.next in
if next.value <> None then begin
if Atomic.compare_and_set q.head head next then
next.value
else None
end else None
end
25.5 Latency Profiling
module Latency = struct
(** High-resolution timer (nanoseconds) *)
let now_ns () =
let ts = Unix.gettimeofday () in
Int64.of_float (ts *. 1e9)
type measurement = {
label : string;
start_ns : int64;
end_ns : int64;
}
let elapsed m = Int64.sub m.end_ns m.start_ns
let measure label f =
let t0 = now_ns () in
let r = f () in
let t1 = now_ns () in
({ label; start_ns = t0; end_ns = t1 }, r)
type histogram = {
buckets : int array; (* nanosecond buckets *)
min_ns : int64;
max_ns : int64;
count : int;
total : int64;
}
let percentile hist p =
let target = int_of_float (float_of_int hist.count *. p) in
let cumul = ref 0 in
let result = ref 0 in
Array.iteri (fun i n ->
cumul := !cumul + n;
if !cumul >= target && !result = 0 then result := i
) hist.buckets;
!result
end
25.7 PPX for Type-Safe Protocol Parsing
High-frequency trading systems must parse two critical protocols at microsecond latency: FIX (Financial Information eXchange) for order management, and ITCH/SBE (Simple Binary Encoding) for market data. Hand-writing parsers for these protocols is tedious, error-prone, and produces code that drifts from the protocol specification over time. OCaml's PPX system allows parsers to be derived directly from type definitions annotated with protocol metadata — eliminating the entire class of hand-written-parser bugs.
25.7.1 Type-Safe FIX Parser via PPX
The FIX protocol represents each field as a tag=value\001 pair. A hand-written parser for ExecutionReport (35=8) must map each integer tag to its field, convert the string value to the correct OCaml type, and validate required fields. PPX generates this from an annotated record:
(** FIX 4.2 ExecutionReport: PPX derives a statically-typed parser *)
(** Each field is annotated with its FIX tag number *)
type execution_report = {
cl_ord_id : string; [@fix.tag 11] [@fix.required]
order_id : string; [@fix.tag 37] [@fix.required]
exec_id : string; [@fix.tag 17] [@fix.required]
exec_type : exec_type_code; [@fix.tag 150] [@fix.required]
ord_status : ord_status_code; [@fix.tag 39] [@fix.required]
symbol : string; [@fix.tag 55] [@fix.required]
side : [`Buy | `Sell]; [@fix.tag 54]
last_qty : float; [@fix.tag 32]
last_px : float; [@fix.tag 31]
cum_qty : float; [@fix.tag 14]
leaves_qty : float; [@fix.tag 151]
transact_time: string; [@fix.tag 60]
} [@@deriving fix_parser]
(** Generated:
val parse_execution_report : string -> (execution_report, string) result
val encode_execution_report : execution_report -> string
val execution_report_tags : int list (* for validation *)
*)
and exec_type_code = New | Partial | Filled | Cancelled | Rejected
[@@deriving fix_enum { "0"=New; "1"=Partial; "2"=Filled; "4"=Cancelled; "8"=Rejected }]
and ord_status_code = Open | Partially_filled | Filled_status | Cancelled_status
[@@deriving fix_enum { "0"=Open; "1"=Partially_filled; "2"=Filled_status; "4"=Cancelled_status }]
(** Runtime usage: zero hand-written parsing code *)
let handle_fix_message raw_msg =
match parse_execution_report raw_msg with
| Error msg ->
Printf.printf "Parse error: %s\n" msg
| Ok report ->
(* report.last_px is already a float — no manual atof *)
(* report.exec_type = Filled is a type-safe comparison — no string comparison *)
if report.exec_type = Filled then
Printf.printf "Fill: %.0f @ %.4f for order %s\n"
report.last_qty report.last_px report.cl_ord_id
The [@@deriving fix_parser] attribute instructs the PPX to generate:
- A parser that splits the FIX message on
\001, maps eachtag=valuepair to its record field by integer tag lookup, converts string values to their OCaml types using the field's declared type, and validates[@fix.required]fields are present - An encoder that serialises the record back to a FIX string
- The tag list constant for external validation tools
The critical property is that field-tag mismatches are caught at code generation time (when the PPX runs), not at runtime when a malformed message arrives in production. If a developer adds a new required field to execution_report without the corresponding annotation, the PPX rejects the type definition. If they annotate the wrong tag number, the generated parser will fail to extract the field in tests, not silently in production.
25.7.2 ITCH Binary Parser via PPX
For ITCH market data (the Nasdaq binary market data protocol), PPX generates byte-offset readers from field-layout annotations:
(** ITCH 5.0 Add Order message: PPX derives a zero-copy binary parser *)
type itch_add_order = {
message_type : char; [@itch.offset 0] [@itch.size 1] [@itch.type `char]
stock_locate : int; [@itch.offset 1] [@itch.size 2] [@itch.type `uint16_be]
tracking_number : int; [@itch.offset 3] [@itch.size 2] [@itch.type `uint16_be]
timestamp_ns : int64; [@itch.offset 5] [@itch.size 6] [@itch.type `uint48_be]
order_reference : int64; [@itch.offset 11] [@itch.size 8] [@itch.type `uint64_be]
buy_sell : [`Buy | `Sell]; [@itch.offset 19] [@itch.size 1] [@itch.type `side]
shares : int; [@itch.offset 20] [@itch.size 4] [@itch.type `uint32_be]
stock : string; [@itch.offset 24] [@itch.size 8] [@itch.type `alpha_padded]
price : float; [@itch.offset 32] [@itch.size 4] [@itch.type `price4]
} [@@deriving itch_parser]
(** Generated:
val parse_itch_add_order : Bytes.t -> int -> itch_add_order
(* offset parameter for zero-copy parsing from a ring buffer *)
*)
(** High-frequency handler: statically-typed, no string intermediary *)
let on_add_order buf offset =
let msg = parse_itch_add_order buf offset in
(* msg.price is already a float (divided by 10000); msg.buy_sell is [`Buy | `Sell] *)
Order_book.add
~symbol:msg.stock
~side:msg.buy_sell
~price:msg.price
~qty:msg.shares
~ref_id:msg.order_reference
The [@itch.type \price4]annotation tells the PPX to read a 4-byte big-endian integer and divide by 10,000 to recover the fixed-point price representation. The[@itch.type `alpha_padded]` reads 8 bytes and strips trailing spaces. All of this is generated from the type definition; the developer never writes byte-offset arithmetic manually.
25.7.3 Comparison: PPX vs. Hand-Written Parsers
| Property | Hand-written parser | PPX-derived parser |
|---|---|---|
| Field-tag mismatch | Runtime error | Compile-time error |
| Type mismatches | Runtime cast/exception | Impossible |
| New field maintenance | Manual update | Re-run code generation |
| Validation of required fields | Runtime, if remembered | At code generation |
| Performance | Optimised manually | Equivalent or better (no overhead) |
| Testability | Test parser + business logic | Business logic only |
PPX-derived parsers are not a convenience feature — they are a correctness feature. For a protocol with 50+ message types (FIX has over 60 message types; ITCH has 26), the amount of hand-written boilerplate that can be eliminated is substantial, and each eliminated line of boilerplate is a line that cannot contain a bug.
25.8 Chapter Summary
High-performance trading infrastructure is an engineering discipline where every abstraction has a cost and every cost must be measured. The tools in this chapter — memory layout, allocation avoidance, lock-free data structures, binary protocols, latency profiling — are not academic optimisations but operational necessities for any system that must respond to market events in microseconds.
OCaml's incremental GC's bounded pause time is critical: the minor heap can be sized so that minor collection pauses are under 10 microseconds, and major collection can be triggered at controlled points. Pre-allocating all data structures at startup and reusing them with ring buffers or object pools eliminates allocation during the hot path entirely. This is the same technique used in C++ with custom allocators, but OCaml's type system makes it safer.
Binary protocols (ITCH for market data, SBE for derivatives) are 5-10x faster to parse than FIX because they avoid string parsing entirely. Integers are packed into direct byte-offsets; message fields are read by simple array indexing. The FIX protocol's tag=value format was designed for human readability and is entirely unsuited for machine parsing at scale; it persists in the industry only because of legacy compatibility. PPX-derived parsers (§25.7) generate type-safe, zero-overhead parsers directly from annotated type definitions, eliminating the entire class of hand-written-parser bugs with zero runtime cost compared to manually written field extraction.
OCaml 5 domains enable genuinely parallel market data processing. With lock-free queues using atomic compare-and-swap operations, a market data aggregation domain can push updates to multiple strategy domains without locking. Latency profiling at microsecond resolution — tracking not just mean latency but p95, p99, and p999 — identifies the tail events that matter most for system reliability.
Exercises
25.1 Implement and benchmark a pre-allocated ring buffer for 10,000 quote updates. Measure throughput vs a naive Queue.t.
25.2 Write a complete FIX 4.2 parser for New Order Single (35=D) and Execution Report (35=8). Test with sample market messages.
25.3 Build a lock-free single-producer/single-consumer queue using Atomic operations and benchmark against Mutex-protected queue.
25.4 Profile the Black-Scholes pricer: measure time for 1 million option pricings, identify the bottleneck (norm_cdf approximation), and optimise.
25.5 Design a PPX attribute schema for a simplified FIX New Order Single (35=D) message with fields: cl_ord_id [tag 11], symbol [tag 55], side [tag 54], order_type [tag 40], order_qty [tag 38], price [tag 44, optional]. Write the annotated type definition and describe what code the PPX should generate. Implement the parser by hand and measure the difference in line count vs the annotated approach.