FIX engines perform differently. This is a product of how these components are written. When selecting a FIX engine it is important to understand your desired performance characteristics, and test the FIX engines to ensure you use a product that meets your requirements. Alternatively you can use benchmarks supplied by trusted third parties. Rapid Addition benchmarks its engines in the Intel Low Latency labs and publishes the results.
What do you measure?
There are three typical measurements that people us to characterise the performance of a FIX engine
- Throughput is an aggregate measure that shows how many messages can be processed in a period of time. For FIX messaging it is normally measured in messages per second. Sometimes people try and infer the latency from the throughput, by assuming that the reciprocal of the throughput is the same as the latency, but this is erroneous as parts of the system can overlap, with processing of multiple messages being executed in parallel. It is possible, for example, to have a system that could process 50 messages using a pipeline of parallel sub-processes, with each message taking one second to traverse the entire pipeline (tick to trade). Because they are all following each other, getting processed in parallel, the measured throughput would be 50 messages per second – yet the latency would be one second, not 0.02 seconds (1/50th).
- Latency is an individual measure that shows how long it takes a system to process any individual message. Often latency measurements are reported as a statistical analysis of a series of individual measurements, and so are presented as averages and spreads (standard deviation, percentiles), to allow users to understand the distribution of the results.
- Jitter is the distribution of latency across a series of measurements. Often jitter is expressed using the standard deviation, but this should not be used alone as jitter is not normally distributed – latency is clearly bounded on the lower side, but not on the upper side where a ‘long tail’ is often evident. For this reason, calculating the median and percentiles gives a better understanding of the distribution – which typically requires every latency measurement to be captured, unlike the mean and standard deviation which can be calculated using running totals. Often the best way to understand the jitter in a system is to see this distribution graphically plotted out.
What are the sources of latency?
In measuring latency most people are trying to determine the tick to trade latency. Tick to trade latency is a measure of the time taken to react to a market data tick by placing a trade. There are four main sources of latency in this journey – the market data tick on the wire has to be read by the network hardware (normally a NIC card), then passed through parts of the operating system (mainly the TCP/IP stack) and then it must parsed by the FIX engine, before finally being presented to user code. There is also the latency of the user code to consider, as although this may be doing something very simple (such as simply checking a price), it is often something more complex. Once the user code has assembled the trade message in response to the tick, it needs to traverse the FIX engine, the OS and the network hardware in the opposite direction. When comparing latency of FIX engines it is easiest to replace the user code with a very simple piece of logic that simply trades every nth tick, where n is chosen to represent a typical tick to trade ratio. This allows the comparison of engines without much influence from the user code, and essentially isolates the plumbing latency cost.
Rapid Addition works primarily with managed code, and in managed code there is also the effect of the garbage collector. The GC can affect the latency of an individual messages, increasing the jitter considerably. Most garbage collectors stop all code execution whilst they clean up the unused memory objects, and this pause in code execution can cause large delays in processing a message. Rapid Addition have developed a programming framework, called Generation Zero, that avoids our applications creating any objects that will be eligible for garbage collection during the operational use of our programs. Objects are created at start up, assigned to object pools and these pooled objects are reused by the program throughout its life. Objects are only garbage collected when the engine is shut down. This avoids both the overhead of memory allocation during the critical tick to trade window, and reduces the risk of a garbage collection pause during that window.
Where do you time?
There are two ways people try to time latency and throughput of software, one is from the software itself, and the other is to use a network capture tool to capture the network traffic that contains the FIX messages. The two techniques have different pros and cons.
Timing with measurement points in the software. Such tests need to be carefully designed, otherwise they can omit sources of latency. For example, you could just test the message parsing time, but omit the network and operating system elements. These measurements can be useful for tuning a specific part of the system but are not as useful for measuring tick to trade latency. You can design software timestamp tests that eliminate most of these issues and give surprisingly accurate results – by using two machines and measure the time from sending a tick from one machine to receiving the trade on the same box. This total round trip represents twice the tick to trade latency, and will show a higher level of jitter, but is likely the best estimate you can get using pure software timing.
Timing using network capture devices allow you to directly measure tick to trade latency, including all its elements. They give a true measure of jitter as there is no averaging down on the latency numbers – all data can be captured in a PCAP file and analysed later to give the latency distribution. The main downside is the relative cost of this approach compared to a software timing method, which can be set up with the addition of a few lines of code.
Rapid Addition maintains a testing environment instrumented with network capture devices and can work with clients to perform short tests of our, and competing, products.
How Rapid Addition measure the performance of our engines.
The test involves listening to a stream of FIX Market Data Incremental Refresh (MsgType = X) messages on a FIX session and whenever the bid on a market data message ends in “.000” sending a FIX New Order – Single (MsgType = D) message, to buy the symbol, to an execution venue simulator on a second FIX session. The simulator fills the FIX New Order – Single (MsgType = D) message with two FIX Execution Report (MsgType = 8) messages, one with Order Status of ‘New’, and the second ‘Filled’. On receipt of the second execution report the test application sends a second FIX New Order – Single (MsgType = D) message, this time to sell the same symbol.
The test that a message end in “.000” is obviously not a real trading strategy but has been chosen to isolate the performance of the messaging framework and allow that to be illustrated in isolation. Real trading strategies may add more latency and jitter to this process.
Using a network tap and a high precision clock four times are recorded, the arrival at the network card of the FIX Market Data Incremental Refresh (MsgType = X) message, the departures from the network card of the two FIX New Order – Single (MsgType = D) messages and the arrival at the network card of the second FIX Execution Report (MsgType = 8).
This gives us two two categories of event, firstly an event that is independent of the trading channel (MDToBuy) and secondly an event within the trading channel (ERToSell).
By measuring externally, this test provides a fair and objective measure of end-to-end latency, allowing the performance characteristic of different FIX engines to be compared. The scenario outlined above is similar enough to a real trading flow that it is indicative of the performance of a real world FIX application. Rapid Addition always test our FIX engines using this methodology, ensuring our products always provide the highest levels of performance.