How did we do it? Why was it the most challenging development project we’ve ever taken on? And how could it give you the edge you’ve been looking for? Find out from our interview with industry expert and Rapid Addition’s Executive Chairman, Kevin Houstoun.
RA Fastlane is part of RA’s Flagship Hub solution. What was the driver behind deciding to develop an FPGA accelerator for RA-Hub?
The RA Hub provides the technology platform that underpins our customers’ critical electronic trading processes meeting clients’ need for performance coupled with the real-time oversight that delivers complete transparency and control of end to end electronic trading messaging.
Certain business models amongst our client base require a higher level of performance, so we constantly review how we can accelerate processing times and identify areas where we can reduce latency.
To achieve a quantum leap in product improvement and genuinely set the industry benchmark, the answer was to move the critical processing elements to hardware in the same way that Intel have been offloading processing onto co-processors for years.
On a conventional server the trip across the PCIe interface is measured in microseconds, so the only way to achieve sub-microsecond is to eliminate that trip.
What was the main challenge that you faced when developing the product?
Our development team has extensive experience in trading tech, so we understand the importance of high quality and responsive support.
However, during the design process our research identified that existing FPGA library vendors did not meet the support levels that we would expect to provide to our customers.
We therefore decided that the only way to achieve the exacting standards we wanted was to develop the FPGA components in-house, and this would require access to the source code for the TCP core.
Our best option was to acquire the intellectual property and engineering team from a company with whom we’d been collaborating – which gave us the entire network stack electronic circuit designs.
From this point on we had all the components for the FPGA development completely in our hands and therefore had the ability to provide the same high level of support for clients that we deliver on our software solutions.
What smaller challenges did you meet along the way?
Finding developers with the right skills was not easy.
There aren’t as many experienced FPGA developers around as there are software developers. A software developer can come from almost any academic background whereas an FPGA developer is likely to have a background in computer science or electrical engineering.
Development tools are also an issue due to there being fewer options for the tools and libraries required to develop under hardware. Furthermore, they are generally much more prone to bugs as there are less people using and testing them.
We also found that dealing with physical boards can be challenging – working with software it can easily be delivered via a download, however FPGA requires sourcing and delivery of a physical board; the transceivers and cables that enable connection to the rest of the system.
Having to work with the possibility that crucial components are not in stock when required, as well as possible shipping and customs delays associated with high value physical items, can create challenges in getting them on time to where they’re needed.
There’s also then the concern that your hardware could be damaged if someone drops it on the way to the customer!
The development cycle for this product is also much longer – a typical build for FPGA can take over 3 hours, so there’s possibility of a long wait before changes can be tested.
Accuracy in development therefore becomes crucial as iterative programming is massively inefficient. On top of this, builds can sometimes produce an image that then fails due to the timing of the electrical signal as it propagates from transistor to transistor, requiring changes and the need to rerun the build to achieve an image that is usable.
Another challenge is that FPGA is a black box and therefore the internal state of the FPGA can’t be inspected without specifically adding functionality to expose particular elements, which again adds substantial time to development and so tracing where a problem lies can be painfully slow.
We have now created a logging system to capture the internal states in a debug mode; giving us visibility of the internal state of the machine.
Finally, changes can also have consequences for unrelated functionality – all the functionality required on the FPGA must share a variety of limited resources.
Changes to one module can “steal” resources from neighbouring modules, which may be unrelated, and impact where they are placed on the chip and how they connect to the other resources they need.
Small changes to one module can end up having unknown and unforeseen consequences throughout the design on the whole as an extra transistor results in logic being shuffled around and timings changed.
However, despite the many challenges of working with FPGA technology, the performance enhancements are significant and owning our own end to end tech stack gives us the ability to manage these challenges and optimize speed.
How did the development team work together given the range of different skills required to develop across both the hardware and software Environments?
We originally started with two separate development teams based in Berlin and Prague – one as a result of the acquisition and the other being our existing core software team.
Ultimately, we amalgamated the two groups in our Prague development center to create a highly integrated team, with both software and hardware engineers working side by side.
This gave the software team a deeper understanding of what can be accelerated through the hardware process, and the FPGA team better insight into the business challenges that the hardware solution needs to address.
One of the most interesting things about the FPGA deployment is that the system ensures only the critical path processing is developed within the FPGA card – how does that work in practice?
We perform extensive profiling both from a throughput and a latency perspective on our software solutions, which identifies candidates for hardware acceleration.
In practice, the way that we implement this is by looking at a journey from wire to wire to determine where the longest steps are, or those that use the most CPU resources, and consider each as a candidate for offloading onto the FPGA.
Once they are identified, we conduct a POC to confirm if our ideas work and deliver the anticipated performance gains. For example, managing a simple TCP pass through proxy to host a basic risk check consumes several microseconds, and moving that into FPGA allows us to reduce this time to less than one microsecond.
What did you learn from the development of Fastlane and given what you now know would you still have developed using FPGA?
This was by no means an easy development programme taking into consideration the various challenges already discussed, but we are beyond proud of the significant performance enhancements we have achieved for our clients.
Having invested the time and money into the FPGA development, RA is now well placed to deliver those performance benefits to our growing customer base.
Off-loading workload into electronics can reduce latency or remove significant workloads from the CPU, creating faster systems for our clients or allowing them to do more work with fewer servers, reducing their overall spend and helping improve their cost income ratios and return on equity.
Want to know more? Check out RA Hub, the message routing platform and on-boarding solution that sits at the heart of your electronic trading environment.