Incorporating FPGAs into Heterogeneous Environments

by   |   December 17, 2015 5:30 am   |   0 Comments

Pat McGarry, Vice President of Engineering, Ryft

Pat McGarry, Vice President of Engineering, Ryft

The typical approach to big data analysis involves massive data centers packed with hundreds of servers and dozens of open-source tools, or costly and rigid commercial frameworks. Despite this investment in IT, organizations still can’t get the answers they need quickly enough. As data pipelines scale to accommodate more decisions about more data, they inevitably hit a tipping point, becoming over-burdened and overly complex – the point where more hardware doesn’t effectively add processing power and more complex software doesn’t yield measurable improvements. It’s a problem literally hardwired into our IT architecture, and we are running headlong into a performance wall if we don’t look to new solutions.

Enterprise growth, innovation and, occasionally, even effective competition requires intelligence that is trapped inside systems built on 70-year-old Von Neumann compute architectures never intended to take on some of the processing tasks required for big data. While clustering has made it technically possible to process data at far greater scale, the clusters are slow, costly, and challenging to deploy, program, and maintain. These clusters and their supporting infrastructure have succeeded in buying the industry time, but now it’s necessary to come up with other answers. It’s time for new tools to be added to the toolbox.

Those working at the forefront of big data computing already know what’s next: shifting the burden from large server farms, based on common sequential programming architectures, to heterogeneous architectures containing a mix of sequential (CPU, GPU) technologies alongside massively parallel field-programmable gate array (FPGA) enabled systems.

The roots of FPGA acceleration technology date back to the 1980s, but only now are FPGA-based systems beginning to see wider adoption, thanks to the demands of processing the variety, volume, and velocity of big data driven by our more connected world and the Internet of Things. In the technology’s early stages, deploying FPGAs required very specific programming expertise that was rare and, therefore, expensive. This made changes difficult and presented obstacles to co-existing with more common x86 hardware. FPGA processing always has been a better tool for many computing tasks, especially at high scale and high speed, but its historical complexity made it unattractive relative to the less elegant but easier strategy of sequential architecture clustering. So only the most latency-sensitive organizations adopted FPGA technology. Over time, out of necessity and ubiquity, commodity sequential clusters and related software took on the task of data analysis, but the architecture inherently compromised the strategy for addressing quickly growing data.

Related Stories

IBM, Xilinx Eye Performance Gains for Cloud Analytics, Data Centers.
Read the story »

Is Your Data Center Ready for 2020?
Read the story »

New OpenPOWER Consortium Looks to Innovate For Future Data Centers.
Read the story »

Use Near-Line Data Storage to Cut Costs and Speed Access.
Read the story »

Today, with clustering and software doing all they can to lessen the burden on CPU hardware, the majority of enterprises have reached the tipping point. Fortunately, a new breed of easier to use and integrate FPGA-based systems is ready to meet this challenge – systems that remove the points of friction common with early FPGA deployments and deliver the full benefit of its hardware-level parallelism.

FPGA-enabled systems today are capable of delivering real-time data analysis performance to drive nimble and confident decision making. New systems have abstracted the complexity and provide an open platform that easily interoperates with other IT systems and sequential-based computing, whether x86, GPU, or others. With the size, simplicity, and scalability advantages we are now seeing in FPGA and heterogeneous systems, edge computing – or bringing data analysis closer to the source of data – is now possible, unlocking trillions in value.

FPGA 101

FPGAs are purpose-built hardware that can provide lower latency, more horsepower, and the ability to instantly analyze disparate types of data at less than half of the cost of a cluster of commodity servers. FPGAs enable the execution of massively bitwise operations in parallel. Specialized analytics servers based on FPGA technology can replace hundreds of CPU-based servers while delivering instantaneous insight into both streaming and batch data – even correlating the two.

Most importantly, FPGAs break through the bottlenecks associated with today’s x86-based architectures. That isn’t to say that FPGAs are better at everything. It’s simply a different kind of tool well suited to algorithms to which CPUs are poorly applied. In a similar way, cars are very useful for a trip to the local store, but less efficient for cross-country travel. It’s much more efficient to cross the country by plane than to drive across it by car. The same idea applies to analytics processing: it’s well past time to stop misapplying sequential processors to all processing needs.

FPGAs are purpose-built for specific algorithms, such as those relating to fast searching or machine learning. In other words, if a specific problem maps well to the hardware parallelism afforded by FPGA fabric, then FPGA is the best bet. The future of data analysis lies in a balanced heterogeneous computing architecture, using the right tool for the right job at the right time. Traditional software companies like Microsoft now realize that FPGA technology has a place alongside CPU (and even GPU) architectures in core technology implementations, including its search engine, Microsoft Bing.

FPGA vs. CPU and GPU

The most significant difference between a CPU- or GPU-based design and an FPGA-based design is that CPUs and GPUs are constrained by a fixed hardware structure reliant on software programs operating with sequential instruction sets with associated register and bus width limitations. Using many cores in CPU and GPU architectures can certainly help in some scenarios, but that only affords a type of software parallelism, which remains limited by the processor’s sequential instruction sets and fixed bus widths.

And FPGA is hardware re-programmable according to user applications. Once programmed with suitable algorithms, the FPGA can operate on data with very wide bus widths and very large pipelines with no extra latency and no software overhead. In that manner, FPGAs present a completely different solution paradigm when compared with contemporary CPU and GPU architectures.

General Purpose vs. Purpose Built

CPUs are exceedingly flexible and able to handle a wide range of computing tasks without changes, but there is a performance penalty for that flexibility. CPUs were highly valuable when choosing one platform for all tasks was possible and programming FPGAs was difficult. Now that adding transistors is no longer effective for improving processor speeds, and adding more hardware complexity is counterproductive, we need to apply all available tools – each to their appropriate role.

Early FPGA systems may not have been practical for broad use cases. Today, however, reprogramming a modern, well-architected FPGA-based system repeatedly is as easy as working with a Linux server, allowing them to do new and different things as needed. FPGAs can be configured for one problem for an instant and reconfigured slightly or entirely to suit a completely different problem in another instant. This allows smaller, faster, and more efficient processing platforms to be created and distributed throughout an enterprise to bring much needed data center-scale processing power to remote sources of data.

That’s millions of transistors dedicated to specific operations, when and where you need them. It’s like being able to create your own instruction set and your own buses and bus widths, without actually needing an instruction set. A CPU- or GPU-based design, on the other hand, is ultimately constrained by its well-defined instruction set and its equally constrained bus widths. Sixty-four-bit wide buses are quite typical, for example, which pales in comparison to the thousands of bits wide pathways that are available to an FPGA.

The bus-width problems notwithstanding, in a CPU or GPU environment, programs are executed that run sequential instructions, often via high-level languages such as Java and Python that add significant amounts of software overhead. Although modern CPU and GPU architectures allow for some amount of parallelism across multiple cores, this parallelism is software parallelism, and the instructions, in the end, are always sequential in nature, even in multi-threaded environments. FPGA differentiates itself here because properly architected FPGA fabric is anything but sequential – everything executes in full parallel fashion, using hardware parallelism concepts.

It is also interesting to note that the recent increased use of GPU architectures in the data analysis space is, at the lowest level, still a sequential architecture. The GPU’s secret sauce is that it just has “more” hardware cores allowing it to run “more” threads at the same time than a typical CPU, and a GPU does so in a more power- and cost-efficient manner than a CPU for certain operations. GPU reliance on instruction sets, static registers, static bus widths, RAM requirements, and interfacing, and so on, means that, in the end, they too are just a different type of software parallelism. True reconfigurable hardware parallelism is the unique purview of properly architected FPGA architectures.

Simplified Architecture

As I noted, the cost of this hardware parallelism is in the need to program FPGAs. When not properly architected, FPGAs can be complex – and for decades, they were. Those who knew about FPGA architectures often considered them more trouble in programming than they were worth in processing. As a result, CPU and GPU architectures prevailed for as long as IT organizations could effectively rely on just software-parallel systems. Now, as demand for hardware parallelism becomes a pressing issue, simplified FPGA architectures and pre-defined analysis algorithms to go with them have emerged for uses such as fuzzy search, term frequency, image search, and others.

These small, FPGA-accelerated systems offer several benefits that appeal to data scientists, business analysts, data systems specialists, and organizational leaders from all industry sectors:

  • Performance. For targeted analytics data-centric applications, such as search and fuzzy search, the fast and efficient FPGA-based Linux servers are proven to analyze big data 100 times faster than high-performance conventional servers, with no data preparation required, based on benchmark testing of Spark running on Amazon Web Services nodes. That power is reflected in the financial upside of smarter decisions that help organizations stay ahead of their competition.


  • Ease of use. Packaged FPGAs with open APIs are available incorporating pre-built algorithms for specific needs such as fuzzy, image, and term-frequency search. Users don’t even have to know that FPGAs are working behind the scenes. Data analysis becomes a near instantaneous, push-button exercise.


    • High efficiency. Return on investment for big data analysis must be proven quickly. Otherwise, executives lose faith in big data initiatives and often stop funding them. Fortunately, the TCO savings for most FPGA-enabled systems is immediately apparent. Due to their reliance on hardware parallelism, FPGA-enabled servers offer increased efficiency with their ability to handle the workload of 100 or more traditional clustered nodes for specific applications. That means massive cost savings in rack space, power, cabling, networking, IT personnel, and software maintenance requirements. 


Every day, businesses and entire markets are adding new and growing data sources to their competitive tool set. The increasing volume and detail of information captured by enterprises, the rise of multimedia, social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future. Better analysis of this data at scale is a key point of competition, underpinning new waves of productivity, innovation, and growth. Organizations of all kinds need insights in real time, not in weeks or months.

No organization can fall into the trap of simply throwing more of the same hardware at their data challenges. Decision makers no longer can be forced to act on outdated, inaccurate, or incomplete data. The industry’s job is to provide them with the best possible tools for success in the big data era, and x86-based systems alone can’t meet the need. Thankfully, we are at a point now that heterogeneous environments employing the right tool at the right time – to include FPGA-based technology – do meet the need.

Pat McGarry brings extensive technology and leadership experience in hardware and software engineering to his role as Vice President of Engineering at Ryft. He joined Ryft from Ixia Communications, where he was responsible for the company’s Federal security systems engineering programs. During his tenure at Ixia and BreakingPoint Systems, Pat spent several years working in the cyber security industry within the DoD and the Intelligence communities conducting experimentation and analysis of cyber-related performance and security concerns on arbitrary network infrastructures. 

Prior to BreakingPoint, Pat held key roles in product and engineering management. This included hardware and software design in the realm of embedded systems design, network systems analysis and design, and cyber security, while working at both Spirent and Hekimian Laboratories. He earned Bachelor’s degrees in Computer Science from Virginia Tech and Electrical Engineering from Virginia Tech.

Subscribe to Data Informed
for the latest information and news on big data and analytics for the enterprise.

Tags: , , , , , , ,

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>