MySQL is known for its low cost, relative ease of use and charter membership in the ubiquitous LAMP software stack with the Linux operating system, Apache HTTP server and PHP, Perl or Python programming languages. As MySQL has spread, it has developed a healthy labor supply: Guru.com lists more than 19,000 MySQL freelancers and MySQL’s Meetup.com page shows close to 43,000 members in 18 countries.
There is little question about MySQL’s popularity. But some organizations are concerned about their ability to feed growing data sets into MySQL. Time-intensive activities—such as loading a large database, adding rows to an existing database, adding or removing a column, adding an index—all can lead to challenging slowdowns or take important data offline. Performance lags hurt MySQL-based systems straining to keep up with requests for processing business queries that, for example, track hundreds of millions of online advertisements, or analyze how customers are moving through a website to track behavior and identify new business opportunities.
Tokutek seeks to address this performance challenge through its TokuDB storage engine for MySQL installations that surpass 50 gigabytes and are heading toward 500 gigabytes. By improving the performance of the storage engine managing access, updates, insertions, deletions and other changes to the database, Tokutek argues that it can help organizations avoid costly and difficult decisions such as buying more hardware or ripping out and replacing their beloved MySQL (and hiring and training people to manage new systems).
And, Tokutek’s TokuDB uses a technology called Fractal Tree Indexes that make MySQL queries sing—much faster than the InnoDB storage engine that many installations employ. This performance boost makes it possible to run online transaction processing (OLTP) and analytics applications on larger data sets, at faster rates with staff using existing MySQL skills, the company says. In April, Tokutek announced version 6.0 of TokuDB. The storage engine works with MariaDB as well as MySQL.
About the technology: Fractal Tree indexing is based on the computer science theory of cache-oblivious algorithms. This is a different approach from the cache-aware techniques B-Tree indexing storage engines have used for many years. These clever algorithms allow for managing data in bigger chunks and enable insertions of data much more quickly than traditional storage engine techniques, says Lawrence Schwartz, marketing vice president at Tokutek.
The Fractal Tree approach “is specifically designed to overcome the disk I/O limitations that impact B-tree indexes, the indexing method used by the majority of existing database products,” Matt Aslett of The 451 Group writes in an assessment of Tokutek in 2009, when the company was introducing its technology. Cache-oblivious algorithms, Aslett notes “are designed to remove the dependency on block sizes in system memory. This has enabled the Fractal Tree indexing algorithm itself, as well as improving data-compression capabilities, which also improves query performance.”
Zeroing in on the I/O issue in storage engines, Philip Howard, an analyst at Bloor Research, writes that when a database using B-Trees “writes data to disk, it mixes in old data with new, which means it needs a lot of writes to get all the new data written. Having more indexes makes it worse because every time you update the data on disk you also have to update all the relevant indexes so that you get a lot of writing to disk, which increases load times. As a result, you have to limit the number of indexes you can support which in turn slows down your query response.”
Fractal Trees get around this problem, Howard adds: “When a Fractal Tree index is used to write data, it’s all new, which means you do a lot less writing. Further, Fractal Tree indexes write data in much larger blocks (measured in megabytes as opposed to 16 kilobytes, which is typical for MySQL) which yields more effective compression. … So you get both better OLTP performance and better query performance.”
The comparison to InnoDB is impressive, and Tokutek’s technology represents an evolutionary advance in database management, as well as an opportunity to extend the life of MySQL systems for users suffering poor performance, says Joseph Martins, managing director of Data Mobility Group. Martins adds he is eager to see more information about the technology’s workings, both in the form of a Tokutek white paper about Fractal Trees, and input from more customers implementing TokuDB. “This approach has a lot of headroom,” he says.
TokuDB is a software-only MySQL plug-in. It is MVCC (multi-version concurrency control) and ACID-compliant (ACID refers to a test applied to data for atomicity, consistency, isolation, and durability). And TokuDB is fully compatible with existing MySQL applications, Tokutek notes.
Use cases: Evidenzia, a company based in Karlsruhe, Germany, tracks copyright infringements (such as illegal file sharing) for the software, movie and music industries, ingesting Internet log information and tracking IP addresses for court cases. The company said it uses TokuDB to help it manage a growing MySQL installation, by performing database inserts and selects in parallel, and benefitted from the data compression rates by saving on storage resources.
Kayak, the online travel site, was an early adopter of TokuDB. The company said in a statement that it implemented the storage engine to speed the time it could respond to customer requests for flights and other arrangements, and to support analytics queries about ways to improve its online experience.
Pricing: TokuDB for MySQL is free for commercial deployments up to 50 gigabytes, and it is licensed at $2,500 per 100 gigabytes per year.
About the company: A trio of computer scientists with experience in industry founded privately-held Tokutek in 2006 to commercialize technology they developed in their research. Michael A. Bender of Stony Brook University, developed a job scheduler for massive supercomputers licensed to Cray. Martin Farach-Colton of Rutgers University is a former Google web crawling specialist. And Bradley C. Kuszmaul of MIT architected Akamai’s distributed data collection system.
Michael Goldberg is editor of Data Informed. Follow him on Twitter at