The In-Memory DB Hypewagon – Quick Thoughts

Recent years have seen many stories touting the impact of in-memory database management systems (IMDBMSes), which load up all data into memory, as the new direction for ultra-fast, real-time processing. With each week a new VC-funded entrant seems to arrive with claims of being the fastest database ever and accelerating processing by 100X, 1000X, 10000X! Gartner late last year put out a Market Guide for In-Memory DBMS which advises IT decision-makers to investigate this disruptive technology. Poor old relational database management systems are slagged as relics, dinosaurs optimized for on-disk processing.

There’s no doubt the performance of all DBMSes benefits from providing enough memory to fit all data in memory. The reason IMDBMSes are presented by their proponents as gazillions of times faster than traditional databases (even when they are given enough memory to load up all the data into the cache) is because they are designed from the ground up to optimize for in-memory operations vs. disk-based processing.

It seems to me however, that the optimizations that really account for the biggest performance gains can and in many cases have been successfully built into our aforementioned poor “traditional” RDBMSes. In the case of analytics, I would say it is the columnar data structure that is really the game-changer, facilitating efficient aggregation and calculation on individual columns. DB2 10.5 added BLU Acceleration – with this capability you are able to create columnar storage tables in the database. You can provision enough memory for all of them to be loaded up but BLU is also optimized for working on large data sets which do not fit into memory, minimizing I/O from disk. When SAP HANA, one of the supposed “pure” in-memory database players, first came out SAP said that all data should/will be in memory. They have since backtracked with SAP IQ for Near-Line Storage for HANA and more recently HANA Dynamic Tiering for Extended Storage (for “non-active” data). The customer performance results I have seen with BLU are no less impressive than those of HANA, and are typically accomplished with a ton less hardware. Microsoft SQL Server and Oracle have also come out with their own columnar store offerings, the Columnstore Index and Database In-Memory, respectively.

On the OLTP side of the house, I have not seen any real evidence that in-memory optimizations have resulted in tremendous performance acceleration. The major difference maker, which is nothing new, is turning off logging so you avoid touching disk altogether but this is not applicable to all application uses. While grandiose claims of XXXX times speedup for OLTP are common by IMDBMS vendors, substantiated results are harder to find. I would refer you to a very funny old article on how these claims or “benchmarks” are often put together – MySQL is bazillion times faster than MemSQL, a response to a MemSQL “result”. SQL Server In-Memory OLTP, an offering by a traditional vendor, has received some plaudits. But from what I have seen, acceleration is not broadly applicable to an entire workload, is quite limited without the use of C stored procedures (not exactly an in-memory optimization or even a new one), and perhaps has something to do with addressing SQL Server locking issues. I absolutely understand that there are optimizations that can and have been done for more efficient in-memory processing for OLTP, I just don’t know that they have delivered a game-changing performance boost.