My Thoughts on Microsoft-sponsored Report on Analytics Platform System

Having had clients reach out to me for my perspective on Value Prism Consulting’s Microsoft-sponsored report Microsoft Analytics Platform System Delivers Best TCO – to – Performance, I thought I would take some time to share post some of my thoughts on this piece of work. I will provide some comparisons to the IBM PureData System for Analytics, given I have background in this offering.

1. First, the report makes the case that what I would call a yet unproven high-end data warehousing offering will deliver the best performance for $ invested because it has the most cores, storage, I/O bandwidth, and memory. I could go ahead and take any given free open-source DBMS with an MPP scale-out architecture, deploy it on loads of hardware and make the same argument, but performance for complex analytics on large data sets, without extensive tuning, is hard!

The Microsoft Analytics Platform System’s first iteration was available in 2010. Last I checked there were only 3 case studies available for customers running the latest SQL Server 2012 Parallel Data Warehouse Edition on a Microsoft appliance. I had also seen a number of “20+” given in one Microsoft presentation as the total number of customers running the Analytics Platform System with Microsoft’s Clustered Columnstore Indexes (only available in the latest 2012 software). By contrast, the IBM PureData System for Analytics customer count is in the many hundreds, including data sizes of over a petabyte. $ per TB/core/GB ram is not equal to $ per unit of performance, especially when you don’t have that many proof points.

2. I see some issues with how Full Time Equivalent (FTE) per rack, which represents the labor costs for administering a system, is calculated for both the Analytics Platform System and the competition. As I said already, performance for complex analytics over large data sets is hard. For some offerings this kind of performance is achievable, but requires extensive tuning work. As the report itself says, many vendors brand their offering an appliance and claim leading Total Cost of Ownership.

In the case of the Analytics Platform System, Microsoft and Value Prism claim simplicity of use and hence lowest labor costs by suggesting that the product is just SQL Server, SQL Server skills are easily transferable, and so labor costs will be no higher than for administering SQL Server. This strikes me as nonsense. Vanilla SQL Server is SMP, not scale-out, and average deployments still tend to the smaller side. Administering SQL Server on a few cores is not the same as administering an MPP/Scale-out system running deep analytic queries against tens or hundreds of TB of data. Scale-out introduces additional tasks like monitoring for data skew. Also, my understanding is SQL Server Parallel Data Warehouse Edition is a hybrid of Microsoft’s DataAllegro acquisition and SQL Server, and it still lacks the complete feature set/syntax compatibility with regular SQL Server (recently delivered some improvements for this in their AU3 update).

By comparison, the PureData System for Analytics has long been proven for ease of management, with many many customer testimonials attesting to performance out of the box with no tuning as well as simplicity of administration (unlike the APS sytems with hardware from Dell, HP, or Quanta it was build from the ground with HW+SW with unique hardware acceleration). Somehow the Value Prism report claims 4 FTEs required per rack of PureData System for Analytics while referencing an ITG report which says something completely different.

“Among 21 PureData System for Analytics users, 18 reported that they employed less than one FTE administrator. The exceptions were an organization that declined to state the number of systems employed, but described the installation as over one petabyte (one FTE was employed); and others reporting more than 20 and more than 30 systems respectively (two FTEs were employed).”

For the 4 customers profiled in detail in the ITG report, the FTEs per rack is 0.32 (2.4 FTEs/7 racks). Where on earth did the 4 FTE per rack number come from??

netezzafte

3. I don’t think I need to say much more, but the last thing I’ll add is that even the numbers for hardware resources per rack appear to have inaccuracies as well. The amount of cores in a rack of PureData System for Analytics is given as 112, where as the number is in fact double, with another 112 cores doing filtering of data off the storage tier. This would bring the total number of cores on the system significantly more than on the Analytics Platform System, but these cores are conveniently excluded.