Replacing Traditional Storage, Databases With In-Memory Analytics 124
storagedude writes "Traditional databases and storage networks, even those sporting high-speed solid state drives, don't offer enough performance for the real-time analytics craze sweeping corporations, giving rise to in-memory analytics, or data mining performed in memory without the limitations of the traditional data path. The end result could be that storage and databases get pushed to the periphery of data centers and in-memory analytics becomes the new critical IT infrastructure. From the article: 'With big vendors like Microsoft and SAP buying into in-memory analytics to solve Big Data challenges, the big question for IT is what this trend will mean for the traditional data center infrastructure. Will storage, even flash drives, be needed in the future, given the requirement for real-time data analysis and current trends in design for real-time data analytics? Or will storage move from the heart of data centers and become merely a means of backup and recovery for critical real-time apps?'"
Goodbye Orwell (Score:2, Interesting)
The marginalization of long-term data storage can only be a good thing -- the big advertising and other firms get the analytical data that actually matters to their bottom line, and to the extent that the average joe's privacy is being invaded at the very least the fruits of that invasion will become increasingly accessible.
Re:The cutting edge is in high frequency trading (Score:4, Interesting)
Yep, the article is 10-20 years out of date.
HFT has been using statistical synchronization of dbs for years.
Big financial shops switched to in-memory dbs decades ago. With co-lo on the compute farms.
I don't know why he's even talking about 32G boxes as servers. That's a desktop, real db hosts are an order of magnitude bigger.
His "push the disks to the edge of the network?" Um, that's already happened - it's called tier 2. Tier 1 is the terabytes of solid-state storage we keep just in case.
This is a blast from the 1990s.
But puting data in system ram = harder reboots (Score:2, Interesting)
But puting data in system ram = harder reboots as you need to dump it to a disk. Also what about UPS's you need one that has the power to last for the time it takes to do that as well.
Re:Terabyte RAM? (Score:2, Interesting)
I think, perhaps, that you're missing the point, at least of the article. It has nothing to do with whether to store information in memory or in the database and everything to do with the current trend of using dedicated analytics products (i.e. OLAP) to do data analysis. Whereas we used to use the same relational databases to store, retrieve and analyze all data with SQL as the Swiss Army knife that enabled it all, we're moving towards a model where the relational database is responsible for storage and retrieval of information only and dedicated analytics products have their own cache of the information for reporting and analysis purposes.
The point is that relational databases are being marginalized and one of their major selling points (i.e. the ability to analyze data based on the relationship between different types of data) is increasingly less relevant. Once you're limiting your RDBMS usage to simple CRUD operations, the rationale for choosing an RDBMS (especially an expensive one like Oracle and its ilk) over NoSQL options or open source databases with limited support for power-user options starts to disappear. MySQL may lack a lot of the features that experienced DBAs consider mandatory, but it can do INSERTs, UPDATEs and DELETEs as well as anything and it has no problems with SELECTs based on keyed columns. Similarly, Casandra, Voldemort and such can also easily support that limited subset of functionality.
That is why RDBMSs are becoming marginalized. Applications are increasingly being designed to either avoid an RDBMS back-end or to use it as simple "dumb" storage and rely on a separate analytics product to accomplish all the complicated logic that previously would be accomplished with complicated SQL and stored procedures. Beyond that, OLAP concepts allow the data-mining interface to require less development effort. It's simple to write an interface around (an) OLAP cube(s) and allow the user to choose the dimensions and measures and allow the user to pivot, drill-down and such. In fact, most analytics products do this stuff out of the box without any development necessary. With a SQL database, an interface needs to be created that will translate the user's instructions into SQL, which can often become very complex and requires significant effort to ensure that the resulting SQL will perform well.
This isn't about RDBMSs becoming unnecessary, it's about them now being best served in a much more limited role than they've previously occupied in the application architecture.