Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Data Storage Hardware IT

Replacing Traditional Storage, Databases With In-Memory Analytics 124

storagedude writes "Traditional databases and storage networks, even those sporting high-speed solid state drives, don't offer enough performance for the real-time analytics craze sweeping corporations, giving rise to in-memory analytics, or data mining performed in memory without the limitations of the traditional data path. The end result could be that storage and databases get pushed to the periphery of data centers and in-memory analytics becomes the new critical IT infrastructure. From the article: 'With big vendors like Microsoft and SAP buying into in-memory analytics to solve Big Data challenges, the big question for IT is what this trend will mean for the traditional data center infrastructure. Will storage, even flash drives, be needed in the future, given the requirement for real-time data analysis and current trends in design for real-time data analytics? Or will storage move from the heart of data centers and become merely a means of backup and recovery for critical real-time apps?'"
This discussion has been archived. No new comments can be posted.

Replacing Traditional Storage, Databases With In-Memory Analytics

Comments Filter:
  • by Animats ( 122034 ) on Saturday January 01, 2011 @02:32PM (#34731266) Homepage

    For the cutting edge in this area, see what the "high frequency traders" are doing. Computers aren't fast enough for that any more. The trend is toward writing trading algorithms in VHDL and compiling them into FPGAs [stoneridgetechnology.com], so the actual trading decisions are made in special-purpose hardware. Transaction latency (from trade data in on the wire to action out) is dropping below 10 microseconds. In the high-frequency trading world, if you're doing less than 1000 trades per second, you're not considered serious.

    More generally, we have a fundamental problem in the I/O area: UNIX. UNIX I/O has a very simple model, which is now used by Linux, DOS, and Windows. Everything is a byte stream, and byte streams are accessed by making read and write calls to the operating system. That was OK when I/O was slower. But it's a terrible way to do inter-machine communication in clusters today. The OS overhead swamps the data transfer. Then there's the interaction with CPU dispatching. Each I/O operation usually ends by unblocking some thread, so there's a pass through the scheduler at the receive end. This works on "vanilla hardware" (most existing computers), which is why it dominates.

    Bypassing the read/write model is sometimes done by giving one machine remote direct memory access ("RDMA") into another. This is usually too brutal, and tends to be done in ways that bypass the MMU and process security. So it's not very general. Still, that's how most Ethernet packets are delivered, and how graphics units talk to CPUs.

    The supercomputer interconnect people have been struggling with this for years, but nothing general has emerged. RDMA via Infiniband is about where that group has ended up. That's not something a typical large hosting cluster could use safely.

    Most inter-machine operations are of two types - a subroutine call to another machine, or a queue operation. Those give you the basic synchronous and asynchronous operations. A reasonable design goal is to design hardware which can perform those two operations with little or no operating system intervention once the connection has been set up, with MMU-level safety at both ends. When CPU designers have put in elaborate hardware of comparable complexity, though, nobody uses it. 386 and later machines have hardware for rings of protection, call gates, segmented memory, hardware context switching, and other stuff nobody uses because it doesn't map to vanilla C programming. That has discouraged innovation in this area. A few hardware innovations, like MMX, caught on, but still are used only in a few inner loops.

    It's not that this can't be done. It's that unless it's supported by both Intel and Microsoft, it will only be a niche technology.

  • Re:Goodbye Orwell (Score:5, Informative)

    by quanticle ( 843097 ) on Saturday January 01, 2011 @03:18PM (#34731578) Homepage

    You're misinterpreting the post. No one said anything about long term data storage being marginalized or eliminated. Instead, the author is talking about the difference between persistent and non-persistent storage. He's saying that existing database technologies that rely on persistent storage are being marginalized as the speed difference between spinning disks and RAM widens, and the low cost of RAM makes it practical to hold large data sets entirely in memory. According to the author, data processing and analysis will increasingly move towards in-memory systems, while traditional databases will be relegated to a "backup and restore" role for these in-memory systems.

  • Re:Totally inane (Score:4, Informative)

    by quanticle ( 843097 ) on Saturday January 01, 2011 @03:22PM (#34731616) Homepage

    I didn't really see the author mention anything about discarding data. Rather, it seems like he's saying that existing databases (which attempt to commit data to persistent storage as soon as possible) will be marginalized as the speed gap between persistent storage and RAM widens. Instead, business applications are going to hold data in RAM, and rely on redundancy to prevent data loss when a system fails before its data has been backed up to the database.

  • by BitZtream ( 692029 ) on Saturday January 01, 2011 @03:38PM (#34731750)

    So I'm guessing you've never actually done any development?

    The 'byte stream' model is not from UNIX, its just the way the hardware is laid out physically.

    IPC happens in an entirely different way unless you're using something simplistic like pipes

    RDMA is pretty much a stable of high speed cluster computing, however its DMA that allows pretty much everything in your PC to work without slowing the processor down. Even your keyboard controller uses DMA to get the characters into somewhere useful.

    As far as what you're calling RDMA via Infiniband, I've seen massive clusters (some of the largest in the world) using it ... safely.

    If you think nothing uses the protections provided by the x86 family I'd like to know what shitty OS you're using? Not only does everyone actually use it on the x86, they do it in ... get this ... C! Perhaps you should take a look at a few open source OSes and notice that while there is some assembly in specific places for speed and the required lowest level libraries ... you'll be suprised by the fact that all of that memory management stuff is written in ... C and utilized by .... C programs.

    I guess you're also ignore the fact that intel and amd added more protection hardware to the x86 architecture JUST FOR VIRTUALIZATION ... I suppose you think the fact mordern hypervisors won't work without these features present is just a silly little annoyance that the software venders throw in to make us buy new hardware to pad their bank accounts?

    I'm not sure what development you do, by my standard C library uses MMX for many functions that require me to do nothing to take advantage of their speedup.

    You really have no clue do you?

It is easier to write an incorrect program than understand a correct one.

Working...