IBM Speeds Storage With Flash: 10B Files In 43 Min 76
CWmike writes "With an eye toward helping tomorrow's data-deluged organizations, IBM researchers have created a super-fast storage system capable of scanning in 10 billion files in 43 minutes. This system handily bested their previous system, demonstrated at Supercomputing 2007, which scanned 1 billion files in three hours. Key to the increased performance was the use of speedy flash memory to store the metadata that the storage system uses to locate requested information. Traditionally, metadata repositories reside on disk, access to which slows operations. (See IBM's whitepaper.)"
Re:File Sizes? (Score:5, Informative)
As far as I can see, the files themselves were not read, only the metadata (who has access, modification time, position on the spinning platter, etc.).
Re: (Score:2)
Re: (Score:2)
Did anyone else read that as "10 byte files?" that seemed mighty slow lol
Nope, I read 267
43 min for 10 bytes? (Score:5, Insightful)
Thats very slow.
Also, please, better technical expertise writing the articles.
Re:43 min for 10 bytes? (Score:5, Funny)
Come on! Adobe Flash has always been slow, that's a massive improvement!
Re: (Score:2)
Make 10 Billion files on your ext3 filesystem and see how long an ls takes you
Ext3 can store 10 billion files in 10 bytes? Must be the new Whoosh feature, which avoids reading metadata like the comment title.
Re: (Score:2)
Re: (Score:1)
Oh, BURN!
Well, for burning I'd prefer ISO9660 with RockRidge extension to ext3. :-)
Re: (Score:2)
IBM are selling ClearCase with a straight face.
Re: (Score:2)
43 min for 10 bytes.
I see they've copied the poorly hobbled together config for my SAMBA server.
Re: (Score:2)
Maybe they are scanning them to see if they contain a 1 or a 0. That way they can claim insane numbers like 10B. Whatever a 10B is.
2?
Re: (Score:3)
I wonder how google would go indexing the contents of 10 billion files.
Huh.... (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Are you confusing a system that stores something in memory, and a system that caches a copy of a small part in memory for fast access?
Re: (Score:2)
Re: (Score:2)
It doesn't sound like you do. Sync is used to flush the cache of metadata back out to the disk. The metadata is actually stored on disk.
Re: (Score:2)
Re: (Score:2)
So now you are shifting in your claims. Yes, when metadata is used it is in memory - the same is true of any data. But it is held (to use your term) on disk, where it is loaded into memory on use, changed and saved back to disk. The primary store of metadata, the one that persists between boots, is held on the disk. A small local cache is changed, as with any data. So going back to your original (erroneous) claim: traditional file-systems *do* hold their metadata on disk, even if they cache a portion of it
Re: (Score:2)
Re: (Score:1)
You be trollin.
Demand (Score:1)
cost/performance (Score:1)
They noted that while solid-state storage can cost 10 times as much as traditional disks, they can offer 100 percent performance boost.
So you get 2 times the performance for 10 times the price? I'd say that's still 5 times as expensive. What would be the performance boost with a RAID of 5 disks?
Re: (Score:2)
I think you misunderstood the point of the statement in that article.
It's referencing using solid state as a cache, and how even though solid state memory costs 10x as much, when used for caching duty, it can increase the performance of the disk array by 100%. This would be in line with the numbers alot of sites are getting from intel's new hard disk SSD caching tech.
You can DIY it in linux. (Score:2)
Numbers (Score:2)
Now, some of my maths might be (a little) off, but ...
I've just spent half the day processing financial files ... 133KB average file size and processed (by process, I mean every byte is 'looked' at in c++ code) 4000 per second. I did this on a single file (compressed tar.gz) that when expanded is 7857 files and just over 1GB in size. The compressed file is temporarily stored in /dev/shm. The parallelisation is around one thread processing the ram drive file while the other file copies the next file (1GB
Re: (Score:2)
Your lack of understanding is quite simply astounding. You have completely missed the point of their research, which is to reduce the latency in randomly accessing information in a large dataset. They are not measuring throughput (or bandwidth) although the article does state that they hit 4.9GB/s. If you made your files much much smaller and then repeated your test you would find that your performance drops drastically as your program comes limited by a different IO bound. Instead of being bounded by the b
Re: (Score:2)
Doing something for 7857 files and doing it for 10 billion are very different situations. 7857 files, including metadata, can easily be sucked into memory in one big chunk and unpacked/examined from there. That simply doesn't work for datasets larger than memory. At the higher scale, modern filesystems do tend to fall apart, badly, so different approaches are needed. Comparing your paper airplane to an F-22 doesn't make it look like you know anything about writing software properly. Quite the opposite.
Try it for yourself (Score:2)
time sudo ls -lAR / | grep -E '^[ld\-]+' | wc -l
It should give you the number of files on your filesystem and the time it took to "scan" them all.
Re: (Score:2)
Well, you probably need to make sure you dont have any of the files or metadata in the buffercache before starting.. Also limit the search to the actual filesystem you want to test..
# echo 3 >/proc/sys/vm/drop_caches
# time find / -xdev -printf "%p %y %s %n %i %m %G %U %c %b %a\\n" |wc -l
621847
real 0m36.738s
user 0m6.031s
sys 0m12.737s
This on a simple 40Gb Intel SSD with a ext4 fs
Re: (Score:2)
Alternative summary (Score:3)
SUN were doing this in 1990 (Score:2)
something strange in the title? (Score:1)
I was wondering what does it mean 10B files... Ok, the article talk of 10 Billion files... But 1 Billion is 10^9 or is 10^12. So If you have to use a symbol, use a sensible one... What about 10G files? :D
Comparison of several big UGG Deckers (Score:1)
Coach high performance in China in 2010 (Score:1)