Forgot your password?
typodupeerror

Build Your Own 135TB RAID6 Storage Pod For $7,384 239

Posted by CmdrTaco
from the for-all-your...-movies-yeah-thats-it dept.
An anonymous reader writes "Backblaze, the cloud-based backup provider, has revealed how it continues to undercut its competitors: by building its own 135TB Storage Pods which cost just $7,384 in parts. Backblaze has provided almost all of the information that you need to make your own Storage Pod, including 45 3TB hard drives, three PCIe SATA II cards, and nine backplane multipliers, but without Backblaze's proprietary management software you'll probably have to use FreeNAS, or cobble together your own software solution... A couple of years ago they showed how to make their first-generation, 67TB Storage Pods"
This discussion has been archived. No new comments can be posted.

Build Your Own 135TB RAID6 Storage Pod For $7,384

Comments Filter:
  • It's full of stars!!
    • by ByOhTek (1181381)

      It's full of slashvertisements!!

      • by x6060 (672364)
        It's not a slashvertisement if they tell you how to build it yourself.....
        • Except for the bit about how it would be even better if you paid for their proprietary management software...
          • Re:My God... (Score:5, Insightful)

            by x6060 (672364) on Thursday July 21, 2011 @11:50AM (#36835308)
            Did you notice how they even gave you the alternatives to their software? Essentially they are saying "We developed this for our own internal use and if you would LIKE to pay for it its cool. If you dont then there are these other free alternatives." But then again just because some company is mentioned in the article it MUST be a slashvertisment.
            • 1) Build 135TB box 2) Install Openstack.org's Object Storage system (free! Amazon S3 API Compliant!) 3) Profit? Fuck profit! STORE ALL THE THINGS!

          • Who said it would be better? Not the article. The article said you'd have to do it yourself, and if you're THAT good, you might make something better. If you don't want to do it yourself (FreeNAS) you can opt for their software to manage it, which GASP HORROR, they charge for.

            Challenge laid down Open Source Community, make your own Management software and create a new FS that doesn't have the limitations of the EXT4 has without using LVN to get around those limitations, that is better than what these people

          • They don't sell the management software.. its internal. They only sell a backup service to end users, via a client.

      • Re:My God... (Score:5, Insightful)

        by Dillon2112 (197474) on Thursday July 21, 2011 @11:24AM (#36835096) Homepage

        My problem with Backblaze is their marketing is very misleading...they pit these storage pods up against cloud storage and assert that they are "cheaper", as though a storage pod is anything like cloud storage. It isn't. Sure, there's the management software issue that's already been mentioned, but they do no analysis on redundancy, power usage, security, bandwidth usage, cooling, drive replacement due to failure, administrative costs, etc. It's insulting to anyone who can tell the difference, but there are suits out there who read their marketing pitch and decide that current cloud storage providers like Google and Amazon are a rip off because "Backblaze can do the same thing for a twentieth the price!" It's nuts.

        You can see this yourself in their pricing chart at the bottom of their blog post. They assert that Backblaze can store a petabyte for three years for either $56k or $94k (if you include "space and power"). And then they compare that to S3 costing roughly $2.5 million. In their old graphs, they left out the "space and power" part, and I'm sure people complained about the inaccuracies. But they're making the same mistake again this time: they're implicitly assuming the cost of replicating, say, S3, is dominated by the cost of the initial hardware. It isn't. They still haven't included the cost of geographically distributing the data across data centers, the cost of drive replacement to account for drive failure over 3 years, the cost of the bandwidth to access that data, and it is totally unclear if their cost for "power" includes cooling. And what about maintaining the data center's security? Is that included in "space"?

        On a side note, I'd be interested to see their analysis on mean time between data loss using their system as it is priced in their post.

        You could say the Backblaze is serving a different need, so it doesn't need to incur all those additional costs, and you might be right, but then why are they comparing it to S3 in the first place? It's just marketing fluff, and it is in an article people are lauding for its technical accuracy. Meh.

        • by Revotron (1115029)
          Suits will be suits. Backblaze proudly boasts that they're a great offsite backup solution, but they will quickly tell you that they are not a "cloud storage" provider. Their only business is offsite replication. They don't hide the fact that if you upload 500GB of data and then delete it off your computer, it will be removed from their systems as well.

          You didn't read the article. You know how I know that? They explicitly state that they don't have any costs for replacement hard drives over 3 years,
        • Re:My God... (Score:4, Insightful)

          by Archangel Michael (180766) on Thursday July 21, 2011 @12:27PM (#36835632) Journal

          First: ALL Marketing is misleading. That is what marketing does. Accentuate the positive, eliminate the negative. So complaining about that is just idiotic.

          Second: You could have a couple dozen Backblaze units, pay for a tech to monitor them 24/7/365 and replace all the drives twice over for what Amazon charges for the same thing. Sure that doesn't included cost for premises, and HighSpeed Internet to multiple locations. But still, that is aggregated with all the other clients.

          Third: what are you paying for in the "cloud", I mean besides ethereal concepts. Does Amazon tell you how they do things? You probably know less about Amazon (and the others) setup so you're comparing something you know something about (not everything) verses something you know almost nothing about, and the complain that they aren't doing it in a comparable way. You don't know.

          Fourth: Your basic assumption is that Backblaze has no contigency for drive replacement, which is false. Since these are "new" drives there might be insufficient data about failure rates and therefore the actual cost of replacement (never mind warranties) or having drives in both Hot and Cold Spare setups. I'm sure that Backblaze in their $5/MO service figures what it costs to store data, have spares, keep the Datacenter running and profitable. Even if they double the cost to $10, it still puts the others to shame.

          Have you compared the data loss rates for the last three years between Amazon and Backblaze? Can you even compare or is that data held secret (see point 1b). My point here, is that you're pulling shit out of your ass and thinking it doesn't stink. Even if it isn't directly comparable, it is at least in the realm of consideration, EVEN if everything you said is true. And at 10 times less in cost, that can buy a lot of redundancy. It is just a matter of perspective.

          • Amazon's S3 is based off of MogileFS (the concept, not the code): http://danga.com/mogilefs/ [danga.com]

            And if you want to run an S3 compliant system internally, you'll us openstack.org's object storage system:

            http://www.openstack.org/projects/storage/ [openstack.org]

            Ability to provide object storage services at multi-petabyte scale
            Free open source software, no licensing frees, ‘open-core,’ or ‘freemium’ model
            Written in python; easy to differentiate your offering with extensions and modifications
            Compatibility a

    • by Kjella (173770)

      Pr0n stars? Because we all know what it's really full of...

  • Wow, are we already approaching Petabyte clusters? I'm still getting used to Terabyte!
    • by Narnie (1349029)

      If I ever have to admin a Petabyte cluster, I'd name it Petabear.

    • by corbettw (214229)

      I know, it's crazy! Storage numbers are increasing faster than the national debt!

  • by bryan1945 (301828) on Thursday July 21, 2011 @10:12AM (#36834306) Journal

    For a true porn collector yet.

  • by mugurel (1424497) on Thursday July 21, 2011 @10:17AM (#36834338)
    for both internet security and privacy: each of us can now store his own local copy of the internet and surf offline!
    • That would actually be nice. If every site I ever went to was cached locally. Like having a browser cache with unlimited size. It would be miles better than archive.org, if you remember a site from years ago and wish you could go back. Even better if it prefetched links you never clicked on.
    • by demonbug (309515)

      for both internet security and privacy: each of us can now store his own local copy of the internet and surf offline!

      Of course, with my 150GB/month bandwidth cap it is going to take ~70 years to fill it up...

    • by Pharmboy (216950)
      wget -m -p http://*

      Just run that in your cron.daily scripts and you are good to go!
  • Or can somebody tell me if the cooling of the HDs is ok if they are stacked like in the picture?

    • With those gigantic fans, and the track record they have, it's probably ok.
    • It's probably fine [imgur.com].

    • by gweihir (88907)

      First, it depends on airflow. That is pretty close to optimal in the design. Second, you can monitor disk temperature and even have an emergency slowdown or shut-off if they overheat. Monitoring and shut-down is easy to script, maybe half a day if you know what you are doing.

    • by demonbug (309515) on Thursday July 21, 2011 @11:47AM (#36835272) Journal

      Or can somebody tell me if the cooling of the HDs is ok if they are stacked like in the picture?

      According to their blog post about it, they see a variation of ~5 degrees within unit (middle drives to outside drives) and about 2 degrees from the lowest unit in a rack to the highest. They also indicate that the drives stay within the spec operating temperature range with only two of the six fans in each chassis running.

      Keep in mind these are 5400 RPM drives, not the 10K+ drives you would expect in an application where performance is critical. These are designed for one thing - lots of storage, cheap. No real worries about access times, IOPS, or a lot of the other performance measures that a more flexible storage solution would need to be concerned with. These are for backup only - nice large chunks of data written and (hopefully) never looked at again.

    • Or can somebody tell me if the cooling of the HDs is ok if they are stacked like in the picture?

      It doesn't take much airflow at all to keep drives down around 35-40C. Even a light breeze can be enough to drop drive temperatures 5-10C. They're only 5-10W devices (for 3.5" drives) which means they're easy to cool in comparison to the 100-200W video cards or the 95-150W CPUs.
  • by gman003 (1693318) on Thursday July 21, 2011 @10:21AM (#36834390)

    The article says it uses RAID 6 - 45 hard drives are in the pod, which are grouped into an arrays of 15 that use RAID 6 (the groups being combined by logical volumes), which gives you an actual data capacity of 39TB per group (3TB * (15 - 2) = 39TB), which then becomes 117TB usable space (39TB * 3 = 117TB). The 135TB figure is what it would be if you used RAID 1, or just used them as normal drives (45 * 3TB = 135TB).

    And these are all "manufacturer's terabytes", which is probably 1,024,000,000,000 bytes per terabyte instead of 1,099,511,627,776 (2^40) bytes per terabyte like it should be. So it's a mere 108 terabytes, assuming you use the standard power-of-two terabyte ("tebibyte', if you prefer that stupid-sounding term).

    • Re: (Score:3, Informative)

      by GameboyRMH (1153867)

      A manufacturer's terabyte would be 1,000,000,000,000 bytes.

    • by Inda (580031)

      Tell me about it!!!

      £4,561.68 still sounds like a steal. In fact, I might just steal one and save even more!

      I actually spend more than that on food for the family per year. I wonder...

    • Data is also duplicated across different pods so you can lose one due to power supply issues and not care for a while. RAID across local groups of disks does seem a bit pointless when you already have a layer of redundancy across the whole rack.
    • by ari_j (90255)
      Just a small quibble: RAID level 1 would give you a capacity of 3TB with an absurd amount of redundancy. Level 0 is that one that would give you 135TB striped across all 45 disks.
    • There is "cloud storage" management software that would be awesome on these boxes (although they might benefit from a bit more CPU and ram, and some more Gigabit nics.. When I read this blog article yesterday, I immediately went back to openstack.. The examples for Openstack Storage don't even bother with raid, since the objects on the drive will be replicated to multiple other servers automatically. This could be very, very interesting..

      http://www.openstack.org/projects/storage/ [openstack.org]

  • by hjf (703092)

    RAID-6, really?
    After 5+ years working with ZFS, personally, I wouldn't touch md/extX/xfs/btrfs/whatever with a 10 foot pole. Solaris pretty much sucks (OpenSolaris is dead and the open source spinoffs are a joke), but for a storage backend it's years ahead of Linux/BSD.

    Sure, you can run ZFS on Linux (I did) and FreeBSD (I do), but for huge amounts of serious data? No thanks.

    • Sure, you can run ZFS on Linux (I did) and FreeBSD (I do), but for huge amounts of serious data? No thanks.

      What do you count as a serious amount of data? And what makes the FreeBSD version inferior in your opinion (aside from being a slightly older version - I think -STABLE now has the latest OpenSolaris release)?

      Genuinely curious: I'm thinking of building a FreeBSD/ZFS NAS and I'd like to know if there's anything in particular that I need to look out for. Performance isn't really important, because most of the time I'll be accessing it over WiFi anyway, which is liekly to be far more of a bottleneck than any

  • by QuietLagoon (813062) on Thursday July 21, 2011 @10:33AM (#36834544)
    ... if you really care about the data. ZFS has built-in so much more data integrity checks, and more extensive data integrity checks [oracle.com], than the vanilla RAID6 arrays.

    .
    Both FreeBSD [freebsd.org] and FreeNAS [freenas.org], in addition to OpenSolaris [opensolaris.org], support ZFS.

    • by brianwski (2401184) on Thursday July 21, 2011 @01:43PM (#36836454) Homepage

      ... if you really care about the data.

      (Disclaimer: I work at Backblaze) - If you really care about data, you *MUST* have end-to-end application level data integrity checks (it isn't just the hard drives that lose data!).

      Let's make this perfectly clear: Backblaze checksums EVERYTHING on an end-to-end basis (mostly we use SHA-1). This is so important I cannot stress this highly enough, each and every file and portion of file we store has our own checksum on the end, and we use this all over the place. For example, we pass over the data every week or so reading it, recalculating the checksums, and if a single bit has been thrown we heal it up either from our own copies of the data or ask the client to re-transmit that file or part of that file.

      At the large amount of data we store, our checksums catch errors at EVERY level - RAM, hard drive, network transmission, everywhere. My guess is that consumers just do not notice when a single bit in one of their JPEG photos has been flipped -> one pixel gets every so slightly more red or something. Only one photo changes out of their collection of thousands. But at our crazy numbers of files stored we see it (and fix it) daily.

    • by rubycodez (864176)
      Except OpenSolaris is dead, better to keep data on a filesystem that runs on living OS. FreeNAS is FreeBSD based, so we're down to one open source OS that supports ZFS.
  • by roman_mir (125474) on Thursday July 21, 2011 @10:34AM (#36834562) Homepage Journal

    When you choose which file system to use, you should consider what the purpose of the storage is. If it's to run a database, you may want to rethink the decision to go with a journaling file system, because databases often their own journaling (like PostreSQL WAL), which actually means the performance will get reduced if you put a journaling file system underneath that. [postgresql.org] Just my 0.0003 grams of gold.

    • They don't run databases on this storage. the ONLY way they access all this storage is via an HTTPS connection to the tomcat server running on the machine. They have some very, very interesting blog entries about how things scale when you go beyond a handful of servers.

      • by roman_mir (125474)

        That's not why I wrote the comment, I saw that the access is over http, I wrote it because this story is an ad for this company, but also it's talking about building a system like that for your own use, and if you do it for your own use, why would you do http only?

  • Why not use a SAS card?
    why have three PCIe cards that are only X1 when a x4 or better card with more ports has more PCI-e bandwidth and some even have there own RAID cpu on them.

    Why use a low end I3 cpu in a 7K system? at least go to i5 even more so with software raid.

    • Hardware RAID controllers are stupid in this context. The only place they make sense is in a workstation, where you want your CPU for doing work, and if the controller dies you restore from backups or just reinstall. Using software RAID means never having to try to get a rebuilder software to convert the RAID from one format to another because the old controller isn't available any more, or because you can't get one when you really need one to get that project data out so you can ship and bill.

      • by pz (113803) on Thursday July 21, 2011 @12:02PM (#36835412) Journal

        No. Hardware controllers are the right solution in this context. These pods are not designed for individual users, but for corporations that can afford stockpiles of spare parts, so replacing a board can be done easily. Using hardware controllers allows many more drives per box, and thus per CPU. A populated 6-CPU motherboard is going to be less reliable, dissipate more heat, require more memory, and likely be less reliable, than the special-purpose hardware approach that allows for a single CPU.

        Software RAID makes sense when you have a balance of storage bandwidth requirements to CPU capacity that is heavy on the CPU side. This box is designed for the opposite scenario, as the highly informative blog describes:

        http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/ [backblaze.com]

        (Yes, I know, expecting someone to read the blog would mean that they would have to read the linked article and then click through to the original post, a veritable impossibility. Still, it is recommended reading, especially the part about their experience with failure rates and how they have *one* guy replacing failed drives *one* day per week.)

    • by gman003 (1693318) on Thursday July 21, 2011 @10:47AM (#36834756)

      Because, for this project, raw storage capacity is much more important than performance. Besides, they claim their main bottleneck is the gigabit Ethernet interface - even software RAID, the PCIe x1, and the raw drive performance is less of a limiting factor.

      Yeah, in a situation where you need high I/O performance, this design would be less than ideal. But they don't - they're providing backup storage. They don't need heavy write performance, they don't need heavy read performance. They just need to put a lot of data on a disk and not break anything.

      PS: SAS doesn't really provide much better performance than SATA, and it's a lot more expensive. Same for hardware RAID - using those would easily octuple the cost of the entire system.

      • by nabsltd (1313397)

        Besides, they claim their main bottleneck is the gigabit Ethernet interface - even software RAID, the PCIe x1, and the raw drive performance is less of a limiting factor.

        This is absolutely true. Even with a pair of bonded 1Gb Ethernet connections, it's not nearly enough to keep up with a PCIe x1 in the real world. I'm moving to a single 10Gb connection from each server to iSCSI SAN because of this.

    • by gweihir (88907)

      Very simple: Best bang for the buck. Your approach just increases cost without any real benefit in the target usage scenario. For example, the i5 is just a waste of money and energy. Hardware RAID drives cost, but the only "advantage" is has is that it is easier to use for clueless people.

    • An Intel i3 540, more powerful than the CPU on most hardware RAID controllers.This thing will be doing very little other than handling the RAID sets.

  • by Jim Ethanol (613572) on Thursday July 21, 2011 @10:41AM (#36834664) Homepage

    If you're in the SF Bay Area check out http://geeksessions.com/ [geeksessions.com] where Gleb Budman from Backblaze will be speaking about the Storage Pod and their approach to Network & Infrastructure scalability along with engineers from Zynga, Yahoo!, and Boundary. This event will also have a live stream on geeksessions.com.

    Full Disclosure: This is my event.

    50% discount to the event (about $8 bucks and free beer) for the Slashdot crowd here: http://gs22.eventbrite.com/?discount=slashdot [eventbrite.com]

    • by gpuk (712102)

      Hi Jim

      I'm quite a few timezones East of you, meaning the live stream will start at 0300 local on Wednesday for me. I'm willing to tough it out and stay up to watch it if necessary but it would be much more civilised if I could watch a playback. Will it be available for download later or is it live only?

      It sucks I've only just learnt about geeksessions :( Some of your earlier events look awesome

  • Original blog post (Score:5, Informative)

    by Baloroth (2370816) on Thursday July 21, 2011 @10:44AM (#36834708)

    Here [backblaze.com] is a link to Backblaze's actual blog entry for the new pods 135TB, and here [backblaze.com] is the original 67TB pods. The blog article is actually quite fascinating. Apparently they are employee owned, use entirely off-the-shelf parts (except for the case, looks like), and recommend Hitachi drives (Deskstar 5K3000 HDS5C3030ALA630) as having the lowest failure rate of any manufacturer (less than 1% they say).

    I found it kinda amusing that ext4's 16TB volume limit was an "issue" for them. Not because its surprising, but because... well, its 16TB. The whole blog post is actually recommended reading for anyone looking to build their own data pods like this. It really does a good job showing their personal experience in the field and problems/not problems they have. For instance: apparently heat isn't an issue, as 2 fans are able to keep an entire pod within the recommended temperature (although they actually use 6). It'll be interesting to see what happens as some of their pods get older, as I suspect that their failure rate will get pretty high fairly soon (their oldest drives are currently 4 years old, I expect when they hit 5-6 years failures will start becoming much more common.) All in all, pretty cool. Oh, and it shows how much Amazon/ Dell price gouges, but that shouldn't really shock anyone. Except the amount. A petabyte for three years is $94,000 with Backblaze, and $2,466,000 with Amazon.

    P.S. I suspect they use ext4 over ZFS because ZFS, despite the built in data checks, isn't mature enough for them yet. They mention they used to use JFS before switching to ext4, so I suspect they have done some pretty extensive checking on this.

  • by savanik (1090193) on Thursday July 21, 2011 @10:51AM (#36834810)

    With the latest bandwidth caps I'm seeing on my provider (AT&T U-verse), I can download data at a rate of 250 GB per month. So it'll take me 45 YEARS to fill up that 135 TB array. Something tells me they'll have better storage solutions by then.

    In the meantime, I'm just waiting for Google to roll out the high-speed internet in my locale next year - maybe then I'll have a chance at filling up my current file server.

    • Crazy enough, you can actually *buy* content instead of downloading it from Pirate Bay.

    • by pz (113803)

      These pods are not intended for the individual user. Your ability to saturate a home pipe without filling up 135 TB isn't relevant.

  • I did something a bit similar on a smaller scale about 9 years ago. (Linux software RAID, 12 disk in a cheap server). The trick is to make sure that you pay something like 70% of the total hardware cost for the disks. It is possible, it can be done reliable, but you have to know what you are doing. If you are not a competent and enterprising engineer, forget it (or become one). But the largest cost driver in storage is that people want to buy storage pre-configured and in a box that they do not need to unde

    • But the largest cost driver in storage is that people want to buy storage pre-configured and in a box that they do not need to understand. This is not only very expensive, (when I researched this 9 years ago, disk part of total price was sometimes as low as 15%!), but gives you lower performance and lower reliability. And also less flexibility.

      You aint kidding. I have installed systems for people that cost hundreds of thousands of dollars and they cant even give me basic information in order to complete the install. How many disks to each head? No Idea. How big do you want your RAID groups? No idea. Excuse me sir this IP and Gateway are in different subnets can I have another? That last one has actually happened more than once.

  • hmm.

    What the hell else is Sean doing with his time? That's what the articles are really missing...

"But this one goes to eleven." -- Nigel Tufnel

Working...