Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Data Storage Hardware

Ask Slashdot: How Do You Store a Half-Petabyte of Data? (And Back It Up?) 219

An anonymous reader writes: My workplace has recently had two internal groups step forward with a request for almost a half-petabyte of disk to store data. The first is a research project that will computationally analyze a quarter petabyte of data in 100-200MB blobs. The second is looking to archive an ever increasing amount of mixed media. Buying a SAN large enough for these tasks is easy, but how do you present it back to the clients? And how do you back it up? Both projects have expressed a preference for a single human-navigable directory tree. The solution should involve clustered servers providing the connectivity between storage and client so that there is no system downtime. Many SAN solutions have a maximum volume limit of only 16TB, which means some sort of volume concatenation or spanning would be required, but is that recommended? Is anyone out there managing gigantic storage needs like this? How did you do it? What worked, what failed, and what would you do differently?
This discussion has been archived. No new comments can be posted.

Ask Slashdot: How Do You Store a Half-Petabyte of Data? (And Back It Up?)

Comments Filter:
  • It's all going to get backed up.

  • ceph (Score:3, Informative)

    by Drew Matthews ( 4197401 ) on Saturday July 25, 2015 @03:23PM (#50181937)
    we use Ceph, its fast, redundant, and crazy scalable, oh did i mention free (paid support)? ceph.com
    • we use Ceph, its fast, redundant, and crazy scalable, oh did i mention free (paid support)? ceph.com

      Personally I've been using Ceph for the last few years myself. It has to be one of the best DFS's I've ever used. It includes security, speed, easy to expand by adding additional nodes. The free part was great. I found it looking through the repos one day. You can even tie it into other projects such as Hadoop (at least I recall reading it had a plug in a couple years ago).

      Great product!

  • by Old VMS Junkie ( 739626 ) on Saturday July 25, 2015 @03:25PM (#50181943)
    Honestly, you should talk to the pros. I would call a couple of storage vendors, give them the basic outline of what you want to do, and let them tell you how they would do it. You can even get more formal and issue a Request for Information (RFI) or even a Request for Quote (RFQ). If you're a biggish company, your purchasing people probably have an SOP and standard forms for how to issue an RFI/RFQ. For the big boy storage vendors, half a petabyte is commonplace. The bigger question may very well be what this is going to look like at a software level. Managing the data might be a bigger challenge than storing it. Is this going to be organized in some sort of big data solution like Hadoop? Is it just a whole bunch of files and a people are going to write R or SAS jobs to query against it? Sometimes the tool set that you want to use will drive your choices in how to build the infrastructure under it.
    • Re:Talk to Vendors (Score:5, Informative)

      by Anonymous Coward on Saturday July 25, 2015 @03:41PM (#50182007)

      Honestly, that's the WORST thing to do. When you talk to the pros, they will try and sell you some outrageous overpriced Fiber Channel system that's total overkill for what you are doing. I've worked with 'big data' storage companys like EMC and Netapp. We needed 300TB of 'nearline' storage, and EMC came up with a $3,000,000.00 TOTAL overkill Fiber Channel solution, and Netapp wasn't much better, coming in at close to $2,000,000.00. Total ripoff. The ONLY reason you would ever choose Fiber Channel over ISCSI is if you are doing HUGE transactional database, with millions of access per minute. If you just need STORAGE, I went with Synology, and got 300TB of RAID-10 storage for about 100K. I DUPLICATED it (200K total), and still only paid 10% of what the 'vendors' tried to sell me, I was VERY clear that I did not need Fiber Channel, I refused to spend tons of money for something that would have zero bearing on the performance, and found it's much better to research and provide your own solution at 10% of the cost of the big vendors. Why do you think EMC has almost 3Billion of revenue, because they convince pointy haired bosses that their solution is the best. Trust me, going with a 2nd tier vendor for 'near line storage' is a much better idea than talking to the 'big 5' to ask for a solution

      • LOL at FC only for transactional DBs.

        Also, VM environments. Large media servers, etc.

        My solution:
        Infortrend. It has iSCSI for you and your slow environment, and FC for me and my fast environment. And cheap enough for both.

        Also, 300TB of RAID 10 at $100k is most likely 7k rpm. I much prefer 15k as it's performant for VMs even when full of running VMs. 7k drives never will be. Well, maybe if you put a nice fat SSD cache in front of it.

        • I put my data inside XML files, split the fields with CSV and store all of it on 4200RPM laptop drives that automatically go to sleep after a few minutes of inactivity.

          Oh, and I backup all of that data on punched tape once per year.

          • by AK Marc ( 707885 )
            I know you were making a joke, but 4200 RPM laptop drives are great. You'll have trouble finding a lower power usage spinner, and the read speed will be roughly interface speed for most practical implementations of multi-drive arrays.
            • by ihtoit ( 3393327 )

              Seconded. I use laptop EIDE drives for my network scratch - it's great, the array runs at saturation for my Gigabit network. And at 2TB, that volume isn't too shoddy on usable space either.

              For archival storage (for some measure of permanent to not include removable tape) I use huge drives in quick-release caddies and set to JBOD and simply diff the data daily. Once the drive's full, out it comes and in goes the next empty. Full drive goes offsite. Working volume is around 14TB right now, that's a RAID6. All

            • by Bengie ( 1121981 )
              I saw a ZFS benchmark comparing random read, write, read+write, and sequential read, write, and read+write of a 15k RPM RAID and 5400 RPM with 10x as much storage but just as many spindles for a fraction the price, and the 5400 RPM setup was faster once the 64GB of SSDs got warmed up.
        • by mlts ( 1038732 )

          Oracle has a SAN (well, SAN/NAS) offering which does similar with a rack of ports/HBAs that were configurable, assuming the right SFP was present. Want FC? Got it. iSCSI? Yep. FCoE? Yep. Want to just share a NFS backing store on a LAG for a VMWare backing store. Easy doing.

          The price wasn't that shocking either. It wasn't dirt cheap like a Backblaze storage pod, but it was reasonable, especially with SSD available and autotiering.

      • by jbolden ( 176878 )

        Netapp provides performance storage. If you don't want performance and only want part of their solution they can virtualize the software and run on anyone's hardware. You can be down around $12k / mo for 300TB duplicated 1x with their software. Nowhere near $3m.

      • Talking to the pros is only the worst thing to do if you know as much, if not more, than they do. The fact that the OP is asking slashdot indicates he does not know a lot about setting up storage in the PB range. Are the major vendors overpriced? In terms of the hardware you get, probably. In terms of the knowledge they bring to the table, probably NOT in the case of the OP. If you have someone who can select COTS components and effectively couple them with some good OS/SW, great. Otherwise, get someo
      • by drsmithy ( 35869 )

        RAID10 for nearline storage ?

        More research required, methinks.

      • We have actually purchased a NetApp cluster, replicated in two sites, and while I can't divulge what we paid (Plus I'm just the guy who set it up), there's a good chance the parent is off by almost an order of magnitude. Now – I'm not saying you couldn't build your own storage cheaper, or that I have my own issues with NetApp, or that some sort of Cloud solution might not be an even better answer- such as Amazon S3 or Glacier, I will say that a SAN is not at all a bad idea and depending on how imp
    • A well written RFI sent to some vendors should give you an overview of what is available and at what cost.

      As you need file level access you should talk to NAS vendors, like Netapp or ENC Isilon. They will certainly have storage boxes for you. You'll have to fit a backup solution to your storage box too, this is work and adds cost.

      If you think this may grow, then look at scalability. Not all solutions scale. Also you may end up with millions of files, this may be problemantic to some backup solutions.

      I ha

  • by snowgirl ( 978879 ) on Saturday July 25, 2015 @03:28PM (#50181953) Journal

    At Facebook, it's memcached, with an HDD backup, eventually put onto tape...

    At Google, it's a ramdisk, backed up to SSD/HDD, eventually put onto tape...

    For anyone who can't afford half a petabyte of RAM with the commensurate number of computers? I have no good ideas... except maybe RAM cache of SSD, cache of HDD, backed up on tape...

    Using something like HDFS to store your data in a Hadoop cluster of file requests, is likely the best F/OSS solution you're going to get for that...

    • by tsetem ( 59788 )

      Thumbs up on HDFS. The next question to ask your groups how they will be analyzing it. HDFS (and Hadoop/Spark/Whatever) will hopefully fit in nicely there. Not only will your data be redundantly copied across multiple systems, but as your data needs (and cluster) grows, so does your computational power.

      Getting data in & out can be done via Java API, Rest API, FUSE or NFS Mounts. The only issue is that HDFS doesn't play well with small files, but hopefully your groups will be using large files instea

  • by NFN_NLN ( 633283 ) on Saturday July 25, 2015 @03:29PM (#50181955)

    This project must have an unrealistically low budget, otherwise there are quite a few Enterprise solutions that will do all OR a combination of these tasks.

    > how do you present it back to the clients?
    Look at a NAS, not a SAN. ie NetApp or 3Par C series.

    > And how do you back it up?
    Disaster Recovery replication to another system or hosted services. NetApp, EMC, 3Par, etc, etc

    > Many SAN solutions have a maximum volume limit of only 16TB
    NetApp Infinite volumes limit is 20PB

    You can contact a sales person from any of those companies to answer any of these questions.

    • Yeah, the 16TB limit says OP is looking at VERY low end solutions. As in not feasible for petabyte range projects.

      • Not necessarily, HP 3Par 20850 scales to 4 PB of SSD (raw, 15+ PB with dedupe) and 3.2 million sub 1ms IOPS, and 75GB/s of throughout but one LUN is still limited to 16TB because not enough customers need more than that it one logical disk to change underlying code.

    • Replication is not backup. I cannot stress this enough.

      I know of major companies that depended on replication and ignored backup, and then the original copy gets corrupted and the corruption gets replicated to the recovery sites.

      Now if you're doing SAN snapshots, and replicating those, then you might be covered, but mounting one of those snaps, and recovering some portion of your data, can be a real pain in the behind.

  • Seriously. Call ixsysyems. They specialize in this stuff and they use ZFS.
  • The research projects I've seen using that amount of storage has usually used a tape solution with dCache in front of it. You use a number of tape robots filled with tape, put them in different locations and have them back up everything between them.
    • Just realized I was a few digits off, saw that you said 0.5 PB. Somehow got it to 500 PB. Not that dCache isn't going to handle it, it will. But for as little data as just 0.5 PB a couple of disk arrays connected to a single server will usually be fine. Tape is still good for backup though.
  • If you want to keep your data on-site, unless your already have a lot of the infrastructure that you can leverage the path of least resistance is to use something like a NetApp Filer.

    For backups it can create snapshots on a schedule (hourly/daily/weekly), then either replicate them to a second physical storage unit (hopefully at a different site) or present them to your backup solution.

    Using the file services on the NetApp will also provide a solution to your "how do I present it to the storage consumers" q

  • by Anonymous Coward

    Something like storage pods? https://www.backblaze.com/blog/storage-pod/

  • You could look into Lustre, although it would change your hardware configuration a bit (its not a SAN) Depending on your configuration and desired redundancy, this will affect costs a bit (i.e.. more luster nodes).

    You could by a traditional SAN and tie it all together with fibre, though you'd need a clustered file system like Stornext, or another commercial CFS, or even GFS if you prefer open source. This would help solve your traversal of the system as a regular directory structure issue.

    Best bet for bac

  • SanDisk's Infiniflash [sandisk.com] is 512TB in a 3U chassis that is SAS-connected. You can front this with something like DataCore's SANsymphony [datacore.com] to turn it into a NAS/SAN appliance.

    The pricing looks to be around $1/GB, which is a ton cheaper than building a SAN of that capacity, plus it's much smaller in power/space/cooling.

    • $1/GB, which is a ton cheaper than building a SAN of that capacity,

      The marginal price of HDD storage is about $0.05/GB. Maybe double that for higher density, maybe double it again for redundancy. That's a maximum of $0.2/GB for the disks. There's some fixed overhead for a large disk farm plus some more per-byte overhead for the controllers and interconnects. Hard to believe that really adds up to much more than $1/GB. We're talking half a million dollars for 500TB.

      Daydream on. Big cluster of mid-tower PCs. Six 4TB drives per tower, for a total of 20TB with 1:6 redundancy.

      • by hjf ( 703092 )

        now factor in the cost of maintaining spinning disks, powering them, cooling them, and datacenter space....

  • Let's start growing brains in jars.

  • by MetricT ( 128876 ) on Saturday July 25, 2015 @03:44PM (#50182029)

    What clients will you be exporting it to? Linux, OS X, Windows? All three?

    What kind of throughput do you need? Is 10 MB/sec enough? 100 MB/sec? 10 GB/sec?

    What kind of IO are you doing? Random or sequential? Are you doing mostly reads, mostly writes, or an even mix?

    Is it mission critical? If something goes wrong, do you fix it the next day, or do you need access to a tier 3 help desk at 3 am?

    We have a couple of petabytes of CMS-HI data stored on a homegrown object filesystem we developed and exported to the compute nodes via FUSE. Reed-Solomon 6+3 for redundancy. No SAN, no fancy hardware, just a bunch of Linux boxes with lots of hard drives.

    There is no "one shoe fits all" filesystem, which is part of the reason we use our own. If you have the ability to run it, I'd suggest looking at Ceph. It only supports Linux, but has Reed-Solomon for redundancy (considered it a higher tier of RAID) and good performance if you need it. If you have to add Windows or OS X clients into the mix, you may need to consider NFS, Samba, WebDAV, or (ugh) OpenAFS.

    • by rev0lt ( 1950662 )
      It is funny, I've read many comments since the top of the page, and finally someone is actually asking for requirements. At this point, its buried at the middle of the scrollbar. And yet, someone blames slashdot moderation. I blame the users.
  • by tlambert ( 566799 ) on Saturday July 25, 2015 @03:44PM (#50182031)

    You're asking like you will be implementing it... don't.

    Gather all their requirements, gather your requirements on top of it (I'm pretty confident that some of those requirements were your additions for "you'd be an idiot to have that, but not also have this...", possibly including the backup).

    Then put out an Preliminary RFP to the major storage vendors, including asking them what they'd say you'd missed in the preliminary.

    Then take the recommendations they make on top of the preliminary with a grain of salt, since most of them will be intended to insure vendor lock-in to their solution set, revise the preliminary, and put out a final RFP.

    Then accept the bid that you like which management is willing to approve.

    Problem solved.

    P.S.: You don't have to grow everything yourself from seed you genetically modify yourself, you know...

  • by Anonymous Coward

    Unless you REALLY want to pay for it.

    As someone who works in a Hospital system, Imaging Informatics specifically, we have roughly that much data spread across 2 locations. Backups aren't what you think they are. We backup the infrastructure config. Databases, VM cluster config and VM's, which compressed, probably equates to 5-10 Terabytes. That's it. That's the stuff which, if worst possible event happened, we wouldn't be exctly back to 0 when we rebuilt.

    As for the 400-500 Terabytes of data, they're in what

  • by Anonymous Coward

    Backblaze blog has a rundown of their storage pod https://www.backblaze.com/blog/storage-pod-4-5-tweaking-a-proven-design/

    This with something like gluster, luster, cephe or even just nfs.

  • Backblaze is an online backup provider. They have open sourced some of their software and hardware designs.

    They are currently storing over 150 Petabytes of user data. https://www.backblaze.com/blog/150-petabytes-of-cloud-storage/
    They are working on scalability into the Zettabyte range https://www.backblaze.com/blog/vault-cloud-storage-architecture/
    They have open sourced their hardware design for anyone to use. https://www.backblaze.com/blog/storage-pod-4-5-tweaking-a-proven-design/

    They also looked into usin

  • Easy (Score:5, Funny)

    by ArcadeMan ( 2766669 ) on Saturday July 25, 2015 @03:53PM (#50182069)

    How Do You Store a Half-Petabyte of Data? (And Back It Up?)

    That's the easiest question I've ever seen.

    1. Wait about a decade or so.
    2. Buy two half-petabyte flash drives.
    3. Alternate your copies on the two flash drives, the previous one becomes your backup.


  • Step 1: buy a metric shitton of storage space (virtual or physical)
    Step 2: put your data on it
    Step 3: ???
    Step 4: profit

  • If you have a small budget and moderate reliability requirements, I'd suggest looking into building a couple Backblaze-style storage pods for block store (5x 180TB storage systems, apx $9000 each), each exporting 145TB RAID5 volumes via iSCSI to a pair of front-end NAS boxes. NAS boxes could be FreeBSD or Solaris systems offering ZFS filestores (putting multiples of 5 volumes, one from each blockstore, together in RAIDZ sets), which then export these volumes via CIFS or NFS to the clients. Total cost for storage, front-ends, 10GbE NICs and a pair of 10GbE switches: $60K, plus a few weeks to build, provision, and test.

    If you have a bigger budget, switch to FibreChannel SANs. I'd suggest a couple HP StorServ 7450s, connected via 8 or 16Gb FC across two fabrics, to your front ends, which aggregate the block storage into ZFS-based NAS systems as above, implementing raidz for redundancy. This would limit storage volumes to 16TB each, but if they're all exposed to the front ends as a giant pool of volumes, then ZFS can centrally manage how they're used. A 7450 filled with 96 4TB drives will provide 260TB of usable volume space (thin or thick provisioned), and cost around $200K-$250K each. Going this route would cost $500-$550K (SANs, plus 8 or 16Gb FC switches, plus fibre interconnects, plus HBAs) but give you extremely reliable and fast block storage.

    A couple advantages of using ZFS for the file storage is its ability to migrate data between backing stores when maintenance on underlying storage is required, and its ability to compress its data. For mostly-textual datasets, you can see a 2x to 3x space reduction, with slight cost in speed, depending on your front-ends' CPUs and memory speed. ZFS is also relatively easy to manage on the commandline by someone with intermediate knowledge of SAN/NAS storage management.

    Whatever you decide to use for block storage, you're going to want to ensure the front-end filers (managing filestores and exporting as network shares) are set up in an identical active/standby pair. There's lots of free software on linux and freebsd that accomplish this. These front-ends would otherwise be your single-point-of-failure, and can render your data completely unusable and possibly permanently lost if you don't have redundancy in this department.

  • They'll be happy to talk to you for free, for the prospect of getting their hands on that kind of cash. You're easily looking at $.5M-$1M between storage, processing, and redundancy.
  • Sounds like you need the storage onsite at least for the research project.

    The mixed media thing sounds like something to throw at the cloud unless there's a reason not to do that.

    As to spanning volumes etc... I don't really understand the file structure of this research project. Having a petabyte of data in a single directory is typically the opposite of good ideas.

    I'd like more information.

    As to back ups... it depends on how frequently the information changes. Backup tapes are probably the cheapest way to

  • You will not get a good answer here, because even if there would be one it will be hard to find between all the nonsense.

    BTW your scenario is incomplete and therefore it is unlikely to give a good answer. It looks a little bit like you want /. to make your homework.

  • by d3vi1 ( 710592 ) on Saturday July 25, 2015 @04:27PM (#50182201)

    You're not asking the right questions:

    The first correct question is why on earth would someone need to access half a petabyte? In most cases the commonly accessed data is less than 1%. That's the amount of data that realistically needs to reside on disk. It never is more than 10% on such a large dataset. Everything else would be better placed on tape. Tiered storage is the answer to the first question. You have RAM, solid/flash storage (PCI based), fast disks, slow high capacity disks and tape. Choose your tiering wisely.

    The second question you need to ask is how the customer needs to access that large datastore. In most cases you need serious metadata in parallel with that data. For Petabytes of data you cannot in most cases just use an intelligent tree structure. You need a web-site or an app to search that data and get the required "blob". For such an app you need a large database since you have 5M objects with searchable metadata (at 200MB/blob).

    The third question is why do you have SAN as a premise? Do you want to put a clustered filesystem with 5-10 nodes? Probably Isilon or Oracle ZS3-2/ZS4-4 are your answer.

    Fourth question: what are the requirements? (How many simultaneous clients? IOPS? Bandwidth? ACL support? Auditing? AD integration? Performance tuning?)

    Fifth question: There is no such thing as 100% availability. The term disaster in Disaster Recovery is correctly placed. Set reasonable SLA expectations. If you go for five-nine availability it will triple the cost of the project. Keep in mind that synchronous replication is distance limited. Typically, for a small performance cost, the radius is 150 miles and everything above impacts a lot.

    Even if you solve the problems above, if you want to share it via NFS/CIFS or something else you're going to run into troubles. Since CIFS was not realistically designed for clustered operation regardless of the distributed FS underneath the CIFS server, you get locking issues. Windows Explorer is a good example since it creates thumbs.db files, leaves them open and when you want to delete the folder you cannot unless you magically ask the same node that was serving you when it created the Thumbs.DB file. Apparently, the POSIX lock is transferred to the other server and stops you from deleting, but when Windows Explorer asks the other node who has the lock on the file you get screwed since the other server doesn't know. Posix locks are different from Windows locks. It affects all Likewise based products from EMC (VNX filler, Isilon, etc.) and it also affects the CIFS product from NetApp. I'm not sure about Samba CTDB though.
    I would design a storage based on ZFS for the main tiers, exported via NFSv4 to the front-end nodes and have QFS on top of the whole thing in order to push rarely accessed data to Tape. The fronted nodes would be accessed via WebDAV by a portal in which you can also query the metadata with a serious DB behind it.

    I've installed Isilon storage for 6000 xendesktop clients that all log-on at 9AM, i've worked on an SL8500, Exadata, various NetApp and Sun storages and I can tell you that you need to do a study. Have simulations with commodity hardware on smaller datasets to figure out the performance requirements and optimal access method (NAS, Web, etc.). Extrapolate the numbers, double them and ask for POC and demos from vendors, be it IBM, EMC, Oracle, NetApp or HP. Make sure that in the future, when you'll need 2PB you can expand in an affordable manner. Take care since vendors like IBM tend to use the least upgradable solution. They will do a demo with something that can hold 0,6PB in their max configuration and if you'll need to go larger you'll need a brand new solution from another vendor.

    It's not worth doing it yourself since it will be time-consuming (at least 500 man-hours until production) and with at least 1 full-time employees for the storage. But if you must, look at Nexenta and the hardware that they recommend.

    And remember to test DR failover scenarios.

    Good luck!

    • by radish ( 98371 )

      The first correct question is why on earth would someone need to access half a petabyte? In most cases the commonly accessed data is less than 1%. That's the amount of data that realistically needs to reside on disk. It never is more than 10% on such a large dataset.

      Never say never. We have data sets several times larger than that which are 100% always online due to client access patterns. Not only online, but extremely latency critical. And I personally could name a dozen other companies with similar requ

  • Library storage sounds like that may be your best choice. Several high end vendors sell such systems and may need to have RFS and RFQ's submitted, not to mention seeing the systems in action. This is not going to be cheap, but it's best on the long term investment. Ensure that it is scalable and can handle any future expansions without investing in whole new kit or that will simply put your department back to square one.

  • On a SAN the 16tb limit comes generally from 32 bit SANs the 64 bit SANs wouldn't have it. Plenty of SAN solutions can handle 500tb or 10x that much. So just upgrade. If you only want backup there are plenty of hardware backup devices that handle this. For example exagrid scales to I believe 300tb / hr much less 500tb total. This isn't gigantic in today's world. You just need to have a conversation with your vendor, or an agent. You aren't asking for anything abnormal or challenging.

  • For high throughput/IOPS requirements build a Lustre/Ceph/etc. cluster and mount the cluster filesystems directly on as many clients as possible. You'll have to set up gateway machines for CIFS/NFS clients that can't directly talk to the cluster, so figure out how much throughput those clients will need and build appropriate gateway boxes and hook them to the cluster. Sizing for performance depends on the type of workload, so start getting disk activity profiles and stats from any existing storage NOW to figure out what typical workloads look like. Data analysis before purchasing is your best friend.

    If the IOPS and throughput requirements are especially low (guaranteed < 50 random IOPS [for RAID/background process/degraded-or-rebuilding-array overhead] per spindle and what a couple 10gbps ethernet ports can handle, over the entire lifetime of the system) then you can probably get away with just some SAS cards attached to SAS hotplug drive shelves and building one big FreeBSD ZFS box. Use two mirrored vdevs per pool (RAID10-alike) for the higher-IOPS processing group and RAIDZ2 or RAIDZ3 with ~15 disk vdevs for the archiving group to save on disk costs.

    Plan for 100% more growth in the first year than anyone says they need (shiny new storage always attracts new usage). Buy server hardware capable of 3 to 5 years of growth; be sure your SAS cards and arrays will scale that high if you go with one big storage box.

  • Buy a Storage Pod (Score:4, Informative)

    by Areyoukiddingme ( 1289470 ) on Saturday July 25, 2015 @05:00PM (#50182341)

    Buy Storage Pods, designed by BackBlaze. You can get 270TB of raw storage in 4U of rackspace for $0.051 per gigabyte [backblaze.com]. Total cost for half a petabyte of raw storage: $27,686. To back it all up cheaply but relatively effectively, buy a second set to use as a mirror. $55,372. For use with off-the-shelf software (FreeNAS running ZFS or Linux running mdm RAID) to present a unified filesystem that won't self-destruct when a single drive fails, you'll need to over-provision enough to store parity data. Go big or go home. Just buy another pod for each of the primary and the backup sets. Total of 6 pods with 1620TB of raw storage: $83,058. Some assembly required. And 24U of rackspace required, with power and cooling and 10Gbe ethernet and UPSs (another 4-8U of rackspace).

    Expect a ballpark price of something a little under $100,000 that will meet your storage requirements with sufficient availability and redundancy to keep people happy. It will require 2 racks of space, and regular care and feeding. Do the care and feeding in house. A support contract where you pay some asshole tens of thousands of dollars a year to show up and swap drives for you is a waste of money. Bearing that in mind, as other posters have said, talk to storage vendors selling turnkey solutions. Come armed with these numbers. When they bid $1 million, laugh in their faces. But there's an outside chance you'll find a vendor with a price that is something less than hyperinflated. Stranger things have happened.

    If you don't generate data very quickly, you can ease into it. For around $35,000, you can start with just 2 pods and the surrounding infrastructure, and add pods in pairs as necessary to accommodate data growth. Add $27,000 in 2 chassis next year to double your space. Add $26,000 of space again in 2017 and increase your raw capacity another 50%. (Total storage cost using BackBlaze-inspired pods is dominated by hard drive prices, which trend downwards.) When you find out your users underestimated growth, another $25,000 of space in 2018 takes you to somewhere in the neighborhood of 2 petabytes of raw storage, that you're using with double parity and 100% mirrored backup for a total effective useable space of approximately 918TB. You'll be replacing 2-3 drives per year, starting out, and 0-1 after infant mortality has run its course. Keep extras in a drawer and do it yourself in half an hour each on a Friday night. If you configured ZFS with reasonably sized vdevs, (3-5 devices) the array rebuild should be done by Monday morning. By 2020, you'll be back up to replacing 2-3 drives per year again as you climb the far side of the bathtub curve. While you're at it, you can seriously consider replacing whole vdevs with larger capacity drives, so your total useable space can start to creep up over time, without buying new chassis. By 2025, you will have 8 chassis in two racks hosting 2.88PB of raw storage space that's young and vital and low maintenance, having spent roughly $200,000.

    A bargain, really.

  • Super-Micro has 36 and 72 drive racks that aren't horrible human effort wise (you can get 90 drive racks, but I wouldn't recommend it). You COULD get 8TB drives for like 9.5 cent / GB (including the $10k 4U chassi overhead). 4TB drives will be more practical for rebuilds (and performance), but will push you to near 11c / GB. You can go with 1TB or even 1/2TB drives for performance (and faster rebuilds), but now you're up to 35c / GB.

    That's roughly 288TB of RAW for say $30k 4U. If you need 1/2 PB, I'd s
  • Lucky (?) for you, I just went through purchasing a storage refresh for a cluster, as we're planning to move to a new building and no one trusts the current 5 year old solution to survive the move (besides which, we can only get 2nd hand replacements now). The current system is 8 shelves of Panasas ActiveStor 12, mostly 4 TB blades, but the original 2-3 shelves are 2 TB blades, giving about 270 TB raw storage, or about 235ish TB in real use. The current largest volume is about 100 TB in size, the next-largest is about 65 TB, with the remainder spread among 5-6 additional volumes including a cluster-wide scratch space. Most of the data is genomic sequences and references, either downloaded from public sources or generated in labs and sent to us for analysis.

    As for the replacement...

    I tried to get a quote from EMC. Aside from being contacted by someone *not* in the sector we're in, they also managed to misread their own online form and assumed that we wanted something at the opposite end of the spectrum from what I requested info on. After a bit of back and forth, and a promise to receive a call that never materialized, I never did get a quote. My assumption is they knew from our budget that we'd never be able to afford the capacities we were looking for. At a prior job, a multi-million dollar new data center and quasi-DR site went with EMC Isilon and some VPX stuff for VM storage/migration/replication between old/new DCs, and while I wasn't directly involved with it there, I had no complaints. If you can afford it, it's probably worth it.

    The same prior job had briefly, before my time there, used some NetApp appliances. The reactions of the storage admins wasn't all that great, and throughout the 6 years I was there, we never could get NetApp to come in to talk to us whenever we were looking for expansion of our storage. I've had colleagues swear by NetApp though, so YMMV.

    I briefly looked at the offerings from Overland Storage (where we got our current tape libraries), on the recommendation of the VAR we use for tapes & library upgrades. It looked promising, but in the end, we'd made a decision before we got most of those materials...

    What we ended up going with was Panasas, again. Part of it was familiarity. Part of it was their incredible tech support even when the AS12 didn't have a support contract (we have a 1 shelf AS14 at our other location for a highly specialized cluster, so we had *some* support, and my boss has a golden tongue, talking them into a 1-time support case for the 8 shelf AS12). We also have a good relationship with the sales rep for our sector, the prior one actually hooked us up with another customer to acquire shelves 6-8 (and 3 spares), as this customer was upgrading to a newer model. Based on that, we felt comfortable going with the same vendor. We knew our budget, and got quotes for three configurations of their current models, ActiveStor 14 & 16. We ended up with the AS16, with 8 shelves of 6 TB disk (x2) and 240 GB SSD per blade (10 per, plus a "Director Blade" per). Approximate raw storage is just a bit under 1 PB (roughly 970-980 TB raw for the system).

    In terms of physical specs, each shelf is 4U, have dual 10 GbE connections, and adding additional shelves is as easy as racking them and joining them to the existing array (I literally had no idea what I was doing when we added shelves on the current AS12, it just worked as they powered on). Depending on your environment, they'll support NFS, CIFS, and their own PanFS (basically pNFS) through a driver (or Linux kernel module, in our case). We're snowflakes, so we can't take advantage of their "phone home" system to report issues proactively and download updates (pretty much all vendors have this feature now). Updating manually is a little more time-consuming, but still possible.

    As for backups, I honestly have no idea what I'm going to do. Most data, once written, is static in our environment, so I can probably get away with infrequent longer retention period backups for every

  • One of these will do you well
    https://en.wikipedia.org/wiki/... [wikipedia.org]

    For storage that's trickier. You probably need to characterize your usage before you talk to a vendor otherwise they will oversell you into oblivion.

  • Where I work, we are running EMC's Isilon platform. We have ~4PB of data replicated between two data centers.

    The platform supports the traditional CIFS/SMB and NFS for client connectivity.

    It also has Hadoop support (HDFS). The great thing about the HDFS support is that you do not have to spin a separate file system for it. The same files that your clients access via CIFS or NFS can be accessed via HDFS. Isilon was built with Hadoop in mind and the Isilon nodes act as Hadoop "compute nodes".

    The OneFS fil

  • That's it? (Score:5, Informative)

    by guruevi ( 827432 ) <evi AT evcircuits DOT com> on Saturday July 25, 2015 @06:12PM (#50182633) Homepage

    500TB is nothing these days. You can easily buy any system and it will support it. Look at FreeBSD/FreeNAS with ZFS (or their commercial counterpart by iXSystems). If you want to have an extremely comfortable, commercial setup, go Nexenta or with a bit of elbow grease, use the open/free counterpart OpenIndiana (Solaris based).

    You can build 2 systems (I personally have 3, 1 with SAS in Striped-Mirrors, 1 with Enterprise-SATA in RAIDZ2 and 1 with Desktop-SATA in RAIDZ2) and have ZFS snapshots every minute/hour/day replicated across the network for backups, both Nexenta and FreeNAS have that right in the GUI. The primary system also has a mirrored head node which can take over in less than 10s. As far as sharing out the data: AFP/SMB/NFS/iSCSI/WebDAV etc. whatever you need to build up on it.

    My system is continuously snapshotted to it's primary backup so that in case of extreme failure (which has not happened in the 7 years since I've built this system) I can run from the primary backup until the primary has been restored with perhaps a few seconds of data loss (don't know if that's acceptable to you but in my case it's not a problem in case we do have a full meltdown)

    Where are those systems limited to 16TB? I wouldn't touch them with a 10-foot pole because they're running behind (within a few years a single hard drive will surpass that limit).

  • by im_thatoneguy ( 819432 ) on Saturday July 25, 2015 @06:13PM (#50182639)

    What are your performance requirements. If you just need a giant dump of semi-offline storage then look into building a backblaze Storage Pod.
    https://www.backblaze.com/blog... [backblaze.com]

    For about $30,000 you could build four storage pods. Speed would not be terrific. Backups are handled through RAID. If you want faster, more redundant or fully serviced your next step up in price is probably a $300,000 NAS solution. Which might serve you better anyway.

  • Where I work we deal with data sets of a similar order. However, different data sets are stored differently depending on need. For online relational data where performance is critical, it's in master/slave/backup DB clusters running with 4.8TB PCIe SSDs. The backups are taken from a slave node and stored locally, plus they're pushed offsite. No tape, if we need a restore we can't really wait that long.

    For data we can afford to access more slowly we use large HDFS clusters with regular SATA discs. There's a

  • While I agree with most commenters that you need to supply many more details before even beginning to narrow the options, if you do look at the storage vendors, DDN (Data Direct Networks) is really hard to beat.

    I see the EMC Isilon guys posting here and need to counter. :) They are overpriced and underpowered for almost every application. Their strength is typical enterprise environments - lots of small files accessed via NFS and "enterprise" SLAs. That's almost always the wrong solution for big data applic

  • I keep it all in a separate drive, and only mount it when I want to look at the data. Also, I mount it under .porn, so it isn't visible in a casual listing.
  • To store files close to a petabyte, you need a petafile, obviously.

  • Storing the data is the easy part, Glusterfs should do it just fine. The point I am curious about is backups: how do you backup such a volume?
  • Disclaimer: I work for a storage vendor. Also a long time Slashdot reader though, so this isn't mean as a sales pitch.

    Half of a petabyte is not really a lot of data in today's world. I talk to people every day that are trying to find ways to manages many PBs (into the hundreds) and are having challenges doing this with traditional storage. The trend that was started by the big Internet companies is to get rid of the fibre-channel SANs and instead solve the problem of storage using standard x86 servers. T
  • It may sound "funny," but I once priced Mega (KimDotCom) for offsite backup & storage. They turned out to be less expensive than Amazon Glacier by a bit AND instantly available. We didn't go with them. Instead, we replicated across data centers with multi-terabyte storage nodes.
  • Store it in the cloud. 1/2 petabyte isn't even the "highest tier" requirement.

    On Azure it will cost $168k/year to store this much data instantly accessible. Whatever other solution you come up with, if it takes more than 1 full time person to support, then it's already more expensive (and that's not even including the up-front capital costs, installation and setup costs, training costs, deprecation, maintainance, ...)

  • Sounds like a fairly simple case for a Hadoop cluster - a smallish one at that. We're currently deploying to clusters at 1PB/rack density, which means you could deploy a rack or two easily enough. You'd get compute, you get a single flat filesystem, you get redundancy, all built in. Our biggest cluster is now up to 16PB, all one big compute/storage beast, chugging away all day.

    I'd suggest starting with the Hortonworks Sandbox VM - grab it, fire it up, play with it. Add some files, poke around, see if it

  • Not only are you out of your league, but you're barking up the wrong tree.

    1) You should hire someone to figure it out for you- as either on-site consultancy or use something like amazon.
    2) You should use a different site that has more than 5 legitimate comments on a thread.

  • 'nuff said

  • We store and backup about this much data (a little more), although spread across a variety of machines. All in all, though, the data is primary virtual hard drives (we run a private cloud environment).

    Storing it on disk is easy enough - and cheap enough, that it's little concern. Amazon, Azure, etc. are *insanely* expensive for this task, month by month, compared to self owned disks.

    As our hypervisors are all Microsoft (Hyper-V - and yes, I know this is Slashdot and I just said I use a Microsoft product

Air is water with holes in it.