Ask Slashdot: Simple Way To Backup 24TB of Data Onto USB HDDs ?

Ask Slashdot: Simple Way To Backup 24TB of Data Onto USB HDDs ? 405

Posted by samzenpus on Friday August 10, 2012 @05:24AM from the save-often dept.

An anonymous reader writes "Hi there ! I'm looking for a simple solution to backup a big data set consisting of files between 3MB and 20GB, for a total of 24TB, onto multiple hard drives (usb, firewire, whatever) I am aware of many backup tools which split the backup onto multiple DVDs with the infamous 'insert disc N and press continue', but I haven't come across one that can do it with external hard drives (insert next USB device...). OS not relevant, but Linux (console) or MacOS (GUI) preferred... Did I miss something or is there no such thing already done, and am I doomed to code it myself ?"

Ask Slashdot: Simple Way To Backup 24TB of Data Onto USB HDDs ?

This discussion has been archived. No new comments can be posted.

Search 405 Comments Log In/Create an Account

Comments Filter:

Re:USB and disk Speed (Score:5, Informative)

by gagol ( 583737 ) writes: on Friday August 10, 2012 @05:30AM (#40943529)

If you can achieve a sustained write speed of 50 megabytes per second, you are in for 140 hours of data transfer. I hope it is not a daily backup!

Bacula is your friend (Score:5, Informative)

by bernywork ( 57298 ) writes: <.bstapleton. .at. .gmail.com.> on Friday August 10, 2012 @05:32AM (#40943543) Journal

http://www.bacula.org/en/ [bacula.org]
There's even a howto here:
http://wiki.bacula.org/doku.php?id=removable_disk [bacula.org]

Split into multiple tar files? (Score:5, Informative)

by Anonymous Coward writes: on Friday August 10, 2012 @05:34AM (#40943549)

I'm guessing you don't have enough space to split a backup on the original storage medium and then mirror the splits onto each drive?
Given the size requirements, it seems that might be prohibitive, but it would make things easier for you:
How to Create a Multi Part Tar File with Linux [simplehelp.net]

RAID (Score:5, Informative)

by Anonymous Coward writes: on Friday August 10, 2012 @05:34AM (#40943553)

For that much data you want a RAID since drives tend to fail if left sitting on the shelf, and they also tend (for different reasons) if they are spinning.
Basically: buy a RAID enclosure, insert drives so it looks like one giant drive, then copy files.
For 24TB you can use eight 4TB drives for a 6+2 RAID-6 setup. Then if any two of the drives fail you can still recover the data.

git-annex (Score:4, Informative)

by Anonymous Coward writes: on Friday August 10, 2012 @05:40AM (#40943585)

You might want to look into git-annex:
http://git-annex.branchable.com/ [branchable.com]
I've not tried it, but it sounds like an ideal solution for your request, especially if your data is already compressed.

Re:Bacula is your friend (Score:3, Informative)

by Anonymous Coward writes: on Friday August 10, 2012 @05:50AM (#40943643)

Yes, Bacula is the only real solution out there that isn't going to cost you an arm and a leg, and that allows you to switch easily between any backup medium. As long as your mySQL catalog is intact restoration is a synch...
Did I mention it supports backup archiving as well if you want duplicate copies for Tapes being shipped off site...

Tar already does this (Score:3, Informative)

by cyocum ( 793488 ) writes: on Friday August 10, 2012 @05:56AM (#40943669) Homepage

Have a look at tar and it's "multi-volume" [gnu.org] option.

Linuxquestions thread on multi-disk backups (Score:2, Informative)

by Anonymous Coward writes: on Friday August 10, 2012 @05:56AM (#40943671)

Here's a Linuxquestions thread [linuxquestions.org] outlining multi-disk backup strategies.
The gist of the discussion is to use DAR [linux.free.fr].

Bash.... (Score:5, Informative)

by djsmiley ( 752149 ) writes: <djsmiley2k@gmail.com> on Friday August 10, 2012 @06:08AM (#40943733) Homepage Journal

First bash script to grab the size of the "current" storage;
compress the files up until that size;
Move compressed file onto storage;
request new storage, start again.
----------
Or, if you've got all the storage already connected; bash for 0..x; do { cp $archive$x /mount/$x/ }; done :D

Re:Tar already does this (Score:5, Informative)

by leuk_he ( 194174 ) writes: on Friday August 10, 2012 @06:09AM (#40943739) Homepage Journal

multi volume tar [gnu.org]Just mount a new usb disk whenever it is full.
However to have reasonable retrieve rate (going through 24 TB of data will rake some days over USB2), You better split the dataset in multiple smaller sets. That also has the advantage that if one disk chrashes (AND Consumer grade USB disk will chrash!) not your entire dataset is lost.
For that reason (diskfailure), do not use some linux spanning disk feature. File systems are lost when one of the disks they write on are lost. Unless you use a feature that can handle lost disks (Raid/ Zraid)
And last but not least: Test your backup. I have seen myself cheap USB interfaces failing to write the data to disk without a good error messages. All looks ok until you retreive the data and some files are corrupted.

Use DAR or KDAR (Score:3, Informative)

by pegasustonans ( 589396 ) writes: on Friday August 10, 2012 @06:18AM (#40943771)

If you don't want to invest in new hardware, you could use DAR [ubuntugeek.com] or KDAR [sourceforge.net] (KDE front-end for DAR).
With KDAR, what you want is the slicing settings [sourceforge.net].
There's an option to pause between slices, which gives you time to mount a new disk.

Re:solution (Score:5, Informative)

by aglider ( 2435074 ) writes: on Friday August 10, 2012 @06:23AM (#40943797) Homepage

3.samba
Uh? Why?
cp -a is all you need once you put the HDD inside the target machine.
And if you put it into another machine on the same network, then rsync is the answer.
Forget about the buggy and slow SAMBA.

Re:No. (Score:2, Informative)

by ledow ( 319597 ) writes: on Friday August 10, 2012 @06:34AM (#40943851) Homepage

USB 2.0 provides 480Mbps of (theoretical) bandwidth. So unless you go Gigabit all over your network (not unreasonable), you won't beat it with a NAS. Even then, it's only 1-and-a-bit times as fast as USB working flat-out (and the difference being if you have multiple USB busses, you can get multiple drives working at once). And USB 3.0 would beat it again. And 10Gb between the client and a server is an expensive network to deploy still.
Granted, eSATA would probably be faster but there's nothing wrong with USB for such tasks if you *don't* want to provide Gigabit connections everywhere and (presumably) greater-than-gigabit backbones.

Re:Tape? (Score:5, Informative)

by Anonymous Coward writes: on Friday August 10, 2012 @06:57AM (#40943955)

No kidding. For $2400, you get 24x TB HDs and a bookkeeping nightmare if you ever actually resort to the "backup." For $3k, you get a network-ready tape autoloader with 50-100TB capacity and easy access through any number of highly refined backup and recovery systems.
Now, if the USB requirement is because that's the only way to access the files you want to steal from an employer or government agency, then the time required to transfer across the USB will almost guarantee you get caught. Even over the weekend. You should come up with a different method for extracting the data.

PAR (Score:4, Informative)

by fa2k ( 881632 ) writes: <pmbjornstad@noSPAm.gmail.com> on Friday August 10, 2012 @06:59AM (#40943967)

I have just seen "PAR" a couple of times here on slashdot, haven't used it, but it seems great for this: http://en.wikipedia.org/wiki/Parchive [wikipedia.org] . You need enough redundancy to allow one USB drive to fail. And I would rather get a SATA bay and use "internal" drives than having to deal with external USB drives. Get "green" drives, they are slow but cheap.

Re:Bacula is your friend (Score:5, Informative)

by arth1 ( 260657 ) writes: on Friday August 10, 2012 @07:17AM (#40944049) Homepage Journal

Yes, Bacula is the only real solution out there that isn't going to cost you an arm and a leg, and that allows you to switch easily between any backup medium.
Except for good old tar, which is present on all systems.
Most people are probably not aware that tar has the ability to create split tar archives. Add the following options to tar:
-L <max-size-in-k-per-tarfile> -M myscript.sh ... where myscript.sh echoes out the name to use for the next tar file in the series. It can be as easy as a for loop checking where the tar file already exists and returning the next hooked up volume where it doesn't.
Or it could even unmount the current volume and automount the next volume for you. Or display a dialogue telling you to replace the drive.
One advantage is that you can easily extract from just one of the tar files; you don't need all of them or the first-and-last like with most backup systems. Each tar file is a valid one, and at most you need two tar files to extract any file, and most of them just one.
Tar multivolume can, of course, be combined with tar's built in compression.

Re:solution (Score:1, Informative)

by myowntrueself ( 607117 ) writes: on Friday August 10, 2012 @07:35AM (#40944119)

3.samba
Uh? Why?
cp -a is all you need once you put the HDD inside the target machine.
And if you put it into another machine on the same network, then rsync is the answer.
Forget about the buggy and slow SAMBA.
cp copies file by file.
A more efficient way is something like
tar -cf - .|(cd /somewhere/ ; tar xf -)
tar treats the directory contents as a data stream. Its much faster for large amounts of files and data.

Re:USB and disk Speed (Score:2, Informative)

by Anonymous Coward writes: on Friday August 10, 2012 @08:09AM (#40944319)

It's "nudge-nudge", not "notch-notch".
Also, you left out "wink-wink".
Yes, I know, I should get a life..

Re:USB and disk Speed (Score:5, Informative)

by v1 ( 525388 ) writes: on Friday August 10, 2012 @08:31AM (#40944455) Homepage Journal

I have a setup here where the server's video media is about 8tb in size. That backs up via rsync to the backup server which is in another room over rsync. It contains a large number of internal and external drives. None of them are over 2tb in capacity. The main drive has data separated into subfolders and the rsync jobs back up specific folders to specific drives.
A few times I've had to do some rearranging of data on the main and backup drives when a volume filled up. So it helps to plan ahead to save time down the road. But it works well for me here.
The only thing with rsync you need to worry about is users moving large trees or renaming root folders in large trees. This tends to cause rsync to want to delete a few TB of data and then turn around and copy it all over again on the backup drive. It doesn't follow files and folders by inode, it just goes by exact location and name.
I help mitigate this by hiding the root folders from the users. The share points are a couple levels deeper so they can't cause TOO big of a problem if someone decides to "tidy up". If they REALLY need something at a lower level moved or renamed, I do it myself, on both the source and the backup drives at the same time.
Another alternative is to get something like a Drobo where you can have a fairly inexpensive large pool of backup storage space that can match your primary storage. This prevents the problem of smaller backup volumes filling up and requiring data shuffling, but does nothing for the issue of users mucking with the lower levels of the tree.

Re:RAID (Score:4, Informative)

by Sarten-X ( 1102295 ) writes: on Friday August 10, 2012 @08:58AM (#40944647) Homepage

As mentioned already, RAID is not a backup solution. While it will likely work fine for a while, the risk [datamation.com] of a catastrophic failure rises as drive capacity increases. From the linked article:
With a twelve -terabyte array the chances of complete data loss during a resilver operation begin to approach one hundred percent - meaning that RAID 5 has no functionality whatsoever in that case. There is always a chance of survival, but it is very low.
Granted, this is talking about RAID 5, so let's naively assume that doubling the parity disks for RAID 6 will halve the risk... but then since we're trying to duplicate 24 terabytes instead of twelve, we can also assume the risk doubles again, and we're back to being practically guaranteed a failure.
Bottom line is that 24 terabytes is still a huge amount of data. There is no reliable solution I can think of for backing it all up that will be cheap. At that point, you're looking at file-level redundancy managed by a backup manager like Backup Exec (or whatever you prefer) with the data split across a dozen drives. As also mentioned already, the problem becomes much easier if you're able to reduce that volume of data somewhat.

Re:solution (Score:5, Informative)

by fnj ( 64210 ) writes: on Friday August 10, 2012 @08:58AM (#40944651)

No. It's slower. Informative, my ass.

Re:USB and disk Speed (Score:3, Informative)

by milgr ( 726027 ) writes: on Friday August 10, 2012 @09:17AM (#40944841)

The LHC generates a petabyte per second [slashdot.org].

Re:DaisyChain (Score:5, Informative)

by Painted ( 1343347 ) writes: on Friday August 10, 2012 @10:35AM (#40945779) Homepage

DON'T DO THIS.

We did this exact thing using WD Green drives for our 18Tb backup problem. Got two of 'em, planning on using their built-in rsync for onsite/off siting the data. Unfortunately, the units never broke 1MB/s transfer, and no amount of work with Drobo yielded faster performance reliably. Both of our units are now sitting unused, ($2500 each!), and we put the drives into a RAID-50 8 bay USB3 enclosure. The new unit runs about 150x faster, and ended up costing $400 (prices are for enclosures only, drives were additional).

Most disappointing was Drobo's support- they just seemed to shrug a lot, and were hyper-agressive about closing trouble tickets.

Re:USB and disk Speed (Score:3, Informative)

by voltorb ( 2668983 ) writes: on Friday August 10, 2012 @10:36AM (#40945799)

But only 1GB/s is recorded: http://www.itnews.com.au/News/310769,computing-for-the-large-hadron-collider.aspx [itnews.com.au]

Re:RAID (Score:4, Informative)

by louic ( 1841824 ) writes: on Friday August 10, 2012 @10:54AM (#40946019)

As mentioned already, RAID is not a backup solution.
Nevertheless, there is nothing wrong with using disks that happen to be in a RAID configuration as backup disks. In fact, it is probably a pretty good idea for large files and large amounts of data.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Ask Slashdot: Simple Way To Backup 24TB of Data Onto USB HDDs ? 405

Ask Slashdot: Simple Way To Backup 24TB of Data Onto USB HDDs ? More Login

Ask Slashdot: Simple Way To Backup 24TB of Data Onto USB HDDs ?

Re:USB and disk Speed (Score:5, Informative)

Bacula is your friend (Score:5, Informative)

Split into multiple tar files? (Score:5, Informative)

RAID (Score:5, Informative)

git-annex (Score:4, Informative)

Re:Bacula is your friend (Score:3, Informative)

Tar already does this (Score:3, Informative)

Linuxquestions thread on multi-disk backups (Score:2, Informative)

Bash.... (Score:5, Informative)

Re:Tar already does this (Score:5, Informative)

Use DAR or KDAR (Score:3, Informative)

Re:solution (Score:5, Informative)

Re:No. (Score:2, Informative)

Re:Tape? (Score:5, Informative)

PAR (Score:4, Informative)

Re:Bacula is your friend (Score:5, Informative)

Re:solution (Score:1, Informative)

Re:USB and disk Speed (Score:2, Informative)

Re:USB and disk Speed (Score:5, Informative)

Re:RAID (Score:4, Informative)

Re:solution (Score:5, Informative)

Re:USB and disk Speed (Score:3, Informative)

Re:DaisyChain (Score:5, Informative)

Re:USB and disk Speed (Score:3, Informative)

Re:RAID (Score:4, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot