Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Hardware

The Amazing $5k Terabyte Array 448

An anonymous reader writes: "Running out of space on your local disk? How about a Terabyte array for only a few thousand dollars. This article at KCGeek.com shows how to put together 1000 Gigs of hard drive space for the cost of a few desktop computers." I could rip my entire anime collection for instant access! Rip all my CDs and still have .9 Terabytes left! Maybe Mirror Usenet! I guess the simple truth is that now that 100 gig drives are a couple hundred bucks, we now have the ability to store anything we reasonably could need (unless you define "Reasonable" as "I need to store DNA Sequences").
This discussion has been archived. No new comments can be posted.

The Amazing $5k Terabyte Array

Comments Filter:
  • by DNAGuy ( 131264 ) <brent@nOSpam.brentrockwood.org> on Wednesday January 30, 2002 @08:48AM (#2924567) Homepage

    Its only a matter of time 'til video becomes as commonplace as MP3's on our drives. 100 Gigs is what...20 movies??? I don't see my appetite for disk space slowing down any time soon.

    Hmmm...video; logfiles that don't roll over - ever; online network backup... I'm sure to figure out a way to fill that terabyte. :)

    • Its essential that we move to this level of secondary storage, or there is a real danger that tertiary storage systems (such as tape, DVD) may actually be able to keep up!

      Seriously , the big problem here is not having the data online, but figuring how to recover it if you lose it.

      Not that RAID is a bad thing, but I have seen RAID systems go down - I lost a day's work (not archived by myself) when my web hosting company's raid system failed completely. (They were most apologetic and offered some compensation, but the data was very gone for all their customers - I believe they bought new RAID systems from another vendor immediately thereafter).

      My 2c worth.

      Michael
      • Redundancy is a better solution than disposable media backup. Often more expensive, but infinately more reliable.

        Code Versioning/document management on changing files to maintain history.

        Your web hosting provider had 1 Raid system, thats only 1 level of redundancy (I know multiple disks - but on 1 system). If you want to truely ensure data you need redundant systems, such as networked backup to 2 additional machines that also utilize raid.

        If the data is critical you need to examine points of failure. Thats what clustering, and load balancing offers - total redundancy.
      • Not that RAID is a bad thing, but I have seen RAID systems go down

        I once saw a shaddowing controller fail in such a way that it managed to corrupt both of the RAID 5 arrays it was driving. Had to bring the system back up from the first level backup.

        Soon after that we switched to using EMC gear.

  • by Anonymous Coward
    yeah , with 160 gig ATA drives out now,
    you can do it with 6 drives vs. 10 drives,
    and alot of motherboards come with onboard
    RAID, and if you use software RAID via
    win2k or Volume manager type app for Linux
    it would rock .

    Cheap too, at $260 per drive per pricewatch .

    Peace out...
    • please tell me how you get 6 IDE drives on a pc that gives you any performance in a rad function... U160 SCSI drives will give you at least a 70% speed increase and a 80% increase in reliability....

      If I had to store a terebyte of information I'd be an idiot to use consumer level storage (IDE).

      Ever wonder why real servers uses SCSI?
      • I believe that Promise makes the SuperTRAK Pro [promise.com] series of ATA RAID cards that support up to 6 drives and RAID 5. I haven't used them personally but they do exist.

        I agree that on a server or a professional workstation SCSI is the way to go for speed and reliability. But for the home consumer who wants to work with digital video the cost of a SCSI RAID set up is extremely prohibitive.
        • on a server or a professional workstation SCSI is the way to go

          I do wish to avoid yet another SCSI/IDE flamefest, but I would point out that this configuration is like most of its ilk--it is basically network attached storage. That means that no one will be reading or writing from the server system itself, but will be accessing the raid array through a network link via NFS and/or SMB. In my experience, performance of Linux Software RAID5 on Promise IDE controllers with 80-GB Maxtor 5400-RPM hard drives can exceed 50 MB/s write and 70 MB/s read. SMB/NFS even over Gbit ethernet will be hard pressed to saturate that.

          Having built many of these low-budget raid5 arrays, I cannot concur that SCSI and/or hardware RAID is necessary to see acceptable performance. <Horror stories about Hardware IDE RAID5 controllers deleted.>

          I do admonish would-be builders to include an extra hard drive in the raid array as a hot spare. For four drive arrays (3 data + 1 parity), it may be unnecessary. For larger system (7 data + 1 parity), I think a hot spare is a worthwhile investment. Also, avoid 7200-RPM drives if possible and actively cool all of the drives in the array. One or two fans blowing on the array can make a big difference.
          • Also, avoid 7200-RPM drives if possible and actively cool all of the drives in the array. One or two fans blowing on the array can make a big difference.

            This is important in any drive array. IF you dont have them spaced apart and have cooling fans on them, no mater what they are, you are asking for failures and short life spans. I was am azed at the differences on my SCSI drives, seperating them another 1/4th of an inch and adding a fan blowing at each bundle of 3 drives caused a temperature drop from 110DegF to 76DegF or only 4 degrees above the server rooms ambient temperature. (Hey, I'd rather have it set for 60 but the traffic and billing ladies complain..they keep their work area at 80!)

            The suprising part is that the ML530's have a spot to place a fan in the drive cages, yet no fans installed.
      • Well, the reliability is a point taken, one device fails on an IDE chain and the whole chain may collapse depending on the problem.

        The performance is questionable. IDE is behind SCSI, but not nearly as badly anymore. And how are you accessing this storage space typically? Through a network whose speed is likely not to exceed 100 MBit, so network is essentially the bottleneck for throughput, unless of course you have some data processing application or something like oracle, in which case your point may be valid, but still, SCSI is still overpriced for what it gives... If only it had prevailed more, then it would be cheaper and this debate would be moot...
        • This is nice when taken as a over simplified example. but that is not the case. I can access all 15 of my scsi devices and tell them to do things seperately and they will perform the job. 2 devices on the IDE chain? if one is doing a job the other has to sit and do nothing. the communication system built into SCSI gives the largest performance gains... This is why a SCSI-II hard drive and controller still feels snappier than a Ultra 33 IDE drive... yes the IDE drive is theoretically faster than the IDE, but as soon as I access the CD and drive at the same time the SCSI devices cintinue to fly while the IDE devices start falling down waiting for each other.

          They really really need to design a IDE-II specification that gives the SCSI performance traits to IDE.
          • > They really really need to design a IDE-II
            > specification that gives the SCSI performance
            > traits to IDE.

            They already have it -- tag command queueing has been in the ATA spec for years, since ATA-5 I think. Most vendors either have command queueing IDE drives, or are coming out with them soon.

            http://www.t13.org for more info on the various ATA specifications

            --eric
      • While I agree, SCSI drives are simply better drives, cheap is a very powerful motivator. IDE is about one tenth the cost of SCSI. So the IDE array will last a year -- how long do a lot of companies last these days? Over time, the SCSI system may, ultimately, be cheaper -- the cost of replacing failed drives, the downtime for rebuilding and restoring the array, lost productivity of a missing database, drugs for the admin headaches...

        I've built a 1.04TB array. It's an impressive hack of a system. Out of the 16 drives for the array, four (4) were defective right out of the box! And two of those replacements were suspect. After a month of handling a full news feed (120G+ per day) we've worked most of the kinks out of it (I don't recommend w2k for a drive array.)

        BTW: I used a pair of 3ware Escalade (6800) controllers. They take alot of the suckiness out of IDE (tho' it's a cabling mess.)
      • please tell me how you get 6 IDE drives on a pc that gives you any performance in a rad function...

        I don't know how he does it, but I have personal experience in doing it two different ways:

        1) 3ware IDE RAID controller, has 1 IDE controller per drive on the card (i.e. 8 ide controllers), which the firmware maps to a RAID Device. Depending on the RAID configuration the drives appear as one large SCSI drive to the system.
        Performance is on par with SCSI.

        2) External IDE-SCSI Raid chassis. Again, 1 IDE controller per hot-swap drive, appearing to the system as one or more big SCSI drives, controlled by a standard SCSI controller. Speed and reliability have surpassed that of a $60,000 SCSI solution sold by Sun I happen to have lying around.

        U160 SCSI drives will give you at least a 70% speed increase and a 80% increase in reliability....

        If I had to store a terebyte of information I'd be an idiot to use consumer level storage (IDE).


        Nonsense, see above. This is simply SCSI bigotry (I know, I was once a SCSI bigot too). What you say is only true if you are using low end cards, with more than one device on each IDE bus, which is untrue for mid- and high-level IDE-SCSI solutions such as 3ware and various external chassis systems. We run our entire enterprise on one, and have done so for well over a year, with much better reliablity and performance than an older, very expensive SCSI solution provided.

        But yes, if people are plugging drives into el cheapo IDE "raid" cards like Promise and the like, or worse, into their onboard IDE controllers (most of which are inexpensive knockoffs anyway) then performance will be very suboptimal, and reliability problems (one device taking down the entire IDE bus, etc.) abound.
  • Actually (Score:5, Interesting)

    by IAmATuringMachine! ( 62994 ) on Wednesday January 30, 2002 @08:51AM (#2924576)
    Actually a DNA sequence is only about 3GB for a human - you're anime DVDs might take more space, at least until you compress them. Then again, DNA should be fairly trivial to compress highly. Let Z = CA, Y = TG, .....
    • Why stop there? You could store four base pairs per byte with the most basic of compression schemes. You could probably compress it down much, much further.
      • Re:Actually (Score:4, Funny)

        by Quixote ( 154172 ) on Wednesday January 30, 2002 @09:08AM (#2924641) Homepage Journal
        Why stop there? You could store four base pairs per byte with the most basic of compression schemes. You could probably compress it down much, much further.

        But be careful with that compression thing! If you compress the DNA too much, you could end up like Minime [geocities.com]
      • Re:Actually (Score:2, Interesting)

        Good insight - I suppose I was just considering the cheap thrill at showing that it can be trivially halved, but no doubt if one is looking at base pairs alone they could probably compress it by a factor of eight. But the other poster was correct in observing that there is a plethora of other meta-information that goes along with it, such as what the various base pairs code for. Then again, if we wanted to be all GATTACA, they would probably do the simple compressed file (seemingly of a third of a gig) and the hardware would would decode it and calculate the meta-information for my insurance company.
    • Re:Actually (Score:5, Insightful)

      by dNil ( 308743 ) on Wednesday January 30, 2002 @09:20AM (#2924683)

      You are correct that the human genome is "only about" 3 giga basepairs of sequence, but to only store that would be rather egocentric. There are as of Dec 3 2001 some 14396883064 bp in the GenBank, and the amount of sequence information [nih.gov] still grows roughly in a exponential manner.


      Now, this will not hit the TB line anytime soon. The trouble starts if you are involved in genome sequencing. Then you need to store the raw data for all that sequence. Each some 450 bp of sequence is reconstructed from about 5 - 10 different fairly high reslution gel images (in the ballpark of 150 kBi per image). Also, recall that even short stretches of the sequence can be accompanied with a lot of annotating information, such as names and functions of genes, regualtory elements or pointers to articles explaining the experimental evidence for such. This mutiplies the storage requirement with quite a factor - nothing a neat little linux box with a huge RAID-array cannot handle though. Thats how we handle the sequencing data from Trypanosoma cruzi, by the way.

    • I made a grammar error (you're instead of your) - it must be that my compressed DNA didn't unzip properly.

      CATcoyboynealGTTA....
  • by Anonymous Coward
    What is he talking about with the DNA sequences?

    human = 3 billion base pairs
    = 6 billion bits of data
    = 7.5e8 bytes
    = 7.3e5 kilobytes
    = 715 megabytes
    < 1 gigabyte

    Sure, lots of other life forms have been sequenced too, but most of these have much smaller genomes than humans.

    So how would you need a terabyte to store DNA sequences?

    • OK, I'm not an expert in this area, but I think when people do research into DNA sequences they get DNA sequences from a large sample of people so they can look for statistical links between certain gene sequences and various properties. Therefore they will need reasonablesamplesize*sizeofsequence, so if you have a sample of just 1000 people you could easily be getting into terrabyte land. (Then they need a highly optimised version of diff to spot the differences in the sequences!)
  • by morie ( 227571 ) on Wednesday January 30, 2002 @08:52AM (#2924581) Homepage
    I guess the simple truth is that now that 100 gig drives are a couple hundred bucks, we now have the ability to store anything we reasonably could need (unless you define "Reasonable" as "I need to store DNA Sequences"). slashdot

    Nobody should ever have need for more than 640 kB of RAM Bill Gates

    Simularities anyone?

    • Personally, I won't be satisfied until I have enough storage to catalog the quantum state of every particle in the universe.

    • by Jenming ( 37265 ) on Wednesday January 30, 2002 @10:02AM (#2924837)
      i store my DNA sequence. I actualy have lots of copies incase i lose some or it gets corrupted.
    • Video is the most bulky storage people would save. How much would people want to save for re-viewing? First you have the time-shifting stuff like TiVo/Replay- perhaps a few tens of hours at most. Then you would be your favorite movies and TV series. As video-phone improves you might be saving some hours of friends and relatives video conversations. With infinite storage, the constraint becomes need and time to view all that stuff. And you'll probably be wanting to spend your time looking at new stuff. So I'd guess most people's real needs would be hundreds to a thousand hours. At 1-2 BG per hour, your talking about a terabyte or two.

      I don't include the argument that you'd have trouble finding old stuff. Computer software is more clever at organizing things - far better than material storage. A good recent example of this is Apple's "iPhoto" that much more convenient for organizing thousands of photos than physical albums.
    • From a Huntsville Times (Alabama) interview with Bill Gates:

      QUESTION: "I read in a newspaper that in l981 you said '640K of memory should be enough for anybody.' What did you mean when you said this?"

      ANSWER: "I've said some stupid things and some wrong things, but not that. No one involved in computers would ever say that a certain amount of memory is enough for all time."

  • Cost Per MB (Score:3, Informative)

    by JohnHegarty ( 453016 ) on Wednesday January 30, 2002 @08:52AM (#2924582) Homepage
    1 Terabyte = 1024GB = 1048576 MB

    $ 5,000 /1048576 is a price of $0.0047 a mb.
    Or another was $4.88 for a GB.

    Now who remembers when harddisks where more than $10 a mb.
    • Not quite - most (if not all) hard drive manufacturers define megabyte as 1000*1000 bytes, gigabyte as 1000megabytes, etc.
      (See, for example, the note on http://maxtor.com/products/diamondmax/diamondmaxpl us/QuickSpecs/42093.htm stating that "1GB = 1 billion bytes")

      Therefore, in *this* case, 1TB = 1000GB = 1000000MB, which puts the price up a little (although not much, I'll admit :-) )

      Cheers,

      Tim
    • Re:Cost Per MB (Score:2, Insightful)

      Yup, except that to a hard drive manufacturer:

      1 Terabyte = 1000GB = 1000000 MB

      Their marketdroids have a bad habit of rounding the values down and evening them off. This allows them to post bigger numbers on the actual size of the drive since dividing by 1000000 instead of 1048576 yeilds a larger end result.
    • Shouldn't that be tebibyte?
  • Nothing special. (Score:2, Interesting)

    by Night0wl ( 251522 )
    A terabyte isn't any thing special. But it's cool to see someone doing it. I was bored once one night. For a mere 36K you could, assuming you already own a Thunder K7 w/ the on-board SCSI pluss needed components, put together your self some really big storage. Using those 181GB Seagate SCSI drives.

    U160 and all of it churning at 10,000RPM. For a grand total of a few GB short of 5.5 Terabytes.

    But assuming you can affoard Thirty 1200$ drives you should be able to spring for a nice U160 SCSI RAID Card with an external connector ;p

    I couldn't even find a case with enough room for 30 hd's.... and I don't want to even think about cooling.

    But I wont have to worry about that. I can't even affoard a 9gb scsi drive at this point.
    • Re:Nothing special. (Score:2, Interesting)

      by millwood ( 542462 )
      HP virtual disk arrays [hp.com]

      Heard a rumor that they may be considering support for IDE in something like this.

  • by rdl ( 4744 ) <ryan@@@venona...com> on Wednesday January 30, 2002 @08:54AM (#2924589) Homepage
    I've been using these for a long time (6200 dual-port in hardware-mirror, up to the 8-port cards for large disk configs), and they're very fast and reliable. Cheap, too.

    $500 for an 8-port 64-bit RAID controller, looking to the host like a single scsi device per logical volume, seems like the best deal available. Along with a motherboard with sufficient slots for gig-e and these cards (easy to get 4 64-bit slots...maybe you can get more with 3-4 buses), and a 4U rackmount case with 16 drive bays, and you can have 4U of rackmount storage for $5k, too.

    I've been using setups like this for clients, as well as for private file storage (divx, mp3, backups, etc.), and know of people using them for USENET news servers (one of the most demanding unix apps for reasonably priced hardware).

    It goes without saying you want a journaled file system or softupdates when you have disks this size, and ideally keep them mounted read-only, and divided into smaller partitions, whenever possible. e2fsck on a 300GB partition with hundred of open files is painful.
    • Actually, I assembled a 600 gig storage device using the afore mentioned 3ware controller.

      First, there were hardware bugs and they recalled the controller

      Second, 3ware dropped the product line, but vendors were still telling me it was available.

      Third, they brought it back, and I had to get a drop ship

      I lost about 3 months on design phase due to this little tidbit.

      Now don't get me wrong, it's working now and seems reliable... but... there's always this nagging suspicion that something is going to go wrong and I'll lose all that data.
  • by danimal ( 1712 ) on Wednesday January 30, 2002 @08:55AM (#2924593) Homepage
    I would rather spend the money on good disk storage with an integrated or integral back-up solution. Why? Well, as cool as all that storage it, what happens when it goes *poof* and you can't get it back. You're screwed.

    Yes, this is a groovy/geeky/cool solution for under your desk, but at least spend the extra dollars for a SCSI card and tape backup unit. You could fit the whole thing on a few DLT's. You can also keep incremental backups to keep the tape swapping to a minimum.

    • That reminds me, I don't know where the hell the tape manufacturers think they're marketing to, but with 80 GB hard drives common now, it's rare to find a tape backup solution that is affordable for a consumer that can handle that much. By affordable I mean drives around $250 and tapes under $10/piece for at least 50GB of storage. I've seen some of the proprietary drives but the tapes cost almost as much as the drive! 5 or 6 years ago the backup drives available to consumers could handle backing up the entire average hard drive of the time onto a $15 tape (Travan), but now people are probably just doing without backups which is a disaster waiting to happen.
    • How about another terabyte array and rdiff? While Joe Average User probably isn't going to be able to afford to do that, he's probably not going to be able to want to build the first one either. If you're a small to medium size company, it'd probably be worth considering. I think by the time you start talking this price tag, you'd be considering some of the mainfraime storage companies for DASD and backup though. IBM's 2105 "Shark" machine will go larger than 11TB now, IIRC, and I'm sure the other "big iron" shops have similar solutions.
    • If you're going to spend $4K on a DLT drive, spend $8K on a DLT tape library that holds 10 DLT's plus 1 cleaning tape and forget about it. Sure it's only 700Gig of backup but you can always compress.. Otherwise upgrade to a 20 DLT tape library box and call it done.
    • by Paul Johnson ( 33553 ) on Wednesday January 30, 2002 @10:00AM (#2924826) Homepage
      Absolutely. And to those who say "Just build another one" / "RAID doesn't need backup", I have only one thing to say:

      FIRE!

      Any serious data store needs to include a backup system which allows for copies off-site. Fire is the obvious risk of course, but floods, vandalism and lightning strikes are all possibilities.

      AFAIK the only generally available tape backup for something this big is DLT, which IIRC can now do around 40GB per tape before compression. With the 2:1 compression usually quoted thats 80GB per tape, or around 13-14 tapes for a full backup. So you really need about 30 tapes for a double cycle, and maybe more if lots of the data is non-compressible (like movies). But this stuff ain't cheap. DLT drives start at around £1000 and the tapes cost £55 each. So thats around £2500 = $4200 to back this beastie up.

      Having said that, the possibility of using hot-swappable IDE drives as backup devices is intriguing. Just point your backup program at /dev/hdx3 or whatever. One big advantage is that if your tape drive gets cooked in the server-room fire you don't have the risk of tapes that can only be read on the drive that wrote them. A Seagate 5400RPM 60GB drive costs £110, which is only a third more per megabyte than a bare DLT tape. Two cycles-worth of backup (34 drives) would be £3,700. And you can probably do better by shopping around. For servers with only a few hundred GB on line this might well be more cost-effective than buying a DLT drive.

      We use Amanda [amanda.org] to do backups here. Its a useful program, but it can't back up a partition bigger than a tape. So you need to think carefully about your partition strategy. (Side note: you can use tar rather than dump to break up over-large partitions, but its still a pain).

      Suddenly that terabyte starts looking a bit more expensive.

      Paul.

      • Oops! (Score:4, Interesting)

        by Paul Johnson ( 33553 ) on Wednesday January 30, 2002 @10:04AM (#2924845) Homepage
        Sorry, I just noticed a thinko in the discussion of IDE drive costs. The DLT costing assumed 2:1 compression. The disk cost didn't. Assuming compression we can squash 120GB onto a 60GB drive, requiring only 9 drives for a full backup, and 20 drives overall (a couple of spares is always a good idea). Thats £2200 for IDE backup, which is actually cheaper than the DLT solution.

        Does anyone out there actually use IDE drives like this? It seems a pretty obvious thing to do.

        Paul.

        • ... for the simple reason that the mechanism (eg: DLT-drive) and datacarrier (eg: Tape) are separated. IDE disks have both in one sealed package, which makes it terribly difficult to get to your data if your stepper motor borks.
          With tapes, you just get a new drive.

        • Re:Oops! (Score:3, Informative)

          by cuyler ( 444961 )
          If your serious about backing up that much data you could also use a 9840 drive which holds 20gb uncompressed and (they say) 80 gb compressed however in my experience you can get 140gb onto a tape. Also, it'll write faster (when backing pu a terabyte having the backup take 32 hours is not a good idea). The 9840B drives write at up to 50gb/hour but usually run closer to 30-35gb/hour. While DLT drives usually write at about 5gb/hour.

          I haven't tested it out but StorageTek has a drive called the 9940 which has tapes that hold 60gb uncompressed (likely 200+gb compressed), it writes faster (10mb/sec ~= 55gb/hour). Also, the drive itself will put you out $33.5k with the tapes being a couple hundred a piece.

          In this case, it'd probably be better just to have a second 1tb raid - then again tapes are much more stable.

          -Cuyler
      • True. But one thing I haven't seen yet is the fact that most backups aren't full backups. You do a full backup maybe one a month or once a year. Every other backup is a diff only. So while the initial backup may take several tapes, the nightly backups shouldn't. At least on the type of system where the data is basically the same from day to day, which was the point of the article.

        Plus, as described in the article, where the point was to have a singe hard drive based storage for dvd's and cd's, if there was a drive failure, you could just take the original media and do the rip again. Annoying yes, but doable. You haven't lost data unless the fire burned down your house and melted the cd's at the same time it took out your storage. That's why companies buy fire safes and use off-site storage.
  • by Thagg ( 9904 ) <thadbeier@gmail.com> on Wednesday January 30, 2002 @08:59AM (#2924605) Journal
    Check out this article [sdsc.edu] referenced by slashdot on July 20 2001 [slashdot.org].

    The nice thing about this article is that the people building it at SDSC really took extreme care in getting quality components that would work together to build a reliable, solid system, and still didn't spend more than $5K for a terabyte file server. In particular, the tradeoff of disk speed vs. power consumption was extremely insightful.

    I built one of these to their spec for my company, and I couldn't be happier. It's worked flawlessly since then. It's not clear if the Escalade boards are still available -- 3ware had said that they were discontinuing them, but they still appear to be for sale.

    thad
    • Escalade originally planned to discontinue their IDE controllers, but due to public demand they decided to continue production...
      • It can actually be a bit cheaper:

        Promise FastTrak 100TX2 * 4 $500
        Maxtor DiamondMax 160GB Drive * 8 $3000
        Maxtor DiamondMax 20GB Drive $80

        You can get an Escalade 7850 for $550 or less, which is a single 64-bit card instead of the 4x Promise controllers. I don't know why there's a 20GB drive in there, maybe a boot drive? At $3k for 8 160GB drives, that's $375 each. Looking quickly at pricewatch, you can get the same Maxtor 160GB drives (5400RPM -- yuck!) for around $260 each. 8*160*(1000/1024) = 1250MB (actual MB) = 1.22 TB for a total of 550+8*260 = 2630 instead of 3580. Plus you have 3 PCI slots more than you had before.

  • I hate to rain on everyones parade (I really do). But this is just a typical IDE raid 5 setup with bigger disks. Not exactly slashdot worthy IMHO. If you're thinking about doing somthing like this, Raid Level 5 is not a bad choice if you don't need redundancy. For more raid info check out:
    http://www.acnc.com/04_01_00.html
    • maybe it's your definition of redundancy, but if one drive fails in a RAID5 array nothing breaks. Isn't that some kind of redundancy?
    • What do you mean "if you don't need redundancy", the only RAID level that doesn't offer redundancy is RAID-0. RAID-5 can tolerate single disk failures, and if you do multiple levels of RAID-5 you can tolerate more failures (depending on how you configure it). The common configuration of RAID-5 with available hot-spares is quite sufficient in all but the most critical configurations, especially if it is a system that is closely monitored. Sure, you can build RAID-1 arrays of N drives where you can tolerate up to N-1 drive failures without problems, but for one space is used a lot less efficiently and for another write performance decreases for every extra level of redundancy you add, but that is overkill for most situations, the chances that multiple drives will fail simultaneously (or within a few hours of each other) is significantly remote compared to single drive failure probability.
  • by MosesJones ( 55544 ) on Wednesday January 30, 2002 @09:00AM (#2924617) Homepage

    1) "Compress" at a higher rate than the CD uses (I've seen this)

    2) Use POV Ray to render Lord of the Rings for the cinema

    3) Keep every src and every .o from every build you do

    4) Set the Linux swap space to be "500Gb" because you've upgraded the Kernel to the new VM stuff and it looks cool

    5) Install Windows XP+ in two years time, with Office XP+.

    Imagine that "Minimum Reqs: 1TB of available disk space"

    It will happen
  • Redundancy? (Score:2, Interesting)

    by evilviper ( 135110 )
    I'm sure some poor fool will do something like this, fill it up with data, then have ONE hard drive go bad, making everything practically useless.

    What we need isn't larger hard drive storage (not that it's a bad thing) we need more speed, and a cheap, gigantic & ultrafast tape backup system to backup all the data. Some PC designs that use better cooling methods would be very nice as well.
    • Read the article. it sais "RAID5". Do you know what RAID5 [anandtech.com] is?
    • Re:Redundancy? (Score:3, Insightful)

      by GigsVT ( 208848 )
      In case you didn't notice, it's RAID5. One hard disk could go bad with no issues other than slowdown.

      They could also do what we did with our IDE TB. We used three RAID5s in hardware, each with hot swap. In theory, if they failed just right, we could lose up to 6 drives without losing any data.

      The three RAID5s are hardware RAID0ed together. The worst case scenerio is a simultaneous failure of two drives on the same array. But we saved so much money using IDE that we just built two complete systems for less price than SCSI. So really, we would have to hit the worst case scenerio twice at nearly the same time to have a total loss.. It gets less and less likely.
  • In fact I remember reading somewhere about a year ago on the linux terminal page about how they put a tb server together for right around 4K I can't find the link, but if someone does please post. But grabbing the third largest drive (100GB) out there will save you a bundle and you still only need 10.
  • pfft, these days people are demanding a terabyte of RAM.
  • 2TB for $8300 (Score:5, Interesting)

    by GigsVT ( 208848 ) on Wednesday January 30, 2002 @09:17AM (#2924675) Journal
    Inspired by Slashdot's earlier story that was nearly identical, and with the help of Peter Ashford from ACCS [accs.com], we built two servers, both with capacities well over a TB, for around $8000 each. They have the capacity to expand to 3TB if need be.

    Story here [smythco.com]

    As far as performance:
    (from my memory)
    EXT3: About 16MB/Sec block write, 45MB/sec block read
    ReiserFS: About 20MB/sec block write, 130MB/Sec block read (that's no typo).
    XFS: About 30MB/sec block write, 85MB/sec block read.

    It seems that file system plays a large role in performance. The arrays are three RAID5 in hardware using Linux software RAID0 on top of the RAID5 arrays to tie them together.

    IDE RAID controllers are 3ware Escalade 7810. Write performance can be greatly increased by using 7850 cards that have more cache.

    We stuck with XFS, Reiserfs had a bigfile bug, files created over 2GB would lock up the computer basically. XFS in general seemed much more mature, reiserfs seems more like someone's college thesis project, that they never cleaned up to be production grade.

    We experimented with different RAID0 stripe sizes, the hardware RAID5 stripe size is fixed at 64k, there are 7 active disks in each array and one hot spare. Stripe size tweaking seemed to mostly trade off read for write speed, within a certain range of values, with a taper off in performance at either extreme, (down around 8k stripes, or over 1024k stripes)

    We eventually went with 1024k stripes. That is what the benchmarks above reflect. The variance in file system performance could very well be due to interactions with stripe size, but there seemed to be common themes (reiser always read fastest no matter what stripe, XFS was always better at writes)

    I have been in so many arguments with SCSI zealots on here over this RAID... I wish people would understand what price/performance ratio means. IDE isn't a superior technology, but every now and then, it is the right tool for the job, when price is a goal too.
  • by VWswing ( 74185 ) on Wednesday January 30, 2002 @09:20AM (#2924681) Homepage
    Is this any more special than the last time
    slashdot announced an amazing terabyte arrayHere [slashdot.org]

    Seriously though.. People's numbers are pretty far off. This can be done for about 3000.. Pricewatch
    has 160 gig drives for $259 .. 10 of these would give you over 1 terabyte in useable space in raid 1.. Or if you just cared about write performance, 6 of them for $1554 would give you a terabyte of useable storage.. another $600 to throw together a cheap pc and cheap ide raid cards.. you get it for under $2500.. big deal.

    Lately I'm realizing how awful IDE really is.. I finally got around to throwing 2 36 gig ultra 160 drives on my box with an adaptec scsi card, running ext3 on top of a raid mirror.. more space than I need (I just keep all my mp3s on an IDE raid.. since my dragon motherboard has ide raid built in).. Since I've gone to scsi life has been happy. I can do things while compiling, while vacuuming my db, etc..

    Funny how mac used scsi before the rest of us, huh?
    • And now Mac uses IDE... I use IDE too, and it is pretty clearcut to me that while SCSI drives offer better performance, IDE drive offer better bang for the buck. Recent IDE drives are nearly indistinguishable (to me) from SCSI drives in terms of desktop application performance. Now when you are building a raid array with, say, a high-load database on top of it, then yes, SCSI will be worth it. On the other hand, if it is a single user workstation, I don't really see much of a reason to go to SCSI. In fact, I'm currently building a network file server with raid with no more than 3 or 4 users over a 100-MBit connection concurrently, and I see no reason to use SCSI raid over software RAID on IDE, since SCSI increases the cost dramatically and offers little perceivable benefit...

      And as far as this story is concerned, the array I plan to build happens to cost about 1100 dollars and provides 480 gigs of storage (160 more is redundancy, so if redundancy wasn't an issue, it would be 640 gigs, and as we all know, "640 Gigs ought to be enough for anyone"), so a $5k terabyte seems a bit steep when you think about it.
  • YKYHBRSFTLW (Score:2, Funny)

    by Graabein ( 96715 )
    You know you have been reading /. for too long when...

    The first thing that runs through your mind when you see the above headline is: "Wow, imagine a Beowolf cluster..."

    Argh.

  • Why not snap in a Promise SX6000 for like $250?

    This neat piece DOES hardware RAID5, so you don't need a fast cpu&mobo, less RAM, and since it can only manage up to 6 drives you can even have 2 as pseudo hot spare...

    The only drawback is the ability of "only" storing 800GB which is nice at this even cheaper price...
  • by JoeShmoe ( 90109 ) <askjoeshmoe@hotmail.com> on Wednesday January 30, 2002 @09:25AM (#2924701)
    Aren't these types of systems more for archiving massive amounts of data than actively working on it? I mean, how much data can a computer actively process anyway? Wouldn't a 100GB drive meet just about any processing demands (genome tracking, video editing, etc)?

    Why not use slower but MUCH cheaper offline storage? I really like the design goal of

    http://www.dvdchanger.com/

    You can easily get 1TB of storage with such a device for less than $1000. True, only one person can access it at a time but that is only because PowerFile wants to charge more for so-called "networked version".

    In theory, if someone could figure out how to build on of these things, you could throw in a two or three CD/DVD drives for accessing and a 20GB hard drive to buffer images. Boom. Now you have the perfect storage backbone for a house-wide media center. I just wish Linksys or someone would throw a linux thinserver onto of the PowerFile hardware and get me something cheap and network-ready.

    - JoeShmoe

    .
  • Why not firewire? (Score:3, Insightful)

    by weave ( 48069 ) on Wednesday January 30, 2002 @09:41AM (#2924762) Journal
    Maxtor now has a 160 gig external firewire drive. You can chain 62 of these puppies." Screw terabyte, think petabyte.

    I figure this is the easiest way to add as you grow without having to break open the case and try to figure out how to add another damn drive in there. For backup, just have two systems with identical capacities and rsync between the two nightly.

    RAID is nice, but for home use, it's not as nice as a nightly mirror. Why? I've seen RAID controllers fail and take out an entire RAID set. RAID also doesn't deal with the "Holy shit, I just accidently type `rm * ~` instead of `rm *~` problem."

    • Re:Why not firewire? (Score:2, Informative)

      by Ixohoxi ( 170656 )
      160 GB * 62 = 9920 GB = approx 9.9 TB
      9.9 TB = approx 0.01 PetaByte

      Don't hold your breath thinking about petabytes.

      Also, RAID isn't for people who make stupid mistakes. Sorry about your 'rm' debacle.

  • DNA Sequence (Score:2, Informative)

    by zmokhtar ( 539671 )
    FYI, the DNA sequence isn't that big. The National Human Genome Research Institute has their 90% complete draft burned on a single CD.
  • This would be great for a home file server. Many new homes are being built pre-wired with CAT5 (alas not my old house). Just add a big file server in the basement. With proper wiring, it can act as an answering machine / PBX, personal video recorder, music (MP3) repository, mail server, file server, etc. With RAID, you have less worries about a drive crash wiping you out (though you'll need a disaster recovery plan - flooded basements would be real bad). I've always wanted to do this! Main stumbling block is getting CAT5 wiring from the second floor (where my computers reside) to the basement.
    • I had a similar problem when I bought a house last year. I had a converted garage that I wired for ethernet, and even ran ethernet into the basement. However, I didn't want to install ethernet jacks in the house, as it's about 100 years old, and I didn't want the hassle.

      I settled on using 802.11b wireless to communicate between the house and the office. I know all about the security problems (my address is....) but maybe the newer 802.11g or 802.11a might work for you.

      I have some workbenches in the basement that are about 4-5 feet off the floor. I'm going to install a file server and leave it on one of these benches.

      It's cold and damp down there in the winter. I don't know how well the equipment will take to the humidity. I guess I'll find out!
  • by jandrese ( 485 ) <kensama@vt.edu> on Wednesday January 30, 2002 @10:40AM (#2925026) Homepage Journal
    Ironically, I just built something very similar to this a few weeks ago (it runs great BTW), but I spent <$1500US on all the components. The biggest thing you have to watch out for is the Hard Drives. I went for the ones with the best bang/buck ratio at the time (Maxtor 80GB 5400RPM drives). This let me build a system with well over 1/2 a Terabyte of usable space at a fraction of the cost. Additionally, the slower drives require less power and less cooling, making them easier to fit in a standard full tower case with a merely beefy (as opposed to server-class) power supply. I think the processor requirements he stated were a little overboard as well. I've found that disk access tends to be limited by the PCI bus (it doesn't help that I used an older motherboard with 33 Mhz 32bit PCI), especially on writes where you can spread data across the write cache on the drives. Be careful when you build an array like this, ATA *hates* having access to both a master and a slave drive at the same time. Be sure to avoid having two disks on the same plex on the same controller. This was natural for me fortunatly, since I was building two plexes, a "backup" and a "media" plex.

    A final word of warning: Promise ATA100 TX2 controllers may look like a natural choice for a server like this, but they only support UDMA on up to 8 drives at once, and Promise's tech support only supports a maximum of 1 (one!) of their cards in any system.
  • by Matey-O ( 518004 ) <michaeljohnmiller@mSPAMsSPAMnSPAM.com> on Wednesday January 30, 2002 @10:42AM (#2925042) Homepage Journal
    Maybe Mirror Usenet!
    Well, exclude the binaries and I can mirror USENET on my Palm III!
    • Re:Mirror Usenet? (Score:4, Informative)

      by Cramer ( 69040 ) on Wednesday January 30, 2002 @11:25AM (#2925258) Homepage
      A full USENET news feed (everything one can find) will exceed 120GB per day. It'll almost fill a DS3. (And we were receiving a "crappy" test feed from UUNet.) So, minus @alt.binaries.*, one could mirror USENET for a few years. With the binaries, it'll hold you for about a week, 2 at the most.
      • Well, yes. That was me using a little Artistic License. a) A Palm III has _2_kb RAM, and b) it subtly inferred that 'aside from pr0n and the Simpsons, there isn't much happening on USENET.'
  • With Raid5 a single drive can fail without causing dataloss.

    How do you know WHEN a drive has failed?

    With the low end IDE RAID cards your notification comes when the 2nd drive fails......

    3Ware's website describes a SNMP monitoring utility for windows, but didn't specifically mention Linux support. Ditto for Adaptec.

    If the raid is done in software, is there a linux program to monitor and notify when a single drive goes down?
  • You can stuff 8 60 gb disks into an antec server case. With a pair of 1600 XP processors, the total cost is 2 promise cards = $50, 8 drives = $720,
    2 xp processors = $220, mobo = $220, memory = $200,
    case = $150, total is about $1500 for .5 tb and $3000 for the full tb. Further, you have a bit more
    i/o bandwidth with 6 ide controllers, and 2 pci busses than with the single. Also when one of them craps out, the other is still going in all probability. Going to 80 mb drives gives you about the same cost per gb of drive space and lets you put .6 tb into a case. When you are paying for floor space and cooling, the 160 gb drives make sense, but when you are tunning these in your basement, going for two boxes makes it a cheaper and more robust solution.
  • With 120 gig drives, your total cost for a 1 TB array would be about $2500. With 4 IDE ports and a large enough case, you could get all that into one box, then network the beastie.

    Now I just need to find $2500. I know I won't have a problem filling it.

    -Restil
  • gawd (Score:4, Funny)

    by kin_korn_karn ( 466864 ) on Wednesday January 30, 2002 @12:10PM (#2925459) Homepage
    using a tb array for anime is like having one of your turds bronzed.
  • by jefp ( 90879 ) <jef@mail.acme.com> on Wednesday January 30, 2002 @12:22PM (#2925534) Homepage
    I've wanted a terabyte of storage since the mid-1970s, when I realized that there were approximately a trillion square meters on the Earth's surface. Store one byte of grayscale image for each square meter and that's a terabyte of data right there.

    Of course these days I'd want 3TB so I could store color images.
  • by ellem ( 147712 ) <{moc.liamg} {ta} {25melle}> on Wednesday January 30, 2002 @12:39PM (#2925645) Homepage Journal
    1 Terrabyte solution - $2500

    All the pr0n you could ever watch - $1,000,000

    The look on your Mom's face when she clicks on AsianDogAssRape10.mpg - Priceless
  • by gd ( 86983 ) on Wednesday January 30, 2002 @02:19PM (#2926158)
    I used to build a similar kind of raid system (half a TB) using the Antec case. Their case is nice, but not for the IDE raid. The problem is that the IDE cables need to be within certain length in order to get DMA 5. The case is designed for scsi, which has a longer cable length limit. To hook up all the IDE drive in that case is really a pain in the butt.

    For IDE raid, this case is good except it's a bit expansive:

    http://www.rackmountnet.com/rackmountchassis/rac km ountchassis_4ud.htm

    It can hold up to 16 drives with hot swappable trays. There should be no cable length problem.

    On a side note, I used to plugin 5 Promise Ultra100TX2 cards in one computer. All cards are recognized but only 8 drives are recognized correctly (I plugged in 12 drives altogether). I remember seeing some where (either in linux kernel source or FreeBSD sys source) saying that Promise has a limit of 12 drives per system, with 8 of then in DMA mode, and the rest 4 in PIO mode with some tweak (burst?). So for a big raid like that, an ide raid cards (either 3ware's or high point's) are recommended. Using a hardware raid ide card also has the benefit of being able to hot swap the drives with the case mentioned above.
  • by mangoless ( 540447 ) on Wednesday January 30, 2002 @02:34PM (#2926221) Homepage
    Storage solution: 1TB RAID5 storage array (Prices are from Pricewatch) Quantity Price Subtotal Intel Celeron 700 MHz w/ Socket 370 MB, UDMA 100, AGP VIDEO 8~64MB shared only, Sound, 56K AMR Modem, 10/100 Network in MidTower case w/Powersupply 1x$135.00=$135.00 Power Magic PCI IDE U/ATA100 RAID Controller w/Cable 4x$22.00=$88.00 Maxtor 4G160J8 5400/133 8x$259.00=$2,072.00 60.0GB EIDE Ultra DMA 5400 1x$85.00=$85.00 Total: $2,380.00 - Mangoless
  • Better performance.. (Score:4, Interesting)

    by tcc ( 140386 ) on Wednesday January 30, 2002 @03:03PM (#2926376) Homepage Journal
    Get a 3ware escalade card in march they'll support 48bits-LBA in the new firmware, you'll be able to hookup those 160GB monsters in raid-0 (or raid-5) with a tenfold increase in performance, without taking up all the PCI slots.

    the TX2 is a nice little card, but you can only use 2 drives per board for getting the "full speed" (else if you use master/secondary, 4 drives will give you the raid speed of 2 in stripe) and then you'd have to stripe your raid-0 drives in software. Instead of wasting PCI slots and using an underperforming card, you pay a couple of bucks more and you get the real thing with full speed and hardware raid5.

    There are a lot of raid benchmarks at storagereview.com as well. IDE raid is so damn cheap.
  • raidweb (Score:3, Informative)

    by NetMasta10bt ( 468001 ) on Wednesday January 30, 2002 @03:34PM (#2926551)
    Ok. This is just inane. Why build this when someone has already done it better for cheaper?

    http://www.raidweb.com

    We purchase their 8 disk IDE RAID arrays. They are hot swap, support RAID 0, 0+1, 1, 3, 5, and hot spare, have dual failover power supplies, come with 64MB cache, which can be upgraded. Configurable via the EZ front LCD display, or via serial console. They support ATA-100, and ATA-133 coming shortly. Software upgradable, and it runs Linux.

    They array (sans disks) runs us $3200. They even have versions that have dual fiber ports out the back.

    WARNING - DO NOT purchase these with IBM GXP75 (75GB) disks like we did... we have about 80 of them that failed.

Understanding is always the understanding of a smaller problem in relation to a bigger problem. -- P.D. Ouspensky

Working...