Pros & Cons of Different RAID Solutions 261
sp1n continues: "We are currently considering 3 options:
(1) SCSI - EIDE controller with six 9G/7200 ATA drives (hadn't heard of this one until recently). This supposedly accesses the drives directly through DMA and bypasses all IDE, just using them as physical media. All are accessed in parallel. I'm a bit weary about the reliability of IDE drives under constant use.
(2) SCSI - SCSI controller with six 9G/7200 u2w drives. The controller currently at the top of my list is the Mylex DAC960SXi w/ 32MB cache. However, something that fits in a half-height bay, instead of hogging a full-height would be nice.
(3) SCSI - SCSI controller as above, running with 2 disk channels and 2 separate RAID 5 arrays for each mountpoint (spool/mail storage).
I'm looking for any experience with IDE/DMA raid setups (1), as well as the pros/cons of making 2 partitions, both which are very active, on one array of 6 drives (2), as well as 2 separate level 5 arrays of 3 for each mountpoint (3). In addition, any suggestions for external controllers and rackmount enclosures would be greatly appreciated. I would like the controller to have an i960 or better processor.
--
"The glass is not half full, nor half empty. The glass is just too big."
just a small note about scsi vs. ide (Score:1)
Jeremy
Sun (Score:1)
More spindles, more simultanious reads (Score:1)
With each additional drive, you can access another unique piece of data simultaniously. While raid is nice and helps solve reliability and performance problems, it isn't the only solution.
It is a technique that newsgroup server admins used to use, and probably still do.
Re:just a small note about scsi vs. ide (Score:1)
I'm going to laugh at you! Scsi drives have more reportability for failures and have some "dead space" set aside for failure recovery in hardware. Also, the difference between the scsi and ide models are huge. In ide the cpu does more, in scsi the hardware does more.
Oh well...
Check to be 100% sure drives are the problem (Score:5)
In many cases, adding more memory or CPU can make a bigger difference than more/faster hard drives, if the problem is that the cache is too small, or paging activity too much. Also check your CPU load and make sure it is nowhere near 100% - if so, time to get a 2nd CPU.
Also, avoid software RAID implementations like the plague. They will slow down your system and provide questionable reliability. You should also try to find cards that have redundant SCSI controllers onboard, and support redundant cabling. This way if the cable, plug, or SCSI bus fails for some reason you will not be SOL.
Finally, be sure that the majority of your disk accesses are reads. RAID will slow down writes, sometimes drastically so. If the majority of your disk accesses are writes, then tuning your kernel to flush dirty buffers less often may make a good difference.
Dell Powervault (Score:1)
Re:just a small note about scsi vs. ide (Score:1)
Re:Dell Powervault (Score:1)
That, and they start at around $50k for 100GB, which isn't even local storage - it's network storage. (Choose CIFS, NFS, HTTP, or whatever else they support.)
Not that these aren't great boxes - we have one and are about to get a second one. But they're pricey and not as fast as local storage - which I believe is what this guy is looking for.
Pricey but attractive (Score:2)
The Network Appliance Filers [netapp.com] are really sexy.
The beautiful thing is they use the WAFL filesystem so you can expand your array when you need to without adding big sets of drives.
Granted, I don't have one but I've submitted the proposals and am waiting on financing. The F720 scales to 464GB, is network attached, has journaling (rad), and can benefit your WHOLE network.
Of course, you have to use NFS or SMB though. I've heard they start as low as $17k but usually $30-40k with a bunch of drives but it's difficult to find general prices without hearing the sales pitch.
This paper [netapp.com] discusses testing the Stanford Linear Accelerator Center performed while evaluating the NetApp filers. It's geared toward Usenet news but if it can handle that, it can surely handle your mail situation.
Does anyone here have first hand experience good or bad with NetApp Filers? And some word on the pricing?
RAID 0 + 1 would be faster than RAID 5 (Score:1)
RAID 0+1 is a lot faster than RAID 5. It's disadvantage is that it's more expensive because you have to buy 100% more disk than storage, as opposed to 20-33% more for RAID 5.
As far as which controller to use... Sun now rebrands DPT controllers, but they're pci and you're stuck on sbus, so I don't know.
Good luck
Couple things (Score:5)
In particular: load is a measure of how many processes are using or waiting for a resource (such as disk I/O, CPU or network I/O). On a busy mail server that's completely adequate for the job, I'd expect to often see a high load average due to the number of processes that are waiting on the network. That is, due to the number of processes waiting for slow network connections to places halfway around the world.
All you mention is the load averages and a fairly non-specific measure of drives that are "cranking away constantly". If the drives were being used at a current constant 10% of available I/O, they'd tend to "crank constantly" even if they could be hit much harder. (still, given that losing email is considered bad by customers, a RAID 5 solution seems like a good idea anyways and leaves you room to grow and handle sudden increases in email from the holidays or spammers or gradual expansion of business)
As to IDE vs. SCSI -- never go with straight IDE on a server. SCSI has the ability to lie to the OS and silently move data from sectors that have gone bad into sectors reserved for that purpose. Sure, it slows down access to that particular block of data, but it's a lot easier than the OS having to deal with failures directly. However, I'm completely unfamiliar with the strange SCSI - EIDE setup that you're describing -- if it treats them as just physical media and provided the SCSI interface itself, it may be able to do that particular SCSI trick, as well. Physically, SCSI drives and EIDE drives are identical -- as in, you can find the *exact* same drive from certain manufacturers, only one has SCSI and the other EIDE. Reliability of the physical media is the same, IOW. In a normal configuration, *apparent* physical reliability is higher for SCSI due to wonderfully useful trickery.
I don't recall the exact model numbers, but I've seen pretty good results with Mylex RAID controllers before. (more along the lines of database stuff than what you're talking about -- somewhat different needs, but not all *that* different, I suppose.)
I can't see putting two partitions on one RAID device as making a lot of sense -- since things are striped you'd end up running into contention issues.
IOW: I'd guess that option #3 would be the fastest -- it's also probably the most expensive.
If I were you, I'd check more carefully to determine how much of the currently available disk I/O is actually being used... If the budget allows it, the dual-channel RAID solution sounds pretty good. You might want to go with two single-channel RAID cards instead -- makes it easier to stock a backup card in case a card decides to die. Try and get something with hot-swappable drives, too. It makes the RAID stuff so much more useful.
Also, I don't know the details of your setup (of course), but seriously consider breaking the mail serving task into separate pieces and run it on separate machines.
You have:
1) incoming email
2) outgoing email
3) email from customers
4) email customers pick up (POP)
It sounds like you have one machine handling all of these. Breaking these tasks onto separate boxes (If you've made the mistake of telling customers the same thing for #3 and #4 (ie, mail.isp.net instead of mail.isp.net and pop.isp.net) it might be impossible to split those two tasks away from each other)
You can have a setup such as:
outgoing1 through outgoingN all behind the single name of "outgoing" that internal machines are told to send email to that they don't know how to deal with
mail1 through mailN all behind "mail" that customers are told to have as their outgoing mail server. In particular, it should blindly send off email it doesn't know how to deal with to outgoing.
pop (harder to break into separate machines, but possible)
incoming1 through incomingN with MX records pointing at them for your domain.
Now, breaking into that many machines is probably silly. Moving outgoing to one machine and everything else to a second machine (and possibly mailing lists off to a third machine) may make a *lot* of sense though. Don't get tied into the idea of a monolithic machine to accomplish everything related to a particular task -- eventually it's much more expensive than many cheaper boxes to handle the same task.
Fibre Channel RAID (Score:5)
Our solution is going to be a single cabinet RAID (level 5 for accessing smaller files) with a "hot spare" that will rebuild a crashed disk on the fly. This being a standard cabinet we'll have 8 disks, of which the capacity of 6 will be data (one parity (term used loosely as parity is striped on RAID-5), and one spare).
The disks are Seagate's 10,000 RPM Cheetahs, the most commonly recommended units among all the vendors we've talked to, and the controller is a multi-channel u2w with fibre interface to a Q-Logic PCI adapter.
The total system is going to run just over $15,000. This sounds like a lot, but pricing lower end systems isn't too much cheaper and you'll never get 24-hour turnaround on failed parts (if they're even available). This seems like overkill for a single system, but by adding a fibre hub later we can use the single system for many many machines once a file controller (dedicated machine) is put into place.
The beauty of SAN is that it operates much like FTP, with a control and a data connection. The control connection occurs over your existing LAN, and the data is transmitted directly over the fibre channel (max rate of 100 MB/s).
Other NAS (Network Accessible Storage) models are somewhat cheaper to implement, but performance can never match the fibre as the "control" and "data" connections (NFS or SMB) both transmit across your network.
I apologize for digressing from the straight RAID topic, but I felt obligated to give the
-Steve
Re:just a small note about scsi vs. ide (Score:1)
However, SCSI drives reserve dead space and move the contents of bad sectors to a reserved sector and remap the bad sector to point at the previously reserved sector.
IOW: SCSI drives hide physical defects on the media from you, where IDE drives require the OS to deal with the problem.
What about the AMI MegaRaid cards? (Score:2)
- A.P.
--
"One World, one Web, one Program" - Microsoft promotional ad
Suggestions (Score:2)
I'd suggest a SCSI or Fibre Channel raid array, with some 10,000RPM drives, and lots of cache on the drives and the controller. If you are currently IO-bound, you want to make sure that you remove that bottleneck for at least a couple years. Some sort of external enclosure might be nice if only due to the fact that 10,000RPM hard drives make a LOT of heat, so it keeps things a little less critical. Oh, and of course I'd recommend using RAID-5 for obvious reasons. RAID-0 is faster, but clinically insane.
Re:RAID (Score:1)
It allows you to cluster your mail server to multiple servers with very little fuss.
Re:Pricey but attractive (Score:1)
Yes, they're network attached. Good for stuff that is going to be used over the network, naturally. Not good if you need -really- fast access to the data from -one- server. They have CIFS, HTTP, NFS, and something else. We use this for all of our UNIX and Windows home dirs - the same data is accessible via either NFS or CIFS, which can be quite convenient at times.
The feature I like the best from their WAFL file system is the snapshot. It's configurable, and can be set to take hourly and nightly snapshots of the entire file system. A user deleted a file? They can go back into their
your raid dellemadilemma (Score:1)
try to do some benchmarking before you buy (Score:4)
I spent a fair amount of time looking at RAID 5 solutions this past summer for a client. Both external and internal, for Linux. Tried several different controller card brands and drive configurations, did a lot of reading, and bugged a lot of vendors.
You really should try to test your options and all of the configuration combinations using something like Bonnie [textuality.com], on a machine with a simular configuration to your target server. Make sure that your Bonnie test file size is at least twice physical RAM, to eliminate the effects of RAM and controller caching on the results.
I found that using 6 drives in a RAID 5 config was a LOT faster than 5 drives, most of the time. In fact, 3 drives in an array was faster than 5 in some cases. I think it has to do with the way the controller cards were calculating the distributed parity, and perhaps also due to things the driver was doing. 4 drives usually wasn't much better than 3, either.
Stripe sizes for the array can also make a big difference. 32k vs 128k, etc. Larger strips sizes are usually better for I/O speed, but you may find for email that having a higher number of random seek transactions per second is better than raw speed.
I did not get a chance to do any hard testing of multiple channel configurations with these cards. I suspect that splitting the I/O onto multiple channels would be a win.
IMHO, you definately want a i960 based board or system, with the fastest CPU you can find on them. I noticed a signifigant difference between boards with the 33Mhz part vs. the 66Mhz part.
FYI for others: for controllers, the AMI MegaRAID (alias Dell's PERC2/SC) just blows chunks. Older non-LVD, non-raid SCSI systems can run rings around it, at least on write speed.
It has been my experience that the write speed on a RAID 5 system is generally only a fraction of the reading speed, like 1/4th to 1/2. For a quick and stupid test, do something like 'time cat /proc/kcore > /tmp/kcore' and do the math for MB/second.
oh, and my current favorite card is the DPT Millenium V controller, using it in several systems in various places for the last 3 or 4 months. Here are some Bonnie results for a system with a DPT with 6x 7200 RPM drives, all on the same channel (internal) Linux kernel 2.2.10, dual P3 500Mhz:
something isnt right (Score:3)
Re:just a small note about scsi vs. ide (Score:1)
Re:just a small note about scsi vs. ide (Score:1)
raid 5 (Score:1)
I would recommend a Sun MultiPack [sun.com] with Solstice DiskSuite [sun.com] for management.
Load Ave 10 need not mean an IO Bottleneck. (Score:2)
I would be surprised if any exim system was having more of a bottleneck to disk than it was to network. Your disks are faster than your network and exim is pretty light on un-required disk access.
The more bottleneck to network (by network I mean end-to-end with your customer not just your links) is large, the more processes are going to hang around longer.
More processes, more paging, less cacheing. Less cacheing, more IO. More paging, more IO.
Probably teching granny to suck eggs - but you do have your swap space on a seperate device don't you
The more exim processes that hang around longer, the more processes for the CPU to switch around. The more switching, the more likely you are to see paging.
If the processes hang around longer, they take up more memory which reduces the cache-size available.
Exim has several files which it accesses frequently, mainly the retry databases and its configuration. These should perminantly be in memory.
Bottom Line:
I do however suggest that you don't consider moving a single server to RAID. If you have a server that you want to move to RAID for efficency purposes... your design is wrong and you should be building a scalable system .
Red
SCSI RAID (Score:2)
Don't even think of software RAID.
For some background on SCSI itself try http://www.scsifaq.org
There are many types of RAID 0-5 are the "standard" but there are several new ones eg level 10 which attempts to address throughput issues. Your actual space requirements don't seem outrageous so level 5 would be reasonably cost effective.
Another thing you will probably want is hot swapping. Once you've had a box tell you a drive is dead, you've removed it and popped a new one in without taking the box down, you will not want anything else.
On the IDE vs SCSI debate, whilst IDE is fast it seems to me that under continuous load SCSI gives better throughput.
As others have pointed out - a 'designed' server, rather than a "roll your own" box would make sense. Compaq Proliants make excelent Linux machines. The SMART arrays are very good and support RAID to level 5. You can fit a lot of disks in the drive cages as well. They are a little pricey but of a good quality and reliability. We have rather a lot of them running NetWare. I get to use the older kit to run my funny Open Source stuff
A suggestion might be:
Proliant 1600, 2 x 600Mhz processors, SMART 3200 with 64Mb cache, 5 drive slots - 81 Gb available after RAID 5 on 18Gb 1" drives (that's Ultra-2 SCSI) supports upto 1Gb RAM (has 128 by default). There is also an on-board SCSI interface for CDROM etc. This comes in at about GBP 9,000
What about the software side? (Score:1)
I'm not familiar with Exim, but aren't there more efficient solutions?
Although my experiences have been with much smaller configurations, qmail [qmail.org] reportedly handles loads of this magnitude on lesser hardware.
Re:More spindles, more simultanious reads (Score:5)
PCI Bus-- The fastest controller/drives wont make a difference if the PCI bus cant get data to the drives fast enough. Look at what else you are running, consider upgrading memory/processor like another person said.
Stripe Size-- In a hardware raid setup the controller will write to one hard drive for xxx kb before switching to the next hard drive. You want to figure out what size 'chunks' of data the OS will send to the controller. Netware uses a 64k block size, which means large file reads/writes will be sent from the OS to controller in 64k pieces. If your stripe size is set to 8k, and you have 6 hard drives in a raid 5 array, look at the following situation.
drive1 - 8k total=8k
drive2 - 8k total=16k
drive3 - 8k total=24k
drive4 - 8k total=32k
drive5 - 8k total=40k
now time to calculate parity. this requires the controller to read data from drive1,2,3,4,5, calculate the parity using an XOR algorithm then write the parity
drive6 - 8k parity
drive1 - 8k total=48k
drive2 - 8k total=56k
drive3 - 8k total=64k
Now it has to calculate and write parity again.
compare this to a stripe size of 64k
drive1 - 64k total=64k
calculate parity, write parity
drive6 - 64k parity
Having a poorly configured stripe size can cause a huge performance problem. NT and NetWare(current versions) both optimize their disk writes to 64k. YES! I know the block size in NT is 4k, but the OS still optimizes disk requests to 64k chunks for performance reasons. I'm not sure about various *nix, can someone else answer that? Some people have the notion that writing smaller amounts of data to multiple hard drives is somehow faster. Hard drive maximum transfer rates are based on controller->hdd cache. A 64k or 8k write isnt going to fill up the cache on the controller, and a single 64k write will take less time on the controller, fewer commands will need to be issued, and performance will be better overall.
An anecdote about this.
Copying a 1.5 gig file from a workstation to a server with the stripe size at 8k took about 40min, with the stripe size at 64k it took 6min
Another consideration is how much cache the controller has and what its use is. The AMI Megaraid controller has 3 types of cache. Write, Read and IO. Write cache allows for Lazy Writes, which can improve performance. Read cache will allow the controller to read ahead, hopefully improving performance. IO cache(and I20 cards) allow the controller to take some of the work off of the processor, improving overall system performance.
Some controller come with multiple channels. The AMI MegaRaid series 438 controller has 3 different SCSI channels on it. IIRC each channel can transfer up to 80MB/S. This is similar to the idea of putting hard drives on different SCSI controllers except that I've never seen an implementation that allows a raid array to span multiple controllers.
The above info IS NOT ACCURATE for RAID 0, RAID 1, or RAID 3, those levels have different rules. You should consult the OS vendor, documentation, and Database vendor for specific settings to optimize the controller.
Re:Sun (Score:1)
RAID Setup (Score:1)
I used to run a large mail server at a fairly big ISP who will remain nameless, and I'd like to suggest you consider a RAID-10 solution, we were experiencing disk bottleneck problems, and this really helped. Basically, RAID-10 splits the disk i/o half and half over multiple drives with the standard mirroring/striping. This is a simplified explination, but that's the basic idea.
Procedure (Score:2)
This is my favourite tool for disk analysis. Secondly go to http://www.sun.com/sun-on-net/performance read what you feel is important but download the se toolkit.
Run zoom.se to get a professional analysis of your system. Run virtual_adrian.se to get a virtual professional to tune your box.
I recommend you do this BEFORE spending any money. I have an E3000 with 2Gb RAM and 2% processor utilisation because nobody checked the system properly.
If it is your disks I recommend sun kit even though it is expensive and RAID 5. Don't worry about people telling you about it being slower, compared to a thrashing single spindle it is extremely fast and as importantly reliable. Tinker and learn!
Re:Couple things (Score:1)
I suppose you could spam everyone and tell them to change that, and then have your router redirect that port to the appropriate machine for the people who forget.
Re:Dell Powervault (Score:1)
On a side note another good solution (except that it's not external) would be a Dell Poweredge server. I'm currently running a Dell Poweredge server with Linux and RAID 5 and it works quite well.
...and yes, I'm bias, I work at Dell.... in support...
Re:Dell Powervault (Score:1)
Re:What about the AMI MegaRaid cards? (Score:2)
drives (Score:1)
as for controllers, i say mylex, high-end adapter of your choice, i would beef it up to 128 megs of ram in any case...
as for the drives, go 10,000 RPM, the difference in access times will help you out, and i think that is much more important in your case than trasfer rate... for an ISP, i would only ever buy IBM or Seagate drives, reputable workhorses that they are...
for great cases and setups, i honestly recommend macgurus.com - they specialize in mac stuff, but a scsi tower is a scsi tower, and they will build it with good components at a reasonable price to whatever specs you need... (no, i dont work for them)...
Re:"Home Raid Solutions" (Score:1)
Just looking for a way to play with raid on a home system. As you put it, if it were to go down, who cares =) I'd rather make mistakes now while I can afford them.
I see you guys like the case on my page =)
-S
Scott Ruttencutter
Re:Sun (Score:1)
I'm sorry about your experience. However, I support Sun's internal hardware and I have not seen abnormal failure rates on the beasts. Sure, disks go bad - they have moving parts. I support loads of A1000s and they work great. As to diagnostics, that is a sore point for me as well. There's nothing really at the OBP level to test the array. They do come with software that is minimally useful however.
It may be overkill, but I much prefer the A5x00s. All around though the hardware from Sun is VERY good.
_damnit_
mail configuration (Score:5)
a couple of years ago we had the same problem till I discovered that all our mailboxes where in one mail spool directory. This was a huge bottleneck and after adapting qpopper and configuring sendmail to a split mailspool dir load came down to 1. (split mailspool is
check above first before you buy hardware
Re:Sun (Score:1)
I am currently contracting to a major shop setting up ISPs and we're using E250s with A1000s in the rear for data. I've been to 4 different sites in the world, with this setup and its just not failed so far, as long as you put a terminator on it.
The RAID Manager software is good for setup, but nothing else. I agree there's nothing for diagnostics on it, but I've never had any failure on the device, except when I kicked one and 2 disks popped loose. But the disks were fine after that.
I wouldnt go for an A5x00 on an Ultra 2, just because a diff scsi card is much easier on the system then putting fibre in there and having more possibilities(?) of crap to wade thru. It is overkill.
Re:Couple things (Score:2)
We are all in the gutter, but some of us are looking at the stars --Oscar Wilde
Best RAID? (Score:1)
Well, after dealing with many different brands of RAID controllers, I have found that DPT's Millenium series tend to be the best. The card takes care of everything, and they're available in 64-bit flavors with 3 onboard U2 channels, or 2 Fibre channels.
Mylex are good if you're looking for a cheaper solution, or Adaptec for dirt cheap. But, if you're looking for the absolute fastest possible solution, it would be Fibre Channel Quantum Atlas 10k's on a 64-bit DPT Millenium Fibre controller in a RAID 0+1 configuration. With a 10 drive setup (equal to the total capacity of 5 of the drives) you could easily reach 100MB/s. Of course, that's gonna cost you a pretty penny.
I just did this... my experiences (Score:5)
Assuming IO matters, I am putting my full faith (and job) on Mylex controllers. I love them. I only have one in production, but am about to deploy 5 more, and we'll come in at about 600G managed by them. They just work. The DAC960SXi I have in production (for 7months now) has been flawless, delivering wire speed doing RAID 5 without any effort after initial config (which is a bit annoying, to be sure).
My production system using it is doing far too many things - mail, staging server, enterprise backup. This is changing - lack of time and historical accident made it that way. The point is that the Mylex handles it with no grief.
If you're building these, be aware that Mylex external controllers need to be mounted in a box with "internal" style connectors. For good RAID cases, check out http://www.storagepath.com/ - they are what I'm using. They look low rent, but the boxes are nice (if a bit expensive).
Down to specifics. For a mail only machine doing the sort of volume you're talking about, I'd deploy a dual processor box with three SCSI busses (one for spool, two for mbox/system access - system access is pretty cheap in comparison) attached to two harware RAID setups. Granted volume allows, I'd go RAID 5 for spool (with 18G disks, that's ~65G spool) and hot spares. For mboxes, I'd do 0+1, for as much space as needed. Stripe disks on independent controllers, mirrored to each other. Striped mirrors can grow, as you need them to (RAID 5 can't, easily). You don't want to lose anyone's mail. Hot spares for each.
Assuming 100G of mboxes, that's a total of 17 18G disks. Add three Mylex DAC9660SXis and (initially) 3 rack mount cases, and that's something around ~24K.
Availability beyond disk is a different question, that gets platform specific. I do mainly Solaris now, so I can't talk much about Linux for this. Mylex controllers can do dual active/dual host configurations, but things get more complex, and
a summary here doesn't make sense.
Other options like A1000s (Sun specific) and Netapps require different approaches - they're very different beasts. We have all of the above, and treat them very differently. We'll buy them all again - they're all decent - but are good at different things.
If you can, buy raw Mylex contollers through a reseller like TechData or similar - you'll save a lot.
Hope this helps some.
-j
Evaluating RAIDs (Score:4)
Units such as SUN A1000 and Baydel connect via SCSI and you just watch for an orange light, even the part-time cleaner could pull out the correct disk and replace it and have the system back and running without the OS noticing. Storageworks and Clariion(EMC) do the same but over Fiber Channel. SCSI units tend to top out at 40Mb/s, Fiber Channel theoretically top out at 200Mb/s (they have two 100Mb/s loops) but since I only had a max of 30x18Gb disks to play with the disks were the bottleneck. Monster multi-scsi machines like EMC/IBM's can achieve whatever bandwidth you want by multiplexing SCSI connections.
We've evaluated software RAID, Hardware RAID over SCSI, Hardware RAID over Fiber channel from EMC, IBM, SUN, Compaq(storageworks) and in our opinion a good smart raid controller with two data channels and load balancing software is impossible to beat.
For Speed, stripe(0) mirrors together(1), in RAID 0+1, this allows reads at double speed because each mirrored disk can handle a request seperately, and slightly sped-up writes because you can write to the RAID controller's NV cache and carry on doing your work whilst that takes care of putting the data to media.
This of course has only a 50% data efficiency.
Using Raid 3 or 5 you lose one disk in a rank for parity, raid 6 (used by Network Appliances) use two disks for parity but have wider ranks of disks. This often means that sequential reads are fast, because a request for data wakes up all the disks in the rank, but therefore the whole rank can only handle one request at a time. Writes are slower because you have to read a stripe of data, calculate parity and write the whole stripe back again.
RAID5 is really good for data which doesn't have to be the absolute fastest.
Whilst we were doing performance tests, we measured a linear increase in speed up to 20 disks (in transactions/second), and there is a definite art in making sure that you spread the load over all the disks available so that a single disk doesn't get thrashed to death.
In conclusion? well, that depends on your OS.
For me, for a PC-based system I would choose a hardware RAID system with SCSI connection which let me choose the LUN sizes. 5 disks in a RAID5 configuration will only waste 1 disk in capacity. If you're finding your mail spool is being thrashed then I would build a 10 disk 0+1 raid and stripe the mail area across them, using the rest of the area for home areas or web areas or something else which has large storage requirements but doesn't get hit hard.
Oops, this assumes that this REALLY is your problem, a lot of disk problems go away by adding more memory to the machine... I assume you have measured this by tracking the outstanding I/O queue.
Re:SCSI RAID (Score:1)
I'd like you to back up this claim (if you can).
You see, serious people like Deja [deja.com] does in fact use Linux software RAID and get it to work. Rather well too. Does zero-point-two-five-percent disk related downtime sound OK to you? It does to them.
Mailbox format can definitly affect performance (Score:1)
The original posting doesn't say if the server is running pop/imap, and thus if it is used as the final delivery point for those 10,000 users.
If it is, then the hashing of the mailbox path that lucky luck mentioned is worth investigating. Also worth investigating is alternative mailbox formats. If you're using mbox format, then I'm not surprised there's a problem if you have a large number of users (and/or reasonably large mailboxes).
There has been some discussion about these issues on the exim-users mailing list [exim.org]. I read it via egroups. [egroups.com]
Re:Couple things (Score:3)
Correct me if I'm wrong, but isn't the load the average number of processes in the run queue? This would mean that processes that are blocked on the network or disk would be in the sleep (wait) queue, and not counted in the load average.
In this case, a load of 20 means 20 processes are ready to run, which is not so good.
Re:What about the AMI MegaRaid cards? (Score:1)
e-mail me at bm@datapace.com if you are still thinking about getting one.
Re:Load Ave 10 need not mean an IO Bottleneck. (Score:1)
I suggest you examine your system carefully to see what is actually happening. Besides using vmstat, iostat and friends you can get
a software package by Adrian Cockroft which has a 'virtual adrian' which points out all the bad spots in the system.
It can be found here : SE toolkit [sunworld.com]
Re:Dell Powervault (Score:1)
Clariion fibre-channel RAID box (Score:1)
As for me, I'm considering their lower end SCSI boxes connected to high-end Intel server running Linux, beings I have $52,000 to spend this year! (yippee). The idea is to put all the money where the valuables are (the data) and use commodity hardware and open source software to drive it. The OS would boot from internal HD and all data and local customizations (ie, /usr/local) would be on external RAID box. If a CPU box fails, unplug it from the array, plug in a spare CPU box, reboot. Minimal downtime due to hardware problems. I can then repair or replace the busted CPU box at ease.
For linux jockies, there is efforts to bring fibre-channel drivers to Linux. Be sure to look at the work at Worcester Polytech [wpi.edu] for info.
Seconded! Re: netapps (Score:1)
You do, however, need to be aware of how to make your application play well over NFS. Exim is actually reasonable at this. Qmail is good at storing mailboxes on NFS thanks to it's Maildir technology, but the mail queue *needs* to be on a local disk... I'm not sure about postfix or sendmail (bletch).
Unfortunately, I can't remember the command to make the individual LEDs on the disks blink, which is one of the best remote diagnostic features ever.
-Dom
Re:just a small note about scsi vs. ide (Score:2)
There is even a usable external ATA RAID subsystem out there, manufactured by Arena. They use the same i960 that is used on high end SCSI RAID controllers and deliver decent performance with cheap drives. (Remember: The I in RAID once meant inexpensive)
Of course, in a server, you want reliable drives. But that has next to nothing to do with the interface. UDMA is very reliable as far as the interface data transfer is concerned, I would rate it even higher than SCSI in this regard (proper CRC vs. ordinary parity). The quality of the disk mechanism is another thing, but with IDE drives being so cheap, you could afford to upgrade the things so quickly that they never get a chance to fail at work. Or you could just buy two big ATA drives for less than one SCSI drive and do RAID1.
For the records: Recent ATA drives really scream. Look at these bonnie results from my workstation (dual P2, 128M, Red Hat 6.1, 2.2.13, Test run on 2 GB / Partition 50% full):
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
512 18196 97.7 23648 22.5 10807 19.1 19702 84.2 23128 6.9 129.9 2.0
The drive is a 20 GB Seagate ST320430A which sells for less than 400 DM around here. Remember: These are not artificial results on an empty filesystem. This is my real root partition which is used daily.
Wrong, wrong, wrong (Score:4)
Firstly vmstat tells you very little about disk i/o. What it is good for is the processes. Look at the output from vmstat 5 for example. The first three colums are r b w, running, blocked and waiting. If there are blocked processes look at WHY processes are blocked. Use top to get the i/o wait information. If there is a lot of io wait then look at the disks. Use iostat -D to get percentage utilisation of the disks. If there is a lot of disk wait then you may need to either add more disks or spread the load.
It is interesting to note the relative speeds of devices:
If cpu takes 3 seconds to do a job then,
Level 1 cache takes 10 seconds
Level 2 cache takes 1 minute
Memory takes 10 minutes
Disk takes 7.7 months
Network takes 6.5 years
Get stuff off your disks better! Monitor your cache hit rate to get information on efficiency. Use vmstat or sar or stuff from the se toolkit. Get the se toolkit from http://www.sun.com/sun-on/net/performance. Run zoom.se to monitor your system. Run virtual_adrian.se to tune your system. Use the right tools and don't just add more memory, identify the bottleneck, fix the bottleneck, re-test and repeat until the performance is satisfactory.
Re:just a small note about scsi vs. ide (Score:1)
Anyway, there sure is not two heads on each platter as suggested by another poster. At one time Seagate made Barracuda drives that were able to read data off two platters in parallel. They dropped it in the later Barracudas when the increase in data density made it possible to make faster drives without this feature.
Another issue is that IDE drives are usually optimized to withstand getting started and stopped again and again by powersaving, whereas SCSI drives are optimized to run continuously for years.
Benny
Probably Not I/O Thats the problem (Score:1)
A few things that may help;
1) Our POP mail server (~1000 users) running on an old SUN Solaris machine (LX) was having problems because of the number of NIS lookups that were going on. System CPU was up near 75% constantly, I/O waits near 0, and load was also very high. Solution; make mail server a NIS slave as opposed to a NIS client. Reduced load by 20% immediately. Same goes for DNS lookups.
2) Make sure you're not writing/reading to/from NFS mounted fs.
3) Install rec. Solaris patches - these can make a big difference. Try installing Virtual Adrian, and see what it reccommends.
5) Don't buy EIDE for all the reasons mentioned previously. For lots of simultaneous hits, SCSI outperforms EIDE every time.
6) Consider fibre channel disk arrays from SUN - expensive but they are nice especially the new A5200. Give 22 spindels as opposed to the 14 in the A5100.
7) Ignore the guys talking about s/w RAID solutions being a BIG slowdown. Sure h/w RAID 5 is much faster than the s/w equivalent but when it comes to RAID 0+1 then there ain't a lot of difference. Not only that BUT s/w RAID systems tend to be much easier to configure and maintain w/o a doubt - check out Veritas Volume Manager (love it!) or even the free DiskSuite (with Sun Solaris server version) is better than any h/w RAID configuration I've seen.
8) I would bet my next salary that adding a RAID system to your mail server will increase performance by less than 15%.
Oh, and I've been managing enterprise level Sun systems now for 8 years, so I'm not just a Linux geek who has read too much ;)
Hope this helps.
This may have helped me as wel... (Score:1)
Filesystem Size Used Avail Use% Mounted on
and that's AFTER cleaning out... before I had / at 100%,
If you think you know what the hell is really going on you're probably full of shit.
Re:What about the software side? (Score:1)
Probably not
qmail starts up a seperate process for every email it delivers, whereas exim starts a seperate process for each batch of email it delivers. On a lightly loaded system, the point it probably moot - however on systems like what we are discussing, it's quite probably not!
Re:just a small note about scsi vs. ide (Score:1)
Re:just a small note about scsi vs. ide (Score:1)
physical limits of IDE might become a
issue.
I don't know if it's reasonable to plug
raid5 array disks as IDE slaves. But i would go
for SCSI if you do big raid5 arrays.
With 5 ULTRA2 fast and wide scsi in raid5
array (software raid5 in Linux) i have seen
reports of 40MB/s read and write throughput.
And if you have dough, buy 2 controllers
and put raid5 array on both. And stripe among them
--miku
Re:just a small note about scsi vs. ide (Score:1)
seem to notice (mp3's dont stop, i can move my
mouse again). So i wouldn't even consider ide for my ws anymore.. even if scsi is 500-1000 NOK (US $80-140).
Re:just a small note about scsi vs. ide (Score:1)
So the man is right - no difference in reliability whatsoever.
The question is I think that they are actually hitting not a drive bottleneck but the UFS filesystem bottleneck so they should either abandon Solaris or buy (forgot what's their name) the file server and reliable filesystem solutions for Solaris extensions.
So even if they upgarde to RAID they are not going to get anywehere.
Also on the topic of RAID: There are very good external boxen using proprieatry solutions for IDE hotswap and presenting a single u2w or better SCSI interface to the box. And they are rackmountable. And they cost about 4000-5000 fully populated with 13-17GB EIDE.
Re:just a small note about scsi vs. ide (Score:1)
You fail to mention which chipset and transfer
mode you are using. And there _are_ SCSI hosts that do a lot worse than recent ATA interfaces (the cheap ISA-Adaptecs that come bundled with scanners and ZIPs for example).
I once was a SCSI advocate, too. Then came the Intel PIIX3 and Mword DMA mode 2, nowadays I am using a PIIX4 and UDMA33. I have _never_ had my system go slow on me with DMA ATA drives, much less the mouse pointer stop moving.
There is just one special case: Swapping to ATA drives can put more of a load on the system under certain circumstances (because I haven't seen ATA drivers use command queueing yet - it's already specified in the ATA spec, though), but you don't really want to be swapping in the first place. If your system does that constantly, you should have gone for more RAM instead of that pricey SCSI drive.
Re:More spindles, more simultanious reads (Score:1)
RAID is not the answer to your problem (Score:1)
For example, try using a fallback mailhost for outgoing mail (fallback_mx in Sendmail). That way messages that cannot be delivered within a couple of seconds are relayed to the fallback server, keeping your outqueue clean and tidy.
For incoming mail, use a different server, or if you can, use several. You could just put them all in the MX list of your domain, with the same priority. This does wonders.
It might be smart to look at the mailbox format. Some mailbox formats (MBX) have much better performance than others. And you could put POP3 and IMAP on a third server.
All this is much preferable to simply installing a RAID array, IMO, based on the information you presented.
Yup, I love Clariion stuff too. (Score:1)
I would definitely try to tune the system before throwing hardware at it though. Find out exactly where the bottleneck is.
RAID Solution (Score:1)
More on SE - Orca (Score:1)
raid and sun solaris (Score:1)
definitely install virtual adrian to get a better
idea of system tuning you can do and where your
real problems might lie. have you tuned all the system paramaters possible ? ncsize ? turned off
all non essential daemons/apps on the machine ?
mylex controllers seem reliable but were definitely a pain to configure - we're using them on a dec fileserver solution. one downside that appeared was they took 6-8 hours to initialize the array - compared to 1.5 hours for a non mylex controller
we're now switching from DEC+Mylex to Sun+Infortrend who make a very nice scsi-scsi controller. www.infortrend.com - we're using the 3201U2G - 4 Ultra2Wide scsi buses.
don't go to raid unless you know what you're getting yourself into - it's far more complex and expensive in the long term apart from your initial investment in the hardware. you'll have larger spares provisioning, your documentation (you do have some right
my rule of thumb at present is JBOD to 50G, RAID
as a NAS for 50G-500G and SAN (RAID/fibre) for above 500G. you really don't need raid below 50G except for specific performance reasons
it's been an interesting thread to read, since i'm
right in the middle of working on a raid5 server implementation.
-jason
Don't focus 100% on the hardware (Score:1)
I'd recommend using the Postfix MTA, as it has almost all features of Sendmail, and it's secure, and (hold on) it's even faster than QMail. Eventually you could use it with the Cyrus IMAP/POP services. You definitely want to make sure that you don't have all mailboxes in the same directory. Build a hierarchial structure where you never have more than say 30-50 subdirectories/files in one directory.
Ok, if disks are still your problem, consider:
1) Software RAID is usually a lot faster than hardware RAID. And for the money you save on the HW controller you could buy faster/more disks.
2) An IDE disk is identical to an SCSI one, except of course for the interface and the warranty. The price difference is mainly due to the warranty.
3) UDMA/ATA-{33,66} IDE interfaces are as fast as any SCSI solution if you keep _one_ disk per channel. The main problems with IDE solutions is the short cable length allowed (a problem for 10+ disks) and the number of controllers you must have (one controller for each two disks)
You can spend $50K on a SCSI/HW-RAID solution easily. And you won't know if you'll even get the speed of one single UDMA drive from it (yes people actually get 15MB/s both from their single UDMA drives, and from their expensive DPT RAID solutions). At least consider a software-RAID and eventually IDE solution before rushing out to spend the next 10 years budget on the shiny HW-RAID solution.
Your setup is fairly small, eg. you would probably do just fine with a four-disk RAID-5/10 for spool and mailboxes. This is where SW RAID is worth considering. Granted, for 20+ disk systems, HW RAID may well be a better way to go, eventually combined with SW RAID.
My 0.02 Euro.
Re:just a small note about scsi vs. ide (Score:1)
Re:just a small note about scsi vs. ide (Score:1)
Re:RAID 0 + 1 would be faster than RAID 5 (Score:2)
Some array types (notably HP that I know of) will dynamically rearrange data storage between RAID 0+1 and RAID5 to optimize speed and space.
RAID 5 vs. Striping (Score:1)
The multi-controller solution is probably best; someone mentioned the Sun StorEDGE product with the Cheetah drives. This is a great piece of gear, and coupled with some really good storage management software (might I suggest Veritas Software's File System/Volume Manager) you'll get a very flexible solution providing the most bang for the least buck. With the Veritas product you can manage the data on the fly over several drives, and monitor & tweak the configuration on the fly while in a production capacity; additionally, the Veritas product provides a journalled filesystem which will allow rapid restarts in the event of a crash and if you have the drives, can be configured to fail over to available spares.
Yes I am a Veritas Consultant =^) but that does not change the fact that this is an excellent product that would probably go a long way towards addressing your issues (which seem more performance oriented than reliability related) on your existing drives. Check out this link for more info: http://www.veritas.com/library/su/fsconceptwp.pdf
Good Luck!
-Videoranger
attractive but attractive (Score:1)
One of the best things about it is it's simplicity. GUI people use the nice Java applet to control it (it get's better with every release of the OS), and us Unix people have a great command line interface.
If you plan to use the NetApp with lots of clients (about 500 in our case) in a mixed enviroment, the Netword Appliance is probably the most reliable and simple to maintain solution. If you want the fastest RAID array to connect to your mail server, it will simply amaze you
If your budget allows, got for it!
Re:just a small note about scsi vs. ide (Score:1)
No - he is in fact right. I don't know about all manufacturers, but certainly IBM drives use the same hardware - the only real difference being the content of a firmware chip (and the cable connector, I guess).
This doesn't mean that all the drive's features would be available for both SCSI and EIDE, and it doesn't stop them charging loads more either.
-- Steve
Re:More spindles, more simultanious reads (Score:1)
NetApp the good the bad and the ugly (Score:2)
All we seeem to hear is RAID5 this RAID5 that (Score:1)
reads vs writes (Score:1)
Re:What about the AMI MegaRaid cards? (Score:1)
Software RAID is NOT faster (Score:1)
>disks.
Since when? I've been working on servers with and without RAID for ten years now, and this is the first time I've EVER seen this claim. Was that a typo? Hardware RAID is much faster usually, as well as more reliable. Yes, it can be harder to set up, but in the end it is well worth it. Remember, you get what you pay for. Any time you use software to do a job that hardware can handle, you are devoting CPU cycles to it. Properly designed RAID controllers offset a ton of processing that would otherwise be done by the host CPU. They don't put RISC processors on RAID controllers just for show
As for SCSI controllers, I'll echo what others here have said. Mylex is one of the best. Not the easiest to config, but by far one of the fastest and most reliable controllers out there.
Re:SCSI vs. IDE (Score:1)
Yes, it is. There are still people who recommend SCSI without further investigation.
> For example, let's say your system is trying to read data and do a write at the same time.
No decent OS would do that. It would concentrate on reads and save the writes for later, unless the write cache is full.
> With IDE your OS has to issue one command to the controller which passes it to the device and then waits...
With IDE maybe. With ATA not. ATA does have everything that SCSI has, and more. Read the specs at www.t13.org.
> With SCSI, the OS tells the controller all the operations it wants to do and the controller looks at it and decides if there is an optimal way of doing the commands.
Of course, only if you have a host adapter / driver which support command queueing, and an application that _does_ do multiple accesses at the same time. Most don't. And a decent OS reorders the commands anyway before they are sent to disk, partly eliminating the need for reordering by the drive.
Re:Sun (Score:1)
I've installed 3 A1000's over the last couple of weeks, ranging from the minimally specced ones (50Gb RAID5) to a fully loaded one (8x 36.4Gb)
Although RAIDmanager is only marginally useful and you have to make sure your
--
Full Time Idiot and Miserable Sod
Re:just a small note about scsi vs. ide (Score:1)
For example think of a readhead able to read the ever shrinking area of a single bit the surface of the platter (bear in mind that tricks are used to work out the real state of a bit, you don't need heads able to read a bit on a stationary platter). Attempting to use that head to write to the disk may well destroy it, you now need annother head to write with.
The two heads are on the same arm, and can't operate at the same time (no point, you know what your writing
Bryn
--
You've bought the hype (Score:1)
The problem is that each vendor implements this differently, and has a different definition of what a SAN should be. None have really addressed the complex issues, instead implementing the kind of hack you describe - NFS with a data channel over FCAL. You still have the problems of NFS to contend with (no reliable locking, consistant transactional guarantees in client and server implementations, etc.). Heck, most vendors are selling FCAL HUBS instead of SWITCHES to accomplish this storage sharing because the switches aren't prepared to do TCP/IP over fiber!
Ideally a SAN would be a well fleshed-out spec that allows massive amounts of storage to be conveniently accessed accross a network with all of the guarantees of a local disk. That's how it's being sold. However, right now it's looking like little more then a way to get NFS to run faster.
-Peter
Re:Software RAID is NOT faster (Score:1)
I've seen quite a few people finding in disbelief that they surely didn't get what they thought they paid for when buying HW RAID solutions.
Back in the old days I'm sure letting an i960 do parity calculations was a boost. Well, times change.
The _only_ thing I've seen HW raid controllers being better at, is large setups (10+ disks) where a pure SW solution will load the memory and PCI busses of the system heavily. Especially RAID-1 where a SW solution will have to duplicate data to all disks, the HW solution will have an edge moving this duplication off the main memory / PCI bus.
For smaller setups, like the one in question here, software RAID is absolutely both a viable solution, and probably offers by far the best price/performance.
Look at the FS first! (Score:1)
Also, if you do get a RAID, I'd highly recommend a box that does not get controlled in software, i.e. Solstice DiskSuite or Veritas Volume manager (I love veritas' VM, but as a raid controller it lacks intellegence).
A good external box with hot-swappable drives and a sizeable write-back cache (w/ a battery!) is my favorite way to do this stuff.
The trouble with NetApp (Score:2)
Anyway, NetApp's are a great solution for multiprotocol storage. One of the drawbacks is that it is Network attached and therefore only as fast as your network...which has been a problem for many of our customers. Another HUGE problem is backup. There is only one product that can do it well-a product called BudTool. BudTool is a little guy that some geeks in my company thought up and brought to market, then along came NetApp who asked us to figure out a way to b/u their filers. Out of that venture NDMP was born. BudTool is the only product that makes use of NDMP correctly. That divison of my company was recently sold to Legato systems, who plans to EOL that product. NetApp is now scrambling to find another solution, since they've been recommeding BudTool from Jump Street....
Pricing is also an issue. And you were right in saying that they start at aroung $17K, but that is WITHOUT storage. A good sized storage solution, let's say 1 TB is going to run you upwards of $100K. Yikes.
There is also a good resource for people who are thinking of deploying a NetApp solution, which is the toasters users group. You can send an e mail to toasters@mathworks.com and ask to subsrcibe to the group. You'll get alot of good feed back on what works, and what doesn't. You'll also get to see the downside to using it (and BudTool). I think there is info about the group at http://teaparty.mathworks.com but i haven't been able to get there in a few.....Check it out. It's definitely worth the trip.
And if you need any quotes I'd love to help you out!!! Just Joking
Re:All we seeem to hear is RAID5 this RAID5 that (Score:1)
That is simply not true. Reads in RAID 5 occur from all volumes where a stripe resides. A file never exists on a single volume in RAID 5 unless it is smaller than the stripe size.
Raid & NFS Systems for Sun Sparc & ISP's (Score:2)
Reliability Is the issue when it comes to email, and raid systems. Ofcourse Sun has the edge, so why not stick with Sun Software & hardware. The sun StorEdge A1000 has a caching controller and usually 30-40 gigs per rack, it plugs into your SCSI Bus, and you can simply add another Dual Channel Scsi card to split the load or add redudancy.
Network Appliances makes an Excellent Solution. NFS Toasters are the way to go in a distributed environment. Say you have customer on a shell account, well you can export the mail directory and mount it VIA NFS and access it from the shell servers without throwing more email load on them locally. NFS Toasters come in a great looking appliance rackmount case, and depending on how much storage you need, is how much rackspace you need.
And ofcourse there is StorageTek, which will run you a pretty penny, but offers Fibre Channel, or Multiple SCSI channel connections, full redundancy, caching, hotswap and maintenance features.
I'd never stick and IDE solution on a production box, You need something that you can get support on and Services on, so i'd suggest that you stick with the Sun StorEdge A1000 drive systems for complete compatibility and put it under the same Support contract as your UltraSparcl
AND
As far as email is concerned, you should setup an MX server to cache and forward incoming email, these work real nice since you can run RBL or pre-process out spam without killing the actuall server that holds and processes email for incoming clients. You have to look at a distributed environment, as email is precious to alot of people, and a single server machine is not gonna cut it when your upwards to 20,000 customers doing that much email.
PS. Try out Qmail too :) smaller footprint!
Go with a professional solution (Score:2)
I'm as much of a tinkerer as anybody; for my own use I don't mind spending two bucks of labor to svae one buck of investment, because I'm really investing in myself. That said, if I had 13K users depending on me for e-mail, I wouldn't mess around; two days of down time could be fatal for your business.
I'd invest $1.50-$2.00/user in a professional grade solution:
Hardware SCSI raid controller.
Drives on hot swap trays.
Same/next day on-site service contract.
External cabinet that can be swapped over to another computer.
It's been over two years since I spec'd a solution like this one (I'm doing software exclusively these days), so I can't make a specific recommendation for today's hardware. I know that some devices used to come in a separate cabinet and looked like a humungous SCSI drive; they even had their own RJ-11 to hook up to a phone line for remote diagnostics from the vendor's tech support.
If the money to swing this is impossible, then I'd recommend mirroring rather than RAID 5. All these kinds of things are compromises between reliability, cost, convenience and performance. RAID 5 is an excellent overall solution from a performance standpoint; but if you cannot afford this RAID 1 is a good choice. It offers fast reads at the cost of slow writes and survival from failure on either disk. In this application, users won't be affected by slightly slower write times. Since drives are so incredibly cheap these days, I'd say this is a pretty good choice if you are strapped for cash. You could even use IDE drives. If you could afford a second IDE controller, then you could use software mirroring across two different controllers for improved throughput.
One thing I haven't looked into is RAID-2; RAID-2 is like RAID-1 with additional error correction codes. It is seldom used in SCSI because SCSI does this for you, but it might be worth looking into for IDE raids.
Good luck.
Really what would be great is failover clustering.
Some observations... (Score:2)
1. When was the last time you defragged the drives? Chances are this will reduce thrashing immediately.
2. Add more memory. More cache == less I/O. Double the RAM for a week and see how much better things are...
3. Hardware RAID is the only RAID. In most cases, the overhead of s/w RAID exceedes the I/O performance increase. Plus, the OS (whatever OS) need never know the boot drive is spread across 5 drives is three racks...
4. Hot Swap is a must for a production environment. Nothing beats the warm feeling of yanking a dead drive, slapping in a new one, and watching it get rebuilt on the fly - and the users never know...
5. Any amount of RAID will still fail badly if the PSU dies - always get redundant, hot swap power supplies.
6. The same goes for cabling.
Re:Load Ave 10 need not mean an IO Bottleneck. (Score:2)
Indeed, a high load average indicates that there is no I/O bottleneck, and a low load average may indicate an I/O bottleneck.
The run queue holds only those processes that the kernel thinks can constructively use CPU cycles. Once a process asks the kernel to access an I/O device, the kernel decides whether the device is currently available. If not, the process gets kicked off the run queue until the device becomes available again.
Thus, if you have a lot of processes hitting the same device, an I/O bottleneck would actually drop the load, as there are fewer processes able to use the processor.
Re:Dell Powervault (Score:3)
I think network-attached storage is a fine idea and the "right solution" for many things, but I just have to add a rebuttal here anyway.
Network-attached storage is faster than local storage if your network (including the protocol stack) is fast enough and your local-storage subsystem (including its own separate protocol stack) is slow enough. That's a totally useless claim. It's like saying that a train is faster than a car, leaving out the part about the train being an unloaded bullet-train engine on an empty track and the car being a Yugo stuck in New York traffic.
In actual fact, the raw bandwidth of modern storage interconnects (e.g. UW SCSI, FC) is higher than that of most network interconnects (e.g. 100baseT) for which the adapter cost is similar. In addition, the protocols used for storage (e.g. SCSI, the various layers of FC) are more suited toward that task - duh - than are the protocols used for networking (e.g. TCP/IP). There is no reason in hell that it should be faster to use network interconnects and protocols to access your storage than to use storage-specific interconnects and protocols.
Why might it appear that network-attached storage performs better? I can think of at least three reasons right off the top of my head:
At this point I should disclose my own biases. First, I work for EMC. That's not by choice - the company I was working for got bought out - and I'm often not thrilled about it, but the pay is good. In particular, I don't buy in to all of EMC's arrogant "storage is the center of the universe and the Symmetrix is the ultimate storage device" attitude, and I heartily dislike our own Celerra NAS product even though it blows the doors off NetApp in terms of performance and scalability. Secondly, my professional areas of interest include distributed, cluster, and SAN filesystems, so I of course have some fairly strong opinions on such matters. That said...
I think that once we start seeing true, mature, multi-platform shared-storage filesystems, NAS will start to seem much less appealing. Why pay for NAS when you can just add software to your existing hardware investment and get all the sharing with almost all the performance of local access? Now all we need is a decent implementation of such a filesystem.
IDE still isn't SCSI (Score:2)
I wouldn't go that far.
Yes, IDE has finally caught on to such things as DMA and busmastering, and throughput on IDE devices is in the same arena as SCSI now. But.
IDE is limited to two devices per bus, and generally requires one IRQ per bus. IDE also has very strict and short cable length limits, and lack a "external" connector -- you generally can't have an external IDE device (I know is is possible, but the cable restrictions make it very difficult).
There are more kinds of devices (scanners, printers, etc.) available for SCSI then IDE. SCSI is generally more capable in terms of what you can do with it.
IDE controllers tend to be very primitive compared to their SCSI counterparts. Things like bus disconnect, command queuing, scatter-gather, even busmastering are often not available or iffy on IDE controllers. This applies especially to the onboard controllers in many motherboards; the number of shortcuts taken there are incredible.
Likewise, the drive electronics and HDA components in IDE drives are often cheaper then those in SCSI drives. These are all design and engineering issues, not issues with the specification itself, but they exist. The problems stem from the fact that IDE is marketed to be cheap, cheap, cheap, and thus gets are higher incidence of cheap components. It isn't limited to IDE, either -- you can also find cheap SCSI hardware, it is just that there is less of it.
IDE often appears faster in benchmarks, because benchmarks typically try to do operations in bulk on a single device. IDE has a lower command overhead then SCSI, so for such things, IDE will be faster. But when you get into the real world, and have multiple processes trying to access multiple devices at once, that is when IDE stalls, while SCSI keeps on going.
I realize this started off as a discussion about RAID, and that IDE RAID devices are not your typical RAID devices. They usually have one drive per bus, connected to a custom controller that multiplexes them all and presents them to the host as a SCSI interface. But the topic has drifted to more general applications.
Just my 1/4 of a byte.
Re:Evaluating RAIDs (Score:2)
Kinda why you want gobs of battery-backed RAID controller cache memory... (and a UPS, and clean power...
Your Working Boy,
Re:Go with a professional solution (RAID-1 vs 5) (Score:2)
The reason for this is that RAID-1 uses 1:1 mirroring of a 2-drive set while RAID-5 uses rotating parity in which parity information is distributed across all drives.
With regard to space, using RAID-1, your usable yield (what shows up in df) is half of the total disk space put into it. With RAID-5, parity info is spread througout all the drives. Eg., I have a RAID-5 using four 4GB drives, which gives me 12GB of usable space. With 0+1 on this configuration, it would be 8GB usable.
As for speed, both RAID-1 and RAID-5 allow you to read from multiple disks at once (which, of course, is a win). For writes, a drive pair in a RAID-1 will take as long as a write to a single drive. On RAID-5, however, it takes longer because (afaik) the RAID controller has to determine which drives to write the parity info to, which takes CPU time.
A decent little overview is at DPT's site (sadly, only in PDF) at http://www.dpt.com/pdf/understand_raid.pdf [dpt.com]
Re:More spindles, more simultanious reads (Score:2)
FOLLOWUP - Current Solution (Score:2)
After much shopping, questions, advice and temporary insanity, we decided to go for a new Linux box to handle the mail. Apparently, the load wasn't only coming from disk i/o wait; the kernel was using 70% cpu. We chose a Dual PIII/500 setup on an Asus P3B-DS, 512M ECC SDRAM (less than before, but prices are so high right now, and we figure processes should end sooner on this box), Intel Pro/100, Seagate Barracuda for system, six Seagate Cheetahs for spool and mail storage, and a Mylex eXtremeRAID 1100 (w/ the 233MHz i960).
It was configured with 5 spindles in RAID 5, with 1 as a hot spare, and then partitioned in half. I'm confident this badarse controller can keep up on the writes, with minimal performance hit. Preliminary results with bonnie are inconclusive, since it's working with one huge file, rather than thousands of small files. If write performance lags once it goes online (this Sunday am), we'll split it into 0+1.
Exim, QPOP, and IMAPD were hax0red to use a double-hashed directory structure. ie: "spin" would reside in /var/mail/s/.p/spin (the dot was required for those who have a single digit username). This should eliminate any overhead that ext2fs may have with large directories.
Thanks for all your advice, keep it coming. If you're a gamer, check out http://www.xmission.com/quake
-Kevin Blackham Xmission Internet Salt Lake City, UT
Actually... (Score:2)
A collegue of mine recommends doing a complete backup/reformat/restore cycle every 2 months or so on partitions that see a great deal of edit/extension to files - on a partition in use since '93 i expect this would give a radical reduction in trashing . . .
I also give you a chance to test your backup procedures