Ask Slashdot: Art, Linux and the Slashdot Effect? 204
patSPLAT submitted this artful submission: "I'm asking Slashdot: What kind of box does Linux need to handle the Slashdot Effect? I'm an artist, and I'm working on a sculpture which will be self-documenting with a running server/webcam. Since the server will be a part of the piece, I don't want to spend more than I need. I do want it to be able to handle a heavy load if my piece is well recieved. I'm planning on getting a 10/100 Ethernet, but I'm wondering about processor and memory. Could I get away with an older Pentium? Would a Celeron running in console mode do the trick? 64MB? 128? What do you think I could get away with? The website on the piece would be no larger than 5 megabytes, and webcam would obviously require some resources. I'm not sure how much the webcam would take yet, so give me the minimum and I'll go up a step to account for the webcam. "
Re:Slashdot effect killer (Score:1)
And drop lots of packets... (Score:1)
dynamic content (Score:1)
Tuning, tuning, tuning! (Score:1)
Re:RAM fragmentation (Score:1)
Re:I get slashdotted all the time. (Score:1)
ramdisk and cache (Score:1)
I've heard (though not benchmarked) that a ramdisk will swap out, allowing other system processes (or cache) to utilize memory, and that access through a swapfile is more efficient than access through a cache. I'd be interested in hearing of|seeing benchmark results or comparisons of ramdisk performance vs. cache hits on otherwise identically configured boxes -- same memory, same OS, same load. The discussion was general (other Unices, NT, etc.), not specific to Linux, so YMMV applies.
Any takers?
fhttpd... (Score:1)
Re:Bandwidth (Score:1)
Tuning for load (Score:1)
If the load gets even higher, split your server. Get your images from another machine, possible with a special web server optimized for static data ("phttpd [signum.se]). If you are using an SQL backend, put the database on a dedicated machine.
And always remember: In servers, memory is more important than I/O. I/O is more important than CPU.
Bandwidth (Score:1)
--
Re:Let's look at how to do this systematically :) (Score:1)
Apache does NOT fork for each connection! (Score:1)
This is untrue, unless you are using CGI scripts for dynamic content. Webservers that fork for each connection went out of mainstream use about 5 years ago. Apache uses a pre-forking model, meaning it has a "herd" of seperate processes each of which handles one connection at a time. Multi-threaded servers are more efficient, but the pre-forking model is optimum for stability.
While there are other webservers known faster for static content (Zeus comes to mind), I don't know where Apache stands for dynamic content. It probably depends what language you are using: mod_php4 is supposed to be very fast, and of course if you write a custom module in C you can make it as fast as it needs to be.
It should be noted that Zope can be used with several servers; Bruce Perens in fact uses it with Apache. It is not considered to be especially quick, it is the rich functionality and flexibility that people choose it for.
Philip Greenspun is rapturous in his praise of AOLServer; but then, he thinks we should be using dynamic content for everything, and I've never heard of anyone else who actually uses it (apart from AOL, obviously). Bear in mind that AOLServer is as tied to TCL as Zope is to Python.
No dynamic content. (Score:1)
Many people are telling you that dynamic content matters a lot too. This is less valid for your application. Slashdot has lots of dynamic content. Everybody can customize his settings, and needs a different page.
If you have a piece of artwork, which is photographed by a webcam, you could go get a snapshot for every "client". Don't do this. It will bog down tremendously. Just have a program make a picture every 10 - 60 seconds. Then you have almost completely static content, and you can serve LOTS of pages using a fairly conservative setup.
Roger.
Re:Confused, but here are my interpretations.. (Score:1)
Re:Let's look at how to do this systematically :) (Score:1)
Nice explanation though.
Re:Your Bandwith is what counts (Score:1)
Re:What's so artful about this? (Score:1)
Main Entry: artful /-f&-lE/ adverb
Pronunciation: 'ärt-f&l
Function: adjective
Date: 1615
1 : performed with or showing art or skill (an artful performance on the violin)
2 a : using or characterized by art and skill : DEXTEROUS (an artful prose stylist) b : adroit in attaining an end often by insinuating or indirect means : WILY (an artful cross-examiner) 3 : ARTIFICIAL (trim walks and artful bowers -- William Wordsworth)
synonym see SLY
- artfully
- artfulness noun
So, it doesn't have to mean "dishonest".
Re:WHAT'RE YOU WORRYING ABOUT??? (Score:1)
Re:Cannot compute. Not enough data. (Score:1)
Resisting the slashdot effect (Score:1)
answer: don't use your own box.. (Score:1)
and just update your webcam image when it needs
updating.
of course, then you get to pay for all that band-
width.. not a happy day when THAT bill comes in.
also, if most of your content is static, use squid
or some other transparent caching proxy.
Re: Tell that to ThinkGeek (Score:1)
--
Re:What's so artful about this? (Score:1)
;)
Yes, but... (Score:1)
What if your entire disk takes up 50MB of space, and your OS uses only 50MB of RAM? Why not set up a 50MB RAM Disk, and still have 28MB of RAM to spare? That way everything is always in RAM, and you never have to worry about it? Heck, throw in another old, cheap 32MB DIMM, and you'll go up to a spare 60MB of RAM!
I think I could fit a minimal linux-based Web server running in 50MB... That is, if all you want is a web server, and don't care about anything else. Which is what it sounds like is required.
Re:WHAT'RE YOU WORRYING ABOUT??? (Score:1)
Re:ramdisk and cache (Score:1)
Is this limit a compile time, boot time or run time variable that I can change?
Re:Cannot compute. Not enough data. (Score:1)
I based my recommendations on how much traffic I thought a peice of art might attract, which is probably quite a bit at first since it's a new idea, but not a lot of repeat visits, I'd guess. I mean, you don't read your horoscope in da vinci's armpit.
Either way, I'd like the piece in my living room.
You've invented the free version of Akamai (Score:1)
I wonder if someone could write a freebie script that does the same thing using free web space like GeoCities or Xoom. The script could automatically create accounts on those sites and shed the load for serving static content to these free servers. >:-) I bet GeoCities would soon figure out a way to block this behaviour, though.. but it still might be fun.
Bandwidth Bandwidth Bandwidth (Score:1)
about any unix w/apache and a sane config can survive the
bandwidth. a T1 is not sufficient. I'm assuming
you will have lots of graphics (an art site), and you would pass 3G in less than 12 hours. Also, if bandwidth is limiting transferrs then each transfer takes longer and you run up to max httpd's sooner. In fact a 486 w/32MB ram and an ide drive would probably do fine on an OC3, but on a T1 you can forget it.
Dual Celeron MBs (K-6? Pentium?) (Score:1)
Link to Abit? (Score:1)
Re:just get an account with a websever & upload (Score:1)
-Chris
Re:just get an account with a websever & upload (Score:1)
Re:From our experience.... (Score:1)
In other words, you only use apache for local hits taken by a local proxy. The proxy then transfers the data to the client. This way, the amount of time apache spends working on a hit is lowered, giving more clients/sec to apache.
Anyone else know more about this?
Re:Confused, but here are my interpretations.. (Score:1)
Confused, but here are my interpretations.. (Score:1)
If you are doing some sorta server-side dhtml (someone reply and tell me what mod_perl, cgi and php, thunderstone and coldfussion all classify as), double the memory, make it a PII 266 (whatever the slowest moddles are). If there are a lot of these programs, double the ram, increase the disk cache.
Now, for tcpip stuff, set your MRU/MTU high. Since you are handling large chunks of data, using less than the MAX won't pay. Think of it. Would you rather send 10k 2 byte packets with extra overhead per packet, or 2 5k packets?
Re:String Cheese and Hot Biscuit (Score:1)
Practice this long enough, you don't need to actually think about it.
Off topic, but its something people should do...
Re:Simple...Celeron+128Megs (Score:1)
url (Score:1)
Re:Layer 4 switching. (Score:1)
Re:Don't forgot bandwidth in this equation (Score:1)
Speaking of Linux servers... (Score:1)
I have an AMD K62-300 running through dual-ISDN (the router uses dhcp to assign addresses, set to permanent, of course!). and a Redhat 6 CD.
What I want to do is set up Your Average Server (i.e. smtp, pop3, ftp, http).
I read through my copies of nag and sag, and read almost all the HOWTOs that pertain to networking.
I gave up on sendmail after a week. Was that a configuration file or line noise? There was another alternative that was suggested in the HOWTOs, but it would never run. The best I got from the mail subsytem was sending mail out (sometimes), but any mail sent in was happily received by whatever program and subsequently disappeared into the ether. ftpd doesn't seem to allow permissions based on user (i.e. allow upload but not delete for user X).
httpd (apache) appears to work out of the box, but ALL network access to the box is flakey (sometimes it is lightning fast, sometimes it is pathetically slow, getting to 300 cps, even from an adjacent machine! - I am using a recent PCI Realtek 10baseT card)
The configuration options for most Linux programs appear to be arbitrary, almost deliberatly cryptic, and stored in the most unlikely of places.
Isn't there some way to configure a linux box that doesn't take more than a day? a week? (I gave up after 2).
Re: sleeping with the fans on (Score:1)
Re:just get an account with a websever & upload (Score:1)
Re:the slashdot effect (and starwars effect) (Score:1)
That doesn't well solve the problem. Sure, your box still gives you a command prompt, but web surfers out there are still not able to get to your site because there are no available httpds.
I believe the issue we're dealing with is how to allow a heck of a lot of people come in and use the site, not simply get stopped at the door.
- Scott
------
Scott Stevenson
Don't forget NR_TASKS! (Score:1)
In
#define NR_TASKS 512
For the life of me, I cannot figure out why this is set at 512. Recompiling the kernel can be a really aggrivating experience for those who come from a background of not having to recompile kernels. So this is just another thing that makes Linux unnecessarily diffcult. What would be ideal is that the installer prompts you for this number, and creates a kernel based on your requirements.
- Scott
------
Scott Stevenson
Bandwith... (Score:1)
This can be useful. If you don't want your machine dying from overload, purposly putting a bandwith throttle on it protects the machine-while denying some people access when things are busy. This is bad for an e-commerce site, but for your purposes, that may not be an issue.
If you are trying to be up all the time, regardless of load, you will have to have a pipe to the internet that can match what your server can put out. If you've only got an ISDN line, forget about a dual Alpha setup-you'll never get close to slashdotting the box.
Re:Ramdisk versus cached disk... My experience. (Score:1)
Re:Apache does NOT fork for each connection! (Score:1)
This is only true with an inappropriate configuration. If you want, you can even tell Apache to never fork again any child after the initial start-up (except for CGIs, of course, which were not subject of the discussed).
More practically you just set the limit higher, how much requests a child is allowed to answer before it dies.
article.. (Score:1)
WebCam drivers? (Score:1)
I have seen references to an open standard set of drivers, but I've long since lost the link and don't know the progress... Anyone?
Re:String Cheese and Hot Biscuit (Score:1)
Simple...Celeron+128Megs (Score:2)
large # apache spare processes (Score:2)
Layer 4 switching. (Score:2)
OVERKILL (Score:2)
He's having
a) 5 megs of what will be cached, static pages
b) static webcam images, undoubtably well under 1 meg.
Thats *it*
Period.
If he spends more than 150$ on this, he's wasted money.
His total bandwidth is 100 mbs. That means he can unload ~15 megs per second. Are you trying to tell me that a low end pentium can't shove 15 megs *in ram* out per second? Come on, a low end pentium could do half that from a fast scsi disk, let alone from ram. Your main concern is saving bandwidth - shut off reverse DNS, for one. As for the system, turn of unnessisary logging, reduce the max number of forks, etc to minimize disk accesses and limit how much memory apache will eat up. CPU isn't a problem on static content in this day in age, even for slashdotted pages.
- Rei
Re:How about FreeBSD? (Score:2)
Think Software (Score:2)
Secondly, consider the BSDs. (Moderators: this is not flamebait. I have a valid point). Their TCP/IP implementation is a good deal faster than the Linux equivalent, and while the Linux stack is maxing out on concurrent TCP/IP connections (which is a possibility, especially with lots of images, etc. on your site) BSD will keep on chugging. I'm not sure how much of an issue this will be here, though. I think for the most part, unless you're Yahoo.com, you should be okay with Linux. But hey, you're the judge.
Finally, be sure to think outside the box when it comes to HTTP servers. There are other servers besides Apache, believe it or not. And in your case, there are ones that are a lot more optimized than Apache for serving up static content (I think it's static, save the webcam. You didn't really say). thttpd [acme.com] and Zeus [zeus.co.uk] (it's not free, shoot me) come to mind.
Re:Ramdisk versus cached disk... My experience. (Score:2)
I believe this is an incorrect statement under Linux. If I recall correctly, Linux' ramdisk driver just allocates pages for the ramdisk's filesystem in the disk-cache and just asks that they never be purged. That way, there aren't two copies of the same data in RAM, and you don't have to copy data back and forth between the ramdisk and the cache.
I imagine that the reason you saw better performance with those bdflush parameters is that more of the OS got cached into memory, fewer things got edged to swap, and logs that weren't on the ramdisk before (eg. everything in /var/adm) are benefitting from the huge flush period. I personally would not recommend going 10 minutes between flushes though. What happens, for instance, when a HD starts getting flaky?
--Joe--
Re:Confused, but here are my interpretations.. (Score:2)
as for mod_perl, cgi, php and the like, they classify as "server-side programming", not dhtml (that's html + client-side scripting).
Roxen Challenger (Score:2)
Re:Your Bandwith is what counts (Score:2)
I'm sitting here trying to think of a sculpture medium that wouldn't be particularly harsh towards functioning electronics that you wish to stay functioning throughout the process.
You certianly can't use a torch for anything, and an arc welder would fry every component in the computer in about 1/13th of a second. With paper mache you'd have to be very careful you didn't drip any goo inside the vents. Clay would similarly be bad juju.
Epoxy based putties would work alright, as long as you got it right the first time. Flying debris from griding or chizzling it down could be very bad for the machine, not to mention the vibration problems.
For the same reasons, if you wanted to use wood you'd have to cut it well away from the computer and finish it before attaching it.
As an artist i'm intregued. Conceptual stuff isn't my bag, so I'd never consider putting a live computer inside something i was working on (More likely a dead one, or I'd save the live system as the last piece of the puzzle), but I'm finally getting around to having a studio to do stuff in, and I wonder how well a computer fits into the game.
You know, I mean, stuff tends to get pretty messy. And live computers aren't the sort of thing that mix well with fits of inspiration that involve picking something up and slamming it back down in a different position. Even excepting a spinning harddrive and fans, cables and cards tend to get knocked loose, sparks fly, etc.
If the computer survives the sculpture, that's art in itself. Go for it. Lemme know how it works out.
Cannot compute. Not enough data. (Score:2)
Your question is *way* too generic. "I want a car. What should I buy?"
In your case, it depends on what kind of pages you are about to serve.
You haven't mentioned if your pages are static or dynamic. Dynamic means that they are created "on the fly", e.g. using content from a database. Slashdot itself is a dynamic site. And even then, there are lots of differences - some database engines require more hardware than others, some technologies for dynamic pages require more processing power per page hit than others etc. etc.
Reading your question, it seems that your site is made of static pages only. In that case, you do not need very much processing power and an older CPU will do.
With webcams, that again is something that completely depends on what kind of camera hardware you are about to use. Some of them require a lot of help by your web server, but most of them don't. You don't say what kind of camera you have, so again, no definite answer is possible.
Once you actually know what you will do, feel free to mail me. THEN I could try to help you...
Tell that to ThinkGeek (Score:2)
Of course, the original question is just stated wrong. It's not the hardware config that matters so much, it's the software config, and especially how you serve up dynamic content. And most places that get
Bandwidth (Score:2)
Avoiding the Slashdot Effect (Score:2)
If the server is reaching saturation, it should enable a log keeping track of WHERE the traffic is coming from (as in what site its being directed from). If it where to do this it could stop serving as many pages to Slashdot folks while leaving the site up for other people. It could also display a error message such as "This page has been swamped. Please try again later."
What do you think?
How do I implement a, "Layer 4 switch?" (Score:2)
Web servers (Score:2)
If you decide to mingle databases and dynamic pages (either in perl or php), I'd pump up the ram to 128 and give it a little more processing juice for good measure. A well tuned apache can be made to not throw up when there's LOTS of requests (ie slashdot), but I'm guessing it'll probably end up puking if it has to produce a page each request.
There's a workaround though. You can write a set of perl scripts that make a static web site every 15 minutes (or whatever time) running as a daemon. That way, you escape building pages for every single request. (correct me if i'm wrong, I think that's how slashdot's built... rob???)
In *all* likelihood, though, I wouldn't worry about it. Let's face it, if you *do* get slashdotted (which is likely), it won't be everyday (which is certain). It *will* force you to configure your server well, though. In your place I'd just go static HTML, or dynamic pages with perl or php3.
Of course, then there's the web cam to take into consideration, but to be honest, I haven't set one up so I'll let that up to someone who has.
And that's my 2 cents
Tuning your webserver (Score:2)
http://evolt.org/index.cfm?menu= 8&cid=193&catid=18 [evolt.org].
.djc.
Re:ramdisk and cache (Score:2)
append="ramdisk=65536"
To your
lilo: linux ramdisk=65536
That will give you 64Mb RAMdisk's. You can change it to whatever quantity you may need. I don't believe that it even needs to divide evenly into your total RAM amount. That simple sets the maximum size of the RAMdisks on your box.
The usual disclaimers apply, you milge may vary, etc...but it's always worked for me.
Watch your back! (Score:2)
I think that your backend (if you use dynamic content) is just as important as what machine/webserver software.
If you are using som kind of database, think twice and test carefully. Mysql is often used, although me thinks that the sql implementation in mysql is _bad_ , subselects anyone?.
A single badly written cgi-script can also bring down a otherwise good server. (Trust me!
mod_perl can speed things up if you use perl alot, but it alse puts some special requirements on your perl scripts.
/. setup is not the best way of doing things. (Score:2)
The simple fundimental mistake that always seems to be made is that there has to be a central resource. The problem with a central resource is that it becomes a single point of failure.
Sure
a) Its database
b) Its switch
c) Its router
d) Its link to the Internet
e) Its uplinked ISP has a BGP problem
etc etc etc...
The only way for a system as popular as slashdot can maintain the availability it deserves and requires is with a fully distributed system.
There are better ways.
noidd
Don't forgot bandwidth in this equation (Score:2)
Re:Linux servers with dynamic content (Score:2)
It was written in PHP, and I'm more than happy with the performance, in fact (I'm not the author of much of the code itself) I'm more often than not, very impressed with the speed with which it operates. I am looking forward to being able to move it over to Zend, this should increase our capacity even more, without any hardware additions.
RAM fragmentation (Score:2)
Re:Simple...Celeron+128Megs (Score:2)
So, bottom line, you shouldn't ever need to do a RAMdisk on linux, except under special circumstances, such as booting or installing.
Rules of Thumb (Score:2)
1 7200 RPM SCSI disk per 75 hits per sec. Make sure to have a big eough SCSI Bus (Wide/Ultra, etc) to handle the number of drivers you use.
Linux 2.2 kernel much more efficient than the 2.0 kernel.
BSD and Solaris have more efficient memory paging algorythms. This is only an issue if you are serving more data then will comfortably fit in RAM, which you aren't.
As others have said, your connection is probably the weak link. Calculate your average page size and multiple by the number of page hits per second to figure out how much bandwidth you need.
-Loopy
How about FreeBSD? (Score:2)
Anyone here have any reasons why Linux would be better than FreeBSD in this situation? After all, yahoo (probably the worlds most visited site) uses FreeBSD and seems to really like it.
...
*rolls eyes* (Score:2)
An old pentium can serve sufficient static pages to saturate your bandwidth. For that matter, an old *macintosh* can serve sufficient pages to saturate your bandwith.
The major thing will be all the side processing that you do to generate the pages and content. In this case, his webcam, probably dynamic generation of archive pages and the like (although a better idea would be to regenerate all the active pages once - your last archive page, the index of the archive page, and the new cam pic page - when the new cam pic comes up.
Especially for a high traffic site, doing it once and then serving from the filesystem will be much more important.
As for your analysis: you forgot the biggest server system speed-up. RAID. Multiple disks on multiple controllers. A single controller and a single disk like you suggested, no matter how fast, will always pale to this relatively low-cost solution (and, for that matter, his data will be much safer, too).
--
AolServer or Zope (Score:2)
This isn't a knock against the apache group - they made a great webserver. But their emphasis has been on modularity and extensibility. The great drawback of apache is that it forks for each new connection - this can eat up a lot of RAM very quickly.
I would think that a non-forking webserver, such as AOLServer or Zope, would serve you better. Perhaps AOLServer more than Zope, as Zope has to interpret a lot of python on execution.
As endorsements go, Bruce Perens runs Zope for his site (although I'm not sure how much traffic it gets, but it's been mentioned on slashdot at least a half dozen times and should have taken the slashdotting to end all slashdotting by now). Philip Greenspun, the author of Database-Backed Web Sites and Philip and Alex's Guide to Web Publishing (not to mention the brain behind Ars Digita and hence scads of corporate sites), uses AOLServer.
--
Re:Tuning, tuning, tuning! (Score:2)
Pretty small footprint, apart from the (shared) SSL libs.
Re:Simple ... run NT and IIS (Score:2)
www.linuxplanet.com is running Apache/1.3.3 (Unix) PHP/3.0.7 AuthMySQL/2.20 on Solaris
www.linuxgeneralstore.com is running Apache/1.3.6 (Unix) mod_frontpage/3.0.4.3 on DIGITAL UNIX
I do realize that these sites are in the minority, but it just goes to show you that linux is not the end all be all of OS's. OS's are like masturbation, everybody has their own way of doing things.
Re:Avoiding the Slashdot Effect (Score:2)
How would you know which people were
Yes you would. That is the definition of the Slashdot Effect, everyone seeing a link to a server on Slashdot and clicking on it at the same time (roughly). You can tell which are
Maybe any browser running Netscape Linux or Mozilla gets denied? That won't win you many friends around here.
No, it's not based on the browser but the IP address of the last page that was served up.
Besides, what's the good of denying
You are missing the point. Many web sites have a community or user group that relies on it as a valuable resource. The 'invasion' by thousands of strangers killing their web site could be mistaken for a denial of service attack. The previous author is trying to reduce the impact of the Slashdot to a manageable effect.
Phillip.
Re:Spread the traffic around... (Score:2)
Each link to an image is actually a link to another HTML file (I don't know how this works when the link says "/something.jpg"). It's very frustrating when you are trying to "Save Target As..." since you end up with an HTML file and inside is a link to a JPG that has already been moved. You have to basically load the HTML page and then right-click on the image itself to save it.
What's the problem with just linking to the HTML files? Well...that's where the frame+banner+ad trickery all happens. So yes...you could link to a bunch of images on free website providers as long you don't mind the fact that clicking on one of these links would spawns a new window or frame with some ad content.
Now...I don't like GeoCities because it spawns a new window. I find it is much more "polite" if they tuck the ad content in another frame. Why? Because I know the code to "break" the frame and just give me a plain, unadulterated, page that looks identical to part of my site.
If you'd like to see a GREAT example of how you can built a COMPLETE site out of nothing but free website providers (with hardly an ad anywhere!)check out...
http://mangaheaven.cjb.net/
(Naughty Anime alert). If it wasn't for the host name, you'd never know this entire collection of pages was run completely on the good graces of providers like Xoom and Tripod. Notice how the main page on cbj.net instantly hands off traffic to ten other websites so that if one site goes down, it only takes a day or two for the site owner to mirror the collection to a new host.
- JoeShmoe
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Spread the traffic around... (Score:2)
You might consider putting some of your content on any number of the free webpage providers like GeoCities or Xoom. It's not really classy enough for commercial sites, but they are great for defraying some traffic from your primary site. This may be just what a starving artist needs...?
Give basic information and/or samples on the free site and then if people are interested, they can click-through to your primary site. It's also a great way to tell people about mirrors (if Link A doesn't work, try Link B)
Generally speaking...unless you are reaching abuse levels with MB transferred...most webpage providers could even handle link attention of slashdot proportions.
Some providers like Tripod and Web1000 (porn banner alert) already spread your content over several servers to keep other people from "deep-linking" one particular file...handy if you want people to read your statement and not just download your images.
- JoeShmoe
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Requirements (Score:3)
The web daemon will be reading files from cache repeatedly, not building content on the fly, so a 233 may be overkill.
If you're broadcasting a "live" (1+ second refresh) show, you can improve efficiency a bit more by encoding the JPGs on another system (even a win95 box) and ftp-ing them to the web server automatically. This is how most adult streaming sites work.
j
I get slashdotted all the time. (Score:3)
If you're serving video or audio or images, though, you might need a faster net connection - do the math regarding bandwidth-per-user and how many you can support.
Thanks
Bruce
Re:Tuning, tuning, tuning! (Score:3)
More performance info can be found here. [acme.com]
the slashdot effect (and starwars effect) (Score:3)
for fun, turn any SYN flood program (ie portfuck) on port 80 and bang away. make a bazillion servers TRY and start up and see how your box responds. simple as that.
oh yeah, my stats: PII/266, 64 MB ram, OC-3 connection in. my max instances of httpd? around 50.
jose nazario
Re:Let's look at how to do this systematically :) (Score:3)
This is all really, really bad advice, imnsho. You ignore bandwidth, recommend 2 boxes instead of actually getting the most out of a single box, and even go so far as to insist on an alpha or athlon?
Too much infoweek!
--
Blue
Let's look at how to do this systematically :) (Score:3)
1. Look at the processor/motherboard that the server has. You will be handling a large amount of requests. Therefore, you will want to get an Athlon or Compaq Alpha. Both of these models handle multitasking OS's better than the Intel chip. I prefer the Alpha. You will want the fastest memory you can buy, with the best configuration. An Alpha motherboard and chip will give you this.
2. Your disk subsystems are also very important. You will need to maximize bandwidth between the motherboard and the disks. An Adaptec U2W SCSI controller will help you here. I also recommend the Seagate Cheetah series of hard drives. A 9GB drive will not cost you that much, and wil have plenty of performance benefits.
2.5. Your networking subsystem. Make sure your network card is directly supported by Linux and has a good chipset. 3Com Fast Etherlink XL PCI cards are my favorite choice here because of their Parallel Tasking chipset, and because installation of them under Red Hat 6 is a snap. They even work well with NT, which you do not want to run a slashdotted site on unless you want to run a Compaq Proliant 8500 and spend as much on it as you would a house.
3. Make sure your Linux installation has a large enough swap partition, and don't run any extra services. Strip it down to what you need, and preferrably put your DNS on a small extra machine, as well as other system functions that you might want, but do not need to be on that machine.
4. Check with people here about exploits. Every script kiddie that reads this site will want to crack your box and leave messages. The more immature ones will probably quote DMX or other rap artists. There are many cool people here that are really good with Linux security.
I believe the reason a lot of Linux sites get slashdotted like this is because a lot of hardware that Linux is used to run on is not what you'd want to run a commercial website on.
The reason why NT appears somewhat stable in a lot of cases is because the manufacturers of NT servers bend over backwards to make NT work on the BIOS and hardware level.
Linux can get the same effect and maximize performance off a website by tuning the hardware a bit, and knowing what hardware to use. A Celeron ain't gonna cut it. Alpha processors will do your job just fine for you without the Intel issues.
Plus, the system I quoted there can be had for about $4K and can handle heavy loads. Try doing that with 1 processor on an Intel chipset.
Your Bandwith is what counts (Score:3)
Perhaps you should really focus on making sure the sculpture delivers up a pretty small, streamlined image, otherwise, the demands on your internet connection are going to kill you...
Let us know when you get hte project done... hopefully it will be so great, we'll crash your server no matter how beefy it is... Good luck...
Chris MOyer
Linux servers with dynamic content (Score:4)
To give a suggestion of the CPU power required, the company I work for has several heavily loaded servers:
A celeron 350/128mb ram, maxes out at approximately 7 hits/second (Heavily dynamic material)
A celeron 350/128mb ram, maxes out at ~17hits/sec (Quite heavily dynamic material).
I just don't know how many hits/sec the
Note that these servers have been specially configured to handle the traffic involved, it is unlikely that you will go to the same levels of specialisation, so leave some extra space.
Ramdisk versus cached disk... My experience. (Score:4)
Unfortunately, with writable dynamic content, the ramdisk will have to be written to disk periodically, adding complexity, overhead, and, quite possibly, more disk IO than using a disk directly!
My server is a Celeron with 320MB RAM running Linux 2.0.36. I configured it with a 128 MB ramdisk and did a great deal of testing. Performance was significantly better, especially during peak loads, than running straight from the disk. Of course, I had a considerably more complicated set of scripts and still stood to lose some transactions if something bad happened.
As my next excercise, I tried to duplicate that performance without the ramdisk. By tuning the values in
The trick that worked for me was to increase the percentage of dirty buffers before forcing a flush to 80% and to increase the timeout for dirty buffers before flushing them to disk to 10 minutes. That does include some of the disadvantages of the ramdisk but my UPS is good for over 10 minutes so I don't worry much (the Internet connectiond drops when power is lost so my machine, while still up, goes idle). My startup/shutdown/backup scripts are much simpler as a result though.
Bandwidth and Benchmarking (Score:4)
Where will the site be hosted? Are you planning to host it with an ISP or at the location of the web-cam? If you are hosting it at the location of the web cam, network bandwidth will be by far your biggest concern. At the very least, you are going to need a frac-T1, frame relay, or DSL connection. Chances are, though, that if you are concerned about PC hardware costs, all of these (except perhaps DSL) are out of the question.
More likely, you will have the webcam connected to a PC, which could do nothing but capture images and upload them (via modem, ISDN, or DSL) to a co-located machine with an ISP. The server located at the ISP will then push them out to the teeming millions.
If you do not have the need for any CGI, or your CGI needs are minimal, you may not even want to use your own machine. You may be best off just getting a web access account -- you know, the kind of think you get with many dial-up accounts, though with better service and the capability for more bandwidth.
Assuming you are doing CGI, and you really do need your own machine, you really ought to answer your own question. By that I mean that you should benchmark your system on whatever hardware you happen to have handy. Depending on the complexity of your site, there are many server-testing tools that can tell you just what type of loads your system is capable of handling, and what type of latency you can expect at those loads.
If those numbers are much more than you expect to receive, then you know a machine like what you have is sufficient. Or, you may discover that a 486 with 32 megs of ram is plenty sufficient. If you have a lot of inefficient CGI, you may need a dual pII with gobs of memory. If you have more time than money, then trial and error will give you by far the most efficient system.
Let me tell you this: building a system to handle a high bandwidth site is not nearly as much fun when money for hardware is no object. Perhaps the e-mail domain may clue you in there...
-p.
just get an account with a websever & upload (Score:5)
its a hell of alot cheaper then getting your own t-1 & servers.. as most of these people are sitting directly on the mae's and such.
My Experience (Score:5)
Details
1. think about the difference between "static" content (just files on the disk) and "dynamic" content (pages generated live, like here at /.). If you are just serving files, a 486 can handle it (assuming T1 speeds). I personally use a Pentium/90 at .3 T1 speeds and CPU never gets high.
1bis. Memory and disk speeds are hugely more critical than CPU speeds (if you are not doing dynamic content). Get a DMA harddisk (SCSI or UltraDMA IDE). 64-meg of RAM should really be enough for your application.
2. the biggest thing that is going to kill you is bandwidth. Now I run a website that gets about 10,000 hits/day (raw) on a 400-kbps link, but I'm just serving HTML and inline GIFs so the link never really gets overloaded. However, you sound like you might be hosting some pretty hefty downloads. One technique is to stick your big-files on a free-hosting website (like GeoCities), but they do monitor their logs and they will kill your download, but hopefully that's after being Slashdotted.
3. Reading other comments, I see a bunch of people suggesting RAMDISKS. That's totally unnecessary; the operating system caches disk access equally as well as a RAMDISK. (In fact, a RAMDISK is just a crude way of tuning your disk-cache).
4. Remember to consider you content. Artistic web-designers tend to put way to much layout/graphics in their pages. This can kill you website, as it can easily reach 10-times the bare minimum in size, but moreover kill your site with unnecessary TCP connections (If you put 4 gifs in a web-page, you will cause 4 TCP connections to your site; and the TCP stack within the machine can handle only so many concurrent TCP connections before bogging down).
4bis. Please be polite to readers. You probably will develope your content only on one browser, but slashdotters use a wide variety of browsers; you'll likely piss off a lot of people if, for example, your pages render well on Netscape/4.61 but look like crap on older/alternative versions. This often means reducing layout.
How to guarantee you won't be affected (Score:5)
We've been given some sketchy details on the current setup. It would be interesting if there was a page with all of the specs, software, and tunings, including config files, etc.
Slashdot can take quite a load. If the setup was documented, a lot of us who have projects on the horizon will have something to base them on and can avoid mistakes, etc.
From our experience.... (Score:5)
However, our site is very heavy on the dynamic content (and uses a lot of SSL for the ordering system).
The machine could handle about 20 minutes of the /. effect at a time before the CPU time went sky high. Luckily, we were able to bring the machine down, put in a second processor and double the memory (we also updated mod_perl); and get the machine back up in a couple hours with the new configuration.
The machine has been running like an absolute champ for the past few days. It's been able to handle requests numbering in the millions (page hits are in the hundreds of thousands, but our site uses a fair amount of graphics also) and has transferred several GBs of data just this weekend. If you do anything securely (SSL), keep in mind that anything on that secure page will take up about 7 to 8 times the CPU time as a non-secure item. And never, never, never run a site using Perl for dynamic content without installing mod_perl for Apache. The difference between a machine with it and one without it is tremendous (especially in memory usage).
One thing you can do for big gains in speed is disable hostname lookups (this makes a huge difference when being slashdotted). Also, turn the log level down on Apache. Because we have space to spare on this particular machine, we have the logging set at a moderate level. After two days since the mention on Slashdot, the logs are a few hundred MB. Not a problem if you have the disk space, but if you don't it will be a major problem.
Anyway, the configuration now is: dual PII-450, 256MB of ECC PC100 SDRAM, 10MB Ethernet on kernel 2.2.12 running Apache 1.3.9 and Perl 5.005 (along with the latest OpenSSL, SSLeay and mod_perl). It's having no problem keeping up with the load at this point, and the traffic is still pretty heavy.
-Jon