Forgot your password?
typodupeerror
Hardware

Building a Better Webserver 286

Posted by michael
from the slashdot-will-beat-a-path-to-your-door dept.
msolnik writes: "The guys over at Aces' Hardware have put up a new article going over the basics, and not-so-basics, of building a new server. This is a very informative, I think everyone should devote 5 minutes and a can of Dr Pepper to this article."
This discussion has been archived. No new comments can be posted.

Building a Better Webserver

Comments Filter:
  • by msolnik (536110) on Tuesday November 27, 2001 @05:56PM (#2621496) Homepage
    Why would they go with the desktop version when they want a rackmount server? You can get the Netra X1 for 50$ less and it comes with the exact same hardware but in rackmount case. Check it here [sun.com].
  • by trb (8509) on Tuesday November 27, 2001 @06:06PM (#2621558)
    "Real multithreading" is really no panacea. See the notes from John Ousterhout's talk, Why Threads Are A Bad Idea (for most purposes). [softpanorama.org]
  • by azephrahel (193559) on Tuesday November 27, 2001 @06:07PM (#2621567)
    It really feels like they only made a token gesture towards using an x86 box. To be honest my next box will probably be a sunblade too (but hey, I'm gonna use it for a desktop ;) Mind you this was a really good article, but I think they should have said that they were just more comfortable with sparc and that was that. There was another good article on a similar subject not long ago, on Anandtech's new server [anandtech.com]. For that article they benchmarked different configs (mobo, proc, etc) then did a price performace.. as far as I recall. And they chose AMD ;)
  • by czardonic (526710) on Tuesday November 27, 2001 @06:10PM (#2621598) Homepage
    You'd get the traffic to your site no matter what server you run. Code Red would just pump up traffic from your site.
  • by dmelomed (148666) on Tuesday November 27, 2001 @06:14PM (#2621626)
    Bah, speak for yourself. Java relies on the virtual machine, so that's your bottleneck (as in beer and performance). With proper software (like the new version of Apache still in beta) and tuning, or other threaded servers like aolserver or Xitami and PHP or modperl instead of Java I bet my money _that_ configuration will scale better.

    Also, don't confuse the CGI protocol with short-lived CGI binaries. Slashdot uses modperl, whcih is NOT a short-lived process, but Apache is still a forking server in the 1.3.x branch.
  • by Lumpish Scholar (17107) on Tuesday November 27, 2001 @06:16PM (#2621637) Homepage Journal
    Consider a user with a typical analog modem that has an average maximum downstream throughput of, say, 5 KB/s. If this user is trying to download the general message board index page, about 200 KB in size (rather small by today's standards), it will require a solid 40 seconds to complete this single download.... To maximize the efficiency of the network itself, we can compress the output stream and thus, compress the site. HTML is often very repetitive, so it's not impossible to reach a very high compression ratio. The 200 KB request mentioned above required 40 seconds of sustained transfer on a 5 KB/s link. If that 200 KB request can be compressed to 15 KB, it will require only 3 seconds of transfer time.

    Except that 56 Kbps modems get 5 KBps thoughput by compressing the data! If the client and server compress, the modems won't be able to; the net effect is lots of extra work on the server side, and probably no increased throughput for the modem user.

    The server might or might not see a decrease in latency, and in the number of sockets needed simultaneously; it depends on how much it can "stuff" the intermediate "pipes". The server will see an overall decrease in bandwidth needed to serve all the pages.

    Ironically, broadband customers (who presumably don't have any compression between their clients and Internet servers) will see pages load faster. (And the poor cable modem providers from the previous story will be happy.)
  • by john@iastate.edu (113202) on Tuesday November 27, 2001 @06:20PM (#2621662) Homepage
    Well, lots of big iron gets crushed by the slashdot effect too. This thing is running on a piddly little Sun, after all. And it was very responsive early.

    One thing that does seem to work against the onslaught is a throttling webserver [acme.com]. If you haven't got the bandwidth etc to serve a sudden onslaught of requests, probably the best thing to do is to just start 503'ing -- at least people get a quick message 'come back later' instead of just dead air.

  • by Lumpish Scholar (17107) on Tuesday November 27, 2001 @06:25PM (#2621692) Homepage Journal
    Ousterhout says threads are bad for apparent concurrency but good for taking advantage of multiple processors, and for building scalable servers.

    In other words, with the right hardware architecture, threads could be very useful for sites such as Ace's Hardware (though they happened to go with a uniprocessor) and Slashdot.

    Java threads are also easier to program than C and C++ threads, though not easy. (Manual memory management is hard; thread programming is hard; manual memory management in a threaded program is very hard. I'm not speaking hypothetically on the last point; I've really envied Java programmers the last few weeks.)-:
  • by Betcour (50623) on Tuesday November 27, 2001 @06:35PM (#2621737)
    I am affraid you are wrong, the modems get 5 KB/s of raw data, not counting compression. I can download zipped files at over 5 KB/s with a dialup modem...

    mod_gzip is your friend.
  • Confusing the issues (Score:4, Informative)

    by Alex Belits (437) on Tuesday November 27, 2001 @06:36PM (#2621741) Homepage

    In a part about databases and persistent connections they confuse the issues more than a bit. The real problem is not too many processes, what automatically makes threads look better, but the symmetry among processes -- any request should be possible to serve by every process, so all processes end up with database connections. This is a problem particular to Apache and Apachelike servers, not a fundamental issue with processes and threads.

    In my server (fhttpd [fhttpd.org] I have used the completely different idea -- processes are still processes, however they can be specialized, and requests that don't run database-dependent scripts are directed to processes that don't have database connections, so reasonable performance is achieved if the webmaster defines different applications for different purposes. While I didn't post any updates to the server's source in two last years (was rather busy at work that I am leaving now), even the published version 0.4.3, despite its lack of clustering and process management mechanism that I am working on now, performed well in situations where "lightweight" and "heavyweight" tasks were separated.

  • by victim (30647) on Tuesday November 27, 2001 @06:46PM (#2621792)
    Speaking as the maintainer of a site that is periodically slashdotted...

    Yes, a throttling server is a great idea. If you recognize that there will always be a load too high for you to handle (10 requests per minute for my site, yes minute, it is a physical device), then you must either decide to deal with the load or let the load crush your machine.

    Consider a typical web server. When it gets overloaded it slows down, each request takes longer to handle, there are more concurrent threads, overall efficiency drops, each request takes longer to handle.... welcome to the death spiral. (on my site-which-must-not-be-named-less-it-be-slashdotte d, everyone waiting in queue gets a periodic update, at a certain point the load of generating the updates swamps the machine. I have to limit the number of people in queue.)

    The key decision is to determine how many concurrent threads you can handle without sacrificing efficiency and then reject enough traffic to stay under that limit.

    This is where optimism comes in and bites you in the ass. You remember that every shunned connection is going to cost you money/fame/clicks whatever so you set the limit too high and melt down anyway.
  • by victim (30647) on Tuesday November 27, 2001 @06:51PM (#2621819)
    One other factor to consider is that the gzip transfer encoding compresses much better than the algorithm in the modem. Part of this is the algorithm with its larger dictionary size, the other part is the `pure' data stream being fed to it. It is just the html, not the html interspersed with ppp, ip, and tcp headers.
  • it's the BANDWIDTH (Score:5, Informative)

    by green pizza (159161) on Tuesday November 27, 2001 @08:00PM (#2622183) Homepage
    If you haven't noticed by now, Ace's Hardware has a neat little indicator on each page that shows time processing and queue time it spent getting to you (very bottom left-hand corner of each page). Most are about 74ms - 112ms for me. This, plus the result of some pings and traceroutes leads me to belive they're heavily BANDWIDTH bound right now, not CPU bound. I do hope Ace puts up a summary of the Slashdot effect as well as some other data for us to pour over. Some MRTG router graphs of the bandwidth usage would be *really* nice, too.
  • Re:Why Sun? (Score:2, Informative)

    by Zog (12506) <israelshirk@g m a i l .com> on Tuesday November 27, 2001 @08:47PM (#2622405) Homepage
    One of the great things about sparcs is their performance under load - they're a *lot* better at running under high loads than your typical pc.

    About pc's having more competition, it's not a hard argument that the competition isn't really what it seems to be - most of the competition is in price and how fast Quake will play. If Intel's processor is a little bit slower than AMD's, the fact that it still goes into most OEM computers will keep Intel alive. If Sun does not stand up to the competition with their processors, motherboards, and other components, people will leave them for something better, and Sun will be down the hole. They *have* to be better to survive - there's not much forcing people to stick with them.

    They're also a lot more solid in their roots (Sun servers have been around forever, so they've had a lot of time to work on tweaking things and getting processors to work well for their applications), and Sun's support generally ranges from fairly good to downright amazing, from what I've heard (not that I've needed it).

    But in the end, it's a lot different from PC hardware, and it can sometimes take a bit of getting used to.
  • by Fjord (99230) on Tuesday November 27, 2001 @10:44PM (#2622872) Homepage Journal
    No he's right. You are right that the 5KB is raw data: that's what he is saying. But the difference between whether it's compressed by the server or compressed by the modem is abstract: there are cases where one is better than the other. But, for the most part, compressing it on the server is slightly better than letting it compress downstream, plus it can increase your bandwidth (which is what the article was talking about) and the speed of transit before it gets to the modem-to-modem link, so it is worth doing.
  • Re:Why Sun? (Score:2, Informative)

    by Anonymous Coward on Tuesday November 27, 2001 @11:33PM (#2623041)
    I am amazed at how people buy into the myth of cheap PC?s...

    You mean people like Google [google.com] who run their highly-regarded search engine/translator [google.com]/image indexer [google.com]/Usenet archive [google.com] on a server farm of 8,000 inexpensive [internetweek.com] PCs [google.com] with off-the-shelf Maxtor 80GB IDE HDs?

  • Multithread Apache (Score:3, Informative)

    by Zeinfeld (263942) on Wednesday November 28, 2001 @12:47AM (#2623226) Homepage
    The article preens itself over the use of multithreaded code over the multiprocess model of Apache. This is potentially a big win since the multiprocess model involves a lot of expensive process context swoitching and process to process communication which is expensive as opposed to thread switching.

    When I discussed this issue with Thau (or to be precise, he did most of the talking) he gave the reason for using processes over threads as the awful state of the then pthreads packages. If Apache was to be portable it could not use threads. He even spent some time writing a threads package of his own.

    I am tempted to suggest that rather than abandon apache for some java server (yeah lets compile all our code to an obsolete byte code and then try to JIT compile it for another architecture), it should not be a major task to replace the Apache hunt group of processes with a thread loop.

    The other reason Thau gave for using processes was that the scheduler on UNIX sux and using lots of threads was a good way to get more resources, err quite.

    Now that we have Linux I don't see why the design of applications like apache should be compromised to support obsolete and crippled legacy O/S. If someone wants to run on a BSD Vaxen then they can write their own Web server. One of the liabilities of open source is that once a platform is supported it can end up with the application supporting the platform long after the O/S vendor has ceased to. In the 1980s I had an unpleasant experience with a bunch of physicists attempting to use an old MVS machine, despite the fact that the vendor had obviously ceased giving meaningfull support for at least a decade. In particular they insisted that all function calls in the fortran programs be limited to 6 characters since they were still waiting for the new linker (when it came it turned out that for functions over 8 characters long it took the first four characters and the last four characters to build the linker label... lame, lame, lame)

  • Re:New Webserver? (Score:2, Informative)

    by benspionage (265723) on Wednesday November 28, 2001 @12:59AM (#2623253) Homepage
    An excellent reply [aceshardware.com] to the "they've been slashdotted" comment was given in the forum for this article. I should note that the site is responding fine now.


    Most people are unlikely/too lazy to follow the comment link above so I've repeated the first part of the response below:


    Yes, I read quite a few snide comments on slashdot about this server not being able to handle the load and ridiculing the article because of it. Frankly these people dont have a clue. It would be pointless in the extreme to operate a server 24/7 to handle the kind of loads the "slashdot effect" generates unless those kind of loads are the norm... A well tuned properly designed website/server should be equipped to handle 2 to 3 times its _expected_ peak traffic rate (which seems about what this server can do as its tuned now). It is a waste of money and hardware trying to do anymore than that imho as 99.9% of the time you would have alot of $$$ sitting totally idle in the form of hardware and bandwidth. Being a server admin myself, I think the guys here did an EXCELLENT job explaining what is involved with hosting a fairly high-traffic website effeciently. And I also think the server/programming for this site is well designed and does its job admirably (better than 99% of the websites on the internet at least). They did an excellent job of explaining the pros and cons

    of different approaches to dynamic sites. Knocking them for getting nailed by slashdot isnt exactly productive, I would like to see ANY site which uses database generated content on a single thousand dollar server handle that kind of load (my guess is > 1000 requests per second at its peak from what I have heard from others who have been slashdotted)... Caching can only do so much :)

    [Rest of comment follows, see link above for full version]

  • by Alex Belits (437) on Wednesday November 28, 2001 @07:35AM (#2623996) Homepage
    FastCGI is better than just a bunch of symmetric processes, however it has some serious flaws -- among them poor security model for processes that run on other hosts (fhttpd reverses the logins, backends' connect to the server, and those connections authenticate on the server), and a need to proxy the response through a server for processes that run locally (fhttpd passes a client's fd to the backend process).

    Other than that, FastCGI is a good idea.

Debug is human, de-fix divine.

Working...