Behind the Scenes at Hotmail 292
mallumax writes "ACM Queue interviews Hotmail engineer Phil Smoot on how they manage more than 10,000 servers spread around the globe. Between them, they process billions of emails per day and are overseen by hundreds of administrators. To do that they have returned to the command line. From the article: 'Our operations group never wants to rely on any sort of user interface. Everything has to be scriptable and run from some sort of command line'. The overriding philosophy seems to be KISS. Also: tape backups are out and spam levels have stabilized."
The SPAM problem (Score:2, Insightful)
BF Can you quantify in some way the extent of the spam problem?
PS It is massive. Years ago we saw as many as 3 billion incoming messages. This has declined, but the estimates are that 75 percent of all e-mail is spam. Over the past couple of years our techniques have gotten better, and our partnerships with other major ISPs have improved. I would say spam is still gross and abusive, but it hasn't been getting worse lately.
We do continue to react to spam on a daily basis as spammers continue to seek out holes in our defenses. What we see now is more sophistication in the spammers--more phishing schemes, people trying to get credit card numbers and that kind of thing.
But didn't he get the memo from headquarters? Bill Gates said there would be no more spam! They better get to work -- they're running out of time!
Command line (Score:3, Insightful)
> To do that they have returned to the command line.
Absolutely.
I'm currently in the process of trying to change our company culture away from legacy GUI tools and toward command-line tools.
Scriptability is a highly under-rated goal. I'm not against GUI tools -- but they need to be built on top of scriptable utilities.
I'm amazed (Score:1, Insightful)
I have worked on projects with that many hosts before and only had maybe 10 colleagues.
Oh, come on yourself. (Score:4, Insightful)
Right... it's always more interesting to read article after article about only unsuccessful operations run by people who aren't proud of what they do, and don't face huge, global challenges.
You're cranky because it's MS. If exactly the same article ran, substituting "gmail" and "google" for all of the other names, you'd say, "cool!"
From the immortal words of Henry Spencer (Score:5, Insightful)
"Those who don't understand UNIX are doomed to reinvent it, poorly."
From the article and elaborating on the
Q: Are there scaling reasons to think about the benefits of a command line for managing over a GUI, or are there other things to think about?
A: Our operations group never wants to rely on any sort of user interface. Everything has to be scriptable and run from some sort of command line. That's the only way you're going to be able to execute scripts and gather the results over thousands of machines.
Also, we all remember the scaling issues that MS had when they took over hotmail and initially tried to switch from freebsd to Windows.
MS had to port over cron jobs because its not something that is installed and used by default under windows like UNIX. They had to rewrite the "inefficient" perl code that ran fine on FreeBSD to C++. They had to redo the memory allocation to prevent memory leaks in the new C++ code. Read about it from the goat's mouth http://www.microsoft.com/technet/interopmigration
I can't wait until FreeBSD and other inferior OSes get tools to find memory leaks. One day....
(That last line was sarcasm and not a flame).
F**Kin Speak English ! (Score:5, Insightful)
Re:Does anyone know... (Score:3, Insightful)
If it is responding in the headers IIS, it's probably being proxied by some kind of load balancer. In a modern setup, the proxy is a hardware device with a custom OS... probably originating in BSD, but the IP stack heavily modified. The system for delivery and transport of mail will also be differnt than that of the web interface.
I don't think an OS really matters anymore when you're getting to that scale. The architecture matters, and that's probably proprietary and protected by IP agreements with employees because it would have value to competitors.
Re:Hundreds of admins for 10K servers is not so ho (Score:1, Insightful)
Paul Graham on the importance of tools (Score:4, Insightful)
He submits, of course, that any program can be written in any reasonable language -- for they all are, after all Turing machine's equivalents. But the quality of the tools can make a difference between a feature being added next week and not at all.
If Hotmail's admins are back to command line and scripting anyway, maybe, they should've stuck with FreeBSD.
Look at how quickly Google is rolling new things out -- their platform allows them to.
Not relying on UIs???? (Score:1, Insightful)
Re:Hundreds of administrators (Score:5, Insightful)
It sounds to me like you don't understand what it is that Akamai does. They're not just running web & streaming servers on their 15k machines. They're distributing content in real time in a way thtat vastly improves user access all around the world. You may have heard when Victorias Secret held their first video-streaming lingerie show. Well their servers couldn't handle the load because of all the people trying to watch it. They became an Akamai customer, and Akamai was able to redistribute their streams in real-time all over the globe. To be able to take video (or just web content) from a single source and distribute it quickly and efficiently to thousands of distributed users in real-time is a huge undertaking. Akamai has some very impressive technology to be able to do this.
I'm not saying that running a mail service like Hotmail is a piece of cake, but I do think that what Akamai does is a lot more difficult and impressive when you think about it. If Akamai's distributed environment were to drop off the net then you probably wouldn't be able to access any of the on-line services of most of their customers [akamai.com]. (And that's just a small subset of their customer base) The ability to keep websites like those of Microsoft, eBay, Fed Ex, Red Hat, etc. all highly responsive to end users is not a simple feat by any stretch of the imagination.