Solaris Machine Shut Down After 3737 Days of Uptime 409
An anonymous reader writes "After running uninterrupted for 3737 days, this humble Sun 280R server running Solaris 9 was shut down. At the time of making the video it was idle, the last service it had was removed sometime last year. A tribute video was made with some feelings about Sun, Solaris, the walk to the data center and freeing a machine from internet-slavery."
So what did it do all that time? (Score:5, Funny)
Re:So what did it do all that time? (Score:5, Insightful)
Uptime is irrelevant for an individual server, anyway. If there's fail over (and there should be if uptime is important), take it down and update the kernel for security reasons, who cares?
Re:So what did it do all that time? (Score:5, Funny)
If there's fail over (and there should be if uptime is important)
i agree... if you're responsible for a single server performing a mission critical function with no fail over, you may as well just fire yourself
Re:So what did it do all that time? (Score:5, Insightful)
Just get it in writing.
Been there done that, when it has to come down for hardware failure or something like that you can show you tried to get a backup machine, you tried to do things right.
Re:So what did it do all that time? (Score:5, Insightful)
Re: (Score:3)
Proof you are right doesn't help. It hast to be shared and spread long before there's an issue, or you can still end up in an unwinnable situation.
Re:So what did it do all that time? (Score:5, Interesting)
I'd differ with that. I was fresh on the job, just 2 or 3 months, long enough to get the feeling I would be the scapegoat. The owner came in, and a deal the GM had made in a bar 2 weeks back hadn't worked out, and as the 3 of us were walking to the back of the garage to look at what we had, The GM tried to say it was all my idea.
Wrong, I skipped out in front, spun around and said this stops right here and now, I was just following orders. The owner looked at the GM, looked at me, gave a barely perceptible nod, and started walking again. I didn't get pushed to take the blame again, but I did get pushed in every other way it seemed.
Owners didn't get to be owners without a sense of who's right and who's wrong in boss/employee differences. Tell the truth even if you lose, because if you lose, that job was looking for somebody to do it when you walked in. I'd a hell of a lot prefer to stand my ground if I'm right, and admit it if I'm wrong, and I've done quite a bit of both in my 78 years. Honesty has paid off handsomely several times.
About 2 years later another situation came to a boil, and I was the first one called to the owners office when he arrived. He wanted to know what it would take to fix it. I said 2 things, the gear these people are using is just plain worn out, its been on the road non-stop for at least 5 years, I can't get parts because the parts bills aren't being paid. I need 10 grand in parts, and I can't get a P.O. for more than $200 a month, COD. Hell of a way to run a train. Besides that, the technology has moved on. Its time to upgrade.
His next question floored me, he wanted to know if he needed a new GM. I had to say it looked like he was, at the end of the day, the biggest roadblock to making things run smoothly. Then he had another dept head paged, 3 all told in the next 30 minutes. Years later he said they all agreed with me, so we had a new GM by the next morning. That and $150,000 in new gear put out the fire. That GM didn't work so well either after a couple years, but that's another story I am not directly involved in. The 3rd one is a pussy cat and we sometimes get into very noisy arguments even now, just to entertain the troops. He's a decent man, a motivated manager, but in a war of wits with me on technical stuff, he is unarmed and knows it very very well.
Bottom line to this story is that I had already proved my worth from the 1st day on the job because they had about half the gear packed up to go back to the factory shop, expected 2 to 3 grand each for repairs with a 2 week turnaround time. I canceled that, unpacked them and handed in parts orders at about 10% of that per machine. All were back in service inside of 10 days, half that waiting on FEDEX or UPS.
So it was a question of who was worth more to the person who owns the place. I stayed there 18+ years, have now been retired for 11 years, and the owner and I are still friends.
Cheers, Gene
Re:So what did it do all that time? (Score:5, Insightful)
It is surprisingly hard to guarantee data integrity when doing a fail over.
If you want to guarantee a system keeps operating and maintains data integrity when a single computer fails, you need at least another three computers that are still running with no failures. There is a mathematical proof for this.
If you want to go lower than four computers, you have to make assumptions about how the failures behave. And if just one computer fails in a way that does not match your assumptions, the system will fail.
If you do decide to go with the four computers required to handle a single failure, the protocols to ensure they agree on the current state of your data are quite complicated. The protocols have to be non-deterministic. That's another proven fact. No matter how many machines you throw at the problem, a deterministic protocol cannot handle even a single failure.
You can get around the non-deterministic requirement if you make assumptions about the timing of communication. But you'd slow down the system unnecessarily because you'd have to wait for the maximum time you assumed packet delivery could take on every operation, and if the network was slower than you assumed, the system would fail.
Knowing how difficult fail over can be, it is no surprise that sometimes it is decided to not bother with it and instead hire an operator, who you assume can make everything be ok as long as you have backups plus spare hardware ready to put in production.
Re: (Score:3)
Most problems gets easier to solve, if you have a lot of money to work with. This one is no exception. But you still need software, which can correctly execute a non-trivial protocol. A single software glitch could still take down the system when the same bug triggers on replicas simultaneously. Redundant systems have blown up [youtube.com] due to replicas suffering from the same software glitch.
That means either you need the software to be bug free. Or you let four di
Re:So what did it do all that time? (Score:4, Informative)
I don't remember which paper the result was in, but I do remember the overall idea of the proof.
The general proof says to handle t failures there must be 3t+1 nodes in total.
It is a proof by contradiction, so initially we assume the nodes can be split into three groups with each node being in exactly one of those three groups. And we assume that any two out of those three groups can reach a consensus without involving the third group. Now we'll prove that under those assumptions, the system breaks down.
So we imagine two completely functional groups out of those three, the network within each group is stable, but the network between them is slow. All the nodes in the third group suffer from a byzantine failure, which cause them to send corrupted messages. Imagine that the third group of failing nodes is still communicating with each of the functional groups, but it sends different information to those two groups. Under those circumstances the failing group along with one group of functional nodes can reach consensus, because we assumed two groups can reach consensus without the third. But at the same time the failing group can reach consensus on a different result with the other group of functional nodes.
In the above partitioning into three groups, we could have t nodes in each group, in which case it is proven that with t failures among 3t nodes we cannot reach consensus. Additionally there exist solutions that will reach consensus with t failures among 3t+1 nodes. They are randomized which means runtime is theoretically unbounded, but the probability that the protocol will take forever is zero. On average it completes quickly. For example the Asynchronous Binary Byzantine Agreement protocol operates in round and has 50% probability of finishing in a given round. If it fails to complete it will run another round and have 50% chance of finishing there. The idea in that protocol is that if there are two candidate results to agree on with roughly the same number of nodes supporting each result, they flip a coin, and try to agree on using the result of the coin flip. Trying to agree on the coin flip can only fail if the coin suggested a result that was behind in the number of nodes supporting it. Hence there is at least 50% chance the coin will land on a side, that leads to agreement.
The byzantine failure model is a bit extreme, but that means protocols designed to work in that model are resilient to extreme failures. The stop dead model on the other hand is a bit unrealistic. Which means protocols designed to work in that model are only proven correct under unrealistic assumptions. They may work in practice most of the time. But the proof of correctness isn't valid in the real world. I don't know if anybody have managed to come up with a sensible model, which lies somewhere between those two.
Re: (Score:2)
If you use something like Ksplice, you can install the kernel security patches without rebooting, although I don't think they were doing that here. I'm so disappointed that Oracle bought Ksplice.
Re: (Score:3)
If you run everything on a "cluster" layer (your apps are not dependent or maybe not even aware of the noncluster layer) then you won't have such problems - you can reboot a node with minimal impact. In the old days the ones famous for uptimes were Tandem and VMS.
Re: (Score:3)
Yup. And running a full marathon is pointless and irrelevant - any one could run 26.2 miles on a couple of months, half a mile at a time.
Re: (Score:3)
[There go the mod points]
Uptime is irrelevant for an individual server, anyway. If there's fail over (and there should be if uptime is important), take it down and update the kernel for security reasons, who cares?
Not all critical services are necessarily internet facing. I know of someone who had an application that ran continually for over 10 years, highly business-critical (master video stream controller for a TV network) and with very fancy hardware attached that it was tricky to replicate. The hardware was gradually updated over that decade, as was the code of the application (dlclose() FTW!)
Re:So what did it do all that time? (Score:5, Funny)
Somewhere at my last job, there was a Solaris 8 machine with over 4000 days uptime, that everybody hated to do anything with, but one person loved it and refused to migrate the last service that was still on it to something more modern.
Uptime is irrelevant for an individual server, anyway. If there's fail over (and there should be if uptime is important), take it down and update the kernel for security reasons, who cares?
It's like Cory Doctorow said in When Sysadmins Ruled the Earth [craphound.com]:
“Greedo will rise again,” Felix said. “I’ve got a 486 downstairs with over five years of uptime. It’s going to break my heart to reboot it.”
“What the everlasting shit do you use a 486 for?”
“Nothing. But who shuts down a machine with five years uptime? That’s like euthanizing your grandmother.”
Re:So what did it do all that time? (Score:4, Informative)
No, it was idle "only" since day 3509 (served as a hot backup if we had to restore the service from the new machines).
Re:*nix does not need to reboot for more updates u (Score:4, Informative)
Kernel updates generally required reboots even in the unix/linux world. In Windows, you could also avoid a reboot if you stopped the services that are being patched and restart them after a patch was applied.
Oracle sucks. (Score:5, Insightful)
I'd just like to leave this here. Yeah, I know Linux is great and everyfink, but Solaris is excellent and better in some ways. Oracle really ground my gears when they stopped supporting OpenSolaris and OpenIndiana is going nowhere fast.
RIP Sun.
Re: (Score:3)
Oracle never supported OpenIndiana, it's a distribution of illumos (the OpenSolaris fork).
Re:Oracle sucks. (Score:5, Informative)
I don't think his comment suggested anything else. You should probably parse it like this:
(Oracle really ground my gears when they stopped supporting OpenSolaris) && (OpenIndiana is going nowhere fast)
Oracle support only applies to the Left Side of the statement. The point of the statement was to suggest that with support gone, and the only alternative to the supported version going nowhere, the Solaris world is completely Shit Out of Luck.
Re: (Score:2)
Thanks, you're right. Misread it on account of not being awake for long :)
Re: (Score:3)
Re:Oracle sucks. (Score:4, Insightful)
OK, first off, it is not stolen. You cannot steal open source software. Oracle is following the GPL.
Second, Oracle was doing OEL before they acquired Sun.
Solaris is a technically good and high quality OS but its hardware support was limited. If you bought the Sun-branded boxes and Sun-branded cards, you were OK. However if you are white-boxing a server, you had to be careful to select chipsets that were on their compatibility list. Then support got murky at that point even then.
I really, really love Solaris, but let's face the facts. Outside of the SPARC platform, there is no reason for Solaris. Linux does everything as well or nearly as well. Linux is weaker in some areas, but not weak enough to justify the cost and lock-in of Solaris.
Solaris exists for Oracle to milk legacy customers on support contracts who aren't ready or willing to migrate to Linux and commodity x86 hardware . There isn't much if any new development going on, and Oracle is only pushing Solaris to new customers as part of their big data warehouse solutions (where customers have $$$$$ and want to spend it with one vendor) where they want to get people locked in to one vendor.
Re: (Score:3)
I was talking about Solaris on Intel. Not sure where you got this from. In fact, you kind of reinfor
Re: (Score:3)
Actually, last I checked Linux can not show you an uptime of 3737 days.
No, that's not a dig on Linux being unstable. The real reason is both more boring and more interesting at the same time. A Linux system with that kind of uptime would have to be running a kernel from a time where the uptime counter overflows after around 400 days.
And yes, I've seen that happen. :-)
Re: (Score:2)
Should have switched to Linux instead.
Actually no, that would mean less work for me.
Re: (Score:2)
That sucks.
Well, based on the recruiting calls I get might be about time to start looking again.
a terrible disturbance the /src (Score:5, Funny)
hey, that's three jokes there, take your pick.
Re: (Score:2)
Re:Oracle sucks. (Score:4, Interesting)
I will say that AIX is pretty good as well. In general, unless there is a show-stopper patch, or one installs a driver like EMC PowerPath that requires a reboot due to the hooks in the kernel, one can keep AIX up for a long while, only really bothering to update and reboot when the latest tech level is released, and if there are no security specific issues, even that can be ignored, although it is wise to keep up on new firmware stuff just in case.
Re:Oracle sucks. (Score:5, Insightful)
VMS isn't a Unix
So I've heard, but I believe it is a "computer operating system". Hence I thought it was a more appropriate comparison than to a bicycle.
and I don't believe you can get ahold of VMS any more
Then into the memory hole!
The IBM mainframes are too expensive
For whom? To operations like banks, for whom downtime is incredibly expensive, they're still worth it. For me, an UltraSPARC like the 280R breaks the piggy bank. I get my x86 hardware from other people's castoffs.
and not open source
As you pointed out, OSS Solaris is toast.
What's your point exactly?
Umm, that some other OS's are/were at least as reliable as Solaris. Was I being that obtuse?
Re: (Score:3)
Re: (Score:3)
time-honoured tradition of rebooting your Windows boxes as the first step in troubleshooting.
Laws!, how I hate this debugging technique. There are some people that I have worked with who would observe an issue with a program, completely skip reading any of the logging information, and jump straight to rebooting the machine. Fortunately, I try to write my applications to recover gracefully, so when the machine comes back up, the services start up and before long, the application is right back to where it was before, working on the same piece of data and complaining in the log about it.
T'ain't nothin... (Score:2, Funny)
Last place I worked at still used token ring. Packet-Packet-Give baby!
Errr? (Score:2)
I'm not sure how uptime and Token Ring really compare. Though I will say that I haven't worked on *any* Token Ring since '94 -- and that was a Thomas Conrad bastardization that did 100 Mbit over fiber. Haven't touched the copper stuff since '92.
Last message in system log was . . . (Score:5, Funny)
. . . Mar 12 11:57:03 hedvig kernel:WILL I DREAM?
Kudos. (Score:2)
I used to name my boxen "hal", "sal", and so forth.
Re: (Score:3)
I never could make up my mind on the whole "boxen" thing. Some days it was irritating enough to kill over. Other days it would just slip out, like "pop" instead of "coke" from the lips of a southerner forced to live in chicago for too long. At a minimum it does seem to show ones age though...
Re:Last message in system log was . . . (Score:4, Funny)
I was thinking:
Mar 12 11:57:03 hedvig kernel: So long, and thanks for all the bits.......
taken down early as a precaution (Score:4, Funny)
In another 57 years the uptime command might've had rollover issues.
This is news? (Score:5, Interesting)
I work at a Very Large Company (who must remain nameless.) We've got Solaris boxes that were last rebooted in the 90's. Yes. Really. Running Solaris 2.6, even.
Re:This is news? (Score:5, Insightful)
I work at a Very Large Company (who must remain nameless.) We've got Solaris boxes that were last rebooted in the 90's. Yes. Really. Running Solaris 2.6, even.
I am not surprised. I've seen Sparc/Solaris boxes run for very long times and even when not properly cared for have run times measured in months and years. I've had to shut down boxes to move them that had been running for 5 years. We where scared to death the disk drives would not spin back up after 2 days in the truck, but when we plugged them back in, they powered right back up. Sun built some SOLID hardware and produced a SOLID operating system.
Re:This is news? (Score:5, Interesting)
Amazingly enough, in my experience, two days in a truck is not nearly as bad as a few weeks in an extremely temperature-controlled, vibration free room.
The drives will weld to the platter if there's no vibration or movement after "spinning themselves flat" over many years' time.
Apparently, all the micro-projections on the surface of the heads and disks get worn off over time, making the disk and heads Extremely flat; they stick like glue when the air barrier between them escapes over time.
Thermal changes and ambient vibration are apparently enough to keep things 'fluid', and not as likely to stick.
YMMV.
Re: (Score:3)
Re:This is news? (Score:5, Funny)
I work at a Very Large Company (who must remain nameless.) We've got Solaris boxes that were last rebooted in the 90's. Yes. Really. Running Solaris 2.6, even.
I'm willing to hazard a guess who you work for. Let's see.. you're running servers that have an OS that was released in 1997, and apparently you haven't rebooted them since. Almost like your company is stuck in the mid- to late-90s. You're the only Slashdotter I've seen with an AOL instant messenger screen name in their profile. That can't be a coincidence. You work for AOL. They have you designing the latest Free CD labels.
Re: (Score:3, Funny)
Interesting, I left a Very Large Company in the late 90's after having set up a few Solaris 2.x machines for our R&D projects. I had a Quake server running on one of them. There was a lot of incentive to keep that server up.
Uptime fetish (Score:2, Insightful)
Re:Uptime fetish (Score:4, Insightful)
Funny because you're right - "Impressive UPS" is all I thought.
Re:Uptime fetish (Score:4, Insightful)
Otherwise it's a Solaris box which is missing A SHITLOAD OF PATCHES.
Apply a patch to a service and restart the service, not the whole computer. Or what am I missing?
Re:Uptime fetish (Score:4, Insightful)
Impressive if you can do that on the kernel and still be confident of stability.
Re: (Score:2)
You can't really patch the kernel while it's running
Re: (Score:2)
He's used to microsoft or apple products?
Re:Uptime fetish (Score:5, Insightful)
You have no idea if the system can start from a cold boot. And if it fails to start from a cold boot, you have no idea which of the hundreds of patches you've applied in the last 10 years is the one that is causing the boot process to fail, or if it's hardware that's randomly gone sketchy. The last known-good cold state is 10 years ago.
Power systems fail. Backup power is limited. Buildings get damaged and remodeled. For these reasons it is unwise to assume you will never need to power a system off. Even with the super hotswapping of the VAX you would occasionally need to move the system to a different building with new server rooms. If you never demonstrate that a server can safely power back on to a running state, you have no idea what state the system will be in when you do it.
Consider the system in this article for a moment. The last service was removed last year. Why was it left powered on? It was literally doing nothing but counting the seconds until it was shut down today. That's a disgusting waste of power.
Re:Uptime fetish (Score:5, Informative)
The summary is misleading. It was acting as a backup server for it's own replacement.
Re: (Score:3, Funny)
Boy, you must be fun at parties.
Re:Uptime fetish (Score:5, Informative)
You can get patches, even kernel patches without having to restart the system. That was one of it's selling points back in the day, some systems even allowed you to hot-swap or hot-upgrade CPU's and memory.
Re: (Score:2)
And with the right hardware, my OpenSolaris still does it. It "reboots" the kernel but never has to go through the whole BIOS thing. If you ever however have the wrong drivers (like Areca) the system is simply going to complain it can't quiesce the driver and reboot anyway.
Re: (Score:2)
Re: (Score:2, Troll)
Why would "missing patches" be of concern for a Unix machine?
That sounds like the sort of thing a WinDOS consumer would need to be fixated on, not an "educated sysadmin".
Missing services patches vs kernel patches (Score:2)
Why would "missing patches" be of concern for a Unix machine?
Missing services patches can leave one vulnerable to being hacked. Fortunately, you don't need a reboot to install those. Security related kernel patches do happen and they do require a reboot. However, these are generally of the privilege escalation variety and require specially written code to exploit. If you don't have untrustworthy people logging in to your machine it isn't a major problem if you don't have all the kernel patches.
Of more serious concern is the general lack of patches for Solaris 9.
Re: (Score:2)
The unfortunate software house where the dev teams are broken up after a project is complete. Then approvals are denied to patch systems because there are no devs to correct for any problems that occur due to the patch.
Uptime is all I have.
(FreeBSD box with 3,196 day uptime running internal DNS).
[John]
Re: (Score:2)
Why would the box hosting DNS need to stay up?
Mine could stay up that long, but there are a bunch of VMs doing that task so rebooting them is no big deal.
Linux, but no real not to do the same with FreeBSD.
Re:Uptime fetish (Score:5, Insightful)
If you don't care, you don't understand history. And sadly, looking at your attitude and phrasing, I got a feeling you're older than I and should know it better.
That you understand it's not worthy of worship is a mark in your favor -- but not as big as you're hoping.
It's not fanboyism. It's from the old cult of service. From taking your limited resources on a system that costs more than your pension, and absolutely positively guaranteeing they were available to your userbase.
We didn't all have roundrobin DNS, sharding, clouds in the early 2000's.
Some of us had Sun's, BSD's, Vaxen, and other systems that might be missing security fixes, but that by and large were secure as long as you made sure nobody that didn't belong on it had an account.
Kernel and driver patches? It might be a performance boost, it might be a security patch. It might be a driver problem that could cause data loss, but only if you were running a certain service. A great admin can choose which are needed. A good admin knows they should apply them all
There's something to be said about rebooting machines -- just to make sure they'll still boot. But the best sysadmins didn't need to check -- they knew.
Uptime diferentiated us from our little brothers running windows, who couldn't even change network settings without a reboot. Who had to restart every 28 days or crash horribly. Who could be brought to a grinding halt with a single large ICMP request.
In short, uptime was an additional proxy variable for admin competence (given the presence of an unrooted box).
Yeah, any idiot could leave a system plugged into a UPS in a closet and have it come out OK. But if you didn't get cracked and filled with porn, you were doing something right.
Given elastic clouds, round robin DNS, volume licensing, SAS... it's very nearly cheaper to spin up a new image and run the install scripts than reboot these days.
I'm not convinced this makes modern sysadmin practices better -- just more resilient to single-host failure.
Just the other week we had a million dollar NAS go down for nearly 12 hours (during the week) while applying a kernel update to the cluster.
If you did that in 99 on a Unix system, you'd have probably been shot after the execs showed you out the door.
Somehow, the cult of service availability has been replaced with the cult of 'good enough'
Re:Uptime fetish (Score:4, Interesting)
The old adage holds true: Iffen ain't broke, don't fix it.
If the machine is in an area where security is important, certain security patches might be needed. But that's no certainty. Other patches - well, with an uptime of 10+ years, adding a stability patch which causes downtime seems rather counter-productive.
Then, experienced sysadmins, which you clearly are not, know that like the most dangerous time for an airplane is during takeoff and landing, the most dangerous time for a server is during shutdown and start. Stiction on old drives, minor internal power surges during boot that doesn't affect a running system, and much else can cause problems.
Oh, and there are also services that you may want to provide 24/7 with no downtime at all, so help you cod. You even mention one such in your nickname. But I have strong doubts whether you truly have kept that service up and running 24/7, even with failovers, if you install patches and reboot just to install patches and reboot.
Re:Uptime fetish (Score:4, Interesting)
On the other hand, I worked on a system for the US Navy that controlled Trident-I missiles... we rebooted both of our main computers every six hours to ensure that we could reboot them when needed - and the first one after midnight included an extensive hard drive self test to make sure it was working to spec. The gentleman down thread has it right, the answer to 100% uptime is redundancy and failover or switchover, not relying on nothing ever going wrong.
In addition, you seem to be unclear on the difference between a reboot and power cycling... In the latter case, if you're worried about stiction and power surges, that's an indication that you should have been thinking about replacing the machine for quite a while rather than hoping nothing ever goes wrong. Because eventually, something will - and when that happens, now you've potentially got two problems... the one that brought the machine to it's knees, *and* the undiscovered ones because you've never rebooted or cycled power.
Re: (Score:3)
if you're worried about stiction and power surges, that's an indication that you should have been thinking about replacing the machine for quite a while rather than hoping nothing ever goes wrong
More likely, someone should have thought of that long before the hardware became legacy. When a new sysadmin comes aboard, the best that can be done for legacy systems is often to keep spares and backups, and try not to trigger any faults. The software might not be supported, and the cost of porting can run to millions.
You're lucky if you've never had to support legacy systems. And a company that has them is lucky if they don't get a new sysadmin who first thing causes downtime by well-meaning patching t
Re: (Score:2)
Rarely you see trollinsh behavior modded insightful, therefore I will bite.
First of all, "linux fanbois" pitched uptime feature 15 - 13 years ago when Windows stability was a joke. And feature wise Linux systems weren't less complex than Windows ones. Microsoft just did quite a number of fundamental mistakes in designing Windows 95/98/98SE/ME line and also older Windows NT versions, having graphical driver in ring 0 in example. All this made Windows usable only with regular reboots. Yes, there was carefully
Re: (Score:2)
I will never for the life of me understand the "uptime fetish" that uneducated sysadmins have. Who the hell cares? The only people who give a crap about this sort of thing are linux fanbois.
I don't care about uptime per se, but I hate to try to attach to a screen(1) session only to discover it's gone because someone decided it was somehow "good for the machine" to have a power cycle.
I don't ask for years of uptime, just no gratuitous reboots.
Re: (Score:2)
Re:Uptime fetish (Score:4, Interesting)
It's not like Sun has issued very many Solaris 2.6 patches in the last few years...
Besides... Many Solaris patches simply didn't require a full reboot. In fact, unless you are changing the Kernel, there was no reason to because it just takes longer. Then there is the mission critical system that is on an isolated network that you take a "If it ain't broke, don't fix it" approach. Who cares what patches are on or not? The system just needs to work, day in and out, sans patches.
Windows users amaze me with all the "got to reboot the box" they put up with. Install software? Reboot! Install new drivers? Reboot! Things start to slow down for unknown reasons? Reboot! I simply don't believe that it should be necessary to reboot a box very often. Reboots should not be required unless you are changing hardware and have to actually power it off or need to change parts of the memory resident portions of the operating system (i.e. the booted kernel image). Windows is getting better about this, but you still need to reboot it way too often for all the "recommended" patches to get installed.
Re: (Score:2)
I don't know about you, but if I am running a stable version of Solaris, with a version of whatever application that we run on it that is no longer having to be constantly updated, that is pretty much the holy grail of production. While a bazillion hours of uptime doesn't guarantee that is the case, it is a necessary condition for it happening.
As for patches, half the time new code causes as many problems as they fix. And besides, if you're running something like 2.6 or 2.8 on it, you're so far out of sup
Re: (Score:2)
What better relational query language? (Score:3)
MySQL can die for all I care. SQL likewise. Horrible language.
What language would you prefer to query a relational database?
Better than I've got. (Score:2)
Router's at "Time: 14:08:44 up 335 days, 13:29, load average: 037,0.11,0.02". That's the best I've got. Longest running computer is "1:46pm up 280 days, 21:01, 3 users, load average: 0.00, 0.01, 0.00". Tho it is a roughly 15 year old machine and it's had longer runs that the current run, I doubt it's broken a thousand days straight. But 335 and 280 days is pretty good for equipment that's not plugged into a UPS.
Here's the real question... (Score:5, Interesting)
Did they power it back up again after shutting it off? Just to see?
Surprised no one posted this yet (Score:3, Funny)
Re: (Score:3, Funny)
Netware 3.12 (Score:5, Interesting)
One of my clients had a Netware 3.12 machine on site that operated continuously about about 16 years. It was retired unceremoniously when they moved to a new location, but that machine did not in all its life have a hardware fault or abend.
Not a good thing!!!! (Score:5, Insightful)
Last place I was at that had server admins that bragged about /years/ of uptime quickly turned into a discovery that we had thousands of servers that had not been patched in years. Only a few systems can patch the kernel without rebooting and those are the exception, not the rule. It turned into a six month project but in the end we were patching systems that were vulnerable to 5 year old exploits (mix of *nix and Windows).
I had to make the argument that server uptime meant jack, and to make it I put forward the argument that the only thing that mattered was /service/ uptime. Frankly it is the service that needs to be always available, not the server. This is why you have maintenance windows, for the explicit purpose of allowing a given system to patched and rebooted at a predictable time without interrupting services.
If your server is really that important it will have a fail over server for redundancy (SQL cluster, whatever). If your server isn't important enough to have a failover server for service redundancy that it isn't so important that you can't have a maintenance window. Think service, not server!
The only thing that matters is service availability.
Truly Impressed (Score:2)
Re: (Score:2)
old Unix has a ton of security holes though, wouldn't want a web server right on the internet with it
I've worked on Alpha VMS systems with 15 years uptime.
Re: (Score:3)
HP calls it OpenVMS now, their big Itanium boxes can run it, and Alpha version still supported till 2016:
http://h71000.www7.hp.com/openvms/openvms_supportchart.html [hp.com]
And they shut it down? (Score:3)
Kevin Flynn was trapped in there!
My /. user number? I'm honored! (Score:5, Funny)
I know, I know, it's just a coincidence...
AT&T 3B20D's (Score:3)
I don't know this for sure, but I suspect there is one out there with 30 years of uptime now, or damn close to that, running Unix-RTR as part of a 5ESS switch.
Re: (Score:2, Insightful)
Re: (Score:2)
Re: (Score:2)
Re:in other news ... (Score:5, Insightful)
a slab of concrete has been found with an uptime of 3737 years
You exaggerate. The oldest concrete structure I know of is the dome of the Pantheon, and that's only been around for 1887 years. Time will tell if it was well built.
Re: (Score:3)
a slab of concrete has been found with an uptime of 3737 years
You exaggerate. The oldest concrete structure I know of is the dome of the Pantheon, and that's only been around for 1887 years. Time will tell if it was well built.
Umm, who cares about what "you know of"? What matters is historical fact. The Colosseum, for example, contains large amounts of concrete and was finished a half-century before the Pantheon. Lots of concrete was used in rebuilding after the great fire in Rome in the mid first century as well. But, of course, Roman concrete was around for centuries before that.
And yet, all of this is irrelevant, since concrete was used in Egypt, Syria, China, and other places thousands of years earlier. There are in fa
Re: (Score:3)
>
maybe the sysadmins liked them but as a developer i hated solaris boxen. the libraries were always years old, nothing modern would compile, the cli tools were slightly incompatible with linux scripts, ...
They may be a pain to write and deploy programs on but they will run forever once you do...
Fully characterized platforms, take a LOT of testing effort and testing at this level takes lots of time. The Sparc/Solaris platform was behind the state of the art, but it was stable, stable, stable. Solaris on X86 wasn't bad, if your hardware was supported and you didn't really need the GUI to be local, but it wasn't as stable (mainly due to the hardware).
Sun did their stuff right for the most part, but got serio
Re: (Score:3)
Linux on SPARC (Score:2)
Re: (Score:2)
Myself, I hate when developers depend on the newest versions of libraries and stuff.
I run Debian on my servers. If your app can't run on top of the older versions of the libraries, then... I just don't need to run your app, at least not for a few years yet. I'll take "stable" over "modern", please.
Re: (Score:2)
And that's why concrete is such an awesome building material. In the short term, it might be a lot easier/faster to whip up a structure out of mud, sticks, and goat hide --- mixing, forming, pouring, curing concrete is a real pain. However, the mud-stick-goat solution doesn't work out so great if you need a structure that endures for the ages.
Re: (Score:2)
Incompatibility with linux (I guess you mean GNU) scripts is understandable, incompatibility with basic POSIX requirements is not. If something works both on GNU and BSD systems, there's a fat chance it's a fault of Solaris rather than the script.
(I dislike using the name "GNU/Linux", but here the distinction matters: GNU works on kfreebsd too (ie, BSD kernel, GNU userland), and if one's crazy, even on hurd. And I strongly suspect you didn't mean Android, which uses Linux but doesn't pretend to be UNIXy a
Re: (Score:3)
If you had a problem with POSIX compatibility on Solaris, it's because you don't know Solaris. There are specific paths you should specify for the various POSIX standards, /usr/xpg4, /usr/xpg6, etc. You might try "man -s 5 POSIX" for a start.
Re:in other news ... (Score:5, Informative)
/usr/xpg4/something is not /bin/sh, the latter being what POSIX requires.
Re: (Score:2)
If you don't enjoy stories about servers, you might not be on the right website.
Re:3737 days in years (Score:4, Insightful)