Forgot your password?
typodupeerror
Image

Solaris Machine Shut Down After 3737 Days of Uptime 409

Posted by timothy
from the those-are-some-crazy-socks dept.
An anonymous reader writes "After running uninterrupted for 3737 days, this humble Sun 280R server running Solaris 9 was shut down. At the time of making the video it was idle, the last service it had was removed sometime last year. A tribute video was made with some feelings about Sun, Solaris, the walk to the data center and freeing a machine from internet-slavery."
This discussion has been archived. No new comments can be posted.

Solaris Machine Shut Down After 3737 Days of Uptime

Comments Filter:
  • Oracle sucks. (Score:5, Insightful)

    by RocketRabbit (830691) on Thursday March 14, 2013 @04:37PM (#43175745)

    I'd just like to leave this here. Yeah, I know Linux is great and everyfink, but Solaris is excellent and better in some ways. Oracle really ground my gears when they stopped supporting OpenSolaris and OpenIndiana is going nowhere fast.

    RIP Sun.

  • Uptime fetish (Score:2, Insightful)

    by DNS-and-BIND (461968) on Thursday March 14, 2013 @04:44PM (#43175839) Homepage
    I will never for the life of me understand the "uptime fetish" that uneducated sysadmins have. Who the hell cares? The only people who give a crap about this sort of thing are linux fanbois. The only thing this tells me is that this machine has had an uninterrupted power supply, which is mildly impressive. Otherwise it's a Solaris box which is missing A SHITLOAD OF PATCHES. WTF, sysadmins? What kind of pro sysadmin worships at the altar of individual machine uptime? Much less a Solaris sysadmin?
  • by Anonymous Coward on Thursday March 14, 2013 @04:45PM (#43175847)
    Somewhere at my last job, there was a Solaris 8 machine with over 4000 days uptime, that everybody hated to do anything with, but one person loved it and refused to migrate the last service that was still on it to something more modern.

    Uptime is irrelevant for an individual server, anyway. If there's fail over (and there should be if uptime is important), take it down and update the kernel for security reasons, who cares?
  • Re:Uptime fetish (Score:4, Insightful)

    by FileNotFound (85933) on Thursday March 14, 2013 @04:47PM (#43175875) Homepage Journal

    Funny because you're right - "Impressive UPS" is all I thought.

  • Re:Uptime fetish (Score:4, Insightful)

    by tepples (727027) <[moc.liamg] [ta] [selppet]> on Thursday March 14, 2013 @04:47PM (#43175877) Homepage Journal

    Otherwise it's a Solaris box which is missing A SHITLOAD OF PATCHES.

    Apply a patch to a service and restart the service, not the whole computer. Or what am I missing?

  • Re:Wow! (Score:2, Insightful)

    by black3d (1648913) on Thursday March 14, 2013 @04:53PM (#43175927)
    Really? Comparing it to Windows 95? You know that was almost 20 years ago, right? It'd getting kinda old. A more apt comparison (considering Solaris 9's x86 release) would be "WoW! That is 2940 days more than Windows Server 20003 could stay up!" ;)
  • Re:Uptime fetish (Score:4, Insightful)

    by Richard_at_work (517087) <richardprice.gmail@com> on Thursday March 14, 2013 @04:54PM (#43175943)

    Impressive if you can do that on the kernel and still be confident of stability.

  • by h4rr4r (612664) on Thursday March 14, 2013 @05:04PM (#43176077)

    Just get it in writing.
    Been there done that, when it has to come down for hardware failure or something like that you can show you tried to get a backup machine, you tried to do things right.

  • by ebno-10db (1459097) on Thursday March 14, 2013 @05:06PM (#43176101)

    a slab of concrete has been found with an uptime of 3737 years

    You exaggerate. The oldest concrete structure I know of is the dome of the Pantheon, and that's only been around for 1887 years. Time will tell if it was well built.

  • Re:Uptime fetish (Score:5, Insightful)

    by Anonymous Coward on Thursday March 14, 2013 @05:09PM (#43176127)

    If you don't care, you don't understand history. And sadly, looking at your attitude and phrasing, I got a feeling you're older than I and should know it better.

    That you understand it's not worthy of worship is a mark in your favor -- but not as big as you're hoping.

    It's not fanboyism. It's from the old cult of service. From taking your limited resources on a system that costs more than your pension, and absolutely positively guaranteeing they were available to your userbase.

    We didn't all have roundrobin DNS, sharding, clouds in the early 2000's.

    Some of us had Sun's, BSD's, Vaxen, and other systems that might be missing security fixes, but that by and large were secure as long as you made sure nobody that didn't belong on it had an account.

    Kernel and driver patches? It might be a performance boost, it might be a security patch. It might be a driver problem that could cause data loss, but only if you were running a certain service. A great admin can choose which are needed. A good admin knows they should apply them all

    There's something to be said about rebooting machines -- just to make sure they'll still boot. But the best sysadmins didn't need to check -- they knew.

    Uptime diferentiated us from our little brothers running windows, who couldn't even change network settings without a reboot. Who had to restart every 28 days or crash horribly. Who could be brought to a grinding halt with a single large ICMP request.

    In short, uptime was an additional proxy variable for admin competence (given the presence of an unrooted box).

    Yeah, any idiot could leave a system plugged into a UPS in a closet and have it come out OK. But if you didn't get cracked and filled with porn, you were doing something right.

    Given elastic clouds, round robin DNS, volume licensing, SAS... it's very nearly cheaper to spin up a new image and run the install scripts than reboot these days.

    I'm not convinced this makes modern sysadmin practices better -- just more resilient to single-host failure.

    Just the other week we had a million dollar NAS go down for nearly 12 hours (during the week) while applying a kernel update to the cluster.

    If you did that in 99 on a Unix system, you'd have probably been shot after the execs showed you out the door.

    Somehow, the cult of service availability has been replaced with the cult of 'good enough'

  • Re:Uptime fetish (Score:5, Insightful)

    by Bacon Bits (926911) on Thursday March 14, 2013 @05:10PM (#43176135)

    You have no idea if the system can start from a cold boot. And if it fails to start from a cold boot, you have no idea which of the hundreds of patches you've applied in the last 10 years is the one that is causing the boot process to fail, or if it's hardware that's randomly gone sketchy. The last known-good cold state is 10 years ago.

    Power systems fail. Backup power is limited. Buildings get damaged and remodeled. For these reasons it is unwise to assume you will never need to power a system off. Even with the super hotswapping of the VAX you would occasionally need to move the system to a different building with new server rooms. If you never demonstrate that a server can safely power back on to a running state, you have no idea what state the system will be in when you do it.

    Consider the system in this article for a moment. The last service was removed last year. Why was it left powered on? It was literally doing nothing but counting the seconds until it was shut down today. That's a disgusting waste of power.

  • by onyxruby (118189) <onyxrubyNO@SPAMcomcast.net> on Thursday March 14, 2013 @05:11PM (#43176159)

    Last place I was at that had server admins that bragged about /years/ of uptime quickly turned into a discovery that we had thousands of servers that had not been patched in years. Only a few systems can patch the kernel without rebooting and those are the exception, not the rule. It turned into a six month project but in the end we were patching systems that were vulnerable to 5 year old exploits (mix of *nix and Windows).

    I had to make the argument that server uptime meant jack, and to make it I put forward the argument that the only thing that mattered was /service/ uptime. Frankly it is the service that needs to be always available, not the server. This is why you have maintenance windows, for the explicit purpose of allowing a given system to patched and rebooted at a predictable time without interrupting services.

    If your server is really that important it will have a fail over server for redundancy (SQL cluster, whatever). If your server isn't important enough to have a failover server for service redundancy that it isn't so important that you can't have a maintenance window. Think service, not server!

    The only thing that matters is service availability.

  • Re:This is news? (Score:5, Insightful)

    by bobbied (2522392) on Thursday March 14, 2013 @05:21PM (#43176265)

    I work at a Very Large Company (who must remain nameless.) We've got Solaris boxes that were last rebooted in the 90's. Yes. Really. Running Solaris 2.6, even.

    I am not surprised. I've seen Sparc/Solaris boxes run for very long times and even when not properly cared for have run times measured in months and years. I've had to shut down boxes to move them that had been running for 5 years. We where scared to death the disk drives would not spin back up after 2 days in the truck, but when we plugged them back in, they powered right back up. Sun built some SOLID hardware and produced a SOLID operating system.

  • Re:Oracle sucks. (Score:5, Insightful)

    by ebno-10db (1459097) on Thursday March 14, 2013 @06:05PM (#43176759)

    VMS isn't a Unix

    So I've heard, but I believe it is a "computer operating system". Hence I thought it was a more appropriate comparison than to a bicycle.

    and I don't believe you can get ahold of VMS any more

    Then into the memory hole!

    The IBM mainframes are too expensive

    For whom? To operations like banks, for whom downtime is incredibly expensive, they're still worth it. For me, an UltraSPARC like the 280R breaks the piggy bank. I get my x86 hardware from other people's castoffs.

    and not open source

    As you pointed out, OSS Solaris is toast.

    What's your point exactly?

    Umm, that some other OS's are/were at least as reliable as Solaris. Was I being that obtuse?

  • by Anarchduke (1551707) on Thursday March 14, 2013 @06:11PM (#43176823)
    an even more important part of your job then ensuring failover. that is, covering your ass.
  • by kasperd (592156) on Thursday March 14, 2013 @06:27PM (#43176941) Homepage Journal

    mission critical function with no fail over

    It is surprisingly hard to guarantee data integrity when doing a fail over.

    If you want to guarantee a system keeps operating and maintains data integrity when a single computer fails, you need at least another three computers that are still running with no failures. There is a mathematical proof for this.

    If you want to go lower than four computers, you have to make assumptions about how the failures behave. And if just one computer fails in a way that does not match your assumptions, the system will fail.

    If you do decide to go with the four computers required to handle a single failure, the protocols to ensure they agree on the current state of your data are quite complicated. The protocols have to be non-deterministic. That's another proven fact. No matter how many machines you throw at the problem, a deterministic protocol cannot handle even a single failure.

    You can get around the non-deterministic requirement if you make assumptions about the timing of communication. But you'd slow down the system unnecessarily because you'd have to wait for the maximum time you assumed packet delivery could take on every operation, and if the network was slower than you assumed, the system would fail.

    Knowing how difficult fail over can be, it is no surprise that sometimes it is decided to not bother with it and instead hire an operator, who you assume can make everything be ok as long as you have backups plus spare hardware ready to put in production.

  • Re:Oracle sucks. (Score:4, Insightful)

    by Reschekle (2661565) on Thursday March 14, 2013 @06:45PM (#43177067)

    OK, first off, it is not stolen. You cannot steal open source software. Oracle is following the GPL.

    Second, Oracle was doing OEL before they acquired Sun.

    Solaris is a technically good and high quality OS but its hardware support was limited. If you bought the Sun-branded boxes and Sun-branded cards, you were OK. However if you are white-boxing a server, you had to be careful to select chipsets that were on their compatibility list. Then support got murky at that point even then.

    I really, really love Solaris, but let's face the facts. Outside of the SPARC platform, there is no reason for Solaris. Linux does everything as well or nearly as well. Linux is weaker in some areas, but not weak enough to justify the cost and lock-in of Solaris.

    Solaris exists for Oracle to milk legacy customers on support contracts who aren't ready or willing to migrate to Linux and commodity x86 hardware . There isn't much if any new development going on, and Oracle is only pushing Solaris to new customers as part of their big data warehouse solutions (where customers have $$$$$ and want to spend it with one vendor) where they want to get people locked in to one vendor.

  • by bobbied (2522392) on Thursday March 14, 2013 @06:47PM (#43177077)

    Only a few systems can patch the kernel without rebooting and those are the exception, not the rule.

    You sir, must be a windows admin... Server up-time DOES mean a lot when you have an SLA that specifies 99.999%. Rebooting complex systems of servers just to apply patches simply doesn't fit in the allowed down time. Your mileage apparently varies, and obviously your up-time requirements are lower.

    Windows services have grave difficulty with "five nines", heck X86 hardware has problems meeting that. You have to reboot once a quarter or more just to install the required patches on Windows and that will drive you under five nines.. Put a windows box on the internet and you had better keep pace with the patches or else. Not so with Sparc/Solaris. Solaris/Sparc hardware could easily meet 99.999% with limited amounts of fuss and patching was usually not an issue with these systems. Solaris was SOLID, the hardware was SOLID, these systems just run and if you set them up correctly they where safe enough to run unpatched when necessary.

  • by hcs_$reboot (1536101) on Thursday March 14, 2013 @09:41PM (#43178627)
    Thanks, not so many people know that there are 3650 days in 10 years, especially the geeks here on /.
  • by crutchy (1949900) on Friday March 15, 2013 @02:48AM (#43180125)

    best way to justify the job you do is to create work for yourself

    in IT you can covertly install a virus, which will have half your users begging to get things back up and running and the other half berating you for not doing your job

    the last thing you want to do is increase your efficiency to the point where management thinks you are no longer required or that your role can be filled by a machine or some kid fresh out of school

    or if you're a department of defense big brass knob, you need to justify spending billions of tax payer money, so you blow up 2 skyscrapers and scare the crap outta the public so they give you more money to go off and fight the world :)

"The only way for a reporter to look at a politician is down." -- H.L. Mencken

Working...