Linux Supercomputer Wins Weather Bid 115
Greg Lindahl writes "The Forecast Systems Laboratory, a divison of NOAA, selected HPTi, a Linux cluster integrator, to provide a $15 million supercomputing system during the next 5 years. The computational core of this system is a cluster of Compaq Alphas running Linux, using Myrinet interconnect.
Check outwww.hpti.com for information on the company. "
Linux Not Useful For All Superclustering Tasks (Score:2)
We looked at putting together a Beowulf Linux cluster to run our software, which is very memory and processor intensive, but Linux could not do the job because JVMs on Linux are absolutely terrible. We wound up on WinNT (we couldn't afford Suns, but plan to upgrade when we can) because the JVMs were the best.
Because people making large software systems are fed up with reengineering for new hardware, expect other people to start choosing Java for large, intensive applications that were previously written in C, Fortran, C++, etc.
If Linux can't compete with other OSes for running large Java programs, these projects will not be able to consider Linux as their OS of choice (which we all WANTED to do here, we were very upset to go to NT).
Right now the fastest Java environment we've found is Java 2 with HotSpot, running on NT (we're testing Solaris now, as we might be able to afford Suns soon). Can the Linux community do any better, or even as well? So far, no.
Re:Why Alpha's? Screaming FP performance, that's w (Score:1)
Perhaps in the future a cluster of G4's will be used. The gcc compiler should/may be generating more efficient in the future as improvements are being made. IIRC, apple is using gcc in the development of the forthcoming MacOS-X.
Nonetheless, it is nice to see the federal government go this route.
The link also shows the k7 as faster (Score:1)
I am also looking for speed in powerpc g3 for a new powerpc linux box and the standard p3 was alomst twoce as fast. I am a former mac guy and I use to regard anything from apple in benchmarking as fact but either apple is really lieing (probably are) or this test is biased. I think you should found out what this test was trying to prove. Something is really screwed up.
Re: Solving PDEs (Score:1)
The major controlling factor is the model. For fluid dynamics, approximations are made to make the problem solvable. Stuff like, aOf course, the input parameters/data can play a major role. If the problem is chaotic, one has to run a whole bunch of scenarios to obtain a statistical model.
My only dispute with what you said is that if the model is wrong, the results may be wrong. Running three models with limitations that yield the same result may not give you the right answer. Additionally, chaotic effects can lead to bad results.
And to the idiot who commented about no advances in math, I would like to say that while the math (e.g., 1+1=2) may remain the same, the physical model may be different.
Re:Why Alpha's? Because they are fastest (Score:1)
Scientific, vector processor tuned codes are known to run fastest on the Alpha 21264 + Tsunami memory chipset, so it is the only choice for a no-compromise, fastest computer in the world solution.
Take a look at the benchmark numbers (albeit limited) on http://www.hpti.com/clusterweb/ for some initial results.
Now, on the choice of Myrinet... This is a more interesting question.
Any takers?
No_Target
How about 400 MB/s Sustained? (Score:1)
An enabler for cluster effectiveness is the Fibre Channel Storage Area Network, a technology that allows multiple hosts to read _and_ write to the same file at the same time at very high bandwidth.
In fact, the I/O bandwidth of a cluster in this context is still limited by the speed of the PCI busses on one node if you are serializing the I/O to that one node. If this is the case, the XP1000 will sustain about 250+ MB/s with three-four Fibre Channel Host Bus Adapters on its two independent PCI busses. If your software can distribute the I/O to multiple nodes, like FSL's parallel weather forecasting API can (SMS), then your I/O bandwidth is essentially limited by your budget for RAID systems, Fibre Channel Switches and HBAs.
No_Target
Re:Linux Not Useful For All Superclustering Tasks (Score:1)
However, native code is always an improvement (instead of bytecode).
Makes me wish Juice had been more sucessfull... (Juice was/is the platform-independant binary format used for Oberon; its loader translated it quickly to native code). The current equivalent of Juice is ANDF.
-Billy
A few words about WRF (Score:1)
A few other thoughts:
Hi, Greg! Didn't know you were here!
Re:Are they afraid? (Score:1)
benchmarks (Score:1)
gcc is gcc version 2.95.1 19990816 (release). Compile time options: -O9 -mcpu=ev56
ccc is Compaq C T6.2-001 on Linux 2.2.13pre6 alpha. Compile time options: -fast -noifo -arch ev56
The benchmark consisted of running two scripts through the CGI version of PHP4 [php.net]. We compare user times as measured by time(1). The tests were run three times, the shown results are mean values. The scripts are available from the Zend homepage [zend.com]. PHP was configured with --disable-debug.
The test shows that the code ccc produced was about 10% faster than gcc's. Other conclusions are left as an exercise to the reader.
The Real reason they chose linux (Score:1)
Re:Linux Not Useful For All Superclustering Tasks (Score:2)
Can you elaborate without giving away the company secrets?
attn: moderator - follow the link! (Score:2)
I'm crying foul on the moderations I've been given on this story. It's true that the government finds ways to mess things up, e.g. crypto laws, software patents, etc.
M2 has seemed to make moderations a bit more accurate, but I don't see it working out for me here. Unless somebody actually goes to the page [berkeley.edu] and sees what I'm talking about -- "Alpha" in ten hours, and the EV series are cranking out units faster than LensCrafters...
I didn't make up those "CPU's". They are actually listed on the page! Please follow the link [berkeley.edu] and see for yourself.
--
Re:An interesting observation (Score:1)
Check out: http://linuxtoday.com/stories/10157.html
.signature not found
Re:Linux Not Useful For All Superclustering Tasks (Score:2)
Re: Solving PDEs (Score:2)
The very definition of "chaos" is high sensitivity to changes in the initial conditions. If a weather front appears in the same place (within the resolution of the data grid) on all 120-hour forecasts despite a reasonable variation in the initial conditions, you can be pretty sure it isn't in a chaotic realm and your forecasts will be fairly accurate.
On the other hand, if a modest amount of variation in the initial conditions result in wildly different predictions, the system is obviously in a chaotic realm and you can't make decent predictions.
As odd as it sounds, for something as large as a planetary atmosphere it's quite reasonable for parts of the system to be chaotic while other parts are boringly predictable. That's why they were starting to compare the predictions from different models, the same models with slightly different initial conditions, etc. That might give the appropriate officials enough information to decide to evacuate a coastline (at $1M/mile), or to hold off another 6 hours since the computers predict the storm will turn away.
P.S., the models do make mistakes, but fewer than you might expect. It's been years since I've thought about it, but as I recall most models work in "isentrophic" coordinates and are mapped to the coordinates that humans care about at the last step. The biggest problem has been the resolution of the grids; when I left I think the RUC model was just dropping to 60km; by now it's probably 40 or 30km. To get good mesoscale forecasts (which cover extended metro areas, and should be able to predict localized flooding) you probably need a grid with 5 or 10 km resolution.
Re:Software? (Score:1)
Dude... I think you just compressed an entire episode of Star Trek into six sentences.
Re:Great. I still wonder about the compilier thoug (Score:1)
Re:How about 400 MB/s Sustained? (Score:1)
Re:Not a beowulf? (Score:1)
Beowulf refers to the tools created at NASA Goddard CESDIS [nasa.gov]
This cluster uses MPI and tools developed by the University of Virginia's Legion Project [virginia.edu]
Beowulf has become, to some, a generic term for a Linux cluster, like Kleenex to tissues.
Mark Vernon HPTi
Re: Solving PDEs (Score:1)
NOAA doesn't care what's in the machine (Score:1)
If SGI or IBM (the two other leading competitors) had won, the press release wouldn't have mentioned Irix or AIX either.
HPTi could deliver 10,000 trained monkey's in a box if it met the performance requirements.
The fact that a Linux solution could exceed the performance of an SGI or IBM supercomputing solution is important to the Linux community, but not directly to NOAA.
Mark Vernon
HPTi
I have Compaq's compiler, and it kicks ass. (Score:1)
I've seen 280% speedups over gcc's best effort, more than justifying the 100% price premium of the hardware over (for instance) dual PIII boxen.
If I was going to put in a number crunching cluster (and I may) AlphaLinux would be the best way for me to go, cutting 40% from my TCO over IntelLinux.
Thanks Compaq!
Re:You got it all mistaken dude. (Score:1)
purpose of hypercube or 5-D torus is to have a shortest path to as many places as possible, instead of hopping onto that megapipe
and making a stop at every node to see who wants to get off.
Technicaly you are correct. What I wanted to illistrate though is that in big NUMA boxes, you have one copy of the kernel running all processors. With a Beowulf system, and a Cray T3E I believe, you have a local copy of the kernel on each node of one or two processors. This negates the SMP problems of Linux on multi-CPU machines.
its not really free, but WHY? (Score:1)
there would be no reason not to just make thier
own back end to gcc for the alpha.
i still dont know why compaqs doing this...
Re:Linux Supercomputer Wins Weather Bid (Score:1)
Granted, it's not a 2 to (1 + 1) performance ratio in the truest sense but the concept is valid if not the accuracy of my description.
On top of that, the previous post said nothing about running on 32bit. Alpha and several other currently available systems are running 64bit today (and for the past several years). True, x86 is not 64bit. IA-64 is not really an x86 processor but the next generation from Intel. IA-64 will bring Intel more in line with what other chip manufacturers have been doing for extreme high end systems for years and will bring it to prominence on the desktop.
D. Keith Higgs
CWRU. Kelvin Smith Library
Re:its not really free, but WHY? (Score:1)
How would this work without an OS (Score:1)
oops (Score:1)
Re:benchmarks (Score:1)
If you want to benchmark, then do a meaningful benchmark!
Why Alpha's??? (Score:2)
I've installed Linux once on an Alpha box and the BIOS is truely impressive, much better than PCs. But what are some of the other reasons? Wider data/cpu buses? Larger memory configurations?
Anyone who actually uses Linux on Alphas is encouraged to reply.
Re:Why Alpha's??? (Score:2)
Probably the best thing is that engineers like alphas, and they like linux.
Pan
Beowulf Cluster! (Score:1)
Good to see this sort of thing (Score:1)
Along with the use of Linux in digital VCRs and other Internet appliances this goes a long way to validating Linux as a viable, and very flexible commercial platform.
-josh
Re:Ok, here's your chance... (Score:2)
Why Alpha's? Screaming FP performance, that's why (Score:3)
I've installed Linux once on an Alpha box and the BIOS is truely impressive, much better than PCs. But what are some of the other reasons? Wider data/cpu buses? Larger memory configurations?
The big thing about the Alpha for people like NOAA (who run big custom number-crunching apps written in FORTRAN) is its stellar FP performance. A 500MHz 21264 Alpha peaks at 1 GFLOPS and can sustain 25-40% of that, because of the memory bandwidth available. A Pentium III Xeon at the same clock rate peaks at 500MFLOPS and can sustain 20-30% of that.
That doesn't fly for everybody, though. Where I work, we have a huge hodgepodge of message-passed, shared-memory, and vector scientific codes, plus needs for some canned applications that aren't available on the Alpha. We picked quad Xeons for our cluster and bought the Portland Group's compiler suite to try to get some extra performance out of the Intel chips.
Re:Why Alpha's??? (Score:3)
(UP2000 21264 667MHz -Alpha Processor Inc)
53.7 SPECfp95
32.1 SPECint95
The P3 is
(SE440BX2 MB/550MHz P3 -intel)
15.1 SPECfp95
22.3 SPECint95
An interesting observation (Score:1)
This is good news, but it only affirms the role of Linux in niche markets. It will be some time before it is accepted widely as a general purpose business or desktop solution.
Go to this link (Score:3)
Compare the SPECfp scores of high-end Intel and Alpha offerings. Take a look at a 600MHz PIII Xeon and a 667MHz Alpha 21264.
The reason to choose Alpha should be obvious.
will it live up to expectations? (Score:1)
Re: Solving PDEs (Score:4)
NOAA needs to solve partial differential equations (PDEs). A *lot* of PDEs. My class spent a lot of time on solving numerical methods, and my entire undergraduate class in the early 80's was covered in the first lecture of my graduate class a few years ago. My Palm Pilot, running multigrid analysis, could beat the pants off a Cray-XMP running the best known algorithm from 15 years ago.
AI programs may not scale well, but the type of work done at NOAA *does*. Furthermore the hot topic a few years ago was applying some ideas from chaos theory to weather forecasts - take a dozen systems, insert just a little bit of noise into the initial data (essentially, instrument noise in your observations), then let them all run. If all models show the same weather phenonema, you can be pretty sure that it will occur. If the models show wildly different results (e.g., Hurricane Floyd slams into Key West in one run, but NYC in the other) you know that you can't make any firm predictions. As an educated layman's guess, I expect that the reason the hurricane forecasts are so much better than just a few years ago is precisely this type of variational analysis.
Re:Why Alpha's? Screaming FP performance, that's w (Score:3)
If the G4 can sustain >1gflops, then why not build a cluster of G4s running LinuxPPC?
I'm not convinced the G4 can sustain 1 GFLOP/s in any kind of real calculation -- it simply doesn't have enough memory bandwidth. The G4 uses the standard PC100 memory bus, AFAIK. That's 64 bits wide running at 100MHz = 800MB/s peak. So without help from the caches, the absolute best you can do is on *any* PC100 based system is 200 MFLOP/s using 32-bit FP or 100 MFLOP/s using 64-bit FP. In practice you can only sustain about 300-350 MB/s out of the PC100 memory bus, so things get even worse. The caches will help quite a bit (maybe a factor of 2-4), but I have trouble imagining the G4 being able to sustain over 500 MFLOP/s even on something small like Linpack 100x100 because of the limited bandwidth and latency of the PC100 bus. Other processors that have similar peak FP ratings have much higher memory bandwidths; we've benchmarked an Alpha 21264 (1 GFLOP/s peak, ~400 MFLOP/s sustained) at about 1 GB/s memory bandwidth (that's measured, not peak), and a Cray T90 CPU (1.8 GLOP/s peak, ~700 MFLOP/s sustained) at 11-13 GB/s (again, measured not peak).
There's also the question of compilers. You have to have a compiler that recognizes vectorizable loops and generates the appropriate machine code to use the vector unit. Unless Motorola's feeling *really* magnanimous, I don't see that kind of technology making it into gcc (and g77, more importantly for scientific codes) any time soon. Otherwise, you're at the mercy of a commercial Fortran compiler vendor like Portland Group or Absoft. PGI hasn't shown any interest in PowerPC to this point, and Absoft currently does PPC compilers only for MacOS 8, not OSX or LinuxPPC.
I'd love to be proven wrong on this, but based on my experience I don't see how you could do it.
Re:Linux Not Useful For All Superclustering Tasks (Score:1)
The answer is you *DON'T*. This is basically crap from the JAVA crowd trying to pretend that JAVA is actually something you'll actually want to use in the real world. The Amiga Arexx crowd used run around pulling the same kind of stunts too. I wouldn't be too surprised to discover if in fact a large number of the JAVA advocates posting here also ran around adovacating the use of AREXX for *everything* on the Amiga, no matter how silly it was.
Re:Linux Supercomputer Wins Weather Bid (Score:1)
Every six hours the National Weather Service sends out to all of it's forecast offices around the country a series of models to help in local forecasting. Each model is based on a massive amount of information that comes in to their central office, and that information is used in preparing the next set of forecasts. Now, you would want a) a system that is capable of processing all of this information rapidly and reliably, with b) redundancy built in so that if a part of the system goes down, you're still able to digest and transmit those models. Using a cluster of systems gives you that backup redundancy, and using a stable operating system gives you that speed and reliability to churn out models reliably.
The people at NOAA likely could care less about advocacy in this respect. What they want is a system that they can use, provide them the reliability and performance that is demanded, for a reasonable cost. $15 million for a distributed cluster that gives them a lot more bang for the buck is definitely money well spent. And remember, this IS your tax dollars at work, one of the few times you will ever see it spent for a truly worthwhile cause.
-Tal Greywolf
Great. I still wonder about the compilier though (Score:2)
Re:One more thing... (Score:1)
MS also won the award for best VM at JavaOne.
Re:Why Alpha's??? (Score:1)
PCI can support, so your NIC becomes a message passing bottleneck without 64 bit PCI.
There are various types of alphas available. As has already been mentioned, the 21264 (ev6) is the latest and greatest. Price/performance wise, however, you simply can't beat its older cousin, the 21164 (ev56). Volume sales have driven the cost of the 21164 down to right around the same cost as a similarly clocked Intel box.
Someone mentioned the K7, or AMD Athlon, as being faster than a alpha. Not true. It has exactly the same floating point peak, and has the same bus as the ev6. However, due to its x86 instructions set, software has access to only 8 floating point registers, which means achievable peak is going to be quite a bit lower for the Athlon than for the ev6 (you wind up continually reloading stuff from L1 that you can keep in registers on the ev6).
Re:Toy Story: The Beowolf Cluster (Score:1)
Re:The right tool for the right task ... (Score:1)
Re:Linux Not Useful For All Superclustering Tasks (Score:1)
supercomputing (Score:1)
PA-RISC definitely worth a look. (Score:1)
The peak total memory bandwidth available then was 2.4Gb/sec in the AlphaServer 8400, and it really had an impact on big calcs - can't speak for SPECfp, but for a big matrix algebra calc you need (asymptotically approaching) 4 bytes/sec per "flop", and these systems just didn't cut it.
I won't even speak about 32-bit Intel boxes - the 100MHz cache bus sucks enormous rocks, and the 4Gb memory limit (3Gb with NT, less with Linux IIRC) cuts it out of the big job league anyway. This is maybe OK if it's a node in a large MPP system, but these days you want to be able to bring 64Gb or more of RAM to bear on a single problem.
The question we used to hear from our engineering staff was along the lines of: "Hey, my desktop PC is n-zillion MHz, and it runs this tiny test calc almost as fast as the big machine, why don't we just get a lot of big twin Xeon PC's with XYZ graphics cards?". Or occaisonally, the same thing in favour of SGI workstations - engineers love toys just like the rest of us.
This is the classic misconception caused by benchmarks in the FE industry; a lot of test calcs will fit in the cache on a Xeon PC or an R10k or UltraSparc workstation, and show pretty acceptable performance, but the dropoff when you move to a larger problem size and start hitting RAM is sudden and dramatic.
By comparison, if you look at real supercomputers, like the high end Crays or NEC SX series, memory bandwidths of 2 to 4 Gb/sec *per processor* are the norm.
The machine we ended up buying to replace a low-end vector Cray was - an HP V-Class.
The PA-RISC has excellent scoreboarding and memory bus, and the Convex architecture keeps it well fed. We tested on the Convex S-Class hardware running at 180MHz with SPP-UX, and HP guaranteed that the delivered system running HP-UX would meet the clock over clock speedup ratio, which it did with room to spare. We saw well over 700 MFlops *sustained* per CPU on a 200MHz PA-8200 using rather nondescript FORTRAN, against a theoretical peak of 800.
The picture with the newer PA-8500 machines is not so rosy, as the memory bandwidth does not seem to have been scaled up with the capabilties of the new CPUs, especially with double the number of CPUs per board. Nevertheless, as the previous posters' figures would indicate, I believe the sustained throughput still exceeds that of the latest Alpha based systems for certain types of job, and the price/performance is very good.
Of course, for the rabidly religious, Linux is still not well supported on PA-RISC, and doesn't handle the high end hardware.
Re:Why Alpha's??? (Score:1)
1) They scale very easily
2) They process very quickly
3) They are totally modular, so if something breaks its very easily replaced.
4) Pentium based servers haven't quite got the architecture to allow for multiprocessing and multiuser processes.
Its good to see this happening especially after Microsoft stopped NT on Alphas. This would have traditionally been thier area. If this sort of thing continues Linux would get a lot of kudos and respectibility, which can only be good.
I keep thinking back to the Coca-cola/Pepsi war, and the moment Coke changed their formula. Maybe Microsoft have just done the same thing and lost a lot of the battle.
IA64 is good, but it will be a long time before it gets the stability and respect that Alpha processors currently have.
Re:benchmarks (Score:1)
Re:Why Alpha's??? (Score:1)
Re:Linux Not Useful For All Superclustering Tasks (Score:1)
1) We're not using Java to gain in performance, obviously, we're trying to optimize performance
of a system already written in Java.
2) Solaris x86 JVMs also sucked. In fact, when we made the NT decision, JVMs on Solaris SPARC AND Solaris x86 were slower than on NT. Extensive benchmarking was done, using both our software, and simple benchmark tests.
3) Only one person suggested that maybe Linux does need a better JVM. It's ironic that the response is to attack our software (which you know nothing about), Java, and our intelligence, rather than to suggest that writing a good JVM would be useful... R&D folks are taking a liking to Java, and without a good JVM Linux will be unusable by a fair portion of the R&D community.
4) Actually, one of our people is writing a better JVM, though obviously it will be of little use to any of you...
5) Um, we don't need Beowulf "to run Java", we need a cluster or supercomputer to run the very complicated software we've written in Java.
It's funny, rather than being interested in how to expand the horizons of Linux and maybe try to understand why someone would want to use a VM based language like Java, people just get all uppity. Your computing paradigm is challenged, time to get defensive...
Whatever.
We're doing fine without Linux, actually, I just thought maybe some other Linux folks would be interested in writing a decent JVM, but we'll do it ourselves...
Re:Linux Not Useful For All Superclustering Tasks (Score:1)
Java compilers (and supercompilers, which would run prior to a compiler, actually) are being developed, and while the compilers may not speed things up much, supercompilers will.
So, if the JVMs don't totally suck, Java is about as good as C++, and only 2-3 times slower than C.
With JNI we could rewrite very computationally intensive parts of the program in C, as well. As things like TowerJ and HotSpot are ported to Linux and other platforms, speed-ups occur there, as well.
All in all, if you're working in C++ you can get roughly the same performance from Java... (it will require a lot of tricks to get C level performance... maybe even the Java chip... but so what? most of the system doesn't need it... many systems don't...)
Re:The right tool for the right task ... (Score:1)
1) global circulation models are actually done by people in the US, downscaling via nested regional models are limited to this part of the world and if and when the system becomes operationalised, is expected to be distributed. Think cooperating groups around the world sharing the CPU burden
2) the 100m models are interfaced to streamflow and catchment models which are only a comparatively small region set within with the wider desert (rather uninteresting). Think sparse multi-resolutional hierarchy.
3) futher submodels are inherently linear in space/time, while the climate fields are calculated once, the bulk of the operational landscape runs the scientists are interested in are multiple ensembles which require lots of memory, hence some rather painful use of staging and compression. Think conversion to streaming media rather than static files.
If you're interested in more details, send me your email and I'll point you to some of my papers.
Regards,
LL
4 TFlops?! (Score:1)
Toy Story: The Beowolf Cluster (Score:2)
Yes, but what kind of plot? Would it be Woody and Mr. Potatohead lost in a hurricane with a large number of penguins?
Raw power is cool, but art takes a bit more than that.
Linux Supercomputer Wins Weather Bid (Score:2)
Remember those vast performance diffs between the 80386SX-16 and the 80386DX-16? That's what we got here.
7lt;Note-to-Microsoft> Nanny-nanny-nah-nah, our OS runs on IA-64 and yours won't.7lt;/Note-to-Microsoft>
D. Keith Higgs
CWRU. Kelvin Smith Library
Infrastructure costs (Re:BEOWOLF!) (Score:2)
For $15,000,00 to buy an Alpha Beowolf, it sounds like they might have 2,500 nodes with a 'decent' Alpha system. But if they go really high end, they'll have about 750 nodes (For the 'killer' $20,000 Alpha machines).
That doesn't include the cost of the Myrinet cards and switches, racks, 3rd party software, support people, power, cooling, etc. Believe me, if you're paying $15M for a machine, part of it better be going for support personnel and infrastructure. The configuration's probably more like 250-500 nodes with a corresponding number of Myrinet cards and switch ports, 30-75 racks (8 nodes/rack if you're lucky), a *buttload* of power and air conditioning, and 2-5 onsite support people working in it full time.
Not a beowulf? (Score:1)
Not every Linux cluster is a Beowulf. The fastest alpha Linux cluster [sandia.gov] in existence is not a Beowulf.
Anyone know what they plan to use?
No one said it was. (Score:1)
A real life demonstration... (Score:1)
--
The right tool for the right task ... (Score:3)
To give you some real-world experience, a group I'm working with is looking at continential-scale simulation at a 5km resolution with the aim of going down to 100m. Now despite what most people think, the bottleneck (in this example) is in fact the I/O, with estimated total requirements of 30 TBytes. Doing the sums show that to keep up with the CPU (say hypothetically 1 run/24 hours), you would need average throughput of 350 MByte/sec. Hardware that supports both this volume and capacity is NOT cheap. We would joke that we paid x million for the I/O and SGI would throw in the Cray for free
Now as for how an Alpha cluster could be used, it would fit very nicely into the dedicated batch box category. It has a very high CPU rate and some decent compiler optimisation. As such it would augment whatever existing environment exists, reducing the workload of the more expensive machines for development which generally have better tools (just you try debugging a multi-gigabyte core dump). The biggest problem nowadays is not the algorithms, but managing the data traffic to the CPUs and this is where Linux clusters are weak with relatively slow interconnects, unbalanced memory hierarchies, and cheaper but higher latency memory. You have to accept the disadvantages and shift jobs which are not suited for this architecture off. A bit of smarts goes a long way in stretching the budget.
LL
Linux NOT mentioned in the official release (Score:1)
Is the NOAA afraid to say that they are basing a 15 million dollars investment on free software rather than on something from Microsoft/Sun/IBM/whatever ?
Re:Why Alpha's? Screaming FP performance, that's w (Score:1)
And the raw bandwidth of even the unreleased G4s trail that of three year old Alpha designs anyways, and now there's the switch-matrix arch that gets close to twice that of the new G4's theoretical bandwidth (EV6 500 ->> ~2.6 GB/s, G4 (7400) -> ~0.8 GB/s). This is the 'theoretical', Alphas still get 1.3GB/s in sustained throughput, 50% more than G4s Theoretical
Re:will it live up to expectations? (Score:3)
To fix this, you use 2 processor busses, and 2 memory busses. you fill these up, and you get 4 processor busses and 4 memory busses. now you need to connect these buss segments. You have several options. First, connect them within the same machine. This is what NUMA is. the other route is to put each bus in a seperate machine, each machine running a copy of the kernel localy, and connect each box together with a fast network. This is what boewulf is.
To give you an example. think of a highway system. If you have a lot of traffic switching lanes(busses) constantently, then it would be best to build one big 20 lane highway(NUMA). but if all the traffic basicaly keeps in its own lane, without much need to switch lanes(Inter Process Comunication) then it may be more economical to build 10 2lane highways(boewulf).
Infact ins't a cray T3E more of a boewulf type cluster of closly nit machines then a NUMA. I think each node on a T3E runs a local copy of the micro-kernel.
Motorola gcc optimisation (Score:1)
--
Why digital??? (Score:2)
Forget about any of these digital OS, we even implemented our own ANALinux, which used OS technology that was originally implemented for the quantum computers that is slow to come about. Except for the fact that probability wave algorithm in the kernel was reimplemented with the electron wave method(more descrete.)
We can't open source it yet, since the whole kernel runs via negative feedback, so it is constantly being upgraded. We could take a snapshot of the loaded kernel image by detaching all the ferrule doughnuts at the same time, but the source would all be in analog stream and useless unless you have another valve box.
It easily interfaces with outside systems even though it is 100% analog inside due to the (ported) quantum kernel's interface, which utilizes the duality of the wave and sends discrete signals to outside the box. The only problem is the primitiveness of current technology. Since petabit networking has not been implented, we basically watch the tube's change in brightness as I/O. Current internet access by outsiders is via out webcam pointed at the tubes.
This OS is totally unhackable since nowbody know how to hack it. Input is vial variosistors instead of toggle switchs, so all the script gramps who hacked their way into Univacs would not know how to break in.
So all you digiphiles, put you toys down and use the computer that work like the way humans do.
Re:Linux NOT mentioned in the official release (Score:1)
Re:Why Alpha's??? Only the Facts Ma'am! (Score:1)
Software? (Score:1)
Re:will it live up to expectations? (Score:3)
SMP (Symmetric Multi- Processing) is fundamentally different to clustering, as all of the processors in an SMP configuration share the same memory bus, whilst in a cluster the machine architectures are distinct, and we use a high-speed network to exploit parallelism.
See the Linux Parallel Processing HOWTO [purdue.edu] for more information.
Re:Why Alpha's??? Now this is better (Score:2)
Re:Linux NOT mentioned in the official release (Score:1)
anyone had any experience porting MPI code from linux/solaris clusters to NT? i.e. same hardware?
And assuming the same compilers etc.. Part of the problem is until recently there have been (allegedly) really nice compilers for NT that have not been available for linux. Also BLAS routines were native only for NT by intel for ages. I think they have been ported over now.
From my understanding for most work, it makes absolutely no difference as the overhead of the OS should be negligable. In my experience w/ single processor jobs w/ large memory jobs (say > 500 megs), Solaris tends to run smoother.
I ask this because I am moving to a school soon that got bought out my microsoft and they have ported all their code to just this: NT Clusters using MPI (from microsoft grant money that is being dumped on all the schools p.s. we got it here.. we just umm formated the drives