California Researchers Build The World's First 1,000-Processor Chip (ucdavis.edu) 205
An anonymous reader quotes a report from the University of California, Davis about the world's first microchip with 1,000 independent programmable processors: The 1,000 processors can execute 115 billion instructions per second while dissipating only 0.7 Watts, low enough to be powered by a single AA battery...more than 100 times more efficiently than a modern laptop processor... The energy-efficient "KiloCore" chip has a maximum computation rate of 1.78 trillion instructions per second and contains 621 million transistors.
Programs get split across many processors (each running independently as needed with an average maximum clock frequency of 1.78 gigahertz), "and they transfer data directly to each other rather than using a pooled memory area that can become a bottleneck for data." Imagine how many mind-boggling things will become possible if this much processing power ultimately finds its way into new consumer technologies.
Programs get split across many processors (each running independently as needed with an average maximum clock frequency of 1.78 gigahertz), "and they transfer data directly to each other rather than using a pooled memory area that can become a bottleneck for data." Imagine how many mind-boggling things will become possible if this much processing power ultimately finds its way into new consumer technologies.
Link to paper (Score:5, Informative)
The press release does not include it, nor does the slashdot summary. The link to the paper: http://vcl.ece.ucdavis.edu/pub... [ucdavis.edu]
Re: (Score:3)
These are pretty primitive, yet very flexible cores. Worthless for most current loads, but that may change. However the comparison to modern CPUs is unfair. A proper comparison would be to modern GPUs.
Mind bogglingly complecated co-processing (Score:2)
That leaves out the nasty deadly embrace. Or less nasty, waiting on a key resource to complete.
More core just gets you bound up in your shorts faster.
more cores is not a magic bullet.
Re: Mind bogglingly complecated co-processing (Score:5, Interesting)
I take it you've never done high performance computing, have you? More cores is often a good thing. If I'm doing a simulation across 1,024 cores and each node has 16 cores, that means I need a minimum of 64 nodes. There's a lot of communication that takes place over protocols like Infiniband in order to make MPI work. It also rules out the possibility of shared memory systems like OpenMP when jobs reach that scale and have to be spread across multiple nodes. If more cores are located within a single node, it reduces the amount of communication with other nodes and the resulting latency. It also makes shared memory a viable option for larger parallel jobs. If I can fit 64 or 256 cores on a node, there's a lot less need for relatively slow protocols like Infiniband to pass messages. I don't think the ordinary user has a need for 1,000 cores or would have such a need for a long time. But it really could help with high performance computing.
Re: Mind bogglingly complecated co-processing (Score:4, Informative)
Oi
There's always problems that parallelize well and this setup will likely work just fine for them. The same way nvidia cuda does already, the same way vectorizing/coprocessing add ons have done going back to the ISA bus.
The fly in the ointment is most of the worlds problems don't and even when you can parallelize debugging is nightmarish.
All said expect to see this doing neural network work. From the article and the description of the processor communication/lack of shared memory it sounds custom tailored to that.
Re: (Score:3)
Re: Mind bogglingly complecated co-processing (Score:5, Informative)
It also makes shared memory a viable option for larger parallel jobs.
Good luck with that. I mean it. IME as you go *more* parallel, shared memory becomes a *less* viable option, regardless of how many cores are running on the same machine. The cycles lost to memory locking to make shared memory work increases exponentially with the number of autonomous processes/threads.
The math isn't disputed - see the birthday problem [wikipedia.org] for a start on calculating the clashes in playing musical chairs. In short, when you have X individuals with Y pigeonholes, then you are effectively bounded by Y, not by X. When you have X threads trying to access one variable, the chance that any thread will get this variable without waiting is effectively 1 for one thread, 1/2 for two threads, 1/3 for three threads, etc.
By the time you get to a mere 64 threads each trying to access a variable, each thread basically has a 1.5% chance of getting it, and a 98.5% chance of being placed into a queue for that variable. Queue times get longer logarithmically. For one thread, time spent in the queue is ((0 * ATIME) + ATIME) where ATIME is the access time of the variable. For two threads, it's ((1-1/2) * ATIME) + ATIME, for three threads it's ((1-1/3) * ATIME) + ATIME, for four threads it's ((1-1/4) * ATIME) + ATIME. For ATIME=100us, the times above are, respectively, 100us, 150us, 166.67us, 175us. That last number is only for four threads with one variable, and assuming that queuing takes no clock cycles. The times increase exponentially with an increase in the number of variables that must be locked.
For 64 threads your expected time in the queue is ((1-1/64) * ATIME) = 98.5us. You can forget about using shared memory if you want to use 1000 cores.
But wait, "Use a sane design pattern and that won't happen, like with consumer/producer, etc" I hear you say? Sorry, no design pattern will save you, because if even a single thread writes to a variable, then all threads have to implement read-locks to make sure they don't get an access during a write (race condition).
If you have 1000 cores, implement local message-passing. Don't try shared memory unless each thread will use a local copy (in which case, it isn't "shared", now is it?). Or, go ahead and do it and maybe you'll find a shared memory design that doesn't fail to first year statistics, and if you do beat the numbers then I'll be the first to nominate you for a Fields medal/Turing award :-)
Re: (Score:2)
That sound like a problem the immutable object [wikipedia.org] pattern was designed to solve.
Re: (Score:2)
That sound like a problem the immutable object [wikipedia.org] pattern was designed to solve.
Then you don't need shared memory. If the object never changes then each thread can keep their own local copy, and there's no need for shared memory (which is what I said somewhere above in that jungle of text).
Re: (Score:2)
Re: (Score:3)
I was thinking atomic operations [wikipedia.org] as they would also avoid the wait.
Atomic operations aren't useful enough to share data; we use them to implement the locks on the actual data we want to share. GP spoke about wanting 1000 cores with shared memory, chances are he's not planning on having all 1000 simply increment/decrement an integer.
Re: (Score:2)
Re: Mind bogglingly complecated co-processing (Score:5, Interesting)
Doing any sort of large-scale computational fluid dynamics or finite element simulations may require a great many cores. For example, you might want to conduct a very detailed simulation of the air flow around a vehicle, airplane, structure, etc. to have a basic understanding of its aerodynamics before spending time and money testing an actual prototype in a wind tunnel. You might also want to look at how very complicated, soft-body structures deform due to a variety of external stimuli. Such information would be crucial for certain materials science applications. Chemical reaction and acoustic simulations may also require a great deal of computing power, especially if you want to have a high spatio-temporal resolution.
Essentially, there are plenty of physical and theoretical science applications that can benefit from massive processing capabilities. There is a lot of fundamental science that is also performed in simulation before any actual tests occur.
Re: Mind bogglingly complecated co-processing (Score:2)
Too bad they won't fit in 784k of ram
Re: (Score:3)
Tell me about it. My i5 supports a whopping 1MB.
Oh wait you thought that was RAM in the traditional sense? Maybe you should read the original paper which among other things said that this is extensible with onchip memory (think level 3 cache), or off chip memory (actual RAM).
Re: (Score:2)
You don't even need to get into theoretical CAD for this to be of benefit. There are a lot of computer use cases that are massively parallel. I mean my 5 year old graphics card has 448 CUDA cores for the insanely parallel task of rendering something on my display, and that doesn't even take into account professional rendering which covers everything from marketing departments to the motion picture industry.
Re: (Score:3, Insightful)
Because I look at the real world around me and I see little that would benefit from that.
This is a failure of imagination. The worst kind of failure.
Re: (Score:2)
That's a start, think of image processing. A lot of it is applying the same operation to a very large number of images with no need to do it in a special order or in a serial manner at all.
Even something as trivial as editing a home movie is going to be an utter pain if the software is single threaded instead of doing the task more quickly in parallel.
Re: (Score:2)
In other news (Score:5, Funny)
A young intern who likes to "work late" in Davis California has recently come into the possession of a rather large stash of bitcoins.
I guess this is great (Score:4, Interesting)
Re:I guess this is great (Score:4, Insightful)
Quantum computing is not magic. It has problems it's insanely good at (in theory) solving, and it has problems where it's as fast or slower (because of the necessary error correction) as your traditional deterministic computer. Not only are we a long way off from personal quantum computing (we still don't even have a general purpose quantum processor), we still need to research deterministic architectures.
Re: (Score:2)
Re:I guess this is great (Score:4, Interesting)
Re: (Score:2)
Re: (Score:2)
How does it currently? How does your GPU know which pixel to render with which of the similarly high number of CUDA cores a typical video card has these days?
Re: (Score:2)
But I am not sure what system or software can take advantage of it. Personally I want to see progress being made on quantum computing for consumer lever stuff.
If you have an application where you can calculate many possible solutions independant of each other, and then choose the best one, this kind of processor might be useful. Quantum computers are very strong for that kind of application, so I see it being a stepping stone to quantum computing.
Re: (Score:2)
Re: (Score:2)
Highly parallelized software (Score:2)
remaining core count (Score:5, Funny)
the world's first microchip with 1,000 independent programmable processors ... Imagine how many mind-boggling things will become possible if this much processing power ultimately finds its way into new consumer technologies.
Yeah, but you have to keep in mind how many cores will be left for the user!
1000 cores minus:
* 200 cores for anti-virus software
* 25 cores for the ransomware battling it out with the anti-virus
* 55 cores for Microsoft's Win10 update nagware
* 350 cores for the NSA monitoring
* 122 cores for the FBI monitoring
* 75 cores to handle syncing all your data to the cloud
* 94 cores to run the 3D GUI based desktop
* 62 cores for constant advertising
* 14 cores for Google to keep tabs on what you're doing
* 1 core dedicated to emacs
So, only 2 cores left for the user. No better than an Athlon from 2005, I'm afraid.
Re: (Score:3)
What if they also want to run a decent text editor?
Re: (Score:2)
But he still needs to compile the stuff he edited with emacs/vi so there is the last bit
Obligatory (Score:5, Funny)
Imagine a Beowulf cluster of these!
Re: (Score:2)
What task is a RaspberryPI Beowulf cluster good for?
Re: (Score:2)
What task is a RaspberryPI Beowulf cluster good for?
Generating discussion on Slashdot.
Imagine it as a coprocessor (Score:4, Interesting)
Re: (Score:2)
The main problem would be the memory bandwidth then. GPU can siphon through a lot of data because the architecture assumes that nearby threads are very likely to read contiguous data. This architecture however, allows for each core to have its own instruction queue, it should be hard to predict which thread is going to access which portion of the memory so that we can fetch it into a single request. I fail to see how you can scale the bus/controller/etc to match the bandwidth requirement (outside of few doz
Re: (Score:2)
HBM is the answer to your memory bandwidth issue. [amd.com] Especially since it allows for die stacking.
A GPU? (Score:3)
Re: (Score:3)
We have those already, in the form of modern GPUs that can do a lot of general purpose processing such as physics simulation and image recognition.
This chip is more like the Cell processor in the Playstation 2, with a bunch of under-powered cores that are a bugger to program and have very low performance each. I can't see it taking off because, for example, each core only has access to a tiny amount of RAM so the processing they can do will be limited mostly by memory bandwidth. A GPU gives its thousands of
Imagine a beowulf cluster of these! (Score:2)
The 1990's called, they want their joke back!
Shader units (Score:2)
Aren't the shader units of the modern GPUs like the Geforces basically specialized CPUs?
In this case we're already at 2560 CPUs on a single chip.
Re:Shader units (Score:4, Insightful)
No they are not. The threads in a modern GPU are not all free to execute different instructions. A GPU is a SIMT architecture : Single Instruction, Multiple Threads; each warp of threads (group of approx. 16 to 32 threads) will execute the same instruction at the same time on whatever data each one is holding (some threads can also be deactivated in the group, for this instruction). So the physical architecture for each of the thread in a GPU is much simpler than for the threads of this processor (because of factorization of all the instruction queue and related mechanism, much simpler synchronization, etc.).
Re: (Score:2)
That makes em quite bad at dealing with conditional execution, right?
Re: (Score:3)
Well, yes. But I don't think that we can say "terrible" performance for conditional execution. Very simply, if you have a condition "if(test){ ... } else { ... }", the warp (group of threads) will go in the true-block if at least one of them ticks (test==true). During this portion of the execution, the threads which did not tick are disabled and are indeed waiting. And vice-versa for the false-block. If none of the threads tick, or if they all do, then the unnecessary block will be avoided (this is what we
Re: (Score:2)
For loops that use a gradient as a reference must be completely GPU crushing.
Re: (Score:2)
Photoshops implementation (oil paint filter) is particularly poor in performance. I don't know why its so terrible in performance. Maybe its a marketing thing (if its slow, its must be really good?)
For image processing in particular, the fact that branching can in the worst case have a significant penalty on gpu's is moot because the worst case doesnt normally happen in practice.
in a world without ever increasing frequency (Score:3)
The way to improve computational technology is parallelism. What are the usage domains?
-anything video related
--games
--image recognition
-anything AI (I think?)
--autonomous cars
--facial recognition
-a lot of physics applications
Thoughts?
Re: (Score:2)
Most stuff in autonomous cars don't need that power.
The stuff I was involved in runs mainly on 4 ARMs, 1 DSP, 512MB, 500MHz(not sure, might be less). But that was image processing, only for emergency breaking, pedestrian recognition, sign recognition, lane detection etc.
Additional systems like LIDAR, RADAR, ultrasonic surface tracking etc. usually run independent on a different system, but with similar low spec requirements.
Sslloow (Score:2)
It only runs at 1.7ghz. My Pentium IV running XP runs at 4 GHz! Just ask any Joe Six pack who bought them over an AMD?
Boring (Score:4, Informative)
...contains 621 million transistors... Imagine how many mind-boggling things will become possible if this much processing power ultimately finds its way into new consumer technologies.
Let see... 1,000 very small compute cores... sounds a awful lot like your typical GP-GPU these days. Only reason the power consumption is so small is because it has < 1 billion transistors. Compare that to the 17 billion transistor nVidia pascal monster. Even the non-Iris graphics Skylake desktop CPU has ~1.7 billion, and over half of those are spent on the GPU.
Chances are even paltry Intel HD Graphics running an OpenCL program will have more FLOPS than this thing. Don't be fooled by the flashy headline, the laws of physics still apply.
Re: (Score:2)
While I agree this is more flash then substance, it hardly deviates from the laws of physics. Unlike the nVidia example you provided, this CPU does not have much in the way of IO bandwidth. So we are talking about minimal movement of data which in turn results in impressively low power consumption. For certain applications this could be great (a previous post mentions neural networks). For the other 99% it is worthless.
One should not compare this CPU to a GPU because the underlying design goals are v
Windows 23 (Score:3)
It does almost nothing very very fast (Score:5, Informative)
Each CPU supplies an amount of computation less then a single instruction on a regular CPU. Think of it as a grid of instructions not a grid of computers. A processor has a Harvard architecture with 128 instructions of 40 bit size and a separate data memory with two banks of 128 16 bit data values (256 16 bit data words total). It says nothing about register files or stacks or subroutine calls. It's likely that the two data banks are in effect the register set. The paper implies that a CPU can compute a single floating point operation in software.
Compiling means mapping code fragments to a set of connected CPUs and routing resources, and then feeding the data into the compute array. After some circuitous path through the grid the answer emerges somewhere. There are also 12 independent memory banks each with a 64KB of SRAM that are available to all CPUs.
History has not been kind to this kind of grid architecture with lots of CPUs and very little memory. Almost none of them ever made it out of the lab. It's symptomatic of hardware engineers who are clueless about software and design unprogrammable computers. They confuse aggregate theoretical throughput with useful compute resources.
Debugging code on this would be a nightmare. It's completely asynchronous, there is no hardware to segregate different sets of CPUs doing different computing tasks and so few resources per CPU that software debugging aids would crowd out the working code. The people listed on the paper should be punished by being force to make it do useful work for at least a year. They would be scarred for life.
Re: (Score:3)
Why? This CPU sounds like it's perfect for Erlang, which although a somewhat odd language, is nonetheless one in which a fair amount of useful software (Chef, CouchDB, Riak, RabbitMQ) has been written.
Don't forget ejabberd, one of the most useful XMPP instant messenging servers out there.
But really, I don't think massive parallel processing is going to cause big improvements, because the software design stops at other bottlenecks anyhow, like IO. Having a thousand cores waiting for a commit isn't going to be a heck of a lot faster than having eight cores waiting for commits.
I can imagine. (Score:4, Interesting)
Even ignoring all other limitations of this particular processor there's still Amdahl's law, limiting the speedup by the serial parts of a task.
As one example how that works look at compiling to hardware. In theory this should bring enormous benefits as not only can one parallelize on a instruction level but on a sub-instruction one, speculating and pipelining e.g. additions. Many types of communication can be eliminated entirely by replicating hardware.
But even with those benefits there are a _lot_ of software that is better to run on a standard processor. Why? Because using custom optimized hardware to run it ends up replicating a number of normal processors including caches, branch prediction etc. and then a processor optimized by a dedicated team of experienced people ends up being attractive.
Now saying custom hardware can't bring huge benefits, not even saying that this research processor can't do it _however_ in general there are a lot of tasks that can't really be accelerated much.
Creative licence on power usage (Score:2)
FINALLY! (Score:4, Funny)
Something that will run Flash without bogging down.
Disappointed (Score:2)
What kind of computer scientists are they?
They should have made it 1024. And labelled them 0-1023.
Imagine... (Score:2)
Had to say it. Haven't see that response in a while.
Re: (Score:2)
Re: (Score:3)
Re: (Score:3)
Re: (Score:2)
Re:Can this chip run GNU/systemd/Linux? (Score:5, Interesting)
That's probably all it can run. Typically specially designed systems need the ability to configure the OS radically differently than has been done previously which requires source code. Microsoft provides source code, as does IBM, in some special situations, but mostly it tends to be Linux that is used first. Consider the reasoning behind the OS chosen for the fastest computers [wikipedia.org] in the world.
Systemd? Probably because serious computer engineers don't have any trouble dealing with the irritation that systemd causes. (The rest of us may, but if you have enough smarts to handle building a specialized chip, then systemd isn't really a challenge.)
Systemd on CentOS7 (Score:5, Informative)
Systemd? Probably because serious computer engineers don't have any trouble dealing with the irritation that systemd causes.
Confirming: our latest nodes on our cluster are running CentOS7 which is systemd powered.
(And hopefully the final practical product out this buzzword-compliant pressrelease would still be somewhat useful.
We could have some special workloads to apply it to).
Re: (Score:2)
Re:Can this chip run GNU/systemd/Linux? (Score:4, Informative)
No.
systemd requires glibc. And glibc is 2 MB large. According to the paper, the processor has whopping 768 KB of RAM (and no capabilities to add external RAM).
Means systemd won't gonna run. Dunno about the kernel, probably its easier to write a minimal one from scratch than to port it over to that special architecture.
Re: Can this chip run GNU/systemd/Linux? (Score:2)
Re:Can this chip run GNU/systemd/Linux? (Score:5, Informative)
Re: (Score:2)
No doubt Linux runs on a conventional processor that manages the embedded processors. Probably just running on the metal on the embedded processors, like a GPU.
Re:Can this chip run GNU/systemd/Linux? (Score:5, Informative)
This is basically a modern transputer. As with connection machines, GPUs, and all such machines, it will very likely need a traditional host CPU to manage it, and that may well run Linux.
Re:Can this chip run GNU/systemd/Linux? (Score:4, Interesting)
I still wonder how long it will be until the 'traditional host CPU' is scaled down to a small SOC, so that the traditional heavyweight CPU is freed up for tasks that actually require it: most of what runs on the i5 in the machine I am writing this on doesn't need anything remotely as powerful as said i5. Likewise, putting a small SOC-like chip in the graphics card and running most of the GUI there is another thing. As such, once processors hit the single core brick wall (and they're kind of doing that now), performance improvements will come from offloading what can run on a small power-efficient core to such a small power-efficient core. Given what the chip in e.g. a pi zero costs, it ought to make sense: connect your machine to power, and a tiny microcontroller handles the ILO and basic system management functions, and on power-on, a larger microcontroller/SOC does what the BIOS/UEFI does on current machines. Similarly in the screen we have the same arrangement, with a microcontroller starting up the GPU and display (independently of the rest of the machine). A modern PC is already like a small network (the GPU being networked to the main CPU via the pcie bus, multiple intel sockets networked via QPI etc.). Making this more explicit is the sensible thing to do.
Depends... (Score:2)
Its depends.
In the case of Xeon-Phi (i.e.: ex-Larrabee GPUs repurposed as parallel processing units), in addition to the very wide SIMD AVX512 units, there are also scalar cores able to run pentium-compatible binaries.
So the Linux core managing all the hardware actually run *on* the GPU itself (and you can SSH into your Xeon-Phi if you want).
On the other hand, the Tilera works exactly as you describe.
A weird many-core structure running the processing kernels,
and a nearby classical risc core managing the who
Re:Can this chip run GNU/systemd/Linux? (Score:5, Informative)
Nitpicking (Score:2)
I'm nitpicking to hell with this but...
Yes, all the *SIMD units attached to 1 execution core* will necessarily process the exact same instruction at the same time on the same cycle... ...but there more than 1 execution core on most higher range GPUs, and nearly all modern GPUs are
(which from a design point of view makes entirely sens: graphical processing is about repeating some processing on thousands or million pixels. Better group them in batches instead of processing every last damn pixels individually)
Re: Can this chip run GNU/systemd/Linux? (Score:5, Informative)
To get into a bit more details, I'll use AMD as an example, but Nvidia pretty much does the same thing with slightly different terms for the same concepts. The AMD RX 480 has 2304 streaming processors(cores), that are grouped into 36 CUs(execution groups). Each streaming processor can handle up to something like 4 wavefront(threads, like hyper-threading to hide memory access latency) at a time. All streaming processors in a CU for a given wavefront must be executing the same instruction at the same time, except in the case of a branch. When a branch happens, one fork of the branch will process, stalling the other streaming processors taking the other fork. Once that fork is finished, the first group of streaming processors will stall while the other processing finish their fork.
Nit-picking (Score:3)
Nit-picking to hell...
You've forgotten a special use case:
Yes, if AC's code does something stupid like "every even thread branch lest, every odd thread branch right", the execution group will need to run the code twice, with altening masks to run each branch, exactly as you describe.
But if it's entirely different part of the thread block that diverge (e.g.: first half vs. second half), the "executions groups" will each diverge independently. The first 18 taking one branch and the second taking the other bra
Re: (Score:2)
I've written OpenCL kernels that have variable length loops and branches either of which could be run, and executed then in parallel.
The way this typically works is to use conditional execution, just like in ARM or Itanium, with the predicate bit being a set of bits. This is all explained in early research papers on GPUs, such as this one from the now-amusingly-named "Lucasfilm Pixar Project" circa 1984 [cmu.edu].
Re: (Score:2)
Re: (Score:2)
New applications?
Re: (Score:2)
Likewise, if doing weather processing, or geo-graphical, or simulations of lightrays. They all involve the same calulations but applied to different data.
Hence SAME INSTRUCTION; Multiple (or different) DATA.
Roughly, those CPUs all operate in lock-step.
MIMD, is like having 10
Raspberry Pomegranite (Score:2)
Perfect for the internet of things. Now rather than just an egg timer I can have a battery power super computer in my salt shaker that does a finite element simulation of the egg in boiling water, going beep and the perfect moment. The toaster will be able to insult me in the kings english or the emporer's mandarin.
And orange Pi is planning to make a board with one of these that only runs on one of the 1000 cores, and no stable OS.
This thing is I suspect suited for programs that parallelize and have littl
Re: (Score:3)
Most programmers don't know how to code for parallel processors. At best you may get multi-threaded apps but those are often made to handle large load of request not soling a single problem much quicker.
Re: (Score:2)
Re: (Score:2)
Most programmers seem to be coding Javascript these days.
Re: (Score:3)
Even very simple stuff with sound and images is inherently parallel. More complex modelling of physical objects is inherently parallel.
You don't get it? Imagine resizing the every frame of a movie at 25fps over two hours. That's the same operation done many times and very trivial to do in parallel. It's just a matter of splitting the task to whatever resources you have. With sound (and thus things like seismic data as well) if you want to
Re: (Score:2)
Only in a beowulf cluster.
Re: (Score:3)
And is it really 1000 CPUs, or is it 1024 rounded down to 1000 for the press release?
1000 exactly (Score:5, Informative)
It's a 32 x 31 grid = 992, plus 8 extra stuck on one edge to make up the numbers.
Re: (Score:2)
Why not just go to 32x32 and be done with it?!
Re: (Score:2)
So not only isn't it a KibiCPU as would be expected, but it won't be a true KiloCPU either? Calling my lawyer right now to discuss remediation options.
Re: (Score:2)
Re: (Score:2)
1 kibiCPU.
Re: (Score:2)
Re: (Score:2)
Same thing I thought. And connection machine died because the architecture was not actually that great.
Re:What games does this come with (Score:5, Funny)
Re: (Score:2)
Was hoping for a TIS-100 reference. Left satisfied.
Re: (Score:2)
Like this one form 2004?
https://tech.slashdot.org/story/11/01/03/1722240/researchers-claim-1000-core-chip-created [slashdot.org]