Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Intel Power Hardware Technology

Qualcomm Debuts 10nm Server Chip To Attack Intel Server Stronghold (tomshardware.com) 110

An anonymous reader quotes a report from Tom's Hardware: Qualcomm and its Qualcomm Datacenter Technologies subsidiary announced today that the company has already begun sampling its first 10nm server processor. The Centriq 2400 is the second generation of Qualcomm server SOCs, but it is the first in its new family of 10nm FinFET processors. The Centriq 2400 features up to 48 custom Qualcomm ARMv8-compliant Falkor cores and comes a little over a year after Qualcomm began developing its first-generation Centriq processors. Qualcomm's introduction of a 10nm server chip while Intel is still refining its 14nm process appears to be a clear shot across Intel's bow--due not only to the smaller process, but also its sudden lead in core count. Intel's latest 14nm E7 Broadwell processors top out at 24 cores. Qualcomm isn't releasing more information, such as clock speeds or performance specifications, which would help to quantify the benefit of its increased core count. The server market commands the highest margins, which is certainly attractive for the mobile-centric Qualcomm, which found its success in the relatively low-margin smartphone segment. However, Intel has a commanding lead in the data center with more than a 99% share of the world's server sockets, and penetrating the segment requires considerable time, investment, and ecosystem development. Qualcomm unveiled at least a small portion of its development efforts by demonstrating Apache Spark and Hadoop on Linux and Java running on the Centriq 2400 processor. The company also notes that Falkor is SBSA compliant, which means that it is compatible with any software that runs on an ARMv8-compliant server platform.
This discussion has been archived. No new comments can be posted.

Qualcomm Debuts 10nm Server Chip To Attack Intel Server Stronghold

Comments Filter:
  • First post (Score:1, Informative)

    by Visarga ( 1071662 )
    It depends a lot on how fast the interconnect is and how fast is access to memory.
    • In the 90s, a few companies used to do CPU-neutral motherboards. Can't someone make a server/workstation w/ those fast interconnects, and then give sub-vendors the choice of using either x64 or ARM? That way, they can configure servers depending on which CPU they want, based on price, ISA, et al

  • by Anonymous Coward on Wednesday December 07, 2016 @06:34PM (#53443507)

    It takes a LOT of cache and very clever data paths to keep 48 cores fed with data. Intel cores typically have 2.5MB of local level 3 cache for each core and multiple ring busses so cores can access the whole cache and not waste precious off-chip bandwidth trying to read from main memory. If this is a special purpose chip for executing deep learning algorithms that's one thing, but for a general purpose server where tasks are uncorrelated, it ain't easy to prevent stalls while cores wait for data.

    • by Anonymous Coward

      It's aimed at servers, so its pretty safe to say it will be running 48 Apache threads with the socket code pretty much always in cache.

      Or 48 other *identical* threads servicing multiple users for the same thread type.

      • It's aimed at servers, so its pretty safe to say it will be running 48 Apache threads with the socket code pretty much always in cache.

        Or 48 other *identical* threads servicing multiple users for the same thread type.

        Eh? Maybe you missed the whole IT thing that's been going on for like 40ish years but servers are used for a few things other than just apache.

        • by raymorris ( 2726007 ) on Wednesday December 07, 2016 @08:45PM (#53443995) Journal

          It's called an "example". There are millions of servers that do almost nothing but run a bunch of Apache threads, many that do nothing but smtp, many that do nothing but nosql lookups, etc. It's very common, especially for companies with thousands of servers, to have servers dedicated to a single task.

    • OR, people will just have to learn writing code for the new memory hierarchy. It's not like this would be the first time; people had to learn writing for caches, too.
      • by Rockoon ( 1252108 ) on Thursday December 08, 2016 @01:46AM (#53444809)
        Nobody learned.

        Look at any standard library or application framework and you will not find any cache oblivious algorithms.

        Linked lists are just traditionally implemented linked lists. Hash tables are just traditionally implemented hash tables. Trees are just traditionally implemented trees. Even sorting will be a ham-fisted quicksort.

        pretty much only assembly language programmers give a shit, mainly because they are the only ones that understand the issues. Any exceptions you find are the exceptions that prove the rule.
        • You don't need it for a large portions of an application's code. It's not even worth the effort for many things. Those people who need to learn it eventually learn it.
          • You don't need it for a large portions of an application's code.

            We arent talking about application code. We are talking about library code.

            If you write libraries like you write applications, then you are part of the problem.

        • Linked lists are just traditionally implemented linked lists. Hash tables are just traditionally implemented hash tables

          Linked lists suck for caches, but hash tables don't have to. There's a trend for libraries to provide things like hopscotch hash tables as the default hash table implementation and these are very much cache aware. The real problem is the trend towards languages that favour composition by reference rather than by inclusion, which means that you do a lot of pointer chasing, which is very bad for both caches and modern pipelines.

        • Not true. Most of the stuff that programs do are totally dependent on the speed of a) a Database b) an Online web service c) a File system.

          In those cases Caches are definitely used, a lot. And you get 95% of your speed gains from there.

          • Someone doesnt know what kind of cache is being talked about....

            ...but still decided to dive in with "Not true..." acting like they know something.
        • Good luck finding assembly language programmers for modern processors. Almost all have gone in the RISC direction, relying on the higer-level compilers to fill-in the gaps to make the environment more CISC-like. Example - a RISC CPU doesn't have an ADD instruction, but you can implement the function by negating one of the parameters and using SUB ... a C compiler will do this for you, and will remember to flip the polarity of the carry flag and any conditionals as appropriate. It makes assembly programmi
          • by Bengie ( 1121981 )
            You don't need to write ASM to get large performance increases. I help optimize C# applications all of the time and I can regularly gain 10%-30% performance from simple refactoring that does not affect code readability and without changing the algorithm. Most of what I do is think of how the .Net code will be converted into ASM and how the Garbage Collector will be involved. While GC optimizations gain the most, re-ordering calculations can also give good returns.
  • It would be interesting since AMD cancelled their ARM efforts in the server space.

    • For a damn good reason too. Every single attempt at ARM server has failed, there have been roughly (IIRC) 4 server ARM chips that made it to sampling and all bit the dust shortly afterward when the performance was shown to be abdominal. Personally I believed AMD dropping the effort was a clear indicator that even with all their experience they couldn't build something that would beat x86.

      The problem isn't the instruction set, it's all the stuff bolted onto it to try to keep cores fed. Multi-threading isn't

  • Since node geometry now has more to do with marketing than it does with feature size, it's no longer a meaningful comparison. Intel's 14nm node is generally superior to TSMC's 10nm node (where the Centriq will most likely be fabbed).

    • Intel's 14nm node is generally superior to TSMC's 10nm node (where the Centriq will most likely be fabbed).

      Can you quantify that? I generally assign a 30% "marketing penalty" to TSMC. By that rule of thumb, TSMC's 10nm node is a bit better than Intel's 14nm, other things being equal, which is of course a gross simplification. IMHO, the reality is, Intel's traditional process advantage is no more. It may even be turning into a process handicap as Intel persists in going it alone in a shrinking market while others are pooling resources in growing markets.

      • Intel have known it for some time and spent a lot of time refining the cache down to the geometry...

        what they do not specify is the cache size or any benchmarks... personally I would like nothing more than to see a mix of architectures with a standard board interface layout...

        john

      • The table here: https://www.semiwiki.com/forum... [semiwiki.com]
        give a breakdown of the different foundries nodes.
        As you can see, TSMC's 10nm is about 15% denser than Intel's 14nm, however density isn't the only factor. Performance-wise I would say Intel's 14nm is going to be better for a server chip, because it's specifically tuned for high performance computing, while TSMC's nodes are tuned for low power mobile SoC's

  • Qualcomm designs chips, from my experience based on ARM not x86, and outsources the actual making of chips to other companies (TSMC, Samsung, whomever).

    Not really seeing how this threatens Intel outside of the whole ARM vs x86 thing. My understanding is most server farms are connected to dedicated nuclear power plants anyway, so power consumption isn't an issue. Heat dissipation? Yeah, that might be an issue.
    • It potentially could end up freeing the server space from a monopoly. You know? The thing Slashdot's always rallying against.

    • by epine ( 68316 )

      You're entirely right that the memory subsystem is 90% of the battle for most server workloads once you exceed ten cores.

      For integer workloads with unreasonable parallelism and unreasonable cache locality (that Intel's AVX doesn't already handle almost ideally), I'm sure this design will smoke Intel on the thermal management envelope, a nice niche to gain Qualcomm some traction in the server mix, but hardly a shot heard around the world.

      And Qualcomm better be good, because Intel will soon respond with Omni-

    • My understanding is most server farms are connected to dedicated nuclear power plants anyway, so power consumption isn't an issue. Heat dissipation? Yeah, that might be an issue.

      With recent news that Google is shooting for 100% renewable energy for its datacentres (and many others will follow suit), I'm not quite so sure that's true any more.

    • Data center power is expensive. Mostly because it's reliable and redundant. And yes, every watt used is a watt of heat that has to be removed by the cooling system.

      Suppose it was literally true that a data center was powered by a dedicated nuclear power plant. It costs about $12 billion to build a power plant. How many cores would you like to be able to power from your $12 billion investment? If I operated a big DC, I'd rather power a million low-power CPUs from my X gigawatts of power than only be able

    • Qualcomm designs chips, from my experience based on ARM not x86, and outsources the actual making of chips to other companies (TSMC, Samsung, whomever).

      Almost certainly not Samsung as Samsung isn't Pure-play, its IDM. Maybe Qualcomm can use them for some things, but probably not ARM SoC's.

    • My understanding is most server farms are connected to dedicated nuclear power plants anyway, so power consumption isn't an issue. Heat dissipation? Yeah, that might be an issue.

      Heat and power are the same issue. The conservation of energy means that power in is power out, and the power out is heat that needs to be dissipated. A rule of thumb for data centres is that every dollar you pay in electricity for the computers, you need to pay another dollar in electricity for cooling. If you want high density, then you hit hard limits in the amount of heat that you can physically extract (faster fans hit diminishing returns quickly). This is why AMD's presence in the server room went

  • Intel is only "refining" the 14nm design through the natural course of their "tick-tock" process (which has now added a third "tock", which seems likely to be due to lack of real competition). The fact remains that intel opened their 10nm fab in July, we're 6 months into production, and Canyonlake is on track for next year:

    Intel starts up 10nm factory [wccftech.com]
    • Intel is only "refining" the 14nm design through the natural course of their "tick-tock" process (which has now added a third "tock", which seems likely to be due to lack of real competition).

      No its because their 10nm test yields aren't even close to economical.

      Intel has blown their advantage. I'm sure someone will reply with "but its not real 10nm" while being completely ignorant that not only isn't Intel doing "real 14nm" that they are the ones that invented lying about feature size.

      • Only none of what you said is true [semiwiki.com].

        They haven't blown their advantage, though it's certainly shrunk, and they will continue to hold it through the "10nm" node where Intel's process is actually below the standard 10nm and TSMC et al's most certainly is not.

  • And the A10 is about equal to x86 Sandy Bridge performance so it's going to take a lot of Qualcomm cores to be competitive with each x86 core.
  • Any chance I could get a data sheet without a prohibitive NDA and the need to fork over one of my children?

    (I suspect the answer is "no.")
  • This is quite narrow. A 10 nm wide track fits less than 100 atoms of silicium.
    • None of the features will be as small as 10nm in size, just like none of Intels 14nm chips have features even close to as small as 14nm.

      The current trend in labeling since transistors started on their vertical adventures is to extrapolate an "equivalent" feature size based on overall transistor density. These TSMC-made chips have about a 30% higher density than their "14nm" chips, just as Intels "10nm" chips will have about a 30% higher density than their "14nm" chips.
      • Therefore a 14 layered 14nm chip would be labeled as 1nm? They will soon announce sizes smaller than a single atom!
        • Therefore a 14 layered 14nm chip would be labeled as 1nm?

          Give or take... but they are probably still 15 years away from actual 14nm feature sizes. If current features were simply reduced by a factor of 15 across the board, the smallest feature size would still be about 3nm.

      • The current trend in labeling since transistors started on their vertical adventures is to extrapolate an "equivalent" feature size based on overall transistor density.

        And when did it start? I mean what was the last non-extrapolated feature size?

  • As more than half the cores have to remain ideal most of the time to keep it from over heating.

  • It makes more sense in networking gear at first. If people could rewrite their packet forwarding engine or create something like DPDK or SRIOV for this chip. They could drop the mic. RISC usually kicks the shit out of x86 for packet forwarding.

"The vast majority of successful major crimes against property are perpetrated by individuals abusing positions of trust." -- Lawrence Dalzell

Working...