[CALUG] Looking for advice

Bryan Smith b.j.smith at ieee.org
Mon Feb 18 13:02:00 EST 2019


Open Source Rocks <opn.src.rocks at gmail.com> wrote:
> Bryan,
> Thanks for explaining, I was unaware of the AMD's advancements.

The 8087 lineage has an interesting history.  At one point most 80x87
units were sourced by Intel from AMD, and AMD made a number of refits
and improvements -- especially around precision.  This is something
that finally reared its ugly head in the Pentium (more on that in a
bit).

Moore and Sanders were colleagues at Fairchild, and AMD was the
fabbing and process leader, while Intel was more innovative.

This was apparent in their 486 and K5 units, versus Intel 486 units.

The later K5 units were still AMD's FPU, just added to the NexGen
RISC86 core they acquired.  The NexGen 6x86 then became the K6, with
the AMD FPU.  The NexGen ALU design was not only 3x faster than the
486, but the Pentium couldn't compete -- in fact, it was a whole slew
of mis-designs that were supposed to make it more competitive with the
NexGen that backfired.

Probably the saving grace of the Pentium, since it's ALU was
completely defective -- don't get me started, the Pentium Pro was an
ALU re-design by Digital -- was that Intel pipelined the FPU, while
AMD had not.  This allowed Intel to 'hack' the design flaws of the
32-bit ALU, but using a rippled set of instructions to use its FPU to
load 2 integers.

Yes, they were literally using the FPU to load integers, nothing more.
Because the Pentium FPU had so many design flaws in other areas, that
it finally caused a full recall due to completely out-of-whack
precision.  FPU units always have precision issues, but these were
completely unacceptable in the Pentium's design flaws.  However, for
games of the era, 32-bit integers were still used, and the flawed,
poorly performing LOAD of the ALU was easily bypassed by the LOAD of
the FPU ... even for loading integers.

Aka 'Pentium optimizations' that not only crippled all other
competitors, but the 486 and even hurt the Pentium Pro (and
II/III/IV).  That's why no one should use Pentium optimizations unless
they actually have a genuine Pentium.  The Math Matrix eXtensions
(MMX) of the Pentium came out of a refit of the FPU to handle loading
16-bit and 32-bit integers.

Insert long-standing debate of i486 v. i586 v. i686 optimizations,
pre-x86_64 kernels.  The few distros that went i586 didn't perform as
well on non-Pentiums as i486, and i686 optimizations were the best
around for everyone ... as long as one had a 686 compatible (and
especially for the later, in-line, 2-issue Atom).

AMD then, of course, over-designed the 'Hammer' to have a 3-issue,
pipelined FPU.  It was so good, including a 3x issue ADD/MULTI --
while the Pentium Pro/II/III could still only do 2x issue ADD, or 1x
MULTI, that it could be re-microcoded for just about any SIMD
extensions Intel came up with.  That's why Intel added dedicated SIMD
units in the Pentium IV.

All the meanwhile, SIMD units had massive precision issues, while
AMD's full '87 compatible design could do 2x issue FPU '87, and with
more precision.  It wasn't even close.

Intel finally 'got serious' with the IA-64 Itanium FPU, but it
couldn't compete even with its native EPIC and Predication, even
putting x86 aside (don't get me started).

Luckily for Intel, Project Yonah was taking the Pentium III turned
'mobile 32-bit' designs, in addition to the in-line, 2-issue x86
design (which would become Atom), and serious engineering work was put
into bringing the full SIMD into the fold, even if not the FPU.  That
kept Intel ahead, as they had the markethold to tell software vendors
what extensions to optimize for.

The 2nd gen, the i-Core series, finally brought a full, improved FPU,
in addition to SIMD.  That's where Intel, for the very first time,
started to challenge AMD not just in SIMD instructions, but actual,
double-precision, FPU.

Since then, Intel's had the lead in fabrication, so no matter how well
AMD optimizes, they've been behind ... especially after they went
fabless.  Intel's ALU/FPU/SIMD design + 18-24 month fabrication had
always been 15-25% faster than AMD with the same code.

Now Intel is increasingly outsourcing fabrication, and hasn't
redesigned the 'Core' in a full decade, while AMD not only merged
their 2 design lines (one SoC, one added chipset + CPU) into the new,
unified, always SoC (system on a chip) 'Zen' microarchitecture, but
Zen2 was designed for TSMC's 7nm process.

So ... even the 14nm Zen 'Threadripper' and, even more so, refit Zen+,
were already killing Intel in threading, so much so Intel had to
re-brand their high core count Xeon E-series as the i9 to compete with
the 12+ core 1st gen Zen.  All while only being 8% behind in single
core execution with typical, Intel optimized code.

FPU-wise, it's bad for Intel, don't get me started ... and they've
always had precision issues -- which is why AMD has often been
preferred there, when precision matters.

The 7nm Zen2 arriving this year, assuming TSMC doesn't have yield
issues or other things post-tapeout, means AMD is going to start
beating Intel pretty badly.  And Intel is already getting hammered by
ARM to the point Intel re-licensed ARM (after selling of StrongARM to
Marvell back in '06, thinking Atom was the future).  Atom has
continually been beaten by ARM, and even more so after ARM went
superscalar, and Atom's redesign for superscalar made it worse
performance than i-Core series in a lower power state (negating any
consideration for Atom).

And because Zen is a SoC, boards are dirt cheap.  Everything is
on-chip.  Heck, one really doesn't even need a socket, and I wish AMD
would just push BGA (soldiered chips) on-board, because the boards are
almost no cost.  They already do this in their embedded and single-die
server boards.

But consumers seem to be allergic to it -- ergo, where Socket-AM1 came from.

I.e., AMD did push the Socket-FT1, then FT3[b] -- the 'Socket' being a
misnomer as its soldiered Ball Grid Array (BGA) 423 and then BGA-769
-- with even more on-chip in the latter case.  AMD wanted to stop
producing sockets altogether, and completely shift to soldiered BGA,
not just in select notebooks, or not just in embedded and industrial,
but all low-end consumer products ... including not just PC OEM cheap
systems using Pico-ITX and Nano-ITX 'little boxes,' but Mini-ITX and
other, smaller ITX boards themselves, to self-assemblers.

These BGA-423/769 are the units in everything from low-spec, dual-core
E1-3 to faster, quad-core A4-10 units, including the pair of units in
the Xbox One and Playstation 4.  The consoles use a variant of FT3,
two (2) of them on-board, plus GPUs on each die (for cost
considerations).  Cheap, inexpensive, SoC BGA-769 itself actually
supports up to four (4) sockets, no different than Servers
(HyperTransport broadcast domain signaling).  The consoles use two (2)
sockets.

Cost is king in high volume, hence why AMD would be considered over Intel.
But unlike other platforms, again, consumers didn't accept Socket-FT/BGA units.

So AMD introduced ~$30 Socket-FS1[b], later renamed Socket-AM1, boards
which were basically a bunch of traces and a PGA-ZIF of 721 pins -- a
little left off of the BGA-769 (long story, they were reusing another
design).

SIDE NOTE:  Although ARM's superscalar designs in 32, 64 and even
higher core counts, along with far more GPU options -- e.g., ARM's own
Mali, PowerVR (yes, they're still around, doing ARM GPUs) and others,
along with AMD and nVidia themselves (yes, they sell ARM chips with
GPUs in-die) -- will likely spell the end for Intel x86 in gaming next
decade.  Linux/aarch64 is the key, as 80% of Windows libraries are
Win32/x86-only (don't get me started).

Now, with Zen, AMD has gone full SoC on everything.  It's not just the
low-end, but even high core count Epyc series, which replaces all
embedded, Server and other lines (often sold as the "G," short for the
prior "Geode" branding).  And it's sitll HyperTransport (even though
they call it something else), capable of multiple, soldiered sockets.
AMD also adds in ARM cores for embedded control and hypervisory
functions (long story) in the Epyc line.

We're really looking at the last era of x86-64 in these next 3 years.
By the mid 2020s, x86-64 will not only be behind, but ARM licensees
and its greater options in GPU selection, will really become
commodity.  The entire x86 world is surviving on inertia.

In fact, AMD reaching 7nm in Zen2 within 9 months of ARM is really
going to save x86-64 in 2019.  Because Intel is still 'stuck' at 10nm,
which is already late.  I mean, we're seeing even 20nm fabricated,
high core count, high L3 cache size, ARM Server chips matching 14nm
Xeon in benchmarks, at much, much less power.

> My background is mostly as an OS Engineer - system administrator, systems architect - tailoring the OS to meet company standards, setting standards and policies.

I'm just a computer/UNIX geek'n wennie, but my education is EE
w/Computer Architecture (semiconductor layout) that has done quite a
bit of embedded, although I do far more cloud automation now.

> Now a days, I have left that field and go to the dark side - security - so not much techie stuff hands on.

My problem with security is too many vendors charging too much for
vendor lock-in.  I'm fighting that at my current employer, who seems
to buy whomever the Windows teams decides upon, being wholly ignorant
of SELinux-auditd, SSSD-PAM, Ansible-Vault and other things.

In fact I have a meeting tomorrow morning where I'm going to ask the
vendor how their product improves over what we already use and do.

> Your explanation made sense and made me look up stuff. :) thank you.

There's a huge history here, and it's largely ugly and based on
Intel's control of the market.  But AMD has cracked it wide open
again, as i-Core is getting old.

Just like AMD rode 'Hammer' too long and Intel bested them by 2010.

> Too bad, i could not make it into the meet yesterday, had to do a chocolate and tiny gift bag stuffer for my son's school valentines day. Hopefully, next month.

I'm in the middle of Alabama again, after being in DC awhile.  I'm
just not as young as I was once, and cannot work all the hours.

Now I get to argue for keeping our core of our systems open source and
open standard, including how many others in our industry, even
regulatory, work.  But for some reason, storing sudoer policies along
with openssh keys in LDAP, along with taking away logins and forcing
use of Ansible-Vault, it 'too cheap' for them, and they want to insert
a vendor that charges up to $300/CAL-system that breaks the OS
updates.

Because Windows security is so silo'd and not built into the OS, they
don't understand.  So I end up hacking up PAM into a mess so their
various, security vendors they've selected for Windows can work on
Linux ... even if removing supportability of the platform.  That along
with creating RPMs for maintainability, ripping their SPEC files and
scripts apart for totally being non-compliant with Fedora/Red Hat
guildelines.

And I do it, because they don't know how.  Even their vendors hit me
on LinkedIn after I do it, usually with the title 'Recruiter'
after-the-fact.  I'm sure if I wanted to make more money, I could work
for them.  Although I am getting tired of seeing people leave a
company for a security after they make a large purchase of their
software.

In the federal-DC world, that would be -- literally -- criminal.

- bjs

-- 
Bryan J Smith  -  http://www.linkedin.com/in/bjsmith
E-mail:  b.j.smith at ieee.org  or  me at bjsmith.me



More information about the CALUG mailing list