Message-ID:

Computers are like air conditioners. Both stop working, if you open windows. -- Adam Heath

devel / comp.arch / Intel goes to 32-bit general purpose registers

Intel goes to 32-bit general purpose registers

<u9o14h$183or$1@newsreader4.netcologne.de>

https://news.novabbs.org/devel/article-flat.php?id=33448&group=comp.arch#33448

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-29be-0-aaa5-66e9-5b87-33fe.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Intel goes to 32-bit general purpose registers
Date: Tue, 25 Jul 2023 08:29:05 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <u9o14h$183or$1@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=ANSI_X3.4-1968
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 25 Jul 2023 08:29:05 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-29be-0-aaa5-66e9-5b87-33fe.ipv6dyn.netcologne.de:2001:4dd6:29be:0:aaa5:66e9:5b87:33fe";
logging-data="1314587"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Tue, 25 Jul 2023 08:29 UTC

This is slightly hilarious, I assume they will have used a crowbar
to put it into their encoding (probably a two-byte prefix).

"Intel® APX doubles the number of general-purpose registers (GPRs)
from 16 to 32. This allows the compiler to keep more values in
registers; as a result, APX-compiled code contains 10% fewer loads
and more than 20% fewer stores than the same code compiled for an
Intel® 64 baseline.2 Register accesses are not only faster, but
they also consume significantly less dynamic power than complex
load and store operations."

https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html

Re: Intel goes to 32-bit general purpose registers

<d6RvM.21474$cc2c.10950@fx37.iad>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33455&group=comp.arch#33455

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx37.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Intel goes to 32-bit general purpose registers
Newsgroups: comp.arch
Distribution: world
References: <u9o14h$183or$1@newsreader4.netcologne.de>
Lines: 26
Message-ID: <d6RvM.21474$cc2c.10950@fx37.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 25 Jul 2023 14:36:57 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Tue, 25 Jul 2023 14:36:57 GMT
X-Received-Bytes: 1859

by: Scott Lurndal - Tue, 25 Jul 2023 14:36 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
>This is slightly hilarious, I assume they will have used a crowbar
>to put it into their encoding (probably a two-byte prefix).

The article seems to imply a single prefix byte (REX2), which may
be combined with existing prefix bytes to alter the interpretation
of the register field in the instructions. They also claim similar
code density to existing code.

"The performance features introduced so far will have limited
impact in workloads that suffer from a large number of conditional
branch mispredictions. As out-of-order CPUs continue to become
deeper and wider, the cost of mispredictions increasingly dominates
performance of such workloads. Branch predictor improvements can
mitigate this to a limited extent only as data-dependent branches
are fundamentally hard to predict."

So, they're adding additional conditional instructions along
the same lines as the existing CSET/CMOV.

The document finishes with

"Intel� APX demonstrates the advantage of the variable-length
instruction encodings of x86 - new features enhancing
the entire instruction set can be defined with only incremental
changes to the instruction-decode hardware."

Re: Intel goes to 32-bit general purpose registers

<0cf10981-cb89-4d23-b715-d09dcf84fc34n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33456&group=comp.arch#33456

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:4f85:0:b0:63d:3d3:1dd4 with SMTP id em5-20020ad44f85000000b0063d03d31dd4mr9174qvb.4.1690295999607;
Tue, 25 Jul 2023 07:39:59 -0700 (PDT)
X-Received: by 2002:a05:6808:128b:b0:3a4:13ba:9fe with SMTP id
a11-20020a056808128b00b003a413ba09femr26792668oiw.10.1690295999434; Tue, 25
Jul 2023 07:39:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 25 Jul 2023 07:39:59 -0700 (PDT)
In-Reply-To: <u9o14h$183or$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa34:c000:c0f:1d8e:2d47:32aa;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa34:c000:c0f:1d8e:2d47:32aa
References: <u9o14h$183or$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0cf10981-cb89-4d23-b715-d09dcf84fc34n@googlegroups.com>
Subject: Re: Intel goes to 32-bit general purpose registers
From: jsavard@ecn.ab.ca (Quadibloc)
Injection-Date: Tue, 25 Jul 2023 14:39:59 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 25

by: Quadibloc - Tue, 25 Jul 2023 14:39 UTC

On Tuesday, July 25, 2023 at 2:29:08 AM UTC-6, Thomas Koenig wrote:
> This is slightly hilarious, I assume they will have used a crowbar
> to put it into their encoding (probably a two-byte prefix).
>
> "IntelÂ® APX doubles the number of general-purpose registers (GPRs)
> from 16 to 32. This allows the compiler to keep more values in
> registers; as a result, APX-compiled code contains 10% fewer loads
> and more than 20% fewer stores than the same code compiled for an
> IntelÂ® 64 baseline.2 Register accesses are not only faster, but
> they also consume significantly less dynamic power than complex
> load and store operations."
>
> https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html

Oh, wow, Intel will be just as good as RISC!

Of course, there is room for further improvement. They already have, on the shelf, technology to
increase to 128 registers. If they find a way to improve performance still further when switching
to Itanium mode, then they will really dominate the field!

John Savard

Re: Intel goes to 32-bit general purpose registers

<db82b799-811e-413e-b838-98b85b96a3b1n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33457&group=comp.arch#33457

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:7502:b0:403:e7aa:4bae with SMTP id jm2-20020a05622a750200b00403e7aa4baemr11257qtb.2.1690297204504;
Tue, 25 Jul 2023 08:00:04 -0700 (PDT)
X-Received: by 2002:a05:6870:9554:b0:1b0:7c0b:7db7 with SMTP id
v20-20020a056870955400b001b07c0b7db7mr16408627oal.8.1690297204098; Tue, 25
Jul 2023 08:00:04 -0700 (PDT)
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 25 Jul 2023 08:00:03 -0700 (PDT)
In-Reply-To: <d6RvM.21474$cc2c.10950@fx37.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d443:7c82:54ab:74a9;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d443:7c82:54ab:74a9
References: <u9o14h$183or$1@newsreader4.netcologne.de> <d6RvM.21474$cc2c.10950@fx37.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <db82b799-811e-413e-b838-98b85b96a3b1n@googlegroups.com>
Subject: Re: Intel goes to 32-bit general purpose registers
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Tue, 25 Jul 2023 15:00:04 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: MitchAlsup - Tue, 25 Jul 2023 15:00 UTC

On Tuesday, July 25, 2023 at 9:37:01 AM UTC-5, Scott Lurndal wrote:
> Thomas Koenig <tko...@netcologne.de> writes:
> >This is slightly hilarious, I assume they will have used a crowbar
> >to put it into their encoding (probably a two-byte prefix).
> The article seems to imply a single prefix byte (REX2), which may
> be combined with existing prefix bytes to alter the interpretation
> of the register field in the instructions. They also claim similar
> code density to existing code.
<
Right, they save instructions but eat bytes in the instruction stream.
Net density is a wash.
<
When I was at AMD we looked at using the (then) REX prefix twice
to gain access to 32-registers.
<
But, here, Intel added another prefix to go from Ra = Ra op Rb into
Ra = Rb op Rc. {in case you missed it}
>
> "The performance features introduced so far will have limited
> impact in workloads that suffer from a large number of conditional
> branch mispredictions. As out-of-order CPUs continue to become
> deeper and wider, the cost of mispredictions increasingly dominates
> performance of such workloads. Branch predictor improvements can
> mitigate this to a limited extent only as data-dependent branches
> are fundamentally hard to predict."
>
> So, they're adding additional conditional instructions along
> the same lines as the existing CSET/CMOV.
>
> The document finishes with
>
> "Intel® APX demonstrates the advantage of the variable-length
> instruction encodings of x86 - new features enhancing
> the entire instruction set can be defined with only incremental
> changes to the instruction-decode hardware."
<
I have been advocating since about 2015 that RISC should go
with variable length instructions.

Intel goes to 32 GPRs (was: Intel goes to 32-bit ...)

<2023Jul25.175501@mips.complang.tuwien.ac.at>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33467&group=comp.arch#33467

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Intel goes to 32 GPRs (was: Intel goes to 32-bit ...)
Date: Tue, 25 Jul 2023 15:55:01 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 55
Distribution: world
Message-ID: <2023Jul25.175501@mips.complang.tuwien.ac.at>
References: <u9o14h$183or$1@newsreader4.netcologne.de>
Injection-Info: dont-email.me; posting-host="596b3fec98eb87da8610821f45303ebd";
logging-data="1237067"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19FcyRsDFAQegDnAVFOaWKJ"
Cancel-Lock: sha1:vka1vtncNPzyCs/wWFmLcc6zzMM=
X-newsreader: xrn 10.11

by: Anton Ertl - Tue, 25 Jul 2023 15:55 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
>This is slightly hilarious, I assume they will have used a crowbar
>to put it into their encoding (probably a two-byte prefix).

Somewhat. They have a 2-byte REX2 prefix for extending the existing
destructive instructions, which looks like what one would expect; the
bit3 bits are in the same place as in the REX prefix, and the bit4
bits are adjacent to them, which makes assembling and disassembling
slightly more complicated than if they had had two-bit fields.

They also have a 4-byte NDD prefix (from the EVEX encoding space) that
encodes the additional bits of the existing registers and a whole new
destination register to turn existing destructive instructions into
non-destructive instructions. There the encodings of many, but not
all bits are inverted, and the bits are distributed in an interesting
way (probably due to how EVEX has been previously defined, and how VEX
has been defined).

So an instruction with a REX2 prefix is at least three bytes long (and
five bytes is probably more typical). And instruction with a new
destination is at least two bytes longer. That's often still shorter
than prefixing the instruction with a register-register move (the
current idiom if you don't want a non-destructive use).

>https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html

So in 20 years we may be seeing this to become the new architecture,
or maybe not. Some time ago people proposed the x32 ABI, a 32-bit ABI
using the AMD64 instruction set. This was sold on better prerformance
then IA-32 (because of more registers) and AMD64 (because of fewer
cache misses). It did not take off.

So when in 20 years, APX is universally available, will people switch
to a new ABI for it? Doubtful.

Intel seems to doubt it themselves, so they recommend that the new
registers be caller-saved for compatibility with existing ABIs. If
that recommendation is followed, it also means that the additional
registers will be not as useful as they might otherwise be.

E.g., GCC does not use caller-saved registers for virtual-machine
registers in Gforth, even if they are used hundreds of times in
engine(), because these registers live across dozens of calls. As a
result GForth could not use more registers for virtual-machine
registers on Alpha than on AMD64 and barely more than on IA-32. So if
the recommendation is followed, the additional registers will be
hardly used in Gforth's engine().

OTOH, given the x32 experience, it's unlikely that we will see an
alternative ABI.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Intel goes to 32-bit general purpose registers

<989dc353-b18d-4a40-8b61-11f65b4b1bd0n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33481&group=comp.arch#33481

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:1815:b0:400:a226:316e with SMTP id t21-20020a05622a181500b00400a226316emr2585qtc.0.1690343343774;
Tue, 25 Jul 2023 20:49:03 -0700 (PDT)
X-Received: by 2002:a05:6808:181e:b0:3a3:7087:bbfb with SMTP id
bh30-20020a056808181e00b003a37087bbfbmr2163720oib.6.1690343343465; Tue, 25
Jul 2023 20:49:03 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 25 Jul 2023 20:49:03 -0700 (PDT)
In-Reply-To: <db82b799-811e-413e-b838-98b85b96a3b1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa34:c000:89e4:8dc9:1e1e:6325;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa34:c000:89e4:8dc9:1e1e:6325
References: <u9o14h$183or$1@newsreader4.netcologne.de> <d6RvM.21474$cc2c.10950@fx37.iad>
<db82b799-811e-413e-b838-98b85b96a3b1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <989dc353-b18d-4a40-8b61-11f65b4b1bd0n@googlegroups.com>
Subject: Re: Intel goes to 32-bit general purpose registers
From: jsavard@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 26 Jul 2023 03:49:03 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2071

by: Quadibloc - Wed, 26 Jul 2023 03:49 UTC

On Tuesday, July 25, 2023 at 9:00:07 AM UTC-6, MitchAlsup wrote:

> Right, they save instructions but eat bytes in the instruction stream.
> Net density is a wash.

I looked into this new instruction set a bit more.

Intel recognized this issue. So for your prefix byte, instead of *just*
getting to use 32 registers instead of 16, you *also* get to refer to
three registers instead of just two in the instruction!

That may help a bit, but I suspect it won't be enough. However, it
could be that the instructions with 16 registers are stuff that came in
with the 386, and so those instructions had a prefix byte too. In which
case, the new instructions could just have a different prefix byte, and
not be a byte longer.

John Savard

Re: Intel goes to 32 GPRs (was: Intel goes to 32-bit ...)

<c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33482&group=comp.arch#33482

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:28c:b0:400:8493:83c7 with SMTP id z12-20020a05622a028c00b00400849383c7mr2718qtw.3.1690343890926;
Tue, 25 Jul 2023 20:58:10 -0700 (PDT)
X-Received: by 2002:a05:6808:1899:b0:3a5:a79b:56b1 with SMTP id
bi25-20020a056808189900b003a5a79b56b1mr2291639oib.9.1690343890593; Tue, 25
Jul 2023 20:58:10 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 25 Jul 2023 20:58:10 -0700 (PDT)
In-Reply-To: <2023Jul25.175501@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa34:c000:89e4:8dc9:1e1e:6325;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa34:c000:89e4:8dc9:1e1e:6325
References: <u9o14h$183or$1@newsreader4.netcologne.de> <2023Jul25.175501@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com>
Subject: Re: Intel goes to 32 GPRs (was: Intel goes to 32-bit ...)
From: jsavard@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 26 Jul 2023 03:58:10 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: Quadibloc - Wed, 26 Jul 2023 03:58 UTC

On Tuesday, July 25, 2023 at 10:44:12 AM UTC-6, Anton Ertl wrote:

> So when in 20 years, APX is universally available, will people switch
> to a new ABI for it? Doubtful.

You're talking about this as if people have a _choice_.

Instead, it will probably go like this:

Intel puts this new feature in all their products next year.

Ten years from that, Microsoft starts using it to make Windows
run faster, and the hardware requirements for the latest upgrade
require a processor not more than ten years old from Intel.

Therefore, Intel will have immense leverage over AMD to get
them to license this new feature, so that they can stay in business
ten years from now.

John Savard

Re: Intel goes to 32-bit general purpose registers

<2023Jul26.072712@mips.complang.tuwien.ac.at>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33483&group=comp.arch#33483

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Intel goes to 32-bit general purpose registers
Date: Wed, 26 Jul 2023 05:27:12 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 63
Message-ID: <2023Jul26.072712@mips.complang.tuwien.ac.at>
References: <u9o14h$183or$1@newsreader4.netcologne.de> <d6RvM.21474$cc2c.10950@fx37.iad> <db82b799-811e-413e-b838-98b85b96a3b1n@googlegroups.com> <989dc353-b18d-4a40-8b61-11f65b4b1bd0n@googlegroups.com>
Injection-Info: dont-email.me; posting-host="6d2084336482f7ceb567b40f7314e9ba";
logging-data="1505697"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+rmMeIRQdVPZEBGZsMONnO"
Cancel-Lock: sha1:FWTBfPkmkNMEdFj762/q0EAA6Aw=
X-newsreader: xrn 10.11

by: Anton Ertl - Wed, 26 Jul 2023 05:27 UTC

Quadibloc <jsavard@ecn.ab.ca> writes:
>Intel recognized this issue. So for your prefix byte, instead of *just*
>getting to use 32 registers instead of 16, you *also* get to refer to
>three registers instead of just two in the instruction!

No. IA-32 already has three registers in the instruction, e.g.,

add ebx, 123[ecx+8*edx]

So AMD64 extended this with the REX prefix to support

add r8, 123[r9+8*r10]

and provides one extra bit for every register (as well as one bit for
indicating 64-bit operation).

With REX2 (2 bytes) you get two extra bits for each of these
registers, for a total of 5 bits.

Alternatively, you can use the NDD prefix from the EVEX encoding space
(4 bytes) to encode these 6 extra bits, plus five bits for a new
destination register.

The REX2 prefix contains one bit that selects between legacy map 0 and
1 (legacy maps 2 and 3 cannot use the REX2 prefix). Map 1 are the
instructions that start with a 0F escape, but not 0F38 nor 0F3A (maps
2 and 3). So with the REX2 prefix you replace the 0F prefix of the
map1 instructions. That's the advantage of having the REX2 prefix
over having two REX prefixes as AMD considered according to Mitch
Alsup. The disadvantage is that it occupies the D5 byte of AMD64
(which is AAD in IA-32, but apparently has been reserved in AMD64).

The NDD prefix contains three bits for opcode maps and replaces the OF
prefix byte (so it costs only three extra bytes in this case) as well
as the 0F38 and 0F3A prefixes (costing only two extra bytes in this
case). There is also space for four additional opcode maps when using
the NDD prefix (not used up to now).

I am not versed enough in the encoding of IA-32/AMD64 to know how
frequent the different encoding maps are used.

>However, it
>could be that the instructions with 16 registers are stuff that came in
>with the 386, and so those instructions had a prefix byte too.

No, the extension from 8+8 (GPR+SSE) registers (IA-32) to 16+16
registers came with AMD64, and yes, AMD64 used the REX prefix for
that.

>In which
>case, the new instructions could just have a different prefix byte, and
>not be a byte longer.

Instructions from map 0 become one byte longer with the REX2 prefix
than with the REX prefix. Instructions from map 1 (0F prefix) stay
the same length with the REX2 prefix as with the REX prefix.
Instructions from map 2 and map 3 become one byte longer with the NDD
prefix than with the REX prefix.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Intel goes to 32 GPRs (was: Intel goes to 32-bit ...)

<2023Jul26.080443@mips.complang.tuwien.ac.at>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33484&group=comp.arch#33484

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Intel goes to 32 GPRs (was: Intel goes to 32-bit ...)
Date: Wed, 26 Jul 2023 06:04:43 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 49
Message-ID: <2023Jul26.080443@mips.complang.tuwien.ac.at>
References: <u9o14h$183or$1@newsreader4.netcologne.de> <2023Jul25.175501@mips.complang.tuwien.ac.at> <c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com>
Injection-Info: dont-email.me; posting-host="6d2084336482f7ceb567b40f7314e9ba";
logging-data="1509604"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18vCqu2jvW7hXuZowBFbu2E"
Cancel-Lock: sha1:Lyv7F5bN+nUkut2R9reTPcq69+c=
X-newsreader: xrn 10.11

by: Anton Ertl - Wed, 26 Jul 2023 06:04 UTC

Quadibloc <jsavard@ecn.ab.ca> writes:
>On Tuesday, July 25, 2023 at 10:44:12=E2=80=AFAM UTC-6, Anton Ertl wrote:
>
>> So when in 20 years, APX is universally available, will people switch=20
>> to a new ABI for it? Doubtful.=20
>
>You're talking about this as if people have a _choice_.
>
>Instead, it will probably go like this:
>
>Intel puts this new feature in all their products next year.

I wish. AVX and AVX512 tell a different story. Intel still sells
CPUs (e.g., the Pentium Gold G6400) that willfully do not support AVX
(even though the hardware is capable of it). Maybe Intel marketing
will decide that APX is a premium feature and will disable it in the
less-expensive CPUs for a decade or two.

Meanwhile, AMD will put it into Zen6 or so, and support it across all
their CPUs from that point onwards, like they have done for AVX since
Bulldozer (2011) and Jaguar (2013), for AVX2 since Excavator (2015),
and for AVX-512 since Zen4 (2022).

>Ten years from that, Microsoft starts using it to make Windows
>run faster, and the hardware requirements for the latest upgrade
>require a processor not more than ten years old from Intel.
>
>Therefore, Intel will have immense leverage over AMD to get
>them to license this new feature, so that they can stay in business
>ten years from now.

Unlikely. Microsoft wants hardware at commodity prices in order to
make its software the premium feature, so they won't support APX
unless AMD also has it, because they don't want to give Intel more
monopoly power than it already has. So if Intel does not license APX
to AMD, Microsoft won't use APX, and most other software vendors will
not use it, either (except those paid by Intel; that would be the
premium-feature scenario, but then Intel probably won't support it on
their cheaper CPUs, either).

Apart from that Intel and AMD have patent exchange agreements, and AMD
probably has enough patents that are relevant for Intel that they
don't want to stop extending these agreements, so Intel probably
licenses APX to AMD as part of these agreements.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Intel goes to 32 GPRs

<u9qo9l$1f22o$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33485&group=comp.arch#33485

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Intel goes to 32 GPRs
Date: Wed, 26 Jul 2023 11:16:36 +0200
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <u9qo9l$1f22o$1@dont-email.me>
References: <u9o14h$183or$1@newsreader4.netcologne.de>
<2023Jul25.175501@mips.complang.tuwien.ac.at>
<c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 26 Jul 2023 09:16:37 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="0cf6fe72e6c6af6f081f08e53a086dd5";
logging-data="1542232"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19wSbq3YaQPBsoMI7z+yKPGrod6hu+TIoMYKVWVvdz3Qw=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.16
Cancel-Lock: sha1:GYxPNiNHc4qiw1bjBYPcAvrDV0g=
In-Reply-To: <c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com>

by: Terje Mathisen - Wed, 26 Jul 2023 09:16 UTC

Quadibloc wrote:
> On Tuesday, July 25, 2023 at 10:44:12â¯AM UTC-6, Anton Ertl wrote:
>
>> So when in 20 years, APX is universally available, will people switch
>> to a new ABI for it? Doubtful.
>
> You're talking about this as if people have a _choice_.
>
> Instead, it will probably go like this:
>
> Intel puts this new feature in all their products next year.
>
> Ten years from that, Microsoft starts using it to make Windows
> run faster, and the hardware requirements for the latest upgrade
> require a processor not more than ten years old from Intel.

Possibly true.
>
> Therefore, Intel will have immense leverage over AMD to get
> them to license this new feature, so that they can stay in business
> ten years from now.

Rather the opposite: Intel have to persuade AMD to stay compatible,
otherwise there is zero chance of Microsoft ever making it a requirement.

BTW, when you guys are discussing 16 vs 32 general registers and 3 vs
2-reg instructions forms, you just have to look at the AVX2++
instruction sets! The SIMD part of the cpus got both of these features
several years ago, allowing the same for integer regs is probably a very
small update to the instructions decoders. (Unless the encodings are
incompatible, which I would dismiss as very unlikely.)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Intel goes to 32 GPRs

<u9qrho$19u0h$1@newsreader4.netcologne.de>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33487&group=comp.arch#33487

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-29be-0-b487-e986-3c8b-cde2.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Intel goes to 32 GPRs
Date: Wed, 26 Jul 2023 10:12:08 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <u9qrho$19u0h$1@newsreader4.netcologne.de>
References: <u9o14h$183or$1@newsreader4.netcologne.de>
<2023Jul25.175501@mips.complang.tuwien.ac.at>
<c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com>
<u9qo9l$1f22o$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 26 Jul 2023 10:12:08 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-29be-0-b487-e986-3c8b-cde2.ipv6dyn.netcologne.de:2001:4dd6:29be:0:b487:e986:3c8b:cde2";
logging-data="1374225"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Wed, 26 Jul 2023 10:12 UTC

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
> Quadibloc wrote:
>> On Tuesday, July 25, 2023 at 10:44:12â¯AM UTC-6, Anton Ertl wrote:
>>
>>> So when in 20 years, APX is universally available, will people switch
>>> to a new ABI for it? Doubtful.
>>
>> You're talking about this as if people have a _choice_.
>>
>> Instead, it will probably go like this:
>>
>> Intel puts this new feature in all their products next year.
>>
>> Ten years from that, Microsoft starts using it to make Windows
>> run faster, and the hardware requirements for the latest upgrade
>> require a processor not more than ten years old from Intel.
>
> Possibly true.
>>
>> Therefore, Intel will have immense leverage over AMD to get
>> them to license this new feature, so that they can stay in business
>> ten years from now.
>
> Rather the opposite: Intel have to persuade AMD to stay compatible,
> otherwise there is zero chance of Microsoft ever making it a requirement.

One problem I see is that all the new registers are caller-saved,
for compatibility with existing ABIs. This is needed due to stack
unwinding and setjmp/longjmp, but restricts their benefit due to
having to spill them across function calls. It might be possible
to set __attribute__((nothrow)) on functions where this cannot
happen, and change some caller-saved to callee-saved registers
in that case, but that could be an interesting discussion.

Re: Intel goes to 32 GPRs

<u9r0jd$1ftnb$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33488&group=comp.arch#33488

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Intel goes to 32 GPRs
Date: Wed, 26 Jul 2023 13:38:21 +0200
Organization: A noiseless patient Spider
Lines: 47
Message-ID: <u9r0jd$1ftnb$1@dont-email.me>
References: <u9o14h$183or$1@newsreader4.netcologne.de>
<2023Jul25.175501@mips.complang.tuwien.ac.at>
<c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com>
<u9qo9l$1f22o$1@dont-email.me> <u9qrho$19u0h$1@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 26 Jul 2023 11:38:21 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="0cf6fe72e6c6af6f081f08e53a086dd5";
logging-data="1570539"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/dFW/roVRcEti8pdkodbc2BuigEoxgL1urLM1iju6UXg=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.16
Cancel-Lock: sha1:chXyjG7G+jnoRC4EtV4DMwXFev4=
In-Reply-To: <u9qrho$19u0h$1@newsreader4.netcologne.de>

by: Terje Mathisen - Wed, 26 Jul 2023 11:38 UTC

Thomas Koenig wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
>> Quadibloc wrote:
>>> On Tuesday, July 25, 2023 at 10:44:12Ã¢ÂÂ¯AM UTC-6, Anton Ertl wrote:
>>>
>>>> So when in 20 years, APX is universally available, will people switch
>>>> to a new ABI for it? Doubtful.
>>>
>>> You're talking about this as if people have a _choice_.
>>>
>>> Instead, it will probably go like this:
>>>
>>> Intel puts this new feature in all their products next year.
>>>
>>> Ten years from that, Microsoft starts using it to make Windows
>>> run faster, and the hardware requirements for the latest upgrade
>>> require a processor not more than ten years old from Intel.
>>
>> Possibly true.
>>>
>>> Therefore, Intel will have immense leverage over AMD to get
>>> them to license this new feature, so that they can stay in business
>>> ten years from now.
>>
>> Rather the opposite: Intel have to persuade AMD to stay compatible,
>> otherwise there is zero chance of Microsoft ever making it a requirement.
>
> One problem I see is that all the new registers are caller-saved,
> for compatibility with existing ABIs. This is needed due to stack
> unwinding and setjmp/longjmp, but restricts their benefit due to
> having to spill them across function calls. It might be possible
> to set __attribute__((nothrow)) on functions where this cannot
> happen, and change some caller-saved to callee-saved registers
> in that case, but that could be an interesting discussion.
>

I'm not worried at all about this point: The only places where I really
want lots of registers are in big/complicated leaf functions!

If a function both needs lots of registers _and_ have to call any
non-inlined functions, then it really isn't that time critical.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Intel goes to 32 GPRs

<u9r24c$1a2ag$1@newsreader4.netcologne.de>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33489&group=comp.arch#33489

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-29be-0-b487-e986-3c8b-cde2.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Intel goes to 32 GPRs
Date: Wed, 26 Jul 2023 12:04:28 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <u9r24c$1a2ag$1@newsreader4.netcologne.de>
References: <u9o14h$183or$1@newsreader4.netcologne.de>
<2023Jul25.175501@mips.complang.tuwien.ac.at>
<c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com>
<u9qo9l$1f22o$1@dont-email.me> <u9qrho$19u0h$1@newsreader4.netcologne.de>
<u9r0jd$1ftnb$1@dont-email.me>
Injection-Date: Wed, 26 Jul 2023 12:04:28 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-29be-0-b487-e986-3c8b-cde2.ipv6dyn.netcologne.de:2001:4dd6:29be:0:b487:e986:3c8b:cde2";
logging-data="1378640"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Wed, 26 Jul 2023 12:04 UTC

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
> Thomas Koenig wrote:

>> One problem I see is that all the new registers are caller-saved,
>> for compatibility with existing ABIs. This is needed due to stack
>> unwinding and setjmp/longjmp, but restricts their benefit due to
>> having to spill them across function calls. It might be possible
>> to set __attribute__((nothrow)) on functions where this cannot
>> happen, and change some caller-saved to callee-saved registers
>> in that case, but that could be an interesting discussion.
>>
>
> I'm not worried at all about this point: The only places where I really
> want lots of registers are in big/complicated leaf functions!
>
> If a function both needs lots of registers _and_ have to call any
> non-inlined functions, then it really isn't that time critical.

Fortran can use lots of registers for its array descriptors,
and also can use lots of library calls for mathematical functions
(because most CPUs don't have Mitch's single instructions for them).
Fortran library functions are typically __attribute__((nothrow)),
so in that field being able to use more registers across calls
would be a good thing, generally.

Re: Intel goes to 32 GPRs

<b8211322-affd-452a-ad9d-46dd46c705d2n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33490&group=comp.arch#33490

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:2150:b0:765:ada6:5733 with SMTP id m16-20020a05620a215000b00765ada65733mr6217qkm.10.1690389305509;
Wed, 26 Jul 2023 09:35:05 -0700 (PDT)
X-Received: by 2002:a05:6870:a89d:b0:1bb:7126:4ddc with SMTP id
eb29-20020a056870a89d00b001bb71264ddcmr45998oab.2.1690389305204; Wed, 26 Jul
2023 09:35:05 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 26 Jul 2023 09:35:04 -0700 (PDT)
In-Reply-To: <u9r0jd$1ftnb$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:6dba:c66b:d76:4185;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:6dba:c66b:d76:4185
References: <u9o14h$183or$1@newsreader4.netcologne.de> <2023Jul25.175501@mips.complang.tuwien.ac.at>
<c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com> <u9qo9l$1f22o$1@dont-email.me>
<u9qrho$19u0h$1@newsreader4.netcologne.de> <u9r0jd$1ftnb$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b8211322-affd-452a-ad9d-46dd46c705d2n@googlegroups.com>
Subject: Re: Intel goes to 32 GPRs
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Wed, 26 Jul 2023 16:35:05 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4379

by: MitchAlsup - Wed, 26 Jul 2023 16:35 UTC

On Wednesday, July 26, 2023 at 6:38:25 AM UTC-5, Terje Mathisen wrote:
> Thomas Koenig wrote:
> > Terje Mathisen <terje.m...@tmsw.no> schrieb:
> >> Quadibloc wrote:
> >>> On Tuesday, July 25, 2023 at 10:44:12â€¯AM UTC-6, Anton Ertl wrote:
> >>>
> >>>> So when in 20 years, APX is universally available, will people switch
> >>>> to a new ABI for it? Doubtful.
> >>>
> >>> You're talking about this as if people have a _choice_.
> >>>
> >>> Instead, it will probably go like this:
> >>>
> >>> Intel puts this new feature in all their products next year.
> >>>
> >>> Ten years from that, Microsoft starts using it to make Windows
> >>> run faster, and the hardware requirements for the latest upgrade
> >>> require a processor not more than ten years old from Intel.
> >>
> >> Possibly true.
> >>>
> >>> Therefore, Intel will have immense leverage over AMD to get
> >>> them to license this new feature, so that they can stay in business
> >>> ten years from now.
> >>
> >> Rather the opposite: Intel have to persuade AMD to stay compatible,
> >> otherwise there is zero chance of Microsoft ever making it a requirement.
> >
> > One problem I see is that all the new registers are caller-saved,
> > for compatibility with existing ABIs. This is needed due to stack
> > unwinding and setjmp/longjmp, but restricts their benefit due to
> > having to spill them across function calls. It might be possible
> > to set __attribute__((nothrow)) on functions where this cannot
> > happen, and change some caller-saved to callee-saved registers
> > in that case, but that could be an interesting discussion.
> >
> I'm not worried at all about this point: The only places where I really
> want lots of registers are in big/complicated leaf functions!
<
This is also my impression, non-leaf functions (but see caveat)
generally guide leaf functions through the calculations, while
leaf functions tend to grind on the data more than walk the
data structures.
<
Caveat 1: Elementary functions are used a lot in what would have
been a leaf function except for these intrinsics. {SIN(), COS(),
EXP(), Ln(),POW(,),...} Here the compiler could KNOW that those
registers are not damaged by that callee and pretend they are
callee-save. Making these OpCodes instead of Calls simplifies
this.
>
Caveat 2: Compilers could be given an __attribute__(NoDamage)
and not use the extended registers, thus alleviating the need to
save and/or restore.
<
> If a function both needs lots of registers _and_ have to call any
> non-inlined functions, then it really isn't that time critical.
<
The farther from the leaves you are, the more the code looks like
book-keeping and looping.
<
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: Intel goes to 32 GPRs

<2023Jul26.221142@mips.complang.tuwien.ac.at>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33492&group=comp.arch#33492

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Intel goes to 32 GPRs
Date: Wed, 26 Jul 2023 20:11:42 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 20
Message-ID: <2023Jul26.221142@mips.complang.tuwien.ac.at>
References: <u9o14h$183or$1@newsreader4.netcologne.de> <2023Jul25.175501@mips.complang.tuwien.ac.at> <c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com> <u9qo9l$1f22o$1@dont-email.me> <u9qrho$19u0h$1@newsreader4.netcologne.de> <u9r0jd$1ftnb$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="6d2084336482f7ceb567b40f7314e9ba";
logging-data="1674766"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18/80JAdMgyD4MXG1vvvCsO"
Cancel-Lock: sha1:q7Wy96TAefsAS2RaF8dKIhzn9uU=
X-newsreader: xrn 10.11

by: Anton Ertl - Wed, 26 Jul 2023 20:11 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:
>If a function both needs lots of registers _and_ have to call any
>non-inlined functions, then it really isn't that time critical.

Every interpreter calls non-inlined functions, and they often need a
lot of registers, or can make good use of them.

Now you may consider that to be not time-critical, but if you are
Intel and want to sell an interpreter user an Intel system, and it is
slow compared to the ARM64 or RISC-V systems, you will still lose the
sale.

Conversely, given that you spend so much time in your leaf function,
saving a few registers at the start and restoring them at the end
won't slow you down.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Intel goes to 32 GPRs

<u9s0m2$1jbl0$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33493&group=comp.arch#33493

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Intel goes to 32 GPRs
Date: Wed, 26 Jul 2023 22:45:54 +0200
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <u9s0m2$1jbl0$1@dont-email.me>
References: <u9o14h$183or$1@newsreader4.netcologne.de>
<2023Jul25.175501@mips.complang.tuwien.ac.at>
<c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com>
<u9qo9l$1f22o$1@dont-email.me> <u9qrho$19u0h$1@newsreader4.netcologne.de>
<u9r0jd$1ftnb$1@dont-email.me> <u9r24c$1a2ag$1@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 26 Jul 2023 20:45:54 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="0cf6fe72e6c6af6f081f08e53a086dd5";
logging-data="1683104"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/PsCzOuwoebtMrlLpSQtu6ZgvWzJwVTpC7BSC/E6zz3A=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.16
Cancel-Lock: sha1:0z5hYrVepa+nUcqfLmta7zqA4BI=
In-Reply-To: <u9r24c$1a2ag$1@newsreader4.netcologne.de>

by: Terje Mathisen - Wed, 26 Jul 2023 20:45 UTC

Thomas Koenig wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
>> Thomas Koenig wrote:
>
>>> One problem I see is that all the new registers are caller-saved,
>>> for compatibility with existing ABIs. This is needed due to stack
>>> unwinding and setjmp/longjmp, but restricts their benefit due to
>>> having to spill them across function calls. It might be possible
>>> to set __attribute__((nothrow)) on functions where this cannot
>>> happen, and change some caller-saved to callee-saved registers
>>> in that case, but that could be an interesting discussion.
>>>
>>
>> I'm not worried at all about this point: The only places where I really
>> want lots of registers are in big/complicated leaf functions!
>>
>> If a function both needs lots of registers _and_ have to call any
>> non-inlined functions, then it really isn't that time critical.
>
> Fortran can use lots of registers for its array descriptors,
> and also can use lots of library calls for mathematical functions
> (because most CPUs don't have Mitch's single instructions for them).
> Fortran library functions are typically __attribute__((nothrow)),
> so in that field being able to use more registers across calls
> would be a good thing, generally.
>
If said Fortran code is really performance critical, like in an FFT,
then all the sin/cos function calls will be done up front and cached.

It is only when you have unique arguments that it can be faster to do
them inline, right?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Intel goes to 32 GPRs

<u9s14g$1jed5$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33494&group=comp.arch#33494

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Intel goes to 32 GPRs
Date: Wed, 26 Jul 2023 22:53:36 +0200
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <u9s14g$1jed5$1@dont-email.me>
References: <u9o14h$183or$1@newsreader4.netcologne.de>
<2023Jul25.175501@mips.complang.tuwien.ac.at>
<c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com>
<u9qo9l$1f22o$1@dont-email.me> <u9qrho$19u0h$1@newsreader4.netcologne.de>
<u9r0jd$1ftnb$1@dont-email.me> <2023Jul26.221142@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 26 Jul 2023 20:53:36 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="0cf6fe72e6c6af6f081f08e53a086dd5";
logging-data="1685925"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+TM47KaaXcNnkefbNXUx7qL318aIT8wFYBkfMswBKcBA=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.16
Cancel-Lock: sha1:7HAJhyumv5k5T3M1wd8VffJef9U=
In-Reply-To: <2023Jul26.221142@mips.complang.tuwien.ac.at>

by: Terje Mathisen - Wed, 26 Jul 2023 20:53 UTC

Anton Ertl wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>> If a function both needs lots of registers _and_ have to call any
>> non-inlined functions, then it really isn't that time critical.
>
> Every interpreter calls non-inlined functions, and they often need a
> lot of registers, or can make good use of them.
>
> Now you may consider that to be not time-critical, but if you are
> Intel and want to sell an interpreter user an Intel system, and it is
> slow compared to the ARM64 or RISC-V systems, you will still lose the
> sale.

I'm willing to be shown otherwise, but util then I'd consider any code
that runs under an actual interpreter to be non-performance-critical.

>
> Conversely, given that you spend so much time in your leaf function,
> saving a few registers at the start and restoring them at the end
> won't slow you down.

This is obviously true, and in the case of having non-saved registers
that you'd like to use across function calls, I would probably consider
writing wrappers for those function calls: The wrapper would do the
save, call the actual function, then restore and return.

This would be a win when you need to call the same function from many
locations since it would reduce the code space while still allowing all
regs to be used.

If, otoh, you aren't worried about the code space expansion, then mark
those wrapper functions as "inline_always" and do the save/restore
around every call site.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Intel goes to 32 GPRs

<EVfwM.141257$U3w1.14154@fx09.iad>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33495&group=comp.arch#33495

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx09.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Intel goes to 32 GPRs
Newsgroups: comp.arch
References: <u9o14h$183or$1@newsreader4.netcologne.de> <2023Jul25.175501@mips.complang.tuwien.ac.at> <c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com> <u9qo9l$1f22o$1@dont-email.me> <u9qrho$19u0h$1@newsreader4.netcologne.de> <u9r0jd$1ftnb$1@dont-email.me> <2023Jul26.221142@mips.complang.tuwien.ac.at> <u9s14g$1jed5$1@dont-email.me>
Lines: 27
Message-ID: <EVfwM.141257$U3w1.14154@fx09.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Wed, 26 Jul 2023 21:06:44 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Wed, 26 Jul 2023 21:06:44 GMT
X-Received-Bytes: 2248

by: Scott Lurndal - Wed, 26 Jul 2023 21:06 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:
>Anton Ertl wrote:
>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>> If a function both needs lots of registers _and_ have to call any
>>> non-inlined functions, then it really isn't that time critical.
>>
>> Every interpreter calls non-inlined functions, and they often need a
>> lot of registers, or can make good use of them.
>>
>> Now you may consider that to be not time-critical, but if you are
>> Intel and want to sell an interpreter user an Intel system, and it is
>> slow compared to the ARM64 or RISC-V systems, you will still lose the
>> sale.
>
>I'm willing to be shown otherwise, but util then I'd consider any code
>that runs under an actual interpreter to be non-performance-critical.

I would likely argue that an interpreter is inherently performance-critical
in order to make the interpreted code useful.

Take a machine simulator as an example of performance-critical
interpreter (interpreting machine instructions rather than bytecode
or some intermediate tree representation). Booting linux on the
simulator, for example, is definitely performance critical from the
standpoint of the user waiting for a login prompt[*].

[*] Yet another thing systemd makes worse.

Re: Intel goes to 32 GPRs

<25d6403b-57b4-4f71-8466-ae0ea2027066n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33496&group=comp.arch#33496

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:b87:b0:635:e5ff:b4b7 with SMTP id fe7-20020a0562140b8700b00635e5ffb4b7mr8311qvb.3.1690406550730;
Wed, 26 Jul 2023 14:22:30 -0700 (PDT)
X-Received: by 2002:a05:6808:138d:b0:39c:cd8e:998f with SMTP id
c13-20020a056808138d00b0039ccd8e998fmr1691432oiw.0.1690406550463; Wed, 26 Jul
2023 14:22:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 26 Jul 2023 14:22:30 -0700 (PDT)
In-Reply-To: <u9s0m2$1jbl0$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:9051:b8b1:3e11:16;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:9051:b8b1:3e11:16
References: <u9o14h$183or$1@newsreader4.netcologne.de> <2023Jul25.175501@mips.complang.tuwien.ac.at>
<c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com> <u9qo9l$1f22o$1@dont-email.me>
<u9qrho$19u0h$1@newsreader4.netcologne.de> <u9r0jd$1ftnb$1@dont-email.me>
<u9r24c$1a2ag$1@newsreader4.netcologne.de> <u9s0m2$1jbl0$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <25d6403b-57b4-4f71-8466-ae0ea2027066n@googlegroups.com>
Subject: Re: Intel goes to 32 GPRs
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Wed, 26 Jul 2023 21:22:30 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3369

by: MitchAlsup - Wed, 26 Jul 2023 21:22 UTC

On Wednesday, July 26, 2023 at 3:45:58 PM UTC-5, Terje Mathisen wrote:
> Thomas Koenig wrote:
> > Terje Mathisen <terje.m...@tmsw.no> schrieb:
> >> Thomas Koenig wrote:
> >
> >>> One problem I see is that all the new registers are caller-saved,
> >>> for compatibility with existing ABIs. This is needed due to stack
> >>> unwinding and setjmp/longjmp, but restricts their benefit due to
> >>> having to spill them across function calls. It might be possible
> >>> to set __attribute__((nothrow)) on functions where this cannot
> >>> happen, and change some caller-saved to callee-saved registers
> >>> in that case, but that could be an interesting discussion.
> >>>
> >>
> >> I'm not worried at all about this point: The only places where I really
> >> want lots of registers are in big/complicated leaf functions!
> >>
> >> If a function both needs lots of registers _and_ have to call any
> >> non-inlined functions, then it really isn't that time critical.
> >
> > Fortran can use lots of registers for its array descriptors,
> > and also can use lots of library calls for mathematical functions
> > (because most CPUs don't have Mitch's single instructions for them).
> > Fortran library functions are typically __attribute__((nothrow)),
> > so in that field being able to use more registers across calls
> > would be a good thing, generally.
> >
> If said Fortran code is really performance critical, like in an FFT,
> then all the sin/cos function calls will be done up front and cached.
<
Not just cached, but computed prior to loading the application.
>
> It is only when you have unique arguments that it can be faster to do
> them inline, right?
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: Intel goes to 32 GPRs

<5a1c383b-ac9d-4069-955f-16adfd1626c6n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33497&group=comp.arch#33497

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:9b3:b0:63d:f3:a7f0 with SMTP id du19-20020a05621409b300b0063d00f3a7f0mr9153qvb.9.1690407018966;
Wed, 26 Jul 2023 14:30:18 -0700 (PDT)
X-Received: by 2002:a05:6808:3099:b0:3a4:13ba:9f6 with SMTP id
bl25-20020a056808309900b003a413ba09f6mr1603477oib.10.1690407018791; Wed, 26
Jul 2023 14:30:18 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 26 Jul 2023 14:30:18 -0700 (PDT)
In-Reply-To: <EVfwM.141257$U3w1.14154@fx09.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:9051:b8b1:3e11:16;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:9051:b8b1:3e11:16
References: <u9o14h$183or$1@newsreader4.netcologne.de> <2023Jul25.175501@mips.complang.tuwien.ac.at>
<c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com> <u9qo9l$1f22o$1@dont-email.me>
<u9qrho$19u0h$1@newsreader4.netcologne.de> <u9r0jd$1ftnb$1@dont-email.me>
<2023Jul26.221142@mips.complang.tuwien.ac.at> <u9s14g$1jed5$1@dont-email.me> <EVfwM.141257$U3w1.14154@fx09.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5a1c383b-ac9d-4069-955f-16adfd1626c6n@googlegroups.com>
Subject: Re: Intel goes to 32 GPRs
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Wed, 26 Jul 2023 21:30:18 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3544

by: MitchAlsup - Wed, 26 Jul 2023 21:30 UTC

On Wednesday, July 26, 2023 at 4:06:48 PM UTC-5, Scott Lurndal wrote:
> Terje Mathisen <terje.m...@tmsw.no> writes:
> >Anton Ertl wrote:
> >> Terje Mathisen <terje.m...@tmsw.no> writes:
> >>> If a function both needs lots of registers _and_ have to call any
> >>> non-inlined functions, then it really isn't that time critical.
> >>
> >> Every interpreter calls non-inlined functions, and they often need a
> >> lot of registers, or can make good use of them.
> >>
> >> Now you may consider that to be not time-critical, but if you are
> >> Intel and want to sell an interpreter user an Intel system, and it is
> >> slow compared to the ARM64 or RISC-V systems, you will still lose the
> >> sale.
> >
> >I'm willing to be shown otherwise, but util then I'd consider any code
> >that runs under an actual interpreter to be non-performance-critical.
<
> I would likely argue that an interpreter is inherently performance-critical
> in order to make the interpreted code useful.
>
> Take a machine simulator as an example of performance-critical
> interpreter (interpreting machine instructions rather than bytecode
> or some intermediate tree representation). Booting linux on the
> simulator, for example, is definitely performance critical from the
> standpoint of the user waiting for a login prompt[*].
<
As someone who has actually done this (1999), what we did was to
take portions (~= basic blocks) of code and compile them into traces
complete with CPU/cache/TLB statistics updates, and then run 95%±
of the interpreter as native code.
<
Is this still simulation ? obviously
but it is more like JIT than interpretation.
and eliminated most of the overhead of interpretation.
<
Obviously this is easier of target and computer are of the same architecture
but one could do Mc 88100 ISA on a SPARC V8 without much hassle {or
x86-64 on a SPARC V9,...}.
>
> [*] Yet another thing systemd makes worse.

Re: Intel goes to 32 GPRs

<1365a6eb-aae3-4bcd-9acc-c99b1ba65404n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33498&group=comp.arch#33498

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5992:0:b0:403:daba:38a2 with SMTP id e18-20020ac85992000000b00403daba38a2mr9718qte.11.1690407068750;
Wed, 26 Jul 2023 14:31:08 -0700 (PDT)
X-Received: by 2002:a05:6870:9575:b0:1b0:9643:6f69 with SMTP id
v53-20020a056870957500b001b096436f69mr1090058oal.4.1690407068564; Wed, 26 Jul
2023 14:31:08 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 26 Jul 2023 14:31:08 -0700 (PDT)
In-Reply-To: <5a1c383b-ac9d-4069-955f-16adfd1626c6n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:9051:b8b1:3e11:16;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:9051:b8b1:3e11:16
References: <u9o14h$183or$1@newsreader4.netcologne.de> <2023Jul25.175501@mips.complang.tuwien.ac.at>
<c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com> <u9qo9l$1f22o$1@dont-email.me>
<u9qrho$19u0h$1@newsreader4.netcologne.de> <u9r0jd$1ftnb$1@dont-email.me>
<2023Jul26.221142@mips.complang.tuwien.ac.at> <u9s14g$1jed5$1@dont-email.me>
<EVfwM.141257$U3w1.14154@fx09.iad> <5a1c383b-ac9d-4069-955f-16adfd1626c6n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1365a6eb-aae3-4bcd-9acc-c99b1ba65404n@googlegroups.com>
Subject: Re: Intel goes to 32 GPRs
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Wed, 26 Jul 2023 21:31:08 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3824

by: MitchAlsup - Wed, 26 Jul 2023 21:31 UTC

On Wednesday, July 26, 2023 at 4:30:20 PM UTC-5, MitchAlsup wrote:
> On Wednesday, July 26, 2023 at 4:06:48 PM UTC-5, Scott Lurndal wrote:
> > Terje Mathisen <terje.m...@tmsw.no> writes:
> > >Anton Ertl wrote:
> > >> Terje Mathisen <terje.m...@tmsw.no> writes:
> > >>> If a function both needs lots of registers _and_ have to call any
> > >>> non-inlined functions, then it really isn't that time critical.
> > >>
> > >> Every interpreter calls non-inlined functions, and they often need a
> > >> lot of registers, or can make good use of them.
> > >>
> > >> Now you may consider that to be not time-critical, but if you are
> > >> Intel and want to sell an interpreter user an Intel system, and it is
> > >> slow compared to the ARM64 or RISC-V systems, you will still lose the
> > >> sale.
> > >
> > >I'm willing to be shown otherwise, but util then I'd consider any code
> > >that runs under an actual interpreter to be non-performance-critical.
> <
> > I would likely argue that an interpreter is inherently performance-critical
> > in order to make the interpreted code useful.
> >
> > Take a machine simulator as an example of performance-critical
> > interpreter (interpreting machine instructions rather than bytecode
> > or some intermediate tree representation). Booting linux on the
> > simulator, for example, is definitely performance critical from the
> > standpoint of the user waiting for a login prompt[*].
> <
> As someone who has actually done this (1999), what we did was to
> take portions (~= basic blocks) of code and compile them into traces
> complete with CPU/cache/TLB statistics updates, and then run 95%±
> of the interpreter as native code.
> <
> Is this still simulation ? obviously
> but it is more like JIT than interpretation.
> and eliminated most of the overhead of interpretation.
> <
> Obviously this is easier of target and computer are of the same architecture
> but one could do Mc 88100 ISA on a SPARC V8 without much hassle {or
> x86-64 on a SPARC V9,...}.
> >
> > [*] Yet another thing systemd makes worse.

Re: Intel goes to 32 GPRs

<yBgwM.206164$TPw2.184992@fx17.iad>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33499&group=comp.arch#33499

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx17.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Intel goes to 32 GPRs
Newsgroups: comp.arch
References: <u9o14h$183or$1@newsreader4.netcologne.de> <2023Jul25.175501@mips.complang.tuwien.ac.at> <c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com> <u9qo9l$1f22o$1@dont-email.me> <u9qrho$19u0h$1@newsreader4.netcologne.de> <u9r0jd$1ftnb$1@dont-email.me> <2023Jul26.221142@mips.complang.tuwien.ac.at> <u9s14g$1jed5$1@dont-email.me> <EVfwM.141257$U3w1.14154@fx09.iad> <5a1c383b-ac9d-4069-955f-16adfd1626c6n@googlegroups.com>
Lines: 58
Message-ID: <yBgwM.206164$TPw2.184992@fx17.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Wed, 26 Jul 2023 21:53:34 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Wed, 26 Jul 2023 21:53:34 GMT
X-Received-Bytes: 3309

by: Scott Lurndal - Wed, 26 Jul 2023 21:53 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Wednesday, July 26, 2023 at 4:06:48=E2=80=AFPM UTC-5, Scott Lurndal wrot=
>e:
>> Terje Mathisen <terje.m...@tmsw.no> writes:=20
>> >Anton Ertl wrote:=20
>> >> Terje Mathisen <terje.m...@tmsw.no> writes:=20
>> >>> If a function both needs lots of registers _and_ have to call any=20
>> >>> non-inlined functions, then it really isn't that time critical.=20
>> >>=20
>> >> Every interpreter calls non-inlined functions, and they often need a=
>=20
>> >> lot of registers, or can make good use of them.=20
>> >>=20
>> >> Now you may consider that to be not time-critical, but if you are=20
>> >> Intel and want to sell an interpreter user an Intel system, and it is=
>=20
>> >> slow compared to the ARM64 or RISC-V systems, you will still lose the=
>=20
>> >> sale.=20
>> >=20
>> >I'm willing to be shown otherwise, but util then I'd consider any code=
>=20
>> >that runs under an actual interpreter to be non-performance-critical.
><
>> I would likely argue that an interpreter is inherently performance-critic=
>al=20
>> in order to make the interpreted code useful.=20
>>=20
>> Take a machine simulator as an example of performance-critical=20
>> interpreter (interpreting machine instructions rather than bytecode=20
>> or some intermediate tree representation). Booting linux on the=20
>> simulator, for example, is definitely performance critical from the=20
>> standpoint of the user waiting for a login prompt[*].=20
><
>As someone who has actually done this (1999)

We had a simulator for the Burroughs mainframe in the 70's
that we used to test instruction set changes (it was in Burroughs
Algol and ran on a B7900).

I currently work on an SoC simulator which simulates (functionally)
the entire SoC: ARM64 cores (dozens), peripheral controllers (e.g. SATA,
SPI, EMMC, I2C/I3C, networking), microcontrollers, accelerator blocks, PCI, etc.

Performance is key.

, what we did was to
>take portions (~=3D basic blocks) of code and compile them into traces
>complete with CPU/cache/TLB statistics updates, and then run 95%=C2=B1=20
>of the interpreter as native code.=20
><
>Is this still simulation ? obviously
>but it is more like JIT than interpretation.

Most modern simulators (e.g. qemu) use jit-like mechanisms,
as did AMDs SimNow! back in the 2000s.

Re: Intel goes to 32 GPRs

<u9tf9l$1rqbp$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33502&group=comp.arch#33502

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Intel goes to 32 GPRs
Date: Thu, 27 Jul 2023 12:01:25 +0200
Organization: A noiseless patient Spider
Lines: 83
Message-ID: <u9tf9l$1rqbp$1@dont-email.me>
References: <u9o14h$183or$1@newsreader4.netcologne.de>
<2023Jul25.175501@mips.complang.tuwien.ac.at>
<c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com>
<u9qo9l$1f22o$1@dont-email.me> <u9qrho$19u0h$1@newsreader4.netcologne.de>
<u9r0jd$1ftnb$1@dont-email.me> <2023Jul26.221142@mips.complang.tuwien.ac.at>
<u9s14g$1jed5$1@dont-email.me> <EVfwM.141257$U3w1.14154@fx09.iad>
<5a1c383b-ac9d-4069-955f-16adfd1626c6n@googlegroups.com>
<yBgwM.206164$TPw2.184992@fx17.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 27 Jul 2023 10:01:25 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="0696af5b0086bac0fa33289a21625009";
logging-data="1960313"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/ciIv68Y31PgV19FwEpMmg5TL1ntc8Z/9zmeQzKp+5Qg=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.16
Cancel-Lock: sha1:BCUru9v/fQaVDVt/6X2L8zRG4GI=
In-Reply-To: <yBgwM.206164$TPw2.184992@fx17.iad>

by: Terje Mathisen - Thu, 27 Jul 2023 10:01 UTC

Scott Lurndal wrote:
> MitchAlsup <MitchAlsup@aol.com> writes:
>> On Wednesday, July 26, 2023 at 4:06:48=E2=80=AFPM UTC-5, Scott Lurndal wrot=
>> e:
>>> Terje Mathisen <terje.m...@tmsw.no> writes:=20
>>>> Anton Ertl wrote:=20
>>>>> Terje Mathisen <terje.m...@tmsw.no> writes:=20
>>>>>> If a function both needs lots of registers _and_ have to call any=20
>>>>>> non-inlined functions, then it really isn't that time critical.=20
>>>>> =20
>>>>> Every interpreter calls non-inlined functions, and they often need a=
>> =20
>>>>> lot of registers, or can make good use of them.=20
>>>>> =20
>>>>> Now you may consider that to be not time-critical, but if you are=20
>>>>> Intel and want to sell an interpreter user an Intel system, and it is=
>> =20
>>>>> slow compared to the ARM64 or RISC-V systems, you will still lose the=
>> =20
>>>>> sale.=20
>>>> =20
>>>> I'm willing to be shown otherwise, but util then I'd consider any code=
>> =20
>>>> that runs under an actual interpreter to be non-performance-critical.
>> <
>>> I would likely argue that an interpreter is inherently performance-critic=
>> al=20
>>> in order to make the interpreted code useful.=20
>>> =20
>>> Take a machine simulator as an example of performance-critical=20
>>> interpreter (interpreting machine instructions rather than bytecode=20
>>> or some intermediate tree representation). Booting linux on the=20
>>> simulator, for example, is definitely performance critical from the=20
>>> standpoint of the user waiting for a login prompt[*].=20
>> <
>> As someone who has actually done this (1999)
>
> We had a simulator for the Burroughs mainframe in the 70's
> that we used to test instruction set changes (it was in Burroughs
> Algol and ran on a B7900).
>
> I currently work on an SoC simulator which simulates (functionally)
> the entire SoC: ARM64 cores (dozens), peripheral controllers (e.g. SATA,
> SPI, EMMC, I2C/I3C, networking), microcontrollers, accelerator blocks, PCI, etc.
>
> Performance is key.
>
>
> , what we did was to
>> take portions (~=3D basic blocks) of code and compile them into traces
>> complete with CPU/cache/TLB statistics updates, and then run 95%=C2=B1=20
>> of the interpreter as native code.=20
>> <
>> Is this still simulation ? obviously
>> but it is more like JIT than interpretation.
>
> Most modern simulators (e.g. qemu) use jit-like mechanisms,
> as did AMDs SimNow! back in the 2000s.

Exactly my point: Those JIT-style traces can obviously use all the regs,
as can any other leaf code.

It is only when you want to use the new regs for the core interpreter
code that you run into trouble.

All this said, I strongly suspect that the variable sys save/restore
opcodes wil be capable of handling everything, as long as the OS
initializes it properly. I.e. promise that the save areas will be large
enough.

It would still probably be better performance wise to make those
save/restore functions partially lazy, so that code which never touches
the extended regs don't need the save/restore. OTOH, we have seen this
tried multiple times over the years, and we typically end up with the
much cleaner "grab everything" approach because it is easier to make
bug-free.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Intel goes to 32-bit general purpose registers

<d185bbc6-1a82-425a-b932-95aac063b189n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33503&group=comp.arch#33503

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:b9c4:0:b0:767:3d3e:4de9 with SMTP id j187-20020a37b9c4000000b007673d3e4de9mr2202qkf.4.1690470801252;
Thu, 27 Jul 2023 08:13:21 -0700 (PDT)
X-Received: by 2002:a05:6808:138d:b0:3a1:eb8a:203d with SMTP id
c13-20020a056808138d00b003a1eb8a203dmr6055028oiw.11.1690470801031; Thu, 27
Jul 2023 08:13:21 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 27 Jul 2023 08:13:20 -0700 (PDT)
In-Reply-To: <2023Jul26.072712@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=80.62.116.102; posting-account=iwcJjQoAAAAIecwT8pOXxaSOyiUTZMJr
NNTP-Posting-Host: 80.62.116.102
References: <u9o14h$183or$1@newsreader4.netcologne.de> <d6RvM.21474$cc2c.10950@fx37.iad>
<db82b799-811e-413e-b838-98b85b96a3b1n@googlegroups.com> <989dc353-b18d-4a40-8b61-11f65b4b1bd0n@googlegroups.com>
<2023Jul26.072712@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d185bbc6-1a82-425a-b932-95aac063b189n@googlegroups.com>
Subject: Re: Intel goes to 32-bit general purpose registers
From: peterfirefly@gmail.com (Peter Lund)
Injection-Date: Thu, 27 Jul 2023 15:13:21 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2217

by: Peter Lund - Thu, 27 Jul 2023 15:13 UTC

On Wednesday, July 26, 2023 at 8:03:00 AM UTC+2, Anton Ertl wrote:
> So with the REX2 prefix you replace the 0F prefix of the
> map1 instructions. That's the advantage of having the REX2 prefix
> over having two REX prefixes as AMD considered according to Mitch
> Alsup. The disadvantage is that it occupies the D5 byte of AMD64

That is exactly as what the 2x REX scheme could have done. REX2 has a payload of 8 bits. 2x REX has a payload of 4+4=8 bits. You need W + 2x3 extra bits for the register numbers. Then you have one bit left over, which REX2 uses for the map selection. 2x REX could have done exactly the same -- what would you need two W bits for?

-Peter

PS: It's nice to get tidbits like that, even with the inevitable decade+ delay. I wonder what AMD had considered using the extra W bit for...

Re: Intel goes to 32 GPRs

<ua087o$1dg5f$1@newsreader4.netcologne.de>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33504&group=comp.arch#33504

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-e8ee-0-97f8-a0fd-ad47-8b4d.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Intel goes to 32 GPRs
Date: Fri, 28 Jul 2023 11:19:20 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <ua087o$1dg5f$1@newsreader4.netcologne.de>
References: <u9o14h$183or$1@newsreader4.netcologne.de>
<2023Jul25.175501@mips.complang.tuwien.ac.at>
<c12bf4b2-8844-46c1-bd5a-e4b060baec24n@googlegroups.com>
<u9qo9l$1f22o$1@dont-email.me> <u9qrho$19u0h$1@newsreader4.netcologne.de>
<u9r0jd$1ftnb$1@dont-email.me> <u9r24c$1a2ag$1@newsreader4.netcologne.de>
<u9s0m2$1jbl0$1@dont-email.me>
Injection-Date: Fri, 28 Jul 2023 11:19:20 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-e8ee-0-97f8-a0fd-ad47-8b4d.ipv6dyn.netcologne.de:2001:4dd7:e8ee:0:97f8:a0fd:ad47:8b4d";
logging-data="1491119"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Fri, 28 Jul 2023 11:19 UTC

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
> Thomas Koenig wrote:
>> Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
>>> Thomas Koenig wrote:
>>
>>>> One problem I see is that all the new registers are caller-saved,
>>>> for compatibility with existing ABIs. This is needed due to stack
>>>> unwinding and setjmp/longjmp, but restricts their benefit due to
>>>> having to spill them across function calls. It might be possible
>>>> to set __attribute__((nothrow)) on functions where this cannot
>>>> happen, and change some caller-saved to callee-saved registers
>>>> in that case, but that could be an interesting discussion.
>>>>
>>>
>>> I'm not worried at all about this point: The only places where I really
>>> want lots of registers are in big/complicated leaf functions!
>>>
>>> If a function both needs lots of registers _and_ have to call any
>>> non-inlined functions, then it really isn't that time critical.
>>
>> Fortran can use lots of registers for its array descriptors,
>> and also can use lots of library calls for mathematical functions
>> (because most CPUs don't have Mitch's single instructions for them).
>> Fortran library functions are typically __attribute__((nothrow)),
>> so in that field being able to use more registers across calls
>> would be a good thing, generally.
>>
> If said Fortran code is really performance critical, like in an FFT,
> then all the sin/cos function calls will be done up front and cached.

If you're doing lots of chemical reaction calculation, it is
not possible to pre-compute the Arrhenius equation coefficients
(and their Jacobians).

There's more to life than FFT :-)

Subject	Author
Intel goes to 32-bit general purpose registers	Thomas Koenig
Re: Intel goes to 32-bit general purpose registers	Scott Lurndal
Re: Intel goes to 32-bit general purpose registers	MitchAlsup
Re: Intel goes to 32-bit general purpose registers	Quadibloc
Re: Intel goes to 32-bit general purpose registers	Anton Ertl
Re: Intel goes to 32-bit general purpose registers	Peter Lund
Re: Intel goes to 32-bit general purpose registers	Anton Ertl
Re: Intel goes to 32-bit general purpose registers	Elijah Stone
Re: Intel goes to 32-bit general purpose registers	MitchAlsup
Re: Intel goes to 32-bit general purpose registers	Thomas Koenig
Re: Intel goes to 32-bit general purpose registers	Quadibloc
Re: Intel goes to 32-bit general purpose registers	Quadibloc
Re: Intel goes to 32-bit general purpose registers	John Dallman
Re: Intel goes to 32-bit general purpose registers	Scott Lurndal
Re: Intel goes to 32-bit general purpose registers	John Dallman
Re: Intel goes to 32-bit general purpose registers	Anton Ertl
Re: Intel goes to 32-bit general purpose registers	John Dallman
Re: Intel goes to 32-bit general purpose registers	BGB
Re: Intel goes to 32-bit general purpose registers	Anton Ertl
Re: Intel goes to 32-bit general purpose registers	JimBrakefield
Re: Intel goes to 32-bit general purpose registers	Michael S
Re: Intel goes to 32-bit general purpose registers	Anton Ertl
Re: Intel goes to 32-bit general purpose registers	John Dallman
Re: Intel goes to 32-bit general purpose registers	Stephen Fuld
Re: Intel goes to 32-bit general purpose registers	MitchAlsup
Re: Intel goes to 32-bit general purpose registers	Anton Ertl
Re: Intel goes to 32-bit general purpose registers	John Dallman
Intel goes to 32 GPRs (was: Intel goes to 32-bit ...)	Anton Ertl
Re: Intel goes to 32 GPRs (was: Intel goes to 32-bit ...)	Quadibloc
Re: Intel goes to 32 GPRs (was: Intel goes to 32-bit ...)	Anton Ertl
Re: Intel goes to 32 GPRs	Terje Mathisen
Re: Intel goes to 32 GPRs	Thomas Koenig
Re: Intel goes to 32 GPRs	Terje Mathisen
Re: Intel goes to 32 GPRs	Thomas Koenig
Re: Intel goes to 32 GPRs	Terje Mathisen
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	Thomas Koenig
Re: Intel goes to 32 GPRs	Terje Mathisen
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	Thomas Koenig
Re: Intel goes to 32 GPRs	Anton Ertl
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	Anton Ertl
Re: Intel goes to 32 GPRs	Terje Mathisen
Re: Intel goes to 32 GPRs	Scott Lurndal
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	Scott Lurndal
Re: Intel goes to 32 GPRs	Terje Mathisen
Re: Intel goes to 32 GPRs	BGB
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	BGB
Re: Intel goes to 32 GPRs	Quadibloc
Re: Intel goes to 32 GPRs	BGB
Re: Intel goes to 32 GPRs	Anton Ertl
Re: Intel goes to 32 GPRs	Terje Mathisen
Re: Intel goes to 32 GPRs	BGB
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	BGB
Re: Intel goes to 32 GPRs	Anton Ertl
Re: Intel goes to 32 GPRs	Thomas Koenig
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	Anton Ertl
Re: Intel goes to 32 GPRs	Terje Mathisen
Re: Intel goes to 32 GPRs	Anton Ertl
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	JimBrakefield
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	BGB
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	BGB
Re: Intel goes to 32 GPRs	Terje Mathisen
Re: Intel goes to 32 GPRs	BGB
Re: Intel goes to 32 GPRs	Stephen Fuld
Re: Intel goes to 32 GPRs	Anton Ertl
Re: Intel goes to 32 GPRs	Stephen Fuld
Re: Intel goes to 32 GPRs	Thomas Koenig
Re: Intel goes to 32 GPRs	Thomas Koenig
Re: Intel goes to 32 GPRs	Terje Mathisen
Re: Intel goes to 32 GPRs	Thomas Koenig
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	Niklas Holsti
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	Niklas Holsti
Re: Intel goes to 32 GPRs	Stephen Fuld
Re: Intel goes to 32 GPRs	Niklas Holsti
Re: Intel goes to 32 GPRs	Ivan Godard
Re: Intel goes to 32 GPRs	Kent Dickey
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	Quadibloc
Re: Intel goes to 32 GPRs	Terje Mathisen
Re: Intel goes to 32 GPRs	Kent Dickey
Re: Intel goes to 32 GPRs	Thomas Koenig
Re: Intel goes to 32 GPRs	Anton Ertl
Re: Intel goes to 32 GPRs	Anton Ertl
Re: Intel goes to 32 GPRs	EricP
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	Thomas Koenig
Re: Intel goes to 32 GPRs	BGB
Re: Intel goes to 32 GPRs	MitchAlsup
Re: Intel goes to 32 GPRs	BGB
Re: Intel goes to 32 GPRs	Terje Mathisen
Re: Intel goes to 32 GPRs	Stephen Fuld
Re: Intel goes to 32 GPRs	Kent Dickey
Callee-saved registers (was: Intel goes to 32 GPRs)	Anton Ertl
Re: Intel goes to 32 GPRs	Mike Stump