Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

The following statement is not true. The previous statement is true.


devel / comp.arch / Re: More of my philosophy about CISC and RISC instructions..

SubjectAuthor
* More of my philosophy about CISC and RISC instructions..Amine Moulay Ramdane
+* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
|`* Re: More of my philosophy about CISC and RISC instructions..pec...@gmail.com
| +* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| |+* Re: More of my philosophy about CISC and RISC instructions..pec...@gmail.com
| ||+* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| |||`* Re: More of my philosophy about CISC and RISC instructions..Scott Lurndal
| ||| `* Re: More of my philosophy about CISC and RISC instructions..BGB
| |||  `* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| |||   `* Re: More of my philosophy about CISC and RISC instructions..BGB
| |||    +* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| |||    |`- Re: More of my philosophy about CISC and RISC instructions..BGB
| |||    `* Re: More of my philosophy about CISC and RISC instructions..Terje Mathisen
| |||     +* Re: More of my philosophy about CISC and RISC instructions..BGB
| |||     |`* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| |||     | `- Re: More of my philosophy about CISC and RISC instructions..BGB
| |||     `* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| |||      +- Re: More of my philosophy about CISC and RISC instructions..BGB
| |||      `- Re: More of my philosophy about CISC and RISC instructions..Terje Mathisen
| ||`* Re: More of my philosophy about CISC and RISC instructions..BGB
| || +* Re: More of my philosophy about CISC and RISC instructions..pec...@gmail.com
| || |+- Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || |+* Re: More of my philosophy about CISC and RISC instructions..BGB
| || ||`* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || +* Re: More of my philosophy about CISC and RISC instructions..JimBrakefield
| || || |`* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || | +- Re: More of my philosophy about CISC and RISC instructions..JimBrakefield
| || || | `* Re: More of my philosophy about CISC and RISC instructions..Scott Lurndal
| || || |  `* Re: More of my philosophy about CISC and RISC instructions..BGB
| || || |   `* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |    +* Re: More of my philosophy about CISC and RISC instructions..BGB
| || || |    |`* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |    | `* Re: More of my philosophy about CISC and RISC instructions..BGB
| || || |    |  `* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |    |   `* Re: More of my philosophy about CISC and RISC instructions..BGB
| || || |    |    `* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |    |     +* Re: More of my philosophy about CISC and RISC instructions..EricP
| || || |    |     |+- Re: More of my philosophy about CISC and RISC instructions..BGB
| || || |    |     |`- Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |    |     `- Re: More of my philosophy about CISC and RISC instructions..BGB
| || || |    +- Re: More of my philosophy about CISC and RISC instructions..Scott Lurndal
| || || |    `* Re: More of my philosophy about CISC and RISC instructions..Paul A. Clayton
| || || |     `* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |      `* Re: More of my philosophy about CISC and RISC instructions..Stephen Fuld
| || || |       +* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |       |+- Re: More of my philosophy about CISC and RISC instructions..Stephen Fuld
| || || |       |`- Re: More of my philosophy about CISC and RISC instructions..Stephen Fuld
| || || |       `* Re: More of my philosophy about CISC and RISC instructions..Thomas Koenig
| || || |        +* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |        |`- Going fast, was Re: More of my philosophyJohn Levine
| || || |        `* Re: More of my philosophy about CISC and RISC instructions..aph
| || || |         `* Re: More of my philosophy about CISC and RISC instructions..luke.l...@gmail.com
| || || |          `* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |           `* Re: More of my philosophy about CISC and RISC instructions..Stefan Monnier
| || || |            `- Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || +* Re: More of my philosophy about CISC and RISC instructions..BGB
| || || |`* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || | `* Re: More of my philosophy about CISC and RISC instructions..BGB
| || || |  +* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |  |`* Re: More of my philosophy about CISC and RISC instructions..BGB
| || || |  | `* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |  |  `* Re: More of my philosophy about CISC and RISC instructions..BGB
| || || |  |   `* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |  |    `* Re: More of my philosophy about CISC and RISC instructions..BGB
| || || |  |     `* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |  |      `* Re: More of my philosophy about CISC and RISC instructions..BGB
| || || |  |       `* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |  |        +- Re: More of my philosophy about CISC and RISC instructions..BGB
| || || |  |        `* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || || |  |         `- Re: More of my philosophy about CISC and RISC instructions..BGB
| || || |  `- Re: More of my philosophy about CISC and RISC instructions..Scott Lurndal
| || || `- Re: More of my philosophy about CISC and RISC instructions..Paul A. Clayton
| || |`- Re: More of my philosophy about CISC and RISC instructions..luke.l...@gmail.com
| || `* Re: More of my philosophy about CISC and RISC instructions..luke.l...@gmail.com
| ||  `- Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| |+* Re: More of my philosophy about CISC and RISC instructions..Brett
| ||`* Re: More of my philosophy about CISC and RISC instructions..pec...@gmail.com
| || +* Re: More of my philosophy about CISC and RISC instructions..Thomas Koenig
| || |+* Re: More of my philosophy about CISC and RISC instructions..pec...@gmail.com
| || ||`* Re: More of my philosophy about CISC and RISC instructions..Thomas Koenig
| || || `* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || ||  +* Re: More of my philosophy about CISC and RISC instructions..pec...@gmail.com
| || ||  |`* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || ||  | `- Re: More of my philosophy about CISC and RISC instructions..Paul A. Clayton
| || ||  `- register windows (was: More of my philosophy ...)Anton Ertl
| || |`* Re: More of my philosophy about CISC and RISC instructions..Anton Ertl
| || | +* Re: More of my philosophy about CISC and RISC instructions..John Levine
| || | |`* Re: More of my philosophy about CISC and RISC instructions..Anton Ertl
| || | | +- Re: More of my philosophy about CISC and RISC instructions..Scott Lurndal
| || | | `- Re: More of my philosophy about CISC and RISC instructions..John Levine
| || | `- Re: More of my philosophy about CISC and RISC instructions..Stephen Fuld
| || +* Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| || |`- Re: More of my philosophy about CISC and RISC instructions..Timothy McCaffrey
| || +* Re: More of my philosophy about CISC and RISC instructions..Timothy McCaffrey
| || |`- Re: More of my philosophy about CISC and RISC instructions..luke.l...@gmail.com
| || `* Re: More of my philosophy about CISC and RISC instructions..Timothy McCaffrey
| ||  +* Re: More of my philosophy about CISC and RISC instructions..Stephen Fuld
| ||  |`- Re: More of my philosophy about CISC and RISC instructions..BGB
| ||  +- Re: More of my philosophy about CISC and RISC instructions..MitchAlsup
| ||  `- Re: More of my philosophy about CISC and RISC instructions..luke.l...@gmail.com
| |`- Re: More of my philosophy about CISC and RISC instructions..luke.l...@gmail.com
| +- Re: More of my philosophy about CISC and RISC instructions..BGB
| `* Re: More of my philosophy about CISC and RISC instructions..JimBrakefield
`- Re: More of my philosophy about CISC and RISC instructions..Hogege NaN

Pages:123456
Re: More of my philosophy about CISC and RISC instructions..

<793a9545-066b-4819-9bd1-ba0453f92955n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33655&group=comp.arch#33655

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:4b62:0:b0:649:ba51:a26c with SMTP id m2-20020ad44b62000000b00649ba51a26cmr6887qvx.5.1692217190215;
Wed, 16 Aug 2023 13:19:50 -0700 (PDT)
X-Received: by 2002:a05:6a00:18a2:b0:688:7ce7:f29c with SMTP id
x34-20020a056a0018a200b006887ce7f29cmr1333268pfh.3.1692217189691; Wed, 16 Aug
2023 13:19:49 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 16 Aug 2023 13:19:49 -0700 (PDT)
In-Reply-To: <36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=63.70.0.202; posting-account=ujX_IwoAAACu0_cef9hMHeR8g0ZYDNHh
NNTP-Posting-Host: 63.70.0.202
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <ubeclc$2gi1m$1@dont-email.me>
<36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <793a9545-066b-4819-9bd1-ba0453f92955n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: timcaffrey@aol.com (Timothy McCaffrey)
Injection-Date: Wed, 16 Aug 2023 20:19:50 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2814
 by: Timothy McCaffrey - Wed, 16 Aug 2023 20:19 UTC

On Wednesday, August 16, 2023 at 2:18:25 PM UTC-4, pec...@gmail.com wrote:
> Brett wrote:
> > MitchAlsup <Mitch...@aol.com> wrote:
> > > 32K is 25% bigger than 24K but only 1.1% faster, and likely burns more
> > > than 1.1% more power.
> > > <
> > > Comparing 64K 4-way to 48K 6-way:: 64K is only 0.7% faster; with 1M L2 only 0.4% faster.
> > This is the killer argument that would have saved me from caring about 16
> > bit opcodes.
> > Only toy CPU’s can care about 16 bit opcodes.
> Instruction compression still matters in embedded applications.

Given a variable length instruction set, it seems to me it makes sense to encode the most used
instructions into small instructions, if possible. I believe I have read that the most used
instructions are load, compare, add and branch. The rest are in the single digits percentage wise.
(I wish I had a reference, so take the above with a rock sized grain of salt). Anyway, if
you could encode those instructions into a 16 bit word, and leave the longer instructions
for all the useful but not used that much remainder, wouldn't that basically "compress"
your instruction set (even if variants of the longer instructions "overlapped" the short instructions,
it would probably still be a win).
- Tim

Re: More of my philosophy about CISC and RISC instructions..

<ubjc4h$3e8i6$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33656&group=comp.arch#33656

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Wed, 16 Aug 2023 13:38:41 -0700
Organization: A noiseless patient Spider
Lines: 53
Message-ID: <ubjc4h$3e8i6$1@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com>
<bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<ubeclc$2gi1m$1@dont-email.me>
<36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com>
<793a9545-066b-4819-9bd1-ba0453f92955n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 16 Aug 2023 20:38:41 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e8e7749db2155aee91cc3b8234d5e046";
logging-data="3613254"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+D9V5Zmfn5JbLWh+tgwuqriJP0+ZJFIPA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ezIo/i9CDIb7V0TQm49TNTmKaAE=
Content-Language: en-US
In-Reply-To: <793a9545-066b-4819-9bd1-ba0453f92955n@googlegroups.com>
 by: Stephen Fuld - Wed, 16 Aug 2023 20:38 UTC

On 8/16/2023 1:19 PM, Timothy McCaffrey wrote:
> On Wednesday, August 16, 2023 at 2:18:25 PM UTC-4, pec...@gmail.com wrote:
>> Brett wrote:
>>> MitchAlsup <Mitch...@aol.com> wrote:
>>>> 32K is 25% bigger than 24K but only 1.1% faster, and likely burns more
>>>> than 1.1% more power.
>>>> <
>>>> Comparing 64K 4-way to 48K 6-way:: 64K is only 0.7% faster; with 1M L2 only 0.4% faster.
>>> This is the killer argument that would have saved me from caring about 16
>>> bit opcodes.
>>> Only toy CPU’s can care about 16 bit opcodes.
>> Instruction compression still matters in embedded applications.
>
> Given a variable length instruction set, it seems to me it makes sense to encode the most used
> instructions into small instructions, if possible.

Sure.

I believe I have read that the most used
> instructions are load, compare, add and branch. The rest are in the single digits percentage wise.
> (I wish I had a reference, so take the above with a rock sized grain of salt).

I think that is at least approximately right.

> Anyway, if
> you could encode those instructions into a 16 bit word, and leave the longer instructions
> for all the useful but not used that much remainder, wouldn't that basically "compress"
> your instruction set (even if variants of the longer instructions "overlapped" the short instructions,
> it would probably still be a win).

Yes, but . . . For loads, you would be limited to a very short
displacement, limiting their usefulness. You almost certainly wouldn't
use three register specifiers, which limits adds to A=A+B, which isn't
terrible, but an annoyance. Having a small immediate field proobably
isn't much of a problem, as I think many constant adds are of a small
number. Branches are probably OK with a smaller displacement, as I
suspect a lot of branches are to quite close. With compare, are you
proposing using condition codes? Otherwise you have the three register
specifier problem - eccch.

I think these considerations reduce (but probably don't eliminate) the
percentage of time the 16 bit instructions would be useful.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: More of my philosophy about CISC and RISC instructions..

<0a5e4657-dc23-4974-bb2c-53c2bfdf526fn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33657&group=comp.arch#33657

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:9cd:b0:63c:e916:a2cf with SMTP id dp13-20020a05621409cd00b0063ce916a2cfmr31621qvb.6.1692221405777;
Wed, 16 Aug 2023 14:30:05 -0700 (PDT)
X-Received: by 2002:a17:902:da82:b0:1bc:95bf:bdc9 with SMTP id
j2-20020a170902da8200b001bc95bfbdc9mr973431plx.13.1692221405061; Wed, 16 Aug
2023 14:30:05 -0700 (PDT)
Path: i2pn2.org!rocksolid2!i2pn.org!news.niel.me!glou.org!news.glou.org!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 16 Aug 2023 14:30:04 -0700 (PDT)
In-Reply-To: <793a9545-066b-4819-9bd1-ba0453f92955n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:85aa:2278:f81:8fa8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:85aa:2278:f81:8fa8
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <ubeclc$2gi1m$1@dont-email.me>
<36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com> <793a9545-066b-4819-9bd1-ba0453f92955n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0a5e4657-dc23-4974-bb2c-53c2bfdf526fn@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Wed, 16 Aug 2023 21:30:05 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Wed, 16 Aug 2023 21:30 UTC

On Wednesday, August 16, 2023 at 3:19:52 PM UTC-5, Timothy McCaffrey wrote:
> On Wednesday, August 16, 2023 at 2:18:25 PM UTC-4, pec...@gmail.com wrote:
> > Brett wrote:
> > > MitchAlsup <Mitch...@aol.com> wrote:
> > > > 32K is 25% bigger than 24K but only 1.1% faster, and likely burns more
> > > > than 1.1% more power.
> > > > <
> > > > Comparing 64K 4-way to 48K 6-way:: 64K is only 0.7% faster; with 1M L2 only 0.4% faster.
> > > This is the killer argument that would have saved me from caring about 16
> > > bit opcodes.
> > > Only toy CPU’s can care about 16 bit opcodes.
> > Instruction compression still matters in embedded applications.
<
> Given a variable length instruction set, it seems to me it makes sense to encode the most used
> instructions into small instructions, if possible. I believe I have read that the most used
> instructions are load, compare, add and branch. The rest are in the single digits percentage wise.
<
My 66000 encodes {ADD, CMP, Bcnd} into the LOOP instruction.
<
> (I wish I had a reference, so take the above with a rock sized grain of salt). Anyway, if
<
Hennessey and Patterson (any revision) has data on this.
I have a spreadsheet that cooks all H&P data across several architectures.
<
> you could encode those instructions into a 16 bit word, and leave the longer instructions
> for all the useful but not used that much remainder, wouldn't that basically "compress"
> your instruction set (even if variants of the longer instructions "overlapped" the short instructions,
> it would probably still be a win).
> - Tim

Re: More of my philosophy about CISC and RISC instructions..

<ubjf63$2eqg6$2@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33658&group=comp.arch#33658

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-1c08-0-35cb-adf0-25c1-8f2e.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Wed, 16 Aug 2023 21:30:43 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <ubjf63$2eqg6$2@newsreader4.netcologne.de>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com>
<bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<ubeclc$2gi1m$1@dont-email.me>
<36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com>
<ubj6bh$2ek2d$2@newsreader4.netcologne.de>
<5caa71f9-d744-461b-96f4-3fd4d2e3a108n@googlegroups.com>
Injection-Date: Wed, 16 Aug 2023 21:30:43 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-1c08-0-35cb-adf0-25c1-8f2e.ipv6dyn.netcologne.de:2001:4dd6:1c08:0:35cb:adf0:25c1:8f2e";
logging-data="2583046"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Wed, 16 Aug 2023 21:30 UTC

pec...@gmail.com <peceed@gmail.com> schrieb:
> Thomas Koenig wrote:
>> pec...@gmail.com <pec...@gmail.com> schrieb:
>> > Instruction compression still matters in embedded applications.
>>
>> > Until the mid-1990s, instruction compression was of great
>> > practical importance even on large machines. The first two
>> > processors that were "conscious RISCs", made by people who knew
>> > what they were doing, had a compact instruction format.
>> The first real RISC was arguably the 801, and it had both 16
>> and 32-bit instructoins, and that had both 16-bit and 32-bit
>> instructions, where the 32-bit instructions had 16-bit constants.
>> (It was also a 24-bit machine, which seems strange, but probably
>> due to IBM internal politics).
>>
>> They did not have three-register instructions, which have becomme
>> the hallmark of RISC processors later.
>>
>> Did they, according to your definition, know what they were doing?
> Sure.
>>
>> And who was the other machine?
> Berkeley RISC Blue

Register windows have proven not to be a very good idea, finally.

>> >Then came a bunch of imitators who, for purely religious reasons,
>> >insisted on a fixed instruction size.
>> Branch range is one reason why a multiple of four for instruction
>> size can be useful.
> Yes, but with code compression you can regain half of the effective "span".
> The more important advantage is that instructions are aligned, but it is not worth of 40-50% of code expansion.

RISC-V certainly took that path, and they spent a large part of their
opcode space for 16-bit

And this led to follow-on problems - lack of opcode space made
the designers chose small offsets for branches, leading to futher
problems.

How did RISC-II address this issue?

Re: More of my philosophy about CISC and RISC instructions..

<b42c7084-798d-418f-af89-0f454a296e9bn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33659&group=comp.arch#33659

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:5609:b0:649:bc8:6083 with SMTP id mg9-20020a056214560900b006490bc86083mr28077qvb.5.1692222681155;
Wed, 16 Aug 2023 14:51:21 -0700 (PDT)
X-Received: by 2002:a63:9d86:0:b0:563:dced:3f3a with SMTP id
i128-20020a639d86000000b00563dced3f3amr191858pgd.0.1692222680700; Wed, 16 Aug
2023 14:51:20 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 16 Aug 2023 14:51:20 -0700 (PDT)
In-Reply-To: <ubjf63$2eqg6$2@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:85aa:2278:f81:8fa8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:85aa:2278:f81:8fa8
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <ubeclc$2gi1m$1@dont-email.me>
<36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com> <ubj6bh$2ek2d$2@newsreader4.netcologne.de>
<5caa71f9-d744-461b-96f4-3fd4d2e3a108n@googlegroups.com> <ubjf63$2eqg6$2@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b42c7084-798d-418f-af89-0f454a296e9bn@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Wed, 16 Aug 2023 21:51:21 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4515
 by: MitchAlsup - Wed, 16 Aug 2023 21:51 UTC

On Wednesday, August 16, 2023 at 4:30:46 PM UTC-5, Thomas Koenig wrote:
> pec...@gmail.com <pec...@gmail.com> schrieb:
> > Thomas Koenig wrote:
> >> pec...@gmail.com <pec...@gmail.com> schrieb:
> >> > Instruction compression still matters in embedded applications.
> >>
> >> > Until the mid-1990s, instruction compression was of great
> >> > practical importance even on large machines. The first two
> >> > processors that were "conscious RISCs", made by people who knew
> >> > what they were doing, had a compact instruction format.
> >> The first real RISC was arguably the 801, and it had both 16
> >> and 32-bit instructoins, and that had both 16-bit and 32-bit
> >> instructions, where the 32-bit instructions had 16-bit constants.
> >> (It was also a 24-bit machine, which seems strange, but probably
> >> due to IBM internal politics).
> >>
> >> They did not have three-register instructions, which have becomme
> >> the hallmark of RISC processors later.
> >>
> >> Did they, according to your definition, know what they were doing?
> > Sure.
> >>
> >> And who was the other machine?
> > Berkeley RISC Blue
> Register windows have proven not to be a very good idea, finally.
<
Only those betting against the power of the optimizing compilers choose
register windows (and some that had to bet with them--Itanic for example)
<
> >> >Then came a bunch of imitators who, for purely religious reasons,
> >> >insisted on a fixed instruction size.
> >> Branch range is one reason why a multiple of four for instruction
> >> size can be useful.
> > Yes, but with code compression you can regain half of the effective "span".
> > The more important advantage is that instructions are aligned, but it is not worth of 40-50% of code expansion.
> RISC-V certainly took that path, and they spent a large part of their
> opcode space for 16-bit
>
> And this led to follow-on problems - lack of opcode space made
> the designers chose small offsets for branches, leading to futher
> problems.
<
It also caused them to a) have to expand the instructions back to size
and then b) fuse instructions together. Literature indicates 5% by
fusing. In contrast, My 66000 ISA only needs 70% of the instruction
count of RISC-V {average, 69% Geomean, 68% Harmonic Mean} over
the 560 subroutines I have spent the time to examine in fine detail.
<
EMBench seems to have several characteristics that RISC-V ISA
illustrates its own disadvantages.
A) many stack frames are big enough that the 12-bit displacement
is insufficient, but that a 16-bit displacement would have been.
B) a plethora of LUI Rt,hi(variable) followed by MEM Rd,lo(variable)[Rt]
C) a plethora of AUPIC Rt,hi(variable).....
>
> How did RISC-II address this issue?

Re: More of my philosophy about CISC and RISC instructions..

<ubjibr$3f76s$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33660&group=comp.arch#33660

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Wed, 16 Aug 2023 17:24:57 -0500
Organization: A noiseless patient Spider
Lines: 650
Message-ID: <ubjibr$3f76s$1@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com>
<bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<ba6c5d7b-80f5-49d5-84a5-4fa92ee7f86bn@googlegroups.com>
<ubesqp$2m85c$1@dont-email.me>
<23c1700c-e477-4130-b7f4-d8559f85165an@googlegroups.com>
<ubgcn8$2tniu$1@dont-email.me>
<06e13dbd-db31-49e6-8f1d-262b0fc277b8n@googlegroups.com>
<ubgqjp$2vppn$1@dont-email.me>
<01c75f2e-0471-4f2e-88fb-f45d9020138bn@googlegroups.com>
<ubhqcc$375ak$1@dont-email.me>
<23b38342-5d35-4fb5-97fb-3cb20432d343n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 16 Aug 2023 22:25:01 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="708fff5e1d94865da4a2d757f8bda7bf";
logging-data="3644636"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/dNwualfAY8COG2uWqR0Gt"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:7gZeDDkyrnQtlBTa8e/rPzaS3H8=
In-Reply-To: <23b38342-5d35-4fb5-97fb-3cb20432d343n@googlegroups.com>
Content-Language: en-US
 by: BGB - Wed, 16 Aug 2023 22:24 UTC

On 8/16/2023 1:15 PM, MitchAlsup wrote:
> On Wednesday, August 16, 2023 at 1:29:37 AM UTC-5, BGB wrote:
>> On 8/15/2023 4:53 PM, MitchAlsup wrote:
>>> On Tuesday, August 15, 2023 at 4:27:26 PM UTC-5, BGB-Alt wrote:
>>
>>> I have 5-bit immediates::
>>> <
>>> FADD R8,#1,R9 // R8 = 1.0D0 + R9;
>>> <
>>> I have two 6-bit immediates::
>>> <
>>> SLL R8,R9,<17:28> // R8 = ~(~0<<17) & (R9>>28)
>>> <
>>> I have 16-bit immediates:
>>> <
>>> ADD R8,R9,0x1234
>>> LD R8,[R9+0x1234]
>>> <
>>> And all of these fit in 1 word--as anyone from the 1st generation RISC camp would
>>> see (except the SPARC guys...)
> <
>> Granted, I am not really familiar with your instruction formats, since
>> you tend not to describe them in any detail here...
>>
>>
>> My usual descriptions should at least be easier to figure out, since
>> they tend to be effectively "notation-modified hexadecimal".
> <
> Inst<31:26> Major OpCode
> Inst<25:21> Rd or Condition
> Inst<20:16> Rs1
> Inst<15:11> Instruction Modifiers
> Inst<10:05> Minor OpCodes
> Inst<04:00> Rs2
> <
> When Inst<31> == 1
> Inst<15:0> IMM16
> <
> However when Major == 3-Operand
> Inst<12:10> Minor OpCode
> Inst<9:5> Rs3
> <

Half the encoding space is Imm16 ops?...

But, yeah, I guess this works.

I guess, specifying everything with bit-ranges is easier to read than, say:
0zzzzz-ddddd-sssss-xxxx-yyyyyy-ttttt

....

>>>>
>>>> However, using the 64-bit encoding are less desirable, since these can't
>>>> be organized into bundles.
>>>>
>>> Easily fixed--get rid of the concept of bundles.
>> Usual downsides of superscalar notwithstanding. Falling back to
>> scalar-only operation being similarly undesirable.
> <
> Everyone and his brother have done superscalar without bundles.
> Conversely all static VLIW forms have failed.
> <
> Now, what would you do if you got an FPGA with the resources to do
> a 4-wide or 5-wide but not a 6-wide machine ??

Dunno...

A lot of DSP's and some microcontrollers and similar have gotten along
OK with VLIW.

Under my existing practice, it would be:
Define a new WEX profile for the new rules;
Modify compiler to allow rules for new profile;
Deal with annoyance of resulting compatibility issues.

But, current thinking was more like:
I will canonize on 3-wide;
4+ wide, by that point, can probably afford superscalar...

There is little that should prevent superscalar. Since, the bundling
rules still require that the instruction sequence is also "sane" if the
instructions are executed sequentially.

As for 3-wide, at present, there doesn't seem to be enough "free ILP"
floating around to justify going wider. Even 3 is pushing it, but the
main advantage that 3-wide has over 2-wide is that it makes it easier to
justify a 6-port register file (which sidesteps some limitations which
result in my case from a 4-port register file).

Well, and a 2-wide configuration with a 6R register file costs almost as
much as a 3-wide configuration.

>>
>> Though, granted, in theory a superscalar core would not need to worry
>> about things which lack a dedicated bundle encoding.
>>>>
>>>> Could potentially try to address this by changing some of the ISA rules
>>>> (to allow jumbo encodings within bundles), but this would make fetch and
>>>> decode more expensive (or, if I allowed a 2-wide case with 2 jumbo
>>>> prefixes, this would require supporting a 128-bit instruction fetch, ...).
>>> <
>>> I started with the concept of 64-bit computer with an inherently misaligned
>>> memory model. Loading a misaligned 64-bit item requires fetching 128-bits
>>> from DCache. Then once you have 128-bit DCache, another instance and you
>>> have a 128-bit instruction fetch. Presto, done.....
>>> <
>>> See how easy that is !!
>> In my case, the "freely aligned" cases only ended up going up to 64 bits.
>>
>>
>> A freely aligned 128-bit fetch would effectively require the L1 caches
>> to internally work with 256 bits at a time (rather than using a 128-bit
>> block).
> <
> But Ifetch does not access the ICache misaligned--obviating that.

It has to deal with the minimum allowed alignment, which in my case is
16 bit.

So, say, one has a 128-bit block fetched with a 64b alignment (X=16
bits), with 64-bit fetch:
XXXXXXXX
IIII----
-IIII---
--IIII--
---IIII-
Everything fits.

96-bit:
XXXXXXXX
IIIIII--
-IIIIII-
--IIIIII
---IIIIIi //oh-no

Or, 128-bit:
XXXXXXXX
IIIIIIII
-IIIIIIIi //oh-no
--IIIIIIii //oh-no
---IIIIIiii //oh-no

So, errm, block needs to be bigger...

Say, we expand the internal fetch block to 192 bits with a 64-bit alignment:
Or, 128-bit:
XXXXXXXXXXXX
IIIIIIII----
-IIIIIIII---
--IIIIIIII--
---IIIIIIII-

This would work at least...

And, still maps to the "paired cache line" scheme.

EEEEEEEEOOOOOOOOEEEEEEEEOOOOOOOO
XXXXXXXXXXXX----________________
IIIIIIII--------
-IIIIIIII-------
--IIIIIIII------
---IIIIIIII-----
----XXXXXXXXXXXX________________
----IIIIIIII----
-----IIIIIIII---
------IIIIIIII--
-------IIIIIIII-
________XXXXXXXXXXXX----________
________IIIIIIII--------
________-IIIIIIII-------
________--IIIIIIII------
________---IIIIIIII-----
________----XXXXXXXXXXXX________
________----IIIIIIII----
________-----IIIIIIII---
________------IIIIIIII--
________-------IIIIIIII-

I guess, 192 bits is cheaper than 256, and sufficient to deal with free
alignment.

>>
>> Though, instruction alignment that is like:
>> Well, 16/32/64 bit cases have a 16-bit alignment, but 96 bit requires
>> 32-bit alignment, is a little wonky.
> <
> Variable issue with 16-bit quanta has 2× as many multiplexers as with
> 32-bit quanta.

Yeah.

In this case, it is mostly just an issue that there exists a case where
a 96-bit fetch with a 128-bit fetch block would leave the final 16 bits
"hanging off the end".

Defining "96-bit op needs 32-bit alignment" was less wonky than "96-bit
op is not valid if ((PC&0x6)==0x6)..."

Granted, one could argue that "less bad" option is to require that the
I$ be wide enough to deal with any instruction at any alignment.

>>
>> If I were to handle it the same way as my L1 D$, then a 128-bit
>> instruction fetch would need a 64-bit alignment. This is basically no-go.
>>
> Sigh.........

I guess, technically, there is at least a workaround...

>>
>> So, would need to make this logic wider...
>>
>> Granted, caches which work with even/odd pairs of 128-bit cache lines,
>> are at least wide enough to deal with unaligned 128-bit load/store
>> without needing to redesign the bus-side interface (it would mostly be
>> an issue of added cost).
>>>>
>>>> Though, a possible merit would be, say, if I could allow a
>>>> "FEii-iiii-FEii-iiii-FFw0-0iii-ZZnm-ZeiZ"
>>>> Special case to allow gluing a 64-bit immediate onto pretty much any
>>>> other instruction...
>>> <
>>> instruction<15:13,11> contains the "Routing of operands to Function Units".
>>> This includes sign control, immediates and their position,....
>>>>
>>>> But, this falls more into a "possible but debatable if worth the cost"
>>>> category...
>>> <
>>> You are the one who can never let a thread die.........
>>> <
>> Hmm...
>>>>>>
>>>>>> Granted, one can encode:
>>>>>> ADD 0x1234, R45
>>>>>> And:
>>>>>> ADD 0x12345678, R45
>>>>>> In the Baseline mode.
>>>>>>
>>>>>> But, at the cost of needing to use a jumbo prefix...
>>>>>>>>
>>>>>>>> Or:
>>>>>>>> ADD?T R21, R11, R30
>>>>>>>> And:
>>>>>>>> ADD R21, R43, R60
>>>>>>>> But not:
>>>>>>>> ADD?T R21, R43, R60
>>>>>>> <
>>>>>>> You only have that problem when you improperly encode predication.
>>>>>>> <
>>>>>>> Pcnd Rcnd,TTTEEE
>>>>>>> Then-Inst
>>>>>>> Then-Inst
>>>>>>> Then-Inst
>>>>>>> Else-Inst
>>>>>>> Else-Inst
>>>>>>> Else-Inst
>>>>>>> unpredicated-Inst
>>>>> <
>>>>>> The scheme I am using allows me to execute the Then and Else branches at
>>>>>> the same time:
>>>>>> ADD?T R4, 1, R9 | ADD?F R4, -1, R9
>>>>> <
>>>>> What makes you think I cannot ??
>>> <
>>>> Assuming the decoder moves along at one instruction (or bundle) per
>>>> clock cycle, then encoding them end-to-end would take twice as long.
>>> <
>>> That is not how the decoder moves along. At the PRED instruction, the
>>> fetch-decode pipeline knows how long the then-clause is and how long
>>> the else-clause is. Now remember I am fetching 128-bits per cycle in
>>> my 1-wide machine and there is a limit of 8 instructions (max) in each
>>> clause, so by the time the PRED condition resolves, all of the instructions
>>> in the then-clause have been fetched, and the else-clause instructions
>>> are being fetched. At this point all one has to do is cancel stuff, but you
>>> still do not need to disrupt the fetch-decode pipeline. At most I decode
>>> 1 instruction in the then-clause that will not survive execution. And it
>>> does not get to execute.
>> Hmm...
>>
>> OK.
>> My pipeline advances 1 bundle per clock cycle (excluding stalls or
>> branches).
> <
> SuperScalar pipelines do not have to have all instructions proceeding
> at the same rate. OoO pipelines guarantee that they don't.


Click here to read the complete article
Re: More of my philosophy about CISC and RISC instructions..

<dfea4b94-cd0b-4741-93e8-44ca37f09473n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33661&group=comp.arch#33661

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1a03:b0:76d:473:2e74 with SMTP id bk3-20020a05620a1a0300b0076d04732e74mr17709qkb.6.1692229261318;
Wed, 16 Aug 2023 16:41:01 -0700 (PDT)
X-Received: by 2002:a63:3e4c:0:b0:564:1c39:c022 with SMTP id
l73-20020a633e4c000000b005641c39c022mr862717pga.5.1692229260883; Wed, 16 Aug
2023 16:41:00 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 16 Aug 2023 16:41:00 -0700 (PDT)
In-Reply-To: <ubjibr$3f76s$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:85aa:2278:f81:8fa8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:85aa:2278:f81:8fa8
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<ba6c5d7b-80f5-49d5-84a5-4fa92ee7f86bn@googlegroups.com> <ubesqp$2m85c$1@dont-email.me>
<23c1700c-e477-4130-b7f4-d8559f85165an@googlegroups.com> <ubgcn8$2tniu$1@dont-email.me>
<06e13dbd-db31-49e6-8f1d-262b0fc277b8n@googlegroups.com> <ubgqjp$2vppn$1@dont-email.me>
<01c75f2e-0471-4f2e-88fb-f45d9020138bn@googlegroups.com> <ubhqcc$375ak$1@dont-email.me>
<23b38342-5d35-4fb5-97fb-3cb20432d343n@googlegroups.com> <ubjibr$3f76s$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <dfea4b94-cd0b-4741-93e8-44ca37f09473n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Wed, 16 Aug 2023 23:41:01 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 17312
 by: MitchAlsup - Wed, 16 Aug 2023 23:41 UTC

On Wednesday, August 16, 2023 at 5:25:06 PM UTC-5, BGB wrote:
> On 8/16/2023 1:15 PM, MitchAlsup wrote:
> > On Wednesday, August 16, 2023 at 1:29:37 AM UTC-5, BGB wrote:
> >> On 8/15/2023 4:53 PM, MitchAlsup wrote:
> >>> On Tuesday, August 15, 2023 at 4:27:26 PM UTC-5, BGB-Alt wrote:
> >>
> >>> I have 5-bit immediates::
> >>> <
> >>> FADD R8,#1,R9 // R8 = 1.0D0 + R9;
> >>> <
> >>> I have two 6-bit immediates::
> >>> <
> >>> SLL R8,R9,<17:28> // R8 = ~(~0<<17) & (R9>>28)
> >>> <
> >>> I have 16-bit immediates:
> >>> <
> >>> ADD R8,R9,0x1234
> >>> LD R8,[R9+0x1234]
> >>> <
> >>> And all of these fit in 1 word--as anyone from the 1st generation RISC camp would
> >>> see (except the SPARC guys...)
> > <
> >> Granted, I am not really familiar with your instruction formats, since
> >> you tend not to describe them in any detail here...
> >>
> >>
> >> My usual descriptions should at least be easier to figure out, since
> >> they tend to be effectively "notation-modified hexadecimal".
> > <
> > Inst<31:26> Major OpCode
> > Inst<25:21> Rd or Condition
> > Inst<20:16> Rs1
> > Inst<15:11> Instruction Modifiers
> > Inst<10:05> Minor OpCodes
> > Inst<04:00> Rs2
> > <
> > When Inst<31> == 1
> > Inst<15:0> IMM16
> > <
> > However when Major == 3-Operand
> > Inst<12:10> Minor OpCode
> > Inst<9:5> Rs3
> > <
> Half the encoding space is Imm16 ops?...
<
Just under, yes.
>
> But, yeah, I guess this works.
>
> I guess, specifying everything with bit-ranges is easier to read than, say:
> 0zzzzz-ddddd-sssss-xxxx-yyyyyy-ttttt
<
There are 7 formats:: and in your form::
<
The Major OpCode escapes
>
000110-bcond-sssss-PRED-----imm4----imm4 // predication cast
000111-ddddd-sssss-SHF-s-wwwwww-llllll // shift with immediates
001001-ddddd-sssss-MODIF-MEMORY-sssss // 2-register memory references
001010-ddddd-sssss-MODIF-2OPRND-sssss // 2-register calculations
001100-ddddd-sssss-MOD-3OP-sssss-sssss // 3-register calculations
001101-ddddd-sssss-MODIF-1OPERND-sssss // 1-register calculations
<
Then the Major OpCodes
<
01100x-bonbit-sssss-displacement16
011010-bcond-sssss-displacement16
011011-hhhhh-sssss-displacement16 // Table Transfer (switch)
011110-displacement26 // branch
011111-displacement26 // CALL
100xxx-ddddd-sssss-immediate16 // LDs
101xxx-ddddd-sssss-immediate16 // STs
110xxx-ddddd-sssss-immediate16 // integer
111xxx-ddddd-sssss-immediate16 // logical
>
> ...
> >>>>
> >>>> However, using the 64-bit encoding are less desirable, since these can't
> >>>> be organized into bundles.
> >>>>
> >>> Easily fixed--get rid of the concept of bundles.
> >> Usual downsides of superscalar notwithstanding. Falling back to
> >> scalar-only operation being similarly undesirable.
> > <
> > Everyone and his brother have done superscalar without bundles.
> > Conversely all static VLIW forms have failed.
> > <
> > Now, what would you do if you got an FPGA with the resources to do
> > a 4-wide or 5-wide but not a 6-wide machine ??
> Dunno...
>
> A lot of DSP's and some microcontrollers and similar have gotten along
> OK with VLIW.
>
>
> Under my existing practice, it would be:
> Define a new WEX profile for the new rules;
> Modify compiler to allow rules for new profile;
> Deal with annoyance of resulting compatibility issues.
>
>
> But, current thinking was more like:
> I will canonize on 3-wide;
> 4+ wide, by that point, can probably afford superscalar...
>
>
> There is little that should prevent superscalar. Since, the bundling
> rules still require that the instruction sequence is also "sane" if the
> instructions are executed sequentially.
>
>
> As for 3-wide, at present, there doesn't seem to be enough "free ILP"
> floating around to justify going wider. Even 3 is pushing it, but the
> main advantage that 3-wide has over 2-wide is that it makes it easier to
> justify a 6-port register file (which sidesteps some limitations which
> result in my case from a 4-port register file).
>
> Well, and a 2-wide configuration with a 6R register file costs almost as
> much as a 3-wide configuration.
<
There is a time and place for fully-resourcing a machine--the register file
is not one of them.
<
A 3-wide machine which can provide 2-2-2 and 3-2-1 register operands
to any function units is far more performing than a 2-wide machine 3-3.
{Counting long constants} as much as 45% of instructions use a constant
and thereby don't need the second register port, while only 10%-ish need
all 3.
<
Also note: Storing of a constant becomes independent of any register dependency.
<
> >>
> >> Though, granted, in theory a superscalar core would not need to worry
> >> about things which lack a dedicated bundle encoding.
> >>>>
> >>>> Could potentially try to address this by changing some of the ISA rules
> >>>> (to allow jumbo encodings within bundles), but this would make fetch and
> >>>> decode more expensive (or, if I allowed a 2-wide case with 2 jumbo
> >>>> prefixes, this would require supporting a 128-bit instruction fetch, ...).
> >>> <
> >>> I started with the concept of 64-bit computer with an inherently misaligned
> >>> memory model. Loading a misaligned 64-bit item requires fetching 128-bits
> >>> from DCache. Then once you have 128-bit DCache, another instance and you
> >>> have a 128-bit instruction fetch. Presto, done.....
> >>> <
> >>> See how easy that is !!
> >> In my case, the "freely aligned" cases only ended up going up to 64 bits.
> >>
> >>
> >> A freely aligned 128-bit fetch would effectively require the L1 caches
> >> to internally work with 256 bits at a time (rather than using a 128-bit
> >> block).
> > <
> > But Ifetch does not access the ICache misaligned--obviating that.
> It has to deal with the minimum allowed alignment, which in my case is
> 16 bit.
<
Simpler than that::
<
128-bits = ICache[ IP & ~15];
<
Instruction = 128-bits[ IP & 15 ];
>
> So, say, one has a 128-bit block fetched with a 64b alignment (X=16
> bits), with 64-bit fetch:
> XXXXXXXX
> IIII----
> -IIII---
> --IIII--
> ---IIII-
> Everything fits.
>
> 96-bit:
> XXXXXXXX
> IIIIII--
> -IIIIII-
> --IIIIII
> ---IIIIIi //oh-no
>
> Or, 128-bit:
> XXXXXXXX
> IIIIIIII
> -IIIIIIIi //oh-no
> --IIIIIIii //oh-no
> ---IIIIIiii //oh-no
>
> So, errm, block needs to be bigger...
>
No, FETCH needs to be more agressive::
<
Fetch on cycle[1] :: IP + 00 ::: 0123456789abcdef
Fetch on cycle[2] :: IP + 16 ::: fedcba9876543210
<
inst-128 can be ovserved in cycle[2] but only after tag==TLB comparisons (late)
So, you have time to PARSE the instructions determine boundaries but not
decode them or access register file ports.
<
So, DECODE is in cycle[3] and multiple decoders are handed 1 instruction-specifier
each {with or without appended constants}. By the end of cycle[3] the overhanging
constants on the subsequent FETCH will have arrived and can be routed directly into
execution (while arriving instructions-themselves-cannot.) So, yes, there is the possi-
bility of stutter stepping but to instruction alignment, because you are fetching 4×
your execution width the front end plows far enough ahead this is seldom a problem.
{Far less of a problem that branch target latency}
>
> Say, we expand the internal fetch block to 192 bits with a 64-bit alignment:
> Or, 128-bit:
> XXXXXXXXXXXX
> IIIIIIII----
> -IIIIIIII---
> --IIIIIIII--
> ---IIIIIIII-
>
> This would work at least...
>
>
> And, still maps to the "paired cache line" scheme.
>
> EEEEEEEEOOOOOOOOEEEEEEEEOOOOOOOO
> XXXXXXXXXXXX----________________
> IIIIIIII--------
> -IIIIIIII-------
> --IIIIIIII------
> ---IIIIIIII-----
> ----XXXXXXXXXXXX________________
> ----IIIIIIII----
> -----IIIIIIII---
> ------IIIIIIII--
> -------IIIIIIII-
> ________XXXXXXXXXXXX----________
> ________IIIIIIII--------
> ________-IIIIIIII-------
> ________--IIIIIIII------
> ________---IIIIIIII-----
> ________----XXXXXXXXXXXX________
> ________----IIIIIIII----
> ________-----IIIIIIII---
> ________------IIIIIIII--
> ________-------IIIIIIII-
>
>
> I guess, 192 bits is cheaper than 256, and sufficient to deal with free
> alignment.
For your ISA and your execution width (only)
> >>
> >> Though, instruction alignment that is like:
> >> Well, 16/32/64 bit cases have a 16-bit alignment, but 96 bit requires
> >> 32-bit alignment, is a little wonky.
> > <
> > Variable issue with 16-bit quanta has 2× as many multiplexers as with
> > 32-bit quanta.
> Yeah.
>
> In this case, it is mostly just an issue that there exists a case where
> a 96-bit fetch with a 128-bit fetch block would leave the final 16 bits
> "hanging off the end".
<
But by the time you have figured that out, all you have to look at is
hit[+1] to know that the rest of the instruction has arrived.
>
> Defining "96-bit op needs 32-bit alignment" was less wonky than "96-bit
> op is not valid if ((PC&0x6)==0x6)..."
>
>
> Granted, one could argue that "less bad" option is to require that the
> I$ be wide enough to deal with any instruction at any alignment.
<
My requirement was dramatically simpler: I don't want to expend the
engineering resources to build an ICache when I can instantiate a
DCache and let the great Verilog gate-eater get rid of the unneeded
stuff. ICache is a degenerate subset of DCache functionality. Build
once, verify once, instantiate as many times as desired.
> >>
> >> If I were to handle it the same way as my L1 D$, then a 128-bit
> >> instruction fetch would need a 64-bit alignment. This is basically no-go.
> >>
> > Sigh.........
> I guess, technically, there is at least a workaround...
> >>
> >> So, would need to make this logic wider...
> >>
> >> Granted, caches which work with even/odd pairs of 128-bit cache lines,
> >> are at least wide enough to deal with unaligned 128-bit load/store
> >> without needing to redesign the bus-side interface (it would mostly be
> >> an issue of added cost).
> >>>>
> >>>> Though, a possible merit would be, say, if I could allow a
> >>>> "FEii-iiii-FEii-iiii-FFw0-0iii-ZZnm-ZeiZ"
> >>>> Special case to allow gluing a 64-bit immediate onto pretty much any
> >>>> other instruction...
> >>> <
> >>> instruction<15:13,11> contains the "Routing of operands to Function Units".
> >>> This includes sign control, immediates and their position,....
> >>>>
> >>>> But, this falls more into a "possible but debatable if worth the cost"
> >>>> category...
> >>> <
> >>> You are the one who can never let a thread die.........
> >>> <
> >> Hmm...
> >>>>>>
> >>>>>> Granted, one can encode:
> >>>>>> ADD 0x1234, R45
> >>>>>> And:
> >>>>>> ADD 0x12345678, R45
> >>>>>> In the Baseline mode.
> >>>>>>
> >>>>>> But, at the cost of needing to use a jumbo prefix...
> >>>>>>>>
> >>>>>>>> Or:
> >>>>>>>> ADD?T R21, R11, R30
> >>>>>>>> And:
> >>>>>>>> ADD R21, R43, R60
> >>>>>>>> But not:
> >>>>>>>> ADD?T R21, R43, R60
> >>>>>>> <
> >>>>>>> You only have that problem when you improperly encode predication..
> >>>>>>> <
> >>>>>>> Pcnd Rcnd,TTTEEE
> >>>>>>> Then-Inst
> >>>>>>> Then-Inst
> >>>>>>> Then-Inst
> >>>>>>> Else-Inst
> >>>>>>> Else-Inst
> >>>>>>> Else-Inst
> >>>>>>> unpredicated-Inst
> >>>>> <
> >>>>>> The scheme I am using allows me to execute the Then and Else branches at
> >>>>>> the same time:
> >>>>>> ADD?T R4, 1, R9 | ADD?F R4, -1, R9
> >>>>> <
> >>>>> What makes you think I cannot ??
> >>> <
> >>>> Assuming the decoder moves along at one instruction (or bundle) per
> >>>> clock cycle, then encoding them end-to-end would take twice as long.
> >>> <
> >>> That is not how the decoder moves along. At the PRED instruction, the
> >>> fetch-decode pipeline knows how long the then-clause is and how long
> >>> the else-clause is. Now remember I am fetching 128-bits per cycle in
> >>> my 1-wide machine and there is a limit of 8 instructions (max) in each
> >>> clause, so by the time the PRED condition resolves, all of the instructions
> >>> in the then-clause have been fetched, and the else-clause instructions
> >>> are being fetched. At this point all one has to do is cancel stuff, but you
> >>> still do not need to disrupt the fetch-decode pipeline. At most I decode
> >>> 1 instruction in the then-clause that will not survive execution. And it
> >>> does not get to execute.
> >> Hmm...
> >>
> >> OK.
> >> My pipeline advances 1 bundle per clock cycle (excluding stalls or
> >> branches).
> > <
> > SuperScalar pipelines do not have to have all instructions proceeding
> > at the same rate. OoO pipelines guarantee that they don't.
<
> My imagination for a superscalar pipeline would have basically been to
> have logic to detect valid prefix/suffix pairs and a lack of register
> conflict, and then behave as if the WEX flag were set (*).
<
Add function unit conflict detect and you have the sequencer from Mc 88110.
>
<snip>
> >> So, you are arguing it would be better to just "bite the bullet" early
> >> and go over to superscalar?...
> > <
> > If you EVER see your architecture being implemented as a 1-wide
> > or 4,5-wide or 7-8-wide :: yes absolutely. That is you are locking
> > in the concept of the bundle that may not be relevant in other
> > implementations.
> > <
> I have done a 1-wide implementation, but it was pretty limited.
>
> But, within the limitations of a 1-wide context, there isn't really any
> strong advantage over RISC-V.
>
> I had also done a 2-wide implementation, but there is the annoyance of
> binary compatibility issues.
>
See, my ISA goes from the small through the large (mostly) effortlessly.
{Something I paid dearly for in the middle of my career having to engineer
my way out of the holes I had dug myself into earlier.}
<
Mental test cases show I can scale typical ICached based Fetch-Insert
pipelines from 1 through 16 instructions per clock rather easily.
Exactly how wide a given technology can fit (or even be optimal for)
is a significantly harder question.


Click here to read the complete article
Re: More of my philosophy about CISC and RISC instructions..

<19aaa95e-047d-48f4-a6ff-0f60fed9d054n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33662&group=comp.arch#33662

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:600c:b45:b0:3fe:24da:494a with SMTP id k5-20020a05600c0b4500b003fe24da494amr37471wmr.8.1692231902946;
Wed, 16 Aug 2023 17:25:02 -0700 (PDT)
X-Received: by 2002:a05:6a00:992:b0:67d:41a8:3e19 with SMTP id
u18-20020a056a00099200b0067d41a83e19mr1797525pfg.3.1692231902016; Wed, 16 Aug
2023 17:25:02 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.128.87.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 16 Aug 2023 17:25:01 -0700 (PDT)
In-Reply-To: <b42c7084-798d-418f-af89-0f454a296e9bn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=93.159.184.237; posting-account=zjh_fgoAAABo0Nzgf6peaFtS6c-3xdgr
NNTP-Posting-Host: 93.159.184.237
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <ubeclc$2gi1m$1@dont-email.me>
<36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com> <ubj6bh$2ek2d$2@newsreader4.netcologne.de>
<5caa71f9-d744-461b-96f4-3fd4d2e3a108n@googlegroups.com> <ubjf63$2eqg6$2@newsreader4.netcologne.de>
<b42c7084-798d-418f-af89-0f454a296e9bn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <19aaa95e-047d-48f4-a6ff-0f60fed9d054n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: peceed@gmail.com (pec...@gmail.com)
Injection-Date: Thu, 17 Aug 2023 00:25:02 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: pec...@gmail.com - Thu, 17 Aug 2023 00:25 UTC

MitchAlsup wrote:
> On Wednesday, August 16, 2023 at 4:30:46 PM UTC-5, Thomas Koenig wrote:
> > pec...@gmail.com <pec...@gmail.com> schrieb:
> > > Thomas Koenig wrote:
> > >> pec...@gmail.com <pec...@gmail.com> schrieb:
> > >> > Instruction compression still matters in embedded applications.
> > >>
> > >> > Until the mid-1990s, instruction compression was of great
> > >> > practical importance even on large machines. The first two
> > >> > processors that were "conscious RISCs", made by people who knew
> > >> > what they were doing, had a compact instruction format.
> > >> The first real RISC was arguably the 801, and it had both 16
> > >> and 32-bit instructoins, and that had both 16-bit and 32-bit
> > >> instructions, where the 32-bit instructions had 16-bit constants.
> > >> (It was also a 24-bit machine, which seems strange, but probably
> > >> due to IBM internal politics).
> > >>
> > >> They did not have three-register instructions, which have becomme
> > >> the hallmark of RISC processors later.
> > >>
> > >> Did they, according to your definition, know what they were doing?
> > > Sure.
> > >>
> > >> And who was the other machine?
> > > Berkeley RISC Blue
> > Register windows have proven not to be a very good idea, finally.
> <
> Only those betting against the power of the optimizing compilers choose
> register windows (and some that had to bet with them--Itanic for example)
> <
> > >> >Then came a bunch of imitators who, for purely religious reasons,
> > >> >insisted on a fixed instruction size.
> > >> Branch range is one reason why a multiple of four for instruction
> > >> size can be useful.
> > > Yes, but with code compression you can regain half of the effective "span".
> > > The more important advantage is that instructions are aligned, but it is not worth of 40-50% of code expansion.
> > RISC-V certainly took that path, and they spent a large part of their
> > opcode space for 16-bit
> >
> > And this led to follow-on problems - lack of opcode space made
> > the designers chose small offsets for branches, leading to futher
> > problems.
> <
> It also caused them to a) have to expand the instructions back to size
> and then b) fuse instructions together. Literature indicates 5% by
> fusing.
It is 7% for RISC-V common idioms. And for this purpose compressed instructions are very useful. In the future
it can be even more.

> In contrast, My 66000 ISA only needs 70% of the instruction
> count of RISC-V {average, 69% Geomean, 68% Harmonic Mean} over
> the 560 subroutines I have spent the time to examine in fine detail.

I think there is no point to compare RISC-V to My66000, it is in a different league.
Let's compare it ARMv8: I don't think all these shortcomings translate into a performance loss of more than 10% in any significant metric, which means it won't prevent the success of this architecture. For me, the bigger problem is the arrogance of the organization and its bureaucratic inefficiency.

> EMBench seems to have several characteristics that RISC-V ISA
> illustrates its own disadvantages.
> A) many stack frames are big enough that the 12-bit displacement
> is insufficient, but that a 16-bit displacement would have been.
> B) a plethora of LUI Rt,hi(variable) followed by MEM Rd,lo(variable)[Rt]
> C) a plethora of AUPIC Rt,hi(variable).....
A lot of idioms to fuse...

Re: More of my philosophy about CISC and RISC instructions..

<b209864b-39cc-4796-8a8a-57c72a491d23n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33663&group=comp.arch#33663

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:600c:2112:b0:3fc:5821:323d with SMTP id u18-20020a05600c211200b003fc5821323dmr36697wml.1.1692235509882;
Wed, 16 Aug 2023 18:25:09 -0700 (PDT)
X-Received: by 2002:a17:902:f90f:b0:1bf:fcc:e8d7 with SMTP id
kw15-20020a170902f90f00b001bf0fcce8d7mr250273plb.9.1692235509551; Wed, 16 Aug
2023 18:25:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.128.87.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 16 Aug 2023 18:25:09 -0700 (PDT)
In-Reply-To: <19aaa95e-047d-48f4-a6ff-0f60fed9d054n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:85aa:2278:f81:8fa8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:85aa:2278:f81:8fa8
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <ubeclc$2gi1m$1@dont-email.me>
<36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com> <ubj6bh$2ek2d$2@newsreader4.netcologne.de>
<5caa71f9-d744-461b-96f4-3fd4d2e3a108n@googlegroups.com> <ubjf63$2eqg6$2@newsreader4.netcologne.de>
<b42c7084-798d-418f-af89-0f454a296e9bn@googlegroups.com> <19aaa95e-047d-48f4-a6ff-0f60fed9d054n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b209864b-39cc-4796-8a8a-57c72a491d23n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 17 Aug 2023 01:25:09 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Thu, 17 Aug 2023 01:25 UTC

On Wednesday, August 16, 2023 at 7:25:06 PM UTC-5, pec...@gmail.com wrote:
> MitchAlsup wrote:
> > On Wednesday, August 16, 2023 at 4:30:46 PM UTC-5, Thomas Koenig wrote:
> > > pec...@gmail.com <pec...@gmail.com> schrieb:
> > > > Thomas Koenig wrote:
> > > >> pec...@gmail.com <pec...@gmail.com> schrieb:
> > > >> > Instruction compression still matters in embedded applications.
> > > >>
> > > >> > Until the mid-1990s, instruction compression was of great
> > > >> > practical importance even on large machines. The first two
> > > >> > processors that were "conscious RISCs", made by people who knew
> > > >> > what they were doing, had a compact instruction format.
> > > >> The first real RISC was arguably the 801, and it had both 16
> > > >> and 32-bit instructoins, and that had both 16-bit and 32-bit
> > > >> instructions, where the 32-bit instructions had 16-bit constants.
> > > >> (It was also a 24-bit machine, which seems strange, but probably
> > > >> due to IBM internal politics).
> > > >>
> > > >> They did not have three-register instructions, which have becomme
> > > >> the hallmark of RISC processors later.
> > > >>
> > > >> Did they, according to your definition, know what they were doing?
> > > > Sure.
> > > >>
> > > >> And who was the other machine?
> > > > Berkeley RISC Blue
> > > Register windows have proven not to be a very good idea, finally.
> > <
> > Only those betting against the power of the optimizing compilers choose
> > register windows (and some that had to bet with them--Itanic for example)
> > <
> > > >> >Then came a bunch of imitators who, for purely religious reasons,
> > > >> >insisted on a fixed instruction size.
> > > >> Branch range is one reason why a multiple of four for instruction
> > > >> size can be useful.
> > > > Yes, but with code compression you can regain half of the effective "span".
> > > > The more important advantage is that instructions are aligned, but it is not worth of 40-50% of code expansion.
> > > RISC-V certainly took that path, and they spent a large part of their
> > > opcode space for 16-bit
> > >
> > > And this led to follow-on problems - lack of opcode space made
> > > the designers chose small offsets for branches, leading to futher
> > > problems.
> > <
> > It also caused them to a) have to expand the instructions back to size
> > and then b) fuse instructions together. Literature indicates 5% by
> > fusing.
> It is 7% for RISC-V common idioms. And for this purpose compressed instructions are very useful. In the future
> it can be even more.
> > In contrast, My 66000 ISA only needs 70% of the instruction
> > count of RISC-V {average, 69% Geomean, 68% Harmonic Mean} over
> > the 560 subroutines I have spent the time to examine in fine detail.
> I think there is no point to compare RISC-V to My66000, it is in a different league.
> Let's compare it ARMv8: I don't think all these shortcomings translate into a performance loss of more than 10% in any significant metric, which means it won't prevent the success of this architecture. For me, the bigger problem is the arrogance of the organization and its bureaucratic inefficiency.
<
You cannot overcome incompetence with arrogance.
and
Leading with arrogance often implies a base of incompetence.
<
> > EMBench seems to have several characteristics that RISC-V ISA
> > illustrates its own disadvantages.
> > A) many stack frames are big enough that the 12-bit displacement
> > is insufficient, but that a 16-bit displacement would have been.
> > B) a plethora of LUI Rt,hi(variable) followed by MEM Rd,lo(variable)[Rt]
> > C) a plethora of AUPIC Rt,hi(variable).....
> A lot of idioms to fuse...

Re: More of my philosophy about CISC and RISC instructions..

<ubkbqa$3lt5i$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33664&group=comp.arch#33664

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Thu, 17 Aug 2023 00:39:19 -0500
Organization: A noiseless patient Spider
Lines: 572
Message-ID: <ubkbqa$3lt5i$1@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com>
<bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<ba6c5d7b-80f5-49d5-84a5-4fa92ee7f86bn@googlegroups.com>
<ubesqp$2m85c$1@dont-email.me>
<23c1700c-e477-4130-b7f4-d8559f85165an@googlegroups.com>
<ubgcn8$2tniu$1@dont-email.me>
<06e13dbd-db31-49e6-8f1d-262b0fc277b8n@googlegroups.com>
<ubgqjp$2vppn$1@dont-email.me>
<01c75f2e-0471-4f2e-88fb-f45d9020138bn@googlegroups.com>
<ubhqcc$375ak$1@dont-email.me>
<23b38342-5d35-4fb5-97fb-3cb20432d343n@googlegroups.com>
<ubjibr$3f76s$1@dont-email.me>
<dfea4b94-cd0b-4741-93e8-44ca37f09473n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 17 Aug 2023 05:39:22 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="708fff5e1d94865da4a2d757f8bda7bf";
logging-data="3863730"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/6tUSXXj2F5bDFG4pfqOSs"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:v/rjHkCaMjJVKO2wvdwF4OWzBA8=
Content-Language: en-US
In-Reply-To: <dfea4b94-cd0b-4741-93e8-44ca37f09473n@googlegroups.com>
 by: BGB - Thu, 17 Aug 2023 05:39 UTC

On 8/16/2023 6:41 PM, MitchAlsup wrote:
> On Wednesday, August 16, 2023 at 5:25:06 PM UTC-5, BGB wrote:
>> On 8/16/2023 1:15 PM, MitchAlsup wrote:
>>> On Wednesday, August 16, 2023 at 1:29:37 AM UTC-5, BGB wrote:
>>>> On 8/15/2023 4:53 PM, MitchAlsup wrote:
>>>>> On Tuesday, August 15, 2023 at 4:27:26 PM UTC-5, BGB-Alt wrote:
>>>>
>>>>> I have 5-bit immediates::
>>>>> <
>>>>> FADD R8,#1,R9 // R8 = 1.0D0 + R9;
>>>>> <
>>>>> I have two 6-bit immediates::
>>>>> <
>>>>> SLL R8,R9,<17:28> // R8 = ~(~0<<17) & (R9>>28)
>>>>> <
>>>>> I have 16-bit immediates:
>>>>> <
>>>>> ADD R8,R9,0x1234
>>>>> LD R8,[R9+0x1234]
>>>>> <
>>>>> And all of these fit in 1 word--as anyone from the 1st generation RISC camp would
>>>>> see (except the SPARC guys...)
>>> <
>>>> Granted, I am not really familiar with your instruction formats, since
>>>> you tend not to describe them in any detail here...
>>>>
>>>>
>>>> My usual descriptions should at least be easier to figure out, since
>>>> they tend to be effectively "notation-modified hexadecimal".
>>> <
>>> Inst<31:26> Major OpCode
>>> Inst<25:21> Rd or Condition
>>> Inst<20:16> Rs1
>>> Inst<15:11> Instruction Modifiers
>>> Inst<10:05> Minor OpCodes
>>> Inst<04:00> Rs2
>>> <
>>> When Inst<31> == 1
>>> Inst<15:0> IMM16
>>> <
>>> However when Major == 3-Operand
>>> Inst<12:10> Minor OpCode
>>> Inst<9:5> Rs3
>>> <
>> Half the encoding space is Imm16 ops?...
> <
> Just under, yes.
>>
>> But, yeah, I guess this works.
>>
>> I guess, specifying everything with bit-ranges is easier to read than, say:
>> 0zzzzz-ddddd-sssss-xxxx-yyyyyy-ttttt
> <
> There are 7 formats:: and in your form::
> <
> The Major OpCode escapes
>>
> 000110-bcond-sssss-PRED-----imm4----imm4 // predication cast
> 000111-ddddd-sssss-SHF-s-wwwwww-llllll // shift with immediates
> 001001-ddddd-sssss-MODIF-MEMORY-sssss // 2-register memory references
> 001010-ddddd-sssss-MODIF-2OPRND-sssss // 2-register calculations
> 001100-ddddd-sssss-MOD-3OP-sssss-sssss // 3-register calculations
> 001101-ddddd-sssss-MODIF-1OPERND-sssss // 1-register calculations
> <
> Then the Major OpCodes
> <
> 01100x-bonbit-sssss-displacement16
> 011010-bcond-sssss-displacement16
> 011011-hhhhh-sssss-displacement16 // Table Transfer (switch)
> 011110-displacement26 // branch
> 011111-displacement26 // CALL
> 100xxx-ddddd-sssss-immediate16 // LDs
> 101xxx-ddddd-sssss-immediate16 // STs
> 110xxx-ddddd-sssss-immediate16 // integer
> 111xxx-ddddd-sssss-immediate16 // logical

OK, doesn't seem like a whole lot of bits left for opcode in some cases
though...

Probably needs more looking at it.

As noted, I had generally smaller immediate and displacement fields to
leave more room for opcode bits and similar.

Only about 12.5% of the 32-bit instruction space had gone to Imm16 ops,
which were, as noted:
MOV Imm16u, Rn
MOV Imm16n, Rn
ADD Imm16s, Rn
LDSH Imm16u, Rn
FLDCH Imm16u, Rn

With a few more spots left over (the block was basically large enough to
encode 8 instruction spots). With XG2, this block implicitly expands to
32 spots (with the Ws/Wt bits being reserved as opcode).

So, within the F0 block, there were effectively 9 bits for opcode (for
the entire space). Though, part of this was carved off for branch ops,
and there are also 2R and 1R spaces carved off of this space (space
worth roughly 24 3R ops was carved off for the 2R spaces; which
currently allows for around 384 2R instruction spots).

The F1 block Load/Store had enough space for 32 unique spots.
Half this space was used for normal Load/Store ops (and LEA);
1/4 was used for more specialized Load/Store ops;
1/4 was used for Compare+Branch ops (like in RISC-V).

The F2 block was divided roughly in half:
Low part was used for 3RI Imm9 ops (~ 18 spots);
High part was used for "Imm10, Rn" ops (~ 224 spots).

....

Checking, currently there are roughly 391 unique mnemonics (though,
looks like around 30-40% of these are various SIMD ops and similar).

The remaining F3 and F9 blocks are both 24 bits, which if used in the
same way as the F0 block, could potentially each allow and additional
512 3R instruction spots (or 1024 spots in total). Though, my current
plan is to try to leave F3 unused (mostly to leave it for
implementation-defined instructions).

There is a bit more space left if one counts 64-bit encodings (but, thus
far, the definitions of instructions specific to 64-bit encodings has
ended up a little ad-hoc).

....

>>
>> ...
>>>>>>
>>>>>> However, using the 64-bit encoding are less desirable, since these can't
>>>>>> be organized into bundles.
>>>>>>
>>>>> Easily fixed--get rid of the concept of bundles.
>>>> Usual downsides of superscalar notwithstanding. Falling back to
>>>> scalar-only operation being similarly undesirable.
>>> <
>>> Everyone and his brother have done superscalar without bundles.
>>> Conversely all static VLIW forms have failed.
>>> <
>>> Now, what would you do if you got an FPGA with the resources to do
>>> a 4-wide or 5-wide but not a 6-wide machine ??
>> Dunno...
>>
>> A lot of DSP's and some microcontrollers and similar have gotten along
>> OK with VLIW.
>>
>>
>> Under my existing practice, it would be:
>> Define a new WEX profile for the new rules;
>> Modify compiler to allow rules for new profile;
>> Deal with annoyance of resulting compatibility issues.
>>
>>
>> But, current thinking was more like:
>> I will canonize on 3-wide;
>> 4+ wide, by that point, can probably afford superscalar...
>>
>>
>> There is little that should prevent superscalar. Since, the bundling
>> rules still require that the instruction sequence is also "sane" if the
>> instructions are executed sequentially.
>>
>>
>> As for 3-wide, at present, there doesn't seem to be enough "free ILP"
>> floating around to justify going wider. Even 3 is pushing it, but the
>> main advantage that 3-wide has over 2-wide is that it makes it easier to
>> justify a 6-port register file (which sidesteps some limitations which
>> result in my case from a 4-port register file).
>>
>> Well, and a 2-wide configuration with a 6R register file costs almost as
>> much as a 3-wide configuration.
> <
> There is a time and place for fully-resourcing a machine--the register file
> is not one of them.
> <
> A 3-wide machine which can provide 2-2-2 and 3-2-1 register operands
> to any function units is far more performing than a 2-wide machine 3-3.
> {Counting long constants} as much as 45% of instructions use a constant
> and thereby don't need the second register port, while only 10%-ish need
> all 3.
> <
> Also note: Storing of a constant becomes independent of any register dependency.
> <

My layouts are, as-is:
1-wide: 3R1W and 6R2W (128-bit operand)
2-wide: 2x 3R1W
3-wide: 3x 2R1W

But, yeah, 3-wide with a 6R3W register file looked like the local optimum.

Given:
3R1W, generally needed for 1-wide;
4R2W (2-wide), kinda sucked due to limitations.
Though, slightly more capable than 1-wide;
Can be made to support SIMD and the MOV.X instructions.
6R2W (2-wide), almost as expensive as 3-wide;
6R3W (3-wide), only slightly more cost, but more capable.

Even if infrequent, in cases where they come up, 3-wide execution is
nice to have.

One "choke point" for ILP is only having a single memory port, which for
a lot of code, it seems like the only way to get more ILP would be to be
able to support 2 memory access per clock-cycle.

But, pulling this off effectively in the L1 D$ is a bit more of an issue
(I had looked before at a second read-only memory port, but the gains
weren't quite enough to justify the "fairly steep" cost increase).


Click here to read the complete article
register windows (was: More of my philosophy ...)

<2023Aug17.100603@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33665&group=comp.arch#33665

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: register windows (was: More of my philosophy ...)
Date: Thu, 17 Aug 2023 08:06:03 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 68
Message-ID: <2023Aug17.100603@mips.complang.tuwien.ac.at>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com> <7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <ubeclc$2gi1m$1@dont-email.me> <36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com> <ubj6bh$2ek2d$2@newsreader4.netcologne.de> <5caa71f9-d744-461b-96f4-3fd4d2e3a108n@googlegroups.com> <ubjf63$2eqg6$2@newsreader4.netcologne.de> <b42c7084-798d-418f-af89-0f454a296e9bn@googlegroups.com>
Injection-Info: dont-email.me; posting-host="375392cab78ae53127f103eb12c2991a";
logging-data="3914438"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18QUCY0JQh3ySWnLyAm23K5"
Cancel-Lock: sha1:xMa7NyiLMLjklsD12qJq4qrSZUM=
X-newsreader: xrn 10.11
 by: Anton Ertl - Thu, 17 Aug 2023 08:06 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>Only those betting against the power of the optimizing compilers choose
>register windows (and some that had to bet with them--Itanic for example)

One could say that for Berkeley RISC vs. Stanford MIPS (and the
commercial architectures that were derived from them, SPARC/29K and
MIPS).

But as you note, the IA-64 architects very much bet on optimizing
compilers, so why did they choose the register stack, and why didn't
the ARM A64 and RISC-V architects?

While in SPEC89/92/95 benchmarketing with statically-linked
link-time-optimized and profile-feedback-optimized binaries the
overhead of saving and restoring the registers at call boundaries is
small, in production settings on an in-order machine it is
significantly larger, because of more object-oriented code than in
SPEC89-95, dynamic linking (static linking costs development time and
space as well as end-user space, especially in the 1990s), no LTO
(costs too much link time), and no profile-feedback (costs too much
developer and compile time). And while the IA-64 architects thought
they could win with their architectural ideas and optimizing compilers
within functions, they obviously thought that it would be a good idea
to provide architectural support for fast calls rather than relying on
the power of benchmarketing compiler settings. They also had numbers
for the amount of loads and stores saved by the register stack.

When A64 and RISC-V were designed, the high-performance cores all used
out-of-order execution. OoO (especially sophisticated OoO with
hardware alias prediction) reduces the cost of saving and restoring
registers compared to in-order CPUs. The saves depend on the stack
pointer and the saved data, the loads just depend on the stack
pointer. The stack pointer is not updated that often on these
architectures, so it is likely available relatively early, so the
loads can be performed relatively early, and the results are therefore
also available early, meaning that the restoring had little influence
on execution time (and the saving had little influence anyway).

If a save has to wait long for its data, and the restore of that data
is relatively shortly after (the only case where the load latency
could be a problem), the advanced store-to-load-forwarding mechanisms
of modern CPUs will rename the saved value directly into the target
register of the load, without incurring the save-load roundtrip
latency. In essence, these combinations of microarchitectural
mechanisms achieve what register windows was designed to achieve, only
coming from the other side.

An A64, the store-pair and load-pair instructions also reduce the
resource usage of the load-store units by the saving and restoring by
up to 50%.

Overall, I think that register windows/stack is a valid design choice
for an architecture designed for in-order execution, but OoO has won,
and there you don't need this feature.

>It also caused them to a) have to expand the instructions back to size
>and then b) fuse instructions together. Literature indicates 5% by
>fusing. In contrast, My 66000 ISA only needs 70% of the instruction
>count of RISC-V {average, 69% Geomean, 68% Harmonic Mean} over
>the 560 subroutines I have spent the time to examine in fine detail.

It's easy to win on a metric the RISC-V architects are not optimizing
for.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: More of my philosophy about CISC and RISC instructions..

<2023Aug17.173147@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33666&group=comp.arch#33666

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Thu, 17 Aug 2023 15:31:47 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 18
Distribution: world
Message-ID: <2023Aug17.173147@mips.complang.tuwien.ac.at>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com> <7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com> <7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <ubeclc$2gi1m$1@dont-email.me> <36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com> <ubj6bh$2ek2d$2@newsreader4.netcologne.de>
Injection-Info: dont-email.me; posting-host="375392cab78ae53127f103eb12c2991a";
logging-data="4019716"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX198HMbXcDe72KzTa1rm13Fh"
Cancel-Lock: sha1:Fd6qLW9YsL3ja0MMjx3JOLdJ6MU=
X-newsreader: xrn 10.11
 by: Anton Ertl - Thu, 17 Aug 2023 15:31 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
[IBM 801]
>(It was also a 24-bit machine, which seems strange, but probably
>due to IBM internal politics).

According to
<http://www.bitsavers.org/pdf/ibm/system801/The_801_Minicomputer_an_Overview_Sep76.pdf>,
Page 9:

|[...] why we did not go to 32 bit registers. Primarily the reason is
|that a technucal case is hard to make for the additional cost.
[...]
|The CPU cost will grow from 7,600 gates to about 10,000 gates.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: More of my philosophy about CISC and RISC instructions..

<ublnl8$2npa$1@gal.iecc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33668&group=comp.arch#33668

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: johnl@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Thu, 17 Aug 2023 18:07:36 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <ublnl8$2npa$1@gal.iecc.com>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com> <36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com> <ubj6bh$2ek2d$2@newsreader4.netcologne.de> <2023Aug17.173147@mips.complang.tuwien.ac.at>
Injection-Date: Thu, 17 Aug 2023 18:07:36 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="89898"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com> <36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com> <ubj6bh$2ek2d$2@newsreader4.netcologne.de> <2023Aug17.173147@mips.complang.tuwien.ac.at>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Thu, 17 Aug 2023 18:07 UTC

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>Thomas Koenig <tkoenig@netcologne.de> writes:
>[IBM 801]
>>(It was also a 24-bit machine, which seems strange, but probably
>>due to IBM internal politics).
>
>According to
><http://www.bitsavers.org/pdf/ibm/system801/The_801_Minicomputer_an_Overview_Sep76.pdf>,
>Page 9:
>
>|[...] why we did not go to 32 bit registers. Primarily the reason is
>|that a technucal case is hard to make for the additional cost.
>[...]
>|The CPU cost will grow from 7,600 gates to about 10,000 gates.

The 801 project, probably more than any CPU design before or after,
didn't do anything in hardware if they could do it in the compiler,
and their PL.8 compiler was very good. That's why they didn't have
register windows or even load/store multiple. It was the first
compiler to do graph coloring and managed loads and stores well enough
that fancy register instructions weren't needed. Its descendants
compromised with the reality that not all compilers were as good as
PL.8, so the ROMP was 32 bits and had LM/STM.

The Berkeley RISC people were using the old PCC compiler which did
Sethi-Ullman numbering to get expressions to fit into the registers
available and not much else, and let you use register declarations
to tell it to put variables in the registers. No wonder they invented
register windows.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: More of my philosophy about CISC and RISC instructions..

<ublo7b$3rqu0$2@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33669&group=comp.arch#33669

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Thu, 17 Aug 2023 11:17:15 -0700
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <ublo7b$3rqu0$2@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com>
<bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<ubeclc$2gi1m$1@dont-email.me>
<36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com>
<ubj6bh$2ek2d$2@newsreader4.netcologne.de>
<2023Aug17.173147@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 17 Aug 2023 18:17:15 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ee069377dca074244065d837b48cc752";
logging-data="4058048"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+7VlzK7qDlcZi8ogsma8wBIByxQD5VJ2s="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:rke3axCoCHZRTZF0wR+KWYM/wlk=
Content-Language: en-US
In-Reply-To: <2023Aug17.173147@mips.complang.tuwien.ac.at>
 by: Stephen Fuld - Thu, 17 Aug 2023 18:17 UTC

On 8/17/2023 8:31 AM, Anton Ertl wrote:
> Thomas Koenig <tkoenig@netcologne.de> writes:
> [IBM 801]
>> (It was also a 24-bit machine, which seems strange, but probably
>> due to IBM internal politics).
>
> According to
> <http://www.bitsavers.org/pdf/ibm/system801/The_801_Minicomputer_an_Overview_Sep76.pdf>,
> Page 9:
>
> |[...] why we did not go to 32 bit registers. Primarily the reason is
> |that a technucal case is hard to make for the additional cost.
> [...]
> |The CPU cost will grow from 7,600 gates to about 10,000 gates.
>

Also, remember that the original 801 was designed for an embedded
application, specifically a telephone switch, so there was probably no
need for 32 bit registers, etc.

https://en.wikipedia.org/wiki/IBM_801#Original_concept

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: More of my philosophy about CISC and RISC instructions..

<ubn20o$5d2d$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33672&group=comp.arch#33672

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Fri, 18 Aug 2023 01:10:29 -0500
Organization: A noiseless patient Spider
Lines: 139
Message-ID: <ubn20o$5d2d$1@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com>
<bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me>
<3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 18 Aug 2023 06:10:32 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a344d91824e041232a6dada93d7ed5b3";
logging-data="177229"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX191I7bY29KQ3GZIc2TFS22a"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:DY5UeM2eB20AMIczuG7VjvWO1C0=
Content-Language: en-US
In-Reply-To: <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
 by: BGB - Fri, 18 Aug 2023 06:10 UTC

On 8/16/2023 12:04 PM, pec...@gmail.com wrote:
> BGB wrote:
>>> I started to think that RVC should be removed from specification, and its opcode space should be essentially free for any use.
>>> Code compression could be optional and vendor specific, performed during installation or loading/linking.
>>> Compilers are unaware of it anyway and it doesn't affect the size of zipped binaries used for distribution
>>> Reserved part of 16-bit space alone could double available 32 bit opcode space.
>>>
>> I would almost be inclined to agree, but more because the existing RVC
>> encoding scheme is *awful* (like, someone looked at Thumb and was then
>> like, "Hey man, hold my beer!").
> That's why I wrote "vendor specific".
> Generally compression scheme should be extension-agnostic (=orthogonal), and concentrated on low-end applications, because it is
> the only performance boosting feature in the ISA for this segment.
>
> Unfortunately they (risc nazi) managed to add compressed floating point instructions.
> The real irony is that it is the least important area. Most of the cores have no fpu at all. Big cores perform most of the floating point operations in the SIMD units. There is not much room int the market for middle ground.
> Moreover, floating point code is quite regular, concentrated in the small loop kernels - performance impact of compression will be negligible.
>

Yeah.

Realistically, a few major things make sense as 16-bit ops:
MOV Reg, Reg
ADD Reg, Reg
MOV Imm, Reg
ADD Imm, Reg
A selection of basic Load/Store ops;
A few short branch encodings;
...

It makes sense to give the instructions which appear in the highest
densities the shorter encodings, and one can gloss over everything else.

Also preferably without the encoding scheme being a dog-chewed mess.
Granted, my own ISA is not entirely free of dog-chew, but both it and
RISC-V sort of have this in common.

Mine has some encoding wonk from its origins as an ISA originally with
16-bit instructions (which, ironically, has been gradually migrating
away from its 16-bit origins).

Having recently seen some of Mitch's encoding, I can admit that it is at
least "not dog chewed".

Though, it does seem to lean a little further in the direction of
immediate bits at the expense of opcode bits.

But, OTOH, there are tradeoffs here.

And, admittedly, on the other side, not as many people are as likely to
agree to my sentiment that 9-bits for more immediate and displacement
fields is "mostly sufficient".

Well, and my instruction listing has also gotten bigger than I would
prefer, ...

Where, as can be noted, if expressed in bits (this for the XG2 variant):
NMOp ZpZZ nnnn mmmm ZZZZ Znmo oooo ZZZZ //3R
NMYp ZpZZ nnnn mmmm ZZZZ ZnmZ ZZZZ ZZZZ //2R
NMIp ZpZZ nnnn mmmm ZZZZ Znmi iiii iiii //3RI (Imm9 / 10s)
NMIp ZpZZ nnnn ZZZZ ZZZZ Znii iiii iiii //2RI (Imm10 / 11s)
NYYp 1p00 ZZZn nnnn iiii iiii iiii iiii //2RI (Imm16)
YYYp 1p1Z iiii iiii iiii iiii iiii iiii //Imm24/Jumbo/PrWEX

Where, Z is the bits effectively used as part of the opcode.
n/m/o: Register, i=immediate, p=predicate.
M/N/O: Register (high inverted bit)
Y: Reserved for Opcode (future, must be 1 for now).

Or, for Baseline:
111p ZpZZ nnnn mmmm ZZZZ Znmo oooo ZZZZ //3R
111p ZpZZ nnnn mmmm ZZZZ ZnmZ ZZZZ ZZZZ //2R
111p ZpZZ nnnn mmmm ZZZZ Znmi iiii iiii //3RI (Imm9)
111p ZpZZ nnnn ZZZZ ZZZZ Znii iiii iiii //2RI (Imm10)
111p 1p00 ZZZn nnnn iiii iiii iiii iiii //2RI (Imm16)

Where, as noted, the baseline encoding has 5-bit register fields.

There are limits though to what is possible within a 32 bits layout.

And, I had made what tradeoffs I had made...

>
>> So, 16K or 32K appears to be a local optimum here.
> Advanced prediction definitely lowers the pressure on i-cache even further.
>

Yeah.

Predication can help to reduce the overall "branchiness" of the code:
Average trace-length gets longer;
The number of branch ops goes down;
One can save a lot of cycles with short if-expressions;
...

Some tasks that are painfully slow on more conventional processors can
see a nice speed boost:
Range-clamping expressions;
The PNG Paeth filter;
Things like range coders;
...

Granted, a compiler can't always know which is better, since knowledge
about whether or not a given branch is predictable is not known at
compile time.

....

Otherwise, I have been distracted recently.
There was my "new C compiler" sub-effort, and then going and
implementing a rasterizer module.

Also recently, working in the shop (my "day job"), where I have recently
gotten (and am setting up) a newer/fancier CNC milling machine.

A few pictures here:
https://twitter.com/cr88192/status/1692344456337375421

Admittedly, wasn't a particularly cheap machine though.
When fully assembled, it will be one of the ones with an enclosed
cabinet and a flood-cooling system (and will have an automatic tool
changer, but this part is still in the boxes). At present, still
unboxing and assembling stuff.

Lots of random stuff going on at the moment...

Re: More of my philosophy about CISC and RISC instructions..

<2023Aug18.094653@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33675&group=comp.arch#33675

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Fri, 18 Aug 2023 07:46:53 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 61
Message-ID: <2023Aug18.094653@mips.complang.tuwien.ac.at>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com> <36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com> <ubj6bh$2ek2d$2@newsreader4.netcologne.de> <2023Aug17.173147@mips.complang.tuwien.ac.at> <ublnl8$2npa$1@gal.iecc.com>
Injection-Info: dont-email.me; posting-host="a7c9d4ae395aed9912cda3e3d3b97471";
logging-data="210993"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+7gZEEvyhm5A/HlBpodc3M"
Cancel-Lock: sha1:klB9zUqO2y+4TU3m/pfrlMzRfwI=
X-newsreader: xrn 10.11
 by: Anton Ertl - Fri, 18 Aug 2023 07:46 UTC

John Levine <johnl@taugh.com> writes:
>According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>>Thomas Koenig <tkoenig@netcologne.de> writes:
>>[IBM 801]
>>>(It was also a 24-bit machine, which seems strange, but probably
>>>due to IBM internal politics).
>>
>>According to
>><http://www.bitsavers.org/pdf/ibm/system801/The_801_Minicomputer_an_Overview_Sep76.pdf>,
>>Page 9:
>>
>>|[...] why we did not go to 32 bit registers. Primarily the reason is
>>|that a technucal case is hard to make for the additional cost.
>>[...]
>>|The CPU cost will grow from 7,600 gates to about 10,000 gates.
>
>The 801 project, probably more than any CPU design before or after,
>didn't do anything in hardware if they could do it in the compiler,
>and their PL.8 compiler was very good. That's why they didn't have
>register windows or even load/store multiple. It was the first
>compiler to do graph coloring and managed loads and stores well enough
>that fancy register instructions weren't needed. Its descendants
>compromised with the reality that not all compilers were as good as
>PL.8, so the ROMP was 32 bits and had LM/STM.

I don't think that the 24-bit architecture has anything to do with the
quality of the compiler. The 801 was a research project, and building
it as 24-bit architecture was good enough as a proof-of-concept (and
for a number of applications).

The 32-bitness of the ROMP was probably needed for the step from
research project to product. The load/store-multiple may also have to
do with that: Another poster here explained how load/store-multiple on
ARM increased the block-copy performance by close to a factor of 4,
and this may be relevant for a product like ROMP that does not have an
I-cache (unlike the MIPS R2000 typically had).

Interestingly, Power kept load/store-multiple despite having an
instruction cache already in its first incarnation. Maybe the large
number of callee-saved registers in the Power(PC) calling convention
has to do with the availability of load/store-multiple.

It's interesting that Power and MIPS, both with lots of compiler
expertise, ended up on opposite extremes of the number of callee-saved
registers in their calling conventions.

>The Berkeley RISC people were using the old PCC compiler which did
>Sethi-Ullman numbering to get expressions to fit into the registers
>available and not much else, and let you use register declarations
>to tell it to put variables in the registers. No wonder they invented
>register windows.

They certainly must have used registers also for local variables in
Pascal (and auto variables in C). If they had kept auto variables in
memory like the original PCC, there would be little point to register
windows.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: More of my philosophy about CISC and RISC instructions..

<%EMDM.147258$X02a.70096@fx46.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33680&group=comp.arch#33680

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx46.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: More of my philosophy about CISC and RISC instructions..
Newsgroups: comp.arch
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com> <7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com> <7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com> <2fc528c1-c0d4-4f20-8ce9-5845e9b805e0n@googlegroups.com>
Lines: 19
Message-ID: <%EMDM.147258$X02a.70096@fx46.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Fri, 18 Aug 2023 16:05:15 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Fri, 18 Aug 2023 16:05:15 GMT
X-Received-Bytes: 1698
 by: Scott Lurndal - Fri, 18 Aug 2023 16:05 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Monday, August 14, 2023 at 5:45:10=E2=80=AFAM UTC-5, pec...@gmail.com wr=
>ote:

>> Reserved part of 16-bit space alone could double available 32 bit opcode =
>space.
><
>RISC-V allocates 3/4 of the OpCode encoding to 16-bit stuff and gains all t=
>he complexity of variable length instructions but gains little of the benef=
>its.

ARM has the Thumb32 instruction set, which I just finished a simulator for,
which reserves three of the 16-bit encodings to indicate 32-bit instructions.

It also includes the rather unusual T16 IT instruction (If-Then) which, as a form
of predication, can cover up to four subsequent T16 instructions.

It's worth noting that the IT instruction was deprecated in the thumb
support for AArch32 in ARMv8+.

Re: More of my philosophy about CISC and RISC instructions..

<d1NDM.190689$ens9.96753@fx45.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33682&group=comp.arch#33682

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx45.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: More of my philosophy about CISC and RISC instructions..
Newsgroups: comp.arch
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com> <36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com> <ubj6bh$2ek2d$2@newsreader4.netcologne.de> <2023Aug17.173147@mips.complang.tuwien.ac.at> <ublnl8$2npa$1@gal.iecc.com> <2023Aug18.094653@mips.complang.tuwien.ac.at>
Lines: 67
Message-ID: <d1NDM.190689$ens9.96753@fx45.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Fri, 18 Aug 2023 16:31:05 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Fri, 18 Aug 2023 16:31:05 GMT
X-Received-Bytes: 4226
 by: Scott Lurndal - Fri, 18 Aug 2023 16:31 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>John Levine <johnl@taugh.com> writes:
>>According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>>>Thomas Koenig <tkoenig@netcologne.de> writes:
>>>[IBM 801]
>>>>(It was also a 24-bit machine, which seems strange, but probably
>>>>due to IBM internal politics).
>>>
>>>According to
>>><http://www.bitsavers.org/pdf/ibm/system801/The_801_Minicomputer_an_Overview_Sep76.pdf>,
>>>Page 9:
>>>
>>>|[...] why we did not go to 32 bit registers. Primarily the reason is
>>>|that a technucal case is hard to make for the additional cost.
>>>[...]
>>>|The CPU cost will grow from 7,600 gates to about 10,000 gates.
>>
>>The 801 project, probably more than any CPU design before or after,
>>didn't do anything in hardware if they could do it in the compiler,
>>and their PL.8 compiler was very good. That's why they didn't have
>>register windows or even load/store multiple. It was the first
>>compiler to do graph coloring and managed loads and stores well enough
>>that fancy register instructions weren't needed. Its descendants
>>compromised with the reality that not all compilers were as good as
>>PL.8, so the ROMP was 32 bits and had LM/STM.
>
>I don't think that the 24-bit architecture has anything to do with the
>quality of the compiler. The 801 was a research project, and building
>it as 24-bit architecture was good enough as a proof-of-concept (and
>for a number of applications).
>
>The 32-bitness of the ROMP was probably needed for the step from
>research project to product. The load/store-multiple may also have to
>do with that: Another poster here explained how load/store-multiple on
>ARM increased the block-copy performance by close to a factor of 4,
>and this may be relevant for a product like ROMP that does not have an
>I-cache (unlike the MIPS R2000 typically had).
>
>Interestingly, Power kept load/store-multiple despite having an
>instruction cache already in its first incarnation. Maybe the large
>number of callee-saved registers in the Power(PC) calling convention
>has to do with the availability of load/store-multiple.
>
>It's interesting that Power and MIPS, both with lots of compiler
>expertise, ended up on opposite extremes of the number of callee-saved
>registers in their calling conventions.
>
>>The Berkeley RISC people were using the old PCC compiler which did
>>Sethi-Ullman numbering to get expressions to fit into the registers
>>available and not much else, and let you use register declarations
>>to tell it to put variables in the registers. No wonder they invented
>>register windows.
>
>They certainly must have used registers also for local variables in
>Pascal (and auto variables in C). If they had kept auto variables in
>memory like the original PCC, there would be little point to register
>windows.

I had to fix a bug in the Moto 88100 version of PCC related to the
temporary register allocation back in 1990 - I don't recall if it
was a defect in the implementation of the sethi ullman algorithms,
or if it just plain weren't there in the 88100 version. My recollection
is that it was the later case.

PCC in this case was fed the output from cfront which was prone to
generating long statements with multiple comma operators resulting
in large expression trees and failure allocating temp registers.

Re: More of my philosophy about CISC and RISC instructions..

<ubo9i9$be1i$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33684&group=comp.arch#33684

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Fri, 18 Aug 2023 12:25:27 -0500
Organization: A noiseless patient Spider
Lines: 199
Message-ID: <ubo9i9$be1i$1@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com>
<bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<ubeclc$2gi1m$1@dont-email.me>
<36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com>
<793a9545-066b-4819-9bd1-ba0453f92955n@googlegroups.com>
<ubjc4h$3e8i6$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 18 Aug 2023 17:25:30 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a344d91824e041232a6dada93d7ed5b3";
logging-data="374834"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ev8YqGn6Ci60AdtKVKMCF"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:inArSgYCQDXOrF1XJgSba2exy+8=
Content-Language: en-US
In-Reply-To: <ubjc4h$3e8i6$1@dont-email.me>
 by: BGB - Fri, 18 Aug 2023 17:25 UTC

On 8/16/2023 3:38 PM, Stephen Fuld wrote:
> On 8/16/2023 1:19 PM, Timothy McCaffrey wrote:
>> On Wednesday, August 16, 2023 at 2:18:25 PM UTC-4, pec...@gmail.com
>> wrote:
>>> Brett wrote:
>>>> MitchAlsup <Mitch...@aol.com> wrote:
>>>>> 32K is 25% bigger than 24K but only 1.1% faster, and likely burns more
>>>>> than 1.1% more power.
>>>>> <
>>>>> Comparing 64K 4-way to 48K 6-way:: 64K is only 0.7% faster; with 1M
>>>>> L2 only 0.4% faster.
>>>> This is the killer argument that would have saved me from caring
>>>> about 16
>>>> bit opcodes.
>>>> Only toy CPU’s can care about 16 bit opcodes.
>>> Instruction compression still matters in embedded applications.
>>
>> Given a variable length instruction set, it seems to me it makes sense
>> to encode the most used
>> instructions into small instructions, if possible.
>
>
> Sure.
>
>
>
>  I believe I have read that the most used
>> instructions are load, compare, add and branch.  The rest are in the
>> single digits percentage wise.
>> (I wish I had a reference, so take the above with a rock sized grain
>> of salt).
>
> I think that is at least approximately right.
>
>
>>  Anyway, if
>> you could encode those instructions into a 16 bit word, and leave the
>> longer instructions
>> for all the useful but not used that much remainder, wouldn't that
>> basically "compress"
>> your instruction set (even if variants of the longer instructions
>> "overlapped" the short instructions,
>> it would probably still be a win).
>
>
> Yes, but . . .  For loads, you would be limited to a very short
> displacement, limiting their usefulness.  You almost certainly wouldn't
> use three register specifiers, which limits adds to A=A+B, which isn't
> terrible, but an annoyance.  Having a small immediate field proobably
> isn't much of a problem, as I think many constant adds are of a small
> number.  Branches are probably OK with a smaller displacement, as I
> suspect a lot of branches are to quite close.  With compare, are you
> proposing using condition codes?  Otherwise you have the three register
> specifier problem - eccch.
>

Yeah, you aren't getting much over a 3 or 4 bit displacement.

In my case, I had:
MOV.x (Rm), Rn //Load, no offset
MOV.x Rn, (Rm) //Store, no offset
MOV.x (Rm, R0), Rn //Load, R0 is offset
MOV.x Rn, (Rm, R0) //Store, R0 is offset

MOV.{L/Q} (SP, Disp4), Rn //SP+Disp4*(4|8)
MOV.{L/Q} Rn, (SP, Disp4) //SP+Disp4*(4|8)

MOV.L (Rm, Disp3), Rn //Rm+Disp3*4
MOV.L Rn, (Rm, Disp3) //Rm+Disp3*4

In my case, the ISA uses an "SR.T" flag bit as a general-purpose
true/false flag, which at least has less issues than condition codes.
Still less ideal for purists.

A few ops were defined for the range of R0..R31:
MOV Rm,Rn
MOV Imm4,Rn
MOV.(L/Q) {SP,Disp4},Rn
...

Nearly everything else in the 16-bit part of the ISA is limited to
R0..R15. This part of the ISA does not have any access to R32..R63.

Unlike some other "compressed" ISA's, I didn't even really bother with
3-register ALU ops.

Partly this was based on noting that limiting them to 8 registers, even
if selecting the most commonly used registers, would still leave them
"borderline useless". Well, at least short of the compiler aggressively
trying to keep variables in these particular registers (likely
increasing spill rate as a result).

> I think these considerations reduce (but probably don't eliminate) the
> percentage of time the 16 bit instructions would be useful.
>

Early on, it was closer to 60% 16-bit, 40% 32-bit...

But, as things "evolved" it has drifted closer to 15% 16-bit, 85% 32-bit
(speed optimized code), and 35% 16-bit, 65% 32-bit, for size-optimized code.

Reasons are "various".

Partly it is that, generating "denser" / "more efficient" code has
(ironically enough) reduced many of the situations where the existing
16-bit ops were useful.

Note, for 16-bit percentage vs purely 32-bit:
60%: 70% original size.
35%: 82% original size.
15%: 93% original size.

Any performance advantage is mostly negligible with a 32K L1 I$.

Main area it does have an effect is mostly related to how much code I
can fit into a 32K Boot ROM.

Generated ".text" size is generally smaller than x86-64, A64, and SH-4.

Though does not quite match i386 or Thumb though (there tending to be a
fairly large code-size delta between i386 and x86-64, *).
But, appears "competitive".

If compiling equivalent code, I seem to be getting smaller code than for
RISC-V.

*: If "hand-compiling" code, x86-64 would be more size competitive, but
it seems like modern compilers (particularly MSVC) tend towards "bulky"
code generation (beyond what could be attributed to the presence of
absence of a REX prefix on each instruction).

Though, one thing that can throw off measurements for "naive comparison"
is that in my case, I also tend to effectively link the whole OS kernel
to the binaries in many cases (which adds a fair bit of code-size
overhead; but in the emulator, allows running them without also needing
to use the shell).

Well, also naive file-size comparisons also can't really be used, given
BGBCC tends to produce LZ4 compressed binaries by default.

But, yeah, otherwise I could be like "Hey, check it out, my binaries are
smaller than i386 binaries" (even when comparing a binary with a
static-linked kernel with a shared-object libc...).

....

Though, in terms of practical tradeoffs, who "wins" in terms of
code-density likely doesn't matter that much, and as long as one isn't
"massively worse", it is probably OK.

So, say, ".text" size for Doom:
BJX2: ~ 500K (with the kernel linked in, ...)
Probably fine...
MSVC (x64): ~ 1400K
What exactly is going on here?...
GCC (x86-64) + SDL (Linux, shared objects): ~ 800K
Original DOS version: Also ~ 500K-ish
(EXE: ~700K, ~200K initialized data + strings)
Thumb2 (with dynamic C library):
~ 360K.
RISC-V (RV64IMA, basic C library only, static-linked):
~ 700K
...

Meanwhile:
Doom in 7..9MB: "Oh dear, what is this crap?!"
If it looks anything like this, there is a problem...

Not everything is equivalent between programs.

Have noted that despite similar ".text" sizes, both i386 and Thumb code
does seem to LZ compress better than BJX2 code, so something is going on
here as well. Both i386 and Thumb seem to be readily compressible.

In general though, LZ4 seems to be reasonably well suited to binaries
(my RP2 format does better for general data compression, but compiler
output across several ISA's seems to more favor LZ4 for whatever reason).

....

Re: More of my philosophy about CISC and RISC instructions..

<2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33685&group=comp.arch#33685

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:4f4e:0:b0:63c:fd45:7d69 with SMTP id eu14-20020ad44f4e000000b0063cfd457d69mr81244qvb.2.1692381167731;
Fri, 18 Aug 2023 10:52:47 -0700 (PDT)
X-Received: by 2002:a17:902:ea11:b0:1bd:da96:dc74 with SMTP id
s17-20020a170902ea1100b001bdda96dc74mr1322520plg.6.1692381167317; Fri, 18 Aug
2023 10:52:47 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.1d4.us!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 18 Aug 2023 10:52:46 -0700 (PDT)
In-Reply-To: <ubn20o$5d2d$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5de9:58df:ef8d:b257;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5de9:58df:ef8d:b257
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Fri, 18 Aug 2023 17:52:47 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 7599
 by: MitchAlsup - Fri, 18 Aug 2023 17:52 UTC

On Friday, August 18, 2023 at 1:10:37 AM UTC-5, BGB wrote:
> On 8/16/2023 12:04 PM, pec...@gmail.com wrote:
> > BGB wrote:
> >>> I started to think that RVC should be removed from specification, and its opcode space should be essentially free for any use.
> >>> Code compression could be optional and vendor specific, performed during installation or loading/linking.
> >>> Compilers are unaware of it anyway and it doesn't affect the size of zipped binaries used for distribution
> >>> Reserved part of 16-bit space alone could double available 32 bit opcode space.
> >>>
> >> I would almost be inclined to agree, but more because the existing RVC
> >> encoding scheme is *awful* (like, someone looked at Thumb and was then
> >> like, "Hey man, hold my beer!").
> > That's why I wrote "vendor specific".
> > Generally compression scheme should be extension-agnostic (=orthogonal), and concentrated on low-end applications, because it is
> > the only performance boosting feature in the ISA for this segment.
> >
> > Unfortunately they (risc nazi) managed to add compressed floating point instructions.
> > The real irony is that it is the least important area. Most of the cores have no fpu at all. Big cores perform most of the floating point operations in the SIMD units. There is not much room int the market for middle ground.
> > Moreover, floating point code is quite regular, concentrated in the small loop kernels - performance impact of compression will be negligible.
> >
> Yeah.
>
> Realistically, a few major things make sense as 16-bit ops:
> MOV Reg, Reg
> ADD Reg, Reg
> MOV Imm, Reg
> ADD Imm, Reg
> A selection of basic Load/Store ops;
> A few short branch encodings;
> ...
>
> It makes sense to give the instructions which appear in the highest
> densities the shorter encodings, and one can gloss over everything else.
>
>
> Also preferably without the encoding scheme being a dog-chewed mess.
> Granted, my own ISA is not entirely free of dog-chew, but both it and
> RISC-V sort of have this in common.
>
> Mine has some encoding wonk from its origins as an ISA originally with
> 16-bit instructions (which, ironically, has been gradually migrating
> away from its 16-bit origins).
>
>
>
> Having recently seen some of Mitch's encoding, I can admit that it is at
> least "not dog chewed".
<
This is a consequence of me having done a moderately dog-chewed ISA
in 1983, worked on SPARC for 9 years, then over in x86-64 for 7 years
then having done a GPU ISA, and then retired from working for corporations.
<
What you see is an attempt to combine the best features of RISC with the
best features of CISC (and there are some--much to the chagrin of the
puritans) into a cohesive and mostly orthogonal ISA.
>
> Though, it does seem to lean a little further in the direction of
> immediate bits at the expense of opcode bits.
<
Because it was here that pure RISC ISAs waste so many instructions on
pasting bits together only to sue them once as operands. So by inventing
universal constants all of these bit pasting instructions vanish from the
instruction stream.
>
>
> But, OTOH, there are tradeoffs here.
>
>
>
> And, admittedly, on the other side, not as many people are as likely to
> agree to my sentiment that 9-bits for more immediate and displacement
> fields is "mostly sufficient".
<
I agree it is "mostly sufficient", but wouldn't you rather have "almost entirely
sufficient" instead of "mostly sufficient" ?? i.e., 16-bits
>
> Well, and my instruction listing has also gotten bigger than I would
> prefer, ...
>
> Where, as can be noted, if expressed in bits (this for the XG2 variant):
> NMOp ZpZZ nnnn mmmm ZZZZ Znmo oooo ZZZZ //3R
> NMYp ZpZZ nnnn mmmm ZZZZ ZnmZ ZZZZ ZZZZ //2R
> NMIp ZpZZ nnnn mmmm ZZZZ Znmi iiii iiii //3RI (Imm9 / 10s)
> NMIp ZpZZ nnnn ZZZZ ZZZZ Znii iiii iiii //2RI (Imm10 / 11s)
> NYYp 1p00 ZZZn nnnn iiii iiii iiii iiii //2RI (Imm16)
> YYYp 1p1Z iiii iiii iiii iiii iiii iiii //Imm24/Jumbo/PrWEX
>
> Where, Z is the bits effectively used as part of the opcode.
> n/m/o: Register, i=immediate, p=predicate.
> M/N/O: Register (high inverted bit)
> Y: Reserved for Opcode (future, must be 1 for now).
>
> Or, for Baseline:
> 111p ZpZZ nnnn mmmm ZZZZ Znmo oooo ZZZZ //3R
> 111p ZpZZ nnnn mmmm ZZZZ ZnmZ ZZZZ ZZZZ //2R
> 111p ZpZZ nnnn mmmm ZZZZ Znmi iiii iiii //3RI (Imm9)
> 111p ZpZZ nnnn ZZZZ ZZZZ Znii iiii iiii //2RI (Imm10)
> 111p 1p00 ZZZn nnnn iiii iiii iiii iiii //2RI (Imm16)
>
> Where, as noted, the baseline encoding has 5-bit register fields.
>
>
> There are limits though to what is possible within a 32 bits layout.
<
I am on record that the ideal instruction size is 34-36-bits.
>
> And, I had made what tradeoffs I had made...
> >
> >> So, 16K or 32K appears to be a local optimum here.
> > Advanced prediction definitely lowers the pressure on i-cache even further.
> >
> Yeah.
>
>
> Predication can help to reduce the overall "branchiness" of the code:
> Average trace-length gets longer;
> The number of branch ops goes down;
> One can save a lot of cycles with short if-expressions;
> ...
>
> Some tasks that are painfully slow on more conventional processors can
> see a nice speed boost:
> Range-clamping expressions;
> The PNG Paeth filter;
> Things like range coders;
> ...
>
> Granted, a compiler can't always know which is better, since knowledge
> about whether or not a given branch is predictable is not known at
> compile time.
>
It often changes from predictable and back based on the data being processed
by the application.

Re: More of my philosophy about CISC and RISC instructions..

<ubomup$1fq1$1@gal.iecc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33689&group=comp.arch#33689

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: johnl@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Fri, 18 Aug 2023 21:14:01 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <ubomup$1fq1$1@gal.iecc.com>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com> <2023Aug17.173147@mips.complang.tuwien.ac.at> <ublnl8$2npa$1@gal.iecc.com> <2023Aug18.094653@mips.complang.tuwien.ac.at>
Injection-Date: Fri, 18 Aug 2023 21:14:01 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="48961"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com> <2023Aug17.173147@mips.complang.tuwien.ac.at> <ublnl8$2npa$1@gal.iecc.com> <2023Aug18.094653@mips.complang.tuwien.ac.at>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Fri, 18 Aug 2023 21:14 UTC

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>>>According to
>>><http://www.bitsavers.org/pdf/ibm/system801/The_801_Minicomputer_an_Overview_Sep76.pdf>,
>>>Page 9:
>>>
>>>|[...] why we did not go to 32 bit registers. Primarily the reason is
>>>|that a technucal case is hard to make for the additional cost.
>>>[...]
>>>|The CPU cost will grow from 7,600 gates to about 10,000 gates.

>I don't think that the 24-bit architecture has anything to do with the
>quality of the compiler. The 801 was a research project, and building
>it as 24-bit architecture was good enough as a proof-of-concept (and
>for a number of applications).

We don't have to guess. They explain the rationale on page 9 of that
paper. In an era when addresses were 24 bits or shorter, there wasn't
anything that was signficantly easier to do in 32 bits than 24, given
the excellent optimization their compiler did.

>The 32-bitness of the ROMP was probably needed for the step from
>research project to product. The load/store-multiple may also have to
>do with that: Another poster here explained how load/store-multiple on
>ARM increased the block-copy performance by close to a factor of 4,

Having written a fair amount of the kernel design and toolchain for
ROMP AIX, I hope I have some insights here.

I think the main reason is that by the late 1970s it was apparent that
24 bits of address wasn't enough, and registers definitely had to be
big enough to contain an address. The 801 depended on the compiler to
enforce code safety and that wasn't going to work when the code wasn't
all written in PL.8. (There were some efforts to put a C front end on
the compiler but they gave up when I explained how thoroughly confused
C's pointers and arrays are.) They added an MMU which I think was
the first reverse mapped one, with TLB misses trapping and handled
by software.

Flushing the entire MMU on a context switch would have been horrible
so they came up with a hack which I think was carried over into POWER.
Virtual addresses were 40 bits, with the high 12 bits considered the
segment number and the low 28 the address. They had a 16 entry fast
RAM of 12 bit segment numbers, so the high 4 bits of each virtual
address were mapped to a 12 bit segment number, and the 40 bit address
looked up in the MMU. Each process saw 16 segments of up to 256M, and
context switches just had to reload the 16 entry RAM. This also made
it easy to share segments. My recollection is that in AIX, segment 0
was the VM kernel, segment 1 was the Unix kernel, segments 2 and 3
were the program's code and data segments and I think we had another
segment for a large static shared C library.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: More of my philosophy about CISC and RISC instructions..

<8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33691&group=comp.arch#33691

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:14e:b0:403:a627:8b79 with SMTP id v14-20020a05622a014e00b00403a6278b79mr5597qtw.13.1692406214829;
Fri, 18 Aug 2023 17:50:14 -0700 (PDT)
X-Received: by 2002:a05:6a00:b4e:b0:687:94c2:106 with SMTP id
p14-20020a056a000b4e00b0068794c20106mr450436pfo.5.1692406214323; Fri, 18 Aug
2023 17:50:14 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!newsfeed.hasname.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 18 Aug 2023 17:50:13 -0700 (PDT)
In-Reply-To: <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.14.162; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.14.162
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: jim.brakefield@ieee.org (JimBrakefield)
Injection-Date: Sat, 19 Aug 2023 00:50:14 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 8589
 by: JimBrakefield - Sat, 19 Aug 2023 00:50 UTC

On Friday, August 18, 2023 at 12:52:49 PM UTC-5, MitchAlsup wrote:
> On Friday, August 18, 2023 at 1:10:37 AM UTC-5, BGB wrote:
> > On 8/16/2023 12:04 PM, pec...@gmail.com wrote:
> > > BGB wrote:
> > >>> I started to think that RVC should be removed from specification, and its opcode space should be essentially free for any use.
> > >>> Code compression could be optional and vendor specific, performed during installation or loading/linking.
> > >>> Compilers are unaware of it anyway and it doesn't affect the size of zipped binaries used for distribution
> > >>> Reserved part of 16-bit space alone could double available 32 bit opcode space.
> > >>>
> > >> I would almost be inclined to agree, but more because the existing RVC
> > >> encoding scheme is *awful* (like, someone looked at Thumb and was then
> > >> like, "Hey man, hold my beer!").
> > > That's why I wrote "vendor specific".
> > > Generally compression scheme should be extension-agnostic (=orthogonal), and concentrated on low-end applications, because it is
> > > the only performance boosting feature in the ISA for this segment.
> > >
> > > Unfortunately they (risc nazi) managed to add compressed floating point instructions.
> > > The real irony is that it is the least important area. Most of the cores have no fpu at all. Big cores perform most of the floating point operations in the SIMD units. There is not much room int the market for middle ground.
> > > Moreover, floating point code is quite regular, concentrated in the small loop kernels - performance impact of compression will be negligible.
> > >
> > Yeah.
> >
> > Realistically, a few major things make sense as 16-bit ops:
> > MOV Reg, Reg
> > ADD Reg, Reg
> > MOV Imm, Reg
> > ADD Imm, Reg
> > A selection of basic Load/Store ops;
> > A few short branch encodings;
> > ...
> >
> > It makes sense to give the instructions which appear in the highest
> > densities the shorter encodings, and one can gloss over everything else..
> >
> >
> > Also preferably without the encoding scheme being a dog-chewed mess.
> > Granted, my own ISA is not entirely free of dog-chew, but both it and
> > RISC-V sort of have this in common.
> >
> > Mine has some encoding wonk from its origins as an ISA originally with
> > 16-bit instructions (which, ironically, has been gradually migrating
> > away from its 16-bit origins).
> >
> >
> >
> > Having recently seen some of Mitch's encoding, I can admit that it is at
> > least "not dog chewed".
> <
> This is a consequence of me having done a moderately dog-chewed ISA
> in 1983, worked on SPARC for 9 years, then over in x86-64 for 7 years
> then having done a GPU ISA, and then retired from working for corporations.
> <
> What you see is an attempt to combine the best features of RISC with the
> best features of CISC (and there are some--much to the chagrin of the
> puritans) into a cohesive and mostly orthogonal ISA.
> >
> > Though, it does seem to lean a little further in the direction of
> > immediate bits at the expense of opcode bits.
> <
> Because it was here that pure RISC ISAs waste so many instructions on
> pasting bits together only to sue them once as operands. So by inventing
> universal constants all of these bit pasting instructions vanish from the
> instruction stream.
> >
> >
> > But, OTOH, there are tradeoffs here.
> >
> >
> >
> > And, admittedly, on the other side, not as many people are as likely to
> > agree to my sentiment that 9-bits for more immediate and displacement
> > fields is "mostly sufficient".
> <
> I agree it is "mostly sufficient", but wouldn't you rather have "almost entirely
> sufficient" instead of "mostly sufficient" ?? i.e., 16-bits
> >
> > Well, and my instruction listing has also gotten bigger than I would
> > prefer, ...
> >
> > Where, as can be noted, if expressed in bits (this for the XG2 variant):
> > NMOp ZpZZ nnnn mmmm ZZZZ Znmo oooo ZZZZ //3R
> > NMYp ZpZZ nnnn mmmm ZZZZ ZnmZ ZZZZ ZZZZ //2R
> > NMIp ZpZZ nnnn mmmm ZZZZ Znmi iiii iiii //3RI (Imm9 / 10s)
> > NMIp ZpZZ nnnn ZZZZ ZZZZ Znii iiii iiii //2RI (Imm10 / 11s)
> > NYYp 1p00 ZZZn nnnn iiii iiii iiii iiii //2RI (Imm16)
> > YYYp 1p1Z iiii iiii iiii iiii iiii iiii //Imm24/Jumbo/PrWEX
> >
> > Where, Z is the bits effectively used as part of the opcode.
> > n/m/o: Register, i=immediate, p=predicate.
> > M/N/O: Register (high inverted bit)
> > Y: Reserved for Opcode (future, must be 1 for now).
> >
> > Or, for Baseline:
> > 111p ZpZZ nnnn mmmm ZZZZ Znmo oooo ZZZZ //3R
> > 111p ZpZZ nnnn mmmm ZZZZ ZnmZ ZZZZ ZZZZ //2R
> > 111p ZpZZ nnnn mmmm ZZZZ Znmi iiii iiii //3RI (Imm9)
> > 111p ZpZZ nnnn ZZZZ ZZZZ Znii iiii iiii //2RI (Imm10)
> > 111p 1p00 ZZZn nnnn iiii iiii iiii iiii //2RI (Imm16)
> >
> > Where, as noted, the baseline encoding has 5-bit register fields.
> >
> >
> > There are limits though to what is possible within a 32 bits layout.
> <
> I am on record that the ideal instruction size is 34-36-bits.
> >
> > And, I had made what tradeoffs I had made...
> > >
> > >> So, 16K or 32K appears to be a local optimum here.
> > > Advanced prediction definitely lowers the pressure on i-cache even further.
> > >
> > Yeah.
> >
> >
> > Predication can help to reduce the overall "branchiness" of the code:
> > Average trace-length gets longer;
> > The number of branch ops goes down;
> > One can save a lot of cycles with short if-expressions;
> > ...
> >
> > Some tasks that are painfully slow on more conventional processors can
> > see a nice speed boost:
> > Range-clamping expressions;
> > The PNG Paeth filter;
> > Things like range coders;
> > ...
> >
> > Granted, a compiler can't always know which is better, since knowledge
> > about whether or not a given branch is predictable is not known at
> > compile time.
> >
> It often changes from predictable and back based on the data being processed
> by the application.

Ugh, possibilities:
|>I am on record that the ideal instruction size is 34-36-bits.

Seven 9-bit "bytes" will fit into 64-bits. So one could do one 27-bit instruction and one 36-bit instruction in 64-bits.
Given some flexibility in the their placement, and three configurations one can have: 32-32, 27-36 and 36-27 less the configuration bit(s).
So there is a way, if one will go into uncharted territory?
And, what is the percentage of 32 or 36 bit compiler generated instructions that will easily fit into 27-bits??

Re: More of my philosophy about CISC and RISC instructions..

<47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33692&group=comp.arch#33692

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:7f87:0:b0:40e:6f1:3d45 with SMTP id z7-20020ac87f87000000b0040e06f13d45mr6217qtj.2.1692407437378;
Fri, 18 Aug 2023 18:10:37 -0700 (PDT)
X-Received: by 2002:a05:6a00:189a:b0:67c:a3a6:7a70 with SMTP id
x26-20020a056a00189a00b0067ca3a67a70mr435721pfh.4.1692407436910; Fri, 18 Aug
2023 18:10:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!2.eu.feeder.erje.net!feeder.erje.net!fdn.fr!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 18 Aug 2023 18:10:36 -0700 (PDT)
In-Reply-To: <8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5de9:58df:ef8d:b257;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5de9:58df:ef8d:b257
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 19 Aug 2023 01:10:37 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Sat, 19 Aug 2023 01:10 UTC

On Friday, August 18, 2023 at 7:50:16 PM UTC-5, JimBrakefield wrote:
> On Friday, August 18, 2023 at 12:52:49 PM UTC-5, MitchAlsup wrote:
> > On Friday, August 18, 2023 at 1:10:37 AM UTC-5, BGB wrote:
> > > On 8/16/2023 12:04 PM, pec...@gmail.com wrote:
> > > > BGB wrote:
> > > >>> I started to think that RVC should be removed from specification, and its opcode space should be essentially free for any use.
> > > >>> Code compression could be optional and vendor specific, performed during installation or loading/linking.
> > > >>> Compilers are unaware of it anyway and it doesn't affect the size of zipped binaries used for distribution
> > > >>> Reserved part of 16-bit space alone could double available 32 bit opcode space.
> > > >>>
> > > >> I would almost be inclined to agree, but more because the existing RVC
> > > >> encoding scheme is *awful* (like, someone looked at Thumb and was then
> > > >> like, "Hey man, hold my beer!").
> > > > That's why I wrote "vendor specific".
> > > > Generally compression scheme should be extension-agnostic (=orthogonal), and concentrated on low-end applications, because it is
> > > > the only performance boosting feature in the ISA for this segment.
> > > >
> > > > Unfortunately they (risc nazi) managed to add compressed floating point instructions.
> > > > The real irony is that it is the least important area. Most of the cores have no fpu at all. Big cores perform most of the floating point operations in the SIMD units. There is not much room int the market for middle ground.
> > > > Moreover, floating point code is quite regular, concentrated in the small loop kernels - performance impact of compression will be negligible.
> > > >
> > > Yeah.
> > >
> > > Realistically, a few major things make sense as 16-bit ops:
> > > MOV Reg, Reg
> > > ADD Reg, Reg
> > > MOV Imm, Reg
> > > ADD Imm, Reg
> > > A selection of basic Load/Store ops;
> > > A few short branch encodings;
> > > ...
> > >
> > > It makes sense to give the instructions which appear in the highest
> > > densities the shorter encodings, and one can gloss over everything else.
> > >
> > >
> > > Also preferably without the encoding scheme being a dog-chewed mess.
> > > Granted, my own ISA is not entirely free of dog-chew, but both it and
> > > RISC-V sort of have this in common.
> > >
> > > Mine has some encoding wonk from its origins as an ISA originally with
> > > 16-bit instructions (which, ironically, has been gradually migrating
> > > away from its 16-bit origins).
> > >
> > >
> > >
> > > Having recently seen some of Mitch's encoding, I can admit that it is at
> > > least "not dog chewed".
> > <
> > This is a consequence of me having done a moderately dog-chewed ISA
> > in 1983, worked on SPARC for 9 years, then over in x86-64 for 7 years
> > then having done a GPU ISA, and then retired from working for corporations.
> > <
> > What you see is an attempt to combine the best features of RISC with the
> > best features of CISC (and there are some--much to the chagrin of the
> > puritans) into a cohesive and mostly orthogonal ISA.
> > >
> > > Though, it does seem to lean a little further in the direction of
> > > immediate bits at the expense of opcode bits.
> > <
> > Because it was here that pure RISC ISAs waste so many instructions on
> > pasting bits together only to sue them once as operands. So by inventing
> > universal constants all of these bit pasting instructions vanish from the
> > instruction stream.
> > >
> > >
> > > But, OTOH, there are tradeoffs here.
> > >
> > >
> > >
> > > And, admittedly, on the other side, not as many people are as likely to
> > > agree to my sentiment that 9-bits for more immediate and displacement
> > > fields is "mostly sufficient".
> > <
> > I agree it is "mostly sufficient", but wouldn't you rather have "almost entirely
> > sufficient" instead of "mostly sufficient" ?? i.e., 16-bits
> > >
> > > Well, and my instruction listing has also gotten bigger than I would
> > > prefer, ...
> > >
> > > Where, as can be noted, if expressed in bits (this for the XG2 variant):
> > > NMOp ZpZZ nnnn mmmm ZZZZ Znmo oooo ZZZZ //3R
> > > NMYp ZpZZ nnnn mmmm ZZZZ ZnmZ ZZZZ ZZZZ //2R
> > > NMIp ZpZZ nnnn mmmm ZZZZ Znmi iiii iiii //3RI (Imm9 / 10s)
> > > NMIp ZpZZ nnnn ZZZZ ZZZZ Znii iiii iiii //2RI (Imm10 / 11s)
> > > NYYp 1p00 ZZZn nnnn iiii iiii iiii iiii //2RI (Imm16)
> > > YYYp 1p1Z iiii iiii iiii iiii iiii iiii //Imm24/Jumbo/PrWEX
> > >
> > > Where, Z is the bits effectively used as part of the opcode.
> > > n/m/o: Register, i=immediate, p=predicate.
> > > M/N/O: Register (high inverted bit)
> > > Y: Reserved for Opcode (future, must be 1 for now).
> > >
> > > Or, for Baseline:
> > > 111p ZpZZ nnnn mmmm ZZZZ Znmo oooo ZZZZ //3R
> > > 111p ZpZZ nnnn mmmm ZZZZ ZnmZ ZZZZ ZZZZ //2R
> > > 111p ZpZZ nnnn mmmm ZZZZ Znmi iiii iiii //3RI (Imm9)
> > > 111p ZpZZ nnnn ZZZZ ZZZZ Znii iiii iiii //2RI (Imm10)
> > > 111p 1p00 ZZZn nnnn iiii iiii iiii iiii //2RI (Imm16)
> > >
> > > Where, as noted, the baseline encoding has 5-bit register fields.
> > >
> > >
> > > There are limits though to what is possible within a 32 bits layout.
> > <
> > I am on record that the ideal instruction size is 34-36-bits.
> > >
> > > And, I had made what tradeoffs I had made...
> > > >
> > > >> So, 16K or 32K appears to be a local optimum here.
> > > > Advanced prediction definitely lowers the pressure on i-cache even further.
> > > >
> > > Yeah.
> > >
> > >
> > > Predication can help to reduce the overall "branchiness" of the code:
> > > Average trace-length gets longer;
> > > The number of branch ops goes down;
> > > One can save a lot of cycles with short if-expressions;
> > > ...
> > >
> > > Some tasks that are painfully slow on more conventional processors can
> > > see a nice speed boost:
> > > Range-clamping expressions;
> > > The PNG Paeth filter;
> > > Things like range coders;
> > > ...
> > >
> > > Granted, a compiler can't always know which is better, since knowledge
> > > about whether or not a given branch is predictable is not known at
> > > compile time.
> > >
> > It often changes from predictable and back based on the data being processed
> > by the application.
> Ugh, possibilities:
> |>I am on record that the ideal instruction size is 34-36-bits.
> Seven 9-bit "bytes" will fit into 64-bits. So one could do one 27-bit instruction and one 36-bit instruction in 64-bits.
> Given some flexibility in the their placement, and three configurations one can have: 32-32, 27-36 and 36-27 less the configuration bit(s).
<
Insightful--thanks
<
> So there is a way, if one will go into uncharted territory?
<
At this point:: Might as well.
<
> And, what is the percentage of 32 or 36 bit compiler generated instructions that will easily fit into 27-bits??
<
My guess (1st order) is "enough" will compared to the times one needs 36-bits for a big instruction.
{This comes with the implication that 36-bit instructions are less than 20% of instruction stream}
<
But how do you take a trap and get back between the 27-bit and the 36-bit instruction ??
Or between the 36-bit instruction and the 27-bit instruction ??
{{And a few other questions along the same lines}}

Re: More of my philosophy about CISC and RISC instructions..

<cfbc986d-75fc-494a-bdb5-ccec52b17d8an@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33693&group=comp.arch#33693

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:41c8:b0:40f:c562:daa6 with SMTP id ce8-20020a05622a41c800b0040fc562daa6mr8459qtb.3.1692410512162;
Fri, 18 Aug 2023 19:01:52 -0700 (PDT)
X-Received: by 2002:a63:3750:0:b0:569:450d:cf3d with SMTP id
g16-20020a633750000000b00569450dcf3dmr154702pgn.6.1692410511252; Fri, 18 Aug
2023 19:01:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 18 Aug 2023 19:01:50 -0700 (PDT)
In-Reply-To: <47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.14.162; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.14.162
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com> <47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <cfbc986d-75fc-494a-bdb5-ccec52b17d8an@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: jim.brakefield@ieee.org (JimBrakefield)
Injection-Date: Sat, 19 Aug 2023 02:01:52 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: JimBrakefield - Sat, 19 Aug 2023 02:01 UTC

On Friday, August 18, 2023 at 8:10:38 PM UTC-5, MitchAlsup wrote:
> On Friday, August 18, 2023 at 7:50:16 PM UTC-5, JimBrakefield wrote:
> > On Friday, August 18, 2023 at 12:52:49 PM UTC-5, MitchAlsup wrote:
> > > On Friday, August 18, 2023 at 1:10:37 AM UTC-5, BGB wrote:
> > > > On 8/16/2023 12:04 PM, pec...@gmail.com wrote:
> > > > > BGB wrote:
> > > > >>> I started to think that RVC should be removed from specification, and its opcode space should be essentially free for any use.
> > > > >>> Code compression could be optional and vendor specific, performed during installation or loading/linking.
> > > > >>> Compilers are unaware of it anyway and it doesn't affect the size of zipped binaries used for distribution
> > > > >>> Reserved part of 16-bit space alone could double available 32 bit opcode space.
> > > > >>>
> > > > >> I would almost be inclined to agree, but more because the existing RVC
> > > > >> encoding scheme is *awful* (like, someone looked at Thumb and was then
> > > > >> like, "Hey man, hold my beer!").
> > > > > That's why I wrote "vendor specific".
> > > > > Generally compression scheme should be extension-agnostic (=orthogonal), and concentrated on low-end applications, because it is
> > > > > the only performance boosting feature in the ISA for this segment..
> > > > >
> > > > > Unfortunately they (risc nazi) managed to add compressed floating point instructions.
> > > > > The real irony is that it is the least important area. Most of the cores have no fpu at all. Big cores perform most of the floating point operations in the SIMD units. There is not much room int the market for middle ground.
> > > > > Moreover, floating point code is quite regular, concentrated in the small loop kernels - performance impact of compression will be negligible.
> > > > >
> > > > Yeah.
> > > >
> > > > Realistically, a few major things make sense as 16-bit ops:
> > > > MOV Reg, Reg
> > > > ADD Reg, Reg
> > > > MOV Imm, Reg
> > > > ADD Imm, Reg
> > > > A selection of basic Load/Store ops;
> > > > A few short branch encodings;
> > > > ...
> > > >
> > > > It makes sense to give the instructions which appear in the highest
> > > > densities the shorter encodings, and one can gloss over everything else.
> > > >
> > > >
> > > > Also preferably without the encoding scheme being a dog-chewed mess..
> > > > Granted, my own ISA is not entirely free of dog-chew, but both it and
> > > > RISC-V sort of have this in common.
> > > >
> > > > Mine has some encoding wonk from its origins as an ISA originally with
> > > > 16-bit instructions (which, ironically, has been gradually migrating
> > > > away from its 16-bit origins).
> > > >
> > > >
> > > >
> > > > Having recently seen some of Mitch's encoding, I can admit that it is at
> > > > least "not dog chewed".
> > > <
> > > This is a consequence of me having done a moderately dog-chewed ISA
> > > in 1983, worked on SPARC for 9 years, then over in x86-64 for 7 years
> > > then having done a GPU ISA, and then retired from working for corporations.
> > > <
> > > What you see is an attempt to combine the best features of RISC with the
> > > best features of CISC (and there are some--much to the chagrin of the
> > > puritans) into a cohesive and mostly orthogonal ISA.
> > > >
> > > > Though, it does seem to lean a little further in the direction of
> > > > immediate bits at the expense of opcode bits.
> > > <
> > > Because it was here that pure RISC ISAs waste so many instructions on
> > > pasting bits together only to sue them once as operands. So by inventing
> > > universal constants all of these bit pasting instructions vanish from the
> > > instruction stream.
> > > >
> > > >
> > > > But, OTOH, there are tradeoffs here.
> > > >
> > > >
> > > >
> > > > And, admittedly, on the other side, not as many people are as likely to
> > > > agree to my sentiment that 9-bits for more immediate and displacement
> > > > fields is "mostly sufficient".
> > > <
> > > I agree it is "mostly sufficient", but wouldn't you rather have "almost entirely
> > > sufficient" instead of "mostly sufficient" ?? i.e., 16-bits
> > > >
> > > > Well, and my instruction listing has also gotten bigger than I would
> > > > prefer, ...
> > > >
> > > > Where, as can be noted, if expressed in bits (this for the XG2 variant):
> > > > NMOp ZpZZ nnnn mmmm ZZZZ Znmo oooo ZZZZ //3R
> > > > NMYp ZpZZ nnnn mmmm ZZZZ ZnmZ ZZZZ ZZZZ //2R
> > > > NMIp ZpZZ nnnn mmmm ZZZZ Znmi iiii iiii //3RI (Imm9 / 10s)
> > > > NMIp ZpZZ nnnn ZZZZ ZZZZ Znii iiii iiii //2RI (Imm10 / 11s)
> > > > NYYp 1p00 ZZZn nnnn iiii iiii iiii iiii //2RI (Imm16)
> > > > YYYp 1p1Z iiii iiii iiii iiii iiii iiii //Imm24/Jumbo/PrWEX
> > > >
> > > > Where, Z is the bits effectively used as part of the opcode.
> > > > n/m/o: Register, i=immediate, p=predicate.
> > > > M/N/O: Register (high inverted bit)
> > > > Y: Reserved for Opcode (future, must be 1 for now).
> > > >
> > > > Or, for Baseline:
> > > > 111p ZpZZ nnnn mmmm ZZZZ Znmo oooo ZZZZ //3R
> > > > 111p ZpZZ nnnn mmmm ZZZZ ZnmZ ZZZZ ZZZZ //2R
> > > > 111p ZpZZ nnnn mmmm ZZZZ Znmi iiii iiii //3RI (Imm9)
> > > > 111p ZpZZ nnnn ZZZZ ZZZZ Znii iiii iiii //2RI (Imm10)
> > > > 111p 1p00 ZZZn nnnn iiii iiii iiii iiii //2RI (Imm16)
> > > >
> > > > Where, as noted, the baseline encoding has 5-bit register fields.
> > > >
> > > >
> > > > There are limits though to what is possible within a 32 bits layout..
> > > <
> > > I am on record that the ideal instruction size is 34-36-bits.
> > > >
> > > > And, I had made what tradeoffs I had made...
> > > > >
> > > > >> So, 16K or 32K appears to be a local optimum here.
> > > > > Advanced prediction definitely lowers the pressure on i-cache even further.
> > > > >
> > > > Yeah.
> > > >
> > > >
> > > > Predication can help to reduce the overall "branchiness" of the code:
> > > > Average trace-length gets longer;
> > > > The number of branch ops goes down;
> > > > One can save a lot of cycles with short if-expressions;
> > > > ...
> > > >
> > > > Some tasks that are painfully slow on more conventional processors can
> > > > see a nice speed boost:
> > > > Range-clamping expressions;
> > > > The PNG Paeth filter;
> > > > Things like range coders;
> > > > ...
> > > >
> > > > Granted, a compiler can't always know which is better, since knowledge
> > > > about whether or not a given branch is predictable is not known at
> > > > compile time.
> > > >
> > > It often changes from predictable and back based on the data being processed
> > > by the application.
> > Ugh, possibilities:
> > |>I am on record that the ideal instruction size is 34-36-bits.
> > Seven 9-bit "bytes" will fit into 64-bits. So one could do one 27-bit instruction and one 36-bit instruction in 64-bits.
> > Given some flexibility in the their placement, and three configurations one can have: 32-32, 27-36 and 36-27 less the configuration bit(s).
> <
> Insightful--thanks
> <
> > So there is a way, if one will go into uncharted territory?
> <
> At this point:: Might as well.
> <
> > And, what is the percentage of 32 or 36 bit compiler generated instructions that will easily fit into 27-bits??
> <
> My guess (1st order) is "enough" will compared to the times one needs 36-bits for a big instruction.
> {This comes with the implication that 36-bit instructions are less than 20% of instruction stream}
> <
> But how do you take a trap and get back between the 27-bit and the 36-bit instruction ??
> Or between the 36-bit instruction and the 27-bit instruction ??
> {{And a few other questions along the same lines}}


Click here to read the complete article
Re: More of my philosophy about CISC and RISC instructions..

<8m4EM.686037$TPw2.506418@fx17.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33697&group=comp.arch#33697

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx17.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: More of my philosophy about CISC and RISC instructions..
Newsgroups: comp.arch
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com> <7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com> <ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com> <ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com> <8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com> <47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
Lines: 18
Message-ID: <8m4EM.686037$TPw2.506418@fx17.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sat, 19 Aug 2023 14:30:28 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sat, 19 Aug 2023 14:30:28 GMT
X-Received-Bytes: 1829
 by: Scott Lurndal - Sat, 19 Aug 2023 14:30 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Friday, August 18, 2023 at 7:50:16=E2=80=AFPM UTC-5, JimBrakefield wrote=

>> And, what is the percentage of 32 or 36 bit compiler generated instructio=
>ns that will easily fit into 27-bits??
><
>My guess (1st order) is "enough" will compared to the times one needs 36-bi=
>ts for a big instruction.
>{This comes with the implication that 36-bit instructions are less than 20%=
> of instruction stream}
><
>But how do you take a trap and get back between the 27-bit and the 36-bit i=
>nstruction ??
>Or between the 36-bit instruction and the 27-bit instruction ??

Add a bit to the PC to record which part is next? Use something
like the PDP-8 link register? Record it in the processor status
register (e.g. like ARM Thumb IT instruction state)?


devel / comp.arch / Re: More of my philosophy about CISC and RISC instructions..

Pages:123456
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor