Message-ID:

To err is human, to forgive, beyond the scope of the Operating System.

devel / comp.arch / More of my philosophy about CISC and RISC instructions..

On 8/19/2023 9:30 AM, Scott Lurndal wrote:
> MitchAlsup <MitchAlsup@aol.com> writes:
>> On Friday, August 18, 2023 at 7:50:16=E2=80=AFPM UTC-5, JimBrakefield wrote=
>
>>> And, what is the percentage of 32 or 36 bit compiler generated instructio=
>> ns that will easily fit into 27-bits??
>> <
>> My guess (1st order) is "enough" will compared to the times one needs 36-bi=
>> ts for a big instruction.
>> {This comes with the implication that 36-bit instructions are less than 20%=
>> of instruction stream}
>> <
>> But how do you take a trap and get back between the 27-bit and the 36-bit i=
>> nstruction ??
>> Or between the 36-bit instruction and the 27-bit instruction ??
>
> Add a bit to the PC to record which part is next? Use something
> like the PDP-8 link register? Record it in the processor status
> register (e.g. like ARM Thumb IT instruction state)?

Hmm, what about an ISA where instructions are mostly a prime number of
bytes:
2, 3, 5, 7, 11.

xxxx-xxxx xxxx-xxx0
xxxx-xxxx xxxx-xxxx xxxx-xx01
xxxx-xxxx xxxx-xxxx xxxx-xxxx xxxx-xxxx xxxx-x011
...

Then, say:
16 bit ops have 2 4-bit register fields.
24 bit ops have 3 5-bit register fields.
40 bit ops have 3 6-bit register fields.

zzzz-ssss nnnn-zzz0
tttt-tsss sszn-nnnn zzzz-zz01
zzzz-zztt tttt-zsss ssnn-nnnn zzzz-zzzz ppzz-z011

The 16-bit ops would mostly hold a collection of 2R ops.

The 24-bit ops hold a selection of Ld/St and 3R ALU ops.
iiii-isss ss0n-nnnn zzz0-0001 //LD (Rs, Disp5)
iiii-isss ss1n-nnnn zzz0-0001 //ST (Rs, Disp5)
tttt-tsss ss0n-nnnn zzz1-0001 //LD (Rs, Rt)
tttt-tsss ss1n-nnnn zzz1-0001 //ST (Rs, Rt)
iiii-isss ss0n-nnnn zzz0-1001 //ALU Rs, Imm5u, Rn
tttt-tsss ss1n-nnnn zzz0-1001 //ALU Rs, Rt, Rn
tttt-tsss ss0n-nnnn zzz1-1001 //Misc (3R)
zzzz-zsss ss1n-nnnn zzz1-1001 //Misc (2R ops)
...
iiii-isss sszn-nnnn zzzz-0101 //LD (Rs, Disp5)
iiii-iiii iiii-iiii zz11-1101 //Branch (Disp16s)

The 16 and 24 bit ops could be defined as (hopefully straightforward)
unpacking rules into the 40 bit format (they can be considered as
"compressed", but in the sense that one needs to define bit-for-bit
mapping rules to the larger formats).

The 56 and 88 bit formats would mostly add immediate bits or similar
onto the 40 bit format.

In this case, branch displacements would be in terms of bytes.

....

Probably not terribly sensible as an ISA design, but could be kinda
amusing I think.

Also funny if one could do a superscalar implementation of such an ISA...

Re: More of my philosophy about CISC and RISC instructions..

<4941705f-ac14-4f98-b3d1-6fa62bdb4236n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33700&group=comp.arch#33700

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:5a47:0:b0:63c:fb67:a414 with SMTP id ej7-20020ad45a47000000b0063cfb67a414mr17446qvb.10.1692462684735;
Sat, 19 Aug 2023 09:31:24 -0700 (PDT)
X-Received: by 2002:a05:6a00:b4e:b0:687:94c2:106 with SMTP id
p14-20020a056a000b4e00b0068794c20106mr1467543pfo.5.1692462684496; Sat, 19 Aug
2023 09:31:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Aug 2023 09:31:23 -0700 (PDT)
In-Reply-To: <ubqphs$u0gp$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:68d1:2825:1412:c96;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:68d1:2825:1412:c96
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com> <47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
<8m4EM.686037$TPw2.506418@fx17.iad> <ubqphs$u0gp$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4941705f-ac14-4f98-b3d1-6fa62bdb4236n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 19 Aug 2023 16:31:24 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5242

by: MitchAlsup - Sat, 19 Aug 2023 16:31 UTC

On Saturday, August 19, 2023 at 11:10:41 AM UTC-5, BGB wrote:
> On 8/19/2023 9:30 AM, Scott Lurndal wrote:
> > MitchAlsup <Mitch...@aol.com> writes:
> >> On Friday, August 18, 2023 at 7:50:16=E2=80=AFPM UTC-5, JimBrakefield wrote=
> >
> >>> And, what is the percentage of 32 or 36 bit compiler generated instructio=
> >> ns that will easily fit into 27-bits??
> >> <
> >> My guess (1st order) is "enough" will compared to the times one needs 36-bi=
> >> ts for a big instruction.
> >> {This comes with the implication that 36-bit instructions are less than 20%=
> >> of instruction stream}
> >> <
> >> But how do you take a trap and get back between the 27-bit and the 36-bit i=
> >> nstruction ??
> >> Or between the 36-bit instruction and the 27-bit instruction ??
> >
> > Add a bit to the PC to record which part is next? Use something
> > like the PDP-8 link register? Record it in the processor status
> > register (e.g. like ARM Thumb IT instruction state)?
> Hmm, what about an ISA where instructions are mostly a prime number of
> bytes:
> 2, 3, 5, 7, 11.
>
> xxxx-xxxx xxxx-xxx0
> xxxx-xxxx xxxx-xxxx xxxx-xx01
> xxxx-xxxx xxxx-xxxx xxxx-xxxx xxxx-xxxx xxxx-x011
> ...
>
> Then, say:
> 16 bit ops have 2 4-bit register fields.
> 24 bit ops have 3 5-bit register fields.
> 40 bit ops have 3 6-bit register fields.
<
I see not giving full access to the whole RF as a poor choice,
Feel free to disagree with me. {There are too many register
allocation problems without having artificial boundaries in
use of registers. You might have set up a situation where you
have to register allocate from one virtual RF space to another
virtual RF space before allocating into the physical RF space.}
>
> zzzz-ssss nnnn-zzz0
> tttt-tsss sszn-nnnn zzzz-zz01
> zzzz-zztt tttt-zsss ssnn-nnnn zzzz-zzzz ppzz-z011
>
>
> The 16-bit ops would mostly hold a collection of 2R ops.
>
> The 24-bit ops hold a selection of Ld/St and 3R ALU ops.
> iiii-isss ss0n-nnnn zzz0-0001 //LD (Rs, Disp5)
> iiii-isss ss1n-nnnn zzz0-0001 //ST (Rs, Disp5)
> tttt-tsss ss0n-nnnn zzz1-0001 //LD (Rs, Rt)
> tttt-tsss ss1n-nnnn zzz1-0001 //ST (Rs, Rt)
<
I think you have sacrificed too much entropy to this particular encoding.
Consider a 32-bit RISC LD/ST instruction can have a 16-bit displacement
So a 24-bit one should be able to have an 8-bit displacement.
<
> iiii-isss ss0n-nnnn zzz0-1001 //ALU Rs, Imm5u, Rn
> tttt-tsss ss1n-nnnn zzz0-1001 //ALU Rs, Rt, Rn
> tttt-tsss ss0n-nnnn zzz1-1001 //Misc (3R)
> zzzz-zsss ss1n-nnnn zzz1-1001 //Misc (2R ops)
> ...
> iiii-isss sszn-nnnn zzzz-0101 //LD (Rs, Disp5)
> iiii-iiii iiii-iiii zz11-1101 //Branch (Disp16s)
>
> The 16 and 24 bit ops could be defined as (hopefully straightforward)
> unpacking rules into the 40 bit format (they can be considered as
> "compressed", but in the sense that one needs to define bit-for-bit
> mapping rules to the larger formats).
>
> The 56 and 88 bit formats would mostly add immediate bits or similar
> onto the 40 bit format.
>
> In this case, branch displacements would be in terms of bytes.
>
> ...
>
>
>
> Probably not terribly sensible as an ISA design, but could be kinda
> amusing I think.
>
> Also funny if one could do a superscalar implementation of such an ISA...

On 8/18/2023 12:52 PM, MitchAlsup wrote:
> On Friday, August 18, 2023 at 1:10:37 AM UTC-5, BGB wrote:
>> On 8/16/2023 12:04 PM, pec...@gmail.com wrote:
>>> BGB wrote:
>>>>> I started to think that RVC should be removed from specification, and its opcode space should be essentially free for any use.
>>>>> Code compression could be optional and vendor specific, performed during installation or loading/linking.
>>>>> Compilers are unaware of it anyway and it doesn't affect the size of zipped binaries used for distribution
>>>>> Reserved part of 16-bit space alone could double available 32 bit opcode space.
>>>>>
>>>> I would almost be inclined to agree, but more because the existing RVC
>>>> encoding scheme is *awful* (like, someone looked at Thumb and was then
>>>> like, "Hey man, hold my beer!").
>>> That's why I wrote "vendor specific".
>>> Generally compression scheme should be extension-agnostic (=orthogonal), and concentrated on low-end applications, because it is
>>> the only performance boosting feature in the ISA for this segment.
>>>
>>> Unfortunately they (risc nazi) managed to add compressed floating point instructions.
>>> The real irony is that it is the least important area. Most of the cores have no fpu at all. Big cores perform most of the floating point operations in the SIMD units. There is not much room int the market for middle ground.
>>> Moreover, floating point code is quite regular, concentrated in the small loop kernels - performance impact of compression will be negligible.
>>>
>> Yeah.
>>
>> Realistically, a few major things make sense as 16-bit ops:
>> MOV Reg, Reg
>> ADD Reg, Reg
>> MOV Imm, Reg
>> ADD Imm, Reg
>> A selection of basic Load/Store ops;
>> A few short branch encodings;
>> ...
>>
>> It makes sense to give the instructions which appear in the highest
>> densities the shorter encodings, and one can gloss over everything else.
>>
>>
>> Also preferably without the encoding scheme being a dog-chewed mess.
>> Granted, my own ISA is not entirely free of dog-chew, but both it and
>> RISC-V sort of have this in common.
>>
>> Mine has some encoding wonk from its origins as an ISA originally with
>> 16-bit instructions (which, ironically, has been gradually migrating
>> away from its 16-bit origins).
>>
>>
>>
>> Having recently seen some of Mitch's encoding, I can admit that it is at
>> least "not dog chewed".
> <
> This is a consequence of me having done a moderately dog-chewed ISA
> in 1983, worked on SPARC for 9 years, then over in x86-64 for 7 years
> then having done a GPU ISA, and then retired from working for corporations.
> <
> What you see is an attempt to combine the best features of RISC with the
> best features of CISC (and there are some--much to the chagrin of the
> puritans) into a cohesive and mostly orthogonal ISA.

Fair enough.

>>
>> Though, it does seem to lean a little further in the direction of
>> immediate bits at the expense of opcode bits.
> <
> Because it was here that pure RISC ISAs waste so many instructions on
> pasting bits together only to sue them once as operands. So by inventing
> universal constants all of these bit pasting instructions vanish from the
> instruction stream.

Yeah, this is why I ended up adding jumbo prefixes...

Even within a pure RISC, there are better/worse:
OK: LDSH/SHORI
Worse: LUI+ADD or similar;
BAD: PC-relative Load

Main advantage of LDSH/SHORI being that it expands easily to 64-bit
constants, whereas LUI doesn't.

Ironically, despite being a microcontroller RISC, the IMM
prefix-instruction in MicroBlaze is also functionally similar to a jumbo
prefix.

>>
>>
>> But, OTOH, there are tradeoffs here.
>>
>>
>>
>> And, admittedly, on the other side, not as many people are as likely to
>> agree to my sentiment that 9-bits for more immediate and displacement
>> fields is "mostly sufficient".
> <
> I agree it is "mostly sufficient", but wouldn't you rather have "almost entirely
> sufficient" instead of "mostly sufficient" ?? i.e., 16-bits

It is mostly a difference of a few percent if going by my stats.
9 bits still "wipes the floor" with 5 or 6 bit displacement fields.
12 (scaled) does a little better, but enough to justify 33% more bits.

The practical difference between 96.9% and 99.5% is "not that huge",
whereas the difference from 60% (scaled) or 20% (unscaled) for a 5u or
6s displacement is, quite a bit more significant.

Though, the 9-bit cases effectively expand to 10-bit signed in XG2,
partly because, while 9-bit unsigned won out over 9-bit signed, 10-bit
signed wins out over 10-bit unsigned (but, it was pretty close here).

Ironically, both 9-bit unsigned and 10-bit signed, with a displacement
scale, manage to slightly beat out the 12-bit signed/unscaled
displacement style used by RISC-V.

Say, Disp12s can reach +/-2K. Whereas, scaled Disp9u (for QWORD) can
reach 4K.

Granted, the RISC-V strategy (unscaled displacements) would be more of a
win if the general case use of packed structs or similar was "actually a
thing".

For ALU immediate values, 9 bits still gets ~ 95%, 12 bits would get ~
97%. Both beat out Imm5 at roughly 54%, ...

Then, with fallback cases:
Load a Imm25s into R0, use R0 instead of an immediate (*1);
Or:
Use a jumbo prefix, now it is Imm33s.

If 5% of the time, one needs to use a jumbo prefix or similar, this
isn't all that terrible.

*1: Many of the immediate or displacement values that blew out
Imm9/Disp9 also often blow out Imm16s, having originally needed ~24u/25s
bits as the "covers most cases" fallback case. This was designed before
I later added jumbo prefixes.

There is a little layout wonkiness due to the deprecated/dropped BT/BF
encodings as well, but this was because I had designed branches before I
had added predication.

In retrospect, it might have made more sense to have put the branches
into the remaining space in the F8 block. Say:
F8Ei-iiii BRA Disp20s
F8Fi-iiii BSR Disp20s
E8Ei-iiii BT Disp20s
ECEi-iiii BF Disp20s

But, such a change would break my existing code (and would require new
relocs/... as well).

As noted, BGBCC (and the ABI) ended up treating R0 and R1 as special
registers that may be stomped without warning. However, since then, R1
ended up being reclaimed more as a scratch/auxiliary link register or
scratch branch-address register.

If writing ASM code, one needs to tread carefully if using these
registers (partly due to possibly wonky cases in the assembler, and
partly as they are sometimes treated as special case encodings in the
instruction decoder for certain ops).

>>
>> Well, and my instruction listing has also gotten bigger than I would
>> prefer, ...
>>
>> Where, as can be noted, if expressed in bits (this for the XG2 variant):
>> NMOp ZpZZ nnnn mmmm ZZZZ Znmo oooo ZZZZ //3R
>> NMYp ZpZZ nnnn mmmm ZZZZ ZnmZ ZZZZ ZZZZ //2R
>> NMIp ZpZZ nnnn mmmm ZZZZ Znmi iiii iiii //3RI (Imm9 / 10s)
>> NMIp ZpZZ nnnn ZZZZ ZZZZ Znii iiii iiii //2RI (Imm10 / 11s)
>> NYYp 1p00 ZZZn nnnn iiii iiii iiii iiii //2RI (Imm16)
>> YYYp 1p1Z iiii iiii iiii iiii iiii iiii //Imm24/Jumbo/PrWEX
>>
>> Where, Z is the bits effectively used as part of the opcode.
>> n/m/o: Register, i=immediate, p=predicate.
>> M/N/O: Register (high inverted bit)
>> Y: Reserved for Opcode (future, must be 1 for now).
>>
>> Or, for Baseline:
>> 111p ZpZZ nnnn mmmm ZZZZ Znmo oooo ZZZZ //3R
>> 111p ZpZZ nnnn mmmm ZZZZ ZnmZ ZZZZ ZZZZ //2R
>> 111p ZpZZ nnnn mmmm ZZZZ Znmi iiii iiii //3RI (Imm9)
>> 111p ZpZZ nnnn ZZZZ ZZZZ Znii iiii iiii //2RI (Imm10)
>> 111p 1p00 ZZZn nnnn iiii iiii iiii iiii //2RI (Imm16)
>>
>> Where, as noted, the baseline encoding has 5-bit register fields.
>>
>>
>> There are limits though to what is possible within a 32 bits layout.
> <
> I am on record that the ideal instruction size is 34-36-bits.

Yes, but memory being built around 8-bit bytes kinda precludes this.

Fixed-length 40 or 48 bit instructions "aint gonna fly".

>>
>> And, I had made what tradeoffs I had made...
>>>
>>>> So, 16K or 32K appears to be a local optimum here.
>>> Advanced prediction definitely lowers the pressure on i-cache even further.
>>>
>> Yeah.
>>
>>
>> Predication can help to reduce the overall "branchiness" of the code:
>> Average trace-length gets longer;
>> The number of branch ops goes down;
>> One can save a lot of cycles with short if-expressions;
>> ...
>>
>> Some tasks that are painfully slow on more conventional processors can
>> see a nice speed boost:
>> Range-clamping expressions;
>> The PNG Paeth filter;
>> Things like range coders;
>> ...
>>
>> Granted, a compiler can't always know which is better, since knowledge
>> about whether or not a given branch is predictable is not known at
>> compile time.
>>
> It often changes from predictable and back based on the data being processed
> by the application.
>

Click here to read the complete article

On 8/19/2023 11:31 AM, MitchAlsup wrote:
> On Saturday, August 19, 2023 at 11:10:41 AM UTC-5, BGB wrote:
>> On 8/19/2023 9:30 AM, Scott Lurndal wrote:
>>> MitchAlsup <Mitch...@aol.com> writes:
>>>> On Friday, August 18, 2023 at 7:50:16=E2=80=AFPM UTC-5, JimBrakefield wrote=
>>>
>>>>> And, what is the percentage of 32 or 36 bit compiler generated instructio=
>>>> ns that will easily fit into 27-bits??
>>>> <
>>>> My guess (1st order) is "enough" will compared to the times one needs 36-bi=
>>>> ts for a big instruction.
>>>> {This comes with the implication that 36-bit instructions are less than 20%=
>>>> of instruction stream}
>>>> <
>>>> But how do you take a trap and get back between the 27-bit and the 36-bit i=
>>>> nstruction ??
>>>> Or between the 36-bit instruction and the 27-bit instruction ??
>>>
>>> Add a bit to the PC to record which part is next? Use something
>>> like the PDP-8 link register? Record it in the processor status
>>> register (e.g. like ARM Thumb IT instruction state)?
>> Hmm, what about an ISA where instructions are mostly a prime number of
>> bytes:
>> 2, 3, 5, 7, 11.
>>
>> xxxx-xxxx xxxx-xxx0
>> xxxx-xxxx xxxx-xxxx xxxx-xx01
>> xxxx-xxxx xxxx-xxxx xxxx-xxxx xxxx-xxxx xxxx-x011
>> ...
>>
>> Then, say:
>> 16 bit ops have 2 4-bit register fields.
>> 24 bit ops have 3 5-bit register fields.
>> 40 bit ops have 3 6-bit register fields.
> <
> I see not giving full access to the whole RF as a poor choice,
> Feel free to disagree with me. {There are too many register
> allocation problems without having artificial boundaries in
> use of registers. You might have set up a situation where you
> have to register allocate from one virtual RF space to another
> virtual RF space before allocating into the physical RF space.}

This is less of a problem if every shorter encoding has a corresponding
encoding in a wider format (and the effects of instruction size are not
explicit at the ASM level).

In this case, the wonky register sizes become merely an size
optimization issue, where one can have the compiler prioritize the
registers that can use shorter formats over the ones that need longer
formats.

Though, in this case, for this combination, it would likely make sense
to keep a similar register layout to BJX2, which ironically mostly has
the needed layout out of a side-effect of having "grown out" of an
earlier 16-register layout (the 64 GPR layout being effectively the 16
GPR layout repeated 4 times...).

....

Still better than having an encoding where a bunch of combinations are
non-encodable in the ISA (and the code-generator needs to have a bunch
of wonky edge cases to work around being unable to encode the offending
cases...).

But, yeah, I had been down the above road both with R16..R31 in BJX1,
and with R32..R63 in BJX2's Baseline encoding (eg: the fun of trying to
work with 64 GPRs in an ISA encoding designed around 5-bit register fields).

At least the XG2 encoding sort of "fixes" the above issue, albeit at the
potential cost of code density due to the loss of 16-bit encodings.

Then again, given Doom with the XG2 encoding is still smaller than
either RV64IMA or x86-64 builds, I don't think it is doing too horribly
(even if XG2 is roughly 11% worse than the Baseline encoding in terms of
code-density).

I guess the main tradeoff here being whether one wants to build programs
for 32 or 64 GPRs (I may consider splitting my A and G/H profiles along
these lines, possibly with the G/H profiles assuming XG2 encoding as the
default, but A assuming Baseline; partly as using XG2 in a 32 GPR
configuration gains nothing; but using Baseline in a 64 GPR
configuration sucks due to non-orthogonality issues...).

>>
>> zzzz-ssss nnnn-zzz0
>> tttt-tsss sszn-nnnn zzzz-zz01
>> zzzz-zztt tttt-zsss ssnn-nnnn zzzz-zzzz ppzz-z011
>>
>>
>> The 16-bit ops would mostly hold a collection of 2R ops.
>>
>> The 24-bit ops hold a selection of Ld/St and 3R ALU ops.
>> iiii-isss ss0n-nnnn zzz0-0001 //LD (Rs, Disp5)
>> iiii-isss ss1n-nnnn zzz0-0001 //ST (Rs, Disp5)
>> tttt-tsss ss0n-nnnn zzz1-0001 //LD (Rs, Rt)
>> tttt-tsss ss1n-nnnn zzz1-0001 //ST (Rs, Rt)
> <
> I think you have sacrificed too much entropy to this particular encoding.
> Consider a 32-bit RISC LD/ST instruction can have a 16-bit displacement
> So a 24-bit one should be able to have an 8-bit displacement.
> <

Then for this encoding block, you would have *nothing* apart from LD/ST
ops...

One could note that Disp5u still typically hits roughly 50% of the time
in my stats. This is probably enough for the encoding to still be "useful".

Granted, half the time, one would still need to use the 40-bit format...

>> iiii-isss ss0n-nnnn zzz0-1001 //ALU Rs, Imm5u, Rn
>> tttt-tsss ss1n-nnnn zzz0-1001 //ALU Rs, Rt, Rn
>> tttt-tsss ss0n-nnnn zzz1-1001 //Misc (3R)
>> zzzz-zsss ss1n-nnnn zzz1-1001 //Misc (2R ops)
>> ...
>> iiii-isss sszn-nnnn zzzz-0101 //LD (Rs, Disp5)
>> iiii-iiii iiii-iiii zz11-1101 //Branch (Disp16s)
>>
>> The 16 and 24 bit ops could be defined as (hopefully straightforward)
>> unpacking rules into the 40 bit format (they can be considered as
>> "compressed", but in the sense that one needs to define bit-for-bit
>> mapping rules to the larger formats).
>>
>> The 56 and 88 bit formats would mostly add immediate bits or similar
>> onto the 40 bit format.
>>
>> In this case, branch displacements would be in terms of bytes.
>>
>> ...
>>
>>
>>
>> Probably not terribly sensible as an ISA design, but could be kinda
>> amusing I think.
>>
>> Also funny if one could do a superscalar implementation of such an ISA...

On 8/18/2023 11:05 AM, Scott Lurndal wrote:
> MitchAlsup <MitchAlsup@aol.com> writes:
>> On Monday, August 14, 2023 at 5:45:10=E2=80=AFAM UTC-5, pec...@gmail.com wr=
>> ote:
>
>>> Reserved part of 16-bit space alone could double available 32 bit opcode =
>> space.
>> <
>> RISC-V allocates 3/4 of the OpCode encoding to 16-bit stuff and gains all t=
>> he complexity of variable length instructions but gains little of the benef=
>> its.
>
> ARM has the Thumb32 instruction set, which I just finished a simulator for,
> which reserves three of the 16-bit encodings to indicate 32-bit instructions.
>

Having developed along a vaguely similar trajectory, I had ended up with
a similar scheme (to Thumb2) in my case.

> It also includes the rather unusual T16 IT instruction (If-Then) which, as a form
> of predication, can cover up to four subsequent T16 instructions.
>
> It's worth noting that the IT instruction was deprecated in the thumb
> support for AArch32 in ARMv8+.

I would guess that this mechanism would have required a way to preserve
and restore this state during interrupts, which could be "rather
annoying" to deal with.

Probably combined with limited use by compilers compared with normal
branches.

Conventional wisdom is usually that "branch predictor makes branches not
slow" so "one does not need predication".

Except now the CPU performance may "eat it" when trying to deal with a
PNG Paeth filter or bitwise range coder or similar (which effectively
feed raw entropy from the data stream into the branch hit/miss
handling). Likewise for things like alpha-testing pixels in a software
rasterizer, etc.

But, a lot of people (including compiler writers) seem inclined to
ignore these cases.

But, then CPU designers are like "well, we will interpret a short
forward branch as predicating the next N instructions rather than doing
a branch", ...

....

Re: More of my philosophy about CISC and RISC instructions..

<4abb73a0-37f7-410c-9ea1-3d433bf8a80cn@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33705&group=comp.arch#33705

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:4b46:b0:76d:fe8:1b03 with SMTP id su6-20020a05620a4b4600b0076d0fe81b03mr12968qkn.15.1692472358672;
Sat, 19 Aug 2023 12:12:38 -0700 (PDT)
X-Received: by 2002:a05:6a00:17aa:b0:687:4ed6:ec12 with SMTP id
s42-20020a056a0017aa00b006874ed6ec12mr1753786pfg.3.1692472358329; Sat, 19 Aug
2023 12:12:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Aug 2023 12:12:37 -0700 (PDT)
In-Reply-To: <ubqr9n$uehf$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:68d1:2825:1412:c96;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:68d1:2825:1412:c96
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<ubqr9n$uehf$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4abb73a0-37f7-410c-9ea1-3d433bf8a80cn@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 19 Aug 2023 19:12:38 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 10643

by: MitchAlsup - Sat, 19 Aug 2023 19:12 UTC

On Saturday, August 19, 2023 at 11:40:27 AM UTC-5, BGB wrote:
> On 8/18/2023 12:52 PM, MitchAlsup wrote:
> > On Friday, August 18, 2023 at 1:10:37 AM UTC-5, BGB wrote:
> >> On 8/16/2023 12:04 PM, pec...@gmail.com wrote:
> >>> BGB wrote:
> >>>>> I started to think that RVC should be removed from specification, and its opcode space should be essentially free for any use.
> >>>>> Code compression could be optional and vendor specific, performed during installation or loading/linking.
> >>>>> Compilers are unaware of it anyway and it doesn't affect the size of zipped binaries used for distribution
> >>>>> Reserved part of 16-bit space alone could double available 32 bit opcode space.
> >>>>>
> >>>> I would almost be inclined to agree, but more because the existing RVC
> >>>> encoding scheme is *awful* (like, someone looked at Thumb and was then
> >>>> like, "Hey man, hold my beer!").
> >>> That's why I wrote "vendor specific".
> >>> Generally compression scheme should be extension-agnostic (=orthogonal), and concentrated on low-end applications, because it is
> >>> the only performance boosting feature in the ISA for this segment.
> >>>
> >>> Unfortunately they (risc nazi) managed to add compressed floating point instructions.
> >>> The real irony is that it is the least important area. Most of the cores have no fpu at all. Big cores perform most of the floating point operations in the SIMD units. There is not much room int the market for middle ground.
> >>> Moreover, floating point code is quite regular, concentrated in the small loop kernels - performance impact of compression will be negligible.
> >>>
> >> Yeah.
> >>
> >> Realistically, a few major things make sense as 16-bit ops:
> >> MOV Reg, Reg
> >> ADD Reg, Reg
> >> MOV Imm, Reg
> >> ADD Imm, Reg
> >> A selection of basic Load/Store ops;
> >> A few short branch encodings;
> >> ...
> >>
> >> It makes sense to give the instructions which appear in the highest
> >> densities the shorter encodings, and one can gloss over everything else.
> >>
> >>
> >> Also preferably without the encoding scheme being a dog-chewed mess.
> >> Granted, my own ISA is not entirely free of dog-chew, but both it and
> >> RISC-V sort of have this in common.
> >>
> >> Mine has some encoding wonk from its origins as an ISA originally with
> >> 16-bit instructions (which, ironically, has been gradually migrating
> >> away from its 16-bit origins).
> >>
> >>
> >>
> >> Having recently seen some of Mitch's encoding, I can admit that it is at
> >> least "not dog chewed".
> > <
> > This is a consequence of me having done a moderately dog-chewed ISA
> > in 1983, worked on SPARC for 9 years, then over in x86-64 for 7 years
> > then having done a GPU ISA, and then retired from working for corporations.
> > <
> > What you see is an attempt to combine the best features of RISC with the
> > best features of CISC (and there are some--much to the chagrin of the
> > puritans) into a cohesive and mostly orthogonal ISA.
> Fair enough.
> >>
> >> Though, it does seem to lean a little further in the direction of
> >> immediate bits at the expense of opcode bits.
> > <
> > Because it was here that pure RISC ISAs waste so many instructions on
> > pasting bits together only to sue them once as operands. So by inventing
> > universal constants all of these bit pasting instructions vanish from the
> > instruction stream.
> Yeah, this is why I ended up adding jumbo prefixes...
>
>
> Even within a pure RISC, there are better/worse:
> OK: LDSH/SHORI
> Worse: LUI+ADD or similar;
> BAD: PC-relative Load
<
Only when "done wrong".
<
LDD R7,[IP,0x1234]
<
Is one <word> instruction using R0 as a proxy for IP when used as a base register.
>
> Main advantage of LDSH/SHORI being that it expands easily to 64-bit
> constants, whereas LUI doesn't.
>
LDSH = Load Signed Half Word ??
SHORI = Store Half OR Immediate ??
>
> Ironically, despite being a microcontroller RISC, the IMM
> prefix-instruction in MicroBlaze is also functionally similar to a jumbo
> prefix.
<
STD 3.141592653589278643,[R3,R7<<3,DISP64]
<
Is 1 instruction, issues in 1 cycle, wastes no temporary registers,.......
That is, you can store an arbitrary constant anywhere in memory
using any addressing mode at any time with a single instruction.
> >>
> >>
> >> But, OTOH, there are tradeoffs here.
> >>
> >>
> >>
> >> And, admittedly, on the other side, not as many people are as likely to
> >> agree to my sentiment that 9-bits for more immediate and displacement
> >> fields is "mostly sufficient".
> > <
> > I agree it is "mostly sufficient", but wouldn't you rather have "almost entirely
> > sufficient" instead of "mostly sufficient" ?? i.e., 16-bits
<
> It is mostly a difference of a few percent if going by my stats.
> 9 bits still "wipes the floor" with 5 or 6 bit displacement fields.
> 12 (scaled) does a little better, but enough to justify 33% more bits.
<
I don't think you could point to a place where I sacrificed anything to enable
almost all integer and memory references getting 16-bit immediates.
Whereas; EMBench demonstrates that RISC-V's 12-bit displacements
are insufficient for most memory accesses. {Almost as if EMBench
had been designed to illustrate that disparity.}
>
> The practical difference between 96.9% and 99.5% is "not that huge",
> whereas the difference from 60% (scaled) or 20% (unscaled) for a 5u or
> 6s displacement is, quite a bit more significant.
>
You are still operating under the assumption that I had to sacrifice
anything.
>
> Though, the 9-bit cases effectively expand to 10-bit signed in XG2,
> partly because, while 9-bit unsigned won out over 9-bit signed, 10-bit
> signed wins out over 10-bit unsigned (but, it was pretty close here).
>
>
> Ironically, both 9-bit unsigned and 10-bit signed, with a displacement
> scale, manage to slightly beat out the 12-bit signed/unscaled
> displacement style used by RISC-V.
>
> Say, Disp12s can reach +/-2K. Whereas, scaled Disp9u (for QWORD) can
> reach 4K.
<
This is the same argument I used in Mc 88100 arguing that displacement
arithmetic need not be signed (ala IBM 360), that the arithmetic was
congruent (could be rephrased in the same number of instructions,
and allow certain linker tricks.
<
The compiler people wouldn't even discuss it.
>
>
> Granted, the RISC-V strategy (unscaled displacements) would be more of a
> win if the general case use of packed structs or similar was "actually a
> thing".
>
What you are saying is that "If RISC-V hadn't screwed up so many things it
would have been a significantly better ISA". And no one could possibly disagree
with you.
>
> For ALU immediate values, 9 bits still gets ~ 95%, 12 bits would get ~
> 97%. Both beat out Imm5 at roughly 54%, ...
<
But now you have to route all sorts of different sizes from the instruction
to various operand busses, whereas I only have to route {16,32-64}-bits.
This takes less decode logic and less multiplexing logic in the <time
critical> forwarding "loop".
<
RISC-V then compounds this problem by adding compression.
>
><snip>
>
> As noted, BGBCC (and the ABI) ended up treating R0 and R1 as special
> registers that may be stomped without warning. However, since then, R1
> ended up being reclaimed more as a scratch/auxiliary link register or
> scratch branch-address register.
<
I have no registers that any external force can stomp on
>
> If writing ASM code, one needs to tread carefully if using these
> registers (partly due to possibly wonky cases in the assembler, and
> partly as they are sometimes treated as special case encodings in the
> instruction decoder for certain ops).
<
I don't have these issues.
<snip>
> > I am on record that the ideal instruction size is 34-36-bits.
> Yes, but memory being built around 8-bit bytes kinda precludes this.
Somewhat of a shame, actually.........
>
><snip>
> >> Granted, a compiler can't always know which is better, since knowledge
> >> about whether or not a given branch is predictable is not known at
> >> compile time.
> >>
> > It often changes from predictable and back based on the data being processed
> > by the application.
> >
> Yeah, either way, the compiler isn't going to know.
<
If it weren't for benchmarketeering, the compiler would never have had to know.

Click here to read the complete article

Re: More of my philosophy about CISC and RISC instructions..

<4826e253-d7c4-4b5e-98b4-8b51ee9e4a88n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33706&group=comp.arch#33706

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:309:b0:403:c1e5:e427 with SMTP id q9-20020a05622a030900b00403c1e5e427mr16136qtw.5.1692472649675;
Sat, 19 Aug 2023 12:17:29 -0700 (PDT)
X-Received: by 2002:a05:6a00:2d98:b0:68a:3c7a:128c with SMTP id
fb24-20020a056a002d9800b0068a3c7a128cmr414619pfb.2.1692472649383; Sat, 19 Aug
2023 12:17:29 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Aug 2023 12:17:28 -0700 (PDT)
In-Reply-To: <ubqtm2$uqgs$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:68d1:2825:1412:c96;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:68d1:2825:1412:c96
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com> <47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
<8m4EM.686037$TPw2.506418@fx17.iad> <ubqphs$u0gp$1@dont-email.me>
<4941705f-ac14-4f98-b3d1-6fa62bdb4236n@googlegroups.com> <ubqtm2$uqgs$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4826e253-d7c4-4b5e-98b4-8b51ee9e4a88n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 19 Aug 2023 19:17:29 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3089

by: MitchAlsup - Sat, 19 Aug 2023 19:17 UTC

On Saturday, August 19, 2023 at 12:21:10 PM UTC-5, BGB wrote:
> On 8/19/2023 11:31 AM, MitchAlsup wrote:

> >> The 16-bit ops would mostly hold a collection of 2R ops.
> >>
> >> The 24-bit ops hold a selection of Ld/St and 3R ALU ops.
> >> iiii-isss ss0n-nnnn zzz0-0001 //LD (Rs, Disp5)
> >> iiii-isss ss1n-nnnn zzz0-0001 //ST (Rs, Disp5)
> >> tttt-tsss ss0n-nnnn zzz1-0001 //LD (Rs, Rt)
> >> tttt-tsss ss1n-nnnn zzz1-0001 //ST (Rs, Rt)
> > <
> > I think you have sacrificed too much entropy to this particular encoding.
> > Consider a 32-bit RISC LD/ST instruction can have a 16-bit displacement
> > So a 24-bit one should be able to have an 8-bit displacement.
> > <
> Then for this encoding block, you would have *nothing* apart from LD/ST
> ops...
<
2 flavors
a) MEM Rd,[Rb,DISP16]
b) MEM Rd,[Rb,Ri<<s] // which have optional displacements {32,64}
>
> One could note that Disp5u still typically hits roughly 50% of the time
> in my stats. This is probably enough for the encoding to still be "useful".
<
Whereas, My encoding gives that "flavor" 16-bits which as you stated is good
to the 99% level. 99% > 50% to the point the compiler does not need the
intermediate pattern recognition cases.

Re: More of my philosophy about CISC and RISC instructions..

<bb790143-18f5-4865-b162-5a0da094a273n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33707&group=comp.arch#33707

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:4709:b0:76d:8403:1aa4 with SMTP id bs9-20020a05620a470900b0076d84031aa4mr13132qkb.5.1692473056002;
Sat, 19 Aug 2023 12:24:16 -0700 (PDT)
X-Received: by 2002:a05:6a00:2d8f:b0:688:47b1:a89f with SMTP id
fb15-20020a056a002d8f00b0068847b1a89fmr1723521pfb.3.1692473055567; Sat, 19
Aug 2023 12:24:15 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Aug 2023 12:24:15 -0700 (PDT)
In-Reply-To: <ubqv57$v2re$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:68d1:2825:1412:c96;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:68d1:2825:1412:c96
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<2fc528c1-c0d4-4f20-8ce9-5845e9b805e0n@googlegroups.com> <%EMDM.147258$X02a.70096@fx46.iad>
<ubqv57$v2re$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bb790143-18f5-4865-b162-5a0da094a273n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 19 Aug 2023 19:24:15 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4063

by: MitchAlsup - Sat, 19 Aug 2023 19:24 UTC

On Saturday, August 19, 2023 at 12:46:19 PM UTC-5, BGB wrote:
> On 8/18/2023 11:05 AM, Scott Lurndal wrote:
> > MitchAlsup <Mitch...@aol.com> writes:
> >> On Monday, August 14, 2023 at 5:45:10=E2=80=AFAM UTC-5, pec...@gmail.com wr=
> >> ote:
> >
> >>> Reserved part of 16-bit space alone could double available 32 bit opcode =
> >> space.
> >> <
> >> RISC-V allocates 3/4 of the OpCode encoding to 16-bit stuff and gains all t=
> >> he complexity of variable length instructions but gains little of the benef=
> >> its.
> >
> > ARM has the Thumb32 instruction set, which I just finished a simulator for,
> > which reserves three of the 16-bit encodings to indicate 32-bit instructions.
> >
> Having developed along a vaguely similar trajectory, I had ended up with
> a similar scheme (to Thumb2) in my case.
> > It also includes the rather unusual T16 IT instruction (If-Then) which, as a form
> > of predication, can cover up to four subsequent T16 instructions.
> >
> > It's worth noting that the IT instruction was deprecated in the thumb
> > support for AArch32 in ARMv8+.
> I would guess that this mechanism would have required a way to preserve
> and restore this state during interrupts, which could be "rather
> annoying" to deal with.
>
> Probably combined with limited use by compilers compared with normal
> branches.
>
> Conventional wisdom is usually that "branch predictor makes branches not
> slow" so "one does not need predication".
>
>
> Except now the CPU performance may "eat it" when trying to deal with a
> PNG Paeth filter or bitwise range coder or similar (which effectively
> feed raw entropy from the data stream into the branch hit/miss
> handling). Likewise for things like alpha-testing pixels in a software
> rasterizer, etc.
<
Extract and Insert Instructions simplify the encoding of these.
>
> But, a lot of people (including compiler writers) seem inclined to
> ignore these cases.
<
Often disguised as a series of shifts (a << const1)>>const2 because
the underlying language does not express variable length bit-fields
efficiently.
>
> But, then CPU designers are like "well, we will interpret a short
> forward branch as predicating the next N instructions rather than doing
> a branch", ...
<
AND WHY NOT ??
>
> ...

On 8/19/2023 2:12 PM, MitchAlsup wrote:
> On Saturday, August 19, 2023 at 11:40:27 AM UTC-5, BGB wrote:
>> On 8/18/2023 12:52 PM, MitchAlsup wrote:
>>> On Friday, August 18, 2023 at 1:10:37 AM UTC-5, BGB wrote:
>>>> On 8/16/2023 12:04 PM, pec...@gmail.com wrote:
>>>>> BGB wrote:
>>>>>>> I started to think that RVC should be removed from specification, and its opcode space should be essentially free for any use.
>>>>>>> Code compression could be optional and vendor specific, performed during installation or loading/linking.
>>>>>>> Compilers are unaware of it anyway and it doesn't affect the size of zipped binaries used for distribution
>>>>>>> Reserved part of 16-bit space alone could double available 32 bit opcode space.
>>>>>>>
>>>>>> I would almost be inclined to agree, but more because the existing RVC
>>>>>> encoding scheme is *awful* (like, someone looked at Thumb and was then
>>>>>> like, "Hey man, hold my beer!").
>>>>> That's why I wrote "vendor specific".
>>>>> Generally compression scheme should be extension-agnostic (=orthogonal), and concentrated on low-end applications, because it is
>>>>> the only performance boosting feature in the ISA for this segment.
>>>>>
>>>>> Unfortunately they (risc nazi) managed to add compressed floating point instructions.
>>>>> The real irony is that it is the least important area. Most of the cores have no fpu at all. Big cores perform most of the floating point operations in the SIMD units. There is not much room int the market for middle ground.
>>>>> Moreover, floating point code is quite regular, concentrated in the small loop kernels - performance impact of compression will be negligible.
>>>>>
>>>> Yeah.
>>>>
>>>> Realistically, a few major things make sense as 16-bit ops:
>>>> MOV Reg, Reg
>>>> ADD Reg, Reg
>>>> MOV Imm, Reg
>>>> ADD Imm, Reg
>>>> A selection of basic Load/Store ops;
>>>> A few short branch encodings;
>>>> ...
>>>>
>>>> It makes sense to give the instructions which appear in the highest
>>>> densities the shorter encodings, and one can gloss over everything else.
>>>>
>>>>
>>>> Also preferably without the encoding scheme being a dog-chewed mess.
>>>> Granted, my own ISA is not entirely free of dog-chew, but both it and
>>>> RISC-V sort of have this in common.
>>>>
>>>> Mine has some encoding wonk from its origins as an ISA originally with
>>>> 16-bit instructions (which, ironically, has been gradually migrating
>>>> away from its 16-bit origins).
>>>>
>>>>
>>>>
>>>> Having recently seen some of Mitch's encoding, I can admit that it is at
>>>> least "not dog chewed".
>>> <
>>> This is a consequence of me having done a moderately dog-chewed ISA
>>> in 1983, worked on SPARC for 9 years, then over in x86-64 for 7 years
>>> then having done a GPU ISA, and then retired from working for corporations.
>>> <
>>> What you see is an attempt to combine the best features of RISC with the
>>> best features of CISC (and there are some--much to the chagrin of the
>>> puritans) into a cohesive and mostly orthogonal ISA.
>> Fair enough.
>>>>
>>>> Though, it does seem to lean a little further in the direction of
>>>> immediate bits at the expense of opcode bits.
>>> <
>>> Because it was here that pure RISC ISAs waste so many instructions on
>>> pasting bits together only to sue them once as operands. So by inventing
>>> universal constants all of these bit pasting instructions vanish from the
>>> instruction stream.
>> Yeah, this is why I ended up adding jumbo prefixes...
>>
>>
>> Even within a pure RISC, there are better/worse:
>> OK: LDSH/SHORI
>> Worse: LUI+ADD or similar;
>> BAD: PC-relative Load
> <
> Only when "done wrong".
> <
> LDD R7,[IP,0x1234]
> <
> Is one <word> instruction using R0 as a proxy for IP when used as a base register.

>>
>> Main advantage of LDSH/SHORI being that it expands easily to 64-bit
>> constants, whereas LUI doesn't.
>>
> LDSH = Load Signed Half Word ??
> SHORI = Store Half OR Immediate ??

LDSH = Load-via-Shift (the name I originally came up with for BJX1).
SHORI = Shift-with-OR (the name Hitachi came up with for SH5).

Both basically being the same mechanism:
Rn = (Rn<<16)|Imm16u;

In both ISA branches, Load/Store (from memory) uses a MOV.x name, such as:
MOV.W (R4), R9 //BJX2 notation
MOV.W @R4, R9 //SuperH notation

The original SH-2/4 ISA had instead used dedicated PC-relative load
instructions, IIRC:
MOV.W (PC, Disp8), R0
MOV.L (PC, Disp8), R0

But, these were a pain...

Basically, the assembler would need to find spots to silently dump a
blob of whatever constants were pending, or silently in the middle of
the instruction stream if the distance got large enough (anywhere near a
hard limit of 512 bytes).

Typically, this would also involve emitting a branch over the blob of
constants (along with a NOP since branches in SH had a delay slot), ...

>>
>> Ironically, despite being a microcontroller RISC, the IMM
>> prefix-instruction in MicroBlaze is also functionally similar to a jumbo
>> prefix.
> <
> STD 3.141592653589278643,[R3,R7<<3,DISP64]
> <
> Is 1 instruction, issues in 1 cycle, wastes no temporary registers,.......
> That is, you can store an arbitrary constant anywhere in memory
> using any addressing mode at any time with a single instruction.

Possible.

Pulling similar off in my case would likely require 3 instructions
(assuming the RiMOV extension), or 4 (otherwise).

But, this is not a common case...

>>>>
>>>>
>>>> But, OTOH, there are tradeoffs here.
>>>>
>>>>
>>>>
>>>> And, admittedly, on the other side, not as many people are as likely to
>>>> agree to my sentiment that 9-bits for more immediate and displacement
>>>> fields is "mostly sufficient".
>>> <
>>> I agree it is "mostly sufficient", but wouldn't you rather have "almost entirely
>>> sufficient" instead of "mostly sufficient" ?? i.e., 16-bits
> <
>> It is mostly a difference of a few percent if going by my stats.
>> 9 bits still "wipes the floor" with 5 or 6 bit displacement fields.
>> 12 (scaled) does a little better, but enough to justify 33% more bits.
> <
> I don't think you could point to a place where I sacrificed anything to enable
> almost all integer and memory references getting 16-bit immediates.
> Whereas; EMBench demonstrates that RISC-V's 12-bit displacements
> are insufficient for most memory accesses. {Almost as if EMBench
> had been designed to illustrate that disparity.}

I will not claim that 9 bits gets universal coverage, but in the
programs I have been running this far, it has good coverage (and just
slightly better than the RISC-V strategy on average despite having 3
fewer bits).

Granted... packed structures in my case would require displacements to
be shuffled through R0 (there is an special case for unscaled R0
displacements; like in its SuperH ancestors...).

>>
>> The practical difference between 96.9% and 99.5% is "not that huge",
>> whereas the difference from 60% (scaled) or 20% (unscaled) for a 5u or
>> 6s displacement is, quite a bit more significant.
>>
> You are still operating under the assumption that I had to sacrifice
> anything.

There is less space for opcode bits.

Something like x86 SSE or ARM NEON style SIMD would likely be an issue
for encoding space, at least in terms of 32-bit ops... Granted, I am
guessing you probably also have an "escape hatch" for more opcode space?...

>>
>> Though, the 9-bit cases effectively expand to 10-bit signed in XG2,
>> partly because, while 9-bit unsigned won out over 9-bit signed, 10-bit
>> signed wins out over 10-bit unsigned (but, it was pretty close here).
>>
>>
>> Ironically, both 9-bit unsigned and 10-bit signed, with a displacement
>> scale, manage to slightly beat out the 12-bit signed/unscaled
>> displacement style used by RISC-V.
>>
>> Say, Disp12s can reach +/-2K. Whereas, scaled Disp9u (for QWORD) can
>> reach 4K.
> <
> This is the same argument I used in Mc 88100 arguing that displacement
> arithmetic need not be signed (ala IBM 360), that the arithmetic was
> congruent (could be rephrased in the same number of instructions,
> and allow certain linker tricks.
> <
> The compiler people wouldn't even discuss it.

Click here to read the complete article

MitchAlsup <MitchAlsup@aol.com> writes:
>On Saturday, August 19, 2023 at 11:10:41=E2=80=AFAM UTC-5, BGB wrote:
>> On 8/19/2023 9:30 AM, Scott Lurndal wrote:=20
>> > MitchAlsup <Mitch...@aol.com> writes:=20
>> >> On Friday, August 18, 2023 at 7:50:16=3DE2=3D80=3DAFPM UTC-5, JimBrake=
>field wrote=3D=20
>> >=20
>> >>> And, what is the percentage of 32 or 36 bit compiler generated instru=
>ctio=3D=20
>> >> ns that will easily fit into 27-bits??=20
>> >> <=20
>> >> My guess (1st order) is "enough" will compared to the times one needs =
>36-bi=3D=20
>> >> ts for a big instruction.=20
>> >> {This comes with the implication that 36-bit instructions are less tha=
>n 20%=3D=20
>> >> of instruction stream}=20
>> >> <=20
>> >> But how do you take a trap and get back between the 27-bit and the 36-=
>bit i=3D=20
>> >> nstruction ??=20
>> >> Or between the 36-bit instruction and the 27-bit instruction ??=20
>> >=20
>> > Add a bit to the PC to record which part is next? Use something=20
>> > like the PDP-8 link register? Record it in the processor status=20
>> > register (e.g. like ARM Thumb IT instruction state)?
>> Hmm, what about an ISA where instructions are mostly a prime number of=20
>> bytes:=20
>> 2, 3, 5, 7, 11.=20
>>=20
>> xxxx-xxxx xxxx-xxx0=20
>> xxxx-xxxx xxxx-xxxx xxxx-xx01=20
>> xxxx-xxxx xxxx-xxxx xxxx-xxxx xxxx-xxxx xxxx-x011=20
>> ...=20
>>=20
>> Then, say:=20
>> 16 bit ops have 2 4-bit register fields.=20
>> 24 bit ops have 3 5-bit register fields.=20
>> 40 bit ops have 3 6-bit register fields.=20
><
>I see not giving full access to the whole RF as a poor choice,

That's one of the "features" of ARM's Thumb32. The 32-bit
instructions have access to all 16 registers, while the
16-bit instructions only access the first 8.

Re: More of my philosophy about CISC and RISC instructions..

<299eacf1-ed31-4611-a9b0-e5098f85bd8bn@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33712&group=comp.arch#33712

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:4e47:0:b0:63c:f62c:45dd with SMTP id eb7-20020ad44e47000000b0063cf62c45ddmr29192qvb.5.1692482074471;
Sat, 19 Aug 2023 14:54:34 -0700 (PDT)
X-Received: by 2002:a17:902:f681:b0:1b9:e8e5:b0a4 with SMTP id
l1-20020a170902f68100b001b9e8e5b0a4mr1193550plg.8.1692482074219; Sat, 19 Aug
2023 14:54:34 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 19 Aug 2023 14:54:33 -0700 (PDT)
In-Reply-To: <ubra6f$10m81$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:68d1:2825:1412:c96;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:68d1:2825:1412:c96
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<ubqr9n$uehf$1@dont-email.me> <4abb73a0-37f7-410c-9ea1-3d433bf8a80cn@googlegroups.com>
<ubra6f$10m81$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <299eacf1-ed31-4611-a9b0-e5098f85bd8bn@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 19 Aug 2023 21:54:34 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 8772

by: MitchAlsup - Sat, 19 Aug 2023 21:54 UTC

On Saturday, August 19, 2023 at 3:54:44 PM UTC-5, BGB wrote:
> On 8/19/2023 2:12 PM, MitchAlsup wrote:

> >> Main advantage of LDSH/SHORI being that it expands easily to 64-bit
> >> constants, whereas LUI doesn't.
> >>
> > LDSH = Load Signed Half Word ??
> > SHORI = Store Half OR Immediate ??
> LDSH = Load-via-Shift (the name I originally came up with for BJX1).
> SHORI = Shift-with-OR (the name Hitachi came up with for SH5).
>
> Both basically being the same mechanism:
> Rn = (Rn<<16)|Imm16u;
<
lack of addition is problematic.
>
> In both ISA branches, Load/Store (from memory) uses a MOV.x name, such as:
> MOV.W (R4), R9 //BJX2 notation
> MOV.W @R4, R9 //SuperH notation
>
An inbound memory reference should be spelled LD
An outbound memory reference should be spelled ST
>

> >> Ironically, despite being a microcontroller RISC, the IMM
> >> prefix-instruction in MicroBlaze is also functionally similar to a jumbo
> >> prefix.
> > <
> > STD 3.141592653589278643,[R3,R7<<3,DISP64]
> > <
> > Is 1 instruction, issues in 1 cycle, wastes no temporary registers,........
> > That is, you can store an arbitrary constant anywhere in memory
> > using any addressing mode at any time with a single instruction.
> Possible.
>
> Pulling similar off in my case would likely require 3 instructions
> (assuming the RiMOV extension), or 4 (otherwise).
<
RISC-V typically uses 3 or 4 instructions:
AUPIC; LD const; LDHI; ST location
>
> But, this is not a common case...
<
I can show subroutines with 10 of these in a row.
<
-------------
> > You are still operating under the assumption that I had to sacrifice
> > anything.
<
> There is less space for opcode bits.
<
Not compared to MIPS, Mc 88100, RISC-V.
And every OpCode Group has space remaining.
>
> Something like x86 SSE or ARM NEON style SIMD would likely be an issue
> for encoding space, at least in terms of 32-bit ops... Granted, I am
> guessing you probably also have an "escape hatch" for more opcode space?....
<
All of the unused OpCode Groups are reserved for the future. There are 22
(out of 64) Major OpCodes for future expansion. Given that I consumed
21 for 16-bit immediates, I think there is plenty (at least for the rest of
my lifetime.) Also notice I got Vectorization and SIMD into 2 instructions.
> >>
> >> Though, the 9-bit cases effectively expand to 10-bit signed in XG2,
> >> partly because, while 9-bit unsigned won out over 9-bit signed, 10-bit
> >> signed wins out over 10-bit unsigned (but, it was pretty close here).
> >>
> >>
> >> Ironically, both 9-bit unsigned and 10-bit signed, with a displacement
> >> scale, manage to slightly beat out the 12-bit signed/unscaled
> >> displacement style used by RISC-V.
> >>
> >> Say, Disp12s can reach +/-2K. Whereas, scaled Disp9u (for QWORD) can
> >> reach 4K.
> > <
> > This is the same argument I used in Mc 88100 arguing that displacement
> > arithmetic need not be signed (ala IBM 360), that the arithmetic was
> > congruent (could be rephrased in the same number of instructions,
> > and allow certain linker tricks.
> > <
> > The compiler people wouldn't even discuss it.
> The arithmetic is not unsigned, but the 9-bit displacements are.
>
> Argument is that this last bit can increase hit-rate by from around 88%
> to 97%, whereas a sign bit would have gained only a fraction of a percent..
<
My measured data indicates that about 93% of integer constants are positive..
And that positive has a much wider span than negatives:: such that if one
dedicated 7/8th of the constants to positive and 1/8 to negatives, both
sides would be served better than 1/2 and 1/2.
>
> However, from 9 to 10 bits, sign won out (in both cases, it was only of
> a fraction of a percent).
>
> Reaching 100% would still require a significantly larger displacement.
> >>
> >>
> >> Granted, the RISC-V strategy (unscaled displacements) would be more of a
> >> win if the general case use of packed structs or similar was "actually a
> >> thing".
> >>
> > What you are saying is that "If RISC-V hadn't screwed up so many things it
> > would have been a significantly better ISA". And no one could possibly disagree
> > with you.
> Probably.
Only Probably ?!?
>
> Scaled displacements, Register-indexed Load/Store, Constant-loading that
> "doesn't suck", ...
>
> Yet, it is seemingly the most popular open ISA at this point.
>
An Open ISA simply means somebody else can come in a dump a crapload
of new OpCodes where you wanted to put your next generation feature that
had been so carefully worked out to fit exactly right there without adding
any gates to the decoder.
>
> In most other regards, I would put my bet instead on ARMv8, except:
> Not an open ISA;
> ALU condition codes, bleh...
> >>
> >> For ALU immediate values, 9 bits still gets ~ 95%, 12 bits would get ~
> >> 97%. Both beat out Imm5 at roughly 54%, ...
> > <
> > But now you have to route all sorts of different sizes from the instruction
> > to various operand busses, whereas I only have to route {16,32-64}-bits..
> > This takes less decode logic and less multiplexing logic in the <time
> > critical> forwarding "loop".
> After the decode stage, all the pipeline sees is a 33-bit value...
<
On a 64-bit machine ?!?
>
> Granted, during decode, the decoder needs to deal with all of the
> various possible instruction layouts.
>
> So, say:
> Lookup opcode based on the various bits;
<
if( 6 <= inst.major <= 14 ) then OpCode format is from OpCode
else OpCode format is from Major
// but the important thing is that all register specifiers are always in the same
// bit positions
<
> Finds where it is routed to;
if( 9<= inst.major <= 10) MODIF determines routing
if( inst.major == 12 ) MOD determines routing
> Finds the "FormID" which tells which instruction layout was used/...
> Unpack instruction based on FormID rules;
>
> Outer decoder maps the decoded instruction decoder's outputs to the
> pipeline's lanes and register ports.
> > <
> > RISC-V then compounds this problem by adding compression.
> Yeah.
> It also looks simple, but in some ways is kind of a pain.
>
> RVC is "kinda evil" in some ways. Doesn't map easily to "hey, unpack
> this instruction according to this particular layout", as there are a
> number of one-off deviations, ...
<
Just wait until people add their own OpCodes to this compressed space.

> > If it weren't for benchmarketeering, the compiler would never have had to know.
> Granted.

On 8/19/2023 4:54 PM, MitchAlsup wrote:
> On Saturday, August 19, 2023 at 3:54:44 PM UTC-5, BGB wrote:
>> On 8/19/2023 2:12 PM, MitchAlsup wrote:
>
>>>> Main advantage of LDSH/SHORI being that it expands easily to 64-bit
>>>> constants, whereas LUI doesn't.
>>>>
>>> LDSH = Load Signed Half Word ??
>>> SHORI = Store Half OR Immediate ??
>> LDSH = Load-via-Shift (the name I originally came up with for BJX1).
>> SHORI = Shift-with-OR (the name Hitachi came up with for SH5).
>>
>> Both basically being the same mechanism:
>> Rn = (Rn<<16)|Imm16u;
> <
> lack of addition is problematic.
>>
>> In both ISA branches, Load/Store (from memory) uses a MOV.x name, such as:
>> MOV.W (R4), R9 //BJX2 notation
>> MOV.W @R4, R9 //SuperH notation
>>
> An inbound memory reference should be spelled LD
> An outbound memory reference should be spelled ST

In some naming conventions (in many traditional RISC's).

Less true of where I started from...

Major influences on the design were:
SuperH, TMS320, MSP430, ...

How do these ISA's name their Load/Store ops?
MOV.x
....

Seemingly, M68K and similar also followed a lot of similar conventions.

So, the design didn't originally "evolve" out of something like RISC-V
or similar, rather, it evolved out of SuperH with influence from TMS320,
but then managed to go in a convergent direction towards RISC-V in some
areas...

>>
>
>>>> Ironically, despite being a microcontroller RISC, the IMM
>>>> prefix-instruction in MicroBlaze is also functionally similar to a jumbo
>>>> prefix.
>>> <
>>> STD 3.141592653589278643,[R3,R7<<3,DISP64]
>>> <
>>> Is 1 instruction, issues in 1 cycle, wastes no temporary registers,.......
>>> That is, you can store an arbitrary constant anywhere in memory
>>> using any addressing mode at any time with a single instruction.
>> Possible.
>>
>> Pulling similar off in my case would likely require 3 instructions
>> (assuming the RiMOV extension), or 4 (otherwise).
> <
> RISC-V typically uses 3 or 4 instructions:
> AUPIC; LD const; LDHI; ST location

You would need more than this to represent such an address mode.

I would estimate this case would need more like 8 instructions for RISC-V:
AUIPC; LDD; AUIPC; LDD; SLL; ADD; ADD; STD

My case, it is mostly the addressing mode:
MOV Imm64, R16
MOV Disp64, R17
ADD R3, R17, R17
MOV.Q R16, (R17, R7)

If it were a Disp33:
MOV Imm64, R16
LEA.B (R3, Disp33s), R17
MOV.Q R16, (R17, R7)

>>
>> But, this is not a common case...
> <
> I can show subroutines with 10 of these in a row.
> <
> -------------
>>> You are still operating under the assumption that I had to sacrifice
>>> anything.
> <
>> There is less space for opcode bits.
> <
> Not compared to MIPS, Mc 88100, RISC-V.
> And every OpCode Group has space remaining.

OK.

As noted, I had concerns before for the encoding space left over in
RISC-V once the various extensions were considered.

But, yeah, many of my existing encoding blocks are already mostly full.
Apart from F3 and F9, both of which have ~ 24 bits of unassigned space.

Within the F0 block, remaining space is:
F0-7 (partial)
F0-9/A/B (enough here for ~ 96 3R ops)
F0-E/F (Reclaimed), ~ 64 more 3R ops.

Most of the F0-3-(8..F) 2R space remains free:
Around 256 2R ops;
All of the F0-7-(8..F) 2R space remains free:
Another 256 2R ops;

So, could add around 160 3R ops and 512 2R ops, then F0 would be full.

As is, there are around 264 3R ops, and around 272 2R ops.

The number of mnemonics is a little less as some encodings share mnemonics.

The F2 block still has enough space reserved for around 96 more
"Imm10,Rn" ops. Currently, all assigned "Rm,Imm9,Rn" spots are in use.

The F1 block has 1 spot available (out of 32).
This was used for the Disp9 Load/Store ops.

>>
>> Something like x86 SSE or ARM NEON style SIMD would likely be an issue
>> for encoding space, at least in terms of 32-bit ops... Granted, I am
>> guessing you probably also have an "escape hatch" for more opcode space?...
> <
> All of the unused OpCode Groups are reserved for the future. There are 22
> (out of 64) Major OpCodes for future expansion. Given that I consumed
> 21 for 16-bit immediates, I think there is plenty (at least for the rest of
> my lifetime.) Also notice I got Vectorization and SIMD into 2 instructions.

As noted, I would have assumed having enough opcode space to fit
ideally, say, several thousand unique instructions.

But, wanting some Imm9/Disp9 encodings eats into things a fair bit.

>>>>
>>>> Though, the 9-bit cases effectively expand to 10-bit signed in XG2,
>>>> partly because, while 9-bit unsigned won out over 9-bit signed, 10-bit
>>>> signed wins out over 10-bit unsigned (but, it was pretty close here).
>>>>
>>>>
>>>> Ironically, both 9-bit unsigned and 10-bit signed, with a displacement
>>>> scale, manage to slightly beat out the 12-bit signed/unscaled
>>>> displacement style used by RISC-V.
>>>>
>>>> Say, Disp12s can reach +/-2K. Whereas, scaled Disp9u (for QWORD) can
>>>> reach 4K.
>>> <
>>> This is the same argument I used in Mc 88100 arguing that displacement
>>> arithmetic need not be signed (ala IBM 360), that the arithmetic was
>>> congruent (could be rephrased in the same number of instructions,
>>> and allow certain linker tricks.
>>> <
>>> The compiler people wouldn't even discuss it.
>> The arithmetic is not unsigned, but the 9-bit displacements are.
>>
>> Argument is that this last bit can increase hit-rate by from around 88%
>> to 97%, whereas a sign bit would have gained only a fraction of a percent.
> <
> My measured data indicates that about 93% of integer constants are positive.
> And that positive has a much wider span than negatives:: such that if one
> dedicated 7/8th of the constants to positive and 1/8 to negatives, both
> sides would be served better than 1/2 and 1/2.

I was running stats separately between displacements and integer
immediate values.

But, yeah, this is how it ended up.

Apart from ADD (and, implicitly, SUB); most of the ALU ops ended up with
positive-only immediate values. These values remained unsigned in XG2,
but were expanded to 10 bits.

Load/Store displacements ended up becoming signed though.

>>
>> However, from 9 to 10 bits, sign won out (in both cases, it was only of
>> a fraction of a percent).
>>
>> Reaching 100% would still require a significantly larger displacement.
>>>>
>>>>
>>>> Granted, the RISC-V strategy (unscaled displacements) would be more of a
>>>> win if the general case use of packed structs or similar was "actually a
>>>> thing".
>>>>
>>> What you are saying is that "If RISC-V hadn't screwed up so many things it
>>> would have been a significantly better ISA". And no one could possibly disagree
>>> with you.
>> Probably.
> Only Probably ?!?
>>
>> Scaled displacements, Register-indexed Load/Store, Constant-loading that
>> "doesn't suck", ...
>>
>> Yet, it is seemingly the most popular open ISA at this point.
>>
> An Open ISA simply means somebody else can come in a dump a crapload
> of new OpCodes where you wanted to put your next generation feature that
> had been so carefully worked out to fit exactly right there without adding
> any gates to the decoder.

Possibly true.

But also the option to be like "that design sucks, I am not going to
adopt it".

And, no one needs to pay royalties.

Like, say, if someone wanted to use BJX2 in their own project, they are
not under any obligation to pay me royalties, this is "just how it is".

Even if, granted, this does mean one needs a "day job" to pay for ones'
cost of living and similar.

>>
>> In most other regards, I would put my bet instead on ARMv8, except:
>> Not an open ISA;
>> ALU condition codes, bleh...
>>>>
>>>> For ALU immediate values, 9 bits still gets ~ 95%, 12 bits would get ~
>>>> 97%. Both beat out Imm5 at roughly 54%, ...
>>> <
>>> But now you have to route all sorts of different sizes from the instruction
>>> to various operand busses, whereas I only have to route {16,32-64}-bits.
>>> This takes less decode logic and less multiplexing logic in the <time
>>> critical> forwarding "loop".
>> After the decode stage, all the pipeline sees is a 33-bit value...
> <
> On a 64-bit machine ?!?

Click here to read the complete article

BGB <cr88192@gmail.com> writes:
>On 8/19/2023 2:12 PM, MitchAlsup wrote:
>> On Saturday, August 19, 2023 at 11:40:27 AM UTC-5, BGB wrote:
>>> On 8/18/2023 12:52 PM, MitchAlsup wrote:
>>>> On Friday, August 18, 2023 at 1:10:37 AM UTC-5, BGB wrote:
>>>>> On 8/16/2023 12:04 PM, pec...@gmail.com wrote:
>>>>>> BGB wrote:
>>>>>>>> I started to think that RVC should be removed from specification, and its opcode space should be essentially free for any use.
>>>>>>>> Code compression could be optional and vendor specific, performed during installation or loading/linking.
>>>>>>>> Compilers are unaware of it anyway and it doesn't affect the size of zipped binaries used for distribution
>>>>>>>> Reserved part of 16-bit space alone could double available 32 bit opcode space.
>>>>>>>>
>>>>>>> I would almost be inclined to agree, but more because the existing RVC
>>>>>>> encoding scheme is *awful* (like, someone looked at Thumb and was then
>>>>>>> like, "Hey man, hold my beer!").
>>>>>> That's why I wrote "vendor specific".
>>>>>> Generally compression scheme should be extension-agnostic (=orthogonal), and concentrated on low-end applications, because it is
>>>>>> the only performance boosting feature in the ISA for this segment.
>>>>>>
>>>>>> Unfortunately they (risc nazi) managed to add compressed floating point instructions.
>>>>>> The real irony is that it is the least important area. Most of the cores have no fpu at all. Big cores perform most of the floating point operations in the SIMD units. There is not much room int the market for middle ground.
>>>>>> Moreover, floating point code is quite regular, concentrated in the small loop kernels - performance impact of compression will be negligible.
>>>>>>
>>>>> Yeah.
>>>>>
>>>>> Realistically, a few major things make sense as 16-bit ops:
>>>>> MOV Reg, Reg
>>>>> ADD Reg, Reg
>>>>> MOV Imm, Reg
>>>>> ADD Imm, Reg
>>>>> A selection of basic Load/Store ops;
>>>>> A few short branch encodings;
>>>>> ...
>>>>>
>>>>> It makes sense to give the instructions which appear in the highest
>>>>> densities the shorter encodings, and one can gloss over everything else.
>>>>>
>>>>>
>>>>> Also preferably without the encoding scheme being a dog-chewed mess.
>>>>> Granted, my own ISA is not entirely free of dog-chew, but both it and
>>>>> RISC-V sort of have this in common.
>>>>>
>>>>> Mine has some encoding wonk from its origins as an ISA originally with
>>>>> 16-bit instructions (which, ironically, has been gradually migrating
>>>>> away from its 16-bit origins).
>>>>>
>>>>>
>>>>>
>>>>> Having recently seen some of Mitch's encoding, I can admit that it is at
>>>>> least "not dog chewed".
>>>> <
>>>> This is a consequence of me having done a moderately dog-chewed ISA
>>>> in 1983, worked on SPARC for 9 years, then over in x86-64 for 7 years
>>>> then having done a GPU ISA, and then retired from working for corporations.
>>>> <
>>>> What you see is an attempt to combine the best features of RISC with the
>>>> best features of CISC (and there are some--much to the chagrin of the
>>>> puritans) into a cohesive and mostly orthogonal ISA.
>>> Fair enough.
>>>>>
>>>>> Though, it does seem to lean a little further in the direction of
>>>>> immediate bits at the expense of opcode bits.
>>>> <
>>>> Because it was here that pure RISC ISAs waste so many instructions on
>>>> pasting bits together only to sue them once as operands. So by inventing
>>>> universal constants all of these bit pasting instructions vanish from the
>>>> instruction stream.
>>> Yeah, this is why I ended up adding jumbo prefixes...
>>>
>>>
>>> Even within a pure RISC, there are better/worse:
>>> OK: LDSH/SHORI
>>> Worse: LUI+ADD or similar;
>>> BAD: PC-relative Load
>> <
>> Only when "done wrong".
>> <
>> LDD R7,[IP,0x1234]
>> <
>> Is one <word> instruction using R0 as a proxy for IP when used as a base register.
>
>>>
>>> Main advantage of LDSH/SHORI being that it expands easily to 64-bit
>>> constants, whereas LUI doesn't.
>>>
>> LDSH = Load Signed Half Word ??
>> SHORI = Store Half OR Immediate ??
>
>LDSH = Load-via-Shift (the name I originally came up with for BJX1).
>SHORI = Shift-with-OR (the name Hitachi came up with for SH5).
>
>Both basically being the same mechanism:
> Rn = (Rn<<16)|Imm16u;

ARM T32 has:

MOVT = Move a 16-bit immediate value to the top halfword of the destination register.

ARM A64 has:

MOVK = Move a 16-bit immediate value to anywhere in the destination register.

Operation
bits(datasize) result;
result = X[d, datasize];
result<pos+15:pos> = imm16;
X[d, datasize] = result;

Re: More of my philosophy about CISC and RISC instructions..

<7034e3e8-3a16-488b-9877-89b9169ba8den@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33716&group=comp.arch#33716

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:1b8d:b0:3f6:b052:3431 with SMTP id bp13-20020a05622a1b8d00b003f6b0523431mr26439qtb.5.1692545356805;
Sun, 20 Aug 2023 08:29:16 -0700 (PDT)
X-Received: by 2002:a17:903:1c4:b0:1bf:fcc:e8d7 with SMTP id
e4-20020a17090301c400b001bf0fcce8d7mr2352750plh.9.1692545356578; Sun, 20 Aug
2023 08:29:16 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 20 Aug 2023 08:29:15 -0700 (PDT)
In-Reply-To: <ubs0ng$17b7g$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:507b:9597:941b:92c0;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:507b:9597:941b:92c0
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<ubqr9n$uehf$1@dont-email.me> <4abb73a0-37f7-410c-9ea1-3d433bf8a80cn@googlegroups.com>
<ubra6f$10m81$1@dont-email.me> <299eacf1-ed31-4611-a9b0-e5098f85bd8bn@googlegroups.com>
<ubs0ng$17b7g$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7034e3e8-3a16-488b-9877-89b9169ba8den@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sun, 20 Aug 2023 15:29:16 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: MitchAlsup - Sun, 20 Aug 2023 15:29 UTC

On Saturday, August 19, 2023 at 10:19:17 PM UTC-5, BGB wrote:
> On 8/19/2023 4:54 PM, MitchAlsup wrote:
> > On Saturday, August 19, 2023 at 3:54:44 PM UTC-5, BGB wrote:
> >> On 8/19/2023 2:12 PM, MitchAlsup wrote:
> >
> >>>> Main advantage of LDSH/SHORI being that it expands easily to 64-bit
> >>>> constants, whereas LUI doesn't.
> >>>>
> >>> LDSH = Load Signed Half Word ??
> >>> SHORI = Store Half OR Immediate ??
> >> LDSH = Load-via-Shift (the name I originally came up with for BJX1).
> >> SHORI = Shift-with-OR (the name Hitachi came up with for SH5).
> >>
> >> Both basically being the same mechanism:
> >> Rn = (Rn<<16)|Imm16u;
> > <
> > lack of addition is problematic.
> >>
> >> In both ISA branches, Load/Store (from memory) uses a MOV.x name, such as:
> >> MOV.W (R4), R9 //BJX2 notation
> >> MOV.W @R4, R9 //SuperH notation
> >>
> > An inbound memory reference should be spelled LD
> > An outbound memory reference should be spelled ST
> In some naming conventions (in many traditional RISC's).
>
>
> Less true of where I started from...
>
> Major influences on the design were:
> SuperH, TMS320, MSP430, ...
>
> How do these ISA's name their Load/Store ops?
> MOV.x
> ...
>
> Seemingly, M68K and similar also followed a lot of similar conventions.
>
MOV is only appropriate when you can combine both the LD and ST in a
single instruction:: MOV @R4,@r6
>
> So, the design didn't originally "evolve" out of something like RISC-V
> or similar, rather, it evolved out of SuperH with influence from TMS320,
> but then managed to go in a convergent direction towards RISC-V in some
> areas...
> >>
> >
> >>>> Ironically, despite being a microcontroller RISC, the IMM
> >>>> prefix-instruction in MicroBlaze is also functionally similar to a jumbo
> >>>> prefix.
> >>> <
> >>> STD 3.141592653589278643,[R3,R7<<3,DISP64]
> >>> <
> >>> Is 1 instruction, issues in 1 cycle, wastes no temporary registers,........
> >>> That is, you can store an arbitrary constant anywhere in memory
> >>> using any addressing mode at any time with a single instruction.
> >> Possible.
> >>
> >> Pulling similar off in my case would likely require 3 instructions
> >> (assuming the RiMOV extension), or 4 (otherwise).
> > <
> > RISC-V typically uses 3 or 4 instructions:
> > AUPIC; LD const; LDHI; ST location
> You would need more than this to represent such an address mode.
<
Where you is not me but is most other RISCs.
>
> I would estimate this case would need more like 8 instructions for RISC-V:
> AUIPC; LDD; AUIPC; LDD; SLL; ADD; ADD; STD
<
Which is why RISC-V is mediocre at best.
>
>
> My case, it is mostly the addressing mode:
> MOV Imm64, R16
> MOV Disp64, R17
> ADD R3, R17, R17
> MOV.Q R16, (R17, R7)
<
Still 1 instruction in my ISA
<
STD 3.141592653589278643,[R3,R7<<3,DISP64]
>
> If it were a Disp33:
> MOV Imm64, R16
> LEA.B (R3, Disp33s), R17
> MOV.Q R16, (R17, R7)
<
DISP32 form saves 1 word::
<
STD 3.141592653589278643,[R3,R7<<3,DISP32]

> > All of the unused OpCode Groups are reserved for the future. There are 22
> > (out of 64) Major OpCodes for future expansion. Given that I consumed
> > 21 for 16-bit immediates, I think there is plenty (at least for the rest of
> > my lifetime.) Also notice I got Vectorization and SIMD into 2 instructions.
> As noted, I would have assumed having enough opcode space to fit
<
> ideally, say, several thousand unique instructions.
<
Certainly my ISA has room, but remember I get both vectorization and SIMD
out of exactly 2 instructions--instead of 1300..........
>

> >> After the decode stage, all the pipeline sees is a 33-bit value...
> > <
> > On a 64-bit machine ?!?
> Yeah. If you want to pass a 64-bit immediate, it eats multiple lanes...
<
Then it is not really a 64-bit machine in a similar manner that Mc 68000
was a 16-bit machine that could perform 32-bit calculations.
<
> The decoders cooperate to produce a 64-bit value split across two 33-bit
> immediate fields (which may then be glued back together at a later stage)..
>
> "A 33 bit immediate should be big enough for anyone..."
>
Even Floating Point ??
>
> Basically, the decoder deals with 64-bit values in a similar way to how
> the ALU ops deal with 128-bit values, namely by having multiple narrower
> units cooperate and give the illusion of a wider unit.
> >>
> >> Granted, during decode, the decoder needs to deal with all of the
> >> various possible instruction layouts.
> >>
> >> So, say:
> >> Lookup opcode based on the various bits;
> > <
> > if( 6 <= inst.major <= 14 ) then OpCode format is from OpCode
> > else OpCode format is from Major
> > // but the important thing is that all register specifiers are always in the same
> > // bit positions
> > <
> >> Finds where it is routed to;
> > if( 9<= inst.major <= 10) MODIF determines routing
> > if( inst.major == 12 ) MOD determines routing
> My instruction format wasn't organized based on where the instruction is
> routed. In some cases, this routing has changed around based on design
> changes within the core (adding or removing units, ...).
<
No, you misunderstand:: it is not where instructions are routed to that I am
talking about, it is where OPERANDS are routed from.
>
> Rather, things were more organized by instruction format, so 3R
> instructions are near other 3R instructions, most 2R instructions are
> consolidated into big blocks, ...
>
Yes, I have this setup, too, but INS and FMAC sit in the same subGroup.
>
> In effect, there is a giant set of nested "casez" blocks for every
> instruction in the ISA.
<
I do this with tabularized subroutines:: three_operand[opcode](arguments);
where the routing <from> is performed as setup to arguments.
>

Re: More of my philosophy about CISC and RISC instructions..

<ubtq4l$1gv5c$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33717&group=comp.arch#33717

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Sun, 20 Aug 2023 14:38:59 -0500
Organization: A noiseless patient Spider
Lines: 512
Message-ID: <ubtq4l$1gv5c$1@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com>
<bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me>
<3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me>
<2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<ubqr9n$uehf$1@dont-email.me>
<4abb73a0-37f7-410c-9ea1-3d433bf8a80cn@googlegroups.com>
<ubra6f$10m81$1@dont-email.me>
<299eacf1-ed31-4611-a9b0-e5098f85bd8bn@googlegroups.com>
<ubs0ng$17b7g$1@dont-email.me>
<7034e3e8-3a16-488b-9877-89b9169ba8den@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 20 Aug 2023 19:39:01 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="362821d9e9d014c27de130fe78782e57";
logging-data="1604780"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX199EczbsT062vTiEWxV9mEK"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:qySw/n1z1pMVT4Ss2jGsJzvb8uQ=
Content-Language: en-US
In-Reply-To: <7034e3e8-3a16-488b-9877-89b9169ba8den@googlegroups.com>

by: BGB - Sun, 20 Aug 2023 19:38 UTC

On 8/20/2023 10:29 AM, MitchAlsup wrote:
> On Saturday, August 19, 2023 at 10:19:17 PM UTC-5, BGB wrote:
>> On 8/19/2023 4:54 PM, MitchAlsup wrote:
>>> On Saturday, August 19, 2023 at 3:54:44 PM UTC-5, BGB wrote:
>>>> On 8/19/2023 2:12 PM, MitchAlsup wrote:
>>>
>>>>>> Main advantage of LDSH/SHORI being that it expands easily to 64-bit
>>>>>> constants, whereas LUI doesn't.
>>>>>>
>>>>> LDSH = Load Signed Half Word ??
>>>>> SHORI = Store Half OR Immediate ??
>>>> LDSH = Load-via-Shift (the name I originally came up with for BJX1).
>>>> SHORI = Shift-with-OR (the name Hitachi came up with for SH5).
>>>>
>>>> Both basically being the same mechanism:
>>>> Rn = (Rn<<16)|Imm16u;
>>> <
>>> lack of addition is problematic.
>>>>
>>>> In both ISA branches, Load/Store (from memory) uses a MOV.x name, such as:
>>>> MOV.W (R4), R9 //BJX2 notation
>>>> MOV.W @R4, R9 //SuperH notation
>>>>
>>> An inbound memory reference should be spelled LD
>>> An outbound memory reference should be spelled ST
>> In some naming conventions (in many traditional RISC's).
>>
>>
>> Less true of where I started from...
>>
>> Major influences on the design were:
>> SuperH, TMS320, MSP430, ...
>>
>> How do these ISA's name their Load/Store ops?
>> MOV.x
>> ...
>>
>> Seemingly, M68K and similar also followed a lot of similar conventions.
>>
> MOV is only appropriate when you can combine both the LD and ST in a
> single instruction:: MOV @R4,@r6

I didn't make this convention.
It was more one of those things that ended up "grandfathered in".

I just sort of made the stylistic change from "@Reg" to "(Reg)", but for
the most part the assembler will still accept "@Reg".

BGBCC will also accept auto-increment notation as well, but these are
faked with a multi-op sequence:
MOV.L R4, @-R6
Emitted as:
ADD -4, R6
MOV.L R4, (R6)

Could have faked a "MOV.L @R4, @R5" instruction, but didn't, as the
ancestor ISA didn't have this either.

In my newer (still incomplete) TKUCC effort, the handling for most of
these "fake" instructions was dropped, so the ASM will need to be
written more in terms of what instructions actually exist.

Though, one other difference was that for TKUCC, it was generally
assuming that jumbo prefixes always exist.

>>
>> So, the design didn't originally "evolve" out of something like RISC-V
>> or similar, rather, it evolved out of SuperH with influence from TMS320,
>> but then managed to go in a convergent direction towards RISC-V in some
>> areas...
>>>>
>>>
>>>>>> Ironically, despite being a microcontroller RISC, the IMM
>>>>>> prefix-instruction in MicroBlaze is also functionally similar to a jumbo
>>>>>> prefix.
>>>>> <
>>>>> STD 3.141592653589278643,[R3,R7<<3,DISP64]
>>>>> <
>>>>> Is 1 instruction, issues in 1 cycle, wastes no temporary registers,.......
>>>>> That is, you can store an arbitrary constant anywhere in memory
>>>>> using any addressing mode at any time with a single instruction.
>>>> Possible.
>>>>
>>>> Pulling similar off in my case would likely require 3 instructions
>>>> (assuming the RiMOV extension), or 4 (otherwise).
>>> <
>>> RISC-V typically uses 3 or 4 instructions:
>>> AUPIC; LD const; LDHI; ST location
>> You would need more than this to represent such an address mode.
> <
> Where you is not me but is most other RISCs.

I meant on RISC-V...

That addressing mode kinda "steps in it".

>>
>> I would estimate this case would need more like 8 instructions for RISC-V:
>> AUIPC; LDD; AUIPC; LDD; SLL; ADD; ADD; STD
> <
> Which is why RISC-V is mediocre at best.
>>
>>
>> My case, it is mostly the addressing mode:
>> MOV Imm64, R16
>> MOV Disp64, R17
>> ADD R3, R17, R17
>> MOV.Q R16, (R17, R7)
> <
> Still 1 instruction in my ISA
> <
> STD 3.141592653589278643,[R3,R7<<3,DISP64]
>>
>> If it were a Disp33:
>> MOV Imm64, R16
>> LEA.B (R3, Disp33s), R17
>> MOV.Q R16, (R17, R7)
> <
> DISP32 form saves 1 word::
> <
> STD 3.141592653589278643,[R3,R7<<3,DISP32]
>

Here it saves a constant load, since the largest allowed displacement
encoding is 33 bits. While it could theoretically be encoded, a larger
fixed displacement would not easily be supported with the current
implementation.

If one were to use a simpler addressing mode, this case could drop to 2
instructions.

There is not currently any way to directly store a constant to memory.
Similarly, there are still some other implementation limits at present,
like there is only support for a single immediate/displacement for a
given instruction (at least short of using multiple lanes and some
additional decoder hackery).

With the RiMOV extension, there is, however:
MOV.Q R2, (R3, R7, Disp11u)

But, unlike the normal displacements, this displacement is unscaled and
can't currently be expanded with a jumbo prefix.

A similar encoding was used for instructions like:
DMACS.L R4, R5, R6, R7 //R7=R4*R5+R6
But, this feature is also an optional extension.

>>> All of the unused OpCode Groups are reserved for the future. There are 22
>>> (out of 64) Major OpCodes for future expansion. Given that I consumed
>>> 21 for 16-bit immediates, I think there is plenty (at least for the rest of
>>> my lifetime.) Also notice I got Vectorization and SIMD into 2 instructions.
>> As noted, I would have assumed having enough opcode space to fit
> <
>> ideally, say, several thousand unique instructions.
> <
> Certainly my ISA has room, but remember I get both vectorization and SIMD
> out of exactly 2 instructions--instead of 1300..........

There end up needing to be a lot of special cases even for integer ops, say:
ADD Rm, Ro, Rn
ADD Rm, Imm9u, Rn //zero extended
ADD Rm, Imm9n, Rn //one extended
ADD Imm16u, Rn
ADD Imm16n, Rn

ADDS.L Rm, Ro, Rn //sign-extend result from 32-bits
ADDS.L Rm, Imm9u, Rn //zero extended
ADDS.L Rm, Imm9n, Rn //one extended

ADDU.L Rm, Ro, Rn //zero-extend result from 32-bits
ADDU.L Rm, Imm9u, Rn //zero extended
ADDU.L Rm, Imm9n, Rn //one extended

...

Or, variant semantics:
FADD Rm, Ro, Rn //FPU ADD, Binary64, fixed RNE
FADDG Rm, Ro, Rn //FPU ADD, Binary64, dynamic rounding mode
FADDA Rm, Ro, Rn //FPU ADD, Binary64, fake Binary32 RNE
...

Or:
FADD Rm, Imm5fp, Rn //FPIMM
...

SIMD ops, eg:
PADD.H Rm, Ro, Rn //Packed ADD 4x Binary16
PADD.F Rm, Ro, Rn //Packed ADD 2x Binary32
PADDX.F Xm, Xo, Xn //Packed ADD 4x Binary32

PADD.W Rm, Ro, Rn //Packed ADD 4x Int16
PADD.L Rm, Ro, Rn //Packed ADD 2x Int32
...

Didn't bother with signed and unsigned saturate variants, at least for
32-bit encodings (things like "PADDSS.W"/"PADDUS.W"/... would add a lot
of ops).

And, a lot of format converter ops, ...

PLDCH Rm, Rn //2x Binary16 (Low bits) to 2x Binary32
PLDCHH Rm, Rn //2x Binary16 (High bits) to 2x Binary32
PLDCXH Rm, Xn //4x Binary16 to 4x Binary32

PSTCH Rm, Rn //2x Binary32 to 2x Binary16
...

RGB5UPCK64 Rm, Rn //Unpack RGB555 to 64-bit (16b per component)
RGB5PCK64 Rm, Rn //Pack 64-bit to RGB555
...

Though, these sorts of converter ops have resulted in a fair number of
mnemonics.

But, yeah, as noted, assuming that no more "heavy eaters" are added, the
remaining F3 and F9 blocks have theoretically enough space for 1024 more
3R ops (or 32768 if one wanted to use it all for 2R ops...).

Potentially, relocating BRA/BSR to the F8 block could free up 64 more 3R
spots in the F0 block, but would be a pretty major "breaking change".

And, potentially, one could need some more Imm16 ops, and there was
debate over the possibility of, say, adding "BRGT Rn, Disp12s" ops and
similar (say, because usefulness of the existing Disp8s ops are limited
by the small displacement size; and "Conditional branch that doesn't
stomp SR.T" is potentially useful for combining predication with
modulo-scheduling, ...).

Granted, one could do the latter case by faking:
BRGT R4, .L0
As:
BRLE R4, .L1
BRA .L0
.L1:
In cases where .L0 is outside the 256 byte limit.

Click here to read the complete article

Re: More of my philosophy about CISC and RISC instructions..

<13ad15bd-63f6-466a-8295-097a390a0bf7n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33718&group=comp.arch#33718

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:192d:b0:641:8875:22cb with SMTP id es13-20020a056214192d00b00641887522cbmr29074qvb.5.1692566613949;
Sun, 20 Aug 2023 14:23:33 -0700 (PDT)
X-Received: by 2002:a05:6a00:1794:b0:687:5274:da17 with SMTP id
s20-20020a056a00179400b006875274da17mr2924805pfg.2.1692566613506; Sun, 20 Aug
2023 14:23:33 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 20 Aug 2023 14:23:32 -0700 (PDT)
In-Reply-To: <ubtq4l$1gv5c$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:507b:9597:941b:92c0;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:507b:9597:941b:92c0
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<ubqr9n$uehf$1@dont-email.me> <4abb73a0-37f7-410c-9ea1-3d433bf8a80cn@googlegroups.com>
<ubra6f$10m81$1@dont-email.me> <299eacf1-ed31-4611-a9b0-e5098f85bd8bn@googlegroups.com>
<ubs0ng$17b7g$1@dont-email.me> <7034e3e8-3a16-488b-9877-89b9169ba8den@googlegroups.com>
<ubtq4l$1gv5c$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <13ad15bd-63f6-466a-8295-097a390a0bf7n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sun, 20 Aug 2023 21:23:33 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3457

by: MitchAlsup - Sun, 20 Aug 2023 21:23 UTC

On Sunday, August 20, 2023 at 2:39:06 PM UTC-5, BGB wrote:
> On 8/20/2023 10:29 AM, MitchAlsup wrote:
>
> > Then it is not really a 64-bit machine in a similar manner that Mc 68000
> > was a 16-bit machine that could perform 32-bit calculations.
> > <
> The registers and ops are still 64-bits...
>
> Just the immediate field from the decoders remain 33 bits.
>
So, in order to use a 64-bit constant you consume 2/3rds of your execution lanes ?!?
>

> >>
> > Even Floating Point ??
> Say:
> MOV Imm64, Rn
>
> Can load full Binary64, but is technically a 2-lane operation (that two
> 32 bit halves are glued together in the pipeline is invisible).
<
Right, so using a 64-bit constant eats 2/3rds of your execution width.
>
> Or:
> FLDCH Imm16, Rn //Load immediate as Binary16 to Binary64
> Routes the immediate through a format converter.
>
I do similar with 5-bit immediates in FP.
<
FDIV R9,#5,R16 // R9 = 5.0D0 / R16
>
> For the FpImm experiment, had ended up needing to make the decoders
> perform a 5-bit to Binary16 conversion, with Binary16 to Binary64
> converters shoved into the register-file module (only valid on certain
> register ports).
>
> These sorts of cases are handled with "fake" internal registers that
> essentially tell the register-file "Hey, there is a Binary16 value in
> the Imm33 field, get the value of this having been converted to Binary64"..

On 8/19/2023 2:24 PM, MitchAlsup wrote:
> On Saturday, August 19, 2023 at 12:46:19 PM UTC-5, BGB wrote:
>> On 8/18/2023 11:05 AM, Scott Lurndal wrote:
>>> MitchAlsup <Mitch...@aol.com> writes:
>>>> On Monday, August 14, 2023 at 5:45:10=E2=80=AFAM UTC-5, pec...@gmail.com wr=
>>>> ote:
>>>
>>>>> Reserved part of 16-bit space alone could double available 32 bit opcode =
>>>> space.
>>>> <
>>>> RISC-V allocates 3/4 of the OpCode encoding to 16-bit stuff and gains all t=
>>>> he complexity of variable length instructions but gains little of the benef=
>>>> its.
>>>
>>> ARM has the Thumb32 instruction set, which I just finished a simulator for,
>>> which reserves three of the 16-bit encodings to indicate 32-bit instructions.
>>>
>> Having developed along a vaguely similar trajectory, I had ended up with
>> a similar scheme (to Thumb2) in my case.
>>> It also includes the rather unusual T16 IT instruction (If-Then) which, as a form
>>> of predication, can cover up to four subsequent T16 instructions.
>>>
>>> It's worth noting that the IT instruction was deprecated in the thumb
>>> support for AArch32 in ARMv8+.
>> I would guess that this mechanism would have required a way to preserve
>> and restore this state during interrupts, which could be "rather
>> annoying" to deal with.
>>
>> Probably combined with limited use by compilers compared with normal
>> branches.
>>
>> Conventional wisdom is usually that "branch predictor makes branches not
>> slow" so "one does not need predication".
>>
>>
>> Except now the CPU performance may "eat it" when trying to deal with a
>> PNG Paeth filter or bitwise range coder or similar (which effectively
>> feed raw entropy from the data stream into the branch hit/miss
>> handling). Likewise for things like alpha-testing pixels in a software
>> rasterizer, etc.
> <
> Extract and Insert Instructions simplify the encoding of these.

Paeth filter (from memory) is something like:
P=A+B-C
dA=abs(P-A)
dB=abs(P-B)
dC=abs(P-C)
if(dA<dB)
{
if(dA<dC)
{ D=A; }
else if(dB<dC)
{ D=B; }
else
{ D=C; }
}else
{
if(dB<dC)
{ D=B; }
else
{ D=C; }
}

But, then evaluated for nearly every component of nearly every pixel in
an image (because it tends to do better than the other filters), but is
also the slowest (though, there are "faster" ways to do it by turning it
into a mess of subtraction, shifts, and bitwise operators).

Range-coding would be something like (also from memory):
struct RangeCtx_s {
byte wvals[65536]; //probability weights
uint16_t wctx; //context of previous bits
uint32_t rhi; //high value of range
uint32_t rlo; //low value of range
uint32_t rmid; //midpoint (encoded range)
byte *cs; //encoded bitstream
};

int DecodeBit(RangeCtx *ctx)
{
uint32_t m, r;
byte w, b;

w=ctx->wvals[ctx->wctx]; //fetch weight
r=ctx->rhi-ctx->rlo; //size of range
m=ctx->rlo+((r>>8)*w); //calc midpoint based on weight

if(ctx->rmid>=m)
{
b=1; //if >= midpoint, we have a 1 bit
ctx->rlo=m; //cut off low part of range
w=w_inctab[w]; //adjust weight for 1 bit
}else
{
b=0; //if < midpoint, we have a 0 bit
ctx->rhi=m; //cut off high part of range
w=w_dectab[w]; //adjust weight for 0 bit
}
ctx->wvals[ctx->wctx]=w; //update probability weight
ctx->wctx=(ctx->wctx<<1)|b; //update context

//check and renormaulize as range converges
if(!((ctx->rhi^ctx->rlo)>>24))
{
ctx->rhi=(ctx->rhi<<8)|0xFF;
ctx->rlo=(ctx->rlo<<8)|0x00;
ctx->rmid=(ctx->rmid<<8)|(*ctx->cs++);
}
return(b);
}

int DecodeByte(RangeCtx *ctx)
{
int b;
b=DecodeBit(ctx);
b=(b<<1)|DecodeBit(ctx);
b=(b<<1)|DecodeBit(ctx);
b=(b<<1)|DecodeBit(ctx);
b=(b<<1)|DecodeBit(ctx);
b=(b<<1)|DecodeBit(ctx);
b=(b<<1)|DecodeBit(ctx);
b=(b<<1)|DecodeBit(ctx);
return(b);
}

Where, say, this sort of thing can get an entropy coder which compresses
stuff reasonably well, but is painfully slow (meanwhile, a Huffman style
entropy coder is significantly faster).

One can argue, "why not encode 1 bit at a time, and instead encode a
whole symbol?", this works, but ironically tends to be slower than
encoding/decoding things 1 bit at a time...

There is a lot of variation on this sort of idea.

....

>>
>> But, a lot of people (including compiler writers) seem inclined to
>> ignore these cases.
> <
> Often disguised as a series of shifts (a << const1)>>const2 because
> the underlying language does not express variable length bit-fields
> efficiently.

For bitfields and similar...

Less sure about Paeth or bitwise range coders, which are more hurt by
needing to deal with all the branching and similar.

Granted, there are faster alternatives in both cases.

>>
>> But, then CPU designers are like "well, we will interpret a short
>> forward branch as predicating the next N instructions rather than doing
>> a branch", ...
> <
> AND WHY NOT ??

I guess, probably true enough...

>>
>> ...

Re: More of my philosophy about CISC and RISC instructions..

<ubu3vh$1ies4$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33720&group=comp.arch#33720

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Sun, 20 Aug 2023 17:26:55 -0500
Organization: A noiseless patient Spider
Lines: 114
Message-ID: <ubu3vh$1ies4$1@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com>
<bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me>
<3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me>
<2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<ubqr9n$uehf$1@dont-email.me>
<4abb73a0-37f7-410c-9ea1-3d433bf8a80cn@googlegroups.com>
<ubra6f$10m81$1@dont-email.me>
<299eacf1-ed31-4611-a9b0-e5098f85bd8bn@googlegroups.com>
<ubs0ng$17b7g$1@dont-email.me>
<7034e3e8-3a16-488b-9877-89b9169ba8den@googlegroups.com>
<ubtq4l$1gv5c$1@dont-email.me>
<13ad15bd-63f6-466a-8295-097a390a0bf7n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 20 Aug 2023 22:26:57 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2a39aa5a328649d362b84a1c1f2e7794";
logging-data="1653636"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19v4jSRTjWMXMr3ryH6FiuU"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:BZ0sWU1ZG9KXnRy0w8uiP/mCAqM=
In-Reply-To: <13ad15bd-63f6-466a-8295-097a390a0bf7n@googlegroups.com>
Content-Language: en-US

by: BGB - Sun, 20 Aug 2023 22:26 UTC

On 8/20/2023 4:23 PM, MitchAlsup wrote:
> On Sunday, August 20, 2023 at 2:39:06 PM UTC-5, BGB wrote:
>> On 8/20/2023 10:29 AM, MitchAlsup wrote:
>>
>>> Then it is not really a 64-bit machine in a similar manner that Mc 68000
>>> was a 16-bit machine that could perform 32-bit calculations.
>>> <
>> The registers and ops are still 64-bits...
>>
>> Just the immediate field from the decoders remain 33 bits.
>>
> So, in order to use a 64-bit constant you consume 2/3rds of your execution lanes ?!?

Actually, encoding an instruction with a 64-bit constant eats *all* of
the lanes...

How much space does it take to encode a 64-bit constant?
96 bits.
How wide is the fetch?
96 bits.
How many more ops *could* I have bundled here?
0.

The 32-bit encodings can be bundled, but none of them is capable of
producing a full 64-bit value in the first place.

I could almost have gotten away with a 25-bit field here...
The largest 32-bit encodings only encode a 25-bit immed;
All larger values could have been multi-lane.

But, for other reasons, 33-bit made more sense here than 25 bit.

Though, FWIW, in early versions the 64-bit constant load did split the
value like: 24 bits in lanes 2 and 3; 16 bits in Lane 1.

If I supported hybrid bundles, say:
FE-Op32 | Op32
Or similar.

One could potentially have jumbo encodings in a bundle, but the largest
cases would have still been 33 bits.

Only real way for this to become a limiting factor would be to support
larger bundles.

>>
>
>>>>
>>> Even Floating Point ??
>> Say:
>> MOV Imm64, Rn
>>
>> Can load full Binary64, but is technically a 2-lane operation (that two
>> 32 bit halves are glued together in the pipeline is invisible).
> <
> Right, so using a 64-bit constant eats 2/3rds of your execution width.

Yeah.

>>
>> Or:
>> FLDCH Imm16, Rn //Load immediate as Binary16 to Binary64
>> Routes the immediate through a format converter.
>>
> I do similar with 5-bit immediates in FP.
> <
> FDIV R9,#5,R16 // R9 = 5.0D0 / R16

I had interpreted the 5-bit values as E3.F2, had tried various schemes,
but E3.F2 ended up with the best overall hit-rate among the
possibilities tested.

Hit rate still isn't particularly high though.

Meanwhile, it turns out Binary16 can exactly represent a majority of the
floating point constants which appear in code, so the operation to
express a Binary16 value directly has a fairly good hit rate.

>>
>> For the FpImm experiment, had ended up needing to make the decoders
>> perform a 5-bit to Binary16 conversion, with Binary16 to Binary64
>> converters shoved into the register-file module (only valid on certain
>> register ports).
>>
>> These sorts of cases are handled with "fake" internal registers that
>> essentially tell the register-file "Hey, there is a Binary16 value in
>> the Imm33 field, get the value of this having been converted to Binary64".
>

Meanwhile, checking some other stats:
Only around 5% of function-local branches are within +/- 256 bytes.
But, the vast majority (96%) are within +/- 4K.

This implies that a 12-bit branch displacement would be a fair bit more
useful than an 8 bit displacement.

Meanwhile, looking at my compiler, it had somehow slipped my mind that I
also already have "BRcc Rn, Disp33s" encodings via jumbo prefixes, which
end up being the main form used if this feature is enabled in my
compiler (but... I had forgotten it seems...).

So, it is more a tradeoff between burning encoding space, vs needing a
64-bit encoding for these.

Re: More of my philosophy about CISC and RISC instructions..

<5261c939-c2ef-4ed5-947f-89c482e710f8n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33721&group=comp.arch#33721

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:bd1:b0:64f:3bbb:1d1c with SMTP id ff17-20020a0562140bd100b0064f3bbb1d1cmr22677qvb.2.1692572234312;
Sun, 20 Aug 2023 15:57:14 -0700 (PDT)
X-Received: by 2002:a63:7510:0:b0:565:dd06:815f with SMTP id
q16-20020a637510000000b00565dd06815fmr973206pgc.3.1692572233886; Sun, 20 Aug
2023 15:57:13 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 20 Aug 2023 15:57:13 -0700 (PDT)
In-Reply-To: <ubu3vh$1ies4$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:507b:9597:941b:92c0;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:507b:9597:941b:92c0
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<ubqr9n$uehf$1@dont-email.me> <4abb73a0-37f7-410c-9ea1-3d433bf8a80cn@googlegroups.com>
<ubra6f$10m81$1@dont-email.me> <299eacf1-ed31-4611-a9b0-e5098f85bd8bn@googlegroups.com>
<ubs0ng$17b7g$1@dont-email.me> <7034e3e8-3a16-488b-9877-89b9169ba8den@googlegroups.com>
<ubtq4l$1gv5c$1@dont-email.me> <13ad15bd-63f6-466a-8295-097a390a0bf7n@googlegroups.com>
<ubu3vh$1ies4$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5261c939-c2ef-4ed5-947f-89c482e710f8n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sun, 20 Aug 2023 22:57:14 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 6661

by: MitchAlsup - Sun, 20 Aug 2023 22:57 UTC

On Sunday, August 20, 2023 at 5:27:02 PM UTC-5, BGB wrote:
> On 8/20/2023 4:23 PM, MitchAlsup wrote:
> > On Sunday, August 20, 2023 at 2:39:06 PM UTC-5, BGB wrote:
> >> On 8/20/2023 10:29 AM, MitchAlsup wrote:
> >>
> >>> Then it is not really a 64-bit machine in a similar manner that Mc 68000
> >>> was a 16-bit machine that could perform 32-bit calculations.
> >>> <
> >> The registers and ops are still 64-bits...
> >>
> >> Just the immediate field from the decoders remain 33 bits.
> >>
> > So, in order to use a 64-bit constant you consume 2/3rds of your execution lanes ?!?
> Actually, encoding an instruction with a 64-bit constant eats *all* of
> the lanes...
>
> How much space does it take to encode a 64-bit constant?
> 96 bits.
> How wide is the fetch?
> 96 bits.
> How many more ops *could* I have bundled here?
> 0.
>
OK, I see the disconnect. I am fetching 128-bits wide on a 1-wide machine
so that I can use excess I$ bandwidth to do other things (including power
savings), while you are fetching only as wide as you can issue. Secondarily
I am designing a scalable ISA where you are designing an ISA targeting a
particular data path design.
>
> The 32-bit encodings can be bundled, but none of them is capable of
> producing a full 64-bit value in the first place.
<
value = Operand or value = result ?
>
> I could almost have gotten away with a 25-bit field here...
> The largest 32-bit encodings only encode a 25-bit immed;
> All larger values could have been multi-lane.
<
And you are naming lanes of decode not lanes of execution. Gotcha.
>
>

> > I do similar with 5-bit immediates in FP.
> > <
> > FDIV R9,#5,R16 // R9 = 5.0D0 / R16
> I had interpreted the 5-bit values as E3.F2, had tried various schemes,
> but E3.F2 ended up with the best overall hit-rate among the
> possibilities tested.
>
> Hit rate still isn't particularly high though.
>
This would have caused problems in assembly and disassembly, So,
after looking at the data we choose that the expansions from int->fp
were just like (double)int_constant. Sure it limited use, but there are
a lot of 1,2,5,10s in FP codes and while we missed things like 0.5,...
what we did was a pure win as we still have float->double conversions
in the "routing".
>
> Meanwhile, it turns out Binary16 can exactly represent a majority of the
> floating point constants which appear in code, so the operation to
> express a Binary16 value directly has a fairly good hit rate.
<
I would have done something like this, but I don't have the ability to
spontaneously poof a 16-bit immediate onto a FP instruction.
<
On the other hand, having universal constants means I save crap_loads
of instructions delivering constants as FP Operands.
<
>
> >
> Meanwhile, checking some other stats:
> Only around 5% of function-local branches are within +/- 256 bytes.
> But, the vast majority (96%) are within +/- 4K.
<
An even larger number are within ¼Mb. In fact, I don't think Brian's compiler
has run into a subroutine large enough to need a backup plan in this area.
>
> This implies that a 12-bit branch displacement would be a fair bit more
> useful than an 8 bit displacement.
>
My argument is that 16-bits is even more useful than 12. Although Thomas'
work in binutils is now compressing halfword tables jumps (switch) into
byte jumps when all the labels are within range--making switch tables much
more compact.
>
> Meanwhile, looking at my compiler, it had somehow slipped my mind that I
> also already have "BRcc Rn, Disp33s" encodings via jumbo prefixes, which
> end up being the main form used if this feature is enabled in my
> compiler (but... I had forgotten it seems...).
<
We all do that now and again.....
>
> So, it is more a tradeoff between burning encoding space, vs needing a
> 64-bit encoding for these.
<
I don't see it as an encoding space issue, I see it as a variable length constant
routing problem from instruction buffer to function unit as part of "forwarding".
So, the majority of instructions (able to be encoded) have a routing OpCode
in addition to a Calculation OpCode. Instructions with 16-bit immediates have
a canned routing OpCode.
<
You can consider the routing OpCode as treating "forwarding" as another
calculation performed prior to execution. {Not dissimilar to how DG NOVA
had shifts with integer arithmetic}

Re: More of my philosophy about CISC and RISC instructions..

<52323735-ab5c-4e0d-8489-144811bb4ef3n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33722&group=comp.arch#33722

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:240c:b0:76d:7a5f:5f04 with SMTP id d12-20020a05620a240c00b0076d7a5f5f04mr28740qkn.1.1692572616154;
Sun, 20 Aug 2023 16:03:36 -0700 (PDT)
X-Received: by 2002:a17:903:22c1:b0:1b7:c803:4818 with SMTP id
y1-20020a17090322c100b001b7c8034818mr2874088plg.0.1692572615774; Sun, 20 Aug
2023 16:03:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 20 Aug 2023 16:03:35 -0700 (PDT)
In-Reply-To: <ubu0it$1hvac$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:507b:9597:941b:92c0;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:507b:9597:941b:92c0
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<2fc528c1-c0d4-4f20-8ce9-5845e9b805e0n@googlegroups.com> <%EMDM.147258$X02a.70096@fx46.iad>
<ubqv57$v2re$1@dont-email.me> <bb790143-18f5-4865-b162-5a0da094a273n@googlegroups.com>
<ubu0it$1hvac$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <52323735-ab5c-4e0d-8489-144811bb4ef3n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sun, 20 Aug 2023 23:03:36 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: MitchAlsup - Sun, 20 Aug 2023 23:03 UTC

On Sunday, August 20, 2023 at 4:29:05 PM UTC-5, BGB wrote:
> On 8/19/2023 2:24 PM, MitchAlsup wrote:
>
> Range-coding would be something like (also from memory):
> struct RangeCtx_s {
> byte wvals[65536]; //probability weights
> uint16_t wctx; //context of previous bits
> uint32_t rhi; //high value of range
> uint32_t rlo; //low value of range
> uint32_t rmid; //midpoint (encoded range)
> byte *cs; //encoded bitstream
> };
>
> int DecodeBit(RangeCtx *ctx)
> {
> uint32_t m, r;
> byte w, b;
// type mismatch between return(b); below and typeof(b)
>
> w=ctx->wvals[ctx->wctx]; //fetch weight
> r=ctx->rhi-ctx->rlo; //size of range
> m=ctx->rlo+((r>>8)*w); //calc midpoint based on weight
>
> if(ctx->rmid>=m)
> {
> b=1; //if >= midpoint, we have a 1 bit
> ctx->rlo=m; //cut off low part of range
> w=w_inctab[w]; //adjust weight for 1 bit
> }else
> {
> b=0; //if < midpoint, we have a 0 bit
> ctx->rhi=m; //cut off high part of range
> w=w_dectab[w]; //adjust weight for 0 bit
> }
> ctx->wvals[ctx->wctx]=w; //update probability weight
> ctx->wctx=(ctx->wctx<<1)|b; //update context
>
> //check and renormaulize as range converges
> if(!((ctx->rhi^ctx->rlo)>>24))
> {
> ctx->rhi=(ctx->rhi<<8)|0xFF;
> ctx->rlo=(ctx->rlo<<8)|0x00;
> ctx->rmid=(ctx->rmid<<8)|(*ctx->cs++);
> }
> return(b);
> }
>
> int DecodeByte(RangeCtx *ctx)
> {
> int b;
> b=DecodeBit(ctx);
> b=(b<<1)|DecodeBit(ctx);
> b=(b<<1)|DecodeBit(ctx);
> b=(b<<1)|DecodeBit(ctx);
> b=(b<<1)|DecodeBit(ctx);
> b=(b<<1)|DecodeBit(ctx);
> b=(b<<1)|DecodeBit(ctx);
> b=(b<<1)|DecodeBit(ctx);
> return(b);
> }
>
intriguing.

On 8/20/2023 6:03 PM, MitchAlsup wrote:
> On Sunday, August 20, 2023 at 4:29:05 PM UTC-5, BGB wrote:
>> On 8/19/2023 2:24 PM, MitchAlsup wrote:
>>
>> Range-coding would be something like (also from memory):
>> struct RangeCtx_s {
>> byte wvals[65536]; //probability weights
>> uint16_t wctx; //context of previous bits
>> uint32_t rhi; //high value of range
>> uint32_t rlo; //low value of range
>> uint32_t rmid; //midpoint (encoded range)
>> byte *cs; //encoded bitstream
>> };
>>
>> int DecodeBit(RangeCtx *ctx)
>> {
>> uint32_t m, r;
>> byte w, b;
> // type mismatch between return(b); below and typeof(b)
>>
>> w=ctx->wvals[ctx->wctx]; //fetch weight
>> r=ctx->rhi-ctx->rlo; //size of range
>> m=ctx->rlo+((r>>8)*w); //calc midpoint based on weight
>>
>> if(ctx->rmid>=m)
>> {
>> b=1; //if >= midpoint, we have a 1 bit
>> ctx->rlo=m; //cut off low part of range
>> w=w_inctab[w]; //adjust weight for 1 bit
>> }else
>> {
>> b=0; //if < midpoint, we have a 0 bit
>> ctx->rhi=m; //cut off high part of range
>> w=w_dectab[w]; //adjust weight for 0 bit
>> }
>> ctx->wvals[ctx->wctx]=w; //update probability weight
>> ctx->wctx=(ctx->wctx<<1)|b; //update context
>>
>> //check and renormaulize as range converges
>> if(!((ctx->rhi^ctx->rlo)>>24))
>> {
>> ctx->rhi=(ctx->rhi<<8)|0xFF;
>> ctx->rlo=(ctx->rlo<<8)|0x00;
>> ctx->rmid=(ctx->rmid<<8)|(*ctx->cs++);
>> }
>> return(b);
>> }
>>
>> int DecodeByte(RangeCtx *ctx)
>> {
>> int b;
>> b=DecodeBit(ctx);
>> b=(b<<1)|DecodeBit(ctx);
>> b=(b<<1)|DecodeBit(ctx);
>> b=(b<<1)|DecodeBit(ctx);
>> b=(b<<1)|DecodeBit(ctx);
>> b=(b<<1)|DecodeBit(ctx);
>> b=(b<<1)|DecodeBit(ctx);
>> b=(b<<1)|DecodeBit(ctx);
>> return(b);
>> }
>>
> intriguing.

I can't say for certain I remembered it correctly (getting bitwise range
coders to encode and decode correctly is rather fiddly).

But, yeah, a similar sort of algorithm was used in LZMA.

Several video codecs also use a hybrid of range-coding and Huffman
coding, where feeding a Huffman coded stream through a range-coder can
get some compression improvements with less of a performance impact.

General idea is that weights are all initialized at a neutral value
(say, 0x80), and then adjusted based on each bit such that more common
bits reduce the range more slowly, but less common ones more rapidly
shrink the range (and, as the high and low bits converge; the encoder
pushes out the converged bits, and the decoder reads in more bits,
causing the range to expand again).

Both the encoder and decoder operate as mirrors of each other.

I had missed a case for handling a case which can pop up sometimes,
where the high and low value fail to converge to the same value and the
range collapses to a point that bits could no longer be unambiguously
encoded. Usually in this case, the entire range needs to be emitted or
read-in before the process can continue.

Note that the inctab/dectab would not be +1 or -1, but usually more of
an S-curve shape. Weight will move more quickly near the center, and
more slowly near the extremes. Tables would be set up to keep the
weights in a range of say, 0x10<=w<=0xF0 or similar, ...

....

Re: More of my philosophy about CISC and RISC instructions..

<uc01um$1vcn2$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33728&group=comp.arch#33728

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Mon, 21 Aug 2023 11:04:35 -0500
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <uc01um$1vcn2$1@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me>
<3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me>
<2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com>
<47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
<8m4EM.686037$TPw2.506418@fx17.iad> <ubqphs$u0gp$1@dont-email.me>
<4941705f-ac14-4f98-b3d1-6fa62bdb4236n@googlegroups.com>
<ubqtm2$uqgs$1@dont-email.me>
<4826e253-d7c4-4b5e-98b4-8b51ee9e4a88n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 21 Aug 2023 16:04:38 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2a39aa5a328649d362b84a1c1f2e7794";
logging-data="2077410"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/LkGgMSECwbA8sWBkFqUTN"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:kshNbhxHpJO3Ep1gpLF/IzSSE+U=
In-Reply-To: <4826e253-d7c4-4b5e-98b4-8b51ee9e4a88n@googlegroups.com>
Content-Language: en-US

by: BGB - Mon, 21 Aug 2023 16:04 UTC

On 8/19/2023 2:17 PM, MitchAlsup wrote:
> On Saturday, August 19, 2023 at 12:21:10 PM UTC-5, BGB wrote:
>> On 8/19/2023 11:31 AM, MitchAlsup wrote:
>
>>>> The 16-bit ops would mostly hold a collection of 2R ops.
>>>>
>>>> The 24-bit ops hold a selection of Ld/St and 3R ALU ops.
>>>> iiii-isss ss0n-nnnn zzz0-0001 //LD (Rs, Disp5)
>>>> iiii-isss ss1n-nnnn zzz0-0001 //ST (Rs, Disp5)
>>>> tttt-tsss ss0n-nnnn zzz1-0001 //LD (Rs, Rt)
>>>> tttt-tsss ss1n-nnnn zzz1-0001 //ST (Rs, Rt)
>>> <
>>> I think you have sacrificed too much entropy to this particular encoding.
>>> Consider a 32-bit RISC LD/ST instruction can have a 16-bit displacement
>>> So a 24-bit one should be able to have an 8-bit displacement.
>>> <
>> Then for this encoding block, you would have *nothing* apart from LD/ST
>> ops...
> <
> 2 flavors
> a) MEM Rd,[Rb,DISP16]
> b) MEM Rd,[Rb,Ri<<s] // which have optional displacements {32,64}

There are reasons to have other types of ops as well, say, 3R ALU.

>>
>> One could note that Disp5u still typically hits roughly 50% of the time
>> in my stats. This is probably enough for the encoding to still be "useful".
> <
> Whereas, My encoding gives that "flavor" 16-bits which as you stated is good
> to the 99% level. 99% > 50% to the point the compiler does not need the
> intermediate pattern recognition cases.
>

But, Disp16 would not be viable with a 24-bit instruction format.

Disp8 would still leave "only" LD/ST ops in this case.

Where, say, LD/ST also needs 3 bits to encode the type of value to be
loaded/stored.

In my conventions:
000=SB (8-bit)
001=SW (16-bit)
010=SL (32-bit)
011=SQ (64-bit)
100=UB
101=UW
110=UL
111=X (128-bit)

Unsigned is usually N/A for store, so had often interpreted the unsigned
store cases as LEA, though this creates conflict with X.

Geanted, the idea of an ISA with prime-number-of-bytes sized
instructions is likely DOA anyways, so...

Re: More of my philosophy about CISC and RISC instructions..

<a875ad5b-56e5-4a57-8b59-406fbe0ab970n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33732&group=comp.arch#33732

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5b02:0:b0:410:8bcc:fb8d with SMTP id m2-20020ac85b02000000b004108bccfb8dmr47948qtw.7.1692641349470;
Mon, 21 Aug 2023 11:09:09 -0700 (PDT)
X-Received: by 2002:a17:90b:104:b0:26d:19eb:d861 with SMTP id
p4-20020a17090b010400b0026d19ebd861mr1483892pjz.9.1692641349041; Mon, 21 Aug
2023 11:09:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Aug 2023 11:09:08 -0700 (PDT)
In-Reply-To: <uc01um$1vcn2$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:fb:e091:a8ab:d83e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:fb:e091:a8ab:d83e
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com> <47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
<8m4EM.686037$TPw2.506418@fx17.iad> <ubqphs$u0gp$1@dont-email.me>
<4941705f-ac14-4f98-b3d1-6fa62bdb4236n@googlegroups.com> <ubqtm2$uqgs$1@dont-email.me>
<4826e253-d7c4-4b5e-98b4-8b51ee9e4a88n@googlegroups.com> <uc01um$1vcn2$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a875ad5b-56e5-4a57-8b59-406fbe0ab970n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Mon, 21 Aug 2023 18:09:09 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4563

by: MitchAlsup - Mon, 21 Aug 2023 18:09 UTC

On Monday, August 21, 2023 at 11:04:42 AM UTC-5, BGB wrote:
> On 8/19/2023 2:17 PM, MitchAlsup wrote:
> > On Saturday, August 19, 2023 at 12:21:10 PM UTC-5, BGB wrote:
> >> On 8/19/2023 11:31 AM, MitchAlsup wrote:
> >
> >>>> The 16-bit ops would mostly hold a collection of 2R ops.
> >>>>
> >>>> The 24-bit ops hold a selection of Ld/St and 3R ALU ops.
> >>>> iiii-isss ss0n-nnnn zzz0-0001 //LD (Rs, Disp5)
> >>>> iiii-isss ss1n-nnnn zzz0-0001 //ST (Rs, Disp5)
> >>>> tttt-tsss ss0n-nnnn zzz1-0001 //LD (Rs, Rt)
> >>>> tttt-tsss ss1n-nnnn zzz1-0001 //ST (Rs, Rt)
> >>> <
> >>> I think you have sacrificed too much entropy to this particular encoding.
> >>> Consider a 32-bit RISC LD/ST instruction can have a 16-bit displacement
> >>> So a 24-bit one should be able to have an 8-bit displacement.
> >>> <
> >> Then for this encoding block, you would have *nothing* apart from LD/ST
> >> ops...
> > <
> > 2 flavors
> > a) MEM Rd,[Rb,DISP16]
> > b) MEM Rd,[Rb,Ri<<s] // which have optional displacements {32,64}
<
> There are reasons to have other types of ops as well, say, 3R ALU.
<
In the 1980s I used the nR notation and after a while I found it confused
the readers. So I switched to the nO notation so 3R (Rd=Rs1 OP Rs2) is
now 2O (2 Operands) and since almost every instruction delivers a
result the destination field can be omitted from the notation. Readers
of my literature have found this notation less confusing. Does 3R
means 3 register operands and 1 result Rd = FMAC(Rs1,Rs2,Rs3)
or 3-1 register operands and +1 register result Rd = OP(Rs1,Rs2) ??
<
What notation would you use if an instruction delivered 2 results ??
<
> >>
> >> One could note that Disp5u still typically hits roughly 50% of the time
> >> in my stats. This is probably enough for the encoding to still be "useful".
> > <
> > Whereas, My encoding gives that "flavor" 16-bits which as you stated is good
> > to the 99% level. 99% > 50% to the point the compiler does not need the
> > intermediate pattern recognition cases.
> >
> But, Disp16 would not be viable with a 24-bit instruction format.
<
One of the reasons 24-bits was never considered.
>
> Disp8 would still leave "only" LD/ST ops in this case.
>
> Where, say, LD/ST also needs 3 bits to encode the type of value to be
> loaded/stored.
<
LD needs 3 bits, ST only needs 2. Actually LD only needs 2.8 bits
since we don't need both signed and unsigned 64-bit items. Stores
do not need signed and unsigned, just an indication of how-much
to store.

On Saturday, August 5, 2023 at 1:48:38 AM UTC+9, Amine Moulay Ramdane wrote:
> Hello,
> More of my philosophy about CISC and RISC instructions..
> So we can generally consider CISC (Complex Instruction Set Computer)
> instructions of x86 architecture to be higher-level programming instructions compared to RISC (Reduced Instruction Set Computer) instructions due to their complexity.
>
> CISC instructions are designed to perform more complex operations in a single instruction. This complexity allows higher-level programming languages and compilers to generate fewer instructions to accomplish certain tasks. CISC architectures often have a broader range of instructions, some of which might even directly correspond to operations in high-level programming languages.
> In contrast, RISC instructions are designed to be simpler and more streamlined, typically performing basic operations that can be executed in a single clock cycle. It might require more instructions to accomplish the same high-level task that a CISC instruction could handle in a single operation.

CISC vs. RISC was disucussion in 80s, you can find discussion between RISC-I designer (now Esperanto Tech’s CEO) and Vax designer through IEEE transactions.

I think that it is no longer better point of view, becase;
1)Hardware design tools are advanced,
-it supports design of complex architecture and logic circuit with reasonable performance,
-it supports analysis of complex design.

2)Semiconductor process node are advanced,
-it supports enough transistor counts and realizes complex design,
-it supports lesser fabrication cost for old processes (28nm or less have best cost/performance),

3)Needs is changed,
-Application/domain-specifc architectures are rised,
-Every domain requires different specifications.

4)Logic circuit design philosophy is separated from architecture design philosophy now,
-Logic circuit should be simpler (mainly for verification cost and integration density), this is same from ancient (not only computers).
-But architecture(or mechanisms for processing) can be complex(first two reasons).

-
S. Takano

ps.
tired of social network services, so back to here :)

> Thank you,
> Amine Moulay Ramdane.

Re: More of my philosophy about CISC and RISC instructions..

<uc0dpu$21brh$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33735&group=comp.arch#33735

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Mon, 21 Aug 2023 14:26:51 -0500
Organization: A noiseless patient Spider
Lines: 151
Message-ID: <uc0dpu$21brh$1@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me>
<3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me>
<2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com>
<47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
<8m4EM.686037$TPw2.506418@fx17.iad> <ubqphs$u0gp$1@dont-email.me>
<4941705f-ac14-4f98-b3d1-6fa62bdb4236n@googlegroups.com>
<ubqtm2$uqgs$1@dont-email.me>
<4826e253-d7c4-4b5e-98b4-8b51ee9e4a88n@googlegroups.com>
<uc01um$1vcn2$1@dont-email.me>
<a875ad5b-56e5-4a57-8b59-406fbe0ab970n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 21 Aug 2023 19:26:54 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2a39aa5a328649d362b84a1c1f2e7794";
logging-data="2142065"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19w5a+ZSMo+tZYxis1Rr9KK"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:S8HP7lc0xV752V0AjlwVgJ9Mcds=
Content-Language: en-US
In-Reply-To: <a875ad5b-56e5-4a57-8b59-406fbe0ab970n@googlegroups.com>

by: BGB - Mon, 21 Aug 2023 19:26 UTC

On 8/21/2023 1:09 PM, MitchAlsup wrote:
> On Monday, August 21, 2023 at 11:04:42 AM UTC-5, BGB wrote:
>> On 8/19/2023 2:17 PM, MitchAlsup wrote:
>>> On Saturday, August 19, 2023 at 12:21:10 PM UTC-5, BGB wrote:
>>>> On 8/19/2023 11:31 AM, MitchAlsup wrote:
>>>
>>>>>> The 16-bit ops would mostly hold a collection of 2R ops.
>>>>>>
>>>>>> The 24-bit ops hold a selection of Ld/St and 3R ALU ops.
>>>>>> iiii-isss ss0n-nnnn zzz0-0001 //LD (Rs, Disp5)
>>>>>> iiii-isss ss1n-nnnn zzz0-0001 //ST (Rs, Disp5)
>>>>>> tttt-tsss ss0n-nnnn zzz1-0001 //LD (Rs, Rt)
>>>>>> tttt-tsss ss1n-nnnn zzz1-0001 //ST (Rs, Rt)
>>>>> <
>>>>> I think you have sacrificed too much entropy to this particular encoding.
>>>>> Consider a 32-bit RISC LD/ST instruction can have a 16-bit displacement
>>>>> So a 24-bit one should be able to have an 8-bit displacement.
>>>>> <
>>>> Then for this encoding block, you would have *nothing* apart from LD/ST
>>>> ops...
>>> <
>>> 2 flavors
>>> a) MEM Rd,[Rb,DISP16]
>>> b) MEM Rd,[Rb,Ri<<s] // which have optional displacements {32,64}
> <
>> There are reasons to have other types of ops as well, say, 3R ALU.
> <
> In the 1980s I used the nR notation and after a while I found it confused
> the readers. So I switched to the nO notation so 3R (Rd=Rs1 OP Rs2) is
> now 2O (2 Operands) and since almost every instruction delivers a
> result the destination field can be omitted from the notation. Readers
> of my literature have found this notation less confusing. Does 3R
> means 3 register operands and 1 result Rd = FMAC(Rs1,Rs2,Rs3)
> or 3-1 register operands and +1 register result Rd = OP(Rs1,Rs2) ??
> <
> What notation would you use if an instruction delivered 2 results ??
> <

OK, I used 3R, for "Three Register"; always implicitly 2-source and 1
destination.

A few instructions in BJX2 are 4R (3-source, 1 destination).

In these cases, the internal Rp and Rn ports are separate, but in most
cases Rp and Rn are assumed to be equivalent.

These generally involve an Op64 encoding, with a field that is usually
one of either:
A 4th register field;
An extra load/store displacement (RiMOV);
A rounding mode (some FPU and SIMD ops);
More opcode bits (some other ops).
Mostly depends on which "primary opcode" this prefix is used with.

At present, no instructions deliver 2 (independent) results, but it is
theoretically possible that this could be done with multi-lane ops.

>>>>
>>>> One could note that Disp5u still typically hits roughly 50% of the time
>>>> in my stats. This is probably enough for the encoding to still be "useful".
>>> <
>>> Whereas, My encoding gives that "flavor" 16-bits which as you stated is good
>>> to the 99% level. 99% > 50% to the point the compiler does not need the
>>> intermediate pattern recognition cases.
>>>
>> But, Disp16 would not be viable with a 24-bit instruction format.
> <
> One of the reasons 24-bits was never considered.

I did briefly experiment with it in BJX2 (as a possible code-size saving
feature for microcontroller like profiles), but quickly dropped the idea
due to it being fairly quickly revealed to be "a dog turd":
Code size savings fell short of expectations;
Byte alignment within the instruction stream added a whole new mess of
issues (in an ISA not otherwise designed to deal with free-form byte
alignment in the instruction stream);
....

So, it has basically been entirely dropped from the ISA.

The encoding space had later been reused to glue XGPR support onto the
BJX2 Baseline encoding.

This space entirely goes away in XG2 Mode though, which is part of what
eliminated the 40x2 encoding (which was built on top of an "invalid edge
case" of the XGPR encoding). Could maybe revive the idea at some point
under a slightly different encoding.

Then again, when 48-bit ops got knocked out of the ISA due to an
encoding change, they still haven't been revived. But, the original
form, even if it were revived, would not have fit what the ISA has
become. I would effectively need "something new", and what this would be
exactly has not taken shape (given the relative rareness of Op64 ops,
they wouldn't save much space; and their original role has effectively
been entirely subsumed by Jumbo/Op64 encodings... Given their relative
infrequency, the Op64 encodings being 33% bigger likely doesn't matter
all that much to the overall size of the binary).

Well, and also they would only be encodable in Baseline mode (the
relevant encoding space does not exist in XG2).

....

Partial issue:
Yeah, Disp5u is "not really sufficient".

This is part of why, admittedly, the original form of BJX2 had used
Disp5u Load/Store encodings, but I (fairly early) added Disp9u encodings
because of Disp5u being mostly insufficient.

For a while, had dropped the Disp5 encodings, but ended up reviving them
again because there kept being a non-zero number of edge cases where I
still ended up needing them to exist (even if the "general case" is
dominated by Disp9u and similar).

It is like:
"Why does 32-bit 'ADD Rm, Rn' exist if 'ADD Rm, Ro, Rn' also exists?";
(Pause) "Reasons..."

>>
>> Disp8 would still leave "only" LD/ST ops in this case.
>>
>> Where, say, LD/ST also needs 3 bits to encode the type of value to be
>> loaded/stored.
> <
> LD needs 3 bits, ST only needs 2. Actually LD only needs 2.8 bits
> since we don't need both signed and unsigned 64-bit items. Stores
> do not need signed and unsigned, just an indication of how-much
> to store.
>

That is why I usually ended up putting LEA's there...

A LEA operation is nice to have, but is an issue if one has an 'X'
(paired) case.

Though, depending on the ISA rules, one could skip a byte LEA and
instead encode this case as an ADD.

STB, STW, STL, STQ, LEAW, LEAL, LEAQ, STX

....

Subject	Author
More of my philosophy about CISC and RISC instructions..	Amine Moulay Ramdane
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	pec...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	pec...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	Scott Lurndal
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	Terje Mathisen
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	Terje Mathisen
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	pec...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	JimBrakefield
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	JimBrakefield
Re: More of my philosophy about CISC and RISC instructions..	Scott Lurndal
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	EricP
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	Scott Lurndal
Re: More of my philosophy about CISC and RISC instructions..	Paul A. Clayton
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	Stephen Fuld
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	Stephen Fuld
Re: More of my philosophy about CISC and RISC instructions..	Stephen Fuld
Re: More of my philosophy about CISC and RISC instructions..	Thomas Koenig
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Going fast, was Re: More of my philosophy	John Levine
Re: More of my philosophy about CISC and RISC instructions..	aph
Re: More of my philosophy about CISC and RISC instructions..	luke.l...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	Stefan Monnier
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	Scott Lurndal
Re: More of my philosophy about CISC and RISC instructions..	Paul A. Clayton
Re: More of my philosophy about CISC and RISC instructions..	luke.l...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	luke.l...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	Brett
Re: More of my philosophy about CISC and RISC instructions..	pec...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	Thomas Koenig
Re: More of my philosophy about CISC and RISC instructions..	pec...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	Thomas Koenig
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	pec...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	Paul A. Clayton
register windows (was: More of my philosophy ...)	Anton Ertl
Re: More of my philosophy about CISC and RISC instructions..	Anton Ertl
Re: More of my philosophy about CISC and RISC instructions..	John Levine
Re: More of my philosophy about CISC and RISC instructions..	Anton Ertl
Re: More of my philosophy about CISC and RISC instructions..	Scott Lurndal
Re: More of my philosophy about CISC and RISC instructions..	John Levine
Re: More of my philosophy about CISC and RISC instructions..	Stephen Fuld
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	Timothy McCaffrey
Re: More of my philosophy about CISC and RISC instructions..	Timothy McCaffrey
Re: More of my philosophy about CISC and RISC instructions..	luke.l...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	Timothy McCaffrey
Re: More of my philosophy about CISC and RISC instructions..	Stephen Fuld
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	luke.l...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	luke.l...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	JimBrakefield
Re: More of my philosophy about CISC and RISC instructions..	Hogege NaN