Message-ID:

Spock: The odds of surviving another attack are 13562190123 to 1, Captain.

devel / comp.arch / Re: More of my philosophy about CISC and RISC instructions..

Re: More of my philosophy about CISC and RISC instructions..

<bd354c97-a238-43f3-bc9d-c316e84cad2cn@googlegroups.com>

https://news.novabbs.org/devel/article-flat.php?id=33739&group=comp.arch#33739

X-Received: by 2002:ac8:7dcc:0:b0:40d:4c6:bce4 with SMTP id c12-20020ac87dcc000000b0040d04c6bce4mr48172qte.11.1692655403329;
Mon, 21 Aug 2023 15:03:23 -0700 (PDT)
X-Received: by 2002:a17:902:e746:b0:1bb:de7f:a4b7 with SMTP id
p6-20020a170902e74600b001bbde7fa4b7mr3528586plf.10.1692655403104; Mon, 21 Aug
2023 15:03:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Aug 2023 15:03:22 -0700 (PDT)
In-Reply-To: <uc0dpu$21brh$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:fb:e091:a8ab:d83e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:fb:e091:a8ab:d83e
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com> <47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
<8m4EM.686037$TPw2.506418@fx17.iad> <ubqphs$u0gp$1@dont-email.me>
<4941705f-ac14-4f98-b3d1-6fa62bdb4236n@googlegroups.com> <ubqtm2$uqgs$1@dont-email.me>
<4826e253-d7c4-4b5e-98b4-8b51ee9e4a88n@googlegroups.com> <uc01um$1vcn2$1@dont-email.me>
<a875ad5b-56e5-4a57-8b59-406fbe0ab970n@googlegroups.com> <uc0dpu$21brh$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bd354c97-a238-43f3-bc9d-c316e84cad2cn@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Mon, 21 Aug 2023 22:03:23 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3561

by: MitchAlsup - Mon, 21 Aug 2023 22:03 UTC

On Monday, August 21, 2023 at 2:26:58 PM UTC-5, BGB wrote:
> On 8/21/2023 1:09 PM, MitchAlsup wrote:
> >
>
> It is like:
> "Why does 32-bit 'ADD Rm, Rn' exist if 'ADD Rm, Ro, Rn' also exists?";
> (Pause) "Reasons..."
<
Some things "fall out for free", such as ADD Rd,Rs1,#0 as a MOV inst
{along with similar arithmetic identities} and disallowing these costs
gates and design time for no gain.
<
> >>
> >> Disp8 would still leave "only" LD/ST ops in this case.
> >>
> >> Where, say, LD/ST also needs 3 bits to encode the type of value to be
> >> loaded/stored.
> > <
> > LD needs 3 bits, ST only needs 2. Actually LD only needs 2.8 bits
> > since we don't need both signed and unsigned 64-bit items. Stores
> > do not need signed and unsigned, just an indication of how-much
> > to store.
> >
> That is why I usually ended up putting LEA's there...
>
> A LEA operation is nice to have, but is an issue if one has an 'X'
> (paired) case.
>
> Though, depending on the ISA rules, one could skip a byte LEA and
> instead encode this case as an ADD.
>
> STB, STW, STL, STQ, LEAW, LEAL, LEAQ, STX
<
For the MEM Rd,[Rb,Disp16] case I use the signed LDD as the EXIT
instruction, and the similar place in STD as the ENTER instruction.
There is no need for LEA, here, as it is redundant with ADD.
<
For the MEM Rd,[Rb,Ri<<sc] case I use the signed LDD as LEA,
because ADD with 2 operands will be seen to be less costly in
emulating LEA, so LEA is basically reserved for 3-Operand ADDs.
>
> ...

Re: More of my philosophy about CISC and RISC instructions..

<uc0o08$22t34$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33740&group=comp.arch#33740

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Mon, 21 Aug 2023 17:20:53 -0500
Organization: A noiseless patient Spider
Lines: 247
Message-ID: <uc0o08$22t34$1@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com>
<bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me>
<3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me>
<2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<ubqr9n$uehf$1@dont-email.me>
<4abb73a0-37f7-410c-9ea1-3d433bf8a80cn@googlegroups.com>
<ubra6f$10m81$1@dont-email.me>
<299eacf1-ed31-4611-a9b0-e5098f85bd8bn@googlegroups.com>
<ubs0ng$17b7g$1@dont-email.me>
<7034e3e8-3a16-488b-9877-89b9169ba8den@googlegroups.com>
<ubtq4l$1gv5c$1@dont-email.me>
<13ad15bd-63f6-466a-8295-097a390a0bf7n@googlegroups.com>
<ubu3vh$1ies4$1@dont-email.me>
<5261c939-c2ef-4ed5-947f-89c482e710f8n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 21 Aug 2023 22:20:56 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="60048df51a645c346b795ec2e584cf80";
logging-data="2192484"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Qo8nTDmO5VRp1P06To0y4"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:D5Zhu6dkuL3xKt9DTUP0LqBb7yM=
Content-Language: en-US
In-Reply-To: <5261c939-c2ef-4ed5-947f-89c482e710f8n@googlegroups.com>

by: BGB - Mon, 21 Aug 2023 22:20 UTC

On 8/20/2023 5:57 PM, MitchAlsup wrote:
> On Sunday, August 20, 2023 at 5:27:02 PM UTC-5, BGB wrote:
>> On 8/20/2023 4:23 PM, MitchAlsup wrote:
>>> On Sunday, August 20, 2023 at 2:39:06 PM UTC-5, BGB wrote:
>>>> On 8/20/2023 10:29 AM, MitchAlsup wrote:
>>>>
>>>>> Then it is not really a 64-bit machine in a similar manner that Mc 68000
>>>>> was a 16-bit machine that could perform 32-bit calculations.
>>>>> <
>>>> The registers and ops are still 64-bits...
>>>>
>>>> Just the immediate field from the decoders remain 33 bits.
>>>>
>>> So, in order to use a 64-bit constant you consume 2/3rds of your execution lanes ?!?
>> Actually, encoding an instruction with a 64-bit constant eats *all* of
>> the lanes...
>>
>> How much space does it take to encode a 64-bit constant?
>> 96 bits.
>> How wide is the fetch?
>> 96 bits.
>> How many more ops *could* I have bundled here?
>> 0.
>>
> OK, I see the disconnect. I am fetching 128-bits wide on a 1-wide machine
> so that I can use excess I$ bandwidth to do other things (including power
> savings), while you are fetching only as wide as you can issue. Secondarily
> I am designing a scalable ISA where you are designing an ISA targeting a
> particular data path design.

Yeah, something to this effect.

A hypothetical "future machine" could do things well beyond what my
current implementation could do.

But, in the current implementation, getting 2 or 3 wide is only possible
if using 32-bit ops.

It is basically sort of like:
Fetch 96 bits;
Shove it through three 32-bit decoders;
Pick the outputs corresponding to the current bundle format.

The 16-bit decoder, RISC-V decoders, etc, also see the same input
bundle. But, if the bundle or mode doesn't match, these decoder outputs
are ignored (and any unused lanes get filled with NOPs).

There is possible wonk in my case that lane numbering is in reverse order.
Op1
Op2 | Op1
Op3 | Op2 | Op1

But, I have my reasons (my initial conclusion was that reverse-ordering
the lanes was "less bad" than forward ordering would have been, even if
forward ordering could have been cheaper).

>>
>> The 32-bit encodings can be bundled, but none of them is capable of
>> producing a full 64-bit value in the first place.
> <
> value = Operand or value = result ?
>>
>> I could almost have gotten away with a 25-bit field here...
>> The largest 32-bit encodings only encode a 25-bit immed;
>> All larger values could have been multi-lane.
> <
> And you are naming lanes of decode not lanes of execution. Gotcha.

The lanes for decode and execute are equivalent in this case.

A fancier core could have them separate, but at present, they are
equivalent.

>>
>>
>
>>> I do similar with 5-bit immediates in FP.
>>> <
>>> FDIV R9,#5,R16 // R9 = 5.0D0 / R16
>> I had interpreted the 5-bit values as E3.F2, had tried various schemes,
>> but E3.F2 ended up with the best overall hit-rate among the
>> possibilities tested.
>>
>> Hit rate still isn't particularly high though.
>>
> This would have caused problems in assembly and disassembly, So,
> after looking at the data we choose that the expansions from int->fp
> were just like (double)int_constant. Sure it limited use, but there are
> a lot of 1,2,5,10s in FP codes and while we missed things like 0.5,...
> what we did was a pure win as we still have float->double conversions
> in the "routing".

Floating point constants in ASM are represented as raw binary numbers in
my case...

I guess, assembler support for expressing floating point numbers in
decimal notation could have been possible, didn't think of or consider
it though...

But, say, maybe could be possible to, instead of writing, say:
MOV 0x3FF0000000000000, R4
FLDCH 0x3C00, R5
One could write:
MOV 1.0D, R4 //Binary64
FLDCH 1.0H, R5 //Binary16

Where, say, the ASM parser behaves as-if a hexadecimal version of the
constant had been used.

>>
>> Meanwhile, it turns out Binary16 can exactly represent a majority of the
>> floating point constants which appear in code, so the operation to
>> express a Binary16 value directly has a fairly good hit rate.
> <
> I would have done something like this, but I don't have the ability to
> spontaneously poof a 16-bit immediate onto a FP instruction.
> <
> On the other hand, having universal constants means I save crap_loads
> of instructions delivering constants as FP Operands.
> <

In my case, originally it is a "mostly normal" converter op, just with
the input routed from an immediate rather than a register.

>>
>>>
>> Meanwhile, checking some other stats:
>> Only around 5% of function-local branches are within +/- 256 bytes.
>> But, the vast majority (96%) are within +/- 4K.
> <
> An even larger number are within ¼Mb. In fact, I don't think Brian's compiler
> has run into a subroutine large enough to need a backup plan in this area.

With 1MB, it reaches 100% of all branches in most of my current test
programs (excluding some combinations of options for ROTT which can
exceed the 1MB limit).

12-bits is 96% of local (intra function) branches, but only 19% of
global branches (a mixture of function calls, and the backwards branches
for prolog/epilog compression).

>>
>> This implies that a 12-bit branch displacement would be a fair bit more
>> useful than an 8 bit displacement.
>>
> My argument is that 16-bits is even more useful than 12. Although Thomas'
> work in binutils is now compressing halfword tables jumps (switch) into
> byte jumps when all the labels are within range--making switch tables much
> more compact.

Possibly.

Though, my experience seems to imply that 8-bit displacements are fairly
limited if one does displacement calculations based on a 16-bit
instruction word. Would be limited mostly to fairly small switch blocks
(and moderately small loop bodies and similar).

Could be a little better if the 8-bit displacements are unsigned
(forward only) and assuming a 32-bit word, increasing the reach from 64
to 256 instruction words.

Granted, in my case, using 32-bit "BRA Disp20s" instructions as the
jump-table entries is probably not ideal in this sense, but was easiest
to implement in my case (well, and also avoids having any
"non-instruction" data in the ".text" section).

>>
>> Meanwhile, looking at my compiler, it had somehow slipped my mind that I
>> also already have "BRcc Rn, Disp33s" encodings via jumbo prefixes, which
>> end up being the main form used if this feature is enabled in my
>> compiler (but... I had forgotten it seems...).
> <
> We all do that now and again.....

Yeah.

Though, one other tradeoff is that these ops would mostly useful for
loops like:
while(n--) { ... }
Or:
while(p) { ... }

But, not so much:
for(i=0; i<n; i++)
{ ... }

Where, in this case, the relative usefulness of a dedicated Disp12
compare-with-0 branch would also depend on the relative usage of the
former vs the latter.

>>
>> So, it is more a tradeoff between burning encoding space, vs needing a
>> 64-bit encoding for these.
> <
> I don't see it as an encoding space issue, I see it as a variable length constant
> routing problem from instruction buffer to function unit as part of "forwarding".
> So, the majority of instructions (able to be encoded) have a routing OpCode
> in addition to a Calculation OpCode. Instructions with 16-bit immediates have
> a canned routing OpCode.
> <
> You can consider the routing OpCode as treating "forwarding" as another
> calculation performed prior to execution. {Not dissimilar to how DG NOVA
> had shifts with integer arithmetic}

Hmm... And/or (partially) separating the matter of instruction-layout
from opcode semantics?...

So, the instruction is expressed as a combination of "layout" (explicit
in the encoding) and "opcode" (which instruction should be applied to
these parameters).

This could be possible, just sort of implies that all of the major
function units accept the same general interface internally.

Seems like this would have a higher demand for encoding bits than the
strategy I had used, and would lead to a lot of combinations which "are
possible to encode but do not make sense". Though, an intermediate (more
practical) option being to define the table of opcodes per layout.

Say:
fffff-pp-oooo-nnnnn-oo-sssss-ttttt-oooo //3R
fffff-pp-oooo-nnnnn-oo-sssss-iiiii-iiii //Imm9
fffff-pp-oooo-nnnnn-ii-iiiii-iiiii-iiii //Imm16

Click here to read the complete article

Re: More of my philosophy about CISC and RISC instructions..

<7CREM.499490$qnnb.342286@fx11.iad>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33741&group=comp.arch#33741

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!tncsrv06.tnetconsulting.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx11.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com> <7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com> <ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com> <ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com> <8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com> <47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com> <8m4EM.686037$TPw2.506418@fx17.iad> <ubqphs$u0gp$1@dont-email.me> <4941705f-ac14-4f98-b3d1-6fa62bdb4236n@googlegroups.com> <ubqtm2$uqgs$1@dont-email.me> <4826e253-d7c4-4b5e-98b4-8b51ee9e4a88n@googlegroups.com> <uc01um$1vcn2$1@dont-email.me> <a875ad5b-56e5-4a57-8b59-406fbe0ab970n@googlegroups.com> <uc0dpu$21brh$1@dont-email.me> <bd354c97-a238-43f3-bc9d-c316e84cad2cn@googlegroups.com>
In-Reply-To: <bd354c97-a238-43f3-bc9d-c316e84cad2cn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 25
Message-ID: <7CREM.499490$qnnb.342286@fx11.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 21 Aug 2023 22:32:35 UTC
Date: Mon, 21 Aug 2023 18:32:27 -0400
X-Received-Bytes: 2340

by: EricP - Mon, 21 Aug 2023 22:32 UTC

MitchAlsup wrote:
> On Monday, August 21, 2023 at 2:26:58 PM UTC-5, BGB wrote:
>>
>> Though, depending on the ISA rules, one could skip a byte LEA and
>> instead encode this case as an ADD.
>>
>> STB, STW, STL, STQ, LEAW, LEAL, LEAQ, STX
> <
> For the MEM Rd,[Rb,Disp16] case I use the signed LDD as the EXIT
> instruction, and the similar place in STD as the ENTER instruction.
> There is no need for LEA, here, as it is redundant with ADD.

Except if Rb is r0 it means the RIP for a LD/ST instruction,
but the data r0 value for an ADD (which would probably be the
return RIP but you can't assume that).

> <
> For the MEM Rd,[Rb,Ri<<sc] case I use the signed LDD as LEA,
> because ADD with 2 operands will be seen to be less costly in
> emulating LEA, so LEA is basically reserved for 3-Operand ADDs.
>> ...

Again for Rb is r0

Re: More of my philosophy about CISC and RISC instructions..

<uc0qfl$237eg$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33742&group=comp.arch#33742

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Mon, 21 Aug 2023 18:03:14 -0500
Organization: A noiseless patient Spider
Lines: 123
Message-ID: <uc0qfl$237eg$1@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me>
<3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me>
<2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com>
<47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
<8m4EM.686037$TPw2.506418@fx17.iad> <ubqphs$u0gp$1@dont-email.me>
<4941705f-ac14-4f98-b3d1-6fa62bdb4236n@googlegroups.com>
<ubqtm2$uqgs$1@dont-email.me>
<4826e253-d7c4-4b5e-98b4-8b51ee9e4a88n@googlegroups.com>
<uc01um$1vcn2$1@dont-email.me>
<a875ad5b-56e5-4a57-8b59-406fbe0ab970n@googlegroups.com>
<uc0dpu$21brh$1@dont-email.me>
<bd354c97-a238-43f3-bc9d-c316e84cad2cn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 21 Aug 2023 23:03:17 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="60048df51a645c346b795ec2e584cf80";
logging-data="2203088"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/FGJs9NZrC/A7zuv3AA5/m"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:vtwPPMx6DhoiRmJknNywBG/O7i4=
In-Reply-To: <bd354c97-a238-43f3-bc9d-c316e84cad2cn@googlegroups.com>
Content-Language: en-US

by: BGB - Mon, 21 Aug 2023 23:03 UTC

On 8/21/2023 5:03 PM, MitchAlsup wrote:
> On Monday, August 21, 2023 at 2:26:58 PM UTC-5, BGB wrote:
>> On 8/21/2023 1:09 PM, MitchAlsup wrote:
>>>
>>
>> It is like:
>> "Why does 32-bit 'ADD Rm, Rn' exist if 'ADD Rm, Ro, Rn' also exists?";
>> (Pause) "Reasons..."
> <
> Some things "fall out for free", such as ADD Rd,Rs1,#0 as a MOV inst
> {along with similar arithmetic identities} and disallowing these costs
> gates and design time for no gain.
> <

Yeah.
MOV Rm, Rn
ADD Rm, 0, Rn
OR Rm, 0, Rn
...
All exist as semantically equivalent ways to do the same thing.

Similarly, "ADD Rm, Rn" is semantically equivalent to "ADD Rn, Rm, Rn".
But, there may be secondary reasons for such things to exist (such as
interactions between other parts of the ISA, or between the ISA and the
compiler).

In a strict sense, you don't need:
MOV.L (Rm, Disp5u), Rn
If you also have:
MOV.L (Rm, Disp9u), Rn

The former can't express anything that the latter can't.

But, then XGPR came along:
With the former, there was an ability to express negative displacements
(an formerly unserved use-case), but the encoding scheme could not
extend 9u in a similar way.

Also quirks with RiMOV, where for the Jumbo and Op64 prefixes it ended
up making sense to have different semantics when applied to the Disp5u
and Disp9u cases (so the operations differ when prefixed, but are
redundant in the form of a basic 32-bit instruction word).

Technically, the 5u encodings also exist in PrWEX whereas the 9u
encodings do not, but this case would only matter if the ability to
encode memory ops in Lane 2 were a thing (this was experimented with
though).

But, as can be noted, BJX2 isn't really "minimalistic" in the same sense
as something like RISC-V or similar (and some instructions that would
have been unnecessary in RISC-V were necessary in BJX2, due to the lack
of an architectural zero register, ...).

>>>>
>>>> Disp8 would still leave "only" LD/ST ops in this case.
>>>>
>>>> Where, say, LD/ST also needs 3 bits to encode the type of value to be
>>>> loaded/stored.
>>> <
>>> LD needs 3 bits, ST only needs 2. Actually LD only needs 2.8 bits
>>> since we don't need both signed and unsigned 64-bit items. Stores
>>> do not need signed and unsigned, just an indication of how-much
>>> to store.
>>>
>> That is why I usually ended up putting LEA's there...
>>
>> A LEA operation is nice to have, but is an issue if one has an 'X'
>> (paired) case.
>>
>> Though, depending on the ISA rules, one could skip a byte LEA and
>> instead encode this case as an ADD.
>>
>> STB, STW, STL, STQ, LEAW, LEAL, LEAQ, STX
> <
> For the MEM Rd,[Rb,Disp16] case I use the signed LDD as the EXIT
> instruction, and the similar place in STD as the ENTER instruction.
> There is no need for LEA, here, as it is redundant with ADD.
> <
> For the MEM Rd,[Rb,Ri<<sc] case I use the signed LDD as LEA,
> because ADD with 2 operands will be seen to be less costly in
> emulating LEA, so LEA is basically reserved for 3-Operand ADDs.

OK.

I have LEA, but it isn't quite so useful as a 3-operand ADD mostly
because it does a "zero extend from low 48 bits" thing (so would only be
useful here for 32-bit unsigned operations; but then may produce
out-of-range results rather than the proper 32-bit wrapping behavior).

Well, and this is why ADDS.L / ADDU.L / SUBS.L / SUBU.L exist, to
preserve the expected wrapping semantics (and not require explicit sign
or zero extensions following the various operations).

Well, and one could argue:
EXTS.L Rm, Rn
Is unnecessary because:
ADDS.L Rm, 0, Rn
Does basically the same thing...
....

Well, and stupid stuff, like there has ended up being two semi-redundant
encodings for the BLKUTX2 instruction (as an earlier attempt to relocate
the encoding got botched in my compiler), but at present, I can't remove
either of them without breaking something.

....

Some things could have been a little better here.

But, alas...

>>
>> ...

Re: More of my philosophy about CISC and RISC instructions..

<uc0tbl$23jaf$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33743&group=comp.arch#33743

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Mon, 21 Aug 2023 18:52:18 -0500
Organization: A noiseless patient Spider
Lines: 55
Message-ID: <uc0tbl$23jaf$1@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me>
<3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me>
<2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com>
<47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
<8m4EM.686037$TPw2.506418@fx17.iad> <ubqphs$u0gp$1@dont-email.me>
<4941705f-ac14-4f98-b3d1-6fa62bdb4236n@googlegroups.com>
<ubqtm2$uqgs$1@dont-email.me>
<4826e253-d7c4-4b5e-98b4-8b51ee9e4a88n@googlegroups.com>
<uc01um$1vcn2$1@dont-email.me>
<a875ad5b-56e5-4a57-8b59-406fbe0ab970n@googlegroups.com>
<uc0dpu$21brh$1@dont-email.me>
<bd354c97-a238-43f3-bc9d-c316e84cad2cn@googlegroups.com>
<7CREM.499490$qnnb.342286@fx11.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 21 Aug 2023 23:52:21 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="60048df51a645c346b795ec2e584cf80";
logging-data="2215247"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19xeq47QrV/h0FKhFPk2WiS"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:c1i9bR335tQnuvmLLEi9ZCTFz78=
Content-Language: en-US
In-Reply-To: <7CREM.499490$qnnb.342286@fx11.iad>

by: BGB - Mon, 21 Aug 2023 23:52 UTC

On 8/21/2023 5:32 PM, EricP wrote:
> MitchAlsup wrote:
>> On Monday, August 21, 2023 at 2:26:58 PM UTC-5, BGB wrote:
>>>
>>> Though, depending on the ISA rules, one could skip a byte LEA and
>>> instead encode this case as an ADD.
>>> STB, STW, STL, STQ, LEAW, LEAL, LEAQ, STX
>> <
>> For the MEM Rd,[Rb,Disp16] case I use the signed LDD as the EXIT
>> instruction, and the similar place in STD as the ENTER instruction.
>> There is no need for LEA, here, as it is redundant with ADD.
>
> Except if Rb is r0 it means the RIP for a LD/ST instruction,
> but the data r0 value for an ADD (which would probably be the
> return RIP but you can't assume that).
>

Similar applies in my case as well.

Base register:
R0 -> PC
R1 -> GBR

Index register (Rb != R0|R1):
R0: R0, scaled by element size.
R1: R0, but unscaled.

Or, combined:
(R0, R0) -> (PC, R0)
(R0, R1) -> (R0)
(R1, R0) -> (GBR, R0)
(R1, R1) -> (TBR, R0)

At present, the above is the only way to encode TBR as a base, but this
isn't a huge loss as typically the only reason to use TBR as a
base-register is to access context variables or TLS or similar (serving
a similar role to the FS/GS segments on x86).

>> <
>> For the MEM Rd,[Rb,Ri<<sc] case I use the signed LDD as LEA,
>> because ADD with 2 operands will be seen to be less costly in
>> emulating LEA, so LEA is basically reserved for 3-Operand ADDs.
>>> ...
>
> Again for Rb is r0
>
>

Same...

LEA and ADD will have different behaviors here in my case as well...

Re: More of my philosophy about CISC and RISC instructions..

<6e83740c-b80b-4ba1-8249-9a9cfa28ddd3n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33744&group=comp.arch#33744

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:a0c:b0:640:5bf2:f7d1 with SMTP id dw12-20020a0562140a0c00b006405bf2f7d1mr50248qvb.1.1692662186786;
Mon, 21 Aug 2023 16:56:26 -0700 (PDT)
X-Received: by 2002:a17:902:fb03:b0:1bc:2547:b17c with SMTP id
le3-20020a170902fb0300b001bc2547b17cmr3476239plb.1.1692662186344; Mon, 21 Aug
2023 16:56:26 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Aug 2023 16:56:25 -0700 (PDT)
In-Reply-To: <uc0o08$22t34$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:fb:e091:a8ab:d83e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:fb:e091:a8ab:d83e
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<ubqr9n$uehf$1@dont-email.me> <4abb73a0-37f7-410c-9ea1-3d433bf8a80cn@googlegroups.com>
<ubra6f$10m81$1@dont-email.me> <299eacf1-ed31-4611-a9b0-e5098f85bd8bn@googlegroups.com>
<ubs0ng$17b7g$1@dont-email.me> <7034e3e8-3a16-488b-9877-89b9169ba8den@googlegroups.com>
<ubtq4l$1gv5c$1@dont-email.me> <13ad15bd-63f6-466a-8295-097a390a0bf7n@googlegroups.com>
<ubu3vh$1ies4$1@dont-email.me> <5261c939-c2ef-4ed5-947f-89c482e710f8n@googlegroups.com>
<uc0o08$22t34$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6e83740c-b80b-4ba1-8249-9a9cfa28ddd3n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Mon, 21 Aug 2023 23:56:26 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5707

by: MitchAlsup - Mon, 21 Aug 2023 23:56 UTC

On Monday, August 21, 2023 at 5:21:01 PM UTC-5, BGB wrote:
> On 8/20/2023 5:57 PM, MitchAlsup wrote:
>
>
> But, say, maybe could be possible to, instead of writing, say:
> MOV 0x3FF0000000000000, R4
> FLDCH 0x3C00, R5
> One could write:
> MOV 1.0D, R4 //Binary64
> FLDCH 1.0H, R5 //Binary16
<
In my case, the space efficient code is:
<
CVTSD Rd,#1 // ConVerT signed to double 1-word
or
CVTFD Rd,13.7E0 // Convert float to double 2-words
<
In practice, these rarely show up except when passing arguments to
subroutines or results back from functions.
>
>

>
> 12-bits is 96% of local (intra function) branches, but only 19% of
> global branches (a mixture of function calls, and the backwards branches
> for prolog/epilog compression).
<
Is this statically linked or dynamically linked ??

>
> Though, one other tradeoff is that these ops would mostly useful for
> loops like:
> while(n--) { ... }
> Or:
> while(p) { ... }
>
> But, not so much:
> for(i=0; i<n; i++)
> { ... }
<
My LOOP OpCodes cover all of these.
>
> Where, in this case, the relative usefulness of a dedicated Disp12
> compare-with-0 branch would also depend on the relative usage of the
> former vs the latter.
<
Compare with anything you want, use any integer comparison you like
{#0, #integer, Rc},....
> >>
> >> So, it is more a tradeoff between burning encoding space, vs needing a
> >> 64-bit encoding for these.
> > <
> > I don't see it as an encoding space issue, I see it as a variable length constant
> > routing problem from instruction buffer to function unit as part of "forwarding".
> > So, the majority of instructions (able to be encoded) have a routing OpCode
> > in addition to a Calculation OpCode. Instructions with 16-bit immediates have
> > a canned routing OpCode.
> > <
> > You can consider the routing OpCode as treating "forwarding" as another
> > calculation performed prior to execution. {Not dissimilar to how DG NOVA
> > had shifts with integer arithmetic}
<
> Hmm... And/or (partially) separating the matter of instruction-layout
> from opcode semantics?...
<
To do this efficiently in smaller implementations, the decode of this set of
bits has to be of small gate count.
>
> So, the instruction is expressed as a combination of "layout" (explicit
> in the encoding) and "opcode" (which instruction should be applied to
> these parameters).
<
I just use the word "modifiers" to access constants, change the sign,
specify which operand the constant is routed to,....
>
> This could be possible, just sort of implies that all of the major
> function units accept the same general interface internally.
<
Not at all, I have FUs that accept {1,2,3}-operand, and deliver {0,1,2}-results.
The 2nd result is special and is used to support CARRY without adding
register ports to the design.
>
> Seems like this would have a higher demand for encoding bits than the
> strategy I had used, and would lead to a lot of combinations which "are
> possible to encode but do not make sense". Though, an intermediate (more
> practical) option being to define the table of opcodes per layout.
<
It is the mapping of the bits to the decoded table of "what to do" to "where
to do it" that is important. As you should have garnered in the OpCode layout
I illustrated a couple of days ago.
>

Re: More of my philosophy about CISC and RISC instructions..

<7dc335ee-647a-45b0-a463-6a9132cc905en@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33745&group=comp.arch#33745

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:7d03:0:b0:3ff:3013:d2b0 with SMTP id g3-20020ac87d03000000b003ff3013d2b0mr54911qtb.0.1692662310495;
Mon, 21 Aug 2023 16:58:30 -0700 (PDT)
X-Received: by 2002:a17:90b:f0f:b0:26d:ae3:f6a4 with SMTP id
br15-20020a17090b0f0f00b0026d0ae3f6a4mr1710558pjb.5.1692662310133; Mon, 21
Aug 2023 16:58:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 21 Aug 2023 16:58:29 -0700 (PDT)
In-Reply-To: <7CREM.499490$qnnb.342286@fx11.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:fb:e091:a8ab:d83e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:fb:e091:a8ab:d83e
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com> <47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
<8m4EM.686037$TPw2.506418@fx17.iad> <ubqphs$u0gp$1@dont-email.me>
<4941705f-ac14-4f98-b3d1-6fa62bdb4236n@googlegroups.com> <ubqtm2$uqgs$1@dont-email.me>
<4826e253-d7c4-4b5e-98b4-8b51ee9e4a88n@googlegroups.com> <uc01um$1vcn2$1@dont-email.me>
<a875ad5b-56e5-4a57-8b59-406fbe0ab970n@googlegroups.com> <uc0dpu$21brh$1@dont-email.me>
<bd354c97-a238-43f3-bc9d-c316e84cad2cn@googlegroups.com> <7CREM.499490$qnnb.342286@fx11.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7dc335ee-647a-45b0-a463-6a9132cc905en@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Mon, 21 Aug 2023 23:58:30 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 28

by: MitchAlsup - Mon, 21 Aug 2023 23:58 UTC

On Monday, August 21, 2023 at 5:32:39 PM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Monday, August 21, 2023 at 2:26:58 PM UTC-5, BGB wrote:
> >>
> >> Though, depending on the ISA rules, one could skip a byte LEA and
> >> instead encode this case as an ADD.
> >>
> >> STB, STW, STL, STQ, LEAW, LEAL, LEAQ, STX
> > <
> > For the MEM Rd,[Rb,Disp16] case I use the signed LDD as the EXIT
> > instruction, and the similar place in STD as the ENTER instruction.
> > There is no need for LEA, here, as it is redundant with ADD.
<
> Except if Rb is r0 it means the RIP for a LD/ST instruction,
> but the data r0 value for an ADD (which would probably be the
> return RIP but you can't assume that).
<
An accepted liability.
>
> > <
> > For the MEM Rd,[Rb,Ri<<sc] case I use the signed LDD as LEA,
> > because ADD with 2 operands will be seen to be less costly in
> > emulating LEA, so LEA is basically reserved for 3-Operand ADDs.
> >> ...
> Again for Rb is r0
<
Since R0 arrives at a subroutine carrying the return address, Brian's
compiler seldom finds a need to use R0 as a GPR. So, this seldom
falls from grace.

BGB wrote:
> Paeth filter (from memory) is something like:
> P=A+B-C
> dA=abs(P-A)
> dB=abs(P-B)
> dC=abs(P-C)
> if(dA<dB)
> {
>      if(dA<dC)
>        { D=A; }
>      else if(dB<dC)
>        { D=B; }
>      else
>        { D=C; }
> }else
> {
>      if(dB<dC)
>        { D=B; }
>      else
>        { D=C; }
> }

So effectively (using 0/-1) for false/true)

a_less_b = dA<dB
a_less_c = dA<dC
b_less_c = dB<dC

select_a = a_less_b & a_less_c
select_b = ^a_less_b & b_less_c
select_c = ^a_less_c & ^b_less_c

I.e. you find the smallest of the three dX values and pick the
corresponding X?

If you have a vector MIN/MAX which is twice as wide as the values
involved, then it is tempting to put the dX values in the top half and X
in the bottom, and then just return the bottom half?

This presumes that it would be OK to return the smaller value if two
deltas are equal!

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

On 8/22/2023 2:03 PM, Terje Mathisen wrote:
> BGB wrote:
>> Paeth filter (from memory) is something like:
>>    P=A+B-C
>>    dA=abs(P-A)
>>    dB=abs(P-B)
>>    dC=abs(P-C)
>>    if(dA<dB)
>>    {
>>       if(dA<dC)
>>         { D=A; }
>>       else if(dB<dC)
>>         { D=B; }
>>       else
>>         { D=C; }
>>    }else
>>    {
>>       if(dB<dC)
>>         { D=B; }
>>       else
>>         { D=C; }
>>    }
>
> So effectively (using 0/-1) for false/true)
>
> a_less_b = dA<dB
> a_less_c = dA<dC
> b_less_c = dB<dC
>
> select_a = a_less_b & a_less_c
> select_b = ^a_less_b & b_less_c
> select_c = ^a_less_c & ^b_less_c
>
> I.e. you find the smallest of the three dX values and pick the
> corresponding X?
>

Yeah.

Paeth is basically "pick whichever of the 3 inputs is closest to the
target A+B-C prediction..."

There are ways to do it faster than the use of if/else branches on more
conventional targets, granted.

An ISA with conditional select or predication though can handle this
transform more efficiently without a need to resort to implementing it
via bit-masking or similar.

Would have also been nice if PNG also had a plain A+B-C predictor, but
alas...

Decided to leave out going into a thing about various approaches to
lossy and lossless image compression.

Eg (small summary):
PNG like, optimized for synthetic and lossless (normal PNG)
PNG like, but more optimized for natural images and lossy
No "real world" examples of this category, but can "sorta work" (*1)
JPEG like, optimized for lossless
Typically replacing DCT with WHT or similar.
JPEG like, but optimized for natural images and lossy
Eg: T.81 JPEG, some of the newer "JPEG replacements" (like WebP)
Wavelet-based formats (eg: JPEG-2000)
...

*1: Can basically ends up looking sort of like PNG with some parts from
FLAC and ADPCM glued on (namely a small FIR filter and dynamic adaptive
quantization; possibly using a Rice-coder, ...).
Have had OK results with some past experiments in these areas, but no
mainstream image formats seem to work this way.

Granted, one isn't terribly likely to dethrone JPEG in either speed or
Q/bpp with this, but it is possible to pull something like this off with
significantly less code (IME, one is looking usually at roughly 2kLOC or
so for something like a T.81 JPEG codec; vs, say, something one can
implement in around 500 lines or so).

> If you have a vector MIN/MAX which is twice as wide as the values
> involved, then it is tempting to put the dX values in the top half and X
> in the bottom, and then just return the bottom half?
>
> This presumes that it would be OK to return the smaller value if two
> deltas are equal!
>

Yeah. The "what happens if two deltas are equal" case is something one
has to get correct if they want a PNG implementation to be able to
encode/decode images without them turning into an ugly looking mess.

> Terje
>
>
>

Re: More of my philosophy about CISC and RISC instructions..

<560d0b40-cbdc-463f-87a0-f1a0368607d2n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33762&group=comp.arch#33762

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:7f07:0:b0:403:f763:5c6a with SMTP id f7-20020ac87f07000000b00403f7635c6amr99281qtk.12.1692743970798;
Tue, 22 Aug 2023 15:39:30 -0700 (PDT)
X-Received: by 2002:a17:90a:8009:b0:26b:5c14:cedc with SMTP id
b9-20020a17090a800900b0026b5c14cedcmr2828654pjn.1.1692743969760; Tue, 22 Aug
2023 15:39:29 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 22 Aug 2023 15:39:29 -0700 (PDT)
In-Reply-To: <uc30qv$2hebs$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b1d0:3ff5:2adf:5c0c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b1d0:3ff5:2adf:5c0c
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<2fc528c1-c0d4-4f20-8ce9-5845e9b805e0n@googlegroups.com> <%EMDM.147258$X02a.70096@fx46.iad>
<ubqv57$v2re$1@dont-email.me> <bb790143-18f5-4865-b162-5a0da094a273n@googlegroups.com>
<ubu0it$1hvac$1@dont-email.me> <uc30qv$2hebs$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <560d0b40-cbdc-463f-87a0-f1a0368607d2n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Tue, 22 Aug 2023 22:39:30 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3900

by: MitchAlsup - Tue, 22 Aug 2023 22:39 UTC

On Tuesday, August 22, 2023 at 2:04:03 PM UTC-5, Terje Mathisen wrote:
> BGB wrote:
> > Paeth filter (from memory) is something like:
> > P=A+B-C
> > dA=abs(P-A)
> > dB=abs(P-B)
> > dC=abs(P-C)
> > if(dA<dB)
> > {
> > if(dA<dC)
> > { D=A; }
> > else if(dB<dC)
> > { D=B; }
> > else
> > { D=C; }
> > }else
> > {
> > if(dB<dC)
> > { D=B; }
> > else
> > { D=C; }
> > }
> So effectively (using 0/-1) for false/true)
>
> a_less_b = dA<dB
> a_less_c = dA<dC
> b_less_c = dB<dC
>
> select_a = a_less_b & a_less_c
> select_b = ^a_less_b & b_less_c
> select_c = ^a_less_c & ^b_less_c
<
Just for fun::
<
CMP Rab,Ra,Rb
CMP Rac,Ra,Rc
CMP Rbc,Rb,Rc
SLA Ralb,Rab,<1,LT>
SLA Ralc,Rac,<1,LT>
SLA Rblc,Rbc,<1,LT>
AND Rsa,Ralb,Ralc
AND Rsb,~Ealb,Rblc
AND Rsc,~Ralb,~Rblc
// but we have not selected D yet.
<
Presto !!
<
But it occurs to me that this is even better::
<
CMP Rab,Ra,Rb
CMP Rac,Ra,Rc
CMP Rbc,Rb,Rc
SLL Ralb,Rab,<1,LT>
SLL Rblc,Rbc,<1,LT>
CMOV Rd,Ra,Rb,Ralb
CMOV Rd,Rd,Rc,Rclb
// and we have selected D
>
> I.e. you find the smallest of the three dX values and pick the
> corresponding X?
>
> If you have a vector MIN/MAX which is twice as wide as the values
> involved, then it is tempting to put the dX values in the top half and X
> in the bottom, and then just return the bottom half?
<
MIN Rd,Ra,Rb
MIN Rd,Rd,Rc
<
And we have a winner. Moral: express your code correctly.
>
> This presumes that it would be OK to return the smaller value if two
> deltas are equal!
<
Exactly what do you think "equal" means--in almost all circumstances
equal means one can replace the other (except IEEE ±0)
>
> Terje
>
>
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: More of my philosophy about CISC and RISC instructions..

<41e0e83e-8994-4483-a928-757037f55e2an@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33763&group=comp.arch#33763

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:4d52:0:b0:64c:e289:ae2d with SMTP id m18-20020ad44d52000000b0064ce289ae2dmr90592qvm.3.1692744155480;
Tue, 22 Aug 2023 15:42:35 -0700 (PDT)
X-Received: by 2002:a17:902:ea0a:b0:1bf:559a:7bd6 with SMTP id
s10-20020a170902ea0a00b001bf559a7bd6mr5147371plg.3.1692744155231; Tue, 22 Aug
2023 15:42:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!newsfeed.endofthelinebbs.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 22 Aug 2023 15:42:34 -0700 (PDT)
In-Reply-To: <uc3735$2ig89$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b1d0:3ff5:2adf:5c0c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b1d0:3ff5:2adf:5c0c
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<2fc528c1-c0d4-4f20-8ce9-5845e9b805e0n@googlegroups.com> <%EMDM.147258$X02a.70096@fx46.iad>
<ubqv57$v2re$1@dont-email.me> <bb790143-18f5-4865-b162-5a0da094a273n@googlegroups.com>
<ubu0it$1hvac$1@dont-email.me> <uc30qv$2hebs$1@dont-email.me> <uc3735$2ig89$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <41e0e83e-8994-4483-a928-757037f55e2an@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Tue, 22 Aug 2023 22:42:35 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5474

by: MitchAlsup - Tue, 22 Aug 2023 22:42 UTC

On Tuesday, August 22, 2023 at 3:50:49 PM UTC-5, BGB wrote:
> On 8/22/2023 2:03 PM, Terje Mathisen wrote:
> > BGB wrote:
> >> Paeth filter (from memory) is something like:
> >> P=A+B-C
> >> dA=abs(P-A)
> >> dB=abs(P-B)
> >> dC=abs(P-C)
> >> if(dA<dB)
> >> {
> >> if(dA<dC)
> >> { D=A; }
> >> else if(dB<dC)
> >> { D=B; }
> >> else
> >> { D=C; }
> >> }else
> >> {
> >> if(dB<dC)
> >> { D=B; }
> >> else
> >> { D=C; }
> >> }
> >
> > So effectively (using 0/-1) for false/true)
> >
> > a_less_b = dA<dB
> > a_less_c = dA<dC
> > b_less_c = dB<dC
> >
> > select_a = a_less_b & a_less_c
> > select_b = ^a_less_b & b_less_c
> > select_c = ^a_less_c & ^b_less_c
> >
> > I.e. you find the smallest of the three dX values and pick the
> > corresponding X?
> >
> Yeah.
>
> Paeth is basically "pick whichever of the 3 inputs is closest to the
> target A+B-C prediction..."
>
> There are ways to do it faster than the use of if/else branches on more
> conventional targets, granted.
>
> An ISA with conditional select or predication though can handle this
> transform more efficiently without a need to resort to implementing it
> via bit-masking or similar.
>
> Would have also been nice if PNG also had a plain A+B-C predictor, but
> alas...
>
>
>
> Decided to leave out going into a thing about various approaches to
> lossy and lossless image compression.
>
> Eg (small summary):
> PNG like, optimized for synthetic and lossless (normal PNG)
> PNG like, but more optimized for natural images and lossy
> No "real world" examples of this category, but can "sorta work" (*1)
> JPEG like, optimized for lossless
> Typically replacing DCT with WHT or similar.
> JPEG like, but optimized for natural images and lossy
> Eg: T.81 JPEG, some of the newer "JPEG replacements" (like WebP)
> Wavelet-based formats (eg: JPEG-2000)
> ...
<
It occurs to me that if you are doing enough of these to matter, that in
the same way one would offload Texture, or Cyphers, one would offload
image compression. Then the nuances of ISA don't matter.
>
> *1: Can basically ends up looking sort of like PNG with some parts from
> FLAC and ADPCM glued on (namely a small FIR filter and dynamic adaptive
> quantization; possibly using a Rice-coder, ...).
> Have had OK results with some past experiments in these areas, but no
> mainstream image formats seem to work this way.
>
> Granted, one isn't terribly likely to dethrone JPEG in either speed or
> Q/bpp with this, but it is possible to pull something like this off with
> significantly less code (IME, one is looking usually at roughly 2kLOC or
> so for something like a T.81 JPEG codec; vs, say, something one can
> implement in around 500 lines or so).
> > If you have a vector MIN/MAX which is twice as wide as the values
> > involved, then it is tempting to put the dX values in the top half and X
> > in the bottom, and then just return the bottom half?
> >
> > This presumes that it would be OK to return the smaller value if two
> > deltas are equal!
> >
> Yeah. The "what happens if two deltas are equal" case is something one
> has to get correct if they want a PNG implementation to be able to
> encode/decode images without them turning into an ugly looking mess.
>
>
> > Terje
> >
> >
> >

On 8/19/23 12:31 PM, MitchAlsup wrote:
[snip]
> I see not giving full access to the whole RF as a poor choice,
> Feel free to disagree with me. {There are too many register
> allocation problems without having artificial boundaries in
> use of registers. You might have set up a situation where you
> have to register allocate from one virtual RF space to another
> virtual RF space before allocating into the physical RF space.}

I disagree, but I also think the preference depends of the weight
given to various tradeoffs.

The tradeoffs will vary based on architecture targets. A more
specialized architecture (e.g., microcontroller-only) or more
focused architecture (e.g., primarily "server workloads" but with
adequate function for personal computing) could favor different
tradeoffs.

For some targets (many microcontroller uses) total code size
(including constant data) is very important. For some targets
instruction bandwidth and possibly size for cold and/or luke-warm
code is significant. For some targets code size is not
significant.

Since the compiler's work in register allocation can be
"cached"/reused for many executions, I feel spending more work at
compile time (and compiler development time) can be justified.

Limiting register names seems least problematic for uncommonly
used operations. Having a longer form of all operations that
includes all the register names would also seem to moderate the
negative effect of shorter encodings at the cost of more complex
decoding and opcode space (which can then take back some of the
code density advantage).

While you, Mitch, have argued persuasively for a unified register
set, there are some benefits to architectural specialization. Of
course, microarchitectural specialization can be applied if there
is a natural idiom which can be easily detected. An artificial
convention (optimization recommendation) can also provide such an
idiom.

E.g., providing a stack cache (or partial frame cache) would be
easier if the stack pointer was known to the microarchitecture.
(In theory, a stack pointer register could be "predicted" by
looking at the memory access pattern, but that seems pointlessly
complex and would probably make microarchitectural optimizations
based on that information not worthwhile.) In this case, there
seems little (no?) difference between convention and architecture,
but in other cases there would be.

(Even software idioms can be almost as difficult to change as
explicit interfaces. One programming concept that came to mind
which _might_ moderate this issue would be presenting a generic
expression of intent and "overloading" expressions with valid
specific implementations. This seems a little like runtime
dispatch choice for supporting non-universal features, though
such have the choice based on feature absence/presence rather
than a compiler choice based on optimization goals presented
at compile time.)

On 8/18/23 1:52 PM, MitchAlsup wrote:
> On Friday, August 18, 2023 at 1:10:37 AM UTC-5, BGB wrote:
[snip]
>> And, admittedly, on the other side, not as many people are as likely to
>> agree to my sentiment that 9-bits for more immediate and displacement
>> fields is "mostly sufficient".
> <
> I agree it is "mostly sufficient", but wouldn't you rather have "almost entirely
> sufficient" instead of "mostly sufficient" ?? i.e., 16-bits

I think it also depends on the cost of going beyond the base
level. If one needs to use several instructions to "paste
together" a larger immediate, then "mostly sufficient" is likely
to be excessively painful. If it means a 48-bit instruction rather
than a 32-bit instruction, then "mostly sufficient" might be more
reasonable. If it means a 64-bit instruction ...

Since base immediate sizes (and ways of extending immediates)
interacts with other aspects of instruction encoding, the
tradeoffs do not seem limited to instruction count and code size
for instructions using immediates. For variable length encoding,
parcel size would influence choices. For packet-oriented encodings
(even with cross-packet borrowing of immediate bits), the choices
would likely be different.

Bits not used for immediates are available to other operand value
encodings and to opcodes. Bit field arrangement will influence
decode/operand routing complexity. (The choices are further
complicated by the possibility — for some targets — of caching a
predecoded version of instructions. Increasing cache miss latency
and cache size from predecoding might be worthwhile, depending not
only on the specific microarchitectural targets and the benefit of
the specific predecode but also considering partial cost overlaps
for other changes. Engineering seems to get complex very easily.)

Re: More of my philosophy about CISC and RISC instructions..

<uc3f0g$2jjeu$2@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33766&group=comp.arch#33766

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Tue, 22 Aug 2023 19:05:52 -0400
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <uc3f0g$2jjeu$2@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com>
<bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<ubeclc$2gi1m$1@dont-email.me>
<36e3c863-a199-4824-9668-1d1c30227baan@googlegroups.com>
<ubj6bh$2ek2d$2@newsreader4.netcologne.de>
<5caa71f9-d744-461b-96f4-3fd4d2e3a108n@googlegroups.com>
<ubjf63$2eqg6$2@newsreader4.netcologne.de>
<b42c7084-798d-418f-af89-0f454a296e9bn@googlegroups.com>
<19aaa95e-047d-48f4-a6ff-0f60fed9d054n@googlegroups.com>
<b209864b-39cc-4796-8a8a-57c72a491d23n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 22 Aug 2023 23:05:52 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="9bd4327076c3f8cd0412db6e318757da";
logging-data="2739678"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18i1toGV2G93MWoPyxmaUqVY/Y4VglOwHk="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:5Kf0ca4Dy9cphdYb6wIL1nfC0iI=
X-Mozilla-News-Host: news://news.eternal-september.org
In-Reply-To: <b209864b-39cc-4796-8a8a-57c72a491d23n@googlegroups.com>

by: Paul A. Clayton - Tue, 22 Aug 2023 23:05 UTC

On 8/16/23 9:25 PM, MitchAlsup wrote:
[snip]
> You cannot overcome incompetence with arrogance.

That is true with respect to overcoming the _engineering_ effect
of incompetence. However, organizationally, arrogance — at least
the self-confidence aspect — seems rather effective in the short
term.

> and
> Leading with arrogance often implies a base of incompetence.

Arrogance also seems to promote incompetence by discouraging
disagreement and exploration of different perspectives.

On 8/22/2023 5:39 PM, MitchAlsup wrote:
> On Tuesday, August 22, 2023 at 2:04:03 PM UTC-5, Terje Mathisen wrote:
>> BGB wrote:
>>> Paeth filter (from memory) is something like:
>>> P=A+B-C
>>> dA=abs(P-A)
>>> dB=abs(P-B)
>>> dC=abs(P-C)
>>> if(dA<dB)
>>> {
>>> if(dA<dC)
>>> { D=A; }
>>> else if(dB<dC)
>>> { D=B; }
>>> else
>>> { D=C; }
>>> }else
>>> {
>>> if(dB<dC)
>>> { D=B; }
>>> else
>>> { D=C; }
>>> }
>> So effectively (using 0/-1) for false/true)
>>
>> a_less_b = dA<dB
>> a_less_c = dA<dC
>> b_less_c = dB<dC
>>
>> select_a = a_less_b & a_less_c
>> select_b = ^a_less_b & b_less_c
>> select_c = ^a_less_c & ^b_less_c
> <
> Just for fun::
> <
> CMP Rab,Ra,Rb
> CMP Rac,Ra,Rc
> CMP Rbc,Rb,Rc
> SLA Ralb,Rab,<1,LT>
> SLA Ralc,Rac,<1,LT>
> SLA Rblc,Rbc,<1,LT>
> AND Rsa,Ralb,Ralc
> AND Rsb,~Ealb,Rblc
> AND Rsc,~Ralb,~Rblc
> // but we have not selected D yet.
> <
> Presto !!
> <
> But it occurs to me that this is even better::
> <
> CMP Rab,Ra,Rb
> CMP Rac,Ra,Rc
> CMP Rbc,Rb,Rc
> SLL Ralb,Rab,<1,LT>
> SLL Rblc,Rbc,<1,LT>
> CMOV Rd,Ra,Rb,Ralb
> CMOV Rd,Rd,Rc,Rclb
> // and we have selected D
>>
>> I.e. you find the smallest of the three dX values and pick the
>> corresponding X?
>>
>> If you have a vector MIN/MAX which is twice as wide as the values
>> involved, then it is tempting to put the dX values in the top half and X
>> in the bottom, and then just return the bottom half?
> <
> MIN Rd,Ra,Rb
> MIN Rd,Rd,Rc
> <
> And we have a winner. Moral: express your code correctly.
>>
>> This presumes that it would be OK to return the smaller value if two
>> deltas are equal!
> <
> Exactly what do you think "equal" means--in almost all circumstances
> equal means one can replace the other (except IEEE ±0)

If they are on the same side...

One can have cases where two deltas are equal, but on opposite sides of
the predictor, in which case the relative order in which the selections
are chosen will matter.

Say:
A=128, B=176, C=160,
P=144

Both dA and dC would be 16, but results would differ if one selects 128
or 160.

One version may result in a correct image, another with a progressively
increasing error (starting at first as a colored streak which then
steadily increases in intensity as more errors accumulate and then
garbles the whole rest of the image).

>>
>> Terje
>>
>>
>>
>> --
>> - <Terje.Mathisen at tmsw.no>
>> "almost all programming can be viewed as an exercise in caching"

On 8/22/2023 5:42 PM, MitchAlsup wrote:
> On Tuesday, August 22, 2023 at 3:50:49 PM UTC-5, BGB wrote:
>> On 8/22/2023 2:03 PM, Terje Mathisen wrote:
>>> BGB wrote:
>>>> Paeth filter (from memory) is something like:
>>>> P=A+B-C
>>>> dA=abs(P-A)
>>>> dB=abs(P-B)
>>>> dC=abs(P-C)
>>>> if(dA<dB)
>>>> {
>>>> if(dA<dC)
>>>> { D=A; }
>>>> else if(dB<dC)
>>>> { D=B; }
>>>> else
>>>> { D=C; }
>>>> }else
>>>> {
>>>> if(dB<dC)
>>>> { D=B; }
>>>> else
>>>> { D=C; }
>>>> }
>>>
>>> So effectively (using 0/-1) for false/true)
>>>
>>> a_less_b = dA<dB
>>> a_less_c = dA<dC
>>> b_less_c = dB<dC
>>>
>>> select_a = a_less_b & a_less_c
>>> select_b = ^a_less_b & b_less_c
>>> select_c = ^a_less_c & ^b_less_c
>>>
>>> I.e. you find the smallest of the three dX values and pick the
>>> corresponding X?
>>>
>> Yeah.
>>
>> Paeth is basically "pick whichever of the 3 inputs is closest to the
>> target A+B-C prediction..."
>>
>> There are ways to do it faster than the use of if/else branches on more
>> conventional targets, granted.
>>
>> An ISA with conditional select or predication though can handle this
>> transform more efficiently without a need to resort to implementing it
>> via bit-masking or similar.
>>
>> Would have also been nice if PNG also had a plain A+B-C predictor, but
>> alas...
>>
>>
>>
>> Decided to leave out going into a thing about various approaches to
>> lossy and lossless image compression.
>>
>> Eg (small summary):
>> PNG like, optimized for synthetic and lossless (normal PNG)
>> PNG like, but more optimized for natural images and lossy
>> No "real world" examples of this category, but can "sorta work" (*1)
>> JPEG like, optimized for lossless
>> Typically replacing DCT with WHT or similar.
>> JPEG like, but optimized for natural images and lossy
>> Eg: T.81 JPEG, some of the newer "JPEG replacements" (like WebP)
>> Wavelet-based formats (eg: JPEG-2000)
>> ...
> <
> It occurs to me that if you are doing enough of these to matter, that in
> the same way one would offload Texture, or Cyphers, one would offload
> image compression. Then the nuances of ISA don't matter.

Dedicated Paeth instruction? Probably doable...
Full image codec, a little harder.

If one wanted to design a codec to make it easy to pull off a lot of the
implementation in hardware, and wanted something "sort of JPEG-like",
could make sense to build the codec around Rice-coding and a 4x4 WHT (a
4x4 WHT being a bit cheaper/easier to pull off in hardware vs an 8x8 DCT).

Partly to compensate for the smaller block size, could make sense to use
a Paeth predictor for the block DC coefficients rather than simply
encoding the difference from the previous DC.

If the format supports 1:1:1 sub-sampling and a reversible color
transform, the format can be made lossless as well.

Eg:
Y=(2*G+R+B)/4, U=(B-G)+128, V=(R-G)+128
Or:
Y=(8*G+5*R+3*B)/16, U=(B-Y)+128, V=(R-Y)+128

>>
>> *1: Can basically ends up looking sort of like PNG with some parts from
>> FLAC and ADPCM glued on (namely a small FIR filter and dynamic adaptive
>> quantization; possibly using a Rice-coder, ...).
>> Have had OK results with some past experiments in these areas, but no
>> mainstream image formats seem to work this way.
>>
>> Granted, one isn't terribly likely to dethrone JPEG in either speed or
>> Q/bpp with this, but it is possible to pull something like this off with
>> significantly less code (IME, one is looking usually at roughly 2kLOC or
>> so for something like a T.81 JPEG codec; vs, say, something one can
>> implement in around 500 lines or so).

Or, something like the above, which also shouldn't be too difficult to
hardware-accelerate.

Pseudocode for an encoder being something like:
step=0;
for(y=0; y<height; y++)
for(x=0; x<width; x++)
for(c=0; c<4; c++)
{
pr=doFilt(img, x, y, c, xstr);
px=img[(y*xstr+x)*4+c];
d=px-pr;

//if lossy
q=(d*stepRcpTab[step])>>15; //reciprocals for each step
qa=abs(q)
if((qa<qLoThresh) && (step>0))
step--;
if(qa>qHiThresh)
step++;

emitResidual(q); //AdRice+RLE or similar.
}

Thresholds would be used to tune quality, and would need to be known by
the decoder.

Filter could be something like:
if((x>0) && (y>0))
{
//P=A+B-C
pr= img[((y )*xstr+x-1)*4+c]+
img[((y-1)*xstr+x )*4+c]-
img[((y-1)*xstr+x-1)*4+c];
}else
{
pr=0;
}

A naive entropy scheme being something like:
void emitResidual(int q)
{
int qf;
qf=(q<0)?(((-q)<<1)|1):(q<<1);

if(qf)
{
if(runZeroCount)
{
if(runZeroCount>1)
{
emitAdRice(0);
emitAdRice(runZeroCount+1);
}else
{
emitAdRice(1);
}
}
emitAdRice(qf+1);
}else
{
if(runZeroCount>=RUNZEROMAX)
{
emitAdRice(0);
emitAdRice(runZeroCount+1);
runZeroCount=0;
}
runZeroCount++;
}
}

Rest of the codec mostly being stuff for the AdRice/bitstream handling
and similar.

Note that 0 could escape both an RLE run, but also be used to further
escape meta-commands or control parameters.

Basic:
1+: single residual
0, 2+: run of zeroes.
0, 1, param, value: Update control parameter
0, 0, x, ...: Command Escape.

Decoding could be single-pass with finite-state-machines for the
entropy/residual stages.

Would likely also use a length-limited AdRice encoding, where for Q:
0: Decrement Rk (if Rk>0)
1: Leave Rk as-is
2..6: Increment Rk
7: Escape, full-length N-bit symbol follows.

Where, one emits each symbol as a Q+1 bit prefix, and an Rk bit suffix.
Say, Q=2, Rk=2: 110zz (Encodes a value of 8..11).
But, after encountering this symbol, Rk would increase to 3.
Then, Q=0, Rk=3: 0zzz (0..7)
Would cause Rk to drop back to 2.
...

Where, Rice decoding can be helped along with a CTNZ instruction (Count
Trailing Non-Zero), but failing this, lookup tables also work, ...
(length-limited variants can also be decoded similar to Huffman if
needed; but unlike Huffman, the lookup tables are constant).

For simplicity sake, assume RGBA32 / RGBA8888.

For simplicity, if developed into a format, would likely borrow the
DIB/BMP packaging.

Though, I guess this does sort of point out the relative lack of formats
between "simple" formats (like TGA or PCX) and "more complex" formats
(like PNG or JPEG).

>>> If you have a vector MIN/MAX which is twice as wide as the values
>>> involved, then it is tempting to put the dX values in the top half and X
>>> in the bottom, and then just return the bottom half?
>>>
>>> This presumes that it would be OK to return the smaller value if two
>>> deltas are equal!
>>>
>> Yeah. The "what happens if two deltas are equal" case is something one
>> has to get correct if they want a PNG implementation to be able to
>> encode/decode images without them turning into an ugly looking mess.
>>
>>
>>> Terje
>>>
>>>
>>>

MitchAlsup wrote:
>> I.e. you find the smallest of the three dX values and pick the
>> corresponding X?
>>
>> If you have a vector MIN/MAX which is twice as wide as the values
>> involved, then it is tempting to put the dX values in the top half and X
>> in the bottom, and then just return the bottom half?
> <
> MIN Rd,Ra,Rb
> MIN Rd,Rd,Rc
> <
> And we have a winner. Moral: express your code correctly.

We do need to merge the deltas with the original values, and mask away
the top at the end, but it is obviously very fast.

>> This presumes that it would be OK to return the smaller value if two
>> deltas are equal!
> <
> Exactly what do you think "equal" means--in almost all circumstances
> equal means one can replace the other (except IEEE ±0)

The problem with a merged key is that in the case where dA == dB but A <
B, then the original logic says that B should be selected (due to <= to
dA), but now we will end up with A since ((dA<<32)|A) < ((dB<<32)|B).

BGB mentioned that this difference would mess up PNG decoding.

We need something similar to CAS2 (CMPXCHG8B) where you have both a
32-bit key and a 32-bit payload which is not part of the comparison.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: More of my philosophy about CISC and RISC instructions..

<uc70gv$3chcn$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33792&group=comp.arch#33792

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Thu, 24 Aug 2023 02:23:09 -0500
Organization: A noiseless patient Spider
Lines: 212
Message-ID: <uc70gv$3chcn$1@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com>
<bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me>
<3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me>
<2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<ubqr9n$uehf$1@dont-email.me>
<4abb73a0-37f7-410c-9ea1-3d433bf8a80cn@googlegroups.com>
<ubra6f$10m81$1@dont-email.me>
<299eacf1-ed31-4611-a9b0-e5098f85bd8bn@googlegroups.com>
<ubs0ng$17b7g$1@dont-email.me>
<7034e3e8-3a16-488b-9877-89b9169ba8den@googlegroups.com>
<ubtq4l$1gv5c$1@dont-email.me>
<13ad15bd-63f6-466a-8295-097a390a0bf7n@googlegroups.com>
<ubu3vh$1ies4$1@dont-email.me>
<5261c939-c2ef-4ed5-947f-89c482e710f8n@googlegroups.com>
<uc0o08$22t34$1@dont-email.me>
<6e83740c-b80b-4ba1-8249-9a9cfa28ddd3n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 24 Aug 2023 07:23:12 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="342e4c7d05e29fe3062774f92ffb81e6";
logging-data="3556759"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19o3GU7BIGkCfUdxrxPZ2DE"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:xYurdYqxTsMfnFkNmzq64rVJGxc=
In-Reply-To: <6e83740c-b80b-4ba1-8249-9a9cfa28ddd3n@googlegroups.com>
Content-Language: en-US

by: BGB - Thu, 24 Aug 2023 07:23 UTC

On 8/21/2023 6:56 PM, MitchAlsup wrote:
> On Monday, August 21, 2023 at 5:21:01 PM UTC-5, BGB wrote:
>> On 8/20/2023 5:57 PM, MitchAlsup wrote:
>>
>>
>> But, say, maybe could be possible to, instead of writing, say:
>> MOV 0x3FF0000000000000, R4
>> FLDCH 0x3C00, R5
>> One could write:
>> MOV 1.0D, R4 //Binary64
>> FLDCH 1.0H, R5 //Binary16
> <
> In my case, the space efficient code is:
> <
> CVTSD Rd,#1 // ConVerT signed to double 1-word
> or
> CVTFD Rd,13.7E0 // Convert float to double 2-words
> <
> In practice, these rarely show up except when passing arguments to
> subroutines or results back from functions.

OK.

>>
>>
>
>>
>> 12-bits is 96% of local (intra function) branches, but only 19% of
>> global branches (a mixture of function calls, and the backwards branches
>> for prolog/epilog compression).
> <
> Is this statically linked or dynamically linked ??
>

BGBCC is a "compile everything all at once" compiler design, and in this
case, static linked.

Though, something does seem anomalous, in that the "2 back branches per
function" amounting to roughly 80% of the total branches, does seem a
little suspect...

But, I can't really otherwise explain why, the displacements are showing
up a global pattern something like (roughly):
GLQuake: 8s=0.96% 12s=19.96% 16s=0.04% 20s=79.04%
Doom: 8s=1.08% 12s= 7.58% 16s=0.03% 20s=91.23% 24s=0.08%

This seems like there is a clear split between local branches and global
branches.

OTOH:
These stats are also based on a conservative "branch length estimator"
model (which selects which branch type to use by making an "educated
guess"), rather than the stats from the final binary's relocs.

May need to gather stats based on reloc time statistics as well (which
may well give a different pattern).

Goes and adds stats logic for this...

GLQuake, modeled based on distances while applying relocs:
8s=40.39% 12s=5.48% 16s=4.11% 20s=48.57% 24s=1.45% 33s=0.00%

This is, a bit different...

This stat paints an entirely different picture about the value of adding
Disp12s instructions... (And, that the branch-length estimator may be
significantly underestimating the number of short branches).

Granted, the relative cost of overestimating the required branch length
is significantly worse than underestimating.

>>
>> Though, one other tradeoff is that these ops would mostly useful for
>> loops like:
>> while(n--) { ... }
>> Or:
>> while(p) { ... }
>>
>> But, not so much:
>> for(i=0; i<n; i++)
>> { ... }
> <
> My LOOP OpCodes cover all of these.

These branches would be limited to what can be made to fit the pattern:
if(x CMP 0)
goto Lbl;

>>
>> Where, in this case, the relative usefulness of a dedicated Disp12
>> compare-with-0 branch would also depend on the relative usage of the
>> former vs the latter.
> <
> Compare with anything you want, use any integer comparison you like
> {#0, #integer, Rc},....

There are also "compare two-registers and branch" ops, but no encoding
space to expand these to a larger displacement.

Also no immediate-form.

But, an immediate form would be a problem, as the immediate field is
already in use with holding the branch displacement.

Reason both "compare two regs" and "compare with zero" variants exist,
is because of a lack of an architectural zero register.

Though, if one needs two ops anyways, and doesn't care about preserving
SR.T:
CMPxx Imm, Rn
BT/BF Lbl
Also works well...

>>>>
>>>> So, it is more a tradeoff between burning encoding space, vs needing a
>>>> 64-bit encoding for these.
>>> <
>>> I don't see it as an encoding space issue, I see it as a variable length constant
>>> routing problem from instruction buffer to function unit as part of "forwarding".
>>> So, the majority of instructions (able to be encoded) have a routing OpCode
>>> in addition to a Calculation OpCode. Instructions with 16-bit immediates have
>>> a canned routing OpCode.
>>> <
>>> You can consider the routing OpCode as treating "forwarding" as another
>>> calculation performed prior to execution. {Not dissimilar to how DG NOVA
>>> had shifts with integer arithmetic}
> <
>> Hmm... And/or (partially) separating the matter of instruction-layout
>> from opcode semantics?...
> <
> To do this efficiently in smaller implementations, the decode of this set of
> bits has to be of small gate count.

OK.

>>
>> So, the instruction is expressed as a combination of "layout" (explicit
>> in the encoding) and "opcode" (which instruction should be applied to
>> these parameters).
> <
> I just use the word "modifiers" to access constants, change the sign,
> specify which operand the constant is routed to,....

OK.

>>
>> This could be possible, just sort of implies that all of the major
>> function units accept the same general interface internally.
> <
> Not at all, I have FUs that accept {1,2,3}-operand, and deliver {0,1,2}-results.
> The 2nd result is special and is used to support CARRY without adding
> register ports to the design.

I meant, say, if one has different FUs that expect different input and
output layouts, mix/match may result in a whole lot of "this doesn't
make sense" combinations.

3R doesn't make sense for a branch, and 1R doesn't make sense for most
ALU ops, ...

>>
>> Seems like this would have a higher demand for encoding bits than the
>> strategy I had used, and would lead to a lot of combinations which "are
>> possible to encode but do not make sense". Though, an intermediate (more
>> practical) option being to define the table of opcodes per layout.
> <
> It is the mapping of the bits to the decoded table of "what to do" to "where
> to do it" that is important. As you should have garnered in the OpCode layout
> I illustrated a couple of days ago.

I didn't entirely understand it...

Your approach to the encoding sounds like it is likely very different
from my approach in these areas.

In my case, opcode drives instruction layout and unpacking, not the
other way around.

So, every possible combination of operation and instruction layout
effectively requires its own opcode (and, there not necessarily being a
correlation between where an instruction is located, and its decoding
pattern; with the partial exception of the F8 block).

Though, in the listings, I mostly left out jumbo encodings mostly except
in cases where "new" semantics were expressed. Partly because these did
follow a more straightforward pattern.

Say, I don't necessarily need to list out every combination when most
cases can be summarized as, say, "If it is Imm9/Disp9, the Imm/Disp goes
from 9 bits to 33 bits with an FE jumbo prefix", ...

....

Re: More of my philosophy about CISC and RISC instructions..

<10413779-4d21-4856-ac65-233381a700b6n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33802&group=comp.arch#33802

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:55d0:0:b0:649:9ae9:2924 with SMTP id bt16-20020ad455d0000000b006499ae92924mr208186qvb.11.1692895730278;
Thu, 24 Aug 2023 09:48:50 -0700 (PDT)
X-Received: by 2002:a17:902:e743:b0:1b1:e9c0:4625 with SMTP id
p3-20020a170902e74300b001b1e9c04625mr6585124plf.10.1692895730024; Thu, 24 Aug
2023 09:48:50 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.niel.me!glou.org!news.glou.org!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 24 Aug 2023 09:48:49 -0700 (PDT)
In-Reply-To: <uc70gv$3chcn$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:858d:3bb5:7746:21e2;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:858d:3bb5:7746:21e2
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7d6e8035-0402-4f29-ae39-c467cfa4245cn@googlegroups.com> <bab02209-0dc4-492e-8cf2-ede14635e4d4n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<ubqr9n$uehf$1@dont-email.me> <4abb73a0-37f7-410c-9ea1-3d433bf8a80cn@googlegroups.com>
<ubra6f$10m81$1@dont-email.me> <299eacf1-ed31-4611-a9b0-e5098f85bd8bn@googlegroups.com>
<ubs0ng$17b7g$1@dont-email.me> <7034e3e8-3a16-488b-9877-89b9169ba8den@googlegroups.com>
<ubtq4l$1gv5c$1@dont-email.me> <13ad15bd-63f6-466a-8295-097a390a0bf7n@googlegroups.com>
<ubu3vh$1ies4$1@dont-email.me> <5261c939-c2ef-4ed5-947f-89c482e710f8n@googlegroups.com>
<uc0o08$22t34$1@dont-email.me> <6e83740c-b80b-4ba1-8249-9a9cfa28ddd3n@googlegroups.com>
<uc70gv$3chcn$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <10413779-4d21-4856-ac65-233381a700b6n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 24 Aug 2023 16:48:50 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: MitchAlsup - Thu, 24 Aug 2023 16:48 UTC

On Thursday, August 24, 2023 at 2:23:16 AM UTC-5, BGB wrote:
> On 8/21/2023 6:56 PM, MitchAlsup wrote:
> > On Monday, August 21, 2023 at 5:21:01 PM UTC-5, BGB wrote:
> >> On 8/20/2023 5:57 PM, MitchAlsup wrote:
> >>
> >>
> >> But, say, maybe could be possible to, instead of writing, say:
> >> MOV 0x3FF0000000000000, R4
> >> FLDCH 0x3C00, R5
> >> One could write:
> >> MOV 1.0D, R4 //Binary64
> >> FLDCH 1.0H, R5 //Binary16
> > <
> > In my case, the space efficient code is:
> > <
> > CVTSD Rd,#1 // ConVerT signed to double 1-word
> > or
> > CVTFD Rd,13.7E0 // Convert float to double 2-words
> > <
> > In practice, these rarely show up except when passing arguments to
> > subroutines or results back from functions.
> OK.
> >>
> >>
> >
> >>
> >> 12-bits is 96% of local (intra function) branches, but only 19% of
> >> global branches (a mixture of function calls, and the backwards branches
> >> for prolog/epilog compression).
> > <
> > Is this statically linked or dynamically linked ??
> >
> BGBCC is a "compile everything all at once" compiler design, and in this
> case, static linked.
>
>
> Though, something does seem anomalous, in that the "2 back branches per
> function" amounting to roughly 80% of the total branches, does seem a
> little suspect...
>
>
> But, I can't really otherwise explain why, the displacements are showing
> up a global pattern something like (roughly):
> GLQuake: 8s=0.96% 12s=19.96% 16s=0.04% 20s=79.04%
> Doom: 8s=1.08% 12s= 7.58% 16s=0.03% 20s=91.23% 24s=0.08%
>
> This seems like there is a clear split between local branches and global
> branches.
>
>
> OTOH:
> These stats are also based on a conservative "branch length estimator"
> model (which selects which branch type to use by making an "educated
> guess"), rather than the stats from the final binary's relocs.
>
>
> May need to gather stats based on reloc time statistics as well (which
> may well give a different pattern).
>
>
> Goes and adds stats logic for this...
>
> GLQuake, modeled based on distances while applying relocs:
> 8s=40.39% 12s=5.48% 16s=4.11% 20s=48.57% 24s=1.45% 33s=0.00%
>
> This is, a bit different...
>
>
> This stat paints an entirely different picture about the value of adding
> Disp12s instructions... (And, that the branch-length estimator may be
> significantly underestimating the number of short branches).
>
> Granted, the relative cost of overestimating the required branch length
> is significantly worse than underestimating.
> >>
> >> Though, one other tradeoff is that these ops would mostly useful for
> >> loops like:
> >> while(n--) { ... }
> >> Or:
> >> while(p) { ... }
> >>
> >> But, not so much:
> >> for(i=0; i<n; i++)
> >> { ... }
> > <
> > My LOOP OpCodes cover all of these.
> These branches would be limited to what can be made to fit the pattern:
> if(x CMP 0)
> goto Lbl;
> >>
> >> Where, in this case, the relative usefulness of a dedicated Disp12
> >> compare-with-0 branch would also depend on the relative usage of the
> >> former vs the latter.
> > <
> > Compare with anything you want, use any integer comparison you like
> > {#0, #integer, Rc},....
> There are also "compare two-registers and branch" ops, but no encoding
> space to expand these to a larger displacement.
>
> Also no immediate-form.
>
> But, an immediate form would be a problem, as the immediate field is
> already in use with holding the branch displacement.
>
> Reason both "compare two regs" and "compare with zero" variants exist,
> is because of a lack of an architectural zero register.
>
>
> Though, if one needs two ops anyways, and doesn't care about preserving
> SR.T:
> CMPxx Imm, Rn
> BT/BF Lbl
> Also works well...
> >>>>
> >>>> So, it is more a tradeoff between burning encoding space, vs needing a
> >>>> 64-bit encoding for these.
> >>> <
> >>> I don't see it as an encoding space issue, I see it as a variable length constant
> >>> routing problem from instruction buffer to function unit as part of "forwarding".
> >>> So, the majority of instructions (able to be encoded) have a routing OpCode
> >>> in addition to a Calculation OpCode. Instructions with 16-bit immediates have
> >>> a canned routing OpCode.
> >>> <
> >>> You can consider the routing OpCode as treating "forwarding" as another
> >>> calculation performed prior to execution. {Not dissimilar to how DG NOVA
> >>> had shifts with integer arithmetic}
> > <
> >> Hmm... And/or (partially) separating the matter of instruction-layout
> >> from opcode semantics?...
> > <
> > To do this efficiently in smaller implementations, the decode of this set of
> > bits has to be of small gate count.
> OK.
> >>
> >> So, the instruction is expressed as a combination of "layout" (explicit
> >> in the encoding) and "opcode" (which instruction should be applied to
> >> these parameters).
> > <
> > I just use the word "modifiers" to access constants, change the sign,
> > specify which operand the constant is routed to,....
> OK.
> >>
> >> This could be possible, just sort of implies that all of the major
> >> function units accept the same general interface internally.
> > <
> > Not at all, I have FUs that accept {1,2,3}-operand, and deliver {0,1,2}-results.
> > The 2nd result is special and is used to support CARRY without adding
> > register ports to the design.
> I meant, say, if one has different FUs that expect different input and
> output layouts, mix/match may result in a whole lot of "this doesn't
> make sense" combinations.
>
> 3R doesn't make sense for a branch, and 1R doesn't make sense for most
> ALU ops, ...
> >>
> >> Seems like this would have a higher demand for encoding bits than the
> >> strategy I had used, and would lead to a lot of combinations which "are
> >> possible to encode but do not make sense". Though, an intermediate (more
> >> practical) option being to define the table of opcodes per layout.
> > <
> > It is the mapping of the bits to the decoded table of "what to do" to "where
> > to do it" that is important. As you should have garnered in the OpCode layout
> > I illustrated a couple of days ago.
> I didn't entirely understand it...
>
>
> Your approach to the encoding sounds like it is likely very different
> from my approach in these areas.
>
> In my case, opcode drives instruction layout and unpacking, not the
> other way around.
<
Yes, tis is what I have been trying to explain all these months. It is the
modifiers, and their need to be compact but do everything I wanted done
that drove their position in the instruction. The OpCodes, then, were fitted
in the space remaining.
>
>
> So, every possible combination of operation and instruction layout
> effectively requires its own opcode (and, there not necessarily being a
> correlation between where an instruction is located, and its decoding
> pattern; with the partial exception of the F8 block).
>
This is where doing the modifiers first wins--when you need that 3rd operand
register, it goes where it goes, and then the 3-operand instruction is then
crammed in the space which is left over; and the guiding principle is that
the gates decoding the modifier are not perturbed by all of this.
>
> Though, in the listings, I mostly left out jumbo encodings mostly except
> in cases where "new" semantics were expressed. Partly because these did
> follow a more straightforward pattern.
>
> Say, I don't necessarily need to list out every combination when most
> cases can be summarized as, say, "If it is Imm9/Disp9, the Imm/Disp goes
> from 9 bits to 33 bits with an FE jumbo prefix", ...
>
>
> ...
You have gone to great lengths to get 3-wide running at 50 MHz. I wonder if
a 1-wide at 100 MHz would actually perform better ???

Click here to read the complete article

Re: More of my philosophy about CISC and RISC instructions..

<uc859r$3ij0r$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33804&group=comp.arch#33804

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: More of my philosophy about CISC and RISC instructions..
Date: Thu, 24 Aug 2023 12:50:48 -0500
Organization: A noiseless patient Spider
Lines: 276
Message-ID: <uc859r$3ij0r$1@dont-email.me>
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com>
<d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me>
<3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me>
<2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<ubqr9n$uehf$1@dont-email.me>
<4abb73a0-37f7-410c-9ea1-3d433bf8a80cn@googlegroups.com>
<ubra6f$10m81$1@dont-email.me>
<299eacf1-ed31-4611-a9b0-e5098f85bd8bn@googlegroups.com>
<ubs0ng$17b7g$1@dont-email.me>
<7034e3e8-3a16-488b-9877-89b9169ba8den@googlegroups.com>
<ubtq4l$1gv5c$1@dont-email.me>
<13ad15bd-63f6-466a-8295-097a390a0bf7n@googlegroups.com>
<ubu3vh$1ies4$1@dont-email.me>
<5261c939-c2ef-4ed5-947f-89c482e710f8n@googlegroups.com>
<uc0o08$22t34$1@dont-email.me>
<6e83740c-b80b-4ba1-8249-9a9cfa28ddd3n@googlegroups.com>
<uc70gv$3chcn$1@dont-email.me>
<10413779-4d21-4856-ac65-233381a700b6n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 24 Aug 2023 17:50:52 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="342e4c7d05e29fe3062774f92ffb81e6";
logging-data="3755035"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/RL4H0v9arQ0n+L41gGcRo"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:PHtJ280pOlSNEThjN51gT9kqPW8=
Content-Language: en-US
In-Reply-To: <10413779-4d21-4856-ac65-233381a700b6n@googlegroups.com>

by: BGB - Thu, 24 Aug 2023 17:50 UTC

On 8/24/2023 11:48 AM, MitchAlsup wrote:
> On Thursday, August 24, 2023 at 2:23:16 AM UTC-5, BGB wrote:
>> On 8/21/2023 6:56 PM, MitchAlsup wrote:
>>> On Monday, August 21, 2023 at 5:21:01 PM UTC-5, BGB wrote:
>>>> On 8/20/2023 5:57 PM, MitchAlsup wrote:
>>>>
>>>>
>>>> But, say, maybe could be possible to, instead of writing, say:
>>>> MOV 0x3FF0000000000000, R4
>>>> FLDCH 0x3C00, R5
>>>> One could write:
>>>> MOV 1.0D, R4 //Binary64
>>>> FLDCH 1.0H, R5 //Binary16
>>> <
>>> In my case, the space efficient code is:
>>> <
>>> CVTSD Rd,#1 // ConVerT signed to double 1-word
>>> or
>>> CVTFD Rd,13.7E0 // Convert float to double 2-words
>>> <
>>> In practice, these rarely show up except when passing arguments to
>>> subroutines or results back from functions.
>> OK.
>>>>
>>>>
>>>
>>>>
>>>> 12-bits is 96% of local (intra function) branches, but only 19% of
>>>> global branches (a mixture of function calls, and the backwards branches
>>>> for prolog/epilog compression).
>>> <
>>> Is this statically linked or dynamically linked ??
>>>
>> BGBCC is a "compile everything all at once" compiler design, and in this
>> case, static linked.
>>
>>
>> Though, something does seem anomalous, in that the "2 back branches per
>> function" amounting to roughly 80% of the total branches, does seem a
>> little suspect...
>>
>>
>> But, I can't really otherwise explain why, the displacements are showing
>> up a global pattern something like (roughly):
>> GLQuake: 8s=0.96% 12s=19.96% 16s=0.04% 20s=79.04%
>> Doom: 8s=1.08% 12s= 7.58% 16s=0.03% 20s=91.23% 24s=0.08%
>>
>> This seems like there is a clear split between local branches and global
>> branches.
>>
>>
>> OTOH:
>> These stats are also based on a conservative "branch length estimator"
>> model (which selects which branch type to use by making an "educated
>> guess"), rather than the stats from the final binary's relocs.
>>
>>
>> May need to gather stats based on reloc time statistics as well (which
>> may well give a different pattern).
>>
>>
>> Goes and adds stats logic for this...
>>
>> GLQuake, modeled based on distances while applying relocs:
>> 8s=40.39% 12s=5.48% 16s=4.11% 20s=48.57% 24s=1.45% 33s=0.00%
>>
>> This is, a bit different...
>>
>>
>> This stat paints an entirely different picture about the value of adding
>> Disp12s instructions... (And, that the branch-length estimator may be
>> significantly underestimating the number of short branches).
>>
>> Granted, the relative cost of overestimating the required branch length
>> is significantly worse than underestimating.
>>>>
>>>> Though, one other tradeoff is that these ops would mostly useful for
>>>> loops like:
>>>> while(n--) { ... }
>>>> Or:
>>>> while(p) { ... }
>>>>
>>>> But, not so much:
>>>> for(i=0; i<n; i++)
>>>> { ... }
>>> <
>>> My LOOP OpCodes cover all of these.
>> These branches would be limited to what can be made to fit the pattern:
>> if(x CMP 0)
>> goto Lbl;
>>>>
>>>> Where, in this case, the relative usefulness of a dedicated Disp12
>>>> compare-with-0 branch would also depend on the relative usage of the
>>>> former vs the latter.
>>> <
>>> Compare with anything you want, use any integer comparison you like
>>> {#0, #integer, Rc},....
>> There are also "compare two-registers and branch" ops, but no encoding
>> space to expand these to a larger displacement.
>>
>> Also no immediate-form.
>>
>> But, an immediate form would be a problem, as the immediate field is
>> already in use with holding the branch displacement.
>>
>> Reason both "compare two regs" and "compare with zero" variants exist,
>> is because of a lack of an architectural zero register.
>>
>>
>> Though, if one needs two ops anyways, and doesn't care about preserving
>> SR.T:
>> CMPxx Imm, Rn
>> BT/BF Lbl
>> Also works well...
>>>>>>
>>>>>> So, it is more a tradeoff between burning encoding space, vs needing a
>>>>>> 64-bit encoding for these.
>>>>> <
>>>>> I don't see it as an encoding space issue, I see it as a variable length constant
>>>>> routing problem from instruction buffer to function unit as part of "forwarding".
>>>>> So, the majority of instructions (able to be encoded) have a routing OpCode
>>>>> in addition to a Calculation OpCode. Instructions with 16-bit immediates have
>>>>> a canned routing OpCode.
>>>>> <
>>>>> You can consider the routing OpCode as treating "forwarding" as another
>>>>> calculation performed prior to execution. {Not dissimilar to how DG NOVA
>>>>> had shifts with integer arithmetic}
>>> <
>>>> Hmm... And/or (partially) separating the matter of instruction-layout
>>>> from opcode semantics?...
>>> <
>>> To do this efficiently in smaller implementations, the decode of this set of
>>> bits has to be of small gate count.
>> OK.
>>>>
>>>> So, the instruction is expressed as a combination of "layout" (explicit
>>>> in the encoding) and "opcode" (which instruction should be applied to
>>>> these parameters).
>>> <
>>> I just use the word "modifiers" to access constants, change the sign,
>>> specify which operand the constant is routed to,....
>> OK.
>>>>
>>>> This could be possible, just sort of implies that all of the major
>>>> function units accept the same general interface internally.
>>> <
>>> Not at all, I have FUs that accept {1,2,3}-operand, and deliver {0,1,2}-results.
>>> The 2nd result is special and is used to support CARRY without adding
>>> register ports to the design.
>> I meant, say, if one has different FUs that expect different input and
>> output layouts, mix/match may result in a whole lot of "this doesn't
>> make sense" combinations.
>>
>> 3R doesn't make sense for a branch, and 1R doesn't make sense for most
>> ALU ops, ...
>>>>
>>>> Seems like this would have a higher demand for encoding bits than the
>>>> strategy I had used, and would lead to a lot of combinations which "are
>>>> possible to encode but do not make sense". Though, an intermediate (more
>>>> practical) option being to define the table of opcodes per layout.
>>> <
>>> It is the mapping of the bits to the decoded table of "what to do" to "where
>>> to do it" that is important. As you should have garnered in the OpCode layout
>>> I illustrated a couple of days ago.
>> I didn't entirely understand it...
>>
>>
>> Your approach to the encoding sounds like it is likely very different
>> from my approach in these areas.
>>
>> In my case, opcode drives instruction layout and unpacking, not the
>> other way around.
> <
> Yes, tis is what I have been trying to explain all these months. It is the
> modifiers, and their need to be compact but do everything I wanted done
> that drove their position in the instruction. The OpCodes, then, were fitted
> in the space remaining.

OK.

>>
>>
>> So, every possible combination of operation and instruction layout
>> effectively requires its own opcode (and, there not necessarily being a
>> correlation between where an instruction is located, and its decoding
>> pattern; with the partial exception of the F8 block).
>>
> This is where doing the modifiers first wins--when you need that 3rd operand
> register, it goes where it goes, and then the 3-operand instruction is then
> crammed in the space which is left over; and the guiding principle is that
> the gates decoding the modifier are not perturbed by all of this.

Click here to read the complete article

Re: More of my philosophy about CISC and RISC instructions..

<130086d6-3ea6-41de-a7d6-6e68317c24c5n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33806&group=comp.arch#33806

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:8e0e:b0:76d:8643:58b7 with SMTP id re14-20020a05620a8e0e00b0076d864358b7mr158455qkn.4.1692903772284;
Thu, 24 Aug 2023 12:02:52 -0700 (PDT)
X-Received: by 2002:a17:902:cec9:b0:1b8:97ed:a437 with SMTP id
d9-20020a170902cec900b001b897eda437mr7882916plg.4.1692903771882; Thu, 24 Aug
2023 12:02:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 24 Aug 2023 12:02:51 -0700 (PDT)
In-Reply-To: <uc3etu$2jjeu$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:858d:3bb5:7746:21e2;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:858d:3bb5:7746:21e2
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com> <47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
<8m4EM.686037$TPw2.506418@fx17.iad> <ubqphs$u0gp$1@dont-email.me>
<4941705f-ac14-4f98-b3d1-6fa62bdb4236n@googlegroups.com> <uc3etu$2jjeu$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <130086d6-3ea6-41de-a7d6-6e68317c24c5n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 24 Aug 2023 19:02:52 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 6895

by: MitchAlsup - Thu, 24 Aug 2023 19:02 UTC

On Tuesday, August 22, 2023 at 6:04:34 PM UTC-5, Paul A. Clayton wrote:
> On 8/19/23 12:31 PM, MitchAlsup wrote:
[snip]
>
> While you, Mitch, have argued persuasively for a unified register
> set, there are some benefits to architectural specialization. Of
> course, microarchitectural specialization can be applied if there
> is a natural idiom which can be easily detected. An artificial
> convention (optimization recommendation) can also provide such an
> idiom.
<
Allow me to clarify::
<
I am not trying to create and architecture which is
a) a marvelous microcontroller CPU
b) a marvelous vector supercomputer CPU
What I am trying to do is
c) a marvelous general purpose CPU
with
d) an actually Reduced instruction set.
<
And in this domain I think unified register is closer to optimal than
a similar ISA with separate or specialized register sets.
<
Outside of this domain other design points/decisions take over.
<
But within this domain I am getting VAX instruction counts with
RISC pipelineability and pipeline efficiency, and my ISA requires
only 70%± of the instruction count of RISC-V which should trans-
late into nearly a 40% performance advantage {under a whole
slew of necessary caveats} at the same operating frequency....
>
> E.g., providing a stack cache (or partial frame cache) would be
> easier if the stack pointer was known to the microarchitecture.
<
My ISA understands that R31=SP and that if one allocates space
{SP-=128} and then deallocates space {SP+=128} than the associated
cache lines do not need to be pushed into the memory hierarchy. They
CAN be, but they do not NEED to be. Whereas: those same manipulations
on any other register do not have that property--even when SW might
want that behavior. {{Different implementations are allowed to do
different things, here, just like they are allowed different cache sizes,
and sets of associativity.}}
<
> (In theory, a stack pointer register could be "predicted" by
> looking at the memory access pattern, but that seems pointlessly
> complex and would probably make microarchitectural optimizations
> based on that information not worthwhile.) In this case, there
> seems little (no?) difference between convention and architecture,
> but in other cases there would be.
<
In general:: The stack pointer is manipulated once at the entry of a
subroutine and once at the exit of a subroutine and is constant over
the execution of a subroutine. This is RISC philosophy, foreign to
PDP-11 and VAX philosophy; neither of which lived long enough to
witness the transition x86 made from VAX philosophy to RISC philo-
sophy at the switch over from x86 to x86-64. {{In block structured
languages, the subroutine boundary changes to a block boundary.}}
>
> (Even software idioms can be almost as difficult to change as
> explicit interfaces. One programming concept that came to mind
> which _might_ moderate this issue would be presenting a generic
> expression of intent and "overloading" expressions with valid
> specific implementations. This seems a little like runtime
> dispatch choice for supporting non-universal features, though
> such have the choice based on feature absence/presence rather
> than a compiler choice based on optimization goals presented
> at compile time.)
<
All of the overload resolution takes place in the compiler before
code generation and linking; and whatever code sequences the
compiler chooses becomes an idiom which could be recognized
and optimized later.
<
But you are correct in your assumption that "some of this stuff"
makes its way back into ISA--and one of the reasons I think
Quadriblock's gyrations are misguided. My 66000 ISA has improved
markedly since Brian (and now Thomas) have been contributing
{Brian doing compiler and Thomas /binutls}. Swtich statements
have a single instruction that performs range checking, table
access, default qualification; in a way that remains position
independent. Likewise dynamic linking uses a single instruction
to CALL an external subroutine that is not attackable the current
attack strategies, not does it use a trampoline to get there and back.
Fewer instructions, fewer cycles, retaining all the desired properties.
<
Along the way, several things were "invented", the later modified,
then later parts of them discarded for other "inventions". These
are the things one has to gyrate through before the ISA gets
"public" because afterwards the original mistakes become frozen
{and from x86 experience--frozen for at least 40 years}
<
I could not have understood the proper properties of these "things"
without feedback of reading the code, relearning how and why of
that functionality so that I could then figure out what embodiment
was required so that they were both small, fast, and efficient. And
this is where I think Quadriblock is going astray--you need guidance
from compiler and runtime development to make the ISA "correct".

On 8/24/2023 12:02 PM, MitchAlsup wrote:
> On Tuesday, August 22, 2023 at 6:04:34 PM UTC-5, Paul A. Clayton wrote:
>> On 8/19/23 12:31 PM, MitchAlsup wrote:
> [snip]
>>
>> While you, Mitch, have argued persuasively for a unified register
>> set, there are some benefits to architectural specialization. Of
>> course, microarchitectural specialization can be applied if there
>> is a natural idiom which can be easily detected. An artificial
>> convention (optimization recommendation) can also provide such an
>> idiom.
> <
> Allow me to clarify::
> <
> I am not trying to create and architecture which is
> a) a marvelous microcontroller CPU
> b) a marvelous vector supercomputer CPU

OK, I will ask. If you were trying to create a marvelous vector
supercomputer CPU, how would it be different from MY66000?

Specifically, would you still use the VVM mechanism? How different
would the ISA be (or would it just use more FP functional units and
perhaps more/bigger buffers for VVM use?)? Would you provide full
support for 128 bit FP? etc., etc.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: More of my philosophy about CISC and RISC instructions..

<1d34de28-7fdf-4368-bb34-f7b69aeb4f89n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33828&group=comp.arch#33828

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:8508:b0:76d:86b1:ece8 with SMTP id pe8-20020a05620a850800b0076d86b1ece8mr544940qkn.12.1693078431158;
Sat, 26 Aug 2023 12:33:51 -0700 (PDT)
X-Received: by 2002:a17:902:e548:b0:1bd:df9a:4fc6 with SMTP id
n8-20020a170902e54800b001bddf9a4fc6mr8287151plf.4.1693078430722; Sat, 26 Aug
2023 12:33:50 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 26 Aug 2023 12:33:50 -0700 (PDT)
In-Reply-To: <ucd0p2$jjfb$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:88d8:58f6:8390:8089;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:88d8:58f6:8390:8089
References: <9dd7cbc7-ec85-4a8b-84f1-266cb47575b2n@googlegroups.com>
<7c9ef480-989b-4cbb-ac7f-db8f3749e8f1n@googlegroups.com> <d1fc890d-e4e5-4afd-8718-86143628ea2an@googlegroups.com>
<ubdrp6$2e6ik$1@dont-email.me> <3b6e5488-022e-4299-ab6b-70a7436bb004n@googlegroups.com>
<ubn20o$5d2d$1@dont-email.me> <2e6de6de-b408-4aa7-a430-ead3f983ed78n@googlegroups.com>
<8eee43b7-cb96-49c1-9735-4bb8c004b5c3n@googlegroups.com> <47020cd0-2ee6-4c51-91a0-885ba719137cn@googlegroups.com>
<8m4EM.686037$TPw2.506418@fx17.iad> <ubqphs$u0gp$1@dont-email.me>
<4941705f-ac14-4f98-b3d1-6fa62bdb4236n@googlegroups.com> <uc3etu$2jjeu$1@dont-email.me>
<130086d6-3ea6-41de-a7d6-6e68317c24c5n@googlegroups.com> <ucd0p2$jjfb$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1d34de28-7fdf-4368-bb34-f7b69aeb4f89n@googlegroups.com>
Subject: Re: More of my philosophy about CISC and RISC instructions..
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 26 Aug 2023 19:33:51 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4105

by: MitchAlsup - Sat, 26 Aug 2023 19:33 UTC

On Saturday, August 26, 2023 at 9:04:22 AM UTC-5, Stephen Fuld wrote:
> On 8/24/2023 12:02 PM, MitchAlsup wrote:
> > On Tuesday, August 22, 2023 at 6:04:34 PM UTC-5, Paul A. Clayton wrote:
> >> On 8/19/23 12:31 PM, MitchAlsup wrote:
> > [snip]
> >>
> >> While you, Mitch, have argued persuasively for a unified register
> >> set, there are some benefits to architectural specialization. Of
> >> course, microarchitectural specialization can be applied if there
> >> is a natural idiom which can be easily detected. An artificial
> >> convention (optimization recommendation) can also provide such an
> >> idiom.
> > <
> > Allow me to clarify::
> > <
> > I am not trying to create and architecture which is
> > a) a marvelous microcontroller CPU
> > b) a marvelous vector supercomputer CPU
<
> OK, I will ask. If you were trying to create a marvelous vector
> supercomputer CPU, how would it be different from MY66000?
<
What an intriguing question !!
>
> Specifically, would you still use the VVM mechanism? How different
> would the ISA be (or would it just use more FP functional units and
> perhaps more/bigger buffers for VVM use?)? Would you provide full
> support for 128 bit FP? etc., etc.
<
After thinking about this for an hour::
<
ISA would probably be pretty much the same, the memory system and
interconnect would be vastly beefier. I would shoot for a cache line
width of FPUs (8)×{FADD, FMAC, FDIV/SQRT} 8 to 16 cache line staging
buffers, 4 AGENs per cycle, all feeding off the 1MB 16-banked L2, taking
4 caches misses per cycle. Then over in the memory/DRAM area there
would be a minimum of 16 DIMMs (or HBMs) operating at 2 speed
grades below maximum BW DDR <of that generation> could muster.
Every lane would be capable of integer and logical calculations.
<
After writing the above and more thought, I can't see any changes in
ISA, as we already get gather (LDD->LD) and Scater (LDD-ST) falling
out for free.
<
My hope would be that it would not melt when operating at full throughput.
>
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Subject	Author
More of my philosophy about CISC and RISC instructions..	Amine Moulay Ramdane
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	pec...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	pec...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	Scott Lurndal
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	Terje Mathisen
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	Terje Mathisen
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	pec...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	JimBrakefield
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	JimBrakefield
Re: More of my philosophy about CISC and RISC instructions..	Scott Lurndal
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	EricP
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	Scott Lurndal
Re: More of my philosophy about CISC and RISC instructions..	Paul A. Clayton
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	Stephen Fuld
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	Stephen Fuld
Re: More of my philosophy about CISC and RISC instructions..	Stephen Fuld
Re: More of my philosophy about CISC and RISC instructions..	Thomas Koenig
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Going fast, was Re: More of my philosophy	John Levine
Re: More of my philosophy about CISC and RISC instructions..	aph
Re: More of my philosophy about CISC and RISC instructions..	luke.l...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	Stefan Monnier
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	Scott Lurndal
Re: More of my philosophy about CISC and RISC instructions..	Paul A. Clayton
Re: More of my philosophy about CISC and RISC instructions..	luke.l...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	luke.l...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	Brett
Re: More of my philosophy about CISC and RISC instructions..	pec...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	Thomas Koenig
Re: More of my philosophy about CISC and RISC instructions..	pec...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	Thomas Koenig
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	pec...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	Paul A. Clayton
register windows (was: More of my philosophy ...)	Anton Ertl
Re: More of my philosophy about CISC and RISC instructions..	Anton Ertl
Re: More of my philosophy about CISC and RISC instructions..	John Levine
Re: More of my philosophy about CISC and RISC instructions..	Anton Ertl
Re: More of my philosophy about CISC and RISC instructions..	Scott Lurndal
Re: More of my philosophy about CISC and RISC instructions..	John Levine
Re: More of my philosophy about CISC and RISC instructions..	Stephen Fuld
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	Timothy McCaffrey
Re: More of my philosophy about CISC and RISC instructions..	Timothy McCaffrey
Re: More of my philosophy about CISC and RISC instructions..	luke.l...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	Timothy McCaffrey
Re: More of my philosophy about CISC and RISC instructions..	Stephen Fuld
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	MitchAlsup
Re: More of my philosophy about CISC and RISC instructions..	luke.l...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	luke.l...@gmail.com
Re: More of my philosophy about CISC and RISC instructions..	BGB
Re: More of my philosophy about CISC and RISC instructions..	JimBrakefield
Re: More of my philosophy about CISC and RISC instructions..	Hogege NaN