Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

<chesty> xemacs fixed my flatulence -- From the "XEmacs: Not just an editor" department


devel / comp.arch / Re: Alternative Representations of the Concertina II ISA

SubjectAuthor
* Alternative Representations of the Concertina II ISAQuadibloc
+* Re: Alternative Representations of the Concertina II ISAQuadibloc
|`- Re: Alternative Representations of the Concertina II ISAQuadibloc
+* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|`* Re: Alternative Representations of the Concertina II ISAQuadibloc
| +* Re: Alternative Representations of the Concertina II ISAQuadibloc
| |`- Re: Alternative Representations of the Concertina II ISABGB
| `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|  `* Re: Alternative Representations of the Concertina II ISAQuadibloc
|   +- Re: Alternative Representations of the Concertina II ISAMitchAlsup
|   `* Re: Alternative Representations of the Concertina II ISABGB
|    +* Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    |+* Re: Alternative Representations of the Concertina II ISAQuadibloc
|    ||`* Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    || +* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    || |`- Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    || `* Re: Alternative Representations of the Concertina II ISAQuadibloc
|    ||  `- Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |+- Re: Alternative Representations of the Concertina II ISABGB
|    |`* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    | +- Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    | +* Re: Alternative Representations of the Concertina II ISABGB
|    | |`* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    | | `* Re: Alternative Representations of the Concertina II ISARobert Finch
|    | |  +* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    | |  |`- Re: Alternative Representations of the Concertina II ISABGB
|    | |  `- Re: Alternative Representations of the Concertina II ISABGB
|    | `* Re: Alternative Representations of the Concertina II ISAPaul A. Clayton
|    |  +* Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    |  |+* Re: Alternative Representations of the Concertina II ISABGB
|    |  ||+- Re: Alternative Representations of the Concertina II ISABGB
|    |  ||+* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |  |||`- Re: Alternative Representations of the Concertina II ISABGB
|    |  ||`* Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    |  || `* Re: Alternative Representations of the Concertina II ISABGB
|    |  ||  `- Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    |  |`* Re: Alternative Representations of the Concertina II ISAPaul A. Clayton
|    |  | `- Re: Alternative Representations of the Concertina II ISABGB
|    |  `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |   `* Re: Alternative Representations of the Concertina II ISAPaul A. Clayton
|    |    `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |     `* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      +* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      |`* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      | +* Re: Alternative Representations of the Concertina II ISABGB
|    |      | |`* Re: Alternative Representations of the Concertina II ISARobert Finch
|    |      | | +- Re: Alternative Representations of the Concertina II ISABGB
|    |      | | `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      | |  `- Re: Alternative Representations of the Concertina II ISABGB
|    |      | `* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      |  `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      |   `* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      |    +* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      |    |`* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      |    | `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      |    |  `- Re: Alternative Representations of the Concertina II ISAPaul A. Clayton
|    |      |    `* Re: Alternative Representations of the Concertina II ISATerje Mathisen
|    |      |     `* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      |      `- Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |       `- Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|     +- Re: Alternative Representations of the Concertina II ISABGB
|     `* Re: Alternative Representations of the Concertina II ISAMarko Zec
|      `* Re: Alternative Representations of the Concertina II ISABGB
|       `* Re: Alternative Representations of the Concertina II ISAStephen Fuld
|        `* Re: Alternative Representations of the Concertina II ISABGB
|         `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          +* Re: Alternative Representations of the Concertina II ISABGB
|          |`* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          | `* Re: Alternative Representations of the Concertina II ISABGB
|          |  `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          |   +* Re: Alternative Representations of the Concertina II ISARobert Finch
|          |   |+* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          |   ||+* Re: Alternative Representations of the Concertina II ISAChris M. Thomasson
|          |   |||`- Re: Alternative Representations of the Concertina II ISAChris M. Thomasson
|          |   ||`- Re: Alternative Representations of the Concertina II ISARobert Finch
|          |   |`* Re: Alternative Representations of the Concertina II ISATerje Mathisen
|          |   | `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          |   |  `* Re: Alternative Representations of the Concertina II ISATerje Mathisen
|          |   |   `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          |   |    +* Re: Alternative Representations of the Concertina II ISAChris M. Thomasson
|          |   |    |`- Re: Alternative Representations of the Concertina II ISABGB
|          |   |    `- Re: Alternative Representations of the Concertina II ISATerje Mathisen
|          |   `* Re: Alternative Representations of the Concertina II ISABGB
|          |    `- Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          `* Re: Alternative Representations of the Concertina II ISATerje Mathisen
|           +* Re: Alternative Representations of the Concertina II ISAChris M. Thomasson
|           |`* Fast approx hypotenuse (Was Re: Alternative Representations of theTerje Mathisen
|           | `- Re: Fast approx hypotenuse (Was Re: Alternative Representations ofBGB
|           `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|            `- Re: Alternative Representations of the Concertina II ISABGB
`* Re: Alternative Representations of the Concertina II ISAQuadibloc
 `- Re: Alternative Representations of the Concertina II ISAQuadibloc

Pages:1234
Re: Alternative Representations of the Concertina II ISA

<uk5em8$d3ss$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35309&group=comp.arch#35309

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.samoylyk.net!news.szaf.org!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Tue, 28 Nov 2023 11:22:48 -0800
Organization: A noiseless patient Spider
Lines: 16
Message-ID: <uk5em8$d3ss$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 28 Nov 2023 19:22:48 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="6ab5e1bccaefdaf17b623068b07b8318";
logging-data="429980"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18eYPOtLMqD28+l3m/g1wrIVcSfqvG8WgI="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:V8ePXW00K/JNHAkHjZ5ceu8nT/8=
In-Reply-To: <uk5dbu$curb$1@dont-email.me>
Content-Language: en-US
 by: Stephen Fuld - Tue, 28 Nov 2023 19:22 UTC

On 11/28/2023 11:00 AM, BGB wrote:

snip

> Still, it seems the incidence of the integer DIV/MOD/etc instructions
> being used is somewhat rarer than this estimate in the code I was
> measuring.

Is this a result of your using mostly a few old games as your
"reference" code? i.e. your code base may be different from what gave
rise to the statistics mentioned by others.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Alternative Representations of the Concertina II ISA

<ab431f0d71c11d83f395f221633ab692@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35310&group=comp.arch#35310

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Tue, 28 Nov 2023 19:37:04 +0000
Organization: novaBBS
Message-ID: <ab431f0d71c11d83f395f221633ab692@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <2023Nov27.101759@mips.complang.tuwien.ac.at> <uk25f4$3pc0r$1@dont-email.me> <2023Nov27.165403@mips.complang.tuwien.ac.at> <uk3ric$5adb$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2415209"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$RNh8vO4uvq8kNTV05JFqQecHXxCfwwD.TwMMK7TCvyzlVPrIjy9Sy
 by: MitchAlsup - Tue, 28 Nov 2023 19:37 UTC

Quadibloc wrote:

> On Mon, 27 Nov 2023 15:54:03 +0000, Anton Ertl wrote:

>> Quadibloc <quadibloc@servername.invalid> writes:
>>>On Mon, 27 Nov 2023 09:17:59 +0000, Anton Ertl wrote:
>>>
>>>> That debate has been held, and MIPS has hardware integer divide, Alpha
>>>> and IA-64 don't have a hardware integer divide; they both have FP
>>>> divide instructions.
>>>
>>>I didn't know this about the Itanium. All I remembered hearing was that
>>>the Itanium "didn't have a divide instruction", and so I didn't realize
>>>this applied to fixed-point arithmetic only.
>>
>> You are right, I was wrong.
>> <http://gec.di.uminho.pt/discip/minf/ac0203/icca03/ia64fpbf1.pdf> says:
>>
>> |A number of floating-point operations defined by the IEEE Standard are
>> |deferred to software by the IA-64 architecture in all its
>> |implementations |
>> |* floating-point divide (integer divide, which is based on the |
>> floating-point divide operation, is also deferred to software)
>>
>> The paper goes on to describe that FP division a/b is based on
>> determining 1/b through Newton-Raphson approximation, using the FMA
>> instruction, and then multiplying with a. It shows a sequence of 13
>> instructions for double precision, 1 frcpa and 12 fma or fnma
>> instructions.
>>
>> Given that integer division is based on FP division, it probably takes
>> even more instructions.

> The fastest _hardware_ implementations of FP division are Goldschmidt
> division and Newton-Raphson divisiion, and they both make use of
> multiplication hardware.

Goldschmidt is about 2× faster than N-R because:: Goldschmidt inner loop
has 2 independent multiplies, while N-R has 2 dependent multiplies. To
meet IEEE 754 accuracy, the last iteration in Goldschmide is ½ a N-R
iteration {{you perform the N-R 1st step and then use the sign bit to
determine +1 or +2 and the +0 can be determined while the multiply is
transpiring.}}

> So using Newton-Raphson division to increase the width of an FP division
> result to perform 64-bit integer division won't be too hard.

> Since the Wikipedia article didn't go into detail, I had to look further
> to find out that you _were_ right about the DEC Alpha, however. It did
> have floating divide but not integer divide. The 21264 added floating
> square root.

> John Savard

Re: Alternative Representations of the Concertina II ISA

<uk5hdi$dlb6$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35311&group=comp.arch#35311

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!nntp.comgw.net!paganini.bofh.team!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Tue, 28 Nov 2023 14:09:19 -0600
Organization: A noiseless patient Spider
Lines: 81
Message-ID: <uk5hdi$dlb6$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 28 Nov 2023 20:09:22 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c208050d75b3f71c95ee722651efd037";
logging-data="447846"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Mima/qRqwPqBGP2AruPwT"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:KzxilHStw+VtwWVk22q8jt8H65w=
In-Reply-To: <uk5em8$d3ss$1@dont-email.me>
Content-Language: en-US
 by: BGB - Tue, 28 Nov 2023 20:09 UTC

On 11/28/2023 1:22 PM, Stephen Fuld wrote:
> On 11/28/2023 11:00 AM, BGB wrote:
>
> snip
>
>> Still, it seems the incidence of the integer DIV/MOD/etc instructions
>> being used is somewhat rarer than this estimate in the code I was
>> measuring.
>
> Is this a result of your using mostly a few old games as your
> "reference" code? i.e. your code base may be different from what gave
> rise to the statistics mentioned by others.
>

Possibly.

There is my C library and kernel/runtime code;
And games like Doom and Quake and similar.

I suspect probably ID also avoided integer division when possible as
well, as it is rarely used (and was probably still slow back in
early/mid 90s PCs).

Looking up, 486 and Pentium 1 still had ~ 40+ cycle 32-bit DIV/IDIV.
....

Looks like MUL/IMUL had similar latency to DIV/IDIV on the 386 and 486,
but then the latency dropped down a fair bit on the Pentium.

However, Doom makes extensive use of integer multiply.

Given the timings, I suspect it is possible they were also using a
Shift-Add design for the multiply/divide unit.

In my case, I am using Shift-Add for DIV and also 64-bit multiply, but a
faster DSP48 based multiplier for 32-bit multiply.

If I were to make a guess based on the latency, I would guess the P1 was
using something like a radix-16 long-multiply.

Comparably, it seems DIV/IDIV didn't actually get much faster until the
Ryzen.

Have noted that these old games did have a nifty trick for calculating
approximate distance, say:
dx=x0-x1;
dy=y0-y1;
adx=dx^(dx>>31);
ady=dy^(dy>>31);
if(ady>adx)
{ t=adx; adx=ady; ady=t; } //common
// { adx^=ady; ady^=adx; adx^=ady; } //sometimes
d=adx+(ady>>1);

Can be extended to 3D without too much issue, but for 4D, the sorting
step becomes more of an issue.

Kinda makes sense though, as finding the square-root of things is also
expensive...

But, not sure how a lot of this compares with other (non game) coding
practices.

Have noticed though that a lot of these games really do like using lots
of global variables (a lot more so than I would prefer to use in my
coding practices).

Though, this is one case where it is useful to be able to load/store
global variables in a single instruction (and using something like a GOT
may have a significant penalty on code that uses lots of global variables).

Re: Alternative Representations of the Concertina II ISA

<uk5hl7$dlb6$2@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35312&group=comp.arch#35312

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Tue, 28 Nov 2023 14:13:26 -0600
Organization: A noiseless patient Spider
Lines: 98
Message-ID: <uk5hl7$dlb6$2@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
<3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 28 Nov 2023 20:13:27 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c208050d75b3f71c95ee722651efd037";
logging-data="447846"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/txHNes66ksdiLt4murpyi"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:/dtcB8asKkkQvguuv3VBwAmVgpw=
In-Reply-To: <3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
Content-Language: en-US
 by: BGB - Tue, 28 Nov 2023 20:13 UTC

On 11/27/2023 7:03 PM, MitchAlsup wrote:
> Anton Ertl wrote:
>
>> BGB <cr88192@gmail.com> writes:
>>> On 11/26/2023 7:29 PM, Quadibloc wrote:
>>>> On Sat, 25 Nov 2023 19:55:59 +0000, MitchAlsup wrote:
>>>>
>>>>> But Integer Multiply and Divide can share the FPU that does these.
>>>>
>>>> But giving each one its own multiplier means more superscalar
>>>> goodness!
>
>> Having two multipliers that serve both purposes means even more
>> superscalar goodness for similar area cost.  However, there is the
>> issue of latency.  The Willamette ships the integers over to the FPU
>> for multiplication, and the result back, crossing several clock
>> domains (at one clock loss per domain crossing), resulting in a
>> 10-cycle latency for integer multiplication.  I think that these days
>> every high-performance core with real silicon invests into separate
>> GPR and FP/SIMD (including integer SIMD) multipliers.
>
>>> In most code, FPU ops are comparably sparse
>
>> In terms of executed ops, that depends very much on the code.  GP
>> cores have acquired SIMD cores primarily for FP ops, as can be seen by
>> both SSE and supporting only FP at first, and only later adding
>> integer stuff, because it cost little extra.  Plus, we have added GPUs
>> that are now capable of doing huge amounts of FP ops, with uses in
>> graphics rendering, HPC and AI.
>
> Once SIMD gains Integer operations, the Multiplier has to be built to
> do both, might as well use it for more things than just SIMD.
>

Well, and/or just use separate multipliers for each, under the rationale
that, at least on an FPGA, it already has the DSP48's whether or not one
uses them (and a lot more DSP48's than I can really make effective use of).

Well, with the caveat that they only natively do signed 18-bit multiply.

>>> Otherwise, one can debate whether or not having DIV/MOD in hardware
>>> makes sense at all (and if they do have it, "cheap-ish" 68 cycle DIV
>>> is at least "probably" faster than a generic software-only solution).
>
>> That debate has been held, and MIPS has hardware integer divide, Alpha
>> and IA-64 don't have a hardware integer divide; they both have FP
>> divide instructions.
>
> And all of these lead fruitful long productive lives before taking over
> -------Oh wait !!
>

MIPS at least did moderately well for a time, might have done better if
it were more open.

Both Alpha and IA-64 had a lot of questionable seeming design choices.

>> However, looking at more recent architectures, the RISC-V M extension
>> (which is part of RV64G and RV32G, i.e., a standard extension) has not
>> just multiply instructions (MUL, MULH, MULHU, MULHSU, MULW), but also
>> integer divide instructions: DIV, DIVU, REM, REMU, DIVW, DIVUW, REMW,
>> and REMUW.
>
> All of which are possible in My 66000 using operand sign control, S-bit,
> and CARRY when you want 64×64->128 or 128/64->{64 quotient, 64 remainder}
>
>>            ARM A64 also has divide instructions (SDIV, UDIV), but
>> RISC-V seems significant to me because there the philosophy seems to
>> be to go for minimalism.  So the debate has apparently come to the
>> conclusion that for general-purpose architectures, you include an
>> integer divide instruction.
>
> Several forms of integer DIV at least signed and unsigned.....

I had left it as optional, but I guess, optional is not the same as
absent, and my main profiles had ended up including it.

The ops are generally given names like:
DIVS.L, DIVU.L, DIVS.Q, DIVU.Q
MODS.L, MODU.L, MODS.Q, MODU.Q

One may notice a curious level of overlap with the RISC-V ops here...

Though, for 32-bit:
MULS.L, MULU.L: Perform a narrow multiply.
DMULS.L, DMULU.L: Perform a widening multiply.
Along with:
MULS.Q, MULU.Q: 64-bit multiply
MULHS.Q, MULHU.Q: 64-bit multiply, high 64 bits of result

DIV, MOD, and 64-bit multiply, is drastically slower than 32-bit
multiply though.

Re: Alternative Representations of the Concertina II ISA

<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35317&group=comp.arch#35317

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 16:23:05 +0000
Organization: novaBBS
Message-ID: <344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com> <uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me> <uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2612573"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$bT/G/jl5Sjrc/EZ/mSQC9eBzNqaOWG62mYbV3xVozR0J35efHRxmG
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: MitchAlsup - Thu, 30 Nov 2023 16:23 UTC

BGB wrote:

> Have noted that these old games did have a nifty trick for calculating
> approximate distance, say:
> dx=x0-x1;
> dy=y0-y1;
> adx=dx^(dx>>31);
> ady=dy^(dy>>31);
> if(ady>adx)
> { t=adx; adx=ady; ady=t; } //common
> // { adx^=ady; ady^=adx; adx^=ady; } //sometimes
> d=adx+(ady>>1);

Why not::

dx=x0-x1;
dy=y0-y1;
adx=dx^(dx>>31);
ady=dy^(dy>>31);
if(ady>adx)
d=adx+(ady>>1);
else
d=ady+(adx>>1);

Re: Alternative Representations of the Concertina II ISA

<5d68ac8699997099a3609c4836eec658@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35318&group=comp.arch#35318

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 16:25:58 +0000
Organization: novaBBS
Message-ID: <5d68ac8699997099a3609c4836eec658@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <2023Nov27.101759@mips.complang.tuwien.ac.at> <3020102144e0e12cd79c784d2b80af78@news.novabbs.com> <uk5hl7$dlb6$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2612573"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$E30lBO2I80qYKCq3JYe4guHoixcnjujFnqYNWiaEK9Fmbe5G4VUWm
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: MitchAlsup - Thu, 30 Nov 2023 16:25 UTC

BGB wrote:

> 64-bit multiply, is drastically slower than 32-bit
> multiply though.

Should only be 1 gate of delay longer......

Re: Alternative Representations of the Concertina II ISA

<ukafih$1ecg6$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35321&group=comp.arch#35321

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: robfi680@gmail.com (Robert Finch)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 12:08:33 -0500
Organization: A noiseless patient Spider
Lines: 16
Message-ID: <ukafih$1ecg6$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
<3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
<uk5hl7$dlb6$2@dont-email.me>
<5d68ac8699997099a3609c4836eec658@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 30 Nov 2023 17:08:34 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="108932fdbdfcb95c13135e42463893dc";
logging-data="1520134"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+49H6Mt4NwRvl0pe8hJXJk0h4iVsmSDyE="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:fPmfz4I4i3CU2z90Cs2drI4DQos=
In-Reply-To: <5d68ac8699997099a3609c4836eec658@news.novabbs.com>
Content-Language: en-US
 by: Robert Finch - Thu, 30 Nov 2023 17:08 UTC

On 2023-11-30 11:25 a.m., MitchAlsup wrote:
> BGB wrote:
>
>>               64-bit multiply, is drastically slower than 32-bit
>> multiply though.
>
> Should only be 1 gate of delay longer......

FPGA land using DSPs, 24x16 multiply is single cycle, 32x32 could be
single cycle if a lower fmax is acceptable, but 64x64 best pipelined for
several cycles (6). Thor had a fast 24x16 multiply as it is good enough
for array address calcs in a lot of circumstances.

The slower 64x64 multiply means that the basic shift and subtract divide
is more appealing compared to NR or Goldschmidt.

Re: Alternative Representations of the Concertina II ISA

<d692c7b0e4ca88cb93a57c4e5af1a7d8@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35322&group=comp.arch#35322

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 17:50:10 +0000
Organization: novaBBS
Message-ID: <d692c7b0e4ca88cb93a57c4e5af1a7d8@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <2023Nov27.101759@mips.complang.tuwien.ac.at> <3020102144e0e12cd79c784d2b80af78@news.novabbs.com> <uk5hl7$dlb6$2@dont-email.me> <5d68ac8699997099a3609c4836eec658@news.novabbs.com> <ukafih$1ecg6$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2619559"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$19cHViVCS2lD71OilygcmOBG4KGtSay7Bh/oGjHgElr55F3S8vap6
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Thu, 30 Nov 2023 17:50 UTC

Robert Finch wrote:

> On 2023-11-30 11:25 a.m., MitchAlsup wrote:
>> BGB wrote:
>>
>>>               64-bit multiply, is drastically slower than 32-bit
>>> multiply though.
>>
>> Should only be 1 gate of delay longer......

> FPGA land using DSPs, 24x16 multiply is single cycle, 32x32 could be
> single cycle if a lower fmax is acceptable, but 64x64 best pipelined for
> several cycles (6). Thor had a fast 24x16 multiply as it is good enough
> for array address calcs in a lot of circumstances.

Nothing stops you from building a multiplier in LUTs with Boothe recoding.
Each pair of LUTs being a 3-input XOR and a 3-input Majority gate. This also
gets rid of BGBs argument on building an incomplete multiplier tree {because
IEE needs 53-bits while your DSP only supplies 48.}

Re: Alternative Representations of the Concertina II ISA

<ukaltc$1fmua$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35324&group=comp.arch#35324

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 12:56:42 -0600
Organization: A noiseless patient Spider
Lines: 127
Message-ID: <ukaltc$1fmua$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
<3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
<uk5hl7$dlb6$2@dont-email.me>
<5d68ac8699997099a3609c4836eec658@news.novabbs.com>
<ukafih$1ecg6$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 30 Nov 2023 18:56:45 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="93ff1ff15b704e58eabaab2bf2b2e768";
logging-data="1563594"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18/dkVYv6F2oURo9d0cDYd1"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:gVIcuFgVcbY0dbkAkkul/Mq13Jo=
Content-Language: en-US
In-Reply-To: <ukafih$1ecg6$1@dont-email.me>
 by: BGB - Thu, 30 Nov 2023 18:56 UTC

On 11/30/2023 11:08 AM, Robert Finch wrote:
> On 2023-11-30 11:25 a.m., MitchAlsup wrote:
>> BGB wrote:
>>
>>>               64-bit multiply, is drastically slower than 32-bit
>>> multiply though.
>>
>> Should only be 1 gate of delay longer......
>
> FPGA land using DSPs, 24x16 multiply is single cycle, 32x32 could be
> single cycle if a lower fmax is acceptable, but 64x64 best pipelined for
> several cycles (6). Thor had a fast 24x16 multiply as it is good enough
> for array address calcs in a lot of circumstances.
>

Yeah, 18x18s (by extension, 16s and 16u) can be single cycle, no issue.

In my case, 32x32 is 3 cycle.
For 64x64, "all hell breaks loose" if one tries to express it directly.

The limiting factor isn't the first-stage of multipliers, but all the
adders needed to glue the results together. Likely, wherever is going on
inside the DSP48 is a lot faster than what happens with the generic FPGA
fabric.

For small 3x3->6 bit multipliers or similar, these can be built directly
from LUTs, and for very small values this can be favorable to using a DSP48.

Only reason the multiply in Binary64 FMUL works OK, is because I
effectively ignore all the low-order bits that would have existed.
Though, arguably, this could be part of why N-R can't converge on the
last 4 bits of the result (since this part seemingly involves sub-ULP
shenanigans that would depend on the low-order bits of the internal
multiply, which don't exist in this case).

When I looked at it originally, doing a direct 64-bit multiply wasn't
terribly attractive because it would be both fairly expensive and not
that much faster than a plain software option built on 32-bit widening
multiply.

Say (with ADD.X, 64x64 -> 128):
SHAD.Q R4, -32, R6 | SHAD.Q R5, -32, R7 //1c
MOV 0, R3 | DMULU.L R4, R5, R16 //1c
DMULU.L R6, R4, R18 //1c *1
DMULU.L R4, R7, R19 //1c
DMULS.L R6, R7, R17 //1c
MOVLLD R18, R3, R20 | SHAD.Q R18, -32, R21 //1c
MOVLLD R19, R3, R22 | SHAD.Q R19, -32, R23 //2c
ADD.X R20, R22, R2 //2c
ADD.X R2, R16, R2 //1c
RTS //2c
So, ~ 13 cycles (with 2c interlock penalty).

Or, say 64x64->64:
SHAD.Q R4, -32, R6 | SHAD.Q R5, -32, R7 //1c
MOV 0, R3 | DMULU.L R4, R5, R16 //1c
DMULU.L R6, R4, R18 //2c *1
DMULU.L R4, R7, R19 //3c
ADD R18, R19, R17 //2c
SHAD.Q R17, 32, R17 //2c
ADD R16, R17, R2 //1c
RTS //2c
Here, ~ 14 cycles (5 cycles interlock penalty).

*1: Can't remember for certain ATM whether DMULS.L or DMULU.L is needed
for these parts. Mostly effects the contributions of sign in the
high-order results. But, for 64-bit narrow multiply, we can mostly
ignore the sign, as it doesn't change anything in the low 64 bits of the
result.

Though, there will be some additional overhead for the function call,
which would be avoided with a CPU instruction.

Even with a slow 68-cycle MULS.Q, when function call overheads are
considered, it still isn't too far from break-even.

In my case, this part is optional, and can be controlled with a compiler
command-line option.

> The slower 64x64 multiply means that the basic shift and subtract divide
> is more appealing compared to NR or Goldschmidt.
>

Yes, agreed.

These require a fast full-precision multiplier (or two for
Goldschmidt?), which I don't have either in this case.

But, luckily, had noted that Shift-and-Subtract was trivially turned
into Shift-and-Add, and that the same general algorithm could be used
(with some minor tweaks) to also function as a 64-bit multiplier.

It isn't fast, but:
It works...
Is is moderately affordable.
Can technically give a 64-bit multiply op as well.
Even if, for performance, still better to do 64b MUL in software.

Though, the difference between the ops isn't too unreasonable when
function-call overhead is considered.

And, for general case DIV and MOD, the ops were still faster than the
software versions (even when the software version may have optimizations
like mapping small divisors through a lookup table of reciprocals).

This was mostly because of overheads:
Function calls aren't free (may need to spill stuff, shuffle values
around in registers, etc);
Special-casing the small divisors requires additional logic, and adds a
layer of function-call overhead for the general case.

And, vs a 36 cycle DIVS.L, is isn't hard to burn this just with function
call related overheads.

Re: Alternative Representations of the Concertina II ISA

<ukan49$1fubo$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35325&group=comp.arch#35325

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 13:17:27 -0600
Organization: A noiseless patient Spider
Lines: 55
Message-ID: <ukan49$1fubo$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
<3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
<uk5hl7$dlb6$2@dont-email.me>
<5d68ac8699997099a3609c4836eec658@news.novabbs.com>
<ukafih$1ecg6$1@dont-email.me>
<d692c7b0e4ca88cb93a57c4e5af1a7d8@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 30 Nov 2023 19:17:29 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="93ff1ff15b704e58eabaab2bf2b2e768";
logging-data="1571192"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/1kmQ+swto5tC5NA2SgBfb"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:sEfyeB5eSCVr+80eTMxVC7J541Q=
In-Reply-To: <d692c7b0e4ca88cb93a57c4e5af1a7d8@news.novabbs.com>
Content-Language: en-US
 by: BGB - Thu, 30 Nov 2023 19:17 UTC

On 11/30/2023 11:50 AM, MitchAlsup wrote:
> Robert Finch wrote:
>
>> On 2023-11-30 11:25 a.m., MitchAlsup wrote:
>>> BGB wrote:
>>>
>>>>               64-bit multiply, is drastically slower than 32-bit
>>>> multiply though.
>>>
>>> Should only be 1 gate of delay longer......
>
>> FPGA land using DSPs, 24x16 multiply is single cycle, 32x32 could be
>> single cycle if a lower fmax is acceptable, but 64x64 best pipelined
>> for several cycles (6). Thor had a fast 24x16 multiply as it is good
>> enough for array address calcs in a lot of circumstances.
>
> Nothing stops you from building a multiplier in LUTs with Boothe recoding.
> Each pair of LUTs being a 3-input XOR and a 3-input Majority gate. This
> also
> gets rid of BGBs argument on building an incomplete multiplier tree
> {because
> IEE needs 53-bits while your DSP only supplies 48.}

But...

Then one needs to deal with the full latency of a multiplier built from
LUT's and CARRY4's, which isn't going to be small...

One could do this, but would likely need to add a few cycles of latency
to the FMUL.

As is, only around 34 bits of the DSP output are used if one is using
them as unsigned 17 bit multipliers. There is still a short-fall of a
few bits, but this can be patched over using 3-bit LUT multipliers or
similar.

It seems like the ADDer capability of the DSPs isn't getting used, but
this is partly a case of being able to get it to synthesize in a useful way.

Say:
D <= A * B + C;
Which means, one needs the 'C' input before they can do the multiply
(and happens to work in a way that is more useful for small MAC
operations than for building a bigger multiplier).

Luckily at least, my FMUL does produce a full-width mantissa for
Binary64, with the low-end part of the multiplier being cut off mostly
effecting the ULP rounding.

Re: Alternative Representations of the Concertina II ISA

<ukaolb$1g6mo$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35326&group=comp.arch#35326

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 13:43:37 -0600
Organization: A noiseless patient Spider
Lines: 60
Message-ID: <ukaolb$1g6mo$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 30 Nov 2023 19:43:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="93ff1ff15b704e58eabaab2bf2b2e768";
logging-data="1579736"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/qY9wgKkAHuYo8QhQh81pt"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:o+RmFZDcu0xDEkfc16z8nNIemm8=
Content-Language: en-US
In-Reply-To: <344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
 by: BGB - Thu, 30 Nov 2023 19:43 UTC

On 11/30/2023 10:23 AM, MitchAlsup wrote:
> BGB wrote:
>
>> Have noted that these old games did have a nifty trick for calculating
>> approximate distance, say:
>>    dx=x0-x1;
>>    dy=y0-y1;
>>    adx=dx^(dx>>31);
>>    ady=dy^(dy>>31);
>>    if(ady>adx)
>>      { t=adx; adx=ady; ady=t; }         //common
>> //  { adx^=ady; ady^=adx; adx^=ady; }  //sometimes
>>    d=adx+(ady>>1);
>
> Why not::
>
>     dx=x0-x1;
>     dy=y0-y1;
>     adx=dx^(dx>>31);
>     ady=dy^(dy>>31);
>     if(ady>adx)
>         d=adx+(ady>>1);
>     else         d=ady+(adx>>1);

That also works, I think this is closer to the form it was used in ROTT
(where I had first noticed it being used).

Swapping the values is easier to extend to 3D though, say:
dx=x0-x1;
dy=y0-y1;
dz=z0-z1;
adx=dx^(dx>>31);
ady=dy^(dy>>31);
adz=dz^(dz>>31);
if(ady>adx)
{ t=adx; adx=ady; ady=t; }
if(adz>ady)
{ t=ady; ady=adz; adz=t; }
if(ady>adx)
{ t=adx; adx=ady; ady=t; }
d=adx+(ady>>1)+(adz>>2);

There isn't a strong reason to prefer one option over another here,
except to avoid the XOR swap, as in this case the XOR's are sequentially
dependent on each other.

Not so pretty for 4D though, as swaps quickly come to dominate things.

There is a way of doing it with a "poor man's square root", but it would
be a lot more hairy looking than the above, and for fixed-point requires
knowing where the decimal point is at (whereas the above manages to work
while also remaining agnostic as to the location of the decimal point).

Luckily at least, one doesn't really need to deal with 4D spheres with
fixed-point all that often.

Re: Alternative Representations of the Concertina II ISA

<68b4893596b51df6084755251f7b95e1@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35327&group=comp.arch#35327

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 20:27:39 +0000
Organization: novaBBS
Message-ID: <68b4893596b51df6084755251f7b95e1@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com> <uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me> <uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me> <344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com> <ukaolb$1g6mo$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2632899"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$IEv9.3jsvGgYttDikyvNuunt/ZxHthS0fXfnjhPuwe6px6EO/wK3i
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: MitchAlsup - Thu, 30 Nov 2023 20:27 UTC

BGB wrote:

> On 11/30/2023 10:23 AM, MitchAlsup wrote:
>> BGB wrote:
>>
>>> Have noted that these old games did have a nifty trick for calculating
>>> approximate distance, say:
>>>    dx=x0-x1;
>>>    dy=y0-y1;
>>>    adx=dx^(dx>>31);
>>>    ady=dy^(dy>>31);
>>>    if(ady>adx)
>>>      { t=adx; adx=ady; ady=t; }         //common
>>> //  { adx^=ady; ady^=adx; adx^=ady; }  //sometimes
>>>    d=adx+(ady>>1);
>>
>> Why not::
>>
>>     dx=x0-x1;
>>     dy=y0-y1;
>>     adx=dx^(dx>>31);
>>     ady=dy^(dy>>31);
>>     if(ady>adx)
>>         d=adx+(ady>>1);
>>     else
>>         d=ady+(adx>>1);

As long as ISA has max() and min():: Why not::

    dx=x0-x1;
    dy=y0-y1;
    adx=dx^(dx>>31);
    ady=dy^(dy>>31);
d = min( adx, ady ) + (max( adx, ady ) >> 1);

>

Re: Alternative Representations of the Concertina II ISA

<ukb0pf$1hhp1$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35329&group=comp.arch#35329

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 16:02:20 -0600
Organization: A noiseless patient Spider
Lines: 110
Message-ID: <ukb0pf$1hhp1$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukaolb$1g6mo$1@dont-email.me>
<68b4893596b51df6084755251f7b95e1@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 30 Nov 2023 22:02:23 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="93ff1ff15b704e58eabaab2bf2b2e768";
logging-data="1623841"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/zcVkVUdZbpCtQNshWYPkT"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:JAjyhfKtH1zQ2uLddXGYFQ5pmyI=
Content-Language: en-US
In-Reply-To: <68b4893596b51df6084755251f7b95e1@news.novabbs.com>
 by: BGB - Thu, 30 Nov 2023 22:02 UTC

On 11/30/2023 2:27 PM, MitchAlsup wrote:
> BGB wrote:
>
>> On 11/30/2023 10:23 AM, MitchAlsup wrote:
>>> BGB wrote:
>>>
>>>> Have noted that these old games did have a nifty trick for
>>>> calculating approximate distance, say:
>>>>    dx=x0-x1;
>>>>    dy=y0-y1;
>>>>    adx=dx^(dx>>31);
>>>>    ady=dy^(dy>>31);
>>>>    if(ady>adx)
>>>>      { t=adx; adx=ady; ady=t; }         //common
>>>> //  { adx^=ady; ady^=adx; adx^=ady; }  //sometimes
>>>>    d=adx+(ady>>1);
>>>
>>> Why not::
>>>
>>>      dx=x0-x1;
>>>      dy=y0-y1;
>>>      adx=dx^(dx>>31);
>>>      ady=dy^(dy>>31);
>>>      if(ady>adx)
>>>          d=adx+(ady>>1);
>>>      else
>>>          d=ady+(adx>>1);
>
> As long as ISA has max() and min():: Why not::
>
>     dx=x0-x1;
>     dy=y0-y1;
>     adx=dx^(dx>>31);
>     ady=dy^(dy>>31);
>     d = min( adx, ady ) + (max( adx, ady ) >> 1);
>

Yeah, I guess that could work as well for 2D...

Though, makes more sense to use the "__int32_min" intrinsic or similar,
rather than the generic C library "min()"/"max()" macros.
#define min(a, b) (((a)<(b))?(a):(b))
#define max(a, b) (((a)>(b))?(a):(b))

Though, apparently the existence of these macros in stdlib.h is
controversial (they are apparently sort of a POSIX legacy feature, and
not part of the C standard).

Mostly, due to implementation issues in BGBCC, ?: needs an actual
branch, so is ironically actually worse for performance in this case
than using an explicit if/else...

A case could be made for C11 "_Generic", but would need to implement it.

But, yeah, apart from compiler issues, in theory the min/max part could
be 3 instructions.

If done in ASM, might be something like:
// R4..R7 (x0, y0, x1, y1)
SUB R4, R6, R16 | SUB R5, R7, R17
SHAD R16, -31, R18 | SHAD R17, -31, R19
XOR R16, R18, R16 | XOR R17, R19, R17
CMPGT R17, R16
CSELT R16, R17, R18 | CSELT R17, R16, R19
SHAD R19, -1, R19
ADD R18, R19, R2

I don't have actual MIN/MAX ops though.
Though, it seems RISC-V does have them in the Bitmanip extension.

Could probably add them in-theory (maybe along with FMIN/FMAX).

I guess, if I did add FMIN/FMAX, I would technically have enough in
place to "more or less" be able to support the F/D extensions in RISC-V
mode as well at this point, which (if I re-enable the BJX2-LDOP
extension) would in-turn be enough to theoretically be able to push it
potentially op to RV64G in userland (or maybe RV64GC if I got around to
finishing the RV-C decoder).

I guess, RV64GC could be useful as this is generally what the Linux
userland is using (but, RV64IM_ZfinxZdinx, not so much...).

But, F/D could have F0..F31 be mapped to R32..R63 in RISC-V mode, ...

But, annoyingly, without the Privileged spec, nor going off and trying
to clone a bunch of SiFive's hardware level interface or similar," more
correct" CSR handling, ..., there would be little hope of running an
unmodified Linux kernel on it.

But, I guess in theory, if one has RV64GC, the kernel could be ported to
work with the unusual interrupt handling and similar. Theoretically,
there are now mechanisms in place in the design to allow the CPU to
handle interrupts directly in RISC-V mode, avoiding the need for a mixed
ISA kernel (though, very little else would be glossed over at this level).

But, would there be much point in running RV64 Linux on the BJX2
core?... Seems like it would pretty much defeat the whole point in this
case (say, vs using a CPU actually designed to run RV64 as its native ISA).

Well, and/or mostly just run much of the userland in RV64 mode and then
mimic the Linux kernel's syscalls...

Re: Alternative Representations of the Concertina II ISA

<4a7556a465e686b0d75d48238469544c@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35332&group=comp.arch#35332

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 23:15:11 +0000
Organization: novaBBS
Message-ID: <4a7556a465e686b0d75d48238469544c@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com> <uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me> <uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me> <344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com> <ukaolb$1g6mo$1@dont-email.me> <68b4893596b51df6084755251f7b95e1@news.novabbs.com> <ukb0pf$1hhp1$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2647061"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$T8TIqQQo7zh2qsuE458vxOHAwaTzGrKa1Ae4/nwMEYwpRLq5JCHD6
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: MitchAlsup - Thu, 30 Nov 2023 23:15 UTC

BGB wrote:

> On 11/30/2023 2:27 PM, MitchAlsup wrote:
>> BGB wrote:
>>
>>> On 11/30/2023 10:23 AM, MitchAlsup wrote:
>>>> BGB wrote:
>>>>
>>>>> Have noted that these old games did have a nifty trick for
>>>>> calculating approximate distance, say:
>>>>>    dx=x0-x1;
>>>>>    dy=y0-y1;
>>>>>    adx=dx^(dx>>31);
>>>>>    ady=dy^(dy>>31);
>>>>>    if(ady>adx)
>>>>>      { t=adx; adx=ady; ady=t; }         //common
>>>>> //  { adx^=ady; ady^=adx; adx^=ady; }  //sometimes
>>>>>    d=adx+(ady>>1);
>>>>
>>>> Why not::
>>>>
>>>>      dx=x0-x1;
>>>>      dy=y0-y1;
>>>>      adx=dx^(dx>>31);
>>>>      ady=dy^(dy>>31);
>>>>      if(ady>adx)
>>>>          d=adx+(ady>>1);
>>>>      else
>>>>          d=ady+(adx>>1);
>>
>> As long as ISA has max() and min():: Why not::
>>
>>     dx=x0-x1;
>>     dy=y0-y1;
>>     adx=dx^(dx>>31);
>>     ady=dy^(dy>>31);
>>     d = min( adx, ady ) + (max( adx, ady ) >> 1);
>>

> Yeah, I guess that could work as well for 2D...

> Though, makes more sense to use the "__int32_min" intrinsic or similar,
> rather than the generic C library "min()"/"max()" macros.
> #define min(a, b) (((a)<(b))?(a):(b))
> #define max(a, b) (((a)>(b))?(a):(b))

max() and min() are single instructions in my ISA. On the FP side, they
even get NaNs put in the proper places, too.

> Though, apparently the existence of these macros in stdlib.h is
> controversial (they are apparently sort of a POSIX legacy feature, and
> not part of the C standard).

It took Brian quite some time to recognize that crap and turn it all into
max() and min() instructions.

> Mostly, due to implementation issues in BGBCC, ?: needs an actual
> branch, so is ironically actually worse for performance in this case
> than using an explicit if/else...

My point exactly--but perhaps you should spend the effort to make your
compiler recognize these as single branch-free instructions.

> A case could be made for C11 "_Generic", but would need to implement it.

> But, yeah, apart from compiler issues, in theory the min/max part could
> be 3 instructions.

max() and min() are easy to perform in a single cycle in both int and FP

> If done in ASM, might be something like:
> // R4..R7 (x0, y0, x1, y1)
> SUB R4, R6, R16 | SUB R5, R7, R17
> SHAD R16, -31, R18 | SHAD R17, -31, R19
> XOR R16, R18, R16 | XOR R17, R19, R17
> CMPGT R17, R16
> CSELT R16, R17, R18 | CSELT R17, R16, R19
> SHAD R19, -1, R19
> ADD R18, R19, R2

And you are claiming this has an advantage over::

MAX R9,R8,R7
MIN R10,R8,R7
SRL R10,R10,#1
LEA R10,[R9,R10,#-1]

> I don't have actual MIN/MAX ops though.

perhaps you should reconsider...

> Though, it seems RISC-V does have them in the Bitmanip extension.

> Could probably add them in-theory (maybe along with FMIN/FMAX).

I use the same FU to perform Int max() and FP max()--remember once you
special case out the problems, IEEE is a linearly ordered set of numbers.

Re: Alternative Representations of the Concertina II ISA

<ukbar2$1iufq$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35334&group=comp.arch#35334

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: robfi680@gmail.com (Robert Finch)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 19:53:53 -0500
Organization: A noiseless patient Spider
Lines: 105
Message-ID: <ukbar2$1iufq$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukaolb$1g6mo$1@dont-email.me>
<68b4893596b51df6084755251f7b95e1@news.novabbs.com>
<ukb0pf$1hhp1$1@dont-email.me>
<4a7556a465e686b0d75d48238469544c@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 1 Dec 2023 00:53:54 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="579b79356b0ea203efc2b134288f0458";
logging-data="1669626"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX194SGPJCyk67IcUL5hBeB7vjF+F0BTKZeY="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:yKjKsNpqy0FFDG2vBX9hNYzbqEU=
Content-Language: en-US
In-Reply-To: <4a7556a465e686b0d75d48238469544c@news.novabbs.com>
 by: Robert Finch - Fri, 1 Dec 2023 00:53 UTC

On 2023-11-30 6:15 p.m., MitchAlsup wrote:
> BGB wrote:
>
>> On 11/30/2023 2:27 PM, MitchAlsup wrote:
>>> BGB wrote:
>>>
>>>> On 11/30/2023 10:23 AM, MitchAlsup wrote:
>>>>> BGB wrote:
>>>>>
>>>>>> Have noted that these old games did have a nifty trick for
>>>>>> calculating approximate distance, say:
>>>>>>    dx=x0-x1;
>>>>>>    dy=y0-y1;
>>>>>>    adx=dx^(dx>>31);
>>>>>>    ady=dy^(dy>>31);
>>>>>>    if(ady>adx)
>>>>>>      { t=adx; adx=ady; ady=t; }         //common
>>>>>> //  { adx^=ady; ady^=adx; adx^=ady; }  //sometimes
>>>>>>    d=adx+(ady>>1);
>>>>>
>>>>> Why not::
>>>>>
>>>>>      dx=x0-x1;
>>>>>      dy=y0-y1;
>>>>>      adx=dx^(dx>>31);
>>>>>      ady=dy^(dy>>31);
>>>>>      if(ady>adx)
>>>>>          d=adx+(ady>>1);
>>>>>      else
>>>>>          d=ady+(adx>>1);
>>>
>>> As long as ISA has max() and min():: Why not::
>>>
>>>      dx=x0-x1;
>>>      dy=y0-y1;
>>>      adx=dx^(dx>>31);
>>>      ady=dy^(dy>>31);
>>>      d = min( adx, ady ) + (max( adx, ady ) >> 1);
>>>
>
>> Yeah, I guess that could work as well for 2D...
>
>
>> Though, makes more sense to use the "__int32_min" intrinsic or
>> similar, rather than the generic C library "min()"/"max()" macros.
>>    #define min(a, b)   (((a)<(b))?(a):(b))
>>    #define max(a, b)   (((a)>(b))?(a):(b))
>
> max() and min() are single instructions in my ISA. On the FP side, they
> even get NaNs put in the proper places, too.
>
>> Though, apparently the existence of these macros in stdlib.h is
>> controversial (they are apparently sort of a POSIX legacy feature, and
>> not part of the C standard).
>
> It took Brian quite some time to recognize that crap and turn it all into
> max() and min() instructions.
>
>> Mostly, due to implementation issues in BGBCC, ?: needs an actual
>> branch, so is ironically actually worse for performance in this case
>> than using an explicit if/else...
>
> My point exactly--but perhaps you should spend the effort to make your
> compiler recognize these as single branch-free instructions.
>
>> A case could be made for C11 "_Generic", but would need to implement it.
>
>
>> But, yeah, apart from compiler issues, in theory the min/max part
>> could be 3 instructions.
>
> max() and min() are easy to perform in a single cycle in both int and FP
>
>> If done in ASM, might be something like:
>>    // R4..R7 (x0, y0, x1, y1)
>>    SUB  R4, R6, R16   |  SUB  R5, R7, R17
>>    SHAD R16, -31, R18 | SHAD R17, -31, R19
>>    XOR  R16, R18, R16 | XOR  R17, R19, R17
>>    CMPGT R17, R16
>>    CSELT R16, R17, R18 | CSELT R17, R16, R19
>>    SHAD R19, -1, R19
>>    ADD  R18, R19, R2
>
> And you are claiming this has an advantage over::
>
>     MAX  R9,R8,R7
>     MIN  R10,R8,R7
>     SRL  R10,R10,#1
>     LEA  R10,[R9,R10,#-1]
>
>
>> I don't have actual MIN/MAX ops though.
>
> perhaps you should reconsider...
>
>>    Though, it seems RISC-V does have them in the Bitmanip extension.
>
>> Could probably add them in-theory (maybe along with FMIN/FMAX).
>
> I use the same FU to perform Int max() and FP max()--remember once you
> special case out the problems, IEEE is a linearly ordered set of numbers.

Q+ has min3 / max3 for minimum or maximum of three values.
But only fmin / fmax for float.

Re: Alternative Representations of the Concertina II ISA

<6439624601c1873b37e03e3c8b95bb22@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35336&group=comp.arch#35336

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Fri, 1 Dec 2023 02:46:36 +0000
Organization: novaBBS
Message-ID: <6439624601c1873b37e03e3c8b95bb22@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com> <uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me> <uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me> <344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com> <ukaolb$1g6mo$1@dont-email.me> <68b4893596b51df6084755251f7b95e1@news.novabbs.com> <ukb0pf$1hhp1$1@dont-email.me> <4a7556a465e686b0d75d48238469544c@news.novabbs.com> <ukbar2$1iufq$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2660720"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$xW.4/6SY5yIr7Sk3C.Lq9u2ybiNiig5TcNtzJCHmM7KT8ilyudFbW
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Fri, 1 Dec 2023 02:46 UTC

Robert Finch wrote:
>>
>>> Could probably add them in-theory (maybe along with FMIN/FMAX).
>>
>> I use the same FU to perform Int max() and FP max()--remember once you
>> special case out the problems, IEEE is a linearly ordered set of numbers.

> Q+ has min3 / max3 for minimum or maximum of three values.
> But only fmin / fmax for float.

Interesting, but I did not have enough room in the 3-operand subGroup.

How do you determine the mid( x, y, z ) ?? allowing::

mx = max( x, y, z );
mn = min( x, y, z );
mi = mid( x, y, z );

??

Re: Alternative Representations of the Concertina II ISA

<ukbi0o$1jqtv$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35338&group=comp.arch#35338

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m.thomasson.1@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 18:56:23 -0800
Organization: A noiseless patient Spider
Lines: 45
Message-ID: <ukbi0o$1jqtv$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukaolb$1g6mo$1@dont-email.me>
<68b4893596b51df6084755251f7b95e1@news.novabbs.com>
<ukb0pf$1hhp1$1@dont-email.me>
<4a7556a465e686b0d75d48238469544c@news.novabbs.com>
<ukbar2$1iufq$1@dont-email.me>
<6439624601c1873b37e03e3c8b95bb22@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 1 Dec 2023 02:56:24 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="cf0aea544530b4574af4925a1242e520";
logging-data="1698751"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/vXSNgx+OfY+BJwNQLYqD7axaZ3NkYQ9Y="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:1ZQ90NNkz+cOGoSKhH+abhTny20=
In-Reply-To: <6439624601c1873b37e03e3c8b95bb22@news.novabbs.com>
Content-Language: en-US
 by: Chris M. Thomasson - Fri, 1 Dec 2023 02:56 UTC

On 11/30/2023 6:46 PM, MitchAlsup wrote:
> Robert Finch wrote:
>>>
>>>> Could probably add them in-theory (maybe along with FMIN/FMAX).
>>>
>>> I use the same FU to perform Int max() and FP max()--remember once you
>>> special case out the problems, IEEE is a linearly ordered set of
>>> numbers.
>
>> Q+ has min3 / max3 for minimum or maximum of three values.
>> But only fmin / fmax for float.
>
> Interesting, but I did not have enough room in the 3-operand subGroup.
>
> How do you determine the mid( x, y, z ) ?? allowing::
>
>     mx = max( x, y, z );
>     mn = min( x, y, z );
>     mi = mid( x, y, z );
>
> ??

Think along the lines of two n-ary points, 3-ary here:

Sorry for the pseudo-code:
____________________
// two n-ary points... 3-ary here:
p0 = (-1, 0, 0);
p1 = (1, 0, 0);

// difference and mid...
dif = p1 - p0;
mid = p0 + dif * .5;

// plot some points on the canvas...
plot_point(p0);
plot_point(mid);
plot_point(p1);
____________________

?

Actually, we have a min and a max, therefore we have a normalized
difference. .5 can be half way between min and max?

Re: Alternative Representations of the Concertina II ISA

<ukbo35$1odlg$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35339&group=comp.arch#35339

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.furie.org.uk!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: robfi680@gmail.com (Robert Finch)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 23:40:02 -0500
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <ukbo35$1odlg$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukaolb$1g6mo$1@dont-email.me>
<68b4893596b51df6084755251f7b95e1@news.novabbs.com>
<ukb0pf$1hhp1$1@dont-email.me>
<4a7556a465e686b0d75d48238469544c@news.novabbs.com>
<ukbar2$1iufq$1@dont-email.me>
<6439624601c1873b37e03e3c8b95bb22@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 1 Dec 2023 04:40:05 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="579b79356b0ea203efc2b134288f0458";
logging-data="1849008"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18yjtLrkJVfvAKHGDpl+Po3BURcmFUZkhM="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ZrkGvMkMAO6P2OJQKAYgZUCag0s=
In-Reply-To: <6439624601c1873b37e03e3c8b95bb22@news.novabbs.com>
Content-Language: en-US
 by: Robert Finch - Fri, 1 Dec 2023 04:40 UTC

On 2023-11-30 9:46 p.m., MitchAlsup wrote:
> Robert Finch wrote:
>>>
>>>> Could probably add them in-theory (maybe along with FMIN/FMAX).
>>>
>>> I use the same FU to perform Int max() and FP max()--remember once you
>>> special case out the problems, IEEE is a linearly ordered set of
>>> numbers.
>
>> Q+ has min3 / max3 for minimum or maximum of three values.
>> But only fmin / fmax for float.
>
> Interesting, but I did not have enough room in the 3-operand subGroup.
>
> How do you determine the mid( x, y, z ) ?? allowing::
>
>     mx = max( x, y, z );
>     mn = min( x, y, z );
>     mi = mid( x, y, z );
>
> ??
I had not thought to add a mid() operation. It would be fairly easy to
do but I am not sure using the opcode space would be worth it. There is
about 20 opcodes available. 3-operands pretty much use up a 32-bit
opcode as register specs are six bits. There is only one bit left over
for sub-grouping. There is MUX, CMOVxx, and PTRDIF all with 3-operands too.

if (x > y && x < z)
mi = x;
else if (y > x && y < z)
mi = y;
else
mi = z;

Re: Alternative Representations of the Concertina II ISA

<ukbpkl$1ohp6$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35340&group=comp.arch#35340

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.niel.me!news.gegeweb.eu!gegeweb.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m.thomasson.1@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 21:06:28 -0800
Organization: A noiseless patient Spider
Lines: 51
Message-ID: <ukbpkl$1ohp6$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukaolb$1g6mo$1@dont-email.me>
<68b4893596b51df6084755251f7b95e1@news.novabbs.com>
<ukb0pf$1hhp1$1@dont-email.me>
<4a7556a465e686b0d75d48238469544c@news.novabbs.com>
<ukbar2$1iufq$1@dont-email.me>
<6439624601c1873b37e03e3c8b95bb22@news.novabbs.com>
<ukbi0o$1jqtv$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 1 Dec 2023 05:06:29 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="cf0aea544530b4574af4925a1242e520";
logging-data="1853222"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18cyWVu+3GnkAW0xupOo+iCxPtopTBluro="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:R4uQW5i4FjZt0+lM1YfvC/dvlS0=
In-Reply-To: <ukbi0o$1jqtv$1@dont-email.me>
Content-Language: en-US
 by: Chris M. Thomasson - Fri, 1 Dec 2023 05:06 UTC

On 11/30/2023 6:56 PM, Chris M. Thomasson wrote:
> On 11/30/2023 6:46 PM, MitchAlsup wrote:
>> Robert Finch wrote:
>>>>
>>>>> Could probably add them in-theory (maybe along with FMIN/FMAX).
>>>>
>>>> I use the same FU to perform Int max() and FP max()--remember once you
>>>> special case out the problems, IEEE is a linearly ordered set of
>>>> numbers.
>>
>>> Q+ has min3 / max3 for minimum or maximum of three values.
>>> But only fmin / fmax for float.
>>
>> Interesting, but I did not have enough room in the 3-operand subGroup.
>>
>> How do you determine the mid( x, y, z ) ?? allowing::
>>
>>      mx = max( x, y, z );
>>      mn = min( x, y, z );
>>      mi = mid( x, y, z );
>>
>> ??
>
>
> Think along the lines of two n-ary points, 3-ary here:
>
> Sorry for the pseudo-code:
> ____________________
> // two n-ary points... 3-ary here:
> p0 = (-1, 0, 0);
> p1 = (1, 0, 0);
>
> // difference and mid...
> dif = p1 - p0;
> mid = p0 + dif * .5;
>
> // plot some points on the canvas...
> plot_point(p0);
> plot_point(mid);
> plot_point(p1);
> ____________________
>
> ?
>
> Actually, we have a min and a max, therefore we have a normalized
> difference. .5 can be half way between min and max?

Imvvvho, an interesting question is how to get a equipotential in 3d
space when there are an infinite number of them... One of my thoughts is
to gain three points in a field line, then use the surface normal of the
triangle for an equipotential.

Re: Alternative Representations of the Concertina II ISA

<ukbq6o$1okqv$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35341&group=comp.arch#35341

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 30 Nov 2023 23:16:05 -0600
Organization: A noiseless patient Spider
Lines: 181
Message-ID: <ukbq6o$1okqv$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukaolb$1g6mo$1@dont-email.me>
<68b4893596b51df6084755251f7b95e1@news.novabbs.com>
<ukb0pf$1hhp1$1@dont-email.me>
<4a7556a465e686b0d75d48238469544c@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 1 Dec 2023 05:16:08 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="fccaa83fe9a17a4e9939919e85f7bbc4";
logging-data="1856351"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18YPVnENRtcsyqmQONOwWiG"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:qvz9W5zFEWWXns40w1oSQqvdiTw=
Content-Language: en-US
In-Reply-To: <4a7556a465e686b0d75d48238469544c@news.novabbs.com>
 by: BGB - Fri, 1 Dec 2023 05:16 UTC

On 11/30/2023 5:15 PM, MitchAlsup wrote:
> BGB wrote:
>
>> On 11/30/2023 2:27 PM, MitchAlsup wrote:
>>> BGB wrote:
>>>
>>>> On 11/30/2023 10:23 AM, MitchAlsup wrote:
>>>>> BGB wrote:
>>>>>
>>>>>> Have noted that these old games did have a nifty trick for
>>>>>> calculating approximate distance, say:
>>>>>>    dx=x0-x1;
>>>>>>    dy=y0-y1;
>>>>>>    adx=dx^(dx>>31);
>>>>>>    ady=dy^(dy>>31);
>>>>>>    if(ady>adx)
>>>>>>      { t=adx; adx=ady; ady=t; }         //common
>>>>>> //  { adx^=ady; ady^=adx; adx^=ady; }  //sometimes
>>>>>>    d=adx+(ady>>1);
>>>>>
>>>>> Why not::
>>>>>
>>>>>      dx=x0-x1;
>>>>>      dy=y0-y1;
>>>>>      adx=dx^(dx>>31);
>>>>>      ady=dy^(dy>>31);
>>>>>      if(ady>adx)
>>>>>          d=adx+(ady>>1);
>>>>>      else
>>>>>          d=ady+(adx>>1);
>>>
>>> As long as ISA has max() and min():: Why not::
>>>
>>>      dx=x0-x1;
>>>      dy=y0-y1;
>>>      adx=dx^(dx>>31);
>>>      ady=dy^(dy>>31);
>>>      d = min( adx, ady ) + (max( adx, ady ) >> 1);
>>>
>
>> Yeah, I guess that could work as well for 2D...
>
>
>> Though, makes more sense to use the "__int32_min" intrinsic or
>> similar, rather than the generic C library "min()"/"max()" macros.
>>    #define min(a, b)   (((a)<(b))?(a):(b))
>>    #define max(a, b)   (((a)>(b))?(a):(b))
>
> max() and min() are single instructions in my ISA. On the FP side, they
> even get NaNs put in the proper places, too.
>
>> Though, apparently the existence of these macros in stdlib.h is
>> controversial (they are apparently sort of a POSIX legacy feature, and
>> not part of the C standard).
>
> It took Brian quite some time to recognize that crap and turn it all into
> max() and min() instructions.
>

OK.

>> Mostly, due to implementation issues in BGBCC, ?: needs an actual
>> branch, so is ironically actually worse for performance in this case
>> than using an explicit if/else...
>
> My point exactly--but perhaps you should spend the effort to make your
> compiler recognize these as single branch-free instructions.
>

I guess, it could be possible in theory to use pattern recognition on
the ASTs to turn this into a call to an intrinsic. Might need to look
into this.

For the most part, hadn't really done much of any of this sort of
pattern recognition.

>> A case could be made for C11 "_Generic", but would need to implement it.
>
>
>> But, yeah, apart from compiler issues, in theory the min/max part
>> could be 3 instructions.
>
> max() and min() are easy to perform in a single cycle in both int and FP
>

Theoretically, it can be done in two instructions, which is still faster
than what ?: generates currently.

Say:
z=(x<y)?x:y;

Compiling to something like, say:
CMPGT R9, R8
BF .L0
MOV R9, R10
BRA .L1
.L0:
MOV R8, R10
BRA .L1
.L1:
MOV R10, R11

Whereas, the normal "if()" is smart enough to turn it into predication,
but ?: needs to use a temporary and doesn't have a predication special case.

Well, and also predication needs to be handled partly in the frontend
(in the generated RIL3 bytecode), but is generally omitted if the output
is going to a RIL object (so, if compiling for a static library, it
always falls back to actual branches).

Partly this is in turn because my 3AC backend isn't really smart enough
to pull this off.

Putting effort into it wasn't really a high priority though, as ?: in C
seems to be relatively infrequently used.

But, as-is, one will get a 2-op sequence if they use the intrinsic.

>> If done in ASM, might be something like:
>>    // R4..R7 (x0, y0, x1, y1)
>>    SUB  R4, R6, R16   |  SUB  R5, R7, R17
>>    SHAD R16, -31, R18 | SHAD R17, -31, R19
>>    XOR  R16, R18, R16 | XOR  R17, R19, R17
>>    CMPGT R17, R16
>>    CSELT R16, R17, R18 | CSELT R17, R16, R19
>>    SHAD R19, -1, R19
>>    ADD  R18, R19, R2
>
> And you are claiming this has an advantage over::
>
>     MAX  R9,R8,R7
>     MIN  R10,R8,R7
>     SRL  R10,R10,#1
>     LEA  R10,[R9,R10,#-1]
>

At least in so far as it doesn't depend on features that don't currently
exist.

>
>> I don't have actual MIN/MAX ops though.
>
> perhaps you should reconsider...
>

I didn't have them previously as I could express them in a 2-op
sequence, which seemed "good enough".

Though, this was before some more recent changes which (in the name of
improving FPGA timing) effectively turned CMPxx and similar into 2 cycle
ops.

Though, MIN/MAX ops could possibly allow:
MIN Imm10u, Rn // Rn=(Rn<Imm)?Rn:Imm

Which could potentially allow implementing "__int32_clamp()" in 2 or 3
instructions, which is an improvement over needing 6 instructions (with
pretty much all of these instructions ending up with a 2-cycle latency).

>>    Though, it seems RISC-V does have them in the Bitmanip extension.
>
>> Could probably add them in-theory (maybe along with FMIN/FMAX).
>
> I use the same FU to perform Int max() and FP max()--remember once you
> special case out the problems, IEEE is a linearly ordered set of numbers.

Yeah.

I was doing both CMP and FCMP cases via the ALU.
This is why likely both could be added at the same time, since the logic
for adding one would likely apply to the others.

Re: Alternative Representations of the Concertina II ISA

<0abdbee61093c310902a84d0e3934a9f@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35343&group=comp.arch#35343

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Fri, 1 Dec 2023 18:26:01 +0000
Organization: novaBBS
Message-ID: <0abdbee61093c310902a84d0e3934a9f@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com> <uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me> <uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me> <344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com> <ukaolb$1g6mo$1@dont-email.me> <68b4893596b51df6084755251f7b95e1@news.novabbs.com> <ukb0pf$1hhp1$1@dont-email.me> <4a7556a465e686b0d75d48238469544c@news.novabbs.com> <ukbq6o$1okqv$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2734321"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$Q.NxumDX/qVQBbAjT6mL8eioSOTucb/5WtgtrYqArIpPLjSVRbeBu
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Fri, 1 Dec 2023 18:26 UTC

BGB wrote:

> On 11/30/2023 5:15 PM, MitchAlsup wrote:
>> BGB wrote:
>>
>>> On 11/30/2023 2:27 PM, MitchAlsup wrote:
>>>> BGB wrote:
>>>>
>>>>> On 11/30/2023 10:23 AM, MitchAlsup wrote:
>>>>>> BGB wrote:
>>>>>>
>>>>>>> Have noted that these old games did have a nifty trick for
>>>>>>> calculating approximate distance, say:
>>>>>>>    dx=x0-x1;
>>>>>>>    dy=y0-y1;
>>>>>>>    adx=dx^(dx>>31);
>>>>>>>    ady=dy^(dy>>31);
>>>>>>>    if(ady>adx)
>>>>>>>      { t=adx; adx=ady; ady=t; }         //common
>>>>>>> //  { adx^=ady; ady^=adx; adx^=ady; }  //sometimes
>>>>>>>    d=adx+(ady>>1);
>>>>>>
>>>>>> Why not::
>>>>>>
>>>>>>      dx=x0-x1;
>>>>>>      dy=y0-y1;
>>>>>>      adx=dx^(dx>>31);
>>>>>>      ady=dy^(dy>>31);
>>>>>>      if(ady>adx)
>>>>>>          d=adx+(ady>>1);
>>>>>>      else
>>>>>>          d=ady+(adx>>1);
>>>>
>>>> As long as ISA has max() and min():: Why not::
>>>>
>>>>      dx=x0-x1;
>>>>      dy=y0-y1;
>>>>      adx=dx^(dx>>31);
>>>>      ady=dy^(dy>>31);
>>>>      d = min( adx, ady ) + (max( adx, ady ) >> 1);
>>>>
>>
>>> Yeah, I guess that could work as well for 2D...
>>
>>
>>> Though, makes more sense to use the "__int32_min" intrinsic or
>>> similar, rather than the generic C library "min()"/"max()" macros.
>>>    #define min(a, b)   (((a)<(b))?(a):(b))
>>>    #define max(a, b)   (((a)>(b))?(a):(b))
>>
>> max() and min() are single instructions in my ISA. On the FP side, they
>> even get NaNs put in the proper places, too.
>>
>>> Though, apparently the existence of these macros in stdlib.h is
>>> controversial (they are apparently sort of a POSIX legacy feature, and
>>> not part of the C standard).
>>
>> It took Brian quite some time to recognize that crap and turn it all into
>> max() and min() instructions.
>>

> OK.

>>> Mostly, due to implementation issues in BGBCC, ?: needs an actual
>>> branch, so is ironically actually worse for performance in this case
>>> than using an explicit if/else...
>>
>> My point exactly--but perhaps you should spend the effort to make your
>> compiler recognize these as single branch-free instructions.
>>

> I guess, it could be possible in theory to use pattern recognition on
> the ASTs to turn this into a call to an intrinsic. Might need to look
> into this.

> For the most part, hadn't really done much of any of this sort of
> pattern recognition.

>>> A case could be made for C11 "_Generic", but would need to implement it.
>>
>>
>>> But, yeah, apart from compiler issues, in theory the min/max part
>>> could be 3 instructions.
>>
>> max() and min() are easy to perform in a single cycle in both int and FP
>>

> Theoretically, it can be done in two instructions, which is still faster
> than what ?: generates currently.

> Say:
> z=(x<y)?x:y;

> Compiling to something like, say:
> CMPGT R9, R8
> BF .L0
> MOV R9, R10
> BRA .L1
> .L0:
> MOV R8, R10
> BRA .L1
> .L1:
> MOV R10, R11

> Whereas, the normal "if()" is smart enough to turn it into predication,
> but ?: needs to use a temporary and doesn't have a predication special case.

> Well, and also predication needs to be handled partly in the frontend
> (in the generated RIL3 bytecode), but is generally omitted if the output
> is going to a RIL object (so, if compiling for a static library, it
> always falls back to actual branches).

> Partly this is in turn because my 3AC backend isn't really smart enough
> to pull this off.

> Putting effort into it wasn't really a high priority though, as ?: in C
> seems to be relatively infrequently used.

> But, as-is, one will get a 2-op sequence if they use the intrinsic.

>>> If done in ASM, might be something like:
>>>    // R4..R7 (x0, y0, x1, y1)
>>>    SUB  R4, R6, R16   |  SUB  R5, R7, R17
>>>    SHAD R16, -31, R18 | SHAD R17, -31, R19
>>>    XOR  R16, R18, R16 | XOR  R17, R19, R17
>>>    CMPGT R17, R16
>>>    CSELT R16, R17, R18 | CSELT R17, R16, R19
>>>    SHAD R19, -1, R19
>>>    ADD  R18, R19, R2
>>
>> And you are claiming this has an advantage over::
>>
>>     MAX  R9,R8,R7
>>     MIN  R10,R8,R7
>>     SRL  R10,R10,#1
>>     LEA  R10,[R9,R10,#-1]
>>

> At least in so far as it doesn't depend on features that don't currently
> exist.

>>
>>> I don't have actual MIN/MAX ops though.
>>
>> perhaps you should reconsider...
>>

> I didn't have them previously as I could express them in a 2-op
> sequence, which seemed "good enough".

> Though, this was before some more recent changes which (in the name of
> improving FPGA timing) effectively turned CMPxx and similar into 2 cycle
> ops.

> Though, MIN/MAX ops could possibly allow:
> MIN Imm10u, Rn // Rn=(Rn<Imm)?Rn:Imm

> Which could potentially allow implementing "__int32_clamp()" in 2 or 3
> instructions, which is an improvement over needing 6 instructions (with
> pretty much all of these instructions ending up with a 2-cycle latency).

>>>    Though, it seems RISC-V does have them in the Bitmanip extension.
>>
>>> Could probably add them in-theory (maybe along with FMIN/FMAX).
>>
>> I use the same FU to perform Int max() and FP max()--remember once you
>> special case out the problems, IEEE is a linearly ordered set of numbers.

> Yeah.

> I was doing both CMP and FCMP cases via the ALU.
> This is why likely both could be added at the same time, since the logic
> for adding one would likely apply to the others.

About 93% of ICMP is used in FCMP.

Re: Alternative Representations of the Concertina II ISA

<uke17d$23120$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35350&group=comp.arch#35350

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sat, 2 Dec 2023 01:28:13 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <uke17d$23120$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me> <ujr9i4$2gau2$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 2 Dec 2023 01:28:13 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1d2816f671b5de6ce0283716d96ae0a9";
logging-data="2196544"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18F0SeyjjZlWAylxM+M8zSYq7ni+mraqYk="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:Jm1/oOajjnWD4EumCPhAB+VlZcs=
 by: Quadibloc - Sat, 2 Dec 2023 01:28 UTC

On Fri, 24 Nov 2023 22:53:57 +0000, Quadibloc wrote:

> While this is an "improvement", I do realize that this, along with even
> supporting multiple memory widths, even if I have found ways that it can
> technically be done, is sheer insanity.
>
> I do have a rationale for going there. It's simple enough. It comes from
> the x86.
>
> Since the irresistible force of availability of applications means that
> we are doomed to always have just *one* dominant ISA...
>
> then, to minimize the terrible loss that this entails, that one dominant
> ISA should be maximally flexible. To approach, as closely as possible,
> the "good old days" where the PDP-10 and the CDC 6600 coexisted
> alongside the PDP-11 and the IBM System/360.
>
> This is the driving force behind my insane quest to design an ISA for a
> CPU which has the ability to do something which no one seems to want.

And for some people, the KDF 9, with a 48-bit word, was something to
sing about - here, to the tune of "The British Grenadiers":

Some talk of I.B.M.s and some of C.D.C.s,
Of Honeywells and Burroughses, and such great names as these,
But of all the world's computers, there's none that is so fine,
As the English Electric Leo Marconi Kay - Dee - Eff Nine!

Some talk of thirty-two bit words, and some of twenty-four,
Of disks and drums and datacells, and megabytes of core,
But for those who've written usercode there's nothing can outshine,
The subroutine jump nesting store of the Kay - Dee - Eff Nine!

It's a good thing I had saved this in a text file on my computer.

A web search, including a search on Google Groups, was no longer
able to turn this up for me. Also, the KDF 9 was lauded in another
posting for reasons more directly related to this scheme of mine:

(Quoting a brochure for the computer)
The KDF 9 word has 48 bits...

It may be used as...

Eight 6-bit Alphanumeric Characters
One 48-bit Fixed-Point Number
Two 24-bit (Half length) Fixed-Point Numbers
Half of a 96-bit (Double length) Fixed-Point Number
One 48-Bit Floating-Point Number
Two 24-bit (Half length) Floating-Point Numbers
Half of a 96-Bit (Double length) Floating-Point Number
Three 16-Bit (Fixed-Point) Integers
Six 8-Bit Instruction Syllables
(end quote)

An instruction was 1, 2, or 3 syllables; an address was 15 bits.
O, memory! We shall not see its like again.

John Savard

Re: Alternative Representations of the Concertina II ISA

<ukfsog$2f665$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35355&group=comp.arch#35355

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sat, 2 Dec 2023 14:24:14 -0400
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <ukfsog$2f665$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Injection-Date: Sat, 2 Dec 2023 18:24:16 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="86d0f3684f9f89950fe3ffa845a90132";
logging-data="2595013"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+XPtFl3fSYAfXdBAMIH0uc7ZGge3qj2Ea3YYlHQUfltg=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17.1
Cancel-Lock: sha1:cExCwA+T/Q0MAHAcrJhpH2X7cOA=
In-Reply-To: <344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
 by: Terje Mathisen - Sat, 2 Dec 2023 18:24 UTC

MitchAlsup wrote:
> BGB wrote:
>
>> Have noted that these old games did have a nifty trick for calculating
>> approximate distance, say:
>>    dx=x0-x1;
>>    dy=y0-y1;
>>    adx=dx^(dx>>31);
>>    ady=dy^(dy>>31);
>>    if(ady>adx)
>>      { t=adx; adx=ady; ady=t; }         //common
>> //  { adx^=ady; ady^=adx; adx^=ady; }  //sometimes
>>    d=adx+(ady>>1);
>
> Why not::
>
>     dx=x0-x1;
>     dy=y0-y1;
>     adx=dx^(dx>>31);
>     ady=dy^(dy>>31);
>     if(ady>adx)
>         d=adx+(ady>>1);
>     else         d=ady+(adx>>1);

Possibly due to wanting to not use more than one branch predictor entry?

Maybe because ady rarely was larger than adx?

Besides, your version has the two terms swapped. :-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Alternative Representations of the Concertina II ISA

<ukftaj$2f665$2@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35356&group=comp.arch#35356

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!nntp.comgw.net!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sat, 2 Dec 2023 14:33:55 -0400
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <ukftaj$2f665$2@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukaolb$1g6mo$1@dont-email.me>
<68b4893596b51df6084755251f7b95e1@news.novabbs.com>
<ukb0pf$1hhp1$1@dont-email.me>
<4a7556a465e686b0d75d48238469544c@news.novabbs.com>
<ukbar2$1iufq$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 2 Dec 2023 18:33:55 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="86d0f3684f9f89950fe3ffa845a90132";
logging-data="2595013"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19t8pNZ+iCX0/pHHnm3u62sS5TcFGWK8IqDH3xONucayg=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17.1
Cancel-Lock: sha1:bCjXY7FSAv/RM8F5cfZjUTLTw1s=
In-Reply-To: <ukbar2$1iufq$1@dont-email.me>
 by: Terje Mathisen - Sat, 2 Dec 2023 18:33 UTC

Robert Finch wrote:
>
> Q+ has min3 / max3 for minimum or maximum of three values.
> But only fmin / fmax for float.
>
For symmetry I would assume/like a median3 as well, so that min3,
median3, max3 would return a sorted list?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Alternative Representations of the Concertina II ISA

<ukg0ui$2fmpv$5@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35358&group=comp.arch#35358

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m.thomasson.1@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sat, 2 Dec 2023 11:35:46 -0800
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <ukg0ui$2fmpv$5@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukfsog$2f665$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 2 Dec 2023 19:35:46 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d31985ac4d5adade32b60c58e4949fb1";
logging-data="2612031"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19oz0XmnfNNIZhDYDHWNNHCCvVhTjrsHhk="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:kCj8+Xlvy8dIYUoCT+7vWbPYNpU=
Content-Language: en-US
In-Reply-To: <ukfsog$2f665$1@dont-email.me>
 by: Chris M. Thomasson - Sat, 2 Dec 2023 19:35 UTC

On 12/2/2023 10:24 AM, Terje Mathisen wrote:
> MitchAlsup wrote:
>> BGB wrote:
>>
>>> Have noted that these old games did have a nifty trick for
>>> calculating approximate distance, say:
>>>    dx=x0-x1;
>>>    dy=y0-y1;
>>>    adx=dx^(dx>>31);
>>>    ady=dy^(dy>>31);
>>>    if(ady>adx)
>>>      { t=adx; adx=ady; ady=t; }         //common
>>> //  { adx^=ady; ady^=adx; adx^=ady; }  //sometimes
>>>    d=adx+(ady>>1);
>>
>> Why not::
>>
>>      dx=x0-x1;
>>      dy=y0-y1;
>>      adx=dx^(dx>>31);
>>      ady=dy^(dy>>31);
>>      if(ady>adx)
>>          d=adx+(ady>>1);
>>      else         d=ady+(adx>>1);
>
> Possibly due to wanting to not use more than one branch predictor entry?
>
> Maybe because ady rarely was larger than adx?
>
> Besides, your version has the two terms swapped. :-)

Check this out, I think it might be relevant, Dog Leg Hypotenuse:

https://forums.parallax.com/discussion/147522/dog-leg-hypotenuse-approximation


devel / comp.arch / Re: Alternative Representations of the Concertina II ISA

Pages:1234
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor