Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

"Yes, and I feel bad about rendering their useless carci into dogfood..." -- Badger comics


devel / comp.arch / Re: Alternative Representations of the Concertina II ISA

SubjectAuthor
* Alternative Representations of the Concertina II ISAQuadibloc
+* Re: Alternative Representations of the Concertina II ISAQuadibloc
|`- Re: Alternative Representations of the Concertina II ISAQuadibloc
+* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|`* Re: Alternative Representations of the Concertina II ISAQuadibloc
| +* Re: Alternative Representations of the Concertina II ISAQuadibloc
| |`- Re: Alternative Representations of the Concertina II ISABGB
| `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|  `* Re: Alternative Representations of the Concertina II ISAQuadibloc
|   +- Re: Alternative Representations of the Concertina II ISAMitchAlsup
|   `* Re: Alternative Representations of the Concertina II ISABGB
|    +* Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    |+* Re: Alternative Representations of the Concertina II ISAQuadibloc
|    ||`* Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    || +* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    || |`- Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    || `* Re: Alternative Representations of the Concertina II ISAQuadibloc
|    ||  `- Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |+- Re: Alternative Representations of the Concertina II ISABGB
|    |`* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    | +- Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    | +* Re: Alternative Representations of the Concertina II ISABGB
|    | |`* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    | | `* Re: Alternative Representations of the Concertina II ISARobert Finch
|    | |  +* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    | |  |`- Re: Alternative Representations of the Concertina II ISABGB
|    | |  `- Re: Alternative Representations of the Concertina II ISABGB
|    | `* Re: Alternative Representations of the Concertina II ISAPaul A. Clayton
|    |  +* Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    |  |+* Re: Alternative Representations of the Concertina II ISABGB
|    |  ||+- Re: Alternative Representations of the Concertina II ISABGB
|    |  ||+* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |  |||`- Re: Alternative Representations of the Concertina II ISABGB
|    |  ||`* Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    |  || `* Re: Alternative Representations of the Concertina II ISABGB
|    |  ||  `- Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    |  |`* Re: Alternative Representations of the Concertina II ISAPaul A. Clayton
|    |  | `- Re: Alternative Representations of the Concertina II ISABGB
|    |  `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |   `* Re: Alternative Representations of the Concertina II ISAPaul A. Clayton
|    |    `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |     `* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      +* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      |`* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      | +* Re: Alternative Representations of the Concertina II ISABGB
|    |      | |`* Re: Alternative Representations of the Concertina II ISARobert Finch
|    |      | | +- Re: Alternative Representations of the Concertina II ISABGB
|    |      | | `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      | |  `- Re: Alternative Representations of the Concertina II ISABGB
|    |      | `* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      |  `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      |   `* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      |    +* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      |    |`* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      |    | `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      |    |  `- Re: Alternative Representations of the Concertina II ISAPaul A. Clayton
|    |      |    `* Re: Alternative Representations of the Concertina II ISATerje Mathisen
|    |      |     `* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      |      `- Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |       `- Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|     +- Re: Alternative Representations of the Concertina II ISABGB
|     `* Re: Alternative Representations of the Concertina II ISAMarko Zec
|      `* Re: Alternative Representations of the Concertina II ISABGB
|       `* Re: Alternative Representations of the Concertina II ISAStephen Fuld
|        `* Re: Alternative Representations of the Concertina II ISABGB
|         `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          +* Re: Alternative Representations of the Concertina II ISABGB
|          |`* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          | `* Re: Alternative Representations of the Concertina II ISABGB
|          |  `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          |   +* Re: Alternative Representations of the Concertina II ISARobert Finch
|          |   |+* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          |   ||+* Re: Alternative Representations of the Concertina II ISAChris M. Thomasson
|          |   |||`- Re: Alternative Representations of the Concertina II ISAChris M. Thomasson
|          |   ||`- Re: Alternative Representations of the Concertina II ISARobert Finch
|          |   |`* Re: Alternative Representations of the Concertina II ISATerje Mathisen
|          |   | `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          |   |  `* Re: Alternative Representations of the Concertina II ISATerje Mathisen
|          |   |   `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          |   |    +* Re: Alternative Representations of the Concertina II ISAChris M. Thomasson
|          |   |    |`- Re: Alternative Representations of the Concertina II ISABGB
|          |   |    `- Re: Alternative Representations of the Concertina II ISATerje Mathisen
|          |   `* Re: Alternative Representations of the Concertina II ISABGB
|          |    `- Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          `* Re: Alternative Representations of the Concertina II ISATerje Mathisen
|           +* Re: Alternative Representations of the Concertina II ISAChris M. Thomasson
|           |`* Fast approx hypotenuse (Was Re: Alternative Representations of theTerje Mathisen
|           | `- Re: Fast approx hypotenuse (Was Re: Alternative Representations ofBGB
|           `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|            `- Re: Alternative Representations of the Concertina II ISABGB
`* Re: Alternative Representations of the Concertina II ISAQuadibloc
 `- Re: Alternative Representations of the Concertina II ISAQuadibloc

Pages:1234
Re: Alternative Representations of the Concertina II ISA

<0e80dd840bec852e665aa6eaa62fc20f@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35362&group=comp.arch#35362

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sat, 2 Dec 2023 20:41:48 +0000
Organization: novaBBS
Message-ID: <0e80dd840bec852e665aa6eaa62fc20f@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com> <uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me> <uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me> <344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com> <ukaolb$1g6mo$1@dont-email.me> <68b4893596b51df6084755251f7b95e1@news.novabbs.com> <ukb0pf$1hhp1$1@dont-email.me> <4a7556a465e686b0d75d48238469544c@news.novabbs.com> <ukbar2$1iufq$1@dont-email.me> <ukftaj$2f665$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2849168"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$CIYti60eo33IZLLl2A1uSui33ROWAvXUBZOv8jgMbHjlqBCysOLEO
 by: MitchAlsup - Sat, 2 Dec 2023 20:41 UTC

Terje Mathisen wrote:

> Robert Finch wrote:
>>
>> Q+ has min3 / max3 for minimum or maximum of three values.
>> But only fmin / fmax for float.
>>
> For symmetry I would assume/like a median3 as well, so that min3,
> median3, max3 would return a sorted list?

That was my point 8-10 posts ago.

> Terje

Re: Alternative Representations of the Concertina II ISA

<f722e5b713350f278981fb8bf87bf34e@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35363&group=comp.arch#35363

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sat, 2 Dec 2023 20:42:37 +0000
Organization: novaBBS
Message-ID: <f722e5b713350f278981fb8bf87bf34e@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com> <uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me> <uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me> <344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com> <ukfsog$2f665$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2849168"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$BrWgSHLc/.8MmPeNp.1veeJ9ssrRJvzX0CjJLd313HpJM4PV5PCue
 by: MitchAlsup - Sat, 2 Dec 2023 20:42 UTC

Terje Mathisen wrote:

> MitchAlsup wrote:
>> BGB wrote:
>>
>>> Have noted that these old games did have a nifty trick for calculating
>>> approximate distance, say:
>>>    dx=x0-x1;
>>>    dy=y0-y1;
>>>    adx=dx^(dx>>31);
>>>    ady=dy^(dy>>31);
>>>    if(ady>adx)
>>>      { t=adx; adx=ady; ady=t; }         //common
>>> //  { adx^=ady; ady^=adx; adx^=ady; }  //sometimes
>>>    d=adx+(ady>>1);
>>
>> Why not::
>>
>>     dx=x0-x1;
>>     dy=y0-y1;
>>     adx=dx^(dx>>31);
>>     ady=dy^(dy>>31);
>>     if(ady>adx)
>>         d=adx+(ady>>1);
>>     else         d=ady+(adx>>1);

> Possibly due to wanting to not use more than one branch predictor entry?

> Maybe because ady rarely was larger than adx?

> Besides, your version has the two terms swapped. :-)

Just mimicking the original.

> Terje

Re: Alternative Representations of the Concertina II ISA

<ukgf45$2huej$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35366&group=comp.arch#35366

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sat, 2 Dec 2023 17:37:38 -0600
Organization: A noiseless patient Spider
Lines: 66
Message-ID: <ukgf45$2huej$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukfsog$2f665$1@dont-email.me>
<f722e5b713350f278981fb8bf87bf34e@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 2 Dec 2023 23:37:42 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="54e7202464a2b74931719912661a1245";
logging-data="2685395"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+WkSPTFkCr2KTwHynD3xOQ"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:0wTbKdA24q4lBO+4+HLGgJ6CTPQ=
Content-Language: en-US
In-Reply-To: <f722e5b713350f278981fb8bf87bf34e@news.novabbs.com>
 by: BGB - Sat, 2 Dec 2023 23:37 UTC

On 12/2/2023 2:42 PM, MitchAlsup wrote:
> Terje Mathisen wrote:
>
>> MitchAlsup wrote:
>>> BGB wrote:
>>>
>>>> Have noted that these old games did have a nifty trick for
>>>> calculating approximate distance, say:
>>>>    dx=x0-x1;
>>>>    dy=y0-y1;
>>>>    adx=dx^(dx>>31);
>>>>    ady=dy^(dy>>31);
>>>>    if(ady>adx)
>>>>      { t=adx; adx=ady; ady=t; }         //common
>>>> //  { adx^=ady; ady^=adx; adx^=ady; }  //sometimes
>>>>    d=adx+(ady>>1);
>>>
>>> Why not::
>>>
>>>      dx=x0-x1;
>>>      dy=y0-y1;
>>>      adx=dx^(dx>>31);
>>>      ady=dy^(dy>>31);
>>>      if(ady>adx)
>>>          d=adx+(ady>>1);
>>>      else         d=ady+(adx>>1);
>
>> Possibly due to wanting to not use more than one branch predictor entry?
>
>> Maybe because ady rarely was larger than adx?
>
>> Besides, your version has the two terms swapped. :-)
>
> Just mimicking the original.
>

These two versions will give different results, since the "if(ady>adx)"
was the "when to switch values" check, rather than the "when to run the
calculation as-is".

But, yeah, I didn't notice this at first.

In my compiler, both will likely be translated into predicated sequences
(though, the "fastest possible" would be to use conditional select, or
MIN/MAX if it were available, *).

*: I have now added MIN/MAX as an optional feature, along with logic for
RISC-V's 'FSGNJx' ops, mostly for sake of filling in a few holes needed
to add support for RISC-V's F and D extensions in RV64 mode (which can
now be extended, in theory, out to RV64IMAFD, at least supporting all
the userland parts of the ISA).

This is along with also needing to add logic for a single-precision FDIV
and similar as well. Though, single precision FSQRT is, at-the-moment,
just using the crude approximate version.

MIN/MAX and FMIN/FMAX have been added to BJX2 as well, though there is
not currently any plan to add FSGNJx or similar to BJX2 (these ops don't
really seem worth the encoding space).

>> Terje

Fast approx hypotenuse (Was Re: Alternative Representations of the Concertina II ISA)

<uki3bq$2sofj$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35373&group=comp.arch#35373

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Fast approx hypotenuse (Was Re: Alternative Representations of the
Concertina II ISA)
Date: Sun, 3 Dec 2023 10:29:14 -0400
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <uki3bq$2sofj$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukfsog$2f665$1@dont-email.me> <ukg0ui$2fmpv$5@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 3 Dec 2023 14:29:14 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b6358965889427b5bba55c62c2d4f1d8";
logging-data="3039731"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/so1bg6bE0cjnZIBavfM/wN5Ujt3g7lj5GkPdgZocGow=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17.1
Cancel-Lock: sha1:c0fWTYjsgSuCScsFJuwO09e49C4=
In-Reply-To: <ukg0ui$2fmpv$5@dont-email.me>
 by: Terje Mathisen - Sun, 3 Dec 2023 14:29 UTC

Chris M. Thomasson wrote:
> Check this out, I think it might be relevant, Dog Leg Hypotenuse:
>
> https://forums.parallax.com/discussion/147522/dog-leg-hypotenuse-approximation
>
>
Quoting from that link:

> However, the best solution I found with just shifts and adds was this:
>
> hypot ~= hi + (lo + max( 0, lo+lo+lo-hi ) ) / 8

with a max error of just 2.8%!

That is very good, I get a result in 5 cycles on a cpu with min/max
opcodes, after a rewrite to:

hypot = (hi*8 + lo + max(0,lo*3-hi))>>3;

hi = max(a,b)
lo = min(a,b)

t = lo*3 ;; LEA, single cycle
t2 = hi*8 + lo ;; ditto

t = max(0,t)

t += t2

t >>= 3

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Alternative Representations of the Concertina II ISA

<uki3pn$2sqjr$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35374&group=comp.arch#35374

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sun, 3 Dec 2023 10:36:39 -0400
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <uki3pn$2sqjr$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukaolb$1g6mo$1@dont-email.me>
<68b4893596b51df6084755251f7b95e1@news.novabbs.com>
<ukb0pf$1hhp1$1@dont-email.me>
<4a7556a465e686b0d75d48238469544c@news.novabbs.com>
<ukbar2$1iufq$1@dont-email.me> <ukftaj$2f665$2@dont-email.me>
<0e80dd840bec852e665aa6eaa62fc20f@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 3 Dec 2023 14:36:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b6358965889427b5bba55c62c2d4f1d8";
logging-data="3041915"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Mbb3qKGtW9cGmC8D/G6/lkU0lAnV7EDyhPmVkSppuJw=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17.1
Cancel-Lock: sha1:IybyzYpr2hVCAL39JBV9Xfr4LyQ=
In-Reply-To: <0e80dd840bec852e665aa6eaa62fc20f@news.novabbs.com>
 by: Terje Mathisen - Sun, 3 Dec 2023 14:36 UTC

MitchAlsup wrote:
> Terje Mathisen wrote:
>
>> Robert Finch wrote:
>>>
>>> Q+ has min3 / max3 for minimum or maximum of three values.
>>> But only fmin / fmax for float.
>>>
>> For symmetry I would assume/like a median3 as well, so that min3,
>> median3, max3 would return a sorted list?
>
> That was my point 8-10 posts ago.

I saw that a bit later: I've been on a sailing trip in the Caribbean
with very sporadic network connections, basically just when in reach of
a French island.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Alternative Representations of the Concertina II ISA

<c8f153ae34fa182ae099ecbf4d64e93b@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35382&group=comp.arch#35382

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sun, 3 Dec 2023 16:59:31 +0000
Organization: novaBBS
Message-ID: <c8f153ae34fa182ae099ecbf4d64e93b@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com> <uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me> <uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me> <344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com> <ukaolb$1g6mo$1@dont-email.me> <68b4893596b51df6084755251f7b95e1@news.novabbs.com> <ukb0pf$1hhp1$1@dont-email.me> <4a7556a465e686b0d75d48238469544c@news.novabbs.com> <ukbar2$1iufq$1@dont-email.me> <ukftaj$2f665$2@dont-email.me> <0e80dd840bec852e665aa6eaa62fc20f@news.novabbs.com> <uki3pn$2sqjr$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2936178"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$nF.0q5AB35z65aBPXfIzAuGEN83wG5SroWRCc9j42FqzqkCACB4AW
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: MitchAlsup - Sun, 3 Dec 2023 16:59 UTC

Terje Mathisen wrote:

> MitchAlsup wrote:
>> Terje Mathisen wrote:
>>
>>> Robert Finch wrote:
>>>>
>>>> Q+ has min3 / max3 for minimum or maximum of three values.
>>>> But only fmin / fmax for float.
>>>>
>>> For symmetry I would assume/like a median3 as well, so that min3,
>>> median3, max3 would return a sorted list?
>>
>> That was my point 8-10 posts ago.

> I saw that a bit later: I've been on a sailing trip in the Caribbean
> with very sporadic network connections, basically just when in reach of
> a French island.

Must be a tough life............

> Terje

Re: Alternative Representations of the Concertina II ISA

<ukio3a$30p2m$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35388&group=comp.arch#35388

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m.thomasson.1@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sun, 3 Dec 2023 12:23:04 -0800
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <ukio3a$30p2m$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukaolb$1g6mo$1@dont-email.me>
<68b4893596b51df6084755251f7b95e1@news.novabbs.com>
<ukb0pf$1hhp1$1@dont-email.me>
<4a7556a465e686b0d75d48238469544c@news.novabbs.com>
<ukbar2$1iufq$1@dont-email.me> <ukftaj$2f665$2@dont-email.me>
<0e80dd840bec852e665aa6eaa62fc20f@news.novabbs.com>
<uki3pn$2sqjr$1@dont-email.me>
<c8f153ae34fa182ae099ecbf4d64e93b@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 3 Dec 2023 20:23:07 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="8868b8f4564f0e4e4a8d04bd2a9ce818";
logging-data="3171414"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/qPdC8Jx8rlVfd3a3+aoevNlJ5D6GX8Xc="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:w3su4oCJbT16+eAMces3TKOPLoQ=
Content-Language: en-US
In-Reply-To: <c8f153ae34fa182ae099ecbf4d64e93b@news.novabbs.com>
 by: Chris M. Thomasson - Sun, 3 Dec 2023 20:23 UTC

On 12/3/2023 8:59 AM, MitchAlsup wrote:
> Terje Mathisen wrote:
>
>> MitchAlsup wrote:
>>> Terje Mathisen wrote:
>>>
>>>> Robert Finch wrote:
>>>>>
>>>>> Q+ has min3 / max3 for minimum or maximum of three values.
>>>>> But only fmin / fmax for float.
>>>>>
>>>> For symmetry I would assume/like a median3 as well, so that min3,
>>>> median3, max3 would return a sorted list?
>>>
>>> That was my point 8-10 posts ago.
>
>> I saw that a bit later: I've been on a sailing trip in the Caribbean
>> with very sporadic network connections, basically just when in reach
>> of a French island.
>
> Must be a tough life............

Fwiw, the Sierra people are sailing around as well. Ken and Roberta
Williams. :^) The retired life. ;^) Just kidding. They are hard at work
on some games. One game in particular. Nice to see them working again:

https://www.colossalcave3d.com

Mix colossalcave3d with Half Life! ;^)

https://en.wikipedia.org/wiki/Half-Life_(video_game)

;^)

Re: Fast approx hypotenuse (Was Re: Alternative Representations of the Concertina II ISA)

<ukiv81$323o6$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35392&group=comp.arch#35392

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.bbs.nz!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Fast approx hypotenuse (Was Re: Alternative Representations of
the Concertina II ISA)
Date: Sun, 3 Dec 2023 16:25:03 -0600
Organization: A noiseless patient Spider
Lines: 106
Message-ID: <ukiv81$323o6$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukfsog$2f665$1@dont-email.me> <ukg0ui$2fmpv$5@dont-email.me>
<uki3bq$2sofj$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 3 Dec 2023 22:25:05 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="54e7202464a2b74931719912661a1245";
logging-data="3215110"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18rIGObUxzjP+wlReiGxf0W"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:x3bilox013aZ1IAehWPh1eoM24o=
Content-Language: en-US
In-Reply-To: <uki3bq$2sofj$1@dont-email.me>
 by: BGB - Sun, 3 Dec 2023 22:25 UTC

On 12/3/2023 8:29 AM, Terje Mathisen wrote:
> Chris M. Thomasson wrote:
>> Check this out, I think it might be relevant, Dog Leg Hypotenuse:
>>
>> https://forums.parallax.com/discussion/147522/dog-leg-hypotenuse-approximation
>>
> Quoting from that link:
>
>> However, the best solution I found with just shifts and adds was this:
>>
>> hypot ~= hi + (lo + max( 0, lo+lo+lo-hi ) ) / 8
>
> with a max error of just 2.8%!
>
> That is very good, I get a result in 5 cycles on a cpu with min/max
> opcodes, after a rewrite to:
>
>  hypot = (hi*8 + lo + max(0,lo*3-hi))>>3;
>
>
>   hi = max(a,b)
>   lo = min(a,b)
>
>   t = lo*3       ;; LEA, single cycle
>   t2 = hi*8 + lo ;; ditto
>
>   t = max(0,t)
>
>   t += t2
>
>   t >>= 3
>

Yeah, seems like a good option if one needs a better approximation than
the original version.

Which, I guess exists because:
d=adx+ady;

Is too crude even for a lot of stuff where precision doesn't really
matter (say, excluding rendering objects that are too far away from the
camera, or ranking objects by relative distance, etc).

One use-case for a 4D distance was mostly for trying to rebuild RGB555
-> Indexed lookup tables for a color palette, where I was doing a
theoretical distance of:
dr=cr1-cr0;
dg=cg1-cg0;
db=cb1-cb0;
cy0=(8*cg0+5*cr0+3*cb0)/16;
cy1=(8*cg1+5*cr1+3*cb1)/16;
dy=cy1-cy0;
d=sqrt_apx(dr*dr+dg*dg+db*db+2*dy*dy);

Where, it is better to try to preserve the luminance of a pixel than its
exact color, so luma is handled specially as an additional parameter, in
addition to just the R/G/B distances.

Where, say, in this case sqrt_apx might be, say:
v>>((31-__int_clz(v))/2)

Which, granted, does make the assumption of having a CLZ instruction.
Though, as a downside, for fixed-point, requires more complicated logic
to deal with the location of the decimal point, and is a lot more
"noisy" than the other strategy (since this square-root approximation is
discontinuous).

But, the tradeoff is more that the multiplies and square-root
approximation may be faster than the cost of sorting out the 4 distances.

Though, later ended up instead pre-building the lookup table in the form
of an 8-bit BMP image which was added to the resource section of the
shell binary, which was a lot faster than trying to rebuild it
dynamically during GUI start-up. Couldn't come up with a good algo to
rebuild the RGB555 -> 8-bit lookup tables in a time-frame that wasn't
annoyingly slow at 50MHz. Granted, if one does both a primary and
secondary match for dithering, this results in a 66K BMP image, which is
annoyingly large for something which will only be accessed once, when
starting up the GUI mode.

Half tempting to consider defining an LZ compressed 8-bit BMP variant
for this use-case.

Though, I am left to suspect that part of the appeal of 6x6x6 216 color
may have been that it would have been computationally cheaper to rebuild
such an RGB555 lookup table for this than it is for a more free-form
color palette.

Well, and also things like KD-Tree based palette optimization (followed
by rebuilding the RGB555 lookup table) is too slow in this case to be
considered for real-time use. I suspect 90s era OS's did not use
dynamically optimized palettes though.

....

> Terje
>

Re: Alternative Representations of the Concertina II ISA

<ukj39m$32rn1$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35396&group=comp.arch#35396

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!news.nntp4.net!news.gegeweb.eu!gegeweb.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sun, 3 Dec 2023 17:34:12 -0600
Organization: A noiseless patient Spider
Lines: 75
Message-ID: <ukj39m$32rn1$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukaolb$1g6mo$1@dont-email.me>
<68b4893596b51df6084755251f7b95e1@news.novabbs.com>
<ukb0pf$1hhp1$1@dont-email.me>
<4a7556a465e686b0d75d48238469544c@news.novabbs.com>
<ukbar2$1iufq$1@dont-email.me> <ukftaj$2f665$2@dont-email.me>
<0e80dd840bec852e665aa6eaa62fc20f@news.novabbs.com>
<uki3pn$2sqjr$1@dont-email.me>
<c8f153ae34fa182ae099ecbf4d64e93b@news.novabbs.com>
<ukio3a$30p2m$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 3 Dec 2023 23:34:14 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="7f62334d3366f2a68b6185e423893c6e";
logging-data="3239649"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX199Ho9f2Bx5XCwuyxUi2WCH"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:msbMswYJd8NlW0vYMZF1Hb0vV4Y=
In-Reply-To: <ukio3a$30p2m$1@dont-email.me>
Content-Language: en-US
 by: BGB - Sun, 3 Dec 2023 23:34 UTC

On 12/3/2023 2:23 PM, Chris M. Thomasson wrote:
> On 12/3/2023 8:59 AM, MitchAlsup wrote:
>> Terje Mathisen wrote:
>>
>>> MitchAlsup wrote:
>>>> Terje Mathisen wrote:
>>>>
>>>>> Robert Finch wrote:
>>>>>>
>>>>>> Q+ has min3 / max3 for minimum or maximum of three values.
>>>>>> But only fmin / fmax for float.
>>>>>>
>>>>> For symmetry I would assume/like a median3 as well, so that min3,
>>>>> median3, max3 would return a sorted list?
>>>>
>>>> That was my point 8-10 posts ago.
>>
>>> I saw that a bit later: I've been on a sailing trip in the Caribbean
>>> with very sporadic network connections, basically just when in reach
>>> of a French island.
>>
>> Must be a tough life............
>
> Fwiw, the Sierra people are sailing around as well. Ken and Roberta
> Williams. :^) The retired life. ;^) Just kidding. They are hard at work
> on some games. One game in particular. Nice to see them working again:
>
> https://www.colossalcave3d.com
>
> Mix colossalcave3d with Half Life! ;^)
>
> https://en.wikipedia.org/wiki/Half-Life_(video_game)
>
> ;^)

Half-Life was a game I played a lot back when I was in high-school.

Hadn't really played any of the other Sierra games at the time, and I
guess all this was around the time that Sierra imploded.

Now, several decades later, I am right at the end of my 30s...
Time marches on I guess...

Had recently noted while compiling Doom in both RISC-V and BJX2:
RV64IM: "-ffunction-sections" "-Wl,-gc-sections"
RV64 can beat out BJX2 XG2 in terms of smaller ".text" with "-Os".
With "-O2" or "-O3", RV64 generates a bigger ".text" section.
This is with both using the same C library implementation.
Without "-ffunction-sections", RV64 still loses with "-Os"
The actual size difference of the binaries is a fair bit larger.
It seems that ELF has a lot more space overhead due to metadata.

Both cases seem to be significantly smaller than the x86-64 versions.
WSL: Has smaller ".text" but bigger ELF;
Win64: Bigger ".text" but smaller EXE;
Comparably, PE/COFF has a lot less metadata.

There is around a 2x size difference between WSL and Win64, with the
Win64 case having around 3x the ".text" size of RV64 and BJX2 (~ 340K in
this case).

BJX2 in Baseline mode with "/Os" would still beat RV64IM "-Os", but this
is not a fair comparison (would need to compare against RV64GC to be fair).

I guess would need to evaluate whether it is more ISA related or
compiler related that it is not reliably beating RV64IM in terms of code
size.

....

Re: Alternative Representations of the Concertina II ISA

<uknn9p$895a$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35440&group=comp.arch#35440

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.network!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Tue, 5 Dec 2023 13:40:08 -0400
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <uknn9p$895a$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr> <uk5dbu$curb$1@dont-email.me>
<uk5em8$d3ss$1@dont-email.me> <uk5hdi$dlb6$1@dont-email.me>
<344c59fecee6751b8e2ce107e211a9b5@news.novabbs.com>
<ukaolb$1g6mo$1@dont-email.me>
<68b4893596b51df6084755251f7b95e1@news.novabbs.com>
<ukb0pf$1hhp1$1@dont-email.me>
<4a7556a465e686b0d75d48238469544c@news.novabbs.com>
<ukbar2$1iufq$1@dont-email.me> <ukftaj$2f665$2@dont-email.me>
<0e80dd840bec852e665aa6eaa62fc20f@news.novabbs.com>
<uki3pn$2sqjr$1@dont-email.me>
<c8f153ae34fa182ae099ecbf4d64e93b@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 5 Dec 2023 17:40:09 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b3c99983c6a4fb72ee217325b8e7e237";
logging-data="271530"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19KRdfKIV4HuxzWskBA0YCBPNq6+QOLxRRbN2Inon81fw=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17.1
Cancel-Lock: sha1:+rBzi56XmT3Edpj+b1X7+AlnrkA=
In-Reply-To: <c8f153ae34fa182ae099ecbf4d64e93b@news.novabbs.com>
 by: Terje Mathisen - Tue, 5 Dec 2023 17:40 UTC

MitchAlsup wrote:
> Terje Mathisen wrote:
>
>> MitchAlsup wrote:
>>> Terje Mathisen wrote:
>>>
>>>> Robert Finch wrote:
>>>>>
>>>>> Q+ has min3 / max3 for minimum or maximum of three values.
>>>>> But only fmin / fmax for float.
>>>>>
>>>> For symmetry I would assume/like a median3 as well, so that min3,
>>>> median3, max3 would return a sorted list?
>>>
>>> That was my point 8-10 posts ago.
>
>> I saw that a bit later: I've been on a sailing trip in the Caribbean
>> with very sporadic network connections, basically just when in reach
>> of a French island.
>
> Must be a tough life............

Indeed.

Our last such trip was in 2018 (Panama City to Jamaica on the Royal
Clipper), this trip was supposed to take place in 2020 but then Covid
happened.

Back in Oslo now, and I just got my very first positive Covid test, so I
must have been exposed to the virus during the trip, thankfully I didn't
really get sick until we got to Oslo Airport in -17C.

Terje
PS. These trips are for orienteering, we ran 10-11 races over 16 days.
I've uploaded headcam videos to youtube from all of them. On those you
can see both my forward view and a map snippet with moving marker which
shows where I'm running.

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Alternative Representations of the Concertina II ISA

<uko9la$e2he$2@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35455&group=comp.arch#35455

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.furie.org.uk!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Mon, 4 Dec 2023 20:14:10 -0500
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <uko9la$e2he$2@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
<3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 5 Dec 2023 22:53:30 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ec68258c84f21c8c1442b2941c45afcb";
logging-data="461358"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18wHBw6gtqwRqwQG1Vc+6j8h975rF7KqJY="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:ERfWL3oKATlL1MHctFGA6kZO/ow=
In-Reply-To: <3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
 by: Paul A. Clayton - Tue, 5 Dec 2023 01:14 UTC

On 11/27/23 8:03 PM, MitchAlsup wrote:
> Anton Ertl wrote:
[snip]
>> However, looking at more recent architectures, the RISC-V M
>> extension (which is part of RV64G and RV32G, i.e., a
>> standard extension) has not just multiply instructions (MUL,
>> MULH, MULHU, MULHSU, MULW), but also integer divide >> instructions: DIV, DIVU, REM, REMU, DIVW, DIVUW, REMW, and
>> REMUW.
>
> All of which are possible in My 66000 using operand sign
> control, S-bit, and and CARRY when you want 64×64->128 or
> 128/64->{64 quotient, 64 remainder}

What about multiply-high-no-carry-in? There might be cases
where such could be used for division with a reused
(possibly compile-time constant) divisor even without the
carry-in. (Producing the carry-in by a low multiply seems
to have significant overhead in energy, throughput, and/or
latency.) A small immediate operand might be shifted into
the most significant position to facilitate division by a
small constant.

A division-specific instruction (divide-using-reciprocal)
could do more sophisticated manipulation of an immediate and
perhaps provide an instruction a compiler could use with
less knowledge of the dividend. Yet there might be uses for
a simple multiply-high-no-carry-in.

Re: Alternative Representations of the Concertina II ISA

<2023Dec7.143933@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35526&group=comp.arch#35526

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!nntp.comgw.net!paganini.bofh.team!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 07 Dec 2023 13:39:33 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 16
Message-ID: <2023Dec7.143933@mips.complang.tuwien.ac.at>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <2023Nov27.101759@mips.complang.tuwien.ac.at> <3020102144e0e12cd79c784d2b80af78@news.novabbs.com> <uko9la$e2he$2@dont-email.me>
Injection-Info: dont-email.me; posting-host="d973e405e1a7fc2e145c0dbaebb3c896";
logging-data="1357530"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18p0spW79tLHMmJDLND3VqA"
Cancel-Lock: sha1:xA6idE0BLxfQjkoPKvDKUwMi+Dk=
X-newsreader: xrn 10.11
 by: Anton Ertl - Thu, 7 Dec 2023 13:39 UTC

"Paul A. Clayton" <paaronclayton@gmail.com> writes:
>What about multiply-high-no-carry-in? There might be cases
>where such could be used for division with a reused
>(possibly compile-time constant) divisor even without the
>carry-in. (Producing the carry-in by a low multiply seems
>to have significant overhead in energy, throughput, and/or
>latency.)

What's that operation supposed to produce? How does it differ from,
e.g., RISC-V MULH/MULHU/MULHSU? And if there is a difference, what
makes you think that it still works for division?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Alternative Representations of the Concertina II ISA

<c8a110514d5c8cb0534e217912a0369e@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35535&group=comp.arch#35535

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 7 Dec 2023 18:55:10 +0000
Organization: novaBBS
Message-ID: <c8a110514d5c8cb0534e217912a0369e@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <2023Nov27.101759@mips.complang.tuwien.ac.at> <3020102144e0e12cd79c784d2b80af78@news.novabbs.com> <uko9la$e2he$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="3369221"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$oimNOz7eeF9WsFbbZILQieOtPHt59D.NZauOtSNuHDbMm3z3EXR8i
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: MitchAlsup - Thu, 7 Dec 2023 18:55 UTC

Paul A. Clayton wrote:

> On 11/27/23 8:03 PM, MitchAlsup wrote:
>> Anton Ertl wrote:
> [snip]
>>> However, looking at more recent architectures, the RISC-V M
>>> extension (which is part of RV64G and RV32G, i.e., a
>>> standard extension) has not just multiply instructions (MUL,
>>> MULH, MULHU, MULHSU, MULW), but also integer divide >> instructions: DIV, DIVU, REM, REMU, DIVW, DIVUW, REMW, and
>>> REMUW.
>>
>> All of which are possible in My 66000 using operand sign
>> control, S-bit, and and CARRY when you want 64×64->128 or
>> 128/64->{64 quotient, 64 remainder}

> What about multiply-high-no-carry-in?

CARRY R9,{{O}} // carry applies to next inst
// no carry in yes carry out
MUL R10,Rx,Ry // {R9,R10} contain result

> There might be cases
> where such could be used for division with a reused
> (possibly compile-time constant) divisor even without the
> carry-in. (Producing the carry-in by a low multiply seems
> to have significant overhead in energy, throughput, and/or
> latency.) A small immediate operand might be shifted into
> the most significant position to facilitate division by a
> small constant.

> A division-specific instruction (divide-using-reciprocal)
> could do more sophisticated manipulation of an immediate and
> perhaps provide an instruction a compiler could use with
> less knowledge of the dividend. Yet there might be uses for
> a simple multiply-high-no-carry-in.

Re: Alternative Representations of the Concertina II ISA

<ukths6$1eahr$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35549&group=comp.arch#35549

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 7 Dec 2023 16:44:20 -0600
Organization: A noiseless patient Spider
Lines: 66
Message-ID: <ukths6$1eahr$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
<3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
<uko9la$e2he$2@dont-email.me> <2023Dec7.143933@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 7 Dec 2023 22:44:22 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f0cffd37592bee6d23d9340da84b6934";
logging-data="1518139"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/59ImtEBdmVFiy85Tc7kVP"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:y8VeSxa7zX5kG2NSfEKi3GcsjYs=
Content-Language: en-US
In-Reply-To: <2023Dec7.143933@mips.complang.tuwien.ac.at>
 by: BGB - Thu, 7 Dec 2023 22:44 UTC

On 12/7/2023 7:39 AM, Anton Ertl wrote:
> "Paul A. Clayton" <paaronclayton@gmail.com> writes:
>> What about multiply-high-no-carry-in? There might be cases
>> where such could be used for division with a reused
>> (possibly compile-time constant) divisor even without the
>> carry-in. (Producing the carry-in by a low multiply seems
>> to have significant overhead in energy, throughput, and/or
>> latency.)
>
> What's that operation supposed to produce? How does it differ from,
> e.g., RISC-V MULH/MULHU/MULHSU? And if there is a difference, what
> makes you think that it still works for division?
>

For division by reciprocal, one may need to be able to adjust the right
shift (say, 32..34 bits), and add a bias to the low-order bits if the
input is negative (say, adding "Rcp-1").

Without the variable shift part, one will not get reliable results for
some divisors (IIRC, 7, 17, 31, etc).

Without a bias, it will round towards negative infinity rather than
towards zero (standard for division is to always round towards zero).

As-is, signed division logic (say, "y=x/7;") would be something like:
MOV 0x49249249, R1
DMULS.L R4, R1, R2
ADD -1, R1
CMPGE 0, R4
ADD?F R2, R1, R2
SHAD.Q R2, -33, R2

Or, if one used jumbo encodings:
DMULS.L R4, 0x49249249, R2
CMPGE 0, R4 // SR.T=(R4>=0)
ADD?F R2, 0x49249248, R2
SHAD.Q R2, -33, R2

So, sadly, simply taking the high result isn't quite sufficient.

In my case, I could almost add a dedicated instruction for this, but I
don't expect it would be used often enough to justify doing so.

Internally, it would leverage logic that already exists for DMACS.L and
FAZDIV (which basically already does this), but the existence of this
instruction would effectively mandate that FAZDIV exists...

But, if the compiler already knows FAZDIV exists, then it can do:
MOV 7, R1
DIVS.L R4, R1, R2
And still get a roughly 3 cycle integer division... (At least, if R1 is
1 to 63 ...).

(Or, if DIV exists but FAZDIV does not, it still works but takes 36
cycles instead...).

Or, the basic case will always work, and would not depend on the
existence of the FAZDIV mechanism.

> - anton

Re: Alternative Representations of the Concertina II ISA

<uktifc$1ed0j$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35550&group=comp.arch#35550

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 7 Dec 2023 16:54:34 -0600
Organization: A noiseless patient Spider
Lines: 78
Message-ID: <uktifc$1ed0j$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
<3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
<uko9la$e2he$2@dont-email.me> <2023Dec7.143933@mips.complang.tuwien.ac.at>
<ukths6$1eahr$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 7 Dec 2023 22:54:36 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f0cffd37592bee6d23d9340da84b6934";
logging-data="1520659"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18rqUZC5oh/ZlN+XzkDK2D1"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:D3M9clHmKh7Ftm9koc3v31HvrzM=
Content-Language: en-US
In-Reply-To: <ukths6$1eahr$1@dont-email.me>
 by: BGB - Thu, 7 Dec 2023 22:54 UTC

On 12/7/2023 4:44 PM, BGB wrote:
> On 12/7/2023 7:39 AM, Anton Ertl wrote:
>> "Paul A. Clayton" <paaronclayton@gmail.com> writes:
>>> What about multiply-high-no-carry-in? There might be cases
>>> where such could be used for division with a reused
>>> (possibly compile-time constant) divisor even without the
>>> carry-in. (Producing the carry-in by a low multiply seems
>>> to have significant overhead in energy, throughput, and/or
>>> latency.)
>>
>> What's that operation supposed to produce?  How does it differ from,
>> e.g., RISC-V MULH/MULHU/MULHSU?  And if there is a difference, what
>> makes you think that it still works for division?
>>
>
> For division by reciprocal, one may need to be able to adjust the right
> shift (say, 32..34 bits), and add a bias to the low-order bits if the
> input is negative (say, adding "Rcp-1").
>
> Without the variable shift part, one will not get reliable results for
> some divisors (IIRC, 7, 17, 31, etc).
>
> Without a bias, it will round towards negative infinity rather than
> towards zero (standard for division is to always round towards zero).
>
>
> As-is, signed division logic (say, "y=x/7;") would be something like:
>   MOV      0x49249249, R1
>   DMULS.L  R4, R1, R2
>   ADD      -1, R1
>   CMPGE    0, R4
>   ADD?F    R2, R1, R2
>   SHAD.Q   R2, -33, R2
>
> Or, if one used jumbo encodings:
>   DMULS.L  R4, 0x49249249, R2
>   CMPGE    0, R4               // SR.T=(R4>=0)
>   ADD?F    R2, 0x49249248, R2
>   SHAD.Q   R2, -33, R2
>
>
> So, sadly, simply taking the high result isn't quite sufficient.
>

Self-correction, bias is wrong here, should be something more like:
ADD?F R2, 0x1B6DB6DB6, R2

I was writing out the algorithm from memory, so I may be prone to error...

I am still not entirely sure, but in any case, a bias value does need to
be added for negative inputs...

>
> In my case, I could almost add a dedicated instruction for this, but I
> don't expect it would be used often enough to justify doing so.
>
> Internally, it would leverage logic that already exists for DMACS.L and
> FAZDIV (which basically already does this), but the existence of this
> instruction would effectively mandate that FAZDIV exists...
>
>
> But, if the compiler already knows FAZDIV exists, then it can do:
>   MOV     7, R1
>   DIVS.L  R4, R1, R2
> And still get a roughly 3 cycle integer division... (At least, if R1 is
> 1 to 63 ...).
>
> (Or, if DIV exists but FAZDIV does not, it still works but takes 36
> cycles instead...).
>
> Or, the basic case will always work, and would not depend on the
> existence of the FAZDIV mechanism.
>
>
>> - anton
>

Re: Alternative Representations of the Concertina II ISA

<534417a4e3232146a6a3ba6e1113b040@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35555&group=comp.arch#35555

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Fri, 8 Dec 2023 02:44:03 +0000
Organization: novaBBS
Message-ID: <534417a4e3232146a6a3ba6e1113b040@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <2023Nov27.101759@mips.complang.tuwien.ac.at> <3020102144e0e12cd79c784d2b80af78@news.novabbs.com> <uko9la$e2he$2@dont-email.me> <2023Dec7.143933@mips.complang.tuwien.ac.at> <ukths6$1eahr$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="3400557"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$xLbjinkWvTQKt5Vpg1uqq.k4CvgqKfc3kekkhUHaKkf1Yv7QBIVx.
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: MitchAlsup - Fri, 8 Dec 2023 02:44 UTC

BGB wrote:

> On 12/7/2023 7:39 AM, Anton Ertl wrote:
>> "Paul A. Clayton" <paaronclayton@gmail.com> writes:
>>> What about multiply-high-no-carry-in? There might be cases
>>> where such could be used for division with a reused
>>> (possibly compile-time constant) divisor even without the
>>> carry-in. (Producing the carry-in by a low multiply seems
>>> to have significant overhead in energy, throughput, and/or
>>> latency.)
>>
>> What's that operation supposed to produce? How does it differ from,
>> e.g., RISC-V MULH/MULHU/MULHSU? And if there is a difference, what
>> makes you think that it still works for division?
>>

> For division by reciprocal, one may need to be able to adjust the right
> shift (say, 32..34 bits), and add a bias to the low-order bits if the
> input is negative (say, adding "Rcp-1").

> Without the variable shift part, one will not get reliable results for
> some divisors (IIRC, 7, 17, 31, etc).

> Without a bias, it will round towards negative infinity rather than
> towards zero (standard for division is to always round towards zero).

No:: C99 division is so specified, but there are 3 other (semi) popular
definitions.

> As-is, signed division logic (say, "y=x/7;") would be something like:
> MOV 0x49249249, R1
> DMULS.L R4, R1, R2
> ADD -1, R1
> CMPGE 0, R4
> ADD?F R2, R1, R2
> SHAD.Q R2, -33, R2

> Or, if one used jumbo encodings:
> DMULS.L R4, 0x49249249, R2
> CMPGE 0, R4 // SR.T=(R4>=0)
> ADD?F R2, 0x49249248, R2
> SHAD.Q R2, -33, R2

In 1975 I wrote PDP-11/40 microcode that performed serial DIV algorithm,
and had the opportunity in the loop to short circuit if all the bits of
the denominator had been consumed. {{CMU had an add on board with student
programmable microcode.}}

Thus::
DIV R9,R5,#7

should not take more than 5 cycles. Nor should
Thusly::
DIV R9,R5,#-7

!!

> So, sadly, simply taking the high result isn't quite sufficient.

It is more practical to make HW DIV fast, than to have everyone {and his
brother} invent new ways to avoid dividing.

Re: Alternative Representations of the Concertina II ISA

<uku584$1kdps$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35557&group=comp.arch#35557

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 7 Dec 2023 22:14:58 -0600
Organization: A noiseless patient Spider
Lines: 153
Message-ID: <uku584$1kdps$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
<3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
<uko9la$e2he$2@dont-email.me> <2023Dec7.143933@mips.complang.tuwien.ac.at>
<ukths6$1eahr$1@dont-email.me>
<534417a4e3232146a6a3ba6e1113b040@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 8 Dec 2023 04:15:01 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="dd11b010c98c2e0ebcf1a63cb087ca72";
logging-data="1718076"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18CodP7dbwTidDaxcBKotRu"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:s/WEPAYpmFRHufLmrzYWv9LTkcs=
In-Reply-To: <534417a4e3232146a6a3ba6e1113b040@news.novabbs.com>
Content-Language: en-US
 by: BGB - Fri, 8 Dec 2023 04:14 UTC

On 12/7/2023 8:44 PM, MitchAlsup wrote:
> BGB wrote:
>
>> On 12/7/2023 7:39 AM, Anton Ertl wrote:
>>> "Paul A. Clayton" <paaronclayton@gmail.com> writes:
>>>> What about multiply-high-no-carry-in? There might be cases
>>>> where such could be used for division with a reused
>>>> (possibly compile-time constant) divisor even without the
>>>> carry-in. (Producing the carry-in by a low multiply seems
>>>> to have significant overhead in energy, throughput, and/or
>>>> latency.)
>>>
>>> What's that operation supposed to produce?  How does it differ from,
>>> e.g., RISC-V MULH/MULHU/MULHSU?  And if there is a difference, what
>>> makes you think that it still works for division?
>>>
>
>> For division by reciprocal, one may need to be able to adjust the
>> right shift (say, 32..34 bits), and add a bias to the low-order bits
>> if the input is negative (say, adding "Rcp-1").
>
>> Without the variable shift part, one will not get reliable results for
>> some divisors (IIRC, 7, 17, 31, etc).
>
>> Without a bias, it will round towards negative infinity rather than
>> towards zero (standard for division is to always round towards zero).
>
> No:: C99 division is so specified, but there are 3 other (semi) popular
> definitions.
>

Existing software will break if it isn't round-towards-zero.

Though, yeah, if one doesn't care about this, it is possible to
eliminate the biasing step.

>
>> As-is, signed division logic (say, "y=x/7;") would be something like:
>>    MOV      0x49249249, R1
>>    DMULS.L  R4, R1, R2
>>    ADD      -1, R1
>>    CMPGE    0, R4
>>    ADD?F    R2, R1, R2
>>    SHAD.Q   R2, -33, R2
>
>> Or, if one used jumbo encodings:
>>    DMULS.L  R4, 0x49249249, R2
>>    CMPGE    0, R4               // SR.T=(R4>=0)
>>    ADD?F    R2, 0x49249248, R2
>>    SHAD.Q   R2, -33, R2
>
> In 1975 I wrote PDP-11/40 microcode that performed serial DIV algorithm,
> and had the opportunity in the loop to short circuit if all the bits of
> the denominator had been consumed. {{CMU had an add on board with student
> programmable microcode.}}
>
> Thus::
>       DIV    R9,R5,#7
>
> should not take more than 5 cycles. Nor should
> Thusly::
>       DIV    R9,R5,#-7
>
> !!
>

Timings for the example, fixing bias:
DMULS.L R4, 0x49249249, R2 //2c
CMPGE 0, R4 //2c
ADD?F 0x1B6DB6DB6, R2 //2c
SHAD.Q R2, -33, R2 //1c
Latency, 7 cycles, 3 cycle interlock penalty.

>> So, sadly, simply taking the high result isn't quite sufficient.
>
> It is more practical to make HW DIV fast, than to have everyone {and his
> brother} invent new ways to avoid dividing.

Or, at least make it semi-fast in the compiler.

Then again, I go skimming though my compiler output and encounter this
nugget:
MOV.Q (SP, 608), RQ4
BSR __va64_arg_l
MOV R2, RQ12
MOV RQ12, RQ8
// pdpc201/btshx_supa.c:1597
MOV RQ8, RQ7
EXTU.L RQ7
MOV RQ7, RQ13
MOV RQ10, RQ4
MOV RQ13, RD5
BSR tk_sprint_hex
MOV RQ2, RQ10
// pdpc201/btshx_supa.c:1598

For:
s1=va_arg(lst, char *);
ct=tk_sprint_hex(ct, (u32)s1);

Things check out as per the internal rules, but this is a stupid number
of MOV's...

Say, if a person were hand-optimizing this, they could do, say:
MOV.Q (SP, 608), RQ4
BSR __va64_arg_l
EXTU.L R2, RQ5
// pdpc201/btshx_supa.c:1597
MOV RQ10, RQ4
BSR tk_sprint_hex
MOV RQ2, RQ10
// pdpc201/btshx_supa.c:1598

But, alas, this is still the sort of stuff I am dealing with from my
compiler...

Then notices:
The compiler was using "stale" logic here for type conversion:
Load variable into a scratch register;
Do the operation on scratch register;
Store register back to the variable.
Rather than the newer pattern:
Fetch source and destination variables as registers;
Do conversion between registers.
The code itself was stale, apparently not having gotten the memo that
pointers are 64-bits now....

But, yeah, fixing up the type conversion case should at least eliminate
a few of the MOV's (still not good, but should be "better").

....

Also found and fixed another bug where apparently the logic trying to
auto-cast a value to a _Bool value would result in it being converted to
_Bool twice in a row, say:
CMPQEQ 0, R8
MOVNT R9
CMPEQ 0, R9
MOVNT R10

....

But, I guess, the stuff one finds if they go skimming through an ASM
output dump...

....

Re: Alternative Representations of the Concertina II ISA

<2023Dec8.162048@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35566&group=comp.arch#35566

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Fri, 08 Dec 2023 15:20:48 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 121
Message-ID: <2023Dec8.162048@mips.complang.tuwien.ac.at>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <2023Nov27.101759@mips.complang.tuwien.ac.at> <3020102144e0e12cd79c784d2b80af78@news.novabbs.com> <uko9la$e2he$2@dont-email.me> <2023Dec7.143933@mips.complang.tuwien.ac.at> <ukths6$1eahr$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="bc1a1ba2cd1bf02df6681add8f69a0c2";
logging-data="1891776"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+fb7FlnY3ZZ/fJV8C0R8iY"
Cancel-Lock: sha1:qo32nCvD8lkSptKUMnNLbq82po4=
X-newsreader: xrn 10.11
 by: Anton Ertl - Fri, 8 Dec 2023 15:20 UTC

BGB <cr88192@gmail.com> writes:
>On 12/7/2023 7:39 AM, Anton Ertl wrote:
>> "Paul A. Clayton" <paaronclayton@gmail.com> writes:
>>> What about multiply-high-no-carry-in? There might be cases
>>> where such could be used for division with a reused
>>> (possibly compile-time constant) divisor even without the
>>> carry-in. (Producing the carry-in by a low multiply seems
>>> to have significant overhead in energy, throughput, and/or
>>> latency.)
>>
>> What's that operation supposed to produce? How does it differ from,
>> e.g., RISC-V MULH/MULHU/MULHSU? And if there is a difference, what
>> makes you think that it still works for division?
>>
>
>For division by reciprocal, one may need to be able to adjust the right
>shift (say, 32..34 bits), and add a bias to the low-order bits if the
>input is negative (say, adding "Rcp-1").
>
>Without the variable shift part, one will not get reliable results for
>some divisors (IIRC, 7, 17, 31, etc).
>
>Without a bias, it will round towards negative infinity rather than
>towards zero (standard for division is to always round towards zero).

Nope. E.g., Forth-83 standardizes floored division; in Forth-94 and
Forth-2012, it's implementation-defined. More generally,
<https://en.wikipedia.org/wiki/Modulo_operation> lists lots of
languages that perform floored or Euclidean modulo, and I would hope
that their division operation agrees with the modulo operation.

As for division by multiplying with the reciprocal, I have written a
paper on it [ertl19kps], but the most important sentence of that paper
is:

|If you read only one paper about the topic, my recommendation is
|Robison’s [Rob05].

My take on this improves the latency of the unsigned case on some CPUs
by avoiding the shift and bias, but it costs an additional
multiplication.

>FAZDIV

What is FAZDIV?

And you failed to answer what multiply-high-no-carry-in does.

@InProceedings{ertl19kps,
author = {M. Anton Ertl},
title = {Integer Division by Multiplying with the
Double-Width Reciprocal},
crossref = {kps19},
pages = {75--84},
url = {http://www.complang.tuwien.ac.at/papers/ertl19kps.pdf},
url-slides = {http://www.complang.tuwien.ac.at/papers/ertl19kps-slides.pdf},
abstract = {Earlier work on integer division by multiplying with
the reciprocal has focused on multiplying with a
single-width reciprocal, combined with a correction
and followed by a shift. The present work explores
using a double-width reciprocal to allow getting rid
of the correction and shift.}
}

@Proceedings{kps19,
title = {20. Kolloquium Programmiersprachen und Grundlagen
der Programmierung (KPS)},
booktitle = {20. Kolloquium Programmiersprachen und Grundlagen
der Programmierung (KPS)},
year = {2019},
key = {kps19},
editor = {Martin Pl\"umicke and Fayez Abu Alia},
url = {https://www.hb.dhbw-stuttgart.de/kps2019/kps2019_Tagungsband.pdf}
}

@InProceedings{robison05,
author = "Arch D. Robison",
title = "{$N$}-Bit Unsigned Division Via {$N$}-Bit
Multiply-Add",
OPTeditor = "Paolo Montuschi and Eric (Eric Mark) Schwarz",
booktitle = "{Proceedings of the 17th IEEE Symposium on Computer
Arithmetic (ARITH-17)}",
publisher = "IEEE Computer Society Press",
ISBN = "0-7695-2366-8",
ISBN-13 = "978-0-7695-2366-8",
year = "2005",
bibdate = "Wed Jun 22 07:02:55 2005",
bibsource = "http://www.math.utah.edu/pub/tex/bib/fparith.bib",
URL = "http://www.acsel-lab.com/arithmetic/arith17/papers/ARITH17_Robison.pdf",
abstract = "Integer division on modern processors is expensive
compared to multiplication. Previous algorithms for
performing unsigned division by an invariant divisor,
via reciprocal approximation, suffer in the worst case
from a common requirement for $ n + 1 $ bit
multiplication, which typically must be synthesized
from $n$-bit multiplication and extra arithmetic
operations. This paper presents, and proves, a hybrid
of previous algorithms that replaces $ n + 1 $ bit
multiplication with a single fused multiply-add
operation on $n$-bit operands, thus reducing any
$n$-bit unsigned division to the upper $n$ bits of a
multiply-add, followed by a single right shift. An
additional benefit is that the prerequisite
calculations are simple and fast. On the Itanium 2
processor, the technique is advantageous for as few as
two quotients that share a common run-time divisor.",
acknowledgement = "Nelson H. F. Beebe, University of Utah, Department
of Mathematics, 110 LCB, 155 S 1400 E RM 233, Salt Lake
City, UT 84112-0090, USA, Tel: +1 801 581 5254, FAX: +1
801 581 4148, e-mail: \path|beebe@math.utah.edu|,
\path|beebe@acm.org|, \path|beebe@computer.org|
(Internet), URL:
\path|http://www.math.utah.edu/~beebe/|",
keywords = "ARITH-17",
pagecount = "9",
}

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Alternative Representations of the Concertina II ISA

<ukvl7u$1r0c6$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35570&group=comp.arch#35570

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Fri, 8 Dec 2023 11:54:04 -0600
Organization: A noiseless patient Spider
Lines: 104
Message-ID: <ukvl7u$1r0c6$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
<3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
<uko9la$e2he$2@dont-email.me> <2023Dec7.143933@mips.complang.tuwien.ac.at>
<ukths6$1eahr$1@dont-email.me> <2023Dec8.162048@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 8 Dec 2023 17:54:06 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="dd11b010c98c2e0ebcf1a63cb087ca72";
logging-data="1933702"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+pSsolJFl30yvgKhVlpjh8"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:9DucHYiuBzukjchktExL8sxpO8g=
Content-Language: en-US
In-Reply-To: <2023Dec8.162048@mips.complang.tuwien.ac.at>
 by: BGB - Fri, 8 Dec 2023 17:54 UTC

On 12/8/2023 9:20 AM, Anton Ertl wrote:
> BGB <cr88192@gmail.com> writes:
>> On 12/7/2023 7:39 AM, Anton Ertl wrote:
>>> "Paul A. Clayton" <paaronclayton@gmail.com> writes:
>>>> What about multiply-high-no-carry-in? There might be cases
>>>> where such could be used for division with a reused
>>>> (possibly compile-time constant) divisor even without the
>>>> carry-in. (Producing the carry-in by a low multiply seems
>>>> to have significant overhead in energy, throughput, and/or
>>>> latency.)
>>>
>>> What's that operation supposed to produce? How does it differ from,
>>> e.g., RISC-V MULH/MULHU/MULHSU? And if there is a difference, what
>>> makes you think that it still works for division?
>>>
>>
>> For division by reciprocal, one may need to be able to adjust the right
>> shift (say, 32..34 bits), and add a bias to the low-order bits if the
>> input is negative (say, adding "Rcp-1").
>>
>> Without the variable shift part, one will not get reliable results for
>> some divisors (IIRC, 7, 17, 31, etc).
>>
>> Without a bias, it will round towards negative infinity rather than
>> towards zero (standard for division is to always round towards zero).
>
> Nope. E.g., Forth-83 standardizes floored division; in Forth-94 and
> Forth-2012, it's implementation-defined. More generally,
> <https://en.wikipedia.org/wiki/Modulo_operation> lists lots of
> languages that perform floored or Euclidean modulo, and I would hope
> that their division operation agrees with the modulo operation.
>
> As for division by multiplying with the reciprocal, I have written a
> paper on it [ertl19kps], but the most important sentence of that paper
> is:
>
> |If you read only one paper about the topic, my recommendation is
> |Robison’s [Rob05].
>
> My take on this improves the latency of the unsigned case on some CPUs
> by avoiding the shift and bias, but it costs an additional
> multiplication.
>

OK, how about:
Languages like C, Java, C#, etc, expect division to always round towards
zero.

>> FAZDIV
>
> What is FAZDIV?
>

It was the name I came up with for a CPU feature.

But, basically:
There is a "SlowMulDiv" unit that implements Shift-Add /
Shift-and-Subtract, but isn't very fast.

There is a 32-bit integer multiplier (that can do integer
multiply-accumulate), which is faster.

I added a case where the integer multiplier can detect a divide that it
can handle itself, and then asserts a signal to the SlowMulDiv unit that
it should treat this one as a NOP. The multiplier then builds a
reciprocal and similar using several lookup tables, feeds this through
the multiplier, and processes the result.

In the current implementation though, it only deals with numbers between
1 and 63. Everything else is left to the SlowMulDiv unit.

The logic in the main pipeline sees that the multiplier had handled this
case instead, and uses the multiplier's output rather than the dividers.

But, used the term "FAZDIV" which more or less just means "Fast Divide"
(also based on the pattern that 'st' followed quickly/immediately by a
constant sound can get mutated into a 'Z' sound, at least in some US
English dialects).

The actual difference it makes is fairly small though, as while it is at
least fast, it ended up being used relatively infrequently as a lot of
code tends to fairly aggressively avoid integer divide (and the compiler
deals with divide-by-constant cases itself).

> And you failed to answer what multiply-high-no-carry-in does.
>

I am not sure, I didn't come up with that one.

I can imagine though what a "multiply instruction intended specifically
to perform division via multiply by reciprocal" would do.

But, have uncertainty as to whether it would be a good idea to expose
the mechanism directly in an ISA.

If it existed, its main use case would likely be shaving a few clock
cycles off the normal divide by constant case.

Re: Alternative Representations of the Concertina II ISA

<2023Dec9.092634@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35592&group=comp.arch#35592

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sat, 09 Dec 2023 08:26:34 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 26
Message-ID: <2023Dec9.092634@mips.complang.tuwien.ac.at>
References: <ujp81t$26ff9$1@dont-email.me> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <2023Nov27.101759@mips.complang.tuwien.ac.at> <3020102144e0e12cd79c784d2b80af78@news.novabbs.com> <uko9la$e2he$2@dont-email.me> <2023Dec7.143933@mips.complang.tuwien.ac.at> <ukths6$1eahr$1@dont-email.me> <2023Dec8.162048@mips.complang.tuwien.ac.at> <ukvl7u$1r0c6$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="81982e2b7fe220322b820553c5d7fd2f";
logging-data="2265915"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+2iF7PJxM/Q0RUaNTuA5Cm"
Cancel-Lock: sha1:kwjrV5Bf77SG2jeUA53U2nvTGto=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sat, 9 Dec 2023 08:26 UTC

BGB <cr88192@gmail.com> writes:
>On 12/8/2023 9:20 AM, Anton Ertl wrote:
>> BGB <cr88192@gmail.com> writes:
>>> Without a bias, it will round towards negative infinity rather than
>>> towards zero (standard for division is to always round towards zero).
>>
>> Nope. E.g., Forth-83 standardizes floored division; in Forth-94 and
>> Forth-2012, it's implementation-defined. More generally,
>> <https://en.wikipedia.org/wiki/Modulo_operation> lists lots of
>> languages that perform floored or Euclidean modulo, and I would hope
>> that their division operation agrees with the modulo operation.
....
>OK, how about:
>Languages like C, Java, C#, etc, expect division to always round towards
>zero.

Yes, there are programming languages that standardize truncated (aka
symmetric) division. BTW, C89 is not one of them, C99 ff. are. There
are also programming languages that standardize other kinds of
division (or at least modulo), or fail to standardize the behaviour
fully when one or both of the operands is negative.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Alternative Representations of the Concertina II ISA

<ul2bu4$2a7gb$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35601&group=comp.arch#35601

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 7 Dec 2023 22:54:44 -0500
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <ul2bu4$2a7gb$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
<3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
<uko9la$e2he$2@dont-email.me> <2023Dec7.143933@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 9 Dec 2023 18:33:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="10bdb06c9808482ebc8e9231eff84a4f";
logging-data="2432523"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18SKQrSnWtBWFhPR00bRsw8Y+pAHNa9yE8="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:gZdEaRAQBUWndPAhdRD3LUXjI5g=
In-Reply-To: <2023Dec7.143933@mips.complang.tuwien.ac.at>
 by: Paul A. Clayton - Fri, 8 Dec 2023 03:54 UTC

On 12/7/23 8:39 AM, Anton Ertl wrote:
> "Paul A. Clayton" <paaronclayton@gmail.com> writes:
>> What about multiply-high-no-carry-in? There might be cases
>> where such could be used for division with a reused
>> (possibly compile-time constant) divisor even without the
>> carry-in. (Producing the carry-in by a low multiply seems
>> to have significant overhead in energy, throughput, and/or
>> latency.)
>
> What's that operation supposed to produce? How does it differ from,
> e.g., RISC-V MULH/MULHU/MULHSU? And if there is a difference, what
> makes you think that it still works for division?

RISC-V's multiply high variants give an exact result and require
carry-in from the lower result. What I was proposing is a multiply
that had the same overhead as a multiply-low (ordinary multiply)
but computed a close approximation of the multiply-high result.
Multiplying by the reciprocal would give a close approximation of
a division result; in some cases a correct result would be
provided. (Hardware might be able to provide an exact division
result more efficiently than software correction of a close
approximation.)

Hardware that performed a full (double-width result) multiply
would probably not benefit from such since alternative use or
power-gating of unused portions of the multiplier could introduce
excessive overhead. Hardware that ran the multiplication through
twice to get a high result might be able to benefit.

The use case seems so limited and the benefits so questionable
that this might well be silly, but it seemed an obvious
possibility.

Re: Alternative Representations of the Concertina II ISA

<ul2bu4$2a7gb$2@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35603&group=comp.arch#35603

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Thu, 7 Dec 2023 22:56:14 -0500
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <ul2bu4$2a7gb$2@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
<3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
<uko9la$e2he$2@dont-email.me>
<c8a110514d5c8cb0534e217912a0369e@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 9 Dec 2023 18:33:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="10bdb06c9808482ebc8e9231eff84a4f";
logging-data="2432523"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/fY5Dz/XGM94x7SFYJh1IcOQWCOPy3V3k="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:K5kjuF5bZZPuB5/Kyl5aJjDVbu4=
In-Reply-To: <c8a110514d5c8cb0534e217912a0369e@news.novabbs.com>
 by: Paul A. Clayton - Fri, 8 Dec 2023 03:56 UTC

On 12/7/23 1:55 PM, MitchAlsup wrote:
> Paul A. Clayton wrote:
>
>> On 11/27/23 8:03 PM, MitchAlsup wrote:
>>> Anton Ertl wrote:
>> [snip]
>>>> However, looking at more recent architectures, the RISC-V M
>>>> extension (which is part of RV64G and RV32G, i.e., a standard
>>>> extension) has not just multiply instructions (MUL,
>>>> MULH, MULHU, MULHSU, MULW), but also integer divide  >>
>>>> instructions: DIV, DIVU, REM, REMU, DIVW, DIVUW, REMW, and
>>>> REMUW.
>>>
>>> All of which are possible in My 66000 using operand sign
>>> control, S-bit, and and CARRY when you want 64×64->128 or
>>> 128/64->{64 quotient, 64 remainder}
>
>> What about multiply-high-no-carry-in?
>
>       CARRY   R9,{{O}}    // carry applies to next inst
>                           // no carry in yes carry out
>       MUL     R10,Rx,Ry   // {R9,R10} contain result

I was thinking of a single register result.

>
>>                                       There might be cases
>> where such could be used for division with a reused
>> (possibly compile-time constant) divisor even without the
>> carry-in. (Producing the carry-in by a low multiply seems
>> to have significant overhead in energy, throughput, and/or
>> latency.) A small immediate operand might be shifted into
>> the most significant position to facilitate division by a
>> small constant.
>
>> A division-specific instruction (divide-using-reciprocal)
>> could do more sophisticated manipulation of an immediate and
>> perhaps provide an instruction a compiler could use with
>> less knowledge of the dividend. Yet there might be uses for
>> a simple multiply-high-no-carry-in.

Re: Alternative Representations of the Concertina II ISA

<ul2kfh$2b9qq$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35609&group=comp.arch#35609

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sat, 9 Dec 2023 14:59:28 -0600
Organization: A noiseless patient Spider
Lines: 53
Message-ID: <ul2kfh$2b9qq$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
<3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
<uko9la$e2he$2@dont-email.me> <2023Dec7.143933@mips.complang.tuwien.ac.at>
<ul2bu4$2a7gb$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 9 Dec 2023 20:59:29 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f2d78be1781fe50ba7c75ccf2cab5887";
logging-data="2467674"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+zad73XKx73nwESusNuAzG"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:Rkg2aOv2GsMO5oD/ycuo1Y0RzG8=
Content-Language: en-US
In-Reply-To: <ul2bu4$2a7gb$1@dont-email.me>
 by: BGB - Sat, 9 Dec 2023 20:59 UTC

On 12/7/2023 9:54 PM, Paul A. Clayton wrote:
> On 12/7/23 8:39 AM, Anton Ertl wrote:
>> "Paul A. Clayton" <paaronclayton@gmail.com> writes:
>>> What about multiply-high-no-carry-in? There might be cases
>>> where such could be used for division with a reused
>>> (possibly compile-time constant) divisor even without the
>>> carry-in. (Producing the carry-in by a low multiply seems
>>> to have significant overhead in energy, throughput, and/or
>>> latency.)
>>
>> What's that operation supposed to produce?  How does it differ from,
>> e.g., RISC-V MULH/MULHU/MULHSU?  And if there is a difference, what
>> makes you think that it still works for division?
>
> RISC-V's multiply high variants give an exact result and require
> carry-in from the lower result. What I was proposing is a multiply
> that had the same overhead as a multiply-low (ordinary multiply)
> but computed a close approximation of the multiply-high result.
> Multiplying by the reciprocal would give a close approximation of
> a division result; in some cases a correct result would be
> provided. (Hardware might be able to provide an exact division
> result more efficiently than software correction of a close
> approximation.)
>
> Hardware that performed a full (double-width result) multiply
> would probably not benefit from such since alternative use or
> power-gating of unused portions of the multiplier could introduce
> excessive overhead. Hardware that ran the multiplication through
> twice to get a high result might be able to benefit.
>
> The use case seems so limited and the benefits so questionable
> that this might well be silly, but it seemed an obvious
> possibility.

Yeah. This is closer to what I was doing for floating-point operations.

For 32-bit multiply, the multiplier always internally produces a
widening 64-bit result, but then sign or zero extends it for the 32-bit
cases. Since for C, normal int*int or similar will discard any
high-order results in the case of overflow; and otherwise following
every multiply with a sign or zero extension would be wasteful (but is
ironically closer to how early versions of my ISA would have worked; but
I later split up these cases).

I had considered a possible 64*64->128 widening multiply, since
internally the 64-bit multiplier also produces an intermediate 128-bit
result (then returns the low or high half; and doing a multiply for each
half takes twice as long).

But, thus far, have not done so...

Re: Alternative Representations of the Concertina II ISA

<a6924ec0fe2b7570aabc143b4e2604e5@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35643&group=comp.arch#35643

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sun, 10 Dec 2023 18:29:50 +0000
Organization: novaBBS
Message-ID: <a6924ec0fe2b7570aabc143b4e2604e5@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <2023Nov27.101759@mips.complang.tuwien.ac.at> <3020102144e0e12cd79c784d2b80af78@news.novabbs.com> <uko9la$e2he$2@dont-email.me> <c8a110514d5c8cb0534e217912a0369e@news.novabbs.com> <ul2bu4$2a7gb$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="3688963"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$8w2K53tZZCpepeI.V9t1L.eGUtkRtPlBj4eA4TaE.v9R6uJ.XjMei
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: MitchAlsup - Sun, 10 Dec 2023 18:29 UTC

Paul A. Clayton wrote:

> On 12/7/23 1:55 PM, MitchAlsup wrote:
>> Paul A. Clayton wrote:
>>
>>> On 11/27/23 8:03 PM, MitchAlsup wrote:
>>>> Anton Ertl wrote:
>>> [snip]
>>>>> However, looking at more recent architectures, the RISC-V M
>>>>> extension (which is part of RV64G and RV32G, i.e., a standard
>>>>> extension) has not just multiply instructions (MUL,
>>>>> MULH, MULHU, MULHSU, MULW), but also integer divide  >>
>>>>> instructions: DIV, DIVU, REM, REMU, DIVW, DIVUW, REMW, and
>>>>> REMUW.
>>>>
>>>> All of which are possible in My 66000 using operand sign
>>>> control, S-bit, and and CARRY when you want 64×64->128 or
>>>> 128/64->{64 quotient, 64 remainder}
>>
>>> What about multiply-high-no-carry-in?
>>
>>       CARRY   R9,{{O}}    // carry applies to next inst
>>                           // no carry in yes carry out
>>       MUL     R10,Rx,Ry   // {R9,R10} contain result

> I was thinking of a single register result.

How can a single 64-bit register hold a 128-bit result ??

>>
>>>                                       There might be cases
>>> where such could be used for division with a reused
>>> (possibly compile-time constant) divisor even without the
>>> carry-in. (Producing the carry-in by a low multiply seems
>>> to have significant overhead in energy, throughput, and/or
>>> latency.) A small immediate operand might be shifted into
>>> the most significant position to facilitate division by a
>>> small constant.
>>
>>> A division-specific instruction (divide-using-reciprocal)
>>> could do more sophisticated manipulation of an immediate and
>>> perhaps provide an instruction a compiler could use with
>>> less knowledge of the dividend. Yet there might be uses for
>>> a simple multiply-high-no-carry-in.

Re: Alternative Representations of the Concertina II ISA

<ul6p4t$m8s4$1@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35678&group=comp.arch#35678

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!2.eu.feeder.erje.net!feeder.erje.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-3b3f-0-e67c-80ca-c105-4ece.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Mon, 11 Dec 2023 10:43:41 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <ul6p4t$m8s4$1@newsreader4.netcologne.de>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
<3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
<uko9la$e2he$2@dont-email.me>
<c8a110514d5c8cb0534e217912a0369e@news.novabbs.com>
<ul2bu4$2a7gb$2@dont-email.me>
<a6924ec0fe2b7570aabc143b4e2604e5@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 11 Dec 2023 10:43:41 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-3b3f-0-e67c-80ca-c105-4ece.ipv6dyn.netcologne.de:2001:4dd7:3b3f:0:e67c:80ca:c105:4ece";
logging-data="729988"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Mon, 11 Dec 2023 10:43 UTC

MitchAlsup <mitchalsup@aol.com> schrieb:
> Paul A. Clayton wrote:
>
>> On 12/7/23 1:55 PM, MitchAlsup wrote:

>>>> What about multiply-high-no-carry-in?
>>>
>>>       CARRY   R9,{{O}}    // carry applies to next inst
>>>                           // no carry in yes carry out
>>>       MUL     R10,Rx,Ry   // {R9,R10} contain result
>
>> I was thinking of a single register result.
>
> How can a single 64-bit register hold a 128-bit result ??

Sometimes, you don't need the whole result; the upper bits suffice.

Consider dividing a 32-bit unsigned number n on a 32-bit system
by 25 (equally applicable to similar examples with 64-bit systems,
but I happen to have the numbers at hand).

In plain C, you could do this by

uint32_t res = ((uint64_t) n * (uint64_t) 1374389535) >> 35;

where any compiler worth its salt will optimize this to a multiply
high followed by a shift by three bits. Compilers have done
this for a long time, for division by a constant. Example, for
32-bit POWER:

#include <stdint.h>

uint32_t div25 (uint32_t n)
{ return n / 25;
}

yields

div25:
lis 9,0x51eb
ori 9,9,0x851f
mulhwu 3,3,9
srwi 3,3,3
blr

where the (lis/ori is POWER's way of synthesizing a 32-bit contant,
and 0x51eb851f = 1374389535).

If you can only access the high part via CARRY, this clobbers a
register that you do not need to do otherwise.


devel / comp.arch / Re: Alternative Representations of the Concertina II ISA

Pages:1234
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor