Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

Conquest is easy. Control is not. -- Kirk, "Mirror, Mirror", stardate unknown


devel / comp.arch / Re: What integer C type to use

SubjectAuthor
* A Very Bad IdeaQuadibloc
+- Re: A Very Bad IdeaChris M. Thomasson
+* Vectors (was: A Very Bad Idea)Anton Ertl
|+* Re: Vectors (was: A Very Bad Idea)Quadibloc
||+- Re: Vectors (was: A Very Bad Idea)Anton Ertl
||`- Re: VectorsMitchAlsup1
|`- Re: VectorsMitchAlsup1
+* Re: A Very Bad IdeaBGB
|`* Re: A Very Bad IdeaMitchAlsup1
| `- Re: A Very Bad IdeaBGB-Alt
+- Re: A Very Bad IdeaMitchAlsup1
+* Re: A Very Bad Idea?Lawrence D'Oliveiro
|`* Re: A Very Bad Idea?MitchAlsup1
| `- Re: A Very Bad Idea?BGB-Alt
`* Re: Cray style vectors (was: A Very Bad Idea)Marcus
 +* Re: Cray style vectors (was: A Very Bad Idea)Quadibloc
 |+- Re: Cray style vectors (was: A Very Bad Idea)Quadibloc
 |+* Re: Cray style vectors (was: A Very Bad Idea)Scott Lurndal
 ||`* Re: Cray style vectors (was: A Very Bad Idea)Thomas Koenig
 || `* Re: Cray style vectorsMitchAlsup1
 ||  `- Re: Cray style vectorsQuadibloc
 |`* Re: Cray style vectorsMarcus
 | +- Re: Cray style vectorsMitchAlsup1
 | `* Re: Cray style vectorsQuadibloc
 |  +- Re: Cray style vectorsQuadibloc
 |  +* Re: Cray style vectorsAnton Ertl
 |  |`* Re: Cray style vectorsStephen Fuld
 |  | +* Re: Cray style vectorsAnton Ertl
 |  | |+- Re: Cray style vectorsMitchAlsup1
 |  | |`* Re: Cray style vectorsStephen Fuld
 |  | | `* Re: Cray style vectorsMitchAlsup
 |  | |  `* Re: Cray style vectorsStephen Fuld
 |  | |   `* Re: Cray style vectorsTerje Mathisen
 |  | |    `* Re: Cray style vectorsAnton Ertl
 |  | |     +* Re: Cray style vectorsTerje Mathisen
 |  | |     |+- Re: Cray style vectorsMitchAlsup1
 |  | |     |+* Re: Cray style vectorsTim Rentsch
 |  | |     ||+* Re: Cray style vectorsMitchAlsup1
 |  | |     |||`* Re: Cray style vectorsTim Rentsch
 |  | |     ||| +* Re: Cray style vectorsOpus
 |  | |     ||| |`- Re: Cray style vectorsTim Rentsch
 |  | |     ||| +* Re: Cray style vectorsScott Lurndal
 |  | |     ||| |`- Re: Cray style vectorsTim Rentsch
 |  | |     ||| `* Re: Cray style vectorsMitchAlsup1
 |  | |     |||  `- Re: Cray style vectorsTim Rentsch
 |  | |     ||`* Re: Cray style vectorsTerje Mathisen
 |  | |     || `* Re: Cray style vectorsTim Rentsch
 |  | |     ||  `* Re: Cray style vectorsTerje Mathisen
 |  | |     ||   +* Re: Cray style vectorsTerje Mathisen
 |  | |     ||   |+* Re: Cray style vectorsMichael S
 |  | |     ||   ||`* Re: Cray style vectorsMitchAlsup1
 |  | |     ||   || `- Re: Cray style vectorsScott Lurndal
 |  | |     ||   |`- Re: Cray style vectorsTim Rentsch
 |  | |     ||   `- Re: Cray style vectorsTim Rentsch
 |  | |     |+- Re: Cray style vectorsAnton Ertl
 |  | |     |`* Re: Cray style vectorsDavid Brown
 |  | |     | +* Re: Cray style vectorsTerje Mathisen
 |  | |     | |+* Re: Cray style vectorsMitchAlsup1
 |  | |     | ||+* Re: Cray style vectorsAnton Ertl
 |  | |     | |||`* What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | ||| `* Re: What integer C type to use (was: Cray style vectors)David Brown
 |  | |     | |||  +* Re: What integer C type to use (was: Cray style vectors)Scott Lurndal
 |  | |     | |||  |`* Re: What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | |||  | +* Re: What integer C type to use (was: Cray style vectors)Scott Lurndal
 |  | |     | |||  | |+- Re: What integer C type to useMitchAlsup1
 |  | |     | |||  | |`* Re: What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | |||  | | `* Re: What integer C type to use (was: Cray style vectors)Scott Lurndal
 |  | |     | |||  | |  `* Re: What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | |||  | |   +- Re: What integer C type to use (was: Cray style vectors)Scott Lurndal
 |  | |     | |||  | |   `* Re: What integer C type to use (was: Cray style vectors)Tim Rentsch
 |  | |     | |||  | |    `* Re: What integer C type to use (was: Cray style vectors)Scott Lurndal
 |  | |     | |||  | |     `- Re: What integer C type to use (was: Cray style vectors)Tim Rentsch
 |  | |     | |||  | `- Re: What integer C type to useMitchAlsup1
 |  | |     | |||  +* Re: What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | |||  |+* Re: What integer C type to useMitchAlsup1
 |  | |     | |||  ||+- Re: What integer C type to useDavid Brown
 |  | |     | |||  ||`* Re: What integer C type to useTerje Mathisen
 |  | |     | |||  || `* Re: What integer C type to useTim Rentsch
 |  | |     | |||  ||  `* Re: What integer C type to useMitchAlsup1
 |  | |     | |||  ||   +- Re: What integer C type to useTim Rentsch
 |  | |     | |||  ||   `* Re: What integer C type to useDavid Brown
 |  | |     | |||  ||    `- Re: What integer C type to useThomas Koenig
 |  | |     | |||  |+* Re: What integer C type to use (was: Cray style vectors)David Brown
 |  | |     | |||  ||+* Re: What integer C type to use (was: Cray style vectors)Scott Lurndal
 |  | |     | |||  |||+* Re: What integer C type to use (was: Cray style vectors)Michael S
 |  | |     | |||  ||||+- Re: What integer C type to use (was: Cray style vectors)Scott Lurndal
 |  | |     | |||  ||||`- Re: What integer C type to use (was: Cray style vectors)David Brown
 |  | |     | |||  |||`- Re: What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | |||  ||`* Re: What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | |||  || `* Re: What integer C type to useDavid Brown
 |  | |     | |||  ||  `* Re: What integer C type to useMitchAlsup1
 |  | |     | |||  ||   `- Re: What integer C type to useDavid Brown
 |  | |     | |||  |`* Re: What integer C type to use (was: Cray style vectors)Thomas Koenig
 |  | |     | |||  | +* Re: What integer C type to useMitchAlsup1
 |  | |     | |||  | |+* Re: What integer C type to useDavid Brown
 |  | |     | |||  | ||`* Re: What integer C type to useMitchAlsup1
 |  | |     | |||  | || `* Re: What integer C type to useDavid Brown
 |  | |     | |||  | ||  `* Re: What integer C type to useMichael S
 |  | |     | |||  | ||   +* Re: What integer C type to useMitchAlsup1
 |  | |     | |||  | ||   |`* Re: What integer C type to useMichael S
 |  | |     | |||  | ||   | `* Re: What integer C type to useMitchAlsup1
 |  | |     | |||  | ||   `- Re: What integer C type to useThomas Koenig
 |  | |     | |||  | |`* Re: What integer C type to useThomas Koenig
 |  | |     | |||  | `* Re: What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | |||  +* Re: What integer C type to use (was: Cray style vectors)Brian G. Lucas
 |  | |     | |||  `- Re: What integer C type to useBGB
 |  | |     | ||+- Re: Cray style vectorsDavid Brown
 |  | |     | ||`- Re: Cray style vectorsTim Rentsch
 |  | |     | |+- Re: Cray style vectorsDavid Brown
 |  | |     | |`- Re: Cray style vectorsTim Rentsch
 |  | |     | `* Re: Cray style vectorsThomas Koenig
 |  | |     `* Re: Cray style vectorsBGB
 |  | `- Re: Cray style vectorsMitchAlsup1
 |  +- Re: Cray style vectorsBGB
 |  +* Re: Cray style vectorsMarcus
 |  `* Re: Cray style vectorsMitchAlsup1
 `* Re: Cray style vectors (was: A Very Bad Idea)Michael S

Pages:12345678910
Re: Cray style vectors

<2024Mar12.232336@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37964&group=comp.arch#37964

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Tue, 12 Mar 2024 22:23:36 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 39
Message-ID: <2024Mar12.232336@mips.complang.tuwien.ac.at>
References: <upq0cr$6b5m$1@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <uqvk2o$1snbf$1@dont-email.me> <ur1h0v$emi4$1@newsreader4.netcologne.de> <86r0h6wyil.fsf@linuxsc.com> <ur7v2r$ipnu$1@newsreader4.netcologne.de> <861q91ulhs.fsf@linuxsc.com> <urkbsu$34rpk$1@dont-email.me> <864jdcsqmn.fsf@linuxsc.com> <usp8lp$7i96$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="a3d96a3e5e8c9d1d65febe5a8b08fbcd";
logging-data="569055"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18iRUDR7ifL8IJlBnNdv4K8"
Cancel-Lock: sha1:q3pTfLbtJ4sPvqFWCsUg/iE+5ag=
X-newsreader: xrn 10.11
 by: Anton Ertl - Tue, 12 Mar 2024 22:23 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:
>Tim Rentsch wrote:
>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>
>>> If I really had to write a 64x64->128 MUL, with no widening MUL or
>>> MULH which returns the high half, then I would punt and do it using
>>> 32-bit parts (all variables are u64): [...]
>>
>> I wrote some code along the same lines. A difference is you
>> are considering unsigned multiplication, and I am considering
>> signed multiplication.
>>
>Signed mul is just a special case of unsigned mul, right?
>
>I.e. in case of a signed widening mul, you'd first extract the signs,
>convert the inputs to unsigned, then do the unsigned widening mul,
>before finally resotirng the sign as the XOR of the input signs?

In Gforth we use:

DCell mmul (Cell a, Cell b) /* signed multiply, mixed precision */
{ DCell res;

res = UD2D(ummul (a, b));
if (a < 0)
res.hi -= b;
if (b < 0)
res.hi -= a;
return res;
}

I have this technique from Andrew Haley. It relies on twos-complement
representation.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Cray style vectors

<md5IN.622$_a1e.582@fx16.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37965&group=comp.arch#37965

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx16.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Cray style vectors
References: <upq0cr$6b5m$1@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <uqvk2o$1snbf$1@dont-email.me> <ur1h0v$emi4$1@newsreader4.netcologne.de> <86r0h6wyil.fsf@linuxsc.com> <ur7v2r$ipnu$1@newsreader4.netcologne.de> <861q91ulhs.fsf@linuxsc.com> <urkbsu$34rpk$1@dont-email.me> <864jdcsqmn.fsf@linuxsc.com> <usp8lp$7i96$1@dont-email.me> <2024Mar12.232336@mips.complang.tuwien.ac.at>
In-Reply-To: <2024Mar12.232336@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 41
Message-ID: <md5IN.622$_a1e.582@fx16.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 12 Mar 2024 23:05:54 UTC
Date: Tue, 12 Mar 2024 19:05:44 -0400
X-Received-Bytes: 2362
 by: EricP - Tue, 12 Mar 2024 23:05 UTC

Anton Ertl wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>> Tim Rentsch wrote:
>>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>>
>>>> If I really had to write a 64x64->128 MUL, with no widening MUL or
>>>> MULH which returns the high half, then I would punt and do it using
>>>> 32-bit parts (all variables are u64): [...]
>>> I wrote some code along the same lines. A difference is you
>>> are considering unsigned multiplication, and I am considering
>>> signed multiplication.
>>>
>> Signed mul is just a special case of unsigned mul, right?
>>
>> I.e. in case of a signed widening mul, you'd first extract the signs,
>> convert the inputs to unsigned, then do the unsigned widening mul,
>> before finally resotirng the sign as the XOR of the input signs?
>
> In Gforth we use:
>
> DCell mmul (Cell a, Cell b) /* signed multiply, mixed precision */
> {
> DCell res;
>
> res = UD2D(ummul (a, b));
> if (a < 0)
> res.hi -= b;
> if (b < 0)
> res.hi -= a;
> return res;
> }
>
> I have this technique from Andrew Haley. It relies on twos-complement
> representation.
>
> - anton

Yeah, that's what Alpha does with UMULH.
I'm still trying to figure out why it works.

Re: Cray style vectors

<86bk7irfjf.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37967&group=comp.arch#37967

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Tue, 12 Mar 2024 20:12:20 -0700
Organization: A noiseless patient Spider
Lines: 58
Message-ID: <86bk7irfjf.fsf@linuxsc.com>
References: <upq0cr$6b5m$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <uqvk2o$1snbf$1@dont-email.me> <ur1h0v$emi4$1@newsreader4.netcologne.de> <86r0h6wyil.fsf@linuxsc.com> <ur7v2r$ipnu$1@newsreader4.netcologne.de> <861q91ulhs.fsf@linuxsc.com> <urkbsu$34rpk$1@dont-email.me> <864jdcsqmn.fsf@linuxsc.com> <usp8lp$7i96$1@dont-email.me> <86jzm7qqdk.fsf@linuxsc.com> <aac798f6f6b611c624b1bb0ad1f7d30a@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="79dd1b0d32cfdf47da0b884151472315";
logging-data="801885"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18bTxeOghPlas/LutYRPNYVEGO7ROgm75w="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:OjbG2pZDLPn6cX8vzJNeKTZDicE=
sha1:sbdbJQLBvuzHlo9HgsWh1dYF+/k=
 by: Tim Rentsch - Wed, 13 Mar 2024 03:12 UTC

mitchalsup@aol.com (MitchAlsup1) writes:

> Tim Rentsch wrote:
>
>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>
>>> Tim Rentsch wrote:
>>>
>>>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>>>
>>>>> If I really had to write a 64x64->128 MUL, with no widening MUL or
>>>>> MULH which returns the high half, then I would punt and do it using
>>>>> 32-bit parts (all variables are u64): [...]
>>>>
>>>> I wrote some code along the same lines. A difference is you
>>>> are considering unsigned multiplication, and I am considering
>>>> signed multiplication.
>>>
>>> Signed mul is just a special case of unsigned mul, right?
>>>
>>> I.e. in case of a signed widening mul, you'd first extract the signs,
>>> convert the inputs to unsigned, then do the unsigned widening mul,
>>> before finally resotirng the sign as the XOR of the input signs?
>>>
>>> There is a small gotcha if either of the inputs are of the 0x80000000
>>> form, i.e. MININT, but the naive iabs() conversion will do the right
>>> thing by leaving the input unchanged.
>>>
>>> At the other end there cannot be any issues since restoring a negative
>>> output sign cannot overflow/fail.
>>
>> It isn't quite that simple. Some of what you describe has a risk
>> of running afoul of implementation-defined behavior or undefined
>> behavior (as for example abs( INT_MIN )). I'm pretty sure it's
>> possible to avoid those pitfalls, but it requires a fair amount
>> of care and careful thinking.
>
> It would be supremely nice if we could go back in time before
> computers and reserve an integer encoding that represents the
> value of "there is no value here" and mandate if upon integer
> arithmetic.

ISO C allows such an encoding, even for two's complement.

Sadly it appears that the latest C standard will be taking
away that allowance.

>> Note that my goal is only to avoid the possibility of undefined
>> behavior that comes from signed overflow. My approach is to safely
>> determine whether the signed multiplication would overflow, and if
>> it wouldn't then simply use signed arithmetic to get the result.
>
> Double width multiplication cannot overflow. 2n = nxn then, ignoring
> the top n bits gives you your non-overflowing multiply.

C does not guarantee that. The point of the exercise is to
write code assuming nothing more than what the C standard
mandates.

Re: Cray style vectors

<8634surf0r.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37969&group=comp.arch#37969

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Tue, 12 Mar 2024 20:23:32 -0700
Organization: A noiseless patient Spider
Lines: 56
Message-ID: <8634surf0r.fsf@linuxsc.com>
References: <upq0cr$6b5m$1@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <uqvk2o$1snbf$1@dont-email.me> <ur1h0v$emi4$1@newsreader4.netcologne.de> <86r0h6wyil.fsf@linuxsc.com> <ur7v2r$ipnu$1@newsreader4.netcologne.de> <861q91ulhs.fsf@linuxsc.com> <urkbsu$34rpk$1@dont-email.me> <864jdcsqmn.fsf@linuxsc.com> <usp8lp$7i96$1@dont-email.me> <2024Mar12.232336@mips.complang.tuwien.ac.at> <md5IN.622$_a1e.582@fx16.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="79dd1b0d32cfdf47da0b884151472315";
logging-data="801885"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX180qn3bkyj6hZ6qebLGmX8FFp+VZXOzFyA="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:mUQjUHR+NJ4PxMJ0ztGU/WuKpRg=
sha1:Q6CYlKez9vs9NMCULwXZTebYy88=
 by: Tim Rentsch - Wed, 13 Mar 2024 03:23 UTC

EricP <ThatWouldBeTelling@thevillage.com> writes:

> Anton Ertl wrote:
>
>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>
>>> Tim Rentsch wrote:
>>>
>>>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>>>
>>>>> If I really had to write a 64x64->128 MUL, with no widening MUL or
>>>>> MULH which returns the high half, then I would punt and do it using
>>>>> 32-bit parts (all variables are u64): [...]
>>>>
>>>> I wrote some code along the same lines. A difference is you
>>>> are considering unsigned multiplication, and I am considering
>>>> signed multiplication.
>>>
>>> Signed mul is just a special case of unsigned mul, right?
>>>
>>> I.e. in case of a signed widening mul, you'd first extract the
>>> signs, convert the inputs to unsigned, then do the unsigned
>>> widening mul, before finally resotirng the sign as the XOR of the
>>> input signs?
>>
>> In Gforth we use:
>>
>> DCell mmul (Cell a, Cell b) /* signed multiply, mixed precision */
>> {
>> DCell res;
>>
>> res = UD2D(ummul (a, b));
>> if (a < 0)
>> res.hi -= b;
>> if (b < 0)
>> res.hi -= a;
>> return res;
>> }
>>
>> I have this technique from Andrew Haley. It relies on twos-complement
>> representation.
>
> Yeah, that's what Alpha does with UMULH.
> I'm still trying to figure out why it works.

It works because a sign bit works like a value bit
with a weight of -2**(N-1), where N is the width of
the memory holding the signed value. So instead
of subtracting 2**(N-1) * b, assuming a is negative,
we have instead added 2**(N-1) * b, so we need to
subtract 2 * 2**(N-1) * b, or 2**N * b, which means
subtracting b from the high order word of the result.
And of course similarly for when b is negative.

(Note that the above holds for two's complement, but
not for ones' complement or signed magnitude.)

Re: What integer C type to use

<usrgcs$qea8$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37972&group=comp.arch#37972

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: What integer C type to use
Date: Wed, 13 Mar 2024 06:17:01 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <usrgcs$qea8$1@dont-email.me>
References: <upq0cr$6b5m$1@dont-email.me> <uqobhv$3o4m9$2@dont-email.me>
<1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com>
<uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me>
<uqvk2o$1snbf$1@dont-email.me> <ur0ka6$23ma8$1@dont-email.me>
<dd9c82c9be34460dc6fea35c8608e51d@www.novabbs.org>
<2024Feb20.083240@mips.complang.tuwien.ac.at>
<2024Feb20.130029@mips.complang.tuwien.ac.at>
<ur2jpf$2j800$1@dont-email.me>
<2024Feb20.184737@mips.complang.tuwien.ac.at>
<uraof0$kij0$1@newsreader4.netcologne.de>
<3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org>
<urfcgs$1rne2$1@dont-email.me>
<dbaa17babbd2ca8362fad9f9ecd4b79c@www.novabbs.org>
<usp9un$7pij$1@dont-email.me> <20240312144428.000063f5@yahoo.com>
Injection-Date: Wed, 13 Mar 2024 06:17:01 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="650ffe840b646e1f9e8b5eb2bbdb0a5d";
logging-data="866632"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19iePhz727zEMkxL4eDuZuNkgQywXPF+y4="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:aB/SnvXufiCMiBdFM6PUOGVwH6o=
 by: Thomas Koenig - Wed, 13 Mar 2024 06:17 UTC

Michael S <already5chosen@yahoo.com> schrieb:

> Even for Cray/NEC-style vectors, the same throughput for different
> precision is not an universal property. Cray's and NEC's vector
> processors happen to be designed like that, but one can easily imagine
> vector processors of similar style that have 2 or even 3 times higher
> throughput for SP vs DP.
> I personally never encountered such machines, but would be surprised if
> it were never built and sold back by one or another usual suspect (may
> be, Fujitsu?) in days when designers liked Cray's style.

I worked on such a machine, and (IIRC) single precision was faster
on that machine. That may have been due to the comparatively
low memory throughput of the single load/store pipeline that it had.

But it's been a few decades, and my memory may be off (and I don't
have any handbooks from the time).

Re: What integer C type to use

<20240313110859.000024e9@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37973&group=comp.arch#37973

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.arch
Subject: Re: What integer C type to use
Date: Wed, 13 Mar 2024 11:08:59 +0200
Organization: A noiseless patient Spider
Lines: 101
Message-ID: <20240313110859.000024e9@yahoo.com>
References: <upq0cr$6b5m$1@dont-email.me>
<uqobhv$3o4m9$2@dont-email.me>
<1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com>
<uqpngc$3o4m9$3@dont-email.me>
<uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at>
<uqqvkc$i2cu$1@dont-email.me>
<uqvk2o$1snbf$1@dont-email.me>
<ur0ka6$23ma8$1@dont-email.me>
<dd9c82c9be34460dc6fea35c8608e51d@www.novabbs.org>
<2024Feb20.083240@mips.complang.tuwien.ac.at>
<2024Feb20.130029@mips.complang.tuwien.ac.at>
<ur2jpf$2j800$1@dont-email.me>
<2024Feb20.184737@mips.complang.tuwien.ac.at>
<uraof0$kij0$1@newsreader4.netcologne.de>
<3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org>
<urfcgs$1rne2$1@dont-email.me>
<dbaa17babbd2ca8362fad9f9ecd4b79c@www.novabbs.org>
<usp9un$7pij$1@dont-email.me>
<20240312144428.000063f5@yahoo.com>
<19da68f1b874758d42b64203741c325b@www.novabbs.org>
<20240312194918.00002cde@yahoo.com>
<db785354ebf90ee6f613fc9c39f8ca72@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Injection-Info: dont-email.me; posting-host="e06ec37471ddbf3b56cad511bd7ef1c1";
logging-data="919684"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Sqh1M5WlgE+PZJT5ATwOt8gI+9CyrwUQ="
Cancel-Lock: sha1:F+3z6dfUO8FaM23LeXAKDAeTLHM=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
 by: Michael S - Wed, 13 Mar 2024 09:08 UTC

On Tue, 12 Mar 2024 19:00:46 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:

> Michael S wrote:
>
> > On Tue, 12 Mar 2024 17:18:36 +0000
> > mitchalsup@aol.com (MitchAlsup1) wrote:
>
> >>
> >> While theoretically possible, they did not do this because both
> >> halves of a 2×SP would not arrive from memory necessarily
> >> simultaneously. {Consider a gather load you need a vector of
> >> addresses 2× as long for pairs of SP going into a single vector
> >> register element.}
>
> > Doctor, it hurts when I do this!
> > So, what prevents you from providing no gather with resolution
> > below 64 bits?
>
> Well, then, you have SP values in a container than could hold 2 and
> you don't get any SIMD speedup.
>

(1) - a need for full gather it hopefully rare. Majority of time things
accessed continuously or, at worst, with non-unit strides.
My personal rule of thumb is "if I need generic gather, most likely I
shouldn't have been bothered with vectorizing." Of course, as every
rule of thumb, it's imprecise.
(2) - there are several important applications that naturally have
pair-wise data layout. Complex numbers is just one.

> >> > Which, of course, leaves the question of what property makes
> >> > vector processor Cray-style. Just having ALU/FPU several times
> >> > narrower than VR is, IMHO, not enough to be considered
> >> > Cray-style.
> >>
> >> That property is that the length of the vector register is chosen
> >> to absorb the latency to memory. SMID is too short to have this
> >> property.
>
> > I don't like this definition at all.
> > For starter, what is "memory"? Does L1D cache count, or only L2 and
> > higher?
>
> Those machines had no L1 or L2 (or LLC) caches. Consider the problems
> for which they were designed--arrays as big as the memory (sometimes
> bigger !!) and processed over and over again with numerical
> algorithms. Caches would simply miss on each memory reference
> (ignoring TLB effects) With the caches never supplying data to the
> calculations why have them at all ??
>
> > Then, what is "absorb" ?
>
> Absorb means that the first data of a vector arrives and can start
> calculation before the last address of the memory reference goes out.
> This, in turn, means that one can create a continuous stream of
> outbound addresses forever and thus cone can create a stream of
> continuous calculations forever. {{Where 'forever' means thousands
> of cycles but no where near the lifetime of the universe.}}
>
> Now, obviously, this means the memory system has to be able to make
> forward progress on all those memory accesses continuously.
>
> > Is the whole VR register file part of
> > absorbent or latency should be covered by one register?
>
> A single register covers a single memory reference latency.
>
> > Is OoO
> > machinery part of absorbent?
>
> The only OoO in the CRAYs was delivery of gather data back to the
> vector register*. Scatter stores were sent out in order, as were the
> addresses of the gather loads.
>
> (*) bank conflicts would delay conflicting accesses but not those
> of other banks, creating an OoO effect of returning data. This was
> re-ordered back to IO prior to forwarding data into calculation.
>
> > Is HW threading part of absorbent?
>
> Absolutely not--none of the CRAYs did this--later XMPs and YMPs did
> use lanes (SIMD with vector) but always did calculations in order
> and always sent out addresses (and data when appropriate) in order.
>
> > And for
> > any of your possible answers I have my "Why?".
>
> No harm in asking.
>

It seems, we are talking about different things.
You are talking about Cray vectors, as done in Cray's 1/X-MP/Y-MP
series. I.e. something fixed, known and of interest mostly for
computing historians among us.
OTH, I am trying to discuss a vague notion of "Cray-style vectors". My
intentions are to see what was applicable in more recent times and
which ideas are not totally obsolete for a future.

Re: What integer C type to use

<b95e1b5c526cf9a13a42048297c1b7ec@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37978&group=comp.arch#37978

  copy link   Newsgroups: comp.arch
Date: Wed, 13 Mar 2024 15:45:00 +0000
Subject: Re: What integer C type to use
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$u07BDHjaBigaQG2uXOeWE.L51vr4HX0WbvRTpmwN1PhrQRf2H/B2u
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <upq0cr$6b5m$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <uqvk2o$1snbf$1@dont-email.me> <ur0ka6$23ma8$1@dont-email.me> <dd9c82c9be34460dc6fea35c8608e51d@www.novabbs.org> <2024Feb20.083240@mips.complang.tuwien.ac.at> <2024Feb20.130029@mips.complang.tuwien.ac.at> <ur2jpf$2j800$1@dont-email.me> <2024Feb20.184737@mips.complang.tuwien.ac.at> <uraof0$kij0$1@newsreader4.netcologne.de> <3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org> <urfcgs$1rne2$1@dont-email.me> <dbaa17babbd2ca8362fad9f9ecd4b79c@www.novabbs.org> <usp9un$7pij$1@dont-email.me> <20240312144428.000063f5@yahoo.com> <19da68f1b874758d42b64203741c325b@www.novabbs.org> <20240312194918.00002cde@yahoo.com> <db785354ebf90ee6f613fc9c39f8ca72@www.novabbs.org> <20240313110859.000024e9@yahoo.com>
Organization: Rocksolid Light
Message-ID: <b95e1b5c526cf9a13a42048297c1b7ec@www.novabbs.org>
 by: MitchAlsup1 - Wed, 13 Mar 2024 15:45 UTC

Michael S wrote:

> On Tue, 12 Mar 2024 19:00:46 +0000
> mitchalsup@aol.com (MitchAlsup1) wrote:

>> Michael S wrote:
>>
>> > On Tue, 12 Mar 2024 17:18:36 +0000
>> > mitchalsup@aol.com (MitchAlsup1) wrote:
>>
>> >>
>> >> While theoretically possible, they did not do this because both
>> >> halves of a 2×SP would not arrive from memory necessarily
>> >> simultaneously. {Consider a gather load you need a vector of
>> >> addresses 2× as long for pairs of SP going into a single vector
>> >> register element.}
>>
>> > Doctor, it hurts when I do this!
>> > So, what prevents you from providing no gather with resolution
>> > below 64 bits?
>>
>> Well, then, you have SP values in a container than could hold 2 and
>> you don't get any SIMD speedup.
>>

> (1) - a need for full gather it hopefully rare. Majority of time things
> accessed continuously or, at worst, with non-unit strides.
> My personal rule of thumb is "if I need generic gather, most likely I
> shouldn't have been bothered with vectorizing." Of course, as every
> rule of thumb, it's imprecise.

CRAY-1 XMP gained considerable speedup (about 20%) on its benchmarks
of the day after adding Scatter/Gather and 5 more Livermore Loops
would vectorize.

> (2) - there are several important applications that naturally have
> pair-wise data layout. Complex numbers is just one.

What about Quaterions--which alleviate the programmer from having to
remember which multiplications are subtracted instead of added.

>> >> > Which, of course, leaves the question of what property makes
>> >> > vector processor Cray-style. Just having ALU/FPU several times
>> >> > narrower than VR is, IMHO, not enough to be considered
>> >> > Cray-style.
>> >>
>> >> That property is that the length of the vector register is chosen
>> >> to absorb the latency to memory. SMID is too short to have this
>> >> property.
>>
>> > I don't like this definition at all.
>> > For starter, what is "memory"? Does L1D cache count, or only L2 and
>> > higher?
>>
>> Those machines had no L1 or L2 (or LLC) caches. Consider the problems
>> for which they were designed--arrays as big as the memory (sometimes
>> bigger !!) and processed over and over again with numerical
>> algorithms. Caches would simply miss on each memory reference
>> (ignoring TLB effects) With the caches never supplying data to the
>> calculations why have them at all ??
>>
>> > Then, what is "absorb" ?
>>
>> Absorb means that the first data of a vector arrives and can start
>> calculation before the last address of the memory reference goes out.
>> This, in turn, means that one can create a continuous stream of
>> outbound addresses forever and thus cone can create a stream of
>> continuous calculations forever. {{Where 'forever' means thousands
>> of cycles but no where near the lifetime of the universe.}}
>>
>> Now, obviously, this means the memory system has to be able to make
>> forward progress on all those memory accesses continuously.
>>
>> > Is the whole VR register file part of
>> > absorbent or latency should be covered by one register?
>>
>> A single register covers a single memory reference latency.
>>
>> > Is OoO
>> > machinery part of absorbent?
>>
>> The only OoO in the CRAYs was delivery of gather data back to the
>> vector register*. Scatter stores were sent out in order, as were the
>> addresses of the gather loads.
>>
>> (*) bank conflicts would delay conflicting accesses but not those
>> of other banks, creating an OoO effect of returning data. This was
>> re-ordered back to IO prior to forwarding data into calculation.
>>
>> > Is HW threading part of absorbent?
>>
>> Absolutely not--none of the CRAYs did this--later XMPs and YMPs did
>> use lanes (SIMD with vector) but always did calculations in order
>> and always sent out addresses (and data when appropriate) in order.
>>
>> > And for
>> > any of your possible answers I have my "Why?".
>>
>> No harm in asking.
>>

> It seems, we are talking about different things.
> You are talking about Cray vectors, as done in Cray's 1/X-MP/Y-MP
> series. I.e. something fixed, known and of interest mostly for
> computing historians among us.

Fair enough, but it remains my model for how to discuss vector calculations.

> OTH, I am trying to discuss a vague notion of "Cray-style vectors". My
> intentions are to see what was applicable in more recent times and
> which ideas are not totally obsolete for a future.

Only after you figure out a way to feed 2-LDs and consume 1 ST per-cycle
continuously (cache miss or cache hit; TLB miss or TLB hit) are you in a
position to utilize CRAY-like vector-register architecture effectively.

Re: Cray style vectors

<2024Mar13.180937@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37982&group=comp.arch#37982

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Wed, 13 Mar 2024 17:09:37 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 42
Message-ID: <2024Mar13.180937@mips.complang.tuwien.ac.at>
References: <upq0cr$6b5m$1@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <uqvk2o$1snbf$1@dont-email.me> <ur1h0v$emi4$1@newsreader4.netcologne.de> <86r0h6wyil.fsf@linuxsc.com> <ur7v2r$ipnu$1@newsreader4.netcologne.de> <861q91ulhs.fsf@linuxsc.com> <urkbsu$34rpk$1@dont-email.me> <864jdcsqmn.fsf@linuxsc.com> <usp8lp$7i96$1@dont-email.me> <2024Mar12.232336@mips.complang.tuwien.ac.at> <md5IN.622$_a1e.582@fx16.iad>
Injection-Info: dont-email.me; posting-host="648717f0057f198d3829d722151bbbbd";
logging-data="1125857"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+m2ks20S7HIJcoDw7D49Ka"
Cancel-Lock: sha1:BE2tUXo11our53A2qYQhaunJOuM=
X-newsreader: xrn 10.11
 by: Anton Ertl - Wed, 13 Mar 2024 17:09 UTC

EricP <ThatWouldBeTelling@thevillage.com> writes:
>Anton Ertl wrote:
>> DCell mmul (Cell a, Cell b) /* signed multiply, mixed precision */
>> {
>> DCell res;
>>
>> res = UD2D(ummul (a, b));
>> if (a < 0)
>> res.hi -= b;
>> if (b < 0)
>> res.hi -= a;
>> return res;
>> }
>>
>> I have this technique from Andrew Haley. It relies on twos-complement
>> representation.
>>
>> - anton
>
>Yeah, that's what Alpha does with UMULH.
>I'm still trying to figure out why it works.

Let's consider the case where a>=0 and b<0, and cells are 64 bits. ua
is a interpreted as unsigned cell, and ub is b interpreted as unsigned
cell. The following computations are in Z (the unlimited
integers). For the case under consideration:

ua=a
ub=b+2^64

res = ua*ub = a*(b+2^64)= a*b + a*2^64

So,

a*b = res - a*2^64

The other cases are similar.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Cray style vectors

<9ee27c60a9a2a6eb0f770c1a26b06592@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37986&group=comp.arch#37986

  copy link   Newsgroups: comp.arch
Date: Wed, 13 Mar 2024 18:58:09 +0000
Subject: Re: Cray style vectors
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$UCBNdZnLeqvSrDw2OtTaueQCc/0xxJu3vyzc6T.vaRSpX9oiTA/AS
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me> <uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <uqvk2o$1snbf$1@dont-email.me> <ur1h0v$emi4$1@newsreader4.netcologne.de> <ur27m7$2gp6e$1@dont-email.me> <ur2oa0$fecv$3@newsreader4.netcologne.de> <usngtr$3ocas$1@dont-email.me> <20240311201015.00006482@yahoo.com> <usnhv7$3olpl$1@dont-email.me> <20240311203844.000071ad@yahoo.com> <usnttn$3re5t$1@dont-email.me> <6ade34bde971a395c0c20bf072ac399b@www.novabbs.org> <usouns$5e0s$1@dont-email.me>
Organization: Rocksolid Light
Message-ID: <9ee27c60a9a2a6eb0f770c1a26b06592@www.novabbs.org>
 by: MitchAlsup1 - Wed, 13 Mar 2024 18:58 UTC

Thomas Koenig wrote:

> MitchAlsup1 <mitchalsup@aol.com> schrieb:
>> Thomas Koenig wrote:

>>> However, what I did put in the paper (and what the subsequent
>>> revision by a J3 subcommittee left in) is a prohibition against
>>> using unsigneds in a DO loop. The reason is semantics of
>>> negative strides.
>>
>>> Currently, in Fortran, the number of iterations of the loop
>>
>>> do i=m1,m2,m3
>>> ....
>>> end do
>>
>>> is (m2-m1+m3)/m3 unless that value is negative, in which case it
>>> is zero (m3 defaults to 1 if it is not present).
>>
>>> So,
>>
>>> do i=1,3,-1
>>
>>> will be executed zero times, as will
>>
>>> do i=3,1
>>
>>> Translating that into arithmetic with unsigned integers makes
>>> little sense, how many times should
>>
>>> do i=1,3,4294967295
>>
>>> be executed?
>>
>> 3-1+4294967295 = 4294967297 // (m2-m1+m3)
>>
>> 4294967297 / 4294967295 = 1.0000000004656612874161594750863
>>
>> So the loop should be executed one time. {{And yes I know 4294967295 ==
>> 0x1,0000,0001}} What would you expect on a 36-bit machine (2s-complement)
>> where 4294967295 is representable naturally ??

> Correct (of course).

It seems to me that the problem is not using unsigned integers as
DO LOOP indexes, the problem is there is no compiler error message
from "4294967295 does not fit in integer container". Bringing the
problem to the programmer. THEN everybody is free (under the above)
to implement unsigned DO LOOPs.

> The same result would be expected for

> do i=1u,3u,-1u

> (asusming an u suffix for unsigned numbers).

> The problem is that this violates a Fortran basic assumption since
> FORTRAN 77, which is that DO loops can be zero-trip.

(m2-m1+m3)/m3

(3-1+(-1))/-1 = -1 and the loop should not be taken at all.
BUT
-1u = 4294967293
Therefore:
(3u-1u+(-1u))/-1u =
(3-1+4294967293)/4294967293 = 1.0000000004656612874161594750863 again.

Once again this is a problem only when a constant integer value
cannot be precisely represented--and deserves a warning/error
message instead of a complete ban.

> This is a can of worms that I would like to leave unopened.

I understand why, I just think the fickle finger of fate should
point at the constant rather than the type.

> Same goes for array slices. Even assuming that no negative
> indices are used, the slice a(1:3:-1) is zero-sized in Fortran,
> as is a(3:1) .

> For a(1u:3u:-1u) the same logic that you outlined above would apply,
> making it a slice with one element.

> Not going there :-)

Re: What integer C type to use

<jwva5n2m1lr.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37987&group=comp.arch#37987

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: monnier@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: What integer C type to use
Date: Wed, 13 Mar 2024 15:02:12 -0400
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <jwva5n2m1lr.fsf-monnier+comp.arch@gnu.org>
References: <upq0cr$6b5m$1@dont-email.me> <uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at>
<uqqvkc$i2cu$1@dont-email.me> <uqvk2o$1snbf$1@dont-email.me>
<ur0ka6$23ma8$1@dont-email.me>
<dd9c82c9be34460dc6fea35c8608e51d@www.novabbs.org>
<2024Feb20.083240@mips.complang.tuwien.ac.at>
<2024Feb20.130029@mips.complang.tuwien.ac.at>
<ur2jpf$2j800$1@dont-email.me>
<2024Feb20.184737@mips.complang.tuwien.ac.at>
<uraof0$kij0$1@newsreader4.netcologne.de>
<3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org>
<urfcgs$1rne2$1@dont-email.me>
<dbaa17babbd2ca8362fad9f9ecd4b79c@www.novabbs.org>
<usp9un$7pij$1@dont-email.me> <20240312144428.000063f5@yahoo.com>
<19da68f1b874758d42b64203741c325b@www.novabbs.org>
<20240312194918.00002cde@yahoo.com>
<db785354ebf90ee6f613fc9c39f8ca72@www.novabbs.org>
<20240313110859.000024e9@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="62ac73b5c59ed7b948a14cb769144932";
logging-data="1167236"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+QFez75KAXiugyiMH6AhNKcqCuaaFs/HU="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:/b8uoLtzfYdmbFhKHkSx2W1Wtog=
sha1:mi3qajoRyhambXHTSAXAtLWEOto=
 by: Stefan Monnier - Wed, 13 Mar 2024 19:02 UTC

> OTH, I am trying to discuss a vague notion of "Cray-style vectors". My
> intentions are to see what was applicable in more recent times and
> which ideas are not totally obsolete for a future.

Another way to look at the difference between SSE-style vectors (which
I'd call "short vectors") at the ISA level is the fact that SSE-style
vector instructions are designed under the assumption that the latency
of a vector instruction will be more or less the same as that of
a non-vector instruction (i.e. you have enough ALUs to do all the
operations at the same time), whereas Cray-style vector instructions
(which we could call "long vectors") are designed under the assumption
that the latency will be somewhat proportional to the length of the
vector because the core of the CPU will only access a chunk of the
vector at a time.

So, short vectors have a fairly free hand at shuffling data across their
vector (e.g. bitmatrix transpose), and they can be
implemented/scheduled/dispatched just like any other instruction, but
the vector length tends to be severely limited and exposed all over
the place.

In contrast long vectors usually depend on specialized implementations
(e.g. chaining) to get good performance, but their length is
easier/cheaper to change.

AFAICT long vectors made sense when we could build machines with
a memory bandwidth that was higher and ALUs were more expensive.
Nowadays we tend to have the opposite.

Also, the massive number of transistors we spend nowadays on OoO means
that a good OoO CPU can dispatch individual non-vector instructions to
ALUs just as well as the Cray did with its vectors with chaining.

Stefan

Re: What integer C type to use

<3186c85221c1baff1a23a46449dadd19@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37989&group=comp.arch#37989

  copy link   Newsgroups: comp.arch
Date: Wed, 13 Mar 2024 19:24:15 +0000
Subject: Re: What integer C type to use
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$FvCCUnK.JMeOgy2/y7NfW.0vmWk7sZW7/GFIDW4C0g1Zh15rkoDs.
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <upq0cr$6b5m$1@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <uqvk2o$1snbf$1@dont-email.me> <ur0ka6$23ma8$1@dont-email.me> <dd9c82c9be34460dc6fea35c8608e51d@www.novabbs.org> <2024Feb20.083240@mips.complang.tuwien.ac.at> <2024Feb20.130029@mips.complang.tuwien.ac.at> <ur2jpf$2j800$1@dont-email.me> <2024Feb20.184737@mips.complang.tuwien.ac.at> <uraof0$kij0$1@newsreader4.netcologne.de> <3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org> <urfcgs$1rne2$1@dont-email.me> <dbaa17babbd2ca8362fad9f9ecd4b79c@www.novabbs.org> <usp9un$7pij$1@dont-email.me> <20240312144428.000063f5@yahoo.com> <19da68f1b874758d42b64203741c325b@www.novabbs.org> <20240312194918.00002cde@yahoo.com> <db785354ebf90ee6f613fc9c39f8ca72@www.novabbs.org> <20240313110859.000024e9@yahoo.com> <jwva5n2m1lr.fsf-monnier+comp.arch@gnu.org>
Organization: Rocksolid Light
Message-ID: <3186c85221c1baff1a23a46449dadd19@www.novabbs.org>
 by: MitchAlsup1 - Wed, 13 Mar 2024 19:24 UTC

Stefan Monnier wrote:

>> OTH, I am trying to discuss a vague notion of "Cray-style vectors". My
>> intentions are to see what was applicable in more recent times and
>> which ideas are not totally obsolete for a future.

> Another way to look at the difference between SSE-style vectors (which
> I'd call "short vectors") at the ISA level is the fact that SSE-style
> vector instructions are designed under the assumption that the latency
> of a vector instruction will be more or less the same as that of
> a non-vector instruction (i.e. you have enough ALUs to do all the
> operations at the same time),

In the early-mid 1980s there was a class of processor assist engines
using the tern Array-Processor that performed a Cray-like vector in
the same latency as a scalar operation. CRAY streamed single-operands
through FUs, Array processors took entire <but shorter> vectors through
lanes of calculations.

I would call these "medium vector" to distinguish from (short vector)
SIMD and (long vector) CRAY {or just vector without qualifier}. So we
have::
SIMD <short> vector
ARRAY <medium> vector
CRAY <long> vector
CDC <memory> vector

> whereas Cray-style vector instructions
> (which we could call "long vectors") are designed under the assumption
> that the latency will be somewhat proportional to the length of the
> vector because the core of the CPU will only access a chunk of the
> vector at a time.

> So, short vectors have a fairly free hand at shuffling data across their
> vector (e.g. bitmatrix transpose), and they can be
> implemented/scheduled/dispatched just like any other instruction, but
> the vector length tends to be severely limited and exposed all over
> the place.

Consuming OpCode space like nobody's business.

> In contrast long vectors usually depend on specialized implementations
> (e.g. chaining) to get good performance, but their length is
> easier/cheaper to change.

The only limitation is when one masks out beats of the vector from
calculation of memory referencing. This is what kept CRAY at 64-element
vectors. It was also the Achilles heal of CRAY--once memory gets more
than 64 beats away, the length of the vector can no longer absorb the
latency to memory. NEC did not have this problem.

> AFAICT long vectors made sense when we could build machines with
> a memory bandwidth that was higher and ALUs were more expensive.
> Nowadays we tend to have the opposite.

While BW is important (very) it is latency that is crucial. Latency
to memory must be smaller than vector length.

> Also, the massive number of transistors we spend nowadays on OoO means
> that a good OoO CPU can dispatch individual non-vector instructions to
> ALUs just as well as the Cray did with its vectors with chaining.

Not "just as well" but "within spitting distance of"

> Stefan

Re: What integer C type to use

<20240314125833.00001ca3@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38011&group=comp.arch#38011

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.arch
Subject: Re: What integer C type to use
Date: Thu, 14 Mar 2024 12:58:33 +0200
Organization: A noiseless patient Spider
Lines: 73
Message-ID: <20240314125833.00001ca3@yahoo.com>
References: <upq0cr$6b5m$1@dont-email.me>
<uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at>
<uqqvkc$i2cu$1@dont-email.me>
<uqvk2o$1snbf$1@dont-email.me>
<ur0ka6$23ma8$1@dont-email.me>
<dd9c82c9be34460dc6fea35c8608e51d@www.novabbs.org>
<2024Feb20.083240@mips.complang.tuwien.ac.at>
<2024Feb20.130029@mips.complang.tuwien.ac.at>
<ur2jpf$2j800$1@dont-email.me>
<2024Feb20.184737@mips.complang.tuwien.ac.at>
<uraof0$kij0$1@newsreader4.netcologne.de>
<3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org>
<urfcgs$1rne2$1@dont-email.me>
<dbaa17babbd2ca8362fad9f9ecd4b79c@www.novabbs.org>
<usp9un$7pij$1@dont-email.me>
<20240312144428.000063f5@yahoo.com>
<19da68f1b874758d42b64203741c325b@www.novabbs.org>
<20240312194918.00002cde@yahoo.com>
<db785354ebf90ee6f613fc9c39f8ca72@www.novabbs.org>
<20240313110859.000024e9@yahoo.com>
<jwva5n2m1lr.fsf-monnier+comp.arch@gnu.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="5c5cd9e25eab93927139eb22892a80cb";
logging-data="1592818"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+sC+mo61Oc029PkuQK24XMfTwPoMjQnLM="
Cancel-Lock: sha1:Y1hoxRqFGBa1hBDZumrDXvCL71E=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
 by: Michael S - Thu, 14 Mar 2024 10:58 UTC

On Wed, 13 Mar 2024 15:02:12 -0400
Stefan Monnier <monnier@iro.umontreal.ca> wrote:

> > OTH, I am trying to discuss a vague notion of "Cray-style vectors".
> > My intentions are to see what was applicable in more recent times
> > and which ideas are not totally obsolete for a future.
>
> Another way to look at the difference between SSE-style vectors (which
> I'd call "short vectors") at the ISA level is the fact that SSE-style
> vector instructions are designed under the assumption that the latency
> of a vector instruction will be more or less the same as that of
> a non-vector instruction (i.e. you have enough ALUs to do all the
> operations at the same time), whereas Cray-style vector instructions
> (which we could call "long vectors") are designed under the assumption
> that the latency will be somewhat proportional to the length of the
> vector because the core of the CPU will only access a chunk of the
> vector at a time.
>

I sympathize with your direction, but want to point out that
historically in x86 world implementations of those or another SIMD
instruction sets that had twice higher throughput for scalar FP ops vs
full-width FP ops were and are very common. As to latency, many of
these implementations had (and may be still have?) one cycle longer
latency for scalar vs full-width.

Similar throughput differences can be seen on few aarch64 processors.
E.g. on Cortex-A75, where Q-form of common FP arithmetic instructions
has half of throughput of scalar and D-form, but latency is not
affected.
I'd guess that the same applies to LITTLE Arm cores, but can't be sure,
because Cortex-A53 Software Optimization Guide does not exist and
Cortex-A55 Software Optimization Guide is not very comprehensible.

> So, short vectors have a fairly free hand at shuffling data across
> their vector (e.g. bitmatrix transpose), and they can be
> implemented/scheduled/dispatched just like any other instruction, but
> the vector length tends to be severely limited and exposed all over
> the place.
>
> In contrast long vectors usually depend on specialized implementations
> (e.g. chaining) to get good performance, but their length is
> easier/cheaper to change.
>
> AFAICT long vectors made sense when we could build machines with
> a memory bandwidth that was higher and ALUs were more expensive.
> Nowadays we tend to have the opposite.
>

16x DP FMA (dual-issue 512-bit), as in server-class Intel cores, is
still expensive, less so in area, more so in effect on max. power
consumption.
As to memory, I find this argument false, at least in its original form
often stated here by Mitch.
There are many important algorithms, both in dense linear algebra and
in digital signal processing, where it is easily possible to reduce
memory read to DP FMA ratio to 5-6 bytes/DP-FMA. With 32 software
visible vector registers and with more programmer's effort the ratio
could be pushed further down, sometimes below 4 bytes.
I know little about other computationally intensive fields (apart
from dense linear algebra and DSP, that is), but would expect that if
not 5-6 then at least 8-9 bytes/DP-FMA (or equivalents for lower
precisions) are not out of reach in other field as well.

> Also, the massive number of transistors we spend nowadays on OoO means
> that a good OoO CPU can dispatch individual non-vector instructions to
> ALUs just as well as the Cray did with its vectors with chaining.
>
>
> Stefan

Re: What integer C type to use

<ut0gsm$23m79$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38051&group=comp.arch#38051

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: What integer C type to use
Date: Thu, 14 Mar 2024 23:56:04 -0400
Organization: A noiseless patient Spider
Lines: 45
Message-ID: <ut0gsm$23m79$1@dont-email.me>
References: <upq0cr$6b5m$1@dont-email.me> <uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me>
<uqvk2o$1snbf$1@dont-email.me> <ur0ka6$23ma8$1@dont-email.me>
<dd9c82c9be34460dc6fea35c8608e51d@www.novabbs.org>
<2024Feb20.083240@mips.complang.tuwien.ac.at>
<2024Feb20.130029@mips.complang.tuwien.ac.at> <ur2jpf$2j800$1@dont-email.me>
<2024Feb20.184737@mips.complang.tuwien.ac.at>
<uraof0$kij0$1@newsreader4.netcologne.de>
<3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org>
<urfcgs$1rne2$1@dont-email.me>
<dbaa17babbd2ca8362fad9f9ecd4b79c@www.novabbs.org>
<usp9un$7pij$1@dont-email.me> <20240312144428.000063f5@yahoo.com>
<19da68f1b874758d42b64203741c325b@www.novabbs.org>
<20240312194918.00002cde@yahoo.com>
<db785354ebf90ee6f613fc9c39f8ca72@www.novabbs.org>
<20240313110859.000024e9@yahoo.com>
<jwva5n2m1lr.fsf-monnier+comp.arch@gnu.org>
<3186c85221c1baff1a23a46449dadd19@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 15 Mar 2024 03:56:06 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f106a7ce8c34302680a5ed4840a9b1d6";
logging-data="2218217"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/DfVx2rnx6nadVLFzJx5ttgjRnxgco1pg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:cOT5WzX3iEZb5Nj9aKz3xFlyAEs=
In-Reply-To: <3186c85221c1baff1a23a46449dadd19@www.novabbs.org>
 by: Paul A. Clayton - Fri, 15 Mar 2024 03:56 UTC

On 3/13/24 3:24 PM, MitchAlsup1 wrote:
> Stefan Monnier wrote:
[snip]
>> So, short vectors have a fairly free hand at shuffling data across their
>> vector (e.g. bitmatrix transpose), and they can be
>> implemented/scheduled/dispatched just like any other instruction, but
>> the vector length tends to be severely limited and exposed all over
>> the place.
>
> Consuming OpCode space like nobody's business.

Is that necessarily the case? Excluding the shuffle operations, I
think only loads and stores would need to have length specifiers.
Shuffle operations become much more expensive with larger
'vectors', so providing the same granularity of shuffle for larger
vectors seems questionable. (With scatter/gather, permute/shuffle
may be less useful anyway.)

The metadata would not even _have_ to be saved, though such would
be better than unnecessarily saving/restoring huge contexts and if
one can support variable-sized contexts one can support additional
metatdata.

If loads and stores were masked, the number of instruction
"encodings" would not need to be increased, but using zero-
extended masks to indicate smaller vector size seems less than
ideal.

Lane-based operations with different length operands would
presumably narrow the result (the later elements of the longer
operand would not be used) and allow the possible error to be
detected.

Unlike My 66000's VVM, different-sized elements would require
unpack/pack operations (where VVM needs to implicitly pack an
operand if it wanted to take advantage of reduced storage). This
would not increase instruction encodings as vector length is
increased. (VVM also provides sequential exceptions while SIMD is
conceptually all at once.)

Am I missing something when I assume that lane-based (SIMD)
operations do not need size information in the instruction? The
extra metadata is not free (perhaps especially as that controls
execution at least for efficiency), but if opcode explosion is so
undesirable using metadata might be preferred.

Re: What integer C type to use

<ut13lm$26ugq$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38052&group=comp.arch#38052

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: What integer C type to use
Date: Fri, 15 Mar 2024 10:16:37 +0100
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <ut13lm$26ugq$1@dont-email.me>
References: <upq0cr$6b5m$1@dont-email.me> <uqobhv$3o4m9$2@dont-email.me>
<1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com>
<uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me>
<uqvk2o$1snbf$1@dont-email.me> <ur0ka6$23ma8$1@dont-email.me>
<dd9c82c9be34460dc6fea35c8608e51d@www.novabbs.org>
<2024Feb20.083240@mips.complang.tuwien.ac.at>
<2024Feb20.130029@mips.complang.tuwien.ac.at> <ur2jpf$2j800$1@dont-email.me>
<2024Feb20.184737@mips.complang.tuwien.ac.at>
<uraof0$kij0$1@newsreader4.netcologne.de>
<3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org>
<urfcgs$1rne2$1@dont-email.me>
<dbaa17babbd2ca8362fad9f9ecd4b79c@www.novabbs.org>
<usp9un$7pij$1@dont-email.me> <20240312144428.000063f5@yahoo.com>
<19da68f1b874758d42b64203741c325b@www.novabbs.org>
<20240312194918.00002cde@yahoo.com>
<db785354ebf90ee6f613fc9c39f8ca72@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 15 Mar 2024 09:16:38 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="6f1799d9c53815839a977e52488ad30c";
logging-data="2325018"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+2BVs+H5LlZ+Mw8WBT8of+mO7ZsvglBUD43cNwGVSXMw=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18.1
Cancel-Lock: sha1:wYS2OQOSLzqCcVM+0bihES68ymA=
In-Reply-To: <db785354ebf90ee6f613fc9c39f8ca72@www.novabbs.org>
 by: Terje Mathisen - Fri, 15 Mar 2024 09:16 UTC

MitchAlsup1 wrote:
> Michael S wrote:
>> Doctor, it hurts when I do this!
>> So, what prevents you from providing no gather with resolution
>> below 64 bits?
>
> Well, then, you have SP values in a container than could hold 2 and you
> don't get any SIMD speedup.

I've looked at this issue since Larrabee (which was the first cpu I had
access to which supported gather):

For a 32-bit gather in a 64-bit environment, the only good solution I've
been able to come up with is to use a 64-bit base register and then
require all the sources to be within 4GB from that base, so that you can
use the 32-bit wide gather addresses as indices/offsets from that base.

<dest_vector_reg> = gather(base_reg, src_vector_reg)

It would be up to the implementer to decide if the src_vector_reg is
used as signed or unsigned offsets from the base. It is also possible to
extend adressability by having the offsets be scaled by the element
size, so effectively

dest[i] = [base+src[i]*4]

for single precision values.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Cray style vectors

<ut14l1$27450$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38053&group=comp.arch#38053

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Fri, 15 Mar 2024 10:33:21 +0100
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <ut14l1$27450$1@dont-email.me>
References: <upq0cr$6b5m$1@dont-email.me>
<1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com>
<uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me>
<uqvk2o$1snbf$1@dont-email.me> <ur1h0v$emi4$1@newsreader4.netcologne.de>
<86r0h6wyil.fsf@linuxsc.com> <ur7v2r$ipnu$1@newsreader4.netcologne.de>
<861q91ulhs.fsf@linuxsc.com> <urkbsu$34rpk$1@dont-email.me>
<864jdcsqmn.fsf@linuxsc.com> <usp8lp$7i96$1@dont-email.me>
<2024Mar12.232336@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 15 Mar 2024 09:33:21 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="6f1799d9c53815839a977e52488ad30c";
logging-data="2330784"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Aapd2zzykDf4dhyg4CDpq1Jj+yksBmF2DNOQTFghV+A=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18.1
Cancel-Lock: sha1:7FfZLJU8dVfS54CQmGx3wHH3L5A=
In-Reply-To: <2024Mar12.232336@mips.complang.tuwien.ac.at>
 by: Terje Mathisen - Fri, 15 Mar 2024 09:33 UTC

Anton Ertl wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>> Tim Rentsch wrote:
>>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>>
>>>> If I really had to write a 64x64->128 MUL, with no widening MUL or
>>>> MULH which returns the high half, then I would punt and do it using
>>>> 32-bit parts (all variables are u64): [...]
>>>
>>> I wrote some code along the same lines. A difference is you
>>> are considering unsigned multiplication, and I am considering
>>> signed multiplication.
>>>
>> Signed mul is just a special case of unsigned mul, right?
>>
>> I.e. in case of a signed widening mul, you'd first extract the signs,
>> convert the inputs to unsigned, then do the unsigned widening mul,
>> before finally resotirng the sign as the XOR of the input signs?
>
> In Gforth we use:
>
> DCell mmul (Cell a, Cell b) /* signed multiply, mixed precision */
> {
> DCell res;
>
> res = UD2D(ummul (a, b));
> if (a < 0)
> res.hi -= b;
> if (b < 0)
> res.hi -= a;
> return res;
> }
>
> I have this technique from Andrew Haley. It relies on twos-complement
> representation.

Nice!

Subtracting the results of having used the sign bit as part of the
multiplication.

Here you can probably schedule the fixup to happen in parallel with the
actual multiplication:

;; inputs in r9 & r10, result in rdx:rax, rbx & rcx as scratch

mov rax,r9 ;; All these can start in the first cycle
mul r10
mov rbx,r9 ;; The MOV can be handled by the renamer
sar r9,63
mov rcx,r10 ;; Ditto
sar r10,63

and rbx,r9 ;; Second set of ops
and rcx,r10

add rbx,rcx ;; Third cycle

sub rdx,rbx ;; Do a single adjustment as soon as the MUL finishes

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: What integer C type to use

<20240315140109.00006006@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38058&group=comp.arch#38058

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.arch
Subject: Re: What integer C type to use
Date: Fri, 15 Mar 2024 14:01:09 +0200
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <20240315140109.00006006@yahoo.com>
References: <upq0cr$6b5m$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at>
<uqqvkc$i2cu$1@dont-email.me>
<uqvk2o$1snbf$1@dont-email.me>
<ur0ka6$23ma8$1@dont-email.me>
<dd9c82c9be34460dc6fea35c8608e51d@www.novabbs.org>
<2024Feb20.083240@mips.complang.tuwien.ac.at>
<2024Feb20.130029@mips.complang.tuwien.ac.at>
<ur2jpf$2j800$1@dont-email.me>
<2024Feb20.184737@mips.complang.tuwien.ac.at>
<uraof0$kij0$1@newsreader4.netcologne.de>
<3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org>
<urfcgs$1rne2$1@dont-email.me>
<dbaa17babbd2ca8362fad9f9ecd4b79c@www.novabbs.org>
<usp9un$7pij$1@dont-email.me>
<20240312144428.000063f5@yahoo.com>
<19da68f1b874758d42b64203741c325b@www.novabbs.org>
<20240312194918.00002cde@yahoo.com>
<db785354ebf90ee6f613fc9c39f8ca72@www.novabbs.org>
<20240313110859.000024e9@yahoo.com>
<jwva5n2m1lr.fsf-monnier+comp.arch@gnu.org>
<3186c85221c1baff1a23a46449dadd19@www.novabbs.org>
<ut0gsm$23m79$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Injection-Info: dont-email.me; posting-host="33cd23f10fa0274abc2057decddf1d09";
logging-data="2380606"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Lb3wlqDzu9zISJZZ5Fx80iB4aX0wIlMU="
Cancel-Lock: sha1:HgJTUdbwBqmNImSCHzH7dYKAuEQ=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
 by: Michael S - Fri, 15 Mar 2024 12:01 UTC

On Thu, 14 Mar 2024 23:56:04 -0400
"Paul A. Clayton" <paaronclayton@gmail.com> wrote:

> On 3/13/24 3:24 PM, MitchAlsup1 wrote:
> > Stefan Monnier wrote:
> [snip]
> >> So, short vectors have a fairly free hand at shuffling data across
> >> their vector (e.g. bitmatrix transpose), and they can be
> >> implemented/scheduled/dispatched just like any other instruction,
> >> but the vector length tends to be severely limited and exposed all
> >> over the place.
> >
> > Consuming OpCode space like nobody's business.
>
> Is that necessarily the case? Excluding the shuffle operations, I
> think only loads and stores would need to have length specifiers.
> Shuffle operations become much more expensive with larger
> 'vectors', so providing the same granularity of shuffle for larger
> vectors seems questionable. (With scatter/gather, permute/shuffle
> may be less useful anyway.)
>
<snip>
>
> Am I missing something when I assume that lane-based (SIMD)
> operations do not need size information in the instruction? The
> extra metadata is not free (perhaps especially as that controls
> execution at least for efficiency), but if opcode explosion is so
> undesirable using metadata might be preferred.

I would guess that Mitch operates under assumption that we are still
talking about ISA that is very similar to CRAY Y-MP.
I.e. non-load-store, but more like AVX512, where one of the source
operands could be in memory.
That is, I had never seen CRAY Y-MP ISA docs, but would imagine that it
was like that.

Re: Cray style vectors

<2024Mar15.180719@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38062&group=comp.arch#38062

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Fri, 15 Mar 2024 17:07:19 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 30
Message-ID: <2024Mar15.180719@mips.complang.tuwien.ac.at>
References: <upq0cr$6b5m$1@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <uqvk2o$1snbf$1@dont-email.me> <ur1h0v$emi4$1@newsreader4.netcologne.de> <86r0h6wyil.fsf@linuxsc.com> <ur7v2r$ipnu$1@newsreader4.netcologne.de> <861q91ulhs.fsf@linuxsc.com> <urkbsu$34rpk$1@dont-email.me> <864jdcsqmn.fsf@linuxsc.com> <usp8lp$7i96$1@dont-email.me> <2024Mar12.232336@mips.complang.tuwien.ac.at> <ut14l1$27450$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="8750a290ee5e01c1a817d491b88228ef";
logging-data="2515179"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1916R9CphqgRIVa7Sw1vTxI"
Cancel-Lock: sha1:LjC4AnzJ1NOM0kOZkhctkAJejyw=
X-newsreader: xrn 10.11
 by: Anton Ertl - Fri, 15 Mar 2024 17:07 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:
>Here you can probably schedule the fixup to happen in parallel with the
>actual multiplication:
>
>;; inputs in r9 & r10, result in rdx:rax, rbx & rcx as scratch
>
> mov rax,r9 ;; All these can start in the first cycle
> mul r10
> mov rbx,r9 ;; The MOV can be handled by the renamer
> sar r9,63
> mov rcx,r10 ;; Ditto
> sar r10,63
>
> and rbx,r9 ;; Second set of ops
> and rcx,r10
>
> add rbx,rcx ;; Third cycle
>
> sub rdx,rbx ;; Do a single adjustment as soon as the MUL finishes

Of course on AMD64 you could just use imul instead.

RISC-V also supports signed as well as unsigned (and also
signed*unsigned) multiplication, and I think that's also the case for
ARM A64. But on Alpha this technique would be useful.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: What integer C type to use

<34935c02188495cad04991abacecdf97@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38063&group=comp.arch#38063

  copy link   Newsgroups: comp.arch
Date: Fri, 15 Mar 2024 20:03:33 +0000
Subject: Re: What integer C type to use
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$WpTgQJUjEDxVGWjmr880A.yNOyCgBot9F4H04u.hH26MvPprWtABC
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <upq0cr$6b5m$1@dont-email.me> <ur0ka6$23ma8$1@dont-email.me> <dd9c82c9be34460dc6fea35c8608e51d@www.novabbs.org> <2024Feb20.083240@mips.complang.tuwien.ac.at> <2024Feb20.130029@mips.complang.tuwien.ac.at> <ur2jpf$2j800$1@dont-email.me> <2024Feb20.184737@mips.complang.tuwien.ac.at> <uraof0$kij0$1@newsreader4.netcologne.de> <3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org> <urfcgs$1rne2$1@dont-email.me> <dbaa17babbd2ca8362fad9f9ecd4b79c@www.novabbs.org> <usp9un$7pij$1@dont-email.me> <20240312144428.000063f5@yahoo.com> <19da68f1b874758d42b64203741c325b@www.novabbs.org> <20240312194918.00002cde@yahoo.com> <db785354ebf90ee6f613fc9c39f8ca72@www.novabbs.org> <20240313110859.000024e9@yahoo.com> <jwva5n2m1lr.fsf-monnier+comp.arch@gnu.org> <3186c85221c1baff1a23a46449dadd19@www.novabbs.org> <ut0gsm$23m79$1@dont-email.me>
Organization: Rocksolid Light
Message-ID: <34935c02188495cad04991abacecdf97@www.novabbs.org>
 by: MitchAlsup1 - Fri, 15 Mar 2024 20:03 UTC

Paul A. Clayton wrote:

> On 3/13/24 3:24 PM, MitchAlsup1 wrote:
>> Stefan Monnier wrote:
> [snip]
>>> So, short vectors have a fairly free hand at shuffling data across their
>>> vector (e.g. bitmatrix transpose), and they can be
>>> implemented/scheduled/dispatched just like any other instruction, but
>>> the vector length tends to be severely limited and exposed all over
>>> the place.
>>
>> Consuming OpCode space like nobody's business.

> Is that necessarily the case? Excluding the shuffle operations, I
> think only loads and stores would need to have length specifiers.

Add signed byte add unsigned byte, add signed half, add unsigned half
add signed int, add unsigned int, add long, add 2 floats, add double

compared to ADD integer, add float and add double.

And then there is the addsub group, too.

I may be possible to avoid OpCode explosion, but neither x86, nor ARM
got anywhere close to avoiding the deluge of OpCodes.

Re: What integer C type to use

<277280765e037dd0fc556e2fc4bc4912@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38064&group=comp.arch#38064

  copy link   Newsgroups: comp.arch
Date: Fri, 15 Mar 2024 20:11:36 +0000
Subject: Re: What integer C type to use
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$wgUw9GaKeWo/ljLil6Yb7uIBUDshB6HCH2Hkv3PqX3ZLlsJlRVpF.
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <upq0cr$6b5m$1@dont-email.me> <ur0ka6$23ma8$1@dont-email.me> <dd9c82c9be34460dc6fea35c8608e51d@www.novabbs.org> <2024Feb20.083240@mips.complang.tuwien.ac.at> <2024Feb20.130029@mips.complang.tuwien.ac.at> <ur2jpf$2j800$1@dont-email.me> <2024Feb20.184737@mips.complang.tuwien.ac.at> <uraof0$kij0$1@newsreader4.netcologne.de> <3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org> <urfcgs$1rne2$1@dont-email.me> <dbaa17babbd2ca8362fad9f9ecd4b79c@www.novabbs.org> <usp9un$7pij$1@dont-email.me> <20240312144428.000063f5@yahoo.com> <19da68f1b874758d42b64203741c325b@www.novabbs.org> <20240312194918.00002cde@yahoo.com> <db785354ebf90ee6f613fc9c39f8ca72@www.novabbs.org> <20240313110859.000024e9@yahoo.com> <jwva5n2m1lr.fsf-monnier+comp.arch@gnu.org> <3186c85221c1baff1a23a46449dadd19@www.novabbs.org> <ut0gsm$23m79$1@dont-email.me> <34935c02188495cad04991abacecdf97@www.novabbs.org>
Organization: Rocksolid Light
Message-ID: <277280765e037dd0fc556e2fc4bc4912@www.novabbs.org>
 by: MitchAlsup1 - Fri, 15 Mar 2024 20:11 UTC

MitchAlsup1 wrote:

> Paul A. Clayton wrote:

>> On 3/13/24 3:24 PM, MitchAlsup1 wrote:
>>> Stefan Monnier wrote:
>> [snip]
>>>> So, short vectors have a fairly free hand at shuffling data across their
>>>> vector (e.g. bitmatrix transpose), and they can be
>>>> implemented/scheduled/dispatched just like any other instruction, but
>>>> the vector length tends to be severely limited and exposed all over
>>>> the place.
>>>
>>> Consuming OpCode space like nobody's business.

>> Is that necessarily the case? Excluding the shuffle operations, I
>> think only loads and stores would need to have length specifiers.

> Add signed byte add unsigned byte, add signed half, add unsigned half
> add signed int, add unsigned int, add long, add 2 floats, add double

All of the above come in 64-bit, 128-bit, 256-bit, and 512-bit variants.

> compared to ADD integer, add float and add double.

> And then there is the addsub group, too.

> I may be possible to avoid OpCode explosion, but neither x86, nor ARM
> got anywhere close to avoiding the deluge of OpCodes.

Pages:12345678910
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor