Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

Warp 7 -- It's a law we can live with.


devel / comp.arch / Re: Cray style vectors

SubjectAuthor
* A Very Bad IdeaQuadibloc
+- Re: A Very Bad IdeaChris M. Thomasson
+* Vectors (was: A Very Bad Idea)Anton Ertl
|+* Re: Vectors (was: A Very Bad Idea)Quadibloc
||+- Re: Vectors (was: A Very Bad Idea)Anton Ertl
||`- Re: VectorsMitchAlsup1
|`- Re: VectorsMitchAlsup1
+* Re: A Very Bad IdeaBGB
|`* Re: A Very Bad IdeaMitchAlsup1
| `- Re: A Very Bad IdeaBGB-Alt
+- Re: A Very Bad IdeaMitchAlsup1
+* Re: A Very Bad Idea?Lawrence D'Oliveiro
|`* Re: A Very Bad Idea?MitchAlsup1
| `- Re: A Very Bad Idea?BGB-Alt
`* Re: Cray style vectors (was: A Very Bad Idea)Marcus
 +* Re: Cray style vectors (was: A Very Bad Idea)Quadibloc
 |+- Re: Cray style vectors (was: A Very Bad Idea)Quadibloc
 |+* Re: Cray style vectors (was: A Very Bad Idea)Scott Lurndal
 ||`* Re: Cray style vectors (was: A Very Bad Idea)Thomas Koenig
 || `* Re: Cray style vectorsMitchAlsup1
 ||  `- Re: Cray style vectorsQuadibloc
 |`* Re: Cray style vectorsMarcus
 | +- Re: Cray style vectorsMitchAlsup1
 | `* Re: Cray style vectorsQuadibloc
 |  +- Re: Cray style vectorsQuadibloc
 |  +* Re: Cray style vectorsAnton Ertl
 |  |`* Re: Cray style vectorsStephen Fuld
 |  | +* Re: Cray style vectorsAnton Ertl
 |  | |+- Re: Cray style vectorsMitchAlsup1
 |  | |`* Re: Cray style vectorsStephen Fuld
 |  | | `* Re: Cray style vectorsMitchAlsup
 |  | |  `* Re: Cray style vectorsStephen Fuld
 |  | |   `* Re: Cray style vectorsTerje Mathisen
 |  | |    `* Re: Cray style vectorsAnton Ertl
 |  | |     +* Re: Cray style vectorsTerje Mathisen
 |  | |     |+- Re: Cray style vectorsMitchAlsup1
 |  | |     |+* Re: Cray style vectorsTim Rentsch
 |  | |     ||+* Re: Cray style vectorsMitchAlsup1
 |  | |     |||`* Re: Cray style vectorsTim Rentsch
 |  | |     ||| +* Re: Cray style vectorsOpus
 |  | |     ||| |`- Re: Cray style vectorsTim Rentsch
 |  | |     ||| +* Re: Cray style vectorsScott Lurndal
 |  | |     ||| |`- Re: Cray style vectorsTim Rentsch
 |  | |     ||| `* Re: Cray style vectorsMitchAlsup1
 |  | |     |||  `- Re: Cray style vectorsTim Rentsch
 |  | |     ||`* Re: Cray style vectorsTerje Mathisen
 |  | |     || `* Re: Cray style vectorsTim Rentsch
 |  | |     ||  `* Re: Cray style vectorsTerje Mathisen
 |  | |     ||   +* Re: Cray style vectorsTerje Mathisen
 |  | |     ||   |+* Re: Cray style vectorsMichael S
 |  | |     ||   ||`* Re: Cray style vectorsMitchAlsup1
 |  | |     ||   || `- Re: Cray style vectorsScott Lurndal
 |  | |     ||   |`- Re: Cray style vectorsTim Rentsch
 |  | |     ||   `- Re: Cray style vectorsTim Rentsch
 |  | |     |+- Re: Cray style vectorsAnton Ertl
 |  | |     |`* Re: Cray style vectorsDavid Brown
 |  | |     | +* Re: Cray style vectorsTerje Mathisen
 |  | |     | |+* Re: Cray style vectorsMitchAlsup1
 |  | |     | ||+* Re: Cray style vectorsAnton Ertl
 |  | |     | |||`* What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | ||| `* Re: What integer C type to use (was: Cray style vectors)David Brown
 |  | |     | |||  +* Re: What integer C type to use (was: Cray style vectors)Scott Lurndal
 |  | |     | |||  |`* Re: What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | |||  | +* Re: What integer C type to use (was: Cray style vectors)Scott Lurndal
 |  | |     | |||  | |+- Re: What integer C type to useMitchAlsup1
 |  | |     | |||  | |`* Re: What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | |||  | | `* Re: What integer C type to use (was: Cray style vectors)Scott Lurndal
 |  | |     | |||  | |  `* Re: What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | |||  | |   +- Re: What integer C type to use (was: Cray style vectors)Scott Lurndal
 |  | |     | |||  | |   `* Re: What integer C type to use (was: Cray style vectors)Tim Rentsch
 |  | |     | |||  | |    `* Re: What integer C type to use (was: Cray style vectors)Scott Lurndal
 |  | |     | |||  | |     `- Re: What integer C type to use (was: Cray style vectors)Tim Rentsch
 |  | |     | |||  | `- Re: What integer C type to useMitchAlsup1
 |  | |     | |||  +* Re: What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | |||  |+* Re: What integer C type to useMitchAlsup1
 |  | |     | |||  ||+- Re: What integer C type to useDavid Brown
 |  | |     | |||  ||`* Re: What integer C type to useTerje Mathisen
 |  | |     | |||  || `* Re: What integer C type to useTim Rentsch
 |  | |     | |||  ||  `* Re: What integer C type to useMitchAlsup1
 |  | |     | |||  ||   +- Re: What integer C type to useTim Rentsch
 |  | |     | |||  ||   `* Re: What integer C type to useDavid Brown
 |  | |     | |||  ||    `- Re: What integer C type to useThomas Koenig
 |  | |     | |||  |+* Re: What integer C type to use (was: Cray style vectors)David Brown
 |  | |     | |||  ||+* Re: What integer C type to use (was: Cray style vectors)Scott Lurndal
 |  | |     | |||  |||+* Re: What integer C type to use (was: Cray style vectors)Michael S
 |  | |     | |||  ||||+- Re: What integer C type to use (was: Cray style vectors)Scott Lurndal
 |  | |     | |||  ||||`- Re: What integer C type to use (was: Cray style vectors)David Brown
 |  | |     | |||  |||`- Re: What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | |||  ||`* Re: What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | |||  || `* Re: What integer C type to useDavid Brown
 |  | |     | |||  ||  `* Re: What integer C type to useMitchAlsup1
 |  | |     | |||  ||   `- Re: What integer C type to useDavid Brown
 |  | |     | |||  |`* Re: What integer C type to use (was: Cray style vectors)Thomas Koenig
 |  | |     | |||  | +* Re: What integer C type to useMitchAlsup1
 |  | |     | |||  | |+* Re: What integer C type to useDavid Brown
 |  | |     | |||  | ||`* Re: What integer C type to useMitchAlsup1
 |  | |     | |||  | || `* Re: What integer C type to useDavid Brown
 |  | |     | |||  | ||  `* Re: What integer C type to useMichael S
 |  | |     | |||  | ||   +* Re: What integer C type to useMitchAlsup1
 |  | |     | |||  | ||   |`* Re: What integer C type to useMichael S
 |  | |     | |||  | ||   | `* Re: What integer C type to useMitchAlsup1
 |  | |     | |||  | ||   `- Re: What integer C type to useThomas Koenig
 |  | |     | |||  | |`* Re: What integer C type to useThomas Koenig
 |  | |     | |||  | `* Re: What integer C type to use (was: Cray style vectors)Anton Ertl
 |  | |     | |||  +* Re: What integer C type to use (was: Cray style vectors)Brian G. Lucas
 |  | |     | |||  `- Re: What integer C type to useBGB
 |  | |     | ||+- Re: Cray style vectorsDavid Brown
 |  | |     | ||`- Re: Cray style vectorsTim Rentsch
 |  | |     | |+- Re: Cray style vectorsDavid Brown
 |  | |     | |`- Re: Cray style vectorsTim Rentsch
 |  | |     | `* Re: Cray style vectorsThomas Koenig
 |  | |     `* Re: Cray style vectorsBGB
 |  | `- Re: Cray style vectorsMitchAlsup1
 |  +- Re: Cray style vectorsBGB
 |  +* Re: Cray style vectorsMarcus
 |  `* Re: Cray style vectorsMitchAlsup1
 `* Re: Cray style vectors (was: A Very Bad Idea)Michael S

Pages:12345678910
Re: Cray style vectors

<a0d20048208d09be3944d5609831354d@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37440&group=comp.arch#37440

  copy link   Newsgroups: comp.arch
Date: Sat, 17 Feb 2024 17:18:36 +0000
Subject: Re: Cray style vectors
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$g2EpMr7MtdsBiqqwnhillenG60zZKa7iQ4lXVgYHW89KbT5j5Dhfu
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me> <uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <uqngut$3r1tr$1@dont-email.me> <16dcb6b6bc6d703cdd95c5f0aea5d164@www.novabbs.org> <AGOzN.80209$SyNd.74562@fx33.iad> <192c54d3d7ecce21832bf5785afd2597@www.novabbs.com> <4Y1AN.329547$PuZ9.21942@fx11.iad>
Organization: Rocksolid Light
Message-ID: <a0d20048208d09be3944d5609831354d@www.novabbs.org>
 by: MitchAlsup1 - Sat, 17 Feb 2024 17:18 UTC

EricP wrote:

> MitchAlsup wrote:
>> EricP wrote:
>>
>>> MitchAlsup1 wrote:
>>>>
>>>> You should think of it like:: VVM can execute as many operations per
>>>> cycle as it has function units. In particular, the low end machine
>>>> can execute a LD, and FMAC, and the ADD-CMP-BC loop terminator every
>>>> cycle. LDs operate at 128-bits wide, so one can execute a LD on even
>>>> cycles and a ST on odd cycles--giving 6-IPC on a 1 wide machine.
>>>>
>>>> Bigger implementations can have more cache ports and more FMAC units;
>>>> and include "lanes" in SIMD-like fashion.
>>
>>> Regarding the 128-bit LD and ST, are you saying the LSQ recognizes
>>> two consecutive 64-bit LD or ST to consecutive addresses and merges
>>> them into a single cache access?
>>
>> first: memory is inherently misaligned in My 66000 architecture. So, since
>> the width of the machine is 64-bits, we read or write in 128-bit quantities
>> so that we have enough bits to extract the misaligned data from or a
>> container
>> large enough to store a 64-bit value into. {{And there are all the
>> associated
>> corner cases}}
>>
>> Second: over in VVM-land, the implementation can decide to read and write
>> wider, but is architecturally constrained not to shrink below 128-bits.
>>
>> A 1-wide My66160 would read pairs of double precision FP values, or quads
>> of 32-bit values, octets of 16-bit values, and hexademials of 8-bit values.
>> This supports loops of 6IPC or greater in a 1-wide machine. This machine
>> would process suitable loops at 128-bits per cycle--depending on "other
>> things" that are generally allowable.
>>
>> A 6-wide My66650 would read a cache line at a time, and has 3 cache ports
>> per cycle. This supports 20 IPC or greater in the 6-wide machine. As
>> many as
>> 8 DP FP calculations per cycle are possible, with adequate LD/ST bandwidths
>> to support this rate.

> Ah, so it can emit Load/Store Pair LDP/STP (or wider) uOps inside the loop.
> That's more straight forward than fusing LD's or ST's in LSQ.

>>> Is that done by disambiguation logic, checking for same cache line
>>> access?
>>
>> Before I have said that the front end observes the first iteration of
>> the loop and makes some determinations as to how wide the loop can be
>> run on
>> the machine at hand. One of those observations is whether memory addresses
>> are dense, whether they all go in the same direction, and what registers
>> carry loop-to-loop dependencies.

> How does it know when to use LDP/STP uOps?

It does not have LDP/STP ops to use.
It uses the width of the cache port it has.
It just so happens that the low end machine has a cache width of 128-bits.
But each implementation gets to choose its own width.

> That decision would have to be made early in the front end, likely Decode
> and before Rename because you have to know how many dest registers you need.

It is not using a register, although it is using flip-flops. It is not
using something that is visible to SW but is visible to HW.

> But the decision on the legality to use LDP/STP depends on knowing the
> current loop counter >= 2 and address(es) aligned on a 16 byte boundary,
> which are multiple dynamic, possibly calculated, values only available
> much later to the back end.

It does not need to see the address aligned to a 16-byte boundary.

Re: Cray style vectors

<2024Feb17.190353@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37443&group=comp.arch#37443

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Sat, 17 Feb 2024 18:03:53 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 19
Message-ID: <2024Feb17.190353@mips.complang.tuwien.ac.at>
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me> <uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="cf099c8f33a36b9f49c577103799719e";
logging-data="562285"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Z3egB09QGRgDXvEci3B+P"
Cancel-Lock: sha1:WNViGQP5ayDkKr72O016AewK5BI=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sat, 17 Feb 2024 18:03 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:
>On the third (i.e gripping) hand you could have a language like Java=20
>where it would be illegal to transform a temporarily trapping loop into=20
>one that would not trap and give the mathematically correct answer.

What "temporarily trapping loop" and "mathematically correct answer"?

If you are talking about integer arithmetic, the limited integers in
Java have modulo semantics, i.e., they don't trap, and BigIntegers certainly
don't trap.

If you are talking about FP (like I did), by default FP addition does
not trap in Java, and any mention of "mathematically correct" in
connection with FP needs a lot of further elaboration.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Cray style vectors

<uqqvkc$i2cu$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37448&group=comp.arch#37448

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.furie.org.uk!nntp.terraraq.uk!news.gegeweb.eu!gegeweb.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Sat, 17 Feb 2024 19:58:19 +0100
Organization: A noiseless patient Spider
Lines: 57
Message-ID: <uqqvkc$i2cu$1@dont-email.me>
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me>
<uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me>
<uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at>
<uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at>
<uqobhv$3o4m9$2@dont-email.me>
<1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com>
<uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 17 Feb 2024 18:58:20 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="dbc745b329d03c91506fe7daa6f38488";
logging-data="592286"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+qzT+Jz+Ng9Al5wObs9pF9b5catsh4ptFDExW6UqswKw=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18.1
Cancel-Lock: sha1:FwGG6BrcJv0VHplkLb+Si5NaJC0=
In-Reply-To: <2024Feb17.190353@mips.complang.tuwien.ac.at>
 by: Terje Mathisen - Sat, 17 Feb 2024 18:58 UTC

Anton Ertl wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>> On the third (i.e gripping) hand you could have a language like Java=20
>> where it would be illegal to transform a temporarily trapping loop into=20
>> one that would not trap and give the mathematically correct answer.
>
> What "temporarily trapping loop" and "mathematically correct answer"?
>
> If you are talking about integer arithmetic, the limited integers in
> Java have modulo semantics, i.e., they don't trap, and BigIntegers certainly
> don't trap.
>
> If you are talking about FP (like I did), by default FP addition does
> not trap in Java, and any mention of "mathematically correct" in
> connection with FP needs a lot of further elaboration.

Sorry to be unclear:

I was specifically talking about adding a bunch of integers together,
some positive and some negative, so that by doing them in program order
you will get an overflow, but if you did them in some other order, or
with a double-wide accumulator, the final result would in fact fit in
the designated target variable.

int8_t sum(int len, int8_t data[])
{ int8_t s = 0;
for (unsigned i = 0 i < len; i++) {
s += data[i];
}
return s;
}

will overflow if called with data = [127, 1, -2], right?

while if you implement it with

int8_t sum(int len, int8_t data[])
{ int s = 0;
for (unsigned i = 0 i < len; i++) {
s += data[i];
}
return (int8_t) s;
}

then you would be OK, and the final result would be mathematically correct.

For this particular example, you would also get the correct answer with
wrapping arithmetic, even if that by default is UB in modern C.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Cray style vectors

<3ce8b7c45ee7492e539eaeff4163bab5@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37449&group=comp.arch#37449

  copy link   Newsgroups: comp.arch
Date: Sat, 17 Feb 2024 20:03:01 +0000
Subject: Re: Cray style vectors
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$iYhRY1EFXkMRRfdBvlnZz.4IBq22MQAA26tgbNIHLShhYScJvkssq
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me> <uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me>
Organization: Rocksolid Light
Message-ID: <3ce8b7c45ee7492e539eaeff4163bab5@www.novabbs.org>
 by: MitchAlsup1 - Sat, 17 Feb 2024 20:03 UTC

Terje Mathisen wrote:

> Anton Ertl wrote:
>>
>> If you are talking about FP (like I did), by default FP addition does
>> not trap in Java, and any mention of "mathematically correct" in
>> connection with FP needs a lot of further elaboration.

> Sorry to be unclear:

> I was specifically talking about adding a bunch of integers together,
> some positive and some negative, so that by doing them in program order
> you will get an overflow, but if you did them in some other order, or
> with a double-wide accumulator, the final result would in fact fit in
> the designated target variable.

> int8_t sum(int len, int8_t data[])
> {
> int8_t s = 0;
> for (unsigned i = 0 i < len; i++) {
> s += data[i];
> }
> return s;
> }

> will overflow if called with data = [127, 1, -2], right?

Yes, and it should not be vectorized when your vector resource has
CRAY-like vector registers--however, it can be vectorized with VVM
like resources.

> while if you implement it with

> int8_t sum(int len, int8_t data[])
> {
> int s = 0;
> for (unsigned i = 0 i < len; i++) {
> s += data[i];
> }
> return (int8_t) s;
> }

> then you would be OK, and the final result would be mathematically correct.

when len > 2^24 it may still not be mathematically correct for 32-bit ints
or len > 2^60 for 64-bit ints.

> For this particular example, you would also get the correct answer with
> wrapping arithmetic, even if that by default is UB in modern C.

> Terje

Re: Cray style vectors

<uqr459$j1cm$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37451&group=comp.arch#37451

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Sat, 17 Feb 2024 14:13:42 -0600
Organization: A noiseless patient Spider
Lines: 202
Message-ID: <uqr459$j1cm$1@dont-email.me>
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me>
<uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me>
<uqmn7c$3n35k$1@dont-email.me> <uqngut$3r1tr$1@dont-email.me>
<uqon2i$1sp9$1@dont-email.me> <uqptpc$be1c$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 17 Feb 2024 20:15:37 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4762732a80d7cde146b10fa0ada97e32";
logging-data="624022"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX184X5eUKb1UjoaU5KloBv06r9NfO71zoFs="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:gVo6NeK/5m7JHcvxxfH3TNLyhP4=
Content-Language: en-US
In-Reply-To: <uqptpc$be1c$1@dont-email.me>
 by: BGB - Sat, 17 Feb 2024 20:13 UTC

On 2/17/2024 3:20 AM, Terje Mathisen wrote:
> BGB wrote:
>> On 2/16/2024 5:29 AM, Marcus wrote:
>>> I'm saying that I believe that within this category there is an
>>> opportunity for improving performance with very little cost by adding
>>> vector operations.
>>>
>>> E.g. imagine a non-pipelined implementation with a single memory port,
>>> shared by instruction fetch and data load/store, that requires perhaps
>>> two cycles to fetch and decode an instruction, and executes the
>>> instruction in the third cycle (possibly accessing the memory, which
>>> precludes fetching a new instruction until the fourth or even fifth
>>> cycle).
>>>
>>> Now imagine if a single instruction could iterate over several elements
>>> of a vector register. This would mean that the execution unit could
>>> execute up to one operation every clock cycle, approaching similar
>>> performance levels as a pipelined 1 CPI machine. The memory port would
>>> be free for data traffic as no new instructions have to be fetched
>>> during the vector loop. And so on.
>>>
>>
>> I guess possible.
>
> Absolutely possible. After all, the IBM block move and all the 1978 x86
> string ops were designed to make an internal, interruptible, loop. No
> need to load more instructions, just let the internal state machine run
> until completion.
>
> The current state of the art (i.e. VMM) is of course far more capable,
> but the original idea is old.
>

I think the thing was that "REP MOVSB" and friends were either:
Amazingly fast (sometimes);
Painfully slow (often).

So, say, it could give GB/s, or it might only give 200 MB/s or similar,
depending on the computer in question.

I guess, doing vectors similarly, say:
VECREP 256 | VADD.H @R4+, @R5+, @R6+
To do a vector operation, could be possible in premise (with "VECREP"
here interpreted as a magic prefix).

Ironically, the original SuperH had a few instructions not too far off
from this, but I pretty quickly dropped them when working on my original
BJX1 design (I dropped pretty much anything that wasn't being actively
used by GCC for the SH-4 target, and reclaimed the encoding space for
other purposes; I "extra hard" dropped anything that didn't fit in with
the Load/Store model; where SuperH was not entirely free of instructions
that violated Load/Store).

( Decides to omit detour path of how I got from SH4 to the original form
of the BJX2 design... )

Nevermind how all this fiddly eventually ended up in a direction that
kinda almost more resembles a hybrid of SH-5 and RISC-V than it does
SH-4, this is its own mystery...

But, I am not entirely sure how one would go about implementing it, as
VADD.H would need to do the equivalent of:
MOV.Q (R4), R16
MOV.Q (R5), R17
ADD 8, R4
ADD 8, R5
PADD.H R16, R17, R18
MOV.Q R18, (R6)
ADD 8, R6
All in a single instruction.

Though, could be reduced if auto-increment were re-added:
MOV.Q @R4+, R16
MOV.Q @R5+, R17
PADD.H R16, R17, R18
MOV.Q R18, @R6+

With the auto-increment ops being decoded as, say:
ADD 8, R4 | MOV.Q (R4), R16
But, with the relaxation of allowing the ADD in parallel with the MOV,
with the ADD's result not taking effect until after the MOV.

I "could" at least be tempted to re-implement auto-increment, if it
seemed like it "would actually be worthwhile" (I dropped it, originally,
because it did not seem worthwhile; also necessarily requires an
additional register port).

Would get a bit wonky though if I wanted to support it with 128-bit
MOV.X and the XMOV.x ops and similar though. Would need to direct the
auto-increment out Lane3's write-port without having access to Lane3's
read-ports; so the ADD would need to pull one input from Lane1 and
another out of the air; or more likely add a special AutoInc operation
to Lane3, whose sole purpose would be to fake the ADD output, with a
side-channel wired to Lane1's Rs port...

Or, basically:
MOV.X @R4+, R8
MOV.X R8, @R5+
Would require effectively doing stuff in all 3 lanes at the same time
(1/2 deal with the MOV.X, 3 supplies the ports for Mem-Store, and a
special op in Lane3, with some funky plumbing, for the address
increment/decrement).

Otherwise, shouldn't be too much different from the way 128-bit SIMD ops
are already handled (likely with the inner decoder having a way to
signal to the outer decoder, "hey, do the thing for auto-increment").

But, this doesn't save much, if all the logic needs to happen as a
single operation.

Could in theory add microcode support, but this wouldn't do much for
making it fast.

More likely would be the hardware needing to add 3 memory ports and some
way for the FPU or SIMD unit to use them...

But, then one would still have the evils of an instruction with 3 output
registers at the end of it (if the operation behaves like an
auto-increment).

But, making this work, with any sort of performance, on a reasonably
hardware cost, seems like a stretch...

Meanwhile, as I understand the VVM approach, I can understand how it
could be interpreted as a loop running conventional instructions, but as
for how a hardware implementation would do anything with it beyond
executing a conventional loop, this part escapes me...

Which is, relatedly, part of the whole "I am just going to do the SIMD
thing". SIMD doesn't leave any big perplexing mysteries like this.

I guess, if one wanted to get crazy with it, it could be possible (at
least in premise) to shove some basic SIMD ops into the mechanism used
for the RISC-V 'A' extension (which also so happens in BJX2 to allow
something like x86 style Load-Op and Op-Store).

Nevermind if doing SIMD this way would likely require pipeline stalls.
Say, each FP-SIMD LoadOp / OpStore requiring a 3-cycle stall.

But, probably not going to do this. I am still not using these ops, and
it seems I didn't even get around to implementing support for them in
BGBCC. IIRC, the basic logic is in-place in the Verilog implementation
though; and exists as an extension of the RiDisp mechanism.

Which added an equivalent of an x86 style [Rb+Ri*Sc+Disp] addressing
mode; but didn't really seem to add enough to be worthwhile (in the
basic case, this is not allowed; but the encoding can still be used to
encode [Rb+Ri*Sc] and [Rb+Disp] cases, which are allowed without
"RiDisp" proper, and the LdOp extension which bolts an additional
sub-opcode to the operation via some of the remaining bits in the FF
prefix).

It is almost funny in a way, that RV defines an extension which
basically requires a mechanism capable of Load-Op and Op-Store, without
actually adding these semantics to the RV ISA (and I sort of ended up
adding Load-Op and Op-Store, seemingly initially missing the point of
the original 'A' extension).

Well, apart from having an instruction that basically does the
"InterlockedExchange" thing, AKA: "AMOSWAP.W"/"AMOSWAP.D" ("LDOP/XCHG.x"
in BJX2 terms). This seemed like the main relevant one for implementing
spinlocks though (well, and the RV "FENCE" instruction, but as-is, this
is handled in other ways in BJX2 and there is no obvious way to handle
"FENCE" either than ignore it, or treating it as a trap and letting an
ISR deal with it, likely by performing an L1 flush).

Where, in my case, a cache flush generally involves both a cache-flush
instruction and a sweep over a chunk of dummy memory large enough to
make sure anything is effectively knocked out of the L1 (the cache-flush
instruction mostly needed to deal with the possibility of associative
caches; the memory sweep needed to actually get all the cache-lines
evicted from the cache).

But, yeah, all this is off in the realm of "borderline useful, kinda
trash fire" areas of the ISA.

I guess though, unlike RISC-V, I didn't burn parts of the 32-bit
encoding space on this stuff, but left it in terms of 64-bit encodings.

....

> Terje
>

Re: Cray style vectors

<uqr6ve$jfij$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37454&group=comp.arch#37454

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Sat, 17 Feb 2024 15:03:41 -0600
Organization: A noiseless patient Spider
Lines: 48
Message-ID: <uqr6ve$jfij$1@dont-email.me>
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me>
<uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me>
<uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at>
<uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at>
<uqobhv$3o4m9$2@dont-email.me>
<1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com>
<uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 17 Feb 2024 21:03:42 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4762732a80d7cde146b10fa0ada97e32";
logging-data="638547"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19zXXI8EBU9zJ6B5F7h5kX1s+Dms9du+qQ="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:DdYu6n33kXbYHY9ieNM7jKbB2RI=
Content-Language: en-US
In-Reply-To: <2024Feb17.190353@mips.complang.tuwien.ac.at>
 by: BGB - Sat, 17 Feb 2024 21:03 UTC

On 2/17/2024 12:03 PM, Anton Ertl wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>> On the third (i.e gripping) hand you could have a language like Java=20
>> where it would be illegal to transform a temporarily trapping loop into=20
>> one that would not trap and give the mathematically correct answer.
>
> What "temporarily trapping loop" and "mathematically correct answer"?
>
> If you are talking about integer arithmetic, the limited integers in
> Java have modulo semantics, i.e., they don't trap, and BigIntegers certainly
> don't trap.
>

Yes.

Trap on overflow is not really a thing in the JVM, the basic integer
types are modulo, and don't actually distinguish signed from unsigned
(unsigned arithmetic is merely faked in some cases with special
operators, with signed arithmetic assumed as the default).

> If you are talking about FP (like I did), by default FP addition does
> not trap in Java, and any mention of "mathematically correct" in
> connection with FP needs a lot of further elaboration.
>

Yeah. No traps, only NaNs.

FWIW: My own languages, and BGBCC, also partly followed Java's model in
this area. But, it wasn't hard: This is generally how C behaves as well
on most targets.

Well, except that C will often trap for things like divide by zero and
similar, at least on x86. Though, off-hand, I don't remember whether or
not JVM throws an exception on divide-by-zero.

On BJX2, there isn't currently any divide-by-zero trap, since:
This case doesn't happen in normal program execution;
Handling it with a trap would cost more than not bothering.

So, IIRC, integer divide-by-zero will just give 0, and FP divide-by-zero
will give Inf or NaN.

> - anton

Re: Cray style vectors

<a303310733eef77a63360606495a761e@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37457&group=comp.arch#37457

  copy link   Newsgroups: comp.arch
Date: Sat, 17 Feb 2024 22:03:16 +0000
Subject: Re: Cray style vectors
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$HP98yiR3foZTTjdaj6uFDuajMUuwmW7yHyLA8.uZfcjhOqPcpL4DG
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me> <uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <uqngut$3r1tr$1@dont-email.me> <uqon2i$1sp9$1@dont-email.me> <uqptpc$be1c$1@dont-email.me> <uqr459$j1cm$1@dont-email.me>
Organization: Rocksolid Light
Message-ID: <a303310733eef77a63360606495a761e@www.novabbs.org>
 by: MitchAlsup1 - Sat, 17 Feb 2024 22:03 UTC

BGB wrote:

> On 2/17/2024 3:20 AM, Terje Mathisen wrote:
>> BGB wrote:
>>>

> But, I am not entirely sure how one would go about implementing it, as
> VADD.H would need to do the equivalent of:
> MOV.Q (R4), R16
> MOV.Q (R5), R17
> ADD 8, R4
> ADD 8, R5
> PADD.H R16, R17, R18
> MOV.Q R18, (R6)
> ADD 8, R6
> All in a single instruction.

With the proper instruction set, the above is::

VEC R9,{}
LDSH R10,[R1,Ri<<1]
LDSH R11,[R2,Ri<<1]
ADD R12,R10,R11
STH R12,[R3,Ri<<1]
LOOP LT,Ri,#1,Rmax

Once you see that there is no loop recurrence, then the loops can be run
concurrently as wide as you have arithmetic capabilities and cache BW--
in this case we have an arithmetic capability of 4 Halfword ADDs per cycle
and a memory capability of 128-bits every cycle creating a BW of 4×5 inst
every 1.5 cycles or 13.3 IPC and we are memory limited, not arithmetic
limited.

> Though, could be reduced if auto-increment were re-added:
> MOV.Q @R4+, R16
> MOV.Q @R5+, R17
> PADD.H R16, R17, R18
> MOV.Q R18, @R6+

You will find the requisite patterns harder to recognize when the memory
reference size is NOT the calculation size. In your case, the calculation
is .H while memory reference is .Q .

Re: Cray style vectors

<802cbb7bbdfc9f3ee2f52eb7d99acfac@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37459&group=comp.arch#37459

  copy link   Newsgroups: comp.arch
Date: Sat, 17 Feb 2024 22:08:30 +0000
Subject: Re: Cray style vectors
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$4/k9uhBvZUc81j7/g9HPA.qInsHMR6NcADTzAFuf7EkcFR0jfQan6
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me> <uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqr6ve$jfij$1@dont-email.me>
Organization: Rocksolid Light
Message-ID: <802cbb7bbdfc9f3ee2f52eb7d99acfac@www.novabbs.org>
 by: MitchAlsup1 - Sat, 17 Feb 2024 22:08 UTC

BGB wrote:

> On 2/17/2024 12:03 PM, Anton Ertl wrote:
>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>> On the third (i.e gripping) hand you could have a language like Java=20
>>> where it would be illegal to transform a temporarily trapping loop into=20
>>> one that would not trap and give the mathematically correct answer.
>>
>> What "temporarily trapping loop" and "mathematically correct answer"?
>>
>> If you are talking about integer arithmetic, the limited integers in
>> Java have modulo semantics, i.e., they don't trap, and BigIntegers certainly
>> don't trap.
>>

> Yes.

> Trap on overflow is not really a thing in the JVM, the basic integer
> types are modulo, and don't actually distinguish signed from unsigned
> (unsigned arithmetic is merely faked in some cases with special
> operators, with signed arithmetic assumed as the default).

>> If you are talking about FP (like I did), by default FP addition does
>> not trap in Java, and any mention of "mathematically correct" in
>> connection with FP needs a lot of further elaboration.

People skilled in numerical analysis hate java FP semantics.

> Yeah. No traps, only NaNs.

> FWIW: My own languages, and BGBCC, also partly followed Java's model in
> this area. But, it wasn't hard: This is generally how C behaves as well
> on most targets.

> Well, except that C will often trap for things like divide by zero and
> similar, at least on x86. Though, off-hand, I don't remember whether or
> not JVM throws an exception on divide-by-zero.

> On BJX2, there isn't currently any divide-by-zero trap, since:
> This case doesn't happen in normal program execution;
> Handling it with a trap would cost more than not bothering.

This sounds like it should make your machine safe to program and use,
but it does not.

> So, IIRC, integer divide-by-zero will just give 0, and FP divide-by-zero
> will give Inf or NaN.

Can I volunteer this as the worst possible value for int/0, [un]signedMAX
is trivially harder to implement.

>> - anton

Re: Cray style vectors

<86y1biy874.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37467&group=comp.arch#37467

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Sat, 17 Feb 2024 15:36:31 -0800
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <86y1biy874.fsf@linuxsc.com>
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me> <uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="442286b2a18a4930a6a4f298195e1b68";
logging-data="695029"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19onrpDI1JDGl9kbshsDrAvAj7mAjTFVCs="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:BgeK10HCP8Z75PjCCYO4Vw4b190=
sha1:HJCDHxdlIpsSsAsk0yacP1ZPxO0=
 by: Tim Rentsch - Sat, 17 Feb 2024 23:36 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:

> [...]
>
> int8_t sum(int len, int8_t data[])
> {
> int s = 0;
> for (unsigned i = 0 i < len; i++) {
> s += data[i];
> }
> return (int8_t) s;
> }

The cast in the return statement is superfluous.

Re: Cray style vectors

<uqrfud$latl$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37468&group=comp.arch#37468

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Sat, 17 Feb 2024 17:36:43 -0600
Organization: A noiseless patient Spider
Lines: 93
Message-ID: <uqrfud$latl$1@dont-email.me>
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me>
<uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me>
<uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at>
<uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at>
<uqobhv$3o4m9$2@dont-email.me>
<1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com>
<uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at> <uqr6ve$jfij$1@dont-email.me>
<802cbb7bbdfc9f3ee2f52eb7d99acfac@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 17 Feb 2024 23:36:45 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5b3d8b0477922975f93abcf7ddc824b4";
logging-data="699317"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/6oDqyYWIrp36wjh7dQdlmprcKppWjGBE="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:FVjSMGBkexnKRdgx7kh8Vw8WNpI=
Content-Language: en-US
In-Reply-To: <802cbb7bbdfc9f3ee2f52eb7d99acfac@www.novabbs.org>
 by: BGB - Sat, 17 Feb 2024 23:36 UTC

On 2/17/2024 4:08 PM, MitchAlsup1 wrote:
> BGB wrote:
>
>> On 2/17/2024 12:03 PM, Anton Ertl wrote:
>>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>>> On the third (i.e gripping) hand you could have a language like Java=20
>>>> where it would be illegal to transform a temporarily trapping loop
>>>> into=20
>>>> one that would not trap and give the mathematically correct answer.
>>>
>>> What "temporarily trapping loop" and "mathematically correct answer"?
>>>
>>> If you are talking about integer arithmetic, the limited integers in
>>> Java have modulo semantics, i.e., they don't trap, and BigIntegers
>>> certainly
>>> don't trap.
>>>
>
>> Yes.
>
>> Trap on overflow is not really a thing in the JVM, the basic integer
>> types are modulo, and don't actually distinguish signed from unsigned
>> (unsigned arithmetic is merely faked in some cases with special
>> operators, with signed arithmetic assumed as the default).
>
>
>>> If you are talking about FP (like I did), by default FP addition does
>>> not trap in Java, and any mention of "mathematically correct" in
>>> connection with FP needs a lot of further elaboration.
>
> People skilled in numerical analysis hate java FP semantics.
>
>> Yeah. No traps, only NaNs.
>
>
>> FWIW: My own languages, and BGBCC, also partly followed Java's model
>> in this area. But, it wasn't hard: This is generally how C behaves as
>> well on most targets.
>
>> Well, except that C will often trap for things like divide by zero and
>> similar, at least on x86. Though, off-hand, I don't remember whether
>> or not JVM throws an exception on divide-by-zero.
>
>
>> On BJX2, there isn't currently any divide-by-zero trap, since:
>>    This case doesn't happen in normal program execution;
>>    Handling it with a trap would cost more than not bothering.
>
> This sounds like it should make your machine safe to program and use,
> but it does not.
>

It is more concerned with "cheap" than "safe".

Trap on divide-by-zero would require having a way for the divider unit
to signal divide-by-zero has been encountered (say, so some external
logic can raise the corresponding exception code). This is not free.

Granted, since the unit is slow, could potentially add a few cycles of
latency without much ill effect.

>> So, IIRC, integer divide-by-zero will just give 0, and FP
>> divide-by-zero will give Inf or NaN.
>
> Can I volunteer this as the worst possible value for int/0, [un]signedMAX
> is trivially harder to implement.
>

Probably depends on what one wants.
With 0, the value just sorta goes poof and disappears...

Not exactly the most numerically correct answer, granted.

Probably at least still better than "divide by zero gives some totally
random garbage value", which is, technically, the cheaper option...

Though, at least Inf has the advantage that it just sorta appears on its
own when one tries to find the reciprocal of 0 (it initially turns into
a very huge value, and quickly turns into Inf in the N-R stages; with
Inf as a special case for whenever the exponent goes out of range).

The NaN then appears if one special-cases 0*Inf to produce NaN.
x/0 => x*rcp(0) => x*Inf => Inf
0/0 => 0*rcp(0) => 0*Inf => NaN

....

>>> - anton

Re: Cray style vectors

<02d4b67c9c2a4807b655ba503240279e@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37471&group=comp.arch#37471

  copy link   Newsgroups: comp.arch
Date: Sun, 18 Feb 2024 01:03:23 +0000
Subject: Re: Cray style vectors
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$aeVP6i0YujFFteCw4XNFvOP4jcnM/7wTBYa88T.lnO6UMA9AjylLK
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me> <uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <86y1biy874.fsf@linuxsc.com>
Organization: Rocksolid Light
Message-ID: <02d4b67c9c2a4807b655ba503240279e@www.novabbs.org>
 by: MitchAlsup1 - Sun, 18 Feb 2024 01:03 UTC

Tim Rentsch wrote:

> Terje Mathisen <terje.mathisen@tmsw.no> writes:

>> [...]
>>
>> int8_t sum(int len, int8_t data[])
>> {
>> int s = 0;
>> for (unsigned i = 0 i < len; i++) {
>> s += data[i];
>> }
>> return (int8_t) s;
>> }

> The cast in the return statement is superfluous.

But the return statement is where overflow (if any) is detected.

Re: Cray style vectors

<6e35da8f1853877c83a5bd5dc9bbe94e@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37472&group=comp.arch#37472

  copy link   Newsgroups: comp.arch
Date: Sun, 18 Feb 2024 01:06:46 +0000
Subject: Re: Cray style vectors
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$eRvj.7Agg90MaqzlW6CFVOyIn/Sv7H.M6Ye2iRv3bOur3tFXs0br6
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me> <uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqr6ve$jfij$1@dont-email.me> <802cbb7bbdfc9f3ee2f52eb7d99acfac@www.novabbs.org> <uqrfud$latl$1@dont-email.me>
Organization: Rocksolid Light
Message-ID: <6e35da8f1853877c83a5bd5dc9bbe94e@www.novabbs.org>
 by: MitchAlsup1 - Sun, 18 Feb 2024 01:06 UTC

BGB wrote:

> On 2/17/2024 4:08 PM, MitchAlsup1 wrote:
>> BGB wrote:
>>
>>
>>> On BJX2, there isn't currently any divide-by-zero trap, since:
>>>    This case doesn't happen in normal program execution;
>>>    Handling it with a trap would cost more than not bothering.
>>
>> This sounds like it should make your machine safe to program and use,
>> but it does not.
>>

> It is more concerned with "cheap" than "safe".

> Trap on divide-by-zero would require having a way for the divider unit
> to signal divide-by-zero has been encountered (say, so some external
> logic can raise the corresponding exception code). This is not free.

Most result busses have a bit that carries exception to the retire end
of the pipeline. The retire stage looks at the bit, sees a DIV instruction
and knows what exception was raised. FP generally needs 3-such bits on
the result bus.

Re: Cray style vectors

<uqro9n$mtmi$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37474&group=comp.arch#37474

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Sat, 17 Feb 2024 19:59:19 -0600
Organization: A noiseless patient Spider
Lines: 118
Message-ID: <uqro9n$mtmi$1@dont-email.me>
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me>
<uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me>
<uqmn7c$3n35k$1@dont-email.me> <uqngut$3r1tr$1@dont-email.me>
<uqon2i$1sp9$1@dont-email.me> <uqptpc$be1c$1@dont-email.me>
<uqr459$j1cm$1@dont-email.me>
<a303310733eef77a63360606495a761e@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 18 Feb 2024 01:59:19 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5b3d8b0477922975f93abcf7ddc824b4";
logging-data="751314"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18JSulG/Qhwp8gPevOiwkAuT1E9hppv5vE="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:MYkPE2fHbauC4J/vTkW7MA42+HU=
Content-Language: en-US
In-Reply-To: <a303310733eef77a63360606495a761e@www.novabbs.org>
 by: BGB - Sun, 18 Feb 2024 01:59 UTC

On 2/17/2024 4:03 PM, MitchAlsup1 wrote:
> BGB wrote:
>
>> On 2/17/2024 3:20 AM, Terje Mathisen wrote:
>>> BGB wrote:
>>>>
>
>> But, I am not entirely sure how one would go about implementing it, as
>> VADD.H would need to do the equivalent of:
>>    MOV.Q   (R4), R16
>>    MOV.Q   (R5), R17
>>    ADD     8, R4
>>    ADD     8, R5
>>    PADD.H  R16, R17, R18
>>    MOV.Q   R18, (R6)
>>    ADD     8, R6
>> All in a single instruction.
>
> With the proper instruction set, the above is::
>
>     VEC     R9,{}
>     LDSH    R10,[R1,Ri<<1]
>     LDSH    R11,[R2,Ri<<1]
>     ADD     R12,R10,R11
>     STH     R12,[R3,Ri<<1]
>     LOOP    LT,Ri,#1,Rmax
>
> Once you see that there is no loop recurrence, then the loops can be run
> concurrently as wide as you have arithmetic capabilities and cache BW--
> in this case we have an arithmetic capability of 4 Halfword ADDs per cycle
> and a memory capability of 128-bits every cycle creating a BW of 4×5 inst
> every 1.5 cycles or 13.3 IPC and we are memory limited, not arithmetic
> limited.
>

Theoretically, but, how would the hardware "actually work"?...

Even if it does the part of allowing the hardware to know that a loop
exists, how does the hardware know to schedule these ops in any way more
efficient than to simply loop over them.

And, simply looping over them, would not gain much here.

My difficultly is I can't seem to imagine a mechanism that would
actually pull this off (at least, within the limits of a vaguely similar
hardware class to what I am dealing with).

>> Though, could be reduced if auto-increment were re-added:
>>    MOV.Q   @R4+, R16
>>    MOV.Q   @R5+, R17
>>    PADD.H  R16, R17, R18
>>    MOV.Q   R18, @R6+
>
> You will find the requisite patterns harder to recognize when the memory
> reference size is NOT the calculation size. In your case, the calculation
> is .H while memory reference is .Q .

PADD.H does 4x Binary16 (64-bit vector), so Q works (also 64-bit).

256x PADD.H would operate on a vector of 1024 elements.

Though, this contrived case does seem to make a possible use-case for
auto-increment addressing.

Would almost entirely dismiss this possibility, apart from the
realization that my performance lead over RV64G (if both are running on
the same pipeline, and RV64G has dual-issue superscalar) is a lot
smaller than I would prefer.

Well, could always drop back to 2-cycle ALU and 3-cycle Load, to make
RV64G suffer worse in relation; but, like, excluding using cheap tricks
to put RV64G at a disadvantage.

But, not quite desperate enough to re-add auto-increment, which while it
could potentially help performance, it is not clear that it would be a
good thing for the ISA design as a whole.

As noted, did end up needing to break out the "BRxx Rm, Rn, Lbl"
instructions to reclaim the lead in Dhrystone, but even then, the lead
isn't all that large...

There are also now the CMP3R instructions. Basic idea behind CMP3R being
that the CMPxx output that would normally go to SR.T, is instead sent to
an output GPR (sorta like "SLT" in RISC-V, but expanded out slightly more).

It looks like one other case that may need addressing is the current
inability to encode a negative immediate to AND (within a single 32-bit
instruction), which seems to be more common than originally expected.
y=x&(~3);
And similar.

I originally assumed this case to be "almost never", but this isn't
quite right (it is a less common case, granted, but can still occur
quite often on the scale of a program), and at the moment requires
either a 64-bit encoding or putting the constant into a temporary
register (and the main spot it might make sense to put this is already
used by the "RSUB" instruction).

Well, and also inefficiencies due to my compiler failing to realize that
NULL is basically equivalent to a constant 0 (and so does not need to be
handled using temporary registers and the general-purpose mechanisms one
generally needs for "real" pointers). Tried poking at this case before
but then stuff started breaking.

But, I guess, at least Doom is getting a little bit faster; gradually
spending more of its time up against the 35 fps limiter.

....

Re: Cray style vectors

<86h6i6xvxq.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37475&group=comp.arch#37475

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Sat, 17 Feb 2024 20:01:21 -0800
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <86h6i6xvxq.fsf@linuxsc.com>
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me> <uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <86y1biy874.fsf@linuxsc.com> <02d4b67c9c2a4807b655ba503240279e@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="442286b2a18a4930a6a4f298195e1b68";
logging-data="918926"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+U+tmgGZo1BQp2/ImQ3VgB7+RstnLJ46I="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:bCMu2e68Qg6k4Y+1vDh/8kGj/gA=
sha1:LazPB1lMo97MZY4rRJbdIBv107U=
 by: Tim Rentsch - Sun, 18 Feb 2024 04:01 UTC

mitchalsup@aol.com (MitchAlsup1) writes:

> Tim Rentsch wrote:
>
>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>
>>> [...]
>>>
>>> int8_t sum(int len, int8_t data[])
>>> {
>>> int s = 0;
>>> for (unsigned i = 0 i < len; i++) {
>>> s += data[i];
>>> }
>>> return (int8_t) s;
>>> }
>>
>> The cast in the return statement is superfluous.
>
> But the return statement is where overflow (if any) is detected.

The cast is superfluous because a conversion to int8_t will be
done in any case, since the return type of the function is
int8_t.

Re: Cray style vectors

<2024Feb18.084713@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37476&group=comp.arch#37476

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.neodome.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Sun, 18 Feb 2024 07:47:13 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 47
Message-ID: <2024Feb18.084713@mips.complang.tuwien.ac.at>
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me> <uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="c083118b4b3258bb2f179aec3491fd82";
logging-data="1010896"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19O6NGk9YTCU4hF6hhuGlME"
Cancel-Lock: sha1:5hlV3xmbGFhlWWcblSNmcL4J3rY=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 18 Feb 2024 07:47 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:
>Anton Ertl wrote:
>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>> On the third (i.e gripping) hand you could have a language like Java=20
>>> where it would be illegal to transform a temporarily trapping loop into=20
>>> one that would not trap and give the mathematically correct answer.
>>
>> What "temporarily trapping loop" and "mathematically correct answer"?
....
>I was specifically talking about adding a bunch of integers together,
>some positive and some negative, so that by doing them in program order
>you will get an overflow, but if you did them in some other order, or
>with a double-wide accumulator, the final result would in fact fit in
>the designated target variable.

As mentioned, Java defines addition of the integral base types to use
modulo (aka wrapping) arithmetic, i.e., overflow is fully defined with
nice properties. In particular, the associative law hold for modulo
addition, which allows all kinds of reassociations that are helpful
for parallelizing reduction.

>int8_t sum(int len, int8_t data[])
>{
> int8_t s = 0;
> for (unsigned i = 0 i < len; i++) {
> s += data[i];
> }
> return s;
>}
>
>will overflow if called with data = [127, 1, -2], right?

I don't think that int8_t or unsigned are Java types.

If that is C code: C standard lawyers will tell you what the C
standard says about storing 128 into s (the addition itself does not
overflow, because it uses ints).

>For this particular example, you would also get the correct answer with
>wrapping arithmetic, even if that by default is UB in modern C.

The standardized subset of C is not relevant for discussing Java.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Cray style vectors

<2024Feb18.090018@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37477&group=comp.arch#37477

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Sun, 18 Feb 2024 08:00:18 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 59
Message-ID: <2024Feb18.090018@mips.complang.tuwien.ac.at>
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me> <uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqr6ve$jfij$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="c083118b4b3258bb2f179aec3491fd82";
logging-data="1019762"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18rm/7tiCJz59bblmDBmzni"
Cancel-Lock: sha1:x/Kw4l5QoG05ttjXMKSe/ycGIQc=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 18 Feb 2024 08:00 UTC

BGB <cr88192@gmail.com> writes:
>Well, except that C will often trap for things like divide by zero and
>similar, at least on x86.

The division instructions of IA-32 and AMD64 trap on divide-by-zero
and when the result is out of range. Unsurprisingly, C compilers
usually use these instructions when compiling division on these
architectures. One interesting case is what C compilers do when you
write

long foo(long x)
{ return x/-1;
}

Both gcc and clang compile this to

0: 48 89 f8 mov %rdi,%rax
3: 48 f7 d8 neg %rax
6: c3 retq

and you don't get a trap when you call foo(LONG_MIN), while you would
if the compiler did not know that the divisor is -1 (and it was -1 at
run-time).

By contrast, when I implemented division-by-constant optimization in
Gforth, I decided not "optimize" the division by -1 case, so you get
the ordinary division operation and its behaviour. If a programmer
codes a division by -1 rather than just NEGATE, they probably want
something other than NEGATE.

>Though, off-hand, I don't remember whether or
>not JVM throws an exception on divide-by-zero.

Reading up on Java,
<https://docs.oracle.com/javase/specs/jls/se21/html/jls-15.html#jls-15.17.2>
says:

|if the dividend is the negative integer of largest possible magnitude
|for its type, and the divisor is -1, then integer overflow occurs and
|the result is equal to the dividend. Despite the overflow, no
|exception is thrown in this case. On the other hand, if the value of
|the divisor in an integer division is 0, then an ArithmeticException
|is thrown.

I expect that the JVM has matching wording.

So on, e.g., AMD64 the JVM has to generate code that catches the
long_min/-1 case and produces long_min rather then just generating the
divide instruction. Alternatively, the generated code could just
produce a division instruction, and the signal handler (on Unix) or
equivalent could then check if the divisor was 0 (and then throw an
ArithmeticException) or -1 (and then produce a long_min result and
continue execution).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Cray style vectors

<uqsern$uurm$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37478&group=comp.arch#37478

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ifonly@youknew.org (Opus)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Sun, 18 Feb 2024 09:24:23 +0100
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <uqsern$uurm$1@dont-email.me>
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me>
<uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me>
<uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at>
<uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at>
<uqobhv$3o4m9$2@dont-email.me>
<1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com>
<uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me>
<86y1biy874.fsf@linuxsc.com>
<02d4b67c9c2a4807b655ba503240279e@www.novabbs.org>
<86h6i6xvxq.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 18 Feb 2024 08:24:23 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="24ad9600d027412db4637dbe11190d92";
logging-data="1014646"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18q3Hw/DFv+EDsGXsumcpDk"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:dSla0E+QJ7kcb+vG2d1qYSHn9gE=
Content-Language: en-US
In-Reply-To: <86h6i6xvxq.fsf@linuxsc.com>
 by: Opus - Sun, 18 Feb 2024 08:24 UTC

On 18/02/2024 05:01, Tim Rentsch wrote:
> mitchalsup@aol.com (MitchAlsup1) writes:
>
>> Tim Rentsch wrote:
>>
>>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>>
>>>> [...]
>>>>
>>>> int8_t sum(int len, int8_t data[])
>>>> {
>>>> int s = 0;
>>>> for (unsigned i = 0 i < len; i++) {
>>>> s += data[i];
>>>> }
>>>> return (int8_t) s;
>>>> }
>>>
>>> The cast in the return statement is superfluous.
>>
>> But the return statement is where overflow (if any) is detected.
>
> The cast is superfluous because a conversion to int8_t will be
> done in any case, since the return type of the function is
> int8_t.

Of course the conversion will be done implicitly. C converts almost
anything implicitly. Not that this is its greatest feature.

The explicit cast is still useful: 1/ to express intent (it shows that
the potential loss of data is intentional) and then 2/ to avoid compiler
warnings (if you enable -Wconversion, which I usually recommend) or
warning from any serious static analyzer too (which I highly recommend
using too).

Re: Cray style vectors

<uqsm01$10b29$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37483&group=comp.arch#37483

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Sun, 18 Feb 2024 11:26:09 +0100
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <uqsm01$10b29$1@dont-email.me>
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me>
<uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me>
<uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at>
<uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at>
<uqobhv$3o4m9$2@dont-email.me>
<1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com>
<uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me>
<86y1biy874.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 18 Feb 2024 10:26:09 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="eb190a48bde03634b799bcfc94f5bb8b";
logging-data="1059913"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/LTpP/0kd1ok8W3IL71/reDh24eKT5f3BhswG3vrdq8Q=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18.1
Cancel-Lock: sha1:BtHe3SzXjuTu2ohWrSIUG6rX8T4=
In-Reply-To: <86y1biy874.fsf@linuxsc.com>
 by: Terje Mathisen - Sun, 18 Feb 2024 10:26 UTC

Tim Rentsch wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>
>> [...]
>>
>> int8_t sum(int len, int8_t data[])
>> {
>> int s = 0;
>> for (unsigned i = 0 i < len; i++) {
>> s += data[i];
>> }
>> return (int8_t) s;
>> }
>
> The cast in the return statement is superfluous.
>
I am normally writing Rust these days, where UB is far less common, but
casts like this are mandatory.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Cray style vectors

<g_pAN.85204$Sf59.5937@fx48.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37488&group=comp.arch#37488

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.hispagatos.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx48.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Cray style vectors
Newsgroups: comp.arch
References: <upq0cr$6b5m$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <86y1biy874.fsf@linuxsc.com> <02d4b67c9c2a4807b655ba503240279e@www.novabbs.org> <86h6i6xvxq.fsf@linuxsc.com>
Lines: 30
Message-ID: <g_pAN.85204$Sf59.5937@fx48.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sun, 18 Feb 2024 16:10:52 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sun, 18 Feb 2024 16:10:52 GMT
X-Received-Bytes: 1824
 by: Scott Lurndal - Sun, 18 Feb 2024 16:10 UTC

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>mitchalsup@aol.com (MitchAlsup1) writes:
>
>> Tim Rentsch wrote:
>>
>>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>>
>>>> [...]
>>>>
>>>> int8_t sum(int len, int8_t data[])
>>>> {
>>>> int s = 0;
>>>> for (unsigned i = 0 i < len; i++) {
>>>> s += data[i];
>>>> }
>>>> return (int8_t) s;
>>>> }
>>>
>>> The cast in the return statement is superfluous.
>>
>> But the return statement is where overflow (if any) is detected.
>
>The cast is superfluous because a conversion to int8_t will be
>done in any case, since the return type of the function is
>int8_t.

I suspect most experienced C programs know that.

Yet, the 'superfluous' cast is also documentation that the
programmer _intended_ that the return value would be narrowed.

Re: Cray style vectors

<8a42db6eafb8ebf769763a572bd9cc3a@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37491&group=comp.arch#37491

  copy link   Newsgroups: comp.arch
Date: Sun, 18 Feb 2024 17:48:09 +0000
Subject: Re: Cray style vectors
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$1pm.uJesdYdvaXiBE.Sk3OIAnyDlPxgrhTxe58w2uNC4FBjP4pOje
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me> <uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <86y1biy874.fsf@linuxsc.com> <02d4b67c9c2a4807b655ba503240279e@www.novabbs.org> <86h6i6xvxq.fsf@linuxsc.com>
Organization: Rocksolid Light
Message-ID: <8a42db6eafb8ebf769763a572bd9cc3a@www.novabbs.org>
 by: MitchAlsup1 - Sun, 18 Feb 2024 17:48 UTC

Tim Rentsch wrote:

> mitchalsup@aol.com (MitchAlsup1) writes:

>> Tim Rentsch wrote:
>>
>>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>>
>>>> [...]
>>>>
>>>> int8_t sum(int len, int8_t data[])
>>>> {
>>>> int s = 0;
>>>> for (unsigned i = 0 i < len; i++) {
>>>> s += data[i];
>>>> }
>>>> return (int8_t) s;
>>>> }
>>>
>>> The cast in the return statement is superfluous.
>>
>> But the return statement is where overflow (if any) is detected.

> The cast is superfluous because a conversion to int8_t will be
> done in any case, since the return type of the function is
> int8_t.

Missing my point:: which was::

The summation loop will not overflow, and overflow is detected at
the smash from int to int8_t.

Re: Cray style vectors

<20240218201405.00000226@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37493&group=comp.arch#37493

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Sun, 18 Feb 2024 20:14:05 +0200
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <20240218201405.00000226@yahoo.com>
References: <upq0cr$6b5m$1@dont-email.me>
<uqge2p$279ql$1@dont-email.me>
<uqhiqb$2grub$1@dont-email.me>
<uqlm2c$3e9bp$1@dont-email.me>
<uqmn7c$3n35k$1@dont-email.me>
<2024Feb16.082736@mips.complang.tuwien.ac.at>
<uqnmue$3o4m9$1@dont-email.me>
<2024Feb16.152320@mips.complang.tuwien.ac.at>
<uqobhv$3o4m9$2@dont-email.me>
<1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com>
<uqpngc$3o4m9$3@dont-email.me>
<uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at>
<uqr6ve$jfij$1@dont-email.me>
<2024Feb18.090018@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="f10e0f7ab1a9cf2f273426e58a091970";
logging-data="1306548"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18gwzsDZPO/IZ2ZSAnvuxo/Rpn1MXttAkg="
Cancel-Lock: sha1:1juBI5O9XhhqSBrrr1VBcUYUMDs=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
 by: Michael S - Sun, 18 Feb 2024 18:14 UTC

On Sun, 18 Feb 2024 08:00:18 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

> BGB <cr88192@gmail.com> writes:
> >Well, except that C will often trap for things like divide by zero
> >and similar, at least on x86.
>
> The division instructions of IA-32 and AMD64 trap on divide-by-zero
> and when the result is out of range. Unsurprisingly, C compilers
> usually use these instructions when compiling division on these
> architectures. One interesting case is what C compilers do when you
> write
>
> long foo(long x)
> {
> return x/-1;
> }
>
> Both gcc and clang compile this to
>
> 0: 48 89 f8 mov %rdi,%rax
> 3: 48 f7 d8 neg %rax
> 6: c3 retq
>
> and you don't get a trap when you call foo(LONG_MIN), while you would
> if the compiler did not know that the divisor is -1 (and it was -1 at
> run-time).
>
> By contrast, when I implemented division-by-constant optimization in
> Gforth, I decided not "optimize" the division by -1 case, so you get
> the ordinary division operation and its behaviour. If a programmer
> codes a division by -1 rather than just NEGATE, they probably want
> something other than NEGATE.
>
> >Though, off-hand, I don't remember whether or
> >not JVM throws an exception on divide-by-zero.
>
> Reading up on Java,
> <https://docs.oracle.com/javase/specs/jls/se21/html/jls-15.html#jls-15.17.2>
> says:
>
> |if the dividend is the negative integer of largest possible magnitude
> |for its type, and the divisor is -1, then integer overflow occurs and
> |the result is equal to the dividend. Despite the overflow, no
> |exception is thrown in this case. On the other hand, if the value of
> |the divisor in an integer division is 0, then an ArithmeticException
> |is thrown.
>
> I expect that the JVM has matching wording.
>
> So on, e.g., AMD64 the JVM has to generate code that catches the
> long_min/-1 case and produces long_min rather then just generating the
> divide instruction. Alternatively, the generated code could just
> produce a division instruction, and the signal handler (on Unix) or
> equivalent could then check if the divisor was 0 (and then throw an
> ArithmeticException) or -1 (and then produce a long_min result and
> continue execution).
>
> - anton

I don't understand why case of LONG_MIN/-1 would possibly require
special handling. IMHO, regular iAMD64 64-bit integer division sequence,
i.e. CQO followed by IDIV, will produce result expected by Java spec
without any overflow.

Re: Cray style vectors

<2024Feb18.234008@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37501&group=comp.arch#37501

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Sun, 18 Feb 2024 22:40:08 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 55
Message-ID: <2024Feb18.234008@mips.complang.tuwien.ac.at>
References: <upq0cr$6b5m$1@dont-email.me> <uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqr6ve$jfij$1@dont-email.me> <2024Feb18.090018@mips.complang.tuwien.ac.at> <20240218201405.00000226@yahoo.com>
Injection-Info: dont-email.me; posting-host="c083118b4b3258bb2f179aec3491fd82";
logging-data="1569570"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+6lKgCLr3WmrmB4F9v1raP"
Cancel-Lock: sha1:+DGtH2rOeNNfxzF6QaiaXuzVsW0=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 18 Feb 2024 22:40 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Sun, 18 Feb 2024 08:00:18 GMT
>anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>> Reading up on Java,
>> <https://docs.oracle.com/javase/specs/jls/se21/html/jls-15.html#jls-15.17.2>
>> says:
>>
>> |if the dividend is the negative integer of largest possible magnitude
>> |for its type, and the divisor is -1, then integer overflow occurs and
>> |the result is equal to the dividend. Despite the overflow, no
>> |exception is thrown in this case. On the other hand, if the value of
>> |the divisor in an integer division is 0, then an ArithmeticException
>> |is thrown.
>>
>> I expect that the JVM has matching wording.
>>
>> So on, e.g., AMD64 the JVM has to generate code that catches the
>> long_min/-1 case and produces long_min rather then just generating the
>> divide instruction. Alternatively, the generated code could just
>> produce a division instruction, and the signal handler (on Unix) or
>> equivalent could then check if the divisor was 0 (and then throw an
>> ArithmeticException) or -1 (and then produce a long_min result and
>> continue execution).
>>
>> - anton
>
>I don't understand why case of LONG_MIN/-1 would possibly require
>special handling. IMHO, regular iAMD64 64-bit integer division sequence,
>i.e. CQO followed by IDIV, will produce result expected by Java spec
>without any overflow.

Try it. E.g., in gforth-fast /S performs this sequence:

see /s
Code /s
0x00005614dd33562d <gforth_engine+3213>: add $0x8,%rbx
0x00005614dd335631 <gforth_engine+3217>: mov 0x8(%r13),%rax
0x00005614dd335635 <gforth_engine+3221>: add $0x8,%r13
0x00005614dd335639 <gforth_engine+3225>: cqto
0x00005614dd33563b <gforth_engine+3227>: idiv %r8
0x00005614dd33563e <gforth_engine+3230>: mov %rax,%r8
0x00005614dd335641 <gforth_engine+3233>: mov (%rbx),%rax
0x00005614dd335644 <gforth_engine+3236>: jmp *%rax
end-code

And when I divide LONG_MIN by -1, I get a trap:

$8000000000000000 -1 /s
*the terminal*:12:22: error: Division by zero
$8000000000000000 -1 >>>/s<<<

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Cray style vectors

<20240219012009.00001e47@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37502&group=comp.arch#37502

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Mon, 19 Feb 2024 01:20:09 +0200
Organization: A noiseless patient Spider
Lines: 61
Message-ID: <20240219012009.00001e47@yahoo.com>
References: <upq0cr$6b5m$1@dont-email.me>
<uqmn7c$3n35k$1@dont-email.me>
<2024Feb16.082736@mips.complang.tuwien.ac.at>
<uqnmue$3o4m9$1@dont-email.me>
<2024Feb16.152320@mips.complang.tuwien.ac.at>
<uqobhv$3o4m9$2@dont-email.me>
<1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com>
<uqpngc$3o4m9$3@dont-email.me>
<uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at>
<uqr6ve$jfij$1@dont-email.me>
<2024Feb18.090018@mips.complang.tuwien.ac.at>
<20240218201405.00000226@yahoo.com>
<2024Feb18.234008@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="5132bb87c0a9018f5dda277557f1da9c";
logging-data="1509593"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/A7L2xwVNl4utiqoW483lQhvt9xwGwbeU="
Cancel-Lock: sha1:NABa4Y+KIFLXwKExTHo1+kzps2I=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
 by: Michael S - Sun, 18 Feb 2024 23:20 UTC

On Sun, 18 Feb 2024 22:40:08 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

> Michael S <already5chosen@yahoo.com> writes:
> >On Sun, 18 Feb 2024 08:00:18 GMT
> >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
> >> Reading up on Java,
> >> <https://docs.oracle.com/javase/specs/jls/se21/html/jls-15.html#jls-15.17.2>
> >> says:
> >>
> >> |if the dividend is the negative integer of largest possible
> >> magnitude |for its type, and the divisor is -1, then integer
> >> overflow occurs and |the result is equal to the dividend. Despite
> >> the overflow, no |exception is thrown in this case. On the other
> >> hand, if the value of |the divisor in an integer division is 0,
> >> then an ArithmeticException |is thrown.
> >>
> >> I expect that the JVM has matching wording.
> >>
> >> So on, e.g., AMD64 the JVM has to generate code that catches the
> >> long_min/-1 case and produces long_min rather then just generating
> >> the divide instruction. Alternatively, the generated code could
> >> just produce a division instruction, and the signal handler (on
> >> Unix) or equivalent could then check if the divisor was 0 (and
> >> then throw an ArithmeticException) or -1 (and then produce a
> >> long_min result and continue execution).
> >>
> >> - anton
> >
> >I don't understand why case of LONG_MIN/-1 would possibly require
> >special handling. IMHO, regular iAMD64 64-bit integer division
> >sequence, i.e. CQO followed by IDIV, will produce result expected by
> >Java spec without any overflow.
>
> Try it. E.g., in gforth-fast /S performs this sequence:
>
> see /s
> Code /s
> 0x00005614dd33562d <gforth_engine+3213>: add $0x8,%rbx
> 0x00005614dd335631 <gforth_engine+3217>: mov 0x8(%r13),%rax
> 0x00005614dd335635 <gforth_engine+3221>: add $0x8,%r13
> 0x00005614dd335639 <gforth_engine+3225>: cqto
> 0x00005614dd33563b <gforth_engine+3227>: idiv %r8
> 0x00005614dd33563e <gforth_engine+3230>: mov %rax,%r8
> 0x00005614dd335641 <gforth_engine+3233>: mov (%rbx),%rax
> 0x00005614dd335644 <gforth_engine+3236>: jmp *%rax
> end-code
>
> And when I divide LONG_MIN by -1, I get a trap:
>
> $8000000000000000 -1 /s
> *the terminal*:12:22: error: Division by zero
> $8000000000000000 -1 >>>/s<<<
>
> - anton

You are right.
LONG_MIN/1 works, but LONG_MIN/-1 crashes, to my surprize.
Seems like I didn't RTFM with regard to IDIV for too many years.

Re: Cray style vectors

<2024Feb19.100623@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37510&group=comp.arch#37510

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Mon, 19 Feb 2024 09:06:23 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 11
Message-ID: <2024Feb19.100623@mips.complang.tuwien.ac.at>
References: <upq0cr$6b5m$1@dont-email.me> <uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqr6ve$jfij$1@dont-email.me> <2024Feb18.090018@mips.complang.tuwien.ac.at> <20240218201405.00000226@yahoo.com> <2024Feb18.234008@mips.complang.tuwien.ac.at> <20240219012009.00001e47@yahoo.com>
Injection-Info: dont-email.me; posting-host="71832a8f79071e6b9f8d19fb193e167d";
logging-data="1896738"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/YuzMALzGxUerR642ndsjz"
Cancel-Lock: sha1:2Wt6jCaYR3PJDOJHeJZl1Q7wu9o=
X-newsreader: xrn 10.11
 by: Anton Ertl - Mon, 19 Feb 2024 09:06 UTC

Michael S <already5chosen@yahoo.com> writes:
>LONG_MIN/1 works, but LONG_MIN/-1 crashes, to my surprize.
>Seems like I didn't RTFM with regard to IDIV for too many years.

The result of LONG_MIN/1 is LONG_MIN, which is in range, while the
result of LONG_MIN/-1 is LONG_MAX+1, which is not in range.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Cray style vectors

<uqvk2o$1snbf$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37511&group=comp.arch#37511

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Cray style vectors
Date: Mon, 19 Feb 2024 14:11:51 +0100
Organization: A noiseless patient Spider
Lines: 83
Message-ID: <uqvk2o$1snbf$1@dont-email.me>
References: <upq0cr$6b5m$1@dont-email.me> <uqge2p$279ql$1@dont-email.me>
<uqhiqb$2grub$1@dont-email.me> <uqlm2c$3e9bp$1@dont-email.me>
<uqmn7c$3n35k$1@dont-email.me> <2024Feb16.082736@mips.complang.tuwien.ac.at>
<uqnmue$3o4m9$1@dont-email.me> <2024Feb16.152320@mips.complang.tuwien.ac.at>
<uqobhv$3o4m9$2@dont-email.me>
<1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com>
<uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me>
<2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 19 Feb 2024 13:11:52 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="8f7bfe56a2f930bb381d80e54878b81f";
logging-data="1989999"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19XD6xWQ0gDklg3JwW1eQ18sJzneSQ+HW0="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:C+ReVoew3aWYcH8WLBVYIq/YPmI=
In-Reply-To: <uqqvkc$i2cu$1@dont-email.me>
Content-Language: en-GB
 by: David Brown - Mon, 19 Feb 2024 13:11 UTC

On 17/02/2024 19:58, Terje Mathisen wrote:
> Anton Ertl wrote:
>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>> On the third (i.e gripping) hand you could have a language like Java=20
>>> where it would be illegal to transform a temporarily trapping loop
>>> into=20
>>> one that would not trap and give the mathematically correct answer.
>>
>> What "temporarily trapping loop" and "mathematically correct answer"?
>>
>> If you are talking about integer arithmetic, the limited integers in
>> Java have modulo semantics, i.e., they don't trap, and BigIntegers
>> certainly
>> don't trap.
>>
>> If you are talking about FP (like I did), by default FP addition does
>> not trap in Java, and any mention of "mathematically correct" in
>> connection with FP needs a lot of further elaboration.
>
> Sorry to be unclear:

I haven't really been following this thread, but there's a few things
here that stand out to me - at least as long as we are talking about C.

>
> I was specifically talking about adding a bunch of integers together,
> some positive and some negative, so that by doing them in program order
> you will get an overflow, but if you did them in some other order, or
> with a double-wide accumulator, the final result would in fact fit in
> the designated target variable.
>
> int8_t sum(int len, int8_t data[])
> {
>   int8_t s = 0;
>   for (unsigned i = 0 i < len; i++) {
>     s += data[i];
>   }
>   return s;
> }
>
> will overflow if called with data = [127, 1, -2], right?

No. In C, int8_t values will be promoted to "int" (which is always at
least 16 bits, on any target) and the operation will therefore not
overflow. The conversion of the result of "s + data[i]" from int to
int8_t, implicit in the assignment, also cannot "overflow" since that
term applies only to the evaluation of operators. But if this value is
outside the range for int8_t, then the conversion is
implementation-defined behaviour. (That is unlike signed integer
overflow, which is undefined behaviour.)

All real-life implementations will define the conversion as
modulo/truncation/wrapping, however you prefer to think of it, though it
is not specified in the standards.

>
> while if you implement it with
>
> int8_t sum(int len, int8_t data[])
> {
>   int s = 0;
>   for (unsigned i = 0 i < len; i++) {
>     s += data[i];
>   }
>   return (int8_t) s;
> }
>
> then you would be OK, and the final result would be mathematically correct.

Converting the "int" to "int8_t" will give the correct value whenever it
is in the range of int8_t. But if we assume that the implementation
does out-of-range conversions as two's complement wrapping, then the
result will be the same no matter when the modulo operations are done.

>
> For this particular example, you would also get the correct answer with
> wrapping arithmetic, even if that by default is UB in modern C.
>

There's no UB in either case. Only IB.


devel / comp.arch / Re: Cray style vectors

Pages:12345678910
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor