Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

One Bell System - it sometimes works.


devel / comp.arch / Re: Solving the Floating-Point Conundrum

SubjectAuthor
* Solving the Floating-Point ConundrumQuadibloc
+* Re: Solving the Floating-Point ConundrumStephen Fuld
|+* Re: Solving the Floating-Point ConundrumQuadibloc
||+- Re: Solving the Floating-Point ConundrumJohn Levine
||`- Re: Solving the Floating-Point ConundrumStephen Fuld
|`* Re: Solving the Floating-Point Conundrummac
| `- Re: Solving the Floating-Point ConundrumThomas Koenig
+* Re: Solving the Floating-Point ConundrumMitchAlsup
|+* Re: Solving the Floating-Point ConundrumQuadibloc
||+* Re: Solving the Floating-Point ConundrumMitchAlsup
|||`* Re: Solving the Floating-Point ConundrumQuadibloc
||| `* Re: Solving the Floating-Point ConundrumMitchAlsup
|||  `- Re: Solving the Floating-Point ConundrumQuadibloc
||`- Re: Solving the Floating-Point ConundrumJohn Dallman
|+- Re: Solving the Floating-Point ConundrumScott Lurndal
|`* Re: Solving the Floating-Point ConundrumQuadibloc
| +* Re: Solving the Floating-Point ConundrumMitchAlsup
| |`* Re: Solving the Floating-Point ConundrumBGB
| | +* Re: Solving the Floating-Point ConundrumScott Lurndal
| | |+* Re: Solving the Floating-Point ConundrumQuadibloc
| | ||+* Re: Solving the Floating-Point ConundrumMitchAlsup
| | |||`- Re: Solving the Floating-Point ConundrumTerje Mathisen
| | ||`* Re: Solving the Floating-Point ConundrumBGB
| | || `* Re: Solving the Floating-Point ConundrumStephen Fuld
| | ||  `* Re: Solving the Floating-Point ConundrumScott Lurndal
| | ||   `- Re: Solving the Floating-Point ConundrumMitchAlsup
| | |`* Re: Solving the Floating-Point ConundrumThomas Koenig
| | | `* Re: memory speeds, Solving the Floating-Point ConundrumJohn Levine
| | |  +- Re: memory speeds, Solving the Floating-Point ConundrumQuadibloc
| | |  +* Re: memory speeds, Solving the Floating-Point ConundrumScott Lurndal
| | |  |+* Re: memory speeds, Solving the Floating-Point ConundrumMitchAlsup
| | |  ||+* Re: memory speeds, Solving the Floating-Point ConundrumEricP
| | |  |||+* Re: memory speeds, Solving the Floating-Point ConundrumScott Lurndal
| | |  ||||`* Re: memory speeds, Solving the Floating-Point ConundrumEricP
| | |  |||| `- Re: memory speeds, Solving the Floating-Point ConundrumScott Lurndal
| | |  |||+- Re: memory speeds, Solving the Floating-Point ConundrumQuadibloc
| | |  |||+* Re: memory speeds, Solving the Floating-Point ConundrumJohn Levine
| | |  ||||`* Re: memory speeds, Solving the Floating-Point ConundrumEricP
| | |  |||| `- Re: memory speeds, Solving the Floating-Point ConundrumMitchAlsup
| | |  |||+- Re: memory speeds, Solving the Floating-Point ConundrumMitchAlsup
| | |  |||`- Re: memory speeds, Solving the Floating-Point ConundrumMitchAlsup
| | |  ||`* Re: memory speeds, Solving the Floating-Point ConundrumTimothy McCaffrey
| | |  || `- Re: memory speeds, Solving the Floating-Point ConundrumMitchAlsup
| | |  |`* Re: memory speeds, Solving the Floating-Point ConundrumQuadibloc
| | |  | +- Re: memory speeds, Solving the Floating-Point ConundrumMitchAlsup
| | |  | `- Re: memory speeds, Solving the Floating-Point Conundrummoi
| | |  `* Re: memory speeds, Solving the Floating-Point ConundrumAnton Ertl
| | |   +* Re: memory speeds, Solving the Floating-Point ConundrumMichael S
| | |   |+* Re: memory speeds, Solving the Floating-Point ConundrumJohn Levine
| | |   ||+- Re: memory speeds, Solving the Floating-Point ConundrumLynn Wheeler
| | |   ||`* Re: memory speeds, Solving the Floating-Point ConundrumAnton Ertl
| | |   || +- Re: memory speeds, Solving the Floating-Point ConundrumEricP
| | |   || `- Re: memory speeds, Solving the Floating-Point ConundrumJohn Levine
| | |   |`* Re: memory speeds, Solving the Floating-Point ConundrumAnton Ertl
| | |   | `- Re: memory speeds, Solving the Floating-Point ConundrumStephen Fuld
| | |   `* Re: memory speeds, Solving the Floating-Point ConundrumThomas Koenig
| | |    `- Re: memory speeds, Solving the Floating-Point ConundrumAnton Ertl
| | +* Re: Solving the Floating-Point ConundrumQuadibloc
| | |`* Re: Solving the Floating-Point ConundrumBGB
| | | `- Re: Solving the Floating-Point ConundrumStephen Fuld
| | +- Re: Solving the Floating-Point ConundrumMitchAlsup
| | `- Re: Solving the Floating-Point ConundrumMitchAlsup
| +* Re: Solving the Floating-Point ConundrumQuadibloc
| |`* Re: Solving the Floating-Point ConundrumQuadibloc
| | `* Re: Solving the Floating-Point ConundrumBGB
| |  `- Re: Solving the Floating-Point ConundrumScott Lurndal
| `* Re: Solving the Floating-Point ConundrumTimothy McCaffrey
|  +- Re: Solving the Floating-Point ConundrumScott Lurndal
|  +- Re: Solving the Floating-Point ConundrumStephen Fuld
|  +* Re: Solving the Floating-Point ConundrumQuadibloc
|  |`* Re: Solving the Floating-Point ConundrumQuadibloc
|  | +* Re: Solving the Floating-Point ConundrumQuadibloc
|  | |`* Re: Solving the Floating-Point ConundrumThomas Koenig
|  | | `* Re: Solving the Floating-Point ConundrumQuadibloc
|  | |  `* Re: Solving the Floating-Point ConundrumThomas Koenig
|  | |   `* Re: Solving the Floating-Point ConundrumQuadibloc
|  | |    `- Re: Solving the Floating-Point ConundrumThomas Koenig
|  | +* Re: Solving the Floating-Point ConundrumMitchAlsup
|  | |+- Re: Solving the Floating-Point ConundrumTerje Mathisen
|  | |`* Re: Solving the Floating-Point ConundrumQuadibloc
|  | | +* Re: Solving the Floating-Point ConundrumThomas Koenig
|  | | |+* Re: Solving the Floating-Point ConundrumJohn Dallman
|  | | ||+- Re: Solving the Floating-Point ConundrumQuadibloc
|  | | ||+* Re: Solving the Floating-Point ConundrumQuadibloc
|  | | |||+* Re: Solving the Floating-Point ConundrumMichael S
|  | | ||||+* Re: Solving the Floating-Point ConundrumMitchAlsup
|  | | |||||`- Re: Solving the Floating-Point ConundrumQuadibloc
|  | | ||||`- Re: Solving the Floating-Point ConundrumQuadibloc
|  | | |||+* Re: Solving the Floating-Point ConundrumMitchAlsup
|  | | ||||`- Re: Solving the Floating-Point ConundrumQuadibloc
|  | | |||`* Re: Solving the Floating-Point ConundrumTerje Mathisen
|  | | ||| `* Re: Solving the Floating-Point ConundrumMitchAlsup
|  | | |||  +* Re: Solving the Floating-Point Conundrumrobf...@gmail.com
|  | | |||  |+- Re: Solving the Floating-Point ConundrumScott Lurndal
|  | | |||  |+* Re: Solving the Floating-Point ConundrumMitchAlsup
|  | | |||  ||`- Re: Solving the Floating-Point ConundrumGeorge Neuner
|  | | |||  |+- Re: Solving the Floating-Point ConundrumThomas Koenig
|  | | |||  |`* Re: Solving the Floating-Point ConundrumTerje Mathisen
|  | | |||  | `- Re: Solving the Floating-Point ConundrumBGB
|  | | |||  `* Re: Solving the Floating-Point ConundrumTerje Mathisen
|  | | |||   +* Re: Solving the Floating-Point Conundrumcomp.arch
|  | | |||   `* Re: Solving the Floating-Point ConundrumMitchAlsup
|  | | ||`* Re: Solving the Floating-Point ConundrumQuadibloc
|  | | |`* Re: Solving the Floating-Point ConundrumJohn Levine
|  | | `- Re: Solving the Floating-Point ConundrumMitchAlsup
|  | +- Re: Solving the Floating-Point ConundrumQuadibloc
|  | `* Re: Solving the Floating-Point ConundrumStefan Monnier
|  +* Re: Solving the Floating-Point ConundrumBGB
|  `- Re: Solving the Floating-Point ConundrumThomas Koenig
+* Re: Solving the Floating-Point ConundrumMitchAlsup
`- Re: Solving the Floating-Point ConundrumQuadibloc

Pages:12345678910
Re: data sizes, Solving the Floating-Point Conundrum

<uel56v$1nko$1@gal.iecc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34274&group=comp.arch#34274

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: johnl@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: data sizes, Solving the Floating-Point Conundrum
Date: Fri, 22 Sep 2023 22:41:35 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <uel56v$1nko$1@gal.iecc.com>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <c35d6ff9-420e-438f-ac5c-78806df57f91n@googlegroups.com> <71d6df28-ece0-4aa4-b07c-051ca81aab4an@googlegroups.com> <000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com>
Injection-Date: Fri, 22 Sep 2023 22:41:35 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="56984"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <c35d6ff9-420e-438f-ac5c-78806df57f91n@googlegroups.com> <71d6df28-ece0-4aa4-b07c-051ca81aab4an@googlegroups.com> <000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Fri, 22 Sep 2023 22:41 UTC

According to Quadibloc <jsavard@ecn.ab.ca>:
>I have an idea, from what I've read, about what lengths are desirable
>for floating-point numbers.
>
>Integers... well, the primary integer type needs to be big enough to
>serve as an index to an array. 32 bits used to do that, and now we need
>64 bits. Although the physical memory addresses are really only 48
>bits... but then, if bigger arrays can live in virtual memory, then indexes
>into them will also be wanted.

The 801 decided that it need registers big enough to hold addresses,
which in that era were 24 bits. I don't see anything's changed there
except that addresses are bigger.

>On the System/360, of course, packed decimal integers were like
>*strings*, and could be any length. But those operations were
>memory to memory, and thus they would be very slow on today's
>computers. So the ability to do packed decimal operations in
>registers is important.

They considered and rejected decimal registers because in that era,
decimal calculations tended to be simple and I/O limited so there
weren't enough intermediate values to make registers worth it.

But on z/Series they do indeed have packed decimal vector instructions
using the 128 bit vector registers as 31 digits and a sign. There is
also decimal floating point, but vector floating point is only in
binary.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Solving the Floating-Point Conundrum

<78cd4ff6-d715-4886-950d-cb1a8d3c6654n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34275&group=comp.arch#34275

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:278c:b0:76f:52f:3f86 with SMTP id g12-20020a05620a278c00b0076f052f3f86mr8205qkp.9.1695433851822;
Fri, 22 Sep 2023 18:50:51 -0700 (PDT)
X-Received: by 2002:a9d:62d7:0:b0:6bf:146a:b86 with SMTP id
z23-20020a9d62d7000000b006bf146a0b86mr412452otk.3.1695433851579; Fri, 22 Sep
2023 18:50:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 22 Sep 2023 18:50:51 -0700 (PDT)
In-Reply-To: <9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com> <f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com> <uefkrv$ag9f$1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com> <9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <78cd4ff6-d715-4886-950d-cb1a8d3c6654n@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: robfi680@gmail.com (robf...@gmail.com)
Injection-Date: Sat, 23 Sep 2023 01:50:51 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: robf...@gmail.com - Sat, 23 Sep 2023 01:50 UTC

On Friday, September 22, 2023 at 10:26:38 AM UTC-4, MitchAlsup wrote:
> On Thursday, September 21, 2023 at 9:05:14 PM UTC-5, JimBrakefield wrote:
> > On Wednesday, September 20, 2023 at 3:32:03 PM UTC-5, Thomas Koenig wrote:
> > > MitchAlsup <Mitch...@aol.com> schrieb:
> > > > On Sunday, September 17, 2023 at 3:30:19 PM UTC-5, John Levine wrote:
> > > >> According to Thomas Koenig <tko...@netcologne.de>:
> > > >> >> That's not a power-of-two length, so how do I keep using these numbers both
> > > >> >> efficient and simple?
> > > >> >
> > > >> >Make the architecture byte-addressable, with another width for the
> > > >> >bytes; possible choices are 6 and 9.
> > > >> I'm pretty sure the world has spoken and we are going to use 8-bit
> > > >> bytes forever. I liked the PDP-8 and PDP-10 but they are, you know, dead.
> > > ><
> > > > In addition, the world has spoken and little endian also won.
> > > ><
> > > >> >Then make your architecture capable of misaligned loads and stores
> > > >> >and an extra floating point format, maybe 45 bits, with 9 bits
> > > >> >exponent and 36 bits of significand.
> > > ><
> > > >> If you're worried about performance, use your 45 bit format and store
> > > >> it in a 64 bit word.
> > > ><
> > > > In 1985 one could get a descent 32-bit pipelined RISC architecture in 1cm^2
> > > > Today this design in < 0.1mm^2 or you can make a GBOoO version < 2mm^2.
> > > ><
> > > > And you really need 5mm^2 to get enough pins on the part to feed what you
> > > > can put inside; 7mm^2 makes even more sense on pins versus perf.
> > > ><
> > > > So, why are you catering to ANY bit counts less than 64 ??
> > > > Intel has version with 512-bit data paths, GPUs generally use 1024-bits in
> > > > and 1024 bits out per cycle continuously per shader core.
> > > ><
> > > > It is no longer 1990, adjust your thinking to the modern realities or our time !
> > >
> > > There could be a justification for an intermediate floating point
> > > design - memory bandwidth (and ALU width).
> > >
> > > If you look at linear algebra solvers, these are usually limited
> > > by memory bandwidth. A 512-bit cache line size accomodates
> > > 8 64-bit numbers, 10 48-bit numbers, 12 40-bit numbers, 14
> > > 36-bit numbers or 16 32-bit numbers.
> > >
> > > For problems where 32 bits are not enough, but a few more bits
> > > might suffice, having additional intermediate floating point sizes
> > > could offer significant speedup.
> > Ugh The business case for non-power-of-two floats:
> > The core count (or lane count) increases for shorter floats
> > 25% increase for 48-bit floats, 60% for 40-bit floats and 75% for 36-bit floats versus 64-bit floats.
> > Ignoring super-linear transistor counts and logic delay, this directly translates into performance advantage.
> <
> One builds FP calculation resources as big as longest container needed at full throughput.
> In a 64-bit machine, this is one with a 11-bit exponent and a 52-bit fraction.
> On such a machine, the latency is set by the calculations on this sized number.
> AND
> Smaller width numbers do not save any cycles.
> <
> So, the only advantage one has with 48-bit, ... numbers is memory footprint.
> There is NO (nada, zero, zilch) advantage in calculation latency.
> <
Does that include complicated calculations too? What about trig functions, square root, or other iterative functions?
As I have implemented reciprocal square root in micro-code it takes longer for greater precision. Makes me think
there is some benefit to supporting varying precisions.

> > As L1, L2 and L3 data caches are on chip, they can be specialized for the float size.
> > Data transfers between DRAM and the processor chip become more complicated, but as DRAM is much slower,
> > the effect is less noticeable.
> > The instructions for these different floating point units can remain 8-bit byte sized, e.g. employ a Harvard architecture.
> > (a given chip would normally support a single float size or a half or fourth thereof)

Re: Solving the Floating-Point Conundrum

<uelqhs$lprp$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34276&group=comp.arch#34276

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Solving the Floating-Point Conundrum
Date: Fri, 22 Sep 2023 23:45:46 -0500
Organization: A noiseless patient Spider
Lines: 111
Message-ID: <uelqhs$lprp$1@dont-email.me>
References: <ue788u$4u5l$1@newsreader4.netcologne.de>
<memo.20230917185814.16292G@jgd.cix.co.uk>
<ba7d6a5d-2373-4f55-a640-69b1ab3e00bbn@googlegroups.com>
<uea4ou$1sudt$1@dont-email.me>
<de99236c-9bd2-4c5b-98b4-c9e2985eb1b0n@googlegroups.com>
<7df1b835-02e5-4988-96ef-2af73b8587d7n@googlegroups.com>
<uebprn$29i6b$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 23 Sep 2023 04:45:49 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="6f9460333be648395718ddfc402ae9a1";
logging-data="714617"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+43cHTQnL+afco5EhIKGAJ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:9urwwyf9WWl5BcAQ6X/TKPIfKwU=
Content-Language: en-US
In-Reply-To: <uebprn$29i6b$2@dont-email.me>
 by: BGB - Sat, 23 Sep 2023 04:45 UTC

On 9/19/2023 4:32 AM, Terje Mathisen wrote:
> robf...@gmail.com wrote:
>> What kind of precision is needed for space-time co-ordinates? I tried
>> looking this
>> up and get the impression that 64-bit floats may not be enough, but
>> 128-bit floats
>> are overkill. The reference was a space game based in a galaxy.
>>
> I would use the same trick as LiDAR, with coordinates in each chunk/file
> being 32-bit integers, and the header contains the origo offset and
> scale factors needed to convert into global coords.
>
> In a game you probably only need exact coords for items which are
> reasonably close to you.
>

In my 3D engines, one common issue came up here:
OpenGL only really works with 32-bit floats;
If your game world is "bigger than Texas", this isn't going to work...

So, typically, coordinates were often stored relative to the current
region. One would add these to the region's address to get a location in
the world.

For rendering and local interactions, the whole coordinate space would
move, typically with the region containing the camera being used as the
local origin for the coordinate space. As the camera moves to a
different region, the whole space moves such that the current region is
now the origin.

So, there may end up being several ways to represent locations:
64-bit, 3x Float21 (Binary16+5b): Region local coordinates.
Binary16 isn't quite sufficient even for this.
96/128-bit, 3x Binary32: Camera local coordinates
128-bit, 3x Binary42 (Binary32+10b): World-global coordinates.
This was used for currently active entities.

This was sufficient for a world size of roughly 1024km x 1024km x 256m
(in my BGBTech2 engine, 1).
In my 3D engines, the world was a torus, rather than a plane;
The way that the coordinate-space worked allowed a seamless wrap.

Some logic (like the terrain system) also used fixed point 3x 20.12
coordinates.

1: In BGBTech3, this was dropped to 64km x 64km x 128m, but was a
similar idea (region size dropping from 256x256x256 to 128x128x128;
which only needs around 1/8 as much RAM per loaded region).

Rendering would work across the seam, which was mostly invisible as far
as the player was concerned (where region coordinates would wrap across
the edge).

For things like entity collisions, there was a little bit of hacking
across the seam (seam-crossing entities would be handled as-if they
existed on both sides of the seam at the same time).

Can also note terrain-generation noise functions:
The BGBTech1 engine had used Perlin noise;
The BGBTech2 had use spatial hashing:
Hash the coordinates via arcane use of shift and xor.
shift-xor gave better results in this case
multiply-by-prime-and-add has obvious repeating patterns.
Use hash result as a random number between 0 and 1.
Interpolate between points on a 2D or 3D grid.
Scale as necessary to get desired results.
Add multiple "frequencies" at different "weights" to get results;
One can mix magic numbers into the hash for multiple.
IIRC, BGBTech3 had used an intermediate strategy.

Actual Perlin noise required an intermediate lookup table and a bunch of
extra math which was seemingly unnecessary if the noise is driven
directly by a 2D or 3D hash function.

Similarly, each sub-layer effectively needs its own lookup table for the
noise function, where the costs add up quickly.

Had to get creative with XOR, as while intuitively, one might think they
could do, say:
h=((((((((x*65521)+y)*65521)+z)*65521)+w)*65521)>>16)&255;
This actually sucks, generating obvious patterns that tend to repeat
along the diagonal axes.

Contrast with, for example:
h=w;
h=(h<<7)^(~(h>>17))^x;
h=(h<<7)^(~(h>>17))^y;
h=(h<<7)^(~(h>>17))^z;
h=(h<<7)^(~(h>>17))^w;
h=(h<<7)^(~(h>>17)); //add more mixing steps here as needed
h=h^(h>>16);
h=(h^(h>>8))&255;
Where, say, x/y/z are (integer) coordinates and w is a magic number
(say, a 'w' value is generated for each sub-function as a random number
generated from the world seed or similar).

And, if one wants a floating point value from 0 .. 1, they do
"v=(h/255.0);" or similar.

Not entirely sure why seemingly this sort of strategy isn't more common.

....

Re: data sizes, Solving the Floating-Point Conundrum

<uem23j$mol6$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34277&group=comp.arch#34277

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: data sizes, Solving the Floating-Point Conundrum
Date: Sat, 23 Sep 2023 08:54:42 +0200
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <uem23j$mol6$1@dont-email.me>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<c35d6ff9-420e-438f-ac5c-78806df57f91n@googlegroups.com>
<71d6df28-ece0-4aa4-b07c-051ca81aab4an@googlegroups.com>
<000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com>
<uel56v$1nko$1@gal.iecc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 23 Sep 2023 06:54:43 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b43d3877bc6f2ad4cccf5e0e9183a6d2";
logging-data="746150"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18qvPmYpWkXyUYJFKbaTMFA1BlG3mTTDYkoF/2D9mGjhQ=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17
Cancel-Lock: sha1:rAv3JyaS7ejI3EPeZBLLSwyMR5Q=
In-Reply-To: <uel56v$1nko$1@gal.iecc.com>
 by: Terje Mathisen - Sat, 23 Sep 2023 06:54 UTC

John Levine wrote:
> According to Quadibloc <jsavard@ecn.ab.ca>:
>> I have an idea, from what I've read, about what lengths are desirable
>> for floating-point numbers.
>>
>> Integers... well, the primary integer type needs to be big enough to
>> serve as an index to an array. 32 bits used to do that, and now we need
>> 64 bits. Although the physical memory addresses are really only 48
>> bits... but then, if bigger arrays can live in virtual memory, then indexes
>> into them will also be wanted.
>
> The 801 decided that it need registers big enough to hold addresses,
> which in that era were 24 bits. I don't see anything's changed there
> except that addresses are bigger.
>
>> On the System/360, of course, packed decimal integers were like
>> *strings*, and could be any length. But those operations were
>> memory to memory, and thus they would be very slow on today's
>> computers. So the ability to do packed decimal operations in
>> registers is important.
>
> They considered and rejected decimal registers because in that era,
> decimal calculations tended to be simple and I/O limited so there
> weren't enough intermediate values to make registers worth it.
>
> But on z/Series they do indeed have packed decimal vector instructions
> using the 128 bit vector registers as 31 digits and a sign. There is

So really nybble math?

I find it interesting that Intel included a set Ascii/Nybble
instructions on the 8086/88, this might have been the least used part
of the entire x86 instruction set?

There are some exceptions, typically related to size optimization
contests where these single and double-byte opcodes were abused because
they were shorter than the normal way to do stuff, like splitting a byte
into hex nybbles by using the initially undocumented feature of using 16
to do div/mod 16 instead of mask and shift.

Even on the original 8086 you could pack 4 nybbles into a register and
operate on them in parallel, and as soon as we got to the 386, packing 8
nybbles into a 32-bit reg was so much faster than any nybble-based loop
was just stupid.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Solving the Floating-Point Conundrum

<uem2jm$mqmm$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34278&group=comp.arch#34278

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Solving the Floating-Point Conundrum
Date: Sat, 23 Sep 2023 09:03:18 +0200
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <uem2jm$mqmm$1@dont-email.me>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com>
<f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com>
<uefkrv$ag9f$1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com>
<9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
<78cd4ff6-d715-4886-950d-cb1a8d3c6654n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 23 Sep 2023 07:03:18 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b43d3877bc6f2ad4cccf5e0e9183a6d2";
logging-data="748246"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/1ayZM2BWvJ8k/GlnX6CcaAAJdyXVffvLjMr2eg0oDng=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17
Cancel-Lock: sha1:O/m0z4/kO4dBNA/CTJFVErVxlmM=
In-Reply-To: <78cd4ff6-d715-4886-950d-cb1a8d3c6654n@googlegroups.com>
 by: Terje Mathisen - Sat, 23 Sep 2023 07:03 UTC

robf...@gmail.com wrote:
> On Friday, September 22, 2023 at 10:26:38 AM UTC-4, MitchAlsup wrote:
>> One builds FP calculation resources as big as longest container needed at full throughput.
>> In a 64-bit machine, this is one with a 11-bit exponent and a 52-bit fraction.
>> On such a machine, the latency is set by the calculations on this sized number.
>> AND
>> Smaller width numbers do not save any cycles.
>> <
>> So, the only advantage one has with 48-bit, ... numbers is memory footprint.
>> There is NO (nada, zero, zilch) advantage in calculation latency.
>> <
> Does that include complicated calculations too? What about trig
> functions, square root, or other iterative functions? As I have
> implemented reciprocal square root in micro-code it takes longer for
> greater precision. Makes me think there is some benefit to supporting
> varying precisions.
This is easy to verify: Lookup the latency for both 32 and 64-bit
versions of the function you are interested in!

If they differ by less then 25%, then anything intermediate really
doesn't make sense for a HW op.

In software you can more easily play tricks like the infamous InvSqrt()
function of Quake III fame, where just 10 bits was sufficient to make
lighting calculations look OK. Today you can do the same, using the
approximate reciprocal square root vector op, and simply skip all the
normal NR stages to follow.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: data sizes, Solving the Floating-Point Conundrum

<2023Sep23.123024@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34279&group=comp.arch#34279

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: data sizes, Solving the Floating-Point Conundrum
Date: Sat, 23 Sep 2023 10:30:24 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 21
Message-ID: <2023Sep23.123024@mips.complang.tuwien.ac.at>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <c35d6ff9-420e-438f-ac5c-78806df57f91n@googlegroups.com> <71d6df28-ece0-4aa4-b07c-051ca81aab4an@googlegroups.com> <000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com> <uel56v$1nko$1@gal.iecc.com>
Injection-Info: dont-email.me; posting-host="a56e51382a37201b890c3dca60a5e2ec";
logging-data="812985"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Z/iniaiV5tw/m+c00GmOk"
Cancel-Lock: sha1:FVTSNxPCO4csf3SS9PN6ZWo9Zqs=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sat, 23 Sep 2023 10:30 UTC

John Levine <johnl@taugh.com> writes:
>But on z/Series they do indeed have packed decimal vector instructions
>using the 128 bit vector registers as 31 digits and a sign. There is
>also decimal floating point

Decimal floating-point hardware seems to be a marketing feature to me.
Is there any real-world application that uses that? How much CPU-time
does it use? Intel has neglected its DFP implementation (in
software), which indicates that any such applications, if they exist,
are insignificant.

As for packed BCD stuff, I expect that the Cobol stuff benefits from
that. But given that the byte-by-byte instructions were good enough
in earlier times with slower CPUs, do we need faster instructions for
that now. Have the amounts of data processed by these programs
increased so much?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Solving the Floating-Point Conundrum

<f2fd635d-71e6-4757-877a-5bedb276afc0n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34280&group=comp.arch#34280

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:355:b0:410:9b45:d7f6 with SMTP id r21-20020a05622a035500b004109b45d7f6mr22690qtw.10.1695486587862;
Sat, 23 Sep 2023 09:29:47 -0700 (PDT)
X-Received: by 2002:a05:6870:3a14:b0:1c8:ce4b:550c with SMTP id
du20-20020a0568703a1400b001c8ce4b550cmr1165074oab.1.1695486587596; Sat, 23
Sep 2023 09:29:47 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 23 Sep 2023 09:29:47 -0700 (PDT)
In-Reply-To: <78cd4ff6-d715-4886-950d-cb1a8d3c6654n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:f9ce:2f52:47e1:2c96;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:f9ce:2f52:47e1:2c96
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com> <f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com> <uefkrv$ag9f$1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com> <9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
<78cd4ff6-d715-4886-950d-cb1a8d3c6654n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f2fd635d-71e6-4757-877a-5bedb276afc0n@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 23 Sep 2023 16:29:47 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 122
 by: MitchAlsup - Sat, 23 Sep 2023 16:29 UTC

On Friday, September 22, 2023 at 8:50:53 PM UTC-5, robf...@gmail.com wrote:
> On Friday, September 22, 2023 at 10:26:38 AM UTC-4, MitchAlsup wrote:
> > On Thursday, September 21, 2023 at 9:05:14 PM UTC-5, JimBrakefield wrote:
> > > On Wednesday, September 20, 2023 at 3:32:03 PM UTC-5, Thomas Koenig wrote:
> > > > MitchAlsup <Mitch...@aol.com> schrieb:
> > > > > On Sunday, September 17, 2023 at 3:30:19 PM UTC-5, John Levine wrote:
> > > > >> According to Thomas Koenig <tko...@netcologne.de>:
> > > > >> >> That's not a power-of-two length, so how do I keep using these numbers both
> > > > >> >> efficient and simple?
> > > > >> >
> > > > >> >Make the architecture byte-addressable, with another width for the
> > > > >> >bytes; possible choices are 6 and 9.
> > > > >> I'm pretty sure the world has spoken and we are going to use 8-bit
> > > > >> bytes forever. I liked the PDP-8 and PDP-10 but they are, you know, dead.
> > > > ><
> > > > > In addition, the world has spoken and little endian also won.
> > > > ><
> > > > >> >Then make your architecture capable of misaligned loads and stores
> > > > >> >and an extra floating point format, maybe 45 bits, with 9 bits
> > > > >> >exponent and 36 bits of significand.
> > > > ><
> > > > >> If you're worried about performance, use your 45 bit format and store
> > > > >> it in a 64 bit word.
> > > > ><
> > > > > In 1985 one could get a descent 32-bit pipelined RISC architecture in 1cm^2
> > > > > Today this design in < 0.1mm^2 or you can make a GBOoO version < 2mm^2.
> > > > ><
> > > > > And you really need 5mm^2 to get enough pins on the part to feed what you
> > > > > can put inside; 7mm^2 makes even more sense on pins versus perf.
> > > > ><
> > > > > So, why are you catering to ANY bit counts less than 64 ??
> > > > > Intel has version with 512-bit data paths, GPUs generally use 1024-bits in
> > > > > and 1024 bits out per cycle continuously per shader core.
> > > > ><
> > > > > It is no longer 1990, adjust your thinking to the modern realities or our time !
> > > >
> > > > There could be a justification for an intermediate floating point
> > > > design - memory bandwidth (and ALU width).
> > > >
> > > > If you look at linear algebra solvers, these are usually limited
> > > > by memory bandwidth. A 512-bit cache line size accomodates
> > > > 8 64-bit numbers, 10 48-bit numbers, 12 40-bit numbers, 14
> > > > 36-bit numbers or 16 32-bit numbers.
> > > >
> > > > For problems where 32 bits are not enough, but a few more bits
> > > > might suffice, having additional intermediate floating point sizes
> > > > could offer significant speedup.
> > > Ugh The business case for non-power-of-two floats:
> > > The core count (or lane count) increases for shorter floats
> > > 25% increase for 48-bit floats, 60% for 40-bit floats and 75% for 36-bit floats versus 64-bit floats.
> > > Ignoring super-linear transistor counts and logic delay, this directly translates into performance advantage.
> > <
> > One builds FP calculation resources as big as longest container needed at full throughput.
> > In a 64-bit machine, this is one with a 11-bit exponent and a 52-bit fraction.
> > On such a machine, the latency is set by the calculations on this sized number.
> > AND
> > Smaller width numbers do not save any cycles.
> > <
> > So, the only advantage one has with 48-bit, ... numbers is memory footprint.
> > There is NO (nada, zero, zilch) advantage in calculation latency.
> > <
> Does that include complicated calculations too? What about trig functions, square root, or other iterative functions?
<
FDIV 17 cycles Goldschmidt with 1 Newton-Raphson iteration
SQRT 22 cycles Goldschmidt with 1 Newton-Raphson iteration
Ln2 16 cycles
ln/ln10 19 cycles
Exp2 17 cycles
Exp/exp10 20 cycles
Sin/Cos 21 cycles including argument reduction
Tan 21 or 38 including argument reduction
Atan 21 or 38 cycles
Pow 36 cycles
All double precision all faithfully rounded all Chebyshev polynomials except as noted.
<
> As I have implemented reciprocal square root in micro-code it takes longer for greater precision. Makes me think
> there is some benefit to supporting varying precisions.
<
SQRT and RSQRT can be done such that precision doubles each iteration.
as such, if you can do 32-bits in K cycles you can do 64-bits in K+3 cycles,
there is not much room for making 48-bits faster. Oh, BTW, K = 19 and loop
iteration is 3 cycles.
<
> > > As L1, L2 and L3 data caches are on chip, they can be specialized for the float size.
> > > Data transfers between DRAM and the processor chip become more complicated, but as DRAM is much slower,
> > > the effect is less noticeable.
> > > The instructions for these different floating point units can remain 8-bit byte sized, e.g. employ a Harvard architecture.
> > > (a given chip would normally support a single float size or a half or fourth thereof)

Re: data sizes, Solving the Floating-Point Conundrum

<uen9ta$t69d$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34281&group=comp.arch#34281

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: data sizes, Solving the Floating-Point Conundrum
Date: Sat, 23 Sep 2023 20:14:02 +0200
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <uen9ta$t69d$1@dont-email.me>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<c35d6ff9-420e-438f-ac5c-78806df57f91n@googlegroups.com>
<71d6df28-ece0-4aa4-b07c-051ca81aab4an@googlegroups.com>
<000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com>
<uel56v$1nko$1@gal.iecc.com> <2023Sep23.123024@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 23 Sep 2023 18:14:02 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b43d3877bc6f2ad4cccf5e0e9183a6d2";
logging-data="956717"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18HpqBrYJnx0X/yIoLy9nvoMom41V7DC/tJfPa+xfQUuw=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17
Cancel-Lock: sha1:GlHh3f7AZ3t5lFb79cwYm/DRkas=
In-Reply-To: <2023Sep23.123024@mips.complang.tuwien.ac.at>
 by: Terje Mathisen - Sat, 23 Sep 2023 18:14 UTC

Anton Ertl wrote:
> John Levine <johnl@taugh.com> writes:
>> But on z/Series they do indeed have packed decimal vector instructions
>> using the 128 bit vector registers as 31 digits and a sign. There is
>> also decimal floating point
>
> Decimal floating-point hardware seems to be a marketing feature to me.
> Is there any real-world application that uses that? How much CPU-time
> does it use? Intel has neglected its DFP implementation (in
> software), which indicates that any such applications, if they exist,
> are insignificant.
>
> As for packed BCD stuff, I expect that the Cobol stuff benefits from
> that. But given that the byte-by-byte instructions were good enough
> in earlier times with slower CPUs, do we need faster instructions for
> that now. Have the amounts of data processed by these programs
> increased so much?

Both Ascii and Nybble math can be done quite easily with SIMD vector
ops, giving an order of magntude better throughput than using the
single-digit instructions.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Solving the Floating-Point Conundrum

<c2f2f9ca-0789-48b5-9047-024f69e2116cn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34282&group=comp.arch#34282

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5691:0:b0:65a:fc29:fc88 with SMTP id bd17-20020ad45691000000b0065afc29fc88mr8110qvb.1.1695496347982;
Sat, 23 Sep 2023 12:12:27 -0700 (PDT)
X-Received: by 2002:a05:6808:14d4:b0:3ad:f525:52bf with SMTP id
f20-20020a05680814d400b003adf52552bfmr1653378oiw.1.1695496347770; Sat, 23 Sep
2023 12:12:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 23 Sep 2023 12:12:27 -0700 (PDT)
In-Reply-To: <f2fd635d-71e6-4757-877a-5bedb276afc0n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com> <f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com> <uefkrv$ag9f$1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com> <9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
<78cd4ff6-d715-4886-950d-cb1a8d3c6654n@googlegroups.com> <f2fd635d-71e6-4757-877a-5bedb276afc0n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c2f2f9ca-0789-48b5-9047-024f69e2116cn@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: robfi680@gmail.com (robf...@gmail.com)
Injection-Date: Sat, 23 Sep 2023 19:12:27 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 8569
 by: robf...@gmail.com - Sat, 23 Sep 2023 19:12 UTC

On Saturday, September 23, 2023 at 12:29:49 PM UTC-4, MitchAlsup wrote:
> On Friday, September 22, 2023 at 8:50:53 PM UTC-5, robf...@gmail.com wrote:
> > On Friday, September 22, 2023 at 10:26:38 AM UTC-4, MitchAlsup wrote:
> > > On Thursday, September 21, 2023 at 9:05:14 PM UTC-5, JimBrakefield wrote:
> > > > On Wednesday, September 20, 2023 at 3:32:03 PM UTC-5, Thomas Koenig wrote:
> > > > > MitchAlsup <Mitch...@aol.com> schrieb:
> > > > > > On Sunday, September 17, 2023 at 3:30:19 PM UTC-5, John Levine wrote:
> > > > > >> According to Thomas Koenig <tko...@netcologne.de>:
> > > > > >> >> That's not a power-of-two length, so how do I keep using these numbers both
> > > > > >> >> efficient and simple?
> > > > > >> >
> > > > > >> >Make the architecture byte-addressable, with another width for the
> > > > > >> >bytes; possible choices are 6 and 9.
> > > > > >> I'm pretty sure the world has spoken and we are going to use 8-bit
> > > > > >> bytes forever. I liked the PDP-8 and PDP-10 but they are, you know, dead.
> > > > > ><
> > > > > > In addition, the world has spoken and little endian also won.
> > > > > ><
> > > > > >> >Then make your architecture capable of misaligned loads and stores
> > > > > >> >and an extra floating point format, maybe 45 bits, with 9 bits
> > > > > >> >exponent and 36 bits of significand.
> > > > > ><
> > > > > >> If you're worried about performance, use your 45 bit format and store
> > > > > >> it in a 64 bit word.
> > > > > ><
> > > > > > In 1985 one could get a descent 32-bit pipelined RISC architecture in 1cm^2
> > > > > > Today this design in < 0.1mm^2 or you can make a GBOoO version < 2mm^2.
> > > > > ><
> > > > > > And you really need 5mm^2 to get enough pins on the part to feed what you
> > > > > > can put inside; 7mm^2 makes even more sense on pins versus perf..
> > > > > ><
> > > > > > So, why are you catering to ANY bit counts less than 64 ??
> > > > > > Intel has version with 512-bit data paths, GPUs generally use 1024-bits in
> > > > > > and 1024 bits out per cycle continuously per shader core.
> > > > > ><
> > > > > > It is no longer 1990, adjust your thinking to the modern realities or our time !
> > > > >
> > > > > There could be a justification for an intermediate floating point
> > > > > design - memory bandwidth (and ALU width).
> > > > >
> > > > > If you look at linear algebra solvers, these are usually limited
> > > > > by memory bandwidth. A 512-bit cache line size accomodates
> > > > > 8 64-bit numbers, 10 48-bit numbers, 12 40-bit numbers, 14
> > > > > 36-bit numbers or 16 32-bit numbers.
> > > > >
> > > > > For problems where 32 bits are not enough, but a few more bits
> > > > > might suffice, having additional intermediate floating point sizes
> > > > > could offer significant speedup.
> > > > Ugh The business case for non-power-of-two floats:
> > > > The core count (or lane count) increases for shorter floats
> > > > 25% increase for 48-bit floats, 60% for 40-bit floats and 75% for 36-bit floats versus 64-bit floats.
> > > > Ignoring super-linear transistor counts and logic delay, this directly translates into performance advantage.
> > > <
> > > One builds FP calculation resources as big as longest container needed at full throughput.
> > > In a 64-bit machine, this is one with a 11-bit exponent and a 52-bit fraction.
> > > On such a machine, the latency is set by the calculations on this sized number.
> > > AND
> > > Smaller width numbers do not save any cycles.
> > > <
> > > So, the only advantage one has with 48-bit, ... numbers is memory footprint.
> > > There is NO (nada, zero, zilch) advantage in calculation latency.
> > > <
> > Does that include complicated calculations too? What about trig functions, square root, or other iterative functions?
> <
> FDIV 17 cycles Goldschmidt with 1 Newton-Raphson iteration
> SQRT 22 cycles Goldschmidt with 1 Newton-Raphson iteration
> Ln2 16 cycles
> ln/ln10 19 cycles
> Exp2 17 cycles
> Exp/exp10 20 cycles
> Sin/Cos 21 cycles including argument reduction
> Tan 21 or 38 including argument reduction
> Atan 21 or 38 cycles
> Pow 36 cycles
> All double precision all faithfully rounded all Chebyshev polynomials except as noted.
> <
> > As I have implemented reciprocal square root in micro-code it takes longer for greater precision. Makes me think
> > there is some benefit to supporting varying precisions.
> <
> SQRT and RSQRT can be done such that precision doubles each iteration.
> as such, if you can do 32-bits in K cycles you can do 64-bits in K+3 cycles,
> there is not much room for making 48-bits faster. Oh, BTW, K = 19 and loop
> iteration is 3 cycles.
> <

I have many more than 3 cycles for an iteration. An FMA takes 8 cycles and there are multiple per iteration.
However, I should have looked at my micro-code more closely. There is indeed no difference in between
calculating out to 64 bit or 48 bits because of the number of bits reached in each iteration.

To get 48 bits an iteration faster would require a much more accurate initial approximation which probably
is not practical.
// RSQRT initial approximation 0
// y = y*(1.5f – xhalf *y*y); // first NR iteration 9.16 bits accurate
// y = y*(1.5f – xhalf *y*y); // second NR iteration 17.69 bits accurate
// y = y*(1.5f – xhalf *y*y); // third NR iteration 35 bits accurate
// y = y*(1.5f – xhalf *y*y); // fourth NR iteration 70 bits accurate

In my case it still looks like there may be value in supporting separate 8/16/32/64/128 bit ops. I guess it
depends on how fast an iteration is. Limited parallel hardware here.

Reciprocal estimate bits/clocks:
16 – 22 clocks
32 – 38 clocks
64 – 54 clocks
16 clocks per iteration plus some overhead.

Scratching my head over how to make the ISA less dependent on the implementation.

> > > > As L1, L2 and L3 data caches are on chip, they can be specialized for the float size.
> > > > Data transfers between DRAM and the processor chip become more complicated, but as DRAM is much slower,
> > > > the effect is less noticeable.
> > > > The instructions for these different floating point units can remain 8-bit byte sized, e.g. employ a Harvard architecture.
> > > > (a given chip would normally support a single float size or a half or fourth thereof)

Re: Solving the Floating-Point Conundrum

<79c18404-45b4-49b0-b947-9cb838600ddbn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34283&group=comp.arch#34283

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5ce1:0:b0:65a:f581:2504 with SMTP id iv1-20020ad45ce1000000b0065af5812504mr21816qvb.8.1695497839949;
Sat, 23 Sep 2023 12:37:19 -0700 (PDT)
X-Received: by 2002:a05:6808:1790:b0:3ae:1e08:41e7 with SMTP id
bg16-20020a056808179000b003ae1e0841e7mr1800362oib.9.1695497839772; Sat, 23
Sep 2023 12:37:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 23 Sep 2023 12:37:19 -0700 (PDT)
In-Reply-To: <c2f2f9ca-0789-48b5-9047-024f69e2116cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:f9ce:2f52:47e1:2c96;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:f9ce:2f52:47e1:2c96
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com> <f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com> <uefkrv$ag9f$1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com> <9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
<78cd4ff6-d715-4886-950d-cb1a8d3c6654n@googlegroups.com> <f2fd635d-71e6-4757-877a-5bedb276afc0n@googlegroups.com>
<c2f2f9ca-0789-48b5-9047-024f69e2116cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <79c18404-45b4-49b0-b947-9cb838600ddbn@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 23 Sep 2023 19:37:19 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 9489
 by: MitchAlsup - Sat, 23 Sep 2023 19:37 UTC

On Saturday, September 23, 2023 at 2:12:30 PM UTC-5, robf...@gmail.com wrote:
> On Saturday, September 23, 2023 at 12:29:49 PM UTC-4, MitchAlsup wrote:
> > On Friday, September 22, 2023 at 8:50:53 PM UTC-5, robf...@gmail.com wrote:
> > > On Friday, September 22, 2023 at 10:26:38 AM UTC-4, MitchAlsup wrote:
> > > > On Thursday, September 21, 2023 at 9:05:14 PM UTC-5, JimBrakefield wrote:
> > > > > On Wednesday, September 20, 2023 at 3:32:03 PM UTC-5, Thomas Koenig wrote:
> > > > > > MitchAlsup <Mitch...@aol.com> schrieb:
> > > > > > > On Sunday, September 17, 2023 at 3:30:19 PM UTC-5, John Levine wrote:
> > > > > > >> According to Thomas Koenig <tko...@netcologne.de>:
> > > > > > >> >> That's not a power-of-two length, so how do I keep using these numbers both
> > > > > > >> >> efficient and simple?
> > > > > > >> >
> > > > > > >> >Make the architecture byte-addressable, with another width for the
> > > > > > >> >bytes; possible choices are 6 and 9.
> > > > > > >> I'm pretty sure the world has spoken and we are going to use 8-bit
> > > > > > >> bytes forever. I liked the PDP-8 and PDP-10 but they are, you know, dead.
> > > > > > ><
> > > > > > > In addition, the world has spoken and little endian also won.
> > > > > > ><
> > > > > > >> >Then make your architecture capable of misaligned loads and stores
> > > > > > >> >and an extra floating point format, maybe 45 bits, with 9 bits
> > > > > > >> >exponent and 36 bits of significand.
> > > > > > ><
> > > > > > >> If you're worried about performance, use your 45 bit format and store
> > > > > > >> it in a 64 bit word.
> > > > > > ><
> > > > > > > In 1985 one could get a descent 32-bit pipelined RISC architecture in 1cm^2
> > > > > > > Today this design in < 0.1mm^2 or you can make a GBOoO version < 2mm^2.
> > > > > > ><
> > > > > > > And you really need 5mm^2 to get enough pins on the part to feed what you
> > > > > > > can put inside; 7mm^2 makes even more sense on pins versus perf.
> > > > > > ><
> > > > > > > So, why are you catering to ANY bit counts less than 64 ??
> > > > > > > Intel has version with 512-bit data paths, GPUs generally use 1024-bits in
> > > > > > > and 1024 bits out per cycle continuously per shader core.
> > > > > > ><
> > > > > > > It is no longer 1990, adjust your thinking to the modern realities or our time !
> > > > > >
> > > > > > There could be a justification for an intermediate floating point
> > > > > > design - memory bandwidth (and ALU width).
> > > > > >
> > > > > > If you look at linear algebra solvers, these are usually limited
> > > > > > by memory bandwidth. A 512-bit cache line size accomodates
> > > > > > 8 64-bit numbers, 10 48-bit numbers, 12 40-bit numbers, 14
> > > > > > 36-bit numbers or 16 32-bit numbers.
> > > > > >
> > > > > > For problems where 32 bits are not enough, but a few more bits
> > > > > > might suffice, having additional intermediate floating point sizes
> > > > > > could offer significant speedup.
> > > > > Ugh The business case for non-power-of-two floats:
> > > > > The core count (or lane count) increases for shorter floats
> > > > > 25% increase for 48-bit floats, 60% for 40-bit floats and 75% for 36-bit floats versus 64-bit floats.
> > > > > Ignoring super-linear transistor counts and logic delay, this directly translates into performance advantage.
> > > > <
> > > > One builds FP calculation resources as big as longest container needed at full throughput.
> > > > In a 64-bit machine, this is one with a 11-bit exponent and a 52-bit fraction.
> > > > On such a machine, the latency is set by the calculations on this sized number.
> > > > AND
> > > > Smaller width numbers do not save any cycles.
> > > > <
> > > > So, the only advantage one has with 48-bit, ... numbers is memory footprint.
> > > > There is NO (nada, zero, zilch) advantage in calculation latency.
> > > > <
> > > Does that include complicated calculations too? What about trig functions, square root, or other iterative functions?
> > <
> > FDIV 17 cycles Goldschmidt with 1 Newton-Raphson iteration
> > SQRT 22 cycles Goldschmidt with 1 Newton-Raphson iteration
> > Ln2 16 cycles
> > ln/ln10 19 cycles
> > Exp2 17 cycles
> > Exp/exp10 20 cycles
> > Sin/Cos 21 cycles including argument reduction
> > Tan 21 or 38 including argument reduction
> > Atan 21 or 38 cycles
> > Pow 36 cycles
> > All double precision all faithfully rounded all Chebyshev polynomials except as noted.
> > <
> > > As I have implemented reciprocal square root in micro-code it takes longer for greater precision. Makes me think
> > > there is some benefit to supporting varying precisions.
> > <
> > SQRT and RSQRT can be done such that precision doubles each iteration.
> > as such, if you can do 32-bits in K cycles you can do 64-bits in K+3 cycles,
> > there is not much room for making 48-bits faster. Oh, BTW, K = 19 and loop
> > iteration is 3 cycles.
> > <
> I have many more than 3 cycles for an iteration. An FMA takes 8 cycles and there are multiple per iteration.
<
You should be able to start a new FMAC every cycle; once you can do this, the
iterations are just dropping stuff into the pipeline.
<
> However, I should have looked at my micro-code more closely. There is indeed no difference in between
> calculating out to 64 bit or 48 bits because of the number of bits reached in each iteration.
>
> To get 48 bits an iteration faster would require a much more accurate initial approximation which probably
> is not practical.
> // RSQRT initial approximation 0
> // y = y*(1.5f – xhalf *y*y); // first NR iteration 9.16 bits accurate
> // y = y*(1.5f – xhalf *y*y); // second NR iteration 17.69 bits accurate
> // y = y*(1.5f – xhalf *y*y); // third NR iteration 35 bits accurate
> // y = y*(1.5f – xhalf *y*y); // fourth NR iteration 70 bits accurate
>
3 dependent multiplies per iteration--Goldschmidt changes this to 2 dependent multiplies per iteration.
{{Then inside the FU:: the multiplies are treated as fixed point so the iteration latency is the height
of the multiplier tree not the latency of the FU itself (from 360/91 FDIV)}}
<
<
> In my case it still looks like there may be value in supporting separate 8/16/32/64/128 bit ops. I guess it
> depends on how fast an iteration is. Limited parallel hardware here.
>
> Reciprocal estimate bits/clocks:
> 16 – 22 clocks
> 32 – 38 clocks
> 64 – 54 clocks
> 16 clocks per iteration plus some overhead.
>
> Scratching my head over how to make the ISA less dependent on the implementation.
> > > > > As L1, L2 and L3 data caches are on chip, they can be specialized for the float size.
> > > > > Data transfers between DRAM and the processor chip become more complicated, but as DRAM is much slower,
> > > > > the effect is less noticeable.
> > > > > The instructions for these different floating point units can remain 8-bit byte sized, e.g. employ a Harvard architecture.
> > > > > (a given chip would normally support a single float size or a half or fourth thereof)

Re: data sizes, Solving the Floating-Point Conundrum

<uenj7h$2mhh$1@gal.iecc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34284&group=comp.arch#34284

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: johnl@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: data sizes, Solving the Floating-Point Conundrum
Date: Sat, 23 Sep 2023 20:53:05 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <uenj7h$2mhh$1@gal.iecc.com>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com> <uel56v$1nko$1@gal.iecc.com> <uem23j$mol6$1@dont-email.me>
Injection-Date: Sat, 23 Sep 2023 20:53:05 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="88625"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com> <uel56v$1nko$1@gal.iecc.com> <uem23j$mol6$1@dont-email.me>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Sat, 23 Sep 2023 20:53 UTC

According to Terje Mathisen <terje.mathisen@tmsw.no>:
>> But on z/Series they do indeed have packed decimal vector instructions
>> using the 128 bit vector registers as 31 digits and a sign. There is
>
>So really nybble math?

No, packed decimal. They're fixed length 31 digit signeddecimal
numbers (the sign is in the 32nd nibble.) Most instructions can
specify the maximum number of significant digits allowed in the
result, and can force positive or negative signs on the result.
There's a multiply and then shift result right, shift dividend left
and divide, and shift and round, again with significant digit limits.
If the result has too many digits, it's either reported with a
condition code or an overflow interrupt. See chapter 25 of the zSeries
POO.

You could do all this with 128 bit integers but the significance
checks and decimal rounding would take a lot of extra instructions.

These are for financial stuff like bond pricing that have to match
formulae invented in the era of mechanical desk calculators. Forty
years ago, I managed to do bond prices and yields using 8087
arithmetic but it would have been a lot easier in decimal since then I
could have directly implemented the spec. I wouldn't have wanted to
try to code up calculations that chained them together.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: data sizes, Solving the Floating-Point Conundrum

<uenjjj$2mhh$2@gal.iecc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34285&group=comp.arch#34285

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: johnl@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: data sizes, Solving the Floating-Point Conundrum
Date: Sat, 23 Sep 2023 20:59:31 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <uenjjj$2mhh$2@gal.iecc.com>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com> <uel56v$1nko$1@gal.iecc.com> <2023Sep23.123024@mips.complang.tuwien.ac.at>
Injection-Date: Sat, 23 Sep 2023 20:59:31 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="88625"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com> <uel56v$1nko$1@gal.iecc.com> <2023Sep23.123024@mips.complang.tuwien.ac.at>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Sat, 23 Sep 2023 20:59 UTC

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>John Levine <johnl@taugh.com> writes:
>>But on z/Series they do indeed have packed decimal vector instructions
>>using the 128 bit vector registers as 31 digits and a sign. There is
>>also decimal floating point
>
>Decimal floating-point hardware seems to be a marketing feature to me.
>Is there any real-world application that uses that?

My impression is that there aren't a lot of people who want it, but
for the ones who do, they want it very much to do their financial
calculations.

Like I said a few minutes ago, there are decimal financial formulae
developed long ago for bond pricing and related time and interest
calculations. Since there are literaly trillions of dollars of
financial instruments priced and sold this way, I can see that for
some customers it would be worth a lot to have decimal arithmetic that
could implement this directly with well defined decimal rounding at
each step to make it easier to write correct code.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: data sizes, Solving the Floating-Point Conundrum

<09798d75-4962-47b8-8816-d554d201a522n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34286&group=comp.arch#34286

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:8593:b0:76d:8827:11a5 with SMTP id pf19-20020a05620a859300b0076d882711a5mr20755qkn.5.1695504284642;
Sat, 23 Sep 2023 14:24:44 -0700 (PDT)
X-Received: by 2002:a9d:6c04:0:b0:6bd:c74e:f21d with SMTP id
f4-20020a9d6c04000000b006bdc74ef21dmr1093720otq.4.1695504284392; Sat, 23 Sep
2023 14:24:44 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 23 Sep 2023 14:24:44 -0700 (PDT)
In-Reply-To: <uenjjj$2mhh$2@gal.iecc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.115; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.115
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com> <uel56v$1nko$1@gal.iecc.com>
<2023Sep23.123024@mips.complang.tuwien.ac.at> <uenjjj$2mhh$2@gal.iecc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <09798d75-4962-47b8-8816-d554d201a522n@googlegroups.com>
Subject: Re: data sizes, Solving the Floating-Point Conundrum
From: already5chosen@yahoo.com (Michael S)
Injection-Date: Sat, 23 Sep 2023 21:24:44 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2866
 by: Michael S - Sat, 23 Sep 2023 21:24 UTC

On Saturday, September 23, 2023 at 11:59:35 PM UTC+3, John Levine wrote:
> According to Anton Ertl <an...@mips.complang.tuwien.ac.at>:
> >John Levine <jo...@taugh.com> writes:
> >>But on z/Series they do indeed have packed decimal vector instructions
> >>using the 128 bit vector registers as 31 digits and a sign. There is
> >>also decimal floating point
> >
> >Decimal floating-point hardware seems to be a marketing feature to me.
> >Is there any real-world application that uses that?
> My impression is that there aren't a lot of people who want it, but
> for the ones who do, they want it very much to do their financial
> calculations.
>
> Like I said a few minutes ago, there are decimal financial formulae
> developed long ago for bond pricing and related time and interest
> calculations. Since there are literaly trillions of dollars of
> financial instruments priced and sold this way, I can see that for
> some customers it would be worth a lot to have decimal arithmetic that
> could implement this directly with well defined decimal rounding at
> each step to make it easier to write correct code.
> --
> Regards,
> John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
> Please consider the environment before reading this e-mail. https://jl.ly

All easily done in sw with even more defined rounding semantics.
And considering how fast modern z CPUs are, they sure wouldn't break
a sweet doing so.

Re: lotsa money and data sizes, Solving the Floating-Point Conundrum

<uenn8d$31cl$1@gal.iecc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34287&group=comp.arch#34287

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: johnl@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: lotsa money and data sizes, Solving the Floating-Point Conundrum
Date: Sat, 23 Sep 2023 22:01:49 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <uenn8d$31cl$1@gal.iecc.com>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <2023Sep23.123024@mips.complang.tuwien.ac.at> <uenjjj$2mhh$2@gal.iecc.com> <09798d75-4962-47b8-8816-d554d201a522n@googlegroups.com>
Injection-Date: Sat, 23 Sep 2023 22:01:49 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="99733"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <2023Sep23.123024@mips.complang.tuwien.ac.at> <uenjjj$2mhh$2@gal.iecc.com> <09798d75-4962-47b8-8816-d554d201a522n@googlegroups.com>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Sat, 23 Sep 2023 22:01 UTC

According to Michael S <already5chosen@yahoo.com>:
>> Like I said a few minutes ago, there are decimal financial formulae
>> developed long ago for bond pricing and related time and interest
>> calculations. Since there are literaly trillions of dollars of
>> financial instruments priced and sold this way, I can see that for
>> some customers it would be worth a lot to have decimal arithmetic that
>> could implement this directly with well defined decimal rounding at
>> each step to make it easier to write correct code.

>All easily done in sw with even more defined rounding semantics.
>And considering how fast modern z CPUs are, they sure wouldn't break
>a sweat doing so.

In the sense that all modern computers are Turing equivalent, sure.
You can do this sort of stuff in python.

But given the cost of getting stuff wrong, it's worth something to
them to know that the decimal stuff is locked down at the hardware
level.

Z implements a lot of the complex instructions in microcode which uses
the hardwired subset of the instruction set, which they call millicode
but apparently DFP is hardware.

This paper describes the hardware and software support for DFP:

https://speleotrove.com/mfc/files/schwarz2009-decimalFP-on-z10.pdf

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Solving the Floating-Point Conundrum

<ueombe$18fbr$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34288&group=comp.arch#34288

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Solving the Floating-Point Conundrum
Date: Sun, 24 Sep 2023 08:52:29 +0200
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <ueombe$18fbr$1@dont-email.me>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com>
<f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com>
<uefkrv$ag9f$1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com>
<9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
<78cd4ff6-d715-4886-950d-cb1a8d3c6654n@googlegroups.com>
<f2fd635d-71e6-4757-877a-5bedb276afc0n@googlegroups.com>
<c2f2f9ca-0789-48b5-9047-024f69e2116cn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 24 Sep 2023 06:52:30 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ad6274270477ca3d3df9816d904a4701";
logging-data="1326459"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX185Oc6w+J4elfpE6cWDOSHnbK3yD1zyLsFuKmQZjHlYFw=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17
Cancel-Lock: sha1:TMhWNFl0gLxiARWqzduydOLeYPI=
In-Reply-To: <c2f2f9ca-0789-48b5-9047-024f69e2116cn@googlegroups.com>
 by: Terje Mathisen - Sun, 24 Sep 2023 06:52 UTC

robf...@gmail.com wrote:
> I have many more than 3 cycles for an iteration. An FMA takes 8 cycles and there are multiple per iteration.
> However, I should have looked at my micro-code more closely. There is indeed no difference in between
> calculating out to 64 bit or 48 bits because of the number of bits reached in each iteration.
>
> To get 48 bits an iteration faster would require a much more accurate initial approximation which probably
> is not practical.
> // RSQRT initial approximation 0
> // y = y*(1.5f – xhalf *y*y); // first NR iteration 9.16 bits accurate

What if I told you that you can get up to 1.7 more bits after the first
NR iteration? You use a slightly different magic number in the bit hack,
then you also modify the two constants in that first NR step: I.e. not
exactly 1.5 and 0.5 but modified to give a cheby style error
distribution over the (0.5 to 2.0) input range.

The result is about 10.8 bits!

> // y = y*(1.5f – xhalf *y*y); // second NR iteration 17.69 bits accurate
~19 bits
> // y = y*(1.5f – xhalf *y*y); // third NR iteration 35 bits accurate
~38 bits
> // y = y*(1.5f – xhalf *y*y); // fourth NR iteration 70 bits accurate
~75 bits

BTW, I independently came up with the idea to modify multiple constants
and got more than a a bit extra, then somebody tipped me off about a guy
from Poland who had done the full optimization of all three at the same
time and gotten half a bit more than me. :-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: data sizes, Solving the Floating-Point Conundrum

<ueomln$18gim$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34289&group=comp.arch#34289

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: data sizes, Solving the Floating-Point Conundrum
Date: Sun, 24 Sep 2023 08:57:58 +0200
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <ueomln$18gim$1@dont-email.me>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com>
<uel56v$1nko$1@gal.iecc.com> <uem23j$mol6$1@dont-email.me>
<uenj7h$2mhh$1@gal.iecc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 24 Sep 2023 06:57:59 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ad6274270477ca3d3df9816d904a4701";
logging-data="1327702"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ARl2pFdJzZKcc5SJg2NEwfHjToR0HsNDoYb/KNfDwhA=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17
Cancel-Lock: sha1:2RSo4NR/yKQmloxCLBu9KHPMGzI=
In-Reply-To: <uenj7h$2mhh$1@gal.iecc.com>
 by: Terje Mathisen - Sun, 24 Sep 2023 06:57 UTC

John Levine wrote:
> According to Terje Mathisen <terje.mathisen@tmsw.no>:
>>> But on z/Series they do indeed have packed decimal vector instructions
>>> using the 128 bit vector registers as 31 digits and a sign. There is
>>
>> So really nybble math?
>
> No, packed decimal. They're fixed length 31 digit signeddecimal
> numbers (the sign is in the 32nd nibble.) Most instructions can
> specify the maximum number of significant digits allowed in the
> result, and can force positive or negative signs on the result.
> There's a multiply and then shift result right, shift dividend left
> and divide, and shift and round, again with significant digit limits.
> If the result has too many digits, it's either reported with a
> condition code or an overflow interrupt. See chapter 25 of the zSeries
> POO.
>
> You could do all this with 128 bit integers but the significance
> checks and decimal rounding would take a lot of extra instructions.
>
> These are for financial stuff like bond pricing that have to match
> formulae invented in the era of mechanical desk calculators. Forty
> years ago, I managed to do bond prices and yields using 8087
> arithmetic but it would have been a lot easier in decimal since then I
> could have directly implemented the spec. I wouldn't have wanted to
> try to code up calculations that chained them together.

I did the exact same thing around 1983, using 80-bit FP to implement
Norwegian rounding rules for my father-in-law's gift card business.

I had to extract the fees first from each individual transaction, then
add them all together to get the sum: Doing the same first summing and
then calculating the fee from the total would have given very nearly the
same results, but not guaranteed for all situations.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: lotsa money and data sizes, Solving the Floating-Point Conundrum

<ueoq2k$gcit$1@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34290&group=comp.arch#34290

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-397b-0-9739-9617-e31c-49d9.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: lotsa money and data sizes, Solving the Floating-Point Conundrum
Date: Sun, 24 Sep 2023 07:56:04 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <ueoq2k$gcit$1@newsreader4.netcologne.de>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<2023Sep23.123024@mips.complang.tuwien.ac.at> <uenjjj$2mhh$2@gal.iecc.com>
<09798d75-4962-47b8-8816-d554d201a522n@googlegroups.com>
<uenn8d$31cl$1@gal.iecc.com>
Injection-Date: Sun, 24 Sep 2023 07:56:04 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-397b-0-9739-9617-e31c-49d9.ipv6dyn.netcologne.de:2001:4dd6:397b:0:9739:9617:e31c:49d9";
logging-data="537181"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sun, 24 Sep 2023 07:56 UTC

John Levine <johnl@taugh.com> schrieb:

> Z implements a lot of the complex instructions in microcode which uses
> the hardwired subset of the instruction set, which they call millicode
> but apparently DFP is hardware.
>
> This paper describes the hardware and software support for DFP:
>
> https://speleotrove.com/mfc/files/schwarz2009-decimalFP-on-z10.pdf

Very interesting link, thanks!

A few interesting snippets: They give the cycle time of the z10
as 15 FO4, which they say is much faster than prior generations.
Not sure how that compares to current designs, but it seems
fast to me.

They also write

"[...] the execution pipeline [for the IBM z] for one instruction
includes both a memory access and an execution stage, whereas
RISC computers require multiple instructions to accomplish the
same task. Nevertheless, resolving memory interlock dependencies
is a concern. Since the operands are in memory, using the result
of a prior operation creates an interlock in memory. If the
operations are not spaced apart in time, the load/store unit (LSU)
or IDU must compare the full addresses to determine the interlock
and somehow bypass the operands. The new decimal floating-point
architecture makes dependencies easier and faster to handle because
the interlocks are simply in registers."

Fixed-point BCD operations are also (to me) surprisingly slow:

"For addition and subtraction, the execution latency is seven
cycles for operands of 8 bytes or less and nine cycles for
operands with greater length. This includes all special cases,
including overflow."

Seven cycles (105 FO4 gate delays) seems like a lot for adding,
but I guess that just speaks to the complexity of BCD arithmetic.

Re: lotsa money and data sizes, Solving the Floating-Point Conundrum

<e2196cd2-5268-4b04-b7ad-19b5b2a4cb8fn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34291&group=comp.arch#34291

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5148:0:b0:65b:7e2:a2db with SMTP id g8-20020ad45148000000b0065b07e2a2dbmr5984qvq.13.1695552590174;
Sun, 24 Sep 2023 03:49:50 -0700 (PDT)
X-Received: by 2002:a05:6870:98a5:b0:1dc:736c:fca8 with SMTP id
eg37-20020a05687098a500b001dc736cfca8mr1691901oab.8.1695552589907; Sun, 24
Sep 2023 03:49:49 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 24 Sep 2023 03:49:49 -0700 (PDT)
In-Reply-To: <ueoq2k$gcit$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:859c:9294:d188:1b63;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:859c:9294:d188:1b63
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<2023Sep23.123024@mips.complang.tuwien.ac.at> <uenjjj$2mhh$2@gal.iecc.com>
<09798d75-4962-47b8-8816-d554d201a522n@googlegroups.com> <uenn8d$31cl$1@gal.iecc.com>
<ueoq2k$gcit$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e2196cd2-5268-4b04-b7ad-19b5b2a4cb8fn@googlegroups.com>
Subject: Re: lotsa money and data sizes, Solving the Floating-Point Conundrum
From: already5chosen@yahoo.com (Michael S)
Injection-Date: Sun, 24 Sep 2023 10:49:50 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3677
 by: Michael S - Sun, 24 Sep 2023 10:49 UTC

On Sunday, September 24, 2023 at 10:56:08 AM UTC+3, Thomas Koenig wrote:
> John Levine <jo...@taugh.com> schrieb:
> > Z implements a lot of the complex instructions in microcode which uses
> > the hardwired subset of the instruction set, which they call millicode
> > but apparently DFP is hardware.
> >
> > This paper describes the hardware and software support for DFP:
> >
> > https://speleotrove.com/mfc/files/schwarz2009-decimalFP-on-z10.pdf
> Very interesting link, thanks!
>
> A few interesting snippets: They give the cycle time of the z10
> as 15 FO4, which they say is much faster than prior generations.
> Not sure how that compares to current designs, but it seems
> fast to me.
>
> They also write
>
> "[...] the execution pipeline [for the IBM z] for one instruction
> includes both a memory access and an execution stage, whereas
> RISC computers require multiple instructions to accomplish the
> same task. Nevertheless, resolving memory interlock dependencies
> is a concern. Since the operands are in memory, using the result
> of a prior operation creates an interlock in memory. If the
> operations are not spaced apart in time, the load/store unit (LSU)
> or IDU must compare the full addresses to determine the interlock
> and somehow bypass the operands. The new decimal floating-point
> architecture makes dependencies easier and faster to handle because
> the interlocks are simply in registers."
>
> Fixed-point BCD operations are also (to me) surprisingly slow:
>
> "For addition and subtraction, the execution latency is seven
> cycles for operands of 8 bytes or less and nine cycles for
> operands with greater length. This includes all special cases,
> including overflow."
>
> Seven cycles (105 FO4 gate delays) seems like a lot for adding,
> but I guess that just speaks to the complexity of BCD arithmetic.

IIRC, z10 was IBM's last "native CISC" design in zArch series.
Starting from z196 they crack load-op into 2 or more uOus, just
like majority of x86 cores does.
It's hard to be sure, because terminology use by IBM is so unique.

Re: Solving the Floating-Point Conundrum

<95d4fd70-4d13-4c62-bd1b-fc1b64d2d481n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34292&group=comp.arch#34292

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:fc52:0:b0:65a:f499:3826 with SMTP id w18-20020a0cfc52000000b0065af4993826mr25996qvp.4.1695554772884;
Sun, 24 Sep 2023 04:26:12 -0700 (PDT)
X-Received: by 2002:a05:6870:8684:b0:1c8:f237:303a with SMTP id
p4-20020a056870868400b001c8f237303amr2071903oam.5.1695554772377; Sun, 24 Sep
2023 04:26:12 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 24 Sep 2023 04:26:12 -0700 (PDT)
In-Reply-To: <ueombe$18fbr$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com> <f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com> <uefkrv$ag9f$1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com> <9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
<78cd4ff6-d715-4886-950d-cb1a8d3c6654n@googlegroups.com> <f2fd635d-71e6-4757-877a-5bedb276afc0n@googlegroups.com>
<c2f2f9ca-0789-48b5-9047-024f69e2116cn@googlegroups.com> <ueombe$18fbr$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <95d4fd70-4d13-4c62-bd1b-fc1b64d2d481n@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: robfi680@gmail.com (robf...@gmail.com)
Injection-Date: Sun, 24 Sep 2023 11:26:12 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3861
 by: robf...@gmail.com - Sun, 24 Sep 2023 11:26 UTC

On Sunday, September 24, 2023 at 2:52:35 AM UTC-4, Terje Mathisen wrote:
> robf...@gmail.com wrote:
> > I have many more than 3 cycles for an iteration. An FMA takes 8 cycles and there are multiple per iteration.
> > However, I should have looked at my micro-code more closely. There is indeed no difference in between
> > calculating out to 64 bit or 48 bits because of the number of bits reached in each iteration.
> >
> > To get 48 bits an iteration faster would require a much more accurate initial approximation which probably
> > is not practical.
> > // RSQRT initial approximation 0
> > // y = y*(1.5f – xhalf *y*y); // first NR iteration 9.16 bits accurate
> What if I told you that you can get up to 1.7 more bits after the first
> NR iteration? You use a slightly different magic number in the bit hack,
> then you also modify the two constants in that first NR step: I.e. not
> exactly 1.5 and 0.5 but modified to give a cheby style error
> distribution over the (0.5 to 2.0) input range.
>
> The result is about 10.8 bits!
> > // y = y*(1.5f – xhalf *y*y); // second NR iteration 17.69 bits accurate
> ~19 bits
> > // y = y*(1.5f – xhalf *y*y); // third NR iteration 35 bits accurate
> ~38 bits
> > // y = y*(1.5f – xhalf *y*y); // fourth NR iteration 70 bits accurate
> ~75 bits
>
> BTW, I independently came up with the idea to modify multiple constants
> and got more than a a bit extra, then somebody tipped me off about a guy
> from Poland who had done the full optimization of all three at the same
> time and gotten half a bit more than me. :-)
> Terje
>
I believe it. It is a good hack. I found the hack researching reciprocal square
roots, but decided to go with the simpler algorithm. It cost clock cycles to
load the different constants and the extra accuracy was not needed.

> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: data sizes, Solving the Floating-Point Conundrum

<VbZPM.124844$bmw6.27779@fx10.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34293&group=comp.arch#34293

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx10.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: data sizes, Solving the Floating-Point Conundrum
Newsgroups: comp.arch
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <c35d6ff9-420e-438f-ac5c-78806df57f91n@googlegroups.com> <71d6df28-ece0-4aa4-b07c-051ca81aab4an@googlegroups.com> <000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com> <uel56v$1nko$1@gal.iecc.com> <2023Sep23.123024@mips.complang.tuwien.ac.at>
Lines: 17
Message-ID: <VbZPM.124844$bmw6.27779@fx10.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sun, 24 Sep 2023 16:10:29 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sun, 24 Sep 2023 16:10:29 GMT
X-Received-Bytes: 1639
 by: Scott Lurndal - Sun, 24 Sep 2023 16:10 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>John Levine <johnl@taugh.com> writes:
>>But on z/Series they do indeed have packed decimal vector instructions
>>using the 128 bit vector registers as 31 digits and a sign. There is
>>also decimal floating point
>
>Decimal floating-point hardware seems to be a marketing feature to me.

Burroughs had decimal floating point (100 digit mantissa, 2 digit exponent)
in the B3500 (circa 1965). It was memory-to-memory, no registers involved.

Turned out that the COBOL customers weren't interested, so the next generation
(B4700) removed it and replaced it with a 24 digit decimal accumulator (20 digit
mantissa, 2 digit exponent, two one-digit signs).

Look for 1025475_B2500_B3500_RefMan_Oct69.pdf on bitsavers.org.

Re: Solving the Floating-Point Conundrum

<fbed57b4-1553-4b63-b39e-c130754b3aa8n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34294&group=comp.arch#34294

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5691:0:b0:65a:fc29:fc88 with SMTP id bd17-20020ad45691000000b0065afc29fc88mr23203qvb.1.1695578379491;
Sun, 24 Sep 2023 10:59:39 -0700 (PDT)
X-Received: by 2002:a05:6808:1511:b0:3a7:56ad:cb9e with SMTP id
u17-20020a056808151100b003a756adcb9emr2766626oiw.9.1695578379073; Sun, 24 Sep
2023 10:59:39 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 24 Sep 2023 10:59:38 -0700 (PDT)
In-Reply-To: <memo.20230917185814.16292G@jgd.cix.co.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa34:c000:9064:ca27:b81:a36e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa34:c000:9064:ca27:b81:a36e
References: <ue788u$4u5l$1@newsreader4.netcologne.de> <memo.20230917185814.16292G@jgd.cix.co.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fbed57b4-1553-4b63-b39e-c130754b3aa8n@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: jsavard@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 24 Sep 2023 17:59:39 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 48
 by: Quadibloc - Sun, 24 Sep 2023 17:59 UTC

On Sunday, September 17, 2023 at 11:58:18 AM UTC-6, John Dallman wrote:
> No
> architecture can do everything efficiently.

This is true enough.

I am aiming, though, to err in the direction of doing lots and lots of things _fast_,
with the penalty that the chip has a lot of extra transistors, and thus extra die
size, as the price of doing so.

The reason I'm making this choice is that I envisage the user's situation to be:

The user is sitting in front of one computer, with one particular type of chip in it.
Running out and buying a different computer with a different chip to make one
particular program run faster is not an option.

So I'm going to try to include in my chip stuff like...

hardware support for packed decimal
hardware support for IBM System/360 hexadecimal floating point

because people do run Hercules on their computers and so on.

On the page

http://www.quadibloc.com/arch/per14.htm

I have now added, at the bottom of the page, a scheme, involving having dual-channel
memory where each channel is 192 bits wide, that permits the operating system
to allocate blocks of 384-bit wide memory, 288-bit wide memory, 240--bit wide
memory, and 256-bit wide memory. Only the 256-bit wide memory requires a division
by three during the translation from internal logical addresses to physical addresses.

So a chip making use of this scheme could allow a very wide selection of architectures
from history to be efficiently emulated.

John Savard

Re: lotsa money and data sizes, Solving the Floating-Point Conundrum

<ueq0gm$1glh$1@gal.iecc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34295&group=comp.arch#34295

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: johnl@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: lotsa money and data sizes, Solving the Floating-Point Conundrum
Date: Sun, 24 Sep 2023 18:52:06 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <ueq0gm$1glh$1@gal.iecc.com>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <09798d75-4962-47b8-8816-d554d201a522n@googlegroups.com> <uenn8d$31cl$1@gal.iecc.com> <ueoq2k$gcit$1@newsreader4.netcologne.de>
Injection-Date: Sun, 24 Sep 2023 18:52:06 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="49841"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <09798d75-4962-47b8-8816-d554d201a522n@googlegroups.com> <uenn8d$31cl$1@gal.iecc.com> <ueoq2k$gcit$1@newsreader4.netcologne.de>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Sun, 24 Sep 2023 18:52 UTC

According to Thomas Koenig <tkoenig@netcologne.de>:
>John Levine <johnl@taugh.com> schrieb:
>
>> Z implements a lot of the complex instructions in microcode which uses
>> the hardwired subset of the instruction set, which they call millicode
>> but apparently DFP is hardware.
>>
>> This paper describes the hardware and software support for DFP:
>>
>> https://speleotrove.com/mfc/files/schwarz2009-decimalFP-on-z10.pdf
>
>Very interesting link, thanks!
>
>A few interesting snippets: They give the cycle time of the z10
>as 15 FO4, which they say is much faster than prior generations.
>Not sure how that compares to current designs, but it seems
>fast to me.

The z15 redbook says the CPUs run at 5.2GHz, same as the z14, but it's
25% faster because it has bigger faster caches.

Read all about the z15 here. CPU description start here:

https://www.redbooks.ibm.com/redbooks/pdfs/sg248851.pdf

The machine is packaged in drawers each of which contain CPUs, four
levels of cache, memory, and I/O controllers. Every CPU can see all
the memory but using memory in other drawers is much slower so they
have stuff with odd acronyms to try and keep related work physically
in one place.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Solving the Floating-Point Conundrum

<ueq3hl$h878$1@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34296&group=comp.arch#34296

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-397b-0-559a-1e42-bbb6-1acb.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Solving the Floating-Point Conundrum
Date: Sun, 24 Sep 2023 19:43:49 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <ueq3hl$h878$1@newsreader4.netcologne.de>
References: <ue788u$4u5l$1@newsreader4.netcologne.de>
<memo.20230917185814.16292G@jgd.cix.co.uk>
<fbed57b4-1553-4b63-b39e-c130754b3aa8n@googlegroups.com>
Injection-Date: Sun, 24 Sep 2023 19:43:49 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-397b-0-559a-1e42-bbb6-1acb.ipv6dyn.netcologne.de:2001:4dd6:397b:0:559a:1e42:bbb6:1acb";
logging-data="565480"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sun, 24 Sep 2023 19:43 UTC

Quadibloc <jsavard@ecn.ab.ca> schrieb:

> So I'm going to try to include in my chip stuff like...

[...]

> hardware support for IBM System/360 hexadecimal floating point

What on Earth for?

Binary is better, and people have not been running serious
scientific software on IBM mainfraimes for decades - there is not
even a Fortran 90, let alone a Fortran 2008 compiler for zOS -
VS Fortran is stuck at Fortran 77.

Re: lotsa money and data sizes, Solving the Floating-Point Conundrum

<8734z3auj4.fsf@localhost>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34297&group=comp.arch#34297

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: lynn@garlic.com (Lynn Wheeler)
Newsgroups: comp.arch
Subject: Re: lotsa money and data sizes, Solving the Floating-Point Conundrum
Date: Sun, 24 Sep 2023 15:52:47 -1000
Organization: Wheeler&Wheeler
Lines: 13
Message-ID: <8734z3auj4.fsf@localhost>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<2023Sep23.123024@mips.complang.tuwien.ac.at>
<uenjjj$2mhh$2@gal.iecc.com>
<09798d75-4962-47b8-8816-d554d201a522n@googlegroups.com>
<uenn8d$31cl$1@gal.iecc.com> <ueoq2k$gcit$1@newsreader4.netcologne.de>
<e2196cd2-5268-4b04-b7ad-19b5b2a4cb8fn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="6ca44d31c81d5ef92e269aa8ff865e9d";
logging-data="1819303"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/mjON0lEx8DgaZ6eLsNtlupU3LKwTF1vY="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:qRPHvCajkf3kScCRw7jXHzYBgSI=
sha1:7ugakpONoelykjUrAz8FzjR7ysk=
 by: Lynn Wheeler - Mon, 25 Sep 2023 01:52 UTC

Michael S <already5chosen@yahoo.com> writes:
> IIRC, z10 was IBM's last "native CISC" design in zArch series.
> Starting from z196 they crack load-op into 2 or more uOus, just
> like majority of x86 cores does.
> It's hard to be sure, because terminology use by IBM is so unique.

they other thing part of z10->z196 was claim that at least half of the
per processor thruoughput increase was introduction of out-of-order
execution, branch prediction, etc ... i also assumed it implied
moving to micro-ops ...

--
virtualization experience starting Jan1968, online at home since Mar1970

Re: data sizes, Solving the Floating-Point Conundrum

<ueqqrd$hfo$1@gal.iecc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34298&group=comp.arch#34298

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: johnl@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: data sizes, Solving the Floating-Point Conundrum
Date: Mon, 25 Sep 2023 02:21:33 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <ueqqrd$hfo$1@gal.iecc.com>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <uel56v$1nko$1@gal.iecc.com> <2023Sep23.123024@mips.complang.tuwien.ac.at> <VbZPM.124844$bmw6.27779@fx10.iad>
Injection-Date: Mon, 25 Sep 2023 02:21:33 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="17912"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <uel56v$1nko$1@gal.iecc.com> <2023Sep23.123024@mips.complang.tuwien.ac.at> <VbZPM.124844$bmw6.27779@fx10.iad>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Mon, 25 Sep 2023 02:21 UTC

According to Scott Lurndal <slp53@pacbell.net>:
>>Decimal floating-point hardware seems to be a marketing feature to me.
>
>Burroughs had decimal floating point (100 digit mantissa, 2 digit exponent)
>in the B3500 (circa 1965). It was memory-to-memory, no registers involved.
>
>Turned out that the COBOL customers weren't interested, ...

I'm not surprised. I would guess the modern market for DFP is more
likely to be high speed financial stuff (what's known as picking up
pennies in front of a steam roller), not accounting.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly


devel / comp.arch / Re: Solving the Floating-Point Conundrum

Pages:12345678910
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor