Message-ID:

Besides, I think Slackware sounds better than 'Microsoft,' don't you? -- Patrick Volkerding

devel / comp.arch / Re: memory speeds, Solving the Floating-Point Conundrum

Re: Solving the Floating-Point Conundrum

<5d119c8b-ff4b-4833-bd56-c92ce272b4d6n@googlegroups.com>

https://news.novabbs.org/devel/article-flat.php?id=34242&group=comp.arch#34242

X-Received: by 2002:a05:622a:19a9:b0:417:611e:98f4 with SMTP id u41-20020a05622a19a900b00417611e98f4mr48889qtc.8.1695247710106;
Wed, 20 Sep 2023 15:08:30 -0700 (PDT)
X-Received: by 2002:a05:6808:148f:b0:3a7:b15d:b59d with SMTP id
e15-20020a056808148f00b003a7b15db59dmr1751165oiw.11.1695247709965; Wed, 20
Sep 2023 15:08:29 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 20 Sep 2023 15:08:29 -0700 (PDT)
In-Reply-To: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:91c2:a2e2:851b:a609;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:91c2:a2e2:851b:a609
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5d119c8b-ff4b-4833-bd56-c92ce272b4d6n@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Wed, 20 Sep 2023 22:08:30 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1883

by: MitchAlsup - Wed, 20 Sep 2023 22:08 UTC

On Tuesday, September 12, 2023 at 12:32:41 AM UTC-5, Quadibloc wrote:
> After trying to work out all kinds of bizarre ways to make a computer
> that uses a 12-bit basic unit of memory work efficiently with
> single-precision floats that are 36 bits long, and double-precision
> floats that are 60 bits long, I've decided it's time to throw in the towel
> on that one, and achieve my goals in another way.
<
Is there a floating point conundrum ??
<
Is there a floating point conundrum that needs to be solved ??
<
What properties do current floating point numbers have (or not have) that
is the basis for this conundrum ??

John Levine <johnl@taugh.com> writes:
> The 801 was a little project at IBM Research to see how much they
> could strip down the hardware and still get good performance with a
> highly optimizing compiler. It was never intended to be a product but
> it worked so well that they later used it in channel controllers and
> it evolved into ROMP and POWER. Vax was always intended to be a
> flagship product since it was evident that the 16 bit PDP-11 was
> running out of gas and (much though its users wished otherwise) word
> addressed 36 bit machines were a dead end,

I would sometimes claim that John was attempting to go to the opposite
extreme of the design of the (failed) Future System effort.
http://www.jfsowa.com/computer/memo125.htm
http://people.cs.clemson.edu/~mark/fs.html

Late 70s was to use 801/risc to replace of large number of different
custom CISC microprocessors ... going to common programming (in place of
large amount of different programming software) ... common Iliad chip to
replace custom CISC microprocess in low & mid-range 370s ... 4361&4381
followon to 4331&4341 (aka 4341 ran about 1MIPS 370, but its CISC
microprocessor avg of ten native instructions per simulated 370
instruction).

I helped with the white paper that showed VLSI technology had gotten to
the point where most of 370 instruction could be directly implemeted in
circuits ... rather than simulating in programming at avg. of ten native
instructions per 370 instruction. For that and other reasons, the
various 801/risc of the period floundered (and found some number of
801/risc engineers going to risc efforts at other vendors).

Iliad was 16bit chip, Los Gatos lab was doing "Blue Iliad", 1st 32bit
801/risc ... really large, hot chip that never reached production (Los
Gatos had previously done JIB-prime ... a really slow CISC
microprocessor used in the 3880 disk controller (thru much of the
80s). Dec80. one of them gave two weeks notice, but spent last two weeks
on "Blue Iliad" chip ... before leaving for risc snake at HP.

801/RISC ROMP was going to be used for the displaywriter follow-on (all
written in PL.8) ... when that got canceled, they decided to retarget
for the unix workstation market (and got the company that had done AT&T
unix port to IBM/PC as PC/IX, to do port for ROMP). Some things needed
to be added to ROMP, like protection domain for an "open" unix operating
system (not needed for the "closed" displaywriter.

For instance any code could reload segment registers as easily as they
could load addressees in general registers). This led to ROMP being
called "40bit addresses" and RIOS being called 52bit (even tho was
32bit). There were sixteen segment registers indexed by the top four
bits of a 32bit address (with 28bit segment displacent), ROMP had 12bit
segment identifiers (28+12=40) and RIOS had 24bit segment identifiers
(28+24=52) ... aka RIOS description was still used long after moved unix
model and user code could no longer abritarily change segment values
(aka 40/52 designation was from when any code could arbitrarily change
segment values).

.... and old email from long ago and far away

Date: 79/07/11 11:00:03
To: wheeler

i heard a funny story: seems the MIT LISP machine people proposed that
IBM furnish them with an 801 to be the engine for their prototype.
B.O. Evans considered their request, and turned them down.. offered
them an 8100 instead! (I hope they told him properly what they
thought of that)

--
virtualization experience starting Jan1968, online at home since Mar1970

Re: Solving the Floating-Point Conundrum

<ueggje$3bqoq$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34244&group=comp.arch#34244

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Solving the Floating-Point Conundrum
Date: Wed, 20 Sep 2023 23:25:16 -0500
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <ueggje$3bqoq$1@dont-email.me>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com>
<f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com>
<e15a4faa-ac4c-4297-bb36-fb0cfcb8e631n@googlegroups.com>
<ue84ep$jmun$1@dont-email.me>
<069cbff0-1e4b-424a-b43d-439ffa0814b0n@googlegroups.com>
<7818370d-f8ec-4b57-b890-2461bd82904an@googlegroups.com>
<22f045ea-1ec6-4adf-8326-4b9246d17f93n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 21 Sep 2023 04:25:18 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e2fc4c590642e6fbf0dcc38b70c97aae";
logging-data="3533594"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18nBqi7qkRWXeoxGZ/q99hX"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:Dxo2aOfGxoYWPJXrj7I72ymkgOA=
Content-Language: en-US
In-Reply-To: <22f045ea-1ec6-4adf-8326-4b9246d17f93n@googlegroups.com>

by: BGB - Thu, 21 Sep 2023 04:25 UTC

On 9/18/2023 9:49 PM, Quadibloc wrote:
> On Monday, September 18, 2023 at 8:30:12 PM UTC-6, JimBrakefield wrote:
>
>> For quadibloc, one can also provision seven 36-bit ALU lanes and for others five 48-bit ALU lanes !!
>
> http://www.quadibloc.com/arch/ar050401.htm
>
> Ages ago, I noted that one could put seven 36-bit floats, or five 51-bit floats, in a 256-bit
> memory word.
>

About the only times I really had much reason to vary from the
traditional power-of-2 floating point formats, was for 3-wide vectors:
A, 3x S.E5.F4 ( 32 bits)
B, 3x S.E5.F15 ( 64 bits)
C, 3x S.E8.F33 (128 bits)

Usually, these were for specific reasons:
Using a narrower 4-wide representation was insufficient;
Using a wider format wasted to much space.

A, This is shaved-down half-float. It is conceptually similar to the
GL_R11F_G11F_B10F format in OpenGL, just with 2 bits left over.

B, Technically, organized as 3x Binary16 values, but where the high 16
bits (W) are divided into 3 5-bit fields, which are glued onto the low
end of the mantissa.

C, organized similar to B, just with Binary32 and 3x 10-bit fields.

The remaining 2-bits are generally treated as context-dependent, with
the W field typically being decoded as 0.0 (B/C) or 1.0 (A).

Ironically, the B and C formats had their origins in one of my past 3D
engines on PC, where I had noted that (based on how they were being
used, *), it ended up being faster on average to keep data in the
narrower format and then pack/unpack to a wider format as-needed, than
to keep the data in a wider format.

*: They were being shuffled around in memory significantly more often
than they were being used in computation. They were mostly being used
for things like XYZ coordinates.

In some cases, the wider mantissa was typically more important than a
wider dynamic range (so, say, Binary32 with 10 more mantissa bits glued
on, was more useful than, say, Binary64 truncated down to 42 bits, with
effectively only 7 more mantissa bits).

A similar idea was carried over to BJX2 as well, where they only really
exist as special purpose format conversion ops (with any actual
calculations being carried out in the more standard formats).

....

Thomas Koenig <tkoenig@netcologne.de> writes:
>Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
>> However, given that the IBM 801 was only operational in 1980

And it gives a reference for that.

>https://acg.cis.upenn.edu/milom/cis501-Fall11/papers/cocke-RISC.pdf
>says it was operational in 1978.

Actually, it says: "The original 801 was completed in 1978 ...". My
guess is that the first 801 (with 16 registers and 16-bit and 32-bit
instructions) was completed in 1978, whereas the second 801 (with 32
registers and fixed-width 32-bit instructions) was operational in
1980.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

John Levine <johnl@taugh.com> writes:
>I agree Vax would never have been a RISC but they do seem to have gone
>overboard with instruction encoding that had to be decoded and
>interpreted a byte at a time.

At least not without significant extra effort. They apparently did
not think about stuff like pipelining at all.

>Even S/360 made it easy to compute the
>addresses and fetch the operands in an instruction in parallel.

Maybe they thought about that, thanks to the Stretch experience, while
DEC engineers only had experience (or at least only that experience
was used) with CPUs where the cycle time was much faster than the
memory speed, so this property was not seen as a problem.
Interestingly, even the Nova with its simple architecture took a
significant number of cycles per instruction.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Michael S <already5chosen@yahoo.com> writes:
>> It is interesting that the IBM 801 was started in 1974 and had a first=20
>> CPU in 1980, while VAX development started in 1976 according to=20
>> <https://en.wikipedia.org/wiki/VAX-11>, the 11/780 was announced in=20
>> October 1977, and shipped in February 1978=20
>> <https://en.wikipedia.org/wiki/Data_General#Fountainhead> despite=20
>> having more features and a more complex instruction set. I guess that=20
>> the research aspect of the 801 cost quite a bit of time.=20
>>=20
>
>Alternative explanation would be that VAX was of far higher priority
>for DEC than 801 was for IBM.

OTOH, DEC needed to do quite a bit more work for shipping the VAX than
IBM had to do for completing the 801.

>Also faction that fought against VAX inside DEC (mostly PDP-10 part
>of the company) was much weaker than anti-801 forces within IBM.

The 801 seems to have been significantly limited (no virtual memory,
24-bit addresses and registers) to avoid such problems.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Anton Ertl wrote:
> John Levine <johnl@taugh.com> writes:
>> I agree Vax would never have been a RISC but they do seem to have gone
>> overboard with instruction encoding that had to be decoded and
>> interpreted a byte at a time.
>
> At least not without significant extra effort. They apparently did
> not think about stuff like pipelining at all.

If anyone is interested, there is a PDF on Gordon Bell's website
containing internal DEC memos to/from him circa 1978 which shows
some of what they were thinking, about the 11, 10/20, VAX, and IBM.

What I found interesting/strange was that as late as 1978/79
Bell still felt the need to have to lobby for VAX internally
(at least that was the tone I got from reading them).

https://gordonbell.azurewebsites.net/Digital/DECMuseum.htm

VAX Strategy
https://gordonbell.azurewebsites.net/Digital/VAX%20Strategy%20c1979.pdf

On page 14 it talks about some of the reasons for VAX complexity
(though he doesn't refer to it thet way).

"VAX was also designed to address the high cost of programming.
....
The architecture has instructions for the important data-types,
the addressing is independent of the data-types and the important
language constants we built into the hardware."
....
The procedure call instruction allows more sub-program sharing than with
architectures that are dependent on conventions (e.g. 360 and 10/20) and it
eliminates a class of systems programming errors resulting from the
multiple assignment of general registers among different programs."

There does not seem to have been any consideration that embedding
such complexity into hardware would make later generations more
difficult to design than competitors products. Memos in other
documents show that by 1984/85 DEC knew internally that the
VAX architecture could not compete against risc designs.

And yes there was no mention of pipelining.

Re: memory speeds, Solving the Floating-Point Conundrum

<uehp75$3jgc2$2@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34252&group=comp.arch#34252

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: memory speeds, Solving the Floating-Point Conundrum
Date: Thu, 21 Sep 2023 08:58:28 -0700
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <uehp75$3jgc2$2@dont-email.me>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<udspsq$27b0q$1@dont-email.me> <qrmMM.7$5jrd.6@fx06.iad>
<udu7us$3v2e2$1@newsreader4.netcologne.de> <ue0esp$ps2$1@gal.iecc.com>
<2023Sep20.170125@mips.complang.tuwien.ac.at>
<2fac2cbe-da9b-4a0b-aca7-7ec2982468fan@googlegroups.com>
<2023Sep21.102146@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 21 Sep 2023 15:58:29 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="626fabe0c5544ad31d7d93442abd8941";
logging-data="3785090"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/gXK+BmUqsWS167VjPirueFgfVse4/O9A="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:K9y2ZYu2diwoCTZTdaQl16fwtF0=
In-Reply-To: <2023Sep21.102146@mips.complang.tuwien.ac.at>
Content-Language: en-US

by: Stephen Fuld - Thu, 21 Sep 2023 15:58 UTC

On 9/21/2023 1:21 AM, Anton Ertl wrote:
> Michael S <already5chosen@yahoo.com> writes:
>>> It is interesting that the IBM 801 was started in 1974 and had a first=20
>>> CPU in 1980, while VAX development started in 1976 according to=20
>>> <https://en.wikipedia.org/wiki/VAX-11>, the 11/780 was announced in=20
>>> October 1977, and shipped in February 1978=20
>>> <https://en.wikipedia.org/wiki/Data_General#Fountainhead> despite=20
>>> having more features and a more complex instruction set. I guess that=20
>>> the research aspect of the 801 cost quite a bit of time.=20
>>> =20
>>
>> Alternative explanation would be that VAX was of far higher priority
>> for DEC than 801 was for IBM.
>
> OTOH, DEC needed to do quite a bit more work for shipping the VAX than
> IBM had to do for completing the 801.
>
>
>> Also faction that fought against VAX inside DEC (mostly PDP-10 part
>> of the company) was much weaker than anti-801 forces within IBM.
>
> The 801 seems to have been significantly limited (no virtual memory,
> 24-bit addresses and registers) to avoid such problems.

While I agree, of course, that the 801 didn't have those features, I
don't think the motivation was "to avoid such problems". At least,
according to

https://en.wikipedia.org/wiki/IBM_801
the 801 was designed for "embedded" applications where things such as
those simply weren't needed.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Solving the Floating-Point Conundrum

<c35d6ff9-420e-438f-ac5c-78806df57f91n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34253&group=comp.arch#34253

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:174c:b0:656:1710:5bb9 with SMTP id dc12-20020a056214174c00b0065617105bb9mr76231qvb.8.1695316283278;
Thu, 21 Sep 2023 10:11:23 -0700 (PDT)
X-Received: by 2002:a05:6808:1589:b0:3a7:392a:7405 with SMTP id
t9-20020a056808158900b003a7392a7405mr3104953oiw.2.1695316283085; Thu, 21 Sep
2023 10:11:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 21 Sep 2023 10:11:22 -0700 (PDT)
In-Reply-To: <5d119c8b-ff4b-4833-bd56-c92ce272b4d6n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa34:c000:19fc:3a52:7e1:282c;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa34:c000:19fc:3a52:7e1:282c
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <5d119c8b-ff4b-4833-bd56-c92ce272b4d6n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c35d6ff9-420e-438f-ac5c-78806df57f91n@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: jsavard@ecn.ab.ca (Quadibloc)
Injection-Date: Thu, 21 Sep 2023 17:11:23 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 113

by: Quadibloc - Thu, 21 Sep 2023 17:11 UTC

On Wednesday, September 20, 2023 at 4:08:31 PM UTC-6, MitchAlsup wrote:
> On Tuesday, September 12, 2023 at 12:32:41 AM UTC-5, Quadibloc wrote:
> > After trying to work out all kinds of bizarre ways to make a computer
> > that uses a 12-bit basic unit of memory work efficiently with
> > single-precision floats that are 36 bits long, and double-precision
> > floats that are 60 bits long, I've decided it's time to throw in the towel
> > on that one, and achieve my goals in another way.
> <
> Is there a floating point conundrum ??
> <
> Is there a floating point conundrum that needs to be solved ??
> <
> What properties do current floating point numbers have (or not have) that
> is the basis for this conundrum ??

The _apparent_ conundrum, at least to me, was this:

The existing 32-bit and 64-bit floating-point formats aren't ideal for
scientific computing.

32 bits are too short, 64 bits are longer than necessary.

From what I had seen, it appeared that the following three lengths would
be a much better fit:

36 bits - unlike 32 bits, this was long enough to allow many problems to be
solved with single precision

48 bits - ten digits of precision, in addition to being used on a lot of pocket
calculators, was also common on higher-end mechanical calculators, and
was the upper limit for mathematical tables; it seemed to be the high precision
for scientific calculations in the pre-computer age

60 bits - it seemed to be every bit as good as 64 bits when double precision
was required, so shortening double precision to make it fit with a common 12-bit
unit was a good idea

Of course, as has been noted, worrying about floats being longer than
necessary may be out of date given modern computer designs.

Where the conundrum comes in... is that 36 bits, 48 bits, and 60 bits
for floats would be just fine if one's computer design were like a PDP-8
or a Control Data 160, with a 12-bit bus. Then they would be 3, 4, and 5
words long.

But how do you make it work on a bigger computer?

You _could_ use a 720-bit bus to memory. Then you could have aligned
36-bit, 48-bit, and 60-bit items all ready for fast access. But then
if you wanted a simple addressing scheme... none of those items would
fit, the bus could be divided down into 45-bit words instead, since 45 times
sixteen is 720.

You could use a memory based around a 48-bit or 96-bit word, and handle
36-bit and 60-bit floats (single and double precision, the _main_ types,
with intermediate 48-bit, the kind best served, being secondary) by standard
methods for unaligned operands. Indexing is only complicated by _multiplying_
by the length, no divisions are required.

Each of these alternatives has advantages and disadvantages, but neither
is as simple and fast as the plain power-of-two scheme we have now.

So that was the conundrum I faced. One set of lengths for floating-point
numbers seemed desirable, but there were problems in implementation.

My "solution" was:

1) Make the 36-bit word the fundamental unit.

2) Replace 60-bit floats by 72-bit floats - but *keep the mantissa length*
of the 60-bit float, expanding the exponent. So we're using more memory
bandwidth, but we aren't requiring a bigger ALU.

3) Have _both_ 48-bit and 54-bit intermediate precision floats.

The 48-bit version is an aliquot part of a 144-bit quadword, so it can
be handled efficiently in vector computations (the 720-bit case above),
the 54-bit version is three 18-bit halfwords, so it can be addressed
easily, and scalar computations can make use of standard methods of
handling unaligned numbers (the 96-bit case above).

So I "solve" the conundrum - I give priority to single and double precision,
and while intermediate precision remains an odd length, therefore having
the basic problems noted above, they're minimized by offering intermediate
precision in _two_ sizes, to make use of the two different methods of handling
odd-length operands noted above, so that the right length is chosen for
the easiest handling: 54 bits to make addressing and indexing simple for
scalars, 48 bits to make loading and storing large vectors quick and efficient.

There are two different ways of handling odd-length operands, each
with its own advantages and disadvantages. So the two different sizes,
since one is handled by one method, and the other by the other method,
allow choosing which method to use, and so the architecture chooses 48 bits
for vectors, and 54 bits for scalars, to allow each type of computation to
only experience the minimum impact of dealing with an odd-length operand.

Of course, this fits more with the design constraints of a bygone day.

John Savard

Re: Solving the Floating-Point Conundrum

<6347866e-f5fb-417d-8674-5316a1c9a706n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34256&group=comp.arch#34256

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:1353:b0:40f:f22c:2a3b with SMTP id w19-20020a05622a135300b0040ff22c2a3bmr13664qtk.3.1695340030000;
Thu, 21 Sep 2023 16:47:10 -0700 (PDT)
X-Received: by 2002:a05:6870:1844:b0:1d6:3c76:e1c9 with SMTP id
u4-20020a056870184400b001d63c76e1c9mr2714343oaf.6.1695340029787; Thu, 21 Sep
2023 16:47:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 21 Sep 2023 16:47:09 -0700 (PDT)
In-Reply-To: <c35d6ff9-420e-438f-ac5c-78806df57f91n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<5d119c8b-ff4b-4833-bd56-c92ce272b4d6n@googlegroups.com> <c35d6ff9-420e-438f-ac5c-78806df57f91n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6347866e-f5fb-417d-8674-5316a1c9a706n@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: robfi680@gmail.com (robf...@gmail.com)
Injection-Date: Thu, 21 Sep 2023 23:47:09 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 7687

by: robf...@gmail.com - Thu, 21 Sep 2023 23:47 UTC

On Thursday, September 21, 2023 at 1:11:25 PM UTC-4, Quadibloc wrote:
> On Wednesday, September 20, 2023 at 4:08:31 PM UTC-6, MitchAlsup wrote:
> > On Tuesday, September 12, 2023 at 12:32:41 AM UTC-5, Quadibloc wrote:
> > > After trying to work out all kinds of bizarre ways to make a computer
> > > that uses a 12-bit basic unit of memory work efficiently with
> > > single-precision floats that are 36 bits long, and double-precision
> > > floats that are 60 bits long, I've decided it's time to throw in the towel
> > > on that one, and achieve my goals in another way.
> > <
> > Is there a floating point conundrum ??
> > <
> > Is there a floating point conundrum that needs to be solved ??
> > <
> > What properties do current floating point numbers have (or not have) that
> > is the basis for this conundrum ??
> The _apparent_ conundrum, at least to me, was this:
>
> The existing 32-bit and 64-bit floating-point formats aren't ideal for
> scientific computing.
>
> 32 bits are too short, 64 bits are longer than necessary.
>
> From what I had seen, it appeared that the following three lengths would
> be a much better fit:
>
> 36 bits - unlike 32 bits, this was long enough to allow many problems to be
> solved with single precision
>
> 48 bits - ten digits of precision, in addition to being used on a lot of pocket
> calculators, was also common on higher-end mechanical calculators, and
> was the upper limit for mathematical tables; it seemed to be the high precision
> for scientific calculations in the pre-computer age
>
> 60 bits - it seemed to be every bit as good as 64 bits when double precision
> was required, so shortening double precision to make it fit with a common 12-bit
> unit was a good idea
>
> Of course, as has been noted, worrying about floats being longer than
> necessary may be out of date given modern computer designs.
>
> Where the conundrum comes in... is that 36 bits, 48 bits, and 60 bits
> for floats would be just fine if one's computer design were like a PDP-8
> or a Control Data 160, with a 12-bit bus. Then they would be 3, 4, and 5
> words long.
>
> But how do you make it work on a bigger computer?
>
> You _could_ use a 720-bit bus to memory. Then you could have aligned
> 36-bit, 48-bit, and 60-bit items all ready for fast access. But then
> if you wanted a simple addressing scheme... none of those items would
> fit, the bus could be divided down into 45-bit words instead, since 45 times
> sixteen is 720.
>
> You could use a memory based around a 48-bit or 96-bit word, and handle
> 36-bit and 60-bit floats (single and double precision, the _main_ types,
> with intermediate 48-bit, the kind best served, being secondary) by standard
> methods for unaligned operands. Indexing is only complicated by _multiplying_
> by the length, no divisions are required.
>
> Each of these alternatives has advantages and disadvantages, but neither
> is as simple and fast as the plain power-of-two scheme we have now.
>
> So that was the conundrum I faced. One set of lengths for floating-point
> numbers seemed desirable, but there were problems in implementation.
>
> My "solution" was:
>
> 1) Make the 36-bit word the fundamental unit.
>
> 2) Replace 60-bit floats by 72-bit floats - but *keep the mantissa length*
> of the 60-bit float, expanding the exponent. So we're using more memory
> bandwidth, but we aren't requiring a bigger ALU.
>
> 3) Have _both_ 48-bit and 54-bit intermediate precision floats.
>
> The 48-bit version is an aliquot part of a 144-bit quadword, so it can
> be handled efficiently in vector computations (the 720-bit case above),
> the 54-bit version is three 18-bit halfwords, so it can be addressed
> easily, and scalar computations can make use of standard methods of
> handling unaligned numbers (the 96-bit case above).
>
> So I "solve" the conundrum - I give priority to single and double precision,
> and while intermediate precision remains an odd length, therefore having
> the basic problems noted above, they're minimized by offering intermediate
> precision in _two_ sizes, to make use of the two different methods of handling
> odd-length operands noted above, so that the right length is chosen for
> the easiest handling: 54 bits to make addressing and indexing simple for
> scalars, 48 bits to make loading and storing large vectors quick and efficient.
>
> There are two different ways of handling odd-length operands, each
> with its own advantages and disadvantages. So the two different sizes,
> since one is handled by one method, and the other by the other method,
> allow choosing which method to use, and so the architecture chooses 48 bits
> for vectors, and 54 bits for scalars, to allow each type of computation to
> only experience the minimum impact of dealing with an odd-length operand.
>
> Of course, this fits more with the design constraints of a bygone day.
>
> John Savard
I think the eight-bit byte is the fundamental unit to work with.

Given that many modern CPUs can load unaligned data, I do not think that the
width of FP is critical except that it should be a multiple of eight bits. I would go
with 40 bits floats instead of 36 and 64 bits instead of 60. Odd FP sizes may
require odd sized index scaling, eg. Scale by 5 or 10 in the ISA.

There should be a tool that will allow one to input the S, E, and M settings and
generate the appropriate code for FP of that precision. There is something like
that in FPGA vendor toolsets. It could be built into some sort of reconfigurable
compute generating tool. The result would allow the use of oddball FP if there
are cases where conversions to standard FP are needed.

For myself, I find it too easy to play with the sizes of FP so I stick to the
“standard” sizes. 16/32/64/128. It is a lot more likely to get correct or be
corrected by third parties.

Re: Solving the Floating-Point Conundrum

<71d6df28-ece0-4aa4-b07c-051ca81aab4an@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34257&group=comp.arch#34257

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:242:b0:412:26be:4642 with SMTP id c2-20020a05622a024200b0041226be4642mr88908qtx.2.1695342077446;
Thu, 21 Sep 2023 17:21:17 -0700 (PDT)
X-Received: by 2002:a9d:73ce:0:b0:6c0:a080:f1a7 with SMTP id
m14-20020a9d73ce000000b006c0a080f1a7mr2204997otk.2.1695342077254; Thu, 21 Sep
2023 17:21:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 21 Sep 2023 17:21:16 -0700 (PDT)
In-Reply-To: <c35d6ff9-420e-438f-ac5c-78806df57f91n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:2978:ff9b:2d22:339c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:2978:ff9b:2d22:339c
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<5d119c8b-ff4b-4833-bd56-c92ce272b4d6n@googlegroups.com> <c35d6ff9-420e-438f-ac5c-78806df57f91n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <71d6df28-ece0-4aa4-b07c-051ca81aab4an@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Fri, 22 Sep 2023 00:21:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 6992

by: MitchAlsup - Fri, 22 Sep 2023 00:21 UTC

On Thursday, September 21, 2023 at 12:11:25 PM UTC-5, Quadibloc wrote:
> On Wednesday, September 20, 2023 at 4:08:31 PM UTC-6, MitchAlsup wrote:
> > On Tuesday, September 12, 2023 at 12:32:41 AM UTC-5, Quadibloc wrote:
> > > After trying to work out all kinds of bizarre ways to make a computer
> > > that uses a 12-bit basic unit of memory work efficiently with
> > > single-precision floats that are 36 bits long, and double-precision
> > > floats that are 60 bits long, I've decided it's time to throw in the towel
> > > on that one, and achieve my goals in another way.
> > <
> > Is there a floating point conundrum ??
> > <
> > Is there a floating point conundrum that needs to be solved ??
> > <
> > What properties do current floating point numbers have (or not have) that
> > is the basis for this conundrum ??
> The _apparent_ conundrum, at least to me, was this:
>
> The existing 32-bit and 64-bit floating-point formats aren't ideal for
> scientific computing.
>
> 32 bits are too short, 64 bits are longer than necessary.
<
These same properties are also found in integers.
>
> From what I had seen, it appeared that the following three lengths would
> be a much better fit:
>
> 36 bits - unlike 32 bits, this was long enough to allow many problems to be
> solved with single precision
>
> 48 bits - ten digits of precision, in addition to being used on a lot of pocket
> calculators, was also common on higher-end mechanical calculators, and
> was the upper limit for mathematical tables; it seemed to be the high precision
> for scientific calculations in the pre-computer age
<
If you have standard integer lengths, would you also want a 48-bit integer width ?
>
> 60 bits - it seemed to be every bit as good as 64 bits when double precision
> was required, so shortening double precision to make it fit with a common 12-bit
> unit was a good idea
>
> Of course, as has been noted, worrying about floats being longer than
> necessary may be out of date given modern computer designs.
>
> Where the conundrum comes in... is that 36 bits, 48 bits, and 60 bits
> for floats would be just fine if one's computer design were like a PDP-8
> or a Control Data 160, with a 12-bit bus. Then they would be 3, 4, and 5
> words long.
>
> But how do you make it work on a bigger computer?
>
> You _could_ use a 720-bit bus to memory. Then you could have aligned
> 36-bit, 48-bit, and 60-bit items all ready for fast access. But then
> if you wanted a simple addressing scheme... none of those items would
> fit, the bus could be divided down into 45-bit words instead, since 45 times
> sixteen is 720.
>
> You could use a memory based around a 48-bit or 96-bit word, and handle
> 36-bit and 60-bit floats (single and double precision, the _main_ types,
> with intermediate 48-bit, the kind best served, being secondary) by standard
> methods for unaligned operands. Indexing is only complicated by _multiplying_
> by the length, no divisions are required.
>
> Each of these alternatives has advantages and disadvantages, but neither
> is as simple and fast as the plain power-of-two scheme we have now.
>
> So that was the conundrum I faced. One set of lengths for floating-point
> numbers seemed desirable, but there were problems in implementation.
>
> My "solution" was:
>
> 1) Make the 36-bit word the fundamental unit.
<
Sounds like a hard sell.
>
> 2) Replace 60-bit floats by 72-bit floats - but *keep the mantissa length*
> of the 60-bit float, expanding the exponent. So we're using more memory
> bandwidth, but we aren't requiring a bigger ALU.
<
I went the other way........
>
> 3) Have _both_ 48-bit and 54-bit intermediate precision floats.
<
Would you want integers of 48-bit and 54-bit ??
>
> The 48-bit version is an aliquot part of a 144-bit quadword, so it can
> be handled efficiently in vector computations (the 720-bit case above),
> the 54-bit version is three 18-bit halfwords, so it can be addressed
> easily, and scalar computations can make use of standard methods of
> handling unaligned numbers (the 96-bit case above).
>
> So I "solve" the conundrum - I give priority to single and double precision,
> and while intermediate precision remains an odd length, therefore having
> the basic problems noted above, they're minimized by offering intermediate
> precision in _two_ sizes, to make use of the two different methods of handling
> odd-length operands noted above, so that the right length is chosen for
> the easiest handling: 54 bits to make addressing and indexing simple for
> scalars, 48 bits to make loading and storing large vectors quick and efficient.
>
> There are two different ways of handling odd-length operands, each
> with its own advantages and disadvantages. So the two different sizes,
> since one is handled by one method, and the other by the other method,
> allow choosing which method to use, and so the architecture chooses 48 bits
> for vectors, and 54 bits for scalars, to allow each type of computation to
> only experience the minimum impact of dealing with an odd-length operand.
>
> Of course, this fits more with the design constraints of a bygone day.
>
> John Savard

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>John Levine <johnl@taugh.com> writes:
>>I agree Vax would never have been a RISC but they do seem to have gone
>>overboard with instruction encoding that had to be decoded and
>>interpreted a byte at a time.
>
>At least not without significant extra effort. They apparently did
>not think about stuff like pipelining at all.

Sure looks that way. Also, apropos another message, they don't seem to
have known much about optimizing compilers, since the goal seems to
have been to allow a very simple translation to machine code. They put
all the fancy prolog/epilog stuff in microcode which was a mistake
since it was so slow and did too much.

>>Even S/360 made it easy to compute the
>>addresses and fetch the operands in an instruction in parallel.
>
>Maybe they thought about that, thanks to the Stretch experience, while
>DEC engineers only had experience ...

The 360 architecture paper said that the high end 360 was supposed to be
faster than Stretch, so I'm sure they did. It also mentions in passing
that high end models have dedicated address arithmetic, low end does it
all in microcode.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Solving the Floating-Point Conundrum

<000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34259&group=comp.arch#34259

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:188c:b0:417:9e0f:fb30 with SMTP id v12-20020a05622a188c00b004179e0ffb30mr85507qtc.12.1695344260268;
Thu, 21 Sep 2023 17:57:40 -0700 (PDT)
X-Received: by 2002:a9d:6a15:0:b0:6b9:2c07:8849 with SMTP id
g21-20020a9d6a15000000b006b92c078849mr510043otn.0.1695344259978; Thu, 21 Sep
2023 17:57:39 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 21 Sep 2023 17:57:39 -0700 (PDT)
In-Reply-To: <71d6df28-ece0-4aa4-b07c-051ca81aab4an@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa34:c000:3d89:4e68:69e7:15f4;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa34:c000:3d89:4e68:69e7:15f4
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<5d119c8b-ff4b-4833-bd56-c92ce272b4d6n@googlegroups.com> <c35d6ff9-420e-438f-ac5c-78806df57f91n@googlegroups.com>
<71d6df28-ece0-4aa4-b07c-051ca81aab4an@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: jsavard@ecn.ab.ca (Quadibloc)
Injection-Date: Fri, 22 Sep 2023 00:57:40 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2818

by: Quadibloc - Fri, 22 Sep 2023 00:57 UTC

On Thursday, September 21, 2023 at 6:21:19 PM UTC-6, MitchAlsup wrote:

> These same properties are also found in integers.
> If you have standard integer lengths, would you also want a 48-bit integer width ?
> Would you want integers of 48-bit and 54-bit ??

I have an idea, from what I've read, about what lengths are desirable
for floating-point numbers.

Integers... well, the primary integer type needs to be big enough to
serve as an index to an array. 32 bits used to do that, and now we need
64 bits. Although the physical memory addresses are really only 48
bits... but then, if bigger arrays can live in virtual memory, then indexes
into them will also be wanted.

So I don't see the same urgency when it comes to binary integers.

On the System/360, of course, packed decimal integers were like
*strings*, and could be any length. But those operations were
memory to memory, and thus they would be very slow on today's
computers. So the ability to do packed decimal operations in
registers is important.

Possibly, if other sizes of integers _are_ needed, they could piggyback
on the capabilities developed for floating-point. But while I saw a problem
with floating-point, I'm not aware of one with integers, so naturally I
wouldn't suggest transforming computers with a strange size for bytes
or peculiar memory addressing... to fill a need I am not aware of.

John Savard

Re: Solving the Floating-Point Conundrum

<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34260&group=comp.arch#34260

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:e91:b0:774:16fb:24da with SMTP id w17-20020a05620a0e9100b0077416fb24damr6878qkm.3.1695348312345;
Thu, 21 Sep 2023 19:05:12 -0700 (PDT)
X-Received: by 2002:a05:6870:b787:b0:1d6:4da3:ae2d with SMTP id
ed7-20020a056870b78700b001d64da3ae2dmr2949279oab.7.1695348311962; Thu, 21 Sep
2023 19:05:11 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!border-2.nntp.ord.giganews.com!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 21 Sep 2023 19:05:11 -0700 (PDT)
In-Reply-To: <uefkrv$ag9f$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=136.50.14.162; posting-account=AoizIQoAAADa7kQDpB0DAj2jwddxXUgl
NNTP-Posting-Host: 136.50.14.162
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com> <f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com> <uefkrv$ag9f$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: jim.brakefield@ieee.org (JimBrakefield)
Injection-Date: Fri, 22 Sep 2023 02:05:12 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 69

by: JimBrakefield - Fri, 22 Sep 2023 02:05 UTC

On Wednesday, September 20, 2023 at 3:32:03 PM UTC-5, Thomas Koenig wrote:
> MitchAlsup <Mitch...@aol.com> schrieb:
> > On Sunday, September 17, 2023 at 3:30:19 PM UTC-5, John Levine wrote:
> >> According to Thomas Koenig <tko...@netcologne.de>:
> >> >> That's not a power-of-two length, so how do I keep using these numbers both
> >> >> efficient and simple?
> >> >
> >> >Make the architecture byte-addressable, with another width for the
> >> >bytes; possible choices are 6 and 9.
> >> I'm pretty sure the world has spoken and we are going to use 8-bit
> >> bytes forever. I liked the PDP-8 and PDP-10 but they are, you know, dead.
> ><
> > In addition, the world has spoken and little endian also won.
> ><
> >> >Then make your architecture capable of misaligned loads and stores
> >> >and an extra floating point format, maybe 45 bits, with 9 bits
> >> >exponent and 36 bits of significand.
> ><
> >> If you're worried about performance, use your 45 bit format and store
> >> it in a 64 bit word.
> ><
> > In 1985 one could get a descent 32-bit pipelined RISC architecture in 1cm^2
> > Today this design in < 0.1mm^2 or you can make a GBOoO version < 2mm^2.
> ><
> > And you really need 5mm^2 to get enough pins on the part to feed what you
> > can put inside; 7mm^2 makes even more sense on pins versus perf.
> ><
> > So, why are you catering to ANY bit counts less than 64 ??
> > Intel has version with 512-bit data paths, GPUs generally use 1024-bits in
> > and 1024 bits out per cycle continuously per shader core.
> ><
> > It is no longer 1990, adjust your thinking to the modern realities or our time !
>
> There could be a justification for an intermediate floating point
> design - memory bandwidth (and ALU width).
>
> If you look at linear algebra solvers, these are usually limited
> by memory bandwidth. A 512-bit cache line size accomodates
> 8 64-bit numbers, 10 48-bit numbers, 12 40-bit numbers, 14
> 36-bit numbers or 16 32-bit numbers.
>
> For problems where 32 bits are not enough, but a few more bits
> might suffice, having additional intermediate floating point sizes
> could offer significant speedup.

Ugh The business case for non-power-of-two floats:
The core count (or lane count) increases for shorter floats
25% increase for 48-bit floats, 60% for 40-bit floats and 75% for 36-bit floats versus 64-bit floats.
Ignoring super-linear transistor counts and logic delay, this directly translates into performance advantage.
As L1, L2 and L3 data caches are on chip, they can be specialized for the float size.
Data transfers between DRAM and the processor chip become more complicated, but as DRAM is much slower,
the effect is less noticeable.
The instructions for these different floating point units can remain 8-bit byte sized, e.g. employ a Harvard architecture.
(a given chip would normally support a single float size or a half or fourth thereof)

Re: Solving the Floating-Point Conundrum

<uejf0p$2rob$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34261&group=comp.arch#34261

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Solving the Floating-Point Conundrum
Date: Fri, 22 Sep 2023 09:16:40 +0200
Organization: A noiseless patient Spider
Lines: 75
Message-ID: <uejf0p$2rob$1@dont-email.me>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com>
<f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com>
<uefkrv$ag9f$1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 22 Sep 2023 07:16:41 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="89d453d9c14e4ef0ae81419d5dd31830";
logging-data="93963"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/u6QX9upjjIxWeNj61EObOVI8PSyeoiFZk4L5q5nw2/w=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17
Cancel-Lock: sha1:9FCGqFEwWbFbz2Sl6BXNbG7WXfE=
In-Reply-To: <deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com>

by: Terje Mathisen - Fri, 22 Sep 2023 07:16 UTC

JimBrakefield wrote:
> On Wednesday, September 20, 2023 at 3:32:03â¯PM UTC-5, Thomas Koenig wrote:
>> MitchAlsup <Mitch...@aol.com> schrieb:
>>> On Sunday, September 17, 2023 at 3:30:19â¯PM UTC-5, John Levine wrote:
>>>> According to Thomas Koenig <tko...@netcologne.de>:
>>>>>> That's not a power-of-two length, so how do I keep using these numbers both
>>>>>> efficient and simple?
>>>>>
>>>>> Make the architecture byte-addressable, with another width for the
>>>>> bytes; possible choices are 6 and 9.
>>>> I'm pretty sure the world has spoken and we are going to use 8-bit
>>>> bytes forever. I liked the PDP-8 and PDP-10 but they are, you know, dead.
>>> <
>>> In addition, the world has spoken and little endian also won.
>>> <
>>>>> Then make your architecture capable of misaligned loads and stores
>>>>> and an extra floating point format, maybe 45 bits, with 9 bits
>>>>> exponent and 36 bits of significand.
>>> <
>>>> If you're worried about performance, use your 45 bit format and store
>>>> it in a 64 bit word.
>>> <
>>> In 1985 one could get a descent 32-bit pipelined RISC architecture in 1cm^2
>>> Today this design in < 0.1mm^2 or you can make a GBOoO version < 2mm^2.
>>> <
>>> And you really need 5mm^2 to get enough pins on the part to feed what you
>>> can put inside; 7mm^2 makes even more sense on pins versus perf.
>>> <
>>> So, why are you catering to ANY bit counts less than 64 ??
>>> Intel has version with 512-bit data paths, GPUs generally use 1024-bits in
>>> and 1024 bits out per cycle continuously per shader core.
>>> <
>>> It is no longer 1990, adjust your thinking to the modern realities or our time !
>>
>> There could be a justification for an intermediate floating point
>> design - memory bandwidth (and ALU width).
>>
>> If you look at linear algebra solvers, these are usually limited
>> by memory bandwidth. A 512-bit cache line size accomodates
>> 8 64-bit numbers, 10 48-bit numbers, 12 40-bit numbers, 14
>> 36-bit numbers or 16 32-bit numbers.
>>
>> For problems where 32 bits are not enough, but a few more bits
>> might suffice, having additional intermediate floating point sizes
>> could offer significant speedup.
>
> Ugh The business case for non-power-of-two floats:
> The core count (or lane count) increases for shorter floats
> 25% increase for 48-bit floats, 60% for 40-bit floats and 75% for 36-bit floats versus 64-bit floats.
> Ignoring super-linear transistor counts and logic delay, this directly translates into performance advantage.
> As L1, L2 and L3 data caches are on chip, they can be specialized for the float size.
> Data transfers between DRAM and the processor chip become more complicated, but as DRAM is much slower,
> the effect is less noticeable.
> The instructions for these different floating point units can remain 8-bit byte sized, e.g. employ a Harvard architecture.
> (a given chip would normally support a single float size or a half or fourth thereof)
>

The obvious solution to this desire for odd FP storage sizes would be
hardware that can extract an odd number of bytes from the input stream
and then zero-fill the rest of a 64-bit container before feeding the
result into the FPU calculation engine. (I.e. EXTRACT)

It is only when you store back to memory that you need some extra magic
to perform rounding, and only if you are worried about double rounding.

Having a way to configure the FPU for arbitrary rounding points would be
sufficient to support perfect rounding at any of the sizes Quadibloc is
asking for, there is no need for extra hardware support to directly
load/store these sizes from RAM.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Solving the Floating-Point Conundrum

<uek4ci$db34$1@newsreader4.netcologne.de>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34262&group=comp.arch#34262

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd6-397b-0-8673-ba5-22a1-ed6e.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Solving the Floating-Point Conundrum
Date: Fri, 22 Sep 2023 13:21:22 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <uek4ci$db34$1@newsreader4.netcologne.de>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<5d119c8b-ff4b-4833-bd56-c92ce272b4d6n@googlegroups.com>
<c35d6ff9-420e-438f-ac5c-78806df57f91n@googlegroups.com>
<71d6df28-ece0-4aa4-b07c-051ca81aab4an@googlegroups.com>
<000949fe-1639-41b3-ae9e-764cdf6c9b4bn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 22 Sep 2023 13:21:22 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd6-397b-0-8673-ba5-22a1-ed6e.ipv6dyn.netcologne.de:2001:4dd6:397b:0:8673:ba5:22a1:ed6e";
logging-data="437348"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Fri, 22 Sep 2023 13:21 UTC

Quadibloc <jsavard@ecn.ab.ca> schrieb:
> On Thursday, September 21, 2023 at 6:21:19 PM UTC-6, MitchAlsup wrote:
>
>> These same properties are also found in integers.
>> If you have standard integer lengths, would you also want a 48-bit integer width ?
>> Would you want integers of 48-bit and 54-bit ??
>
> I have an idea, from what I've read, about what lengths are desirable
> for floating-point numbers.
>
> Integers... well, the primary integer type needs to be big enough to
> serve as an index to an array. 32 bits used to do that, and now we need
> 64 bits. Although the physical memory addresses are really only 48
> bits... but then, if bigger arrays can live in virtual memory, then indexes
> into them will also be wanted.
>
> So I don't see the same urgency when it comes to binary integers.

Scientific computing has come to the point where 32-bit integers are
not enough for many cases - at today's computer speeds and memory
sizes, it is very reasonable to tackle problems where array indices
exceed 2^32.

Re: Solving the Floating-Point Conundrum

<9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34263&group=comp.arch#34263

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:63c1:b0:773:f368:535b with SMTP id pw1-20020a05620a63c100b00773f368535bmr82199qkn.9.1695392796891;
Fri, 22 Sep 2023 07:26:36 -0700 (PDT)
X-Received: by 2002:a05:6870:7696:b0:1d6:b110:2f1f with SMTP id
dx22-20020a056870769600b001d6b1102f1fmr3372543oab.8.1695392796660; Fri, 22
Sep 2023 07:26:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 22 Sep 2023 07:26:36 -0700 (PDT)
In-Reply-To: <deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:cc7e:bad3:63f9:3b9d;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:cc7e:bad3:63f9:3b9d
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com> <f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com> <uefkrv$ag9f$1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Fri, 22 Sep 2023 14:26:36 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5392

by: MitchAlsup - Fri, 22 Sep 2023 14:26 UTC

On Thursday, September 21, 2023 at 9:05:14 PM UTC-5, JimBrakefield wrote:
> On Wednesday, September 20, 2023 at 3:32:03 PM UTC-5, Thomas Koenig wrote:
> > MitchAlsup <Mitch...@aol.com> schrieb:
> > > On Sunday, September 17, 2023 at 3:30:19 PM UTC-5, John Levine wrote:
> > >> According to Thomas Koenig <tko...@netcologne.de>:
> > >> >> That's not a power-of-two length, so how do I keep using these numbers both
> > >> >> efficient and simple?
> > >> >
> > >> >Make the architecture byte-addressable, with another width for the
> > >> >bytes; possible choices are 6 and 9.
> > >> I'm pretty sure the world has spoken and we are going to use 8-bit
> > >> bytes forever. I liked the PDP-8 and PDP-10 but they are, you know, dead.
> > ><
> > > In addition, the world has spoken and little endian also won.
> > ><
> > >> >Then make your architecture capable of misaligned loads and stores
> > >> >and an extra floating point format, maybe 45 bits, with 9 bits
> > >> >exponent and 36 bits of significand.
> > ><
> > >> If you're worried about performance, use your 45 bit format and store
> > >> it in a 64 bit word.
> > ><
> > > In 1985 one could get a descent 32-bit pipelined RISC architecture in 1cm^2
> > > Today this design in < 0.1mm^2 or you can make a GBOoO version < 2mm^2.
> > ><
> > > And you really need 5mm^2 to get enough pins on the part to feed what you
> > > can put inside; 7mm^2 makes even more sense on pins versus perf.
> > ><
> > > So, why are you catering to ANY bit counts less than 64 ??
> > > Intel has version with 512-bit data paths, GPUs generally use 1024-bits in
> > > and 1024 bits out per cycle continuously per shader core.
> > ><
> > > It is no longer 1990, adjust your thinking to the modern realities or our time !
> >
> > There could be a justification for an intermediate floating point
> > design - memory bandwidth (and ALU width).
> >
> > If you look at linear algebra solvers, these are usually limited
> > by memory bandwidth. A 512-bit cache line size accomodates
> > 8 64-bit numbers, 10 48-bit numbers, 12 40-bit numbers, 14
> > 36-bit numbers or 16 32-bit numbers.
> >
> > For problems where 32 bits are not enough, but a few more bits
> > might suffice, having additional intermediate floating point sizes
> > could offer significant speedup.
> Ugh The business case for non-power-of-two floats:
> The core count (or lane count) increases for shorter floats
> 25% increase for 48-bit floats, 60% for 40-bit floats and 75% for 36-bit floats versus 64-bit floats.
> Ignoring super-linear transistor counts and logic delay, this directly translates into performance advantage.
<
One builds FP calculation resources as big as longest container needed at full throughput.
In a 64-bit machine, this is one with a 11-bit exponent and a 52-bit fraction.
On such a machine, the latency is set by the calculations on this sized number.
AND
Smaller width numbers do not save any cycles.
<
So, the only advantage one has with 48-bit, ... numbers is memory footprint..
There is NO (nada, zero, zilch) advantage in calculation latency.
<
> As L1, L2 and L3 data caches are on chip, they can be specialized for the float size.
> Data transfers between DRAM and the processor chip become more complicated, but as DRAM is much slower,
> the effect is less noticeable.
> The instructions for these different floating point units can remain 8-bit byte sized, e.g. employ a Harvard architecture.
> (a given chip would normally support a single float size or a half or fourth thereof)

Re: Solving the Floating-Point Conundrum

<DDhPM.169751$%uv8.101543@fx15.iad>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34264&group=comp.arch#34264

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx15.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Solving the Floating-Point Conundrum
Newsgroups: comp.arch
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com> <f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com> <ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com> <9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com> <uefkrv$ag9f$1@newsreader4.netcologne.de> <deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com> <9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
Lines: 22
Message-ID: <DDhPM.169751$%uv8.101543@fx15.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Fri, 22 Sep 2023 14:36:19 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Fri, 22 Sep 2023 14:36:19 GMT
X-Received-Bytes: 1748

by: Scott Lurndal - Fri, 22 Sep 2023 14:36 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Thursday, September 21, 2023 at 9:05:14=E2=80=AFPM UTC-5, JimBrakefield =

>> Ignoring super-linear transistor counts and logic delay, this directly tr=
>anslates into performance advantage.=20
><
>One builds FP calculation resources as big as longest container needed at f=
>ull throughput.
>In a 64-bit machine, this is one with a 11-bit exponent and a 52-bit fracti=
>on.
>On such a machine, the latency is set by the calculations on this sized num=
>ber.
>AND
>Smaller width numbers do not save any cycles.

Yet they do save space.

Note also that the changes to floating point in the last decade
have not been to increase precision, but rather to reduce it.

See FP16/Bfloat16 and FP8.

Re: Solving the Floating-Point Conundrum

<fc7efd6b-3efc-46c0-9493-6ecd351f9636n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34265&group=comp.arch#34265

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:890c:b0:76f:1614:576d with SMTP id ql12-20020a05620a890c00b0076f1614576dmr97900qkn.4.1695395427547;
Fri, 22 Sep 2023 08:10:27 -0700 (PDT)
X-Received: by 2002:a05:6808:3992:b0:3ae:1e08:41ee with SMTP id
gq18-20020a056808399200b003ae1e0841eemr660934oib.5.1695395427353; Fri, 22 Sep
2023 08:10:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 22 Sep 2023 08:10:27 -0700 (PDT)
In-Reply-To: <9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa34:c000:5422:400e:3f9e:3b5e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa34:c000:5422:400e:3f9e:3b5e
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com> <f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com> <uefkrv$ag9f$1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com> <9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fc7efd6b-3efc-46c0-9493-6ecd351f9636n@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: jsavard@ecn.ab.ca (Quadibloc)
Injection-Date: Fri, 22 Sep 2023 15:10:27 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 34

by: Quadibloc - Fri, 22 Sep 2023 15:10 UTC

On Friday, September 22, 2023 at 8:26:38 AM UTC-6, MitchAlsup wrote:

> One builds FP calculation resources as big as longest container needed at full throughput.
> In a 64-bit machine, this is one with a 11-bit exponent and a 52-bit fraction.
> On such a machine, the latency is set by the calculations on this sized number.
> AND
> Smaller width numbers do not save any cycles.
> <
> So, the only advantage one has with 48-bit, ... numbers is memory footprint.
> There is NO (nada, zero, zilch) advantage in calculation latency.

That *assumes* you are not so desperate to reduce latency that you haven't
built a separate single-precision ALU for single-precision floating-point
calculations.

After all, if you've got enough transistors to put multiple CPUs on one die,
and you believe parallelism is far, far inferior to speeding up the clock or
reducing latency by other means, then you might actually think that throwing
transistors at reducing latency in this fashion is rational.

Essentially, it is true that I don't buy the idea that programmers are just
lazy, and if they did things right they could exploit parallelism effectively.
I will grant that they can probably do much better in many cases, but I
also feel there are fundamental limits.

John Savard

In article <DDhPM.169751$%uv8.101543@fx15.iad>, scott@slp53.sl.home
(Scott Lurndal) wrote:
> MitchAlsup <MitchAlsup@aol.com> writes:
> >Smaller width numbers do not save any cycles.
>
> Yet they do save space.
>
> Note also that the changes to floating point in the last decade
> have not been to increase precision, but rather to reduce it.
>
> See FP16/Bfloat16 and FP8.

However, that has been for new purposes, for which conventional float
sizes were unnecessarily accurate, and where SIMD was fairly easily used.
32-bit and 64-bit floats are still used for all their normal tasks.

John

Re: Solving the Floating-Point Conundrum

<uekcbs$ab5n$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34267&group=comp.arch#34267

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Solving the Floating-Point Conundrum
Date: Fri, 22 Sep 2023 10:37:30 -0500
Organization: A noiseless patient Spider
Lines: 154
Message-ID: <uekcbs$ab5n$1@dont-email.me>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com>
<f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com>
<uefkrv$ag9f$1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com>
<9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 22 Sep 2023 15:37:33 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1eb8596d66c31256d1ec9ac967627607";
logging-data="339127"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+EfzTSqEjqxy7GjXuloUjW"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:e9UMbxmRdnm2t514S3n6sYMNgP4=
Content-Language: en-US
In-Reply-To: <9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>

by: BGB - Fri, 22 Sep 2023 15:37 UTC

On 9/22/2023 9:26 AM, MitchAlsup wrote:
> On Thursday, September 21, 2023 at 9:05:14 PM UTC-5, JimBrakefield wrote:
>> On Wednesday, September 20, 2023 at 3:32:03 PM UTC-5, Thomas Koenig wrote:
>>> MitchAlsup <Mitch...@aol.com> schrieb:
>>>> On Sunday, September 17, 2023 at 3:30:19 PM UTC-5, John Levine wrote:
>>>>> According to Thomas Koenig <tko...@netcologne.de>:
>>>>>>> That's not a power-of-two length, so how do I keep using these numbers both
>>>>>>> efficient and simple?
>>>>>>
>>>>>> Make the architecture byte-addressable, with another width for the
>>>>>> bytes; possible choices are 6 and 9.
>>>>> I'm pretty sure the world has spoken and we are going to use 8-bit
>>>>> bytes forever. I liked the PDP-8 and PDP-10 but they are, you know, dead.
>>>> <
>>>> In addition, the world has spoken and little endian also won.
>>>> <
>>>>>> Then make your architecture capable of misaligned loads and stores
>>>>>> and an extra floating point format, maybe 45 bits, with 9 bits
>>>>>> exponent and 36 bits of significand.
>>>> <
>>>>> If you're worried about performance, use your 45 bit format and store
>>>>> it in a 64 bit word.
>>>> <
>>>> In 1985 one could get a descent 32-bit pipelined RISC architecture in 1cm^2
>>>> Today this design in < 0.1mm^2 or you can make a GBOoO version < 2mm^2.
>>>> <
>>>> And you really need 5mm^2 to get enough pins on the part to feed what you
>>>> can put inside; 7mm^2 makes even more sense on pins versus perf.
>>>> <
>>>> So, why are you catering to ANY bit counts less than 64 ??
>>>> Intel has version with 512-bit data paths, GPUs generally use 1024-bits in
>>>> and 1024 bits out per cycle continuously per shader core.
>>>> <
>>>> It is no longer 1990, adjust your thinking to the modern realities or our time !
>>>
>>> There could be a justification for an intermediate floating point
>>> design - memory bandwidth (and ALU width).
>>>
>>> If you look at linear algebra solvers, these are usually limited
>>> by memory bandwidth. A 512-bit cache line size accomodates
>>> 8 64-bit numbers, 10 48-bit numbers, 12 40-bit numbers, 14
>>> 36-bit numbers or 16 32-bit numbers.
>>>
>>> For problems where 32 bits are not enough, but a few more bits
>>> might suffice, having additional intermediate floating point sizes
>>> could offer significant speedup.
>> Ugh The business case for non-power-of-two floats:
>> The core count (or lane count) increases for shorter floats
>> 25% increase for 48-bit floats, 60% for 40-bit floats and 75% for 36-bit floats versus 64-bit floats.
>> Ignoring super-linear transistor counts and logic delay, this directly translates into performance advantage.
> <
> One builds FP calculation resources as big as longest container needed at full throughput.
> In a 64-bit machine, this is one with a 11-bit exponent and a 52-bit fraction.
> On such a machine, the latency is set by the calculations on this sized number.
> AND
> Smaller width numbers do not save any cycles.
> <
> So, the only advantage one has with 48-bit, ... numbers is memory footprint.
> There is NO (nada, zero, zilch) advantage in calculation latency.
> <

Sorta...

Say, if one has a way to express "yeah, I don't need full precision
here", one could offload the request to a faster/narrower FPU.

Though, this is mostly independent of storage format:
My FADDA/FSUBA/FMULA instructions still operate on Binary64, but
currently only give results comparable to single-precision truncate
rounding (*).

*: Though, if routing all the 'float' math through these in Quake, it
does work and gives performance gains, but adds a noticeable bug that
the player slowly turns rightwards due to the truncate rounding (could
hack around this, or modify things to try to mimic round-nearest behavior).

For storage, I could in theory add a few ops, say:
MOV.L48 //Load/Store low 48 bits.
MOV.H48 //Load/Store high 48 bits.

Which would load or store a 48-bit value from RAM, padding the low
16-bits with zeroes on load (H48) or sign or zero extending (L48).

Could be technically doable, though:
Currently, the only encoding space left would give this Disp5, and with
a scale of 2 (only real way to make this work), direct-displacement
range would kinda suck (60 bytes), and indexing would be annoying.

So, loading from an array would likely need to be, say:
LEA.W (R5, R5), R0 //Scale by 3
MOV.H48 (R4, R0), R7

At present, the above could be faked as:
LEA.W (R5, R5), R0 //Scale by 3
MOV.Q (R4, R0), R7
SHLD.Q R7, 16, R7
But, this would impose an additional interlock penalty.

The store ops would be the more annoying case here, as I would need to
add some way to signal a 48-bit store, and to have this performed in the
L1 D$ (though, the EX stage could deal with the Low-vs-High variants).

As-is, this could be faked as, say:
SHLD.Q R7, -16, R16
SHLD.Q R7, -48, R17
LEA.W (R5, R5), R0
LEA.W (R4, R0), R3
MOV.L R16, (R3, 0)
MOV.W R17, (R3, 4)

Technically also sucks...

Note that adding a Scale 6 case to the AGU isn't really going to be
workable, this would likely be a worse issue than adding the 48-bit
store logic to the L1 cache (for the Load path, need not bother, just
always handle it as a normal 64-bit load internally).

I guess, the main open question is more one of whether 48-bit storage
formats would be useful enough (or not just some bizarre mostly-unused
niche) to justify the costs of adding such a feature.

Beyond all this, to make any use of this, would need to add, say, a
"__float48" type or similar to my C compiler.

Though, in this case, throwing in "signed __int48" and "unsigned
__int48" would also make sense (since most of the hard parts would have
been already by the time one has "__float48"...).

Maybe also add "__vec3h" and similar as well (effectively __vec4h but
ignores/zeroes the W component).

And, for all this, "__float48" would still be expressed as Binary64 in
registers...

Well, and no significant changes to the memory organization (memory
would still organized around ye olde 8-bit byte).

>> As L1, L2 and L3 data caches are on chip, they can be specialized for the float size.
>> Data transfers between DRAM and the processor chip become more complicated, but as DRAM is much slower,
>> the effect is less noticeable.
>> The instructions for these different floating point units can remain 8-bit byte sized, e.g. employ a Harvard architecture.
>> (a given chip would normally support a single float size or a half or fourth thereof)

Re: Solving the Floating-Point Conundrum

<uekdq1$aivs$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34268&group=comp.arch#34268

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Solving the Floating-Point Conundrum
Date: Fri, 22 Sep 2023 11:02:07 -0500
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <uekdq1$aivs$1@dont-email.me>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com>
<f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com>
<uefkrv$ag9f$1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com>
<9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
<DDhPM.169751$%uv8.101543@fx15.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Sep 2023 16:02:09 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1eb8596d66c31256d1ec9ac967627607";
logging-data="347132"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+sMHCYpwmJfh0k7BdR/nTX"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:c9g9CPmBa/WflBRF/mqw+9Xo2mw=
Content-Language: en-US
In-Reply-To: <DDhPM.169751$%uv8.101543@fx15.iad>

by: BGB - Fri, 22 Sep 2023 16:02 UTC

On 9/22/2023 9:36 AM, Scott Lurndal wrote:
> MitchAlsup <MitchAlsup@aol.com> writes:
>> On Thursday, September 21, 2023 at 9:05:14=E2=80=AFPM UTC-5, JimBrakefield =
>
>>> Ignoring super-linear transistor counts and logic delay, this directly tr=
>> anslates into performance advantage.=20
>> <
>> One builds FP calculation resources as big as longest container needed at f=
>> ull throughput.
>> In a 64-bit machine, this is one with a 11-bit exponent and a 52-bit fracti=
>> on.
>> On such a machine, the latency is set by the calculations on this sized num=
>> ber.
>> AND
>> Smaller width numbers do not save any cycles.
>
> Yet they do save space.
>
> Note also that the changes to floating point in the last decade
> have not been to increase precision, but rather to reduce it.
>
> See FP16/Bfloat16 and FP8.
>

Yeah. I prefer Binary16 for most things over BFloat16 (more mantissa
bits is usually more useful than more exponent range).

Might have technically preferred something like: S.E6.F9 instead as a
better balance, but this ship has long since sailed.

There is dedicated support for Binary16 in my case, but not for
BFloat16, but BF16 can be faked easily enough with other operations
(mostly one gets creative with the use of shuffle ops and similar;
presumably this is also how BF16 is pulled off on top of SSE).

There are a few FP8 converter ops in my case as well, but they are less
often used as FP8 is on the lower end of "barely usable for much of
anything" (these ops generally use Binary16 as the intermediary).

So, say, converter paths:
Binary16 <-> Binary64 (Scalar)
Binary32 <-> Binary64 (Scalar and SIMD 2x)
Binary16 <-> Binary32 (SIMD, 2x/4x)
FP8 <-> Binary16 (SIMD, 4x)

Conversion between FPU-SIMD and packed-integer types gets a bit more
wonky in my case (requires manual scaling and biasing to get the values
to/from the range used by the converter ops; though the ops were
designed around the assumption that the default mapping is roughly
unit-range).

Though, say, OpenGL also makes this assumption, say, if you give it
signed-integer coordinates, it will assume they map from -1.0 to 1.0,
even as "more useful" as something like 8.8 or 16.16 might have been.

Can scale to other ranges by adding/subtracting a bias value from the
exponents though (needs to be done in the correct order relative to the
DC biasing step to avoid a risk of going outside of the exponent range
when dealing with zeroes or similar).

....

Re: Solving the Floating-Point Conundrum

<2023Sep22.182349@mips.complang.tuwien.ac.at>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34269&group=comp.arch#34269

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Solving the Floating-Point Conundrum
Date: Fri, 22 Sep 2023 16:23:49 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 35
Message-ID: <2023Sep22.182349@mips.complang.tuwien.ac.at>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com> <f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com> <ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com> <9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com> <uefkrv$ag9f$1@newsreader4.netcologne.de> <deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com> <uejf0p$2rob$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="10aa77f2d00cd468c058ed422f05c01a";
logging-data="359394"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/LZS0/rlcquHbD6CY6DvTl"
Cancel-Lock: sha1:UbnTD85zOsQOMn/wTmwWxeRx1co=
X-newsreader: xrn 10.11

by: Anton Ertl - Fri, 22 Sep 2023 16:23 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:
>JimBrakefield wrote:
>> On Wednesday, September 20, 2023 at 3:32:03 PM UTC-5, Thomas Koenig wrote:
>>> If you look at linear algebra solvers, these are usually limited
>>> by memory bandwidth. A 512-bit cache line size accomodates
>>> 8 64-bit numbers, 10 48-bit numbers, 12 40-bit numbers, 14
>>> 36-bit numbers or 16 32-bit numbers.

Yes, but it can also accomodate 10 51-bit numbers and 12 42-bit
numbers, so 48-bit numbers and 40-bit numbers make no sense with this
reasoning.

>The obvious solution to this desire for odd FP storage sizes would be
>hardware that can extract an odd number of bytes from the input stream
>and then zero-fill the rest of a 64-bit container before feeding the
>result into the FPU calculation engine. (I.e. EXTRACT)

Of course, if the idea is to use these numbers for SIMD stuff, bits
are more relevant than bytes.

With the approach you outline and 51-bit and 42-bit FP numbers, one
would need (for 512-bit-wide SIMD) a SIMD unit with 8 64-bit FMAs, 2
51-bitr FMAs, 2 42-bit FMAs. For 32-bit and 16-bit work, I expect
that the 64-bit FMAs can be somehow split.

>It is only when you store back to memory that you need some extra magic
>to perform rounding, and only if you are worried about double rounding.

If you are worried about double rounding, you need proper rounding for
the mantissa and exponent width at every step, not just on storing.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Solving the Floating-Point Conundrum

<uekgq2$b36v$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34270&group=comp.arch#34270

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Solving the Floating-Point Conundrum
Date: Fri, 22 Sep 2023 11:53:20 -0500
Organization: A noiseless patient Spider
Lines: 69
Message-ID: <uekgq2$b36v$1@dont-email.me>
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com>
<f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com>
<uefkrv$ag9f$1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com>
<9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
<fc7efd6b-3efc-46c0-9493-6ecd351f9636n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 22 Sep 2023 16:53:22 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1eb8596d66c31256d1ec9ac967627607";
logging-data="363743"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19efBx73J7ILQaMTS0CGpqZ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:kKT6IbaZilyNfLlE/Uejxp49ByA=
Content-Language: en-US
In-Reply-To: <fc7efd6b-3efc-46c0-9493-6ecd351f9636n@googlegroups.com>

by: BGB - Fri, 22 Sep 2023 16:53 UTC

On 9/22/2023 10:10 AM, Quadibloc wrote:
> On Friday, September 22, 2023 at 8:26:38 AM UTC-6, MitchAlsup wrote:
>
>> One builds FP calculation resources as big as longest container needed at full throughput.
>> In a 64-bit machine, this is one with a 11-bit exponent and a 52-bit fraction.
>> On such a machine, the latency is set by the calculations on this sized number.
>> AND
>> Smaller width numbers do not save any cycles.
>> <
>> So, the only advantage one has with 48-bit, ... numbers is memory footprint.
>> There is NO (nada, zero, zilch) advantage in calculation latency.
>
> That *assumes* you are not so desperate to reduce latency that you haven't
> built a separate single-precision ALU for single-precision floating-point
> calculations.
>

Yeah.
For example, in my case, I have:
1x FPU modules that can natively handle Binary64
(6 cycle latency).
4x FPU modules that can natively handle Binary32 and Binary16
(3 cycle latency).

> After all, if you've got enough transistors to put multiple CPUs on one die,
> and you believe parallelism is far, far inferior to speeding up the clock or
> reducing latency by other means, then you might actually think that throwing
> transistors at reducing latency in this fashion is rational.
>
> Essentially, it is true that I don't buy the idea that programmers are just
> lazy, and if they did things right they could exploit parallelism effectively.
> I will grant that they can probably do much better in many cases, but I
> also feel there are fundamental limits.
>

Probably...

Otherwise:

I went and checked, and adding the logic to deal with 48-bit Load/Store
in my core as "easy enough".

I hacked the previously unused "FMOV.x UB/UW/UL" logic (note: SW/SL are
used for Binary16/Binary32, SQ/UQ are used for LDTEX).

Where, here:
UB: Low 48-bit, Sign-Extended
UW: Low 48-bit, Zero-Extended
UL: High 48-bit (Low padded zero)

Just need to verify not that it doesn't significantly effect LUT cost or
blow out timing (probably shouldn't in this case, but one never knows).

Within the L1 cache, Load uses the UQ case (in all of these cases),
Store was mapped to the UL case (UB/UW/UL were otherwise N/A for store,
UQ was used to encode the 'MOV.X' operation). But, Store is the only
case where the operation needs to differ as far as the L1 cache is
concerned.

Still need to some define encodings and add the logic to the instruction
decoder, and add the corresponding instructions to the emulator, ...
but, thus far, seemed easy enough.

> John Savard

Re: Solving the Floating-Point Conundrum

<89d3f35b-c6d1-4fc7-9020-1437f3a3b877n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34271&group=comp.arch#34271

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:1aa5:b0:773:f2a0:fda1 with SMTP id bl37-20020a05620a1aa500b00773f2a0fda1mr91qkb.10.1695401941621;
Fri, 22 Sep 2023 09:59:01 -0700 (PDT)
X-Received: by 2002:a05:6820:125:b0:56d:72ca:c4dc with SMTP id
i5-20020a056820012500b0056d72cac4dcmr1295827ood.0.1695401941316; Fri, 22 Sep
2023 09:59:01 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 22 Sep 2023 09:59:00 -0700 (PDT)
In-Reply-To: <2023Sep22.182349@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:cc7e:bad3:63f9:3b9d;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:cc7e:bad3:63f9:3b9d
References: <57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
<8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com> <f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com>
<ue788u$4u5l$1@newsreader4.netcologne.de> <ue7nkh$ne0$1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com> <uefkrv$ag9f$1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com> <uejf0p$2rob$1@dont-email.me>
<2023Sep22.182349@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <89d3f35b-c6d1-4fc7-9020-1437f3a3b877n@googlegroups.com>
Subject: Re: Solving the Floating-Point Conundrum
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Fri, 22 Sep 2023 16:59:01 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3666

by: MitchAlsup - Fri, 22 Sep 2023 16:59 UTC

On Friday, September 22, 2023 at 11:39:34 AM UTC-5, Anton Ertl wrote:
> Terje Mathisen <terje.m...@tmsw.no> writes:
> >JimBrakefield wrote:
> >> On Wednesday, September 20, 2023 at 3:32:03 PM UTC-5, Thomas Koenig wrote:
> >>> If you look at linear algebra solvers, these are usually limited
> >>> by memory bandwidth. A 512-bit cache line size accomodates
> >>> 8 64-bit numbers, 10 48-bit numbers, 12 40-bit numbers, 14
> >>> 36-bit numbers or 16 32-bit numbers.
<
> Yes, but it can also accomodate 10 51-bit numbers and 12 42-bit
> numbers, so 48-bit numbers and 40-bit numbers make no sense with this
> reasoning.
<
Are you EVER going to want to perform atomic operations on these sizes ??
<
> >The obvious solution to this desire for odd FP storage sizes would be
> >hardware that can extract an odd number of bytes from the input stream
> >and then zero-fill the rest of a 64-bit container before feeding the
> >result into the FPU calculation engine. (I.e. EXTRACT)
> Of course, if the idea is to use these numbers for SIMD stuff, bits
> are more relevant than bytes.
>
> With the approach you outline and 51-bit and 42-bit FP numbers, one
> would need (for 512-bit-wide SIMD) a SIMD unit with 8 64-bit FMAs, 2
> 51-bitr FMAs, 2 42-bit FMAs. For 32-bit and 16-bit work, I expect
> that the 64-bit FMAs can be somehow split.
<
How are those 8×64-bit multipliers going to work with 10×51-bit containers ??
or 12×42-bit containers ??
<
> >It is only when you store back to memory that you need some extra magic
> >to perform rounding, and only if you are worried about double rounding.
> If you are worried about double rounding, you need proper rounding for
> the mantissa and exponent width at every step, not just on storing.
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Subject	Author
Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	Stephen Fuld
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	John Levine
Re: Solving the Floating-Point Conundrum	Stephen Fuld
Re: Solving the Floating-Point Conundrum	mac
Re: Solving the Floating-Point Conundrum	Thomas Koenig
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	John Dallman
Re: Solving the Floating-Point Conundrum	Scott Lurndal
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	BGB
Re: Solving the Floating-Point Conundrum	Scott Lurndal
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	Terje Mathisen
Re: Solving the Floating-Point Conundrum	BGB
Re: Solving the Floating-Point Conundrum	Stephen Fuld
Re: Solving the Floating-Point Conundrum	Scott Lurndal
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	Thomas Koenig
Re: memory speeds, Solving the Floating-Point Conundrum	John Levine
Re: memory speeds, Solving the Floating-Point Conundrum	Quadibloc
Re: memory speeds, Solving the Floating-Point Conundrum	Scott Lurndal
Re: memory speeds, Solving the Floating-Point Conundrum	MitchAlsup
Re: memory speeds, Solving the Floating-Point Conundrum	EricP
Re: memory speeds, Solving the Floating-Point Conundrum	Scott Lurndal
Re: memory speeds, Solving the Floating-Point Conundrum	EricP
Re: memory speeds, Solving the Floating-Point Conundrum	Scott Lurndal
Re: memory speeds, Solving the Floating-Point Conundrum	Quadibloc
Re: memory speeds, Solving the Floating-Point Conundrum	John Levine
Re: memory speeds, Solving the Floating-Point Conundrum	EricP
Re: memory speeds, Solving the Floating-Point Conundrum	MitchAlsup
Re: memory speeds, Solving the Floating-Point Conundrum	MitchAlsup
Re: memory speeds, Solving the Floating-Point Conundrum	MitchAlsup
Re: memory speeds, Solving the Floating-Point Conundrum	Timothy McCaffrey
Re: memory speeds, Solving the Floating-Point Conundrum	MitchAlsup
Re: memory speeds, Solving the Floating-Point Conundrum	Quadibloc
Re: memory speeds, Solving the Floating-Point Conundrum	MitchAlsup
Re: memory speeds, Solving the Floating-Point Conundrum	moi
Re: memory speeds, Solving the Floating-Point Conundrum	Anton Ertl
Re: memory speeds, Solving the Floating-Point Conundrum	Michael S
Re: memory speeds, Solving the Floating-Point Conundrum	John Levine
Re: memory speeds, Solving the Floating-Point Conundrum	Lynn Wheeler
Re: memory speeds, Solving the Floating-Point Conundrum	Anton Ertl
Re: memory speeds, Solving the Floating-Point Conundrum	EricP
Re: memory speeds, Solving the Floating-Point Conundrum	John Levine
Re: memory speeds, Solving the Floating-Point Conundrum	Anton Ertl
Re: memory speeds, Solving the Floating-Point Conundrum	Stephen Fuld
Re: memory speeds, Solving the Floating-Point Conundrum	Thomas Koenig
Re: memory speeds, Solving the Floating-Point Conundrum	Anton Ertl
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	BGB
Re: Solving the Floating-Point Conundrum	Stephen Fuld
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	BGB
Re: Solving the Floating-Point Conundrum	Scott Lurndal
Re: Solving the Floating-Point Conundrum	Timothy McCaffrey
Re: Solving the Floating-Point Conundrum	Scott Lurndal
Re: Solving the Floating-Point Conundrum	Stephen Fuld
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	Thomas Koenig
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	Thomas Koenig
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	Thomas Koenig
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	Terje Mathisen
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	Thomas Koenig
Re: Solving the Floating-Point Conundrum	John Dallman
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	Michael S
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	Terje Mathisen
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	robf...@gmail.com
Re: Solving the Floating-Point Conundrum	Scott Lurndal
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	George Neuner
Re: Solving the Floating-Point Conundrum	Thomas Koenig
Re: Solving the Floating-Point Conundrum	Terje Mathisen
Re: Solving the Floating-Point Conundrum	BGB
Re: Solving the Floating-Point Conundrum	Terje Mathisen
Re: Solving the Floating-Point Conundrum	comp.arch
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	John Levine
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	Quadibloc
Re: Solving the Floating-Point Conundrum	Stefan Monnier
Re: Solving the Floating-Point Conundrum	BGB
Re: Solving the Floating-Point Conundrum	Thomas Koenig
Re: Solving the Floating-Point Conundrum	MitchAlsup
Re: Solving the Floating-Point Conundrum	Quadibloc