Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

If you think the system is working, ask someone who's waiting for a prompt.


devel / comp.arch / Re: 80x87 and IEEE 754

SubjectAuthor
* Aprupt underflow modeThomas Koenig
+- Re: Abrupt underflow modeJohn Dallman
+- Re: Aprupt underflow modeMichael S
`* Re: Aprupt underflow modeAnton Ertl
 `* Re: Aprupt underflow modeMichael S
  `* Re: Aprupt underflow modeTerje Mathisen
   +* Re: Aprupt underflow modeBGB
   |`* Re: Aprupt underflow modeMichael S
   | `* Re: Aprupt underflow modeQuadibloc
   |  +* Re: Aprupt underflow modeBGB
   |  |+* Re: Aprupt underflow modeMitchAlsup
   |  ||`- Re: Aprupt underflow modeBGB
   |  |`* Re: Aprupt underflow modeEricP
   |  | `* Re: Aprupt underflow modeBGB
   |  |  +* Re: Aprupt underflow modeMitchAlsup
   |  |  |+- Re: Aprupt underflow moderobf...@gmail.com
   |  |  |`* Re: Aprupt underflow modeBGB
   |  |  | +- Re: Aprupt underflow modeBGB
   |  |  | `* Re: Aprupt underflow moderobf...@gmail.com
   |  |  |  `* Re: Aprupt underflow modeBGB
   |  |  |   `- Re: Aprupt underflow modeMitchAlsup
   |  |  `* Re: Aprupt underflow modeEricP
   |  |   `- Re: Aprupt underflow modeEricP
   |  `* Re: Aprupt underflow modeMichael S
   |   `* Re: Aprupt underflow modeTerje Mathisen
   |    `* Re: Aprupt underflow modeMichael S
   |     `* Re: Aprupt underflow modeTerje Mathisen
   |      +* Re: Aprupt underflow modeMichael S
   |      |`* Re: Aprupt underflow modeTerje Mathisen
   |      | +* Re: Aprupt underflow modeThomas Koenig
   |      | |`* Re: Aprupt underflow modeMichael S
   |      | | `- Re: Aprupt underflow modeMichael S
   |      | `* 80x87 and IEEE 754 (was: Aprupt underflow mode)Anton Ertl
   |      |  +* Re: 80x87 and IEEE 754 (was: Aprupt underflow mode)MitchAlsup
   |      |  |+- Re: 80x87 and IEEE 754 (was: Aprupt underflow mode)Michael S
   |      |  |`* Re: 80x87 and IEEE 754 (was: Aprupt underflow mode)BGB
   |      |  | `- Re: 80x87 and IEEE 754 (was: Aprupt underflow mode)BGB
   |      |  `* Re: 80x87 and IEEE 754Terje Mathisen
   |      |   +* Re: 80x87 and IEEE 754Michael S
   |      |   |`* Re: 80x87 and IEEE 754Terje Mathisen
   |      |   | `- Re: 80x87 and IEEE 754Michael S
   |      |   +* Re: 80x87 and IEEE 754MitchAlsup
   |      |   |`* Re: 80x87 and IEEE 754Terje Mathisen
   |      |   | +- Re: 80x87 and IEEE 754Thomas Koenig
   |      |   | `- Re: 80x87 and IEEE 754MitchAlsup
   |      |   `* Re: 80x87 and IEEE 754EricP
   |      |    +* Re: 80x87 and IEEE 754EricP
   |      |    |`- Re: 80x87 and IEEE 754MitchAlsup
   |      |    `* Re: 80x87 and IEEE 754EricP
   |      |     `* Re: 80x87 and IEEE 754MitchAlsup
   |      |      `* Re: 80x87 and IEEE 754EricP
   |      |       `* Re: 80x87 and IEEE 754MitchAlsup
   |      |        `* Re: 80x87 and IEEE 754EricP
   |      |         `* Re: 80x87 and IEEE 754Quadibloc
   |      |          `- Re: 80x87 and IEEE 754MitchAlsup
   |      `- Re: Aprupt underflow modeMitchAlsup
   `- Re: Aprupt underflow modeMichael S

Pages:123
Re: Aprupt underflow mode

<77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32966&group=comp.arch#32966

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1823:b0:3fd:e410:7397 with SMTP id t35-20020a05622a182300b003fde4107397mr7870307qtc.2.1687981813175;
Wed, 28 Jun 2023 12:50:13 -0700 (PDT)
X-Received: by 2002:a05:6870:d29c:b0:1b0:19a6:2577 with SMTP id
d28-20020a056870d29c00b001b019a62577mr4892427oae.3.1687981812792; Wed, 28 Jun
2023 12:50:12 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 28 Jun 2023 12:50:12 -0700 (PDT)
In-Reply-To: <u7i0r1$1rhn2$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:a4fd:f073:be1e:4ae6;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:a4fd:f073:be1e:4ae6
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <2023Jun25.180112@mips.complang.tuwien.ac.at>
<b613de1c-7401-4e0d-959b-d8d51f75abd6n@googlegroups.com> <u7b8q8$qh6r$1@dont-email.me>
<u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
Subject: Re: Aprupt underflow mode
From: already5chosen@yahoo.com (Michael S)
Injection-Date: Wed, 28 Jun 2023 19:50:13 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 6055
 by: Michael S - Wed, 28 Jun 2023 19:50 UTC

On Wednesday, June 28, 2023 at 10:14:45 PM UTC+3, Terje Mathisen wrote:
> Michael S wrote:
> > On Wednesday, June 28, 2023 at 8:43:41 AM UTC+3, Terje Mathisen wrote:
> >> Michael S wrote:
> >>> On Tuesday, June 27, 2023 at 7:29:25 AM UTC+3, Quadibloc wrote:
> >>>> On Monday, June 26, 2023 at 1:57:28 PM UTC-6, Michael S wrote:
> >>>>> On Monday, June 26, 2023 at 7:12:54 PM UTC+3, BGB wrote:
> >>>>
> >>>>>> I guess one possibility (for cheaper hardware) could be to not perform
> >>>>>> subnormal handling in hardware, but then have a flag in a control
> >>>>>> register that tells the FPU that if a subnormal result would be
> >>>>>> generated, to raise a fault.
> >>>>
> >>>>> That was done, multiple times, and not just on cheap hardware, but also
> >>>>> on pretty expensive HW, like first couple of generations of DEC Alpha.
> >>>>> But never on x86.
> >>>> Of course, the reason _why_ this wasn't done on x86, leaving out SSE,
> >>>
> >>> Of course, in the post above is 'x86' means SSE and later.
> >>>
> >>> x87 is irrelevant because it predates IEEE 754 and, except for formats of the
> >>> data, never had aspirations to be fully compatible with IEEE 754.
> >> Please check your history!
> >>
> >> The 8087 was the inspiration for 754, i.e. the 754-1985 standards
> >> development started with the '87, then added a few tweaks which got
> >> included in versions of the FPU which was released after that point in
> >> time, i.e. the 80387. From Wikipedia
> >> <https://en.wikipedia.org/wiki/Intel_8087>:
> >>
> >> IEEE floating-point standard
> >> When Intel designed the 8087, it aimed to make a standard floating-point
> >> format for future designs. An important aspect of the 8087 from a
> >> historical perspective was that it became the basis for the IEEE 754
> >> floating-point standard.
> >> Terje
> >>
> >>
> >> --
> >> - <Terje.Mathisen at tmsw.no>
> >> "almost all programming can be viewed as an exercise in caching"
> >
> > There is no contradiction: x87 served as inspiration for IEEE 754-85, however
> > after the standard was published, developers of next generations of
> > x87-compatible FPUs had no aspirations to provide optional standard
> > compliance.
> >
> > According to my semi-educated estimate, providing such option would
> > not be too much of the burden on i486 silicon budget. Despite that they didn't
> > do it. May be, because they were forward looking and knew that in coming
> > Pentium, due to fully-pipelined FPU, any complication *would be* a significant
> > burden. Or more likely because they didn't consider it important.
> What specifically is your beef? The 80387 was in fact fully compliant
> with the 754 spec as it existed at the time. Yes, it was different from
> prety much all others due to the internal support for extended/80-bit
> float, but that is still explicitly allowed by the standard.
>

May be, it is allowed by the -85 standard to omit support for arithmetic
operations that take binary64 inputs and produces binary64 outputs as
specified by IEEE, but it is certainly unusual.
It does not appear to be allowed by -2008 standard that says: "All conforming
implementations of this standard shall provide the operations listed in this
clausefor all supported arithmetic formats, except as stated below."
May be, lawyers among as could read it in a different way from how I read it.
Or, may be, one can say that the x87 implementation supports only one format -
80-bit extended precision and that the rest of formats are here purely to help
people with data interchange. But that does not sound satisfactory.
And if nevertheless one takes this particular position then what exactly is the
purpose of precision control field in x87 control word?

> OTOH, since SSE, the Intel/AMD FPU has reverted to something much closer
> to industry norm, making things like Java's requirement for bit-by-bit
> identical results from all operations much more feasible.
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: Aprupt underflow mode

<ye1nM.195$edH5.22@fx11.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32967&group=comp.arch#32967

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx11.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Aprupt underflow mode
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <2023Jun25.180112@mips.complang.tuwien.ac.at> <b613de1c-7401-4e0d-959b-d8d51f75abd6n@googlegroups.com> <u7b8q8$qh6r$1@dont-email.me> <u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com> <780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <u7dr9h$1adv7$1@dont-email.me> <cdHmM.6688$KtZc.1812@fx08.iad> <u7g0do$1hlav$1@dont-email.me> <AkZmM.121085$Zq81.64234@fx15.iad>
In-Reply-To: <AkZmM.121085$Zq81.64234@fx15.iad>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 49
Message-ID: <ye1nM.195$edH5.22@fx11.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 28 Jun 2023 21:03:26 UTC
Date: Wed, 28 Jun 2023 17:02:36 -0400
X-Received-Bytes: 3055
 by: EricP - Wed, 28 Jun 2023 21:02 UTC

EricP wrote:
> BGB wrote:
>>
>> Needed to add logic such that instructions in flushed pipeline stages
>> can't trigger pipeline interlocks...
>
> Yes, you are using a global stall where the whole pipeline stalls.
>
> Putting an elastic buffer, aka skid buffer, between ID2 and EX1 would
> allow a stall of the back end to be isolated while the front end continues.
> However once a valid buffer propagates to ID2 the front end stalls too.
>
> In order to allow complete pipeline flexibility you would need elastic
> buffers in each stage. During a stall at any stage these compress out any
> not-Valid earlier buffers and allow all earlier stages to fill up.
>
> But an elastic buffer costs more than twice as much.
> It needs two sets of stage flip-flops, plus a mux, plus some logic.
> So just having one between the front and back ends might be a good
> compromise.

There is another alternative - you can daisy chain the stall signal
across the stages. That only uses one set of flip flops for the
inter-stage buffers and fills in empty earlier stages if one stalls.

Something like:

If ((stage N input buffer is valid) AND
((stage N generates a stall internally and no result) OR
((stage N calculates a valid result output) AND
(stage N+1 propagates stall backward to stage N))))
then
inhibit clocking a new value into stage N input buffer
propagate a stall backward to stage N-1
else
enable clocking a new value into stage N input buffer
end if

The problem is that it adds 3 or 4 gate delays *per stage* above the
max stage delay to serially propagate the stall signal across all stages.
And this extra gate delay is applied to all stages because they all operate
at the same frequency which is set by the delay of the worst case stage,
the one at the end of the daisy chain.

As a compromise you could use daisy chain stalls between the
stages in each front and back end and then put an elastic buffer
between the ends to break the daisy chain.

Re: Aprupt underflow mode

<862f7ba3-d6a2-4460-b317-9b81f3c78a79n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32968&group=comp.arch#32968

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:a08:b0:635:e244:cc65 with SMTP id dw8-20020a0562140a0800b00635e244cc65mr290031qvb.8.1687987517597;
Wed, 28 Jun 2023 14:25:17 -0700 (PDT)
X-Received: by 2002:aca:62d7:0:b0:39c:ca65:9d88 with SMTP id
w206-20020aca62d7000000b0039cca659d88mr10372155oib.8.1687987517204; Wed, 28
Jun 2023 14:25:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 28 Jun 2023 14:25:16 -0700 (PDT)
In-Reply-To: <u7i0r1$1rhn2$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8cb4:1f9:34a7:fd0f;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8cb4:1f9:34a7:fd0f
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <2023Jun25.180112@mips.complang.tuwien.ac.at>
<b613de1c-7401-4e0d-959b-d8d51f75abd6n@googlegroups.com> <u7b8q8$qh6r$1@dont-email.me>
<u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <862f7ba3-d6a2-4460-b317-9b81f3c78a79n@googlegroups.com>
Subject: Re: Aprupt underflow mode
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Wed, 28 Jun 2023 21:25:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5495
 by: MitchAlsup - Wed, 28 Jun 2023 21:25 UTC

On Wednesday, June 28, 2023 at 2:14:45 PM UTC-5, Terje Mathisen wrote:
> Michael S wrote:
> > On Wednesday, June 28, 2023 at 8:43:41 AM UTC+3, Terje Mathisen wrote:
> >> Michael S wrote:
> >>> On Tuesday, June 27, 2023 at 7:29:25 AM UTC+3, Quadibloc wrote:
> >>>> On Monday, June 26, 2023 at 1:57:28 PM UTC-6, Michael S wrote:
> >>>>> On Monday, June 26, 2023 at 7:12:54 PM UTC+3, BGB wrote:
> >>>>
> >>>>>> I guess one possibility (for cheaper hardware) could be to not perform
> >>>>>> subnormal handling in hardware, but then have a flag in a control
> >>>>>> register that tells the FPU that if a subnormal result would be
> >>>>>> generated, to raise a fault.
> >>>>
> >>>>> That was done, multiple times, and not just on cheap hardware, but also
> >>>>> on pretty expensive HW, like first couple of generations of DEC Alpha.
> >>>>> But never on x86.
> >>>> Of course, the reason _why_ this wasn't done on x86, leaving out SSE,
> >>>
> >>> Of course, in the post above is 'x86' means SSE and later.
> >>>
> >>> x87 is irrelevant because it predates IEEE 754 and, except for formats of the
> >>> data, never had aspirations to be fully compatible with IEEE 754.
> >> Please check your history!
> >>
> >> The 8087 was the inspiration for 754, i.e. the 754-1985 standards
> >> development started with the '87, then added a few tweaks which got
> >> included in versions of the FPU which was released after that point in
> >> time, i.e. the 80387. From Wikipedia
> >> <https://en.wikipedia.org/wiki/Intel_8087>:
> >>
> >> IEEE floating-point standard
> >> When Intel designed the 8087, it aimed to make a standard floating-point
> >> format for future designs. An important aspect of the 8087 from a
> >> historical perspective was that it became the basis for the IEEE 754
> >> floating-point standard.
> >> Terje
> >>
> >>
> >> --
> >> - <Terje.Mathisen at tmsw.no>
> >> "almost all programming can be viewed as an exercise in caching"
> >
> > There is no contradiction: x87 served as inspiration for IEEE 754-85, however
> > after the standard was published, developers of next generations of
> > x87-compatible FPUs had no aspirations to provide optional standard
> > compliance.
> >
> > According to my semi-educated estimate, providing such option would
> > not be too much of the burden on i486 silicon budget. Despite that they didn't
> > do it. May be, because they were forward looking and knew that in coming
> > Pentium, due to fully-pipelined FPU, any complication *would be* a significant
> > burden. Or more likely because they didn't consider it important.
<
> What specifically is your beef? The 80387 was in fact fully compliant
> with the 754 spec as it existed at the time. Yes, it was different from
> prety much all others due to the internal support for extended/80-bit
> float, but that is still explicitly allowed by the standard.
<
Mc 68881 managed to also be compliant with IEEE 754-1985 while using
80-bit representation, a CORDIC unit, and could give different answers
due to when rounding was applied (before or after normalization).
<
So being compliant does not actually mean two implementations give
the same result from a given computation set/sequence.
>
> OTOH, since SSE, the Intel/AMD FPU has reverted to something much closer
> to industry norm, making things like Java's requirement for bit-by-bit
> identical results from all operations much more feasible.
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: Aprupt underflow mode

<u7ib1k$1skas$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32969&group=comp.arch#32969

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.chmurka.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Aprupt underflow mode
Date: Wed, 28 Jun 2023 17:08:47 -0500
Organization: A noiseless patient Spider
Lines: 166
Message-ID: <u7ib1k$1skas$1@dont-email.me>
References: <u78vn4$3keqd$1@newsreader4.netcologne.de>
<2023Jun25.180112@mips.complang.tuwien.ac.at>
<b613de1c-7401-4e0d-959b-d8d51f75abd6n@googlegroups.com>
<u7b8q8$qh6r$1@dont-email.me> <u7cde2$10aie$1@dont-email.me>
<7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com>
<u7dr9h$1adv7$1@dont-email.me> <cdHmM.6688$KtZc.1812@fx08.iad>
<u7g0do$1hlav$1@dont-email.me>
<15c07e9f-fc84-4b25-9cc7-2234b7a153c7n@googlegroups.com>
<u7gmr5$1n5ge$1@dont-email.me>
<0fd56f3f-fd72-4dba-905a-22640fc847c6n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 28 Jun 2023 22:08:52 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f881a1965ea570ced615b35ff0da32fa";
logging-data="1986908"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Id93Fstg9Uo+Fp4z0GLXW"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:XFrpOlIODCYPzOJNTv2i+T7Hb5k=
Content-Language: en-US
In-Reply-To: <0fd56f3f-fd72-4dba-905a-22640fc847c6n@googlegroups.com>
 by: BGB - Wed, 28 Jun 2023 22:08 UTC

On 6/28/2023 2:36 PM, robf...@gmail.com wrote:
> On Wednesday, June 28, 2023 at 3:18:01 AM UTC-4, BGB wrote:
>> On 6/27/2023 9:17 PM, MitchAlsup wrote:
>>> On Tuesday, June 27, 2023 at 7:55:24 PM UTC-5, BGB wrote:
>>>> On 6/27/2023 2:58 PM, EricP wrote:
>>>>> BGB wrote:
>>>>
>>>> I use a strict in-order pipeline design.
>>>>
>>>> Stuff either advances in lock-step at 1 stage per cycle, or the pipeline
>>>> stalls.
>>> <
>>> This could be a cause on one of more of your speed paths. We found
>>> it difficult in the 1-wide and 2-wide in-order days to take register
>>> specifiers from the instruction, CAM all of the potential results in
>>> the pipeline, find-first the youngest one for each operand register,
>>> and use that as unary select input to the forwarding multiplexer.
>>> Only after you [perform all the forwarding do you get to the point
>>> where you can address whether all the potential instructions
>>> actually issued or stalled, and exactly where to cut the pipeline.
>> The register forwarding logic seems to be one of those things that
>> scales poorly.
>
> If you can accept that the forwarding logic will be what limits the
> design, then performance can be improved somewhat by adding
> more complex instructions as long as it is within the timing limit.
>

Had partly started writing up an "idea spec" for a new pipeline design,
where the idea would be to eliminate the both the register forwarding
and the use of stalls in the main EX pipeline (instead using a "Low
Speed Queue" for instructions that can't be fully pipelined, with L1
misses also being shunted off to this queue).

It would not likely help performance per clock, but could maybe reduce
net-delay enough that it could allow increasing clock-speed (by not
needing to propagate a global stall signal across the entire core...).

There would still need to be stall handling for the fetch and decode
stages though (but it would mostly be based on "whether or not the
registers are ready").

In the case of a memory access, likely it would start off as a normal
"fast path" access, but if a miss happened, the instruction is shunted
to the low-speed path (instead of sending its result back to the
register file).

> For instance three source operand instructions, fewer clocks for
> multiplies and divides, performing combo instructions like
> shift-plus-a second op.
>

Had done this to some extent already. Reason the ISA is as "non
minimalist" as it was, was I couldn't get stuff much faster than 50MHz,
but could add a lot of features into what it could do.

> Thor milestone reached: bluescreen displayed in FPGA.

In my case, just went and added a feature to move the handling of
predication from EX1 partly over to ID2 (with an interlock stall being
inserted between ops that may update SR.T and the predicated ops).

Not much significant change in overall LUT cost or timing (WNS), but did
seem to significantly rebalance LUT usage within the core:
ALU: shrank
EX1 stage modules: got bigger
L1 D$: got bigger
GPR file and top-level: shrank somewhat.

There is a minor reduction in LUT cost though.

Did seem to cause Vivado's "implementation" stage to seemingly get a bit
faster though (dropped to around 20 minutes from around a little over an
hour). (Though, on the XC7S50 version, this stage finishes in around 6
minutes).

For the XC7A200T version of the core, the L1 D$ is now eating around 11k
LUT (roughly 1/4 of the total LUT cost of the core).

The slow paths seem to still be following a path from the L1 D$ over to
the various parts of the pipeline via the stall signal (asserted
whenever the L1 misses).

Main cost if this feature is that it adds an extra cycle of latency to
things like CMPxx + BT/BF and similar.

>>
>>
>> But there isn't really an obvious way around it short of adding a
>> similar restriction to some VLIW DSP where one can't use the value in
>> the register as an input until the whole bundle has passed out of the
>> pipeline (trying to use a register before it leaves the pipeline
>> resulting in a stale value).
>>
>> As I saw it though, forwarding and interlocks were "necessary evils" for
>> general usability (short of the compiler needing to emit a bunch of NOPs
>> and similar).
>>>>
>>>> Well, except for interlock stalls, where EX1/2/3 and WB advance, but
>>>> PF/IF/ID1/ID2 are stalled.
>>>>
>>>> Avoiding an interlock stall mechanism would, however, require that
>>>> machine-code include NOPs and similar as needed to account for
>>>> instruction latency (where the output value of each pipeline stage also
>>>> includes a flag to say whether the value is ready yet).
>>>>
>>> You can build a less rigid pipeline.
>> Apparently MIPS originally went for a pipeline without interlock
>> handling; and then re-added it not long after...
>>
>> I had noted that paths effecting the stall signals and also the SR.T bit
>> weigh heavily in terms of timing.
>>
>>
>> I had before considered adding "crash zones" into the pipeline, which
>> could allow delaying stall signals by 1 cycle.
>>
>> Say:
>> PF IF ID CZ
>> RF CZ
>> EX1 EX2 EX3 WB
>>
>> Where, say, if a stall or interlock is detected, the results of a stage
>> can be temporarily shunted into a crash-zone, and the crash-zone stage
>> is used as an input the next stage before that stage begins moving again
>> (with a 1-cycle constant delay).
>>
>> My guess as (besides the effort of doing so) this would likely result in
>> an increase in LUT cost.
>>>>
>>>> Well, and for a while had a bug, that I eventually realized was because
>>>> if (during a branch) the instruction in ID2 stage triggered an interlock
>>>> stall while the branch was initiating, it would cause the branch to fail
>>>> to initiate (as the IF stage simply fails to see the branch-target's
>>>> updated value for the PC register).
>>>>
>>>> Needed to add logic such that instructions in flushed pipeline stages
>>>> can't trigger pipeline interlocks...
>>>>
>>> Yep, that too.....
>> For a while, I thought it was a bug specific to a Load+Branch, but then
>> found that it could happen with other ops with a 3-cycle latency.
>>
>> It took an annoyingly long time to realize that it was due to interlocks
>> with an instruction following the branch.
>>
>> So, say:
>> MOV.L (SP, 40), R4
>> BRA lbl
>> ADD R4, R2, R4
>> Would trigger the bug.
>>
>> Though, there were some ugly workaround hacks until I figured out what
>> was going on here and added a more proper fix...
>>>>
>>>> There are potentially higher performance designs, but "there be dragons"...
>>>>
>>>

Re: Aprupt underflow mode

<cc777e29-d126-447f-9746-505f58cf8e0bn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32970&group=comp.arch#32970

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1a18:b0:400:9314:d5aa with SMTP id f24-20020a05622a1a1800b004009314d5aamr807315qtb.12.1687994825153;
Wed, 28 Jun 2023 16:27:05 -0700 (PDT)
X-Received: by 2002:a05:6870:2886:b0:1b0:449e:d001 with SMTP id
gy6-20020a056870288600b001b0449ed001mr4364065oab.0.1687994824983; Wed, 28 Jun
2023 16:27:04 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 28 Jun 2023 16:27:04 -0700 (PDT)
In-Reply-To: <u7ib1k$1skas$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8cb4:1f9:34a7:fd0f;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8cb4:1f9:34a7:fd0f
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <2023Jun25.180112@mips.complang.tuwien.ac.at>
<b613de1c-7401-4e0d-959b-d8d51f75abd6n@googlegroups.com> <u7b8q8$qh6r$1@dont-email.me>
<u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <u7dr9h$1adv7$1@dont-email.me>
<cdHmM.6688$KtZc.1812@fx08.iad> <u7g0do$1hlav$1@dont-email.me>
<15c07e9f-fc84-4b25-9cc7-2234b7a153c7n@googlegroups.com> <u7gmr5$1n5ge$1@dont-email.me>
<0fd56f3f-fd72-4dba-905a-22640fc847c6n@googlegroups.com> <u7ib1k$1skas$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <cc777e29-d126-447f-9746-505f58cf8e0bn@googlegroups.com>
Subject: Re: Aprupt underflow mode
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Wed, 28 Jun 2023 23:27:05 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4031
 by: MitchAlsup - Wed, 28 Jun 2023 23:27 UTC

On Wednesday, June 28, 2023 at 5:08:55 PM UTC-5, BGB wrote:
> On 6/28/2023 2:36 PM, robf...@gmail.com wrote:
> > On Wednesday, June 28, 2023 at 3:18:01 AM UTC-4, BGB wrote:
> >> On 6/27/2023 9:17 PM, MitchAlsup wrote:
> >>> On Tuesday, June 27, 2023 at 7:55:24 PM UTC-5, BGB wrote:
> >>>> On 6/27/2023 2:58 PM, EricP wrote:
> >>>>> BGB wrote:
> >>>>
> >>>> I use a strict in-order pipeline design.
> >>>>
> >>>> Stuff either advances in lock-step at 1 stage per cycle, or the pipeline
> >>>> stalls.
> >>> <
> >>> This could be a cause on one of more of your speed paths. We found
> >>> it difficult in the 1-wide and 2-wide in-order days to take register
> >>> specifiers from the instruction, CAM all of the potential results in
> >>> the pipeline, find-first the youngest one for each operand register,
> >>> and use that as unary select input to the forwarding multiplexer.
> >>> Only after you [perform all the forwarding do you get to the point
> >>> where you can address whether all the potential instructions
> >>> actually issued or stalled, and exactly where to cut the pipeline.
> >> The register forwarding logic seems to be one of those things that
> >> scales poorly.
> >
> > If you can accept that the forwarding logic will be what limits the
> > design, then performance can be improved somewhat by adding
> > more complex instructions as long as it is within the timing limit.
> >
> Had partly started writing up an "idea spec" for a new pipeline design,
> where the idea would be to eliminate the both the register forwarding
> and the use of stalls in the main EX pipeline (instead using a "Low
> Speed Queue" for instructions that can't be fully pipelined, with L1
> misses also being shunted off to this queue).
<
Since you have the entire architecture under your control::
You could reserve register specifier values to indicate explicit
forwarding (and which bus to forward) This totally eliminates
the forwarding logic from any speed path.
<
If you have a n-wide machine and a j-deep pipeline, you need
a field that has n×j values. I did this in a GPU design and the
compiler decorated the fields.

Re: Aprupt underflow mode

<u7j7m6$233l8$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32971&group=comp.arch#32971

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Aprupt underflow mode
Date: Thu, 29 Jun 2023 08:17:41 +0200
Organization: A noiseless patient Spider
Lines: 95
Message-ID: <u7j7m6$233l8$1@dont-email.me>
References: <u78vn4$3keqd$1@newsreader4.netcologne.de>
<2023Jun25.180112@mips.complang.tuwien.ac.at>
<b613de1c-7401-4e0d-959b-d8d51f75abd6n@googlegroups.com>
<u7b8q8$qh6r$1@dont-email.me> <u7cde2$10aie$1@dont-email.me>
<7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com>
<2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me>
<d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me>
<77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 29 Jun 2023 06:17:42 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="31b5812eaab3125c9627f03ab3392108";
logging-data="2199208"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19vMSDXJSEd4jCjhI1J2DxCAYkir/IBkqPDPLZDNEGt3w=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.16
Cancel-Lock: sha1:Goja5ZLxTIxjmFSlX+s4DFWhuo4=
In-Reply-To: <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
 by: Terje Mathisen - Thu, 29 Jun 2023 06:17 UTC

Michael S wrote:
> On Wednesday, June 28, 2023 at 10:14:45 PM UTC+3, Terje Mathisen wrote:
>> Michael S wrote:
>>> On Wednesday, June 28, 2023 at 8:43:41 AM UTC+3, Terje Mathisen wrote:
>>>> Michael S wrote:
>>>>> On Tuesday, June 27, 2023 at 7:29:25 AM UTC+3, Quadibloc wrote:
>>>>>> On Monday, June 26, 2023 at 1:57:28 PM UTC-6, Michael S wrote:
>>>>>>> On Monday, June 26, 2023 at 7:12:54 PM UTC+3, BGB wrote:
>>>>>>
>>>>>>>> I guess one possibility (for cheaper hardware) could be to not perform
>>>>>>>> subnormal handling in hardware, but then have a flag in a control
>>>>>>>> register that tells the FPU that if a subnormal result would be
>>>>>>>> generated, to raise a fault.
>>>>>>
>>>>>>> That was done, multiple times, and not just on cheap hardware, but also
>>>>>>> on pretty expensive HW, like first couple of generations of DEC Alpha.
>>>>>>> But never on x86.
>>>>>> Of course, the reason _why_ this wasn't done on x86, leaving out SSE,
>>>>>
>>>>> Of course, in the post above is 'x86' means SSE and later.
>>>>>
>>>>> x87 is irrelevant because it predates IEEE 754 and, except for formats of the
>>>>> data, never had aspirations to be fully compatible with IEEE 754.
>>>> Please check your history!
>>>>
>>>> The 8087 was the inspiration for 754, i.e. the 754-1985 standards
>>>> development started with the '87, then added a few tweaks which got
>>>> included in versions of the FPU which was released after that point in
>>>> time, i.e. the 80387. From Wikipedia
>>>> <https://en.wikipedia.org/wiki/Intel_8087>:
>>>>
>>>> IEEE floating-point standard
>>>> When Intel designed the 8087, it aimed to make a standard floating-point
>>>> format for future designs. An important aspect of the 8087 from a
>>>> historical perspective was that it became the basis for the IEEE 754
>>>> floating-point standard.
>>>> Terje
>>>>
>>>>
>>>> --
>>>> - <Terje.Mathisen at tmsw.no>
>>>> "almost all programming can be viewed as an exercise in caching"
>>>
>>> There is no contradiction: x87 served as inspiration for IEEE 754-85, however
>>> after the standard was published, developers of next generations of
>>> x87-compatible FPUs had no aspirations to provide optional standard
>>> compliance.
>>>
>>> According to my semi-educated estimate, providing such option would
>>> not be too much of the burden on i486 silicon budget. Despite that they didn't
>>> do it. May be, because they were forward looking and knew that in coming
>>> Pentium, due to fully-pipelined FPU, any complication *would be* a significant
>>> burden. Or more likely because they didn't consider it important.
>> What specifically is your beef? The 80387 was in fact fully compliant
>> with the 754 spec as it existed at the time. Yes, it was different from
>> prety much all others due to the internal support for extended/80-bit
>> float, but that is still explicitly allowed by the standard.
>>
>
> May be, it is allowed by the -85 standard to omit support for arithmetic
> operations that take binary64 inputs and produces binary64 outputs as
> specified by IEEE, but it is certainly unusual.
> It does not appear to be allowed by -2008 standard that says: "All conforming
> implementations of this standard shall provide the operations listed in this
> clausefor all supported arithmetic formats, except as stated below."
> May be, lawyers among as could read it in a different way from how I read it.
> Or, may be, one can say that the x87 implementation supports only one format -
> 80-bit extended precision and that the rest of formats are here purely to help
> people with data interchange. But that does not sound satisfactory.
> And if nevertheless one takes this particular position then what exactly is the
> purpose of precision control field in x87 control word?

The only problem with the precision control word setting single or
double precision is that the exponent range isn't modified: If you want
to do that as well, in order to force under or overflow, then you do
have to store to a memory format. As you guessed above, this is in fact
legal, mostly because it delivers better range than required, but not
currently a mainstream way to operate.

Afair, only the 68K had a similar 80-bit extended real format.

Please think of the 754 as a standard that tries to deliver the best
results possible, more or less anything a vendor does that is a true
superset of the standard would be grandfathered in.

Personally I am a lot more irritated about when to detect (and report)
subnormal results: You can do so either before or after rounding, so it
is perfectly legal to report subnormal for a result which ends up as the
smallest normal number.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Aprupt underflow mode

<u7jf3t$3r8k2$1@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32972&group=comp.arch#32972

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd4-c42d-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Aprupt underflow mode
Date: Thu, 29 Jun 2023 08:24:29 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <u7jf3t$3r8k2$1@newsreader4.netcologne.de>
References: <u78vn4$3keqd$1@newsreader4.netcologne.de>
<2023Jun25.180112@mips.complang.tuwien.ac.at>
<b613de1c-7401-4e0d-959b-d8d51f75abd6n@googlegroups.com>
<u7b8q8$qh6r$1@dont-email.me> <u7cde2$10aie$1@dont-email.me>
<7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com>
<2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me>
<d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me>
<77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me>
Injection-Date: Thu, 29 Jun 2023 08:24:29 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd4-c42d-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd4:c42d:0:7285:c2ff:fe6c:992d";
logging-data="4039298"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Thu, 29 Jun 2023 08:24 UTC

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:

> Afair, only the 68K had a similar 80-bit extended real format.

I believe the MC88110 did as well, in its extended register file.

80x87 and IEEE 754 (was: Aprupt underflow mode)

<2023Jun29.190507@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32973&group=comp.arch#32973

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: 80x87 and IEEE 754 (was: Aprupt underflow mode)
Date: Thu, 29 Jun 2023 17:05:07 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 17
Message-ID: <2023Jun29.190507@mips.complang.tuwien.ac.at>
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <u7b8q8$qh6r$1@dont-email.me> <u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com> <780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com> <u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com> <u7i0r1$1rhn2$1@dont-email.me> <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com> <u7j7m6$233l8$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="d014cffa1600dfd4ce6f6e668bc0d5cc";
logging-data="2327324"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/qRwr8BJIFoXOmR8E2nuSZ"
Cancel-Lock: sha1:9a8tF5EQ2r5nGIre/UOy6ipldW8=
X-newsreader: xrn 10.11
 by: Anton Ertl - Thu, 29 Jun 2023 17:05 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:
>The only problem with the precision control word setting single or
>double precision is that the exponent range isn't modified: If you want
>to do that as well, in order to force under or overflow, then you do
>have to store to a memory format.

That does not give the same results in all cases; it's a
double-rounding problem: First round for (say) 53-bit mantissa with
16-bit exponent, then round for 53-bit mantissa with 11-bit exponent.

My guess is that Intel did not add a proper binary64 and binary32
mode, because the cases where it makes a difference are rare.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: 80x87 and IEEE 754 (was: Aprupt underflow mode)

<8de2ce9b-2800-42d3-acb6-67bb25c57accn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32974&group=comp.arch#32974

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1a1b:b0:765:444f:785c with SMTP id bk27-20020a05620a1a1b00b00765444f785cmr418qkb.0.1688063429603;
Thu, 29 Jun 2023 11:30:29 -0700 (PDT)
X-Received: by 2002:a05:6871:4692:b0:1b0:8e8e:1e03 with SMTP id
ni18-20020a056871469200b001b08e8e1e03mr641085oab.11.1688063429386; Thu, 29
Jun 2023 11:30:29 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!newsfeed.hasname.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 29 Jun 2023 11:30:29 -0700 (PDT)
In-Reply-To: <2023Jun29.190507@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4930:d523:6e50:b6cd;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4930:d523:6e50:b6cd
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <u7b8q8$qh6r$1@dont-email.me>
<u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me> <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me> <2023Jun29.190507@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8de2ce9b-2800-42d3-acb6-67bb25c57accn@googlegroups.com>
Subject: Re: 80x87 and IEEE 754 (was: Aprupt underflow mode)
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 29 Jun 2023 18:30:29 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2818
 by: MitchAlsup - Thu, 29 Jun 2023 18:30 UTC

On Thursday, June 29, 2023 at 12:13:59 PM UTC-5, Anton Ertl wrote:
> Terje Mathisen <terje.m...@tmsw.no> writes:
> >The only problem with the precision control word setting single or
> >double precision is that the exponent range isn't modified: If you want
> >to do that as well, in order to force under or overflow, then you do
> >have to store to a memory format.
> That does not give the same results in all cases; it's a
> double-rounding problem: First round for (say) 53-bit mantissa with
> 16-bit exponent, then round for 53-bit mantissa with 11-bit exponent.
>
> My guess is that Intel did not add a proper binary64 and binary32
> mode, because the cases where it makes a difference are rare.
<
One can even argue that the surprise factor is lower with the larger
exponent..............fewer programmers are surprised by spurious
results when overflows don't happen, than when they do.
<
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: 80x87 and IEEE 754

<u7kmcp$287sv$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32975&group=comp.arch#32975

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: 80x87 and IEEE 754
Date: Thu, 29 Jun 2023 21:34:49 +0200
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <u7kmcp$287sv$1@dont-email.me>
References: <u78vn4$3keqd$1@newsreader4.netcologne.de>
<u7b8q8$qh6r$1@dont-email.me> <u7cde2$10aie$1@dont-email.me>
<7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com>
<2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me>
<d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me>
<77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me> <2023Jun29.190507@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 29 Jun 2023 19:34:49 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f501d9ada2037b71a6620e3a81604758";
logging-data="2367391"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19mX/mwc+N3yYk7nGc1KQw93f305vBm324rseUTmeKuug=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.16
Cancel-Lock: sha1:5HJYBv6BddDFKrySbN40RGS5QK4=
In-Reply-To: <2023Jun29.190507@mips.complang.tuwien.ac.at>
 by: Terje Mathisen - Thu, 29 Jun 2023 19:34 UTC

Anton Ertl wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>> The only problem with the precision control word setting single or
>> double precision is that the exponent range isn't modified: If you want
>> to do that as well, in order to force under or overflow, then you do
>> have to store to a memory format.
>
> That does not give the same results in all cases; it's a
> double-rounding problem: First round for (say) 53-bit mantissa with
> 16-bit exponent, then round for 53-bit mantissa with 11-bit exponent.
>
> My guess is that Intel did not add a proper binary64 and binary32
> mode, because the cases where it makes a difference are rare.

I don't know why they did it, but you are absolutely correct: In the
case of doing a single FMUL/FADD/FSUB/FDIV/FSQRT operation in 64-bit
precision mode, where the result happens to be a subnormal, the internal
operation will instead deliver a normal result with a correspondingly
smaller exponent, and the rounding happens after the 52 bits of the
mantissa.

When you next store this to memory, you will get a second rounding
operation corresponding to the denormalized mantissa, and it is in this
particular situation that you can get a different result.

If you operate on float/single values, the double rounding is fine,
because you go from 64 via 52 to 23 mantissa bits, and that will always
deliver exactly the same final result.

If Intel had had the foresight (and/or FPU resources) to implement a
proper fp128 format instead of the 80-bit, then all the double rounding
problems would have been moot.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Aprupt underflow mode

<13de7c70-a07d-4eeb-95ba-2420ac263087n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32976&group=comp.arch#32976

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:4493:b0:762:495d:8f89 with SMTP id x19-20020a05620a449300b00762495d8f89mr747qkp.2.1688067967817;
Thu, 29 Jun 2023 12:46:07 -0700 (PDT)
X-Received: by 2002:a05:6870:c4d:b0:1b0:21ca:3afe with SMTP id
lf13-20020a0568700c4d00b001b021ca3afemr824377oab.0.1688067967533; Thu, 29 Jun
2023 12:46:07 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 29 Jun 2023 12:46:07 -0700 (PDT)
In-Reply-To: <u7jf3t$3r8k2$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.136; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.136
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <2023Jun25.180112@mips.complang.tuwien.ac.at>
<b613de1c-7401-4e0d-959b-d8d51f75abd6n@googlegroups.com> <u7b8q8$qh6r$1@dont-email.me>
<u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me> <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me> <u7jf3t$3r8k2$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <13de7c70-a07d-4eeb-95ba-2420ac263087n@googlegroups.com>
Subject: Re: Aprupt underflow mode
From: already5chosen@yahoo.com (Michael S)
Injection-Date: Thu, 29 Jun 2023 19:46:07 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Michael S - Thu, 29 Jun 2023 19:46 UTC

On Thursday, June 29, 2023 at 11:24:33 AM UTC+3, Thomas Koenig wrote:
> Terje Mathisen <terje.m...@tmsw.no> schrieb:
> > Afair, only the 68K had a similar 80-bit extended real format.
> I believe the MC88110 did as well, in its extended register file.

Itanium too.
Also unsuccessful, but nearly as unsuccessful as 88110.

Re: 80x87 and IEEE 754

<d5146573-8bf5-4aa7-bfff-f455655e04c2n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32977&group=comp.arch#32977

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:560c:b0:634:ea69:324 with SMTP id mg12-20020a056214560c00b00634ea690324mr1336qvb.13.1688068782403;
Thu, 29 Jun 2023 12:59:42 -0700 (PDT)
X-Received: by 2002:aca:6245:0:b0:3a2:62ec:b7ee with SMTP id
w66-20020aca6245000000b003a262ecb7eemr195857oib.9.1688068782073; Thu, 29 Jun
2023 12:59:42 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 29 Jun 2023 12:59:41 -0700 (PDT)
In-Reply-To: <u7kmcp$287sv$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.136; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.136
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <u7b8q8$qh6r$1@dont-email.me>
<u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me> <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me> <2023Jun29.190507@mips.complang.tuwien.ac.at> <u7kmcp$287sv$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d5146573-8bf5-4aa7-bfff-f455655e04c2n@googlegroups.com>
Subject: Re: 80x87 and IEEE 754
From: already5chosen@yahoo.com (Michael S)
Injection-Date: Thu, 29 Jun 2023 19:59:42 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4054
 by: Michael S - Thu, 29 Jun 2023 19:59 UTC

On Thursday, June 29, 2023 at 10:34:53 PM UTC+3, Terje Mathisen wrote:
> Anton Ertl wrote:
> > Terje Mathisen <terje.m...@tmsw.no> writes:
> >> The only problem with the precision control word setting single or
> >> double precision is that the exponent range isn't modified: If you want
> >> to do that as well, in order to force under or overflow, then you do
> >> have to store to a memory format.
> >
> > That does not give the same results in all cases; it's a
> > double-rounding problem: First round for (say) 53-bit mantissa with
> > 16-bit exponent, then round for 53-bit mantissa with 11-bit exponent.
> >
> > My guess is that Intel did not add a proper binary64 and binary32
> > mode, because the cases where it makes a difference are rare.
> I don't know why they did it, but you are absolutely correct: In the
> case of doing a single FMUL/FADD/FSUB/FDIV/FSQRT operation in 64-bit
> precision mode, where the result happens to be a subnormal, the internal
> operation will instead deliver a normal result with a correspondingly
> smaller exponent, and the rounding happens after the 52 bits of the
> mantissa.
>
> When you next store this to memory, you will get a second rounding
> operation corresponding to the denormalized mantissa, and it is in this
> particular situation that you can get a different result.
>
> If you operate on float/single values, the double rounding is fine,
> because you go from 64 via 52 to 23 mantissa bits, and that will always
> deliver exactly the same final result.
>

It seems to me, the only difference between binary32 and binary64 is that
in binary32 you can reliably get IEEE-prescribed results of fadd/fsub/fmul if
you *do not* set precision to 24 bit. Instead you store each result as binary32
to memory and then reload from memory. In this case it does not matter
whether precision was set to 53 bits or 64 bits.
I still does not work for fdiv.

> If Intel had had the foresight (and/or FPU resources) to implement a
> proper fp128 format instead of the 80-bit, then all the double rounding
> problems would have been moot.

For add/mul, but not for div.

> Terje
>
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: 80x87 and IEEE 754 (was: Aprupt underflow mode)

<a397d7e0-24cb-4c4a-9b78-e74981e17ce4n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32978&group=comp.arch#32978

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2950:b0:767:33c3:d51e with SMTP id n16-20020a05620a295000b0076733c3d51emr805qkp.0.1688068916050;
Thu, 29 Jun 2023 13:01:56 -0700 (PDT)
X-Received: by 2002:a05:6870:530e:b0:1b0:5b97:af37 with SMTP id
j14-20020a056870530e00b001b05b97af37mr822108oan.11.1688068915571; Thu, 29 Jun
2023 13:01:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 29 Jun 2023 13:01:55 -0700 (PDT)
In-Reply-To: <8de2ce9b-2800-42d3-acb6-67bb25c57accn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.136; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.136
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <u7b8q8$qh6r$1@dont-email.me>
<u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me> <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me> <2023Jun29.190507@mips.complang.tuwien.ac.at> <8de2ce9b-2800-42d3-acb6-67bb25c57accn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a397d7e0-24cb-4c4a-9b78-e74981e17ce4n@googlegroups.com>
Subject: Re: 80x87 and IEEE 754 (was: Aprupt underflow mode)
From: already5chosen@yahoo.com (Michael S)
Injection-Date: Thu, 29 Jun 2023 20:01:56 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3102
 by: Michael S - Thu, 29 Jun 2023 20:01 UTC

On Thursday, June 29, 2023 at 9:30:31 PM UTC+3, MitchAlsup wrote:
> On Thursday, June 29, 2023 at 12:13:59 PM UTC-5, Anton Ertl wrote:
> > Terje Mathisen <terje.m...@tmsw.no> writes:
> > >The only problem with the precision control word setting single or
> > >double precision is that the exponent range isn't modified: If you want
> > >to do that as well, in order to force under or overflow, then you do
> > >have to store to a memory format.
> > That does not give the same results in all cases; it's a
> > double-rounding problem: First round for (say) 53-bit mantissa with
> > 16-bit exponent, then round for 53-bit mantissa with 11-bit exponent.
> >
> > My guess is that Intel did not add a proper binary64 and binary32
> > mode, because the cases where it makes a difference are rare.
> <
> One can even argue that the surprise factor is lower with the larger
> exponent..............fewer programmers are surprised by spurious
> results when overflows don't happen, than when they do.
> <
> > - anton
> > --
> > 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> > Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

That's correct, but only as long as you never store intermediate results in lower
precision. Not always possible.

Re: Aprupt underflow mode

<fb15800c-daba-4990-a368-3d44993d3bc6n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32979&group=comp.arch#32979

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1a19:b0:767:33a2:f4c2 with SMTP id bk25-20020a05620a1a1900b0076733a2f4c2mr744qkb.5.1688068991769;
Thu, 29 Jun 2023 13:03:11 -0700 (PDT)
X-Received: by 2002:a05:6870:9d94:b0:1b0:60ff:b756 with SMTP id
pv20-20020a0568709d9400b001b060ffb756mr822537oab.2.1688068991541; Thu, 29 Jun
2023 13:03:11 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 29 Jun 2023 13:03:11 -0700 (PDT)
In-Reply-To: <13de7c70-a07d-4eeb-95ba-2420ac263087n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.136; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.136
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <2023Jun25.180112@mips.complang.tuwien.ac.at>
<b613de1c-7401-4e0d-959b-d8d51f75abd6n@googlegroups.com> <u7b8q8$qh6r$1@dont-email.me>
<u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me> <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me> <u7jf3t$3r8k2$1@newsreader4.netcologne.de> <13de7c70-a07d-4eeb-95ba-2420ac263087n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fb15800c-daba-4990-a368-3d44993d3bc6n@googlegroups.com>
Subject: Re: Aprupt underflow mode
From: already5chosen@yahoo.com (Michael S)
Injection-Date: Thu, 29 Jun 2023 20:03:11 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2286
 by: Michael S - Thu, 29 Jun 2023 20:03 UTC

On Thursday, June 29, 2023 at 10:46:09 PM UTC+3, Michael S wrote:
> On Thursday, June 29, 2023 at 11:24:33 AM UTC+3, Thomas Koenig wrote:
> > Terje Mathisen <terje.m...@tmsw.no> schrieb:
> > > Afair, only the 68K had a similar 80-bit extended real format.
> > I believe the MC88110 did as well, in its extended register file.
> Itanium too.
> Also unsuccessful, but nearly as unsuccessful as 88110.

I meant to say "but not nearly as unsuccessful as"

Re: 80x87 and IEEE 754

<5f378818-50ae-4bec-aa02-47f8db022e2fn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32980&group=comp.arch#32980

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:134b:b0:400:9314:d5aa with SMTP id w11-20020a05622a134b00b004009314d5aamr1434qtk.12.1688074669492;
Thu, 29 Jun 2023 14:37:49 -0700 (PDT)
X-Received: by 2002:a17:90a:f312:b0:263:165a:9859 with SMTP id
ca18-20020a17090af31200b00263165a9859mr108049pjb.9.1688074668746; Thu, 29 Jun
2023 14:37:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 29 Jun 2023 14:37:48 -0700 (PDT)
In-Reply-To: <u7kmcp$287sv$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4930:d523:6e50:b6cd;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4930:d523:6e50:b6cd
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <u7b8q8$qh6r$1@dont-email.me>
<u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me> <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me> <2023Jun29.190507@mips.complang.tuwien.ac.at> <u7kmcp$287sv$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5f378818-50ae-4bec-aa02-47f8db022e2fn@googlegroups.com>
Subject: Re: 80x87 and IEEE 754
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 29 Jun 2023 21:37:49 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3807
 by: MitchAlsup - Thu, 29 Jun 2023 21:37 UTC

On Thursday, June 29, 2023 at 2:34:53 PM UTC-5, Terje Mathisen wrote:
> Anton Ertl wrote:
> > Terje Mathisen <terje.m...@tmsw.no> writes:
> >> The only problem with the precision control word setting single or
> >> double precision is that the exponent range isn't modified: If you want
> >> to do that as well, in order to force under or overflow, then you do
> >> have to store to a memory format.
> >
> > That does not give the same results in all cases; it's a
> > double-rounding problem: First round for (say) 53-bit mantissa with
> > 16-bit exponent, then round for 53-bit mantissa with 11-bit exponent.
> >
> > My guess is that Intel did not add a proper binary64 and binary32
> > mode, because the cases where it makes a difference are rare.
> I don't know why they did it, but you are absolutely correct: In the
> case of doing a single FMUL/FADD/FSUB/FDIV/FSQRT operation in 64-bit
> precision mode, where the result happens to be a subnormal, the internal
> operation will instead deliver a normal result with a correspondingly
> smaller exponent, and the rounding happens after the 52 bits of the
> mantissa.
<
If the rounding is Round to nearest ODD
>
> When you next store this to memory, you will get a second rounding
> operation corresponding to the denormalized mantissa, and it is in this
> particular situation that you can get a different result.
<
You will not get a different result.
<
Unfortunately IEEE 754-any does not have a RNO.
>
> If you operate on float/single values, the double rounding is fine,
> because you go from 64 via 52 to 23 mantissa bits, and that will always
> deliver exactly the same final result.
>
> If Intel had had the foresight (and/or FPU resources) to implement a
> proper fp128 format instead of the 80-bit, then all the double rounding
> problems would have been moot.
> Terje
>
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: 80x87 and IEEE 754 (was: Aprupt underflow mode)

<u7l2cm$29gag$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32981&group=comp.arch#32981

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: 80x87 and IEEE 754 (was: Aprupt underflow mode)
Date: Thu, 29 Jun 2023 17:59:31 -0500
Organization: A noiseless patient Spider
Lines: 98
Message-ID: <u7l2cm$29gag$1@dont-email.me>
References: <u78vn4$3keqd$1@newsreader4.netcologne.de>
<u7b8q8$qh6r$1@dont-email.me> <u7cde2$10aie$1@dont-email.me>
<7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com>
<2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me>
<d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me>
<77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me> <2023Jun29.190507@mips.complang.tuwien.ac.at>
<8de2ce9b-2800-42d3-acb6-67bb25c57accn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 29 Jun 2023 22:59:34 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="8574c21ff36c78e38c4554a92fa2dc63";
logging-data="2408784"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+s5bValztYuGTx+M41IV5g"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:CRwVWQJuWra7mHl0RhY1BdoGIbE=
Content-Language: en-US
In-Reply-To: <8de2ce9b-2800-42d3-acb6-67bb25c57accn@googlegroups.com>
 by: BGB - Thu, 29 Jun 2023 22:59 UTC

On 6/29/2023 1:30 PM, MitchAlsup wrote:
> On Thursday, June 29, 2023 at 12:13:59 PM UTC-5, Anton Ertl wrote:
>> Terje Mathisen <terje.m...@tmsw.no> writes:
>>> The only problem with the precision control word setting single or
>>> double precision is that the exponent range isn't modified: If you want
>>> to do that as well, in order to force under or overflow, then you do
>>> have to store to a memory format.
>> That does not give the same results in all cases; it's a
>> double-rounding problem: First round for (say) 53-bit mantissa with
>> 16-bit exponent, then round for 53-bit mantissa with 11-bit exponent.
>>
>> My guess is that Intel did not add a proper binary64 and binary32
>> mode, because the cases where it makes a difference are rare.
> <
> One can even argue that the surprise factor is lower with the larger
> exponent..............fewer programmers are surprised by spurious
> results when overflows don't happen, than when they do.
> <

In most contexts, it shouldn't matter.

Granted, in some of my NN experiments, I did end up needing to emulate
Binary16 on my PC as otherwise my training code would sometimes lead to
intermediate results going outside the range of what could be represented.

Well, and ended up with a "partial back-propagation" hack in that if an
intermediate value went outside of an allowed range, it would go back
and reduce all the weights.

Though, arguably, using BF16 could partly avoid this issue, provided at
least one doesn't overflow the Binary32 exponent range (or limit the
weights to a small enough range that an overflow is "basically impossible").

But, say, clamping Binary32 or BF16 to not go over 2^120, is not asking
quite as much as clamping Binary16 to around 2^8 to be sure that no
intermediate values are able to exceed the 2^15 limit...

Though, for the most part, in my experiment I was "mostly" using a
genetic-algorithm approach, where for the top performing nets in each
generation, it would breed new nets by randomly interpolating the
weights (1).

But, yeah, with emulation logic, and code to detect and reduce all the
weights if at any point the accumulator went over 2^14.

OTOH, arguably if one used S.E4.F3 weights, it would become basically
impossible to overflow the Binary16 range for any "reasonable sized" net...

1, With an interpolation value of, say:
0.5+(ssqrt(rand()-16384)/181.0)
Where ssqrt is, say:
(x>=0)?sqrt(x):(-sqrt(-x))

Where, it will primarily interpolate, but may also extrapolate.

Where, say, an approximation of ssqrt is also used as one of the main
activation functions in this case, ...

Though, one could maybe also argue for using back-propagation, or some
hybrid strategy (back-propagate for the error in each test-run before
selecting and breeding the nets?...).

Well, also maybe one can argue for more than 3 hidden layers, but in my
testing, effectiveness seemed to start to drop (relative to generation
count) when going beyond 3 layers (whereas 3 seemed to be more effective
than 1 or 2 layers). Though, this is likely to be task dependent to some
extent...

Though, I guess it might be interesting to try some standardized
benchmarks here, like digit recognition (say, so I can have some idea
how it compares to "other stuff").

Finds and goes to download the training data for the MNIST benchmark...
.... Why exactly does it need to be multiple GB?...
.... Why does it need to be an epic crapton of small files?...

This is a total abuse of the NTFS filesystem, but ironically turning on
"file and folder compression" makes the unzipping process faster...

OK, I guess there isn't just digits here, but also a bunch of pictures
of pieces of clothing for some reason. So, I guess the idea is that one
can also train a net to hopefully tell the difference between
pants/shirts/shoes/... ?...

Yeah, unzipping this file is taking a painfully long time in any case...

>> - anton
>> --
>> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
>> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: 80x87 and IEEE 754

<u7m4a8$2gdkt$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32982&group=comp.arch#32982

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: 80x87 and IEEE 754
Date: Fri, 30 Jun 2023 10:38:31 +0200
Organization: A noiseless patient Spider
Lines: 52
Message-ID: <u7m4a8$2gdkt$1@dont-email.me>
References: <u78vn4$3keqd$1@newsreader4.netcologne.de>
<u7b8q8$qh6r$1@dont-email.me> <u7cde2$10aie$1@dont-email.me>
<7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com>
<2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me>
<d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me>
<77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me> <2023Jun29.190507@mips.complang.tuwien.ac.at>
<u7kmcp$287sv$1@dont-email.me>
<d5146573-8bf5-4aa7-bfff-f455655e04c2n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 08:38:32 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="79526d2acda2b8dbcd601c44b2c3b0e5";
logging-data="2635421"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18OqmXB7RYTBGRM5nHAJJCE3BwP+Zl+PhE7JWrjNxadzw=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.16
Cancel-Lock: sha1:NvCsHi3KD9z/fsUpNXKXOdYKRV0=
In-Reply-To: <d5146573-8bf5-4aa7-bfff-f455655e04c2n@googlegroups.com>
 by: Terje Mathisen - Fri, 30 Jun 2023 08:38 UTC

Michael S wrote:
> On Thursday, June 29, 2023 at 10:34:53 PM UTC+3, Terje Mathisen wrote:
>> Anton Ertl wrote:
>>> Terje Mathisen <terje.m...@tmsw.no> writes:
>>>> The only problem with the precision control word setting single or
>>>> double precision is that the exponent range isn't modified: If you want
>>>> to do that as well, in order to force under or overflow, then you do
>>>> have to store to a memory format.
>>>
>>> That does not give the same results in all cases; it's a
>>> double-rounding problem: First round for (say) 53-bit mantissa with
>>> 16-bit exponent, then round for 53-bit mantissa with 11-bit exponent.
>>>
>>> My guess is that Intel did not add a proper binary64 and binary32
>>> mode, because the cases where it makes a difference are rare.
>> I don't know why they did it, but you are absolutely correct: In the
>> case of doing a single FMUL/FADD/FSUB/FDIV/FSQRT operation in 64-bit
>> precision mode, where the result happens to be a subnormal, the internal
>> operation will instead deliver a normal result with a correspondingly
>> smaller exponent, and the rounding happens after the 52 bits of the
>> mantissa.
>>
>> When you next store this to memory, you will get a second rounding
>> operation corresponding to the denormalized mantissa, and it is in this
>> particular situation that you can get a different result.
>>
>> If you operate on float/single values, the double rounding is fine,
>> because you go from 64 via 52 to 23 mantissa bits, and that will always
>> deliver exactly the same final result.
>>
>
> It seems to me, the only difference between binary32 and binary64 is that
> in binary32 you can reliably get IEEE-prescribed results of fadd/fsub/fmul if
> you *do not* set precision to 24 bit. Instead you store each result as binary32
> to memory and then reload from memory. In this case it does not matter
> whether precision was set to 53 bits or 64 bits.
> I still does not work for fdiv.

Wow!

Do you have an example? I have never seen this and tended to believe my
own experience with unsigned DIV where having N+1 bits in the reciprocal
is sufficient.

Going from 52 to 23 corresponds to 2N + 4 and that is supposed to always
give the same result...

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: 80x87 and IEEE 754

<u7m4gf$2gdkt$2@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32983&group=comp.arch#32983

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: 80x87 and IEEE 754
Date: Fri, 30 Jun 2023 10:41:51 +0200
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <u7m4gf$2gdkt$2@dont-email.me>
References: <u78vn4$3keqd$1@newsreader4.netcologne.de>
<u7b8q8$qh6r$1@dont-email.me> <u7cde2$10aie$1@dont-email.me>
<7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com>
<2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me>
<d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me>
<77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me> <2023Jun29.190507@mips.complang.tuwien.ac.at>
<u7kmcp$287sv$1@dont-email.me>
<5f378818-50ae-4bec-aa02-47f8db022e2fn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 08:41:51 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="79526d2acda2b8dbcd601c44b2c3b0e5";
logging-data="2635421"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ezBL1WQF2JpyPRgvgD4lbne4oqsH1aFK0dfcJMvDGvQ=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.16
Cancel-Lock: sha1:pInW+JjvUp3BScxf7iTcRWjEy3k=
In-Reply-To: <5f378818-50ae-4bec-aa02-47f8db022e2fn@googlegroups.com>
 by: Terje Mathisen - Fri, 30 Jun 2023 08:41 UTC

MitchAlsup wrote:
> On Thursday, June 29, 2023 at 2:34:53 PM UTC-5, Terje Mathisen wrote:
>> Anton Ertl wrote:
>>> Terje Mathisen <terje.m...@tmsw.no> writes:
>>>> The only problem with the precision control word setting single or
>>>> double precision is that the exponent range isn't modified: If you want
>>>> to do that as well, in order to force under or overflow, then you do
>>>> have to store to a memory format.
>>>
>>> That does not give the same results in all cases; it's a
>>> double-rounding problem: First round for (say) 53-bit mantissa with
>>> 16-bit exponent, then round for 53-bit mantissa with 11-bit exponent.
>>>
>>> My guess is that Intel did not add a proper binary64 and binary32
>>> mode, because the cases where it makes a difference are rare.
>> I don't know why they did it, but you are absolutely correct: In the
>> case of doing a single FMUL/FADD/FSUB/FDIV/FSQRT operation in 64-bit
>> precision mode, where the result happens to be a subnormal, the internal
>> operation will instead deliver a normal result with a correspondingly
>> smaller exponent, and the rounding happens after the 52 bits of the
>> mantissa.
> <
> If the rounding is Round to nearest ODD
>>
>> When you next store this to memory, you will get a second rounding
>> operation corresponding to the denormalized mantissa, and it is in this
>> particular situation that you can get a different result.
> <
> You will not get a different result.
> <
> Unfortunately IEEE 754-any does not have a RNO.

I never looked at RNO, but intuitively the Odd part acts as a proxy for
the sticky bit, so that it will propagate properly through the
subsequent secound rounding?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: 80x87 and IEEE 754

<365a4c88-acd1-44c5-912c-d9102f86dc77n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32984&group=comp.arch#32984

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:5c9:b0:635:f079:8dcd with SMTP id t9-20020a05621405c900b00635f0798dcdmr7450qvz.6.1688123671453;
Fri, 30 Jun 2023 04:14:31 -0700 (PDT)
X-Received: by 2002:a05:6a00:802:b0:67f:a7d3:3f67 with SMTP id
m2-20020a056a00080200b0067fa7d33f67mr2546832pfk.0.1688123671168; Fri, 30 Jun
2023 04:14:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 30 Jun 2023 04:14:30 -0700 (PDT)
In-Reply-To: <u7m4a8$2gdkt$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:a4fd:f073:be1e:4ae6;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:a4fd:f073:be1e:4ae6
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <u7b8q8$qh6r$1@dont-email.me>
<u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me> <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me> <2023Jun29.190507@mips.complang.tuwien.ac.at>
<u7kmcp$287sv$1@dont-email.me> <d5146573-8bf5-4aa7-bfff-f455655e04c2n@googlegroups.com>
<u7m4a8$2gdkt$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <365a4c88-acd1-44c5-912c-d9102f86dc77n@googlegroups.com>
Subject: Re: 80x87 and IEEE 754
From: already5chosen@yahoo.com (Michael S)
Injection-Date: Fri, 30 Jun 2023 11:14:31 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4811
 by: Michael S - Fri, 30 Jun 2023 11:14 UTC

On Friday, June 30, 2023 at 11:38:36 AM UTC+3, Terje Mathisen wrote:
> Michael S wrote:
> > On Thursday, June 29, 2023 at 10:34:53 PM UTC+3, Terje Mathisen wrote:
> >> Anton Ertl wrote:
> >>> Terje Mathisen <terje.m...@tmsw.no> writes:
> >>>> The only problem with the precision control word setting single or
> >>>> double precision is that the exponent range isn't modified: If you want
> >>>> to do that as well, in order to force under or overflow, then you do
> >>>> have to store to a memory format.
> >>>
> >>> That does not give the same results in all cases; it's a
> >>> double-rounding problem: First round for (say) 53-bit mantissa with
> >>> 16-bit exponent, then round for 53-bit mantissa with 11-bit exponent.
> >>>
> >>> My guess is that Intel did not add a proper binary64 and binary32
> >>> mode, because the cases where it makes a difference are rare.
> >> I don't know why they did it, but you are absolutely correct: In the
> >> case of doing a single FMUL/FADD/FSUB/FDIV/FSQRT operation in 64-bit
> >> precision mode, where the result happens to be a subnormal, the internal
> >> operation will instead deliver a normal result with a correspondingly
> >> smaller exponent, and the rounding happens after the 52 bits of the
> >> mantissa.
> >>
> >> When you next store this to memory, you will get a second rounding
> >> operation corresponding to the denormalized mantissa, and it is in this
> >> particular situation that you can get a different result.
> >>
> >> If you operate on float/single values, the double rounding is fine,
> >> because you go from 64 via 52 to 23 mantissa bits, and that will always
> >> deliver exactly the same final result.
> >>
> >
> > It seems to me, the only difference between binary32 and binary64 is that
> > in binary32 you can reliably get IEEE-prescribed results of fadd/fsub/fmul if
> > you *do not* set precision to 24 bit. Instead you store each result as binary32
> > to memory and then reload from memory. In this case it does not matter
> > whether precision was set to 53 bits or 64 bits.
> > I still does not work for fdiv.
> Wow!
>
> Do you have an example? I have never seen this and tended to believe my
> own experience with unsigned DIV where having N+1 bits in the reciprocal
> is sufficient.
>
> Going from 52 to 23 corresponds to 2N + 4 and that is supposed to always
> give the same result...
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

You are right. The [exact] ratio of two binary32 numbers contains at most
23 consecutive zero bits in-between 1 bits. So, double rounding through
53 bits will always give correct result.
And SQRT is of course o.k by virtue of never producing subnormal results.

Re: 80x87 and IEEE 754

<yeDnM.2558$fNr5.1450@fx16.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32985&group=comp.arch#32985

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx16.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: 80x87 and IEEE 754
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <u7b8q8$qh6r$1@dont-email.me> <u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com> <780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com> <u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com> <u7i0r1$1rhn2$1@dont-email.me> <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com> <u7j7m6$233l8$1@dont-email.me> <2023Jun29.190507@mips.complang.tuwien.ac.at> <u7kmcp$287sv$1@dont-email.me>
In-Reply-To: <u7kmcp$287sv$1@dont-email.me>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 56
Message-ID: <yeDnM.2558$fNr5.1450@fx16.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Fri, 30 Jun 2023 16:17:34 UTC
Date: Fri, 30 Jun 2023 12:17:12 -0400
X-Received-Bytes: 3441
 by: EricP - Fri, 30 Jun 2023 16:17 UTC

Terje Mathisen wrote:
> Anton Ertl wrote:
>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>> The only problem with the precision control word setting single or
>>> double precision is that the exponent range isn't modified: If you want
>>> to do that as well, in order to force under or overflow, then you do
>>> have to store to a memory format.
>>
>> That does not give the same results in all cases; it's a
>> double-rounding problem: First round for (say) 53-bit mantissa with
>> 16-bit exponent, then round for 53-bit mantissa with 11-bit exponent.
>>
>> My guess is that Intel did not add a proper binary64 and binary32
>> mode, because the cases where it makes a difference are rare.
>
> I don't know why they did it, but you are absolutely correct: In the
> case of doing a single FMUL/FADD/FSUB/FDIV/FSQRT operation in 64-bit
> precision mode, where the result happens to be a subnormal, the internal
> operation will instead deliver a normal result with a correspondingly
> smaller exponent, and the rounding happens after the 52 bits of the
> mantissa.
>
> When you next store this to memory, you will get a second rounding
> operation corresponding to the denormalized mantissa, and it is in this
> particular situation that you can get a different result.
>
> If you operate on float/single values, the double rounding is fine,
> because you go from 64 via 52 to 23 mantissa bits, and that will always
> deliver exactly the same final result.
>
> If Intel had had the foresight (and/or FPU resources) to implement a
> proper fp128 format instead of the 80-bit, then all the double rounding
> problems would have been moot.
>
> Terje
>
>

It took a bit of rummaging about but the 8087 was designed by
John F. Palmer, Bruce W. Ravenel, Rafi Nave.

Palmer wrote 1980 article which describes of some of its design rational.

The INTEL 8087 Numeric Data Processor, Palmer, 1980
https://search.iczhiku.com/paper/DyXtJLK9sG1DZUjG.pdf

and co-authored a 1979 article with Kahan

On a proposed floating-point standard, Kahan, Palmer, 1979
https://dl.acm.org/doi/pdf/10.1145/1057520.1057522

This is the 8087 patent which might offer some insight on it design.

Numeric data processor
https://patents.google.com/patent/USRE33629E/

Re: 80x87 and IEEE 754

<u7n00o$3tipq$1@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32986&group=comp.arch#32986

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd4-c42d-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: 80x87 and IEEE 754
Date: Fri, 30 Jun 2023 16:31:20 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <u7n00o$3tipq$1@newsreader4.netcologne.de>
References: <u78vn4$3keqd$1@newsreader4.netcologne.de>
<u7b8q8$qh6r$1@dont-email.me> <u7cde2$10aie$1@dont-email.me>
<7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com>
<2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me>
<d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me>
<77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me>
<2023Jun29.190507@mips.complang.tuwien.ac.at>
<u7kmcp$287sv$1@dont-email.me>
<5f378818-50ae-4bec-aa02-47f8db022e2fn@googlegroups.com>
<u7m4gf$2gdkt$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 16:31:20 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd4-c42d-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd4:c42d:0:7285:c2ff:fe6c:992d";
logging-data="4115258"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Fri, 30 Jun 2023 16:31 UTC

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
> MitchAlsup wrote:
>> On Thursday, June 29, 2023 at 2:34:53 PM UTC-5, Terje Mathisen wrote:
>>> Anton Ertl wrote:
>>>> Terje Mathisen <terje.m...@tmsw.no> writes:
>>>>> The only problem with the precision control word setting single or
>>>>> double precision is that the exponent range isn't modified: If you want
>>>>> to do that as well, in order to force under or overflow, then you do
>>>>> have to store to a memory format.
>>>>
>>>> That does not give the same results in all cases; it's a
>>>> double-rounding problem: First round for (say) 53-bit mantissa with
>>>> 16-bit exponent, then round for 53-bit mantissa with 11-bit exponent.
>>>>
>>>> My guess is that Intel did not add a proper binary64 and binary32
>>>> mode, because the cases where it makes a difference are rare.
>>> I don't know why they did it, but you are absolutely correct: In the
>>> case of doing a single FMUL/FADD/FSUB/FDIV/FSQRT operation in 64-bit
>>> precision mode, where the result happens to be a subnormal, the internal
>>> operation will instead deliver a normal result with a correspondingly
>>> smaller exponent, and the rounding happens after the 52 bits of the
>>> mantissa.
>> <
>> If the rounding is Round to nearest ODD
>>>
>>> When you next store this to memory, you will get a second rounding
>>> operation corresponding to the denormalized mantissa, and it is in this
>>> particular situation that you can get a different result.
>> <
>> You will not get a different result.
>> <
>> Unfortunately IEEE 754-any does not have a RNO.
>
> I never looked at RNO, but intuitively the Odd part acts as a proxy for
> the sticky bit, so that it will propagate properly through the
> subsequent secound rounding?

POWER has both round ties to odd and round ties to even in its
128-bit float operations. For example, xscvqpdp is "VSX Scalar
round & Convert Quad-Precision to Double-Precision", with
xscvqpdpo being the round to odd variant. There are also versions
for the usual arithmetic operations including FMA.

>
> Terje
>

Re: 80x87 and IEEE 754

<6WFnM.627$WpLf.170@fx33.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32987&group=comp.arch#32987

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx33.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: 80x87 and IEEE 754
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <u7b8q8$qh6r$1@dont-email.me> <u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com> <780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com> <u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com> <u7i0r1$1rhn2$1@dont-email.me> <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com> <u7j7m6$233l8$1@dont-email.me> <2023Jun29.190507@mips.complang.tuwien.ac.at> <u7kmcp$287sv$1@dont-email.me> <yeDnM.2558$fNr5.1450@fx16.iad>
In-Reply-To: <yeDnM.2558$fNr5.1450@fx16.iad>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 36
Message-ID: <6WFnM.627$WpLf.170@fx33.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Fri, 30 Jun 2023 19:20:34 UTC
Date: Fri, 30 Jun 2023 15:19:45 -0400
X-Received-Bytes: 2466
 by: EricP - Fri, 30 Jun 2023 19:19 UTC

EricP wrote:
>
> It took a bit of rummaging about but the 8087 was designed by
> John F. Palmer, Bruce W. Ravenel, Rafi Nave.
>
> Palmer wrote 1980 article which describes of some of its design rational.
>
> The INTEL 8087 Numeric Data Processor, Palmer, 1980
> https://search.iczhiku.com/paper/DyXtJLK9sG1DZUjG.pdf
>
> and co-authored a 1979 article with Kahan
>
> On a proposed floating-point standard, Kahan, Palmer, 1979
> https://dl.acm.org/doi/pdf/10.1145/1057520.1057522
>
> This is the 8087 patent which might offer some insight on it design.
>
> Numeric data processor
> https://patents.google.com/patent/USRE33629E/

What might have kicked off the whole development is that according
to the Wikipedia article on the 8087 Intel had previously
manufactured a 32-bit FPU for the 8080 called the 8231/8232,
which were licensed versions of the AMD 9511/9512 chips.

AMD 9511 uses a 1-7-24 32-bit FP format (which they call "double precision")
and 16 and 32-bit integers, has log and trig functions, etc.
It is a stack design with either 8 words of 16-bits or 4 words of 32-bits.

There are links to manuals for both AMD 9511 and Intel 8231 on the page.
The AMD manual documents the log and trig calculations.

https://en.wikipedia.org/wiki/Intel_8231/8232

Re: 80x87 and IEEE 754

<3da81b3a-8264-451d-95c0-160b8a3168aan@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32988&group=comp.arch#32988

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:303:b0:635:ddbc:cfc7 with SMTP id i3-20020a056214030300b00635ddbccfc7mr12119qvu.1.1688153855045;
Fri, 30 Jun 2023 12:37:35 -0700 (PDT)
X-Received: by 2002:a63:474d:0:b0:55b:33b8:609f with SMTP id
w13-20020a63474d000000b0055b33b8609fmr1754735pgk.11.1688153854777; Fri, 30
Jun 2023 12:37:34 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 30 Jun 2023 12:37:34 -0700 (PDT)
In-Reply-To: <u7m4gf$2gdkt$2@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:442:f22d:37bd:282c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:442:f22d:37bd:282c
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <u7b8q8$qh6r$1@dont-email.me>
<u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me> <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me> <2023Jun29.190507@mips.complang.tuwien.ac.at>
<u7kmcp$287sv$1@dont-email.me> <5f378818-50ae-4bec-aa02-47f8db022e2fn@googlegroups.com>
<u7m4gf$2gdkt$2@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3da81b3a-8264-451d-95c0-160b8a3168aan@googlegroups.com>
Subject: Re: 80x87 and IEEE 754
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Fri, 30 Jun 2023 19:37:35 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3288
 by: MitchAlsup - Fri, 30 Jun 2023 19:37 UTC

On Friday, June 30, 2023 at 3:41:55 AM UTC-5, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Thursday, June 29, 2023 at 2:34:53 PM UTC-5, Terje Mathisen wrote:
> > If the rounding is Round to nearest ODD
> >>
> >> When you next store this to memory, you will get a second rounding
> >> operation corresponding to the denormalized mantissa, and it is in this
> >> particular situation that you can get a different result.
> > <
> > You will not get a different result.
> > <
> > Unfortunately IEEE 754-any does not have a RNO.
<
> I never looked at RNO, but intuitively the Odd part acts as a proxy for
> the sticky bit, so that it will propagate properly through the
> subsequent secound rounding?
<
What RNO does is to avoid incrementing at the LoB, thus avoiding
a carry cascade that can change {guard and round} of the smaller
size which gets rounded later.
<
Mc 88100 numeric instructions had a size for each operand, and
also avoided the double-rounding when one did:
<
FADD.sdd R5,R6,R8
<
R6 and R8 are double (d) while R5 is single (s).
<
But you could get a double rounding if you WANTED with:
<
FADD.ddd R4,R6,R8
FCVT.sd R5,R4
<
The way most architectures define their semantics and expressibility.
<
Mostly, though, compiler people hated it.
<
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: 80x87 and IEEE 754

<e9fe29fb-5a74-42b7-bce6-1b5bdfc12a98n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32989&group=comp.arch#32989

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1a19:b0:767:33a2:f4c2 with SMTP id bk25-20020a05620a1a1900b0076733a2f4c2mr9141qkb.5.1688153873015;
Fri, 30 Jun 2023 12:37:53 -0700 (PDT)
X-Received: by 2002:a63:5614:0:b0:542:88b9:1d71 with SMTP id
k20-20020a635614000000b0054288b91d71mr1912927pgb.11.1688153872368; Fri, 30
Jun 2023 12:37:52 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.neodome.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer02.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 30 Jun 2023 12:37:51 -0700 (PDT)
In-Reply-To: <6WFnM.627$WpLf.170@fx33.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:442:f22d:37bd:282c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:442:f22d:37bd:282c
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <u7b8q8$qh6r$1@dont-email.me>
<u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com>
<780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com>
<u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com>
<u7i0r1$1rhn2$1@dont-email.me> <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com>
<u7j7m6$233l8$1@dont-email.me> <2023Jun29.190507@mips.complang.tuwien.ac.at>
<u7kmcp$287sv$1@dont-email.me> <yeDnM.2558$fNr5.1450@fx16.iad> <6WFnM.627$WpLf.170@fx33.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e9fe29fb-5a74-42b7-bce6-1b5bdfc12a98n@googlegroups.com>
Subject: Re: 80x87 and IEEE 754
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Fri, 30 Jun 2023 19:37:53 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3589
 by: MitchAlsup - Fri, 30 Jun 2023 19:37 UTC

On Friday, June 30, 2023 at 2:20:38 PM UTC-5, EricP wrote:
> EricP wrote:
> >
> > It took a bit of rummaging about but the 8087 was designed by
> > John F. Palmer, Bruce W. Ravenel, Rafi Nave.
> >
> > Palmer wrote 1980 article which describes of some of its design rational.
> >
> > The INTEL 8087 Numeric Data Processor, Palmer, 1980
> > https://search.iczhiku.com/paper/DyXtJLK9sG1DZUjG.pdf
> >
> > and co-authored a 1979 article with Kahan
> >
> > On a proposed floating-point standard, Kahan, Palmer, 1979
> > https://dl.acm.org/doi/pdf/10.1145/1057520.1057522
> >
> > This is the 8087 patent which might offer some insight on it design.
> >
> > Numeric data processor
> > https://patents.google.com/patent/USRE33629E/
> What might have kicked off the whole development is that according
> to the Wikipedia article on the 8087 Intel had previously
> manufactured a 32-bit FPU for the 8080 called the 8231/8232,
> which were licensed versions of the AMD 9511/9512 chips.
>
> AMD 9511 uses a 1-7-24 32-bit FP format (which they call "double precision")
<
Well, if you define word as 16-bits it is Double the Precision of what fits in
a word.......
<
> and 16 and 32-bit integers, has log and trig functions, etc.
> It is a stack design with either 8 words of 16-bits or 4 words of 32-bits..
<
From a modern perspective one has to have the thought "What were they thinking"
run through your head now and again.
>
> There are links to manuals for both AMD 9511 and Intel 8231 on the page.
> The AMD manual documents the log and trig calculations.
>
> https://en.wikipedia.org/wiki/Intel_8231/8232

Re: 80x87 and IEEE 754

<UUZnM.255$1ZN4.198@fx12.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32990&group=comp.arch#32990

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx12.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: 80x87 and IEEE 754
References: <u78vn4$3keqd$1@newsreader4.netcologne.de> <u7b8q8$qh6r$1@dont-email.me> <u7cde2$10aie$1@dont-email.me> <7467e4f5-f727-4d46-b491-8052af8fbcb2n@googlegroups.com> <780c43c5-d79f-4bfe-91dc-6443b3397325n@googlegroups.com> <2304c8fb-20cd-40e9-a9e4-0149d4d39a87n@googlegroups.com> <u7gha9$1mms4$1@dont-email.me> <d138dab4-a19a-4e0b-98af-780aa50ecdf4n@googlegroups.com> <u7i0r1$1rhn2$1@dont-email.me> <77319995-dd17-425a-9427-210feddd2d46n@googlegroups.com> <u7j7m6$233l8$1@dont-email.me> <2023Jun29.190507@mips.complang.tuwien.ac.at> <u7kmcp$287sv$1@dont-email.me> <yeDnM.2558$fNr5.1450@fx16.iad>
In-Reply-To: <yeDnM.2558$fNr5.1450@fx16.iad>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 78
Message-ID: <UUZnM.255$1ZN4.198@fx12.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sat, 01 Jul 2023 18:04:36 UTC
Date: Sat, 01 Jul 2023 14:02:54 -0400
X-Received-Bytes: 4656
 by: EricP - Sat, 1 Jul 2023 18:02 UTC

EricP wrote:
> Terje Mathisen wrote:
>> Anton Ertl wrote:
>>> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>>> The only problem with the precision control word setting single or
>>>> double precision is that the exponent range isn't modified: If you want
>>>> to do that as well, in order to force under or overflow, then you do
>>>> have to store to a memory format.
>>>
>>> That does not give the same results in all cases; it's a
>>> double-rounding problem: First round for (say) 53-bit mantissa with
>>> 16-bit exponent, then round for 53-bit mantissa with 11-bit exponent.
>>>
>>> My guess is that Intel did not add a proper binary64 and binary32
>>> mode, because the cases where it makes a difference are rare.
>>
>> I don't know why they did it, but you are absolutely correct: In the
>> case of doing a single FMUL/FADD/FSUB/FDIV/FSQRT operation in 64-bit
>> precision mode, where the result happens to be a subnormal, the
>> internal operation will instead deliver a normal result with a
>> correspondingly smaller exponent, and the rounding happens after the
>> 52 bits of the mantissa.
>>
>> When you next store this to memory, you will get a second rounding
>> operation corresponding to the denormalized mantissa, and it is in
>> this particular situation that you can get a different result.
>>
>> If you operate on float/single values, the double rounding is fine,
>> because you go from 64 via 52 to 23 mantissa bits, and that will
>> always deliver exactly the same final result.
>>
>> If Intel had had the foresight (and/or FPU resources) to implement a
>> proper fp128 format instead of the 80-bit, then all the double
>> rounding problems would have been moot.
>>
>> Terje
>>
>>
>
> It took a bit of rummaging about but the 8087 was designed by
> John F. Palmer, Bruce W. Ravenel, Rafi Nave.
>
> Palmer wrote 1980 article which describes of some of its design rational.
>
> The INTEL 8087 Numeric Data Processor, Palmer, 1980
> https://search.iczhiku.com/paper/DyXtJLK9sG1DZUjG.pdf

After poking about some more, just guessing but it looks to me like the
reason that Double Rounding (DR) happened was: nobody thought of it.

From looking at the dates on papers it doesn't look like anyone
even noticed DR was happening until 1995. Then they started
backtracking the problem to its source.

Palmer introduces the FP80 format for the reasons he gives and
that unknowingly creates the potential for DR on FP80 to FP64.
The 8087 launches in 1980.
But it still takes 15 years for anyone to detect that DR is an
issue and only if you spill the FP80 to FP64 with a store.

When is Double Rounding Innocuous, Figueroa, 1995
https://dl.acm.org/doi/pdf/10.1145/221332.221334

If p is the smaller fraction #bits then DR differences only happen
when the larger fraction bits is < 2p or 2p+1 or 2p+2 bits,
depending on the operation + - * / or sqrt.
This could not have happened on PDP11 or VAX FP formats.
It could not happen on 8087 for FP80 to FP32, just FP80 to FP64.

IIRC the Microsoft 16-bit compilers for Win3.1 supported FP80 as a
native data type. However the 32-bit MS compiler for WinNT in 1992 onwards
did not, just FP32 and FP64, for both results and spilled intermediates.
And 32-bit code in WinNT 3.1 and 3.5 kinda languishes until Win95 lands.

Then Win95 launches and the frequency of spilling FP80 to FP64 really
goes up and that's people notice it and go "oh... crap".


devel / comp.arch / Re: 80x87 and IEEE 754

Pages:123
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor