Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

A pain in the ass of major dimensions. -- C. A. Desoer, on the solution of non-linear circuits


devel / comp.lang.forth / Eliminating FDUP F*

SubjectAuthor
* Eliminating FDUP F*Krishna Myneni
+* Re: Eliminating FDUP F*minforth
|+* Re: Eliminating FDUP F*mhx
||+- Re: Eliminating FDUP F*minforth
||`- Re: Eliminating FDUP F*none
|`* Re: Eliminating FDUP F*Krishna Myneni
| `* Re: Eliminating FDUP F*minforth
|  `* Re: Eliminating FDUP F*Krishna Myneni
|   `* Re: Eliminating FDUP F*minforth
|    +* Re: Eliminating FDUP F*Anton Ertl
|    |+- Re: Eliminating FDUP F*minforth
|    |`* Re: Eliminating FDUP F*none
|    | `* Re: Eliminating FDUP F*Anton Ertl
|    |  +* Re: Eliminating FDUP F*minforth
|    |  |`* Re: Eliminating FDUP F*Anton Ertl
|    |  | `* Re: Eliminating FDUP F*minforth
|    |  |  +* Re: Eliminating FDUP F*none
|    |  |  |+- Re: Eliminating FDUP F*minforth
|    |  |  |`* Re: Eliminating FDUP F*Anton Ertl
|    |  |  | +* Re: Eliminating FDUP F*mhx
|    |  |  | |`* Re: Eliminating FDUP F*Anton Ertl
|    |  |  | | +* Re: Eliminating FDUP F*mhx
|    |  |  | | |`- Re: Eliminating FDUP F*Anton Ertl
|    |  |  | | `* Re: Eliminating FDUP F*, numerical analysisjan Coombs
|    |  |  | |  +* Re: Eliminating FDUP F*, numerical analysisminforth
|    |  |  | |  |+* Re: Eliminating FDUP F*, numerical analysisnone
|    |  |  | |  ||`- Re: Eliminating FDUP F*, numerical analysisminforth
|    |  |  | |  |`- Re: Eliminating FDUP F*, numerical analysis, positJan Coombs
|    |  |  | |  `* Re: Eliminating FDUP F*, numerical analysisAnton Ertl
|    |  |  | |   +- Re: Eliminating FDUP F*, numerical analysisminforth
|    |  |  | |   `- Re: Eliminating FDUP F*, numerical analysisjan Coombs
|    |  |  | `- Re: Eliminating FDUP F*none
|    |  |  `- Re: Eliminating FDUP F*Anton Ertl
|    |  `* Re: Eliminating FDUP F*none
|    |   +* Re: Eliminating FDUP F*mhx
|    |   |`* Re: Eliminating FDUP F*Anton Ertl
|    |   | `* Re: Eliminating FDUP F*none
|    |   |  `- Re: Eliminating FDUP F*Anton Ertl
|    |   `- Re: Eliminating FDUP F*Anton Ertl
|    `* Re: Eliminating FDUP F*Krishna Myneni
|     `* Re: Eliminating FDUP F*Anton Ertl
|      `- Re: Eliminating FDUP F*Krishna Myneni
`* Re: Eliminating FDUP F*minforth
 +* Re: Eliminating FDUP F*dxf
 |`* Re: Eliminating FDUP F*Krishna Myneni
 | +- Re: Eliminating FDUP F*dxf
 | `* Re: Eliminating FDUP F*mhx
 |  +- Re: Eliminating FDUP F*none
 |  `- Re: Eliminating FDUP F*Krishna Myneni
 +* Re: Eliminating FDUP F*Krishna Myneni
 |`* Re: Eliminating FDUP F*minforth
 | `* Re: Eliminating FDUP F*Krishna Myneni
 |  `* Re: Eliminating FDUP F*minforth
 |   `- Re: Eliminating FDUP F*dxf
 `- Re: Eliminating FDUP F*none

Pages:123
Eliminating FDUP F*

<uk7eb0$rni9$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25483&group=comp.lang.forth#25483

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: krishna.myneni@ccreweb.org (Krishna Myneni)
Newsgroups: comp.lang.forth
Subject: Eliminating FDUP F*
Date: Wed, 29 Nov 2023 07:29:04 -0600
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <uk7eb0$rni9$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 29 Nov 2023 13:29:04 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="8fce5ed209fefad1e9b2787a09304b45";
logging-data="908873"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/wSUfMEenD+y0XIuVdmq98"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:+Hg1TJh9lvr5Q2k57qVd6iQAD7U=
Content-Language: en-US
 by: Krishna Myneni - Wed, 29 Nov 2023 13:29 UTC

Use of the sequence "FDUP F*" is ubiquitous in Forth scientific code for
lack of a common word which squares an fp number. This not only is less
readable but does not convey as much meaning to anyone who is reading
the code.

I've updated the FSL modules in kForth (32, Win32, and 64) to remove use
all instances of "FDUP F*" with the (built-in) word FSQUARE. Some FSL
modules provided definitions of FSQR for the same function (by MHX) and
I replaced these instances with FSQUARE which I find more readable and
less error-prone due to the proximity of FSQR to FSQRT.

--
Krishna Myneni

Re: Eliminating FDUP F*

<8380f3f785960444e8687dabbec4739e@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25485&group=comp.lang.forth#25485

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: minforth@gmx.net (minforth)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Wed, 29 Nov 2023 14:29:09 +0000
Organization: novaBBS
Message-ID: <8380f3f785960444e8687dabbec4739e@news.novabbs.com>
References: <uk7eb0$rni9$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2493918"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 0d6d33dbe0e2e1ff58b82acfc1a8a32ac3b1cb72
X-Rslight-Site: $2y$10$rJTUywbkZ6xGw0sNlotNh.WI4OHn9bQpzNelgXEwAquh3VZaEQOry
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: minforth - Wed, 29 Nov 2023 14:29 UTC

Thanks.

In my apps I added for convenience
FINV alias 1/F
F2* F2/
FHYPOT sqrt(a^2+b^2)
FMA horner a*b+c

Re: Eliminating FDUP F*

<104d31d0da99003ed0dc323134d1243c@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25486&group=comp.lang.forth#25486

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: mhx@iae.nl (mhx)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Wed, 29 Nov 2023 18:05:25 +0000
Organization: novaBBS
Message-ID: <104d31d0da99003ed0dc323134d1243c@news.novabbs.com>
References: <uk7eb0$rni9$1@dont-email.me> <8380f3f785960444e8687dabbec4739e@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2510944"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 463cbf1a76c808942982a163321348c75477c065
X-Rslight-Site: $2y$10$b9mVsEYr8XkfZNqO/I4shO0FFNfe7i/PCgOZlTFmTjinkIXP2mM96
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: mhx - Wed, 29 Nov 2023 18:05 UTC

minforth wrote:

> FHYPOT sqrt(a^2+b^2)

This is a nice one (that iForth does
not have) because FHYPOT is not only
more efficient but also documents a
tricky numerical problem.

-marcel

Re: Eliminating FDUP F*

<5b1c9e0e7f744cb7cd2fc01e224f5e05@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25487&group=comp.lang.forth#25487

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: minforth@gmx.net (minforth)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Wed, 29 Nov 2023 20:45:46 +0000
Organization: novaBBS
Message-ID: <5b1c9e0e7f744cb7cd2fc01e224f5e05@news.novabbs.com>
References: <uk7eb0$rni9$1@dont-email.me> <8380f3f785960444e8687dabbec4739e@news.novabbs.com> <104d31d0da99003ed0dc323134d1243c@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2523488"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 0d6d33dbe0e2e1ff58b82acfc1a8a32ac3b1cb72
X-Rslight-Site: $2y$10$qoMwQ1Np26Hi5JmgzYo7F.h4kQy8vsaBDSuSPME/tRcsg1CbPqBZi
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: minforth - Wed, 29 Nov 2023 20:45 UTC

mhx wrote:
>> FHYPOT sqrt(a^2+b^2)

> This is a nice one (that iForth does
> not have) because FHYPOT is not only
> more efficient but also documents a
> tricky numerical problem.

Perhaps as in here?
https://arxiv.org/pdf/1904.09481.pdf

Re: Eliminating FDUP F*

<uk8mvm$12676$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25488&group=comp.lang.forth#25488

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: krishna.myneni@ccreweb.org (Krishna Myneni)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Wed, 29 Nov 2023 19:02:45 -0600
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <uk8mvm$12676$1@dont-email.me>
References: <uk7eb0$rni9$1@dont-email.me>
<8380f3f785960444e8687dabbec4739e@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 30 Nov 2023 01:02:47 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b36cdd24889b46370144b7d2dbda3725";
logging-data="1120486"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX192iZKywDUfoziOXvq6OxFS"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:sA9A27sSHObYF8L0NO0M+SWedQw=
In-Reply-To: <8380f3f785960444e8687dabbec4739e@news.novabbs.com>
Content-Language: en-US
 by: Krishna Myneni - Thu, 30 Nov 2023 01:02 UTC

On 11/29/23 08:29, minforth wrote:
> Thanks.
>
> In my apps I added for convenience
> FINV alias 1/F
> F2* F2/
> FHYPOT    sqrt(a^2+b^2)
> FMA    horner a*b+c

FINV is also a commonly needed word, instead of writing

"1.0E0 FSWAP F/".

The other most useful word for vector/matrix code is F+!, which also
improves the efficiency, readability, and compactness of code. Use of
F+! can be found in the FSL modules.

F+! has common usage and is easily comprehensible so it may be time to
enter it formally into the Forth floating point lexicon.

--
KM

Re: Eliminating FDUP F*

<66dc01276259490340293bbc19930769@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25489&group=comp.lang.forth#25489

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: minforth@gmx.net (minforth)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Thu, 30 Nov 2023 08:22:58 +0000
Organization: novaBBS
Message-ID: <66dc01276259490340293bbc19930769@news.novabbs.com>
References: <uk7eb0$rni9$1@dont-email.me> <8380f3f785960444e8687dabbec4739e@news.novabbs.com> <uk8mvm$12676$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2572166"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$ejOLPlLYrB38E22tqdTiKOZvgaEcXHOUjvXC5JqiG6orKKjCWCJNS
X-Rslight-Posting-User: 0d6d33dbe0e2e1ff58b82acfc1a8a32ac3b1cb72
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: minforth - Thu, 30 Nov 2023 08:22 UTC

Krishna Myneni wrote:

> On 11/29/23 08:29, minforth wrote:
>> Thanks.
>>
>> In my apps I added for convenience
>> FINV alias 1/F
>> F2* F2/
>> FHYPOT    sqrt(a^2+b^2)
>> FMA    horner a*b+c

> FINV is also a commonly needed word, instead of writing

> "1.0E0 FSWAP F/".

> The other most useful word for vector/matrix code is F+!, which also
> improves the efficiency, readability, and compactness of code. Use of
> F+! can be found in the FSL modules.

> F+! has common usage and is easily comprehensible so it may be time to
> enter it formally into the Forth floating point lexicon.

May I add F*! for scalar operations on vector/matrix elements

Re: Eliminating FDUP F*

<nnd$5607b88b$0d914501@bdf695eed9921205>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25490&group=comp.lang.forth#25490

  copy link   Newsgroups: comp.lang.forth
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
References: <uk7eb0$rni9$1@dont-email.me> <8380f3f785960444e8687dabbec4739e@news.novabbs.com> <104d31d0da99003ed0dc323134d1243c@news.novabbs.com>
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: albert@cherry (none)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$5607b88b$0d914501@bdf695eed9921205>
Organization: KPN B.V.
Date: Thu, 30 Nov 2023 10:51:59 +0100
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.mb-net.net!open-news-network.org!news.mind.de!bolzen.all.de!npeer.as286.net!npeer-ng0.as286.net!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe005.abavia.com!abp001.abavia.com!news.kpn.nl!not-for-mail
Lines: 26
Injection-Date: Thu, 30 Nov 2023 10:51:59 +0100
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 1616
 by: none - Thu, 30 Nov 2023 09:51 UTC

In article <104d31d0da99003ed0dc323134d1243c@news.novabbs.com>,
mhx <mhx@iae.nl> wrote:
>minforth wrote:
>
>> FHYPOT sqrt(a^2+b^2)
>
>This is a nice one (that iForth does
>not have) because FHYPOT is not only
>more efficient but also documents a
>tricky numerical problem.

The hyp calculation is stable as hell, I can't think
of any numerical problem.
It is also useful. I added a `` HYPOs '' as a separate
screen to my fixed point screen.
(Using DSQRT that is not particularly difficult to
implement.)

>-marcel
Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

Re: Eliminating FDUP F*

<ukauma$1h7a6$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25495&group=comp.lang.forth#25495

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: krishna.myneni@ccreweb.org (Krishna Myneni)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Thu, 30 Nov 2023 15:26:32 -0600
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <ukauma$1h7a6$1@dont-email.me>
References: <uk7eb0$rni9$1@dont-email.me>
<8380f3f785960444e8687dabbec4739e@news.novabbs.com>
<uk8mvm$12676$1@dont-email.me>
<66dc01276259490340293bbc19930769@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 30 Nov 2023 21:26:34 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b36cdd24889b46370144b7d2dbda3725";
logging-data="1613126"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX187jOUU9G9TisIXhz/pERTX"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:5tiD0ApFaApgbiOyFVp0J0YlQA4=
In-Reply-To: <66dc01276259490340293bbc19930769@news.novabbs.com>
Content-Language: en-US
 by: Krishna Myneni - Thu, 30 Nov 2023 21:26 UTC

On 11/30/23 02:22, minforth wrote:
> Krishna Myneni wrote:
>
....
>> The other most useful word for vector/matrix code is F+!, which also
>> improves the efficiency, readability, and compactness of code. Use of
>> F+! can be found in the FSL modules.
>
>> F+! has common usage and is easily comprehensible so it may be time to
>> enter it formally into the Forth floating point lexicon.
>
> May I add F*! for scalar operations on vector/matrix elements

It should make the code for loops which scale arrays more compact, but
typically, it is more rare to loop over a sequence of scalars which
multiply a single array element (value at a fixed address) than it is to
loop over a sequence of scalars which accumulate into a single array
element e.g. matrix multiplication.

--
Krishna

Re: Eliminating FDUP F*

<f15ee6da7bfcfe0436e0a5c38d46bfaa@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25502&group=comp.lang.forth#25502

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: minforth@gmx.net (minforth)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Fri, 1 Dec 2023 12:32:03 +0000
Organization: novaBBS
Message-ID: <f15ee6da7bfcfe0436e0a5c38d46bfaa@news.novabbs.com>
References: <uk7eb0$rni9$1@dont-email.me> <8380f3f785960444e8687dabbec4739e@news.novabbs.com> <uk8mvm$12676$1@dont-email.me> <66dc01276259490340293bbc19930769@news.novabbs.com> <ukauma$1h7a6$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2704144"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 0d6d33dbe0e2e1ff58b82acfc1a8a32ac3b1cb72
X-Rslight-Site: $2y$10$Rx0nNTmr2uvSHXTP2L67ve7k2FmaMcRy13h0Shzbl.zBF1qi.iptC
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: minforth - Fri, 1 Dec 2023 12:32 UTC

Krishna Myneni wrote:
> It should make the code for loops which scale arrays more compact, but
> typically, it is more rare to loop over a sequence of scalars which
> multiply a single array element (value at a fixed address) than it is to
> loop over a sequence of scalars which accumulate into a single array
> element e.g. matrix multiplication.

Matrix multiplication (if not available as a primitive or from an external
library) is an example. In other numerical matrix algorithms, pivoting is
is rather common, which involves scalar column or row multiplication.
Most occurrences in my code involve shifting and scaling of vectors.

Re: Eliminating FDUP F*

<2023Dec2.080651@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25506&group=comp.lang.forth#25506

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Sat, 02 Dec 2023 07:06:51 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 40
Message-ID: <2023Dec2.080651@mips.complang.tuwien.ac.at>
References: <uk7eb0$rni9$1@dont-email.me> <8380f3f785960444e8687dabbec4739e@news.novabbs.com> <uk8mvm$12676$1@dont-email.me> <66dc01276259490340293bbc19930769@news.novabbs.com> <ukauma$1h7a6$1@dont-email.me> <f15ee6da7bfcfe0436e0a5c38d46bfaa@news.novabbs.com>
Injection-Info: dont-email.me; posting-host="a31d1fa9158d0bb84814993ffd199cf8";
logging-data="2406240"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+8xRp9YPnIyuLeemmYIftP"
Cancel-Lock: sha1:rFjP2G2+OY286HxbJ8NYfOfWv8k=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sat, 2 Dec 2023 07:06 UTC

minforth@gmx.net (minforth) writes:
>Krishna Myneni wrote:
>> It should make the code for loops which scale arrays more compact, but
>> typically, it is more rare to loop over a sequence of scalars which
>> multiply a single array element (value at a fixed address) than it is to
>> loop over a sequence of scalars which accumulate into a single array
>> element e.g. matrix multiplication.
>
>Matrix multiplication (if not available as a primitive or from an external
>library) is an example.

Not in my experience. Matrix multiplication always multiplies one
element of one matrix with one element of the other matrix. Since you
still need both matrices, you do not want to use F*! for that. Matrix
multiplication adds a number of the products of these multiplications;
e.g., for a 1000x1000 matrix multiply, it sums up 1000 products
resulting in one element of the target matrix. F+! can be used for
that.

But for these kinds of things, it's better to use specialized code,
such as OpenBLAS. E.g., if you look at slides 80 and 87 of
https://www.complang.tuwien.ac.at/anton/lvas/efficient.pdf, you see
that OpenBLAS is >13 times as fast for 1000x1000 matrix multiplication
(on a Tiger Lake CPU) than a straightforward scalar implementation of
matrix multiplication that uses the best loop nesting. Compared to
the naive variant that uses a dot product, the speedup exceeds a
factor of 25 (slide 78). Even when the auto-vectorization of gcc
kicks in (with -O3), the result is still >5 times slower than
OpenBLAS.

"THP" on these slides means that transparent huge pages are enabled
and kick in (there is no guarantee that they kick in if they are
enabled).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Re: Eliminating FDUP F*

<f12a6b634a1f915176394dc87cc727b1@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25507&group=comp.lang.forth#25507

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: minforth@gmx.net (minforth)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Sat, 2 Dec 2023 09:01:53 +0000
Organization: novaBBS
Message-ID: <f12a6b634a1f915176394dc87cc727b1@news.novabbs.com>
References: <uk7eb0$rni9$1@dont-email.me> <8380f3f785960444e8687dabbec4739e@news.novabbs.com> <uk8mvm$12676$1@dont-email.me> <66dc01276259490340293bbc19930769@news.novabbs.com> <ukauma$1h7a6$1@dont-email.me> <f15ee6da7bfcfe0436e0a5c38d46bfaa@news.novabbs.com> <2023Dec2.080651@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2794393"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$ne/c5RUrqrOfbtCSjtyYruKkYPO0iMkx0gZfTUVWENLdlDtrXxGuC
X-Rslight-Posting-User: 0d6d33dbe0e2e1ff58b82acfc1a8a32ac3b1cb72
 by: minforth - Sat, 2 Dec 2023 09:01 UTC

Anton Ertl wrote:
>>Matrix multiplication (if not available as a primitive or from an external
>>library) is an example.

> Not in my experience. Matrix multiplication always multiplies one
> element of one matrix with one element of the other matrix. Since you
> still need both matrices, you do not want to use F*! for that. Matrix
> multiplication adds a number of the products of these multiplications;
> e.g., for a 1000x1000 matrix multiply, it sums up 1000 products
> resulting in one element of the target matrix. F+! can be used for
> that.

> But for these kinds of things, it's better to use specialized code,
> such as OpenBLAS. E.g., if you look at slides 80 and 87 of
> https://www.complang.tuwien.ac.at/anton/lvas/efficient.pdf, you see
> that OpenBLAS is >13 times as fast for 1000x1000 matrix multiplication
> (on a Tiger Lake CPU) than a straightforward scalar implementation of
> matrix multiplication that uses the best loop nesting. Compared to
> the naive variant that uses a dot product, the speedup exceeds a
> factor of 25 (slide 78). Even when the auto-vectorization of gcc
> kicks in (with -O3), the result is still >5 times slower than
> OpenBLAS.

> "THP" on these slides means that transparent huge pages are enabled
> and kick in (there is no guarantee that they kick in if they are
> enabled).

Yes. On desktop systems, it makes little sense not to use numerical maths
libraries for such problems. Large matrices are usually decomposed into
blocks, and sparse matrices require special techniques. It would be quite
tedious to reinvent all the wheels and program them by hand in Forth code,
let alone debug and optimise your creation.

Things are different, however, if you don't have the space to hold fat
library files. In resource-constrained systems, you'll prefer in-place
algorithms wherever possible. If you can do the calculations in background
tasks, speed is not important. And LU decomposition helps a lot, but that
is no surprise.

Re: Eliminating FDUP F*

<nnd$1a101463$1a7abd0d@ce9396c29ac6b3bd>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25508&group=comp.lang.forth#25508

  copy link   Newsgroups: comp.lang.forth
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
References: <uk7eb0$rni9$1@dont-email.me> <ukauma$1h7a6$1@dont-email.me> <f15ee6da7bfcfe0436e0a5c38d46bfaa@news.novabbs.com> <2023Dec2.080651@mips.complang.tuwien.ac.at>
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: albert@cherry (none)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$1a101463$1a7abd0d@ce9396c29ac6b3bd>
Organization: KPN B.V.
Date: Sat, 02 Dec 2023 14:56:03 +0100
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe006.abavia.com!abp003.abavia.com!news.kpn.nl!not-for-mail
Lines: 52
Injection-Date: Sat, 02 Dec 2023 14:56:03 +0100
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 3192
 by: none - Sat, 2 Dec 2023 13:56 UTC

In article <2023Dec2.080651@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>minforth@gmx.net (minforth) writes:
>>Krishna Myneni wrote:
>>> It should make the code for loops which scale arrays more compact, but
>>> typically, it is more rare to loop over a sequence of scalars which
>>> multiply a single array element (value at a fixed address) than it is to
>>> loop over a sequence of scalars which accumulate into a single array
>>> element e.g. matrix multiplication.
>>
>>Matrix multiplication (if not available as a primitive or from an external
>>library) is an example.
>
>Not in my experience. Matrix multiplication always multiplies one
>element of one matrix with one element of the other matrix. Since you
>still need both matrices, you do not want to use F*! for that. Matrix
>multiplication adds a number of the products of these multiplications;
>e.g., for a 1000x1000 matrix multiply, it sums up 1000 products
>resulting in one element of the target matrix. F+! can be used for
>that.
>
>But for these kinds of things, it's better to use specialized code,
>such as OpenBLAS. E.g., if you look at slides 80 and 87 of
>https://www.complang.tuwien.ac.at/anton/lvas/efficient.pdf, you see
>that OpenBLAS is >13 times as fast for 1000x1000 matrix multiplication
>(on a Tiger Lake CPU) than a straightforward scalar implementation of
>matrix multiplication that uses the best loop nesting. Compared to
>the naive variant that uses a dot product, the speedup exceeds a
>factor of 25 (slide 78). Even when the auto-vectorization of gcc
>kicks in (with -O3), the result is still >5 times slower than
>OpenBLAS.
>
>"THP" on these slides means that transparent huge pages are enabled
>and kick in (there is no guarantee that they kick in if they are
>enabled).

This is an excellent opportunity to introduce a single assembler
routine that does a huge speed up.
Approximately a vector times vector multiplication with
specified start addresses, specified strides, and a length.

>
>- anton
>--

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

Re: Eliminating FDUP F*

<2023Dec2.174433@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25510&group=comp.lang.forth#25510

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Sat, 02 Dec 2023 16:44:33 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 49
Message-ID: <2023Dec2.174433@mips.complang.tuwien.ac.at>
References: <uk7eb0$rni9$1@dont-email.me> <ukauma$1h7a6$1@dont-email.me> <f15ee6da7bfcfe0436e0a5c38d46bfaa@news.novabbs.com> <2023Dec2.080651@mips.complang.tuwien.ac.at> <nnd$1a101463$1a7abd0d@ce9396c29ac6b3bd>
Injection-Info: dont-email.me; posting-host="a31d1fa9158d0bb84814993ffd199cf8";
logging-data="2568278"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18OGsJWBknqofcolYZ/iuoZ"
Cancel-Lock: sha1:2EArfD8yeqOV28rcSoIa3Ki+s+o=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sat, 2 Dec 2023 16:44 UTC

albert@cherry.(none) (albert) writes:
>In article <2023Dec2.080651@mips.complang.tuwien.ac.at>,
>Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>>But for these kinds of things, it's better to use specialized code,
>>such as OpenBLAS. E.g., if you look at slides 80 and 87 of
>>https://www.complang.tuwien.ac.at/anton/lvas/efficient.pdf, you see
>>that OpenBLAS is >13 times as fast for 1000x1000 matrix multiplication
>>(on a Tiger Lake CPU) than a straightforward scalar implementation of
>>matrix multiplication that uses the best loop nesting. Compared to
>>the naive variant that uses a dot product, the speedup exceeds a
>>factor of 25 (slide 78). Even when the auto-vectorization of gcc
>>kicks in (with -O3), the result is still >5 times slower than
>>OpenBLAS.
>>
>>"THP" on these slides means that transparent huge pages are enabled
>>and kick in (there is no guarantee that they kick in if they are
>>enabled).
>
>This is an excellent opportunity to introduce a single assembler
>routine that does a huge speed up.
>Approximately a vector times vector multiplication with
>specified start addresses, specified strides, and a length.

You mean something like:

'v*' ( f-addr1 nstride1 f-addr2 nstride2 ucount -- r ) gforth-0.5 "v-star"
dot-product: r=v1*v2. The first element of v1 is at f_addr1, the
next at f_addr1+nstride1 and so on (similar for v2). Both vectors have
ucount elements.

However, note that the dot-product variant is slower than OpenBLAS by
a factor of 25. The best scalar implementation from slide 80 is quite
a bit faster (Factor 13 slower than OpenBLAS) and can be implemented
with

'faxpy' ( ra f-x nstridex f-y nstridey ucount -- ) gforth-0.5 "faxpy"
vy=ra*vx+vy

FAXPY can be implemented in a way that selects a vectorized
implementation if nstridex=nstridey=1 FLOATS. The result would be
slower than OpenBLAS by a factor of 5 (all numbers for 1000x1000
matrix multiplication).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Re: Eliminating FDUP F*

<9198892c8c20bea6f1d90ee7df9c8f76@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25511&group=comp.lang.forth#25511

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: minforth@gmx.net (minforth)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Sun, 3 Dec 2023 08:21:49 +0000
Organization: novaBBS
Message-ID: <9198892c8c20bea6f1d90ee7df9c8f76@news.novabbs.com>
References: <uk7eb0$rni9$1@dont-email.me> <ukauma$1h7a6$1@dont-email.me> <f15ee6da7bfcfe0436e0a5c38d46bfaa@news.novabbs.com> <2023Dec2.080651@mips.complang.tuwien.ac.at> <nnd$1a101463$1a7abd0d@ce9396c29ac6b3bd> <2023Dec2.174433@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2896779"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$e.rFcmFt7v5URxJqWHwvVuSGbpTYkHjkLW.T/GQ01LTbGbjWcV8ze
X-Rslight-Posting-User: 0d6d33dbe0e2e1ff58b82acfc1a8a32ac3b1cb72
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: minforth - Sun, 3 Dec 2023 08:21 UTC

Anton Ertl wrote:
> 'v*' ( f-addr1 nstride1 f-addr2 nstride2 ucount -- r ) gforth-0.5 "v-star"
> dot-product: r=v1*v2. The first element of v1 is at f_addr1, the
> next at f_addr1+nstride1 and so on (similar for v2). Both vectors have
> ucount elements.

> However, note that the dot-product variant is slower than OpenBLAS by
> a factor of 25. The best scalar implementation from slide 80 is quite
> a bit faster (Factor 13 slower than OpenBLAS) and can be implemented
> with

It is not only about speed, but also about minimising calculation errors.

For example, naive dot product summation in a single loop, which is
unfortunately what gforth does, is prone to accumulating rounding errors.

Nothing to blame here, but library functions are often "very smart".

Re: Eliminating FDUP F*

<nnd$1beec904$48a6f5e7@6254ff88a0e86fe0>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25513&group=comp.lang.forth#25513

  copy link   Newsgroups: comp.lang.forth
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
References: <uk7eb0$rni9$1@dont-email.me> <2023Dec2.080651@mips.complang.tuwien.ac.at> <nnd$1a101463$1a7abd0d@ce9396c29ac6b3bd> <2023Dec2.174433@mips.complang.tuwien.ac.at>
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: albert@cherry (none)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$1beec904$48a6f5e7@6254ff88a0e86fe0>
Organization: KPN B.V.
Date: Sun, 03 Dec 2023 12:57:30 +0100
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe004.abavia.com!abp002.abavia.com!news.kpn.nl!not-for-mail
Lines: 36
Injection-Date: Sun, 03 Dec 2023 12:57:30 +0100
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 2207
 by: none - Sun, 3 Dec 2023 11:57 UTC

In article <2023Dec2.174433@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>albert@cherry.(none) (albert) writes:
>>This is an excellent opportunity to introduce a single assembler
>>routine that does a huge speed up.
>>Approximately a vector times vector multiplication with
>>specified start addresses, specified strides, and a length.
>
>You mean something like:
>
>'v*' ( f-addr1 nstride1 f-addr2 nstride2 ucount -- r ) gforth-0.5 "v-star"
> dot-product: r=v1*v2. The first element of v1 is at f_addr1, the
>next at f_addr1+nstride1 and so on (similar for v2). Both vectors have
>ucount elements.

>
>However, note that the dot-product variant is slower than OpenBLAS by
>a factor of 25. The best scalar implementation from slide 80 is quite

Loosing that much imagining using all 8 registers of the 8087 stack
is astonishing, if V* really is implemented in assembler.

If you do a more sophisticated version with at least 8 fp registers
available, you can prefetch easily 2 fp numbers in advance for
each stride.

>
>- anton

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

Re: Eliminating FDUP F*

<80161f628ef485921bef6dbecf033a94@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25515&group=comp.lang.forth#25515

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: mhx@iae.nl (mhx)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Sun, 3 Dec 2023 13:26:56 +0000
Organization: novaBBS
Message-ID: <80161f628ef485921bef6dbecf033a94@news.novabbs.com>
References: <uk7eb0$rni9$1@dont-email.me> <2023Dec2.080651@mips.complang.tuwien.ac.at> <nnd$1a101463$1a7abd0d@ce9396c29ac6b3bd> <2023Dec2.174433@mips.complang.tuwien.ac.at> <nnd$1beec904$48a6f5e7@6254ff88a0e86fe0>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2920032"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$k7SWu2jXVyUlDpL1aKsUX.yXjL9rPIIFhOnmE8SXnCfoHiQyBwEji
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 463cbf1a76c808942982a163321348c75477c065
 by: mhx - Sun, 3 Dec 2023 13:26 UTC

none wrote:

> In article <2023Dec2.174433@mips.complang.tuwien.ac.at>,
> Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>>albert@cherry.(none) (albert) writes:
[..]
> Loosing that much imagining using all 8 registers of the 8087 stack
> is astonishing, if V* really is implemented in assembler.

That is because OpenBLAS uses AVX2 with all cores working
in parallel. Memory access patterns are accounted for, as is
every cycle possibly lost at the start and end of a loop.
It is of course possible to beat it with application specific
tricks (the most obvious and effective is exploiting sparseness).

iForth's DAXPY is sse2-based but uses only 1 core.
I have a lot to learn.

CLK 4192 MHz ( 8 core machine )
60x60 mm - normal algorithm 2.03 GFlops, 2.05 ticks/flop, 0.211 ms
60x60 mm - blocking, factor of 20 1.02 GFlops, 4.09 ticks/flop, 0.422 ms
60x60 mm - transposed B matrix 8.58 GFlops, 0.48 ticks/flop, 50.000 us
60x60 mm - transposed B matrix #2 8.43 GFlops, 0.49 ticks/flop, 51.000 us
60x60 mm - Robert's algorithm 9.36 GFlops, 0.44 ticks/flop, 46.000 us
60x60 mm - T. Maeno's algorithm, subarray 20x20 1.06 GFlops, 3.91 ticks/flop, 0.403 ms
60x60 mm - D. Warner's algorithm, subarray 20x20 1.02 GFlops, 4.07 ticks/flop, 0.419 ms
60x60 mm - generic mat* 30.27 GFlops, 0.13 ticks/flop, 14.000 us
60x60 mm - iForth DGEMM1 54.61 GFlops, 0.07 ticks/flop, 7.000 us
60x60 mm - iForth SMMD* 54.89 GFlops, 0.07 ticks/flop, 7.000 us
60x60 mm - iForth DAXPY based 7.76 GFlops, 0.53 ticks/flop, 55.000 us

120x120 mm - normal algorithm 3.36 GFlops, 1.24 ticks/flop, 1.027 ms
120x120 mm - blocking, factor of 20 0.99 GFlops, 4.19 ticks/flop, 3.461 ms
120x120 mm - transposed B matrix 12.07 GFlops, 0.34 ticks/flop, 0.286 ms
120x120 mm - transposed B matrix #2 11.97 GFlops, 0.35 ticks/flop, 0.288 ms
120x120 mm - Robert's algorithm 13.01 GFlops, 0.32 ticks/flop, 0.265 ms
120x120 mm - T. Maeno's algorithm, subarray 20x20 1.07 GFlops, 3.89 ticks/flop, 3.210 ms
120x120 mm - D. Warner's algorithm, subarray 20x20 1.03 GFlops, 4.04 ticks/flop, 3.335 ms
120x120 mm - generic mat* 111.25 GFlops, 0.03 ticks/flop, 31.000 us
120x120 mm - iForth DGEMM1 120.47 GFlops, 0.03 ticks/flop, 28.000 us
120x120 mm - iForth SMMD* 119.94 GFlops, 0.03 ticks/flop, 28.000 us
120x120 mm - iForth DAXPY based 13.22 GFlops, 0.31 ticks/flop, 0.261 ms

500x500 mm - normal algorithm 4.00 GFlops, 1.04 ticks/flop, 62.407 ms
500x500 mm - blocking, factor of 20 1.04 GFlops, 4.02 ticks/flop, 0.240 s
500x500 mm - transposed B matrix 16.75 GFlops, 0.25 ticks/flop, 14.919 ms
500x500 mm - transposed B matrix #2 16.55 GFlops, 0.25 ticks/flop, 15.099 ms
500x500 mm - Robert's algorithm 17.26 GFlops, 0.24 ticks/flop, 14.482 ms
500x500 mm - T. Maeno's algorithm, subarray 20x20 1.08 GFlops, 3.87 ticks/flop, 0.231 s
500x500 mm - D. Warner's algorithm, subarray 20x20 1.04 GFlops, 4.02 ticks/flop, 0.240 s
500x500 mm - generic mat* 14.35 GFlops, 0.29 ticks/flop, 17.410 ms
500x500 mm - iForth DGEMM1 67.18 GFlops, 0.06 ticks/flop, 3.721 ms
500x500 mm - iForth SMMD* 67.45 GFlops, 0.06 ticks/flop, 3.706 ms
500x500 mm - iForth DAXPY based 13.07 GFlops, 0.32 ticks/flop, 19.125 ms

-marcel

Re: Eliminating FDUP F*

<2023Dec3.145403@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25517&group=comp.lang.forth#25517

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!news.neodome.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Sun, 03 Dec 2023 13:54:03 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 77
Message-ID: <2023Dec3.145403@mips.complang.tuwien.ac.at>
References: <uk7eb0$rni9$1@dont-email.me> <2023Dec2.080651@mips.complang.tuwien.ac.at> <nnd$1a101463$1a7abd0d@ce9396c29ac6b3bd> <2023Dec2.174433@mips.complang.tuwien.ac.at> <nnd$1beec904$48a6f5e7@6254ff88a0e86fe0>
Injection-Info: dont-email.me; posting-host="77885d9748c2fefbce9cbb6ca9781c8c";
logging-data="3036126"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+rMYa0gYnnnoP4AtdEmkq6"
Cancel-Lock: sha1:T6NdNxUhrUaiNHCQQ1I/kpAkAm8=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 3 Dec 2023 13:54 UTC

albert@cherry.(none) (albert) writes:
>In article <2023Dec2.174433@mips.complang.tuwien.ac.at>,
>Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>>albert@cherry.(none) (albert) writes:
>>>This is an excellent opportunity to introduce a single assembler
>>>routine that does a huge speed up.
>>>Approximately a vector times vector multiplication with
>>>specified start addresses, specified strides, and a length.
>>
>>You mean something like:
>>
>>'v*' ( f-addr1 nstride1 f-addr2 nstride2 ucount -- r ) gforth-0.5 "v-star"
>> dot-product: r=v1*v2. The first element of v1 is at f_addr1, the
>>next at f_addr1+nstride1 and so on (similar for v2). Both vectors have
>>ucount elements.
>
>>
>>However, note that the dot-product variant is slower than OpenBLAS by
>>a factor of 25. The best scalar implementation from slide 80 is quite
>
>Loosing that much imagining using all 8 registers of the 8087 stack
>is astonishing, if V* really is implemented in assembler.

It does not use the 8087 stack at all.

>If you do a more sophisticated version with at least 8 fp registers
>available, you can prefetch easily 2 fp numbers in advance for
>each stride.

That is irrelevant for the reasons given below, but it boils down to:
The Tiger Lake on which I measured these speedups is a CPU with
out-of-order execution (with 26 years of ancestry).

The code in question is:

0x000055ba08700990 <v_star+0>: pxor %xmm1,%xmm1
0x000055ba08700994 <v_star+4>: test %r8,%r8
0x000055ba08700997 <v_star+7>: je 0x55ba087009b8 <v_star+40>
0x000055ba08700999 <v_star+9>: nopl 0x0(%rax)
0x000055ba087009a0 <v_star+16>: movsd (%rdi),%xmm0
0x000055ba087009a4 <v_star+20>: mulsd (%rdx),%xmm0
0x000055ba087009a8 <v_star+24>: add %rsi,%rdi
0x000055ba087009ab <v_star+27>: add %rcx,%rdx
0x000055ba087009ae <v_star+30>: addsd %xmm0,%xmm1
0x000055ba087009b2 <v_star+34>: sub $0x1,%r8
0x000055ba087009b6 <v_star+38>: jne 0x55ba087009a0 <v_star+16>
0x000055ba087009b8 <v_star+40>: movapd %xmm1,%xmm0
0x000055ba087009bc <v_star+44>: ret

with the inner loop from 0x55ba087009a0 <v_star+16> to
0x000055ba087009b6 <v_star+38> (inclusive).

The performance is determined by the dependence of the FP addition
addsd on the result from the previous iteration. The latency of this
FP addition is 4 cycles, and the whole matrix multiplication benchmark
runs at 4.1 cycles per iteration of the inner loop (and the cost of
the rest of the benchmark is spread over these cycles; that's the 0.1
cycle).

So what happens in the steady state is that all the other instructions
are executed early (at around the same time as the addsd from 50
iterations earlier; the Tiger Lake has a reorder buffer of 352
instructions), so fetching two values into registers one iteration
earlier makes hardly any difference. Plus, the Tiger lake contains
hardware prefetchers that are very good at prefetching with constant
stride, as in V*.

What could be done to make this faster is to add up, say 4
intermediate sums in parallel, and finally compute the sum of these 4
intermediate sums.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Re: Eliminating FDUP F*

<2023Dec3.151859@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25518&group=comp.lang.forth#25518

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Sun, 03 Dec 2023 14:18:59 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 30
Message-ID: <2023Dec3.151859@mips.complang.tuwien.ac.at>
References: <uk7eb0$rni9$1@dont-email.me> <ukauma$1h7a6$1@dont-email.me> <f15ee6da7bfcfe0436e0a5c38d46bfaa@news.novabbs.com> <2023Dec2.080651@mips.complang.tuwien.ac.at> <nnd$1a101463$1a7abd0d@ce9396c29ac6b3bd> <2023Dec2.174433@mips.complang.tuwien.ac.at> <9198892c8c20bea6f1d90ee7df9c8f76@news.novabbs.com>
Injection-Info: dont-email.me; posting-host="77885d9748c2fefbce9cbb6ca9781c8c";
logging-data="3036126"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/MiDwOfZSRCmw4gD2XBeMC"
Cancel-Lock: sha1:lfCqpaDqAavqhNbiG5+RutsOlCQ=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 3 Dec 2023 14:18 UTC

minforth@gmx.net (minforth) writes:
>Anton Ertl wrote:
>> 'v*' ( f-addr1 nstride1 f-addr2 nstride2 ucount -- r ) gforth-0.5 "v-star"
>> dot-product: r=v1*v2. The first element of v1 is at f_addr1, the
>> next at f_addr1+nstride1 and so on (similar for v2). Both vectors have
>> ucount elements.
>
>> However, note that the dot-product variant is slower than OpenBLAS by
>> a factor of 25. The best scalar implementation from slide 80 is quite
>> a bit faster (Factor 13 slower than OpenBLAS) and can be implemented
>> with
>
>It is not only about speed, but also about minimising calculation errors.
>
>For example, naive dot product summation in a single loop, which is
>unfortunately what gforth does, is prone to accumulating rounding errors.
>
>Nothing to blame here, but library functions are often "very smart".

The BLAS implementations seem to be only about speed. None that I am
aware of uses, e.g., Kahan summation to reduce rounding errors.

There are other libraries that are about accuracy, but not BLAS.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Re: Eliminating FDUP F*

<e93ff88202425b32916bae8123adf0b2@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25520&group=comp.lang.forth#25520

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: minforth@gmx.net (minforth)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Sun, 3 Dec 2023 14:58:56 +0000
Organization: novaBBS
Message-ID: <e93ff88202425b32916bae8123adf0b2@news.novabbs.com>
References: <uk7eb0$rni9$1@dont-email.me> <ukauma$1h7a6$1@dont-email.me> <f15ee6da7bfcfe0436e0a5c38d46bfaa@news.novabbs.com> <2023Dec2.080651@mips.complang.tuwien.ac.at> <nnd$1a101463$1a7abd0d@ce9396c29ac6b3bd> <2023Dec2.174433@mips.complang.tuwien.ac.at> <9198892c8c20bea6f1d90ee7df9c8f76@news.novabbs.com> <2023Dec3.151859@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2926906"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 0d6d33dbe0e2e1ff58b82acfc1a8a32ac3b1cb72
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$Tvvyjb6GNMTmUmDRw93jduSTHyqoHOMTy.y9tWluvnSMZICj.MIG6
 by: minforth - Sun, 3 Dec 2023 14:58 UTC

Anton Ertl wrote:
> The BLAS implementations seem to be only about speed. None that I am
> aware of uses, e.g., Kahan summation to reduce rounding errors.

Kahan summation gives good results but can be very slow. As a good
compromise, I prefer recursive summation of vector halves for dot products,
until their size is small enough to fit into vector chunks ready for
CPU-supported vector operations or intrinsics.

Wikipedia has a small article on this called Pairwise Summation.

Re: Eliminating FDUP F*

<nnd$74756098$55743a7b@a5a58826f7151fa5>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25522&group=comp.lang.forth#25522

  copy link   Newsgroups: comp.lang.forth
Newsgroups: comp.lang.forth
References: <uk7eb0$rni9$1@dont-email.me> <9198892c8c20bea6f1d90ee7df9c8f76@news.novabbs.com> <2023Dec3.151859@mips.complang.tuwien.ac.at> <e93ff88202425b32916bae8123adf0b2@news.novabbs.com>
Subject: Re: Eliminating FDUP F*
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: albert@cherry (none)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$74756098$55743a7b@a5a58826f7151fa5>
Organization: KPN B.V.
Date: Sun, 03 Dec 2023 16:31:13 +0100
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe004.abavia.com!abp003.abavia.com!news.kpn.nl!not-for-mail
Lines: 33
Injection-Date: Sun, 03 Dec 2023 16:31:13 +0100
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 2406
 by: none - Sun, 3 Dec 2023 15:31 UTC

In article <e93ff88202425b32916bae8123adf0b2@news.novabbs.com>,
minforth <minforth@gmx.net> wrote:
>Anton Ertl wrote:
>> The BLAS implementations seem to be only about speed. None that I am
>> aware of uses, e.g., Kahan summation to reduce rounding errors.
>
>Kahan summation gives good results but can be very slow. As a good
>compromise, I prefer recursive summation of vector halves for dot products,
>until their size is small enough to fit into vector chunks ready for
>CPU-supported vector operations or intrinsics.
>
>Wikipedia has a small article on this called Pairwise Summation.

Summing numbers that mean something, result in a sum whose error
is dominated with the maximum error of the summands.
Imagine a fly landing on the top of a church and a flee on top of
that. If you measure the height of the church precise to one mm,
the total height cannot be made more precise on reordering the
summands.
So I think it is mostly academic. The most precise calculation
I've done is 1/256 of an infrared wavelength over 60 m.
(That really required double precision floats. Chili ESO telescopes)
A more practical examples is the thickness of steel pipelines on the
Brent oil rigs. You have to be content with 3 significant digits at
the very most.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

Re: Eliminating FDUP F*

<620873558762f94e38a196516f45897f@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25523&group=comp.lang.forth#25523

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: minforth@gmx.net (minforth)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Sun, 3 Dec 2023 16:35:46 +0000
Organization: novaBBS
Message-ID: <620873558762f94e38a196516f45897f@news.novabbs.com>
References: <uk7eb0$rni9$1@dont-email.me> <9198892c8c20bea6f1d90ee7df9c8f76@news.novabbs.com> <2023Dec3.151859@mips.complang.tuwien.ac.at> <e93ff88202425b32916bae8123adf0b2@news.novabbs.com> <nnd$74756098$55743a7b@a5a58826f7151fa5>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2934200"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$1EsEvi4LPMXb7VGDonaIWubEXXg7rJJg0qjUbDt8Xdycue/U7gwOy
X-Rslight-Posting-User: 0d6d33dbe0e2e1ff58b82acfc1a8a32ac3b1cb72
 by: minforth - Sun, 3 Dec 2023 16:35 UTC

none wrote:

> In article <e93ff88202425b32916bae8123adf0b2@news.novabbs.com>,
> minforth <minforth@gmx.net> wrote:
>>Anton Ertl wrote:
>>> The BLAS implementations seem to be only about speed. None that I am
>>> aware of uses, e.g., Kahan summation to reduce rounding errors.
>>
>>Kahan summation gives good results but can be very slow. As a good
>>compromise, I prefer recursive summation of vector halves for dot products,
>>until their size is small enough to fit into vector chunks ready for
>>CPU-supported vector operations or intrinsics.
>>
>>Wikipedia has a small article on this called Pairwise Summation.

> Summing numbers that mean something, result in a sum whose error
> is dominated with the maximum error of the summands.
> Imagine a fly landing on the top of a church and a flee on top of
> that. If you measure the height of the church precise to one mm,
> the total height cannot be made more precise on reordering the
> summands.
> So I think it is mostly academic.

Well, we are not in the business of measuring academic bellfry bugs ;-),
but signal vectors in the order of up to tens of thousands of samples.
There it is good engineering practice to keep an eye on error propagation.

But you're right, under normal circumstances it doesn't matter. But when
you least expect it, it can ruin your day(s). Better be careful.

Re: Eliminating FDUP F*

<2023Dec3.185807@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25524&group=comp.lang.forth#25524

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Sun, 03 Dec 2023 17:58:07 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 15
Message-ID: <2023Dec3.185807@mips.complang.tuwien.ac.at>
References: <uk7eb0$rni9$1@dont-email.me> <2023Dec2.080651@mips.complang.tuwien.ac.at> <nnd$1a101463$1a7abd0d@ce9396c29ac6b3bd> <2023Dec2.174433@mips.complang.tuwien.ac.at> <nnd$1beec904$48a6f5e7@6254ff88a0e86fe0> <80161f628ef485921bef6dbecf033a94@news.novabbs.com>
Injection-Info: dont-email.me; posting-host="77885d9748c2fefbce9cbb6ca9781c8c";
logging-data="3102635"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/nmEBQH1hOtE4wULgxYAit"
Cancel-Lock: sha1:0OhjLQlV0PQx0WfNZMq7bxS3qbo=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 3 Dec 2023 17:58 UTC

mhx@iae.nl (mhx) writes:
>That is because OpenBLAS uses AVX2 with all cores working
>in parallel.

I expect that it uses AVX-512 on the Tiger Lake which I measured. My
measurements used only one core. Using more cores increases the CPU
cycles needed (due to parallelization overhead), although it reduces
the elapsed time.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Re: Eliminating FDUP F*

<2023Dec3.190208@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25525&group=comp.lang.forth#25525

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Sun, 03 Dec 2023 18:02:08 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 30
Message-ID: <2023Dec3.190208@mips.complang.tuwien.ac.at>
References: <uk7eb0$rni9$1@dont-email.me> <ukauma$1h7a6$1@dont-email.me> <f15ee6da7bfcfe0436e0a5c38d46bfaa@news.novabbs.com> <2023Dec2.080651@mips.complang.tuwien.ac.at> <nnd$1a101463$1a7abd0d@ce9396c29ac6b3bd> <2023Dec2.174433@mips.complang.tuwien.ac.at> <9198892c8c20bea6f1d90ee7df9c8f76@news.novabbs.com> <2023Dec3.151859@mips.complang.tuwien.ac.at> <e93ff88202425b32916bae8123adf0b2@news.novabbs.com>
Injection-Info: dont-email.me; posting-host="77885d9748c2fefbce9cbb6ca9781c8c";
logging-data="3115150"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX184TaAqzSSv6dGBzZ/sSdvQ"
Cancel-Lock: sha1:md3tEHVG4hmUjb3GNm4Ok7z0jFc=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 3 Dec 2023 18:02 UTC

minforth@gmx.net (minforth) writes:
>Anton Ertl wrote:
>> The BLAS implementations seem to be only about speed. None that I am
>> aware of uses, e.g., Kahan summation to reduce rounding errors.
>
>Kahan summation gives good results but can be very slow. As a good
>compromise, I prefer recursive summation of vector halves for dot products,
>until their size is small enough to fit into vector chunks ready for
>CPU-supported vector operations or intrinsics.

For multiplying big matrices (and why would you care in case of small
matrices?), the question is how to combine that with the memory access
patterns that you want for efficiently using the memory subsystem for
matrix multiplication, if it is possible at all. OpenBLAS certainly
does not do that. The divide-and-conquer approach
<https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm#Divide-and-conquer_algorithm>
deals well with the memory subsystem, and may exhibit some of the
properties you want, but at least in the implementation I did, I did
not form intermediate matrices, but added the intermediate results to
the appropriate elements in the target matrix C, so it does not have
significantly better accuracy than the straightforward algorithm. If
one stored intermediate results elsewhere for adding them pairwise,
that would cost extra overhead. Maybe worth it, maybe not.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Re: Eliminating FDUP F*

<2023Dec3.192306@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25528&group=comp.lang.forth#25528

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Sun, 03 Dec 2023 18:23:06 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 16
Message-ID: <2023Dec3.192306@mips.complang.tuwien.ac.at>
References: <uk7eb0$rni9$1@dont-email.me> <9198892c8c20bea6f1d90ee7df9c8f76@news.novabbs.com> <2023Dec3.151859@mips.complang.tuwien.ac.at> <e93ff88202425b32916bae8123adf0b2@news.novabbs.com> <nnd$74756098$55743a7b@a5a58826f7151fa5>
Injection-Info: dont-email.me; posting-host="77885d9748c2fefbce9cbb6ca9781c8c";
logging-data="3115150"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19kmxuZhz23BSa5tWAHP99q"
Cancel-Lock: sha1:xrGpy+ouv53WdX7I2iTdoHbg/3c=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 3 Dec 2023 18:23 UTC

albert@cherry.(none) (albert) writes:
>Summing numbers that mean something, result in a sum whose error
>is dominated with the maximum error of the summands.

1e30 1e f+ -1e30 f+ 1e 0e f~ .

produces 0 (false), even though with exact summation it would produce
true (-1). Of course, you may say that these numbers mean nothing to
you, but you are not the only one in the world.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Re: Eliminating FDUP F*

<fc20dfd66efb80f7a6f1c76db2883704@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=25529&group=comp.lang.forth#25529

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: mhx@iae.nl (mhx)
Newsgroups: comp.lang.forth
Subject: Re: Eliminating FDUP F*
Date: Sun, 3 Dec 2023 21:04:59 +0000
Organization: novaBBS
Message-ID: <fc20dfd66efb80f7a6f1c76db2883704@news.novabbs.com>
References: <uk7eb0$rni9$1@dont-email.me> <9198892c8c20bea6f1d90ee7df9c8f76@news.novabbs.com> <2023Dec3.151859@mips.complang.tuwien.ac.at> <e93ff88202425b32916bae8123adf0b2@news.novabbs.com> <nnd$74756098$55743a7b@a5a58826f7151fa5> <2023Dec3.192306@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2956868"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$CkAcxHIliVXxdWV4qIsUFu9M70Yaz0uqLBnrQoDco7hRg0HCP3RuC
X-Rslight-Posting-User: 463cbf1a76c808942982a163321348c75477c065
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: mhx - Sun, 3 Dec 2023 21:04 UTC

Anton Ertl wrote:

> 1e30 1e f+ -1e30 f+ 1e 0e f~ .

> produces 0 (false), even though with exact summation it would produce
> true (-1). Of course, you may say that these numbers mean nothing to
> you, but you are not the only one in the world.

Take the number of years the big bang happened (14.5 billion years ago),
square it and multiply by the height of Church St. Spirit in meters for
good measure. A photon will travel 1e30 meters in that amount of years.
Now add 1 meter ...

-marcel

Pages:123
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor