Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

"One day I woke up and discovered that I was in love with tripe." -- Tom Anderson


devel / comp.lang.forth / Floating point implementations on AMD64

SubjectAuthor
* Floating point implementations on AMD64Anton Ertl
+* Re: Floating point implementations on AMD64Krishna Myneni
|`* Re: Floating point implementations on AMD64Stephen Pelc
| +- Re: Floating point implementations on AMD64Krishna Myneni
| +- Re: Floating point implementations on AMD64dxf
| `* Re: Floating point implementations on AMD64Anton Ertl
|  `* Re: Floating point implementations on AMD64dxf
|   `- Re: Floating point implementations on AMD64Stephen Pelc
+- Re: Floating point implementations on AMD64dxf
+* Re: Floating point implementations on AMD64mhx
|+* Re: Floating point implementations on AMD64dxf
||`* Re: Floating point implementations on AMD64Krishna Myneni
|| +- Re: Floating point implementations on AMD64dxf
|| `* Re: Floating point implementations on AMD64Anton Ertl
||  `- Re: Floating point implementations on AMD64Krishna Myneni
|+* Re: Floating point implementations on AMD64Anton Ertl
||`* Re: Floating point implementations on AMD64dxf
|| `* Re: Floating point implementations on AMD64Anton Ertl
||  +* Re: Floating point implementations on AMD64dxf
||  |`* Re: Floating point implementations on AMD64Anton Ertl
||  | `* Re: Floating point implementations on AMD64dxf
||  |  `* Re: Floating point implementations on AMD64Krishna Myneni
||  |   +* Re: Floating point implementations on AMD64minforth
||  |   |`* Re: Floating point implementations on AMD64albert
||  |   | +- Re: Floating point implementations on AMD64minforth
||  |   | `- Re: Floating point implementations on AMD64Anton Ertl
||  |   +- Re: Floating point implementations on AMD64dxf
||  |   `* Re: Floating point implementations on AMD64Anton Ertl
||  |    `* Re: Floating point implementations on AMD64Krishna Myneni
||  |     +* Re: Floating point implementations on AMD64minforth
||  |     |+- Re: Floating point implementations on AMD64Krishna Myneni
||  |     |+* Re: Floating point implementations on AMD64Anton Ertl
||  |     ||+- Re: Floating point implementations on AMD64minforth
||  |     ||`- Re: Floating point implementations on AMD64mhx
||  |     |`* Re: Floating point implementations on AMD64Krishna Myneni
||  |     | +* Re: Floating point implementations on AMD64minforth
||  |     | |+* Re: Floating point implementations on AMD64mhx
||  |     | ||+* Re: Floating point implementations on AMD64minforth
||  |     | |||`* Re: Floating point implementations on AMD64mhx
||  |     | ||| +* Re: Floating point implementations on AMD64minforth
||  |     | ||| |`* Re: Floating point implementations on AMD64mhx
||  |     | ||| | `* Re: Floating point implementations on AMD64minforth
||  |     | ||| |  `- Re: Floating point implementations on AMD64minforth
||  |     | ||| `- Re: Floating point implementations on AMD64albert
||  |     | ||`* Re: Floating point implementations on AMD64Anton Ertl
||  |     | || +* Re: Floating point implementations on AMD64Paul Rubin
||  |     | || |`- Re: Floating point implementations on AMD64Anton Ertl
||  |     | || `- Re: Floating point implementations on AMD64Anton Ertl
||  |     | |`* Re: Floating point implementations on AMD64dxf
||  |     | | `- Re: Floating point implementations on AMD64albert
||  |     | `* Re: Floating point implementations on AMD64PMF
||  |     |  +* Re: Floating point implementations on AMD64Stephen Pelc
||  |     |  |`* Re: Floating point implementations on AMD64Krishna Myneni
||  |     |  | `- Re: Floating point implementations on AMD64minforth
||  |     |  `- Re: Floating point implementations on AMD64Krishna Myneni
||  |     `* Re: Floating point implementations on AMD64albert
||  |      `- Re: Floating point implementations on AMD64Krishna Myneni
||  +* Re: Floating point implementations on AMD64dxf
||  |`* Re: Floating point implementations on AMD64Anton Ertl
||  | `* Re: Floating point implementations on AMD64dxf
||  |  `* Re: Floating point implementations on AMD64Krishna Myneni
||  |   +* Re: Floating point implementations on AMD64minforth
||  |   |`* Re: Floating point implementations on AMD64albert
||  |   | +- Re: Floating point implementations on AMD64minforth
||  |   | `- Re: Floating point implementations on AMD64Anton Ertl
||  |   +- Re: Floating point implementations on AMD64dxf
||  |   `* Re: Floating point implementations on AMD64Anton Ertl
||  |    `* Re: Floating point implementations on AMD64Krishna Myneni
||  |     +* Re: Floating point implementations on AMD64minforth
||  |     |+- Re: Floating point implementations on AMD64Krishna Myneni
||  |     |+* Re: Floating point implementations on AMD64Anton Ertl
||  |     ||+- Re: Floating point implementations on AMD64minforth
||  |     ||`- Re: Floating point implementations on AMD64mhx
||  |     |`* Re: Floating point implementations on AMD64Krishna Myneni
||  |     | +* Re: Floating point implementations on AMD64minforth
||  |     | |+* Re: Floating point implementations on AMD64mhx
||  |     | ||+* Re: Floating point implementations on AMD64minforth
||  |     | |||`* Re: Floating point implementations on AMD64mhx
||  |     | ||| +* Re: Floating point implementations on AMD64minforth
||  |     | ||| |`* Re: Floating point implementations on AMD64mhx
||  |     | ||| | `* Re: Floating point implementations on AMD64minforth
||  |     | ||| |  `- Re: Floating point implementations on AMD64minforth
||  |     | ||| `- Re: Floating point implementations on AMD64albert
||  |     | ||`* Re: Floating point implementations on AMD64Anton Ertl
||  |     | || +* Re: Floating point implementations on AMD64Paul Rubin
||  |     | || |`- Re: Floating point implementations on AMD64Anton Ertl
||  |     | || `- Re: Floating point implementations on AMD64Anton Ertl
||  |     | |`* Re: Floating point implementations on AMD64dxf
||  |     | | `- Re: Floating point implementations on AMD64albert
||  |     | `* Re: Floating point implementations on AMD64PMF
||  |     |  +* Re: Floating point implementations on AMD64Stephen Pelc
||  |     |  |`* Re: Floating point implementations on AMD64Krishna Myneni
||  |     |  | `- Re: Floating point implementations on AMD64minforth
||  |     |  `- Re: Floating point implementations on AMD64Krishna Myneni
||  |     `* Re: Floating point implementations on AMD64albert
||  |      `- Re: Floating point implementations on AMD64Krishna Myneni
||  `* Re: Floating point implementations on AMD64albert
||   `- Re: Floating point implementations on AMD64dxf
|`* Re: Floating point implementations on AMD64albert
| `- Re: Floating point implementations on AMD64mhx
`* Re: Floating point implementations on AMD64albert

Pages:123
Floating point implementations on AMD64

<2024Apr13.195518@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26663&group=comp.lang.forth#26663

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Floating point implementations on AMD64
Date: Sat, 13 Apr 2024 17:55:18 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 83
Message-ID: <2024Apr13.195518@mips.complang.tuwien.ac.at>
Injection-Date: Sat, 13 Apr 2024 20:03:59 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="03bcbe3a9305225e2cc0585bad4c57f2";
logging-data="3290999"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18d6dWZSxr3LNLxjPQHmb62"
Cancel-Lock: sha1:AltRlr8lii83z8cXkSgo66eVP3I=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sat, 13 Apr 2024 17:55 UTC

I just looked at the floating-point implementations of recent
SwiftForth and VFX (finally present in the system from the start), and
on iForth-5.1-mini (for comparison):

1 FLOATS .

reports:

16 iforth
10 sf64
10 vfx64

For

: foo f+ f* ;

the resulting code is:

SwiftForth x64-Linux 4.0.0-RC87 24-Mar-2024
: foo f+ f* ; ok
see foo
44E8B9 ST(0) ST(1) FADDP DEC1
44E8BB ST(0) ST(1) FMULP DEC9
44E8BD RET C3 ok

VFX Forth 64 5.43 [build 0199] 2023-11-09 for Linux x64
© MicroProcessor Engineering Ltd, 1998-2023

: foo f+ f* ; ok
see foo
FOO
( 0050A250 DEC1 ) FADDP ST(1), ST
( 0050A252 DEC9 ) FMULP ST(1), ST
( 0050A254 C3 ) RET/NEXT
( 5 bytes, 3 instructions )

iForth:
$10226000 : foo 488BC04883ED088F4500 H.@H.m..E.
$1022600A fld [r13 0 +] tbyte41DB6D00 A[m.
$1022600E fld [r13 #16 +] tbyte
41DB6D10 A[m.
$10226012 fxch ST(2) D9CA YJ
$10226014 lea r13, [r13 #32 +] qword
4D8D6D20 M.m
$10226018 faddp ST(1), ST DEC1 ^A
$1022601A fxch ST(1) D9C9 YI
$1022601C fpopswap, 41DB6D00D9CA4D8D6D10 A[m.YJM.m.
$10226026 fmulp ST(1), ST DEC9 ^I
$10226028 fpush, 4D8D6DF0D9C941DB7D00 M.mpYIA[}.
$10226032 ; 488B45004883C508FFE0 H.E.H.E..` ok

So apparently the 8 hardware FP stack items are enough for SwiftForth
and VFX, while iForth prefers to use an FP stack in memory to allow
for a deeper FP stack.

Gforth sticks out by using 8-byte FP values; most of those are stored
in memory (supporting deep FP stacks), with the top of FP stack in an
xmm register on AMD64:

: foo f+ f* ; ok
see-code foo
$7FF2CE8034E0 f+ 1->1
7FF2CE4A6E43: mov rax,r12
7FF2CE4A6E46: lea r12,$08[r12]
7FF2CE4A6E4B: addsd xmm15,$08[rax]
$7FF2CE8034E8 f* 1->1
7FF2CE4A6E51: mov rax,r12
7FF2CE4A6E54: lea r12,$08[r12]
7FF2CE4A6E59: mulsd xmm15,$08[rax]
$7FF2CE8034F0 ;s 1->1
7FF2CE4A6E5F: mov rbx,[r14]
7FF2CE4A6E62: add r14,$08
7FF2CE4A6E66: mov rax,[rbx]
7FF2CE4A6E69: jmp eax

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Re: Floating point implementations on AMD64

<uvf5i9$38nl1$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26665&group=comp.lang.forth#26665

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: krishna.myneni@ccreweb.org (Krishna Myneni)
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
Date: Sat, 13 Apr 2024 18:47:20 -0500
Organization: A noiseless patient Spider
Lines: 71
Message-ID: <uvf5i9$38nl1$1@dont-email.me>
References: <2024Apr13.195518@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 14 Apr 2024 01:47:22 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="803700620d26eb7e6561f16ec8e1ad58";
logging-data="3432097"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX183rpexaBkEmNk+ZJSB5ybt"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:HrhIdIuWtB6NUyeGm7Aofz0JzzM=
In-Reply-To: <2024Apr13.195518@mips.complang.tuwien.ac.at>
Content-Language: en-US
 by: Krishna Myneni - Sat, 13 Apr 2024 23:47 UTC

On 4/13/24 12:55, Anton Ertl wrote:
> I just looked at the floating-point implementations of recent
> SwiftForth and VFX (finally present in the system from the start), and
> on iForth-5.1-mini (for comparison):
>
> 1 FLOATS .
>
> reports:
>
> 16 iforth
> 10 sf64
> 10 vfx64
>
> For
>
> : foo f+ f* ;
>
> the resulting code is:
>
> SwiftForth x64-Linux 4.0.0-RC87 24-Mar-2024
> : foo f+ f* ; ok
> see foo
> 44E8B9 ST(0) ST(1) FADDP DEC1
> 44E8BB ST(0) ST(1) FMULP DEC9
> 44E8BD RET C3 ok
>
>
> VFX Forth 64 5.43 [build 0199] 2023-11-09 for Linux x64
> © MicroProcessor Engineering Ltd, 1998-2023
>
> : foo f+ f* ; ok
> see foo
> FOO
> ( 0050A250 DEC1 ) FADDP ST(1), ST
> ( 0050A252 DEC9 ) FMULP ST(1), ST
> ( 0050A254 C3 ) RET/NEXT
> ( 5 bytes, 3 instructions )
>
>
> iForth:
> $10226000 : foo 488BC04883ED088F4500 H.@H.m..E.
> $1022600A fld [r13 0 +] tbyte41DB6D00 A[m.
> $1022600E fld [r13 #16 +] tbyte
> 41DB6D10 A[m.
> $10226012 fxch ST(2) D9CA YJ
> $10226014 lea r13, [r13 #32 +] qword
> 4D8D6D20 M.m
> $10226018 faddp ST(1), ST DEC1 ^A
> $1022601A fxch ST(1) D9C9 YI
> $1022601C fpopswap, 41DB6D00D9CA4D8D6D10 A[m.YJM.m.
> $10226026 fmulp ST(1), ST DEC9 ^I
> $10226028 fpush, 4D8D6DF0D9C941DB7D00 M.mpYIA[}.
> $10226032 ; 488B45004883C508FFE0 H.E.H.E..` ok
>
> So apparently the 8 hardware FP stack items are enough for SwiftForth
> and VFX, while iForth prefers to use an FP stack in memory to allow
> for a deeper FP stack.
>
....

For me, an 8 item hardware fp stack limit is too limiting to be useful.
This is mostly because of my use of the fp stack for initializing tables
(arrays and matrices), and my coding style of returning more than 8
floats on the fp stack for some types of computation. No doubt one can
limit themselves to an 8-item fp stack, but I'd hate to have to code wit
such a limit.

--
Krishna

Re: Floating point implementations on AMD64

<661b4d7d$1@news.ausics.net>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26666&group=comp.lang.forth#26666

  copy link   Newsgroups: comp.lang.forth
Date: Sun, 14 Apr 2024 13:29:01 +1000
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: Floating point implementations on AMD64
Newsgroups: comp.lang.forth
References: <2024Apr13.195518@mips.complang.tuwien.ac.at>
Content-Language: en-GB
From: dxforth@gmail.com (dxf)
In-Reply-To: <2024Apr13.195518@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
NNTP-Posting-Host: news.ausics.net
Message-ID: <661b4d7d$1@news.ausics.net>
Organization: Ausics - https://newsgroups.ausics.net
Lines: 13
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: dxf - Sun, 14 Apr 2024 03:29 UTC

On 14/04/2024 3:55 am, Anton Ertl wrote:
> I just looked at the floating-point implementations of recent
> SwiftForth and VFX (finally present in the system from the start), and
> on iForth-5.1-mini (for comparison):
> ...
> So apparently the 8 hardware FP stack items are enough for SwiftForth
> and VFX, while iForth prefers to use an FP stack in memory to allow
> for a deeper FP stack.

All the more reason to make fp loadable so users can choose the model
they want instead of built-in. IIRC VFX and SWF previously did this.

Re: Floating point implementations on AMD64

<27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26668&group=comp.lang.forth#26668

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: mhx@iae.nl (mhx)
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
Date: Sun, 14 Apr 2024 07:03:15 +0000
Organization: novaBBS
Message-ID: <27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>
References: <2024Apr13.195518@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1124178"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 59549e76d0c3560fb37b97f0b9407a8c14054f24
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Site: $2y$10$N0ymyIcp/vFG1N.P.WhjK.M7.o89iPKCpTBB.SL337svg.cZ1HarO
 by: mhx - Sun, 14 Apr 2024 07:03 UTC

Anton Ertl wrote:
[..]

> iForth:
> $10226000 : foo 488BC04883ED088F4500 H.@H.m..E.
> $1022600A fld [r13 0 +] tbyte41DB6D00 A[m.
> $1022600E fld [r13 #16 +] tbyte
> 41DB6D10 A[m.
> $10226012 fxch ST(2) D9CA YJ
> $10226014 lea r13, [r13 #32 +] qword
> 4D8D6D20 M.m
> $10226018 faddp ST(1), ST DEC1 ^A
> $1022601A fxch ST(1) D9C9 YI
> $1022601C fpopswap, 41DB6D00D9CA4D8D6D10 A[m.YJM.m.
> $10226026 fmulp ST(1), ST DEC9 ^I
> $10226028 fpush, 4D8D6DF0D9C941DB7D00 M.mpYIA[}.
> $10226032 ; 488B45004883C508FFE0 H.E.H.E..` ok

> So apparently the 8 hardware FP stack items are enough for SwiftForth
> and VFX, while iForth prefers to use an FP stack in memory to allow
> for a deeper FP stack.

Turbo Pascal had a fast FP mode that used the FPU stack. I found almost
immediately that that is unusable for serious work.

The used scheme is rather complicated. iForth uses the internal stack
when it can prove that there will be no under- or overflow. Non-inlined
calls (F. below) always use the memory stack.

FORTH> pi fvalue val1 pi/2 fvalue val2 ok
FORTH> : test val1 fdup val2 foo val1 f+ val2 f* f. ; ok
FORTH> see test
Flags: ANSI
$015FDA80 : test
$015FDA8A fld $015FD650 tbyte-offset
$015FDA90 fld ST(0)
$015FDA92 fld $015FD670 tbyte-offset
$015FDA98 faddp ST(1), ST
$015FDA9A fmulp ST(1), ST
$015FDA9C fld $015FD650 tbyte-offset
$015FDAA2 faddp ST(1), ST
$015FDAA4 fld $015FD670 tbyte-offset
$015FDAAA fmulp ST(1), ST
$015FDAAC fpush,
$015FDAB6 jmp F.+10 ( $0124ED42 ) offset NEAR
$015FDABB ;

Apparently there are special interrupts that one can enable
to signal FPU stack underflow (and then spill to memory)
but I never got them to work reliably. The software
analysis works fine, but can be fooled in case of rather
contrived circumstances. I have not encountered a bug in the
past two decades.

-marcel

Re: Floating point implementations on AMD64

<661b8d81$1@news.ausics.net>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26669&group=comp.lang.forth#26669

  copy link   Newsgroups: comp.lang.forth
Date: Sun, 14 Apr 2024 18:02:10 +1000
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: Floating point implementations on AMD64
Newsgroups: comp.lang.forth
References: <2024Apr13.195518@mips.complang.tuwien.ac.at>
<27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>
Content-Language: en-GB
From: dxforth@gmail.com (dxf)
In-Reply-To: <27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
NNTP-Posting-Host: news.ausics.net
Message-ID: <661b8d81$1@news.ausics.net>
Organization: Ausics - https://newsgroups.ausics.net
Lines: 28
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: dxf - Sun, 14 Apr 2024 08:02 UTC

On 14/04/2024 5:03 pm, mhx wrote:
> Anton Ertl wrote:
> [..]
>
>> iForth:
>> $10226000  : foo                        488BC04883ED088F4500      H.@H.m..E.
>> $1022600A  fld           [r13 0 +] tbyte41DB6D00                  A[m.
>> $1022600E  fld           [r13 #16 +] tbyte
>>                                         41DB6D10                  A[m.
>> $10226012  fxch          ST(2)          D9CA                      YJ
>> $10226014  lea           r13, [r13 #32 +] qword
>>                                         4D8D6D20                  M.m $10226018  faddp         ST(1), ST      DEC1                      ^A
>> $1022601A  fxch          ST(1)          D9C9                      YI
>> $1022601C  fpopswap,                    41DB6D00D9CA4D8D6D10      A[m.YJM.m.
>> $10226026  fmulp         ST(1), ST      DEC9                      ^I
>> $10226028  fpush,                       4D8D6DF0D9C941DB7D00      M.mpYIA[}.
>> $10226032  ;                            488B45004883C508FFE0      H.E.H.E..` ok
>
>> So apparently the 8 hardware FP stack items are enough for SwiftForth
>> and VFX, while iForth prefers to use an FP stack in memory to allow
>> for a deeper FP stack.
>
> Turbo Pascal had a fast FP mode that used the FPU stack. I found almost
> immediately that that is unusable for serious work.

Were that the case Intel had plenty opportunity to change it. They had
an academic advising them.

Re: Floating point implementations on AMD64

<2024Apr14.103435@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26670&group=comp.lang.forth#26670

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
Date: Sun, 14 Apr 2024 08:34:35 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 27
Message-ID: <2024Apr14.103435@mips.complang.tuwien.ac.at>
References: <2024Apr13.195518@mips.complang.tuwien.ac.at> <27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>
Injection-Date: Sun, 14 Apr 2024 10:52:10 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="0556dea8bd1e2f732f01e6d9982b6267";
logging-data="3743610"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19YjUoQMhHPKmkNjUr6APKh"
Cancel-Lock: sha1:uhhl2FwvwsKq8bGY1+s5JyPrh+A=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 14 Apr 2024 08:34 UTC

mhx@iae.nl (mhx) writes:
>Apparently there are special interrupts that one can enable
>to signal FPU stack underflow (and then spill to memory)
>but I never got them to work reliably.

From what I read about this, the intention was that the FP stack would
extend into memory (and thus not be limited to 8 elements): software
should react to FP stack overflows and underflows and store some
elements on overflow, and reload some elements on underflow. However,
this functionality was implemented in a buggy way on the 8087, so it
never worked as intended. Hoever, when they noticed this, the 8087
was already on the market, and Hyrum's law ensured that this behaviour
could not be changed.

And apparently this feature was not considered to be important enough
to add a new architectural feature that allows implementing the FP
stack extension to memory. I guess that the implementations of
Fortran and Algol-family languages (e.g., C) in the 1980s only used
the stack within an expression, so avoiding FP stack overflows with
compiler analysis (like you do), is relatively easy.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Re: Floating point implementations on AMD64

<661baa6c$1@news.ausics.net>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26671&group=comp.lang.forth#26671

  copy link   Newsgroups: comp.lang.forth
Date: Sun, 14 Apr 2024 20:05:33 +1000
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: Floating point implementations on AMD64
Newsgroups: comp.lang.forth
References: <2024Apr13.195518@mips.complang.tuwien.ac.at>
<27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>
<2024Apr14.103435@mips.complang.tuwien.ac.at>
Content-Language: en-GB
From: dxforth@gmail.com (dxf)
In-Reply-To: <2024Apr14.103435@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
NNTP-Posting-Host: news.ausics.net
Message-ID: <661baa6c$1@news.ausics.net>
Organization: Ausics - https://newsgroups.ausics.net
Lines: 23
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: dxf - Sun, 14 Apr 2024 10:05 UTC

On 14/04/2024 6:34 pm, Anton Ertl wrote:
> mhx@iae.nl (mhx) writes:
>> Apparently there are special interrupts that one can enable
>> to signal FPU stack underflow (and then spill to memory)
>> but I never got them to work reliably.
>
> From what I read about this, the intention was that the FP stack would
> extend into memory (and thus not be limited to 8 elements): software
> should react to FP stack overflows and underflows and store some
> elements on overflow, and reload some elements on underflow. However,
> this functionality was implemented in a buggy way on the 8087, so it
> never worked as intended. Hoever, when they noticed this, the 8087
> was already on the market, and Hyrum's law ensured that this behaviour
> could not be changed.

Do you have a reference for that? Below is a paper written by one of the
designers and it doesn't appear to be mentioned. It's of course possible
to maintain a stack in software and use the FPU to do the calculations.
There are instructions to load/store Temp Real format to memory and that
gets a mention.

https://dl.acm.org/doi/pdf/10.1145/800053.801923

Re: Floating point implementations on AMD64

<nnd$531daf67$282eec20@a0cd55e93bc8e117>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26674&group=comp.lang.forth#26674

  copy link   Newsgroups: comp.lang.forth
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
References: <2024Apr13.195518@mips.complang.tuwien.ac.at>
From: albert@spenarnc.xs4all.nl
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$531daf67$282eec20@a0cd55e93bc8e117>
Organization: KPN B.V.
Date: Sun, 14 Apr 2024 13:12:19 +0200
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feed.abavia.com!abe006.abavia.com!abp002.abavia.com!news.kpn.nl!not-for-mail
Lines: 49
Injection-Date: Sun, 14 Apr 2024 13:12:19 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
 by: albert@spenarnc.xs4all.nl - Sun, 14 Apr 2024 11:12 UTC

In article <2024Apr13.195518@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>I just looked at the floating-point implementations of recent
>SwiftForth and VFX (finally present in the system from the start), and
>on iForth-5.1-mini (for comparison):
>
>1 FLOATS .
>
>reports:
>
>16 iforth
>10 sf64
>10 vfx64
>
>For
>
>: foo f+ f* ;
>
>the resulting code is:
>
>SwiftForth x64-Linux 4.0.0-RC87 24-Mar-2024
>: foo f+ f* ; ok
>see foo
>44E8B9 ST(0) ST(1) FADDP DEC1
>44E8BB ST(0) ST(1) FMULP DEC9
>44E8BD RET C3 ok
>
>
>VFX Forth 64 5.43 [build 0199] 2023-11-09 for Linux x64
>© MicroProcessor Engineering Ltd, 1998-2023
>
>: foo f+ f* ; ok
>see foo
>FOO
>( 0050A250 DEC1 ) FADDP ST(1), ST
>( 0050A252 DEC9 ) FMULP ST(1), ST
>( 0050A254 C3 ) RET/NEXT
>( 5 bytes, 3 instructions )

I cut the same corners with ciforth. However I think this
cannot be compliant with the IEEE requirement of the standard?

>- anton
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

Re: Floating point implementations on AMD64

<nnd$197006f0$6c4a6afa@95b317b8491dfc1d>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26675&group=comp.lang.forth#26675

  copy link   Newsgroups: comp.lang.forth
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
References: <2024Apr13.195518@mips.complang.tuwien.ac.at> <27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>
From: albert@spenarnc.xs4all.nl
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$197006f0$6c4a6afa@95b317b8491dfc1d>
Organization: KPN B.V.
Date: Sun, 14 Apr 2024 13:21:14 +0200
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feed.abavia.com!abe005.abavia.com!abp003.abavia.com!news.kpn.nl!not-for-mail
Lines: 75
Injection-Date: Sun, 14 Apr 2024 13:21:14 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
 by: albert@spenarnc.xs4all.nl - Sun, 14 Apr 2024 11:21 UTC

In article <27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>,
mhx <mhx@iae.nl> wrote:
>Anton Ertl wrote:
>[..]
>
>> iForth:
>> $10226000 : foo 488BC04883ED088F4500 H.@H.m..E.
>> $1022600A fld [r13 0 +] tbyte41DB6D00 A[m.
>> $1022600E fld [r13 #16 +] tbyte
>> 41DB6D10 A[m.
>> $10226012 fxch ST(2) D9CA YJ
>> $10226014 lea r13, [r13 #32 +] qword
>> 4D8D6D20 M.m
>> $10226018 faddp ST(1), ST DEC1 ^A
>> $1022601A fxch ST(1) D9C9 YI
>> $1022601C fpopswap, 41DB6D00D9CA4D8D6D10 A[m.YJM.m.
>> $10226026 fmulp ST(1), ST DEC9 ^I
>> $10226028 fpush, 4D8D6DF0D9C941DB7D00 M.mpYIA[}.
>> $10226032 ; 488B45004883C508FFE0
>H.E.H.E..` ok
>
>> So apparently the 8 hardware FP stack items are enough for SwiftForth
>> and VFX, while iForth prefers to use an FP stack in memory to allow
>> for a deeper FP stack.
>
>Turbo Pascal had a fast FP mode that used the FPU stack. I found almost
>immediately that that is unusable for serious work.
>
>The used scheme is rather complicated. iForth uses the internal stack
>when it can prove that there will be no under- or overflow. Non-inlined
>calls (F. below) always use the memory stack.
>
>FORTH> pi fvalue val1 pi/2 fvalue val2 ok
>FORTH> : test val1 fdup val2 foo val1 f+ val2 f* f. ; ok
>FORTH> see test
>Flags: ANSI
>$015FDA80 : test
>$015FDA8A fld $015FD650 tbyte-offset
>$015FDA90 fld ST(0)
>$015FDA92 fld $015FD670 tbyte-offset
>$015FDA98 faddp ST(1), ST
>$015FDA9A fmulp ST(1), ST
>$015FDA9C fld $015FD650 tbyte-offset
>$015FDAA2 faddp ST(1), ST
>$015FDAA4 fld $015FD670 tbyte-offset
>$015FDAAA fmulp ST(1), ST
>$015FDAAC fpush,
>$015FDAB6 jmp F.+10 ( $0124ED42 ) offset NEAR
>$015FDABB ;
>
>Apparently there are special interrupts that one can enable
>to signal FPU stack underflow (and then spill to memory)
>but I never got them to work reliably. The software
>analysis works fine, but can be fooled in case of rather
>contrived circumstances. I have not encountered a bug in the
>past two decades.

This is a practical way.

I researched whether it is possible to detect whether the
circular stack overflows. There are instructions to
detect whether a position in this stack is occupied.
For a word that using a stack 4 deep, you could detect whether
it is necessary to save words this way, I thought.
I couldn't make it work, because essential assembler instruction
are missing. (Or I'm not clever enough.)

>
>-marcel
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

Re: Floating point implementations on AMD64

<2024Apr14.132507@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26676&group=comp.lang.forth#26676

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
Date: Sun, 14 Apr 2024 11:25:07 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 33
Message-ID: <2024Apr14.132507@mips.complang.tuwien.ac.at>
References: <2024Apr13.195518@mips.complang.tuwien.ac.at> <27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com> <2024Apr14.103435@mips.complang.tuwien.ac.at> <661baa6c$1@news.ausics.net>
Injection-Date: Sun, 14 Apr 2024 14:02:15 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="0556dea8bd1e2f732f01e6d9982b6267";
logging-data="3815225"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18MpVrhHfvrTHp7jAzWqOSr"
Cancel-Lock: sha1:xU48I4J1dSRWEVeQouPU0YC6+C8=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 14 Apr 2024 11:25 UTC

dxf <dxforth@gmail.com> writes:
>On 14/04/2024 6:34 pm, Anton Ertl wrote:
>> From what I read about this, the intention was that the FP stack would
>> extend into memory (and thus not be limited to 8 elements): software
>> should react to FP stack overflows and underflows and store some
>> elements on overflow, and reload some elements on underflow. However,
>> this functionality was implemented in a buggy way on the 8087, so it
>> never worked as intended. Hoever, when they noticed this, the 8087
>> was already on the market, and Hyrum's law ensured that this behaviour
>> could not be changed.
>
>Do you have a reference for that?

Kahan writes about the original intention in

http://web.archive.org/web/20170118054747/https://cims.nyu.edu/~dbindel/class/cs279/87stack.pdf

especially starting at the last paragraph of page 2.

And about the bug (or rather design mistake):

https://history.siam.org/pdfs2/Kahan_final.pdf

Start with the second-to-last paragraph on page 163. He digresses for
a page, but continues on the fourth paragraph of page 165 and
continues to the first paragraph of page 168.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Re: Floating point implementations on AMD64

<3dad22b21ad3e1afd55fdbf149854792@www.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26677&group=comp.lang.forth#26677

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: mhx@iae.nl (mhx)
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
Date: Sun, 14 Apr 2024 11:59:51 +0000
Organization: novaBBS
Message-ID: <3dad22b21ad3e1afd55fdbf149854792@www.novabbs.com>
References: <2024Apr13.195518@mips.complang.tuwien.ac.at> <27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com> <nnd$197006f0$6c4a6afa@95b317b8491dfc1d>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1148474"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 59549e76d0c3560fb37b97f0b9407a8c14054f24
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Site: $2y$10$CHyF5pqDYRb5zZI/U8klNelQJA.BTR8kH1yB9NyNLS6A29jE4Rhu.
 by: mhx - Sun, 14 Apr 2024 11:59 UTC

albert@spenarnc.xs4all.nl wrote:

> In article <27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>,
> mhx <mhx@iae.nl> wrote:
>>Anton Ertl wrote:
>>[..]
> This is a practical way.

> I researched whether it is possible to detect whether the
> circular stack overflows. There are instructions to
> detect whether a position in this stack is occupied.
> For a word that using a stack 4 deep, you could detect whether
> it is necessary to save words this way, I thought.
> I couldn't make it work, because essential assembler instruction
> are missing. (Or I'm not clever enough.)

I vaguely remember something like that for the FPU stack (combined
with interrupts?). It falls under the category "I couldn't make
it work."

-marcel

Re: Floating point implementations on AMD64

<uvgjf3$3kvt4$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26678&group=comp.lang.forth#26678

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: krishna.myneni@ccreweb.org (Krishna Myneni)
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
Date: Sun, 14 Apr 2024 07:50:42 -0500
Organization: A noiseless patient Spider
Lines: 139
Message-ID: <uvgjf3$3kvt4$1@dont-email.me>
References: <2024Apr13.195518@mips.complang.tuwien.ac.at>
<27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>
<661b8d81$1@news.ausics.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 14 Apr 2024 14:50:44 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="803700620d26eb7e6561f16ec8e1ad58";
logging-data="3833764"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19nERr4ME5pYAayooF0UmbO"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:RNPirlDqdb8WWK8r/YSLosxtgsI=
In-Reply-To: <661b8d81$1@news.ausics.net>
Content-Language: en-US
 by: Krishna Myneni - Sun, 14 Apr 2024 12:50 UTC

On 4/14/24 03:02, dxf wrote:
> On 14/04/2024 5:03 pm, mhx wrote:
>> Anton Ertl wrote:
>> [..]
>>
>>> iForth:
>>> $10226000  : foo                        488BC04883ED088F4500      H.@H.m..E.
>>> $1022600A  fld           [r13 0 +] tbyte41DB6D00                  A[m.
>>> $1022600E  fld           [r13 #16 +] tbyte
>>>                                         41DB6D10                  A[m.
>>> $10226012  fxch          ST(2)          D9CA                      YJ
>>> $10226014  lea           r13, [r13 #32 +] qword
>>>                                         4D8D6D20                  M.m $10226018  faddp         ST(1), ST      DEC1                      ^A
>>> $1022601A  fxch          ST(1)          D9C9                      YI
>>> $1022601C  fpopswap,                    41DB6D00D9CA4D8D6D10      A[m.YJM.m.
>>> $10226026  fmulp         ST(1), ST      DEC9                      ^I
>>> $10226028  fpush,                       4D8D6DF0D9C941DB7D00      M.mpYIA[}.
>>> $10226032  ;                            488B45004883C508FFE0      H.E.H.E..` ok
>>
>>> So apparently the 8 hardware FP stack items are enough for SwiftForth
>>> and VFX, while iForth prefers to use an FP stack in memory to allow
>>> for a deeper FP stack.
>>
>> Turbo Pascal had a fast FP mode that used the FPU stack. I found almost
>> immediately that that is unusable for serious work.
>
> Were that the case Intel had plenty opportunity to change it. They had
> an academic advising them.
>

Let's take a non-trivial example to illustrate why the 8-deep fp stack
may not be that useful for numerical computation. This example is
actually from the FSL demo. The word computes the Lorenz equations,
which give rise to the famous butterfly attractor. This is a system of
three nonlinear first order differential equations in three variables,
x, y, z, which are time dependent. The Lorenz equations define the
instantaneous derivatives of these variables:

dx/dt = sigma*(y - x)
dy/dt = x*(rho -z) - y
dz/dt = x*y - beta*z

where sigma, rho, and beta are constant parameters.

Let's say we want to write a word DERIVS which computes and stores the
derivatives, given the instantaneous values of x, y, z. This is the
basis for any numerical code which solves the trajectory in time,
starting from an initial condition.

DERIVS ( F: x y z -- )

Hence, we want to place some values x, y, and z onto the fp stack and
compute the three derivatives. Ideally these three values remain on the
fp stack and don't need to be fetched from memory constantly until the
three derivatives are computed, especially if one is using the hardware
fp stack. We allow the constant parameters to be fetched from memory and
the results of the derivative computation to be stored to memory so they
don't overflow the stack. This should be doable with the 8-element
hardware fp stack.

Below I give Forth code which computes the derivatives. This code is
usable only on systems with a separate FP stack. It will be interesting
to see the compiled code given by Forth systems using the hardware fpu
stack to compute the results. While this example may behave properly, if
we go to a fourth order system or higher, it gets less likely that the
hardware stack remains usable.

--
Krishna

== begin fpstack-test.4th ==
\ fpstack-test.4th
\ \ Compute the Lorenz equations, a set of three coupled
\ nonlinear differential equations.
\ \ dx/dt = sigma*(y - x)
\ dy/dt = x*(rho -z) - y
\ dz/dt = x*y - beta*z
\ \ sigma, rho, and beta are constant parameters.
\ \ The following code requires a separate fp stack

include ans-words \ only for kForth64
include fsl/fsl-util

[UNDEFINED] FPICK [IF]
cr .( Your system may not use a separate floating point stack!)
ABORT
[THEN]

[UNDEFINED] F2OVER [IF]
: f2over ( F: r1 r2 r3 r4 -- r1 r2 r3 r4 r1 r2 )
3 fpick 3 fpick ;
[THEN]

[UNDEFINED] F+! [IF]
: f+! ( a -- ) ( F: r -- ) dup f@ f+ f! ;
[THEN]

16.0e0 fconstant sigma
45.92e0 fconstant rho
4.0e0 fconstant beta

\ Compute the derivatives given the instantaneous values
\ x, y, z for a given time t.

\ xdot{ is an array consisting of dx/dt, dy/dt, dz/dt
3 float array xdot{

: derivs ( F: x y z -- )
fdup f2over \ F: x y z z x y
f- sigma f* fnegate
xdot{ 0 } f! \ F: x y z z
rho fover f- \ F: x y z z rho-z
4 fpick f* \ F: x y z z x*(rho - z)
3 fpick f-
xdot{ 1 } f! \ F: x y z z
fdrop
beta f* fnegate
xdot{ 2 } f!
f* xdot{ 2 } f+!
;

0 [IF]
include ttester
\ Test DERIVS
1e-15 set-near
t{ 0.1e 0.6e 4.0e derivs -> }t
t{ xdot{ 0 } f@ -> 8.0e0 }t
t{ xdot{ 1 } f@ -> 3.592e0 }t
t{ xdot{ 2 } f@ -> -15.94e0 }t
[THEN]

== end fpstack-test.4th ==

Re: Floating point implementations on AMD64

<661bdb9b$1@news.ausics.net>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26679&group=comp.lang.forth#26679

  copy link   Newsgroups: comp.lang.forth
Date: Sun, 14 Apr 2024 23:35:24 +1000
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: Floating point implementations on AMD64
Newsgroups: comp.lang.forth
References: <2024Apr13.195518@mips.complang.tuwien.ac.at>
<27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>
<2024Apr14.103435@mips.complang.tuwien.ac.at> <661baa6c$1@news.ausics.net>
<2024Apr14.132507@mips.complang.tuwien.ac.at>
Content-Language: en-GB
From: dxforth@gmail.com (dxf)
In-Reply-To: <2024Apr14.132507@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
NNTP-Posting-Host: news.ausics.net
Message-ID: <661bdb9b$1@news.ausics.net>
Organization: Ausics - https://newsgroups.ausics.net
Lines: 41
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: dxf - Sun, 14 Apr 2024 13:35 UTC

On 14/04/2024 9:25 pm, Anton Ertl wrote:
> dxf <dxforth@gmail.com> writes:
>> On 14/04/2024 6:34 pm, Anton Ertl wrote:
>>> From what I read about this, the intention was that the FP stack would
>>> extend into memory (and thus not be limited to 8 elements): software
>>> should react to FP stack overflows and underflows and store some
>>> elements on overflow, and reload some elements on underflow. However,
>>> this functionality was implemented in a buggy way on the 8087, so it
>>> never worked as intended. Hoever, when they noticed this, the 8087
>>> was already on the market, and Hyrum's law ensured that this behaviour
>>> could not be changed.
>>
>> Do you have a reference for that?
>
> Kahan writes about the original intention in
>
> http://web.archive.org/web/20170118054747/https://cims.nyu.edu/~dbindel/class/cs279/87stack.pdf
>
> especially starting at the last paragraph of page 2.
>
> And about the bug (or rather design mistake):
>
> https://history.siam.org/pdfs2/Kahan_final.pdf
>
> Start with the second-to-last paragraph on page 163. He digresses for
> a page, but continues on the fourth paragraph of page 165 and
> continues to the first paragraph of page 168.

The latter sounds like someone not getting his way more than a design mistake.
In the first reference Kahan states:

"When the 8087 was designed, I knew that stack over/underflow was an issue of
more aesthetic than practical importance. I still regret that the 8087's stack
implementation was not quite so neat as my original intention described in the
accompanying note."

Intel decided Kahan's aesthetic afterthought could be dispensed with. History
appears to have proven them correct. Were 8 levels of stack actually insufficient,
it would have made more sense for Intel to double it (if not for the 8087 then the
next incarnation) than to spill to memory which was bad in every way.

Re: Floating point implementations on AMD64

<661be7dc$1@news.ausics.net>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26680&group=comp.lang.forth#26680

  copy link   Newsgroups: comp.lang.forth
Date: Mon, 15 Apr 2024 00:27:41 +1000
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: Floating point implementations on AMD64
Content-Language: en-GB
Newsgroups: comp.lang.forth
References: <2024Apr13.195518@mips.complang.tuwien.ac.at>
<27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>
<661b8d81$1@news.ausics.net> <uvgjf3$3kvt4$1@dont-email.me>
From: dxforth@gmail.com (dxf)
In-Reply-To: <uvgjf3$3kvt4$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
NNTP-Posting-Host: news.ausics.net
Message-ID: <661be7dc$1@news.ausics.net>
Organization: Ausics - https://newsgroups.ausics.net
Lines: 6
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: dxf - Sun, 14 Apr 2024 14:27 UTC

On 14/04/2024 10:50 pm, Krishna Myneni wrote:
> ...
> Below I give Forth code which computes the derivatives. This code is usable only on systems with a separate FP stack. It will be interesting to see the compiled code given by Forth systems using the hardware fpu stack to compute the results. While this example may behave properly, if we go to a fourth order system or higher, it gets less likely that the hardware stack remains usable.

Systems that default to hardware fpu stack may well offer a software stack option.

Re: Floating point implementations on AMD64

<2024Apr14.171941@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26682&group=comp.lang.forth#26682

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
Date: Sun, 14 Apr 2024 15:19:41 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 123
Message-ID: <2024Apr14.171941@mips.complang.tuwien.ac.at>
References: <2024Apr13.195518@mips.complang.tuwien.ac.at> <27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com> <661b8d81$1@news.ausics.net> <uvgjf3$3kvt4$1@dont-email.me>
Injection-Date: Sun, 14 Apr 2024 17:52:22 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="0556dea8bd1e2f732f01e6d9982b6267";
logging-data="3908095"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/pyWIlsH2oMUCnDKbHIOU5"
Cancel-Lock: sha1:YB7sjjD9f2Phry2BAr6JrJgClcw=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 14 Apr 2024 15:19 UTC

Krishna Myneni <krishna.myneni@ccreweb.org> writes:
>dx/dt = sigma*(y - x)
>dy/dt = x*(rho -z) - y
>dz/dt = x*y - beta*z
>
>where sigma, rho, and beta are constant parameters.
>
>Let's say we want to write a word DERIVS which computes and stores the
>derivatives, given the instantaneous values of x, y, z. This is the
>basis for any numerical code which solves the trajectory in time,
>starting from an initial condition.
>
>DERIVS ( F: x y z -- )
>
>Hence, we want to place some values x, y, and z onto the fp stack and
>compute the three derivatives. Ideally these three values remain on the
>fp stack and don't need to be fetched from memory constantly until the
>three derivatives are computed, especially if one is using the hardware
>fp stack. We allow the constant parameters to be fetched from memory and
>the results of the derivative computation to be stored to memory so they
>don't overflow the stack. This should be doable with the 8-element
>hardware fp stack.

I have adapted your Forth code:

[UNDEFINED] F2OVER [IF]
: f2over ( F: r1 r2 r3 r4 -- r1 r2 r3 r4 r1 r2 )
3 fpick 3 fpick ;
[THEN]

16.0e0 fconstant sigma
45.92e0 fconstant rho
4.0e0 fconstant beta

fvariable dx/dt
fvariable dy/dt
fvariable dz/dt

: derivs ( F: x y z -- )
fdup f2over \ F: x y z z x y
f- sigma f* fnegate
dx/dt f! \ F: x y z z
rho fover f- \ F: x y z z rho-z
4 fpick f* \ F: x y z z x*(rho - z)
3 fpick f-
dy/dt f! \ F: x y z z
fdrop
beta f* fnegate
frot frot f* f+ dz/dt f!
;

0.1e 0.6e 4.0e derivs
dx/dt f@ f. cr \ 8.
dy/dt f@ f. cr \ 3.592
dz/dt f@ f. cr \ -15.94

In particular, I eliminated the additional memory accesses to DZ/DT.

SwiftForth, VFX and iforth produce the expected results for your test
case. The code is:

SwiftForth 4.0.0-RC87 VFX Forth 64 5.43 iforth-5.1-mini
ST(0) FLD FLD ST fld ST(0)
44E8BC ( f2over ) CALL CALL 0050A080 F2OVER fld [r13 0 +] tbyte
ST(0) ST(1) FSUBP FSUBP ST(1), ST fxch ST(1)
44E8FB ( sigma ) CALL CALL 0050A2BB SIGMA fld [r13 #16 +] tby
ST(0) ST(1) FMULP FMULP ST(1), ST lea r13, [r13 #32 +]
FCHS FCHS fxch ST(3)
-8 [RBP] RBP LEA FSTP TBYTE FFF9CFE8 [RIP] fxch ST(1)
RBX 0 [RBP] MOV CALL 0050A2FB RHO fld ST(3)
4C508 [RDI] RBX LEA FLD ST(1) fld ST(3)
0 [RBX] TBYTE FSTP FSUBP ST(1), ST fsubp ST(1), ST
0 [RBP] RBX MOV LEA RBP, [RBP+-08] fld $101BC720 tbyte
8 [RBP] RBP LEA MOV [RBP], RBX fmulp ST(1), ST
44E923 ( rho ) CALL MOV EBX, # 00000004 fchs
ST(1) FLD CALL 005030C0 FPICK fstp $10226470 tbyte
ST(0) ST(1) FSUBP FMULP ST(1), ST fld $101BC710 tbyte
-8 [RBP] RBP LEA LEA RBP, [RBP+-08] fld ST(1)
RBX 0 [RBP] MOV MOV [RBP], RBX fsubp ST(1), ST
4 # EBX MOV MOV EBX, # 00000003 fld ST(4)
43C901 ( FPICK ) CALL CALL 005030C0 FPICK fmulp ST(1), ST
ST(0) ST(1) FMULP FSUBP ST(1), ST fld ST(3)
-8 [RBP] RBP LEA FSTP TBYTE FFF9CFC1 [RIP] fsubp ST(1), ST
RBX 0 [RBP] MOV FSTP ST fstp $10226490 tbyte
3 # EBX MOV CALL 0050A33B BETA ffreep ST(0)
43C901 ( FPICK ) CALL FMULP ST(1), ST fld $101BC700 tbyte
ST(0) ST(1) FSUBP FCHS fmulp ST(1), ST
-8 [RBP] RBP LEA FXCH ST(1) fchs
RBX 0 [RBP] MOV FXCH ST(2) fxch ST(1)
4C530 [RDI] RBX LEA FXCH ST(1) fxch ST(2)
0 [RBX] TBYTE FSTP FXCH ST(2) fxch ST(1)
0 [RBP] RBX MOV FMULP ST(1), ST fxch ST(2)
8 [RBP] RBP LEA FADDP ST(1), ST fmulp ST(1), ST
ST(0) FSTP FSTP TBYTE FFF9CFB4 [RIP] fxch ST(1)
44E94B ( beta ) CALL RET/NEXT fpopswap,
ST(0) ST(1) FMULP faddp ST(1), ST
FCHS fstp $102264B0 tbyte
43C807 ( FROT ) CALL ;
43C807 ( FROT ) CALL
ST(0) ST(1) FMULP
ST(0) ST(1) FADDP
-8 [RBP] RBP LEA
RBX 0 [RBP] MOV
4C558 [RDI] RBX LEA
0 [RBX] TBYTE FSTP
0 [RBP] RBX MOV
8 [RBP] RBP LEA
RET

FPICK is apparently implemented on SwiftForth and VFX through an
indirect branch that branches to one of 8 variants of "FLD ST(...)",
while iForth manages to resolve this during compilation.

I have also looked at VFX 5.11 which uses XMM registers instead of the
FP stack, but it does not inline FP operations, so you mostly see a long
sequence of calls.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Re: Floating point implementations on AMD64

<2024Apr14.175340@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26683&group=comp.lang.forth#26683

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
Date: Sun, 14 Apr 2024 15:53:40 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 42
Message-ID: <2024Apr14.175340@mips.complang.tuwien.ac.at>
References: <2024Apr13.195518@mips.complang.tuwien.ac.at> <27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com> <2024Apr14.103435@mips.complang.tuwien.ac.at> <661baa6c$1@news.ausics.net> <2024Apr14.132507@mips.complang.tuwien.ac.at> <661bdb9b$1@news.ausics.net>
Injection-Date: Sun, 14 Apr 2024 18:14:37 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="0556dea8bd1e2f732f01e6d9982b6267";
logging-data="3913325"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+4Dn1faO+3RnsSl0Ycypa9"
Cancel-Lock: sha1:4lE7SVR2eNU68gvAX05ppBXooDA=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 14 Apr 2024 15:53 UTC

dxf <dxforth@gmail.com> writes:
>On 14/04/2024 9:25 pm, Anton Ertl wrote:
>> Kahan writes about the original intention in
>>
>> http://web.archive.org/web/20170118054747/https://cims.nyu.edu/~dbindel/class/cs279/87stack.pdf
>>
>> especially starting at the last paragraph of page 2.
>>
>> And about the bug (or rather design mistake):
>>
>> https://history.siam.org/pdfs2/Kahan_final.pdf
>>
>> Start with the second-to-last paragraph on page 163. He digresses for
>> a page, but continues on the fourth paragraph of page 165 and
>> continues to the first paragraph of page 168.
>
>The latter sounds like someone not getting his way more than a design mistake.
>In the first reference Kahan states:
>
> "When the 8087 was designed, I knew that stack over/underflow was an issue of
> more aesthetic than practical importance. I still regret that the 8087's stack
> implementation was not quite so neat as my original intention described in the
> accompanying note."
>
>Intel decided Kahan's aesthetic afterthought could be dispensed with.

In a way, they did, and Kahan obviously did not get his way. But to
me it sounds like they tried and failed at implementing a stack that
extends into memory. The tags that indicate the presence of a stack
item are there. If they had made a conscious decision at the start to
dispense with the idea of an extensible stack, they would have
discpensed with these bits indicating the presence of a stack item as
well. So what happened is that they botched the first attempt, and
then decided that they did not want to do what would have been
necessary to fix it.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Re: Floating point implementations on AMD64

<uvhcnp$3qbqq$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26684&group=comp.lang.forth#26684

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: stephen@vfxforth.com (Stephen Pelc)
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
Date: Sun, 14 Apr 2024 20:02:01 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 85
Message-ID: <uvhcnp$3qbqq$1@dont-email.me>
References: <2024Apr13.195518@mips.complang.tuwien.ac.at> <uvf5i9$38nl1$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 14 Apr 2024 22:02:02 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="8ec680f60ab4308d8802cc0253db5626";
logging-data="4009818"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/TYkChb1VwnIuyp5LdpXVj"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:3Ud434ftoB3NgyMqYRlCQmBbGQU=
X-Usenapp: v1.27.2/l - Full License
 by: Stephen Pelc - Sun, 14 Apr 2024 20:02 UTC

On 14 Apr 2024 at 01:47:20 CEST, "Krishna Myneni" <krishna.myneni@ccreweb.org>
wrote:

> On 4/13/24 12:55, Anton Ertl wrote:
>> I just looked at the floating-point implementations of recent
>> SwiftForth and VFX (finally present in the system from the start), and
>> on iForth-5.1-mini (for comparison):
>>
>> 1 FLOATS .
>>
>> reports:
>>
>> 16 iforth
>> 10 sf64
>> 10 vfx64
>>
>> For
>>
>> : foo f+ f* ;
>>
>> the resulting code is:
>>
>> SwiftForth x64-Linux 4.0.0-RC87 24-Mar-2024
>> : foo f+ f* ; ok
>> see foo
>> 44E8B9 ST(0) ST(1) FADDP DEC1
>> 44E8BB ST(0) ST(1) FMULP DEC9
>> 44E8BD RET C3 ok
>>
>>
>> VFX Forth 64 5.43 [build 0199] 2023-11-09 for Linux x64
>> © MicroProcessor Engineering Ltd, 1998-2023
>>
>> : foo f+ f* ; ok
>> see foo
>> FOO
>> ( 0050A250 DEC1 ) FADDP ST(1), ST
>> ( 0050A252 DEC9 ) FMULP ST(1), ST
>> ( 0050A254 C3 ) RET/NEXT
>> ( 5 bytes, 3 instructions )
>>
>>
>> iForth:
>> $10226000 : foo 488BC04883ED088F4500 H.@H.m..E.
>> $1022600A fld [r13 0 +] tbyte41DB6D00 A[m.
>> $1022600E fld [r13 #16 +] tbyte
>> 41DB6D10 A[m.
>> $10226012 fxch ST(2) D9CA YJ
>> $10226014 lea r13, [r13 #32 +] qword
>> 4D8D6D20 M.m
>> $10226018 faddp ST(1), ST DEC1 ^A
>> $1022601A fxch ST(1) D9C9 YI
>> $1022601C fpopswap, 41DB6D00D9CA4D8D6D10 A[m.YJM.m.
>> $10226026 fmulp ST(1), ST DEC9 ^I
>> $10226028 fpush, 4D8D6DF0D9C941DB7D00 M.mpYIA[}.
>> $10226032 ; 488B45004883C508FFE0 H.E.H.E..` ok
>>
>> So apparently the 8 hardware FP stack items are enough for SwiftForth
>> and VFX, while iForth prefers to use an FP stack in memory to allow
>> for a deeper FP stack.
>>
> ...
>
> For me, an 8 item hardware fp stack limit is too limiting to be useful.
> This is mostly because of my use of the fp stack for initializing tables
> (arrays and matrices), and my coding style of returning more than 8
> floats on the fp stack for some types of computation. No doubt one can
> limit themselves to an 8-item fp stack, but I'd hate to have to code wit
> such a limit.

The manual (gasp) documents how to change the default FP package.

Changing the default pack also changes the system call interfaces to
match.

Stephen
--
Stephen Pelc, stephen@vfxforth.com
MicroProcessor Engineering, Ltd. - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)78 0390 3612, +34 649 662 974
http://www.mpeforth.com
MPE website
http://www.vfxforth.com/downloads/VfxCommunity/
downloads

Re: Floating point implementations on AMD64

<uvhp1t$3sr31$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26685&group=comp.lang.forth#26685

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: krishna.myneni@ccreweb.org (Krishna Myneni)
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
Date: Sun, 14 Apr 2024 18:32:11 -0500
Organization: A noiseless patient Spider
Lines: 138
Message-ID: <uvhp1t$3sr31$1@dont-email.me>
References: <2024Apr13.195518@mips.complang.tuwien.ac.at>
<27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>
<661b8d81$1@news.ausics.net> <uvgjf3$3kvt4$1@dont-email.me>
<2024Apr14.171941@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 15 Apr 2024 01:32:14 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="393420cb1d124b4d02cb54a56910ed6a";
logging-data="4090977"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19eOf/X65BUMb4lGBOz6oBQ"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:5HVtwcwpcEc0h52r2UGbupK4Unw=
Content-Language: en-US
In-Reply-To: <2024Apr14.171941@mips.complang.tuwien.ac.at>
 by: Krishna Myneni - Sun, 14 Apr 2024 23:32 UTC

On 4/14/24 10:19, Anton Ertl wrote:
> Krishna Myneni <krishna.myneni@ccreweb.org> writes:
>> dx/dt = sigma*(y - x)
>> dy/dt = x*(rho -z) - y
>> dz/dt = x*y - beta*z
>>
>> where sigma, rho, and beta are constant parameters.
>>
>> Let's say we want to write a word DERIVS which computes and stores the
>> derivatives, given the instantaneous values of x, y, z. This is the
>> basis for any numerical code which solves the trajectory in time,
>> starting from an initial condition.
>>
>> DERIVS ( F: x y z -- )
>>
>> Hence, we want to place some values x, y, and z onto the fp stack and
>> compute the three derivatives. Ideally these three values remain on the
>> fp stack and don't need to be fetched from memory constantly until the
>> three derivatives are computed, especially if one is using the hardware
>> fp stack. We allow the constant parameters to be fetched from memory and
>> the results of the derivative computation to be stored to memory so they
>> don't overflow the stack. This should be doable with the 8-element
>> hardware fp stack.
>
> I have adapted your Forth code:
>
> [UNDEFINED] F2OVER [IF]
> : f2over ( F: r1 r2 r3 r4 -- r1 r2 r3 r4 r1 r2 )
> 3 fpick 3 fpick ;
> [THEN]
>
> 16.0e0 fconstant sigma
> 45.92e0 fconstant rho
> 4.0e0 fconstant beta
>
> fvariable dx/dt
> fvariable dy/dt
> fvariable dz/dt
>
> : derivs ( F: x y z -- )
> fdup f2over \ F: x y z z x y
> f- sigma f* fnegate
> dx/dt f! \ F: x y z z
> rho fover f- \ F: x y z z rho-z
> 4 fpick f* \ F: x y z z x*(rho - z)
> 3 fpick f-
> dy/dt f! \ F: x y z z
> fdrop
> beta f* fnegate
> frot frot f* f+ dz/dt f!
> ;
>
> 0.1e 0.6e 4.0e derivs
> dx/dt f@ f. cr \ 8.
> dy/dt f@ f. cr \ 3.592
> dz/dt f@ f. cr \ -15.94
>
> In particular, I eliminated the additional memory accesses to DZ/DT.
>

Nice. FROT FROT is expensive on a memory based FP stack, unless it is
optimized by the compiler, but for fpu stack use it's probably very
fast. I see that VFX Forth and iforth use a series of FXCH instructions
to implement FROT FROT.

> SwiftForth, VFX and iforth produce the expected results for your test
> case. The code is:
>
> SwiftForth 4.0.0-RC87 VFX Forth 64 5.43 iforth-5.1-mini
> ST(0) FLD FLD ST fld ST(0)
> 44E8BC ( f2over ) CALL CALL 0050A080 F2OVER fld [r13 0 +] tbyte
> ST(0) ST(1) FSUBP FSUBP ST(1), ST fxch ST(1)
> 44E8FB ( sigma ) CALL CALL 0050A2BB SIGMA fld [r13 #16 +] tby
> ST(0) ST(1) FMULP FMULP ST(1), ST lea r13, [r13 #32 +]
> FCHS FCHS fxch ST(3)
> -8 [RBP] RBP LEA FSTP TBYTE FFF9CFE8 [RIP] fxch ST(1)
> RBX 0 [RBP] MOV CALL 0050A2FB RHO fld ST(3)
> 4C508 [RDI] RBX LEA FLD ST(1) fld ST(3)
> 0 [RBX] TBYTE FSTP FSUBP ST(1), ST fsubp ST(1), ST
> 0 [RBP] RBX MOV LEA RBP, [RBP+-08] fld $101BC720 tbyte
> 8 [RBP] RBP LEA MOV [RBP], RBX fmulp ST(1), ST
> 44E923 ( rho ) CALL MOV EBX, # 00000004 fchs
> ST(1) FLD CALL 005030C0 FPICK fstp $10226470 tbyte
> ST(0) ST(1) FSUBP FMULP ST(1), ST fld $101BC710 tbyte
> -8 [RBP] RBP LEA LEA RBP, [RBP+-08] fld ST(1)
> RBX 0 [RBP] MOV MOV [RBP], RBX fsubp ST(1), ST
> 4 # EBX MOV MOV EBX, # 00000003 fld ST(4)
> 43C901 ( FPICK ) CALL CALL 005030C0 FPICK fmulp ST(1), ST
> ST(0) ST(1) FMULP FSUBP ST(1), ST fld ST(3)
> -8 [RBP] RBP LEA FSTP TBYTE FFF9CFC1 [RIP] fsubp ST(1), ST
> RBX 0 [RBP] MOV FSTP ST fstp $10226490 tbyte
> 3 # EBX MOV CALL 0050A33B BETA ffreep ST(0)
> 43C901 ( FPICK ) CALL FMULP ST(1), ST fld $101BC700 tbyte
> ST(0) ST(1) FSUBP FCHS fmulp ST(1), ST
> -8 [RBP] RBP LEA FXCH ST(1) fchs
> RBX 0 [RBP] MOV FXCH ST(2) fxch ST(1)
> 4C530 [RDI] RBX LEA FXCH ST(1) fxch ST(2)
> 0 [RBX] TBYTE FSTP FXCH ST(2) fxch ST(1)
> 0 [RBP] RBX MOV FMULP ST(1), ST fxch ST(2)
> 8 [RBP] RBP LEA FADDP ST(1), ST fmulp ST(1), ST
> ST(0) FSTP FSTP TBYTE FFF9CFB4 [RIP] fxch ST(1)
> 44E94B ( beta ) CALL RET/NEXT fpopswap,
> ST(0) ST(1) FMULP faddp ST(1), ST
> FCHS fstp $102264B0 tbyte
> 43C807 ( FROT ) CALL ;
> 43C807 ( FROT ) CALL
> ST(0) ST(1) FMULP
> ST(0) ST(1) FADDP
> -8 [RBP] RBP LEA
> RBX 0 [RBP] MOV
> 4C558 [RDI] RBX LEA
> 0 [RBX] TBYTE FSTP
> 0 [RBP] RBX MOV
> 8 [RBP] RBP LEA
> RET
>
> FPICK is apparently implemented on SwiftForth and VFX through an
> indirect branch that branches to one of 8 variants of "FLD ST(...)",
> while iForth manages to resolve this during compilation.
>

Good to see that x, y, z are not repeatedly fetched from memory.

For this example, the hardware fpu stack is sufficient. But, it's easy
to see that the benefits of a hardware-only stack would diminish quickly
as the size of the problem increased a small amount, and then the
programmer (or compiler) would have to keep careful track of how many
fpu registers are used.

> I have also looked at VFX 5.11 which uses XMM registers instead of the
> FP stack, but it does not inline FP operations, so you mostly see a long
> sequence of calls.
>

--
Krishna

Re: Floating point implementations on AMD64

<uvhp6l$3sr31$2@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26686&group=comp.lang.forth#26686

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: krishna.myneni@ccreweb.org (Krishna Myneni)
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
Date: Sun, 14 Apr 2024 18:34:45 -0500
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <uvhp6l$3sr31$2@dont-email.me>
References: <2024Apr13.195518@mips.complang.tuwien.ac.at>
<uvf5i9$38nl1$1@dont-email.me> <uvhcnp$3qbqq$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 15 Apr 2024 01:34:46 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="393420cb1d124b4d02cb54a56910ed6a";
logging-data="4090977"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/lNAd4z70K8t0+FbBFWMQe"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:xA5R5iRpRqV8Zcf9pwq8AXcsmBI=
In-Reply-To: <uvhcnp$3qbqq$1@dont-email.me>
Content-Language: en-US
 by: Krishna Myneni - Sun, 14 Apr 2024 23:34 UTC

On 4/14/24 15:02, Stephen Pelc wrote:
> On 14 Apr 2024 at 01:47:20 CEST, "Krishna Myneni" <krishna.myneni@ccreweb.org>
> wrote:
>
>> On 4/13/24 12:55, Anton Ertl wrote:
>>> I just looked at the floating-point implementations of recent
>>> SwiftForth and VFX (finally present in the system from the start), and
>>> on iForth-5.1-mini (for comparison):
....
>>>
>>> So apparently the 8 hardware FP stack items are enough for SwiftForth
>>> and VFX, while iForth prefers to use an FP stack in memory to allow
>>> for a deeper FP stack.
>>>
>> ...
>>
>> For me, an 8 item hardware fp stack limit is too limiting to be useful.
>> This is mostly because of my use of the fp stack for initializing tables
>> (arrays and matrices), and my coding style of returning more than 8
>> floats on the fp stack for some types of computation. No doubt one can
>> limit themselves to an 8-item fp stack, but I'd hate to have to code wit
>> such a limit.
>
> The manual (gasp) documents how to change the default FP package.
>

Good to know.

--
Krishna

Re: Floating point implementations on AMD64

<661c84e4$1@news.ausics.net>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26687&group=comp.lang.forth#26687

  copy link   Newsgroups: comp.lang.forth
Date: Mon, 15 Apr 2024 11:37:39 +1000
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: Floating point implementations on AMD64
Newsgroups: comp.lang.forth
References: <2024Apr13.195518@mips.complang.tuwien.ac.at>
<uvf5i9$38nl1$1@dont-email.me> <uvhcnp$3qbqq$1@dont-email.me>
Content-Language: en-GB
From: dxforth@gmail.com (dxf)
In-Reply-To: <uvhcnp$3qbqq$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
NNTP-Posting-Host: news.ausics.net
Message-ID: <661c84e4$1@news.ausics.net>
Organization: Ausics - https://newsgroups.ausics.net
Lines: 90
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: dxf - Mon, 15 Apr 2024 01:37 UTC

On 15/04/2024 6:02 am, Stephen Pelc wrote:
> On 14 Apr 2024 at 01:47:20 CEST, "Krishna Myneni" <krishna.myneni@ccreweb.org>
> wrote:
>
>> On 4/13/24 12:55, Anton Ertl wrote:
>>> I just looked at the floating-point implementations of recent
>>> SwiftForth and VFX (finally present in the system from the start), and
>>> on iForth-5.1-mini (for comparison):
>>>
>>> 1 FLOATS .
>>>
>>> reports:
>>>
>>> 16 iforth
>>> 10 sf64
>>> 10 vfx64
>>>
>>> For
>>>
>>> : foo f+ f* ;
>>>
>>> the resulting code is:
>>>
>>> SwiftForth x64-Linux 4.0.0-RC87 24-Mar-2024
>>> : foo f+ f* ; ok
>>> see foo
>>> 44E8B9 ST(0) ST(1) FADDP DEC1
>>> 44E8BB ST(0) ST(1) FMULP DEC9
>>> 44E8BD RET C3 ok
>>>
>>>
>>> VFX Forth 64 5.43 [build 0199] 2023-11-09 for Linux x64
>>> © MicroProcessor Engineering Ltd, 1998-2023
>>>
>>> : foo f+ f* ; ok
>>> see foo
>>> FOO
>>> ( 0050A250 DEC1 ) FADDP ST(1), ST
>>> ( 0050A252 DEC9 ) FMULP ST(1), ST
>>> ( 0050A254 C3 ) RET/NEXT
>>> ( 5 bytes, 3 instructions )
>>>
>>>
>>> iForth:
>>> $10226000 : foo 488BC04883ED088F4500 H.@H.m..E.
>>> $1022600A fld [r13 0 +] tbyte41DB6D00 A[m.
>>> $1022600E fld [r13 #16 +] tbyte
>>> 41DB6D10 A[m.
>>> $10226012 fxch ST(2) D9CA YJ
>>> $10226014 lea r13, [r13 #32 +] qword
>>> 4D8D6D20 M.m
>>> $10226018 faddp ST(1), ST DEC1 ^A
>>> $1022601A fxch ST(1) D9C9 YI
>>> $1022601C fpopswap, 41DB6D00D9CA4D8D6D10 A[m.YJM.m.
>>> $10226026 fmulp ST(1), ST DEC9 ^I
>>> $10226028 fpush, 4D8D6DF0D9C941DB7D00 M.mpYIA[}.
>>> $10226032 ; 488B45004883C508FFE0 H.E.H.E..` ok
>>>
>>> So apparently the 8 hardware FP stack items are enough for SwiftForth
>>> and VFX, while iForth prefers to use an FP stack in memory to allow
>>> for a deeper FP stack.
>>>
>> ...
>>
>> For me, an 8 item hardware fp stack limit is too limiting to be useful.
>> This is mostly because of my use of the fp stack for initializing tables
>> (arrays and matrices), and my coding style of returning more than 8
>> floats on the fp stack for some types of computation. No doubt one can
>> limit themselves to an 8-item fp stack, but I'd hate to have to code wit
>> such a limit.
>
> The manual (gasp) documents how to change the default FP package.
>
> Changing the default pack also changes the system call interfaces to
> match.

Specifically chapter 14 in the PDF doc.

integers
remove-FP-pack
include Lib/x64/Hfpx64

swaps in the 80-bit external stack model.

The HTML doc appears to lack this information (or hard to find) should a user
select that by mistake.

Re: Floating point implementations on AMD64

<661c8bf4@news.ausics.net>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26688&group=comp.lang.forth#26688

  copy link   Newsgroups: comp.lang.forth
Date: Mon, 15 Apr 2024 12:07:47 +1000
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: Floating point implementations on AMD64
Content-Language: en-GB
Newsgroups: comp.lang.forth
References: <2024Apr13.195518@mips.complang.tuwien.ac.at>
<27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>
<2024Apr14.103435@mips.complang.tuwien.ac.at> <661baa6c$1@news.ausics.net>
<2024Apr14.132507@mips.complang.tuwien.ac.at> <661bdb9b$1@news.ausics.net>
<2024Apr14.175340@mips.complang.tuwien.ac.at>
From: dxforth@gmail.com (dxf)
In-Reply-To: <2024Apr14.175340@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
NNTP-Posting-Host: news.ausics.net
Message-ID: <661c8bf4@news.ausics.net>
Organization: Ausics - https://newsgroups.ausics.net
Lines: 45
X-Complaints: abuse@ausics.net
Path: i2pn2.org!i2pn.org!news.bbs.nz!news.ausics.net!not-for-mail
 by: dxf - Mon, 15 Apr 2024 02:07 UTC

On 15/04/2024 1:53 am, Anton Ertl wrote:
> dxf <dxforth@gmail.com> writes:
>> On 14/04/2024 9:25 pm, Anton Ertl wrote:
>>> Kahan writes about the original intention in
>>>
>>> http://web.archive.org/web/20170118054747/https://cims.nyu.edu/~dbindel/class/cs279/87stack.pdf
>>>
>>> especially starting at the last paragraph of page 2.
>>>
>>> And about the bug (or rather design mistake):
>>>
>>> https://history.siam.org/pdfs2/Kahan_final.pdf
>>>
>>> Start with the second-to-last paragraph on page 163. He digresses for
>>> a page, but continues on the fourth paragraph of page 165 and
>>> continues to the first paragraph of page 168.
>>
>> The latter sounds like someone not getting his way more than a design mistake.
>> In the first reference Kahan states:
>>
>> "When the 8087 was designed, I knew that stack over/underflow was an issue of
>> more aesthetic than practical importance. I still regret that the 8087's stack
>> implementation was not quite so neat as my original intention described in the
>> accompanying note."
>>
>> Intel decided Kahan's aesthetic afterthought could be dispensed with.
>
> In a way, they did, and Kahan obviously did not get his way. But to
> me it sounds like they tried and failed at implementing a stack that
> extends into memory. The tags that indicate the presence of a stack
> item are there. If they had made a conscious decision at the start to
> dispense with the idea of an extensible stack, they would have
> discpensed with these bits indicating the presence of a stack item as
> well. So what happened is that they botched the first attempt, and
> then decided that they did not want to do what would have been
> necessary to fix it.

My impression is Palmer (the mathematician Intel hired to co-head the
project) was trying to placate Kahan and it fell through for various
reasons.

The design criteria that never changed was the 8-level hardware stack.
Forthers can either accept it for best performance - or pick something
more forgiving at a lesser performance.

Re: Floating point implementations on AMD64

<uvipj7$6irt$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26689&group=comp.lang.forth#26689

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: krishna.myneni@ccreweb.org (Krishna Myneni)
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
Date: Mon, 15 Apr 2024 03:47:33 -0500
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <uvipj7$6irt$1@dont-email.me>
References: <2024Apr13.195518@mips.complang.tuwien.ac.at>
<27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com>
<2024Apr14.103435@mips.complang.tuwien.ac.at> <661baa6c$1@news.ausics.net>
<2024Apr14.132507@mips.complang.tuwien.ac.at> <661bdb9b$1@news.ausics.net>
<2024Apr14.175340@mips.complang.tuwien.ac.at> <661c8bf4@news.ausics.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 15 Apr 2024 10:47:35 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="393420cb1d124b4d02cb54a56910ed6a";
logging-data="215933"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/LlC6A/kTvZI1QiiZeJ+fT"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:EbtthPYBnLigJCGeKOoR6gD37CU=
Content-Language: en-US
In-Reply-To: <661c8bf4@news.ausics.net>
 by: Krishna Myneni - Mon, 15 Apr 2024 08:47 UTC

On 4/14/24 21:07, dxf wrote:
....
> The design criteria that never changed was the 8-level hardware stack.
> Forthers can either accept it for best performance - or pick something
> more forgiving at a lesser performance.
>

In the Lorenz equation example, which works with the 8 deep fpu stack,
we have assumed that the fpu hardware stack was empty before calling
DERIVS. In a real use case, the call to DERIVS is likely to occur within
a deeper call chain, resulting in items already on the fpu stack before
args for DERIVS are pushed. As Marcel said, using only a hardware-based
fp stack is not realistic for any non-trivial floating point work.

The loss of performance with a memory-based fp stack is far less a
concern than having to consider the limited stack depth when writing
code involving floating point arithmetic. Failure from overflowing the
fpu stack is silent. Debugging is likely to be a nightmare.

--
Krishna

Re: Floating point implementations on AMD64

<3e419396b1ee93c7a391a7ffc0e44ed8@www.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26690&group=comp.lang.forth#26690

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: minforth@gmx.net (minforth)
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
Date: Mon, 15 Apr 2024 09:35:22 +0000
Organization: novaBBS
Message-ID: <3e419396b1ee93c7a391a7ffc0e44ed8@www.novabbs.com>
References: <2024Apr13.195518@mips.complang.tuwien.ac.at> <27089a13c7ce61da7ffb927cb6c365d2@www.novabbs.com> <2024Apr14.103435@mips.complang.tuwien.ac.at> <661baa6c$1@news.ausics.net> <2024Apr14.132507@mips.complang.tuwien.ac.at> <661bdb9b$1@news.ausics.net> <2024Apr14.175340@mips.complang.tuwien.ac.at> <661c8bf4@news.ausics.net> <uvipj7$6irt$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1240887"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: d2a19558f194e2f1f8393b8d9be9ef51734a4da3
X-Rslight-Site: $2y$10$1vQS1xrv/qY1tqW9Koj53.X6vndi9mvytd76pbCnpmbPUdrJyNYw.
 by: minforth - Mon, 15 Apr 2024 09:35 UTC

In most cases 'bigger' fp data will be stored in memory anyhow,
which can be cached before disk access. The old 8087 improvements
were caused by its new fp operators, the stack was unusable.

And if CPU based stacks were so lucrative for high performance,
CPU makers would have implemented them since long for normal
integer data.

Re: Floating point implementations on AMD64

<nnd$405cd711$3a67ac71@29f5e356f3e61d54>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26691&group=comp.lang.forth#26691

  copy link   Newsgroups: comp.lang.forth
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
References: <2024Apr13.195518@mips.complang.tuwien.ac.at> <661c8bf4@news.ausics.net> <uvipj7$6irt$1@dont-email.me> <3e419396b1ee93c7a391a7ffc0e44ed8@www.novabbs.com>
From: albert@spenarnc.xs4all.nl
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$405cd711$3a67ac71@29f5e356f3e61d54>
Organization: KPN B.V.
Date: Mon, 15 Apr 2024 12:37:11 +0200
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!tr1.iad1.usenetexpress.com!feeder.usenetexpress.com!tr2.eu1.usenetexpress.com!2001:67c:174:101:1:67:202:6.MISMATCH!feed.abavia.com!abe006.abavia.com!abp001.abavia.com!news.kpn.nl!not-for-mail
Lines: 20
Injection-Date: Mon, 15 Apr 2024 12:37:11 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
 by: albert@spenarnc.xs4all.nl - Mon, 15 Apr 2024 10:37 UTC

In article <3e419396b1ee93c7a391a7ffc0e44ed8@www.novabbs.com>,
minforth <minforth@gmx.net> wrote:
>In most cases 'bigger' fp data will be stored in memory anyhow,
>which can be cached before disk access. The old 8087 improvements
>were caused by its new fp operators, the stack was unusable.
>
>And if CPU based stacks were so lucrative for high performance,
>CPU makers would have implemented them since long for normal
>integer data.

The iA64 comes to mind. Apparently a failure but was a
technical or commercial failure?

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

Re: Floating point implementations on AMD64

<ed271322dea977db293c6e144a31b2f2@www.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=26692&group=comp.lang.forth#26692

  copy link   Newsgroups: comp.lang.forth
Path: i2pn2.org!.POSTED!not-for-mail
From: minforth@gmx.net (minforth)
Newsgroups: comp.lang.forth
Subject: Re: Floating point implementations on AMD64
Date: Mon, 15 Apr 2024 11:37:00 +0000
Organization: novaBBS
Message-ID: <ed271322dea977db293c6e144a31b2f2@www.novabbs.com>
References: <2024Apr13.195518@mips.complang.tuwien.ac.at> <661c8bf4@news.ausics.net> <uvipj7$6irt$1@dont-email.me> <3e419396b1ee93c7a391a7ffc0e44ed8@www.novabbs.com> <nnd$405cd711$3a67ac71@29f5e356f3e61d54>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1251256"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: d2a19558f194e2f1f8393b8d9be9ef51734a4da3
X-Rslight-Site: $2y$10$JJI7eg8FoK89t4GIpgsEMORyGBoOnZYegxKEf9UzALrKrWRC5Kw3W
 by: minforth - Mon, 15 Apr 2024 11:37 UTC

albert@spenarnc.xs4all.nl wrote:

> In article <3e419396b1ee93c7a391a7ffc0e44ed8@www.novabbs.com>,
> minforth <minforth@gmx.net> wrote:
>>In most cases 'bigger' fp data will be stored in memory anyhow,
>>which can be cached before disk access. The old 8087 improvements
>>were caused by its new fp operators, the stack was unusable.
>>
>>And if CPU based stacks were so lucrative for high performance,
>>CPU makers would have implemented them since long for normal
>>integer data.

> The iA64 comes to mind. Apparently a failure but was a
> technical or commercial failure?

Both i.e. poor developer tool stack and strong AMD competition.
And it was overly complex.
https://softwareengineering.stackexchange.com/questions/279334/why-was-the-itanium-processor-difficult-to-write-a-compiler-for

Pages:123
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor