Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

Dinosaurs aren't extinct. They've just learned to hide in the trees.


devel / comp.arch / Re: AMD CPU funny

SubjectAuthor
* Re: AMD CPU funnyTheo
+- Re: AMD CPU funnyMitchAlsup
`* Re: AMD CPU funnyVir Campestris
 +* Re: AMD CPU funnyScott Lurndal
 |`* Re: AMD CPU funnyVir Campestris
 | +* Re: AMD CPU funnyMichael S
 | |`* Re: AMD CPU funnyMitchAlsup
 | | `* Re: AMD CPU funnyMichael S
 | |  `- Re: AMD CPU funnyMitchAlsup
 | `* Re: AMD CPU funnyAndy Burns
 |  `* Re: AMD CPU funnyChris M. Thomasson
 |   `- Re: AMD CPU funnyChris M. Thomasson
 +* Re: AMD CPU funnyMitchAlsup
 |`* Re: AMD CPU funnyVir Campestris
 | `* Re: AMD CPU funnyTheo
 |  `* Re: AMD CPU funnyMitchAlsup
 |   `* Re: AMD CPU funnyTheo
 |    +- Re: AMD CPU funnyMitchAlsup
 |    `* Re: AMD CPU funnyVir Campestris
 |     `* Re: AMD CPU funnyThomas Koenig
 |      `* Re: AMD CPU funnyDavid Brown
 |       `* Re: AMD CPU funnyVir Campestris
 |        +- Re: AMD CPU funnyDavid Brown
 |        `* Re: AMD CPU funnyMichael S
 |         `* Re: AMD CPU funnyVir Campestris
 |          `* Re: AMD CPU funnyAndy Burns
 |           `* Re: AMD CPU funnyTerje Mathisen
 |            `* Re: AMD CPU funnyBGB
 |             `* Re: AMD CPU funnyMitchAlsup
 |              +* Re: AMD CPU funnyBGB
 |              |+- Re: AMD CPU funnyMitchAlsup
 |              |`* Re: AMD CPU funnyPancho
 |              | +* Re: AMD CPU funnyDaniel James
 |              | |+* Re: AMD CPU funnyPancho
 |              | ||+* Re: AMD CPU funnyTerje Mathisen
 |              | |||+* Re: AMD CPU funnyPancho
 |              | ||||`- Re: AMD CPU funnyMitchAlsup
 |              | |||`- Re: AMD CPU funnyMitchAlsup
 |              | ||+- Re: AMD CPU funnyPancho
 |              | ||`* Re: AMD CPU funnyDaniel James
 |              | || +* Re: AMD CPU funnyTim Rentsch
 |              | || |`* Re: AMD CPU funnyTerje Mathisen
 |              | || | `- Re: AMD CPU funnyTim Rentsch
 |              | || `* Re: AMD CPU funnyPancho
 |              | ||  `- Re: AMD CPU funnyTim Rentsch
 |              | |+* Re: AMD CPU funnyMitchAlsup
 |              | ||`- Re: AMD CPU funnyTerje Mathisen
 |              | |`* Re: AMD CPU funnyTim Rentsch
 |              | | +* Re: AMD CPU funnyTerje Mathisen
 |              | | |+- Re: AMD CPU funnyMitchAlsup
 |              | | |`* Re: AMD CPU funnyTim Rentsch
 |              | | | `* Re: AMD CPU funnyTerje Mathisen
 |              | | |  `- Re: AMD CPU funnyTim Rentsch
 |              | | `* Re: AMD CPU funnyBGB
 |              | |  `- Re: AMD CPU funnyTim Rentsch
 |              | `* Re: AMD CPU funnyTim Rentsch
 |              |  `* Re: AMD CPU funnyThomas Koenig
 |              |   +* Re: AMD CPU funnyMitchAlsup
 |              |   |+* Re: AMD CPU funnyTerje Mathisen
 |              |   ||`- Re: AMD CPU funnyTim Rentsch
 |              |   |`* Re: AMD CPU funnyThomas Koenig
 |              |   | +- Re: AMD CPU funnyMitchAlsup
 |              |   | `- Re: AMD CPU funnyTerje Mathisen
 |              |   `* Re: AMD CPU funnyTim Rentsch
 |              |    +* Re: AMD CPU funnyTerje Mathisen
 |              |    |`* Re: AMD CPU funnyTim Rentsch
 |              |    | +* Re: AMD CPU funnyMitchAlsup
 |              |    | |`- Re: AMD CPU funnyTim Rentsch
 |              |    | `* Re: AMD CPU funnyTerje Mathisen
 |              |    |  +* Re: AMD CPU funnyMitchAlsup
 |              |    |  |+* Re: AMD CPU funnyTerje Mathisen
 |              |    |  ||`* Re: AMD CPU funnyMitchAlsup
 |              |    |  || +- Re: AMD CPU funnyTim Rentsch
 |              |    |  || `* Re: AMD CPU funnyTerje Mathisen
 |              |    |  ||  `* Re: AMD CPU funnyTerje Mathisen
 |              |    |  ||   +* Re: AMD CPU funnyTim Rentsch
 |              |    |  ||   |`* Re: AMD CPU funnyTerje Mathisen
 |              |    |  ||   | `* Re: AMD CPU funnyTim Rentsch
 |              |    |  ||   |  `* Re: AMD CPU funnyTerje Mathisen
 |              |    |  ||   |   `- Re: AMD CPU funnyTim Rentsch
 |              |    |  ||   `* Re: AMD CPU funnyMitchAlsup
 |              |    |  ||    `* Re: AMD CPU funnyMitchAlsup
 |              |    |  ||     `- Re: AMD CPU funnyTim Rentsch
 |              |    |  |`- Re: AMD CPU funnyTim Rentsch
 |              |    |  `* Re: AMD CPU funnyTim Rentsch
 |              |    |   `* Re: AMD CPU funnyTerje Mathisen
 |              |    |    `* Re: AMD CPU funnyTim Rentsch
 |              |    |     `- Re: AMD CPU funnyTerje Mathisen
 |              |    `* Re: AMD CPU funnyThomas Koenig
 |              |     +* Re: AMD CPU funnyTim Rentsch
 |              |     |`- Re: AMD CPU funnyTerje Mathisen
 |              |     `* Re: AMD CPU funnyMitchAlsup
 |              |      `* Re: AMD CPU funnyPaul A. Clayton
 |              |       +* Re: AMD CPU funnyAnton Ertl
 |              |       |+- Re: AMD CPU funnyMitchAlsup
 |              |       |`* Re: AMD CPU funnyGeorge Neuner
 |              |       | `- Re: AMD CPU funnyAnton Ertl
 |              |       `- Re: AMD CPU funnyBGB-Alt
 |              `- Re: AMD CPU funnyTerje Mathisen
 `* Re: AMD CPU funnyAndy Burns
  +* Re: AMD CPU funnyAndy Burns
  `- Re: AMD CPU funnyMitchAlsup

Pages:12345
Re: AMD CPU funny

<kv8gt0Fca4nU6@mid.individual.net>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36394&group=comp.arch#36394

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: usenet@andyburns.uk (Andy Burns)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Fri, 29 Dec 2023 18:19:09 +0000
Lines: 23
Message-ID: <kv8gt0Fca4nU6@mid.individual.net>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com>
<umn15d$svun$6@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net wZty+9eg0ifhuwclqeLhmA11upuJb66WESc8YYobHpPDLBcWMd
Cancel-Lock: sha1:gRi0kAGt+oAq+9EgOtZd2jMJ10k= sha256:M5uX4+n6HVxxxRgcZTjTQgMR42+V+JBmGAJTUGoZkio=
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <umn15d$svun$6@dont-email.me>
 by: Andy Burns - Fri, 29 Dec 2023 18:19 UTC

Vir Campestris wrote:

> It is a microbenchmark.
> I was trying to understand cache performance on my system.
> Over in comp.lang.c++ you'll find a thread about Sieve or Eratosthenes.
>
> I and another poster are trying to optimise it. Not for any good reason
> of course... but I was just curious about some of the results I was
> getting.

A few years ago this guy popped-up on youtube, saying "I was the author
of Task Manager on Windows NT", I ignored the recommendation for quite a
while thinking he just wanted people to blow smoke up his arse,
eventually I watched and he seems a decent guy ... anyway he did a drag
racing series using sieve prime algorithm

<https://www.youtube.com/playlist?list=PLF2KJ6Gy3cZ5Er-1eF9fN1Hgw_xkoD9V1>

Others have contributed in all manner of languages

<https://github.com/PlummersSoftwareLLC/Primes>

Re: AMD CPU funny

<kv8s49Fg7k5U3@mid.individual.net>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36401&group=comp.arch#36401

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: usenet@andyburns.uk (Andy Burns)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Fri, 29 Dec 2023 21:30:45 +0000
Lines: 43
Message-ID: <kv8s49Fg7k5U3@mid.individual.net>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net G32lDj523VoUjcPVR0QjgANlKRvymCsDgIbcc47WCbTH3zDOyd
Cancel-Lock: sha1:YcgMjVdy4TlfUaWCE/2DNRLdJgQ= sha256:KD4Cd3fm8leV/PKwRI3Z11K5+rFL53kyT1Q0SqUnq24=
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <um1h21$13l52$1@dont-email.me>
 by: Andy Burns - Fri, 29 Dec 2023 21:30 UTC

Vir Campestris wrote:

> The code I have put in a signature block; there's no point in risking
> someone posting it again. I've commented it, but no doubt not in all the
> right places! I'd be interested to know what results other people get.

Built and run under Win11 with g++ 13.2.0
on a laptop with
i7-11370H CPU (3.3GHz base, 4.28GHz turbo)
L1 d-cache 48kB per core
L2 cache 1.2MB
L3 cache 12MB
16GB memory

Size 1 gave 5.12047e+09 bytes/second.
Size 2 gave 5.76467e+09 bytes/second.
Size 4 gave 5.78979e+09 bytes/second.
Size 8 gave 5.6642e+09 bytes/second.
Size 16 gave 5.55303e+09 bytes/second.
Size 32 gave 4.4151e+09 bytes/second.
Size 64 gave 4.67783e+09 bytes/second.
Size 128 gave 4.85115e+09 bytes/second.
Size 256 gave 4.88147e+09 bytes/second.
Size 512 gave 4.5214e+09 bytes/second.
Size 1024 gave 4.64794e+09 bytes/second.
Size 2048 gave 4.69149e+09 bytes/second.
Size 4096 gave 4.69416e+09 bytes/second.
Size 8192 gave 4.73425e+09 bytes/second.
Size 16384 gave 4.76353e+09 bytes/second.
Size 32768 gave 4.70864e+09 bytes/second.
Size 65536 gave 4.75244e+09 bytes/second.
Size 131072 gave 4.76452e+09 bytes/second.
Size 262144 gave 4.69839e+09 bytes/second.
Size 524288 gave 4.71094e+09 bytes/second.
Size 1048576 gave 4.62922e+09 bytes/second.
Size 2097152 gave 4.55435e+09 bytes/second.
Size 4194304 gave 4.58315e+09 bytes/second.
Size 8388608 gave 4.55917e+09 bytes/second.
Size 16777216 gave 4.60661e+09 bytes/second.
Size 33554432 gave 4.63509e+09 bytes/second.
Size 67108864 gave 4.62349e+09 bytes/second.
Size 134217728 gave 4.62406e+09 bytes/second.

Re: AMD CPU funny

<kv8u2tFg7k5U6@mid.individual.net>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36403&group=comp.arch#36403

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: usenet@andyburns.uk (Andy Burns)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Fri, 29 Dec 2023 22:04:09 +0000
Lines: 62
Message-ID: <kv8u2tFg7k5U6@mid.individual.net>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<kv8s49Fg7k5U3@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net HqTK3aT2fBRZF2TBRsltWgikW4sCgiS+hi/fLh06403XBkfBYa
Cancel-Lock: sha1:737sFrI1jA6JMwU9jHps3y/v4DY= sha256:Zx8zFzuG44qS0cuPTW4GoxVKDWKgUPx3ZU8DsnI/gZo=
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <kv8s49Fg7k5U3@mid.individual.net>
 by: Andy Burns - Fri, 29 Dec 2023 22:04 UTC

Andy Burns wrote:

> Size 1 gave 5.12047e+09 bytes/second.
> Size 2 gave 5.76467e+09 bytes/second.
> Size 4 gave 5.78979e+09 bytes/second.
> Size 8 gave 5.6642e+09 bytes/second.
> Size 16 gave 5.55303e+09 bytes/second.
> Size 32 gave 4.4151e+09 bytes/second.
> Size 64 gave 4.67783e+09 bytes/second.
> Size 128 gave 4.85115e+09 bytes/second.
> Size 256 gave 4.88147e+09 bytes/second.
> Size 512 gave 4.5214e+09 bytes/second.
> Size 1024 gave 4.64794e+09 bytes/second.
> Size 2048 gave 4.69149e+09 bytes/second.
> Size 4096 gave 4.69416e+09 bytes/second.
> Size 8192 gave 4.73425e+09 bytes/second.
> Size 16384 gave 4.76353e+09 bytes/second.
> Size 32768 gave 4.70864e+09 bytes/second.
> Size 65536 gave 4.75244e+09 bytes/second.
> Size 131072 gave 4.76452e+09 bytes/second.
> Size 262144 gave 4.69839e+09 bytes/second.
> Size 524288 gave 4.71094e+09 bytes/second.
> Size 1048576 gave 4.62922e+09 bytes/second.
> Size 2097152 gave 4.55435e+09 bytes/second.
> Size 4194304 gave 4.58315e+09 bytes/second.
> Size 8388608 gave 4.55917e+09 bytes/second.
> Size 16777216 gave 4.60661e+09 bytes/second.
> Size 33554432 gave 4.63509e+09 bytes/second.
> Size 67108864 gave 4.62349e+09 bytes/second.
> Size 134217728 gave 4.62406e+09 bytes/second.

and (perversely?) with -Ofast

Size 1 gave 4.64966e+09 bytes/second.
Size 2 gave 9.39481e+09 bytes/second.
Size 4 gave 1.59656e+10 bytes/second.
Size 8 gave 2.6789e+10 bytes/second.
Size 16 gave 4.42999e+10 bytes/second.
Size 32 gave 4.51076e+10 bytes/second.
Size 64 gave 4.90103e+10 bytes/second.
Size 128 gave 4.08414e+10 bytes/second.
Size 256 gave 4.36978e+10 bytes/second.
Size 512 gave 4.54119e+10 bytes/second.
Size 1024 gave 4.50594e+10 bytes/second.
Size 2048 gave 4.71622e+10 bytes/second.
Size 4096 gave 4.60261e+10 bytes/second.
Size 8192 gave 3.84764e+10 bytes/second.
Size 16384 gave 3.90878e+10 bytes/second.
Size 32768 gave 3.76718e+10 bytes/second.
Size 65536 gave 3.87339e+10 bytes/second.
Size 131072 gave 3.89058e+10 bytes/second.
Size 262144 gave 3.85662e+10 bytes/second.
Size 524288 gave 4.19209e+10 bytes/second.
Size 1048576 gave 3.00962e+10 bytes/second.
Size 2097152 gave 1.37616e+10 bytes/second.
Size 4194304 gave 1.4136e+10 bytes/second.
Size 8388608 gave 1.39413e+10 bytes/second.
Size 16777216 gave 1.35747e+10 bytes/second.
Size 33554432 gave 1.37286e+10 bytes/second.
Size 67108864 gave 1.38163e+10 bytes/second.
Size 134217728 gave 1.29987e+10 bytes/second.

Re: AMD CPU funny

<4d93fd941f244cf815672ffc126cf485@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36404&group=comp.arch#36404

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Fri, 29 Dec 2023 22:02:36 +0000
Organization: novaBBS
Message-ID: <4d93fd941f244cf815672ffc126cf485@news.novabbs.com>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me> <kv8s49Fg7k5U3@mid.individual.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1591301"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$qrd2TM/s1/53OLXIpmV/IOFAf9RiYq.1fI8Gh6mejSr/y1qoEW9fi
 by: MitchAlsup - Fri, 29 Dec 2023 22:02 UTC

Andy Burns wrote:

> Vir Campestris wrote:

>> The code I have put in a signature block; there's no point in risking
>> someone posting it again. I've commented it, but no doubt not in all the
>> right places! I'd be interested to know what results other people get.

> Built and run under Win11 with g++ 13.2.0
> on a laptop with
> i7-11370H CPU (3.3GHz base, 4.28GHz turbo)
> L1 d-cache 48kB per core
> L2 cache 1.2MB
> L3 cache 12MB
> 16GB memory

> Size 1 gave 5.12047e+09 bytes/second.
> Size 2 gave 5.76467e+09 bytes/second.
> Size 4 gave 5.78979e+09 bytes/second.
> Size 8 gave 5.6642e+09 bytes/second.
> Size 16 gave 5.55303e+09 bytes/second.
> Size 32 gave 4.4151e+09 bytes/second.
> Size 64 gave 4.67783e+09 bytes/second.
> Size 128 gave 4.85115e+09 bytes/second.
> Size 256 gave 4.88147e+09 bytes/second.
> Size 512 gave 4.5214e+09 bytes/second.
> Size 1024 gave 4.64794e+09 bytes/second.
> Size 2048 gave 4.69149e+09 bytes/second.
> Size 4096 gave 4.69416e+09 bytes/second.
> Size 8192 gave 4.73425e+09 bytes/second.
> Size 16384 gave 4.76353e+09 bytes/second.
> Size 32768 gave 4.70864e+09 bytes/second.
> Size 65536 gave 4.75244e+09 bytes/second.
> Size 131072 gave 4.76452e+09 bytes/second.
> Size 262144 gave 4.69839e+09 bytes/second.
> Size 524288 gave 4.71094e+09 bytes/second.
> Size 1048576 gave 4.62922e+09 bytes/second.
> Size 2097152 gave 4.55435e+09 bytes/second.
> Size 4194304 gave 4.58315e+09 bytes/second.
> Size 8388608 gave 4.55917e+09 bytes/second.
> Size 16777216 gave 4.60661e+09 bytes/second.
> Size 33554432 gave 4.63509e+09 bytes/second.
> Size 67108864 gave 4.62349e+09 bytes/second.
> Size 134217728 gave 4.62406e+09 bytes/second.

This data set has a 5/4 ratio best to worst with noise.

Re: AMD CPU funny

<kv8u91Fg7k5U7@mid.individual.net>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36405&group=comp.arch#36405

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: usenet@andyburns.uk (Andy Burns)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Fri, 29 Dec 2023 22:07:25 +0000
Lines: 7
Message-ID: <kv8u91Fg7k5U7@mid.individual.net>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<kv8s49Fg7k5U3@mid.individual.net> <kv8u2tFg7k5U6@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net CAgyhizGm3DNhZn5Wu7vTwQI+IbNrfS8rvbqKjuZnwxUUZSvaS
Cancel-Lock: sha1:m2rnKccJFzOl/RkHFnSQvp1xbEg= sha256:HUMrOkFrcCCC/TwX7PqPYIrDC9G1NP6QMPQRE1SVV1s=
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <kv8u2tFg7k5U6@mid.individual.net>
 by: Andy Burns - Fri, 29 Dec 2023 22:07 UTC

Andy Burns wrote:

> and (perversely?) with -Ofast

I must look at the exponent, not just the mantissa
I must look at the exponent, not just the mantissa
I must look at the exponent, not just the mantissa

Re: AMD CPU funny

<5e3d180fa0e5caa113e9b9be8cb8276c@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36408&group=comp.arch#36408

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Fri, 29 Dec 2023 23:59:44 +0000
Organization: novaBBS
Message-ID: <5e3d180fa0e5caa113e9b9be8cb8276c@news.novabbs.com>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me> <kv8s49Fg7k5U3@mid.individual.net> <kv8u2tFg7k5U6@mid.individual.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1600928"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$j1j/s8924lHomeAiralw/OdALk/eFxdNAGl71rf9sidsrO7XihK7u
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Spam-Checker-Version: SpamAssassin 4.0.0
 by: MitchAlsup - Fri, 29 Dec 2023 23:59 UTC

Andy Burns wrote:

> Andy Burns wrote:

>> Size 1 gave 5.12047e+09 bytes/second.
>> Size 2 gave 5.76467e+09 bytes/second.
>> Size 4 gave 5.78979e+09 bytes/second.
>> Size 8 gave 5.6642e+09 bytes/second.
>> Size 16 gave 5.55303e+09 bytes/second.
>> Size 32 gave 4.4151e+09 bytes/second.
>> Size 64 gave 4.67783e+09 bytes/second.
>> Size 128 gave 4.85115e+09 bytes/second.
>> Size 256 gave 4.88147e+09 bytes/second.
>> Size 512 gave 4.5214e+09 bytes/second.
>> Size 1024 gave 4.64794e+09 bytes/second.
>> Size 2048 gave 4.69149e+09 bytes/second.
>> Size 4096 gave 4.69416e+09 bytes/second.
>> Size 8192 gave 4.73425e+09 bytes/second.
>> Size 16384 gave 4.76353e+09 bytes/second.
>> Size 32768 gave 4.70864e+09 bytes/second.
>> Size 65536 gave 4.75244e+09 bytes/second.
>> Size 131072 gave 4.76452e+09 bytes/second.
>> Size 262144 gave 4.69839e+09 bytes/second.
>> Size 524288 gave 4.71094e+09 bytes/second.
>> Size 1048576 gave 4.62922e+09 bytes/second.
>> Size 2097152 gave 4.55435e+09 bytes/second.
>> Size 4194304 gave 4.58315e+09 bytes/second.
>> Size 8388608 gave 4.55917e+09 bytes/second.
>> Size 16777216 gave 4.60661e+09 bytes/second.
>> Size 33554432 gave 4.63509e+09 bytes/second.
>> Size 67108864 gave 4.62349e+09 bytes/second.
>> Size 134217728 gave 4.62406e+09 bytes/second.

> and (perversely?) with -Ofast
{ A
> Size 1 gave 4.64966e+09 bytes/second.
> Size 2 gave 9.39481e+09 bytes/second.
> Size 4 gave 1.59656e+10 bytes/second.
> Size 8 gave 2.6789e+10 bytes/second.
> Size 16 gave 4.42999e+10 bytes/second.
} A
{ B
> Size 32 gave 4.51076e+10 bytes/second.
> Size 64 gave 4.90103e+10 bytes/second.
> Size 128 gave 4.08414e+10 bytes/second.
> Size 256 gave 4.36978e+10 bytes/second.
> Size 512 gave 4.54119e+10 bytes/second.
> Size 1024 gave 4.50594e+10 bytes/second.
> Size 2048 gave 4.71622e+10 bytes/second.
> Size 4096 gave 4.60261e+10 bytes/second.
} B
{ C
> Size 8192 gave 3.84764e+10 bytes/second.
> Size 16384 gave 3.90878e+10 bytes/second.
> Size 32768 gave 3.76718e+10 bytes/second.
> Size 65536 gave 3.87339e+10 bytes/second.
> Size 131072 gave 3.89058e+10 bytes/second.
> Size 262144 gave 3.85662e+10 bytes/second.
> Size 524288 gave 4.19209e+10 bytes/second.
} C
{ D
> Size 1048576 gave 3.00962e+10 bytes/second.
> Size 2097152 gave 1.37616e+10 bytes/second.
> Size 4194304 gave 1.4136e+10 bytes/second.
> Size 8388608 gave 1.39413e+10 bytes/second.
> Size 16777216 gave 1.35747e+10 bytes/second.
> Size 33554432 gave 1.37286e+10 bytes/second.
> Size 67108864 gave 1.38163e+10 bytes/second.
> Size 134217728 gave 1.29987e+10 bytes/second.
} D

A displays BW follows access size

B displays essentially the same performance throughout

C displays a minor step down in performance due to L2 or TLB effects 5 to 4 ratio

D displays a major stepdown in performance at DIV 3 compared to C 3 to 1 ratio

Re: AMD CPU funny

<ump7l3$19ud7$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36411&group=comp.arch#36411

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sat, 30 Dec 2023 14:57:55 +0100
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <ump7l3$19ud7$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com>
<umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 30 Dec 2023 13:57:56 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4f06a707ad44c362c56217ebc41338d8";
logging-data="1374631"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/4AkEBh24tAFT8oSxPVEskJvUiWpWqoVq2IVm7C6dZUA=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18
Cancel-Lock: sha1:BU1ohQ1ELrqCc1C3Fr6QX8arRJw=
In-Reply-To: <kv8gt0Fca4nU6@mid.individual.net>
 by: Terje Mathisen - Sat, 30 Dec 2023 13:57 UTC

Andy Burns wrote:
> Vir Campestris wrote:
>
>> It is a microbenchmark.
>> I was trying to understand cache performance on my system.
>> Over in comp.lang.c++ you'll find a thread about Sieve or Eratosthenes.
>>
>> I and another poster are trying to optimise it. Not for any good
>> reason of course... but I was just curious about some of the results I
>> was getting.
>
> A few years ago this guy popped-up on youtube, saying "I was the author
> of Task Manager on Windows NT", I ignored the recommendation for quite a
> while thinking he just wanted people to blow smoke up his arse,
> eventually I watched and he seems a decent guy ... anyway he did a drag
> racing series using sieve prime algorithm
>
> <https://www.youtube.com/playlist?list=PLF2KJ6Gy3cZ5Er-1eF9fN1Hgw_xkoD9V1>

Dave is a Good Guy!

I have contributed to his "smallest windows program ever" with a small
tweak that saved 4 (or 6?) bytes.

The sieve algorithm used above is more advanced than the fixed modulo 30
program I wrote a ffew decades ago, but similar in approach.

I decided to use mod 30 because as soon as you get to 30+, there are
exactly 8 possible prime locations in each block, so it fits perfectly
in a byte map. :-)

1,7,11,13,17,19,23,29
>
> Others have contributed in all manner of languages
>
> <https://github.com/PlummersSoftwareLLC/Primes>

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: AMD CPU funny

<umpn9d$1c212$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36413&group=comp.arch#36413

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sat, 30 Dec 2023 12:24:44 -0600
Organization: A noiseless patient Spider
Lines: 71
Message-ID: <umpn9d$1c212$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com>
<umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net>
<ump7l3$19ud7$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 30 Dec 2023 18:24:45 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="7d651c7879b037f034eae6eb942eae4a";
logging-data="1443874"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+supi2TISdriOaiDRAX4rk"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:sLgp/8H6CjprA7CuDGZ5EvKWe+Y=
In-Reply-To: <ump7l3$19ud7$1@dont-email.me>
Content-Language: en-US
 by: BGB - Sat, 30 Dec 2023 18:24 UTC

On 12/30/2023 7:57 AM, Terje Mathisen wrote:
> Andy Burns wrote:
>> Vir Campestris wrote:
>>
>>> It is a microbenchmark.
>>> I was trying to understand cache performance on my system.
>>> Over in comp.lang.c++ you'll find a thread about Sieve or Eratosthenes.
>>>
>>> I and another poster are trying to optimise it. Not for any good
>>> reason of course... but I was just curious about some of the results
>>> I was getting.
>>
>> A few years ago this guy popped-up on youtube, saying "I was the
>> author of Task Manager on Windows NT", I ignored the recommendation
>> for quite a while thinking he just wanted people to blow smoke up his
>> arse, eventually I watched and he seems a decent guy ... anyway he did
>> a drag racing series using sieve prime algorithm
>>
>> <https://www.youtube.com/playlist?list=PLF2KJ6Gy3cZ5Er-1eF9fN1Hgw_xkoD9V1>
>
> Dave is a Good Guy!
>

Yes, and generally interesting videos on YouTube.

He did interviews with David Cutler and Raymond Chen as well, both
fairly big names at Microsoft, which were also kinda interesting.

> I have contributed to his "smallest windows program ever" with a small
> tweak that saved 4 (or 6?) bytes.
>
> The sieve algorithm used above is more advanced than the fixed modulo 30
> program I wrote a ffew decades ago, but similar in approach.
>
> I decided to use mod 30 because as soon as you get to 30+, there are
> exactly 8 possible prime locations in each block, so it fits perfectly
> in a byte map. :-)
>
> 1,7,11,13,17,19,23,29

I before used a prime sieve to test static vs dynamic types performance
in my compiler, and noted that the relative performance difference was
smaller than might otherwise be expected (intuitively, one would expect
dynamic types to be horridly slow, since nearly every operation is
effectively a runtime call).

Namely, I only saw around a 3x difference at the time, rather than
around a 10x+ difference, which is more what I would expect.

Granted, a prime sieve is also not exactly something that performs well
on my ISA design (but, slightly less "aggressively badly" than a the
"recursive Fibonacci number" algorithm, *).

*, IIRC:
long rfib(long x)
{ return((x<2)?1:(rfib(x-1)+rfib(x-2))); }
Which is basically a stress test of function call and return performance...

>>
>> Others have contributed in all manner of languages
>>
>> <https://github.com/PlummersSoftwareLLC/Primes>
>
>
> Terje
>
>

Re: AMD CPU funny

<bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36417&group=comp.arch#36417

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sat, 30 Dec 2023 22:05:30 +0000
Organization: novaBBS
Message-ID: <bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me> <e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com> <um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk> <3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com> <fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me> <um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me> <umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com> <umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net> <ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1707516"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$7lhpPcUDN1aw4ar0u1gdcufwTue0/HZ/Q2cmzgZRTuHaNc7C4hLva
 by: MitchAlsup - Sat, 30 Dec 2023 22:05 UTC

BGB wrote:

> long rfib(long x)
> { return((x<2)?1:(rfib(x-1)+rfib(x-2))); }

At first I thought the following looked pretty good:

rfib:
ENTER R29,R0,#0 // just 3 preserved registers.

CMP R2,R1,#2
PGE R2,TT
MOV R1,#1 // return 1
EXIT R29,R0,#0

SUB R30,R1,#2
SUB R1,R1,#1
CALL rfib // rfib(n-1)
MOV R29,R1
MOV R1,R30
CALL rfib // rfib(n-2)

ADD R1,R1,R29
EXIT R29,R0,#0 // return rfib(n-1)+rfib(n-2)

But then I moved save and restore to the rfib() calls making the last rung
on the call tree less expensive by 6 memory references each, and by saving
restoring only 2 registers rather than 3 at the cost of a MOV, so each non-
leaf level in the call tree saves/restores only 4 registers instead of 6::

rfib:
SUB R2,R1,#2
PNLT R2,TT // (n-2) < 0
MOV R1,#1 // return 1
RET

ENTER R30,R0,#0 // just 2 preserved registers.
MOV R30,R2
SUB R1,R1,#1
CALL rfib // rfib(n-1)
MOV R2,R1
MOV R1,R30
MOV R30,R2
CALL rfib // rfib(n-2)

ADD R1,R1,R30
EXIT R30,R0,#0 // return rfib(n-1)+rfib(n-2)

But then I realized that rfib(n-2) has already been computed by rfib(n-1).
Since the size of the redundant element on the stack is constant, rfib(n-1)
makes a location available to rfib(n-2) so only 1 call is required instead
of 2::

rfib:
ENTER R0,R0,#8
STD #0,[SP+8] // setup rfib[n-2]
CALL rfib1 // recurse through rfib1
EXIT R0,R0,#8

rfib1:
SUB R2,R1,#2
PNLT R2,TT // (n-2) < 0
MOV R1,#1 // return 1
RET

ENTER R0,R0,#8 // no preserved registers just return address

SUB R1,R1,#1
CALL rfib1 // rfib(n-1)
LDD R2,[SP] // save rfib(n-2)

ADD R1,R1,R30
STD R1,[SP+16] // restore rfib(n-2)
EXIT R0,R0,#8 // return rfib(n-1)+rfib(n-2)

.....

Re: AMD CPU funny

<umq8hi$1ebuo$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36419&group=comp.arch#36419

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sat, 30 Dec 2023 17:19:12 -0600
Organization: A noiseless patient Spider
Lines: 207
Message-ID: <umq8hi$1ebuo$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com>
<umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net>
<ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me>
<bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 30 Dec 2023 23:19:15 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3d435c33f4a089fd22823e1e74aca625";
logging-data="1519576"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Busm87KoOjACnWY//q4wA"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:xKPxyMNq77N46Ty20X7LWaacc6s=
Content-Language: en-US
In-Reply-To: <bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com>
 by: BGB - Sat, 30 Dec 2023 23:19 UTC

On 12/30/2023 4:05 PM, MitchAlsup wrote:
> BGB wrote:
>
>>    long rfib(long x)
>>      { return((x<2)?1:(rfib(x-1)+rfib(x-2))); }
>
> At first I thought the following looked pretty good:
>
> rfib:
>      ENTER   R29,R0,#0          // just 3 preserved registers.
>
>      CMP     R2,R1,#2
>      PGE     R2,TT
>      MOV     R1,#1              // return 1
>      EXIT    R29,R0,#0
>
>      SUB     R30,R1,#2
>      SUB     R1,R1,#1
>      CALL    rfib               // rfib(n-1)
>      MOV     R29,R1
>      MOV     R1,R30
>      CALL    rfib               // rfib(n-2)
>
>      ADD     R1,R1,R29
>      EXIT    R29,R0,#0          // return rfib(n-1)+rfib(n-2)
>

Theoretically, could be done in my ISA something like:
rfib:
ADD -24, SP | MOV LR, R1
MOV 1, R2 | CMPGE 2, R4
BF .L0
MOV.X R12, (SP, 0)
MOV R4, R12 | MOV.Q R1, (SP, 16)
ADD R12, -1, R4
BSR rfib
MOV R2, R13 | ADD R12, -2, R4
BSR rfib
ADD R2, R13, R2
MOV.Q (SP, 16), R1
MOV.X (SP, 0), R12
.L0:
ADD 24, SP
JMP R1

But, my compiler isn't going to pull off anything like this.

Checking, actual BGBCC output:
rfib:
MOV LR, R1
BSR __prolog_0000_0000020000FC
ADD -208, SP
MOV RQ4, RQ13
// tk_shell_chksvar.c:234 { return((x<2)?1:(rfib(x-1)+rfib(x-2))); }
CMPQGE 2, RQ13
BT .L00802A62
MOV 1, RQ10
BRA .L00802A63

..L00802A62:
SUB RQ13, 1, RQ14
MOV RQ14, RQ4
BSR rfib
MOV RQ2, RQ12
SUB RQ13, 2, RQ14
MOV RQ14, RQ4
BSR rfib
MOV RQ2, RQ11
ADD RQ12, RQ11, RQ14
MOV RQ14, RQ10

..L00802A63:
MOV RQ10, RQ2

..L00C108CE:
ADD 208, SP
BRA __epilog_0000_0000020000FC
.balign 4

( Yes, this is more MOV's than "technically necessary", but eliminating
them from the compiler output is "easier said than done" given how some
parts of the compiler work. RQn/RDn are technically equivalent to Rn,
but the compiler distinguishes them as early on the idea was that the
assembler might distinguish instructions based on type, like
EAX/RAX/AX/... on x86-64, but it didn't quite happen this way. )

And, fetching the prolog and epilog:
__prolog_0000_0000020000FC:
ADD -48, SP
MOV.Q R1, (SP, 40)
MOV.X R12, (SP, 16)
MOV.Q R14, (SP, 32)
MOV.X R10, (SP, 0)
RTS

__epilog_0000_0000020000FC:
MOV.Q (SP, 40), R1
MOV.X (SP, 0), R10
MOV.X (SP, 16), R12
MOV.Q (SP, 32), R14
ADD 48, SP
JMP R1

Or, a little closer to the final machine-code (BJX2 Baseline):

rfib: //@03E27C
4911 STC R1, R1
..reloc __prolog_0000_0000020000FC 0F/RELW20_BJX
F000_D000 BSR (PC, 0)
F2F1_D330 ADD -208, SP
18D4 MOV R4, R13
F2DA_C802 CMPQGE 2, R13
..reloc .L00802A62 0F/RELW20_BJX
E000_C000 BT (PC, 0)
DA01 MOV 1, R10
..reloc .L00802A63 0F/RELW20_BJX
F000_C000 BRA (PC, 0)
3000 NOP

..L00802A62: //@03E298
F2ED_11FF ADD R13, -1, R14
F04E_1089 MOV R14, R4
..reloc rfib 0F/RELW20_BJX
F000_D000 BSR (PC, 0)
F4C2_1089 MOV R2, R12 |
F2ED_11FE ADD R13, -2, R14
F04E_1089 MOV R14, R4
..reloc rfib 0F/RELW20_BJX
F000_D000 BSR (PC, 0)
F0B2_1089 MOV R2, R11
F0EC_10B0 ADD R12, R11, R14
F0AE_1089 MOV R14, R10

..L00802A63: //@03E2C0
182A MOV R10, R2

..L00C108CE: //@03E2C2
F2F0_D0D0 ADD 208, SP
..reloc __epilog_0000_0000020000FC 0F/RELW20_BJX
F000_C000 BRA (PC, 0)
3030 BRK

> But then I moved save and restore to the rfib() calls making the last rung
> on the call tree less expensive by 6 memory references each, and by saving
> restoring only 2 registers rather than 3 at the cost of a MOV, so each non-
> leaf level in the call tree saves/restores only 4 registers instead of 6::
>
> rfib:
>      SUB     R2,R1,#2
>      PNLT    R2,TT              // (n-2) < 0
>      MOV     R1,#1              // return 1
>      RET
>
>      ENTER   R30,R0,#0          // just 2 preserved registers.
>      MOV     R30,R2
>      SUB     R1,R1,#1
>      CALL    rfib               // rfib(n-1)
>      MOV     R2,R1
>      MOV     R1,R30
>      MOV     R30,R2
>      CALL    rfib               // rfib(n-2)
>
>      ADD     R1,R1,R30
>      EXIT    R30,R0,#0          // return rfib(n-1)+rfib(n-2)
>
> But then I realized that rfib(n-2) has already been computed by rfib(n-1).
> Since the size of the redundant element on the stack is constant, rfib(n-1)
> makes a location available to rfib(n-2) so only 1 call is required instead
> of 2::
>
> rfib:
>      ENTER   R0,R0,#8
>      STD     #0,[SP+8]        // setup rfib[n-2]
>      CALL    rfib1             // recurse through rfib1
>      EXIT    R0,R0,#8
>
> rfib1:
>      SUB     R2,R1,#2
>      PNLT    R2,TT              // (n-2) < 0
>      MOV     R1,#1              // return 1
>      RET
>
>      ENTER   R0,R0,#8           // no preserved registers just return
> address
>
>      SUB     R1,R1,#1
>      CALL    rfib1              // rfib(n-1)
>      LDD     R2,[SP]            // save rfib(n-2)
>
>      ADD     R1,R1,R30
>      STD     R1,[SP+16]         // restore rfib(n-2)
>      EXIT    R0,R0,#8           // return rfib(n-1)+rfib(n-2)
>
> ....

If you cache the intermediate results of the calculations, this entirely
defeats its use as a benchmark...

Say, because it drops from ~ O(2^n) to ~ O(n)...

Granted, There are plenty of other much more efficient ways of
calculating Fibonacci numbers...

Re: AMD CPU funny

<9325dd64f02f529d09b8fa86fbf62ea2@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36421&group=comp.arch#36421

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sat, 30 Dec 2023 23:55:52 +0000
Organization: novaBBS
Message-ID: <9325dd64f02f529d09b8fa86fbf62ea2@news.novabbs.com>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me> <e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com> <um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk> <3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com> <fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me> <um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me> <umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com> <umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net> <ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me> <bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com> <umq8hi$1ebuo$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1717284"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$fMC4WGglirD8O2/siMG4Pee42y8a/mnHGxDKqHoFlAydhKNF8IE8u
 by: MitchAlsup - Sat, 30 Dec 2023 23:55 UTC

BGB wrote:

> On 12/30/2023 4:05 PM, MitchAlsup wrote:
>> BGB wrote:
>>
>>>    long rfib(long x)
>>>      { return((x<2)?1:(rfib(x-1)+rfib(x-2))); }
>>
>> At first I thought the following looked pretty good:
>>
>> rfib:
>>      ENTER   R29,R0,#0          // just 3 preserved registers.
>>
>>      CMP     R2,R1,#2
>>      PGE     R2,TT
>>      MOV     R1,#1              // return 1
>>      EXIT    R29,R0,#0
>>
>>      SUB     R30,R1,#2
>>      SUB     R1,R1,#1
>>      CALL    rfib               // rfib(n-1)
>>      MOV     R29,R1
>>      MOV     R1,R30
>>      CALL    rfib               // rfib(n-2)
>>
>>      ADD     R1,R1,R29
>>      EXIT    R29,R0,#0          // return rfib(n-1)+rfib(n-2)
>>

> Theoretically, could be done in my ISA something like:
> rfib:
> ADD -24, SP | MOV LR, R1
> MOV 1, R2 | CMPGE 2, R4
> BF .L0
> MOV.X R12, (SP, 0)
> MOV R4, R12 | MOV.Q R1, (SP, 16)
> ADD R12, -1, R4
> BSR rfib
> MOV R2, R13 | ADD R12, -2, R4
> BSR rfib
> ADD R2, R13, R2
> MOV.Q (SP, 16), R1
> MOV.X (SP, 0), R12
> .L0:
> ADD 24, SP
> JMP R1

> But, my compiler isn't going to pull off anything like this.

> Checking, actual BGBCC output:
> rfib:
> MOV LR, R1
> BSR __prolog_0000_0000020000FC
> ADD -208, SP
> MOV RQ4, RQ13
> // tk_shell_chksvar.c:234 { return((x<2)?1:(rfib(x-1)+rfib(x-2))); }
> CMPQGE 2, RQ13
> BT .L00802A62
> MOV 1, RQ10
> BRA .L00802A63

> ..L00802A62:
> SUB RQ13, 1, RQ14
> MOV RQ14, RQ4
> BSR rfib
> MOV RQ2, RQ12
> SUB RQ13, 2, RQ14
> MOV RQ14, RQ4
> BSR rfib
> MOV RQ2, RQ11
> ADD RQ12, RQ11, RQ14
> MOV RQ14, RQ10

> ..L00802A63:
> MOV RQ10, RQ2

> ..L00C108CE:
> ADD 208, SP
> BRA __epilog_0000_0000020000FC
> .balign 4

> ( Yes, this is more MOV's than "technically necessary", but eliminating
> them from the compiler output is "easier said than done" given how some
> parts of the compiler work. RQn/RDn are technically equivalent to Rn,
> but the compiler distinguishes them as early on the idea was that the
> assembler might distinguish instructions based on type, like
> EAX/RAX/AX/... on x86-64, but it didn't quite happen this way. )

> And, fetching the prolog and epilog:
> __prolog_0000_0000020000FC:
> ADD -48, SP
> MOV.Q R1, (SP, 40)
> MOV.X R12, (SP, 16)
> MOV.Q R14, (SP, 32)
> MOV.X R10, (SP, 0)
> RTS

> __epilog_0000_0000020000FC:
> MOV.Q (SP, 40), R1
> MOV.X (SP, 0), R10
> MOV.X (SP, 16), R12
> MOV.Q (SP, 32), R14
> ADD 48, SP
> JMP R1

> Or, a little closer to the final machine-code (BJX2 Baseline):

> rfib: //@03E27C
> 4911 STC R1, R1
> ..reloc __prolog_0000_0000020000FC 0F/RELW20_BJX
> F000_D000 BSR (PC, 0)
> F2F1_D330 ADD -208, SP
> 18D4 MOV R4, R13
> F2DA_C802 CMPQGE 2, R13
> ..reloc .L00802A62 0F/RELW20_BJX
> E000_C000 BT (PC, 0)
> DA01 MOV 1, R10
> ..reloc .L00802A63 0F/RELW20_BJX
> F000_C000 BRA (PC, 0)
> 3000 NOP

> ..L00802A62: //@03E298
> F2ED_11FF ADD R13, -1, R14
> F04E_1089 MOV R14, R4
> ..reloc rfib 0F/RELW20_BJX
> F000_D000 BSR (PC, 0)
> F4C2_1089 MOV R2, R12 |
> F2ED_11FE ADD R13, -2, R14
> F04E_1089 MOV R14, R4
> ..reloc rfib 0F/RELW20_BJX
> F000_D000 BSR (PC, 0)
> F0B2_1089 MOV R2, R11
> F0EC_10B0 ADD R12, R11, R14
> F0AE_1089 MOV R14, R10

> ..L00802A63: //@03E2C0
> 182A MOV R10, R2

> ..L00C108CE: //@03E2C2
> F2F0_D0D0 ADD 208, SP
> ..reloc __epilog_0000_0000020000FC 0F/RELW20_BJX
> F000_C000 BRA (PC, 0)
> 3030 BRK

>> But then I moved save and restore to the rfib() calls making the last rung
>> on the call tree less expensive by 6 memory references each, and by saving
>> restoring only 2 registers rather than 3 at the cost of a MOV, so each non-
>> leaf level in the call tree saves/restores only 4 registers instead of 6::
>>
>> rfib:
>>      SUB     R2,R1,#2
>>      PNLT    R2,TT              // (n-2) < 0
>>      MOV     R1,#1              // return 1
>>      RET
>>
>>      ENTER   R30,R0,#0          // just 2 preserved registers.
>>      MOV     R30,R2
>>      SUB     R1,R1,#1
>>      CALL    rfib               // rfib(n-1)
>>      MOV     R2,R1
>>      MOV     R1,R30
>>      MOV     R30,R2
>>      CALL    rfib               // rfib(n-2)
>>
>>      ADD     R1,R1,R30
>>      EXIT    R30,R0,#0          // return rfib(n-1)+rfib(n-2)
>>
>> But then I realized that rfib(n-2) has already been computed by rfib(n-1).
>> Since the size of the redundant element on the stack is constant, rfib(n-1)
>> makes a location available to rfib(n-2) so only 1 call is required instead
>> of 2::
>>
>> rfib:
>>      ENTER   R0,R0,#8
>>      STD     #0,[SP+8]        // setup rfib[n-2]
>>      CALL    rfib1             // recurse through rfib1
>>      EXIT    R0,R0,#8
>>
>> rfib1:
>>      SUB     R2,R1,#2
>>      PNLT    R2,TT              // (n-2) < 0
>>      MOV     R1,#1              // return 1
>>      RET
>>
>>      ENTER   R0,R0,#8           // no preserved registers just return
>> address
>>
>>      SUB     R1,R1,#1
>>      CALL    rfib1              // rfib(n-1)
>>      LDD     R2,[SP]            // save rfib(n-2)
>>
>>      ADD     R1,R1,R30
>>      STD     R1,[SP+16]         // restore rfib(n-2)
>>      EXIT    R0,R0,#8           // return rfib(n-1)+rfib(n-2)
>>
>> ....

> If you cache the intermediate results of the calculations, this entirely
> defeats its use as a benchmark...

Exactly:: but there is talk of compilers figuring out that rfib() is a
pure function and then the compiler itself performs that transformation.

{{Oh and BTW: Fibonacci should have never been a benchmark (post 1985)}}

> Say, because it drops from ~ O(2^n) to ~ O(n)...

It also reduces the constant term in BigO ! since the code path is shorter.

Back in 1982±, S.E.L. got their compiler to recognize the transcendental
loop as constant, and this one transformation improved Whetstone by 3×.

> Granted, There are plenty of other much more efficient ways of
> calculating Fibonacci numbers...

Make a file as big as you like, and simply index.

BigO( const )

Re: AMD CPU funny

<umrb9a$1m3mj$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36424&group=comp.arch#36424

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!news.niel.me!glou.org!news.glou.org!usenet-fr.net!proxad.net!feeder1-2.proxad.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Pancho.Jones@proton.me (Pancho)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sun, 31 Dec 2023 09:12:12 +0000
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <umrb9a$1m3mj$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com>
<umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net>
<ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me>
<bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com>
<umq8hi$1ebuo$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 31 Dec 2023 09:12:10 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="877927e6e8a39a5a2fc47f133a41c074";
logging-data="1773267"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Y+1xXugoETSXdP0upR39CcjqCMIYBXxo="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:/KPeiCQJIcX0mYUqQQR8RhUjGVw=
In-Reply-To: <umq8hi$1ebuo$1@dont-email.me>
Content-Language: en-GB
 by: Pancho - Sun, 31 Dec 2023 09:12 UTC

On 30/12/2023 23:19, BGB wrote:

> Granted, There are plenty of other much more efficient ways of
> calculating Fibonacci numbers...
>

Yes. In order of effort.

1) Memoize the recursive function.
2) Return values for (n-1,n-2) as a tuple, and make it single (tail)
recursive. Which a smart compiler could theoretically convert to an
iterative loop.
3) Write an iterative loop yourself.
4) Remember college maths, and solve the recurrence relation to give a
very simple closed form solution.

In fact, given that fib grows exponentially, so we only ever need to
calculate it for small values of n, the only “bad” solution is the naive
double recursive one.

Re: AMD CPU funny

<ums0vv$1p1a5$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36425&group=comp.arch#36425

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sun, 31 Dec 2023 16:22:39 +0100
Organization: A noiseless patient Spider
Lines: 105
Message-ID: <ums0vv$1p1a5$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com>
<umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net>
<ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me>
<bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Injection-Date: Sun, 31 Dec 2023 15:22:39 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c5e4f11c9061857ac4f60ec384460c79";
logging-data="1869125"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+VY7VFQ3DYraWOu5HhsIMhUZStifUCbJMrI2bfVsK3+w=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18
Cancel-Lock: sha1:DpfhzKqzytyMWpL5AxU9Z0fL610=
In-Reply-To: <bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com>
 by: Terje Mathisen - Sun, 31 Dec 2023 15:22 UTC

MitchAlsup wrote:
> BGB wrote:
>
>>    long rfib(long x)
>>      { return((x<2)?1:(rfib(x-1)+rfib(x-2))); }
>
> At first I thought the following looked pretty good:
>
> rfib:
>      ENTER   R29,R0,#0          // just 3 preserved registers.
>
>      CMP     R2,R1,#2
>      PGE     R2,TT
>      MOV     R1,#1              // return 1
>      EXIT    R29,R0,#0
>
>      SUB     R30,R1,#2
>      SUB     R1,R1,#1
>      CALL    rfib               // rfib(n-1)
>      MOV     R29,R1
>      MOV     R1,R30
>      CALL    rfib               // rfib(n-2)
>
>      ADD     R1,R1,R29
>      EXIT    R29,R0,#0          // return rfib(n-1)+rfib(n-2)
>
> But then I moved save and restore to the rfib() calls making the last rung
> on the call tree less expensive by 6 memory references each, and by saving
> restoring only 2 registers rather than 3 at the cost of a MOV, so each non-
> leaf level in the call tree saves/restores only 4 registers instead of 6::
>
> rfib:
>      SUB     R2,R1,#2
>      PNLT    R2,TT              // (n-2) < 0
>      MOV     R1,#1              // return 1
>      RET
>
>      ENTER   R30,R0,#0          // just 2 preserved registers.
>      MOV     R30,R2
>      SUB     R1,R1,#1
>      CALL    rfib               // rfib(n-1)
>      MOV     R2,R1
>      MOV     R1,R30
>      MOV     R30,R2
>      CALL    rfib               // rfib(n-2)
>
>      ADD     R1,R1,R30
>      EXIT    R30,R0,#0          // return rfib(n-1)+rfib(n-2)
>
> But then I realized that rfib(n-2) has already been computed by rfib(n-1).
> Since the size of the redundant element on the stack is constant, rfib(n-1)
> makes a location available to rfib(n-2) so only 1 call is required instead
> of 2::

That is the huge/obvious/sneaky (take your pick!) Fib optimization,
since you are effectively getting rid of the O(2^n) exponential
recursion fanout, making it O(n) instead.

The step from that to tail recursion elimination is the only thing
remaining, right?

What you've done is similar to converting

f_n = fib(n)

into

(f_n,f_n_minus_1) = fib2(n)
{ if (n < 2) return (1,1);
(f2,f1) = fib2(n-1);
return (f1+f2,f1);
}

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: AMD CPU funny

<ums29d$1p6cc$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36426&group=comp.arch#36426

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: daniel@me.invalid (Daniel James)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sun, 31 Dec 2023 15:45:32 +0000
Organization: Daniel James
Lines: 31
Message-ID: <ums29d$1p6cc$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com>
<umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net>
<ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me>
<bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com>
<umq8hi$1ebuo$1@dont-email.me> <umrb9a$1m3mj$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 31 Dec 2023 15:44:45 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d83cea0757a7dd18edc1a5424bb94484";
logging-data="1874316"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19wwyIvSA/yL5ElLaoLKCsD"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:FTPRb+MrV4tL0jIVXexczUWhXcc=
Content-Language: en-GB
In-Reply-To: <umrb9a$1m3mj$1@dont-email.me>
 by: Daniel James - Sun, 31 Dec 2023 15:45 UTC

On 31/12/2023 09:12, Pancho wrote:
> In fact, given that fib grows exponentially, so we only ever need to
> calculate it for small values of n, the only “bad” solution is the naive
> double recursive one.

Well, I might disagree.

Given that we only ever need to calculate it for small values of n (as
you say, and if that's true) its efficiency doesn't matter. If you need
to compute it often enough that efficiency might become relevant just
put the results in a table and use that. Given that fib goes over 10^12
after about 60 terms it needn't be a very big table.

The naive doubly recursive solution has the huge merit of being
human-readable.

Unlike, say:

double fib( int n )
{
return (pow( (1+sqrt(5.0))/2, n+1 )
- pow( (1-sqrt(5.0))/2, n+1 ))
/ sqrt(5.0);
}

.... which is great for fibs of big values, but not fast for small ones.

--
Cheers,
Daniel.

Re: AMD CPU funny

<ums5h9$1m63d$2@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36428&group=comp.arch#36428

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Pancho.Jones@proton.me (Pancho)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sun, 31 Dec 2023 16:40:09 +0000
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <ums5h9$1m63d$2@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com>
<umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net>
<ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me>
<bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com>
<umq8hi$1ebuo$1@dont-email.me> <umrb9a$1m3mj$1@dont-email.me>
<ums29d$1p6cc$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 31 Dec 2023 16:40:09 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="877927e6e8a39a5a2fc47f133a41c074";
logging-data="1775725"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/ZhVBnp1t6pPAqZdwNPzSgJ4V6Jc+7Nwg="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:cg3y/4x6qeSvMNqUmbNmPpo0f7M=
Content-Language: en-GB
In-Reply-To: <ums29d$1p6cc$1@dont-email.me>
 by: Pancho - Sun, 31 Dec 2023 16:40 UTC

On 31/12/2023 15:45, Daniel James wrote:
> On 31/12/2023 09:12, Pancho wrote:
>> In fact, given that fib grows exponentially, so we only ever need to
>> calculate it for small values of n, the only “bad” solution is the
>> naive double recursive one.
>
> Well, I might disagree.
>
> Given that we only ever need to calculate it for small values of n (as
> you say, and if that's true) its efficiency doesn't matter. If you need
> to compute it often enough that efficiency might become relevant just
> put the results in a table and use that. Given that fib goes over 10^12
> after about 60 terms it needn't be a very big table.
>

Memoization, is effectively a lookup table, but without the thinking. In
my case, avoiding thinking massively improves code reliability.

> The naive doubly recursive solution has the huge merit of being
> human-readable.
>Yes, but you'd be surprised how quickly double recursion goes bad. I
haven't tested recently, but off the top of my head, the maximum
function call depth would be something like 300, so fib(10) would be a
stack overflow. fib(60) would need a stack function call depth ~ 1e18.

> Unlike, say:
>
>   double fib( int n )
>   {
>       return (pow( (1+sqrt(5.0))/2, n+1 )
>               - pow( (1-sqrt(5.0))/2, n+1 ))
>           / sqrt(5.0);
>   }
>
> ... which is great for fibs of big values, but not fast for small ones.
>

Why isn't it fast for small fibs?

I seem to recall, if statements are more costly than floating point
function calls, like exponential.

Re: AMD CPU funny

<ums69b$1pnss$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36429&group=comp.arch#36429

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: vir.campestris@invalid.invalid (Vir Campestris)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sun, 31 Dec 2023 16:52:58 +0000
Organization: A noiseless patient Spider
Lines: 9
Message-ID: <ums69b$1pnss$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<kv8s49Fg7k5U3@mid.individual.net> <kv8u2tFg7k5U6@mid.individual.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 31 Dec 2023 16:52:59 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="9e2cabd8a5a19e27c421c7576dce03d8";
logging-data="1892252"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19c63ZiDoSoAO3Ixte30aEiMlp1fLfU2Z8="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:IgAeAJPa9e9gBGbzxrUWDCDbR0g=
In-Reply-To: <kv8u2tFg7k5U6@mid.individual.net>
Content-Language: en-GB
 by: Vir Campestris - Sun, 31 Dec 2023 16:52 UTC

On 29/12/2023 22:04, Andy Burns wrote:
> and (perversely?) with -Ofast

Thank you for those. It looks as though there are some other effects
going on apart from the cache size - but I can clearly see the drop when
you run out of cache. Within the caches? Too much noise for me to
understand!

Andy

Re: AMD CPU funny

<ums6te$1pqp3$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36430&group=comp.arch#36430

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sun, 31 Dec 2023 18:03:42 +0100
Organization: A noiseless patient Spider
Lines: 74
Message-ID: <ums6te$1pqp3$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com>
<umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net>
<ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me>
<bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com>
<umq8hi$1ebuo$1@dont-email.me> <umrb9a$1m3mj$1@dont-email.me>
<ums29d$1p6cc$1@dont-email.me> <ums5h9$1m63d$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Injection-Date: Sun, 31 Dec 2023 17:03:42 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c5e4f11c9061857ac4f60ec384460c79";
logging-data="1895203"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+MCGI44ClJwRhC9BYwQ3PwsTLZmyF6Fug7ZVJCrd5Oww=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18
Cancel-Lock: sha1:Ngj6OS3qttPdfKwjUoqrrYvRczk=
In-Reply-To: <ums5h9$1m63d$2@dont-email.me>
 by: Terje Mathisen - Sun, 31 Dec 2023 17:03 UTC

Pancho wrote:
> On 31/12/2023 15:45, Daniel James wrote:
>> On 31/12/2023 09:12, Pancho wrote:
>>> In fact, given that fib grows exponentially, so we only ever need to
>>> calculate it for small values of n, the only “bad” solution is the
>>> naive double recursive one.
>>
>> Well, I might disagree.
>>
>> Given that we only ever need to calculate it for small values of n (as
>> you say, and if that's true) its efficiency doesn't matter. If you
>> need to compute it often enough that efficiency might become relevant
>> just put the results in a table and use that. Given that fib goes over
>> 10^12 after about 60 terms it needn't be a very big table.
>>
>
> Memoization, is effectively a lookup table, but without the thinking. In
> my case, avoiding thinking massively improves code reliability.
>
>> The naive doubly recursive solution has the huge merit of being
>> human-readable.
>> Yes, but you'd be surprised how quickly double recursion goes bad. I
> haven't tested recently, but off the top of my head, the maximum
> function call depth would be something like 300, so fib(10) would be a
> stack overflow. fib(60) would need a stack function call depth ~ 1e18.
>
>> Unlike, say:
>>
>>    double fib( int n )
>>    {
>>        return (pow( (1+sqrt(5.0))/2, n+1 )
>>                - pow( (1-sqrt(5.0))/2, n+1 ))
>>            / sqrt(5.0);
>>    }
>>
>> ... which is great for fibs of big values, but not fast for small ones.
>>
>
> Why isn't it fast for small fibs?
>
> I seem to recall, if statements are more costly than floating point
> function calls, like exponential.

The recursive fob() function has just a single if () statement, so worst
case you'll get one or two branch mispredicts, while the 2^n recursive
calls (for fib(10 or 11)) will probably each take about as long.

An iterative O(n) version will take 2-4 clcok cycles per iteration, so
for n less than about 40, you'd only have ~100 clock cycles to evaluate
the two pow() calls. (I'm assuming the compiler or programmer have
pre-calculated both (1+sqrt(5))*0.5 and (1-sqrt(5))*0.5 as well as
1/sqrt(5), so that only the two pow() calls remain!)

You need Mitch's ~20-cycle pow() to win this one at one or two-digit N.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: AMD CPU funny

<umsl03$1rjgv$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36450&group=comp.arch#36450

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Pancho.Jones@proton.me (Pancho)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sun, 31 Dec 2023 21:04:03 +0000
Organization: A noiseless patient Spider
Lines: 11
Message-ID: <umsl03$1rjgv$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com>
<umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net>
<ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me>
<bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com>
<umq8hi$1ebuo$1@dont-email.me> <umrb9a$1m3mj$1@dont-email.me>
<ums29d$1p6cc$1@dont-email.me> <ums5h9$1m63d$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 31 Dec 2023 21:04:03 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5a0b82a9cff17340c6d00205c3ae421f";
logging-data="1953311"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18y6z0LpASGWs2IUwX6zMMSbAjTX3Jm5LQ="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:P1lqt3xtB63rrOpXY5Jq1dI75L0=
In-Reply-To: <ums5h9$1m63d$2@dont-email.me>
Content-Language: en-US
 by: Pancho - Sun, 31 Dec 2023 21:04 UTC

On 31/12/2023 16:40, Pancho wrote:
> I
> haven't tested recently, but off the top of my head, the maximum
> function call depth would be something like 300, so fib(10) would be a
> stack overflow. fib(60) would need a stack function call depth ~ 1e18.
>

Sorry this was a mistake, the function call depth is only ~n not 2^n.
For fib(n), there are ~2^n function calls, but not 2^n deep on the stack.

Re: AMD CPU funny

<umsls6$1roem$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36451&group=comp.arch#36451

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Pancho.Jones@proton.me (Pancho)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sun, 31 Dec 2023 21:19:01 +0000
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <umsls6$1roem$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com>
<umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net>
<ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me>
<bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com>
<umq8hi$1ebuo$1@dont-email.me> <umrb9a$1m3mj$1@dont-email.me>
<ums29d$1p6cc$1@dont-email.me> <ums5h9$1m63d$2@dont-email.me>
<ums6te$1pqp3$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 31 Dec 2023 21:19:02 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5a0b82a9cff17340c6d00205c3ae421f";
logging-data="1958358"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Dxbqa3Kq7RBJLbY2qjPDYyywin1SPngY="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:8KIvBqDVBsl3qbHxCM4oKimBI7c=
Content-Language: en-US
In-Reply-To: <ums6te$1pqp3$1@dont-email.me>
 by: Pancho - Sun, 31 Dec 2023 21:19 UTC

On 31/12/2023 17:03, Terje Mathisen wrote:

>
> The recursive fob() function has just a single if () statement, so worst
> case you'll get one or two branch mispredicts, while the 2^n recursive
> calls (for fib(10 or 11)) will probably each take about as long.
>

How many cycles for a branch mispredict? From real life, I have seen if
statements be more expensve than pow, but it was a long time ago, and
possibly code specific. It surprised me at the time, but ever since I
tend to ignore the cost of functions like pow().

> An iterative O(n) version will take 2-4 clcok cycles per iteration, so
> for n less than about 40, you'd only have ~100 clock cycles to evaluate
> the two pow() calls. (I'm assuming the compiler or programmer have
> pre-calculated both (1+sqrt(5))*0.5 and (1-sqrt(5))*0.5 as well as
> 1/sqrt(5), so that only the two pow() calls remain!)
>
> You need Mitch's ~20-cycle pow() to win this one at one or two-digit N.
>

I generally don't worry about optimizing stuff that is very fast.
Premature optimisation is the root of all evil and what not. If you are
really calling it a huge number of times, a lookup is quickest.

Re: AMD CPU funny

<umuk0t$286vf$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36454&group=comp.arch#36454

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: daniel@me.invalid (Daniel James)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Mon, 1 Jan 2024 15:00:28 +0000
Organization: Daniel James
Lines: 52
Message-ID: <umuk0t$286vf$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com>
<umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net>
<ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me>
<bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com>
<umq8hi$1ebuo$1@dont-email.me> <umrb9a$1m3mj$1@dont-email.me>
<ums29d$1p6cc$1@dont-email.me> <ums5h9$1m63d$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 1 Jan 2024 14:59:41 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b7b6ede821b9053ffd733800c6d04bf0";
logging-data="2366447"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/p2Va6+E6n0WRna5/n68Mx"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:Fx+bL6GXdy8SVtAz1psmDFiYaIw=
Content-Language: en-GB
In-Reply-To: <ums5h9$1m63d$2@dont-email.me>
 by: Daniel James - Mon, 1 Jan 2024 15:00 UTC

On 31/12/2023 16:40, Pancho wrote:
>> The naive doubly recursive solution has the huge merit of being
>> human-readable.
>
> Yes, but you'd be surprised how quickly double recursion goes bad. I
> haven't tested recently, but off the top of my head, the maximum
> function call depth would be something like 300, so fib(10) would be
> a stack overflow. fib(60) would need a stack function call depth ~
> 1e18.

On 64-bit gcc here on Debian (so, 64-bit ints) I can call fib(50)
without blowing the stack, but it takes minutes (on a Ryzen 7).

However, the (signed) 64-bit integer result overflows for fib(n) where
n>45, which is more of a problem.

>> Unlike, say:
>>
>> double fib( int n )
>> {
>> return (pow( (1+sqrt(5.0))/2, n+1 )
>> - pow( (1-sqrt(5.0))/2, n+1 ))
>> / sqrt(5.0);
>> }
>>
>> ... which is great for fibs of big values, but not fast for small
>> ones.
>>
>
> Why isn't it fast for small fibs?

Gut feeling. I haven't timed it, and it may be.

I mean ... obviously it will be as fast for small ones as for large, but
for small ones it won't beat an iterative calculation and for very small
ones I doubt it will beat the naive recursive method.

My point, though, was that clear code is often more important than fast
execution, and while the naive recursive method for fibs gets horribly
slow for large values even it is acceptable for small ones.

> I seem to recall, if statements are more costly than floating point
> function calls, like exponential.

if statements are costly if they invalidate instruction prefetch and/or
jump to code that isn't in the cache ... but optimizers and CPUs are
pretty good at branch prediction and cacheing.

--
Cheers,
Daniel.

Re: AMD CPU funny

<fd0ecac8cd27ba889c031da9b90e0c84@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36464&group=comp.arch#36464

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Mon, 1 Jan 2024 20:16:43 +0000
Organization: novaBBS
Message-ID: <fd0ecac8cd27ba889c031da9b90e0c84@news.novabbs.com>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me> <e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com> <um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk> <3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com> <fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me> <um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me> <umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com> <umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net> <ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me> <bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com> <umq8hi$1ebuo$1@dont-email.me> <umrb9a$1m3mj$1@dont-email.me> <ums29d$1p6cc$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1928256"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$sfevHwAn/ebNzy7nGJucFehP2E1HL/sYN/rXhFKfPvC7O97EDsOt.
 by: MitchAlsup - Mon, 1 Jan 2024 20:16 UTC

Daniel James wrote:

> Unlike, say:

> double fib( int n )
> {
> return (pow( (1+sqrt(5.0))/2, n+1 )
> - pow( (1-sqrt(5.0))/2, n+1 ))
> / sqrt(5.0);
> }

Looks fast to me::

fib:
ADD R2,R1,#1
POWN R3,#(1+sqrt(5.0))/2,R2
POWN R4,#(1-sqrt(5.0))/2,R2
FADD R2,R3,R4
FDIV R1,R2,#sqrt(5)
RET

Re: AMD CPU funny

<f17db7125403c069bd0753272df51e67@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36466&group=comp.arch#36466

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Mon, 1 Jan 2024 20:22:23 +0000
Organization: novaBBS
Message-ID: <f17db7125403c069bd0753272df51e67@news.novabbs.com>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me> <e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com> <um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk> <3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com> <fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me> <um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me> <umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com> <umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net> <ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me> <bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com> <umq8hi$1ebuo$1@dont-email.me> <umrb9a$1m3mj$1@dont-email.me> <ums29d$1p6cc$1@dont-email.me> <ums5h9$1m63d$2@dont-email.me> <ums6te$1pqp3$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1928722"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Site: $2y$10$CCoYtpYLELYtq3FtNgnwW.HYFrj9Lp7ax2tgU5vPupbAw/fNjBtBK
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Mon, 1 Jan 2024 20:22 UTC

Terje Mathisen wrote:

> Pancho wrote:
>>> Unlike, say:
>>>
>>>    double fib( int n )
>>>    {
>>>        return (pow( (1+sqrt(5.0))/2, n+1 )
>>>                - pow( (1-sqrt(5.0))/2, n+1 ))
>>>            / sqrt(5.0);
>>>    }
>>>
>>> ... which is great for fibs of big values, but not fast for small ones.
>>>
>>
>> Why isn't it fast for small fibs?
>>
>> I seem to recall, if statements are more costly than floating point
>> function calls, like exponential.

> The recursive fob() function has just a single if () statement, so worst
> case you'll get one or two branch mispredicts, while the 2^n recursive
> calls (for fib(10 or 11)) will probably each take about as long.

> An iterative O(n) version will take 2-4 clcok cycles per iteration, so
> for n less than about 40, you'd only have ~100 clock cycles to evaluate
> the two pow() calls.

POW is 39 cycles (unpipelined) in My 66000 ISA.

> (I'm assuming the compiler or programmer have
> pre-calculated both (1+sqrt(5))*0.5 and (1-sqrt(5))*0.5 as well as
> 1/sqrt(5), so that only the two pow() calls remain!)

Plus a FSUB and FDIV.

> You need Mitch's ~20-cycle pow() to win this one at one or two-digit N.

The Ln2() is 14, the exp2() is another 14, then there is the 90-bit×64-bit
multiply 5-cycles and pipeline latency of 5 cycles (and I seem to be missing
a cycle)

> Terje

Re: AMD CPU funny

<43ff8764d4eb940338b051caadcf2838@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36467&group=comp.arch#36467

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Mon, 1 Jan 2024 20:26:14 +0000
Organization: novaBBS
Message-ID: <43ff8764d4eb940338b051caadcf2838@news.novabbs.com>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me> <e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com> <um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk> <3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com> <fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me> <um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me> <umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com> <umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net> <ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me> <bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com> <umq8hi$1ebuo$1@dont-email.me> <umrb9a$1m3mj$1@dont-email.me> <ums29d$1p6cc$1@dont-email.me> <ums5h9$1m63d$2@dont-email.me> <ums6te$1pqp3$1@dont-email.me> <umsls6$1roem$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1928808"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$lqZIQlfJoy6OPoIWvje//.FQl4XyzjqoaUKsqvuxjKfMsFbE66f12
 by: MitchAlsup - Mon, 1 Jan 2024 20:26 UTC

Pancho wrote:

> On 31/12/2023 17:03, Terje Mathisen wrote:

>>
>> The recursive fob() function has just a single if () statement, so worst
>> case you'll get one or two branch mispredicts, while the 2^n recursive
>> calls (for fib(10 or 11)) will probably each take about as long.
>>

> How many cycles for a branch mispredict? From real life, I have seen if
> statements be more expensve than pow, but it was a long time ago, and
> possibly code specific. It surprised me at the time, but ever since I
> tend to ignore the cost of functions like pow().

Long latency branch mispredicts are often in the 13-cycle range. So if you
are issuing 2 instructions per cycle into the pipeline, you may be throwing
~40 instructions per mispredict. For example LD-CMP-BR where the LD misses
L1 but hits in L2.

Re: AMD CPU funny

<86cyuksoc0.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36472&group=comp.arch#36472

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: AMD CPU funny
Date: Mon, 01 Jan 2024 14:03:59 -0800
Organization: A noiseless patient Spider
Lines: 60
Message-ID: <86cyuksoc0.fsf@linuxsc.com>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me> <e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com> <um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk> <3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com> <fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me> <um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me> <umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com> <umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net> <ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me> <bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com> <umq8hi$1ebuo$1@dont-email.me> <umrb9a$1m3mj$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="4ad5b29cc5222e274263b40ba81c3fd0";
logging-data="2479563"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18WFeg6jhJfJ5C797zMS/pHpJy/0u1SPm4="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:N+NBkvxVTOEt89h1+i8BayImkyI=
sha1:Sk77mVJd6ZJuipNJ1+txss6mIuc=
 by: Tim Rentsch - Mon, 1 Jan 2024 22:03 UTC

Pancho <Pancho.Jones@proton.me> writes:

> On 30/12/2023 23:19, BGB wrote:
>
>> Granted, There are plenty of other much more efficient ways of
>> calculating Fibonacci numbers...
>
> Yes. In order of effort.
>
> 1) Memoize the recursive function.
> 2) Return values for (n-1,n-2) as a tuple, and make it single (tail)
> recursive. Which a smart compiler could theoretically convert to an
> iterative loop.
> 3) Write an iterative loop yourself.
> 4) Remember college maths, and solve the recurrence relation to give a
> very simple closed form solution.

All of these approaches have drawbacks of one sort or another.

For (1), we end up writing more code and using more memory.
Some problems are a good fit to memoization, but computing
Fibonacci numbers isn't one of the.

For (2), compilers are not so good at dealing with tail
recursion when the return value is a pair of results
rather than a single result.

For (3), iterative code takes more mental effort to see
that it works than does a simple recursive version.

For (4), limitations of floating point cause wrong values to
be produced, for just for 64-bit inputs.

Also, simply storing values in a table takes a fair amount
of memory: almost 3k bytes for 128-bit inputs, or about
3/4 k bytes for 64-bit inputs.

Here is a recursive formulation that is better than the
suggestion given in (2):

typedef unsigned long long NN;

static NN fib2( NN a, NN b, NN k );

NN
fibonacci( NN n ){
return fib2( 1, 0, n );
}

static NN
fib2( NN a, NN b, NN k ){
return k == 0 ? b : fib2( b, a+b, k-1 );
}

At optimization -Os or better, the resulting code looks
very much the same as if a single iterative loop had
been written instead.

Actually we can do much better, but we should do at least
as well as the code shown above.

Re: AMD CPU funny

<867ckssn67.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36473&group=comp.arch#36473

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: AMD CPU funny
Date: Mon, 01 Jan 2024 14:29:04 -0800
Organization: A noiseless patient Spider
Lines: 9
Message-ID: <867ckssn67.fsf@linuxsc.com>
References: <ulv7j4$l0o0$1@dont-email.me> <e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com> <um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk> <3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com> <fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me> <um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me> <umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com> <umn15d$svun$6@dont-email.me> <kv8gt0Fca4nU6@mid.individual.net> <ump7l3$19ud7$1@dont-email.me> <umpn9d$1c212$1@dont-email.me> <bc7bd58cc9f25ca592dc5c62f415ceec@news.novabbs.com> <umq8hi$1ebuo$1@dont-email.me> <umrb9a$1m3mj$1@dont-email.me> <ums29d$1p6cc$1@dont-email.me> <ums5h9$1m63d$2@dont-email.me> <umuk0t$286vf$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="4ad5b29cc5222e274263b40ba81c3fd0";
logging-data="2483172"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/66aTxfGmF8xXCY1I/U/RJD/1A2TMdQqQ="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:/HOggARUAE/k7OQ06KYahYDe+F8=
sha1:n42GER9vdQpQMOkslWpUGQb/lX4=
 by: Tim Rentsch - Mon, 1 Jan 2024 22:29 UTC

Daniel James <daniel@me.invalid> writes:

> However, the (signed) 64-bit integer result overflows for fib(n) where
> n>45, which is more of a problem.

fib( 92 ) is still less than 1UL << 63 (or fib(93) for a 64-bit
unsigned type).

And fib( 186 ) still fits in a 128-bit unsigned type.


devel / comp.arch / Re: AMD CPU funny

Pages:12345
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor