Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

The autodecrement is not magical. -- Larry Wall in the perl man page


devel / comp.arch / Re: AMD CPU funny

SubjectAuthor
* Re: AMD CPU funnyTheo
+- Re: AMD CPU funnyMitchAlsup
`* Re: AMD CPU funnyVir Campestris
 +* Re: AMD CPU funnyScott Lurndal
 |`* Re: AMD CPU funnyVir Campestris
 | +* Re: AMD CPU funnyMichael S
 | |`* Re: AMD CPU funnyMitchAlsup
 | | `* Re: AMD CPU funnyMichael S
 | |  `- Re: AMD CPU funnyMitchAlsup
 | `* Re: AMD CPU funnyAndy Burns
 |  `* Re: AMD CPU funnyChris M. Thomasson
 |   `- Re: AMD CPU funnyChris M. Thomasson
 +* Re: AMD CPU funnyMitchAlsup
 |`* Re: AMD CPU funnyVir Campestris
 | `* Re: AMD CPU funnyTheo
 |  `* Re: AMD CPU funnyMitchAlsup
 |   `* Re: AMD CPU funnyTheo
 |    +- Re: AMD CPU funnyMitchAlsup
 |    `* Re: AMD CPU funnyVir Campestris
 |     `* Re: AMD CPU funnyThomas Koenig
 |      `* Re: AMD CPU funnyDavid Brown
 |       `* Re: AMD CPU funnyVir Campestris
 |        +- Re: AMD CPU funnyDavid Brown
 |        `* Re: AMD CPU funnyMichael S
 |         `* Re: AMD CPU funnyVir Campestris
 |          `* Re: AMD CPU funnyAndy Burns
 |           `* Re: AMD CPU funnyTerje Mathisen
 |            `* Re: AMD CPU funnyBGB
 |             `* Re: AMD CPU funnyMitchAlsup
 |              +* Re: AMD CPU funnyBGB
 |              |+- Re: AMD CPU funnyMitchAlsup
 |              |`* Re: AMD CPU funnyPancho
 |              | +* Re: AMD CPU funnyDaniel James
 |              | |+* Re: AMD CPU funnyPancho
 |              | ||+* Re: AMD CPU funnyTerje Mathisen
 |              | |||+* Re: AMD CPU funnyPancho
 |              | ||||`- Re: AMD CPU funnyMitchAlsup
 |              | |||`- Re: AMD CPU funnyMitchAlsup
 |              | ||+- Re: AMD CPU funnyPancho
 |              | ||`* Re: AMD CPU funnyDaniel James
 |              | || +* Re: AMD CPU funnyTim Rentsch
 |              | || |`* Re: AMD CPU funnyTerje Mathisen
 |              | || | `- Re: AMD CPU funnyTim Rentsch
 |              | || `* Re: AMD CPU funnyPancho
 |              | ||  `- Re: AMD CPU funnyTim Rentsch
 |              | |+* Re: AMD CPU funnyMitchAlsup
 |              | ||`- Re: AMD CPU funnyTerje Mathisen
 |              | |`* Re: AMD CPU funnyTim Rentsch
 |              | | +* Re: AMD CPU funnyTerje Mathisen
 |              | | |+- Re: AMD CPU funnyMitchAlsup
 |              | | |`* Re: AMD CPU funnyTim Rentsch
 |              | | | `* Re: AMD CPU funnyTerje Mathisen
 |              | | |  `- Re: AMD CPU funnyTim Rentsch
 |              | | `* Re: AMD CPU funnyBGB
 |              | |  `- Re: AMD CPU funnyTim Rentsch
 |              | `* Re: AMD CPU funnyTim Rentsch
 |              |  `* Re: AMD CPU funnyThomas Koenig
 |              |   +* Re: AMD CPU funnyMitchAlsup
 |              |   |+* Re: AMD CPU funnyTerje Mathisen
 |              |   ||`- Re: AMD CPU funnyTim Rentsch
 |              |   |`* Re: AMD CPU funnyThomas Koenig
 |              |   | +- Re: AMD CPU funnyMitchAlsup
 |              |   | `- Re: AMD CPU funnyTerje Mathisen
 |              |   `* Re: AMD CPU funnyTim Rentsch
 |              |    +* Re: AMD CPU funnyTerje Mathisen
 |              |    |`* Re: AMD CPU funnyTim Rentsch
 |              |    | +* Re: AMD CPU funnyMitchAlsup
 |              |    | |`- Re: AMD CPU funnyTim Rentsch
 |              |    | `* Re: AMD CPU funnyTerje Mathisen
 |              |    |  +* Re: AMD CPU funnyMitchAlsup
 |              |    |  |+* Re: AMD CPU funnyTerje Mathisen
 |              |    |  ||`* Re: AMD CPU funnyMitchAlsup
 |              |    |  || +- Re: AMD CPU funnyTim Rentsch
 |              |    |  || `* Re: AMD CPU funnyTerje Mathisen
 |              |    |  ||  `* Re: AMD CPU funnyTerje Mathisen
 |              |    |  ||   +* Re: AMD CPU funnyTim Rentsch
 |              |    |  ||   |`* Re: AMD CPU funnyTerje Mathisen
 |              |    |  ||   | `* Re: AMD CPU funnyTim Rentsch
 |              |    |  ||   |  `* Re: AMD CPU funnyTerje Mathisen
 |              |    |  ||   |   `- Re: AMD CPU funnyTim Rentsch
 |              |    |  ||   `* Re: AMD CPU funnyMitchAlsup
 |              |    |  ||    `* Re: AMD CPU funnyMitchAlsup
 |              |    |  ||     `- Re: AMD CPU funnyTim Rentsch
 |              |    |  |`- Re: AMD CPU funnyTim Rentsch
 |              |    |  `* Re: AMD CPU funnyTim Rentsch
 |              |    |   `* Re: AMD CPU funnyTerje Mathisen
 |              |    |    `* Re: AMD CPU funnyTim Rentsch
 |              |    |     `- Re: AMD CPU funnyTerje Mathisen
 |              |    `* Re: AMD CPU funnyThomas Koenig
 |              |     +* Re: AMD CPU funnyTim Rentsch
 |              |     |`- Re: AMD CPU funnyTerje Mathisen
 |              |     `* Re: AMD CPU funnyMitchAlsup
 |              |      `* Re: AMD CPU funnyPaul A. Clayton
 |              |       +* Re: AMD CPU funnyAnton Ertl
 |              |       |+- Re: AMD CPU funnyMitchAlsup
 |              |       |`* Re: AMD CPU funnyGeorge Neuner
 |              |       | `- Re: AMD CPU funnyAnton Ertl
 |              |       `- Re: AMD CPU funnyBGB-Alt
 |              `- Re: AMD CPU funnyTerje Mathisen
 `* Re: AMD CPU funnyAndy Burns
  +* Re: AMD CPU funnyAndy Burns
  `- Re: AMD CPU funnyMitchAlsup

Pages:12345
Re: AMD CPU funny

<hVh*RTmyz@news.chiark.greenend.org.uk>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35922&group=comp.arch#35922

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsfeed.xs3.de!nntp-feed.chiark.greenend.org.uk!ewrotcd!.POSTED.chiark.greenend.org.uk!not-for-mail
From: theom+news@chiark.greenend.org.uk (Theo)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: 20 Dec 2023 18:08:47 +0000 (GMT)
Organization: University of Cambridge, England
Message-ID: <hVh*RTmyz@news.chiark.greenend.org.uk>
References: <ulv7j4$l0o0$1@dont-email.me>
Injection-Info: chiark.greenend.org.uk; posting-host="chiark.greenend.org.uk:212.13.197.229";
logging-data="12594"; mail-complaints-to="abuse@chiark.greenend.org.uk"
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (Linux/5.10.0-22-amd64 (x86_64))
Originator: theom@chiark.greenend.org.uk ([212.13.197.229])
 by: Theo - Wed, 20 Dec 2023 18:08 UTC

Vir Campestris <vir.campestris@invalid.invalid> wrote:
> This is not the right group for this - but I don't know where is.
> Suggestions on a postcard please...

I'm crossposting this to comp.arch, where they may have some ideas.

> For reasons I won't go into I've been writing some code to evaluate
> memory performance on my AMD Ryzen 5 3400G.
>
> It says in the stuff I've found that each core has an 8-way set
> associative L1 data cache of 128k (and an L1 instruction cache); an L2
> cache of 512k, also set associative; and there's an L3 cache of 4MB.
>
> To measure the performance I have three nested loops.
>
> The innermost one goes around a loop incrementing a series of 64 bit
> memory locations. The length of the series is set by the outermost loop.
>
> The middle one repeats the innermost loop so that the number of memory
> accesses is constant regardless of the series length.
>
> The outermost one sets the series length. It starts at 1, and doubles it
> each time.
>
> I _thought_ what would happen is that as I increase the length of the
> series after a while the data won't fit in the cache, and I'll see a
> sudden slowdown.
>
> What I actually see is:
>
> With a series length of 56 to 128 bytes I get the highest speed.
>
> With a series length of 500B to 1.5MB, I get a consistent speed of about
> 2/3 the highest speed.
>
> Once the series length exceeds 1.5MB the speed drops, and is consistent
> from then on. That I can see is main memory speed, and is about 40% of
> the highest.
>
> OK so far.
>
> But...
> Series length 8B is about the same as the 56 to 128 speed. Series length
> 16B is a bit less. Series length 32 is a lot less. Not as slow as main
> memory, but not much more than half the peak speed. My next step up is
> the peak speed. Series length 144 to 448 is slower still - slower in
> fact than the main memory speed.
>
> WTF?
>
> I can post the code (C++, but not very complex) if that would help.

For 'series length 8B/16B/32B' do you mean 8 bytes? ie 8B is a single 64
bit word transferred?

What instruction sequences are being generated for the 8/16/32/64 byte
loops? I'm wondering if the compiler is using different instructions,
eg using MMX, SSE, AVX to do the operations. Maybe they are having
different caching behaviour?

It would help if you could tell us the compiler and platform you're using,
including version.

Theo

Re: AMD CPU funny

<2548c6b928abfac1580528dd900ed823@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35926&group=comp.arch#35926

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Wed, 20 Dec 2023 18:58:35 +0000
Organization: novaBBS
Message-ID: <2548c6b928abfac1580528dd900ed823@news.novabbs.com>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="566970"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$humhFVOVc.fuhHmVZfBt3e/Nub6bAOUoSUNm1khOoHM9AphwNxCyS
 by: MitchAlsup - Wed, 20 Dec 2023 18:58 UTC

Theo wrote:

> Vir Campestris <vir.campestris@invalid.invalid> wrote:
>> This is not the right group for this - but I don't know where is.
>> Suggestions on a postcard please...

> I'm crossposting this to comp.arch, where they may have some ideas.

>> For reasons I won't go into I've been writing some code to evaluate
>> memory performance on my AMD Ryzen 5 3400G.
>>
>> It says in the stuff I've found that each core has an 8-way set
>> associative L1 data cache of 128k (and an L1 instruction cache); an L2
>> cache of 512k, also set associative; and there's an L3 cache of 4MB.
>>
>> To measure the performance I have three nested loops.
>>
>> The innermost one goes around a loop incrementing a series of 64 bit
>> memory locations. The length of the series is set by the outermost loop.
>>
>> The middle one repeats the innermost loop so that the number of memory
>> accesses is constant regardless of the series length.
>>
>> The outermost one sets the series length. It starts at 1, and doubles it
>> each time.
>>
>> I _thought_ what would happen is that as I increase the length of the
>> series after a while the data won't fit in the cache, and I'll see a
>> sudden slowdown.
>>
>> What I actually see is:
>>
>> With a series length of 56 to 128 bytes I get the highest speed.
>>
>> With a series length of 500B to 1.5MB, I get a consistent speed of about
>> 2/3 the highest speed.
>>
>> Once the series length exceeds 1.5MB the speed drops, and is consistent
>> from then on. That I can see is main memory speed, and is about 40% of
>> the highest.
>>
>> OK so far.
>>
>> But...
>> Series length 8B is about the same as the 56 to 128 speed. Series length
>> 16B is a bit less. Series length 32 is a lot less. Not as slow as main
>> memory, but not much more than half the peak speed. My next step up is
>> the peak speed. Series length 144 to 448 is slower still - slower in
>> fact than the main memory speed.
>>
>> WTF?
>>
>> I can post the code (C++, but not very complex) if that would help.

> For 'series length 8B/16B/32B' do you mean 8 bytes? ie 8B is a single 64
> bit word transferred?

> What instruction sequences are being generated for the 8/16/32/64 byte
> loops? I'm wondering if the compiler is using different instructions,
> eg using MMX, SSE, AVX to do the operations. Maybe they are having
> different caching behaviour?

> It would help if you could tell us the compiler and platform you're using,
> including version.

> Theo

Can we see the code ??

Can you present a table of the timing results ??

Re: AMD CPU funny

<um1h21$13l52$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35950&group=comp.arch#35950

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!rocksolid2!news.neodome.net!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: vir.campestris@invalid.invalid (Vir Campestris)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Thu, 21 Dec 2023 14:11:12 +0000
Organization: A noiseless patient Spider
Lines: 141
Message-ID: <um1h21$13l52$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 21 Dec 2023 14:11:13 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5d46455dea781c0d66cdade1098eb582";
logging-data="1168546"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18MEX7CdGbLl1OH0mVp4yWmESvCEE3Z4+8="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:23RHPU2wNGg+71JnkrPmQUg1NBs=
In-Reply-To: <hVh*RTmyz@news.chiark.greenend.org.uk>
Content-Language: en-GB
 by: Vir Campestris - Thu, 21 Dec 2023 14:11 UTC

On 20/12/2023 18:08, Theo wrote:
> Vir Campestris <vir.campestris@invalid.invalid> wrote:
>> This is not the right group for this - but I don't know where is.
>> Suggestions on a postcard please...
>
> I'm crossposting this to comp.arch, where they may have some ideas.
>
<snip>
>
> For 'series length 8B/16B/32B' do you mean 8 bytes? ie 8B is a single 64
> bit word transferred?
>
Yes. My system has a 64 bit CPU and 64 bit main memory.

> What instruction sequences are being generated for the 8/16/32/64 byte
> loops? I'm wondering if the compiler is using different instructions,
> eg using MMX, SSE, AVX to do the operations. Maybe they are having
> different caching behaviour?
>
It's running the same loop for each time, but with different values for
the loop sizes.

> It would help if you could tell us the compiler and platform you're using,
> including version.
>

g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Which of course tells you I'm running Ubuntu!

On 20/12/2023 18:58, MitchAlsup wrote:
>
> Can we see the code ??
>
> Can you present a table of the timing results ??

I've run this with more detailed increments on the line size, but here
are my results for powers of 2.

Size 1 gave 3.82242e+09 bytes/second.
Size 2 gave 3.80533e+09 bytes/second.
Size 4 gave 2.68017e+09 bytes/second.
Size 8 gave 2.33751e+09 bytes/second.
Size 16 gave 2.18424e+09 bytes/second.
Size 32 gave 2.10243e+09 bytes/second.
Size 64 gave 1.99371e+09 bytes/second.
Size 128 gave 1.98475e+09 bytes/second.
Size 256 gave 2.01653e+09 bytes/second.
Size 512 gave 2.00884e+09 bytes/second.
Size 1024 gave 2.02713e+09 bytes/second.
Size 2048 gave 2.01803e+09 bytes/second.
Size 4096 gave 3.26472e+09 bytes/second.
Size 8192 gave 3.85126e+09 bytes/second.
Size 16384 gave 3.85377e+09 bytes/second.
Size 32768 gave 3.85293e+09 bytes/second.
Size 65536 gave 2.06793e+09 bytes/second.
Size 131072 gave 2.06845e+09 bytes/second.

The code will continue, but the results are roughly stable for larger sizes.

The code I have put in a signature block; there's no point in risking
someone posting it again. I've commented it, but no doubt not in all the
right places! I'd be interested to know what results other people get.

Thanks
Andy
--
#include <chrono>
#include <iostream>
#include <vector>

int main()
{ // If your computer is much slower or faster than mine
// you may need to adjust this value.
constexpr size_t NextCount = 1 << 28;

std::vector<uint64_t> CacheStore(NextCount, 0);

// Get a raw pointer to the vector.
// On my machine (Ubuntu, g++) this improves
// performance. Using vector's operator[]
// results in a function call.
uint64_t *CachePtr = &CacheStore[0];

// SetSize is the count of the uint64_t items to be tested.
// I assume that when this is too big for a cache the data
// will overflow to the next level.
// Each loop it doubles in size. I've run it with smaller
// increments too, and the behaviour
// is still confusing.
for (auto SetSize = 1; SetSize < NextCount; SetSize<<=1)
{
size_t loopcount = 0;
size_t j = NextCount / SetSize;
auto start = std::chrono::steady_clock::now();

// The outer loop repeats enough times so that the data
// written by the inner loops of various sizes is
// approximately constant.
for (size_t k = 0; k < j; ++k)
{
// The inner loop modifies data
// within a set of words.
for (size_t l = 0; l < SetSize; ++l)
{
// read-modify-write some data.
// Comment this out
// to confirm that the looping is not
// the cause of the anomaly
++CachePtr[l];

// this counts the actual number
// of memory accesses.
// rounding errors means that for
// different SetSize values
// the count is not completely
// consistent.
++loopcount;
}
}

// Work out how long the loops took in microseconds,
// then scale to seconds
auto delta =
std::chrono::duration_cast<std::chrono::microseconds>
(std::chrono::steady_clock::now() - start).count()
/ 1e6;

// calculate how many bytes per second, and print.
std::cout << "Size " << SetSize << " gave "
<< (double)loopcount * (double)sizeof(uint64_t) /
delta << " bytes/second." << std::endl;
}

return 0;
}

Re: AMD CPU funny

<MmYgN.93175$Wp_8.13425@fx17.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35954&group=comp.arch#35954

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx17.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: AMD CPU funny
Newsgroups: uk.comp.homebuilt,comp.arch
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
Lines: 17
Message-ID: <MmYgN.93175$Wp_8.13425@fx17.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Thu, 21 Dec 2023 14:56:44 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Thu, 21 Dec 2023 14:56:44 GMT
X-Received-Bytes: 1227
 by: Scott Lurndal - Thu, 21 Dec 2023 14:56 UTC

Vir Campestris <vir.campestris@invalid.invalid> writes:
>On 20/12/2023 18:08, Theo wrote:
>> Vir Campestris <vir.campestris@invalid.invalid> wrote:
>>> This is not the right group for this - but I don't know where is.
>>> Suggestions on a postcard please...
>>
>> I'm crossposting this to comp.arch, where they may have some ideas.
>>
><snip>
>>
>> For 'series length 8B/16B/32B' do you mean 8 bytes? ie 8B is a single 64
>> bit word transferred?
>>
>Yes. My system has a 64 bit CPU and 64 bit main memory.

Surely your main memory is just a sequence of 8-bit bytes (+ECC).

Re: AMD CPU funny

<um1lpq$14br5$2@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35956&group=comp.arch#35956

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: vir.campestris@invalid.invalid (Vir Campestris)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Thu, 21 Dec 2023 15:32:10 +0000
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <um1lpq$14br5$2@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<MmYgN.93175$Wp_8.13425@fx17.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 21 Dec 2023 15:32:10 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5d46455dea781c0d66cdade1098eb582";
logging-data="1191781"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/tdlfHa6e3n8Wes7FoNSYhTbkR/qu+SKA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:t4Ac5igI1BghYQxA6MmgCe+QTPo=
Content-Language: en-GB
In-Reply-To: <MmYgN.93175$Wp_8.13425@fx17.iad>
 by: Vir Campestris - Thu, 21 Dec 2023 15:32 UTC

On 21/12/2023 14:56, Scott Lurndal wrote:
> Vir Campestris <vir.campestris@invalid.invalid> writes:
>> On 20/12/2023 18:08, Theo wrote:
>>> Vir Campestris <vir.campestris@invalid.invalid> wrote:
>>>> This is not the right group for this - but I don't know where is.
>>>> Suggestions on a postcard please...
>>>
>>> I'm crossposting this to comp.arch, where they may have some ideas.
>>>
>> <snip>
>>>
>>> For 'series length 8B/16B/32B' do you mean 8 bytes? ie 8B is a single 64
>>> bit word transferred?
>>>
>> Yes. My system has a 64 bit CPU and 64 bit main memory.
>
> Surely your main memory is just a sequence of 8-bit bytes (+ECC).
>

AIUI I have two DIMMs, each of which has a 32-bit bus. I'm not up on
hardware these days, but it used to be if you wanted to write a byte
into your 16-bit memory you had to read both bytes, then write one back.

And

Re: AMD CPU funny

<20231221180933.00007baf@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35957&group=comp.arch#35957

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Thu, 21 Dec 2023 18:09:33 +0200
Organization: A noiseless patient Spider
Lines: 42
Message-ID: <20231221180933.00007baf@yahoo.com>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk>
<um1h21$13l52$1@dont-email.me>
<MmYgN.93175$Wp_8.13425@fx17.iad>
<um1lpq$14br5$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Injection-Info: dont-email.me; posting-host="ad61e48ed7f6a2e90dee9db879d36192";
logging-data="1140154"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18aVHmfdfUVUBSYSOwTYUDb+YxQZfAQo+U="
Cancel-Lock: sha1:iyWBm9GCX/pU9SHfi+4zO1bauiE=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
 by: Michael S - Thu, 21 Dec 2023 16:09 UTC

On Thu, 21 Dec 2023 15:32:10 +0000
Vir Campestris <vir.campestris@invalid.invalid> wrote:

> On 21/12/2023 14:56, Scott Lurndal wrote:
> > Vir Campestris <vir.campestris@invalid.invalid> writes:
> >> On 20/12/2023 18:08, Theo wrote:
> >>> Vir Campestris <vir.campestris@invalid.invalid> wrote:
> >>>> This is not the right group for this - but I don't know where is.
> >>>> Suggestions on a postcard please...
> >>>
> >>> I'm crossposting this to comp.arch, where they may have some
> >>> ideas.
> >> <snip>
> >>>
> >>> For 'series length 8B/16B/32B' do you mean 8 bytes? ie 8B is a
> >>> single 64 bit word transferred?
> >>>
> >> Yes. My system has a 64 bit CPU and 64 bit main memory.
> >
> > Surely your main memory is just a sequence of 8-bit bytes (+ECC).
> >
>
> AIUI I have two DIMMs, each of which has a 32-bit bus. I'm not up on
> hardware these days, but it used to be if you wanted to write a byte
> into your 16-bit memory you had to read both bytes, then write one
> back.
>
> And

DIMMs have 64-bit data buses. Both these days and previous millennium.
Now, these days DDR5 DIMM splits 64-bit data bus into pair of
independent 32-bit channels, but the total is still 64 bits.

That's a physical perspective. From logical perspective, DDR3 and DDR4
bits exchange data with controller in 512-bit chunks. On DDR5 DIMM each
channel exchanges data with controller in 512-bit chunks.

From signaling perspective, it is still possible (at least on non-ECC
gear) to tell to memory to write just 8 bits out of 512 and to ignore
the rest. In PC-class hardware this ability is used very rarely if used
at all.

Re: AMD CPU funny

<kujd2vFg48cU4@mid.individual.net>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35961&group=comp.arch#35961

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!news.neodome.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: usenet@andyburns.uk (Andy Burns)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Thu, 21 Dec 2023 18:05:20 +0000
Lines: 15
Message-ID: <kujd2vFg48cU4@mid.individual.net>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<MmYgN.93175$Wp_8.13425@fx17.iad> <um1lpq$14br5$2@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net m+QGsRkRURMAmyLUBYDhcQwMuzc05R7qzg5d5r7qteU21rUip0
Cancel-Lock: sha1:zgAhE+TO23AYNWiLNuOy87d5Rp8= sha256:+M1xODpCwJKuXAAJCXd/qJ4YvR2/1vfngYcOrEaZfuY=
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <um1lpq$14br5$2@dont-email.me>
 by: Andy Burns - Thu, 21 Dec 2023 18:05 UTC

Vir Campestris wrote:

> Scott Lurndal wrote:
>
>> Surely your main memory is just a sequence of 8-bit bytes (+ECC).
>
> AIUI I have two DIMMs, each of which has a 32-bit bus. I'm not up on
> hardware these days, but it used to be if you wanted to write a byte
> into your 16-bit memory you had to read both bytes, then write one back.

I thought intel x64 machines work in cache lines of 64 bytes per memory
transaction ... maybe AMD processors are different?

Re: AMD CPU funny

<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35965&group=comp.arch#35965

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Thu, 21 Dec 2023 18:30:47 +0000
Organization: novaBBS
Message-ID: <e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="678294"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$QwaqnM60cozv6s50xj06Oe5JQmj82Xc0eg7wG38hmp/MzAPKvfa6.
X-Spam-Checker-Version: SpamAssassin 4.0.0
 by: MitchAlsup - Thu, 21 Dec 2023 18:30 UTC

Vir Campestris wrote:

> On 20/12/2023 18:58, MitchAlsup wrote:
> >
> > Can we see the code ??
> >
> > Can you present a table of the timing results ??

> I've run this with more detailed increments on the line size, but here
> are my results for powers of 2.

{ A
> Size 1 gave 3.82242e+09 bytes/second.
> Size 2 gave 3.80533e+09 bytes/second.
> Size 4 gave 2.68017e+09 bytes/second.
> Size 8 gave 2.33751e+09 bytes/second.
> Size 16 gave 2.18424e+09 bytes/second.
> Size 32 gave 2.10243e+09 bytes/second.
> Size 64 gave 1.99371e+09 bytes/second.
> Size 128 gave 1.98475e+09 bytes/second.
> Size 256 gave 2.01653e+09 bytes/second.
> Size 512 gave 2.00884e+09 bytes/second.
> Size 1024 gave 2.02713e+09 bytes/second.
> Size 2048 gave 2.01803e+09 bytes/second.
} A
{ B
> Size 4096 gave 3.26472e+09 bytes/second.
> Size 8192 gave 3.85126e+09 bytes/second.
> Size 16384 gave 3.85377e+09 bytes/second.
> Size 32768 gave 3.85293e+09 bytes/second.
> Size 65536 gave 2.06793e+09 bytes/second.
> Size 131072 gave 2.06845e+09 bytes/second.
} B

A) Here we have a classical sequence pipelines often encounter where
a simple loop starts off fast 4 bytes/cycle and progressively slows
down by a factor of 2 (2 bytes per cycle) over some interval of
complexity (size). What is important is that factor of 2 something
that took 1 cycle early starts taking 2 cycles later on.

B) here we have a second classical sequence where the performance at
some boundary (4096) reverts back to the 1 cycle pipeline of performance
only to degrade again (in basically the same sequence as A). {{Side
note: size=4096 has "flutter" in the stairstep. size={8192..32768}
has peak performance--probably something to do with sets in the cache
and 4096 is the size of a page (TLB effects)}}.
I suspect the flutter has something to do with your buffer crossing
a page boundary.

> The code will continue, but the results are roughly stable for larger sizes.

> The code I have put in a signature block; there's no point in risking
> someone posting it again. I've commented it, but no doubt not in all the
> right places! I'd be interested to know what results other people get.

> Thanks
> Andy

Re: AMD CPU funny

<fbf6155232b1fc656e561e16bd44bafc@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35966&group=comp.arch#35966

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Thu, 21 Dec 2023 18:36:42 +0000
Organization: novaBBS
Message-ID: <fbf6155232b1fc656e561e16bd44bafc@news.novabbs.com>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me> <MmYgN.93175$Wp_8.13425@fx17.iad> <um1lpq$14br5$2@dont-email.me> <20231221180933.00007baf@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="679001"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Site: $2y$10$8TJVa8XarXQF/Mp9Rgg4QeQhjYa9ffcWOfHlTWGhWIftmkcYWIqvW
 by: MitchAlsup - Thu, 21 Dec 2023 18:36 UTC

Michael S wrote:

> On Thu, 21 Dec 2023 15:32:10 +0000
> Vir Campestris <vir.campestris@invalid.invalid> wrote:

>> On 21/12/2023 14:56, Scott Lurndal wrote:
>> > Vir Campestris <vir.campestris@invalid.invalid> writes:
>> >> On 20/12/2023 18:08, Theo wrote:
>> >>> Vir Campestris <vir.campestris@invalid.invalid> wrote:
>> >>>> This is not the right group for this - but I don't know where is.
>> >>>> Suggestions on a postcard please...
>> >>>
>> >>> I'm crossposting this to comp.arch, where they may have some
>> >>> ideas.
>> >> <snip>
>> >>>
>> >>> For 'series length 8B/16B/32B' do you mean 8 bytes? ie 8B is a
>> >>> single 64 bit word transferred?
>> >>>
>> >> Yes. My system has a 64 bit CPU and 64 bit main memory.
>> >
>> > Surely your main memory is just a sequence of 8-bit bytes (+ECC).
>> >
>>
>> AIUI I have two DIMMs, each of which has a 32-bit bus. I'm not up on
>> hardware these days, but it used to be if you wanted to write a byte
>> into your 16-bit memory you had to read both bytes, then write one
>> back.
>>
>> And

> DIMMs have 64-bit data buses. Both these days and previous millennium.
> Now, these days DDR5 DIMM splits 64-bit data bus into pair of
> independent 32-bit channels, but the total is still 64 bits.

All DDR DIMMs have 64-bit busses (200 pins on the DIMM).
Some SDR (pre 2000) DIMMs had 32-bit busses, some ancient laptop memory
carriers had 32-bit busses.

> That's a physical perspective. From logical perspective, DDR3 and DDR4
> bits exchange data with controller in 512-bit chunks. On DDR5 DIMM each
> channel exchanges data with controller in 512-bit chunks.

Note:: 512-bis is 64-bytes.

> From signaling perspective, it is still possible (at least on non-ECC
> gear) to tell to memory to write just 8 bits out of 512 and to ignore
> the rest. In PC-class hardware this ability is used very rarely if used
> at all.

Never justified when CPU uses a cache. So, only unCacheable (sized)
requests can use this. AMD memory controllers hid this from the DRAMs
so we at least had the opportunity to recompute ECC on ECC carrying
DIMMs. MC would ask DRC for the line of data, check (and repair) ECC
then write the unCacheable data, and place the written data in the
outbound memory queue with its new ECC.

Re: AMD CPU funny

<um2a7s$17ium$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35970&group=comp.arch#35970

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m.thomasson.1@gmail.com (Chris M. Thomasson)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Thu, 21 Dec 2023 13:20:59 -0800
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <um2a7s$17ium$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<MmYgN.93175$Wp_8.13425@fx17.iad> <um1lpq$14br5$2@dont-email.me>
<kujd2vFg48cU4@mid.individual.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 21 Dec 2023 21:21:00 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d5d40264675499f3691087d5c7ec9ace";
logging-data="1297366"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+gB28jP4UY35tYw8ocAECbqSytWRBsMqg="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:COo5seBD/YVr6vR/zs1beKEJMO4=
In-Reply-To: <kujd2vFg48cU4@mid.individual.net>
Content-Language: en-US
 by: Chris M. Thomasson - Thu, 21 Dec 2023 21:20 UTC

On 12/21/2023 10:05 AM, Andy Burns wrote:
> Vir Campestris wrote:
>
>> Scott Lurndal wrote:
>>
>>> Surely your main memory is just a sequence of 8-bit bytes (+ECC).
>>
>> AIUI I have two DIMMs, each of which has a 32-bit bus. I'm not up on
>> hardware these days, but it used to be if you wanted to write a byte
>> into your 16-bit memory you had to read both bytes, then write one back.
>
> I thought intel x64 machines work in cache lines of 64 bytes per memory
> transaction ... maybe AMD processors are different?
>
>
>

Remember when Intel had a false sharing problem when they had 128 cache
lines split into two 64 regions? Iirc, it was on some of their first
hyperthreaded processors. A work around from intel was to offset threads
using alloca... I remember it.

Re: AMD CPU funny

<20231222005507.00005e06@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35974&group=comp.arch#35974

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Fri, 22 Dec 2023 00:55:07 +0200
Organization: A noiseless patient Spider
Lines: 77
Message-ID: <20231222005507.00005e06@yahoo.com>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk>
<um1h21$13l52$1@dont-email.me>
<MmYgN.93175$Wp_8.13425@fx17.iad>
<um1lpq$14br5$2@dont-email.me>
<20231221180933.00007baf@yahoo.com>
<fbf6155232b1fc656e561e16bd44bafc@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="a3b4ac901d1c269f864952912c6e40c2";
logging-data="1321646"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+tc4nfzgzap5rs3HjrRCww2pjMmFf7NIg="
Cancel-Lock: sha1:I4jNsQZtB27lEuHJJi4ZdhJxQJo=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
 by: Michael S - Thu, 21 Dec 2023 22:55 UTC

On Thu, 21 Dec 2023 18:36:42 +0000
mitchalsup@aol.com (MitchAlsup) wrote:

> Michael S wrote:
>
> > On Thu, 21 Dec 2023 15:32:10 +0000
> > Vir Campestris <vir.campestris@invalid.invalid> wrote:
>
> >> On 21/12/2023 14:56, Scott Lurndal wrote:
> >> > Vir Campestris <vir.campestris@invalid.invalid> writes:
> >> >> On 20/12/2023 18:08, Theo wrote:
> >> >>> Vir Campestris <vir.campestris@invalid.invalid> wrote:
> >> >>>> This is not the right group for this - but I don't know where
> >> >>>> is. Suggestions on a postcard please...
> >> >>>
> >> >>> I'm crossposting this to comp.arch, where they may have some
> >> >>> ideas.
> >> >> <snip>
> >> >>>
> >> >>> For 'series length 8B/16B/32B' do you mean 8 bytes? ie 8B is a
> >> >>> single 64 bit word transferred?
> >> >>>
> >> >> Yes. My system has a 64 bit CPU and 64 bit main memory.
> >> >
> >> > Surely your main memory is just a sequence of 8-bit bytes (+ECC).
> >> >
> >>
> >> AIUI I have two DIMMs, each of which has a 32-bit bus. I'm not up
> >> on hardware these days, but it used to be if you wanted to write a
> >> byte into your 16-bit memory you had to read both bytes, then
> >> write one back.
> >>
> >> And
>
> > DIMMs have 64-bit data buses. Both these days and previous
> > millennium. Now, these days DDR5 DIMM splits 64-bit data bus into
> > pair of independent 32-bit channels, but the total is still 64
> > bits.
>
> All DDR DIMMs have 64-bit busses (200 pins on the DIMM).
> Some SDR (pre 2000) DIMMs had 32-bit busses,

Are you sure? My impression always was that the word DIMM was invented
to clearly separate 64-bit DIMMs from 32-bit SIMMs.

> some ancient laptop
> memory carriers had 32-bit busses.

But were they called just DIMM or SO-DIMM?
The later indeed had 72-pin variant with 32-bit bus.

>
> > That's a physical perspective. From logical perspective, DDR3 and
> > DDR4 bits exchange data with controller in 512-bit chunks. On DDR5
> > DIMM each channel exchanges data with controller in 512-bit chunks.
> >
>
> Note:: 512-bis is 64-bytes.
>

Which *not* co-incidentally matches cache line size of majority of x86
CPUs.

> > From signaling perspective, it is still possible (at least on
> > non-ECC gear) to tell to memory to write just 8 bits out of 512 and
> > to ignore the rest. In PC-class hardware this ability is used very
> > rarely if used at all.
>
> Never justified when CPU uses a cache. So, only unCacheable (sized)
> requests can use this. AMD memory controllers hid this from the DRAMs
> so we at least had the opportunity to recompute ECC on ECC carrying
> DIMMs. MC would ask DRC for the line of data, check (and repair) ECC
> then write the unCacheable data, and place the written data in the
> outbound memory queue with its new ECC.

Re: AMD CPU funny

<fb494526330df638d21a061b10f9190d@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35975&group=comp.arch#35975

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Thu, 21 Dec 2023 23:49:20 +0000
Organization: novaBBS
Message-ID: <fb494526330df638d21a061b10f9190d@news.novabbs.com>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me> <MmYgN.93175$Wp_8.13425@fx17.iad> <um1lpq$14br5$2@dont-email.me> <20231221180933.00007baf@yahoo.com> <fbf6155232b1fc656e561e16bd44bafc@news.novabbs.com> <20231222005507.00005e06@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="706018"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$zqm.AO6zYr0x5oruo1a9.ulzj1LZzE7XT15TJDyjDdUBa4B8n.jj2
X-Spam-Checker-Version: SpamAssassin 4.0.0
 by: MitchAlsup - Thu, 21 Dec 2023 23:49 UTC

Michael S wrote:

> On Thu, 21 Dec 2023 18:36:42 +0000
> mitchalsup@aol.com (MitchAlsup) wrote:

>> Michael S wrote:
>>
>> > On Thu, 21 Dec 2023 15:32:10 +0000
>> > Vir Campestris <vir.campestris@invalid.invalid> wrote:
>>
>> >> On 21/12/2023 14:56, Scott Lurndal wrote:
>> >> > Vir Campestris <vir.campestris@invalid.invalid> writes:
>> >> >> On 20/12/2023 18:08, Theo wrote:
>> >> >>> Vir Campestris <vir.campestris@invalid.invalid> wrote:
>> >> >>>> This is not the right group for this - but I don't know where
>> >> >>>> is. Suggestions on a postcard please...
>> >> >>>
>> >> >>> I'm crossposting this to comp.arch, where they may have some
>> >> >>> ideas.
>> >> >> <snip>
>> >> >>>
>> >> >>> For 'series length 8B/16B/32B' do you mean 8 bytes? ie 8B is a
>> >> >>> single 64 bit word transferred?
>> >> >>>
>> >> >> Yes. My system has a 64 bit CPU and 64 bit main memory.
>> >> >
>> >> > Surely your main memory is just a sequence of 8-bit bytes (+ECC).
>> >> >
>> >>
>> >> AIUI I have two DIMMs, each of which has a 32-bit bus. I'm not up
>> >> on hardware these days, but it used to be if you wanted to write a
>> >> byte into your 16-bit memory you had to read both bytes, then
>> >> write one back.
>> >>
>> >> And
>>
>> > DIMMs have 64-bit data buses. Both these days and previous
>> > millennium. Now, these days DDR5 DIMM splits 64-bit data bus into
>> > pair of independent 32-bit channels, but the total is still 64
>> > bits.
>>
>> All DDR DIMMs have 64-bit busses (200 pins on the DIMM).
>> Some SDR (pre 2000) DIMMs had 32-bit busses,

> Are you sure? My impression always was that the word DIMM was invented
> to clearly separate 64-bit DIMMs from 32-bit SIMMs.

Dual In-Line Memory Module means they have pins on both sides of the
board where pins make contact with the plug. They put pins on both
sides so they could get ~200 pins {Vdd, Gnd, lock, reset, control,
and data} this was in the early 1990s.

>> some ancient laptop
>> memory carriers had 32-bit busses.

> But were they called just DIMM or SO-DIMM?
> The later indeed had 72-pin variant with 32-bit bus.

That sounds right.

>>
>> > That's a physical perspective. From logical perspective, DDR3 and
>> > DDR4 bits exchange data with controller in 512-bit chunks. On DDR5
>> > DIMM each channel exchanges data with controller in 512-bit chunks.
>> >
>>
>> Note:: 512-bis is 64-bytes.
>>

> Which *not* co-incidentally matches cache line size of majority of x86
> CPUs.

Given that x86 at the time the Std committee was doing the first one
represented 90% of all computers being sold, and the people on the
committee wanting to keep it that way, is unsurprising.

>> > From signaling perspective, it is still possible (at least on
>> > non-ECC gear) to tell to memory to write just 8 bits out of 512 and
>> > to ignore the rest. In PC-class hardware this ability is used very
>> > rarely if used at all.
>>
>> Never justified when CPU uses a cache. So, only unCacheable (sized)
>> requests can use this. AMD memory controllers hid this from the DRAMs
>> so we at least had the opportunity to recompute ECC on ECC carrying
>> DIMMs. MC would ask DRC for the line of data, check (and repair) ECC
>> then write the unCacheable data, and place the written data in the
>> outbound memory queue with its new ECC.

Re: AMD CPU funny

<um2s8s$19tfe$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35976&group=comp.arch#35976

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m.thomasson.1@gmail.com (Chris M. Thomasson)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Thu, 21 Dec 2023 18:28:43 -0800
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <um2s8s$19tfe$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<MmYgN.93175$Wp_8.13425@fx17.iad> <um1lpq$14br5$2@dont-email.me>
<kujd2vFg48cU4@mid.individual.net> <um2a7s$17ium$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Dec 2023 02:28:44 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="976eed046f8a4bb8ca380aeef2438635";
logging-data="1373678"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18mCsCIKymR3FidxeJgOs7vOWz2yqM45gE="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:sZEGi6O9LDap8X4WM3Rb2bsJzAc=
In-Reply-To: <um2a7s$17ium$1@dont-email.me>
Content-Language: en-US
 by: Chris M. Thomasson - Fri, 22 Dec 2023 02:28 UTC

On 12/21/2023 1:20 PM, Chris M. Thomasson wrote:
> On 12/21/2023 10:05 AM, Andy Burns wrote:
>> Vir Campestris wrote:
>>
>>> Scott Lurndal wrote:
>>>
>>>> Surely your main memory is just a sequence of 8-bit bytes (+ECC).
>>>
>>> AIUI I have two DIMMs, each of which has a 32-bit bus. I'm not up on
>>> hardware these days, but it used to be if you wanted to write a byte
>>> into your 16-bit memory you had to read both bytes, then write one back.
>>
>> I thought intel x64 machines work in cache lines of 64 bytes per
>> memory transaction ... maybe AMD processors are different?
>>
>>
>>
>
> Remember when Intel had a false sharing problem when they had 128 cache
> lines split into two 64 regions? Iirc, it was on some of their first
> hyperthreaded processors. A work around from intel was to offset threads
> using alloca... I remember it.

It was a nightmare, however, at least the workaround did help a bit wrt
performance.

Re: AMD CPU funny

<um43of$1j3jj$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35978&group=comp.arch#35978

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: vir.campestris@invalid.invalid (Vir Campestris)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Fri, 22 Dec 2023 13:42:39 +0000
Organization: A noiseless patient Spider
Lines: 54
Message-ID: <um43of$1j3jj$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Dec 2023 13:42:39 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1a4df9b3c3237271b090b937785a3a31";
logging-data="1674867"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19duWIKt7685pHIcGb2B2hFNVlX9WkxgeA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:znEiJwBbyQse6x7f1e/V0Z+nZCo=
In-Reply-To: <e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
Content-Language: en-GB
 by: Vir Campestris - Fri, 22 Dec 2023 13:42 UTC

On 21/12/2023 18:30, MitchAlsup wrote:
>> I've run this with more detailed increments on the line size, but here
>> are my results for powers of 2.
>
> { A
>> Size 1 gave 3.82242e+09 bytes/second.
>> Size 2 gave 3.80533e+09 bytes/second.
>> Size 4 gave 2.68017e+09 bytes/second.
>> Size 8 gave 2.33751e+09 bytes/second.
>> Size 16 gave 2.18424e+09 bytes/second.
>> Size 32 gave 2.10243e+09 bytes/second.
>> Size 64 gave 1.99371e+09 bytes/second.
>> Size 128 gave 1.98475e+09 bytes/second.
>> Size 256 gave 2.01653e+09 bytes/second.
>> Size 512 gave 2.00884e+09 bytes/second.
>> Size 1024 gave 2.02713e+09 bytes/second.
>> Size 2048 gave 2.01803e+09 bytes/second.
> } A
> { B
>> Size 4096 gave 3.26472e+09 bytes/second.
>> Size 8192 gave 3.85126e+09 bytes/second.
>> Size 16384 gave 3.85377e+09 bytes/second.
>> Size 32768 gave 3.85293e+09 bytes/second.
>> Size 65536 gave 2.06793e+09 bytes/second.
>> Size 131072 gave 2.06845e+09 bytes/second.
> } B
>
> A) Here we have a classical sequence pipelines often encounter where
> a simple loop starts off fast 4 bytes/cycle and progressively slows
> down by a factor of 2 (2 bytes per cycle) over some interval of
> complexity (size). What is important is that factor of 2 something
> that took 1 cycle early starts taking 2 cycles later on.
>
> B) here we have a second classical sequence where the performance at
> some boundary (4096) reverts back to the 1 cycle pipeline of performance
> only to degrade again (in basically the same sequence as A). {{Side
> note: size=4096 has "flutter" in the stairstep. size={8192..32768}
> has peak performance--probably something to do with sets in the cache
> and 4096 is the size of a page (TLB effects)}}.
> I suspect the flutter has something to do with your buffer crossing a
> page boundary.

I ran it with the memory access commented out. When I do that the
overall time is consistent regardless of how many times it goes around
the inner loop, and how many times the outer one.

But pipelines are funny things.

64k x 64 bit words is I think my L3 cache size. It's not very big on my
CPU, they had to fit a GPU on the die.

I'd be interested to see the results from anyone else's computer.

Andy

Re: AMD CPU funny

<eVh*Lrxyz@news.chiark.greenend.org.uk>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35983&group=comp.arch#35983

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.szaf.org!nntp-feed.chiark.greenend.org.uk!ewrotcd!.POSTED.chiark.greenend.org.uk!not-for-mail
From: theom+news@chiark.greenend.org.uk (Theo)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: 22 Dec 2023 18:12:36 +0000 (GMT)
Organization: University of Cambridge, England
Message-ID: <eVh*Lrxyz@news.chiark.greenend.org.uk>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me> <e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com> <um43of$1j3jj$1@dont-email.me>
Injection-Info: chiark.greenend.org.uk; posting-host="chiark.greenend.org.uk:212.13.197.229";
logging-data="31149"; mail-complaints-to="abuse@chiark.greenend.org.uk"
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (Linux/5.10.0-22-amd64 (x86_64))
Originator: theom@chiark.greenend.org.uk ([212.13.197.229])
 by: Theo - Fri, 22 Dec 2023 18:12 UTC

In comp.arch Vir Campestris <vir.campestris@invalid.invalid> wrote:
> I'd be interested to see the results from anyone else's computer.

$ neofetch --off
OS: Kubuntu 23.10 x86_64
Host: 20QBS03N00 ThinkPad X1 Titanium Gen 1
Kernel: 6.5.0-14-generic
CPU: 11th Gen Intel i5-1130G7 (8) @ 4.000GHz
GPU: Intel Tiger Lake-UP4 GT2 [Iris Xe Graphics]
Memory: 4436MiB / 15704MiB
$ gcc --version
gcc version 13.2.0 (Ubuntu 13.2.0-4ubuntu3)

$ g++ -o cache cache.c
$ ./cache
Size 1 gave 4.13643e+09 bytes/second.
Size 2 gave 4.79971e+09 bytes/second.
Size 4 gave 4.87535e+09 bytes/second.
Size 8 gave 4.8321e+09 bytes/second.
Size 16 gave 4.71703e+09 bytes/second.
Size 32 gave 3.89488e+09 bytes/second.
Size 64 gave 4.02976e+09 bytes/second.
Size 128 gave 4.15832e+09 bytes/second.
Size 256 gave 4.19562e+09 bytes/second.
Size 512 gave 4.08511e+09 bytes/second.
Size 1024 gave 4.0796e+09 bytes/second.
Size 2048 gave 4.11983e+09 bytes/second.
Size 4096 gave 4.06869e+09 bytes/second.
Size 8192 gave 4.06807e+09 bytes/second.
Size 16384 gave 4.06217e+09 bytes/second.
Size 32768 gave 4.06067e+09 bytes/second.
Size 65536 gave 4.04791e+09 bytes/second.
Size 131072 gave 4.06143e+09 bytes/second.
Size 262144 gave 4.04301e+09 bytes/second.
Size 524288 gave 4.03872e+09 bytes/second.
Size 1048576 gave 3.97715e+09 bytes/second.
Size 2097152 gave 3.97609e+09 bytes/second.
Size 4194304 gave 3.98361e+09 bytes/second.
Size 8388608 gave 3.98617e+09 bytes/second.
Size 16777216 gave 3.98376e+09 bytes/second.
Size 33554432 gave 3.98504e+09 bytes/second.
Size 67108864 gave 3.98726e+09 bytes/second.
Size 134217728 gave 3.99495e+09 bytes/second.

Re: AMD CPU funny

<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35985&group=comp.arch#35985

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Fri, 22 Dec 2023 19:35:37 +0000
Organization: novaBBS
Message-ID: <3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me> <e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com> <um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="806327"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Site: $2y$10$PQJmU6Z5Kq31IgQwUl8JL.Cq05vcWyjX/lTKKUntX9qiHGzCyxu0m
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Fri, 22 Dec 2023 19:35 UTC

Theo wrote:
> $ g++ -o cache cache.c
> $ ./cache
{ A
> Size 1 gave 4.13643e+09 bytes/second.
> Size 2 gave 4.79971e+09 bytes/second.
> Size 4 gave 4.87535e+09 bytes/second.
> Size 8 gave 4.8321e+09 bytes/second.
> Size 16 gave 4.71703e+09 bytes/second.
} A
{ B
> Size 32 gave 3.89488e+09 bytes/second.
> Size 64 gave 4.02976e+09 bytes/second.
> Size 128 gave 4.15832e+09 bytes/second.
> Size 256 gave 4.19562e+09 bytes/second.
> Size 512 gave 4.08511e+09 bytes/second.
> Size 1024 gave 4.0796e+09 bytes/second.
> Size 2048 gave 4.11983e+09 bytes/second.
> Size 4096 gave 4.06869e+09 bytes/second.
> Size 8192 gave 4.06807e+09 bytes/second.
> Size 16384 gave 4.06217e+09 bytes/second.
> Size 32768 gave 4.06067e+09 bytes/second.
> Size 65536 gave 4.04791e+09 bytes/second.
> Size 131072 gave 4.06143e+09 bytes/second.
> Size 262144 gave 4.04301e+09 bytes/second.
> Size 524288 gave 4.03872e+09 bytes/second.
> Size 1048576 gave 3.97715e+09 bytes/second.
> Size 2097152 gave 3.97609e+09 bytes/second.
> Size 4194304 gave 3.98361e+09 bytes/second.
> Size 8388608 gave 3.98617e+09 bytes/second.
> Size 16777216 gave 3.98376e+09 bytes/second.
> Size 33554432 gave 3.98504e+09 bytes/second.
> Size 67108864 gave 3.98726e+09 bytes/second.
> Size 134217728 gave 3.99495e+09 bytes/second.
} B

The B group are essentially 4.0 with noise.
The A group are essentially 4.8 with a startup flutter.
A 20% down or 25% up change at 32::64.

Re: AMD CPU funny

<fVh*Dgyyz@news.chiark.greenend.org.uk>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35989&group=comp.arch#35989

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsfeed.xs3.de!nntp-feed.chiark.greenend.org.uk!ewrotcd!.POSTED.chiark.greenend.org.uk!not-for-mail
From: theom+news@chiark.greenend.org.uk (Theo)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: 22 Dec 2023 21:58:13 +0000 (GMT)
Organization: University of Cambridge, England
Message-ID: <fVh*Dgyyz@news.chiark.greenend.org.uk>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me> <e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com> <um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk> <3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
Injection-Info: chiark.greenend.org.uk; posting-host="chiark.greenend.org.uk:212.13.197.229";
logging-data="7315"; mail-complaints-to="abuse@chiark.greenend.org.uk"
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (Linux/5.10.0-22-amd64 (x86_64))
Originator: theom@chiark.greenend.org.uk ([212.13.197.229])
 by: Theo - Fri, 22 Dec 2023 21:58 UTC

In comp.arch MitchAlsup <mitchalsup@aol.com> wrote:
> Theo wrote:
> > $ g++ -o cache cache.c
> > $ ./cache
> { A
> > Size 1 gave 4.13643e+09 bytes/second.
> > Size 2 gave 4.79971e+09 bytes/second.
> > Size 4 gave 4.87535e+09 bytes/second.
> > Size 8 gave 4.8321e+09 bytes/second.
> > Size 16 gave 4.71703e+09 bytes/second.
> } A
> { B
> > Size 32 gave 3.89488e+09 bytes/second.
....
> > Size 134217728 gave 3.99495e+09 bytes/second.
> } B
>
> The B group are essentially 4.0 with noise.
> The A group are essentially 4.8 with a startup flutter.
> A 20% down or 25% up change at 32::64.

The nearest machine I have to the OP is this:

OS: Ubuntu 20.04.6 LTS x86_64
Host: 1U4LW-X570/2L2T
Kernel: 5.4.0-167-generic
CPU: AMD Ryzen 7 5800X (16) @ 3.800GHz
GPU: 29:00.0 ASPEED Technology, Inc. ASPEED Graphics Family
Memory: 3332MiB / 128805MiB

gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0

Size 1 gave 6.22043e+09 bytes/second.
Size 2 gave 6.35674e+09 bytes/second.
Size 4 gave 4.14766e+09 bytes/second.
Size 8 gave 5.4239e+09 bytes/second.
Size 16 gave 6.01113e+09 bytes/second.
Size 32 gave 7.75976e+09 bytes/second.
Size 64 gave 8.7972e+09 bytes/second.
Size 128 gave 9.71523e+09 bytes/second.
Size 256 gave 9.91644e+09 bytes/second.
Size 512 gave 1.00179e+10 bytes/second.
Size 1024 gave 1.0065e+10 bytes/second.
Size 2048 gave 9.78508e+09 bytes/second.
Size 4096 gave 9.76764e+09 bytes/second.
Size 8192 gave 9.86537e+09 bytes/second.
Size 16384 gave 9.90053e+09 bytes/second.
Size 32768 gave 9.91552e+09 bytes/second.
Size 65536 gave 9.84556e+09 bytes/second.
Size 131072 gave 9.78442e+09 bytes/second.
Size 262144 gave 9.80282e+09 bytes/second.
Size 524288 gave 9.81447e+09 bytes/second.
Size 1048576 gave 9.81981e+09 bytes/second.
Size 2097152 gave 9.81456e+09 bytes/second.
Size 4194304 gave 9.70057e+09 bytes/second.
Size 8388608 gave 9.55507e+09 bytes/second.
Size 16777216 gave 9.44032e+09 bytes/second.
Size 33554432 gave 9.33896e+09 bytes/second.
Size 67108864 gave 9.28529e+09 bytes/second.
Size 134217728 gave 9.25213e+09 bytes/second.

which seems to have more flutter in group A. I get similar results in a VM
running on a AMD Ryzen 9 5950X (32) @ 3.393GHz.

Theo

Re: AMD CPU funny

<b5757dbbd99e9272f93e6a645a967fd3@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35990&group=comp.arch#35990

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Fri, 22 Dec 2023 23:24:29 +0000
Organization: novaBBS
Message-ID: <b5757dbbd99e9272f93e6a645a967fd3@news.novabbs.com>
References: <ulv7j4$l0o0$1@dont-email.me> <hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me> <e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com> <um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk> <3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com> <fVh*Dgyyz@news.chiark.greenend.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="826170"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$dPkzbbnzYkhLoZ.TxE7XQ.gf4abxPRe.EiMH8L85FG9EX.IabnWg6
X-Spam-Checker-Version: SpamAssassin 4.0.0
 by: MitchAlsup - Fri, 22 Dec 2023 23:24 UTC

Theo wrote:

> In comp.arch MitchAlsup <mitchalsup@aol.com> wrote:
>>

> gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0

{ A
> Size 1 gave 6.22043e+09 bytes/second.
> Size 2 gave 6.35674e+09 bytes/second.
> Size 4 gave 4.14766e+09 bytes/second.
> Size 8 gave 5.4239e+09 bytes/second.
> Size 16 gave 6.01113e+09 bytes/second.
> Size 32 gave 7.75976e+09 bytes/second.
> Size 64 gave 8.7972e+09 bytes/second.
} A
{ B
> Size 128 gave 9.71523e+09 bytes/second.
> Size 256 gave 9.91644e+09 bytes/second.
> Size 512 gave 1.00179e+10 bytes/second.
> Size 1024 gave 1.0065e+10 bytes/second.
> Size 2048 gave 9.78508e+09 bytes/second.
> Size 4096 gave 9.76764e+09 bytes/second.
> Size 8192 gave 9.86537e+09 bytes/second.
> Size 16384 gave 9.90053e+09 bytes/second.
> Size 32768 gave 9.91552e+09 bytes/second.
> Size 65536 gave 9.84556e+09 bytes/second.
> Size 131072 gave 9.78442e+09 bytes/second.
> Size 262144 gave 9.80282e+09 bytes/second.
> Size 524288 gave 9.81447e+09 bytes/second.
> Size 1048576 gave 9.81981e+09 bytes/second.
> Size 2097152 gave 9.81456e+09 bytes/second.
} B
{ C
> Size 4194304 gave 9.70057e+09 bytes/second.
> Size 8388608 gave 9.55507e+09 bytes/second.
> Size 16777216 gave 9.44032e+09 bytes/second.
> Size 33554432 gave 9.33896e+09 bytes/second.
> Size 67108864 gave 9.28529e+09 bytes/second.
> Size 134217728 gave 9.25213e+09 bytes/second.
} C

> which seems to have more flutter in group A. I get similar results in a VM
> running on a AMD Ryzen 9 5950X (32) @ 3.393GHz.

A has some kind of startup irregularity: down then up.
B is essentially constant
C is slowly degrading (maybe from TLB effects}

> Theo

Re: AMD CPU funny

<um7jom$27ikh$3@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35998&group=comp.arch#35998

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: vir.campestris@invalid.invalid (Vir Campestris)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sat, 23 Dec 2023 21:34:14 +0000
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <um7jom$27ikh$3@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 23 Dec 2023 21:34:15 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1ef8412645e9256ce98af330d3194132";
logging-data="2345617"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/LLeQH8FhJGtK8ZaP2e/iDrc8GTXuAxKU="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:hKcJKGhIAT9uVkbE33ttDwZijvk=
In-Reply-To: <fVh*Dgyyz@news.chiark.greenend.org.uk>
Content-Language: en-GB
 by: Vir Campestris - Sat, 23 Dec 2023 21:34 UTC

On 22/12/2023 21:58, Theo wrote:
<snip>
>
> which seems to have more flutter in group A. I get similar results
in a VM
> running on a AMD Ryzen 9 5950X (32) @ 3.393GHz.
>
> Theo
Thank you for those. Neither of them show the odd thing I was seeing -
but I compiled with -ofast. Does that make a difference?

Andy

Re: AMD CPU funny

<um7o72$1bj9r$1@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36001&group=comp.arch#36001

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!news.niel.me!glou.org!news.glou.org!fdn.fr!3.eu.feeder.erje.net!feeder.erje.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-f912-0-3959-a14-977a-3135.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Sat, 23 Dec 2023 22:50:10 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <um7o72$1bj9r$1@newsreader4.netcologne.de>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
Injection-Date: Sat, 23 Dec 2023 22:50:10 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-f912-0-3959-a14-977a-3135.ipv6dyn.netcologne.de:2001:4dd7:f912:0:3959:a14:977a:3135";
logging-data="1428795"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sat, 23 Dec 2023 22:50 UTC

Vir Campestris <vir.campestris@invalid.invalid> schrieb:
> On 22/12/2023 21:58, Theo wrote:
><snip>
> >
> > which seems to have more flutter in group A. I get similar results
> in a VM
> > running on a AMD Ryzen 9 5950X (32) @ 3.393GHz.
> >
> > Theo
> Thank you for those. Neither of them show the odd thing I was seeing -
> but I compiled with -ofast. Does that make a difference?

That would put it in an executable named "fast" :-)

-march=native -mtune=native might make more of a difference, depending
on how good the compiler's model of your hardware is.

Re: AMD CPU funny

<umc56l$32ucu$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36360&group=comp.arch#36360

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Mon, 25 Dec 2023 15:56:19 +0100
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <umc56l$32ucu$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 25 Dec 2023 14:56:21 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="bf99b0ceeab925f478699aa83539d1fb";
logging-data="3242398"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+fzSmyckbPraW4SToTOKVQyKB5PnxSpOU="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:0IbNqDAJFDrxveQRpEIQW5/s9to=
Content-Language: en-GB
In-Reply-To: <um7o72$1bj9r$1@newsreader4.netcologne.de>
 by: David Brown - Mon, 25 Dec 2023 14:56 UTC

On 23/12/2023 23:50, Thomas Koenig wrote:
> Vir Campestris <vir.campestris@invalid.invalid> schrieb:
>> On 22/12/2023 21:58, Theo wrote:
>> <snip>
>>>
>>> which seems to have more flutter in group A. I get similar results
>> in a VM
>>> running on a AMD Ryzen 9 5950X (32) @ 3.393GHz.
>>>
>>> Theo
>> Thank you for those. Neither of them show the odd thing I was seeing -
>> but I compiled with -ofast. Does that make a difference?
>
> That would put it in an executable named "fast" :-)

Some programs are really fussy about capitalisation!

>
> -march=native -mtune=native might make more of a difference, depending
> on how good the compiler's model of your hardware is.

For gcc on modern x86-64 processors (IME at least), the difference
between -O0 and -O1 is often large, but the difference between -O1 and
higher levels usually makes far less difference. The processors
themselves do a good job of things like instruction scheduling and
register renaming at runtime, and are designed to be good for running
weakly optimised code. I find it makes a bigger difference on
processors that don't do as much at run-time, such as microcontroller cores.

But the "-march=native" can make a very big difference, especially if it
means the compiler can use SIMD or other advanced instructions. (You
don't need "-mtune" if you have "march", as it is implied by "-march" -
you only need both if you want to make a build that will run on many
x86-64 variants but is optimised for one particular one.) And the
"-march=native" benefits go well with some of the higher optimisations -
thus it is good to combine "-march=native" with "-Ofast" or "-O2".

In practice, things also vary dramatically according to the type of program.

Re: AMD CPU funny

<umhbqp$3um9d$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36380&group=comp.arch#36380

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: vir.campestris@invalid.invalid (Vir Campestris)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Wed, 27 Dec 2023 14:20:08 +0000
Organization: A noiseless patient Spider
Lines: 63
Message-ID: <umhbqp$3um9d$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 27 Dec 2023 14:20:09 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1c149bb664980ccd6cc8c7ea1c68501d";
logging-data="4151597"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/1e6jrEpyMwoKNyHfm/4Q8v/23KS00U2w="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:kKAp2VYcSc/hkUZFiivfv5WzRck=
In-Reply-To: <umc56l$32ucu$1@dont-email.me>
Content-Language: en-GB
 by: Vir Campestris - Wed, 27 Dec 2023 14:20 UTC

On 25/12/2023 14:56, David Brown wrote:
> On 23/12/2023 23:50, Thomas Koenig wrote:
>> Vir Campestris <vir.campestris@invalid.invalid> schrieb:
>>> On 22/12/2023 21:58, Theo wrote:
>>> <snip>
>>>>
>>>> which seems to have more flutter in group A.  I get similar results
>>> in a VM
>>>> running on a AMD Ryzen 9 5950X (32) @ 3.393GHz.
>>>>
>>>> Theo
>>> Thank you for those. Neither of them show the odd thing I was seeing -
>>> but I compiled with -ofast. Does that make a difference?
>>
>> That would put it in an executable named "fast" :-)

YKWIM :P
>
> Some programs are really fussy about capitalisation!
>
>>
>> -march=native -mtune=native might make more of a difference, depending
>> on how good the compiler's model of your hardware is.
>
> For gcc on modern x86-64 processors (IME at least), the difference
> between -O0 and -O1 is often large, but the difference between -O1 and
> higher levels usually makes far less difference.  The processors
> themselves do a good job of things like instruction scheduling and
> register renaming at runtime, and are designed to be good for running
> weakly optimised code.  I find it makes a bigger difference on
> processors that don't do as much at run-time, such as microcontroller
> cores.
>
> But the "-march=native" can make a very big difference, especially if it
> means the compiler can use SIMD or other advanced instructions.  (You
> don't need "-mtune" if you have "march", as it is implied by "-march" -
> you only need both if you want to make a build that will run on many
> x86-64 variants but is optimised for one particular one.)  And the
> "-march=native" benefits go well with some of the higher optimisations -
> thus it is good to combine "-march=native" with "-Ofast" or "-O2".
>
> In practice, things also vary dramatically according to the type of
> program.
>

mtune=native _does_ make a difference. Somewhat to my surprise it makes
it _slower_ - the biggest difference being about 5% with a size of 128k
UINT64.

I checked O0 (really slow) O1 (a lot faster) O3 (quite a bit faster too)
Ofast (mostly slightly faster than O3 when I use a capital O) and O3
march=native.

The pipeline thing made me try something different - instead of
incrementing a value I used std::copy to copy each word to the next one.

That rose to a peak rate of about 64MB/S with a size of 32k, dropped
sharply to 45MB/s and then showed a steady decline to 40MB/s at 256k. It
then dropped sharply to 10MB/S for all larger sizes.

A much more sensible result!

Andy

Re: AMD CPU funny

<umhe9j$3v1gt$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36382&group=comp.arch#36382

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Wed, 27 Dec 2023 16:02:11 +0100
Organization: A noiseless patient Spider
Lines: 79
Message-ID: <umhe9j$3v1gt$1@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 27 Dec 2023 15:02:11 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="eb08b137e01c9052e60fea59002dd005";
logging-data="4163101"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19BY0kR7NywWAfRshSXoTxJRI4Te3raEAU="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:5nBfgFzIkrp24SRZedT8mgQBttc=
Content-Language: en-GB
In-Reply-To: <umhbqp$3um9d$1@dont-email.me>
 by: David Brown - Wed, 27 Dec 2023 15:02 UTC

On 27/12/2023 15:20, Vir Campestris wrote:
> On 25/12/2023 14:56, David Brown wrote:
>> On 23/12/2023 23:50, Thomas Koenig wrote:
>>> Vir Campestris <vir.campestris@invalid.invalid> schrieb:
>>>> On 22/12/2023 21:58, Theo wrote:
>>>> <snip>
>>>>>
>>>>> which seems to have more flutter in group A.  I get similar results
>>>> in a VM
>>>>> running on a AMD Ryzen 9 5950X (32) @ 3.393GHz.
>>>>>
>>>>> Theo
>>>> Thank you for those. Neither of them show the odd thing I was seeing -
>>>> but I compiled with -ofast. Does that make a difference?
>>>
>>> That would put it in an executable named "fast" :-)
>
> YKWIM :P
>>
>> Some programs are really fussy about capitalisation!
>>
>>>
>>> -march=native -mtune=native might make more of a difference, depending
>>> on how good the compiler's model of your hardware is.
>>
>> For gcc on modern x86-64 processors (IME at least), the difference
>> between -O0 and -O1 is often large, but the difference between -O1 and
>> higher levels usually makes far less difference.  The processors
>> themselves do a good job of things like instruction scheduling and
>> register renaming at runtime, and are designed to be good for running
>> weakly optimised code.  I find it makes a bigger difference on
>> processors that don't do as much at run-time, such as microcontroller
>> cores.
>>
>> But the "-march=native" can make a very big difference, especially if
>> it means the compiler can use SIMD or other advanced instructions.
>> (You don't need "-mtune" if you have "march", as it is implied by
>> "-march" - you only need both if you want to make a build that will
>> run on many x86-64 variants but is optimised for one particular one.)
>> And the "-march=native" benefits go well with some of the higher
>> optimisations - thus it is good to combine "-march=native" with
>> "-Ofast" or "-O2".
>>
>> In practice, things also vary dramatically according to the type of
>> program.
>>
>
> mtune=native _does_ make a difference. Somewhat to my surprise it makes
> it _slower_ - the biggest difference being about 5% with a size of 128k
> UINT64.
>
> I checked O0 (really slow) O1 (a lot faster) O3 (quite a bit faster too)
> Ofast (mostly slightly faster than O3 when I use a capital O) and O3
> march=native.
>

"-Ofast" is like -O3, but additionally allows the compiler to "bend" the
rules somewhat. For example, it allows optimisations of stores that
might cause data races if another thread can see the same data, and it
enables "-ffast-math" which treats floating point as somewhat
approximate and always finite, rather than following IEEE rules
strictly. If you are okay with this, then "-Ofast" is fine, but you
should read the documentation to check that you are happy with it.

While you are there, you can see a number of other optimisation flags
that are not enabled by any -O sets. These may or may not help,
depending on your source code.

> The pipeline thing made me try something different - instead of
> incrementing a value I used std::copy to copy each word to the next one.
>
> That rose to a peak rate of about 64MB/S with a size of 32k, dropped
> sharply to 45MB/s and then showed a steady decline to 40MB/s at 256k. It
> then dropped sharply to 10MB/S for all larger sizes.
>
> A much more sensible result!
>
> Andy

Re: AMD CPU funny

<20231227183842.00001b4f@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36383&group=comp.arch#36383

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Wed, 27 Dec 2023 18:38:42 +0200
Organization: A noiseless patient Spider
Lines: 84
Message-ID: <20231227183842.00001b4f@yahoo.com>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk>
<um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me>
<eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk>
<um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de>
<umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Injection-Info: dont-email.me; posting-host="91c68f03321a6dc84de54df70731538d";
logging-data="4188234"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19fQULi1B4MW/UKBZf8svPUDupP60cc2pg="
Cancel-Lock: sha1:hj1q17SQ7/pFMQNiJrpJihazQ94=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
 by: Michael S - Wed, 27 Dec 2023 16:38 UTC

On Wed, 27 Dec 2023 14:20:08 +0000
Vir Campestris <vir.campestris@invalid.invalid> wrote:

> On 25/12/2023 14:56, David Brown wrote:
> > On 23/12/2023 23:50, Thomas Koenig wrote:
> >> Vir Campestris <vir.campestris@invalid.invalid> schrieb:
> >>> On 22/12/2023 21:58, Theo wrote:
> >>> <snip>
> >>>>
> >>>> which seems to have more flutter in group A.  I get similar
> >>>> results
> >>> in a VM
> >>>> running on a AMD Ryzen 9 5950X (32) @ 3.393GHz.
> >>>>
> >>>> Theo
> >>> Thank you for those. Neither of them show the odd thing I was
> >>> seeing - but I compiled with -ofast. Does that make a difference?
> >>>
> >>
> >> That would put it in an executable named "fast" :-)
>
> YKWIM :P
> >
> > Some programs are really fussy about capitalisation!
> >
> >>
> >> -march=native -mtune=native might make more of a difference,
> >> depending on how good the compiler's model of your hardware is.
> >
> > For gcc on modern x86-64 processors (IME at least), the difference
> > between -O0 and -O1 is often large, but the difference between -O1
> > and higher levels usually makes far less difference.  The
> > processors themselves do a good job of things like instruction
> > scheduling and register renaming at runtime, and are designed to be
> > good for running weakly optimised code.  I find it makes a bigger
> > difference on processors that don't do as much at run-time, such as
> > microcontroller cores.
> >
> > But the "-march=native" can make a very big difference, especially
> > if it means the compiler can use SIMD or other advanced
> > instructions.  (You don't need "-mtune" if you have "march", as it
> > is implied by "-march" - you only need both if you want to make a
> > build that will run on many x86-64 variants but is optimised for
> > one particular one.)  And the "-march=native" benefits go well with
> > some of the higher optimisations - thus it is good to combine
> > "-march=native" with "-Ofast" or "-O2".
> >
> > In practice, things also vary dramatically according to the type of
> > program.
> >
>
> mtune=native _does_ make a difference. Somewhat to my surprise it
> makes it _slower_ - the biggest difference being about 5% with a size
> of 128k UINT64.
>
> I checked O0 (really slow) O1 (a lot faster) O3 (quite a bit faster
> too) Ofast (mostly slightly faster than O3 when I use a capital O)
> and O3 march=native.
>
> The pipeline thing made me try something different - instead of
> incrementing a value I used std::copy to copy each word to the next
> one.
>
> That rose to a peak rate of about 64MB/S with a size of 32k, dropped
> sharply to 45MB/s and then showed a steady decline to 40MB/s at 256k.
> It then dropped sharply to 10MB/S for all larger sizes.
>
> A much more sensible result!

GB rather than MB, hopefully

>
> Andy

Can you tell us what are you trying to achieve?
Is it a microbenchmark intended to improve your understanding of Zen1
internals?
Or part of the code that you want to run as fast as possible?
If the later, what exactly do you want to be done by the code?

Re: AMD CPU funny

<umn15d$svun$6@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=36393&group=comp.arch#36393

  copy link   Newsgroups: uk.comp.homebuilt comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: vir.campestris@invalid.invalid (Vir Campestris)
Newsgroups: uk.comp.homebuilt,comp.arch
Subject: Re: AMD CPU funny
Date: Fri, 29 Dec 2023 17:54:53 +0000
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <umn15d$svun$6@dont-email.me>
References: <ulv7j4$l0o0$1@dont-email.me>
<hVh*RTmyz@news.chiark.greenend.org.uk> <um1h21$13l52$1@dont-email.me>
<e2bb329488167ada6ece600c8f28d3ed@news.novabbs.com>
<um43of$1j3jj$1@dont-email.me> <eVh*Lrxyz@news.chiark.greenend.org.uk>
<3d2dcfa36ceca1ee856b9de3ab416a01@news.novabbs.com>
<fVh*Dgyyz@news.chiark.greenend.org.uk> <um7jom$27ikh$3@dont-email.me>
<um7o72$1bj9r$1@newsreader4.netcologne.de> <umc56l$32ucu$1@dont-email.me>
<umhbqp$3um9d$1@dont-email.me> <20231227183842.00001b4f@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 29 Dec 2023 17:54:53 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f7a3892e219a1937d0607003ea21972a";
logging-data="950231"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18g6G+tqfXu4AGokIzK2qc7eA4P8LiE0mk="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:gyulb9IRxi7LeLXZ8qVzGr2/teA=
In-Reply-To: <20231227183842.00001b4f@yahoo.com>
Content-Language: en-GB
 by: Vir Campestris - Fri, 29 Dec 2023 17:54 UTC

On 27/12/2023 16:38, Michael S wrote:
> On Wed, 27 Dec 2023 14:20:08 +0000
> Vir Campestris <vir.campestris@invalid.invalid> wrote:
>>
>> That rose to a peak rate of about 64MB/S with a size of 32k, dropped
>> sharply to 45MB/s and then showed a steady decline to 40MB/s at 256k.
>> It then dropped sharply to 10MB/S for all larger sizes.
>>
>> A much more sensible result!
>
> GB rather than MB, hopefully
>
>>
>> Andy
>
> Can you tell us what are you trying to achieve?
> Is it a microbenchmark intended to improve your understanding of Zen1
> internals?
> Or part of the code that you want to run as fast as possible?
> If the later, what exactly do you want to be done by the code?
>

It is a microbenchmark.

I was trying to understand cache performance on my system.

Over in comp.lang.c++ you'll find a thread about Sieve or Eratosthenes.

I and another poster are trying to optimise it. Not for any good reason
of course... but I was just curious about some of the results I was getting.

Andy

Pages:12345
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor