Rocksolid Light - comp.arch - Re: Memory dependency microbenchmark

Re: Memory dependency microbenchmark

<uj3d50$1tb8u$2@dont-email.me>

https://news.novabbs.org/devel/article-flat.php?id=35024&group=comp.arch#35024

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m.thomasson.1@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Memory dependency microbenchmark
Date: Wed, 15 Nov 2023 13:27:59 -0800
Organization: A noiseless patient Spider
Lines: 126
Message-ID: <uj3d50$1tb8u$2@dont-email.me>
References: <2023Nov3.101558@mips.complang.tuwien.ac.at>
<82b3b3b710652e607dac6cec2064c90b@news.novabbs.com>
<uisdmn$gd4s$2@dont-email.me> <uiu4t5$t4c2$2@dont-email.me>
<uj3c29$1t9an$1@dont-email.me> <uj3d0a$1tb8u$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 15 Nov 2023 21:28:00 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c1eb1e7d2816baa503f549a694bca0e3";
logging-data="2010398"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+k57iSATowLAuYL4yYG1qw8AGwZJtuELk="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:sEGvWDlKxu8LMNBHscH+SCLmyFg=
Content-Language: en-US
In-Reply-To: <uj3d0a$1tb8u$1@dont-email.me>

by: Chris M. Thomasson - Wed, 15 Nov 2023 21:27 UTC

On 11/15/2023 1:25 PM, Chris M. Thomasson wrote:
> On 11/15/2023 1:09 PM, Kent Dickey wrote:
>> In article <uiu4t5$t4c2$2@dont-email.me>,
>> Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
>>> On 11/12/2023 9:54 PM, Kent Dickey wrote:
>>>> In article <82b3b3b710652e607dac6cec2064c90b@news.novabbs.com>,
>>>> MitchAlsup <mitchalsup@aol.com> wrote:
>>>>> Kent Dickey wrote:
>>>>>
>>>>>> In article <uiri0a$85mp$2@dont-email.me>,
>>>>>> Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
>>>>>
>>>>>>> A highly relaxed memory model can be beneficial for certain
>>>>>>> workloads.
>>>>>
>>>>>> I know a lot of people believe that statement to be true. In
>>>>>> general, it
>>>>>> is assumed to be true without proof.
>>>>> <
>>>>> In its most general case, relaxed order only provides a performance
>>>>> advantage
>>>>> when the code is single threaded.
>>>>
>>>> I believe a Relaxed Memory model provides a small performance
>>>> improvement
>>>> ONLY to simple in-order CPUs in an MP system (if you're a single CPU,
>>>> there's nothing to order).
>>>>
>>>> Relazed Memory ordering provides approximately zero performance
>>>> improvement
>>>> to an OoO CPU, and in fact, might actually lower performance
>>>> (depends on
>>>> how barriers are done--if done poorly, it could be a big negative).
>>>>
>>>> Yes, the system designers of the world have said: let's slow down our
>>>> fastest most expensive most profitable CPUs, so we can speed up our
>>>> cheapest
>>>> lowest profit CPUs a few percent, and push a ton of work onto software
>>>> developers.
>>>>
>>>> It's crazy.
>>>>
>>>>>> I believe that statement to be false. Can you describe some of these
>>>>>> workloads?
>>>>> <
>>>>> Relaxed memory order fails spectacularly when multiple threads are
>>>>> accessing
>>>>> data.
>>>>
>>>> Probably need to clarify with "accessing modified data".
>>>>
>>>> Kent
>>>
>>> Huh? So, C++ is crazy for allowing for std::memory_order_relaxed to even
>>> exist? I must be misunderstanding you point here. Sorry if I am. ;^o
>>
>> You have internalized weakly ordered memory, and you're having trouble
>> seeing beyond it.
>
> Really? Don't project yourself on me. Altering all of the memory
> barriers of a finely tuned lock-free algorithm to seq_cst is VERY bad.
>
>
>>
>> CPUs with weakly ordered memory are the ones that need all those flags.
>> Yes, you need the flags if you want to use those CPUs. I'm pointing out:
>> we could all just require better memory ordering and get rid of all this
>> cruft. Give the flag, don't give the flag, the program is still correct
>> and works properly.
>
> Huh? Just cruft? wow. Just because it seems hard for you does not mean
> we should eliminate it. Believe it or not there are people out there
> that know how to use memory barriers. I suppose you would use seq_cst to
> load each node of a lock-free stack iteration in a RCU read-side region.
> This is terrible! Realy bad, bad, BAD! Afaicvt, it kind a, sort a, seems
> like you do not have all that much experience with them. Humm...
>
>
>>
>> It's like FP denorms--it's generally been decided the hardware cost
>> to implement it is small, so hardware needs to support it at full speed.
>> No need to write code in a careful way to avoid denorms, to use funky
>> CPU-
>> specific calls to turn on flush-to-0, etc., it just works, we move on to
>> other topics. But we still have flush-to-0 calls available--but you
>> don't
>> need to bother to use them. In my opinion, memory ordering is much more
>> complex for programmers to handle. I maintain it's actually so
>> complex most people cannot get it right in software for non-trivial
>> interactions. I've found many hardware designers have a very hard time
>> reasoning about this as well when I report bugs (since the rules are so
>> complex and poorly described). There are over 100 pages describing
>> memory
>> ordering in the Arm Architectureal Reference Manual, and it is very
>> complex (Dependency through registers and memory; Basic Dependency;
>> Address Dependency; Data Dependency; Control Dependency; Pick Basic
>> dependency; Pick Address Dependency; Pick Data Dependency; Pick
>> Control Dependency, Pick Dependency...and this is just from the
>> definition
>> of terms). It's all very abstract and difficult to follow. I'll be
>> honest: I do not understand all of these rules, and I don't care to.
>> I know how to implement a CPU, so I know what they've done, and that's
>> much simpler to understand. But writing a threaded application is much
>> more complex than it should be for software.
>>
>> The cost to do TSO is some out-of-order tracking structures need to get
>> a little bigger, and some instructions have to stay in queues longer
>> (which is why they may need to get bigger), and allow re-issuing loads
>> which now have stale data. The difference between TSO and Sequential
>> Consistency is to just disallow loads seeing stores queued before they
>> write to the data cache (well, you can speculatively let loads happen,
>> but you need to be able to walk it back, which is not difficult). This
>> is why I say the performance cost is low--normal code missing caches and
>> not being pestered by other CPUs can run at the same speed. But when
>> other CPUs begin pestering us, the interference can all be worked out as
>> efficiently as possible using hardware, and barriers just do not
>> compete.
>
> Having access to fine grain memory barriers is a very good thing. Of
> course we can use C++ right now and make everything seq_cst, but that is
> moronic. Why would you want to use seq_cst everywhere when you do not
> have to? There are rather massive performance implications.
>
> Are you thinking about a magic arch that we cannot use right now?

https://youtu.be/DZJPqTTt7MA

Re: Memory dependency microbenchmark

<uj3djd$1tb8u$4@dont-email.me>

Subject	Author
Memory dependency microbenchmark	Anton Ertl
Re: Memory dependency microbenchmark	EricP
Re: Memory dependency microbenchmark	Anton Ertl
Re: Memory dependency microbenchmark	EricP
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	EricP
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	EricP
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Kent Dickey
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Kent Dickey
Re: Memory dependency microbenchmark	aph
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	aph
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Kent Dickey
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	aph
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	aph
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Stefan Monnier
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	aph
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Scott Lurndal
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Stefan Monnier
Re: Memory dependency microbenchmark	EricP
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Branimir Maksimovic
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Paul A. Clayton
Re: Memory dependency microbenchmark	Scott Lurndal
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	EricP
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Paul A. Clayton
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	EricP
Re: Memory dependency microbenchmark	aph
Re: Memory dependency microbenchmark	EricP
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	aph
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	EricP
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	EricP
Re: Memory dependency microbenchmark	aph
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	aph
Re: Memory dependency microbenchmark	Terje Mathisen
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	EricP
Re: Memory dependency microbenchmark	Paul A. Clayton
Re: Memory dependency microbenchmark	Chris M. Thomasson
weak consistency and the supercomputer attitude (was: Memory dependency microben	Anton Ertl
Re: weak consistency and the supercomputer attitude	Stefan Monnier
Re: weak consistency and the supercomputer attitude	MitchAlsup
Re: weak consistency and the supercomputer attitude	Paul A. Clayton
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Chris M. Thomasson
Re: Memory dependency microbenchmark	MitchAlsup
Re: Memory dependency microbenchmark	Anton Ertl
Alder Lake results for the memory dependency microbenchmark	Anton Ertl

The sum of the Universe is zero.

devel / comp.arch / Re: Memory dependency microbenchmark

devel / comp.arch / Re: Memory dependency microbenchmark