Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

Memory fault -- core...uh...um...core... Oh dammit, I forget!


devel / comp.arch / Re: Hardware vs. Software bugs

SubjectAuthor
* Re: Hardware vs. Software bugsPaul A. Clayton
`* Re: Hardware vs. Software bugsMitchAlsup
 `* Re: Hardware vs. Software bugsPaul A. Clayton
  +* Re: Hardware vs. Software bugsScott Lurndal
  |`* Re: Hardware vs. Software bugsMitchAlsup
  | `* Re: Hardware vs. Software bugsScott Lurndal
  |  `* Re: Hardware vs. Software bugsMitchAlsup
  |   `* Re: Hardware vs. Software bugsScott Lurndal
  |    `- Re: Hardware vs. Software bugsMitchAlsup
  `- Re: Hardware vs. Software bugsMitchAlsup

1
Re: Hardware vs. Software bugs

<u62kij$2dc9h$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32754&group=comp.arch#32754

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Hardware vs. Software bugs
Date: Sat, 10 Jun 2023 15:57:07 -0400
Organization: A noiseless patient Spider
Lines: 135
Message-ID: <u62kij$2dc9h$1@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<hMlwL.39866$4jN7.10690@fx02.iad>
<2023Jan14.132018@mips.complang.tuwien.ac.at>
<0lCwL.126778$PXw7.84059@fx45.iad>
<2023Jan15.094445@mips.complang.tuwien.ac.at>
<fbd31452-fa93-4f31-ae87-5c4a22d75850n@googlegroups.com>
<2023Jan16.180404@mips.complang.tuwien.ac.at>
<cbabd8ad-a9e4-448a-93ad-d29ba18cbadan@googlegroups.com>
<2023Jan17.095907@mips.complang.tuwien.ac.at>
<12ea9472-89ea-41cb-924f-d39ec8f85a7an@googlegroups.com>
<2023Jan19.093620@mips.complang.tuwien.ac.at>
<27c10e48-a327-4ffe-a6a3-427034325c49n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 10 Jun 2023 19:57:07 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3c2ccde43b5350e23ec8dde74430ce50";
logging-data="2535729"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX188GMZkxXIc3EjZPJ20fUxRtuXEbWb5Y4g="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:KC7Gfq18rZu7FqpUVX9f0zt6Omw=
X-Mozilla-News-Host: news://news.eternal-september.org
In-Reply-To: <27c10e48-a327-4ffe-a6a3-427034325c49n@googlegroups.com>
 by: Paul A. Clayton - Sat, 10 Jun 2023 19:57 UTC

MitchAlsup wrote:
> On Thursday, January 19, 2023 at 3:05:23 AM UTC-6, Anton Ertl wrote:
>> MitchAlsup <Mitch...@aol.com> writes:
>
>>> When you have milti-word atomic events, either all the data ahs been
>>> updated or none. If you have to relinquish a line it is likely that you will
>>> fail the atomic event, and thus have not made forward progress.
>>
>> You will not repond to the line request with a "line has changed to
>> xxx" message then, so in the scenario above the load would not be
>> canceled.
> <
> Right, you respond with:: "I have the line and I can't give it to you yet".
> So when he receives this response one of 2 things happens::
> If he was doing an atomic event, he fails the event
> If he was not doing an atomic event he retries.
> {Since the latency of the response and re-request take long enough the NAKer gets to make forward progress..

Hmm. I wonder if there are cases where pausing/partially replaying
an atomic event might be better than failing and retrying (or
choosing a different action). If other participants are cache
misses, failing on receiving a (short term) NAK and having
software retry might be less efficient/performant. (I suppose it
would also be possible for an atomic operation to release a cache
line temporarily. E.g., a core might gain ownership of a cache
line before data has returned from main memory (or a remote cache
if the home coherence agent granted ownership to a core closer to
that agent before the data was available?). If the "earlier"
atomic operation will have to wait anyway, it might be better to
make it a "later" atomic operation at least with respect to the
one shared participant.)

[snip]
>> 1) The transactional memory stuff could be kept in the speculative
>> part of the core until the transaction is complete, and only then be
>> transferred to the caches, so no cache rollback becomes necessary of a
>> transaction is cancelled. Yes, progress is not guaranteed, but
>> progress is also a problem with Spectre-vulnerable cores. Progress
>> has to be solved separately, fixing Spectre does not help (and
>> probably does not hurt) here.
> <
> What if the transaction's live data exceeds the execution window ??
> TM fails always, sometimes, never ???

I suspect the question of interest is not whether it always fails
but whether it makes sense to use another mechanism. "Always
fails" is not the only case where another mechanism would be
preferred.

> What if the transaction exceeds the L1 cache ??

For the read set, conservative filters can handle that case. It is
also quite possible to have a versioned memory where main memory
pages are allocated on cache eviction (like Mill's backless
memory). For largish *dense* transactions (or tiny page sizes☺)
the memory overhead might not be horrible. (For a limited local
version count, delta compression might help for less dense
transactions.)

One problem with large transactions is that the retry cost can be
large. Even if participants are aggressively prefetched, a lot of
work will have to be repeated even if only one value changed [at
least in the usual total failure model]. If the software is
inspecting a large amount of loosely coupled data atomically,
failing the entire transaction from a single write conflict which
could otherwise be locally re-executed seems suboptimal.

Nesting also introduces issues. Flattening to a single transaction
is the simplest way to handle nesting, but such increases the
amount of work that must be redone on a failure/retry.

In some sense ESM provides a lock name composed of the six(?)
cache block addresses. I think denser lock names than a
concatenation of the addresses (or of addresses of elided locks)
might facilitate hardware locking and/or queuing for larger
transactions. (Side thought: in some ways lock-guarded memory
collections are similar to memory partitioning. Just as a
coherence request for addresses outside a partition can be
ignored, coherence requests might be filtered via named locks.)

(Conservative filters based on addresses also provide a somewhat
denser lock name but group the lock-guarded addresses based on
addresses rather than semantic use so that false conflicts may be
more likely.)

If the lock names are the same for a retry of a transaction (even
if the participants are different), it might be possible to order
transactions to reduce/avoid conflicts and/or increase throughput.
Lock names might even be helpful for choosing to perform a
transaction remotely (a specific participant list seems more
useful for choosing where to locate the work but that might not be
practical in all cases; the choice of a participant might be
dependent on the value of another participant but the more work
that is done at the original location the more expensive moving
the rest to another location is likely to be).

Being able to "activate" locks under actual contention might
reduce some problems with HTM. Using memory contents for locks has
the advantage of exploiting existing mechanisms, but locks are
somewhat different.

[snip]
> Some people/designers think HTM can be successful.
> I am not one of those.

I am a person but not a designer and I think something broader
than ESM could be useful, particularly if integrated with other
aspects of communication (including "remote procedure calls") and
speculation.

Designing an interface that is extendable (as one learns about
current and potential uses) but also compatible without excessive
costs in use or implementation is difficult.

Extending the read set of ESM to "L1 cache" would seem to require
relatively little additional hardware, but greatly increases the
difficulty of using such effectively. Even probabilistic forward
progress might be difficult to design in software and hardware.
(Cliff Click's "IWannaBit!" proposed a single bit for tracking L1
evictions, believing that this 'simple' HTM would have some uses.)

If a feature is hard to use well or unexpectedly fails to work
well in certain cases, software developers will be offended and
reject the feature. Even presenting a feature as a work-in-
progress — while also guaranteeing some forward compatibility so
that testing effort is not wasted — can hinder use. Dealing with
fuzzy constraints like cache conflict misses is generally
unpleasant, including such frustration as a basic part of using a
feature will not make that feature attractive.

(One of the problems mentioned for Azul Systems HTM was the common
use in Java of software performance counters, which introduced
universal conflicts — the counter participated in all
transactions. Rewriting all of the Java libraries was not an
option.)

Re: Hardware vs. Software bugs

<526f5bc6-be6d-45cc-a79a-f991f24d3efcn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32775&group=comp.arch#32775

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:11c2:b0:3e4:e17f:a544 with SMTP id n2-20020a05622a11c200b003e4e17fa544mr2335120qtk.12.1686525518083;
Sun, 11 Jun 2023 16:18:38 -0700 (PDT)
X-Received: by 2002:a05:6870:3a2d:b0:1a6:6684:e85a with SMTP id
du45-20020a0568703a2d00b001a66684e85amr1168734oab.7.1686525517826; Sun, 11
Jun 2023 16:18:37 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 11 Jun 2023 16:18:37 -0700 (PDT)
In-Reply-To: <u62kij$2dc9h$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4c76:5eca:db1b:54ff;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4c76:5eca:db1b:54ff
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<hMlwL.39866$4jN7.10690@fx02.iad> <2023Jan14.132018@mips.complang.tuwien.ac.at>
<0lCwL.126778$PXw7.84059@fx45.iad> <2023Jan15.094445@mips.complang.tuwien.ac.at>
<fbd31452-fa93-4f31-ae87-5c4a22d75850n@googlegroups.com> <2023Jan16.180404@mips.complang.tuwien.ac.at>
<cbabd8ad-a9e4-448a-93ad-d29ba18cbadan@googlegroups.com> <2023Jan17.095907@mips.complang.tuwien.ac.at>
<12ea9472-89ea-41cb-924f-d39ec8f85a7an@googlegroups.com> <2023Jan19.093620@mips.complang.tuwien.ac.at>
<27c10e48-a327-4ffe-a6a3-427034325c49n@googlegroups.com> <u62kij$2dc9h$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <526f5bc6-be6d-45cc-a79a-f991f24d3efcn@googlegroups.com>
Subject: Re: Hardware vs. Software bugs
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sun, 11 Jun 2023 23:18:38 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 11526
 by: MitchAlsup - Sun, 11 Jun 2023 23:18 UTC

On Saturday, June 10, 2023 at 2:57:11 PM UTC-5, Paul A. Clayton wrote:
> MitchAlsup wrote:
> > On Thursday, January 19, 2023 at 3:05:23 AM UTC-6, Anton Ertl wrote:
> >> MitchAlsup <Mitch...@aol.com> writes:
> >
> >>> When you have milti-word atomic events, either all the data ahs been
> >>> updated or none. If you have to relinquish a line it is likely that you will
> >>> fail the atomic event, and thus have not made forward progress.
> >>
> >> You will not repond to the line request with a "line has changed to
> >> xxx" message then, so in the scenario above the load would not be
> >> canceled.
> > <
> > Right, you respond with:: "I have the line and I can't give it to you yet".
> > So when he receives this response one of 2 things happens::
> > If he was doing an atomic event, he fails the event
> > If he was not doing an atomic event he retries.
> > {Since the latency of the response and re-request take long enough the NAKer gets to make forward progress..
>
> Hmm. I wonder if there are cases where pausing/partially replaying
> an atomic event might be better than failing and retrying (or
> choosing a different action).
<
I am going to go with NO here. ATOMIC means that one or more
pieces of state change from the prior value set to the current
value set instantaneously as seen by all interested 3rd parties.
<
Adding pause or replay implies you have broken the above rule
by allowing some 3rd party to see the prior value set--which
means that abandonment of the event is the only sane recourse.
<
> If other participants are cache
> misses, failing on receiving a (short term) NAK and having
> software retry might be less efficient/performant. (I suppose it
> would also be possible for an atomic operation to release a cache
> line temporarily. E.g., a core might gain ownership of a cache
> line before data has returned from main memory (or a remote cache
> if the home coherence agent granted ownership to a core closer to
> that agent before the data was available?).
<
What possible good can come of that ? Ownership is a prerequisite
to modification, thus violating the above rule.
<
> If the "earlier"
> atomic operation will have to wait anyway, it might be better to
> make it a "later" atomic operation at least with respect to the
> one shared participant.)
<
You seem to forget that the ATOMIC stuff I am talking about
transpires over multiple instructions. Each subject to cache and TLB
misses and the uncertainty of completion time. Add on top of this
the uncertainty that the event will even succeed.
<
Single instruction ATOMICs are easy compared to multiple instruction
and multiple memory locations participating in a single event. But
multiple memory ATOMICs allow one to decrease the exponent of
interference. Take, for example, moving an element from one place
in a concurrent data structure to another such that no 3rd party can
ever see that the element is ever "not" in the CDS. That is, you cannot
remove the element and then insert it somewhere else because some
3rd party might scan the whole CDS while you are paged out and not
find it in there.
>
> [snip]
> >> 1) The transactional memory stuff could be kept in the speculative
> >> part of the core until the transaction is complete, and only then be
> >> transferred to the caches, so no cache rollback becomes necessary of a
> >> transaction is cancelled. Yes, progress is not guaranteed, but
> >> progress is also a problem with Spectre-vulnerable cores. Progress
> >> has to be solved separately, fixing Spectre does not help (and
> >> probably does not hurt) here.
> > <
> > What if the transaction's live data exceeds the execution window ??
> > TM fails always, sometimes, never ???
>
> I suspect the question of interest is not whether it always fails
> but whether it makes sense to use another mechanism. "Always
> fails" is not the only case where another mechanism would be
> preferred.
>
> > What if the transaction exceeds the L1 cache ??
>
> For the read set, conservative filters can handle that case. It is
> also quite possible to have a versioned memory where main memory
> pages are allocated on cache eviction (like Mill's backless
> memory). For largish *dense* transactions (or tiny page sizes☺)
> the memory overhead might not be horrible. (For a limited local
> version count, delta compression might help for less dense
> transactions.)
>
> One problem with large transactions is that the retry cost can be
> large. Even if participants are aggressively prefetched, a lot of
> work will have to be repeated even if only one value changed [at
> least in the usual total failure model]. If the software is
> inspecting a large amount of loosely coupled data atomically,
> failing the entire transaction from a single write conflict which
> could otherwise be locally re-executed seems suboptimal.
<
The problem with TM is that software can use in in ways that
create unbounded numbers of things to track. HW is never good
at unbounded stuff.....
<
>
> Nesting also introduces issues. Flattening to a single transaction
> is the simplest way to handle nesting, but such increases the
> amount of work that must be redone on a failure/retry.
<
Nesting increases the exponent of unboundedness.
>
> In some sense ESM provides a lock name composed of the six(?)
> cache block addresses. I think denser lock names than a
> concatenation of the addresses (or of addresses of elided locks)
> might facilitate hardware locking and/or queuing for larger
> transactions. (Side thought: in some ways lock-guarded memory
> collections are similar to memory partitioning. Just as a
> coherence request for addresses outside a partition can be
> ignored, coherence requests might be filtered via named locks.)
>
> (Conservative filters based on addresses also provide a somewhat
> denser lock name but group the lock-guarded addresses based on
> addresses rather than semantic use so that false conflicts may be
> more likely.)
>
> If the lock names are the same for a retry of a transaction (even
> if the participants are different), it might be possible to order
> transactions to reduce/avoid conflicts and/or increase throughput.
<
But you see, if an ATOMIC event fails, the data structure is likely
to have already changed by the time you notice your own failure,
and retry is never the proper recourse, you have to start at the
very beginning, and assume that every thing you got from the
CDS has become stale. That is, if you are trying to deQueue a
unit of work and it fails, it is unlikely that the head pointer onto
the list remains valid, and every item on the list could have
changed. You just don't know.
<
> Lock names might even be helpful for choosing to perform a
> transaction remotely (a specific participant list seems more
> useful for choosing where to locate the work but that might not be
> practical in all cases; the choice of a participant might be
> dependent on the value of another participant but the more work
> that is done at the original location the more expensive moving
> the rest to another location is likely to be).
>
> Being able to "activate" locks under actual contention might
> reduce some problems with HTM. Using memory contents for locks has
> the advantage of exploiting existing mechanisms, but locks are
> somewhat different.
>
> [snip]
> > Some people/designers think HTM can be successful.
> > I am not one of those.
>
> I am a person but not a designer and I think something broader
> than ESM could be useful, particularly if integrated with other
> aspects of communication (including "remote procedure calls") and
> speculation.
>
> Designing an interface that is extendable (as one learns about
> current and potential uses) but also compatible without excessive
> costs in use or implementation is difficult.
>
> Extending the read set of ESM to "L1 cache" would seem to require
> relatively little additional hardware, but greatly increases the
> difficulty of using such effectively. Even probabilistic forward
> progress might be difficult to design in software and hardware.
> (Cliff Click's "IWannaBit!" proposed a single bit for tracking L1
> evictions, believing that this 'simple' HTM would have some uses.)
>
> If a feature is hard to use well or unexpectedly fails to work
> well in certain cases, software developers will be offended and
> reject the feature. Even presenting a feature as a work-in-
> progress — while also guaranteeing some forward compatibility so
> that testing effort is not wasted — can hinder use. Dealing with
> fuzzy constraints like cache conflict misses is generally
> unpleasant, including such frustration as a basic part of using a
> feature will not make that feature attractive.
>
> (One of the problems mentioned for Azul Systems HTM was the common
> use in Java of software performance counters, which introduced
> universal conflicts — the counter participated in all
> transactions. Rewriting all of the Java libraries was not an
> option.)


Click here to read the complete article
Re: Hardware vs. Software bugs

<u6smnb$2gaa7$6@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32883&group=comp.arch#32883

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Hardware vs. Software bugs
Date: Tue, 20 Jun 2023 13:13:15 -0400
Organization: A noiseless patient Spider
Lines: 247
Message-ID: <u6smnb$2gaa7$6@dont-email.me>
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<hMlwL.39866$4jN7.10690@fx02.iad>
<2023Jan14.132018@mips.complang.tuwien.ac.at>
<0lCwL.126778$PXw7.84059@fx45.iad>
<2023Jan15.094445@mips.complang.tuwien.ac.at>
<fbd31452-fa93-4f31-ae87-5c4a22d75850n@googlegroups.com>
<2023Jan16.180404@mips.complang.tuwien.ac.at>
<cbabd8ad-a9e4-448a-93ad-d29ba18cbadan@googlegroups.com>
<2023Jan17.095907@mips.complang.tuwien.ac.at>
<12ea9472-89ea-41cb-924f-d39ec8f85a7an@googlegroups.com>
<2023Jan19.093620@mips.complang.tuwien.ac.at>
<27c10e48-a327-4ffe-a6a3-427034325c49n@googlegroups.com>
<u62kij$2dc9h$1@dont-email.me>
<526f5bc6-be6d-45cc-a79a-f991f24d3efcn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 20 Jun 2023 17:13:15 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f3d0bcd62b374bce57dd1eafa9d08e5f";
logging-data="2632007"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+UJr2nP+9gKSTAs0odgmTTrzHB8wt06qs="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:0oLCDf1F25dHzBUGlGOFZOnH34Q=
In-Reply-To: <526f5bc6-be6d-45cc-a79a-f991f24d3efcn@googlegroups.com>
X-Mozilla-News-Host: news://news.eternal-september.org
 by: Paul A. Clayton - Tue, 20 Jun 2023 17:13 UTC

On 6/11/23 7:18 PM, MitchAlsup wrote:
> On Saturday, June 10, 2023 at 2:57:11 PM UTC-5, Paul A. Clayton wrote:
[snip]>> Hmm. I wonder if there are cases where pausing/partially
replaying
>> an atomic event might be better than failing and retrying (or
>> choosing a different action).
> <
> I am going to go with NO here. ATOMIC means that one or more
> pieces of state change from the prior value set to the current
> value set instantaneously as seen by all interested 3rd parties.
> <
> Adding pause or replay implies you have broken the above rule
> by allowing some 3rd party to see the prior value set--which
> means that abandonment of the event is the only sane recourse.

If an external read receives the old value that will be
overwritten by the atomic operation, the operation could still be
atomic but occurring "after" that read. If the external read does
not update the value (atomically), then this seems possible
(though recognizing such read-only uses adds complexity). With
multiple such reads across multiple external agents, it might be
excessively complex (or impossible at some point) to guarantee
consistent ordering.

Delaying the operation until "after" an external write (partial
replay) seems even more challenging. If much of the work in the
operation depends on the value that will be delivered later, full
replay could be faster (or at least nearly as fast, reducing the
incentive to add overhead for handling such special cases).
Dynamically changing a cache hit with follow-on computation into a
cache miss (and repeating the dependent computation) seems non-
trivial. Cache misses seem likely to be easier for handling
dynamic ordering; not only is there more "free time" to make a
decision but dependent work will not have been done.

In the general case, tracking dependencies and ordering seems very
unlikely to be worthwhile, but I _suspect_ there are cases where
there would be a net benefit.

>> If other participants are cache
>> misses, failing on receiving a (short term) NAK and having
>> software retry might be less efficient/performant. (I suppose it
>> would also be possible for an atomic operation to release a cache
>> line temporarily. E.g., a core might gain ownership of a cache
>> line before data has returned from main memory (or a remote cache
>> if the home coherence agent granted ownership to a core closer to
>> that agent before the data was available?).
> <
> What possible good can come of that ? Ownership is a prerequisite
> to modification, thus violating the above rule.

What good? If an operation is waiting on other operands,
transferring ownership of an acquired operand value may allow more
forward progress. The borrowing agent could return ownership
(possibly with a new value) but with some overlap of latencies
(waiting for the other agent to return ownership and waiting for
the memory system to provide other operands).

(In theory, in some cases an update could be "silent" from the
perspective of the updater [e.g., updating a saturating {or
wrapping or otherwise overflow-avoiding} counter] and transferred
to another actor.)

With multiple agents and/or multiple shared operands, establishing
a consistent ordering would be more challenging.

(False sharing also introduces both potential for parallelism and
complexity for extracting such. Cache line granular operations
encourage using as much of the content of a cache line as possible
in a shortish time, encouraging true sharing. Yet some false
sharing probably still exists.)

>> If the "earlier"
>> atomic operation will have to wait anyway, it might be better to
>> make it a "later" atomic operation at least with respect to the
>> one shared participant.)
> <
> You seem to forget that the ATOMIC stuff I am talking about
> transpires over multiple instructions. Each subject to cache and TLB
> misses and the uncertainty of completion time. Add on top of this
> the uncertainty that the event will even succeed.

I do realize that uncertainty of completion (both control flow
speculation — one would prefer not to wait always until the
operation is known to be attempted — and atomicity speculation) is
a significant difficulty. The _delays_ in completion time (not
their uncertainties) was what I was seeking to exploit, though
delay and uncertainty are not perfectly orthogonal.

An atomic operation that temporarily shares an old value can
happen before or _not happen_. An atomic operation that borrows a
value can return the old value on failure, quite possibly without
delaying the lending atomic operation (if it is still waiting for
cache misses).

> Single instruction ATOMICs are easy compared to multiple instruction
> and multiple memory locations participating in a single event. But
> multiple memory ATOMICs allow one to decrease the exponent of
> interference. Take, for example, moving an element from one place
> in a concurrent data structure to another such that no 3rd party can
> ever see that the element is ever "not" in the CDS. That is, you cannot
> remove the element and then insert it somewhere else because some
> 3rd party might scan the whole CDS while you are paged out and not
> find it in there.

The general case is typically harder to handle efficiently. A
significant question is whether some special cases can be handled
specially with lower total cost. I speculate that there is
potential for improvement in handling some special cases.

ESM has a system-wide arbiter to provide an ordering/interference
count after a second attempt. I suspect my concept is to provide
such an ordering _during the first attempt_. Obviously such would
be more difficult (especially as one would not want to introduce
overheads for non-conflicting atomics and atomics friendly to
simple retry), but I suspect there is an opportunity to reduce
conflict cost.

Providing a global order of all atomic operations seems
straightforward; a simple clock with an agent ID appended for the
least significant bits seems sufficient. Avoiding excessively
constraining ordering (when atomics do not conflict) seems more
challenging. With only one operand (or one conflicting operand, a
timestamp dependency chain seems sufficient (one might even be
able to optimize routing by not actually progressing in timestamp
order at least for single operand atomics). With two operands, the
complexity seems much greater, though limiting the number of
conflicting agents (where 'later' attempts are pushed to a retry
or other mechanism) might allow establishing a consistent order in
a somewhat distributed manner without huge tracking requirements.

(Sometimes software would know how many operands will be used or
may be used [obviously less than 7 for ESM]. Communicating such
information earlier might be useful.)

[snip]
>> One problem with large transactions is that the retry cost can be
>> large. Even if participants are aggressively prefetched, a lot of
>> work will have to be repeated even if only one value changed [at
>> least in the usual total failure model]. If the software is
>> inspecting a large amount of loosely coupled data atomically,
>> failing the entire transaction from a single write conflict which
>> could otherwise be locally re-executed seems suboptimal.
> <
> The problem with TM is that software can use in in ways that
> create unbounded numbers of things to track. HW is never good
> at unbounded stuff.....

Transactions would presumably be bounded by the address space.☺

I would prefer not to set bounds which are "too restrictive".
Providing extra per cache line metadata in L1 does not seem
unreasonably expensive, but it is also not clear that this
provides worthwhile benefits. A conservative filter might be even
less helpful even though it is relatively inexpensive — the longer
duration of a transaction so large to require such a filter would
tend to increase the probability of conflicts.

There is also relatively little interface difference between a
software/firmware abstraction layer and a hardware abstraction
layer. Deadlock, livelock, and glacially slow forward progress are
not appreciably different to a human being. The never or
eventually distinction seems insufficient, though "soon enough" or
not may not be easily defined, expressed, and/or
detected/predicted and may not provide some information a chooser
may like that is potentially available to the progress monitor.

Part of the challenge seems to be in providing an abstraction (for
easier reasoning) while also leaking enough information that
optimization is possible (including choosing a different
abstraction/interface).

>> Nesting also introduces issues. Flattening to a single transaction
>> is the simplest way to handle nesting, but such increases the
>> amount of work that must be redone on a failure/retry.
> <
> Nesting increases the exponent of unboundedness.


Click here to read the complete article
Re: Hardware vs. Software bugs

<PRlkM.1159$o5e9.652@fx37.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32887&group=comp.arch#32887

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx37.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Hardware vs. Software bugs
Newsgroups: comp.arch
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <2023Jan16.180404@mips.complang.tuwien.ac.at> <cbabd8ad-a9e4-448a-93ad-d29ba18cbadan@googlegroups.com> <2023Jan17.095907@mips.complang.tuwien.ac.at> <12ea9472-89ea-41cb-924f-d39ec8f85a7an@googlegroups.com> <2023Jan19.093620@mips.complang.tuwien.ac.at> <27c10e48-a327-4ffe-a6a3-427034325c49n@googlegroups.com> <u62kij$2dc9h$1@dont-email.me> <526f5bc6-be6d-45cc-a79a-f991f24d3efcn@googlegroups.com> <u6smnb$2gaa7$6@dont-email.me>
Lines: 28
Message-ID: <PRlkM.1159$o5e9.652@fx37.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 20 Jun 2023 18:03:27 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Tue, 20 Jun 2023 18:03:27 GMT
X-Received-Bytes: 2273
 by: Scott Lurndal - Tue, 20 Jun 2023 18:03 UTC

"Paul A. Clayton" <paaronclayton@gmail.com> writes:
>On 6/11/23 7:18 PM, MitchAlsup wrote:
>> On Saturday, June 10, 2023 at 2:57:11 PM UTC-5, Paul A. Clayton wrote:
>[snip]>> Hmm. I wonder if there are cases where pausing/partially
>replaying
>>> an atomic event might be better than failing and retrying (or
>>> choosing a different action).
>> <
>> I am going to go with NO here. ATOMIC means that one or more
>> pieces of state change from the prior value set to the current
>> value set instantaneously as seen by all interested 3rd parties.
>> <
>> Adding pause or replay implies you have broken the above rule
>> by allowing some 3rd party to see the prior value set--which
>> means that abandonment of the event is the only sane recourse.
>
>If an external read receives the old value that will be
>overwritten by the atomic operation, the operation could still be
>atomic but occurring "after" that read. If the external read does
>not update the value (atomically),

Note that external reads may have side effects, e.g. reading
a memory mapped interrupt controller acknowledge register
consumes a pending interrupt. Races here would be bad.

Fortunately ARM requires all aligned loads and stores to be single-copy atomic.

Re: Hardware vs. Software bugs

<ed216e95-46ee-44c4-8901-8f0a6a44fbc9n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32888&group=comp.arch#32888

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:4ba2:0:b0:62d:f25a:3ec1 with SMTP id i2-20020ad44ba2000000b0062df25a3ec1mr1636987qvw.8.1687287484113;
Tue, 20 Jun 2023 11:58:04 -0700 (PDT)
X-Received: by 2002:a05:6830:1d4:b0:6b2:9223:e427 with SMTP id
r20-20020a05683001d400b006b29223e427mr2355200ota.0.1687287483818; Tue, 20 Jun
2023 11:58:03 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Jun 2023 11:58:03 -0700 (PDT)
In-Reply-To: <u6smnb$2gaa7$6@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:315b:1da6:985c:6101;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:315b:1da6:985c:6101
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<hMlwL.39866$4jN7.10690@fx02.iad> <2023Jan14.132018@mips.complang.tuwien.ac.at>
<0lCwL.126778$PXw7.84059@fx45.iad> <2023Jan15.094445@mips.complang.tuwien.ac.at>
<fbd31452-fa93-4f31-ae87-5c4a22d75850n@googlegroups.com> <2023Jan16.180404@mips.complang.tuwien.ac.at>
<cbabd8ad-a9e4-448a-93ad-d29ba18cbadan@googlegroups.com> <2023Jan17.095907@mips.complang.tuwien.ac.at>
<12ea9472-89ea-41cb-924f-d39ec8f85a7an@googlegroups.com> <2023Jan19.093620@mips.complang.tuwien.ac.at>
<27c10e48-a327-4ffe-a6a3-427034325c49n@googlegroups.com> <u62kij$2dc9h$1@dont-email.me>
<526f5bc6-be6d-45cc-a79a-f991f24d3efcn@googlegroups.com> <u6smnb$2gaa7$6@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ed216e95-46ee-44c4-8901-8f0a6a44fbc9n@googlegroups.com>
Subject: Re: Hardware vs. Software bugs
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Tue, 20 Jun 2023 18:58:04 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Tue, 20 Jun 2023 18:58 UTC

On Tuesday, June 20, 2023 at 12:13:19 PM UTC-5, Paul A. Clayton wrote:
> On 6/11/23 7:18 PM, MitchAlsup wrote:
> > On Saturday, June 10, 2023 at 2:57:11 PM UTC-5, Paul A. Clayton wrote:
<snip>
> > Single instruction ATOMICs are easy compared to multiple instruction
> > and multiple memory locations participating in a single event. But
> > multiple memory ATOMICs allow one to decrease the exponent of
> > interference. Take, for example, moving an element from one place
> > in a concurrent data structure to another such that no 3rd party can
> > ever see that the element is ever "not" in the CDS. That is, you cannot
> > remove the element and then insert it somewhere else because some
> > 3rd party might scan the whole CDS while you are paged out and not
> > find it in there.
> The general case is typically harder to handle efficiently. A
> significant question is whether some special cases can be handled
> specially with lower total cost. I speculate that there is
> potential for improvement in handling some special cases.
>
> ESM has a system-wide arbiter to provide an ordering/interference
optional--small systems do not need, big systems cannot get by
without. Small is < 32, big is > 1024 CPUs.
> count after a second attempt. I suspect my concept is to provide
> such an ordering _during the first attempt_. Obviously such would
> be more difficult (especially as one would not want to introduce
> overheads for non-conflicting atomics and atomics friendly to
> simple retry), but I suspect there is an opportunity to reduce
> conflict cost.
<
ESM operates under several modes. After a successful ESM
event, the order is set to Optimistic. In optimistic mode the
upcoming ATOMIC event will try to run the event as if there were
no ATOMIC decoration on the instructions performing the event.
That is they operate as fast as the pipeline can perform them.
<
>
> Providing a global order of all atomic operations seems
> straightforward; a simple clock with an agent ID appended for the
> least significant bits seems sufficient. Avoiding excessively
> constraining ordering (when atomics do not conflict) seems more
> challenging. With only one operand (or one conflicting operand, a
> timestamp dependency chain seems sufficient (one might even be
> able to optimize routing by not actually progressing in timestamp
> order at least for single operand atomics). With two operands, the
> complexity seems much greater, though limiting the number of
> conflicting agents (where 'later' attempts are pushed to a retry
> or other mechanism) might allow establishing a consistent order in
> a somewhat distributed manner without huge tracking requirements.
<
And that is why I did not specify it that way.
<
Once interference has been detected twice, the mode changes to
Methodological. Here, the participating addresses are gathered up
in a message and sent to a system arbiter. The arbiter applies each
address to its "in progress list" and if none of the addresses match
an address already in progress, then up to 8 cache lines are granted
to the requestor, and then the requestor is free to NaK snoops on
those addresses. The addresses are installed in the "stack" and their
counts set to 0.
>
However, if the arbiter find and address already in use, it sends back
the count of that address are the result of the request. This result is,
in effect, how many others are also trying an atomic event ono that
same address. SW can use this to choose different addresses, or wait,
or whatever SW decides is best.
<
Then when the event is complete, the same addresses are again
sent to the arbiter and the arbiter removes these addresses from its
"stack".
>
> (Sometimes software would know how many operands will be used or
> may be used [obviously less than 7 for ESM]. Communicating such
> information earlier might be useful.)
>
> [snip]
> >> One problem with large transactions is that the retry cost can be
> >> large. Even if participants are aggressively prefetched, a lot of
> >> work will have to be repeated even if only one value changed [at
> >> least in the usual total failure model]. If the software is
> >> inspecting a large amount of loosely coupled data atomically,
> >> failing the entire transaction from a single write conflict which
> >> could otherwise be locally re-executed seems suboptimal.
> > <
> > The problem with TM is that software can use in in ways that
> > create unbounded numbers of things to track. HW is never good
> > at unbounded stuff.....
<
> Transactions would presumably be bounded by the address space.☺
<
Great: one only has to monitor 28,823,037,6151,711,743 cache lines.
>
> I would prefer not to set bounds which are "too restrictive".
<
Can everyone agree that 1 container in memory is obviously too
small ?? {TS, T&TS, LL-SC, CAS}
<
Can everyone agree that the fewer ATOMIC events the less the
interference ?? That is more powerful primitives end up needing
fewer events/second.
<
> Providing extra per cache line metadata in L1 does not seem
> unreasonably expensive, but it is also not clear that this
> provides worthwhile benefits.
<
Note: Neither ESM nor AST modify the L1 cache in any way to
support multi-line ATOMICs. It is done at the Miss buffer which
has to be snooped for other reasons. And that is the reason to
stop at 8 cache lines.
<
> A conservative filter might be even
> less helpful even though it is relatively inexpensive — the longer
> duration of a transaction so large to require such a filter would
> tend to increase the probability of conflicts.
<
As long as you already have several lines in the miss buffer,
you basically add 0 additional resources to the problem space.
>
> There is also relatively little interface difference between a
> software/firmware abstraction layer and a hardware abstraction
> layer. Deadlock, livelock, and glacially slow forward progress are
> not appreciably different to a human being. The never or
> eventually distinction seems insufficient, though "soon enough" or
> not may not be easily defined, expressed, and/or
> detected/predicted and may not provide some information a chooser
> may like that is potentially available to the progress monitor.
<
Nick McLaren used to note that his 64 CPU SPARC V9 server would
"not appear to be doing anything" for over 1 full second when he was
testing ATOMIC stuff. When one looks into the bus protocol and
the applicable latencies, and the cubic nature of the memory traffic
1 second is about what one would find for worst case.
<
1 second is enough time for even a human to notice--and is far from
performant.
>
> Part of the challenge seems to be in providing an abstraction (for
> easier reasoning) while also leaking enough information that
> optimization is possible (including choosing a different
> abstraction/interface).
<
The other half is the SW does not know what it wants wrt ATOMICs.
<
> >> Nesting also introduces issues. Flattening to a single transaction
> >> is the simplest way to handle nesting, but such increases the
> >> amount of work that must be redone on a failure/retry.
> > <
> > Nesting increases the exponent of unboundedness.
> Yet it also presents opportunities similar to lock nesting.
> Perhaps constraints might be developed for transaction nesting
> similar how constraints have been developed for nested locking
> without blocking forward progress.
>
> Just as locking is not a solved problem, I would not expect
> transactional memory to be solved. However, I am optimistic that
> forward progress is possible.
>
> The appeal of transactional memory seems to be in shifting lock
> identification/naming (conflict detection) to a lower layer and
> dynamically "naming" the lock (avoiding false dependencies while
> having less overhead and more probability of forward
> progress/correctness than fine-grained locks). A compiler could
> presumably remove the former benefit by defining appropriate locks
> given the programmer-provided atomic sections (as well as use
> other mechanisms like read-copy-update and hardware atomics; this
> seems to be "merely" which layer does what. Dynamic 'lock
> management' might be somewhat possible in software by monitoring
> lock contention, but like branch prediction some hardware
> involvement seems appropriate.
<
Transactional Memory should be like shopping in a department store.
As the customer moves around she picks up items and puts them in
her cart. later she brings the cart to the checkout stand and the items
are recognized, assigned to customer, and paid for.
<
The transactional Memory model has the added implication that no one
else can go to those displays until she is done at the checkout station.
>
>
> [snip]
> >> If the lock names are the same for a retry of a transaction (even
> >> if the participants are different), it might be possible to order
> >> transactions to reduce/avoid conflicts and/or increase throughput.
> > <
> > But you see, if an ATOMIC event fails, the data structure is likely
> > to have already changed by the time you notice your own failure,
> > and retry is never the proper recourse, you have to start at the
> > very beginning, and assume that every thing you got from the
> > CDS has become stale. That is, if you are trying to deQueue a
> > unit of work and it fails, it is unlikely that the head pointer onto
> > the list remains valid, and every item on the list could have
> > changed. You just don't know.
<
> I think there are times when an agent can be certain that some
> data is not stale. Determining independence might not be tractable
> even when possible, but I feel looking into the possibility might
> be worthwhile (even if partial retries are never practical, one
> might well learn something useful in the exploration — besides
> greater skill in detecting pointless rabbitholes☺).
<
The simple rule for the compiler is that if an ATOMIC event fails
everything you touched in and around the concurrent data
structure is stale--heck you don't know that your thread has not
been swapped out for a week and nothing in the CDS is the same
as when you last looked.
<
Compiler hate having to forget stuff like this.
>
> I do not know if more complex interleaving of loads and stores
> (while maintaining specific ordering guarantees) would be
> worthwhile on the hardware side. Such could also easily impact the
> software side, similar to inserting barriers after every access to
> shareable memory to treat a weak consistency model to a strong
> one; a potential benefit that is too complex to exploit (or too
> rarely exploitable to be worthwhile) is not especially beneficial.
<
My 66000 operates in a fairly loose memory ordering roughly
equivalent to Opteron. However, upon encountering the first
instruction of an ATOMIC event, the core switches to sequentially
consistent (with all the ramifications thereto}. After completing
the event memory order is relaxed again.
<
In my days we called this memory ordering "causal".
<
But having the core switch from causal to sequentially consistent or
strongly ordered (MMI/O) alleviates the burden on SW to emit fence
instructions.
<
>
> To me it *feels* like increasing flexibility beyond ESM is likely
> to be worthwhile.
<
I would like to increase the flexibility, but how ? and how to do that
without adding new resources ?
<
Effectively ASF and ESM are what HW would have had to implement
anyway in order to microcode DCAS, and other more exotic multi-
container ATOMIC primitives.
<
However, an important point is being missed. ASM and ESF effectively
perform ATOMIC stuff based on the address of the container not on the
data contained within the container.
<
In SW one can compare pointers for equality, but not easily for nearness !!
In HW one can compare both ways at the same time.
<
// SW
# define near(p) (((uint64_t)p)&(cachelinesize-1))
<
if( near(p) == near(q) ) // then these point to the same cache line.
<
// HW
if( p<63..6> == q<63..6> ) // then these point to the same cache line.
<
And this is partially because bit fields never became first class citizens
of languages (especially as part of pointers.)
<
> The "one, few, many" resource management
> consideration does seem to have some application (though unlike
> register allocation, ESM's six units — even though complicated by
> being cache blocks — is more like "one" because the choice is
> probably just "spill" to a lock if more than 6 rather than trying
> to choose which units would be "spilled" to lock-protected
> storage). "Many" may be easy to handle for correctness
> ("spills/resource depletion" sufficiently rare and heuristics good
> enough), but performance would be a problem. I think I am seduced
> by the "few" option, where one avoids the "always spill"
> shortcoming of "one" and the generally poor and often erratic
> performance of "many". Ignoring complexity costs is probably not a
> sign of wisdom.
<
One has been tried {TS, T%TS, LL-SC, CAS} and found wanting
Many has been tried {TM} and is yet to be successful
<
That is why working with a few seem promising.
>
> I seem to be blathering now, so I should quit before I get further
> behind.


Click here to read the complete article
Re: Hardware vs. Software bugs

<608cabc3-edae-41e1-9750-3c7dc117e9dcn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32889&group=comp.arch#32889

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5b06:0:b0:3f9:e556:c2cc with SMTP id m6-20020ac85b06000000b003f9e556c2ccmr3954958qtw.0.1687287741407;
Tue, 20 Jun 2023 12:02:21 -0700 (PDT)
X-Received: by 2002:a9d:73c2:0:b0:6b2:9133:f224 with SMTP id
m2-20020a9d73c2000000b006b29133f224mr2182623otk.6.1687287741129; Tue, 20 Jun
2023 12:02:21 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Jun 2023 12:02:20 -0700 (PDT)
In-Reply-To: <PRlkM.1159$o5e9.652@fx37.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:315b:1da6:985c:6101;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:315b:1da6:985c:6101
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<2023Jan16.180404@mips.complang.tuwien.ac.at> <cbabd8ad-a9e4-448a-93ad-d29ba18cbadan@googlegroups.com>
<2023Jan17.095907@mips.complang.tuwien.ac.at> <12ea9472-89ea-41cb-924f-d39ec8f85a7an@googlegroups.com>
<2023Jan19.093620@mips.complang.tuwien.ac.at> <27c10e48-a327-4ffe-a6a3-427034325c49n@googlegroups.com>
<u62kij$2dc9h$1@dont-email.me> <526f5bc6-be6d-45cc-a79a-f991f24d3efcn@googlegroups.com>
<u6smnb$2gaa7$6@dont-email.me> <PRlkM.1159$o5e9.652@fx37.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <608cabc3-edae-41e1-9750-3c7dc117e9dcn@googlegroups.com>
Subject: Re: Hardware vs. Software bugs
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Tue, 20 Jun 2023 19:02:21 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3627
 by: MitchAlsup - Tue, 20 Jun 2023 19:02 UTC

On Tuesday, June 20, 2023 at 1:03:31 PM UTC-5, Scott Lurndal wrote:
> "Paul A. Clayton" <paaron...@gmail.com> writes:
> >On 6/11/23 7:18 PM, MitchAlsup wrote:
> >> On Saturday, June 10, 2023 at 2:57:11 PM UTC-5, Paul A. Clayton wrote:
> >[snip]>> Hmm. I wonder if there are cases where pausing/partially
> >replaying
> >>> an atomic event might be better than failing and retrying (or
> >>> choosing a different action).
> >> <
> >> I am going to go with NO here. ATOMIC means that one or more
> >> pieces of state change from the prior value set to the current
> >> value set instantaneously as seen by all interested 3rd parties.
> >> <
> >> Adding pause or replay implies you have broken the above rule
> >> by allowing some 3rd party to see the prior value set--which
> >> means that abandonment of the event is the only sane recourse.
> >
> >If an external read receives the old value that will be
> >overwritten by the atomic operation, the operation could still be
> >atomic but occurring "after" that read. If the external read does
> >not update the value (atomically),
<
> Note that external reads may have side effects,
<
ROM is external..............and can never have side effects....
<
What I think you are trying to state is that there are storage
containers accessed by LD and ST instructions that do not
act like memory. Memory has the property that a new read
always gets the latest write. MMI/O & configuration accesses
do not have this property.
<
I just can't see the word external being the correct word.
<
> e.g. reading
> a memory mapped interrupt controller acknowledge register
> consumes a pending interrupt. Races here would be bad.
>
> Fortunately ARM requires all aligned loads and stores to be single-copy atomic.

Re: Hardware vs. Software bugs

<mrokM.8259$Zq81.1050@fx15.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32892&group=comp.arch#32892

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx15.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Hardware vs. Software bugs
Newsgroups: comp.arch
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <2023Jan17.095907@mips.complang.tuwien.ac.at> <12ea9472-89ea-41cb-924f-d39ec8f85a7an@googlegroups.com> <2023Jan19.093620@mips.complang.tuwien.ac.at> <27c10e48-a327-4ffe-a6a3-427034325c49n@googlegroups.com> <u62kij$2dc9h$1@dont-email.me> <526f5bc6-be6d-45cc-a79a-f991f24d3efcn@googlegroups.com> <u6smnb$2gaa7$6@dont-email.me> <PRlkM.1159$o5e9.652@fx37.iad> <608cabc3-edae-41e1-9750-3c7dc117e9dcn@googlegroups.com>
Lines: 59
Message-ID: <mrokM.8259$Zq81.1050@fx15.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 20 Jun 2023 21:00:02 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Tue, 20 Jun 2023 21:00:02 GMT
X-Received-Bytes: 3353
 by: Scott Lurndal - Tue, 20 Jun 2023 21:00 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Tuesday, June 20, 2023 at 1:03:31=E2=80=AFPM UTC-5, Scott Lurndal wrote:
>> "Paul A. Clayton" <paaron...@gmail.com> writes:=20
>> >On 6/11/23 7:18=E2=80=AFPM, MitchAlsup wrote:=20
>> >> On Saturday, June 10, 2023 at 2:57:11=E2=80=AFPM UTC-5, Paul A. Clayto=
>n wrote:=20
>> >[snip]>> Hmm. I wonder if there are cases where pausing/partially=20
>> >replaying=20
>> >>> an atomic event might be better than failing and retrying (or=20
>> >>> choosing a different action).=20
>> >> <=20
>> >> I am going to go with NO here. ATOMIC means that one or more=20
>> >> pieces of state change from the prior value set to the current=20
>> >> value set instantaneously as seen by all interested 3rd parties.=20
>> >> <=20
>> >> Adding pause or replay implies you have broken the above rule=20
>> >> by allowing some 3rd party to see the prior value set--which=20
>> >> means that abandonment of the event is the only sane recourse.=20
>> >=20
>> >If an external read receives the old value that will be=20
>> >overwritten by the atomic operation, the operation could still be=20
>> >atomic but occurring "after" that read. If the external read does=20
>> >not update the value (atomically),
><
>> Note that external reads may have side effects,=20
><
>ROM is external..............and can never have side effects....

Which is why I said "may".

><
>What I think you are trying to state is that there are storage
>containers accessed by LD and ST instructions that do not=20
>act like memory.

I was referring in general to addresses in the physical address
space, regardless of whether they're backed by memory, MMIO,
ECAM or private CPU registers.

Even a memory read can have noteable side effects, such as
remote cache line eviction/writeback, etc. Noteable from
a security viewpoint, if not a functional viewpoint.

> Memory has the property that a new read
>always gets the latest write.

Assuming cacheable with appropriate guarantees
or strong ordering at the issuer and any intermediate
agents (bus/mesh/ring protocols, root complexes, et alia).

> MMI/O & configuration accesses
>do not have this property.
><
>I just can't see the word external being the correct word.

Fine. It's certainly not uncommon in this usage, but I'll
defer to a better term.

Re: Hardware vs. Software bugs

<e7c64f12-91cc-4bdf-be9a-83a8ea518504n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32893&group=comp.arch#32893

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:b886:0:b0:762:4189:117c with SMTP id i128-20020a37b886000000b007624189117cmr1195501qkf.1.1687296248958;
Tue, 20 Jun 2023 14:24:08 -0700 (PDT)
X-Received: by 2002:a05:6870:1ecd:b0:19f:ad5f:b7e8 with SMTP id
pc13-20020a0568701ecd00b0019fad5fb7e8mr3093008oab.8.1687296248683; Tue, 20
Jun 2023 14:24:08 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Jun 2023 14:24:08 -0700 (PDT)
In-Reply-To: <mrokM.8259$Zq81.1050@fx15.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:315b:1da6:985c:6101;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:315b:1da6:985c:6101
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<2023Jan17.095907@mips.complang.tuwien.ac.at> <12ea9472-89ea-41cb-924f-d39ec8f85a7an@googlegroups.com>
<2023Jan19.093620@mips.complang.tuwien.ac.at> <27c10e48-a327-4ffe-a6a3-427034325c49n@googlegroups.com>
<u62kij$2dc9h$1@dont-email.me> <526f5bc6-be6d-45cc-a79a-f991f24d3efcn@googlegroups.com>
<u6smnb$2gaa7$6@dont-email.me> <PRlkM.1159$o5e9.652@fx37.iad>
<608cabc3-edae-41e1-9750-3c7dc117e9dcn@googlegroups.com> <mrokM.8259$Zq81.1050@fx15.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e7c64f12-91cc-4bdf-be9a-83a8ea518504n@googlegroups.com>
Subject: Re: Hardware vs. Software bugs
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Tue, 20 Jun 2023 21:24:08 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5971
 by: MitchAlsup - Tue, 20 Jun 2023 21:24 UTC

On Tuesday, June 20, 2023 at 4:00:06 PM UTC-5, Scott Lurndal wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> >On Tuesday, June 20, 2023 at 1:03:31=E2=80=AFPM UTC-5, Scott Lurndal wrote:
> >> "Paul A. Clayton" <paaron...@gmail.com> writes:=20
> >> >On 6/11/23 7:18=E2=80=AFPM, MitchAlsup wrote:=20
> >> >> On Saturday, June 10, 2023 at 2:57:11=E2=80=AFPM UTC-5, Paul A. Clayto=
> >n wrote:=20
> >> >[snip]>> Hmm. I wonder if there are cases where pausing/partially=20
> >> >replaying=20
> >> >>> an atomic event might be better than failing and retrying (or=20
> >> >>> choosing a different action).=20
> >> >> <=20
> >> >> I am going to go with NO here. ATOMIC means that one or more=20
> >> >> pieces of state change from the prior value set to the current=20
> >> >> value set instantaneously as seen by all interested 3rd parties.=20
> >> >> <=20
> >> >> Adding pause or replay implies you have broken the above rule=20
> >> >> by allowing some 3rd party to see the prior value set--which=20
> >> >> means that abandonment of the event is the only sane recourse.=20
> >> >=20
> >> >If an external read receives the old value that will be=20
> >> >overwritten by the atomic operation, the operation could still be=20
> >> >atomic but occurring "after" that read. If the external read does=20
> >> >not update the value (atomically),
> ><
> >> Note that external reads may have side effects,=20
> ><
> >ROM is external..............and can never have side effects....
> Which is why I said "may".
> ><
> >What I think you are trying to state is that there are storage
> >containers accessed by LD and ST instructions that do not=20
> >act like memory.
>
> I was referring in general to addresses in the physical address
> space, regardless of whether they're backed by memory, MMIO,
> ECAM or private CPU registers.
>
> Even a memory read can have noteable side effects, such as
> remote cache line eviction/writeback, etc. Noteable from
> a security viewpoint, if not a functional viewpoint.
<
In My 66000 architecture a predicted read will not alter the
cache or TLB image (or tablewalk acceleration) unless the
predicted read retires. This eliminates that concern, and
returns caches to "essentially" invisible <architecturally>.
Spectré attacks laid moot.
<
> > Memory has the property that a new read
> >always gets the latest write.
<
> Assuming cacheable with appropriate guarantees
> or strong ordering at the issuer and any intermediate
> agents (bus/mesh/ring protocols, root complexes, et alia).
<
I don't think you have to apply those distinctions--I think the
reverse is true. Memory is a set of containers which deliver
the last value written whenever read. Memory can be safely
cached (under a reasonable protocol) and retain this property.
<
It is all those other containers that have side effects (at least
occasionally) and should not be labeled as <plain> memory.
External was a good try, but comes with excess baggage.....
<
Back to interconnect technology:: the intermediate agents
operate under a "reasonable" set of cache coherence protocol
rules deliver causal consistency (often called cache consistent).
The design of said protocols and verification thereof is not for the
faint of heart. {{Then try sticking a NaK response in the middle
of all of it--to give ATOMIC events better forward progress
guarantees.}}
<
An interesting property of My 66000 MMU model is that the
core transitions from causal consistency to sequential
consistency (MMI/O) or strongly ordered (configuration)
or weakly ordered (ROM) based on the PTE. In addition,
the first access of an ATOMIC event switches the core to
sequential consistency and back to causal after the event.
<
These actions should alleviate the programmer/compiler
from having to figure this intricate stuff out.
<
> > MMI/O & configuration accesses
> >do not have this property.
> ><
> >I just can't see the word external being the correct word.
> Fine. It's certainly not uncommon in this usage, but I'll
> defer to a better term.

Re: Hardware vs. Software bugs

<SupkM.7189$Vpga.4898@fx09.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32894&group=comp.arch#32894

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx09.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Hardware vs. Software bugs
Newsgroups: comp.arch
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com> <2023Jan19.093620@mips.complang.tuwien.ac.at> <27c10e48-a327-4ffe-a6a3-427034325c49n@googlegroups.com> <u62kij$2dc9h$1@dont-email.me> <526f5bc6-be6d-45cc-a79a-f991f24d3efcn@googlegroups.com> <u6smnb$2gaa7$6@dont-email.me> <PRlkM.1159$o5e9.652@fx37.iad> <608cabc3-edae-41e1-9750-3c7dc117e9dcn@googlegroups.com> <mrokM.8259$Zq81.1050@fx15.iad> <e7c64f12-91cc-4bdf-be9a-83a8ea518504n@googlegroups.com>
Lines: 104
Message-ID: <SupkM.7189$Vpga.4898@fx09.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 20 Jun 2023 22:12:02 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Tue, 20 Jun 2023 22:12:02 GMT
X-Received-Bytes: 5533
 by: Scott Lurndal - Tue, 20 Jun 2023 22:12 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Tuesday, June 20, 2023 at 4:00:06=E2=80=AFPM UTC-5, Scott Lurndal wrote:
>> MitchAlsup <Mitch...@aol.com> writes:=20
>> >On Tuesday, June 20, 2023 at 1:03:31=3DE2=3D80=3DAFPM UTC-5, Scott Lurnd=
>al wrote:=20
>> >> "Paul A. Clayton" <paaron...@gmail.com> writes:=3D20=20
>> >> >On 6/11/23 7:18=3DE2=3D80=3DAFPM, MitchAlsup wrote:=3D20=20
>> >> >> On Saturday, June 10, 2023 at 2:57:11=3DE2=3D80=3DAFPM UTC-5, Paul =
>A. Clayto=3D=20
>> >n wrote:=3D20=20
>> >> >[snip]>> Hmm. I wonder if there are cases where pausing/partially=3D2=
>0=20
>> >> >replaying=3D20=20
>> >> >>> an atomic event might be better than failing and retrying (or=3D20=
>=20
>> >> >>> choosing a different action).=3D20=20
>> >> >> <=3D20=20
>> >> >> I am going to go with NO here. ATOMIC means that one or more=3D20=
>=20
>> >> >> pieces of state change from the prior value set to the current=3D20=
>=20
>> >> >> value set instantaneously as seen by all interested 3rd parties.=3D=
>20=20
>> >> >> <=3D20=20
>> >> >> Adding pause or replay implies you have broken the above rule=3D20=
>=20
>> >> >> by allowing some 3rd party to see the prior value set--which=3D20=
>=20
>> >> >> means that abandonment of the event is the only sane recourse.=3D20=
>=20
>> >> >=3D20=20
>> >> >If an external read receives the old value that will be=3D20=20
>> >> >overwritten by the atomic operation, the operation could still be=3D2=
>0=20
>> >> >atomic but occurring "after" that read. If the external read does=3D2=
>0
>> >> >not update the value (atomically),=20
>> ><
>> >> Note that external reads may have side effects,=3D20
>> ><=20
>> >ROM is external..............and can never have side effects....
>> Which is why I said "may".
>> ><=20
>> >What I think you are trying to state is that there are storage
>> >containers accessed by LD and ST instructions that do not=3D20=20
>> >act like memory.=20
>>=20
>> I was referring in general to addresses in the physical address=20
>> space, regardless of whether they're backed by memory, MMIO,=20
>> ECAM or private CPU registers.=20
>>=20
>> Even a memory read can have noteable side effects, such as=20
>> remote cache line eviction/writeback, etc. Noteable from=20
>> a security viewpoint, if not a functional viewpoint.
><
>In My 66000 architecture a predicted read will not alter the
>cache or TLB image (or tablewalk acceleration) unless the
>predicted read retires. This eliminates that concern, and=20
>returns caches to "essentially" invisible <architecturally>.
>Spectr=C3=A9 attacks laid moot.

Yet even non-speculative evictions leak information, at a
low bit-rate.

><
>> > Memory has the property that a new read=20
>> >always gets the latest write.
><
>> Assuming cacheable with appropriate guarantees=20
>> or strong ordering at the issuer and any intermediate=20
>> agents (bus/mesh/ring protocols, root complexes, et alia).
><
>I don't think you have to apply those distinctions--I think the
>reverse is true. Memory is a set of containers which deliver
>the last value written whenever read. Memory can be safely
>cached (under a reasonable protocol) and retain this property.

If the "memory" is behind CXL, all the intermediate agents
need to provide "causal consistency", including PCIe switches.

I think memory may be just as fraught as external in these contexts.

><
>It is all those other containers that have side effects (at least
>occasionally) and should not be labeled as <plain> memory.
>External was a good try, but comes with excess baggage.....
><
>Back to interconnect technology:: the intermediate agents
>operate under a "reasonable" set of cache coherence protocol
>rules deliver causal consistency (often called cache consistent).
>The design of said protocols and verification thereof is not for the
>faint of heart. {{Then try sticking a NaK response in the middle
>of all of it--to give ATOMIC events better forward progress
>guarantees.}}

At 3leaf systems, caching protocols with intermediate agents
was our bread-n-butter (the intermediate agents in this case
were Infiniband switches), with our ASIC interfacing HT to
IB (and vice versa). There's an abandoned patent app for
a block cache in that environment on the USPTO site.

Just a note, the latest version of the Aarch64 ARM DDI0487 spec
now includes descriptions of the transactional memory
feature.

Re: Hardware vs. Software bugs

<b4101bdf-dcf8-4117-a76e-ba8ea63e0a24n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32896&group=comp.arch#32896

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:4e0c:0:b0:62f:f0c9:1c87 with SMTP id dl12-20020ad44e0c000000b0062ff0c91c87mr2493411qvb.12.1687301425885;
Tue, 20 Jun 2023 15:50:25 -0700 (PDT)
X-Received: by 2002:a4a:d095:0:b0:55e:e016:f166 with SMTP id
i21-20020a4ad095000000b0055ee016f166mr1429813oor.1.1687301425559; Tue, 20 Jun
2023 15:50:25 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Jun 2023 15:50:25 -0700 (PDT)
In-Reply-To: <SupkM.7189$Vpga.4898@fx09.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:315b:1da6:985c:6101;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:315b:1da6:985c:6101
References: <b6260f7b-fbeb-4549-8579-f12c76bc5b69n@googlegroups.com>
<2023Jan19.093620@mips.complang.tuwien.ac.at> <27c10e48-a327-4ffe-a6a3-427034325c49n@googlegroups.com>
<u62kij$2dc9h$1@dont-email.me> <526f5bc6-be6d-45cc-a79a-f991f24d3efcn@googlegroups.com>
<u6smnb$2gaa7$6@dont-email.me> <PRlkM.1159$o5e9.652@fx37.iad>
<608cabc3-edae-41e1-9750-3c7dc117e9dcn@googlegroups.com> <mrokM.8259$Zq81.1050@fx15.iad>
<e7c64f12-91cc-4bdf-be9a-83a8ea518504n@googlegroups.com> <SupkM.7189$Vpga.4898@fx09.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b4101bdf-dcf8-4117-a76e-ba8ea63e0a24n@googlegroups.com>
Subject: Re: Hardware vs. Software bugs
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Tue, 20 Jun 2023 22:50:25 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 7704
 by: MitchAlsup - Tue, 20 Jun 2023 22:50 UTC

On Tuesday, June 20, 2023 at 5:12:06 PM UTC-5, Scott Lurndal wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> >On Tuesday, June 20, 2023 at 4:00:06=E2=80=AFPM UTC-5, Scott Lurndal wrote:
> >> MitchAlsup <Mitch...@aol.com> writes:=20
> >> >On Tuesday, June 20, 2023 at 1:03:31=3DE2=3D80=3DAFPM UTC-5, Scott Lurnd=
> >al wrote:=20
> >> >> "Paul A. Clayton" <paaron...@gmail.com> writes:=3D20=20
> >> >> >On 6/11/23 7:18=3DE2=3D80=3DAFPM, MitchAlsup wrote:=3D20=20
> >> >> >> On Saturday, June 10, 2023 at 2:57:11=3DE2=3D80=3DAFPM UTC-5, Paul =
> >A. Clayto=3D=20
> >> >n wrote:=3D20=20
> >> >> >[snip]>> Hmm. I wonder if there are cases where pausing/partially=3D2=
> >0=20
> >> >> >replaying=3D20=20
> >> >> >>> an atomic event might be better than failing and retrying (or=3D20=
> >=20
> >> >> >>> choosing a different action).=3D20=20
> >> >> >> <=3D20=20
> >> >> >> I am going to go with NO here. ATOMIC means that one or more=3D20=
> >=20
> >> >> >> pieces of state change from the prior value set to the current=3D20=
> >=20
> >> >> >> value set instantaneously as seen by all interested 3rd parties.=3D=
> >20=20
> >> >> >> <=3D20=20
> >> >> >> Adding pause or replay implies you have broken the above rule=3D20=
> >=20
> >> >> >> by allowing some 3rd party to see the prior value set--which=3D20=
> >=20
> >> >> >> means that abandonment of the event is the only sane recourse.=3D20=
> >=20
> >> >> >=3D20=20
> >> >> >If an external read receives the old value that will be=3D20=20
> >> >> >overwritten by the atomic operation, the operation could still be=3D2=
> >0=20
> >> >> >atomic but occurring "after" that read. If the external read does=3D2=
> >0
> >> >> >not update the value (atomically),=20
> >> ><
> >> >> Note that external reads may have side effects,=3D20
> >> ><=20
> >> >ROM is external..............and can never have side effects....
> >> Which is why I said "may".
> >> ><=20
> >> >What I think you are trying to state is that there are storage
> >> >containers accessed by LD and ST instructions that do not=3D20=20
> >> >act like memory.=20
> >>=20
> >> I was referring in general to addresses in the physical address=20
> >> space, regardless of whether they're backed by memory, MMIO,=20
> >> ECAM or private CPU registers.=20
> >>=20
> >> Even a memory read can have noteable side effects, such as=20
> >> remote cache line eviction/writeback, etc. Noteable from=20
> >> a security viewpoint, if not a functional viewpoint.
> ><
> >In My 66000 architecture a predicted read will not alter the
> >cache or TLB image (or tablewalk acceleration) unless the
> >predicted read retires. This eliminates that concern, and=20
> >returns caches to "essentially" invisible <architecturally>.
> >Spectr=C3=A9 attacks laid moot.
>
> Yet even non-speculative evictions leak information, at a
> low bit-rate.
<
Can you cite a paper ??
>
> ><
> >> > Memory has the property that a new read=20
> >> >always gets the latest write.
> ><
> >> Assuming cacheable with appropriate guarantees=20
> >> or strong ordering at the issuer and any intermediate=20
> >> agents (bus/mesh/ring protocols, root complexes, et alia).
> ><
> >I don't think you have to apply those distinctions--I think the
> >reverse is true. Memory is a set of containers which deliver
> >the last value written whenever read. Memory can be safely
> >cached (under a reasonable protocol) and retain this property.
<
> If the "memory" is behind CXL, all the intermediate agents
> need to provide "causal consistency", including PCIe switches.
<
I though the general idea of CXL was so the "chip" did not need to
have a DRAM controller, or a system repeater (ala HyperTransport).
Thus decoupling the "chip" design from the memory of customer
choice--the customer can simply paste a HBM, DDR[23456]
controller on PCIe link somewhere and that memory would
become cache coherent by using the CXL protocol. Similar for
the CXL<->CXL links providing chip-to-chip interconnect.
<
This eliminates the "chip" designers from having to make those
kinds of decisions (or expend the engineering effort.)
<
However, the "system" has to have enough bandwidth at reasonable
latencies for all this to end up with performant systems.
>
> I think memory may be just as fraught as external in these contexts.
<
I am suggesting that when you use the word memory, you attach
the "read the value last written" semantic to whatever you are calling
memory.
> ><
> >It is all those other containers that have side effects (at least
> >occasionally) and should not be labeled as <plain> memory.
> >External was a good try, but comes with excess baggage.....
> ><
> >Back to interconnect technology:: the intermediate agents
> >operate under a "reasonable" set of cache coherence protocol
> >rules deliver causal consistency (often called cache consistent).
> >The design of said protocols and verification thereof is not for the
> >faint of heart. {{Then try sticking a NaK response in the middle
> >of all of it--to give ATOMIC events better forward progress
> >guarantees.}}
<
> At 3leaf systems, caching protocols with intermediate agents
> was our bread-n-butter (the intermediate agents in this case
> were Infiniband switches), with our ASIC interfacing HT to
> IB (and vice versa). There's an abandoned patent app for
> a block cache in that environment on the USPTO site.
>
> Just a note, the latest version of the Aarch64 ARM DDI0487 spec
> now includes descriptions of the transactional memory
> feature.

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor