Message-ID:

The key elements in human thinking are not numbers but labels of fuzzy sets. -- L. Zadeh

devel / comp.arch / Re: Misc: Design tradeoffs in virtual memory systems...

Re: Misc: Design tradeoffs in virtual memory systems...

<369c9330-4f12-45b8-a889-5cab65e55877n@googlegroups.com>

https://news.novabbs.org/devel/article-flat.php?id=32377&group=comp.arch#32377

X-Received: by 2002:a05:620a:4487:b0:75b:3962:8dc6 with SMTP id x7-20020a05620a448700b0075b39628dc6mr952408qkp.1.1685241153564;
Sat, 27 May 2023 19:32:33 -0700 (PDT)
X-Received: by 2002:a05:6870:b145:b0:192:551f:2533 with SMTP id
a5-20020a056870b14500b00192551f2533mr1701431oal.1.1685241153312; Sat, 27 May
2023 19:32:33 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 27 May 2023 19:32:33 -0700 (PDT)
In-Reply-To: <u4udgc$nka0$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1008:b673:73c4:8b61;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1008:b673:73c4:8b61
References: <u4por8$3tugb$1@dont-email.me> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com> <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com> <SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com> <u4tn2h$h7q5$1@dont-email.me>
<c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com> <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
<u4udgc$nka0$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <369c9330-4f12-45b8-a889-5cab65e55877n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sun, 28 May 2023 02:32:33 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5324

by: MitchAlsup - Sun, 28 May 2023 02:32 UTC

On Saturday, May 27, 2023 at 9:15:44 PM UTC-5, BGB wrote:
> On 5/27/2023 4:19 PM, MitchAlsup wrote:
> > On Saturday, May 27, 2023 at 3:18:39 PM UTC-5, robf...@gmail.com wrote:
> >> On Saturday, May 27, 2023 at 3:52:53 PM UTC-4, BGB wrote:
> >>
> >> I hate to ask homework questions, but,
> >>
> >> Am I on the right track thinking that hardware table walking for a table in a
> >> VAS needs to be able to stack TLB misses? I have read through some web
> >> pages, and they just give an overview of address translation, with a little
> >> diamond on a translation diagram that say: ‘if in page table’. I think that
> >> piece is not so simple.
> > <
> > If the tablewalker finds a page fault in midst walk, it simply raises PAGEFAULT
> > and control transferrrs to the page fault handler.
> > <
> > When the pager fault handler returns, the instruction causing the table walk
> > is executed again, and the table walker re walks the table.
> Yeah, sorta similar for TLB Miss.
> It doesn't really work to try to queue up the TLB misses, or to deal
> with all of them.
>
> Rather the first that is raised is handled, and the others are ignored.
> When the instruction is retried after the ISR returns, it may trigger
> another TLB miss. This may continue until all have been resolved.
> >>
> >> Suppose there is a six-level hierarchical table. What if there is a TLB miss
> >> in one of the levels while processing a TLB miss? It seems like one would
> >> have to stack the misses process the new one, then unstack and continue,
> >> all in hardware. Why I had relegated TLB miss processing to software
> >> where it would be easier to handle. I suppose a custom hardware /
> >> software combo co-processor could be built to handle the misses.
> > <
> > One can take a page fault at every level of the table walk.
> > And in 2-level paging (nested) you can take GuestOS page faults in
> > pages GuestOS manages, and HyperVisor page faults in those pages
> > the HV manages.
> >>
> >> What if coincidently the TLB misses all use the same TLB entry?
> > <
> > Associativity > total levels solves this problem. VAX could get into situations
> > where it needed 14 PTEs simultaneously present to execute some exotic
> > instructions. With is set associative TLB this was not always possible.
> > {Time for a VAX aficionado to tell us how it got out of these.}
> > <
> Yeah, multiple memory accessed in the same instruction would be pretty
> bad in this sense.
>
>
> It can be pointed out that I used a 4-way set associative TLB.
> Why? If I could have just used 1 or 2 way, which is cheaper.
>
> What if I said, that things can get bad unless one uses an associative
> mapping here (and, 4-way was basically the practical minimum I could
> really get away with in my earlier tests).
>
> Though, I can point out that this is related to the use of
> modulo-addressing for the TLB. Hashing the TLB index effectively doubles
> the required associativity needed to avoid certain "problem cases".
<
In my HW hashes, I like to bit reverse various fields before XORing them.
a) it gives a better hash
b) it is hard for SW to efficiently compute your hash.
<
hash<7:0> = field<24:32> ^ field<23:16> ^ field<15:8>;
<
You can throw in ASIDs at certain points, too, to improve the hashing.

Re: Misc: Design tradeoffs in virtual memory systems...

<u4v0hk$pecm$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32378&group=comp.arch#32378

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ivan@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sun, 28 May 2023 00:40:36 -0700
Organization: A noiseless patient Spider
Lines: 73
Message-ID: <u4v0hk$pecm$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 28 May 2023 07:40:36 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f52db9e7893db17fc23ff102c7d60eb7";
logging-data="833942"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19f1rsZlXd1XWnlMVk2CTVK"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:EiuYEHRNWWnfTXhCrrHGV8s9mNE=
Content-Language: en-US
In-Reply-To: <bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>

by: Ivan Godard - Sun, 28 May 2023 07:40 UTC

On 5/26/2023 6:49 PM, MitchAlsup wrote:
> On Friday, May 26, 2023 at 8:24:13 PM UTC-5, BGB wrote:
>> On 5/26/2023 4:15 PM, Dan Cross wrote:
>>> In article <66610ab9-0c63-4b8e...@googlegroups.com>,
>>> MitchAlsup <Mitch...@aol.com> wrote:
>>>> On Friday, May 26, 2023 at 2:59:13 PM UTC-5, Dan Cross wrote:
>>>>> Nor does it imply that a hardware page-table walker is bad.
>>>>
>>>> It has always been my argument that HW tableWalkers are best.
>>>
>>> I concur. OP does not, but OP doesn't seem to be talking from a
>>> particularly knowledgable position.
>>>
>> If I had no idea what I was doing, I would not likely have gotten as far
>> as I have...
>>
>> But, admittedly, some amount of what information I had came from
>> quick-and-dirty web-searches and gathering information from things like
>> CS/EE PowerPoint slides and similar.
> <
> Quick and dirty web searches often portray x86 as the best architecture
> that could ever be invented, almost always emulated, and never defamed.
>>
>> Some amount of reading PDFs and similar as well.
> <
>>>>> The OP seems to think so, but has yet to provide a particularly
>>>>> compelling argument beyond some measurements from very
>>>>> unrepresentative workloads running on a hobby ISA on an FPGA
>>>>
>>>> One must rate BGBs architecture is the less-than-even-academic
>>>> category; far from industrial quality.
>>>
>>> Agreed.
>>>
>> How many people have done much better with their hobby CPU ISA projects?...
> <
> I will give you great credit in pulling what you have off.
> Where I tend to disagree, is when you state what you have done as
> if it is superlative. I am guilty of the same.
> <
>>
>>>> But we enjoy watching him stumble across industrial problems
>>>> making the same mistakes we made 40 years ago.
>>>
>>> It seems like many of those could be avoided if OP were a bit
>>> more open-minded and, dare I say it, self-aware.
>>>
>> ?...
>>
>> I am aware of my own existence, and am able to recognize my own
>> reflection in a mirror, etc.
> <
> Cuttlefish are the boundary line on self recognition.
>>
>> I am not always the best in "intuitive" contexts.
>> Nor necessarily at noticing or thinking about "obvious" things.
>>
>> Nor, particularly skilled with "top down" thinking.
>>
> Throughout my career, I have been more successful with middle out
> thinking and designing--a both ends towards the middle, rather than
> a) just throw it together and see what you can make work, and b)
> top down, minutia be damned--ways of addressing the problems.

Not both ends at the same time - I use a yo-yo model instead.
loop:
top down until run out of insight or questions
pick a part you know must be there regardless of how open issues
are resolved
bottom up implement that part
collect insights/question that arose from the implementation
goto loop

Re: Misc: Design tradeoffs in virtual memory systems...

<u4vg0q$s9s5$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32379&group=comp.arch#32379

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sun, 28 May 2023 08:04:40 -0400
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <u4vg0q$s9s5$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
<u4svjf$e5m0$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 28 May 2023 12:04:42 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="cb82edcea353eac95db42fbe15af6df0";
logging-data="927621"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+w0mealOUxp6K3w4QtpDTT2Y1duyFRMrU="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:42p90WAMZZkmpy7rzDhOb6e2qaI=
In-Reply-To: <u4svjf$e5m0$1@dont-email.me>

by: Paul A. Clayton - Sun, 28 May 2023 12:04 UTC

On 5/27/23 9:12 AM, Terje Mathisen wrote:
> MitchAlsup wrote:
>> On Friday, May 26, 2023 at 8:24:13â€¯PM UTC-5, BGB wrote:
>> If I could accurately portray my though processes to a
>> Psychiatrist,
>> they would probably lock me up..............
>> <
>> I often tell the joke to someone like us::
>> <
>> Sane people think that there is a big difference between being
>> sane and insane.
>> <
>> < longish pause >
>> <
>> We know otherwise.
>
> It is more like not all of accept the same definition for the
> boundary line (i.e. it is a very fuzzy boundary).

That reminds me of a Philip K. Dick story where some people had
the ability to conform reality to their delusions. One government
agent (?) was telling another that determining when reality was so
changed would not be difficult. At a party, they find that the
host has made a plant that grows flying cars. Everyone panicking
at the distortion of reality demonstrated the government agent's
view. That agent then comments something like "what sane person
would grow flying cars" and then flies off the roof (demonstrating
his own reality-altering delusion).

(G.K. Chesterton made a related comment in _Heretics_: "Some
people have pulled the lamp-post down because they wanted the
electric light; some because they wanted old iron; some because
they wanted darkness, because their deeds were evil. Some thought
it not enough of a lamp-post, some too much; some acted because
they wanted to smash municipal machinery; some because they wanted
to smash something. And there is a war in the night, no man
knowing whom he strikes." Agreeing on particulars seems more
pragmatic than trying to develop a consensus on a more general
level, but such seems to be one of the factors in "design by
committee" — trading favors/appeasement also helps give "design by
committee" its reputation.)

> It is like the old saw about all progress being due to
> unreasonanable people, since reasonable people accept the status quo.
>
> Terje
>
>

Re: Misc: Design tradeoffs in virtual memory systems...

<u4vg17$s9s5$2@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32380&group=comp.arch#32380

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sun, 28 May 2023 08:04:55 -0400
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <u4vg17$s9s5$2@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 28 May 2023 12:04:55 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="cb82edcea353eac95db42fbe15af6df0";
logging-data="927621"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/nUb0UI+YK4DDeF7JIly5TABsdr6bJ89w="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:b0WkDZddiaLltEzCEx/DvFp2HVM=
In-Reply-To: <bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>

by: Paul A. Clayton - Sun, 28 May 2023 12:04 UTC

On 5/26/23 9:49 PM, MitchAlsup wrote:
[snip]
> If I could accurately portray my though processes to a Psychiatrist,
> they would probably lock me up..............
> <
> I often tell the joke to someone like us::
> <
> Sane people think that there is a big difference between being sane and insane.
> <
> < longish pause >
> <
> We know otherwise.

I am reminded of two statements by Mark Vorkosigan in Bujold's
_Mirror Dance_:

"I do think, half of what we call madness is just some poor slob
dealing with pain by a strategy that annoys the people around
him." (Chapter 18)

"Sometimes, insanity is not a tragedy. Sometimes, it's a strategy
for survival. Sometimes . . . it's a triumph." (Chapter 31)

I suspect your neurodivergence is not that type of coping
mechanism, but the quotes came to mind.

Re: Misc: Design tradeoffs in virtual memory systems...

<u4vidc$si9u$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32381&group=comp.arch#32381

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sun, 28 May 2023 08:45:29 -0400
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <u4vidc$si9u$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
<SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com>
<u4tn2h$h7q5$1@dont-email.me>
<c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com>
<44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
<u4udgc$nka0$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 28 May 2023 12:45:32 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="cb82edcea353eac95db42fbe15af6df0";
logging-data="936254"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/HvjnlGnk/3e5KxveDT9oVV9k1sTfU3sI="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:R+4wqGD6sZFKexAL0lTBXy4u2ZQ=
In-Reply-To: <u4udgc$nka0$1@dont-email.me>

by: Paul A. Clayton - Sun, 28 May 2023 12:45 UTC

On 5/27/23 10:15 PM, BGB wrote:
[snip]
> Though, I can point out that this is related to the use of
> modulo-addressing for the TLB. Hashing the TLB index effectively
> doubles the required associativity needed to avoid certain
> "problem cases".
>
> Even, if taken in isolation, an XOR hashed index looks "better";
> and in an intuitive sense, a modulo index seems like a rather poor
> solution. Hashing adds an ugly edge case: What happens when
> H(A+0)==H(A+1) ?...

You might want to look at skewed associativity, which uses
different hashing functions for different (groups of) ways. The
original paper by André Seznec and François Bodin, "Skewed-
Associative Caches" (1992) noted four traits for the hashing
functions: equability (each entry can be used by the same fraction
of the entire cacheable items), inter-bank dispersion (multiple
mappings in one way that conflict are unlikely to conflict in the
other way(s)), local dispersion in a single bank («Many
applications exhibit spacial locality, then mapping functions must
be chosen in order to avoid having two "almost" neighbor lines of
data conflicting for the same physical line in cache bank i.»),
and simple hardware implementation (one does not want to increase
access latency more than one benefits from lower miss rate).

The "local dispersion" issue that you mention is not that
difficult to avoid. More difficult is avoiding excess conflicts
when two different groups have spacial locality. Conditional bit
inversion (xoring multiple bits based on a single bit hash of
certain bits) is simple and for least significant bits makes
multiple same direction linear access patterns cross at one time.
Conditional bit reversal (reversing bit order) also seems somewhat
simple and provides high dispersion (with few bits needed to
produce the hash).

The ideal mapping function would depend on the capacity and on the
workload as well as how many bits can be used or are available
early enough. (Miss cost could also be a factor.)

Scott Lurndal <scott@slp53.sl.home> schrieb:

>> Power and PowerPC

[...]

> Obsolete CPUs, all designed three decades ago.

Power10 became available in 2021, and Power11 (presumably) is on the
way; patches are already appearing in gcc.

Re: Misc: Design tradeoffs in virtual memory systems...

<u4vjtf$sn61$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32383&group=comp.arch#32383

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sun, 28 May 2023 09:11:08 -0400
Organization: A noiseless patient Spider
Lines: 70
Message-ID: <u4vjtf$sn61$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
<SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com>
<u4tn2h$h7q5$1@dont-email.me>
<c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com>
<44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
<u4uakj$jmlc$1@dont-email.me>
<6be9f945-59e3-492d-8ac9-35f987b981d2n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 28 May 2023 13:11:11 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="cb82edcea353eac95db42fbe15af6df0";
logging-data="941249"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/k4oWdfp3IIsw9FZ25CnhK9rVWjFUVxjs="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:3h1YzK/fjvEprab26R7q3YO+FCc=
In-Reply-To: <6be9f945-59e3-492d-8ac9-35f987b981d2n@googlegroups.com>

by: Paul A. Clayton - Sun, 28 May 2023 13:11 UTC

On 5/27/23 10:28 PM, MitchAlsup wrote:
> On Saturday, May 27, 2023 at 8:28:36 PM UTC-5, Paul A. Clayton wrote:
>> MitchAlsup wrote:
>> [snip]
>>> PTPs are handled differently than PTEs. Only PTEs go in the TLB, a different
>>> structure (possibly even flip-flops) can manage the sequentially of PTPs.
> <
>> Quick niggling response: Linear page tables (table stored in the
>> virtual address space) would naturally store intermediate nodes in
>> the TLB (in fact, the page table base could, in theory, be such a
>> TLB-cached node). Placing such in an L2 TLB or a "side" TLB seems
>> attractive for avoiding interference/capacity waste given more
>> frequent 'ordinary' translation requests.
> <
> We would put PTPs in a table-walk accelerator cache which could
> be searched as the TLB was searched. Thus, a couple hits in the
> TWA and by the time you know the TLB missed, you could already
> have the address of the missing PTE. In Ross Colorado chips the
> TWA has 8 entries.
>>
>> Also, there is some attraction *to me* for
>> storing PTPs and equivalent-node large page translations in the
>> same structure as this reduces the disincentive for large caches
>> (TLBs) for large pages.
> <
> PTPs in the TLB is a waste of TLB entries--a precious resource.

For large page TLBs, there is a problem of utilization. If no
large pages are used, the area (and power) is wasted. By including
PTPs, most of the area and static power is not wasted (though
dynamic power will be due to accessing this special TLB for every
access — unless the large page TLB access can be delayed until
after a small page TLB miss is noted [prediction might also be
used]).

For L2 TLBs, the benefit is even more likely to justify the costs
both because the capacity is larger (so utilization is more
important) and because access frequency is lower (so dynamic power
cost is less). Parallel access of a table walk cache with the TLB
would have the same power waste.

For huge pages, special TLBs and hash/rehash seem to be the most
practical options. (Overlaid skewed associativity could support
parallel access with multiple hashes, but because the bits are
different conflicts would be possible [possibly handled by
serialization].) Hash/rehash has some utilization advantage and
for huge pages the modest extra latency of a later indexing may be
covered by greater persistence of entries in a small specialized
TLB but leaves tags bits unused for large pages and data bits
unused if large pages do not repurpose those bits to extend the
physical address space or to provide extra metadata [permissions,
caching behavior, etc.] Varying the number of entries in a set
based on the page size would be possible (to reduce storage
"waste"), but that also introduces issues.

Associativity is also a factor, with separate storage, each
storage area has distinct ways. This seems to be one of those
areas where specialization (efficiency) and utilization tradeoffs
are significant.

> < (There may be some disadvantages of such
>> shared storage. E.g., perhaps PTPs might benefit from sharing the
>> tag among multiple entries differently than large-page PTEs.
>> Complexity would also be a factor.)
> <
> In the past I have held lines of PTPs in TWA so that nearby accesses
> do not need L2 accesses.
>>
>> (I hope to post my own thoughts on hardware TLB fill sometime.)

In article <obocM.254898$LAYb.126941@fx02.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:
>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>In article <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>,
>>MitchAlsup <MitchAlsup@aol.com> wrote:
>>>On Friday, May 26, 2023 at 2:19:40 PM UTC-5, Scott Lurndal wrote:
>>>> >You use the word "enhance" in a way contrary to the dictionary definition..=
>>>>
>>>> There must have been demand for them from someone.
>>>
>>>A billion demands from a billion different people does not a Mona Lisa make.
>>
>>While I'm sure we can all agree that the x86 is a dog's
>>breakfast, that does not imply that all of its features are bad.
>>Nor does it imply that a hardware page-table walker is bad.
>>
>>The OP seems to think so, but has yet to provide a particularly
>>compelling argument beyond some measurements from very
>>unrepresentative workloads running on a hobby ISA on an FPGA
>
>To be fair to BGB, if software TLBs work for his particular
>hobby ISA, that's fine. It's the idea that software TLBs
>are universally better than hardware page table walkers that
>doesn't stand up to scrutiny.

This exactly. It's one thing to create a hobby ISA as a passion
project; quite another to decide that that gives one the
authority to proclaim oneself an expert on related matters.

- Dan C.

In article <u4rm3o$52b5$1@dont-email.me>, BGB <cr88192@gmail.com> wrote:
>On 5/26/2023 4:15 PM, Dan Cross wrote:
>> In article <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>,
>> MitchAlsup <MitchAlsup@aol.com> wrote:
>>> On Friday, May 26, 2023 at 2:59:13 PM UTC-5, Dan Cross wrote:
>>>> Nor does it imply that a hardware page-table walker is bad.
>>>
>>> It has always been my argument that HW tableWalkers are best.
>>
>> I concur. OP does not, but OP doesn't seem to be talking from a
>> particularly knowledgable position.
>
>If I had no idea what I was doing, I would not likely have gotten as far
>as I have...
>
>But, admittedly, some amount of what information I had came from
>quick-and-dirty web-searches and gathering information from things like
>CS/EE PowerPoint slides and similar.
>
>Some amount of reading PDFs and similar as well.

An issue is that by focusing on how far you've come, you fail to
see how far you have yet to go. Should you be proud of your
accomplishments? Sure. Do they make you an expert? No. There
is no law against arguing with people on the Internet, but you
run the risk of getting lumped into the same category of cranks
as the public domain DOS guy if you show yourself to be arguing
consistently from a position of ignorance. Being at the bottom
of people's plonk filters isn't a great place to be.

>>>> The OP seems to think so, but has yet to provide a particularly
>>>> compelling argument beyond some measurements from very
>>>> unrepresentative workloads running on a hobby ISA on an FPGA
>>>
>>> One must rate BGBs architecture is the less-than-even-academic
>>> category; far from industrial quality.
>>
>> Agreed.
>
>How many people have done much better with their hobby CPU ISA projects?...

I haven't a clue. However, most undergrads who take an
architecture course do about the same.

- Dan C.

cross@spitfire.i.gajendra.net (Dan Cross) writes:
>In article <u4rm3o$52b5$1@dont-email.me>, BGB <cr88192@gmail.com> wrote:
>>How many people have done much better with their hobby CPU ISA projects?...
>
>I haven't a clue. However, most undergrads who take an
>architecture course do about the same.

Implement a new architecture in an FPGA and get it to run Doom (which
also means retargeting a compiler for his architecture)? I very much
doubt it.

Still, the point by Scott Lurndal stands: Choices that are fine for a
project with the limited resources that BGB has may be suboptimal for
a more resource-rich project.

It seems that the most limited resource here is BGB himself. And it's
not clear to me if implementing the hardware to produce a TLB miss
exception and then implementing the miss handler in software really
takes less of that resource than implementing a hardware page table
walker. But he already seems to have gone for the first choice, so I
would stick to it unless it becomes a significant problem.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Misc: Design tradeoffs in virtual memory systems...

<oMKcM.3451880$iU59.2611656@fx14.iad>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32388&group=comp.arch#32388

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx14.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
References: <u4por8$3tugb$1@dont-email.me> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com> <u4r32e$mrq$1@reader2.panix.com> <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com> <u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me> <bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com> <SrqcM.2167850$MVg8.198396@fx12.iad> <1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com> <u4tn2h$h7q5$1@dont-email.me> <c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com> <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
In-Reply-To: <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 42
Message-ID: <oMKcM.3451880$iU59.2611656@fx14.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 28 May 2023 15:51:48 UTC
Date: Sun, 28 May 2023 11:51:21 -0400
X-Received-Bytes: 3390

by: EricP - Sun, 28 May 2023 15:51 UTC

MitchAlsup wrote:
> On Saturday, May 27, 2023 at 3:18:39 PM UTC-5, robf...@gmail.com wrote:
>> What if coincidently the TLB misses all use the same TLB entry?
> <
> Associativity > total levels solves this problem. VAX could get into situations
> where it needed 14 PTEs simultaneously present to execute some exotic
> instructions. With is set associative TLB this was not always possible.
> {Time for a VAX aficionado to tell us how it got out of these.}

If I count correctly the worst case VAX instruction has 52 virtual
address translates and 78 memory reads. I had thought all the virtual
addresses had to be present in the TLB at once but just realized
they do not - just the 52 PTE's need to be marked Valid.
The addresses are processed serially and the hardware walker loads
any that miss the TLB for future lookups. If that load conflict evicts
an entry it immediately needs again then it just gets reloaded.

(A software managed TLB would require all virtual addresses to be
resident in the TLB at once because any TLB miss causes an instruction
abort and must be restarted from the beginning. The hardware walker
does not abort the instruction on a TLB miss and just re-loads them.)

The array INDEX instruction has 6 operands, each of which can be
register indirect deferred addressing (register contains the address
of the address of the operand). The in-memory address can straddle
two pages as can the operand data so that's 4 address translates and
4 memory reads. Plus the instruction itself can straddle two pages.

The P0 and P1 process space page table base registers contains a
virtual address in S0 system space, and the S0 system space page
table base address is also in virtual space in S0 space.
So both of those virtual addresses may be cached in the TLB.

The result is that an address translate can require 2 TLB lookups
and 2 PTE reads if both TLB miss, in addition to the operand access.

Totaling it all up, 6 operands * 4 addresses + 2 instruction addr = 26,
times 2 for the two page tables = 52 virtual address translates,
where each translate may require a PTE memory read.
Plus the 26 actual instruction and data accesses = 78 memory reads.

Re: Misc: Design tradeoffs in virtual memory systems...

<R4LcM.2218886$Tcw8.1253065@fx10.iad>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32389&group=comp.arch#32389

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!npeer.as286.net!npeer-ng0.as286.net!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx10.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
References: <u4por8$3tugb$1@dont-email.me> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com> <u4r32e$mrq$1@reader2.panix.com> <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com> <u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me> <bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com> <SrqcM.2167850$MVg8.198396@fx12.iad> <1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com> <u4tn2h$h7q5$1@dont-email.me> <c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com> <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com> <u4uakj$jmlc$1@dont-email.me>
In-Reply-To: <u4uakj$jmlc$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 36
Message-ID: <R4LcM.2218886$Tcw8.1253065@fx10.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 28 May 2023 16:13:37 UTC
Date: Sun, 28 May 2023 12:12:45 -0400
X-Received-Bytes: 2925

by: EricP - Sun, 28 May 2023 16:12 UTC

Paul A. Clayton wrote:
> MitchAlsup wrote:
> [snip]
>> PTPs are handled differently than PTEs. Only PTEs go in the TLB, a
>> different
>> structure (possibly even flip-flops) can manage the sequentially of PTPs.
>
> Quick niggling response: Linear page tables (table stored in the
> virtual address space) would naturally store intermediate nodes in
> the TLB (in fact, the page table base could, in theory, be such a
> TLB-cached node). Placing such in an L2 TLB or a "side" TLB seems
> attractive for avoiding interference/capacity waste given more
> frequent 'ordinary' translation requests.
>
> Also, there is some attraction *to me* for
> storing PTPs and equivalent-node large page translations in the
> same structure as this reduces the disincentive for large caches
> (TLBs) for large pages. (There may be some disadvantages of such
> shared storage. E.g., perhaps PTPs might benefit from sharing the
> tag among multiple entries differently than large-page PTEs.
> Complexity would also be a factor.)
>
> (I hope to post my own thoughts on hardware TLB fill sometime.)

That's how VAX worked - the page table base is a virtual address
in system space, as does the system space page table,
and each recursive translate can be cached in the TLB.

One way to improve this is a virtually indexed L1 cache,
limited to a single page size so it doesn't have aliasing issues.
The virtually indexed cache acts as a large L1 TLB.

This wouldn't have helped VAX much though as its pages were 512 bytes
so a 4-way assoc virtually indexed cache would only hold 2048 bytes
which is probably not large enough to make any difference.

Re: Misc: Design tradeoffs in virtual memory systems...

<u4vvc9$udlp$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32390&group=comp.arch#32390

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sun, 28 May 2023 11:26:44 -0500
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <u4vvc9$udlp$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<u4vot1$591$2@reader2.panix.com>
<2023May28.165701@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 28 May 2023 16:26:49 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e56593e5a9e5d581f41eba438773153c";
logging-data="997049"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+moL0LBUdIxHYKCt11Zyqd"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:1pJuTINO7PQo5MVrkhDMsoWXD4A=
In-Reply-To: <2023May28.165701@mips.complang.tuwien.ac.at>
Content-Language: en-US

by: BGB - Sun, 28 May 2023 16:26 UTC

On 5/28/2023 9:57 AM, Anton Ertl wrote:
> cross@spitfire.i.gajendra.net (Dan Cross) writes:
>> In article <u4rm3o$52b5$1@dont-email.me>, BGB <cr88192@gmail.com> wrote:
>>> How many people have done much better with their hobby CPU ISA projects?...
>>
>> I haven't a clue. However, most undergrads who take an
>> architecture course do about the same.
>
> Implement a new architecture in an FPGA and get it to run Doom (which
> also means retargeting a compiler for his architecture)? I very much
> doubt it.
>

In this case, I didn't retarget an existing compiler, I wrote my own...
Though, in retrospect, it might have been better had I retargeted an
existing (popular) compiler.

Never actually went to any university or similar (who can afford
this?...), so what little I have "from" any of those courses was
occasionally running into power-point slides on the internet.

Admittedly, never really worked "in the industry" either.

> Still, the point by Scott Lurndal stands: Choices that are fine for a
> project with the limited resources that BGB has may be suboptimal for
> a more resource-rich project.
>
> It seems that the most limited resource here is BGB himself. And it's
> not clear to me if implementing the hardware to produce a TLB miss
> exception and then implementing the miss handler in software really
> takes less of that resource than implementing a hardware page table
> walker. But he already seems to have gone for the first choice, so I
> would stick to it unless it becomes a significant problem.
>

Probably fair enough...

Re: Misc: Design tradeoffs in virtual memory systems...

<u4vvkq$udlp$2@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32391&group=comp.arch#32391

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sun, 28 May 2023 11:31:17 -0500
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <u4vvkq$udlp$2@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com> <obocM.254898$LAYb.126941@fx02.iad>
<u4vohj$591$1@reader2.panix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 28 May 2023 16:31:22 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e56593e5a9e5d581f41eba438773153c";
logging-data="997049"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18ZjUjqHrAzqvz2TZybXCSo"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:M/XegR80LJk8AE27AVgMacG8BI4=
Content-Language: en-US
In-Reply-To: <u4vohj$591$1@reader2.panix.com>

by: BGB - Sun, 28 May 2023 16:31 UTC

On 5/28/2023 9:30 AM, Dan Cross wrote:
> In article <obocM.254898$LAYb.126941@fx02.iad>,
> Scott Lurndal <slp53@pacbell.net> wrote:
>> cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>> In article <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>,
>>> MitchAlsup <MitchAlsup@aol.com> wrote:
>>>> On Friday, May 26, 2023 at 2:19:40 PM UTC-5, Scott Lurndal wrote:
>>>>>> You use the word "enhance" in a way contrary to the dictionary definition..=
>>>>>
>>>>> There must have been demand for them from someone.
>>>>
>>>> A billion demands from a billion different people does not a Mona Lisa make.
>>>
>>> While I'm sure we can all agree that the x86 is a dog's
>>> breakfast, that does not imply that all of its features are bad.
>>> Nor does it imply that a hardware page-table walker is bad.
>>>
>>> The OP seems to think so, but has yet to provide a particularly
>>> compelling argument beyond some measurements from very
>>> unrepresentative workloads running on a hobby ISA on an FPGA
>>
>> To be fair to BGB, if software TLBs work for his particular
>> hobby ISA, that's fine. It's the idea that software TLBs
>> are universally better than hardware page table walkers that
>> doesn't stand up to scrutiny.
>
> This exactly. It's one thing to create a hobby ISA as a passion
> project; quite another to decide that that gives one the
> authority to proclaim oneself an expert on related matters.
>

When or where did I ever claim to be an expert on these topics?...

I make no claims of being an expert, nor "the smartest person in the
room", nor anything similar...

But, admittedly, IRL I don't really know anyone else with similar
interests, so it is mostly all "the internet".

> - Dan C.
>

Re: Misc: Design tradeoffs in virtual memory systems...

<1ELcM.4041590$GNG9.143389@fx18.iad>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32392&group=comp.arch#32392

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.mixmin.net!npeer.as286.net!npeer-ng0.as286.net!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx18.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Newsgroups: comp.arch
References: <u4por8$3tugb$1@dont-email.me> <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com> <u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me> <bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com> <SrqcM.2167850$MVg8.198396@fx12.iad> <1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com> <u4tn2h$h7q5$1@dont-email.me> <c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com> <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com> <u4uakj$jmlc$1@dont-email.me>
Lines: 24
Message-ID: <1ELcM.4041590$GNG9.143389@fx18.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sun, 28 May 2023 16:51:09 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sun, 28 May 2023 16:51:09 GMT
X-Received-Bytes: 2263

by: Scott Lurndal - Sun, 28 May 2023 16:51 UTC

"Paul A. Clayton" <paaronclayton@gmail.com> writes:
>MitchAlsup wrote:
>[snip]
>> PTPs are handled differently than PTEs. Only PTEs go in the TLB, a different
>> structure (possibly even flip-flops) can manage the sequentially of PTPs.
>
>Quick niggling response: Linear page tables (table stored in the
>virtual address space) would naturally store intermediate nodes in
>the TLB (in fact, the page table base could, in theory, be such a
>TLB-cached node). Placing such in an L2 TLB or a "side" TLB seems
>attractive for avoiding interference/capacity waste given more
>frequent 'ordinary' translation requests.

Most modern table walkers have intermediate caches for the upper
levels of the page table to improve walk speed on subsequent
lookups. The tricky part is making sure they're invalidated
when the corresponding TLB entry/entries are invalidated by
software.

When the OS updates a mapping on ARMv8, it must first break
the mapping (mark the last level entry unmapped), invalidate
the TLB, then update the last level entry with the new mapping
address and/or attributes. A second TLB invalidate is required.

Thomas Koenig <tkoenig@netcologne.de> writes:
>Scott Lurndal <scott@slp53.sl.home> schrieb:
>
>>> Power and PowerPC
>
>[...]
>
>> Obsolete CPUs, all designed three decades ago.
>
>Power10 became available in 2021, and Power11 (presumably) is on the
>way; patches are already appearing in gcc.

Indeed, Power has been continuously enhanced since 1991. Which,
I believe is three decades ago. As for obsolescence, how many new
design wins has Power had in the last decade?

Re: Misc: Design tradeoffs in virtual memory systems...

<d4f75b55-78f1-4fa4-96fb-9fb45fee8168n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32394&group=comp.arch#32394

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:1888:b0:3e4:e17f:a544 with SMTP id v8-20020a05622a188800b003e4e17fa544mr1973386qtc.12.1685293480661;
Sun, 28 May 2023 10:04:40 -0700 (PDT)
X-Received: by 2002:a05:6870:72d:b0:19e:8ab9:8f6e with SMTP id
ea45-20020a056870072d00b0019e8ab98f6emr2192660oab.0.1685293480374; Sun, 28
May 2023 10:04:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 28 May 2023 10:04:40 -0700 (PDT)
In-Reply-To: <u4vj9n$25m1c$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a9a3:af95:ad1c:c965;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a9a3:af95:ad1c:c965
References: <u4por8$3tugb$1@dont-email.me> <Ia3cM.3440031$iU59.2338510@fx14.iad>
<u4vj9n$25m1c$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d4f75b55-78f1-4fa4-96fb-9fb45fee8168n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sun, 28 May 2023 17:04:40 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1794

by: MitchAlsup - Sun, 28 May 2023 17:04 UTC

On Sunday, May 28, 2023 at 8:02:07 AM UTC-5, Thomas Koenig wrote:
> Scott Lurndal <sc...@slp53.sl.home> schrieb:
>
> >> Power and PowerPC
>
> [...]
>
> > Obsolete CPUs, all designed three decades ago.
>
> Power10 became available in 2021, and Power11 (presumably) is on the
> way; patches are already appearing in gcc.
<
But Power (on which it is based) is a 4 decade old design.

Re: Misc: Design tradeoffs in virtual memory systems...

<6c8bb1bd-3ba5-4864-8b46-d1c738133c9cn@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32395&group=comp.arch#32395

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:1712:b0:75c:6a75:be4a with SMTP id az18-20020a05620a171200b0075c6a75be4amr1132231qkb.14.1685293611374;
Sun, 28 May 2023 10:06:51 -0700 (PDT)
X-Received: by 2002:a9d:6b0f:0:b0:6af:6f25:490 with SMTP id
g15-20020a9d6b0f000000b006af6f250490mr2331361otp.1.1685293611113; Sun, 28 May
2023 10:06:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 28 May 2023 10:06:50 -0700 (PDT)
In-Reply-To: <u4vjtf$sn61$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:a9a3:af95:ad1c:c965;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:a9a3:af95:ad1c:c965
References: <u4por8$3tugb$1@dont-email.me> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com> <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com> <SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com> <u4tn2h$h7q5$1@dont-email.me>
<c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com> <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
<u4uakj$jmlc$1@dont-email.me> <6be9f945-59e3-492d-8ac9-35f987b981d2n@googlegroups.com>
<u4vjtf$sn61$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6c8bb1bd-3ba5-4864-8b46-d1c738133c9cn@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sun, 28 May 2023 17:06:51 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5810

by: MitchAlsup - Sun, 28 May 2023 17:06 UTC

On Sunday, May 28, 2023 at 8:12:42 AM UTC-5, Paul A. Clayton wrote:
> On 5/27/23 10:28 PM, MitchAlsup wrote:
> > On Saturday, May 27, 2023 at 8:28:36 PM UTC-5, Paul A. Clayton wrote:
> >> MitchAlsup wrote:
> >> [snip]
> >>> PTPs are handled differently than PTEs. Only PTEs go in the TLB, a different
> >>> structure (possibly even flip-flops) can manage the sequentially of PTPs.
> > <
> >> Quick niggling response: Linear page tables (table stored in the
> >> virtual address space) would naturally store intermediate nodes in
> >> the TLB (in fact, the page table base could, in theory, be such a
> >> TLB-cached node). Placing such in an L2 TLB or a "side" TLB seems
> >> attractive for avoiding interference/capacity waste given more
> >> frequent 'ordinary' translation requests.
> > <
> > We would put PTPs in a table-walk accelerator cache which could
> > be searched as the TLB was searched. Thus, a couple hits in the
> > TWA and by the time you know the TLB missed, you could already
> > have the address of the missing PTE. In Ross Colorado chips the
> > TWA has 8 entries.
> >>
> >> Also, there is some attraction *to me* for
> >> storing PTPs and equivalent-node large page translations in the
> >> same structure as this reduces the disincentive for large caches
> >> (TLBs) for large pages.
> > <
> > PTPs in the TLB is a waste of TLB entries--a precious resource.
<
> For large page TLBs, there is a problem of utilization. If no
> large pages are used, the area (and power) is wasted. By including
> PTPs, most of the area and static power is not wasted (though
> dynamic power will be due to accessing this special TLB for every
> access — unless the large page TLB access can be delayed until
> after a small page TLB miss is noted [prediction might also be
> used]).
<
So put large page PTEs in the TLB !! problem solved.
>
> For L2 TLBs, the benefit is even more likely to justify the costs
> both because the capacity is larger (so utilization is more
> important) and because access frequency is lower (so dynamic power
> cost is less). Parallel access of a table walk cache with the TLB
> would have the same power waste.
>
> For huge pages, special TLBs and hash/rehash seem to be the most
> practical options. (Overlaid skewed associativity could support
> parallel access with multiple hashes, but because the bits are
> different conflicts would be possible [possibly handled by
> serialization].) Hash/rehash has some utilization advantage and
> for huge pages the modest extra latency of a later indexing may be
> covered by greater persistence of entries in a small specialized
> TLB but leaves tags bits unused for large pages and data bits
> unused if large pages do not repurpose those bits to extend the
> physical address space or to provide extra metadata [permissions,
> caching behavior, etc.] Varying the number of entries in a set
> based on the page size would be possible (to reduce storage
> "waste"), but that also introduces issues.
>
> Associativity is also a factor, with separate storage, each
> storage area has distinct ways. This seems to be one of those
> areas where specialization (efficiency) and utilization tradeoffs
> are significant.
> > < (There may be some disadvantages of such
> >> shared storage. E.g., perhaps PTPs might benefit from sharing the
> >> tag among multiple entries differently than large-page PTEs.
> >> Complexity would also be a factor.)
> > <
> > In the past I have held lines of PTPs in TWA so that nearby accesses
> > do not need L2 accesses.
> >>
> >> (I hope to post my own thoughts on hardware TLB fill sometime.)

Re: Misc: Design tradeoffs in virtual memory systems...

<u504p0$vefd$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32396&group=comp.arch#32396

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sun, 28 May 2023 12:58:50 -0500
Organization: A noiseless patient Spider
Lines: 167
Message-ID: <u504p0$vefd$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<u4vot1$591$2@reader2.panix.com>
<2023May28.165701@mips.complang.tuwien.ac.at> <u4vvc9$udlp$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 28 May 2023 17:58:56 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e56593e5a9e5d581f41eba438773153c";
logging-data="1030637"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX184pDZw5i/ypzCUlwkWUPQi"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:2XQ9OOXvqo6kVuNERNGlKxSbN/Y=
Content-Language: en-US
In-Reply-To: <u4vvc9$udlp$1@dont-email.me>

by: BGB - Sun, 28 May 2023 17:58 UTC

On 5/28/2023 11:26 AM, BGB wrote:
> On 5/28/2023 9:57 AM, Anton Ertl wrote:
>> cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>> In article <u4rm3o$52b5$1@dont-email.me>, BGB <cr88192@gmail.com>
>>> wrote:
>>>> How many people have done much better with their hobby CPU ISA
>>>> projects?...
>>>
>>> I haven't a clue. However, most undergrads who take an
>>> architecture course do about the same.
>>
>> Implement a new architecture in an FPGA and get it to run Doom (which
>> also means retargeting a compiler for his architecture)? I very much
>> doubt it.
>>
>
> In this case, I didn't retarget an existing compiler, I wrote my own...
> Though, in retrospect, it might have been better had I retargeted an
> existing (popular) compiler.
>

Though, I did have the frontend of a compiler which I had written some
years earlier, back when I was taking college classes (though, this was
at a community college; the compiler was also my own project, unrelated
to anything in the classes).

AFAIK, mostly the idea there is that most of the CS-EE classes have
people implement fairly minimalist RISC-V cores (some of the slides I
had encountered implied that they were often mostly only doing RV32I or
RV64I and even then, leaving out some of the instructions as optional).

In the past, I "could" have "just implemented a RISC-V core", but this
would be kinda lame.

>
> Never actually went to any university or similar (who can afford
> this?...), so what little I have "from" any of those courses was
> occasionally running into power-point slides on the internet.
>

I don't know that much about that whole system though, like "Undergrads"
vs "Postdocs" vs whatever, I have no idea...

As for ports, I also have:
Quake
GLQuake
Both had some tweaks to try to make them faster...
Heretic, Hexen, ROTT

A partially done port of Quake 3 Arena:
But, put this on hold as, at the time, there was still some lacking
functionality in terms of the "OS";
Still on hold mostly because I have doubts it would give anywhere near
usable performance.

Had looked briefly at the Duke Nukem 3D engine, but didn't bother, as a
lot of the code looked terrible and I didn't want to mess with it (ROTT
was already pretty bad in some areas, Duke3D looks worse).

Had ported random misc stuff, like Dhrystone and similar.

Can't really port much that depends on Linux, without Linux itself,
which in turn is somewhat dependent on GCC and the GNU environment. But,
even then looks like it would be too much effort.

Porting over the userland would also likely be a much bigger challenge.

> Admittedly, never really worked "in the industry" either.
>
>
>> Still, the point by Scott Lurndal stands: Choices that are fine for a
>> project with the limited resources that BGB has may be suboptimal for
>> a more resource-rich project.
>>

Part of my design goal was, fairly specifically, trying to optimize for
more resource-limited cases.

As-is, I am running two cores on an XC7A200T, which is bigger than I
would have liked.

But, not really clear what size of ASIC die area the Spartan or Artix
class FPGAs are equivalent to.

But, do know as-is, that I am looking around 1W of power, and an ASIC
could probably use less power than an FPGA while also running at higher
clock speeds. But, dunno...

Something like a "tape out" will cost more money than I am ever likely
to have access to, short of some bigger entity taking interest in my
project.

>> It seems that the most limited resource here is BGB himself. And it's
>> not clear to me if implementing the hardware to produce a TLB miss
>> exception and then implementing the miss handler in software really
>> takes less of that resource than implementing a hardware page table
>> walker. But he already seems to have gone for the first choice, so I
>> would stick to it unless it becomes a significant problem.
>>
>
> Probably fair enough...
>

It was partly inertia:
This is where I started, this is where it was going.

Admittedly, it took around a year or so to really hammer out most of the
edge-cases related to the interrupt handling (such as getting stuck in
states where the CPU effectively deadlocks).

Software TLB won me over with the promise of more flexibility relative
to hardware cost, and at least on an FPGA, "a few Block-RAMs" isn't a
huge cost.

It is possible that on an ASIC, the page-walker could be cheaper than an
equivalent SRAM, can't say here.

Had considered maybe trying to use my CPU core to run a small robot,
probably using an Arty S7-50 board as the "brain" (can sorta shove a
single-core version into the XC7S50).

Had ordered a chasis+motor+wheels for this idea (noting as the full
combo didn't cost much more than just the wheels). Slightly lame that
Amazon classified it as a "toy" but oh well.

Already have most of the other parts I could use.

Beyond the Arty, was probably going to stick a camera module and similar
on it. May also need to figure out some form of wireless communication.

Like, some form of wireless RS-232 style interface or similar would be
useful. Actually, would prefer something in the 400kbps to 1Mbps range
if possible (then I could stream the camera image).

Not sure if there are any off-the-shelf transceiver modules for this
sort of thing though.

Looks it up, apparently there are wireless RS-232 "Packet Radio"
transceivers, but these look much bigger/bulkier/more-expensive than I
would want for this. Doesn't need to go much further than maybe 5 or 10
meters.

Had also noted while looking around that apparently they are selling ~ 2
foot tall humanoid robots on Amazon as "particularly expensive" toys...

Not clear what their capabilities are (eg, if they are actually able to
walk around and similar? ...), not going to buy one though in any case.

....

According to Scott Lurndal <slp53@pacbell.net>:
>>Power10 became available in 2021, and Power11 (presumably) is on the
>>way; patches are already appearing in gcc.
>
>Indeed, Power has been continuously enhanced since 1991. Which,
>I believe is three decades ago. As for obsolescence, how many new
>design wins has Power had in the last decade?

I dunno, how many design wins has x86 had lately?

There are plenty of ISAs that nobody would design now, but they are
good enough that people keep using them. POWER and x86 and 360/370/z
are surely on that list. Seeing how complex ARM is getting (1000 page
architecture manual) it's also on its way.

There are a few mistakes that kill an architecture, with the #1 being
that the address space is too small, and for those of us old enough,
that it didn't have 8-bit byte addressing and twos-complement
arithmetic. Other than that you can throw hardware at it and make it
fast enough.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Misc: Design tradeoffs in virtual memory systems...

<u50ng3$12k82$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32406&group=comp.arch#32406

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sun, 28 May 2023 19:18:25 -0400
Organization: A noiseless patient Spider
Lines: 90
Message-ID: <u50ng3$12k82$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
<SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com>
<u4tn2h$h7q5$1@dont-email.me>
<c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com>
<44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
<u4uakj$jmlc$1@dont-email.me>
<6be9f945-59e3-492d-8ac9-35f987b981d2n@googlegroups.com>
<u4vjtf$sn61$1@dont-email.me>
<6c8bb1bd-3ba5-4864-8b46-d1c738133c9cn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 28 May 2023 23:18:27 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f8a68373967a000ad06113d5edd12f06";
logging-data="1134850"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/xeWWD5eQFx5gVVeNmcBvjsGdPDJJLucg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:hQXzE0uMaIyWfHbP+Y3k9AqQRus=
In-Reply-To: <6c8bb1bd-3ba5-4864-8b46-d1c738133c9cn@googlegroups.com>

by: Paul A. Clayton - Sun, 28 May 2023 23:18 UTC

On 5/28/23 1:06 PM, MitchAlsup wrote:
> On Sunday, May 28, 2023 at 8:12:42 AM UTC-5, Paul A. Clayton wrote:
>> On 5/27/23 10:28 PM, MitchAlsup wrote:
>>> On Saturday, May 27, 2023 at 8:28:36 PM UTC-5, Paul A. Clayton wrote:
[snip]
>>>> Also, there is some attraction *to me* for
>>>> storing PTPs and equivalent-node large page translations in the
>>>> same structure as this reduces the disincentive for large caches
>>>> (TLBs) for large pages.
>>> <
>>> PTPs in the TLB is a waste of TLB entries--a precious resource.
> <
>> For large page TLBs, there is a problem of utilization. If no
>> large pages are used, the area (and power) is wasted. By including
>> PTPs, most of the area and static power is not wasted (though
>> dynamic power will be due to accessing this special TLB for every
>> access — unless the large page TLB access can be delayed until
>> after a small page TLB miss is noted [prediction might also be
>> used]).
> <
> So put large page PTEs in the TLB !! problem solved.

Including large pages and base pages in a single TLB has
traditionally been done using CAMs, which are relatively
expensive. (For L2, hash/rehash has been used.) In addition to the
difference in tag size and possibly data size, the indexing
methods are not consistent.

"put the large page PTEs in the TLB" [with the base page PTEs]
solves the utilization issue in terms of entries but increases
latency for large pages if hash/rehash is used (which latency can
be at least partially hidden by using the extra untranslated bits
for cache tag comparison and for L2 the latency would be less
important) or uses more area from a CAM-based TLB. In either case,
the difference in entry size is not exploited.

(A small tag TLB with full-sized payload could also be used for
base size PTEs with a the missing upper bits filled in by a
constant [perhaps zero extension] or using some bits to reference
a table. Such use seems problematic, but it might be worth at
least researching. I seem to recall that x86 added an address
space extension for 4 MiB pages with 32-bit PTEs, though I doubt
such was used much.)

I do not claim that having PTEs and PTPs of the same level share
storage is an obviously preferred design, but I do think it
presents a different balance of tradeoffs and should not be so
easily dismissed (I doubt you or anyone else has actually studied
this design choice). I admit that I promote it (to the extent I
do) in large part because, as far as I know, I am the first and
only person to notice this possibility. This *might* merely
indicate that the idea is not useful since negative results are
less commonly published even in academia, but I suspect that the
idea is merely not interesting enough to draw research.

(My motivation for the idea was from the limited number of large
page entries supported in some designs, introducing a chicken-and-
egg issue — providing hardware resources for a less used feature
would be wasteful, using a feature with less hardware resources is
less useful. I also note that a PTP cache could be used for
compressing tags in a TLB similar to Seznec's "Don't use the page
number, but a pointer to it" but replacing 'page number' with 'x
MiB region number' and applying to a TLB and not a general memory
cache. While a version of Seznec's idea was used in Itanium 2's
"prevalidated" L1 tags (which used a one hot bit to match a TLB
number, the one hot bit format facilitating flash invalidation on
TLB entry eviction), the utility of this kind of a design for a
TLB seems doubtful.)

It looks like Intel's Sunny Cove combines hash/rehash for the L2
TLB, fully associative (CAM) and multi-size for L1I TLBs, and
separate size-based TLBs for L1D
(https://www.anandtech.com/show/14664/testing-intel-ice-lake-10nm/2 ):

Page # of
Size(s) Entries Associativity
L1D 4K 64 4-way
L1D 2M 32 4-way
L1D 1G 8 full
L1I 4K+2M 8 full
L1I 4K+2M+1G 16 full
L2 4K+2M 1024 8-way
L2 4K+1G 1024 8-way

It is slightly interesting that the L2 is divided into two
sections, allowing three sizes with only hash/rehash (i.e., not
requiring a third probing). The L1I TLB is also divided into two
parts with one supporting all three page sizes (and having twice
as many entries) while the other only supports the two smaller
page sizes.

Re: Misc: Design tradeoffs in virtual memory systems...

<u50p50$12upf$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32407&group=comp.arch#32407

copy link Newsgroups: comp.arch

by: BGB - Sun, 28 May 2023 23:46 UTC

On 5/28/2023 4:33 PM, John Levine wrote:
> According to Scott Lurndal <slp53@pacbell.net>:
>>> Power10 became available in 2021, and Power11 (presumably) is on the
>>> way; patches are already appearing in gcc.
>>
>> Indeed, Power has been continuously enhanced since 1991. Which,
>> I believe is three decades ago. As for obsolescence, how many new
>> design wins has Power had in the last decade?
>
> I dunno, how many design wins has x86 had lately?
>

I guess x86-64 was adopted for the PS4 and PS5, and XBox One, rather
than them sticking with PowerPC.

Nintendo ended up mostly jumping from PowerPC to ARM.

> There are plenty of ISAs that nobody would design now, but they are
> good enough that people keep using them. POWER and x86 and 360/370/z
> are surely on that list. Seeing how complex ARM is getting (1000 page
> architecture manual) it's also on its way.
>

Pretty much.

IMO, for the most part, 64-bit ARM is a fairly sensible design.
One of its biggest drawbacks though, is that it is not an open ISA.

There is a lot of hype for RISC-V, but I suspect it has been "oversold"
in a few areas. Its main merit is that it is an open ISA, and that the
core design makes a fair bit of sense for microcontroller class devices.

Scaling up to bigger systems, I have concerns...

Some of the proposed extensions seem like a needlessly complicated /
ad-hoc mess. And, the available encoding space (and extensibility) seems
nowhere near as large as seems to often be implied.

Like, JAL / LUI / AUIPC, ate some "not exactly small" parts of the
encoding space. Using 12-bit immediate values and displacements also ate
some encoding space, ...

Granted, one can argue, 12-bits is more than 9-bits...

....

> There are a few mistakes that kill an architecture, with the #1 being
> that the address space is too small, and for those of us old enough,
> that it didn't have 8-bit byte addressing and twos-complement
> arithmetic. Other than that you can throw hardware at it and make it
> fast enough.
>

There is at least one recent project using the 6502 (as a "PC") despite
the limitations of its 16-bit address space. Apparent idea being that
they will put 512K of RAM in the thing, and then bank out a few of the
pages to cover the entire 512K space...

Say another project (by someone else) talking about trying to implement
paged virtual memory, on the Z80...

Granted, the "practicality" here is a bit suspect.

On BJX2, I had went with a 48-bit address space.

Failing this, it is "technically" possible to expand to a fully 96-bit
virtual address space. I put this on hold for now mostly because this
seemed overkill (and using 128-bit pointers isn't free).

So, it is then a question of, when one needs more than 256TB of virtual
address space, if the 128 bit pointers are too much of an ask.

But, it is kind of a waste to use 128-bit pointers when only dealing
with MB of memory (and when 32-bit pointers would have been technically
sufficient...).

Re: Misc: Design tradeoffs in virtual memory systems...

<u50r01$13c6e$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32408&group=comp.arch#32408

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sun, 28 May 2023 19:18:02 -0500
Organization: A noiseless patient Spider
Lines: 136
Message-ID: <u50r01$13c6e$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
<SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com>
<u4tn2h$h7q5$1@dont-email.me>
<c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com>
<44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
<u4uakj$jmlc$1@dont-email.me>
<6be9f945-59e3-492d-8ac9-35f987b981d2n@googlegroups.com>
<u4vjtf$sn61$1@dont-email.me>
<6c8bb1bd-3ba5-4864-8b46-d1c738133c9cn@googlegroups.com>
<u50ng3$12k82$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 29 May 2023 00:18:09 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="35878edc648de3651095e2f15a816b94";
logging-data="1159374"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18KmWAoKmLBUtby9Jg74YHC"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:lvR9si4a7fYg5EOK5AMgafYDXqA=
Content-Language: en-US
In-Reply-To: <u50ng3$12k82$1@dont-email.me>

by: BGB - Mon, 29 May 2023 00:18 UTC

On 5/28/2023 6:18 PM, Paul A. Clayton wrote:
> On 5/28/23 1:06 PM, MitchAlsup wrote:
>> On Sunday, May 28, 2023 at 8:12:42 AM UTC-5, Paul A. Clayton wrote:
>>> On 5/27/23 10:28 PM, MitchAlsup wrote:
>>>> On Saturday, May 27, 2023 at 8:28:36 PM UTC-5, Paul A. Clayton wrote:
> [snip]
>>>>> Also, there is some attraction *to me* for
>>>>> storing PTPs and equivalent-node large page translations in the
>>>>> same structure as this reduces the disincentive for large caches
>>>>> (TLBs) for large pages.
>>>> <
>>>> PTPs in the TLB is a waste of TLB entries--a precious resource.
>> <
>>> For large page TLBs, there is a problem of utilization. If no
>>> large pages are used, the area (and power) is wasted. By including
>>> PTPs, most of the area and static power is not wasted (though
>>> dynamic power will be due to accessing this special TLB for every
>>> access — unless the large page TLB access can be delayed until
>>> after a small page TLB miss is noted [prediction might also be
>>> used]).
>> <
>> So put large page PTEs in the TLB !! problem solved.
>
> Including large pages and base pages in a single TLB has
> traditionally been done using CAMs, which are relatively
> expensive. (For L2, hash/rehash has been used.) In addition to the
> difference in tag size and possibly data size, the indexing
> methods are not consistent.
>

Yeah...

To make much use of large pages (within a single TLB) requires a highly
associative TLB, which is very expensive.

So, one can have a small fully-associative TLB which "eats all the
LUTs", or a significantly larger set-associative TLB that eats a few
Block-RAMs. I went for the Block-RAMs.

But, this has the limitation that it can't make effective use of large
pages.

So, running the MMU in 4K page mode, and using 16K pages, would
effectively split the page over 4 TLBEs.

> "put the large page PTEs in the TLB" [with the base page PTEs]
> solves the utilization issue in terms of entries but increases
> latency for large pages if hash/rehash is used (which latency can
> be at least partially hidden by using the extra untranslated bits
> for cache tag comparison and for L2 the latency would be less
> important) or uses more area from a CAM-based TLB. In either case,
> the difference in entry size is not exploited.
>
> (A small tag TLB with full-sized payload could also be used for
> base size PTEs with a the missing upper bits filled in by a
> constant [perhaps zero extension] or using some bits to reference
> a table. Such use seems problematic, but it might be worth at
> least researching. I seem to recall that x86 added an address
> space extension for 4 MiB pages with 32-bit PTEs, though I doubt
> such was used much.)
>

One could, say, have multiple TLBs internally for different page sizes.

Say, for example:
TLB0: 4/16K/64K pages
TLB1: 1MB pages (or some other large size)

So, if each is 256x 4-way, then:
TLB0 covers 16MB
TLB1 covers 1GB

Big limiting factor though is, how likely it is in virtual-memory
contexts that one will have large areas where both the virtual and
physical addresses are contiguous.

Large pages seem likely to be relatively niche in this case, unless the
OS is specifically optimized to use them in certain contexts.

> I do not claim that having PTEs and PTPs of the same level share
> storage is an obviously preferred design, but I do think it
> presents a different balance of tradeoffs and should not be so
> easily dismissed (I doubt you or anyone else has actually studied
> this design choice). I admit that I promote it (to the extent I do) in
> large part because, as far as I know, I am the first and
> only person to notice this possibility. This *might* merely
> indicate that the idea is not useful since negative results are
> less commonly published even in academia, but I suspect that the
> idea is merely not interesting enough to draw research.
>
> (My motivation for the idea was from the limited number of large
> page entries supported in some designs, introducing a chicken-and-
> egg issue — providing hardware resources for a less used feature
> would be wasteful, using a feature with less hardware resources is
> less useful. I also note that a PTP cache could be used for
> compressing tags in a TLB similar to Seznec's "Don't use the page
> number, but a pointer to it" but replacing 'page number' with 'x
> MiB region number' and applying to a TLB and not a general memory
> cache. While a version of Seznec's idea was used in Itanium 2's
> "prevalidated" L1 tags (which used a one hot bit to match a TLB
> number, the one hot bit format facilitating flash invalidation on
> TLB entry eviction), the utility of this kind of a design for a
> TLB seems doubtful.)
>
> It looks like Intel's Sunny Cove combines hash/rehash for the L2 TLB,
> fully associative (CAM) and multi-size for L1I TLBs, and separate
> size-based TLBs for L1D
> (https://www.anandtech.com/show/14664/testing-intel-ice-lake-10nm/2 ):
>
>         Page      # of
>         Size(s) Entries Associativity
> L1D     4K         64         4-way
> L1D     2M         32         4-way
> L1D     1G          8         full
> L1I     4K+2M       8         full
> L1I     4K+2M+1G   16         full
> L2     4K+2M    1024         8-way
> L2     4K+1G     1024         8-way
>
> It is slightly interesting that the L2 is divided into two sections,
> allowing three sizes with only hash/rehash (i.e., not requiring a third
> probing). The L1I TLB is also divided into two parts with one supporting
> all three page sizes (and having twice as many entries) while the other
> only supports the two smaller page sizes.

OK.

Those patterns seem to imply that the tradeoffs for ASICs are not *that*
much different from those of FPGAs in this area.

Re: Misc: Design tradeoffs in virtual memory systems...

<b3ed76d4-9878-403f-aa5f-bec754ec3b62n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32410&group=comp.arch#32410

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:2951:b0:74f:b866:a0c3 with SMTP id n17-20020a05620a295100b0074fb866a0c3mr1355220qkp.1.1685351546054;
Mon, 29 May 2023 02:12:26 -0700 (PDT)
X-Received: by 2002:aca:d702:0:b0:394:4f66:d5f with SMTP id
o2-20020acad702000000b003944f660d5fmr2375895oig.9.1685351545774; Mon, 29 May
2023 02:12:25 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 29 May 2023 02:12:25 -0700 (PDT)
In-Reply-To: <u50r01$13c6e$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <u4por8$3tugb$1@dont-email.me> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com> <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com> <SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com> <u4tn2h$h7q5$1@dont-email.me>
<c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com> <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
<u4uakj$jmlc$1@dont-email.me> <6be9f945-59e3-492d-8ac9-35f987b981d2n@googlegroups.com>
<u4vjtf$sn61$1@dont-email.me> <6c8bb1bd-3ba5-4864-8b46-d1c738133c9cn@googlegroups.com>
<u50ng3$12k82$1@dont-email.me> <u50r01$13c6e$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b3ed76d4-9878-403f-aa5f-bec754ec3b62n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: robfi680@gmail.com (robf...@gmail.com)
Injection-Date: Mon, 29 May 2023 09:12:26 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2047

by: robf...@gmail.com - Mon, 29 May 2023 09:12 UTC

Subject	Author
Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Terje Mathisen
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	EricP
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	robf...@gmail.com
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	robf...@gmail.com
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	Anton Ertl
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	EricP
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	EricP
Re: Misc: Design tradeoffs in virtual memory systems...	EricP
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	EricP
Re: Misc: Design tradeoffs in virtual memory systems...	Ivan Godard
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	Anton Ertl
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	Ivan Godard
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Stephen Fuld
Re: Misc: Design tradeoffs in virtual memory systems...	robf...@gmail.com
Re: Misc: Design tradeoffs in virtual memory systems...	Stephen Fuld
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Stephen Fuld
Re: Misc: Design tradeoffs in virtual memory systems...	Ivan Godard
Re: Misc: Design tradeoffs in virtual memory systems...	Thomas Koenig
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	Thomas Koenig
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB-Alt
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB-Alt
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	John Levine
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	Tim Rentsch
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	John Levine
Re: Misc: Design tradeoffs in virtual memory systems...	Thomas Koenig
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Anton Ertl