Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

"Flattery is all right -- if you don't inhale." -- Adlai Stevenson


devel / comp.arch / Re: Misc: Design tradeoffs in virtual memory systems...

SubjectAuthor
* Misc: Design tradeoffs in virtual memory systems...BGB
+* Re: Misc: Design tradeoffs in virtual memory systems...Scott Lurndal
|+- Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|+* Re: Misc: Design tradeoffs in virtual memory systems...BGB
||+* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|||`* Re: Misc: Design tradeoffs in virtual memory systems...BGB
||| `- Re: Misc: Design tradeoffs in virtual memory systems...Scott Lurndal
||`* Re: Misc: Design tradeoffs in virtual memory systems...Scott Lurndal
|| +* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |`* Re: Misc: Design tradeoffs in virtual memory systems...Scott Lurndal
|| | `* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  +* Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  |+* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  ||`* Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  || `* Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  ||  +* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  ||  |+- Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  ||  |+* Re: Misc: Design tradeoffs in virtual memory systems...Terje Mathisen
|| |  ||  ||`- Re: Misc: Design tradeoffs in virtual memory systems...Paul A. Clayton
|| |  ||  |+* Re: Misc: Design tradeoffs in virtual memory systems...EricP
|| |  ||  ||`* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  ||  || `* Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  ||  ||  `* Re: Misc: Design tradeoffs in virtual memory systems...robf...@gmail.com
|| |  ||  ||   `* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  ||  ||    +* Re: Misc: Design tradeoffs in virtual memory systems...Paul A. Clayton
|| |  ||  ||    |+* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  ||  ||    ||`* Re: Misc: Design tradeoffs in virtual memory systems...Paul A. Clayton
|| |  ||  ||    || `* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  ||  ||    ||  `* Re: Misc: Design tradeoffs in virtual memory systems...Paul A. Clayton
|| |  ||  ||    ||   +* Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  ||  ||    ||   |+- Re: Misc: Design tradeoffs in virtual memory systems...robf...@gmail.com
|| |  ||  ||    ||   |`- Re: Misc: Design tradeoffs in virtual memory systems...Paul A. Clayton
|| |  ||  ||    ||   +* Re: Misc: Design tradeoffs in virtual memory systems...Anton Ertl
|| |  ||  ||    ||   |+- Re: Misc: Design tradeoffs in virtual memory systems...Scott Lurndal
|| |  ||  ||    ||   |`- Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  ||  ||    ||   `* Re: Misc: Design tradeoffs in virtual memory systems...EricP
|| |  ||  ||    ||    `* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  ||  ||    ||     `- Re: Misc: Design tradeoffs in virtual memory systems...EricP
|| |  ||  ||    |+- Re: Misc: Design tradeoffs in virtual memory systems...EricP
|| |  ||  ||    |`- Re: Misc: Design tradeoffs in virtual memory systems...Scott Lurndal
|| |  ||  ||    +* Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  ||  ||    |+- Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  ||  ||    |`- Re: Misc: Design tradeoffs in virtual memory systems...Paul A. Clayton
|| |  ||  ||    `- Re: Misc: Design tradeoffs in virtual memory systems...EricP
|| |  ||  |+- Re: Misc: Design tradeoffs in virtual memory systems...Ivan Godard
|| |  ||  |`- Re: Misc: Design tradeoffs in virtual memory systems...Paul A. Clayton
|| |  ||  `* Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  ||   `* Re: Misc: Design tradeoffs in virtual memory systems...Anton Ertl
|| |  ||    +* Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  ||    |`- Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  ||    `* Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  ||     +* Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  ||     |+* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  ||     ||`- Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  ||     |`- Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  ||     `* Re: Misc: Design tradeoffs in virtual memory systems...Paul A. Clayton
|| |  ||      `* Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  ||       `* Re: Misc: Design tradeoffs in virtual memory systems...Ivan Godard
|| |  ||        +- Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  ||        `* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  ||         +* Re: Misc: Design tradeoffs in virtual memory systems...Stephen Fuld
|| |  ||         |+* Re: Misc: Design tradeoffs in virtual memory systems...robf...@gmail.com
|| |  ||         ||`- Re: Misc: Design tradeoffs in virtual memory systems...Stephen Fuld
|| |  ||         |`* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  ||         | `- Re: Misc: Design tradeoffs in virtual memory systems...Stephen Fuld
|| |  ||         `* Re: Misc: Design tradeoffs in virtual memory systems...Ivan Godard
|| |  ||          +- Re: Misc: Design tradeoffs in virtual memory systems...Thomas Koenig
|| |  ||          `* Re: Misc: Design tradeoffs in virtual memory systems...Paul A. Clayton
|| |  ||           `* Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  ||            `* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  ||             `* Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  ||              +* Re: Misc: Design tradeoffs in virtual memory systems...Scott Lurndal
|| |  ||              |`- Re: Misc: Design tradeoffs in virtual memory systems...Thomas Koenig
|| |  ||              `- Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  |`* Re: Misc: Design tradeoffs in virtual memory systems...Scott Lurndal
|| |  | `* Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  |  `* Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  |   `* Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  |    `* Re: Misc: Design tradeoffs in virtual memory systems...BGB-Alt
|| |  |     `* Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  |      `* Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  |       `* Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  |        `* Re: Misc: Design tradeoffs in virtual memory systems...BGB-Alt
|| |  |         `* Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  |          `* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  |           +- Re: Misc: Design tradeoffs in virtual memory systems...Scott Lurndal
|| |  |           +* Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  |           |`- Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  |           +* Re: Misc: Design tradeoffs in virtual memory systems...John Levine
|| |  |           |`- Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
|| |  |           `* Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  |            `* Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  |             +- Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  |             `* Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  |              `* Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  |               `* Re: Misc: Design tradeoffs in virtual memory systems...Dan Cross
|| |  |                `- Re: Misc: Design tradeoffs in virtual memory systems...BGB
|| |  `- Re: Misc: Design tradeoffs in virtual memory systems...Scott Lurndal
|| `* Re: Misc: Design tradeoffs in virtual memory systems...Tim Rentsch
||  +- Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
||  `* Re: Misc: Design tradeoffs in virtual memory systems...John Levine
|`* Re: Misc: Design tradeoffs in virtual memory systems...Thomas Koenig
+* Re: Misc: Design tradeoffs in virtual memory systems...MitchAlsup
`* Re: Misc: Design tradeoffs in virtual memory systems...Anton Ertl

Pages:12345678
Re: Misc: Design tradeoffs in virtual memory systems...

<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32352&group=comp.arch#32352

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:170e:b0:75c:b075:c65 with SMTP id az14-20020a05620a170e00b0075cb0750c65mr465727qkb.11.1685152184373;
Fri, 26 May 2023 18:49:44 -0700 (PDT)
X-Received: by 2002:aca:c183:0:b0:397:f54a:22d6 with SMTP id
r125-20020acac183000000b00397f54a22d6mr984325oif.9.1685152184071; Fri, 26 May
2023 18:49:44 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 26 May 2023 18:49:43 -0700 (PDT)
In-Reply-To: <u4rm3o$52b5$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:f0cc:6635:9d68:fdae;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:f0cc:6635:9d68:fdae
References: <u4por8$3tugb$1@dont-email.me> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com> <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 27 May 2023 01:49:44 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4862
 by: MitchAlsup - Sat, 27 May 2023 01:49 UTC

On Friday, May 26, 2023 at 8:24:13 PM UTC-5, BGB wrote:
> On 5/26/2023 4:15 PM, Dan Cross wrote:
> > In article <66610ab9-0c63-4b8e...@googlegroups.com>,
> > MitchAlsup <Mitch...@aol.com> wrote:
> >> On Friday, May 26, 2023 at 2:59:13 PM UTC-5, Dan Cross wrote:
> >>> Nor does it imply that a hardware page-table walker is bad.
> >>
> >> It has always been my argument that HW tableWalkers are best.
> >
> > I concur. OP does not, but OP doesn't seem to be talking from a
> > particularly knowledgable position.
> >
> If I had no idea what I was doing, I would not likely have gotten as far
> as I have...
>
> But, admittedly, some amount of what information I had came from
> quick-and-dirty web-searches and gathering information from things like
> CS/EE PowerPoint slides and similar.
<
Quick and dirty web searches often portray x86 as the best architecture
that could ever be invented, almost always emulated, and never defamed.
>
> Some amount of reading PDFs and similar as well.
<
> >>> The OP seems to think so, but has yet to provide a particularly
> >>> compelling argument beyond some measurements from very
> >>> unrepresentative workloads running on a hobby ISA on an FPGA
> >>
> >> One must rate BGBs architecture is the less-than-even-academic
> >> category; far from industrial quality.
> >
> > Agreed.
> >
> How many people have done much better with their hobby CPU ISA projects?....
<
I will give you great credit in pulling what you have off.
Where I tend to disagree, is when you state what you have done as
if it is superlative. I am guilty of the same.
<
>
> >> But we enjoy watching him stumble across industrial problems
> >> making the same mistakes we made 40 years ago.
> >
> > It seems like many of those could be avoided if OP were a bit
> > more open-minded and, dare I say it, self-aware.
> >
> ?...
>
> I am aware of my own existence, and am able to recognize my own
> reflection in a mirror, etc.
<
Cuttlefish are the boundary line on self recognition.
>
> I am not always the best in "intuitive" contexts.
> Nor necessarily at noticing or thinking about "obvious" things.
>
> Nor, particularly skilled with "top down" thinking.
>
Throughout my career, I have been more successful with middle out
thinking and designing--a both ends towards the middle, rather than
a) just throw it together and see what you can make work, and b)
top down, minutia be damned--ways of addressing the problems.
<
I find it interesting that the original 8080 was (a) as was Unix (a).
CDC 6600 has the feel for middle out
System 360 has a top-down feel to it.
MMX-through SSE3,4,5 has a throw it together feel.
x86-64 has a middle out feel to it.
>
> I am not entirely sure how to describe my thought processes, or how they
> compare to others.
<
If I could accurately portray my though processes to a Psychiatrist,
they would probably lock me up..............
<
I often tell the joke to someone like us::
<
Sane people think that there is a big difference between being sane and insane.
<
< longish pause >
<
We know otherwise.
>
> More, it is like mentally stringing random thoughts and ideas together,
> and seeing which (if any) go in seemingly interesting directions.
>
> ...

Re: Misc: Design tradeoffs in virtual memory systems...

<6abdc3ae-e4af-45c8-a324-34ec96b61525n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32353&group=comp.arch#32353

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:4613:b0:74e:a66:30f5 with SMTP id br19-20020a05620a461300b0074e0a6630f5mr419958qkb.5.1685153382791;
Fri, 26 May 2023 19:09:42 -0700 (PDT)
X-Received: by 2002:a9d:7306:0:b0:6ac:a378:15a3 with SMTP id
e6-20020a9d7306000000b006aca37815a3mr999120otk.4.1685153382561; Fri, 26 May
2023 19:09:42 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.1d4.us!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 26 May 2023 19:09:42 -0700 (PDT)
In-Reply-To: <u4rk20$4qbn$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:f0cc:6635:9d68:fdae;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:f0cc:6635:9d68:fdae
References: <u4por8$3tugb$1@dont-email.me> <16d4aaca-e13b-4330-9a50-fecd8933f5fdn@googlegroups.com>
<u4qspr$23sa$1@dont-email.me> <ec6bd9e3-fc64-4c93-9b33-442f8d89a47en@googlegroups.com>
<u4r27s$2lrj$1@dont-email.me> <4ac1abec-c685-4871-b2cd-079e4ea04991n@googlegroups.com>
<u4r92b$3idb$1@dont-email.me> <881c9ad7-2cae-4ea1-a263-72d4778fb9c0n@googlegroups.com>
<u4rk20$4qbn$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6abdc3ae-e4af-45c8-a324-34ec96b61525n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 27 May 2023 02:09:42 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 13268
 by: MitchAlsup - Sat, 27 May 2023 02:09 UTC

On Friday, May 26, 2023 at 7:50:36 PM UTC-5, BGB wrote:
> On 5/26/2023 5:17 PM, MitchAlsup wrote:
> > On Friday, May 26, 2023 at 4:43:19 PM UTC-5, BGB-Alt wrote:
> >> On 5/26/2023 3:31 PM, MitchAlsup wrote:
> >
> >>>>> My 66000 is not attackable by those means--and probably can
> >>>>> avoid needing ASLR although there is no reason to to use it as
> >>>>> it slows nothing down.
> >>> <
> >>>> I had assumed ASLR as a "standard line of defense".
> >>> <
> >>> If/when the unprivileged cannot access super-address-space no mater
> >>> the bit pattern created at AGEN, ASLR is not needed, as there is no way
> >>> the user can access memory not mapped by his page tables.
> > <
> >> ASLR can help protect against things like userspace code doing
> >> buffer-overflow exploits against system calls or intentionally mangled
> >> data structures (along with things like marking stack and heap memory
> >> and similar as non-executable, ...).
> > <
> > What if the control-flow information is not accessible to the LDs and STs
> > of an ISA ?? That is, you can buffer overflow all you want, but when you
> > execute RET you end up back at who called that subroutine ???
> > <
> > My 66000 architecture has 2 stack pointers, one for the data stack and
> > one for the control flow and protected registers stack. The one for the
> > protected register stack is not accessible to the program except through
> > ENTER and EXIT and RET instructions and the pages are marked RWE = 000
> > and HW verifies that those pages are so marked (at least at user privilege).
> > <
> > So, you can damage as much data memory as you like, but return from a
> > subroutine will end up at the instruction after the call AND all of the
> > preserved registers have their old values.
> Yeah. Only one stack in my case.
>
> There are stack canaries though, which can detect if a buffer overflow
> had overwritten the canary value.
>
> Had at one point considered a feature to hash the state of all of the
> preserved registers, and then verify that everything was intact. No real
> practical way to do this check in-hardware though.
> >>
> >> Partly as buffer-overflow and many forms of "confused deputy" attacks
> >> are not stopped by conventional memory-access protections, but can be
> >> made ineffective by using ASLR.
> >>
> > Look, I am NOT arguing that ASLR is bad, just that if the ISA and MMU
> > is properly defined ASLR brings nothing MORE to the table. I am arguing
> > the the need for ASLR is simply indicative of bad architecture.
> OK.
>
> Most systems use it at least, and it is reasonably cheap.
<
Most systems are based on X86 or ARM--both of which have these kinds
of problems.
>
> In an ideal world, maybe it would be unnecessary, but as I see it, it is
> likely well worth its cost on this front.
> >>
> >> Similar goes for the "compiler shuffles all the functions on each
> >> rebuild", etc.
> > <
> > Oh, and BTW, the user privilege is not allowed to write GOT and My 66000
> > does not need a PLT, either. Cutting off even more attack vectors.
<
> OK.
>
> I am mostly using direct branches for local calls, and had intended
> Abs48 branches for DLL imports (but, with the drawback that Abs48
> branches can't encode Inter-ISA branches).
>
> Though, it is possible I could consider also allowing a Jumbo-Load +
> Register Branch sequence for DLL imports, which is capable of InterISA,
> and/or always leave at least 128 bits.
<
What I did was to invent a LD IP,[addres] instruction. This transfers control
to the address in the accessed location and has the side effect of delivering
the return address to R0.
>
> It seems reasonable that class VTable's and similar could be put into
> read-only pages.
>
Once the method call tables have been assembled--those tables should
have write permission removed.
>
> Still leaves a concern though if a program could compose "malicious COM
> objects" or similar and then put them somewhere where they are not
> supposed to go.
>
> Though, this is an area where (in theory) keyring/ACL checks could help,
> since (ideally) if neither party can execute each others' ".text"
> sections, then one can stop programs from sneaking bad COM objects into
> OS APIs.
<
the first problem is that early PTE structures granted read and execute permission
as if it were the same. There are reasons one wants you only to be able to read
(.rodata for example) some are in memory, but you would neve want to transfer
control there; similarly, one should not be able to read code (.text). Mixing these
two up created a whole slew of problems..............
<
the second problem is that there are more than 2 layers. You want a debugger
to be able to read code, the stack walker sometimes needs to read code, the
dynamic linker needs to be able to write what otherwise smells like .rodata--
while the application is not allowed to manipulate any of this, but of course
the operating system IS.
<
In the past we granted excess permission to smooth out the bumps, and
now we have college coursed teaching students how to break existing
architectures, implementations, languages, networks,.....
<
We got it wrong up front, and now you are having to deal with the problem.
>
> Well, that or run lint-checks on any user-supplied objects and make
> disallow ones with VTables or method pointers into writable memory. But,
> this requires the APIs to be "well designed", which is possibly asking a
> lot.
> >>
> >> In my case, there is a hardware RNG that can keep its random seed state
> >> on the SDcard. A "better" option would be some sort of NVRAM, but (much
> >> like a real-time clock), pretty much no normal FPGA boards have this.
> >>
> > Look, if you NEED ASLR to have a modicum of safety and protection,
> > GO FOR IT.
> > <
> > My 66000 has no such need.
<
> It isn't strictly needed in an architectural sense, but I don't trust
> security without it.
>
> Like, seeing how easily x86 systems have been to hack due to these sorts
> of issues (and the epic failure of "just write code that is not weak
> against buffer overflow").
>
My guess is that soon someone will find a hole in PCIe and be able to
attack pretty much ANY system (because PCIe is the I/O hub of every
system and thing). And just wait till you see what CXL addition to PCIe
will bring to the <attackers> game.
>
> Well, and the pros/cons that trying to invoke buffer-overflow exploits
> over USB was one of the major strategies for "jail breaking" or
> "rooting" cell-phones.
>
>
> Well, and as-is, TestKern is still kinda pathetic on this front.
> >>
> >> Would "almost" be nice if capabilities could be supported as well,
> >> except that there are some serious drawbacks with capabilities as well
> >> (to make them "actually effective" adds drawbacks, otherwise one could
> >> sidestep them, which would severely limit their effectiveness).
> > <
> > One way to think about My 66000 is that the user address space is a
> > capability, and that the user does not have a capability to GuestOS
> > address space, GuestOS does not have a capability to HyperVisor
> > address space,.....
> > <
> > It does not mater what bit pattern AGEN spits out, you cannot access
> > the greater privilege level's address spaces. Not from ISA, not from TLB,
> > not from Cache lines, not from watching a high precision timer.
> > <
> > Computer architecture is hard enough, there is no need to add even more
> > fuzz across a myriad of problem-spaces when you can simply design them
> > out before hand.
<
> Hmm...
>
>
> Hard to keep things both "cheap" and "wont interfere with normal C
> coding practices".
<
Yes, modern languages are not really capable of dealing with capabilities.
Anything not a huge-flat address space is problematic with inter process
shared memory.
> >>
> >> There is a feature for bounds-checked pointers in BJX2 (encoding the
> >> bounds in the tag bits), but it falls well short of what would be needed
> >> for an actual capability architecture.
> > <
> > I access bounds checking via the CMP instruction.
<
> I have the LEAT and BNDCHK instructions.
>
> Where:
> LEAT.x (Rm, Ri), Rn
> Behaves like a LEA, but also adjusts the bounds.
> BNDCHK.x Rm, Rn
> Will raise a fault if the access is out-of-bounds.
>
> Whereas, the normal LEA.x will zero the tag bits.
>
> For the XLEA.x and XMOV.x instructions, bound-adjustment and checking
> are the default behaviors (if bounds-checking is enabled in the ISA).
>
> But, unlike a capability machine, nothing prevents a program from
> twiddling the tag bits. A true capability machine would require a way to
> tag the registers and memory to prevent the program from twiddling these
> bits.
<
That twiddling is not inexpensive, and preventing all misuse is like understanding
the Gordian knot.
>
> But, at the moment one is like "well, we can assign 2 bits to every
> 128-bit pair to separate pointers from other data", a massive crap-storm
> is unleashed.
<
But people want and need every bit in every register and container.
If you want tags, you put them elsewhere.
>
>
> Sadly, not enough bits to make them "not suck" or to encode both bounds
> and an element type.
> >>>>>>
> >>>>> <snip>
> >>>>>>
> >>>>>> I guess one possible way would be to organize the ACL table as a
> >>>>>> page-table like structure, say:
> >>>>>> (31:16): KRR_ID
> >>>>>> (15: 0): ACLID
> >>>>> <
> >>>>> In principle, you could have a different page hash table in memory for each
> >>>>> ASID.
> >>> <
> >>>> To clarify, ASID and ACLID are two different features:
> >>>> ASID is used for separating address spaces;
> >>>> ACLID is for per-page per-task access-rights checking.
> >>> <
> >>> Most people place ACLID in the PTE and some in the hierarchy of
> >>> PTPs and PTE.
> >>>
> >> There are some protection flags in the PTEs as well, but they only cover
> >> the traditional User/Supervisor and "Global RWX" state.
> > <
> > Consider a HyperVisor hosting 2 GuestOSs. Both Guest OSs want to run
> > their applications (the normal way) by having the user program share
> > page tables and "Optimizing" the TLB using the G-bit. Now that you have
> > 2 GuestOSs, this G-bit is now allowing leaks between GuestOSs, confusing
> > the TLB mappings, and causing a host of other problems.
> > <
> > Now, consider an ASID system where the GuestOSs use different ASIDs
> > and now all those G-bit problems disappear ! the TLB remains unconfused,
> > and the memory hierarchy knows what to do. HyperVisors do this to the
> > old MMU models, new architectures should solve these problems without
> > creating new ones.
<
> OK.
>
>
> I guess it came up elsewhere that apparently hypervisors differ more
> from emulators than initially thought.
<
And evolving away at a rapid pace.
>


Click here to read the complete article
Re: Misc: Design tradeoffs in virtual memory systems...

<e20fa77e-0bfe-4c6c-a771-a5d2bf964712n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32354&group=comp.arch#32354

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1a08:b0:3f6:bd82:c033 with SMTP id f8-20020a05622a1a0800b003f6bd82c033mr1137781qtb.7.1685160027082;
Fri, 26 May 2023 21:00:27 -0700 (PDT)
X-Received: by 2002:a05:6870:b48a:b0:19a:a91d:af41 with SMTP id
y10-20020a056870b48a00b0019aa91daf41mr433862oap.2.1685160026803; Fri, 26 May
2023 21:00:26 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 26 May 2023 21:00:26 -0700 (PDT)
In-Reply-To: <6abdc3ae-e4af-45c8-a324-34ec96b61525n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <u4por8$3tugb$1@dont-email.me> <16d4aaca-e13b-4330-9a50-fecd8933f5fdn@googlegroups.com>
<u4qspr$23sa$1@dont-email.me> <ec6bd9e3-fc64-4c93-9b33-442f8d89a47en@googlegroups.com>
<u4r27s$2lrj$1@dont-email.me> <4ac1abec-c685-4871-b2cd-079e4ea04991n@googlegroups.com>
<u4r92b$3idb$1@dont-email.me> <881c9ad7-2cae-4ea1-a263-72d4778fb9c0n@googlegroups.com>
<u4rk20$4qbn$1@dont-email.me> <6abdc3ae-e4af-45c8-a324-34ec96b61525n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e20fa77e-0bfe-4c6c-a771-a5d2bf964712n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: robfi680@gmail.com (robf...@gmail.com)
Injection-Date: Sat, 27 May 2023 04:00:27 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 15427
 by: robf...@gmail.com - Sat, 27 May 2023 04:00 UTC

On Friday, May 26, 2023 at 10:09:44 PM UTC-4, MitchAlsup wrote:
> On Friday, May 26, 2023 at 7:50:36 PM UTC-5, BGB wrote:
> > On 5/26/2023 5:17 PM, MitchAlsup wrote:
> > > On Friday, May 26, 2023 at 4:43:19 PM UTC-5, BGB-Alt wrote:
> > >> On 5/26/2023 3:31 PM, MitchAlsup wrote:
> > >
> > >>>>> My 66000 is not attackable by those means--and probably can
> > >>>>> avoid needing ASLR although there is no reason to to use it as
> > >>>>> it slows nothing down.
> > >>> <
> > >>>> I had assumed ASLR as a "standard line of defense".
> > >>> <
> > >>> If/when the unprivileged cannot access super-address-space no mater
> > >>> the bit pattern created at AGEN, ASLR is not needed, as there is no way
> > >>> the user can access memory not mapped by his page tables.
> > > <
> > >> ASLR can help protect against things like userspace code doing
> > >> buffer-overflow exploits against system calls or intentionally mangled
> > >> data structures (along with things like marking stack and heap memory
> > >> and similar as non-executable, ...).
> > > <
> > > What if the control-flow information is not accessible to the LDs and STs
> > > of an ISA ?? That is, you can buffer overflow all you want, but when you
> > > execute RET you end up back at who called that subroutine ???
> > > <
> > > My 66000 architecture has 2 stack pointers, one for the data stack and
> > > one for the control flow and protected registers stack. The one for the
> > > protected register stack is not accessible to the program except through
> > > ENTER and EXIT and RET instructions and the pages are marked RWE = 000
> > > and HW verifies that those pages are so marked (at least at user privilege).
> > > <
> > > So, you can damage as much data memory as you like, but return from a
> > > subroutine will end up at the instruction after the call AND all of the
> > > preserved registers have their old values.
> > Yeah. Only one stack in my case.
> >
> > There are stack canaries though, which can detect if a buffer overflow
> > had overwritten the canary value.
> >
> > Had at one point considered a feature to hash the state of all of the
> > preserved registers, and then verify that everything was intact. No real
> > practical way to do this check in-hardware though.
> > >>
> > >> Partly as buffer-overflow and many forms of "confused deputy" attacks
> > >> are not stopped by conventional memory-access protections, but can be
> > >> made ineffective by using ASLR.
> > >>
> > > Look, I am NOT arguing that ASLR is bad, just that if the ISA and MMU
> > > is properly defined ASLR brings nothing MORE to the table. I am arguing
> > > the the need for ASLR is simply indicative of bad architecture.
> > OK.
> >
> > Most systems use it at least, and it is reasonably cheap.
> <
> Most systems are based on X86 or ARM--both of which have these kinds
> of problems.
> >
> > In an ideal world, maybe it would be unnecessary, but as I see it, it is
> > likely well worth its cost on this front.
> > >>
> > >> Similar goes for the "compiler shuffles all the functions on each
> > >> rebuild", etc.
> > > <
> > > Oh, and BTW, the user privilege is not allowed to write GOT and My 66000
> > > does not need a PLT, either. Cutting off even more attack vectors.
> <
> > OK.
> >
> > I am mostly using direct branches for local calls, and had intended
> > Abs48 branches for DLL imports (but, with the drawback that Abs48
> > branches can't encode Inter-ISA branches).
> >
> > Though, it is possible I could consider also allowing a Jumbo-Load +
> > Register Branch sequence for DLL imports, which is capable of InterISA,
> > and/or always leave at least 128 bits.
> <
> What I did was to invent a LD IP,[addres] instruction. This transfers control
> to the address in the accessed location and has the side effect of delivering
> the return address to R0.
> >
> > It seems reasonable that class VTable's and similar could be put into
> > read-only pages.
> >
> Once the method call tables have been assembled--those tables should
> have write permission removed.
> >
> > Still leaves a concern though if a program could compose "malicious COM
> > objects" or similar and then put them somewhere where they are not
> > supposed to go.
> >
> > Though, this is an area where (in theory) keyring/ACL checks could help,
> > since (ideally) if neither party can execute each others' ".text"
> > sections, then one can stop programs from sneaking bad COM objects into
> > OS APIs.
> <
> the first problem is that early PTE structures granted read and execute permission
> as if it were the same. There are reasons one wants you only to be able to read
> (.rodata for example) some are in memory, but you would neve want to transfer
> control there; similarly, one should not be able to read code (.text). Mixing these
> two up created a whole slew of problems..............
> <
> the second problem is that there are more than 2 layers. You want a debugger
> to be able to read code, the stack walker sometimes needs to read code, the
> dynamic linker needs to be able to write what otherwise smells like .rodata--
> while the application is not allowed to manipulate any of this, but of course
> the operating system IS.
> <
> In the past we granted excess permission to smooth out the bumps, and
> now we have college coursed teaching students how to break existing
> architectures, implementations, languages, networks,.....
> <
> We got it wrong up front, and now you are having to deal with the problem..
> >
> > Well, that or run lint-checks on any user-supplied objects and make
> > disallow ones with VTables or method pointers into writable memory. But,
> > this requires the APIs to be "well designed", which is possibly asking a
> > lot.
> > >>
> > >> In my case, there is a hardware RNG that can keep its random seed state
> > >> on the SDcard. A "better" option would be some sort of NVRAM, but (much
> > >> like a real-time clock), pretty much no normal FPGA boards have this..
> > >>
> > > Look, if you NEED ASLR to have a modicum of safety and protection,
> > > GO FOR IT.
> > > <
> > > My 66000 has no such need.
> <
> > It isn't strictly needed in an architectural sense, but I don't trust
> > security without it.
> >
> > Like, seeing how easily x86 systems have been to hack due to these sorts
> > of issues (and the epic failure of "just write code that is not weak
> > against buffer overflow").
> >
> My guess is that soon someone will find a hole in PCIe and be able to
> attack pretty much ANY system (because PCIe is the I/O hub of every
> system and thing). And just wait till you see what CXL addition to PCIe
> will bring to the <attackers> game.
> >
> > Well, and the pros/cons that trying to invoke buffer-overflow exploits
> > over USB was one of the major strategies for "jail breaking" or
> > "rooting" cell-phones.
> >
> >
> > Well, and as-is, TestKern is still kinda pathetic on this front.
> > >>
> > >> Would "almost" be nice if capabilities could be supported as well,
> > >> except that there are some serious drawbacks with capabilities as well
> > >> (to make them "actually effective" adds drawbacks, otherwise one could
> > >> sidestep them, which would severely limit their effectiveness).
> > > <
> > > One way to think about My 66000 is that the user address space is a
> > > capability, and that the user does not have a capability to GuestOS
> > > address space, GuestOS does not have a capability to HyperVisor
> > > address space,.....
> > > <
> > > It does not mater what bit pattern AGEN spits out, you cannot access
> > > the greater privilege level's address spaces. Not from ISA, not from TLB,
> > > not from Cache lines, not from watching a high precision timer.
> > > <
> > > Computer architecture is hard enough, there is no need to add even more
> > > fuzz across a myriad of problem-spaces when you can simply design them
> > > out before hand.
> <
> > Hmm...
> >
> >
> > Hard to keep things both "cheap" and "wont interfere with normal C
> > coding practices".
> <
> Yes, modern languages are not really capable of dealing with capabilities..
> Anything not a huge-flat address space is problematic with inter process
> shared memory.
> > >>
> > >> There is a feature for bounds-checked pointers in BJX2 (encoding the
> > >> bounds in the tag bits), but it falls well short of what would be needed
> > >> for an actual capability architecture.
> > > <
> > > I access bounds checking via the CMP instruction.
> <
> > I have the LEAT and BNDCHK instructions.
> >
> > Where:
> > LEAT.x (Rm, Ri), Rn
> > Behaves like a LEA, but also adjusts the bounds.
> > BNDCHK.x Rm, Rn
> > Will raise a fault if the access is out-of-bounds.
> >
> > Whereas, the normal LEA.x will zero the tag bits.
> >
> > For the XLEA.x and XMOV.x instructions, bound-adjustment and checking
> > are the default behaviors (if bounds-checking is enabled in the ISA).
> >
> > But, unlike a capability machine, nothing prevents a program from
> > twiddling the tag bits. A true capability machine would require a way to
> > tag the registers and memory to prevent the program from twiddling these
> > bits.
> <
> That twiddling is not inexpensive, and preventing all misuse is like understanding
> the Gordian knot.
> >
> > But, at the moment one is like "well, we can assign 2 bits to every
> > 128-bit pair to separate pointers from other data", a massive crap-storm
> > is unleashed.
> <
> But people want and need every bit in every register and container.
> If you want tags, you put them elsewhere.
> >
> >
> > Sadly, not enough bits to make them "not suck" or to encode both bounds
> > and an element type.
> > >>>>>>
> > >>>>> <snip>
> > >>>>>>
> > >>>>>> I guess one possible way would be to organize the ACL table as a
> > >>>>>> page-table like structure, say:
> > >>>>>> (31:16): KRR_ID
> > >>>>>> (15: 0): ACLID
> > >>>>> <
> > >>>>> In principle, you could have a different page hash table in memory for each
> > >>>>> ASID.
> > >>> <
> > >>>> To clarify, ASID and ACLID are two different features:
> > >>>> ASID is used for separating address spaces;
> > >>>> ACLID is for per-page per-task access-rights checking.
> > >>> <
> > >>> Most people place ACLID in the PTE and some in the hierarchy of
> > >>> PTPs and PTE.
> > >>>
> > >> There are some protection flags in the PTEs as well, but they only cover
> > >> the traditional User/Supervisor and "Global RWX" state.
> > > <
> > > Consider a HyperVisor hosting 2 GuestOSs. Both Guest OSs want to run
> > > their applications (the normal way) by having the user program share
> > > page tables and "Optimizing" the TLB using the G-bit. Now that you have
> > > 2 GuestOSs, this G-bit is now allowing leaks between GuestOSs, confusing
> > > the TLB mappings, and causing a host of other problems.
> > > <
> > > Now, consider an ASID system where the GuestOSs use different ASIDs
> > > and now all those G-bit problems disappear ! the TLB remains unconfused,
> > > and the memory hierarchy knows what to do. HyperVisors do this to the
> > > old MMU models, new architectures should solve these problems without
> > > creating new ones.
> <
> > OK.
> >
> >
> > I guess it came up elsewhere that apparently hypervisors differ more
> > from emulators than initially thought.
> <
> And evolving away at a rapid pace.
> >


Click here to read the complete article
Re: Misc: Design tradeoffs in virtual memory systems...

<u4s51v$aeub$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32355&group=comp.arch#32355

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sat, 27 May 2023 00:39:07 -0500
Organization: A noiseless patient Spider
Lines: 385
Message-ID: <u4s51v$aeub$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 27 May 2023 05:39:11 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c682f1875e0ef9e418a489a2e17601b8";
logging-data="342987"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Oq+qwxq3+PTbvF+1X9X8A"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:uFW4JU+MEnW/fgyXXrpqc5GaAqg=
In-Reply-To: <bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
Content-Language: en-US
 by: BGB - Sat, 27 May 2023 05:39 UTC

On 5/26/2023 8:49 PM, MitchAlsup wrote:
> On Friday, May 26, 2023 at 8:24:13 PM UTC-5, BGB wrote:
>> On 5/26/2023 4:15 PM, Dan Cross wrote:
>>> In article <66610ab9-0c63-4b8e...@googlegroups.com>,
>>> MitchAlsup <Mitch...@aol.com> wrote:
>>>> On Friday, May 26, 2023 at 2:59:13 PM UTC-5, Dan Cross wrote:
>>>>> Nor does it imply that a hardware page-table walker is bad.
>>>>
>>>> It has always been my argument that HW tableWalkers are best.
>>>
>>> I concur. OP does not, but OP doesn't seem to be talking from a
>>> particularly knowledgable position.
>>>
>> If I had no idea what I was doing, I would not likely have gotten as far
>> as I have...
>>
>> But, admittedly, some amount of what information I had came from
>> quick-and-dirty web-searches and gathering information from things like
>> CS/EE PowerPoint slides and similar.
> <
> Quick and dirty web searches often portray x86 as the best architecture
> that could ever be invented, almost always emulated, and never defamed.

OK.

I lot of what I had found talks about what has been done, but doesn't
really talk about any of the tradeoffs for why one option would be
considered over another.

But, I did find after more looking, that some of my initial
characterizations were "not quite right", for example:
SPARC and PowerPC were model-dependent on this front;
IA-64 apparently used a sort of RAM-backed fallback strategy.
But, it was more like a giant TLB rather than page-tables.
...

I guess these are using what is known as "Inverted Page Tables".

>>
>> Some amount of reading PDFs and similar as well.
> <
>>>>> The OP seems to think so, but has yet to provide a particularly
>>>>> compelling argument beyond some measurements from very
>>>>> unrepresentative workloads running on a hobby ISA on an FPGA
>>>>
>>>> One must rate BGBs architecture is the less-than-even-academic
>>>> category; far from industrial quality.
>>>
>>> Agreed.
>>>
>> How many people have done much better with their hobby CPU ISA projects?...
> <
> I will give you great credit in pulling what you have off.
> Where I tend to disagree, is when you state what you have done as
> if it is superlative. I am guilty of the same.
> <

Fair enough.

Often I am just trying to state what I have done and sometimes why I
have done it that way. I am not just setting out trying to design
something that sucks.

Sometimes things turn out to kinda suck after the fact.

Some other things I just sort of expected would suck, and that I could
then just sweep them under the carpet (eg, the whole 96-bit virtual
addressing thing). It has instead ended up in this weird category of
"almost but not entirely pointless"; but also not quite expensive enough
to justify dropping the idea outright.

But, still, doesn't do good things to the page-tables.

Like, in some offline tests where one starts adding pages at random
addresses, with a 3-level page-table there is relatively little overhead
vs sequential address assignment.

But, with an 8-level page table, the memory use "takes off like a rocket".

Granted, with a split ASLR scheme (picking random quadrant addresses for
tasks and then picking random addresses within those quadrants), things
are likely to be a bit more modest vs my original "pick and insert pages
at random" test.

But, yeah, in a test where one randomly inserts a little around 1500
pages and suddenly needs around 200MB for the page tables; I did not
take this as a good sign... Like, the page-tables needed around an order
of magnitude more memory than the pages contained within.

Then, I tried AVL-Trees and B-Trees as possible workarounds, because,
while slower, at least they wouldn't eat all the RAM.

Meanwhile, hash tables had a different set of issues, mostly along the
"what happens when they get full?" issue.

....

>>
>>>> But we enjoy watching him stumble across industrial problems
>>>> making the same mistakes we made 40 years ago.
>>>
>>> It seems like many of those could be avoided if OP were a bit
>>> more open-minded and, dare I say it, self-aware.
>>>
>> ?...
>>
>> I am aware of my own existence, and am able to recognize my own
>> reflection in a mirror, etc.
> <
> Cuttlefish are the boundary line on self recognition.

I am at least probably smarter than a cuttlefish...

>>
>> I am not always the best in "intuitive" contexts.
>> Nor necessarily at noticing or thinking about "obvious" things.
>>
>> Nor, particularly skilled with "top down" thinking.
>>
> Throughout my career, I have been more successful with middle out
> thinking and designing--a both ends towards the middle, rather than
> a) just throw it together and see what you can make work, and b)
> top down, minutia be damned--ways of addressing the problems.
> <
> I find it interesting that the original 8080 was (a) as was Unix (a).
> CDC 6600 has the feel for middle out
> System 360 has a top-down feel to it.
> MMX-through SSE3,4,5 has a throw it together feel.
> x86-64 has a middle out feel to it.

I suspect I use a bottom-up strategy much of the time...

Though, a lot of what I end up writing about and implementing has
already "climbed the latter" a fair bit, and sometimes it is necessary
to prune and reorganize things to "keep the mess at bay".

Admittedly, my "planning" skills tend to be almost non existent, mostly
just life with one thing happening after another. So, I don't really
"plan" so much, as try to predict what will happen next, and have
potential solutions available for the potential scenarios.

Well, and attempts at using "traditional" planning approaches either
don't work correctly, or fairly quickly devolve into implausible
scenarios (and I am not inclined to take a course of action which leads
straight off into the implausible).

I suspect I mostly just have the power of "obsessing on stuff probably
way more than any normal person would obsess on it".

I am not sure if BJX2 really has a "feel".

It was some parts of:
Clone other stuff;
Clean up stuff;
Random ideas that seemed nifty;
...

But, where I started out (originally with SH-4):
+ 32-bit ops (Immed and 3R forms);
+ more GPRs (initially 16->32)
+ Moved 32 -> 64 bit
+ "Early WEX" (very different from modern WEX)
Had used a more IA-64 like encoding scheme.
Initially would have been a separate CPU mode.
Totally redesigned the encoding (BJX1 -> BJX2, ~ 2017)
- Dropped Shadow Registers
- Dropped Delay Slots
- Dropped Auto-Increment
= MMU/TLB moved from MMIO device to architectural
+ WEX2 (Current WEX)
Now based on daisy-chained 32-bit words.
- Dropped FPRs
+ Predication
- PUSH/POP
+ ALU SIMD (~ 2018)
(Got working on FPGA, BJX2-C)
+ PrWEX
+ Jumbo Encodings
(Breaking encoding changes, BJX2-D)
+ FP-SIMD
+ RGB555 Helper Ops (for TKRA-GL)
+ XGPR (32->64 GPRs)
Initially motivated by high register pressure in TKRA-GL.
+ UTX2 (Compressed textures for TKRA-GL)
+ FMOV.S (Combined Load/Store and Fp32<->Fp64 Conv)
Mostly as this gets a "nice speedup" in Quake.
+ RV64I Alt-Mode Decoder.
(Despite being an unrelated ISA, fit into pipeline well enough)
+ XMOV (96-bit VA extension)
+ LDTEX (originally for "GPU Mode")
+ DIV/MOD (and RV64 'M' extension)
+ XG2 (Drops 16-bit ops, but makes XGPR orthogonal)
XG2 is a separate operating mode from 'Baseline Mode'.
+ FP_IMM (Immediate Values for Floating Point and FP-SIMD ops)
+ XG2RV (XG2 Mode but using RV64I register space)
Intended to allow RV64 code to more easily access BJX2 features.
(Presentish)

The RV64I support is still "mostly untested", and I keep running into
roadblocks with this. At the basic level, the BJX2 core will not run an
OS meant for a RISC-V core, but could (in premise) allow running RISC-V
code in userland. GCC could have made my experiments easier if it
supported PIE binaries...

But, sadly, my project has once again achieved a certain level of "hair"...

Would be nicer if it were all more cleanly organized.

Also would be nice if I could run the CPU core on the FPGA faster than
50MHz (while also getting something "usably faster" than the existing
CPU core running at 50MHz).

>>
>> I am not entirely sure how to describe my thought processes, or how they
>> compare to others.
> <
> If I could accurately portray my though processes to a Psychiatrist,
> they would probably lock me up..............
> <
> I often tell the joke to someone like us::
> <
> Sane people think that there is a big difference between being sane and insane.
> <
> < longish pause >
> <
> We know otherwise.


Click here to read the complete article
Re: Misc: Design tradeoffs in virtual memory systems...

<u4sfkc$bt0t$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32356&group=comp.arch#32356

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sat, 27 May 2023 03:39:36 -0500
Organization: A noiseless patient Spider
Lines: 463
Message-ID: <u4sfkc$bt0t$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<16d4aaca-e13b-4330-9a50-fecd8933f5fdn@googlegroups.com>
<u4qspr$23sa$1@dont-email.me>
<ec6bd9e3-fc64-4c93-9b33-442f8d89a47en@googlegroups.com>
<u4r27s$2lrj$1@dont-email.me>
<4ac1abec-c685-4871-b2cd-079e4ea04991n@googlegroups.com>
<u4r92b$3idb$1@dont-email.me>
<881c9ad7-2cae-4ea1-a263-72d4778fb9c0n@googlegroups.com>
<u4rk20$4qbn$1@dont-email.me>
<6abdc3ae-e4af-45c8-a324-34ec96b61525n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 27 May 2023 08:39:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c682f1875e0ef9e418a489a2e17601b8";
logging-data="390173"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Ij8ZyELhMEQcs8I8la/w2"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:2y7OajiRv7R08zbeNlKNf73c1tQ=
Content-Language: en-US
In-Reply-To: <6abdc3ae-e4af-45c8-a324-34ec96b61525n@googlegroups.com>
 by: BGB - Sat, 27 May 2023 08:39 UTC

On 5/26/2023 9:09 PM, MitchAlsup wrote:
> On Friday, May 26, 2023 at 7:50:36 PM UTC-5, BGB wrote:
>> On 5/26/2023 5:17 PM, MitchAlsup wrote:
>>> On Friday, May 26, 2023 at 4:43:19 PM UTC-5, BGB-Alt wrote:
>>>> On 5/26/2023 3:31 PM, MitchAlsup wrote:
>>>
>>>>>>> My 66000 is not attackable by those means--and probably can
>>>>>>> avoid needing ASLR although there is no reason to to use it as
>>>>>>> it slows nothing down.
>>>>> <
>>>>>> I had assumed ASLR as a "standard line of defense".
>>>>> <
>>>>> If/when the unprivileged cannot access super-address-space no mater
>>>>> the bit pattern created at AGEN, ASLR is not needed, as there is no way
>>>>> the user can access memory not mapped by his page tables.
>>> <
>>>> ASLR can help protect against things like userspace code doing
>>>> buffer-overflow exploits against system calls or intentionally mangled
>>>> data structures (along with things like marking stack and heap memory
>>>> and similar as non-executable, ...).
>>> <
>>> What if the control-flow information is not accessible to the LDs and STs
>>> of an ISA ?? That is, you can buffer overflow all you want, but when you
>>> execute RET you end up back at who called that subroutine ???
>>> <
>>> My 66000 architecture has 2 stack pointers, one for the data stack and
>>> one for the control flow and protected registers stack. The one for the
>>> protected register stack is not accessible to the program except through
>>> ENTER and EXIT and RET instructions and the pages are marked RWE = 000
>>> and HW verifies that those pages are so marked (at least at user privilege).
>>> <
>>> So, you can damage as much data memory as you like, but return from a
>>> subroutine will end up at the instruction after the call AND all of the
>>> preserved registers have their old values.
>> Yeah. Only one stack in my case.
>>
>> There are stack canaries though, which can detect if a buffer overflow
>> had overwritten the canary value.
>>
>> Had at one point considered a feature to hash the state of all of the
>> preserved registers, and then verify that everything was intact. No real
>> practical way to do this check in-hardware though.
>>>>
>>>> Partly as buffer-overflow and many forms of "confused deputy" attacks
>>>> are not stopped by conventional memory-access protections, but can be
>>>> made ineffective by using ASLR.
>>>>
>>> Look, I am NOT arguing that ASLR is bad, just that if the ISA and MMU
>>> is properly defined ASLR brings nothing MORE to the table. I am arguing
>>> the the need for ASLR is simply indicative of bad architecture.
>> OK.
>>
>> Most systems use it at least, and it is reasonably cheap.
> <
> Most systems are based on X86 or ARM--both of which have these kinds
> of problems.

Yes.

>>
>> In an ideal world, maybe it would be unnecessary, but as I see it, it is
>> likely well worth its cost on this front.
>>>>
>>>> Similar goes for the "compiler shuffles all the functions on each
>>>> rebuild", etc.
>>> <
>>> Oh, and BTW, the user privilege is not allowed to write GOT and My 66000
>>> does not need a PLT, either. Cutting off even more attack vectors.
> <
>> OK.
>>
>> I am mostly using direct branches for local calls, and had intended
>> Abs48 branches for DLL imports (but, with the drawback that Abs48
>> branches can't encode Inter-ISA branches).
>>
>> Though, it is possible I could consider also allowing a Jumbo-Load +
>> Register Branch sequence for DLL imports, which is capable of InterISA,
>> and/or always leave at least 128 bits.
> <
> What I did was to invent a LD IP,[addres] instruction. This transfers control
> to the address in the accessed location and has the side effect of delivering
> the return address to R0.

Yeah, multiple possibilities in my case:
1:
MOV Imm64, R3
JMP R3
2:
MOV.Q (PC, 8), R3
NOP4B //32-bit aligning NOP
JMP4B R3 //32-bit encoding
.qword target_addr

The NOP being needed because MOV.Q displacement is always a multiple of
8 and PC-rel (in BJX2) is relative to the following instruction (unlike
RISC-V where it is relative to the current instruction).

>>
>> It seems reasonable that class VTable's and similar could be put into
>> read-only pages.
>>
> Once the method call tables have been assembled--those tables should
> have write permission removed.

Probably true.

For a C++ compiler, presumably VTables could be put in
".rdata"/".rodata" or similar.

>>
>> Still leaves a concern though if a program could compose "malicious COM
>> objects" or similar and then put them somewhere where they are not
>> supposed to go.
>>
>> Though, this is an area where (in theory) keyring/ACL checks could help,
>> since (ideally) if neither party can execute each others' ".text"
>> sections, then one can stop programs from sneaking bad COM objects into
>> OS APIs.
> <
> the first problem is that early PTE structures granted read and execute permission
> as if it were the same. There are reasons one wants you only to be able to read
> (.rodata for example) some are in memory, but you would neve want to transfer
> control there; similarly, one should not be able to read code (.text). Mixing these
> two up created a whole slew of problems..............
> <
> the second problem is that there are more than 2 layers. You want a debugger
> to be able to read code, the stack walker sometimes needs to read code, the
> dynamic linker needs to be able to write what otherwise smells like .rodata--
> while the application is not allowed to manipulate any of this, but of course
> the operating system IS.
> <
> In the past we granted excess permission to smooth out the bumps, and
> now we have college coursed teaching students how to break existing
> architectures, implementations, languages, networks,.....
> <
> We got it wrong up front, and now you are having to deal with the problem.

Yeah.

I first designed VUGID, and layer ACLID, because I hoped for something
better than "well, we will just throw our hands in the air and accept
that it will happen as some inescapable law of nature".

Or, "we 'could' fix it", but the fix fundamentally breaks working
assumptions about how languages like C work in ways that seem like an
obvious code-portability mess.

One can argue, "Well, maybe C's time has passed?"... But, it is kinda
weak if one hasn't so much "solved" the problem so much as "pushed
things up a layer of abstraction and call it solved."

Solving the problem means it "actually went away", but not so much
"Security problem has gone away in debug builds, but can return in
optimized release builds because the compiler incorrectly 'proved' that
the access can't go out of bounds; and so optimized away the
bounds-checking on the array..."

So, then it turns out that ones' bounds-checked type-safe language is
still, in effect, capable of suffering from buffer overflow exploits.

Or, OTOH, we could just as well have it, in C, that "char *" can
remember the bounds of the array that was assigned to it, and the random
"*t++=*s++" that went out of bounds, can be made to generate a
bounds-check fault (or, not, depending on compiler settings).

Well, or failing this, we have:
ASLR;
Stack canary checks;
Per-page ACL checks;
...

And, cross ones' fingers that at least one of those will be able to help
stop the exploit.

Though, at least in the type-safe language, the default expectation is
that the arrays will be bounds-checked, rather than possible annoyance
at all the pointer bounds checking making their program 5% slower (and
that pointer<->integer casts may leave "garbage" in the pointer bits).

>>
>> Well, that or run lint-checks on any user-supplied objects and make
>> disallow ones with VTables or method pointers into writable memory. But,
>> this requires the APIs to be "well designed", which is possibly asking a
>> lot.
>>>>
>>>> In my case, there is a hardware RNG that can keep its random seed state
>>>> on the SDcard. A "better" option would be some sort of NVRAM, but (much
>>>> like a real-time clock), pretty much no normal FPGA boards have this.
>>>>
>>> Look, if you NEED ASLR to have a modicum of safety and protection,
>>> GO FOR IT.
>>> <
>>> My 66000 has no such need.
> <
>> It isn't strictly needed in an architectural sense, but I don't trust
>> security without it.
>>
>> Like, seeing how easily x86 systems have been to hack due to these sorts
>> of issues (and the epic failure of "just write code that is not weak
>> against buffer overflow").
>>
> My guess is that soon someone will find a hole in PCIe and be able to
> attack pretty much ANY system (because PCIe is the I/O hub of every
> system and thing). And just wait till you see what CXL addition to PCIe
> will bring to the <attackers> game.


Click here to read the complete article
Re: Misc: Design tradeoffs in virtual memory systems...

<u4sian$c97s$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32357&group=comp.arch#32357

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sat, 27 May 2023 04:25:37 -0500
Organization: A noiseless patient Spider
Lines: 384
Message-ID: <u4sian$c97s$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<16d4aaca-e13b-4330-9a50-fecd8933f5fdn@googlegroups.com>
<u4qspr$23sa$1@dont-email.me>
<ec6bd9e3-fc64-4c93-9b33-442f8d89a47en@googlegroups.com>
<u4r27s$2lrj$1@dont-email.me>
<4ac1abec-c685-4871-b2cd-079e4ea04991n@googlegroups.com>
<u4r92b$3idb$1@dont-email.me>
<881c9ad7-2cae-4ea1-a263-72d4778fb9c0n@googlegroups.com>
<u4rk20$4qbn$1@dont-email.me>
<6abdc3ae-e4af-45c8-a324-34ec96b61525n@googlegroups.com>
<e20fa77e-0bfe-4c6c-a771-a5d2bf964712n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 27 May 2023 09:25:43 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c682f1875e0ef9e418a489a2e17601b8";
logging-data="402684"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18FAqdoumLi0Ge1nVAfErAA"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:Wsh8d9Ov8J2nLYujicQ/Wp5eMfw=
In-Reply-To: <e20fa77e-0bfe-4c6c-a771-a5d2bf964712n@googlegroups.com>
Content-Language: en-US
 by: BGB - Sat, 27 May 2023 09:25 UTC

On 5/26/2023 11:00 PM, robf...@gmail.com wrote:
> On Friday, May 26, 2023 at 10:09:44 PM UTC-4, MitchAlsup wrote:
>> On Friday, May 26, 2023 at 7:50:36 PM UTC-5, BGB wrote:
>>> On 5/26/2023 5:17 PM, MitchAlsup wrote:
>>>> On Friday, May 26, 2023 at 4:43:19 PM UTC-5, BGB-Alt wrote:
>>>>> On 5/26/2023 3:31 PM, MitchAlsup wrote:
>>>>
>>>>>>>> My 66000 is not attackable by those means--and probably can
>>>>>>>> avoid needing ASLR although there is no reason to to use it as
>>>>>>>> it slows nothing down.
>>>>>> <
>>>>>>> I had assumed ASLR as a "standard line of defense".
>>>>>> <
>>>>>> If/when the unprivileged cannot access super-address-space no mater
>>>>>> the bit pattern created at AGEN, ASLR is not needed, as there is no way
>>>>>> the user can access memory not mapped by his page tables.
>>>> <
>>>>> ASLR can help protect against things like userspace code doing
>>>>> buffer-overflow exploits against system calls or intentionally mangled
>>>>> data structures (along with things like marking stack and heap memory
>>>>> and similar as non-executable, ...).
>>>> <
>>>> What if the control-flow information is not accessible to the LDs and STs
>>>> of an ISA ?? That is, you can buffer overflow all you want, but when you
>>>> execute RET you end up back at who called that subroutine ???
>>>> <
>>>> My 66000 architecture has 2 stack pointers, one for the data stack and
>>>> one for the control flow and protected registers stack. The one for the
>>>> protected register stack is not accessible to the program except through
>>>> ENTER and EXIT and RET instructions and the pages are marked RWE = 000
>>>> and HW verifies that those pages are so marked (at least at user privilege).
>>>> <
>>>> So, you can damage as much data memory as you like, but return from a
>>>> subroutine will end up at the instruction after the call AND all of the
>>>> preserved registers have their old values.
>>> Yeah. Only one stack in my case.
>>>
>>> There are stack canaries though, which can detect if a buffer overflow
>>> had overwritten the canary value.
>>>
>>> Had at one point considered a feature to hash the state of all of the
>>> preserved registers, and then verify that everything was intact. No real
>>> practical way to do this check in-hardware though.
>>>>>
>>>>> Partly as buffer-overflow and many forms of "confused deputy" attacks
>>>>> are not stopped by conventional memory-access protections, but can be
>>>>> made ineffective by using ASLR.
>>>>>
>>>> Look, I am NOT arguing that ASLR is bad, just that if the ISA and MMU
>>>> is properly defined ASLR brings nothing MORE to the table. I am arguing
>>>> the the need for ASLR is simply indicative of bad architecture.
>>> OK.
>>>
>>> Most systems use it at least, and it is reasonably cheap.
>> <
>> Most systems are based on X86 or ARM--both of which have these kinds
>> of problems.
>>>
>>> In an ideal world, maybe it would be unnecessary, but as I see it, it is
>>> likely well worth its cost on this front.
>>>>>
>>>>> Similar goes for the "compiler shuffles all the functions on each
>>>>> rebuild", etc.
>>>> <
>>>> Oh, and BTW, the user privilege is not allowed to write GOT and My 66000
>>>> does not need a PLT, either. Cutting off even more attack vectors.
>> <
>>> OK.
>>>
>>> I am mostly using direct branches for local calls, and had intended
>>> Abs48 branches for DLL imports (but, with the drawback that Abs48
>>> branches can't encode Inter-ISA branches).
>>>
>>> Though, it is possible I could consider also allowing a Jumbo-Load +
>>> Register Branch sequence for DLL imports, which is capable of InterISA,
>>> and/or always leave at least 128 bits.
>> <
>> What I did was to invent a LD IP,[addres] instruction. This transfers control
>> to the address in the accessed location and has the side effect of delivering
>> the return address to R0.
>>>
>>> It seems reasonable that class VTable's and similar could be put into
>>> read-only pages.
>>>
>> Once the method call tables have been assembled--those tables should
>> have write permission removed.
>>>
>>> Still leaves a concern though if a program could compose "malicious COM
>>> objects" or similar and then put them somewhere where they are not
>>> supposed to go.
>>>
>>> Though, this is an area where (in theory) keyring/ACL checks could help,
>>> since (ideally) if neither party can execute each others' ".text"
>>> sections, then one can stop programs from sneaking bad COM objects into
>>> OS APIs.
>> <
>> the first problem is that early PTE structures granted read and execute permission
>> as if it were the same. There are reasons one wants you only to be able to read
>> (.rodata for example) some are in memory, but you would neve want to transfer
>> control there; similarly, one should not be able to read code (.text). Mixing these
>> two up created a whole slew of problems..............
>> <
>> the second problem is that there are more than 2 layers. You want a debugger
>> to be able to read code, the stack walker sometimes needs to read code, the
>> dynamic linker needs to be able to write what otherwise smells like .rodata--
>> while the application is not allowed to manipulate any of this, but of course
>> the operating system IS.
>> <
>> In the past we granted excess permission to smooth out the bumps, and
>> now we have college coursed teaching students how to break existing
>> architectures, implementations, languages, networks,.....
>> <
>> We got it wrong up front, and now you are having to deal with the problem.
>>>
>>> Well, that or run lint-checks on any user-supplied objects and make
>>> disallow ones with VTables or method pointers into writable memory. But,
>>> this requires the APIs to be "well designed", which is possibly asking a
>>> lot.
>>>>>
>>>>> In my case, there is a hardware RNG that can keep its random seed state
>>>>> on the SDcard. A "better" option would be some sort of NVRAM, but (much
>>>>> like a real-time clock), pretty much no normal FPGA boards have this.
>>>>>
>>>> Look, if you NEED ASLR to have a modicum of safety and protection,
>>>> GO FOR IT.
>>>> <
>>>> My 66000 has no such need.
>> <
>>> It isn't strictly needed in an architectural sense, but I don't trust
>>> security without it.
>>>
>>> Like, seeing how easily x86 systems have been to hack due to these sorts
>>> of issues (and the epic failure of "just write code that is not weak
>>> against buffer overflow").
>>>
>> My guess is that soon someone will find a hole in PCIe and be able to
>> attack pretty much ANY system (because PCIe is the I/O hub of every
>> system and thing). And just wait till you see what CXL addition to PCIe
>> will bring to the <attackers> game.
>>>
>>> Well, and the pros/cons that trying to invoke buffer-overflow exploits
>>> over USB was one of the major strategies for "jail breaking" or
>>> "rooting" cell-phones.
>>>
>>>
>>> Well, and as-is, TestKern is still kinda pathetic on this front.
>>>>>
>>>>> Would "almost" be nice if capabilities could be supported as well,
>>>>> except that there are some serious drawbacks with capabilities as well
>>>>> (to make them "actually effective" adds drawbacks, otherwise one could
>>>>> sidestep them, which would severely limit their effectiveness).
>>>> <
>>>> One way to think about My 66000 is that the user address space is a
>>>> capability, and that the user does not have a capability to GuestOS
>>>> address space, GuestOS does not have a capability to HyperVisor
>>>> address space,.....
>>>> <
>>>> It does not mater what bit pattern AGEN spits out, you cannot access
>>>> the greater privilege level's address spaces. Not from ISA, not from TLB,
>>>> not from Cache lines, not from watching a high precision timer.
>>>> <
>>>> Computer architecture is hard enough, there is no need to add even more
>>>> fuzz across a myriad of problem-spaces when you can simply design them
>>>> out before hand.
>> <
>>> Hmm...
>>>
>>>
>>> Hard to keep things both "cheap" and "wont interfere with normal C
>>> coding practices".
>> <
>> Yes, modern languages are not really capable of dealing with capabilities.
>> Anything not a huge-flat address space is problematic with inter process
>> shared memory.
>>>>>
>>>>> There is a feature for bounds-checked pointers in BJX2 (encoding the
>>>>> bounds in the tag bits), but it falls well short of what would be needed
>>>>> for an actual capability architecture.
>>>> <
>>>> I access bounds checking via the CMP instruction.
>> <
>>> I have the LEAT and BNDCHK instructions.
>>>
>>> Where:
>>> LEAT.x (Rm, Ri), Rn
>>> Behaves like a LEA, but also adjusts the bounds.
>>> BNDCHK.x Rm, Rn
>>> Will raise a fault if the access is out-of-bounds.
>>>
>>> Whereas, the normal LEA.x will zero the tag bits.
>>>
>>> For the XLEA.x and XMOV.x instructions, bound-adjustment and checking
>>> are the default behaviors (if bounds-checking is enabled in the ISA).
>>>
>>> But, unlike a capability machine, nothing prevents a program from
>>> twiddling the tag bits. A true capability machine would require a way to
>>> tag the registers and memory to prevent the program from twiddling these
>>> bits.
>> <
>> That twiddling is not inexpensive, and preventing all misuse is like understanding
>> the Gordian knot.
>>>
>>> But, at the moment one is like "well, we can assign 2 bits to every
>>> 128-bit pair to separate pointers from other data", a massive crap-storm
>>> is unleashed.
>> <
>> But people want and need every bit in every register and container.
>> If you want tags, you put them elsewhere.
>>>
>>>
>>> Sadly, not enough bits to make them "not suck" or to encode both bounds
>>> and an element type.
>>>>>>>>>
>>>>>>>> <snip>
>>>>>>>>>
>>>>>>>>> I guess one possible way would be to organize the ACL table as a
>>>>>>>>> page-table like structure, say:
>>>>>>>>> (31:16): KRR_ID
>>>>>>>>> (15: 0): ACLID
>>>>>>>> <
>>>>>>>> In principle, you could have a different page hash table in memory for each
>>>>>>>> ASID.
>>>>>> <
>>>>>>> To clarify, ASID and ACLID are two different features:
>>>>>>> ASID is used for separating address spaces;
>>>>>>> ACLID is for per-page per-task access-rights checking.
>>>>>> <
>>>>>> Most people place ACLID in the PTE and some in the hierarchy of
>>>>>> PTPs and PTE.
>>>>>>
>>>>> There are some protection flags in the PTEs as well, but they only cover
>>>>> the traditional User/Supervisor and "Global RWX" state.
>>>> <
>>>> Consider a HyperVisor hosting 2 GuestOSs. Both Guest OSs want to run
>>>> their applications (the normal way) by having the user program share
>>>> page tables and "Optimizing" the TLB using the G-bit. Now that you have
>>>> 2 GuestOSs, this G-bit is now allowing leaks between GuestOSs, confusing
>>>> the TLB mappings, and causing a host of other problems.
>>>> <
>>>> Now, consider an ASID system where the GuestOSs use different ASIDs
>>>> and now all those G-bit problems disappear ! the TLB remains unconfused,
>>>> and the memory hierarchy knows what to do. HyperVisors do this to the
>>>> old MMU models, new architectures should solve these problems without
>>>> creating new ones.
>> <
>>> OK.
>>>
>>>
>>> I guess it came up elsewhere that apparently hypervisors differ more
>>> from emulators than initially thought.
>> <
>> And evolving away at a rapid pace.
>>>
>
> Hey, I had hardware page table walking for both hierarchical and hashed page tables,
> but decided to scrap it. It seems that was a bad choice now, should have kept it. Had
> the entire inverted hash page table for 512MB dram implemented in block RAM so
> walking the table was ultra fast compared to using DRAM. One issue was limited
> hardware budget. If the hardware budget were unlimited I would put whatever was
> possible into hardware. I think a software managed table requires less hardware.
> Another issue IIRC was having the page tables in virtual memory and having to
> walk VM, handling double misses etc. It was getting complex.
>


Click here to read the complete article
Re: Misc: Design tradeoffs in virtual memory systems...

<80f66bbb-6d75-4237-b6e4-ced7c3967c25n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32358&group=comp.arch#32358

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:3953:b0:74e:324:d6f0 with SMTP id qs19-20020a05620a395300b0074e0324d6f0mr535987qkn.7.1685187590164;
Sat, 27 May 2023 04:39:50 -0700 (PDT)
X-Received: by 2002:a05:6871:6a97:b0:19f:3cd:19c9 with SMTP id
zf23-20020a0568716a9700b0019f03cd19c9mr775100oab.7.1685187589881; Sat, 27 May
2023 04:39:49 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 27 May 2023 04:39:49 -0700 (PDT)
In-Reply-To: <u4sian$c97s$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <u4por8$3tugb$1@dont-email.me> <16d4aaca-e13b-4330-9a50-fecd8933f5fdn@googlegroups.com>
<u4qspr$23sa$1@dont-email.me> <ec6bd9e3-fc64-4c93-9b33-442f8d89a47en@googlegroups.com>
<u4r27s$2lrj$1@dont-email.me> <4ac1abec-c685-4871-b2cd-079e4ea04991n@googlegroups.com>
<u4r92b$3idb$1@dont-email.me> <881c9ad7-2cae-4ea1-a263-72d4778fb9c0n@googlegroups.com>
<u4rk20$4qbn$1@dont-email.me> <6abdc3ae-e4af-45c8-a324-34ec96b61525n@googlegroups.com>
<e20fa77e-0bfe-4c6c-a771-a5d2bf964712n@googlegroups.com> <u4sian$c97s$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <80f66bbb-6d75-4237-b6e4-ced7c3967c25n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: robfi680@gmail.com (robf...@gmail.com)
Injection-Date: Sat, 27 May 2023 11:39:50 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 524
 by: robf...@gmail.com - Sat, 27 May 2023 11:39 UTC

On Saturday, May 27, 2023 at 5:25:47 AM UTC-4, BGB wrote:
> On 5/26/2023 11:00 PM, robf...@gmail.com wrote:
> > On Friday, May 26, 2023 at 10:09:44 PM UTC-4, MitchAlsup wrote:
> >> On Friday, May 26, 2023 at 7:50:36 PM UTC-5, BGB wrote:
> >>> On 5/26/2023 5:17 PM, MitchAlsup wrote:
> >>>> On Friday, May 26, 2023 at 4:43:19 PM UTC-5, BGB-Alt wrote:
> >>>>> On 5/26/2023 3:31 PM, MitchAlsup wrote:
> >>>>
> >>>>>>>> My 66000 is not attackable by those means--and probably can
> >>>>>>>> avoid needing ASLR although there is no reason to to use it as
> >>>>>>>> it slows nothing down.
> >>>>>> <
> >>>>>>> I had assumed ASLR as a "standard line of defense".
> >>>>>> <
> >>>>>> If/when the unprivileged cannot access super-address-space no mater
> >>>>>> the bit pattern created at AGEN, ASLR is not needed, as there is no way
> >>>>>> the user can access memory not mapped by his page tables.
> >>>> <
> >>>>> ASLR can help protect against things like userspace code doing
> >>>>> buffer-overflow exploits against system calls or intentionally mangled
> >>>>> data structures (along with things like marking stack and heap memory
> >>>>> and similar as non-executable, ...).
> >>>> <
> >>>> What if the control-flow information is not accessible to the LDs and STs
> >>>> of an ISA ?? That is, you can buffer overflow all you want, but when you
> >>>> execute RET you end up back at who called that subroutine ???
> >>>> <
> >>>> My 66000 architecture has 2 stack pointers, one for the data stack and
> >>>> one for the control flow and protected registers stack. The one for the
> >>>> protected register stack is not accessible to the program except through
> >>>> ENTER and EXIT and RET instructions and the pages are marked RWE = 000
> >>>> and HW verifies that those pages are so marked (at least at user privilege).
> >>>> <
> >>>> So, you can damage as much data memory as you like, but return from a
> >>>> subroutine will end up at the instruction after the call AND all of the
> >>>> preserved registers have their old values.
> >>> Yeah. Only one stack in my case.
> >>>
> >>> There are stack canaries though, which can detect if a buffer overflow
> >>> had overwritten the canary value.
> >>>
> >>> Had at one point considered a feature to hash the state of all of the
> >>> preserved registers, and then verify that everything was intact. No real
> >>> practical way to do this check in-hardware though.
> >>>>>
> >>>>> Partly as buffer-overflow and many forms of "confused deputy" attacks
> >>>>> are not stopped by conventional memory-access protections, but can be
> >>>>> made ineffective by using ASLR.
> >>>>>
> >>>> Look, I am NOT arguing that ASLR is bad, just that if the ISA and MMU
> >>>> is properly defined ASLR brings nothing MORE to the table. I am arguing
> >>>> the the need for ASLR is simply indicative of bad architecture.
> >>> OK.
> >>>
> >>> Most systems use it at least, and it is reasonably cheap.
> >> <
> >> Most systems are based on X86 or ARM--both of which have these kinds
> >> of problems.
> >>>
> >>> In an ideal world, maybe it would be unnecessary, but as I see it, it is
> >>> likely well worth its cost on this front.
> >>>>>
> >>>>> Similar goes for the "compiler shuffles all the functions on each
> >>>>> rebuild", etc.
> >>>> <
> >>>> Oh, and BTW, the user privilege is not allowed to write GOT and My 66000
> >>>> does not need a PLT, either. Cutting off even more attack vectors.
> >> <
> >>> OK.
> >>>
> >>> I am mostly using direct branches for local calls, and had intended
> >>> Abs48 branches for DLL imports (but, with the drawback that Abs48
> >>> branches can't encode Inter-ISA branches).
> >>>
> >>> Though, it is possible I could consider also allowing a Jumbo-Load +
> >>> Register Branch sequence for DLL imports, which is capable of InterISA,
> >>> and/or always leave at least 128 bits.
> >> <
> >> What I did was to invent a LD IP,[addres] instruction. This transfers control
> >> to the address in the accessed location and has the side effect of delivering
> >> the return address to R0.
> >>>
> >>> It seems reasonable that class VTable's and similar could be put into
> >>> read-only pages.
> >>>
> >> Once the method call tables have been assembled--those tables should
> >> have write permission removed.
> >>>
> >>> Still leaves a concern though if a program could compose "malicious COM
> >>> objects" or similar and then put them somewhere where they are not
> >>> supposed to go.
> >>>
> >>> Though, this is an area where (in theory) keyring/ACL checks could help,
> >>> since (ideally) if neither party can execute each others' ".text"
> >>> sections, then one can stop programs from sneaking bad COM objects into
> >>> OS APIs.
> >> <
> >> the first problem is that early PTE structures granted read and execute permission
> >> as if it were the same. There are reasons one wants you only to be able to read
> >> (.rodata for example) some are in memory, but you would neve want to transfer
> >> control there; similarly, one should not be able to read code (.text). Mixing these
> >> two up created a whole slew of problems..............
> >> <
> >> the second problem is that there are more than 2 layers. You want a debugger
> >> to be able to read code, the stack walker sometimes needs to read code, the
> >> dynamic linker needs to be able to write what otherwise smells like .rodata--
> >> while the application is not allowed to manipulate any of this, but of course
> >> the operating system IS.
> >> <
> >> In the past we granted excess permission to smooth out the bumps, and
> >> now we have college coursed teaching students how to break existing
> >> architectures, implementations, languages, networks,.....
> >> <
> >> We got it wrong up front, and now you are having to deal with the problem.
> >>>
> >>> Well, that or run lint-checks on any user-supplied objects and make
> >>> disallow ones with VTables or method pointers into writable memory. But,
> >>> this requires the APIs to be "well designed", which is possibly asking a
> >>> lot.
> >>>>>
> >>>>> In my case, there is a hardware RNG that can keep its random seed state
> >>>>> on the SDcard. A "better" option would be some sort of NVRAM, but (much
> >>>>> like a real-time clock), pretty much no normal FPGA boards have this.
> >>>>>
> >>>> Look, if you NEED ASLR to have a modicum of safety and protection,
> >>>> GO FOR IT.
> >>>> <
> >>>> My 66000 has no such need.
> >> <
> >>> It isn't strictly needed in an architectural sense, but I don't trust
> >>> security without it.
> >>>
> >>> Like, seeing how easily x86 systems have been to hack due to these sorts
> >>> of issues (and the epic failure of "just write code that is not weak
> >>> against buffer overflow").
> >>>
> >> My guess is that soon someone will find a hole in PCIe and be able to
> >> attack pretty much ANY system (because PCIe is the I/O hub of every
> >> system and thing). And just wait till you see what CXL addition to PCIe
> >> will bring to the <attackers> game.
> >>>
> >>> Well, and the pros/cons that trying to invoke buffer-overflow exploits
> >>> over USB was one of the major strategies for "jail breaking" or
> >>> "rooting" cell-phones.
> >>>
> >>>
> >>> Well, and as-is, TestKern is still kinda pathetic on this front.
> >>>>>
> >>>>> Would "almost" be nice if capabilities could be supported as well,
> >>>>> except that there are some serious drawbacks with capabilities as well
> >>>>> (to make them "actually effective" adds drawbacks, otherwise one could
> >>>>> sidestep them, which would severely limit their effectiveness).
> >>>> <
> >>>> One way to think about My 66000 is that the user address space is a
> >>>> capability, and that the user does not have a capability to GuestOS
> >>>> address space, GuestOS does not have a capability to HyperVisor
> >>>> address space,.....
> >>>> <
> >>>> It does not mater what bit pattern AGEN spits out, you cannot access
> >>>> the greater privilege level's address spaces. Not from ISA, not from TLB,
> >>>> not from Cache lines, not from watching a high precision timer.
> >>>> <
> >>>> Computer architecture is hard enough, there is no need to add even more
> >>>> fuzz across a myriad of problem-spaces when you can simply design them
> >>>> out before hand.
> >> <
> >>> Hmm...
> >>>
> >>>
> >>> Hard to keep things both "cheap" and "wont interfere with normal C
> >>> coding practices".
> >> <
> >> Yes, modern languages are not really capable of dealing with capabilities.
> >> Anything not a huge-flat address space is problematic with inter process
> >> shared memory.
> >>>>>
> >>>>> There is a feature for bounds-checked pointers in BJX2 (encoding the
> >>>>> bounds in the tag bits), but it falls well short of what would be needed
> >>>>> for an actual capability architecture.
> >>>> <
> >>>> I access bounds checking via the CMP instruction.
> >> <
> >>> I have the LEAT and BNDCHK instructions.
> >>>
> >>> Where:
> >>> LEAT.x (Rm, Ri), Rn
> >>> Behaves like a LEA, but also adjusts the bounds.
> >>> BNDCHK.x Rm, Rn
> >>> Will raise a fault if the access is out-of-bounds.
> >>>
> >>> Whereas, the normal LEA.x will zero the tag bits.
> >>>
> >>> For the XLEA.x and XMOV.x instructions, bound-adjustment and checking
> >>> are the default behaviors (if bounds-checking is enabled in the ISA).
> >>>
> >>> But, unlike a capability machine, nothing prevents a program from
> >>> twiddling the tag bits. A true capability machine would require a way to
> >>> tag the registers and memory to prevent the program from twiddling these
> >>> bits.
> >> <
> >> That twiddling is not inexpensive, and preventing all misuse is like understanding
> >> the Gordian knot.
> >>>
> >>> But, at the moment one is like "well, we can assign 2 bits to every
> >>> 128-bit pair to separate pointers from other data", a massive crap-storm
> >>> is unleashed.
> >> <
> >> But people want and need every bit in every register and container.
> >> If you want tags, you put them elsewhere.
> >>>
> >>>
> >>> Sadly, not enough bits to make them "not suck" or to encode both bounds
> >>> and an element type.
> >>>>>>>>>
> >>>>>>>> <snip>
> >>>>>>>>>
> >>>>>>>>> I guess one possible way would be to organize the ACL table as a
> >>>>>>>>> page-table like structure, say:
> >>>>>>>>> (31:16): KRR_ID
> >>>>>>>>> (15: 0): ACLID
> >>>>>>>> <
> >>>>>>>> In principle, you could have a different page hash table in memory for each
> >>>>>>>> ASID.
> >>>>>> <
> >>>>>>> To clarify, ASID and ACLID are two different features:
> >>>>>>> ASID is used for separating address spaces;
> >>>>>>> ACLID is for per-page per-task access-rights checking.
> >>>>>> <
> >>>>>> Most people place ACLID in the PTE and some in the hierarchy of
> >>>>>> PTPs and PTE.
> >>>>>>
> >>>>> There are some protection flags in the PTEs as well, but they only cover
> >>>>> the traditional User/Supervisor and "Global RWX" state.
> >>>> <
> >>>> Consider a HyperVisor hosting 2 GuestOSs. Both Guest OSs want to run
> >>>> their applications (the normal way) by having the user program share
> >>>> page tables and "Optimizing" the TLB using the G-bit. Now that you have
> >>>> 2 GuestOSs, this G-bit is now allowing leaks between GuestOSs, confusing
> >>>> the TLB mappings, and causing a host of other problems.
> >>>> <
> >>>> Now, consider an ASID system where the GuestOSs use different ASIDs
> >>>> and now all those G-bit problems disappear ! the TLB remains unconfused,
> >>>> and the memory hierarchy knows what to do. HyperVisors do this to the
> >>>> old MMU models, new architectures should solve these problems without
> >>>> creating new ones.
> >> <
> >>> OK.
> >>>
> >>>
> >>> I guess it came up elsewhere that apparently hypervisors differ more
> >>> from emulators than initially thought.
> >> <
> >> And evolving away at a rapid pace.
> >>>
> >
> > Hey, I had hardware page table walking for both hierarchical and hashed page tables,
> > but decided to scrap it. It seems that was a bad choice now, should have kept it. Had
> > the entire inverted hash page table for 512MB dram implemented in block RAM so
> > walking the table was ultra fast compared to using DRAM. One issue was limited
> > hardware budget. If the hardware budget were unlimited I would put whatever was
> > possible into hardware. I think a software managed table requires less hardware.
> > Another issue IIRC was having the page tables in virtual memory and having to
> > walk VM, handling double misses etc. It was getting complex.
> >
> I didn't have a page-walker.
>
>
> But, as noted, my design evolved out of another ISA which also didn't
> have a page walker.
>
> Well, and the BJX2 TLB design isn't that far removed from the SuperH TLB....
>
>
> Does seem simpler, since one doesn't really need to handle the situation
> in hardware, more like.
>
> TLB miss:
> Turn RAM request into a TLB Miss response;
> Generate an exception-code and throw it at the CPU.
>
> Getting interrupt handling to work reliably (and not prone to crash or
> deadlock the CPU in any number of ways) is the main challenge.
> > I am not quite sure I understand the issue with having software managed tables. It
> > seems to me software can be used to fake out almost anything in hardware. What
> > if the table walking were performed at the highest hardware operating level so that
> > lower level could not distinguish it from hardware?
> >
> The above was my proposal as well...
> Like, if one has a dedicated CPU register for holding the page-table,
> and CPU/ABI defined page-table layouts, why not?...
>
> One does end up with both TLB Miss and Page Fault exceptions, with the
> TLB-Miss handler potentially needing to re-throw a as Page-Fault in
> these cases, but, ...
> > Compilers can do bottom-up compiles. Things can successfully be built from the
> > bottom up and I think that is maybe what must be done when one does not know in
> > advance what one is building. It is also useful as a learning experience. If one already
> > has a wide knowledge base available then it is better to go from the top down,
> > making use of the knowledge already present.
> >
> Yeah.
>
> My C compiler started out long ago as an interpreter for a JavaScript
> clone...
>
> Which started originally as "I have an XML-RPC implementation, built on
> DOM; originally for shoe-horning XML-RPC requests through the Jabber-IM
> Chat Protocol".
>
> Sorta hacked the XML-RPC implementation to work as a full interpreter,
> and threw a JS-like parser on it... But, it sucked...
>
>
> Then it was turned into a crude C interpreter, but "kinda sucked".
> Then it was a "parse headers for metadata" / FFI glue tool for a while.
>
> Partly, in another direction, I had take Scheme interpreter I had
> written earlier, and then write a JS-like parser on top of this. This
> was at least "less terrible" than the one built on a hacked XML-RPC, and
> more useful as a script language.
>
> But, the C compiler mostly ended up as an FFI tool for this interpreter
> (wrapping up designated C functions into a form that the interpreter
> could call into; preferable to writing piles of boilerplate to do this
> stuff manually).
>
> Many years went by...
>
>
> Then, I wrote an SH-4 backend.
>
> Realized SH-4 was kind of a pain, started "fixing" some of the
> annoyances, and adding features. This became BJX1.
>
> Then redesigned BJX1 into BJX2, which was a moving target for a while.
> Compiler is kind of a mess as different parts were written during
> different eras of the ISA's evolution.
>
> Some parts are sort of an awkward SH-4 -> BJX1 -> Early BJX2 -> Newer
> BJX2 translation chain, with a lot of bit-twiddling and "mutators" along
> the way.
>
>
> Early on, I was like:
> Well, SH-4 is a simple ISA with simple 16-bit instructions, I will just
> generate code directly as 16-bit SH-4 words and emit these into the text
> section...
>
> Some years, and a whole lot of bit-twiddly and mutator stages later,
> "yeah, that was a mistake".
>
> Would have been better to code-gen into an ASM-like format, and then
> translate from ASM to BJX2, possibly via a listing table (more like I
> had used in my past x86 related tools), but, alas...
>
> Could rewrite the backend, but this has always been more effort than
> "just hack it some more".
>
>
>
> Despite deriving from SH-4, seemingly it convergently evolved in a
> direction much more like that of RISC-V. This wasn't really "by design",
> just sorta happened.
>
> In some other areas, it had seemingly also converged towards IA-64 as well.
>
> Like, there are some sort of "invisible great attractors" around both
> RISC-V and IA-64, where design fiddling seems to be pulled inexplicably
> towards them in many areas...
>
> But, OTOH, so long as it doesn't turn into a crab, that is probably OK.
> "My CPU has turned into a crab?..."
> "My car has turned crab as well?"
> Now drives sideways and has pincers, ...
> Person looks down at their arms, driving the car, "Oh No!"
>
> ...


Click here to read the complete article
Re: Misc: Design tradeoffs in virtual memory systems...

<u4svjf$e5m0$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32359&group=comp.arch#32359

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sat, 27 May 2023 15:12:15 +0200
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <u4svjf$e5m0$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 27 May 2023 13:12:15 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="0f2a468d4fadb6daf12247eb7e6e59eb";
logging-data="464576"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/C/1tg9nirO385k/g5NtPz4QDZkknS5CA7PUZ2zVnLJg=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.16
Cancel-Lock: sha1:g8KbvWxbFG2vQ38QYytG6+LZqLE=
In-Reply-To: <bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
 by: Terje Mathisen - Sat, 27 May 2023 13:12 UTC

MitchAlsup wrote:
> On Friday, May 26, 2023 at 8:24:13 PM UTC-5, BGB wrote:
> If I could accurately portray my though processes to a Psychiatrist,
> they would probably lock me up..............
> <
> I often tell the joke to someone like us::
> <
> Sane people think that there is a big difference between being sane and insane.
> <
> < longish pause >
> <
> We know otherwise.

It is more like not all of accept the same definition for the boundary
line (i.e. it is a very fuzzy boundary).

It is like the old saw about all progress being due to unreasonanable
people, since reasonable people accept the status quo.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Misc: Design tradeoffs in virtual memory systems...

<j9ocM.254897$LAYb.20065@fx02.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32360&group=comp.arch#32360

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx02.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Newsgroups: comp.arch
References: <u4por8$3tugb$1@dont-email.me> <Ia3cM.3440031$iU59.2338510@fx14.iad> <u4qntf$1fqa$1@dont-email.me> <PT6cM.355159$eRZ7.6952@fx06.iad> <c9b3bfb5-8c7a-4ad9-a9f4-5877bd4fec4en@googlegroups.com> <cD7cM.2199805$gGD7.233078@fx11.iad> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
Lines: 18
Message-ID: <j9ocM.254897$LAYb.20065@fx02.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sat, 27 May 2023 14:08:15 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sat, 27 May 2023 14:08:15 GMT
X-Received-Bytes: 1553
 by: Scott Lurndal - Sat, 27 May 2023 14:08 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Friday, May 26, 2023 at 2:19:40=E2=80=AFPM UTC-5, Scott Lurndal wrote:
>> MitchAlsup <Mitch...@aol.com> writes:=20

>> >> Unlike the CPUs below, x86 and arm are continuously=3D20=20
>> >> being enhanced with new features (e.g. secure conclaves,=3D20=20
>> >> realms, new instructions, etc).=20
>> ><=20
>> >You use the word "enhance" in a way contrary to the dictionary definitio=
>n..=3D=20
>>=20
>> There must have been demand for them from someone.
><
>A billion demands from a billion different people does not a Mona Lisa make=
>.

I don't believe I've ever claimed that the x86 architecture
is a "Mona Lisa". It has, however, been quite successful.

Re: Misc: Design tradeoffs in virtual memory systems...

<obocM.254898$LAYb.126941@fx02.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32361&group=comp.arch#32361

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx02.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Newsgroups: comp.arch
References: <u4por8$3tugb$1@dont-email.me> <c9b3bfb5-8c7a-4ad9-a9f4-5877bd4fec4en@googlegroups.com> <cD7cM.2199805$gGD7.233078@fx11.iad> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com> <u4r32e$mrq$1@reader2.panix.com>
Lines: 23
Message-ID: <obocM.254898$LAYb.126941@fx02.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sat, 27 May 2023 14:10:28 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sat, 27 May 2023 14:10:28 GMT
X-Received-Bytes: 1830
 by: Scott Lurndal - Sat, 27 May 2023 14:10 UTC

cross@spitfire.i.gajendra.net (Dan Cross) writes:
>In article <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>,
>MitchAlsup <MitchAlsup@aol.com> wrote:
>>On Friday, May 26, 2023 at 2:19:40 PM UTC-5, Scott Lurndal wrote:
>>> >You use the word "enhance" in a way contrary to the dictionary definition..=
>>>
>>> There must have been demand for them from someone.
>>
>>A billion demands from a billion different people does not a Mona Lisa make.
>
>While I'm sure we can all agree that the x86 is a dog's
>breakfast, that does not imply that all of its features are bad.
>Nor does it imply that a hardware page-table walker is bad.
>
>The OP seems to think so, but has yet to provide a particularly
>compelling argument beyond some measurements from very
>unrepresentative workloads running on a hobby ISA on an FPGA
>

To be fair to BGB, if software TLBs work for his particular
hobby ISA, that's fine. It's the idea that software TLBs
are universally better than hardware page table walkers that
doesn't stand up to scrutiny.

Re: Misc: Design tradeoffs in virtual memory systems...

<SrqcM.2167850$MVg8.198396@fx12.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32362&group=comp.arch#32362

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx12.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
References: <u4por8$3tugb$1@dont-email.me> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com> <u4r32e$mrq$1@reader2.panix.com> <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com> <u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me> <bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
In-Reply-To: <bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 16
Message-ID: <SrqcM.2167850$MVg8.198396@fx12.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sat, 27 May 2023 16:44:34 UTC
Date: Sat, 27 May 2023 12:44:23 -0400
X-Received-Bytes: 1379
 by: EricP - Sat, 27 May 2023 16:44 UTC

MitchAlsup wrote:
> On Friday, May 26, 2023 at 8:24:13 PM UTC-5, BGB wrote:
>>
>> I am aware of my own existence, and am able to recognize my own
>> reflection in a mirror, etc.
> <
> Cuttlefish are the boundary line on self recognition.

Possibly cleaner wrasse too.

Fish can recognize themselves in photos,
further evidence they may be self-aware, 6-Feb-2023
https://www.sciencenews.org/article/fish-recognize-photo-self-aware

Re: Misc: Design tradeoffs in virtual memory systems...

<2023May27.191829@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32363&group=comp.arch#32363

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sat, 27 May 2023 17:18:29 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 98
Message-ID: <2023May27.191829@mips.complang.tuwien.ac.at>
References: <u4por8$3tugb$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="3d3ae75f349545b0de86902be6235221";
logging-data="538248"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/0EoZ97eXZGp1xZEHSFAy7"
Cancel-Lock: sha1:ClQlAwGU9NvXdvprgATjJ7A004w=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sat, 27 May 2023 17:18 UTC

BGB <cr88192@gmail.com> writes:
>The topic came up elsewhere, where people were arguing that
>software-managed TLB was bad/useless and that (supposedly) no modern CPU
>architecture would consider using it.
....
>Trying to use a large memory footprint at present will tend to either
>cause lots of L2 cache-misses, or page faults, either of which will
>"ruin ones' day" a lot faster then the TLB misses.

What's your page size and the number of TLB entries?

With 4KB pages, and an L2 TLB with 3072 entries like Zen4 has, the TLB
can map 12MB; why would you get page faults before getting TLB misses?
As for L2 cache misses, Zen 4's L2 cache (1MB) contains 16384 lines,
so if your data has low spatial locality (e.g., large strides) you can
easily get a situation where the data fits in the L2 cache, but you
have a TLB miss on every access.

One example of a program that suffers from TLB misses is the following
matrix multiplication kernel:

for (j=0; j<p; j++)
for (k=0; k<m; k++)
for (i=0; i<n; i++)
c[i*p+j]+=a[i*m+k]*b[k*p+j];

Two of the array accesses per inner iteration of the loop are
performed with relatively large strides. If the stride is larger than
a page size, you need two TLB entries per iteration of the inner loop.
With 3072 entries, you get TLB misses once n>1536. At n=2000, you can
be sure to get two TLB misses per iteration.

And with software TLB miss handling, the cost of a TLB miss on an OoO
CPU is substantial.

>So, it looks like for architectures I can find information on:
> Hardware Managed (Page-Table):
> x86 / x86-64
> ARM / ARM64
> RISC-V (Privileged ISA Spec)
> Software Managed TLB:
> SuperH (SH-4 and SH-5)
> MIPS
> SPARC
> Alpha
> Power and PowerPC
> PA-RISC
> Itanium / IA-64
> (Hybrid, Supported RAM-Backed TLB)
> BJX2
> ...
> Unknown:
> M68K
> PDP / VAX

AFAIK Power/PowerPC and PA-RISC have inverted page tables as a
hardware feature. My understanding is that with software TLB miss
handling, the software defines what the page tables look like
(although having some parts of the PTEs have the same format as TLB
entries saves time).

My impression is that at the start of virtual memory it was a hardware
feature, with hardware dealing with the translation completely (and
TLBs were a microarchitectural, not architectural feature). E.g., the
68451 (the MMU) was a coprocessor to the 68020.

It's only with the RISC revolution that the architects thought: Now
that we have offloaded complex addressing modes to compilers etc.,
what else can we do to unburden the hardware. And they introduced
software-managed TLBs. The early RISCs needed it sorely, in order to
fit on a die, but they also tried to spin this as a feature by
claiming that the OS developer has the freedom to use different memory
management approaches (and at the time this had not been settled; in
the meantime it has: hierarchical page tables is it, inverted page
tables lost).

But even among the RISCs not everyone went for a software-managed TLB.
As mentioned above, Power and PA-RISC went for inverted page tables.
Looking at the 88100 manual, I see no mention of a TLB; it does
mention that the 88200 (a companion chip) contains an MMU, so either I
would have to look in the 88200 manual, or the first 88k
implementation has hardware-managed TLBs.

And with OoO, the case for hardware-managed TLBs is even stronger:
With software-managed TLBs, a TLB miss would result in a complete
pipeline reset, like in a branch misprediction, a pretty expensive
operation. And the miss code tends to be not that much faster than
for a single-issue CPU, but the frequency of TLB misses increases with
IPC. So the proportion of TLB misses in total execution time (already
marginal in single-issue RISCs) would increase. At the same time we
now have more hardware to throw at hardware-managing TLBs.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Misc: Design tradeoffs in virtual memory systems...

<u4th9g$gfjc$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32364&group=comp.arch#32364

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sat, 27 May 2023 13:14:02 -0500
Organization: A noiseless patient Spider
Lines: 540
Message-ID: <u4th9g$gfjc$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<16d4aaca-e13b-4330-9a50-fecd8933f5fdn@googlegroups.com>
<u4qspr$23sa$1@dont-email.me>
<ec6bd9e3-fc64-4c93-9b33-442f8d89a47en@googlegroups.com>
<u4r27s$2lrj$1@dont-email.me>
<4ac1abec-c685-4871-b2cd-079e4ea04991n@googlegroups.com>
<u4r92b$3idb$1@dont-email.me>
<881c9ad7-2cae-4ea1-a263-72d4778fb9c0n@googlegroups.com>
<u4rk20$4qbn$1@dont-email.me>
<6abdc3ae-e4af-45c8-a324-34ec96b61525n@googlegroups.com>
<e20fa77e-0bfe-4c6c-a771-a5d2bf964712n@googlegroups.com>
<u4sian$c97s$1@dont-email.me>
<80f66bbb-6d75-4237-b6e4-ced7c3967c25n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 27 May 2023 18:14:08 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c682f1875e0ef9e418a489a2e17601b8";
logging-data="540268"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19lnb8+8It/Hq9lzSbps8og"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:oJqJKhbvKfSXTL2XViqevSBzFpU=
Content-Language: en-US
In-Reply-To: <80f66bbb-6d75-4237-b6e4-ced7c3967c25n@googlegroups.com>
 by: BGB - Sat, 27 May 2023 18:14 UTC

On 5/27/2023 6:39 AM, robf...@gmail.com wrote:
> On Saturday, May 27, 2023 at 5:25:47 AM UTC-4, BGB wrote:
>> On 5/26/2023 11:00 PM, robf...@gmail.com wrote:
>>> On Friday, May 26, 2023 at 10:09:44 PM UTC-4, MitchAlsup wrote:
>>>> On Friday, May 26, 2023 at 7:50:36 PM UTC-5, BGB wrote:
>>>>> On 5/26/2023 5:17 PM, MitchAlsup wrote:
>>>>>> On Friday, May 26, 2023 at 4:43:19 PM UTC-5, BGB-Alt wrote:
>>>>>>> On 5/26/2023 3:31 PM, MitchAlsup wrote:
>>>>>>
>>>>>>>>>> My 66000 is not attackable by those means--and probably can
>>>>>>>>>> avoid needing ASLR although there is no reason to to use it as
>>>>>>>>>> it slows nothing down.
>>>>>>>> <
>>>>>>>>> I had assumed ASLR as a "standard line of defense".
>>>>>>>> <
>>>>>>>> If/when the unprivileged cannot access super-address-space no mater
>>>>>>>> the bit pattern created at AGEN, ASLR is not needed, as there is no way
>>>>>>>> the user can access memory not mapped by his page tables.
>>>>>> <
>>>>>>> ASLR can help protect against things like userspace code doing
>>>>>>> buffer-overflow exploits against system calls or intentionally mangled
>>>>>>> data structures (along with things like marking stack and heap memory
>>>>>>> and similar as non-executable, ...).
>>>>>> <
>>>>>> What if the control-flow information is not accessible to the LDs and STs
>>>>>> of an ISA ?? That is, you can buffer overflow all you want, but when you
>>>>>> execute RET you end up back at who called that subroutine ???
>>>>>> <
>>>>>> My 66000 architecture has 2 stack pointers, one for the data stack and
>>>>>> one for the control flow and protected registers stack. The one for the
>>>>>> protected register stack is not accessible to the program except through
>>>>>> ENTER and EXIT and RET instructions and the pages are marked RWE = 000
>>>>>> and HW verifies that those pages are so marked (at least at user privilege).
>>>>>> <
>>>>>> So, you can damage as much data memory as you like, but return from a
>>>>>> subroutine will end up at the instruction after the call AND all of the
>>>>>> preserved registers have their old values.
>>>>> Yeah. Only one stack in my case.
>>>>>
>>>>> There are stack canaries though, which can detect if a buffer overflow
>>>>> had overwritten the canary value.
>>>>>
>>>>> Had at one point considered a feature to hash the state of all of the
>>>>> preserved registers, and then verify that everything was intact. No real
>>>>> practical way to do this check in-hardware though.
>>>>>>>
>>>>>>> Partly as buffer-overflow and many forms of "confused deputy" attacks
>>>>>>> are not stopped by conventional memory-access protections, but can be
>>>>>>> made ineffective by using ASLR.
>>>>>>>
>>>>>> Look, I am NOT arguing that ASLR is bad, just that if the ISA and MMU
>>>>>> is properly defined ASLR brings nothing MORE to the table. I am arguing
>>>>>> the the need for ASLR is simply indicative of bad architecture.
>>>>> OK.
>>>>>
>>>>> Most systems use it at least, and it is reasonably cheap.
>>>> <
>>>> Most systems are based on X86 or ARM--both of which have these kinds
>>>> of problems.
>>>>>
>>>>> In an ideal world, maybe it would be unnecessary, but as I see it, it is
>>>>> likely well worth its cost on this front.
>>>>>>>
>>>>>>> Similar goes for the "compiler shuffles all the functions on each
>>>>>>> rebuild", etc.
>>>>>> <
>>>>>> Oh, and BTW, the user privilege is not allowed to write GOT and My 66000
>>>>>> does not need a PLT, either. Cutting off even more attack vectors.
>>>> <
>>>>> OK.
>>>>>
>>>>> I am mostly using direct branches for local calls, and had intended
>>>>> Abs48 branches for DLL imports (but, with the drawback that Abs48
>>>>> branches can't encode Inter-ISA branches).
>>>>>
>>>>> Though, it is possible I could consider also allowing a Jumbo-Load +
>>>>> Register Branch sequence for DLL imports, which is capable of InterISA,
>>>>> and/or always leave at least 128 bits.
>>>> <
>>>> What I did was to invent a LD IP,[addres] instruction. This transfers control
>>>> to the address in the accessed location and has the side effect of delivering
>>>> the return address to R0.
>>>>>
>>>>> It seems reasonable that class VTable's and similar could be put into
>>>>> read-only pages.
>>>>>
>>>> Once the method call tables have been assembled--those tables should
>>>> have write permission removed.
>>>>>
>>>>> Still leaves a concern though if a program could compose "malicious COM
>>>>> objects" or similar and then put them somewhere where they are not
>>>>> supposed to go.
>>>>>
>>>>> Though, this is an area where (in theory) keyring/ACL checks could help,
>>>>> since (ideally) if neither party can execute each others' ".text"
>>>>> sections, then one can stop programs from sneaking bad COM objects into
>>>>> OS APIs.
>>>> <
>>>> the first problem is that early PTE structures granted read and execute permission
>>>> as if it were the same. There are reasons one wants you only to be able to read
>>>> (.rodata for example) some are in memory, but you would neve want to transfer
>>>> control there; similarly, one should not be able to read code (.text). Mixing these
>>>> two up created a whole slew of problems..............
>>>> <
>>>> the second problem is that there are more than 2 layers. You want a debugger
>>>> to be able to read code, the stack walker sometimes needs to read code, the
>>>> dynamic linker needs to be able to write what otherwise smells like .rodata--
>>>> while the application is not allowed to manipulate any of this, but of course
>>>> the operating system IS.
>>>> <
>>>> In the past we granted excess permission to smooth out the bumps, and
>>>> now we have college coursed teaching students how to break existing
>>>> architectures, implementations, languages, networks,.....
>>>> <
>>>> We got it wrong up front, and now you are having to deal with the problem.
>>>>>
>>>>> Well, that or run lint-checks on any user-supplied objects and make
>>>>> disallow ones with VTables or method pointers into writable memory. But,
>>>>> this requires the APIs to be "well designed", which is possibly asking a
>>>>> lot.
>>>>>>>
>>>>>>> In my case, there is a hardware RNG that can keep its random seed state
>>>>>>> on the SDcard. A "better" option would be some sort of NVRAM, but (much
>>>>>>> like a real-time clock), pretty much no normal FPGA boards have this.
>>>>>>>
>>>>>> Look, if you NEED ASLR to have a modicum of safety and protection,
>>>>>> GO FOR IT.
>>>>>> <
>>>>>> My 66000 has no such need.
>>>> <
>>>>> It isn't strictly needed in an architectural sense, but I don't trust
>>>>> security without it.
>>>>>
>>>>> Like, seeing how easily x86 systems have been to hack due to these sorts
>>>>> of issues (and the epic failure of "just write code that is not weak
>>>>> against buffer overflow").
>>>>>
>>>> My guess is that soon someone will find a hole in PCIe and be able to
>>>> attack pretty much ANY system (because PCIe is the I/O hub of every
>>>> system and thing). And just wait till you see what CXL addition to PCIe
>>>> will bring to the <attackers> game.
>>>>>
>>>>> Well, and the pros/cons that trying to invoke buffer-overflow exploits
>>>>> over USB was one of the major strategies for "jail breaking" or
>>>>> "rooting" cell-phones.
>>>>>
>>>>>
>>>>> Well, and as-is, TestKern is still kinda pathetic on this front.
>>>>>>>
>>>>>>> Would "almost" be nice if capabilities could be supported as well,
>>>>>>> except that there are some serious drawbacks with capabilities as well
>>>>>>> (to make them "actually effective" adds drawbacks, otherwise one could
>>>>>>> sidestep them, which would severely limit their effectiveness).
>>>>>> <
>>>>>> One way to think about My 66000 is that the user address space is a
>>>>>> capability, and that the user does not have a capability to GuestOS
>>>>>> address space, GuestOS does not have a capability to HyperVisor
>>>>>> address space,.....
>>>>>> <
>>>>>> It does not mater what bit pattern AGEN spits out, you cannot access
>>>>>> the greater privilege level's address spaces. Not from ISA, not from TLB,
>>>>>> not from Cache lines, not from watching a high precision timer.
>>>>>> <
>>>>>> Computer architecture is hard enough, there is no need to add even more
>>>>>> fuzz across a myriad of problem-spaces when you can simply design them
>>>>>> out before hand.
>>>> <
>>>>> Hmm...
>>>>>
>>>>>
>>>>> Hard to keep things both "cheap" and "wont interfere with normal C
>>>>> coding practices".
>>>> <
>>>> Yes, modern languages are not really capable of dealing with capabilities.
>>>> Anything not a huge-flat address space is problematic with inter process
>>>> shared memory.
>>>>>>>
>>>>>>> There is a feature for bounds-checked pointers in BJX2 (encoding the
>>>>>>> bounds in the tag bits), but it falls well short of what would be needed
>>>>>>> for an actual capability architecture.
>>>>>> <
>>>>>> I access bounds checking via the CMP instruction.
>>>> <
>>>>> I have the LEAT and BNDCHK instructions.
>>>>>
>>>>> Where:
>>>>> LEAT.x (Rm, Ri), Rn
>>>>> Behaves like a LEA, but also adjusts the bounds.
>>>>> BNDCHK.x Rm, Rn
>>>>> Will raise a fault if the access is out-of-bounds.
>>>>>
>>>>> Whereas, the normal LEA.x will zero the tag bits.
>>>>>
>>>>> For the XLEA.x and XMOV.x instructions, bound-adjustment and checking
>>>>> are the default behaviors (if bounds-checking is enabled in the ISA).
>>>>>
>>>>> But, unlike a capability machine, nothing prevents a program from
>>>>> twiddling the tag bits. A true capability machine would require a way to
>>>>> tag the registers and memory to prevent the program from twiddling these
>>>>> bits.
>>>> <
>>>> That twiddling is not inexpensive, and preventing all misuse is like understanding
>>>> the Gordian knot.
>>>>>
>>>>> But, at the moment one is like "well, we can assign 2 bits to every
>>>>> 128-bit pair to separate pointers from other data", a massive crap-storm
>>>>> is unleashed.
>>>> <
>>>> But people want and need every bit in every register and container.
>>>> If you want tags, you put them elsewhere.
>>>>>
>>>>>
>>>>> Sadly, not enough bits to make them "not suck" or to encode both bounds
>>>>> and an element type.
>>>>>>>>>>>
>>>>>>>>>> <snip>
>>>>>>>>>>>
>>>>>>>>>>> I guess one possible way would be to organize the ACL table as a
>>>>>>>>>>> page-table like structure, say:
>>>>>>>>>>> (31:16): KRR_ID
>>>>>>>>>>> (15: 0): ACLID
>>>>>>>>>> <
>>>>>>>>>> In principle, you could have a different page hash table in memory for each
>>>>>>>>>> ASID.
>>>>>>>> <
>>>>>>>>> To clarify, ASID and ACLID are two different features:
>>>>>>>>> ASID is used for separating address spaces;
>>>>>>>>> ACLID is for per-page per-task access-rights checking.
>>>>>>>> <
>>>>>>>> Most people place ACLID in the PTE and some in the hierarchy of
>>>>>>>> PTPs and PTE.
>>>>>>>>
>>>>>>> There are some protection flags in the PTEs as well, but they only cover
>>>>>>> the traditional User/Supervisor and "Global RWX" state.
>>>>>> <
>>>>>> Consider a HyperVisor hosting 2 GuestOSs. Both Guest OSs want to run
>>>>>> their applications (the normal way) by having the user program share
>>>>>> page tables and "Optimizing" the TLB using the G-bit. Now that you have
>>>>>> 2 GuestOSs, this G-bit is now allowing leaks between GuestOSs, confusing
>>>>>> the TLB mappings, and causing a host of other problems.
>>>>>> <
>>>>>> Now, consider an ASID system where the GuestOSs use different ASIDs
>>>>>> and now all those G-bit problems disappear ! the TLB remains unconfused,
>>>>>> and the memory hierarchy knows what to do. HyperVisors do this to the
>>>>>> old MMU models, new architectures should solve these problems without
>>>>>> creating new ones.
>>>> <
>>>>> OK.
>>>>>
>>>>>
>>>>> I guess it came up elsewhere that apparently hypervisors differ more
>>>>> from emulators than initially thought.
>>>> <
>>>> And evolving away at a rapid pace.
>>>>>
>>>
>>> Hey, I had hardware page table walking for both hierarchical and hashed page tables,
>>> but decided to scrap it. It seems that was a bad choice now, should have kept it. Had
>>> the entire inverted hash page table for 512MB dram implemented in block RAM so
>>> walking the table was ultra fast compared to using DRAM. One issue was limited
>>> hardware budget. If the hardware budget were unlimited I would put whatever was
>>> possible into hardware. I think a software managed table requires less hardware.
>>> Another issue IIRC was having the page tables in virtual memory and having to
>>> walk VM, handling double misses etc. It was getting complex.
>>>
>> I didn't have a page-walker.
>>
>>
>> But, as noted, my design evolved out of another ISA which also didn't
>> have a page walker.
>>
>> Well, and the BJX2 TLB design isn't that far removed from the SuperH TLB...
>>
>>
>> Does seem simpler, since one doesn't really need to handle the situation
>> in hardware, more like.
>>
>> TLB miss:
>> Turn RAM request into a TLB Miss response;
>> Generate an exception-code and throw it at the CPU.
>>
>> Getting interrupt handling to work reliably (and not prone to crash or
>> deadlock the CPU in any number of ways) is the main challenge.
>>> I am not quite sure I understand the issue with having software managed tables. It
>>> seems to me software can be used to fake out almost anything in hardware. What
>>> if the table walking were performed at the highest hardware operating level so that
>>> lower level could not distinguish it from hardware?
>>>
>> The above was my proposal as well...
>> Like, if one has a dedicated CPU register for holding the page-table,
>> and CPU/ABI defined page-table layouts, why not?...
>>
>> One does end up with both TLB Miss and Page Fault exceptions, with the
>> TLB-Miss handler potentially needing to re-throw a as Page-Fault in
>> these cases, but, ...
>>> Compilers can do bottom-up compiles. Things can successfully be built from the
>>> bottom up and I think that is maybe what must be done when one does not know in
>>> advance what one is building. It is also useful as a learning experience. If one already
>>> has a wide knowledge base available then it is better to go from the top down,
>>> making use of the knowledge already present.
>>>
>> Yeah.
>>
>> My C compiler started out long ago as an interpreter for a JavaScript
>> clone...
>>
>> Which started originally as "I have an XML-RPC implementation, built on
>> DOM; originally for shoe-horning XML-RPC requests through the Jabber-IM
>> Chat Protocol".
>>
>> Sorta hacked the XML-RPC implementation to work as a full interpreter,
>> and threw a JS-like parser on it... But, it sucked...
>>
>>
>> Then it was turned into a crude C interpreter, but "kinda sucked".
>> Then it was a "parse headers for metadata" / FFI glue tool for a while.
>>
>> Partly, in another direction, I had take Scheme interpreter I had
>> written earlier, and then write a JS-like parser on top of this. This
>> was at least "less terrible" than the one built on a hacked XML-RPC, and
>> more useful as a script language.
>>
>> But, the C compiler mostly ended up as an FFI tool for this interpreter
>> (wrapping up designated C functions into a form that the interpreter
>> could call into; preferable to writing piles of boilerplate to do this
>> stuff manually).
>>
>> Many years went by...
>>
>>
>> Then, I wrote an SH-4 backend.
>>
>> Realized SH-4 was kind of a pain, started "fixing" some of the
>> annoyances, and adding features. This became BJX1.
>>
>> Then redesigned BJX1 into BJX2, which was a moving target for a while.
>> Compiler is kind of a mess as different parts were written during
>> different eras of the ISA's evolution.
>>
>> Some parts are sort of an awkward SH-4 -> BJX1 -> Early BJX2 -> Newer
>> BJX2 translation chain, with a lot of bit-twiddling and "mutators" along
>> the way.
>>
>>
>> Early on, I was like:
>> Well, SH-4 is a simple ISA with simple 16-bit instructions, I will just
>> generate code directly as 16-bit SH-4 words and emit these into the text
>> section...
>>
>> Some years, and a whole lot of bit-twiddly and mutator stages later,
>> "yeah, that was a mistake".
>>
>> Would have been better to code-gen into an ASM-like format, and then
>> translate from ASM to BJX2, possibly via a listing table (more like I
>> had used in my past x86 related tools), but, alas...
>>
>> Could rewrite the backend, but this has always been more effort than
>> "just hack it some more".
>>
>>
>>
>> Despite deriving from SH-4, seemingly it convergently evolved in a
>> direction much more like that of RISC-V. This wasn't really "by design",
>> just sorta happened.
>>
>> In some other areas, it had seemingly also converged towards IA-64 as well.
>>
>> Like, there are some sort of "invisible great attractors" around both
>> RISC-V and IA-64, where design fiddling seems to be pulled inexplicably
>> towards them in many areas...
>>
>> But, OTOH, so long as it doesn't turn into a crab, that is probably OK.
>> "My CPU has turned into a crab?..."
>> "My car has turned crab as well?"
>> Now drives sideways and has pincers, ...
>> Person looks down at their arms, driving the car, "Oh No!"
>>
>> ...
>
> I can remember being a crab-like fish at one point in a past life.
>


Click here to read the complete article
Re: Misc: Design tradeoffs in virtual memory systems...

<c9503e4f-61a3-4887-aac2-073da9f8fa81n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32365&group=comp.arch#32365

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:4510:b0:75b:161:6e62 with SMTP id t16-20020a05620a451000b0075b01616e62mr971032qkp.5.1685213781926;
Sat, 27 May 2023 11:56:21 -0700 (PDT)
X-Received: by 2002:a05:6830:1daa:b0:6b0:f7c:127e with SMTP id
z10-20020a0568301daa00b006b00f7c127emr1529434oti.1.1685213781664; Sat, 27 May
2023 11:56:21 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 27 May 2023 11:56:21 -0700 (PDT)
In-Reply-To: <e20fa77e-0bfe-4c6c-a771-a5d2bf964712n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1008:b673:73c4:8b61;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1008:b673:73c4:8b61
References: <u4por8$3tugb$1@dont-email.me> <16d4aaca-e13b-4330-9a50-fecd8933f5fdn@googlegroups.com>
<u4qspr$23sa$1@dont-email.me> <ec6bd9e3-fc64-4c93-9b33-442f8d89a47en@googlegroups.com>
<u4r27s$2lrj$1@dont-email.me> <4ac1abec-c685-4871-b2cd-079e4ea04991n@googlegroups.com>
<u4r92b$3idb$1@dont-email.me> <881c9ad7-2cae-4ea1-a263-72d4778fb9c0n@googlegroups.com>
<u4rk20$4qbn$1@dont-email.me> <6abdc3ae-e4af-45c8-a324-34ec96b61525n@googlegroups.com>
<e20fa77e-0bfe-4c6c-a771-a5d2bf964712n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c9503e4f-61a3-4887-aac2-073da9f8fa81n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 27 May 2023 18:56:21 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 56
 by: MitchAlsup - Sat, 27 May 2023 18:56 UTC

On Friday, May 26, 2023 at 11:00:28 PM UTC-5, robf...@gmail.com wrote:
> On Friday, May 26, 2023 at 10:09:44 PM UTC-4, MitchAlsup wrote:

> > > I guess it came up elsewhere that apparently hypervisors differ more
> > > from emulators than initially thought.
> > <
> > And evolving away at a rapid pace.
> > >
> Hey, I had hardware page table walking for both hierarchical and hashed page tables,
> but decided to scrap it. It seems that was a bad choice now, should have kept it. Had
> the entire inverted hash page table for 512MB dram implemented in block RAM so
> walking the table was ultra fast compared to using DRAM. One issue was limited
> hardware budget. If the hardware budget were unlimited I would put whatever was
> possible into hardware. I think a software managed table requires less hardware.
> Another issue IIRC was having the page tables in virtual memory and having to
> walk VM, handling double misses etc. It was getting complex.
>
> I am not quite sure I understand the issue with having software managed tables. It
> seems to me software can be used to fake out almost anything in hardware. What
> if the table walking were performed at the highest hardware operating level so that
> lower level could not distinguish it from hardware?
<
Consider a HyperVisor plus multiple GuestOS system.
<
You can't have the GuestOS perform the table walk or it would see HV data.
You can't have the HypeVisor perform the GuestOS part of the table walk or
you need a set of HyperVisor tables mimicking the GuestOS tables--wasting
memory.
In general, GuestOS uses 2-level paging while HV uses 1-level paging.
So, to walk the tables, you make 1 access to GuestOS virtual address space
followed by a hierarchy of requests to Hypervisor virtual address followed
by another access to GuestOS virtual address space,.......until done.
Most ISAs are not pemissive of these different address space requests.
>
> Compilers can do bottom-up compiles. Things can successfully be built from the
> bottom up and I think that is maybe what must be done when one does not know in
> advance what one is building. It is also useful as a learning experience. If one already
> has a wide knowledge base available then it is better to go from the top down,
> making use of the knowledge already present.

Re: Misc: Design tradeoffs in virtual memory systems...

<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32366&group=comp.arch#32366

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5c88:0:b0:3f0:abe7:24a2 with SMTP id r8-20020ac85c88000000b003f0abe724a2mr1474724qta.10.1685214552854;
Sat, 27 May 2023 12:09:12 -0700 (PDT)
X-Received: by 2002:a05:6870:b1d1:b0:19a:12aa:e3b8 with SMTP id
x17-20020a056870b1d100b0019a12aae3b8mr1375216oak.4.1685214552531; Sat, 27 May
2023 12:09:12 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 27 May 2023 12:09:12 -0700 (PDT)
In-Reply-To: <SrqcM.2167850$MVg8.198396@fx12.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1008:b673:73c4:8b61;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1008:b673:73c4:8b61
References: <u4por8$3tugb$1@dont-email.me> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com> <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com> <SrqcM.2167850$MVg8.198396@fx12.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 27 May 2023 19:09:12 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 15
 by: MitchAlsup - Sat, 27 May 2023 19:09 UTC

On Saturday, May 27, 2023 at 11:45:42 AM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Friday, May 26, 2023 at 8:24:13 PM UTC-5, BGB wrote:
> >>
> >> I am aware of my own existence, and am able to recognize my own
> >> reflection in a mirror, etc.
> > <
> > Cuttlefish are the boundary line on self recognition.
> Possibly cleaner wrasse too.
>
> Fish can recognize themselves in photos,
> further evidence they may be self-aware, 6-Feb-2023
> https://www.sciencenews.org/article/fish-recognize-photo-self-aware
<
I used cuttlefish because they are the lowest creature that "looks back"
when you look at them.

Re: Misc: Design tradeoffs in virtual memory systems...

<b3258d33-89c0-4d9a-a191-3725e59b7646n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32367&group=comp.arch#32367

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:4606:b0:75c:9c99:6e13 with SMTP id br6-20020a05620a460600b0075c9c996e13mr943894qkb.5.1685214977846;
Sat, 27 May 2023 12:16:17 -0700 (PDT)
X-Received: by 2002:aca:c209:0:b0:394:1a7c:61b9 with SMTP id
s9-20020acac209000000b003941a7c61b9mr1074183oif.5.1685214977592; Sat, 27 May
2023 12:16:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.1d4.us!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 27 May 2023 12:16:17 -0700 (PDT)
In-Reply-To: <2023May27.191829@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1008:b673:73c4:8b61;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1008:b673:73c4:8b61
References: <u4por8$3tugb$1@dont-email.me> <2023May27.191829@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b3258d33-89c0-4d9a-a191-3725e59b7646n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 27 May 2023 19:16:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4219
 by: MitchAlsup - Sat, 27 May 2023 19:16 UTC

On Saturday, May 27, 2023 at 1:06:16 PM UTC-5, Anton Ertl wrote:
> BGB <cr8...@gmail.com> writes:
>
> AFAIK Power/PowerPC and PA-RISC have inverted page tables as a
> hardware feature. My understanding is that with software TLB miss
> handling, the software defines what the page tables look like
> (although having some parts of the PTEs have the same format as TLB
> entries saves time).
>
> My impression is that at the start of virtual memory it was a hardware
> feature, with hardware dealing with the translation completely (and
> TLBs were a microarchitectural, not architectural feature). E.g., the
> 68451 (the MMU) was a coprocessor to the 68020.
>
> It's only with the RISC revolution that the architects thought: Now
> that we have offloaded complex addressing modes to compilers etc.,
> what else can we do to unburden the hardware. And they introduced
> software-managed TLBs. The early RISCs needed it sorely, in order to
> fit on a die, but they also tried to spin this as a feature by
> claiming that the OS developer has the freedom to use different memory
> management approaches (and at the time this had not been settled; in
> the meantime it has: hierarchical page tables is it, inverted page
> tables lost).
>
> But even among the RISCs not everyone went for a software-managed TLB.
> As mentioned above, Power and PA-RISC went for inverted page tables.
> Looking at the 88100 manual, I see no mention of a TLB; it does
> mention that the 88200 (a companion chip) contains an MMU, so either I
> would have to look in the 88200 manual, or the first 88k
> implementation has hardware-managed TLBs.
<
Yes, because we put FPU on the chip we ended up in a position where
we needed another chip And we elected to put the L1 cache and TLB
and table walker on that chip. The CPU never knew a table walk was
happening, just that this request was taking longer than expected.
>
> And with OoO, the case for hardware-managed TLBs is even stronger:
> With software-managed TLBs, a TLB miss would result in a complete
> pipeline reset, like in a branch misprediction, a pretty expensive
> operation.
<
It does not HAVE TOO, but almost always does.
<
> And the miss code tends to be not that much faster than
> for a single-issue CPU, but the frequency of TLB misses increases with
> IPC. So the proportion of TLB misses in total execution time (already
> marginal in single-issue RISCs) would increase. At the same time we
> now have more hardware to throw at hardware-managing TLBs.
<
In multiprocessors systems you also update the has-table-map using
locks to guard the updates from each other.
>
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Misc: Design tradeoffs in virtual memory systems...

<u4tl51$h0v0$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32368&group=comp.arch#32368

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sat, 27 May 2023 14:19:55 -0500
Organization: A noiseless patient Spider
Lines: 234
Message-ID: <u4tl51$h0v0$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<2023May27.191829@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 27 May 2023 19:20:01 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c682f1875e0ef9e418a489a2e17601b8";
logging-data="558048"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18oIz8CQByw8TElw2Fz0rEp"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:I5lcpF1t/LEQx/fJx+6llRElA1I=
Content-Language: en-US
In-Reply-To: <2023May27.191829@mips.complang.tuwien.ac.at>
 by: BGB - Sat, 27 May 2023 19:19 UTC

On 5/27/2023 12:18 PM, Anton Ertl wrote:
> BGB <cr88192@gmail.com> writes:
>> The topic came up elsewhere, where people were arguing that
>> software-managed TLB was bad/useless and that (supposedly) no modern CPU
>> architecture would consider using it.
> ...
>> Trying to use a large memory footprint at present will tend to either
>> cause lots of L2 cache-misses, or page faults, either of which will
>> "ruin ones' day" a lot faster then the TLB misses.
>
> What's your page size and the number of TLB entries?
>

Page Size: 16K
Main TLB Size: 256x 4-way, so 1024 total TLB entries.

Covers an area of roughly 16MB (in 48-bit VA mode, or 8MB in 96-bit VA
mode).

Would drop to 4MB coverage in the 4K page mode.

The TLB only supports a single active "primary" page size at a time. So,
using any 4K pages would require dropping the page-size to 4K (a 16K
page-table would then trigger a TLB miss for every 4K piece, so the ISR
needs to behave correctly in this case).

In this case, the TLB design can't accommodate heterogeneous page sizes
directly.

Granted, such a TLB is possibly "abnormally large" for a small CPU core
(roughly half the Block RAM of one of the L1 caches).

But, 256x gave OK performance, and L1 cache size also seems to be pretty
important (the size and performance of the L1 caches having a fairly
large impact on performance, followed closely by the L2 cache).

Unlike some architectures, a single Main TLB is shared between both the
L1 D$ and I$ (they may have smaller TLBs and skip the main TLB, but this
is optional).

In this case, the L1 caches are virtually indexed/tagged (and
direct-mapped), so the TLB is mostly only relevant to L1 misses.

The L2 cache, however, is physically tagged.

Have experimented with both direct-mapped and 2-way L2 cache, seems to
depend on workload which is "better" (Doom and similar seem to strongly
prefer a direct-mapped L2 cache; other programs seem to favor a
set-associative L2 cache; but the difference isn't that large either
way, so it more comes down to direct-mapped also being cheaper).

> With 4KB pages, and an L2 TLB with 3072 entries like Zen4 has, the TLB
> can map 12MB; why would you get page faults before getting TLB misses?
> As for L2 cache misses, Zen 4's L2 cache (1MB) contains 16384 lines,
> so if your data has low spatial locality (e.g., large strides) you can
> easily get a situation where the data fits in the L2 cache, but you
> have a TLB miss on every access.
>

OK.

In the current configuration (for an XC7A200T):
L2: 512K
RAM assigned to pagefile-backed memory: 32MB
May consider expanding to 64MB with 256MB total RAM.
Most of the rest is direct-use for now.
Currently 2 CPU cores.
Could potentially go 3 cores on this FPGA.
And/or, design a more specialized GPU or something.
Or, hell, maybe a "proper" (1) dedicated RISC-V core...

But, yeah, when the area covered by the pagefile-backed RAM is only 2x
the size of the area covered by the TLB, by the time the TLB miss rate
starts to really grow all that much, then one is getting page faults,
and the page-faults are around 3 orders of magnitude slower...

The board I was using previously had a 128MB RAM module.

Seems to be fairly uncommon for affordable FPGA boards to have much more
than 128MB or 256MB of RAM.

In pagefile cases, one can put a pagefile off in an SDcard, and this
gets an additional several GB or so (albeit, all accessed through a 32MB
or so window).

*1: As-in, implementing RISC-V with the Privileged ISA stuff in place so
that it could potentially run an OS designed to run on RISC-V (as
opposed to awkward/stalled attempts to run it in user-mode, and being
frustrated by GCC's lack of ability to produce PIE binaries for RISC-V,
which would actually be needed in this case given how TestKern works...).

But, say, if I am limited to using RISC-V mode for little more than
trying to run Dhrystone at boot-time or similar, this is "kinda useless"
(and, comparatively, the BJX2 core is "sorely lacking" if one wanted to
use it for exclusively running RISC-V...).

> One example of a program that suffers from TLB misses is the following
> matrix multiplication kernel:
>
> for (j=0; j<p; j++)
> for (k=0; k<m; k++)
> for (i=0; i<n; i++)
> c[i*p+j]+=a[i*m+k]*b[k*p+j];
>
> Two of the array accesses per inner iteration of the loop are
> performed with relatively large strides. If the stride is larger than
> a page size, you need two TLB entries per iteration of the inner loop.
> With 3072 entries, you get TLB misses once n>1536. At n=2000, you can
> be sure to get two TLB misses per iteration.
>
> And with software TLB miss handling, the cost of a TLB miss on an OoO
> CPU is substantial.
>

OK.

In my case, I was assuming in-order core designs...
As noted, my ISA is a 3-wide VLIW designed with the assumption of a
strictly in-order core and "very naive" decode and execute stages.

TLB misses are slow, but tend to be relatively infrequent in the current
setup.

I was more specifically assuming "well, say, imagine a world where OoO
is mostly off the table...".

>
>
>> So, it looks like for architectures I can find information on:
>> Hardware Managed (Page-Table):
>> x86 / x86-64
>> ARM / ARM64
>> RISC-V (Privileged ISA Spec)
>> Software Managed TLB:
>> SuperH (SH-4 and SH-5)
>> MIPS
>> SPARC
>> Alpha
>> Power and PowerPC
>> PA-RISC
>> Itanium / IA-64
>> (Hybrid, Supported RAM-Backed TLB)
>> BJX2
>> ...
>> Unknown:
>> M68K
>> PDP / VAX
>
> AFAIK Power/PowerPC and PA-RISC have inverted page tables as a
> hardware feature. My understanding is that with software TLB miss
> handling, the software defines what the page tables look like
> (although having some parts of the PTEs have the same format as TLB
> entries saves time).
>

Yeah. Seems the line between Software TLB and Inverted Page Table gets a
bit fuzzy...

If I were to do anything, it would probably be an inverted page table,
since this is the "most obvious design extension".

> My impression is that at the start of virtual memory it was a hardware
> feature, with hardware dealing with the translation completely (and
> TLBs were a microarchitectural, not architectural feature). E.g., the
> 68451 (the MMU) was a coprocessor to the 68020.
>
> It's only with the RISC revolution that the architects thought: Now
> that we have offloaded complex addressing modes to compilers etc.,
> what else can we do to unburden the hardware. And they introduced
> software-managed TLBs. The early RISCs needed it sorely, in order to
> fit on a die, but they also tried to spin this as a feature by
> claiming that the OS developer has the freedom to use different memory
> management approaches (and at the time this had not been settled; in
> the meantime it has: hierarchical page tables is it, inverted page
> tables lost).
>
> But even among the RISCs not everyone went for a software-managed TLB.
> As mentioned above, Power and PA-RISC went for inverted page tables.
> Looking at the 88100 manual, I see no mention of a TLB; it does
> mention that the 88200 (a companion chip) contains an MMU, so either I
> would have to look in the 88200 manual, or the first 88k
> implementation has hardware-managed TLBs.
>

OK.

As noted, TestKern was mostly using page tables.
3-level for 48-bit mode;
Hybrid B-Tree for 96-bit mode (implausible for HW page walker).
Hash + N-level, could work for HW, so may consider.

There are dedicated CPU registers for the page tables, and defined
layouts, but the CPU doesn't itself use them. The original idea would
have been that it would be left up to the implementation whether or not
hardware or software page-table walking was used.

> And with OoO, the case for hardware-managed TLBs is even stronger:
> With software-managed TLBs, a TLB miss would result in a complete
> pipeline reset, like in a branch misprediction, a pretty expensive
> operation. And the miss code tends to be not that much faster than
> for a single-issue CPU, but the frequency of TLB misses increases with
> IPC. So the proportion of TLB misses in total execution time (already
> marginal in single-issue RISCs) would increase. At the same time we
> now have more hardware to throw at hardware-managing TLBs.
>

OK.

AS noted, I had assumed in-order VLIW, and trying to optimize for for
hardware resource budget.

I have ended up throwing a lot of LUTs at things like the FP-SIMD unit,
but like, some of my use-cases have a need for fast FP-SIMD (for
something like a neural-net, the performance difference is significant).


Click here to read the complete article
Re: Misc: Design tradeoffs in virtual memory systems...

<97e71967-9856-4550-8797-b4e66c33edffn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32369&group=comp.arch#32369

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:46a0:b0:75b:264e:a1c7 with SMTP id bq32-20020a05620a46a000b0075b264ea1c7mr945131qkb.12.1685216808869;
Sat, 27 May 2023 12:46:48 -0700 (PDT)
X-Received: by 2002:aca:3d46:0:b0:399:ed2a:5604 with SMTP id
k67-20020aca3d46000000b00399ed2a5604mr1214231oia.2.1685216808671; Sat, 27 May
2023 12:46:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 27 May 2023 12:46:48 -0700 (PDT)
In-Reply-To: <2023May27.191829@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1008:b673:73c4:8b61;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1008:b673:73c4:8b61
References: <u4por8$3tugb$1@dont-email.me> <2023May27.191829@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <97e71967-9856-4550-8797-b4e66c33edffn@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 27 May 2023 19:46:48 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 33
 by: MitchAlsup - Sat, 27 May 2023 19:46 UTC

On Saturday, May 27, 2023 at 1:06:16 PM UTC-5, Anton Ertl wrote:

> One example of a program that suffers from TLB misses is the following
> matrix multiplication kernel:
>
> for (j=0; j<p; j++)
> for (k=0; k<m; k++)
> for (i=0; i<n; i++)
> c[i*p+j]+=a[i*m+k]*b[k*p+j];
>
> Two of the array accesses per inner iteration of the loop are
> performed with relatively large strides. If the stride is larger than
> a page size, you need two TLB entries per iteration of the inner loop.
> With 3072 entries, you get TLB misses once n>1536. At n=2000, you can
> be sure to get two TLB misses per iteration.
<
In the days of 88120 (6-wide GBOoO) the first 2 billion instructions of MATRIX300
we were getting 5.99 I/C, then there was a phase change where we transitioned
to only 1.5 I/C--when we looked into it, we were taking a TLB miss every loop,
then another 2 B instructions in performance dropped to 0.6 I/C and we were
taking a TLB miss every 'cycle'.......
<
The above was with a 64-entry fully associative TLB. A 256 Direct Mapped
TLB solved the problem MATRIX300 but would not solve the problem for
modern number crunching.
<
In general I prefer DGEMM over matrix multiplication when analyzing performance
on great big number crunching.

Re: Misc: Design tradeoffs in virtual memory systems...

<u4tn2h$h7q5$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32370&group=comp.arch#32370

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sat, 27 May 2023 14:52:43 -0500
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <u4tn2h$h7q5$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
<SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 27 May 2023 19:52:49 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c682f1875e0ef9e418a489a2e17601b8";
logging-data="565061"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1968Wj++CApUv5KsEmkXHky"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:8ogItw5btV7nHFrXcMxLg77HYU8=
In-Reply-To: <1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com>
Content-Language: en-US
 by: BGB - Sat, 27 May 2023 19:52 UTC

On 5/27/2023 2:09 PM, MitchAlsup wrote:
> On Saturday, May 27, 2023 at 11:45:42 AM UTC-5, EricP wrote:
>> MitchAlsup wrote:
>>> On Friday, May 26, 2023 at 8:24:13 PM UTC-5, BGB wrote:
>>>>
>>>> I am aware of my own existence, and am able to recognize my own
>>>> reflection in a mirror, etc.
>>> <
>>> Cuttlefish are the boundary line on self recognition.
>> Possibly cleaner wrasse too.
>>
>> Fish can recognize themselves in photos,
>> further evidence they may be self-aware, 6-Feb-2023
>> https://www.sciencenews.org/article/fish-recognize-photo-self-aware
> <
> I used cuttlefish because they are the lowest creature that "looks back"
> when you look at them.

Yeah, I think cuttlefish, squid, octopus, etc, are basically among the
smartest invertebrates.

While insects are also "kinda impressive" for what they can do with such
tiny brains, apparently insect brains tend to be essentially hard-wired
(once they reach adult form) and their brain-cells often tend to
jettison most such non-essential features as a cell nucleus or most
other related organelles (similar to red blood cells in mammals).

These sorts of tradeoffs likely result is some limitations as well.

....

Re: Misc: Design tradeoffs in virtual memory systems...

<u4tof7$hd5c$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32371&group=comp.arch#32371

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sat, 27 May 2023 15:16:33 -0500
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <u4tof7$hd5c$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<2023May27.191829@mips.complang.tuwien.ac.at>
<97e71967-9856-4550-8797-b4e66c33edffn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 27 May 2023 20:16:39 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c682f1875e0ef9e418a489a2e17601b8";
logging-data="570540"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18XVjX4smKpbjQcuibPQXaf"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:5OywKaHuhDv8rYIGl6NBykj3Ra4=
In-Reply-To: <97e71967-9856-4550-8797-b4e66c33edffn@googlegroups.com>
Content-Language: en-US
 by: BGB - Sat, 27 May 2023 20:16 UTC

On 5/27/2023 2:46 PM, MitchAlsup wrote:
> On Saturday, May 27, 2023 at 1:06:16 PM UTC-5, Anton Ertl wrote:
>
>> One example of a program that suffers from TLB misses is the following
>> matrix multiplication kernel:
>>
>> for (j=0; j<p; j++)
>> for (k=0; k<m; k++)
>> for (i=0; i<n; i++)
>> c[i*p+j]+=a[i*m+k]*b[k*p+j];
>>
>> Two of the array accesses per inner iteration of the loop are
>> performed with relatively large strides. If the stride is larger than
>> a page size, you need two TLB entries per iteration of the inner loop.
>> With 3072 entries, you get TLB misses once n>1536. At n=2000, you can
>> be sure to get two TLB misses per iteration.
> <
> In the days of 88120 (6-wide GBOoO) the first 2 billion instructions of MATRIX300
> we were getting 5.99 I/C, then there was a phase change where we transitioned
> to only 1.5 I/C--when we looked into it, we were taking a TLB miss every loop,
> then another 2 B instructions in performance dropped to 0.6 I/C and we were
> taking a TLB miss every 'cycle'.......
> <
> The above was with a 64-entry fully associative TLB. A 256 Direct Mapped
> TLB solved the problem MATRIX300 but would not solve the problem for
> modern number crunching.
> <
> In general I prefer DGEMM over matrix multiplication when analyzing performance
> on great big number crunching.

In my past testing, numbers were something like, for Doom:
256x 4-way, 16K Page: ~ 50 TLB miss/sec;
64x 4-way, 16K Page: ~ 300 TLB miss/sec;
16x 4-way, 16K Page: ~ 2000 TLB miss/sec;

256x 4-way, 4K Page: ~ 200 TLB miss/sec;
64x 4-way, 4K Page: ~ 1200 TLB miss/sec;
16x 4-way, 4K Page: ~ 6000 TLB miss/sec (*);

256x 4-way, 64K Page: ~ 20 TLB miss/sec;
64x 4-way, 64K Page: ~ 120 TLB miss/sec;
16x 4-way, 64K Page: ~ 800 TLB miss/sec;

*: Say, only reason it isn't higher is, by this point, the ISR is
severely eating into CPU time and causing Doom performance to fall in
the toilet...

( There aren't current measurements, more base on my memory of past
patterns ).

Note that in this case, like the L1 caches, the TLB uses
modulo-addressing (so, say, 16K pages will use Addr(21:14) as the TLB
index and ignore all the other bits).

If the TLB size is increased to 1024x 4-way (or 4096 total TLB entries),
for Doom the miss-rate seems to quickly start dropping (after initial
start-up), converging towards around 0.5 to 2.0 TLB miss per second (and
no longer cares much about page-size).

But, 1024x seemed probably overkill...

Re: Misc: Design tradeoffs in virtual memory systems...

<c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32372&group=comp.arch#32372

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1487:b0:3f5:2698:7e7c with SMTP id t7-20020a05622a148700b003f526987e7cmr1546962qtx.10.1685218718482;
Sat, 27 May 2023 13:18:38 -0700 (PDT)
X-Received: by 2002:a05:6870:50e4:b0:19a:41fb:304f with SMTP id
s36-20020a05687050e400b0019a41fb304fmr1539131oaf.11.1685218718129; Sat, 27
May 2023 13:18:38 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 27 May 2023 13:18:37 -0700 (PDT)
In-Reply-To: <u4tn2h$h7q5$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=99.251.79.92; posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 99.251.79.92
References: <u4por8$3tugb$1@dont-email.me> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com> <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com> <SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com> <u4tn2h$h7q5$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: robfi680@gmail.com (robf...@gmail.com)
Injection-Date: Sat, 27 May 2023 20:18:38 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4356
 by: robf...@gmail.com - Sat, 27 May 2023 20:18 UTC

On Saturday, May 27, 2023 at 3:52:53 PM UTC-4, BGB wrote:
> On 5/27/2023 2:09 PM, MitchAlsup wrote:
> > On Saturday, May 27, 2023 at 11:45:42 AM UTC-5, EricP wrote:
> >> MitchAlsup wrote:
> >>> On Friday, May 26, 2023 at 8:24:13 PM UTC-5, BGB wrote:
> >>>>
> >>>> I am aware of my own existence, and am able to recognize my own
> >>>> reflection in a mirror, etc.
> >>> <
> >>> Cuttlefish are the boundary line on self recognition.
> >> Possibly cleaner wrasse too.
> >>
> >> Fish can recognize themselves in photos,
> >> further evidence they may be self-aware, 6-Feb-2023
> >> https://www.sciencenews.org/article/fish-recognize-photo-self-aware
> > <
> > I used cuttlefish because they are the lowest creature that "looks back"
> > when you look at them.
> Yeah, I think cuttlefish, squid, octopus, etc, are basically among the
> smartest invertebrates.
>
>
> While insects are also "kinda impressive" for what they can do with such
> tiny brains, apparently insect brains tend to be essentially hard-wired
> (once they reach adult form) and their brain-cells often tend to
> jettison most such non-essential features as a cell nucleus or most
> other related organelles (similar to red blood cells in mammals).
>
> These sorts of tradeoffs likely result is some limitations as well.
>
> ...
Spiders are also quite intelligent. In another alternate lifetime or perhaps a
dream reality, I analyzed the DNA of spiders by noting they had an eye
associated with each leg then looking for DNA pattern that repeated eight
times. I can seem to remember alternate realities clear as day sometimes.

I hate to ask homework questions, but,

Am I on the right track thinking that hardware table walking for a table in a
VAS needs to be able to stack TLB misses? I have read through some web
pages, and they just give an overview of address translation, with a little
diamond on a translation diagram that say: ‘if in page table’. I think that
piece is not so simple.

Suppose there is a six-level hierarchical table. What if there is a TLB miss
in one of the levels while processing a TLB miss? It seems like one would
have to stack the misses process the new one, then unstack and continue,
all in hardware. Why I had relegated TLB miss processing to software
where it would be easier to handle. I suppose a custom hardware /
software combo co-processor could be built to handle the misses.

What if coincidently the TLB misses all use the same TLB entry? It is
possible to have six misses in a row with a six-level table. So, if the TLB
is less than six-way associative at least one of the translations will be
overwritten.

Re: Misc: Design tradeoffs in virtual memory systems...

<44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32373&group=comp.arch#32373

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:8c5:b0:621:3617:ca9e with SMTP id da5-20020a05621408c500b006213617ca9emr870263qvb.10.1685222395049;
Sat, 27 May 2023 14:19:55 -0700 (PDT)
X-Received: by 2002:a05:6830:1d59:b0:6af:8a0f:44be with SMTP id
p25-20020a0568301d5900b006af8a0f44bemr1455291oth.5.1685222394801; Sat, 27 May
2023 14:19:54 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 27 May 2023 14:19:54 -0700 (PDT)
In-Reply-To: <c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1008:b673:73c4:8b61;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1008:b673:73c4:8b61
References: <u4por8$3tugb$1@dont-email.me> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com> <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com> <SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com> <u4tn2h$h7q5$1@dont-email.me>
<c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sat, 27 May 2023 21:19:55 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4084
 by: MitchAlsup - Sat, 27 May 2023 21:19 UTC

On Saturday, May 27, 2023 at 3:18:39 PM UTC-5, robf...@gmail.com wrote:
> On Saturday, May 27, 2023 at 3:52:53 PM UTC-4, BGB wrote:
>
> I hate to ask homework questions, but,
>
> Am I on the right track thinking that hardware table walking for a table in a
> VAS needs to be able to stack TLB misses? I have read through some web
> pages, and they just give an overview of address translation, with a little
> diamond on a translation diagram that say: ‘if in page table’. I think that
> piece is not so simple.
<
If the tablewalker finds a page fault in midst walk, it simply raises PAGEFAULT
and control transferrrs to the page fault handler.
<
When the pager fault handler returns, the instruction causing the table walk
is executed again, and the table walker re walks the table.
>
> Suppose there is a six-level hierarchical table. What if there is a TLB miss
> in one of the levels while processing a TLB miss? It seems like one would
> have to stack the misses process the new one, then unstack and continue,
> all in hardware. Why I had relegated TLB miss processing to software
> where it would be easier to handle. I suppose a custom hardware /
> software combo co-processor could be built to handle the misses.
<
One can take a page fault at every level of the table walk.
And in 2-level paging (nested) you can take GuestOS page faults in
pages GuestOS manages, and HyperVisor page faults in those pages
the HV manages.
>
> What if coincidently the TLB misses all use the same TLB entry?
<
Associativity > total levels solves this problem. VAX could get into situations
where it needed 14 PTEs simultaneously present to execute some exotic
instructions. With is set associative TLB this was not always possible.
{Time for a VAX aficionado to tell us how it got out of these.}
<
> It is
> possible to have six misses in a row with a six-level table. So, if the TLB
> is less than six-way associative at least one of the translations will be
> overwritten.
<
PTPs are handled differently than PTEs. Only PTEs go in the TLB, a different
structure (possibly even flip-flops) can manage the sequentially of PTPs.

Re: Misc: Design tradeoffs in virtual memory systems...

<u4uakj$jmlc$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32374&group=comp.arch#32374

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sat, 27 May 2023 21:26:42 -0400
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <u4uakj$jmlc$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
<SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com>
<u4tn2h$h7q5$1@dont-email.me>
<c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com>
<44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 28 May 2023 01:26:43 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e6247f932b16c7951a90ab6660667530";
logging-data="645804"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+MfD9pOkYonWFnDvRCy6mW+gkFn6nG+I0="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:OomvL/mZ7d34glRG8ZeCariTuLo=
In-Reply-To: <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
 by: Paul A. Clayton - Sun, 28 May 2023 01:26 UTC

MitchAlsup wrote:
[snip]
> PTPs are handled differently than PTEs. Only PTEs go in the TLB, a different
> structure (possibly even flip-flops) can manage the sequentially of PTPs.

Quick niggling response: Linear page tables (table stored in the
virtual address space) would naturally store intermediate nodes in
the TLB (in fact, the page table base could, in theory, be such a
TLB-cached node). Placing such in an L2 TLB or a "side" TLB seems
attractive for avoiding interference/capacity waste given more
frequent 'ordinary' translation requests.

Also, there is some attraction *to me* for
storing PTPs and equivalent-node large page translations in the
same structure as this reduces the disincentive for large caches
(TLBs) for large pages. (There may be some disadvantages of such
shared storage. E.g., perhaps PTPs might benefit from sharing the
tag among multiple entries differently than large-page PTEs.
Complexity would also be a factor.)

(I hope to post my own thoughts on hardware TLB fill sometime.)

Re: Misc: Design tradeoffs in virtual memory systems...

<u4udgc$nka0$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32375&group=comp.arch#32375

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Sat, 27 May 2023 21:15:34 -0500
Organization: A noiseless patient Spider
Lines: 96
Message-ID: <u4udgc$nka0$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
<SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com>
<u4tn2h$h7q5$1@dont-email.me>
<c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com>
<44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 28 May 2023 02:15:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e56593e5a9e5d581f41eba438773153c";
logging-data="774464"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/VTJJ5msyXJhJMk2dupaij"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:trd4Tk+KyM1sinQwJbzOiQDU0TE=
Content-Language: en-US
In-Reply-To: <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
 by: BGB - Sun, 28 May 2023 02:15 UTC

On 5/27/2023 4:19 PM, MitchAlsup wrote:
> On Saturday, May 27, 2023 at 3:18:39 PM UTC-5, robf...@gmail.com wrote:
>> On Saturday, May 27, 2023 at 3:52:53 PM UTC-4, BGB wrote:
>>
>> I hate to ask homework questions, but,
>>
>> Am I on the right track thinking that hardware table walking for a table in a
>> VAS needs to be able to stack TLB misses? I have read through some web
>> pages, and they just give an overview of address translation, with a little
>> diamond on a translation diagram that say: ‘if in page table’. I think that
>> piece is not so simple.
> <
> If the tablewalker finds a page fault in midst walk, it simply raises PAGEFAULT
> and control transferrrs to the page fault handler.
> <
> When the pager fault handler returns, the instruction causing the table walk
> is executed again, and the table walker re walks the table.

Yeah, sorta similar for TLB Miss.
It doesn't really work to try to queue up the TLB misses, or to deal
with all of them.

Rather the first that is raised is handled, and the others are ignored.
When the instruction is retried after the ISR returns, it may trigger
another TLB miss. This may continue until all have been resolved.

>>
>> Suppose there is a six-level hierarchical table. What if there is a TLB miss
>> in one of the levels while processing a TLB miss? It seems like one would
>> have to stack the misses process the new one, then unstack and continue,
>> all in hardware. Why I had relegated TLB miss processing to software
>> where it would be easier to handle. I suppose a custom hardware /
>> software combo co-processor could be built to handle the misses.
> <
> One can take a page fault at every level of the table walk.
> And in 2-level paging (nested) you can take GuestOS page faults in
> pages GuestOS manages, and HyperVisor page faults in those pages
> the HV manages.
>>
>> What if coincidently the TLB misses all use the same TLB entry?
> <
> Associativity > total levels solves this problem. VAX could get into situations
> where it needed 14 PTEs simultaneously present to execute some exotic
> instructions. With is set associative TLB this was not always possible.
> {Time for a VAX aficionado to tell us how it got out of these.}
> <

Yeah, multiple memory accessed in the same instruction would be pretty
bad in this sense.

It can be pointed out that I used a 4-way set associative TLB.
Why? If I could have just used 1 or 2 way, which is cheaper.

What if I said, that things can get bad unless one uses an associative
mapping here (and, 4-way was basically the practical minimum I could
really get away with in my earlier tests).

Though, I can point out that this is related to the use of
modulo-addressing for the TLB. Hashing the TLB index effectively doubles
the required associativity needed to avoid certain "problem cases".

Even, if taken in isolation, an XOR hashed index looks "better"; and in
an intuitive sense, a modulo index seems like a rather poor solution.
Hashing adds an ugly edge case: What happens when H(A+0)==H(A+1) ?...

Of course, the 96-bit mode drops it to 2-way, which (with a modulo
index) is effectively the minimum one can have without running into
"deadlock scenarios". But, is still not ideal, as it can lead to cases
which have an adverse effect on TLB miss rate (pages mutually knocking
each other out of the TLB and resulting in unreasonably large spikes in
TLB miss rate).

But, here, a "better" solution (if the idea is that 96-bit mode will see
serious use) would be either widening the TLBE's internally to 256-bit
(such that 96-bit mode can remain 4-way) or widening the 48-bit case to
8-way.

I am left to wonder if my experiences are related to noticing that other
systems with set-associative software managed TLBs also typically have
either 4 or 8 way; but seemingly never 1 or 2 way...

>> It is
>> possible to have six misses in a row with a six-level table. So, if the TLB
>> is less than six-way associative at least one of the translations will be
>> overwritten.
> <
> PTPs are handled differently than PTEs. Only PTEs go in the TLB, a different
> structure (possibly even flip-flops) can manage the sequentially of PTPs.

Makes sense.

Re: Misc: Design tradeoffs in virtual memory systems...

<6be9f945-59e3-492d-8ac9-35f987b981d2n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32376&group=comp.arch#32376

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2a0a:b0:759:4c9a:1aa5 with SMTP id o10-20020a05620a2a0a00b007594c9a1aa5mr821261qkp.2.1685240884267;
Sat, 27 May 2023 19:28:04 -0700 (PDT)
X-Received: by 2002:aca:d904:0:b0:392:5d1b:5fc2 with SMTP id
q4-20020acad904000000b003925d1b5fc2mr1628389oig.7.1685240884037; Sat, 27 May
2023 19:28:04 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!newsfeed.hasname.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 27 May 2023 19:28:03 -0700 (PDT)
In-Reply-To: <u4uakj$jmlc$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1008:b673:73c4:8b61;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1008:b673:73c4:8b61
References: <u4por8$3tugb$1@dont-email.me> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com> <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com> <SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com> <u4tn2h$h7q5$1@dont-email.me>
<c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com> <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
<u4uakj$jmlc$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6be9f945-59e3-492d-8ac9-35f987b981d2n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sun, 28 May 2023 02:28:04 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3441
 by: MitchAlsup - Sun, 28 May 2023 02:28 UTC

On Saturday, May 27, 2023 at 8:28:36 PM UTC-5, Paul A. Clayton wrote:
> MitchAlsup wrote:
> [snip]
> > PTPs are handled differently than PTEs. Only PTEs go in the TLB, a different
> > structure (possibly even flip-flops) can manage the sequentially of PTPs.
<
> Quick niggling response: Linear page tables (table stored in the
> virtual address space) would naturally store intermediate nodes in
> the TLB (in fact, the page table base could, in theory, be such a
> TLB-cached node). Placing such in an L2 TLB or a "side" TLB seems
> attractive for avoiding interference/capacity waste given more
> frequent 'ordinary' translation requests.
<
We would put PTPs in a table-walk accelerator cache which could
be searched as the TLB was searched. Thus, a couple hits in the
TWA and by the time you know the TLB missed, you could already
have the address of the missing PTE. In Ross Colorado chips the
TWA has 8 entries.
>
> Also, there is some attraction *to me* for
> storing PTPs and equivalent-node large page translations in the
> same structure as this reduces the disincentive for large caches
> (TLBs) for large pages.
<
PTPs in the TLB is a waste of TLB entries--a precious resource.
<
< (There may be some disadvantages of such
> shared storage. E.g., perhaps PTPs might benefit from sharing the
> tag among multiple entries differently than large-page PTEs.
> Complexity would also be a factor.)
<
In the past I have held lines of PTPs in TWA so that nearby accesses
do not need L2 accesses.
>
> (I hope to post my own thoughts on hardware TLB fill sometime.)


devel / comp.arch / Re: Misc: Design tradeoffs in virtual memory systems...

Pages:12345678
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor