Rocksolid Light - comp.arch - Re: Misc: Design tradeoffs in virtual memory systems...

Re: Misc: Design tradeoffs in virtual memory systems...

<u52f73$1fvov$1@dont-email.me>

https://news.novabbs.org/devel/article-flat.php?id=32420&group=comp.arch#32420

by: Paul A. Clayton - Mon, 29 May 2023 15:09 UTC

On 5/28/23 5:33 PM, John Levine wrote:
[snip]
> There are a few mistakes that kill an architecture, with the #1 being
> that the address space is too small, and for those of us old enough,
> that it didn't have 8-bit byte addressing and twos-complement
> arithmetic. Other than that you can throw hardware at it and make it
> fast enough.

I wonder if it would possible even to extend the address space by
throwing hardware at the problem. Merely doubling the size of all
operands *might* work (though I think doubling memory size, memory
bandwidth, and cache size would be considered highly impractical).
(Bandwidth and cache might be addressed somewhat by compression.)
This would also seem to require that the ISA distinguishes pointer
arithmetic (at least for a byte-addressable ISA).

["throw hardware at it" may have been intended to exclude memory
capacity expansion; doubling *system* cost seems a less reasonable
option.]

(Doubling the size of everything and adding load/store "half-byte"
high/low instructions _might_ be made to work. Old software
running in "doubled word size" mode would be a memory hog, but
_might_ work without modification. Transferring data between
systems would be "interesting".)

(Only doubling "word" access sizes would mess with structure
offsets, which are typically hard-coded into the program and do
not distinguish between accumulated sub-word length and count of
word lengths. Encoding addresses as separate word and "byte"
counts would avoid this problem but seems problematic — offset
reach would be reduced for a given number of bits as well as other
complications.)

Tagged memory would also allow arbitrary sizes. (I seem to recall
that there was one architecture that supported arbitrary precision
arithmetic with in-memory tags. Making caches for a memory-memory
architecture work almost as well as registers would be
challenging.)

I suspect a group of clever people could come up with ways to
kludge an address space fix.

Running out of address space is a convenient reason to
update/change an ISA. Making heroic efforts to deal with
increasing 'word'/address size while allowing old programs to
fully exploit the larger size seems unwise.

Re: Misc: Design tradeoffs in virtual memory systems...

<2023May29.171026@mips.complang.tuwien.ac.at>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32422&group=comp.arch#32422

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Mon, 29 May 2023 15:10:26 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 39
Message-ID: <2023May29.171026@mips.complang.tuwien.ac.at>
References: <u4por8$3tugb$1@dont-email.me> <SrqcM.2167850$MVg8.198396@fx12.iad> <1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com> <u4tn2h$h7q5$1@dont-email.me> <c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com> <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com> <u4uakj$jmlc$1@dont-email.me> <6be9f945-59e3-492d-8ac9-35f987b981d2n@googlegroups.com> <u4vjtf$sn61$1@dont-email.me> <6c8bb1bd-3ba5-4864-8b46-d1c738133c9cn@googlegroups.com> <u50ng3$12k82$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="e0ebe82afba4af75848a26f81b17e38f";
logging-data="1572782"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/9B3AieaMD1AyHzyOiIbM/"
Cancel-Lock: sha1:t070uYSl/gEgkgT/1Arorm5c6BE=
X-newsreader: xrn 10.11

by: Anton Ertl - Mon, 29 May 2023 15:10 UTC

"Paul A. Clayton" <paaronclayton@gmail.com> writes:
>Including large pages and base pages in a single TLB has
>traditionally been done using CAMs, which are relatively
>expensive.

My impression was that larger pages were implemented by masking away
the low bits of the comparison. Of course, if you have mixed entries,
whether you have to mask away the low bits depends on the contents of
the entry, which increases the latency, so it's no surprise that they
are often separated for the L1 TLB.

>It looks like Intel's Sunny Cove combines hash/rehash for the L2
>TLB, fully associative (CAM) and multi-size for L1I TLBs, and
>separate size-based TLBs for L1D
>(https://www.anandtech.com/show/14664/testing-intel-ice-lake-10nm/2 ):
>
> Page # of
> Size(s) Entries Associativity
>L1D 4K 64 4-way
>L1D 2M 32 4-way
>L1D 1G 8 full
>L1I 4K+2M 8 full
>L1I 4K+2M+1G 16 full
>L2 4K+2M 1024 8-way
>L2 4K+1G 1024 8-way
>
>It is slightly interesting that the L2 is divided into two
>sections, allowing three sizes with only hash/rehash (i.e., not
>requiring a third probing). The L1I TLB is also divided into two
>parts with one supporting all three page sizes (and having twice
>as many entries) while the other only supports the two smaller
>page sizes.

I wonder what the motivation for 1GB pages for code is.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

"Paul A. Clayton" <paaronclayton@gmail.com> writes:
>On 5/28/23 5:33 PM, John Levine wrote:
>[snip]
>> There are a few mistakes that kill an architecture, with the #1 being
>> that the address space is too small, and for those of us old enough,
>> that it didn't have 8-bit byte addressing and twos-complement
>> arithmetic. Other than that you can throw hardware at it and make it
>> fast enough.

Although it's worth noting that the B5500 descendents are still
operational after over half a century later, albeit emulated
in modern incarnations. 48-bit words. Likewise Univac with
36-bit words and 9-bit bytes.

https://en.wikipedia.org/wiki/Burroughs_Large_Systems#B6500,_B6700/B7700,_and_successors

>
>Tagged memory would also allow arbitrary sizes. (I seem to recall
>that there was one architecture that supported arbitrary precision
>arithmetic with in-memory tags. Making caches for a memory-memory
>architecture work almost as well as registers would be
>challenging.)

https://en.wikipedia.org/wiki/Burroughs_Large_Systems#Tagged_architecture

>Running out of address space is a convenient reason to
>update/change an ISA. Making heroic efforts to deal with
>increasing 'word'/address size while allowing old programs to
>fully exploit the larger size seems unwise.

Burroughs did that with e-mode in the Large systems about 20
years after introduction.

https://web.archive.org/web/20130521171833/http://jack.hoa.org/hoajaa/b5900.htm

We also did that in Medium systems, likewise about 20 years
after they were introduced in 1966. Programs were limited to
500Kbytes (1000 kilodigits) before the architecture upgrade.
After the upgrade the limit was good for the next
century[*] (1,000,000 environments containing 100,000,000 digits each
per task), in variable sized units (pages) of up to 1,000,000 digits
each. With full backward compatability for applications compiled in
1967 where the source deck had been lost for a decade.

[*] The entire line was discontinued[**] in 1992 :-(.

[**] A bit of a medium systems pun, since the command to kill
a job was the DS command - DiScontinue.

Re: Misc: Design tradeoffs in virtual memory systems...

<X%3dM.555286$qjm2.466341@fx09.iad>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32425&group=comp.arch#32425

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx09.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
References: <u4por8$3tugb$1@dont-email.me> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com> <u4r32e$mrq$1@reader2.panix.com> <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com> <u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me> <bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com> <SrqcM.2167850$MVg8.198396@fx12.iad> <1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com> <u4tn2h$h7q5$1@dont-email.me> <c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com> <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com> <u4uakj$jmlc$1@dont-email.me> <6be9f945-59e3-492d-8ac9-35f987b981d2n@googlegroups.com> <u4vjtf$sn61$1@dont-email.me> <6c8bb1bd-3ba5-4864-8b46-d1c738133c9cn@googlegroups.com> <u50ng3$12k82$1@dont-email.me>
In-Reply-To: <u50ng3$12k82$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 141
Message-ID: <X%3dM.555286$qjm2.466341@fx09.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 29 May 2023 16:01:59 UTC
Date: Mon, 29 May 2023 12:01:43 -0400
X-Received-Bytes: 8463

by: EricP - Mon, 29 May 2023 16:01 UTC

Paul A. Clayton wrote:
> On 5/28/23 1:06 PM, MitchAlsup wrote:
>> On Sunday, May 28, 2023 at 8:12:42 AM UTC-5, Paul A. Clayton wrote:
>>> On 5/27/23 10:28 PM, MitchAlsup wrote:
>>>> On Saturday, May 27, 2023 at 8:28:36 PM UTC-5, Paul A. Clayton wrote:
> [snip]
>>>>> Also, there is some attraction *to me* for
>>>>> storing PTPs and equivalent-node large page translations in the
>>>>> same structure as this reduces the disincentive for large caches
>>>>> (TLBs) for large pages.
>>>> <
>>>> PTPs in the TLB is a waste of TLB entries--a precious resource.
>> <
>>> For large page TLBs, there is a problem of utilization. If no
>>> large pages are used, the area (and power) is wasted. By including
>>> PTPs, most of the area and static power is not wasted (though
>>> dynamic power will be due to accessing this special TLB for every
>>> access — unless the large page TLB access can be delayed until
>>> after a small page TLB miss is noted [prediction might also be
>>> used]).
>> <
>> So put large page PTEs in the TLB !! problem solved.
>
> Including large pages and base pages in a single TLB has
> traditionally been done using CAMs, which are relatively
> expensive. (For L2, hash/rehash has been used.) In addition to the
> difference in tag size and possibly data size, the indexing
> methods are not consistent.

Having multiple pages sizes stored in the same TLB requires
a Ternary-CAM (TCAM) which has 0,1 and dont_care X entries.
Obviously a TCAM is more complex than a Binary CAM (BCAM)
(especially for FPGA's).

A TCAM that stores two fixed size tags accomplishes the same
as having two BCAMs, one for each tag size.
BCAM is more efficient because each only stores 0 or 1 match values
and only for the number of bits each tag requires.

> "put the large page PTEs in the TLB" [with the base page PTEs]
> solves the utilization issue in terms of entries but increases
> latency for large pages if hash/rehash is used (which latency can
> be at least partially hidden by using the extra untranslated bits
> for cache tag comparison and for L2 the latency would be less
> important) or uses more area from a CAM-based TLB. In either case,
> the difference in entry size is not exploited.
>
> (A small tag TLB with full-sized payload could also be used for
> base size PTEs with a the missing upper bits filled in by a
> constant [perhaps zero extension] or using some bits to reference
> a table. Such use seems problematic, but it might be worth at
> least researching. I seem to recall that x86 added an address
> space extension for 4 MiB pages with 32-bit PTEs, though I doubt
> such was used much.)
>
> I do not claim that having PTEs and PTPs of the same level share
> storage is an obviously preferred design, but I do think it
> presents a different balance of tradeoffs and should not be so
> easily dismissed (I doubt you or anyone else has actually studied
> this design choice). I admit that I promote it (to the extent I do) in
> large part because, as far as I know, I am the first and
> only person to notice this possibility. This *might* merely
> indicate that the idea is not useful since negative results are
> less commonly published even in academia, but I suspect that the
> idea is merely not interesting enough to draw research.

The most efficient radix tree table walker I know of caches interior
page table nodes using the relevant part of the virtual address as tag
for each level.

If there is a TLB miss it walks backwards up the interior node caches
checking the virtual address for a tag match at each level.

The interior node for the PTP page-table-pointer for a 4kB page
has the same tag address as the PTE page-table-entry for a 2MB page.

That is, virtual address bits <63:12> are the match tag for 4kB page.
Address <63:21> is the tag for both the interior node PTP and 2M-PTE
so these can both be stored together in their own BCAM.

The advantage of having two BCAMs, one for 4kB and one for 2MB+PTP pages is
(a) the tag match for both levels can be done in parallel with a single
port each, (b) each level BCAM only stores the match address bits
it actually requires.

It also means that 2MB-PTE and 4kB-PTP entries complete for space
in the level-2 TLB. I doubt that is a problem.

The design should also include the ability to skip upper tree levels
on the walk down - go straight from top to say level 2 (2MB) or 3 (1GB)
as 99% of apps will fit into a 1GB range.
The table only needs 1 skip, from top to level-N, so only the top
table entries need a level skip count.

> (My motivation for the idea was from the limited number of large
> page entries supported in some designs, introducing a chicken-and-
> egg issue — providing hardware resources for a less used feature
> would be wasteful, using a feature with less hardware resources is
> less useful. I also note that a PTP cache could be used for
> compressing tags in a TLB similar to Seznec's "Don't use the page
> number, but a pointer to it" but replacing 'page number' with 'x
> MiB region number' and applying to a TLB and not a general memory
> cache. While a version of Seznec's idea was used in Itanium 2's
> "prevalidated" L1 tags (which used a one hot bit to match a TLB
> number, the one hot bit format facilitating flash invalidation on
> TLB entry eviction), the utility of this kind of a design for a
> TLB seems doubtful.)
>
> It looks like Intel's Sunny Cove combines hash/rehash for the L2 TLB,
> fully associative (CAM) and multi-size for L1I TLBs, and separate
> size-based TLBs for L1D
> (https://www.anandtech.com/show/14664/testing-intel-ice-lake-10nm/2 ):
>
> Page # of
> Size(s) Entries Associativity
> L1D 4K 64 4-way
> L1D 2M 32 4-way
> L1D 1G 8 full
> L1I 4K+2M 8 full
> L1I 4K+2M+1G 16 full
> L2 4K+2M 1024 8-way
> L2 4K+1G 1024 8-way
>
> It is slightly interesting that the L2 is divided into two sections,
> allowing three sizes with only hash/rehash (i.e., not requiring a third
> probing). The L1I TLB is also divided into two parts with one supporting
> all three page sizes (and having twice as many entries) while the other
> only supports the two smaller page sizes.

Last I heard all OS large page management allocated fixed size pools
at boot for each 4k/2M/1G page size and managed within those pools
(no breaking 2MB pages into 4kB and re-integrating them later).
So large pages would mostly be used only for well-known uses.

The 1GB pages are probably only used for mapping graphics memory and
some OS parts. Eg some OS overlay a virtual map on all of physical memory
so OS can access any physical address at an offset in that virtual area.
2MB pages would be used for OS code whose size is known at boot time.
Privileged apps like RDB's with large resident cache could use large pages.
Everything dynamically loaded or managed likely use 4kB pages.

It appears that Paul A. Clayton <paaronclayton@gmail.com> said:
>I wonder if it would possible even to extend the address space by
>throwing hardware at the problem. Merely doubling the size of all
>operands *might* work (though I think doubling memory size, memory
>bandwidth, and cache size would be considered highly impractical).
>(Bandwidth and cache might be addressed somewhat by compression.)
>This would also seem to require that the ISA distinguishes pointer
>arithmetic (at least for a byte-addressable ISA).

That's what x86 did. Going from 16 to 32 bits they made all the
registers twice as wide, and added a width prefix. There were 16 and
32 bit modes, but the prefixes let you use either size in either mode.
For 64 bit mode, they made the registers wider, added more registers,
kept the prefixes, and dropped a lot of the cruft nobody used any
more. On most 64 bit Unix-ish systems you can still run 32 bit
programs, with the operating system handling the width conversion in
system calls.

On s/360 the addresses were originally 24 bits, which was a mistake.
They did a kludge on s/370 with 31 bit addresses, using the high bit
of the address register to indicate which mode the address was in.
There was also a lot of complicated mode switching stuff to allow 31
bit code to call 24 bit code more or less transparently. For 64 bits
they made the registers wider, and added lots of new instructions for
64 bit operands.

A long time ago the DG Eclipse managed to shoehorn the 32 bit
instruction set into the holes in the Nova's 16 bit instruction
set so they could avoid a mode bit. (That was "Soul of a New
Machine.") Dunno anyone else who did that.

DEC did a very clever hack to extend the PDP-10's 18 bit word
addresses to 30 bits, roughly 1MB to 16MB, but by then the era of 36
bit machines was over and they put their effort into the VAX.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: Misc: Design tradeoffs in virtual memory systems...

<u52rs1$1j1nh$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32428&group=comp.arch#32428

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Mon, 29 May 2023 13:44:43 -0500
Organization: A noiseless patient Spider
Lines: 183
Message-ID: <u52rs1$1j1nh$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<Ia3cM.3440031$iU59.2338510@fx14.iad>
<u4vj9n$25m1c$1@newsreader4.netcologne.de>
<hHLcM.4041635$GNG9.1173433@fx18.iad> <u50hb0$ijs$1@gal.iecc.com>
<u50p50$12upf$1@dont-email.me> <KR1dM.3730494$vBI8.3130881@fx15.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 29 May 2023 18:45:22 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="35878edc648de3651095e2f15a816b94";
logging-data="1672945"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18s9MwEFNnjJAJNBkSkjBrq"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.2
Cancel-Lock: sha1:2T9+Vzxehzpz3IiaYorOAadXGK0=
In-Reply-To: <KR1dM.3730494$vBI8.3130881@fx15.iad>
Content-Language: en-US

by: BGB - Mon, 29 May 2023 18:44 UTC

On 5/29/2023 8:34 AM, Scott Lurndal wrote:
> BGB <cr88192@gmail.com> writes:
>> On 5/28/2023 4:33 PM, John Levine wrote:
>>> According to Scott Lurndal <slp53@pacbell.net>:
>>>>> Power10 became available in 2021, and Power11 (presumably) is on the
>>>>> way; patches are already appearing in gcc.
>>>>
>>>> Indeed, Power has been continuously enhanced since 1991. Which,
>>>> I believe is three decades ago. As for obsolescence, how many new
>>>> design wins has Power had in the last decade?
>>>
>>> I dunno, how many design wins has x86 had lately?
>>>
>>
>> I guess x86-64 was adopted for the PS4 and PS5, and XBox One, rather
>> than them sticking with PowerPC.
>>
>> Nintendo ended up mostly jumping from PowerPC to ARM.
>
> And apple switched from m68k to PPC to X86, then to ARM.
>

Yeah.

But, Apple apparently makes their own ARM cores as well.

Apparently Nintendo used a Tegra-X1 SOC in the Switch (with 4x
Cortex-A57 and 4x Cortex-A53).

I guess this is a little better than had they just thrown a 4x
Cortex-A53 at the problem (say, more like a typical cellphone).

So, abridged history something like:
NES 6502 / Ricoh 12AH
SNES 64C816
N64 MIPS R4000
Game Cube / Wii / Wii U: PowerPC based.
Switch ARM Cortex-A57

The original Game Boy line used a Z80 variant, then also later moved to
ARM with the Game Boy Advance.

I guess, as for Sega:
Master System Z80 variant
Genesis M68000
Saturn SuperH SH-2
Dreamcast SuperH SH-4
(Last console released in 1998/1999).

XBox:
Original x86 (Intel Cellron)
XBox 360 PowerPC (Xenon)
XBox One x86-64 (AMD Jaguar)
Series S/X x86-64 (AMD Zen2 8-core)

PlayStation:
Original MIPS R3000A
PS2 MIPS R5900
PS3 PowerPC / IBM Cell
PS4 x86-64 (AMD Jaguar)
PS5 x86-64 (AMD Zen2 based)

Looks like my PC vs the PS5:
My PC: 16x Zen+ cores at 3.7 GHz
PS5: 8x Zen2 cores at 3.5 GHz

I would assume single-threaded performance is probably fairly similar
between them.

None of the game consoles had ever gone for the Itanium, but it seems
like it could have done OK there (against the PowerPC), but was likely
too expensive at the time.

In my case, I am left to consider trying to build a small robot (has
been a while), but this is currently looking likely to be using an Arty
S7-50 and a RasPi.

Granted, no real practical purpose for this at the moment.

I have the Arty board, not using it for much else at the moment. And, it
is big enough that I should be able to shove a BJX2 core on it.

I would mostly need the RasPi due to a lack of obvious choice of radio
transceiver that is (at the same time):
Fast;
Cheap;
Can easily interface with my PC.

Say:
Packet Radio modules:
56k/112k RS-232 Serial, but big/expensive;
Would make more sense for a "somewhat bigger" robot.
(These modules wanted a 24V 5A supply, ...).
Bluetooth RS-232 modules:
9600 baud (a little too slow)
NRF24L01: 1 or 2 Mbps
No obvious way to connect to a PC (they are SPI based);
Would still need a RasPi or similar as an intermediary.
WiFi:
Needs something like a RasPi or similar (with a TCP/IP stack, ...).

Between the latter two options, this mostly effects the location of the
RasPi (on the robot vs on a desk).

Somewhere around here, I have some 640x480 camera modules (8-bit
parallel interface, + H/V sync strobes, bayer-array output, IIRC 30Hz).

IIRC, the modules basically spew out a stream of 8-bit values on the
data pins, along with the H and V sync pulses, and then repeating the
whole process 30 times per second.

I am likely to leave most of the video processing up to the BJX2 core in
this case...

May consider color-cell encoding it, then streaming it back to a PC.

Maybe also trying to run a depth-inference algorithm (TBD if I will use
mono or stereo).

Mono depth inference:
One can sorta use the depth-of-field of the camera for depth-inference;
Camera being adjusted such that in-focus objects are close to the
camera, further objects out-of-focus.

Stereo depth inference:
Try to align a pair of images and then figure out the point where they
are best aligned (minimal error).

The mono method is computationally cheaper.
Past version had been based on a modified version of the DCT algorithm
(one can use the DCT coefficients to estimate the depth relative to the
camera focus).

The stereo method can be done either with a probe/compare approach, or a
filter / neural-net operating on a pair of rows of pixel values (with
error-weights or neuron outputs for the depth; one can slide it along
for multiple points).

It is unclear if I can pull this off effectively on a 50 MHz CPU though.
Had before run these algorithms on a RasPi.

A lot of the needed stuff has partial ISA level support in the BJX2
core, so could be doable.

I guess, more traditionally, people would use ultrasonic or IR based
sensors here (measuring the amount of reflected ultrasound or reflected
infra-red light).

But, it is at least nominally more interesting to use a camera rather
than IR sensors.

If it can build a (likely voxel based) map of the environment an wander
around with A* pathfinding or similar, this could maybe be interesting.

Could run these algorithms a lot more easily on a PC, but this is kinda
lame (vs running them on the robot itself).

OTOH: Robots are less useful than CNC controllers, but CNC control isn't
quite as interesting, and one can just sort of throw a Cortex-M at a CNC
controller and call it done...

Will see if I get much done, or just end up losing interest and blowing
it off (usually what ends up happening with this sort of stuff).

....

Re: Misc: Design tradeoffs in virtual memory systems...

<2b392b87-691e-4e50-81b3-1c8b761def10n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32429&group=comp.arch#32429

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:144b:b0:3f6:b1ff:3b9b with SMTP id v11-20020a05622a144b00b003f6b1ff3b9bmr2917921qtx.9.1685386219612;
Mon, 29 May 2023 11:50:19 -0700 (PDT)
X-Received: by 2002:a05:6870:c786:b0:19f:6fae:d60f with SMTP id
dy6-20020a056870c78600b0019f6faed60fmr890356oab.7.1685386219310; Mon, 29 May
2023 11:50:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 29 May 2023 11:50:19 -0700 (PDT)
In-Reply-To: <X%3dM.555286$qjm2.466341@fx09.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b116:2eff:734:a100;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b116:2eff:734:a100
References: <u4por8$3tugb$1@dont-email.me> <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com> <66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com> <SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com> <u4tn2h$h7q5$1@dont-email.me>
<c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com> <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
<u4uakj$jmlc$1@dont-email.me> <6be9f945-59e3-492d-8ac9-35f987b981d2n@googlegroups.com>
<u4vjtf$sn61$1@dont-email.me> <6c8bb1bd-3ba5-4864-8b46-d1c738133c9cn@googlegroups.com>
<u50ng3$12k82$1@dont-email.me> <X%3dM.555286$qjm2.466341@fx09.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2b392b87-691e-4e50-81b3-1c8b761def10n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Mon, 29 May 2023 18:50:19 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 10072

by: MitchAlsup - Mon, 29 May 2023 18:50 UTC

On Monday, May 29, 2023 at 11:03:13 AM UTC-5, EricP wrote:
> Paul A. Clayton wrote:
> > On 5/28/23 1:06 PM, MitchAlsup wrote:
> >> On Sunday, May 28, 2023 at 8:12:42 AM UTC-5, Paul A. Clayton wrote:
> >>> On 5/27/23 10:28 PM, MitchAlsup wrote:
> >>>> On Saturday, May 27, 2023 at 8:28:36 PM UTC-5, Paul A. Clayton wrote:
> > [snip]
> >>>>> Also, there is some attraction *to me* for
> >>>>> storing PTPs and equivalent-node large page translations in the
> >>>>> same structure as this reduces the disincentive for large caches
> >>>>> (TLBs) for large pages.
> >>>> <
> >>>> PTPs in the TLB is a waste of TLB entries--a precious resource.
> >> <
> >>> For large page TLBs, there is a problem of utilization. If no
> >>> large pages are used, the area (and power) is wasted. By including
> >>> PTPs, most of the area and static power is not wasted (though
> >>> dynamic power will be due to accessing this special TLB for every
> >>> access — unless the large page TLB access can be delayed until
> >>> after a small page TLB miss is noted [prediction might also be
> >>> used]).
> >> <
> >> So put large page PTEs in the TLB !! problem solved.
> >
> > Including large pages and base pages in a single TLB has
> > traditionally been done using CAMs, which are relatively
> > expensive. (For L2, hash/rehash has been used.) In addition to the
> > difference in tag size and possibly data size, the indexing
> > methods are not consistent.
> Having multiple pages sizes stored in the same TLB requires
> a Ternary-CAM (TCAM) which has 0,1 and dont_care X entries.
> Obviously a TCAM is more complex than a Binary CAM (BCAM)
> (especially for FPGA's).
>
> A TCAM that stores two fixed size tags accomplishes the same
> as having two BCAMs, one for each tag size.
> BCAM is more efficient because each only stores 0 or 1 match values
> and only for the number of bits each tag requires.
<
TCAMs are from the Mc 68851 era.
<
There are other ways, for example, grouping PTE bits in groups of
10-bits each (or whatever suits you) and appending a bit. If this
bit is 1 those 10 bits participate in the CAMing, otherwise they
do not participate. I did this in a SPARC TLB. it adds one Transistor
to the CAM (virtual ground) and is just a trifling slower.
<
> > "put the large page PTEs in the TLB" [with the base page PTEs]
> > solves the utilization issue in terms of entries but increases
> > latency for large pages if hash/rehash is used (which latency can
> > be at least partially hidden by using the extra untranslated bits
> > for cache tag comparison and for L2 the latency would be less
> > important) or uses more area from a CAM-based TLB. In either case,
> > the difference in entry size is not exploited.
> >
> > (A small tag TLB with full-sized payload could also be used for
> > base size PTEs with a the missing upper bits filled in by a
> > constant [perhaps zero extension] or using some bits to reference
> > a table. Such use seems problematic, but it might be worth at
> > least researching. I seem to recall that x86 added an address
> > space extension for 4 MiB pages with 32-bit PTEs, though I doubt
> > such was used much.)
> >
> > I do not claim that having PTEs and PTPs of the same level share
> > storage is an obviously preferred design, but I do think it
> > presents a different balance of tradeoffs and should not be so
> > easily dismissed (I doubt you or anyone else has actually studied
> > this design choice). I admit that I promote it (to the extent I do) in
> > large part because, as far as I know, I am the first and
> > only person to notice this possibility. This *might* merely
> > indicate that the idea is not useful since negative results are
> > less commonly published even in academia, but I suspect that the
> > idea is merely not interesting enough to draw research.
> The most efficient radix tree table walker I know of caches interior
> page table nodes using the relevant part of the virtual address as tag
> for each level.
>
> If there is a TLB miss it walks backwards up the interior node caches
> checking the virtual address for a tag match at each level.
>
> The interior node for the PTP page-table-pointer for a 4kB page
> has the same tag address as the PTE page-table-entry for a 2MB page.
>
> That is, virtual address bits <63:12> are the match tag for 4kB page.
> Address <63:21> is the tag for both the interior node PTP and 2M-PTE
> so these can both be stored together in their own BCAM.
>
> The advantage of having two BCAMs, one for 4kB and one for 2MB+PTP pages is
> (a) the tag match for both levels can be done in parallel with a single
> port each, (b) each level BCAM only stores the match address bits
> it actually requires.
>
> It also means that 2MB-PTE and 4kB-PTP entries complete for space
> in the level-2 TLB. I doubt that is a problem.
>
> The design should also include the ability to skip upper tree levels
> on the walk down - go straight from top to say level 2 (2MB) or 3 (1GB)
> as 99% of apps will fit into a 1GB range.
> The table only needs 1 skip, from top to level-N, so only the top
> table entries need a level skip count.
> > (My motivation for the idea was from the limited number of large
> > page entries supported in some designs, introducing a chicken-and-
> > egg issue — providing hardware resources for a less used feature
> > would be wasteful, using a feature with less hardware resources is
> > less useful. I also note that a PTP cache could be used for
> > compressing tags in a TLB similar to Seznec's "Don't use the page
> > number, but a pointer to it" but replacing 'page number' with 'x
> > MiB region number' and applying to a TLB and not a general memory
> > cache. While a version of Seznec's idea was used in Itanium 2's
> > "prevalidated" L1 tags (which used a one hot bit to match a TLB
> > number, the one hot bit format facilitating flash invalidation on
> > TLB entry eviction), the utility of this kind of a design for a
> > TLB seems doubtful.)
> >
> > It looks like Intel's Sunny Cove combines hash/rehash for the L2 TLB,
> > fully associative (CAM) and multi-size for L1I TLBs, and separate
> > size-based TLBs for L1D
> > (https://www.anandtech.com/show/14664/testing-intel-ice-lake-10nm/2 ):
> >
> > Page # of
> > Size(s) Entries Associativity
> > L1D 4K 64 4-way
> > L1D 2M 32 4-way
> > L1D 1G 8 full
> > L1I 4K+2M 8 full
> > L1I 4K+2M+1G 16 full
> > L2 4K+2M 1024 8-way
> > L2 4K+1G 1024 8-way
> >
> > It is slightly interesting that the L2 is divided into two sections,
> > allowing three sizes with only hash/rehash (i.e., not requiring a third
> > probing). The L1I TLB is also divided into two parts with one supporting
> > all three page sizes (and having twice as many entries) while the other
> > only supports the two smaller page sizes.
> Last I heard all OS large page management allocated fixed size pools
> at boot for each 4k/2M/1G page size and managed within those pools
> (no breaking 2MB pages into 4kB and re-integrating them later).
> So large pages would mostly be used only for well-known uses.
>
> The 1GB pages are probably only used for mapping graphics memory and
> some OS parts. Eg some OS overlay a virtual map on all of physical memory
> so OS can access any physical address at an offset in that virtual area.
> 2MB pages would be used for OS code whose size is known at boot time.
> Privileged apps like RDB's with large resident cache could use large pages.
> Everything dynamically loaded or managed likely use 4kB pages.

Re: Misc: Design tradeoffs in virtual memory systems...

<FK6dM.3749433$vBI8.1054389@fx15.iad>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32430&group=comp.arch#32430

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!newsfeed.hasname.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx15.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
References: <u4por8$3tugb$1@dont-email.me> <16d4aaca-e13b-4330-9a50-fecd8933f5fdn@googlegroups.com> <u4qspr$23sa$1@dont-email.me> <ec6bd9e3-fc64-4c93-9b33-442f8d89a47en@googlegroups.com> <u4r27s$2lrj$1@dont-email.me> <4ac1abec-c685-4871-b2cd-079e4ea04991n@googlegroups.com> <u4r92b$3idb$1@dont-email.me> <881c9ad7-2cae-4ea1-a263-72d4778fb9c0n@googlegroups.com>
In-Reply-To: <881c9ad7-2cae-4ea1-a263-72d4778fb9c0n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 65
Message-ID: <FK6dM.3749433$vBI8.1054389@fx15.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 29 May 2023 19:08:21 UTC
Date: Mon, 29 May 2023 15:08:01 -0400
X-Received-Bytes: 3957

by: EricP - Mon, 29 May 2023 19:08 UTC

MitchAlsup wrote:
> On Friday, May 26, 2023 at 4:43:19 PM UTC-5, BGB-Alt wrote:
>>>
>> There are some protection flags in the PTEs as well, but they only cover
>> the traditional User/Supervisor and "Global RWX" state.
> <
> Consider a HyperVisor hosting 2 GuestOSs. Both Guest OSs want to run
> their applications (the normal way) by having the user program share
> page tables and "Optimizing" the TLB using the G-bit. Now that you have
> 2 GuestOSs, this G-bit is now allowing leaks between GuestOSs, confusing
> the TLB mappings, and causing a host of other problems.
> <
> Now, consider an ASID system where the GuestOSs use different ASIDs
> and now all those G-bit problems disappear ! the TLB remains unconfused,
> and the memory hierarchy knows what to do. HyperVisors do this to the
> old MMU models, new architectures should solve these problems without
> creating new ones.

The way I saw this working is that when Hyper-mode is enabled by
the Hypervisor (HV) writing a control register

(a) it enables a register containing a bit map that controls which
of selected instructions trigger a VM-exit exception.
Mostly it traps attempts to read or write certain control registers.

The VM-exit exception info block would contain handy info
parsed from the instruction. Eg a write to certain control
registers triggers a trap with the exception info containing
the control register number and value written.
HV can then just do a switch{} to emulate.

This saves HV having to parse some emulated instructions.

(The HV marks the control register that enables Hyper-mode as trapping.
Then if someone runs a hypervisor as a guest on the real-HV,
it will trap all control register reads and writes and the real-HV
can emulate for the guest-HV.)

(b) it enables the nested Hyper-mode page table with its own TLB and ASID.
Whereas the GuestOS PTE has User and Super RWE access control bits,
those same PTE bits in the HV PTE control Guest and Hypervisor access.

There would be two separate TLB's, each with its ASID.
Guest virtual addresses (VA) translate using the Guest-TLB
to Guest Addresses (GA).
GA's translate using the Hyper-TLB to physical addresses (PA).

The G-global bit in the guest page table PTE's and Guest-ASID apply
to the Guest-TLB entries,
and the HV page table PTE's G-global bit and Hyper-ASID to the Hyper-TLB.

Each TLB also includes caching for interior nodes of its own table
so each of Guest and Hyper table walker can optimize its translates.

To translate a VA->PA requires two sequential TLB lookups,
VA->GA using G-ASID, then GA->PA using H-ASID.

There could be a third TLB combing the two Guest & Hyper translates
to go straight from VA->PA, with entries tagged with both Guest-ASID
and Hyper-ASID. If either GuestOS changes its Guest-ASID then just
those entries are invalidated. If HypervisorOS changes its Hyper-ASID
then all guest entries so tagged are invalidated.

In article <2023May28.165701@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>In article <u4rm3o$52b5$1@dont-email.me>, BGB <cr88192@gmail.com> wrote:
>>>How many people have done much better with their hobby CPU ISA projects?...
>>
>>I haven't a clue. However, most undergrads who take an
>>architecture course do about the same.
>
>Implement a new architecture in an FPGA and get it to run Doom (which
>also means retargeting a compiler for his architecture)? I very much
>doubt it.

Actually, I really don't think that is that far off. Maybe they
won't port a compiler or a reasonably large game, but so what?
By the time anyone is doing that, most of the really hard work
has been done.

>[snip]
>It seems that the most limited resource here is BGB himself.

Well, you said it.

- Dan C.

In article <u4vvkq$udlp$2@dont-email.me>, BGB <cr88192@gmail.com> wrote:
>On 5/28/2023 9:30 AM, Dan Cross wrote:
>> In article <obocM.254898$LAYb.126941@fx02.iad>,
>> Scott Lurndal <slp53@pacbell.net> wrote:
>>> cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>>> In article <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>,
>>>> MitchAlsup <MitchAlsup@aol.com> wrote:
>>>>> On Friday, May 26, 2023 at 2:19:40 PM UTC-5, Scott Lurndal wrote:
>>>>>>> You use the word "enhance" in a way contrary to the dictionary definition..=
>>>>>>
>>>>>> There must have been demand for them from someone.
>>>>>
>>>>> A billion demands from a billion different people does not a Mona Lisa make.
>>>>
>>>> While I'm sure we can all agree that the x86 is a dog's
>>>> breakfast, that does not imply that all of its features are bad.
>>>> Nor does it imply that a hardware page-table walker is bad.
>>>>
>>>> The OP seems to think so, but has yet to provide a particularly
>>>> compelling argument beyond some measurements from very
>>>> unrepresentative workloads running on a hobby ISA on an FPGA
>>>
>>> To be fair to BGB, if software TLBs work for his particular
>>> hobby ISA, that's fine. It's the idea that software TLBs
>>> are universally better than hardware page table walkers that
>>> doesn't stand up to scrutiny.
>>
>> This exactly. It's one thing to create a hobby ISA as a passion
>> project; quite another to decide that that gives one the
>> authority to proclaim oneself an expert on related matters.
>
>When or where did I ever claim to be an expert on these topics?...

That is a reasonable interpretation of what the manner in which
you were arguing with Scott and I in alt.os.development about
what techniques made virtualization easier. Amazingly, soft
TLBs were among the first things mentioned here as complicating
virtualization.

>I make no claims of being an expert, nor "the smartest person in the
>room", nor anything similar...
>
>But, admittedly, IRL I don't really know anyone else with similar
>interests, so it is mostly all "the internet".

Cool.

- Dan C.

Re: Misc: Design tradeoffs in virtual memory systems...

<2a685537-cce8-467f-8af9-968ebaf696b8n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32435&group=comp.arch#32435

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:4413:b0:75b:38b6:a635 with SMTP id v19-20020a05620a441300b0075b38b6a635mr2383371qkp.6.1685390806992;
Mon, 29 May 2023 13:06:46 -0700 (PDT)
X-Received: by 2002:a05:6871:6baa:b0:19f:90e6:9381 with SMTP id
zh42-20020a0568716baa00b0019f90e69381mr68741oab.5.1685390806684; Mon, 29 May
2023 13:06:46 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!nntp.club.cc.cmu.edu!45.76.7.193.MISMATCH!3.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 29 May 2023 13:06:46 -0700 (PDT)
In-Reply-To: <FK6dM.3749433$vBI8.1054389@fx15.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b116:2eff:734:a100;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b116:2eff:734:a100
References: <u4por8$3tugb$1@dont-email.me> <16d4aaca-e13b-4330-9a50-fecd8933f5fdn@googlegroups.com>
<u4qspr$23sa$1@dont-email.me> <ec6bd9e3-fc64-4c93-9b33-442f8d89a47en@googlegroups.com>
<u4r27s$2lrj$1@dont-email.me> <4ac1abec-c685-4871-b2cd-079e4ea04991n@googlegroups.com>
<u4r92b$3idb$1@dont-email.me> <881c9ad7-2cae-4ea1-a263-72d4778fb9c0n@googlegroups.com>
<FK6dM.3749433$vBI8.1054389@fx15.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2a685537-cce8-467f-8af9-968ebaf696b8n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Mon, 29 May 2023 20:06:46 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 80

by: MitchAlsup - Mon, 29 May 2023 20:06 UTC

On Monday, May 29, 2023 at 2:10:25 PM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Friday, May 26, 2023 at 4:43:19 PM UTC-5, BGB-Alt wrote:
> >>>
> >> There are some protection flags in the PTEs as well, but they only cover
> >> the traditional User/Supervisor and "Global RWX" state.
> > <
> > Consider a HyperVisor hosting 2 GuestOSs. Both Guest OSs want to run
> > their applications (the normal way) by having the user program share
> > page tables and "Optimizing" the TLB using the G-bit. Now that you have
> > 2 GuestOSs, this G-bit is now allowing leaks between GuestOSs, confusing
> > the TLB mappings, and causing a host of other problems.
> > <
> > Now, consider an ASID system where the GuestOSs use different ASIDs
> > and now all those G-bit problems disappear ! the TLB remains unconfused,
> > and the memory hierarchy knows what to do. HyperVisors do this to the
> > old MMU models, new architectures should solve these problems without
> > creating new ones.
>
> The way I saw this working is that when Hyper-mode is enabled by
> the Hypervisor (HV) writing a control register
>
> (a) it enables a register containing a bit map that controls which
> of selected instructions trigger a VM-exit exception.
> Mostly it traps attempts to read or write certain control registers.
>
> The VM-exit exception info block would contain handy info
> parsed from the instruction. Eg a write to certain control
> registers triggers a trap with the exception info containing
> the control register number and value written.
> HV can then just do a switch{} to emulate.
>
> This saves HV having to parse some emulated instructions.
>
> (The HV marks the control register that enables Hyper-mode as trapping.
> Then if someone runs a hypervisor as a guest on the real-HV,
> it will trap all control register reads and writes and the real-HV
> can emulate for the guest-HV.)
>
> (b) it enables the nested Hyper-mode page table with its own TLB and ASID..
> Whereas the GuestOS PTE has User and Super RWE access control bits,
> those same PTE bits in the HV PTE control Guest and Hypervisor access.
>
> There would be two separate TLB's, each with its ASID.
> Guest virtual addresses (VA) translate using the Guest-TLB
> to Guest Addresses (GA).
> GA's translate using the Hyper-TLB to physical addresses (PA).
>
> The G-global bit in the guest page table PTE's and Guest-ASID apply
> to the Guest-TLB entries,
> and the HV page table PTE's G-global bit and Hyper-ASID to the Hyper-TLB.
<
Yes, but now when you switch GuestOSs you have to kill all TLB entries with
the G-bit set--because those translations belong to the (now) wrong GuestOS..
<
If instead GuestOS[1] used ASID[1] and GuestOS[2] used ASID[2], the TLB
would not need to be flushed if you had a means to show ASID[1] does not
have the same value as ASID[2].
>
> Each TLB also includes caching for interior nodes of its own table
> so each of Guest and Hyper table walker can optimize its translates.
>
> To translate a VA->PA requires two sequential TLB lookups,
> VA->GA using G-ASID, then GA->PA using H-ASID.
>
> There could be a third TLB combing the two Guest & Hyper translates
> to go straight from VA->PA, with entries tagged with both Guest-ASID
> and Hyper-ASID. If either GuestOS changes its Guest-ASID then just
> those entries are invalidated. If HypervisorOS changes its Hyper-ASID
> then all guest entries so tagged are invalidated.

Re: Misc: Design tradeoffs in virtual memory systems...

<TO7dM.3507588$iU59.2059992@fx14.iad>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32439&group=comp.arch#32439

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!nntp.club.cc.cmu.edu!45.76.7.193.MISMATCH!3.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx14.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Newsgroups: comp.arch
References: <u4por8$3tugb$1@dont-email.me> <1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com> <u4tn2h$h7q5$1@dont-email.me> <c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com> <44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com> <u4uakj$jmlc$1@dont-email.me> <6be9f945-59e3-492d-8ac9-35f987b981d2n@googlegroups.com> <u4vjtf$sn61$1@dont-email.me> <6c8bb1bd-3ba5-4864-8b46-d1c738133c9cn@googlegroups.com> <u50ng3$12k82$1@dont-email.me> <2023May29.171026@mips.complang.tuwien.ac.at>
Lines: 46
Message-ID: <TO7dM.3507588$iU59.2059992@fx14.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Mon, 29 May 2023 20:21:07 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Mon, 29 May 2023 20:21:07 GMT
X-Received-Bytes: 3096

by: Scott Lurndal - Mon, 29 May 2023 20:21 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>"Paul A. Clayton" <paaronclayton@gmail.com> writes:
>>Including large pages and base pages in a single TLB has
>>traditionally been done using CAMs, which are relatively
>>expensive.
>
>My impression was that larger pages were implemented by masking away
>the low bits of the comparison. Of course, if you have mixed entries,
>whether you have to mask away the low bits depends on the contents of
>the entry, which increases the latency, so it's no surprise that they
>are often separated for the L1 TLB.

ARMv8 supports a bit in the translation table entry called
the 'contiguous' bit. If set, the mapping is physically
contiguous with respect to prior translation table entry
at the terminal level. This allows the hardware to
'append' it to an existing TLB entry and just increase
the unit size in the TLB.

Not sure how widely used it is, as it relies on having
consecutive physical translation granules, and generally
the next higher translation table entry can be treated
as a block (superpage) anyway.

>>
>>It is slightly interesting that the L2 is divided into two
>>sections, allowing three sizes with only hash/rehash (i.e., not
>>requiring a third probing). The L1I TLB is also divided into two
>>parts with one supporting all three page sizes (and having twice
>>as many entries) while the other only supports the two smaller
>>page sizes.
>
>I wonder what the motivation for 1GB pages for code is.

The biggest use I found for them in the past is in the second
level (NPT, EPT) translation tables to reduce the cost
of guest translation walks rather significantly (from 22 to 11
memory references on a clean miss @4k). Assuming guest physical
address space is allocated in chunks aligned on 1GB boundaries.

ARMv8 allows various larger translation units (TLB entries)
depending on the translation granule (4k, 16k, 64k) and the
level at which the translation table walk terminates. As
big as 512Gbyte pages with a 64k granule size.

Re: Misc: Design tradeoffs in virtual memory systems...

<9Z7dM.3507589$iU59.58085@fx14.iad>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32441&group=comp.arch#32441

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.neodome.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx14.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Newsgroups: comp.arch
References: <u4por8$3tugb$1@dont-email.me> <16d4aaca-e13b-4330-9a50-fecd8933f5fdn@googlegroups.com> <u4qspr$23sa$1@dont-email.me> <ec6bd9e3-fc64-4c93-9b33-442f8d89a47en@googlegroups.com> <u4r27s$2lrj$1@dont-email.me> <4ac1abec-c685-4871-b2cd-079e4ea04991n@googlegroups.com> <u4r92b$3idb$1@dont-email.me> <881c9ad7-2cae-4ea1-a263-72d4778fb9c0n@googlegroups.com> <FK6dM.3749433$vBI8.1054389@fx15.iad>
Lines: 56
Message-ID: <9Z7dM.3507589$iU59.58085@fx14.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Mon, 29 May 2023 20:32:05 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Mon, 29 May 2023 20:32:05 GMT
X-Received-Bytes: 3721

by: Scott Lurndal - Mon, 29 May 2023 20:32 UTC

EricP <ThatWouldBeTelling@thevillage.com> writes:
>MitchAlsup wrote:
>> On Friday, May 26, 2023 at 4:43:19 PM UTC-5, BGB-Alt wrote:
>>>>
>>> There are some protection flags in the PTEs as well, but they only cover
>>> the traditional User/Supervisor and "Global RWX" state.
>> <
>> Consider a HyperVisor hosting 2 GuestOSs. Both Guest OSs want to run
>> their applications (the normal way) by having the user program share
>> page tables and "Optimizing" the TLB using the G-bit. Now that you have
>> 2 GuestOSs, this G-bit is now allowing leaks between GuestOSs, confusing
>> the TLB mappings, and causing a host of other problems.
>> <
>> Now, consider an ASID system where the GuestOSs use different ASIDs
>> and now all those G-bit problems disappear ! the TLB remains unconfused,
>> and the memory hierarchy knows what to do. HyperVisors do this to the
>> old MMU models, new architectures should solve these problems without
>> creating new ones.
>
>The way I saw this working is that when Hyper-mode is enabled by
>the Hypervisor (HV) writing a control register
>
>(a) it enables a register containing a bit map that controls which
>of selected instructions trigger a VM-exit exception.
>Mostly it traps attempts to read or write certain control registers.

ARMv8 does exactly this. As there are several hundred
system (control) registers, the group functionally related
traps into fewer bits. However, that turned out not to be
as useful as desired, so they added in a recent version of
the architecture FEAT_FGT (fine grained traps) which adds
256 additional trap bits on an individual register basis. IIRC,
this was to efficiently support nested hypervisors; for which
they also have added additional support features (FEAT_NV, FEAT_NV2).

>
>The VM-exit exception info block would contain handy info
>parsed from the instruction. Eg a write to certain control
>registers triggers a trap with the exception info containing
>the control register number and value written.

Yep. For the class of traps that require this information, it
is included in the exception status register by the hardware
on ARMv8.

>(b) it enables the nested Hyper-mode page table with its own TLB and ASID.

In ARM, the hypervisor doesn't require an ASID as the TLB entries
are tagged with the exception level (EL/Ring) at which they were
created and the EL is included when CAMing the TLB - note that
TLB entries for all four exception levels may be mapping the
same VA.

Guest EL1/EL0 TLB entries are tagged with the guest assigned ASID and the hypervisor
assigned VMID (Virtual Machine ID).

Re: Misc: Design tradeoffs in virtual memory systems...

<%48dM.401322$b7Kc.121014@fx39.iad>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32443&group=comp.arch#32443

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx39.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
References: <u4por8$3tugb$1@dont-email.me> <16d4aaca-e13b-4330-9a50-fecd8933f5fdn@googlegroups.com> <u4qspr$23sa$1@dont-email.me> <ec6bd9e3-fc64-4c93-9b33-442f8d89a47en@googlegroups.com> <u4r27s$2lrj$1@dont-email.me> <4ac1abec-c685-4871-b2cd-079e4ea04991n@googlegroups.com> <u4r92b$3idb$1@dont-email.me> <881c9ad7-2cae-4ea1-a263-72d4778fb9c0n@googlegroups.com> <FK6dM.3749433$vBI8.1054389@fx15.iad> <2a685537-cce8-467f-8af9-968ebaf696b8n@googlegroups.com>
In-Reply-To: <2a685537-cce8-467f-8af9-968ebaf696b8n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 59
Message-ID: <%48dM.401322$b7Kc.121014@fx39.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 29 May 2023 20:40:27 UTC
Date: Mon, 29 May 2023 16:40:17 -0400
X-Received-Bytes: 3980

by: EricP - Mon, 29 May 2023 20:40 UTC

MitchAlsup wrote:
> On Monday, May 29, 2023 at 2:10:25 PM UTC-5, EricP wrote:
>> MitchAlsup wrote:
>>> On Friday, May 26, 2023 at 4:43:19 PM UTC-5, BGB-Alt wrote:
>>>> There are some protection flags in the PTEs as well, but they only cover
>>>> the traditional User/Supervisor and "Global RWX" state.
>>> <
>>> Consider a HyperVisor hosting 2 GuestOSs. Both Guest OSs want to run
>>> their applications (the normal way) by having the user program share
>>> page tables and "Optimizing" the TLB using the G-bit. Now that you have
>>> 2 GuestOSs, this G-bit is now allowing leaks between GuestOSs, confusing
>>> the TLB mappings, and causing a host of other problems.
>>> <
>>> Now, consider an ASID system where the GuestOSs use different ASIDs
>>> and now all those G-bit problems disappear ! the TLB remains unconfused,
>>> and the memory hierarchy knows what to do. HyperVisors do this to the
>>> old MMU models, new architectures should solve these problems without
>>> creating new ones.
>> The way I saw this working is that when Hyper-mode is enabled by
>> the Hypervisor (HV) writing a control register
>>
>> (b) it enables the nested Hyper-mode page table with its own TLB and ASID..
>> Whereas the GuestOS PTE has User and Super RWE access control bits,
>> those same PTE bits in the HV PTE control Guest and Hypervisor access.
>>
>> There would be two separate TLB's, each with its ASID.
>> Guest virtual addresses (VA) translate using the Guest-TLB
>> to Guest Addresses (GA).
>> GA's translate using the Hyper-TLB to physical addresses (PA).
>>
>> The G-global bit in the guest page table PTE's and Guest-ASID apply
>> to the Guest-TLB entries,
>> and the HV page table PTE's G-global bit and Hyper-ASID to the Hyper-TLB.
> <
> Yes, but now when you switch GuestOSs you have to kill all TLB entries with
> the G-bit set--because those translations belong to the (now) wrong GuestOS..
> <
> If instead GuestOS[1] used ASID[1] and GuestOS[2] used ASID[2], the TLB
> would not need to be flushed if you had a means to show ASID[1] does not
> have the same value as ASID[2].

Ah yes, you are right. Requires a minor tweak to fix.

If hyper-mode is enabled, in the Guest-TLB
- if the G bit is clear, meaning per-process entry,
then it is tagged with both the Guest-ASID and Hyper-ASID.
- if the G bit is set then it is tagged with just the Hyper-ASID.

The Hyper-TLB also tags its G bit clear entries with the Hyper-ASID.
If G bit is set it is not tagged with the Hyper-ASID.

When HV switches GuestOS then it also switches H-ASID so all the
prior GuestOS entries would be invisible and can be retained.

For the smaller L1-TLB's with 8 to 64 entries I have difficulty seeing
that retaining entries across VM switches would be much use.
For the larger L2-TLB's with 1024 entries it probably would help.

John Levine <johnl@taugh.com> writes:
>A long time ago the DG Eclipse managed to shoehorn the 32 bit
>instruction set into the holes in the Nova's 16 bit instruction
>set so they could avoid a mode bit. (That was "Soul of a New
>Machine.") Dunno anyone else who did that.

MIPS, SPARC, and Power can be considered to be like that (or another
way of presenting it is that the 32-bit ISA is a subset of the 64-bit
ISA). Interestingly, that's not the case for RISC-V.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Misc: Design tradeoffs in virtual memory systems...

<u535nj$1lcve$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32446&group=comp.arch#32446

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Mon, 29 May 2023 17:33:37 -0400
Organization: A noiseless patient Spider
Lines: 128
Message-ID: <u535nj$1lcve$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>
<u4r32e$mrq$1@reader2.panix.com>
<66610ab9-0c63-4b8e-adfc-3acf89a56dc4n@googlegroups.com>
<u4r7hv$6e5$1@reader2.panix.com> <u4rm3o$52b5$1@dont-email.me>
<bacaa0f0-e42e-4455-9cc8-373aa5cef9a0n@googlegroups.com>
<SrqcM.2167850$MVg8.198396@fx12.iad>
<1f00da07-fb5e-4d79-8e42-203daf875bf5n@googlegroups.com>
<u4tn2h$h7q5$1@dont-email.me>
<c74140e9-3f5c-42db-a241-6bb800c2404an@googlegroups.com>
<44e81b24-07a4-4e7b-b0b1-780b95f0e387n@googlegroups.com>
<u4uakj$jmlc$1@dont-email.me>
<6be9f945-59e3-492d-8ac9-35f987b981d2n@googlegroups.com>
<u4vjtf$sn61$1@dont-email.me>
<6c8bb1bd-3ba5-4864-8b46-d1c738133c9cn@googlegroups.com>
<u50ng3$12k82$1@dont-email.me> <u50r01$13c6e$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 29 May 2023 21:33:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="953f34457ed8080dd53a95746db7265a";
logging-data="1749998"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/xvoaDzg2J/fykkXN63BI3CMLoWl6XyQg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:K2+9YPDPKVFKS9Q2aY2gcANdji4=
In-Reply-To: <u50r01$13c6e$1@dont-email.me>

by: Paul A. Clayton - Mon, 29 May 2023 21:33 UTC

On 5/28/23 8:18 PM, BGB wrote:
> On 5/28/2023 6:18 PM, Paul A. Clayton wrote:
[snip]
>> Including large pages and base pages in a single TLB has
>> traditionally been done using CAMs, which are relatively
>> expensive. (For L2, hash/rehash has been used.) In addition to the
>> difference in tag size and possibly data size, the indexing
>> methods are not consistent.
>
> Yeah...
>
> To make much use of large pages (within a single TLB) requires a
> highly associative TLB, which is very expensive.
>
> So, one can have a small fully-associative TLB which "eats all the
> LUTs", or a significantly larger set-associative TLB that eats a
> few Block-RAMs. I went for the Block-RAMs.

For 4K and 16K pages, Seznec's "Concurrent Support of Multiple
Page Sizes On a Skewed Associative TLB" might be worth looking at.
It uses what I call "overlaid" skewed associativity where
different ways/indexing functions use a common storage. Each page
size uses a distinct group of ways that can index into the common
banked storage with the indexing functions guaranteeing no bank
conflicts.

Mode-based or region-based (like Itanium) page sizes would also be
"easy" to support without CAMs. The page size can be determined
before address generation is completed and the appropriate bit
shifting done.

Predicting page size would also be possible. With two page sizes,
assigning an extra predictor bit to each register might provide
good enough performance. This does require special handling of a
page size misprediction.

Another (questionable) option might be to use "sectored" TLB
entries (where each tag maps multiple PTEs). In a straightforward
design for 4K and 16K pages using 4 entries per tag, this would
provide worse performance as 16K pages would waste three PTE
storage slots and many 4K page accesses do not have enough
locality of reference to benefit from caching a chunk of 4 PTEs. A
"modest" adjustment would provide the option of two PTE entries
being used for tag and payload for a different 16K region.
(Possibly indexing the structure on the "right" side for one
length then on the "left" side for another?) [Adding a "left" side
tag is another possibility.] This alteration might perform a
little better than the base design, but introduces the problem of
different tags wanting different PTEs to be valid. Reading from
left-to-right or right-to-left and using compacting based on a
validity mask seems likely to be impractical. (Not compacting and
trying to maximize utilization at TLB-fill with different index
within a slot for different ways _might_ help. This might help
manage the issue of 'slots' using two tags wanting different pages
to be valid.)

Only supporting either full 4-entry for 4K pages or two slots at a
hardware index for 16K page would avoid the validity-based entry
selection issue.

Providing extra storage for a potentially unused tag might be
worth considering. This would allow paired 4K page to be common
(rather than pair and singular for the alternative tag using PTE
storage). The extra tag might also be used to bias replacement
(e.g., a TLB miss that is a tag hit might set a 'keep me' bit on
refill to delay eviction), though I doubt that would be very
helpful.

Yet another unlikely-to-be-useful possibility would be to use 16K
pages with invalid bits for 4K pages that do not share the
translation (and metadata if that is not replicated). This would
be *slightly* better than always using multiple slots at "the
same" index (with skewed associativity conflicts would be smaller)
for 4K pages. If 4K pages commonly have address
contiguity/coloring, the increase in conflict misses might not be
horrible. With valid-bit filtering, if arbitrary pages within a
16K group mismatch in PTE data, the TLB entry can still hold
multiple matching pages without needing special decompaction
hardware.

(AMD has used a fully associative L1 TLB that merged 8 entries
that can share the address as if a single page 8 times larger.
Being fully associative, the conflict issue of potentially 8 page
groups mapping to the same index is not a problem.)

An option presented in "Efficient Address Translation for
Architectures with Multiple Page Sizes"(Guilherme Cox and Abhishek
Bhattacharjee, 2017) is to index all page sizes by one size and
coalesce multiple large pages that share the same index to
compensate for the replication across multiple sets (from indexing
by the smaller page size). This assumes that the number of sets in
which a translation is "mirrored" is less than the replication
factor. I have not thought much about this; that it violates my
intuition is not much of an argument. (Large pages that are
accessed sparsely — i.e., only a few mirrored entries are present
at a time — might be somewhat common for an L1 TLB. The mirroring
is also necessarily constrained to the number of sets, so a TLB
with few sets would have less waste.)

[For an ASIC design supporting a double-base-size page, a CAM-like
chooser among a pair of entries might practical. The chooser might
store the XOR of the (one bit) page size indicator for each entry
of the pair and a distinct one-bit hash of non-shared indexing
bits. Ties — where either could be a hit based on the one bit per
entry — might be resolved with a per-way [or even per chooser]
policy that combined with fill/replacement policy. Filtering out
the rest of the access when neither entry can be a match might
have energy-saving potential, but I doubt it. (Note that a similar
mechanism could be used for a "normal" single-index cache to
select one way from a part of ways based on a one-bit hash. While
larger virtual address hash keys are used for way prediction in
actual processors, a chooser with smaller storage budget might be
useful in some way. {Obviously, one could be completely insane and
use a binary tree instead of an ordinary row decoder.☺})]

Obviously, these kinds of techniques can be mixed and parameters
varied. E.g., using 8K pseudo-pages for indexing with two PTEs per
TLB entry would moderate/shift some of the tradeoffs. Having pairs
of 8K pseudo-pages map to the same entry for some ways in an
overlaid skewed associative design might avoid duplicate entries
for 16K pages (where both 8K halves are referenced near in time)
while moderating conflict misses — one might even use slots 8K
pseudo-pages where 4K pages happen to match.

While I have thought about this before at some length, I have not
made an effort to evaluate the tradeoffs (especially not for two
page sizes only differing by a factor of four). Much of the above
should only be considered food for thought.

Re: Misc: Design tradeoffs in virtual memory systems...

<u537p9$1ltv8$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32447&group=comp.arch#32447

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bohannonindustriesllc@gmail.com (BGB-Alt)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Mon, 29 May 2023 17:08:40 -0500
Organization: A noiseless patient Spider
Lines: 56
Message-ID: <u537p9$1ltv8$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<Ia3cM.3440031$iU59.2338510@fx14.iad>
<u4vj9n$25m1c$1@newsreader4.netcologne.de>
<d4f75b55-78f1-4fa4-96fb-9fb45fee8168n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 29 May 2023 22:08:42 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4432455b5f0239329e1d2bde81148bb5";
logging-data="1767400"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+PZcUc0fk5rFH7QOtBxiQ7K3ometnoS0A="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.1
Cancel-Lock: sha1:uXgMetwcmYuwPII0riwtDUDyqqM=
Content-Language: en-US
In-Reply-To: <d4f75b55-78f1-4fa4-96fb-9fb45fee8168n@googlegroups.com>

by: BGB-Alt - Mon, 29 May 2023 22:08 UTC

On 5/28/2023 12:04 PM, MitchAlsup wrote:
> On Sunday, May 28, 2023 at 8:02:07 AM UTC-5, Thomas Koenig wrote:
>> Scott Lurndal <sc...@slp53.sl.home> schrieb:
>>
>>>> Power and PowerPC
>>
>> [...]
>>
>>> Obsolete CPUs, all designed three decades ago.
>>
>> Power10 became available in 2021, and Power11 (presumably) is on the
>> way; patches are already appearing in gcc.
> <
> But Power (on which it is based) is a 4 decade old design.

Presumably the merit is more based on how effectively things work, than
on the (absolute) age of the design or design elements?...

In my case, a lot of decisions were based on:
How expensive / complicated is the mechanism?
What are its pros/cons.

For example, relative to SH-4, I did eliminate auto-increment and delay
slots. Relative to x86 or ARM and others, I had no interest in adding
condition-codes, ...

There were a whole lot of "nice to have" features that were left out for
sake of eating too many LUTs. And, most of what I do have, is because it
wasn't too expensive.

Does lead to some things seeming a little lopsided and wonky, as "what
is cheaper or more expensive" doesn't necessarily match "what features
make more sense in an intuitive sense" (and, "makes intuitive sense" and
"elegance" and similar were not really design priorities in my case).

Granted, someone could try to argue that my designs are lopsided in
terms of cost as well.

Some things are unclear, for example, I had designed my ISA around the
assumption of not having a zero register, but a zero register could have
eliminated some number of instructions from having needed to exist (Say,
one doesn't need "NEG Rm, Rn" if they have "SUB ZR, Rm, Rn", etc).

But, it is as it is...

Some people have figured out how to get CPU cores with a much smaller
LUT budget and at higher clock-speeds, which does seem more potentially
interesting (well, excluding the ones which internally use 8-bit ALUs or
similar, which is not so interesting...).

Say, 200MHz isn't quite so useful if it takes 10 cycles to add two
numbers or similar...

Re: Misc: Design tradeoffs in virtual memory systems...

<573393d3-ebf9-4f09-8264-22345147d5b1n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32453&group=comp.arch#32453

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:4b65:0:b0:626:b55:3a8f with SMTP id m5-20020ad44b65000000b006260b553a8fmr38441qvx.0.1685400379883;
Mon, 29 May 2023 15:46:19 -0700 (PDT)
X-Received: by 2002:aca:c2d7:0:b0:397:f064:2a9e with SMTP id
s206-20020acac2d7000000b00397f0642a9emr114612oif.11.1685400379574; Mon, 29
May 2023 15:46:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 29 May 2023 15:46:19 -0700 (PDT)
In-Reply-To: <u537p9$1ltv8$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b116:2eff:734:a100;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b116:2eff:734:a100
References: <u4por8$3tugb$1@dont-email.me> <Ia3cM.3440031$iU59.2338510@fx14.iad>
<u4vj9n$25m1c$1@newsreader4.netcologne.de> <d4f75b55-78f1-4fa4-96fb-9fb45fee8168n@googlegroups.com>
<u537p9$1ltv8$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <573393d3-ebf9-4f09-8264-22345147d5b1n@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Mon, 29 May 2023 22:46:19 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 7296

by: MitchAlsup - Mon, 29 May 2023 22:46 UTC

On Monday, May 29, 2023 at 5:08:45 PM UTC-5, BGB-Alt wrote:
<
BGB has an alter ego !?! Who knew !!
<
> On 5/28/2023 12:04 PM, MitchAlsup wrote:
> > On Sunday, May 28, 2023 at 8:02:07 AM UTC-5, Thomas Koenig wrote:
> >> Scott Lurndal <sc...@slp53.sl.home> schrieb:
> >>
> >>>> Power and PowerPC
> >>
> >> [...]
> >>
> >>> Obsolete CPUs, all designed three decades ago.
> >>
> >> Power10 became available in 2021, and Power11 (presumably) is on the
> >> way; patches are already appearing in gcc.
> > <
> > But Power (on which it is based) is a 4 decade old design.
>
> Presumably the merit is more based on how effectively things work, than
> on the (absolute) age of the design or design elements?...
<
A really good design/architecture can last a very long period of time.
>
> In my case, a lot of decisions were based on:
> How expensive / complicated is the mechanism?
> What are its pros/cons.
<
Date 1983:: We were sitting around debating the merits of putting a
<gasp> 32×32 multiplier on what would become Mc88100. The debate
ran long and rabid. I decided to leave the discussion and use the new
layout tool. So, I opened up Neil Weste's book and laied out a multiplier
cell. After playing around with it for a few hours I had demonstrable
data that the 32×32 was not "that big". This eliminated the argumentation.
>
> For example, relative to SH-4, I did eliminate auto-increment and delay
> slots. Relative to x86 or ARM and others, I had no interest in adding
> condition-codes, ...
>
Bad, good.
>
> There were a whole lot of "nice to have" features that were left out for
> sake of eating too many LUTs. And, most of what I do have, is because it
> wasn't too expensive.
>
> Does lead to some things seeming a little lopsided and wonky, as "what
> is cheaper or more expensive" doesn't necessarily match "what features
> make more sense in an intuitive sense" (and, "makes intuitive sense" and
> "elegance" and similar were not really design priorities in my case).
>
> Granted, someone could try to argue that my designs are lopsided in
> terms of cost as well.
>
> Some things are unclear, for example, I had designed my ISA around the
> assumption of not having a zero register, but a zero register could have
> eliminated some number of instructions from having needed to exist (Say,
> one doesn't need "NEG Rm, Rn" if they have "SUB ZR, Rm, Rn", etc).
<
My 66000 does not have a zero register, either. My 66000 has universal
constants, constants that can go in any operand position. In many
implementations, a 5-bit #0 field can be delivered as a zero value at
far lower Power cost than reading R0 from a register file, special casing
it in Operand Forwarding, and delivering same to the calculation unit.
<
Universal constants also solve the reverse-SUB instruction, and cleans
up things like::
<
FDIV Rd,#1,Rs // rather than having a RCP instruction
SLL Rd,#1,RS
STD #3.141592653582763,[IP,R3<<3,DISP64]
FMAC Rd,Rs1,#3.141592653582763,Rs3
<
on all the non-commutative arithmetics.
<
Here is a snippet of FP code where the compiler has been so efficient
the only instructions left are those utilizing the FMAC unit--42 instructions
in a row:: navier_stokes_2d_exact:: rhs_sprial::
<
..LBB15_2: ; =>This Inner Loop Header: Depth=1
ldd r8,[r27,r7<<3,0]
fmul r9,r8,#0x400921FB54442D18
fcos r10,r9
ldd r11,[r28,r7<<3,0]
fmul r12,r11,#0x400921FB54442D18
fsin r13,r12
fmul r14,r13,-r10
fsin r9,r9
fmul r15,r9,#0x400921FB54442D18
fmul r15,r15,r13
fmul r22,r10,#0x4023BD3CC9BE45DE
fmul r22,r22,r13
fmul r21,r10,#0x400921FB54442D18
fcos r12,r12
fmul r21,r21,r12
fmul r10,r1,r10
fmul r10,r10,r13
fmul r20,r9,r12
fmul r18,r9,#0xC023BD3CC9BE45DE
fmul r18,r18,r12
fmul r17,r9,#0xC00921FB54442D18
fmul r13,r17,r13
fmul r9,r1,r9
fmul r9,r9,r12
fmul r8,r8,#0x401921FB54442D18
fmul r11,r11,#0x401921FB54442D18
fsin r8,r8
fmul r8,r2,r8
fsin r11,r11
fmul r11,r2,r11
fmul r12,r3,r14
fmul r14,r3,r15
fmul r15,r3,r22
fmul r22,r3,r21
fmul r21,r3,r20
fmul r20,r3,r18
fmul r13,r3,r13
fmul r9,r3,r9
fmul r8,r4,r8
fmul r11,r4,r11
fmul r18,r12,r14
fmac r10,r3,r10,r18
fmac r10,r22,-r21,r10
fmac r8,r5,r8,r10
fadd r10,r15,r15
fmac r8,r26,-r10,r8
ldd r10,[r25,r7<<3,0]
fadd r8,r8,-r10
std r8,[r29,r7<<3,0]
fmac r8,r12,r22,-r9
fmac r8,r21,r13,r8
fmac r8,r5,r11,r8
fadd r9,r20,r20
fmac r8,r26,-r9,r8
ldd r9,[r24,r7<<3,0]
fadd r8,r8,-r9
std r8,[r30,r7<<3,0]
fadd r8,r14,r13
ldd r9,[r23,r7<<3,0]
fadd r8,r8,-r9
std r8,[r19,r7<<3,0]
add r7,r7,#1
cmp r8,r7,r6
bne r8,.LBB15_2
..LBB15_3: ; %.loopexit
>
> But, it is as it is...
>
And let it remain.
>
> Some people have figured out how to get CPU cores with a much smaller
> LUT budget and at higher clock-speeds, which does seem more potentially
> interesting (well, excluding the ones which internally use 8-bit ALUs or
> similar, which is not so interesting...).
<
in 5nm one can put literally 1,000+ R2000 cores, and probably get them running
at 3+ GHz without much effort. The effort these days is twofold:: utilizing all
these cores (software) and making each core reasonably efficient (hardware)..
>
>
> Say, 200MHz isn't quite so useful if it takes 10 cycles to add two
> numbers or similar...
<
Pentium 4 taught us not to do that. To a similar degree, *Dozer family did too.
<
There is a point where deepening the pipeline is costing power more rapidly
that it is delivering performance.

Re: Misc: Design tradeoffs in virtual memory systems...

<u53b3c$1mi69$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32457&group=comp.arch#32457

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bohannonindustriesllc@gmail.com (BGB-Alt)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Mon, 29 May 2023 18:05:16 -0500
Organization: A noiseless patient Spider
Lines: 79
Message-ID: <u53b3c$1mi69$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me> <obocM.254898$LAYb.126941@fx02.iad>
<u4vohj$591$1@reader2.panix.com> <u4vvkq$udlp$2@dont-email.me>
<u52ug7$ngk$2@reader2.panix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 29 May 2023 23:05:16 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4432455b5f0239329e1d2bde81148bb5";
logging-data="1788105"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+I+yIo13A5EpjsuP0gNwNlf9lTKBUqti0="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.1
Cancel-Lock: sha1:alej1O1EhbkXgV4/6z5gda2hI8I=
In-Reply-To: <u52ug7$ngk$2@reader2.panix.com>
Content-Language: en-US

by: BGB-Alt - Mon, 29 May 2023 23:05 UTC

On 5/29/2023 2:30 PM, Dan Cross wrote:
> In article <u4vvkq$udlp$2@dont-email.me>, BGB <cr88192@gmail.com> wrote:
>> On 5/28/2023 9:30 AM, Dan Cross wrote:
>>> In article <obocM.254898$LAYb.126941@fx02.iad>,
>>> Scott Lurndal <slp53@pacbell.net> wrote:
>>>> cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>>>> In article <d4319038-c204-43f5-a093-d00d914f59d8n@googlegroups.com>,
>>>>> MitchAlsup <MitchAlsup@aol.com> wrote:
>>>>>> On Friday, May 26, 2023 at 2:19:40 PM UTC-5, Scott Lurndal wrote:
>>>>>>>> You use the word "enhance" in a way contrary to the dictionary definition..=
>>>>>>>
>>>>>>> There must have been demand for them from someone.
>>>>>>
>>>>>> A billion demands from a billion different people does not a Mona Lisa make.
>>>>>
>>>>> While I'm sure we can all agree that the x86 is a dog's
>>>>> breakfast, that does not imply that all of its features are bad.
>>>>> Nor does it imply that a hardware page-table walker is bad.
>>>>>
>>>>> The OP seems to think so, but has yet to provide a particularly
>>>>> compelling argument beyond some measurements from very
>>>>> unrepresentative workloads running on a hobby ISA on an FPGA
>>>>
>>>> To be fair to BGB, if software TLBs work for his particular
>>>> hobby ISA, that's fine. It's the idea that software TLBs
>>>> are universally better than hardware page table walkers that
>>>> doesn't stand up to scrutiny.
>>>
>>> This exactly. It's one thing to create a hobby ISA as a passion
>>> project; quite another to decide that that gives one the
>>> authority to proclaim oneself an expert on related matters.
>>
>> When or where did I ever claim to be an expert on these topics?...
>
> That is a reasonable interpretation of what the manner in which
> you were arguing with Scott and I in alt.os.development about
> what techniques made virtualization easier. Amazingly, soft
> TLBs were among the first things mentioned here as complicating
> virtualization.
>

I wasn't claiming either that it makes it "easy" to implement on the
software side of things... Rather, that it "could still be done".

But admittedly, what I was imagining would have in some ways been closer
to an emulator than any sort of high-level machine.

But, as noted, with such a design one is already spending a fair bit of
code (in terms of kLOC) managing page-table stuff and similar that
would, arguably, be entirely unnecessary on something like x86 (well,
since there is a whole lot less hand-holding from the architecture in
this case).

But, in this case, the idea was that it "could" allow making the
hardware side of things cheaper (as in, fewer LUTs, or less transistors).

Like, this is coming from the perspective of an architectural category
where one is already expecting, for example, that the compiler will
manage things like scheduling instructions within the CPU's pipeline
(all stuff that would be managed, in hardware, on a traditional
superscalar machine).

Or, IOW, that the compiler is a much bigger pain to write.
So, asking maybe a few extra kLOC here or there being, "not really a big
deal"...

>> I make no claims of being an expert, nor "the smartest person in the
>> room", nor anything similar...
>>
>> But, admittedly, IRL I don't really know anyone else with similar
>> interests, so it is mostly all "the internet".
>
> Cool.
>
> - Dan C.
>

Re: Misc: Design tradeoffs in virtual memory systems...

<u53bmi$1mk1a$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32458&group=comp.arch#32458

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Mon, 29 May 2023 16:15:29 -0700
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <u53bmi$1mk1a$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<Ia3cM.3440031$iU59.2338510@fx14.iad>
<u4vj9n$25m1c$1@newsreader4.netcologne.de>
<d4f75b55-78f1-4fa4-96fb-9fb45fee8168n@googlegroups.com>
<u537p9$1ltv8$1@dont-email.me>
<573393d3-ebf9-4f09-8264-22345147d5b1n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 29 May 2023 23:15:30 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="764d6fdff06a664a3a8afb35c1dbeeeb";
logging-data="1789994"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19/BPlN+OoQKUU1h3tQXTeQLLREzTPNhms="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.11.2
Cancel-Lock: sha1:fAtT+kBaJsMln7D3C8PsJabIkS4=
In-Reply-To: <573393d3-ebf9-4f09-8264-22345147d5b1n@googlegroups.com>
Content-Language: en-US

by: Stephen Fuld - Mon, 29 May 2023 23:15 UTC

On 5/29/2023 3:46 PM, MitchAlsup wrote:
> On Monday, May 29, 2023 at 5:08:45 PM UTC-5, BGB-Alt wrote:
> <
> BGB has an alter ego !?! Who knew !!
> <
>> On 5/28/2023 12:04 PM, MitchAlsup wrote:
>>> On Sunday, May 28, 2023 at 8:02:07 AM UTC-5, Thomas Koenig wrote:
>>>> Scott Lurndal <sc...@slp53.sl.home> schrieb:
>>>>
>>>>>> Power and PowerPC
>>>>
>>>> [...]
>>>>
>>>>> Obsolete CPUs, all designed three decades ago.
>>>>
>>>> Power10 became available in 2021, and Power11 (presumably) is on the
>>>> way; patches are already appearing in gcc.
>>> <
>>> But Power (on which it is based) is a 4 decade old design.
>>
>> Presumably the merit is more based on how effectively things work, than
>> on the (absolute) age of the design or design elements?...
> <
> A really good design/architecture can last a very long period of time.

Sure, but so can a not so "really good" design/architecture, see X86. I
think a design's longevity has more to do with various market factors,
such as was it incorporated in a very successful product, than its
inherent quality.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Misc: Design tradeoffs in virtual memory systems...

<752b8b87-21d3-49bd-baca-9c42d9c852abn@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32460&group=comp.arch#32460

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:25ca:b0:75c:abe8:b54b with SMTP id y10-20020a05620a25ca00b0075cabe8b54bmr130654qko.14.1685404175829;
Mon, 29 May 2023 16:49:35 -0700 (PDT)
X-Received: by 2002:a05:6870:5a89:b0:19a:4eab:8e36 with SMTP id
dt9-20020a0568705a8900b0019a4eab8e36mr148206oab.3.1685404175529; Mon, 29 May
2023 16:49:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 29 May 2023 16:49:35 -0700 (PDT)
In-Reply-To: <u53bmi$1mk1a$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b116:2eff:734:a100;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b116:2eff:734:a100
References: <u4por8$3tugb$1@dont-email.me> <Ia3cM.3440031$iU59.2338510@fx14.iad>
<u4vj9n$25m1c$1@newsreader4.netcologne.de> <d4f75b55-78f1-4fa4-96fb-9fb45fee8168n@googlegroups.com>
<u537p9$1ltv8$1@dont-email.me> <573393d3-ebf9-4f09-8264-22345147d5b1n@googlegroups.com>
<u53bmi$1mk1a$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <752b8b87-21d3-49bd-baca-9c42d9c852abn@googlegroups.com>
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Mon, 29 May 2023 23:49:35 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 41

by: MitchAlsup - Mon, 29 May 2023 23:49 UTC

On Monday, May 29, 2023 at 6:15:34 PM UTC-5, Stephen Fuld wrote:
> On 5/29/2023 3:46 PM, MitchAlsup wrote:
> > On Monday, May 29, 2023 at 5:08:45 PM UTC-5, BGB-Alt wrote:
> > <
> > BGB has an alter ego !?! Who knew !!
> > <
> >> On 5/28/2023 12:04 PM, MitchAlsup wrote:
> >>> On Sunday, May 28, 2023 at 8:02:07 AM UTC-5, Thomas Koenig wrote:
> >>>> Scott Lurndal <sc...@slp53.sl.home> schrieb:
> >>>>
> >>>>>> Power and PowerPC
> >>>>
> >>>> [...]
> >>>>
> >>>>> Obsolete CPUs, all designed three decades ago.
> >>>>
> >>>> Power10 became available in 2021, and Power11 (presumably) is on the
> >>>> way; patches are already appearing in gcc.
> >>> <
> >>> But Power (on which it is based) is a 4 decade old design.
> >>
> >> Presumably the merit is more based on how effectively things work, than
> >> on the (absolute) age of the design or design elements?...
> > <
> > A really good design/architecture can last a very long period of time.
<
> Sure, but so can a not so "really good" design/architecture, see X86. I
> think a design's longevity has more to do with various market factors,
> such as was it incorporated in a very successful product, than its
> inherent quality.
>
Leading directly to the meme:: Mediocracy First Wins.
<
First x86, then SPARC, then RISC-V, .........
>
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Re: Misc: Design tradeoffs in virtual memory systems...

<u53eoq$1mvgs$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=32462&group=comp.arch#32462

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Misc: Design tradeoffs in virtual memory systems...
Date: Mon, 29 May 2023 20:07:52 -0400
Organization: A noiseless patient Spider
Lines: 66
Message-ID: <u53eoq$1mvgs$1@dont-email.me>
References: <u4por8$3tugb$1@dont-email.me>
<Ia3cM.3440031$iU59.2338510@fx14.iad>
<u4vj9n$25m1c$1@newsreader4.netcologne.de>
<hHLcM.4041635$GNG9.1173433@fx18.iad> <u50hb0$ijs$1@gal.iecc.com>
<2023May29.142305@mips.complang.tuwien.ac.at>
<jV1dM.3731948$vBI8.2298283@fx15.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 30 May 2023 00:07:54 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ce9cb021cae3bdf40afb0670a12cfcee";
logging-data="1801756"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18ZrHuUkUx3AeqYX8MusjD1mvBj4HEWjjY="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:HGRV+uyapimxBicVgdxWSWl/kow=
In-Reply-To: <jV1dM.3731948$vBI8.2298283@fx15.iad>

by: Paul A. Clayton - Tue, 30 May 2023 00:07 UTC

On 5/29/23 9:38 AM, Scott Lurndal wrote:
> anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>> John Levine <johnl@taugh.com> writes:
>>> Seeing how complex ARM is getting (1000 page
>>> architecture manual) it's also on its way.
>>
>> I have a pdf named "ARM Architecture Reference Manual --- ARMv8, for
>> ARMv8-A architecture profile" that has 7476 pages.
>
> The current version (DDI0487_I_a) is up to 11,952 pages.

While *some* of that is bloat from the print/PDF format which
encourages replication of information and page breaks (e.g., a
brief look noted that instruction descriptions start at the top of
a page) and some comes from how the information is organized, a
*lot* seems to be from "essential complexity".

Even removing FP/SIMD, "multiprocessor" instructions, system
instructions, and various 'extensions', the instruction count
seems to be high. (With respect to "multiprocessor" instructions,
perhaps Mitch Alsup — or anyone who really groks My 66000's Exotic
Synchronization Mechanism — could write "Atomics considered
harmful" as a complement to "SIMD considered harmful".)

Separating instructions that modify condition codes from the same
operations that do not might be considered "organization" bloat,
though even if such were merged the instruction description would
be a little more complex.

(From my quick glance and a little I remember reading earlier,
AArch64 has a substantial variety of special instructions.
Providing some "complex/specialized" instructions, e.g.,
load/store pair, may be preferred over idiom recognition or the
somewhat sophisticated-seeming instruction modifiers of My 66000.
The cost of recognizing the idiom ll/add/sc versus that of
implementing atomic add (versus just letting such simple ll/sc
operations 'unnecessarily' fail sometimes) may not be difficult to
estimate in isolation, but one design choice will impact others.
Optimization of one short instruction sequence might be easily
extended to others. Other design alternatives will also come to
mind — e.g., a special ll that implies a sc using the same address
for the target and the result of the next instruction [or the next
that uses a particular register destination, possibly the zero
register, possibly the address base register of the ll, possibly
the destination register of the ll], possibly simplifying idiom
recognition and certainly reduce code size — expanding the design
space to explore, especially with interactions among architectural
and microarchitectural features.)

Having separate system instructions also increases instruction
count, though some of that complexity and verbosity might be
present in defining the memory-mapped control interfaces for an
ISA without a privileged mode.

There are also complexity factors that are not exposed by the size
of the documentation. It looks like 32-bit and 64-bit GPR
arithmetic operations are defined as one instruction (per
operation type)

Organizing documentation seems challenging and undervalued.
Increasing complexity along with a documentation legacy (both
expectations of established users and effort required to
reorganize content for clarity and concision) increase the cost of
producing good documentation. Yet I think print-oriented and
complete-in-one-volume-for-all-users presentation unnecessarily
increases the difficulty.

"Ahead warp factor 1" -- Captain Kirk

devel / comp.arch / Re: Misc: Design tradeoffs in virtual memory systems...

devel / comp.arch / Re: Misc: Design tradeoffs in virtual memory systems...

Subject	Author
Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Terje Mathisen
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	EricP
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	robf...@gmail.com
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	robf...@gmail.com
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	Anton Ertl
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	EricP
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	EricP
Re: Misc: Design tradeoffs in virtual memory systems...	EricP
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	EricP
Re: Misc: Design tradeoffs in virtual memory systems...	Ivan Godard
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	Anton Ertl
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	Ivan Godard
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Stephen Fuld
Re: Misc: Design tradeoffs in virtual memory systems...	robf...@gmail.com
Re: Misc: Design tradeoffs in virtual memory systems...	Stephen Fuld
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Stephen Fuld
Re: Misc: Design tradeoffs in virtual memory systems...	Ivan Godard
Re: Misc: Design tradeoffs in virtual memory systems...	Thomas Koenig
Re: Misc: Design tradeoffs in virtual memory systems...	Paul A. Clayton
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	Thomas Koenig
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB-Alt
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB-Alt
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	John Levine
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Dan Cross
Re: Misc: Design tradeoffs in virtual memory systems...	BGB
Re: Misc: Design tradeoffs in virtual memory systems...	Scott Lurndal
Re: Misc: Design tradeoffs in virtual memory systems...	Tim Rentsch
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	John Levine
Re: Misc: Design tradeoffs in virtual memory systems...	Thomas Koenig
Re: Misc: Design tradeoffs in virtual memory systems...	MitchAlsup
Re: Misc: Design tradeoffs in virtual memory systems...	Anton Ertl