Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

New York... when civilization falls apart, remember, we were way ahead of you. -- David Letterman


devel / comp.arch / Re: Tonight's tradeoff

SubjectAuthor
* Tonight's tradeoffRobert Finch
+* Re: Tonight's tradeoffEricP
|`* Re: Tonight's tradeoffMitchAlsup
| `* Re: Tonight's tradeoffRobert Finch
|  `* Re: Tonight's tradeoffMitchAlsup
|   `* Re: Tonight's tradeoffRobert Finch
|    +- Re: Tonight's tradeoffRobert Finch
|    `* Re: Tonight's tradeoffMitchAlsup
|     `* Re: Tonight's tradeoffRobert Finch
|      `* Re: Tonight's tradeoffRobert Finch
|       `* Re: Tonight's tradeoffMitchAlsup
|        +* Re: Tonight's tradeoffRobert Finch
|        |+* Re: Tonight's tradeoffBGB
|        ||`* Re: Tonight's tradeoffRobert Finch
|        || +* Re: Tonight's tradeoffScott Lurndal
|        || |`- Re: Tonight's tradeoffMitchAlsup
|        || `- Re: Tonight's tradeoffBGB
|        |+- Re: Tonight's tradeoffScott Lurndal
|        |`* Re: Tonight's tradeoffMitchAlsup
|        | `* Re: Tonight's tradeoffScott Lurndal
|        |  +* Re: Tonight's tradeoffMitchAlsup
|        |  |`* Re: Tonight's tradeoffScott Lurndal
|        |  | `* Re: Tonight's tradeoffRobert Finch
|        |  |  +- Re: Tonight's tradeoffMitchAlsup
|        |  |  `- Re: Tonight's tradeoffScott Lurndal
|        |  `* Re: Tonight's tradeoffAnton Ertl
|        |   +* Re: Tonight's tradeoffEricP
|        |   |+- Re: Tonight's tradeoffMitchAlsup
|        |   |`- Re: Tonight's tradeoffAnton Ertl
|        |   +* Re: Tonight's tradeoffBGB
|        |   |+* Re: Tonight's tradeoffScott Lurndal
|        |   ||+- Re: Tonight's tradeoffBGB
|        |   ||`* Re: Tonight's tradeoffMitchAlsup
|        |   || `- Re: Tonight's tradeoffBGB
|        |   |+- Re: Tonight's tradeoffRobert Finch
|        |   |`* Re: Tonight's tradeoffAnton Ertl
|        |   | `- Re: Tonight's tradeoffBGB
|        |   `* Re: Tonight's tradeoffScott Lurndal
|        |    `* Re: Tonight's tradeoffAnton Ertl
|        |     `* Re: Tonight's tradeoffScott Lurndal
|        |      `* Re: Tonight's tradeoffAnton Ertl
|        |       `* Re: Tonight's tradeoffRobert Finch
|        |        +- Re: Tonight's tradeoffScott Lurndal
|        |        +* Re: Tonight's tradeoffEricP
|        |        |`* Re: Tonight's tradeoffMitchAlsup
|        |        | `* Re: Tonight's tradeoffRobert Finch
|        |        |  `* Re: Tonight's tradeoffMitchAlsup
|        |        |   `* Re: Tonight's tradeoffRobert Finch
|        |        |    `* Re: Tonight's tradeoffMitchAlsup
|        |        |     `* Re: Tonight's tradeoffRobert Finch
|        |        |      `- Re: Tonight's tradeoffMitchAlsup
|        |        `* Re: Tonight's tradeoffRobert Finch
|        |         `* Re: Tonight's tradeoffEricP
|        |          +* Re: Tonight's tradeoffMitchAlsup
|        |          |+- Re: Tonight's tradeoffRobert Finch
|        |          |`* Re: Tonight's tradeoffBGB
|        |          | `* Re: Tonight's tradeoffRobert Finch
|        |          |  `* Re: Tonight's tradeoffBGB
|        |          |   `* Re: Tonight's tradeoffRobert Finch
|        |          |    +- Re: Tonight's tradeoffMitchAlsup
|        |          |    `* Re: Tonight's tradeoffBGB
|        |          |     `* Re: Tonight's tradeoffRobert Finch
|        |          |      `* Re: Tonight's tradeoffBGB
|        |          |       `* Re: Tonight's tradeoffRobert Finch
|        |          |        `* Re: Tonight's tradeoffRobert Finch
|        |          |         `* Re: Tonight's tradeoffMitchAlsup
|        |          |          `* Re: Tonight's tradeoffBGB
|        |          |           `* Re: Tonight's tradeoffRobert Finch
|        |          |            `* Re: Tonight's tradeoffMitchAlsup
|        |          |             `* Re: Tonight's tradeoffRobert Finch
|        |          |              `* Re: Tonight's tradeoffMitchAlsup
|        |          |               `* Re: Tonight's tradeoffRobert Finch
|        |          |                +- Re: Tonight's tradeoffRobert Finch
|        |          |                `* Re: Tonight's tradeoffMitchAlsup
|        |          |                 `* Re: Tonight's tradeoffRobert Finch
|        |          |                  +* Re: Tonight's tradeoffMitchAlsup
|        |          |                  |`* Re: Tonight's tradeoffRobert Finch
|        |          |                  | `* Re: Tonight's tradeoffBGB
|        |          |                  |  `* Re: Tonight's tradeoffRobert Finch
|        |          |                  |   +* Re: Tonight's tradeoffMitchAlsup
|        |          |                  |   |`- Re: Tonight's tradeoffRobert Finch
|        |          |                  |   `* Re: Tonight's tradeoffBGB
|        |          |                  |    `* Re: Tonight's tradeoffRobert Finch
|        |          |                  |     `* Re: Tonight's tradeoffRobert Finch
|        |          |                  |      `* Re: Tonight's tradeoffEricP
|        |          |                  |       +* Re: Tonight's tradeoffMitchAlsup
|        |          |                  |       |`* Re: Tonight's tradeoffRobert Finch
|        |          |                  |       | +- Re: Tonight's tradeoffRobert Finch
|        |          |                  |       | `* Re: Tonight's tradeoffEricP
|        |          |                  |       |  `* Re: Tonight's tradeoffMitchAlsup
|        |          |                  |       |   `* Re: Tonight's tradeoffRobert Finch
|        |          |                  |       |    `* Re: Tonight's tradeoffRobert Finch
|        |          |                  |       |     +- Re: Tonight's tradeoffBGB
|        |          |                  |       |     `* Re: Tonight's tradeoffEricP
|        |          |                  |       |      `* Re: Tonight's tradeoffMitchAlsup
|        |          |                  |       |       +- Re: Tonight's tradeoffRobert Finch
|        |          |                  |       |       `* Re: Tonight's tradeoffEricP
|        |          |                  |       |        +* Re: Tonight's tradeoffChris M. Thomasson
|        |          |                  |       |        |`* Re: Tonight's tradeoffEricP
|        |          |                  |       |        | +- Re: Tonight's tradeoffAnton Ertl
|        |          |                  |       |        | `* Re: Tonight's tradeoffChris M. Thomasson
|        |          |                  |       |        `* Re: Tonight's tradeoffChris M. Thomasson
|        |          |                  |       `- Re: Tonight's tradeoffBGB
|        |          |                  `- Re: Tonight's tradeoffMitchAlsup
|        |          `- Re: Tonight's tradeoffRobert Finch
|        `- Re: Tonight's tradeoffScott Lurndal
+- Re: Tonight's tradeoffMitchAlsup
`* Re: Tonight's tradeoffRobert Finch

Pages:123456789101112
Tonight's tradeoff

<uis67u$fkj4$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34960&group=comp.arch#34960

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: robfi680@gmail.com (Robert Finch)
Newsgroups: comp.arch
Subject: Tonight's tradeoff
Date: Sun, 12 Nov 2023 22:47:12 -0500
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <uis67u$fkj4$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 13 Nov 2023 03:47:10 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c85c55f789d3d0838e4ef5619f49bf7e";
logging-data="512612"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18A0cz8twnsme05DXVjCAZMfg23tkXe3I8="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ajGn0I+/ES2mWIvBurkNHm1dJbU=
Content-Language: en-US
 by: Robert Finch - Mon, 13 Nov 2023 03:47 UTC

Branch miss logic versus clock frequency.

The branch miss logic for the current OoO version of Thor is quite
involved. It needs to back out the register source indexes to the last
valid source before the branch instruction. To do this in a single
cycle, the logic is about 25+ logic levels deep. I find this somewhat
unacceptable.

I can remove a lot of logic improving the clock frequency substantially
by removing the branch miss logic that resets the registers source id to
the last valid source. Instead of stomping on the instruction on a miss
and flushing the instructions in a single cycle, I think the predicate
for the instructions can be cleared which will effectively turn them
into a NOP. The value of the target register will be propagated in the
reorder buffer meaning the registers source id need not be reset. The
reorder buffer is only eight entries. So, on average four entries would
be turned into NOPs. The NOPs would still propagate through the reorder
buffer so it may take several clock cycles for them to be flushed from
the buffer. Meaning the branch latency for miss-predicted branches would
be quite high. However, if the clock frequency can be improved by 20%
for all instructions, much of the lost performance on the branches may
be made up.

Re: Tonight's tradeoff

<aUr4N.33009$BbXa.15163@fx16.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34968&group=comp.arch#34968

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.niel.me!glou.org!news.glou.org!fdn.fr!2.eu.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!weretis.net!feeder6.news.weretis.net!panix!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx16.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
References: <uis67u$fkj4$1@dont-email.me>
In-Reply-To: <uis67u$fkj4$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 58
Message-ID: <aUr4N.33009$BbXa.15163@fx16.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Mon, 13 Nov 2023 16:10:46 UTC
Date: Mon, 13 Nov 2023 11:10:19 -0500
X-Received-Bytes: 3753
 by: EricP - Mon, 13 Nov 2023 16:10 UTC

Robert Finch wrote:
> Branch miss logic versus clock frequency.
>
> The branch miss logic for the current OoO version of Thor is quite
> involved. It needs to back out the register source indexes to the last
> valid source before the branch instruction. To do this in a single
> cycle, the logic is about 25+ logic levels deep. I find this somewhat
> unacceptable.
>
> I can remove a lot of logic improving the clock frequency substantially
> by removing the branch miss logic that resets the registers source id to
> the last valid source. Instead of stomping on the instruction on a miss
> and flushing the instructions in a single cycle, I think the predicate
> for the instructions can be cleared which will effectively turn them
> into a NOP. The value of the target register will be propagated in the
> reorder buffer meaning the registers source id need not be reset. The
> reorder buffer is only eight entries. So, on average four entries would
> be turned into NOPs. The NOPs would still propagate through the reorder
> buffer so it may take several clock cycles for them to be flushed from
> the buffer. Meaning the branch latency for miss-predicted branches would
> be quite high. However, if the clock frequency can be improved by 20%
> for all instructions, much of the lost performance on the branches may
> be made up.

Basically it sounds like you want to eliminate the checkpoint and rollback,
and instead let resources be recovered at Retire. That could work.

However you are not restoring the Renamer's future Register Alias Table (RAT)
to its state at the point of the mispredicted branch instruction, which is
what the rollback would have done, so its state will be whatever it was at
the end of the mispredicted sequence. That needs to be re-sync'ed with the
program state as of the branch.

That can be accomplished by stalling the front end, waiting until the
mispredicted branch reaches Retire and then copying the committed RAT,
maintained by Retire, to the future RAT at Rename, and restart front end.
The list of free physical registers is then all those that are not
marked as architectural registers.
This is partly how I handle exceptions.

Also you still need a mechanism to cancel start of execution of the
subset of pending uOps for the purged set. You don't want to launch
a LD or DIV from the mispredicted set if it has not already started.
If you are using a reservation station design then you need some way
to distribute the cancel request to the various FU's and RS's,
and wait for them to clean themselves up.

Note that some things might not be able to cancel immediately,
like an in-flight MUL in a pipeline or an outstanding LD to the cache.
So some of this will be asynchronous (send cancel request, wait for ACK).

There are some other things that might need cleanup.
A Return Stack Predictor might be manipulated by the mispredicted path.
Not sure how to handle that without a checkpoint.
Maybe have two copies like RAT, a future one maintained by Decode and
a committed one maintained by Retire, and copy the committed to future.

Re: Tonight's tradeoff

<ac120fe66804782fd338198c6d486128@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34974&group=comp.arch#34974

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Mon, 13 Nov 2023 19:47:27 +0000
Organization: novaBBS
Message-ID: <ac120fe66804782fd338198c6d486128@news.novabbs.com>
References: <uis67u$fkj4$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="810955"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$TZ7KUA.Ikdg9UzSNtaGyceJKL/UFZGPQwq.Qn2FiHazuie3RZfE5y
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Mon, 13 Nov 2023 19:47 UTC

Robert Finch wrote:

> Branch miss logic versus clock frequency.

> The branch miss logic for the current OoO version of Thor is quite
> involved. It needs to back out the register source indexes to the last
> valid source before the branch instruction. To do this in a single
> cycle, the logic is about 25+ logic levels deep. I find this somewhat
> unacceptable.
<
When you launch a predicted branch into execution (prelude to signaling
recovery is required), while the branch is determining whether to backup
(or not) have the branch recovery logic setup the register indexes such
that::
a) if the branch succeeds keep the current map
b) if the branch fails, you are 1 multiplexer delay from having the state
you want.
<
That is move the setup to repair the previous clock.
<
> I can remove a lot of logic improving the clock frequency substantially
> by removing the branch miss logic that resets the registers source id to
> the last valid source. Instead of stomping on the instruction on a miss
> and flushing the instructions in a single cycle, I think the predicate
> for the instructions can be cleared which will effectively turn them
> into a NOP. The value of the target register will be propagated in the
> reorder buffer meaning the registers source id need not be reset. The
> reorder buffer is only eight entries. So, on average four entries would
> be turned into NOPs. The NOPs would still propagate through the reorder
> buffer so it may take several clock cycles for them to be flushed from
> the buffer. Meaning the branch latency for miss-predicted branches would
> be quite high. However, if the clock frequency can be improved by 20%
> for all instructions, much of the lost performance on the branches may
> be made up.

Re: Tonight's tradeoff

<643607718b82ff03ae09d2b661963223@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34975&group=comp.arch#34975

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Mon, 13 Nov 2023 20:01:53 +0000
Organization: novaBBS
Message-ID: <643607718b82ff03ae09d2b661963223@news.novabbs.com>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="811877"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$6rDO4i5GhnWRiC/ziV/h8umuZBc2Rqor85IcWwfSQ57KFhziX/TfK
 by: MitchAlsup - Mon, 13 Nov 2023 20:01 UTC

EricP wrote:

> Robert Finch wrote:
>> Branch miss logic versus clock frequency.
>>
>> The branch miss logic for the current OoO version of Thor is quite
>> involved. It needs to back out the register source indexes to the last
>> valid source before the branch instruction. To do this in a single
>> cycle, the logic is about 25+ logic levels deep. I find this somewhat
>> unacceptable.
>>
>> I can remove a lot of logic improving the clock frequency substantially
>> by removing the branch miss logic that resets the registers source id to
>> the last valid source. Instead of stomping on the instruction on a miss
>> and flushing the instructions in a single cycle, I think the predicate
>> for the instructions can be cleared which will effectively turn them
>> into a NOP. The value of the target register will be propagated in the
>> reorder buffer meaning the registers source id need not be reset. The
>> reorder buffer is only eight entries. So, on average four entries would
>> be turned into NOPs. The NOPs would still propagate through the reorder
>> buffer so it may take several clock cycles for them to be flushed from
>> the buffer. Meaning the branch latency for miss-predicted branches would
>> be quite high. However, if the clock frequency can be improved by 20%
>> for all instructions, much of the lost performance on the branches may
>> be made up.

> Basically it sounds like you want to eliminate the checkpoint and rollback,
> and instead let resources be recovered at Retire. That could work.

> However you are not restoring the Renamer's future Register Alias Table (RAT)
> to its state at the point of the mispredicted branch instruction, which is
> what the rollback would have done, so its state will be whatever it was at
> the end of the mispredicted sequence. That needs to be re-sync'ed with the
> program state as of the branch.
<
I, personally, don't use a RAT--I use a CAM based architectural decoder
for operand read and a standard physical equality decoder for writes.
<
Every cycle the CAM.valid bits are block loaded into a history table
and if you need to return the CAMs to the checkpointed mappings, you
take the valid bits from the history table and write the CAM.valid
bits back into the physical register file. Presto, the map is how it
used to be.
<
Can even be made to be performed in 0-cycles. {yes: 0 not 1 cycles}
<
> That can be accomplished by stalling the front end, waiting until the
> mispredicted branch reaches Retire and then copying the committed RAT,
> maintained by Retire, to the future RAT at Rename, and restart front end.
> The list of free physical registers is then all those that are not
> marked as architectural registers.
<
Sounds slow.
<
> This is partly how I handle exceptions.

> Also you still need a mechanism to cancel start of execution of the
> subset of pending uOps for the purged set. You don't want to launch
> a LD or DIV from the mispredicted set if it has not already started.
> If you are using a reservation station design then you need some way
> to distribute the cancel request to the various FU's and RS's,
> and wait for them to clean themselves up.
<
I use the concept of an execution window to do this both at the reservation
station and function units. There is an insert pointer and a consistent
pointer RS is only allowed to launch when the instruction is between.
FU are only allowed to calculate so long as the instruction remains
between these 2 pointers. The 2 pointers (4-bits each) are broadcast
around the machine every cycle. Each station and unit decide for themselves.

> Note that some things might not be able to cancel immediately,
> like an in-flight MUL in a pipeline or an outstanding LD to the cache.
> So some of this will be asynchronous (send cancel request, wait for ACK).
<
If an instruction that should not have its result delivered is delivered,
it is delivered to the physical register it was assigned at its issue time.
But since the value had not been delivered, that register is not in the
pool of assignable registers, so no dependency has been created.
<
> There are some other things that might need cleanup.
> A Return Stack Predictor might be manipulated by the mispredicted path.
<
Do these with a linked list and you can backup a misprediced return
to a mispredicted call.
<
> Not sure how to handle that without a checkpoint.
<
Every (non exceptional) flow altering instruction needs a checkpoint.
Predicated strings of instructions use a light weight checkpoint;
predicted branches use a heavy weight version.
<
> Maybe have two copies like RAT, a future one maintained by Decode and
> a committed one maintained by Retire, and copy the committed to future.

Re: Tonight's tradeoff

<uj1o0t$1kves$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35006&group=comp.arch#35006

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: robfi680@gmail.com (Robert Finch)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Wed, 15 Nov 2023 01:21:16 -0500
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <uj1o0t$1kves$1@dont-email.me>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad>
<643607718b82ff03ae09d2b661963223@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 15 Nov 2023 06:21:17 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2440471bea9af7862456de6065ed7f6c";
logging-data="1736156"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1977p2yUna7y6cF1bUANRqeBKMaGDC2aso="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:bL/VDDR6jOz1e01Np4EdIF9uVaA=
Content-Language: en-US
In-Reply-To: <643607718b82ff03ae09d2b661963223@news.novabbs.com>
 by: Robert Finch - Wed, 15 Nov 2023 06:21 UTC

Decided to shelve Thor2024 and begin work on Thor2025. While Thor2024 is
very good there are a few issues with it. The ROB is used to store
register values and that is effectively a CAM. It is not very resource
efficient in an FPGA. I have been researching an x86 OoO implementation
(https://www.stuffedcow.net/files/henry-thesis-phd.pdf ) done in an FPGA
and it turns out to be considerably smaller than Thor. There are more
efficient implementations for components than what is currently in use.

Thor2025 will use a PRF approach although using a PRF seems large to me.
To reduce the size and complexity of the register file, separate
register files will be used for float and integer operations, along with
separate register files for vector mask registers and subroutine link
registers. This set of register files limits the GPR file to only 3
write ports and 18 read ports to support all the functional units.
Currently the register file is 10r2w.

The trade-off is block RAM usage instead of LUTs.

While having separate registers files seems like a step backwards, it
should ultimately make the hardware more resource efficient. It does
impact the ISA spec.

Re: Tonight's tradeoff

<7761287e80bb22b7742fd7f292664497@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35013&group=comp.arch#35013

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Wed, 15 Nov 2023 19:11:31 +0000
Organization: novaBBS
Message-ID: <7761287e80bb22b7742fd7f292664497@news.novabbs.com>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad> <643607718b82ff03ae09d2b661963223@news.novabbs.com> <uj1o0t$1kves$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1025667"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$9AyWQvatXouJMpkFZxM16OtRgBo7lMHOlJ0zrYvTpFvQeORZ30IRy
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Wed, 15 Nov 2023 19:11 UTC

Robert Finch wrote:

> Decided to shelve Thor2024 and begin work on Thor2025. While Thor2024 is
> very good there are a few issues with it. The ROB is used to store
> register values and that is effectively a CAM. It is not very resource
> efficient in an FPGA. I have been researching an x86 OoO implementation
> (https://www.stuffedcow.net/files/henry-thesis-phd.pdf ) done in an FPGA
> and it turns out to be considerably smaller than Thor. There are more
> efficient implementations for components than what is currently in use.

> Thor2025 will use a PRF approach although using a PRF seems large to me.
<
I have a PRF design I could show you--way to big for comp.arch and
with the requisite figures.
<
> To reduce the size and complexity of the register file, separate
> register files will be used for float and integer operations, along with
> separate register files for vector mask registers and subroutine link
> registers. This set of register files limits the GPR file to only 3
> write ports and 18 read ports to support all the functional units.
> Currently the register file is 10r2w.

> The trade-off is block RAM usage instead of LUTs.

> While having separate registers files seems like a step backwards, it
> should ultimately make the hardware more resource efficient. It does
> impact the ISA spec.

Re: Tonight's tradeoff

<uj9bm2$36401$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35079&group=comp.arch#35079

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: robfi680@gmail.com (Robert Finch)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Fri, 17 Nov 2023 22:39:45 -0500
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <uj9bm2$36401$1@dont-email.me>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad>
<643607718b82ff03ae09d2b661963223@news.novabbs.com>
<uj1o0t$1kves$1@dont-email.me>
<7761287e80bb22b7742fd7f292664497@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 18 Nov 2023 03:39:46 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a48d31499e9734559e4e8f902ccdee86";
logging-data="3346433"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/cXs76HUS/q8P0/tKv9/OyExanCYqi2xo="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:Z18vM/5VRtWWh8S1qIs04WSHVu4=
In-Reply-To: <7761287e80bb22b7742fd7f292664497@news.novabbs.com>
Content-Language: en-US
 by: Robert Finch - Sat, 18 Nov 2023 03:39 UTC

On 2023-11-15 2:11 p.m., MitchAlsup wrote:
> Robert Finch wrote:
>
>> Decided to shelve Thor2024 and begin work on Thor2025. While Thor2024
>> is very good there are a few issues with it. The ROB is used to store
>> register values and that is effectively a CAM. It is not very resource
>> efficient in an FPGA. I have been researching an x86 OoO
>> implementation (https://www.stuffedcow.net/files/henry-thesis-phd.pdf
>> ) done in an FPGA and it turns out to be considerably smaller than
>> Thor. There are more efficient implementations for components than
>> what is currently in use.
>
>> Thor2025 will use a PRF approach although using a PRF seems large to me.
> <
> I have a PRF design I could show you--way to big for comp.arch and
> with the requisite figures.
> <
>> To reduce the size and complexity of the register file, separate
>> register files will be used for float and integer operations, along
>> with separate register files for vector mask registers and subroutine
>> link registers. This set of register files limits the GPR file to only
>> 3 write ports and 18 read ports to support all the functional units.
>> Currently the register file is 10r2w.
>
>> The trade-off is block RAM usage instead of LUTs.
>
>> While having separate registers files seems like a step backwards, it
>> should ultimately make the hardware more resource efficient. It does
>> impact the ISA spec.

Still digesting the PRF diagram.

Decided to go with a unified register file, 27r3w so far. Having
separate register files would not reduce the number of read ports
required and would add complexity to the processor.

Loads, FPU operations and flow control (FCU) operations all share the
third write port of the register file. The other two write ports are
dedicated to the ALU results. I think this will be okay given <1% of
instructions would be FCU updates. Loads are about 25%, and FPU depends
on the application.

The ALUs/FPU/Loads have five input operands including the 3 source
operands, a target operand, and a mask register. Stores do not need a
target operand. FCU ops are non-masked so do not need a mask register or
target operand input.

Not planning to implement the vector register file as it would be immense.

Re: Tonight's tradeoff

<uja5d2$39k51$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35082&group=comp.arch#35082

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: robfi680@gmail.com (Robert Finch)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Sat, 18 Nov 2023 05:58:42 -0500
Organization: A noiseless patient Spider
Lines: 83
Message-ID: <uja5d2$39k51$1@dont-email.me>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad>
<643607718b82ff03ae09d2b661963223@news.novabbs.com>
<uj1o0t$1kves$1@dont-email.me>
<7761287e80bb22b7742fd7f292664497@news.novabbs.com>
<uj9bm2$36401$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 18 Nov 2023 10:58:42 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a48d31499e9734559e4e8f902ccdee86";
logging-data="3461281"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+zfrw462pnHf8MOGjUNCT2e7Fu6tHp7po="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:PmMF4LMFJDJx3uGhsW5BcNAM6SU=
Content-Language: en-US
In-Reply-To: <uj9bm2$36401$1@dont-email.me>
 by: Robert Finch - Sat, 18 Nov 2023 10:58 UTC

On 2023-11-17 10:39 p.m., Robert Finch wrote:
> On 2023-11-15 2:11 p.m., MitchAlsup wrote:
>> Robert Finch wrote:
>>
>>> Decided to shelve Thor2024 and begin work on Thor2025. While Thor2024
>>> is very good there are a few issues with it. The ROB is used to store
>>> register values and that is effectively a CAM. It is not very
>>> resource efficient in an FPGA. I have been researching an x86 OoO
>>> implementation (https://www.stuffedcow.net/files/henry-thesis-phd.pdf
>>> ) done in an FPGA and it turns out to be considerably smaller than
>>> Thor. There are more efficient implementations for components than
>>> what is currently in use.
>>
>>> Thor2025 will use a PRF approach although using a PRF seems large to me.
>> <
>> I have a PRF design I could show you--way to big for comp.arch and
>> with the requisite figures.
>> <
>>> To reduce the size and complexity of the register file, separate
>>> register files will be used for float and integer operations, along
>>> with separate register files for vector mask registers and subroutine
>>> link registers. This set of register files limits the GPR file to
>>> only 3 write ports and 18 read ports to support all the functional
>>> units. Currently the register file is 10r2w.
>>
>>> The trade-off is block RAM usage instead of LUTs.
>>
>>> While having separate registers files seems like a step backwards, it
>>> should ultimately make the hardware more resource efficient. It does
>>> impact the ISA spec.
>
> Still digesting the PRF diagram.
>
> Decided to go with a unified register file, 27r3w so far. Having
> separate register files would not reduce the number of read ports
> required and would add complexity to the processor.
>
> Loads, FPU operations and flow control (FCU) operations all share the
> third write port of the register file. The other two write ports are
> dedicated to the ALU results. I think this will be okay given <1% of
> instructions would be FCU updates. Loads are about 25%, and FPU depends
> on the application.
>
> The ALUs/FPU/Loads have five input operands including the 3 source
> operands, a target operand, and a mask register. Stores do not need a
> target operand. FCU ops are non-masked so do not need a mask register or
> target operand input.
>
> Not planning to implement the vector register file as it would be immense.
>
Changed the moniker of my current processor project from Thor to Qupls
(Q-Plus). I wanted a five- letter name beginning with ‘Q’. For a moment
I thought of calling it Quake but thought that would be too confusing.
One must understand the magic behind name choices.

The current design uses instruction postfixes of 32, 48, 80, and 144
bits which provide constants of 23, 39, 64 and 128 bits. Two bits in the
instruction indicate the postfix size. 64 and 128-bit constants have
seven extra unused bits available. The fields available being 71 and 135
bits.

Somewhat ugly, but it is desired to keep instructions a multiple of
16-bits in size. The shortest instruction is a NOP which is 16-bits so
that it may be used for alignment.

I almost switched to 96-bit floats which seem appealing, but once again
remembered that the progression of 32, 64, 128-bit floats work very well
for the float approximations.

Branches are 48-bit, being a combination of a compare and a branch with
a 24-bit target address field. Other flow control ops like JSR and JMP
are also 48-bit to keep all flow controls at 48-bit for simplified decoding.

Most instructions are 32-bits in size.

Sticking with a 64-register unified register file.

Removed the vector operations. There is enough play in the ISA to add
them at a later date if desired.

Loads and stores support two address mode, d(Rn) and d(Rn+Rm*Sc). The
scaled index address mode will likely be a 48-bit op.

Re: Tonight's tradeoff

<71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35087&group=comp.arch#35087

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Sat, 18 Nov 2023 17:27:50 +0000
Organization: novaBBS
Message-ID: <71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad> <643607718b82ff03ae09d2b661963223@news.novabbs.com> <uj1o0t$1kves$1@dont-email.me> <7761287e80bb22b7742fd7f292664497@news.novabbs.com> <uj9bm2$36401$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1347211"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Spam-Level: *
X-Rslight-Site: $2y$10$iYpEZNCn7X2wE9C6YbI.kOExcLen1mtEv0/64mGUc.DyUibmt/Sb.
 by: MitchAlsup - Sat, 18 Nov 2023 17:27 UTC

Robert Finch wrote:

> On 2023-11-15 2:11 p.m., MitchAlsup wrote:
>> Robert Finch wrote:
>>
>>> Decided to shelve Thor2024 and begin work on Thor2025. While Thor2024
>>> is very good there are a few issues with it. The ROB is used to store
>>> register values and that is effectively a CAM. It is not very resource
>>> efficient in an FPGA. I have been researching an x86 OoO
>>> implementation (https://www.stuffedcow.net/files/henry-thesis-phd.pdf
>>> ) done in an FPGA and it turns out to be considerably smaller than
>>> Thor. There are more efficient implementations for components than
>>> what is currently in use.
>>
>>> Thor2025 will use a PRF approach although using a PRF seems large to me.
>> <
>> I have a PRF design I could show you--way to big for comp.arch and
>> with the requisite figures.
>> <
>>> To reduce the size and complexity of the register file, separate
>>> register files will be used for float and integer operations, along
>>> with separate register files for vector mask registers and subroutine
>>> link registers. This set of register files limits the GPR file to only
>>> 3 write ports and 18 read ports to support all the functional units.
>>> Currently the register file is 10r2w.
>>
>>> The trade-off is block RAM usage instead of LUTs.
>>
>>> While having separate registers files seems like a step backwards, it
>>> should ultimately make the hardware more resource efficient. It does
>>> impact the ISA spec.

> Still digesting the PRF diagram.

The diagram is for a 6R6W PRF with a history table, ARN->PRN translation,
Free pool pickers, and register ports. The X with a ½ box is a latch
or flip-flop depending on the clocking that is put around the figure.
It also includes the renamer {history table and free pool pickers}.

> Decided to go with a unified register file, 27r3w so far. Having
> separate register files would not reduce the number of read ports
> required and would add complexity to the processor.

9 Reads per 1 write ?!?!?

> Loads, FPU operations and flow control (FCU) operations all share the
> third write port of the register file. The other two write ports are
> dedicated to the ALU results. I think this will be okay given <1% of
> instructions would be FCU updates. Loads are about 25%, and FPU depends
> on the application.

> The ALUs/FPU/Loads have five input operands including the 3 source
> operands, a target operand, and a mask register. Stores do not need a
> target operand. FCU ops are non-masked so do not need a mask register or
> target operand input.

> Not planning to implement the vector register file as it would be immense.

Re: Tonight's tradeoff

<ujb40q$3eepe$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35091&group=comp.arch#35091

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: robfi680@gmail.com (Robert Finch)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Sat, 18 Nov 2023 14:41:14 -0500
Organization: A noiseless patient Spider
Lines: 91
Message-ID: <ujb40q$3eepe$1@dont-email.me>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad>
<643607718b82ff03ae09d2b661963223@news.novabbs.com>
<uj1o0t$1kves$1@dont-email.me>
<7761287e80bb22b7742fd7f292664497@news.novabbs.com>
<uj9bm2$36401$1@dont-email.me>
<71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 18 Nov 2023 19:41:14 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a48d31499e9734559e4e8f902ccdee86";
logging-data="3619630"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/lLNoyFfa3ixwqNSPLpI2ZriMY5xE3psM="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:LFxoy9MXtWIuAIizXG8zAfCDErk=
Content-Language: en-US
In-Reply-To: <71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com>
 by: Robert Finch - Sat, 18 Nov 2023 19:41 UTC

On 2023-11-18 12:27 p.m., MitchAlsup wrote:
> Robert Finch wrote:
>
>> On 2023-11-15 2:11 p.m., MitchAlsup wrote:
>>> Robert Finch wrote:
>>>
>>>> Decided to shelve Thor2024 and begin work on Thor2025. While
>>>> Thor2024 is very good there are a few issues with it. The ROB is
>>>> used to store register values and that is effectively a CAM. It is
>>>> not very resource efficient in an FPGA. I have been researching an
>>>> x86 OoO implementation
>>>> (https://www.stuffedcow.net/files/henry-thesis-phd.pdf ) done in an
>>>> FPGA and it turns out to be considerably smaller than Thor. There
>>>> are more efficient implementations for components than what is
>>>> currently in use.
>>>
>>>> Thor2025 will use a PRF approach although using a PRF seems large to
>>>> me.
>>> <
>>> I have a PRF design I could show you--way to big for comp.arch and
>>> with the requisite figures.
>>> <
>>>> To reduce the size and complexity of the register file, separate
>>>> register files will be used for float and integer operations, along
>>>> with separate register files for vector mask registers and
>>>> subroutine link registers. This set of register files limits the GPR
>>>> file to only 3 write ports and 18 read ports to support all the
>>>> functional units. Currently the register file is 10r2w.
>>>
>>>> The trade-off is block RAM usage instead of LUTs.
>>>
>>>> While having separate registers files seems like a step backwards,
>>>> it should ultimately make the hardware more resource efficient. It
>>>> does impact the ISA spec.
>
>> Still digesting the PRF diagram.
>
> The diagram is for a 6R6W PRF with a history table, ARN->PRN translation,
> Free pool pickers, and register ports. The X with a ½ box is a latch
> or flip-flop depending on the clocking that is put around the figure.
> It also includes the renamer {history table and free pool pickers}.
>
>> Decided to go with a unified register file, 27r3w so far. Having
>> separate register files would not reduce the number of read ports
>> required and would add complexity to the processor.
>
> 9 Reads per 1 write ?!?!?
>
>> Loads, FPU operations and flow control (FCU) operations all share the
>> third write port of the register file. The other two write ports are
>> dedicated to the ALU results. I think this will be okay given <1% of
>> instructions would be FCU updates. Loads are about 25%, and FPU
>> depends on the application.
>
>> The ALUs/FPU/Loads have five input operands including the 3 source
>> operands, a target operand, and a mask register. Stores do not need a
>> target operand. FCU ops are non-masked so do not need a mask register
>> or target operand input.
>
>> Not planning to implement the vector register file as it would be
>> immense.
Freelist:

I just used the find-first/last-one’s trick on a bit-list to pick a PR
for an AR. It can provide PRs for two ARs per cycle. I have all the PRs
from the ROB feeding into the list manager so that on a branch miss the
PRs can be freed up. (Just the portion of the PRs associated with the
miss are freed). Three discarded PRs from commit also feed into the list
manager so they can be freed. It seems like a lot of logic translating
the PR to a bit. It seems a bit impractical to me to feed all the PRs
from the ROB to the list manager. It can be done with the smallish 16
entry ROB, but for a larger ROB the free may have to be split up or
another means found.

RAT:

A register alias table is being used to track the mappings of ARs to
PRs. It uses two maps; speculative and committed. On instruction enqueue
speculative mappings are updated. On commit committed mappings are
updated, and on pipeline flush commit is copied to speculative.

Register file:

I’ve reduced the number of read ports, by not supporting the vector
stuff. There are only 18 read ports. Six groups of three.

ROB:
The ROB acts like a CAM to store both the aRN and pRN for the target
register. The aRN is needed to know which previous pRN to free on
commit. For source operands only the pRN is stored.

Re: Tonight's tradeoff

<ujrfaa$2h1v9$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35202&group=comp.arch#35202

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: robfi680@gmail.com (Robert Finch)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Fri, 24 Nov 2023 19:32:09 -0500
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <ujrfaa$2h1v9$1@dont-email.me>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad>
<643607718b82ff03ae09d2b661963223@news.novabbs.com>
<uj1o0t$1kves$1@dont-email.me>
<7761287e80bb22b7742fd7f292664497@news.novabbs.com>
<uj9bm2$36401$1@dont-email.me>
<71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com>
<ujb40q$3eepe$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 25 Nov 2023 00:32:12 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2eefd5876b645ca0e35f3e27c42fcd7c";
logging-data="2656233"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/df5qkTXbRMxLI7uegSOTfKppB8kuxo60="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:NNhCu3l5crmzDLuqPJzpl+UdQHc=
Content-Language: en-US
In-Reply-To: <ujb40q$3eepe$1@dont-email.me>
 by: Robert Finch - Sat, 25 Nov 2023 00:32 UTC

On 2023-11-18 2:41 p.m., Robert Finch wrote:
Q+ uses 64kB memory pages containing 8192 PTEs to map memory. A single
64kB page can handle 512MB of mappings. Tonight’s trade-off is how many
root pointers to support. With a 12-bit ASID, 4096 root pointers are
required to link to the mapping tables with one root pointer for each
address space. A 512 MB space is probably sufficient for a large number
of apps. Meaning access for a TLB update is via a single root pointer
lookup and then looking up the translation from a single memory page.
Not much for the table walker to do. The 4096 root pointers use two
block RAMs and require an 8192-byte address space for update assuming a
32-bit physical address space (a 16-bit root page number).

An IO mapped area of 64kB is available for root pointer memory. 16 block
RAMs could be setup in this area, that would allow 8 root pointers for
each address space. Three bits of the virtual address space could then
be mapped using root pointers. If the root pointer just points to a
single level of page tables, then a 4GB (32-bit) space could be mapped.
I am mulling over whether it is worth it to support the additional root
pointers. It is a chunk of block RAM memory that might be better spent
elsewhere.

If I use an 11-bit ASID, all the root pointers could be present in a
single block RAM. So, design choices are 11 or 12-bits ASID, 1 or 8 root
pointers per address space.

My thought is to have only a single root pointer per space, and organize
the root pointer table as if there were 32-bits for the pointer. This
would allow a 48-bit physical address space to place the mapping tables
in. The RAM could be mapped so that the high order bits of the pointer
are assumed to be zero. The system could get by using a single block RAM
if the mapping tables location were restricted to a 16MB address range.
Eight-bit pointers could be used then.

Given that it is a small system, with only 512MB of DRAM, I think it
best to keep the page-table-walker simple, and use the minimum amount of
BRAM (1).

Re: Tonight's tradeoff

<987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35203&group=comp.arch#35203

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Sat, 25 Nov 2023 01:00:29 +0000
Organization: novaBBS
Message-ID: <987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad> <643607718b82ff03ae09d2b661963223@news.novabbs.com> <uj1o0t$1kves$1@dont-email.me> <7761287e80bb22b7742fd7f292664497@news.novabbs.com> <uj9bm2$36401$1@dont-email.me> <71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com> <ujb40q$3eepe$1@dont-email.me> <ujrfaa$2h1v9$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2016757"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$eNdgt1gtKsNnkw2g155lOOhYlH64j6feQ8fl41dfGBtpvyfYZFxMq
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Spam-Level: *
 by: MitchAlsup - Sat, 25 Nov 2023 01:00 UTC

Robert Finch wrote:

> On 2023-11-18 2:41 p.m., Robert Finch wrote:
> Q+ uses 64kB memory pages containing 8192 PTEs to map memory. A single
> 64kB page can handle 512MB of mappings. Tonight’s trade-off is how many
> root pointers to support. With a 12-bit ASID, 4096 root pointers are
> required to link to the mapping tables with one root pointer for each
> address space.

So, you associate a single ROOT pointer VALUE with an ASID, and manage
in SW who gets that ROOT pointer VALUE; using ASID as an index into
Virtual Address Spaces.

How is this usefully different that only using the ASID to qualify TLB
results ?? <Was this TLB entry installed from the same ASID as is accessing
right now>. And using ASID as an index into any array might lead to some
conundrum down the road a apiece.

Secondarily, SUN started out with 12-bit ASID and went to 16-bits just about
as fast as they could--even before main memories went bigger than 4GB.

Re: Tonight's tradeoff

<ujrm4a$2llie$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35205&group=comp.arch#35205

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: robfi680@gmail.com (Robert Finch)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Fri, 24 Nov 2023 21:28:25 -0500
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <ujrm4a$2llie$1@dont-email.me>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad>
<643607718b82ff03ae09d2b661963223@news.novabbs.com>
<uj1o0t$1kves$1@dont-email.me>
<7761287e80bb22b7742fd7f292664497@news.novabbs.com>
<uj9bm2$36401$1@dont-email.me>
<71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com>
<ujb40q$3eepe$1@dont-email.me> <ujrfaa$2h1v9$1@dont-email.me>
<987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 25 Nov 2023 02:28:26 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2eefd5876b645ca0e35f3e27c42fcd7c";
logging-data="2807374"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18QRVrR3BrgJZDmc9ECOfGZMaZzZaK6CCs="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:WhWJExeGTqYGpmL8KLQlGmcxPw8=
In-Reply-To: <987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com>
Content-Language: en-US
 by: Robert Finch - Sat, 25 Nov 2023 02:28 UTC

On 2023-11-24 8:00 p.m., MitchAlsup wrote:
> Robert Finch wrote:
>
>> On 2023-11-18 2:41 p.m., Robert Finch wrote:
>> Q+ uses 64kB memory pages containing 8192 PTEs to map memory. A single
>> 64kB page can handle 512MB of mappings. Tonight’s trade-off is how
>> many root pointers to support. With a 12-bit ASID, 4096 root pointers
>> are required to link to the mapping tables with one root pointer for
>> each address space.
>
> So, you associate a single ROOT pointer VALUE with an ASID, and manage
> in SW who gets that ROOT pointer VALUE; using ASID as an index into
> Virtual Address Spaces.
>
> How is this usefully different that only using the ASID to qualify TLB
> results ?? <Was this TLB entry installed from the same ASID as is accessing
> right now>. And using ASID as an index into any array might lead to some
> conundrum down the road a apiece.
>
> Secondarily, SUN started out with 12-bit ASID and went to 16-bits just
> about
> as fast as they could--even before main memories went bigger than 4GB.

I view the address space as an entity in it own right to be managed by
the MMU. ASIDs and address spaces should be mapped 1:1. The ASID that
identifies the address space has a life outside of just the TLB. I may
be increasing the typical scope of an ASID.

It is the same idea as using the ASID to qualify TLB entries, except
that it qualifies the root pointer as well. So, the root pointer does
not need to be switched by software. Once the root pointer is set for
the AS it simply sits there statically until the AS is reused.

I am using the ASID like a process ID. So, the root pointer register
does not need to be reset on a task switch. Address spaces may not be
mapped 1:1 with processes. An address space may outlive a task if it is
shared with another task. So, I do not want to use the PID to
distinguish tables. This assumes the address space will not be freed up
and reused by another task, if there are tasks using the ASID.

4096 address spaces is a lot. But if using a 16-bit ASID it would no
longer be practical to store a root pointer per ASID in a table.
Instead, the root pointer would have to be managed by software as is
normally done.

I am wondering why the 16-bit ASID? 256 address spaces in 256 process? I
suspect it is just because 16-bit is easier to pass around/calculate in
a HLL than some other value like 14-bits. Are 65536 address spaces
really needed?

Re: Tonight's tradeoff

<ujrouu$2m1cb$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35208&group=comp.arch#35208

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Fri, 24 Nov 2023 21:16:43 -0600
Organization: A noiseless patient Spider
Lines: 75
Message-ID: <ujrouu$2m1cb$1@dont-email.me>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad>
<643607718b82ff03ae09d2b661963223@news.novabbs.com>
<uj1o0t$1kves$1@dont-email.me>
<7761287e80bb22b7742fd7f292664497@news.novabbs.com>
<uj9bm2$36401$1@dont-email.me>
<71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com>
<ujb40q$3eepe$1@dont-email.me> <ujrfaa$2h1v9$1@dont-email.me>
<987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com>
<ujrm4a$2llie$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 25 Nov 2023 03:16:46 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="7453bc91ed922c1bfb3262e310d4156c";
logging-data="2819467"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/+oK/WB0h2zOBrnkJcDMul"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:VMpA2GL01Lh62l6U8PchcJw7Ar8=
In-Reply-To: <ujrm4a$2llie$1@dont-email.me>
Content-Language: en-US
 by: BGB - Sat, 25 Nov 2023 03:16 UTC

On 11/24/2023 8:28 PM, Robert Finch wrote:
> On 2023-11-24 8:00 p.m., MitchAlsup wrote:
>> Robert Finch wrote:
>>
>>> On 2023-11-18 2:41 p.m., Robert Finch wrote:
>>> Q+ uses 64kB memory pages containing 8192 PTEs to map memory. A
>>> single 64kB page can handle 512MB of mappings. Tonight’s trade-off is
>>> how many root pointers to support. With a 12-bit ASID, 4096 root
>>> pointers are required to link to the mapping tables with one root
>>> pointer for each address space.
>>
>> So, you associate a single ROOT pointer VALUE with an ASID, and manage
>> in SW who gets that ROOT pointer VALUE; using ASID as an index into
>> Virtual Address Spaces.
>>
>> How is this usefully different that only using the ASID to qualify TLB
>> results ?? <Was this TLB entry installed from the same ASID as is
>> accessing
>> right now>. And using ASID as an index into any array might lead to some
>> conundrum down the road a apiece.
>>
>> Secondarily, SUN started out with 12-bit ASID and went to 16-bits just
>> about
>> as fast as they could--even before main memories went bigger than 4GB.
>
> I view the address space as an entity in it own right to be managed by
> the MMU. ASIDs and address spaces should be mapped 1:1. The ASID that
> identifies the address space has a life outside of just the TLB. I may
> be increasing the typical scope of an ASID.
>
> It is the same idea as using the ASID to qualify TLB entries, except
> that it qualifies the root pointer as well. So, the root pointer does
> not need to be switched by software. Once the root pointer is set for
> the AS it simply sits there statically until the AS is reused.
>
> I am using the ASID like a process ID. So, the root pointer register
> does not need to be reset on a task switch. Address spaces may not be
> mapped 1:1 with processes. An address space may outlive a task if it is
> shared with another task. So, I do not want to use the PID to
> distinguish tables. This assumes the address space will not be freed up
> and reused by another task, if there are tasks using the ASID.
>
> 4096 address spaces is a lot. But if using a 16-bit ASID it would no
> longer be practical to store a root pointer per ASID in a table.
> Instead, the root pointer would have to be managed by software as is
> normally done.
>
> I am wondering why the 16-bit ASID? 256 address spaces in 256 process? I
> suspect it is just because 16-bit is easier to pass around/calculate in
> a HLL than some other value like 14-bits. Are 65536 address spaces
> really needed?
>

If one assumes one address space per PID, then one is going to hit a
limit of 4K a lot faster than 64K, and when one hits the limit, there is
no good way to "reclaim" previously used address spaces short of
flushing the TLB to be sure that no entries from that space remain in
the TLB (ASID thrashing is likely to be relatively expensive to deal
with as a result).

Well, along with other things, like if/how to allow "Global" pages:
True global pages are likely a foot gun, as there is no way to exclude
them from a given process (where there may be a need to do so);
Disallowing global pages entirely means higher TLB miss rates because no
processes can share TLB entries.

One option seems to be, say, that a few of the high-order bits of the
ASID could be used as a "page group", with global pages only applying
within a single page-group (possibly with one of the page groups being
designated as "No global pages allowed").

....

Re: Tonight's tradeoff

<ujrqqk$2m71v$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35209&group=comp.arch#35209

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: robfi680@gmail.com (Robert Finch)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Fri, 24 Nov 2023 22:48:35 -0500
Organization: A noiseless patient Spider
Lines: 89
Message-ID: <ujrqqk$2m71v$1@dont-email.me>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad>
<643607718b82ff03ae09d2b661963223@news.novabbs.com>
<uj1o0t$1kves$1@dont-email.me>
<7761287e80bb22b7742fd7f292664497@news.novabbs.com>
<uj9bm2$36401$1@dont-email.me>
<71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com>
<ujb40q$3eepe$1@dont-email.me> <ujrfaa$2h1v9$1@dont-email.me>
<987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com>
<ujrm4a$2llie$1@dont-email.me> <ujrouu$2m1cb$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 25 Nov 2023 03:48:37 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2eefd5876b645ca0e35f3e27c42fcd7c";
logging-data="2825279"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19QkVDOpc9FY8SAEqMWA1ZOs9BqI1lJM7E="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:DXD6aSeFxUEJGDjeNJ98rTrz3bQ=
Content-Language: en-US
In-Reply-To: <ujrouu$2m1cb$1@dont-email.me>
 by: Robert Finch - Sat, 25 Nov 2023 03:48 UTC

On 2023-11-24 10:16 p.m., BGB wrote:
> On 11/24/2023 8:28 PM, Robert Finch wrote:
>> On 2023-11-24 8:00 p.m., MitchAlsup wrote:
>>> Robert Finch wrote:
>>>
>>>> On 2023-11-18 2:41 p.m., Robert Finch wrote:
>>>> Q+ uses 64kB memory pages containing 8192 PTEs to map memory. A
>>>> single 64kB page can handle 512MB of mappings. Tonight’s trade-off
>>>> is how many root pointers to support. With a 12-bit ASID, 4096 root
>>>> pointers are required to link to the mapping tables with one root
>>>> pointer for each address space.
>>>
>>> So, you associate a single ROOT pointer VALUE with an ASID, and manage
>>> in SW who gets that ROOT pointer VALUE; using ASID as an index into
>>> Virtual Address Spaces.
>>>
>>> How is this usefully different that only using the ASID to qualify TLB
>>> results ?? <Was this TLB entry installed from the same ASID as is
>>> accessing
>>> right now>. And using ASID as an index into any array might lead to some
>>> conundrum down the road a apiece.
>>>
>>> Secondarily, SUN started out with 12-bit ASID and went to 16-bits
>>> just about
>>> as fast as they could--even before main memories went bigger than 4GB.
>>
>> I view the address space as an entity in it own right to be managed by
>> the MMU. ASIDs and address spaces should be mapped 1:1. The ASID that
>> identifies the address space has a life outside of just the TLB. I may
>> be increasing the typical scope of an ASID.
>>
>> It is the same idea as using the ASID to qualify TLB entries, except
>> that it qualifies the root pointer as well. So, the root pointer does
>> not need to be switched by software. Once the root pointer is set for
>> the AS it simply sits there statically until the AS is reused.
>>
>> I am using the ASID like a process ID. So, the root pointer register
>> does not need to be reset on a task switch. Address spaces may not be
>> mapped 1:1 with processes. An address space may outlive a task if it
>> is shared with another task. So, I do not want to use the PID to
>> distinguish tables. This assumes the address space will not be freed
>> up and reused by another task, if there are tasks using the ASID.
>>
>> 4096 address spaces is a lot. But if using a 16-bit ASID it would no
>> longer be practical to store a root pointer per ASID in a table.
>> Instead, the root pointer would have to be managed by software as is
>> normally done.
>>
>> I am wondering why the 16-bit ASID? 256 address spaces in 256 process?
>> I suspect it is just because 16-bit is easier to pass around/calculate
>> in a HLL than some other value like 14-bits. Are 65536 address spaces
>> really needed?
>>
>
> If one assumes one address space per PID, then one is going to hit a
> limit of 4K a lot faster than 64K, and when one hits the limit, there is
> no good way to "reclaim" previously used address spaces short of
> flushing the TLB to be sure that no entries from that space remain in
> the TLB (ASID thrashing is likely to be relatively expensive to deal
> with as a result).
>
I see after reading several webpages that the root pointer is used to
point to only a single table for a process. This is not how I was doing
things. I have a MMU tables for each address space as opposed to having
a table for the process. The process may have only a single address
space, or it may use several address spaces.

I am wondering why there is only a single table per process.
>
>
> Well, along with other things, like if/how to allow "Global" pages:
> True global pages are likely a foot gun, as there is no way to exclude
> them from a given process (where there may be a need to do so);
> Disallowing global pages entirely means higher TLB miss rates because no
> processes can share TLB entries.
>
Global space can be assigned by designating an address space as a global
space and giving it an ASID. All process wanting access to the global
space need only then use the MMU table for that ASID. Eg. use ASID 0 for
the global address space.

> One option seems to be, say, that a few of the high-order bits of the
> ASID could be used as a "page group", with global pages only applying
> within a single page-group (possibly with one of the page groups being
> designated as "No global pages allowed").
>
> ...
>

Re: Tonight's tradeoff

<NUp8N.22588$yAie.3100@fx44.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35215&group=comp.arch#35215

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx44.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Tonight's tradeoff
Newsgroups: comp.arch
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad> <643607718b82ff03ae09d2b661963223@news.novabbs.com> <uj1o0t$1kves$1@dont-email.me> <7761287e80bb22b7742fd7f292664497@news.novabbs.com> <uj9bm2$36401$1@dont-email.me> <71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com> <ujb40q$3eepe$1@dont-email.me> <ujrfaa$2h1v9$1@dont-email.me> <987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com>
Lines: 28
Message-ID: <NUp8N.22588$yAie.3100@fx44.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sat, 25 Nov 2023 17:11:09 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sat, 25 Nov 2023 17:11:09 GMT
X-Received-Bytes: 2396
 by: Scott Lurndal - Sat, 25 Nov 2023 17:11 UTC

mitchalsup@aol.com (MitchAlsup) writes:
>Robert Finch wrote:
>
>> On 2023-11-18 2:41 p.m., Robert Finch wrote:
>> Q+ uses 64kB memory pages containing 8192 PTEs to map memory. A single
>> 64kB page can handle 512MB of mappings. Tonight’s trade-off is how many
>> root pointers to support. With a 12-bit ASID, 4096 root pointers are
>> required to link to the mapping tables with one root pointer for each
>> address space.
>
>So, you associate a single ROOT pointer VALUE with an ASID, and manage
>in SW who gets that ROOT pointer VALUE; using ASID as an index into
>Virtual Address Spaces.
>
>How is this usefully different that only using the ASID to qualify TLB
>results ?? <Was this TLB entry installed from the same ASID as is accessing
>right now>. And using ASID as an index into any array might lead to some
>conundrum down the road a apiece.
>
>Secondarily, SUN started out with 12-bit ASID and went to 16-bits just about
>as fast as they could--even before main memories went bigger than 4GB.

Yeah, armv8 was originally 8-bit, and added 16 even before the spec was dry.

I don't see a benefit to tying the ASID (or VMID for that matter) to
the root of the page table. Especially with the common split
address spaces (ARMv8 has a root pointer for each half of the VA space,
for example, where the upper half is shared by all schedulable entities).

Re: Tonight's tradeoff

<b_p8N.22589$yAie.8187@fx44.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35216&group=comp.arch#35216

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx44.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Tonight's tradeoff
Newsgroups: comp.arch
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad> <643607718b82ff03ae09d2b661963223@news.novabbs.com> <uj1o0t$1kves$1@dont-email.me> <7761287e80bb22b7742fd7f292664497@news.novabbs.com> <uj9bm2$36401$1@dont-email.me> <71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com> <ujb40q$3eepe$1@dont-email.me> <ujrfaa$2h1v9$1@dont-email.me> <987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com> <ujrm4a$2llie$1@dont-email.me>
Lines: 60
Message-ID: <b_p8N.22589$yAie.8187@fx44.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sat, 25 Nov 2023 17:16:55 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sat, 25 Nov 2023 17:16:55 GMT
X-Received-Bytes: 3719
 by: Scott Lurndal - Sat, 25 Nov 2023 17:16 UTC

Robert Finch <robfi680@gmail.com> writes:
>On 2023-11-24 8:00 p.m., MitchAlsup wrote:
>> Robert Finch wrote:
>>
>>> On 2023-11-18 2:41 p.m., Robert Finch wrote:
>>> Q+ uses 64kB memory pages containing 8192 PTEs to map memory. A single
>>> 64kB page can handle 512MB of mappings. Tonight’s trade-off is how
>>> many root pointers to support. With a 12-bit ASID, 4096 root pointers
>>> are required to link to the mapping tables with one root pointer for
>>> each address space.
>>
>> So, you associate a single ROOT pointer VALUE with an ASID, and manage
>> in SW who gets that ROOT pointer VALUE; using ASID as an index into
>> Virtual Address Spaces.
>>
>> How is this usefully different that only using the ASID to qualify TLB
>> results ?? <Was this TLB entry installed from the same ASID as is accessing
>> right now>. And using ASID as an index into any array might lead to some
>> conundrum down the road a apiece.
>>
>> Secondarily, SUN started out with 12-bit ASID and went to 16-bits just
>> about
>> as fast as they could--even before main memories went bigger than 4GB.
>
>I view the address space as an entity in it own right to be managed by
>the MMU. ASIDs and address spaces should be mapped 1:1. The ASID that
>identifies the address space has a life outside of just the TLB. I may
>be increasing the typical scope of an ASID.
>
>It is the same idea as using the ASID to qualify TLB entries, except
>that it qualifies the root pointer as well. So, the root pointer does
>not need to be switched by software. Once the root pointer is set for
>the AS it simply sits there statically until the AS is reused.
>
>I am using the ASID like a process ID. So, the root pointer register
>does not need to be reset on a task switch. Address spaces may not be
>mapped 1:1 with processes. An address space may outlive a task if it is
>shared with another task. So, I do not want to use the PID to
>distinguish tables. This assumes the address space will not be freed up
>and reused by another task, if there are tasks using the ASID.
>
>4096 address spaces is a lot. But if using a 16-bit ASID it would no
>longer be practical to store a root pointer per ASID in a table.
>Instead, the root pointer would have to be managed by software as is
>normally done.
>
>I am wondering why the 16-bit ASID? 256 address spaces in 256 process? I
>suspect it is just because 16-bit is easier to pass around/calculate in
>a HLL than some other value like 14-bits. Are 65536 address spaces
>really needed?
>

256 is far too small.

$ ps -ef | wc -l
709

Every time the ASID overflows, the system must basically flush
all the caches system-wide. On an 80 processor system, that's a lot of
overhead.

Re: Tonight's tradeoff

<C1q8N.22590$yAie.1858@fx44.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35217&group=comp.arch#35217

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.nntp4.net!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx44.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Tonight's tradeoff
Newsgroups: comp.arch
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad> <643607718b82ff03ae09d2b661963223@news.novabbs.com> <uj1o0t$1kves$1@dont-email.me> <7761287e80bb22b7742fd7f292664497@news.novabbs.com> <uj9bm2$36401$1@dont-email.me> <71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com> <ujb40q$3eepe$1@dont-email.me> <ujrfaa$2h1v9$1@dont-email.me> <987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com> <ujrm4a$2llie$1@dont-email.me> <ujrouu$2m1cb$1@dont-email.me> <ujrqqk$2m71v$1@dont-email.me>
Lines: 27
Message-ID: <C1q8N.22590$yAie.1858@fx44.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sat, 25 Nov 2023 17:20:34 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sat, 25 Nov 2023 17:20:34 GMT
X-Received-Bytes: 2384
 by: Scott Lurndal - Sat, 25 Nov 2023 17:20 UTC

Robert Finch <robfi680@gmail.com> writes:
>On 2023-11-24 10:16 p.m., BGB wrote:
>> On 11/24/2023 8:28 PM, Robert Finch wrote:
>>> On 2023-11-24 8:00 p.m., MitchAlsup wrote:

>>
>> If one assumes one address space per PID, then one is going to hit a
>> limit of 4K a lot faster than 64K, and when one hits the limit, there is
>> no good way to "reclaim" previously used address spaces short of
>> flushing the TLB to be sure that no entries from that space remain in
>> the TLB (ASID thrashing is likely to be relatively expensive to deal
>> with as a result).
>>
>I see after reading several webpages that the root pointer is used to
>point to only a single table for a process. This is not how I was doing
>things. I have a MMU tables for each address space as opposed to having
>a table for the process. The process may have only a single address
>space, or it may use several address spaces.
>
>I am wondering why there is only a single table per process.

There is actually two in most operating systems - the lower half
of the VA space is owned by the user-mode code in the process and
the upper-half is shared by all processors and used by the
operating system on behalf of the process. For Intel/AMD, the
kernel manages both halves, for ARMv8, each half has a completely
distinct and separate root pointer (at each exeception level).

Re: Tonight's tradeoff

<ujtcmg$2sse1$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35218&group=comp.arch#35218

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Sat, 25 Nov 2023 11:59:42 -0600
Organization: A noiseless patient Spider
Lines: 148
Message-ID: <ujtcmg$2sse1$1@dont-email.me>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad>
<643607718b82ff03ae09d2b661963223@news.novabbs.com>
<uj1o0t$1kves$1@dont-email.me>
<7761287e80bb22b7742fd7f292664497@news.novabbs.com>
<uj9bm2$36401$1@dont-email.me>
<71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com>
<ujb40q$3eepe$1@dont-email.me> <ujrfaa$2h1v9$1@dont-email.me>
<987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com>
<ujrm4a$2llie$1@dont-email.me> <ujrouu$2m1cb$1@dont-email.me>
<ujrqqk$2m71v$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 25 Nov 2023 17:59:44 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="7453bc91ed922c1bfb3262e310d4156c";
logging-data="3043777"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18/gM8+UEPHwMwa1BsTRwOz"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:RJqhwlLWPHKM3ifM6nH7SGcIhRM=
Content-Language: en-US
In-Reply-To: <ujrqqk$2m71v$1@dont-email.me>
 by: BGB - Sat, 25 Nov 2023 17:59 UTC

On 11/24/2023 9:48 PM, Robert Finch wrote:
> On 2023-11-24 10:16 p.m., BGB wrote:
>> On 11/24/2023 8:28 PM, Robert Finch wrote:
>>> On 2023-11-24 8:00 p.m., MitchAlsup wrote:
>>>> Robert Finch wrote:
>>>>
>>>>> On 2023-11-18 2:41 p.m., Robert Finch wrote:
>>>>> Q+ uses 64kB memory pages containing 8192 PTEs to map memory. A
>>>>> single 64kB page can handle 512MB of mappings. Tonight’s trade-off
>>>>> is how many root pointers to support. With a 12-bit ASID, 4096 root
>>>>> pointers are required to link to the mapping tables with one root
>>>>> pointer for each address space.
>>>>
>>>> So, you associate a single ROOT pointer VALUE with an ASID, and manage
>>>> in SW who gets that ROOT pointer VALUE; using ASID as an index into
>>>> Virtual Address Spaces.
>>>>
>>>> How is this usefully different that only using the ASID to qualify TLB
>>>> results ?? <Was this TLB entry installed from the same ASID as is
>>>> accessing
>>>> right now>. And using ASID as an index into any array might lead to
>>>> some
>>>> conundrum down the road a apiece.
>>>>
>>>> Secondarily, SUN started out with 12-bit ASID and went to 16-bits
>>>> just about
>>>> as fast as they could--even before main memories went bigger than 4GB.
>>>
>>> I view the address space as an entity in it own right to be managed
>>> by the MMU. ASIDs and address spaces should be mapped 1:1. The ASID
>>> that identifies the address space has a life outside of just the TLB.
>>> I may be increasing the typical scope of an ASID.
>>>
>>> It is the same idea as using the ASID to qualify TLB entries, except
>>> that it qualifies the root pointer as well. So, the root pointer does
>>> not need to be switched by software. Once the root pointer is set for
>>> the AS it simply sits there statically until the AS is reused.
>>>
>>> I am using the ASID like a process ID. So, the root pointer register
>>> does not need to be reset on a task switch. Address spaces may not be
>>> mapped 1:1 with processes. An address space may outlive a task if it
>>> is shared with another task. So, I do not want to use the PID to
>>> distinguish tables. This assumes the address space will not be freed
>>> up and reused by another task, if there are tasks using the ASID.
>>>
>>> 4096 address spaces is a lot. But if using a 16-bit ASID it would no
>>> longer be practical to store a root pointer per ASID in a table.
>>> Instead, the root pointer would have to be managed by software as is
>>> normally done.
>>>
>>> I am wondering why the 16-bit ASID? 256 address spaces in 256
>>> process? I suspect it is just because 16-bit is easier to pass
>>> around/calculate in a HLL than some other value like 14-bits. Are
>>> 65536 address spaces really needed?
>>>
>>
>> If one assumes one address space per PID, then one is going to hit a
>> limit of 4K a lot faster than 64K, and when one hits the limit, there
>> is no good way to "reclaim" previously used address spaces short of
>> flushing the TLB to be sure that no entries from that space remain in
>> the TLB (ASID thrashing is likely to be relatively expensive to deal
>> with as a result).
>>
> I see after reading several webpages that the root pointer is used to
> point to only a single table for a process. This is not how I was doing
> things. I have a MMU tables for each address space as opposed to having
> a table for the process. The process may have only a single address
> space, or it may use several address spaces.
>
> I am wondering why there is only a single table per process.

I went the opposite route of one big address space, with the idea of
allowing memory protection within this address space via the VUGID/ACL
mechanism. There is a KRR, or Keyring Register, which holds up to 4 keys
that may be used for ACL checking, granting an access if it is allowed
by at least one of the keys; triggering an ISR on miss similar to the
TLB. In this case, the conceptual model is more similar to that
typically used in filesystems.

But, I also have a 16-bit ASID...

As-is, there is at most one set of page tables per address space, or
per-process if processes are given different address spaces.

>>
>>
>> Well, along with other things, like if/how to allow "Global" pages:
>> True global pages are likely a foot gun, as there is no way to exclude
>> them from a given process (where there may be a need to do so);
>> Disallowing global pages entirely means higher TLB miss rates because
>> no processes can share TLB entries.
>>
> Global space can be assigned by designating an address space as a global
> space and giving it an ASID. All process wanting access to the global
> space need only then use the MMU table for that ASID. Eg. use ASID 0 for
> the global address space.
>

Had considered this, but there is a problem:
What if you have a process that you *don't* want to be able to see into
this global space?...

Though, this is where the idea of page-grouping can come in, say, the
ASID becomes:
gggg-pppp-pppp-pppp

Where:
0000 is visible to all of 0zzz
1000 is visible to all of 1zzz
...
Except:
Fzzz, this group does not have any global pages (all one-off ASIDs).

Or, possible also, is a 2.14 bit split.

>> One option seems to be, say, that a few of the high-order bits of the
>> ASID could be used as a "page group", with global pages only applying
>> within a single page-group (possibly with one of the page groups being
>> designated as "No global pages allowed").
>>
>> ...
>>
>

Meanwhile:
I went and bought 128GB of RAM, only to realize my PC doesn't work if
one tries to install the full 128GB (the BIOS boot-loops a bunch of
times, and then apparently concludes that there is only 3.5GB ...).

Does work at least if I install 3x 32GB sticks and 1x 16GB stick, giving
112GB. This breaks the pairing rules, but seems to be working.

....

Had I known this, could have spent half as much, and only upgraded to 96GB.

Seemingly MOBO/BIOS/... designers didn't anticipate someone sticking a
full 128GB in this thing?... (BIOS is dated from 2018).

Well, either this, or a hardware compatibility issue with one of the
cards?...

Re: Tonight's tradeoff

<d1f73b9de9ff6f86dac089ebd4bca037@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35222&group=comp.arch#35222

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Sat, 25 Nov 2023 19:31:13 +0000
Organization: novaBBS
Message-ID: <d1f73b9de9ff6f86dac089ebd4bca037@news.novabbs.com>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad> <643607718b82ff03ae09d2b661963223@news.novabbs.com> <uj1o0t$1kves$1@dont-email.me> <7761287e80bb22b7742fd7f292664497@news.novabbs.com> <uj9bm2$36401$1@dont-email.me> <71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com> <ujb40q$3eepe$1@dont-email.me> <ujrfaa$2h1v9$1@dont-email.me> <987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com> <ujrm4a$2llie$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2097450"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$KJwlSC2Xsee9PGneTY4Pzu2JPDV/atw1D.1AiwRRAqEUGAqSSs/.e
X-Spam-Level: *
 by: MitchAlsup - Sat, 25 Nov 2023 19:31 UTC

Robert Finch wrote:

> On 2023-11-24 8:00 p.m., MitchAlsup wrote:
>> Robert Finch wrote:
>>
>>> On 2023-11-18 2:41 p.m., Robert Finch wrote:
>>> Q+ uses 64kB memory pages containing 8192 PTEs to map memory. A single
>>> 64kB page can handle 512MB of mappings. Tonight’s trade-off is how
>>> many root pointers to support. With a 12-bit ASID, 4096 root pointers
>>> are required to link to the mapping tables with one root pointer for
>>> each address space.
>>
>> So, you associate a single ROOT pointer VALUE with an ASID, and manage
>> in SW who gets that ROOT pointer VALUE; using ASID as an index into
>> Virtual Address Spaces.
>>
>> How is this usefully different that only using the ASID to qualify TLB
>> results ?? <Was this TLB entry installed from the same ASID as is accessing
>> right now>. And using ASID as an index into any array might lead to some
>> conundrum down the road a apiece.
>>
>> Secondarily, SUN started out with 12-bit ASID and went to 16-bits just
>> about
>> as fast as they could--even before main memories went bigger than 4GB.

> I view the address space as an entity in it own right to be managed by
> the MMU. ASIDs and address spaces should be mapped 1:1. The ASID that
> identifies the address space has a life outside of just the TLB. I may
> be increasing the typical scope of an ASID.

Consider the case where two different processes MMAP the same area
of memory.
Should they both end up using the same ASID ??
Should they both take extra TLB walks because they use different ASIDs ??
Should they uses their own ASIDs for their own memory but a different ASID
for the shared memory ?? And How do you expect this to happen ??

> It is the same idea as using the ASID to qualify TLB entries, except
> that it qualifies the root pointer as well. So, the root pointer does
> not need to be switched by software. Once the root pointer is set for
> the AS it simply sits there statically until the AS is reused.

> I am using the ASID like a process ID. So, the root pointer register
> does not need to be reset on a task switch. Address spaces may not be
> mapped 1:1 with processes. An address space may outlive a task if it is
> shared with another task. So, I do not want to use the PID to
> distinguish tables. This assumes the address space will not be freed up
> and reused by another task, if there are tasks using the ASID.

> 4096 address spaces is a lot. But if using a 16-bit ASID it would no
> longer be practical to store a root pointer per ASID in a table.
> Instead, the root pointer would have to be managed by software as is
> normally done.

> I am wondering why the 16-bit ASID? 256 address spaces in 256 process? I
> suspect it is just because 16-bit is easier to pass around/calculate in
> a HLL than some other value like 14-bits. Are 65536 address spaces
> really needed?

Re: Tonight's tradeoff

<ce2e03aa9d8651401a05dae6c19bbacf@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35223&group=comp.arch#35223

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Sat, 25 Nov 2023 19:44:13 +0000
Organization: novaBBS
Message-ID: <ce2e03aa9d8651401a05dae6c19bbacf@news.novabbs.com>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad> <643607718b82ff03ae09d2b661963223@news.novabbs.com> <uj1o0t$1kves$1@dont-email.me> <7761287e80bb22b7742fd7f292664497@news.novabbs.com> <uj9bm2$36401$1@dont-email.me> <71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com> <ujb40q$3eepe$1@dont-email.me> <ujrfaa$2h1v9$1@dont-email.me> <987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com> <ujrm4a$2llie$1@dont-email.me> <ujrouu$2m1cb$1@dont-email.me> <ujrqqk$2m71v$1@dont-email.me> <C1q8N.22590$yAie.1858@fx44.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2098558"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$yK46pnlU5p0pRwuRqi8BHeEieGUTTPQQNcJyAJGEwMy9TVcofTLTS
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Sat, 25 Nov 2023 19:44 UTC

Scott Lurndal wrote:

> Robert Finch <robfi680@gmail.com> writes:
>>On 2023-11-24 10:16 p.m., BGB wrote:
>>> On 11/24/2023 8:28 PM, Robert Finch wrote:
>>>> On 2023-11-24 8:00 p.m., MitchAlsup wrote:

>>>
>>> If one assumes one address space per PID, then one is going to hit a
>>> limit of 4K a lot faster than 64K, and when one hits the limit, there is
>>> no good way to "reclaim" previously used address spaces short of
>>> flushing the TLB to be sure that no entries from that space remain in
>>> the TLB (ASID thrashing is likely to be relatively expensive to deal
>>> with as a result).
>>>
>>I see after reading several webpages that the root pointer is used to
>>point to only a single table for a process. This is not how I was doing
>>things. I have a MMU tables for each address space as opposed to having
>>a table for the process. The process may have only a single address
>>space, or it may use several address spaces.
>>
>>I am wondering why there is only a single table per process.

> There is actually two in most operating systems - the lower half
> of the VA space is owned by the user-mode code in the process and
> the upper-half is shared by all processors and used by the
> operating system on behalf of the process. For Intel/AMD, the
> kernel manages both halves, for ARMv8, each half has a completely
> distinct and separate root pointer (at each exeception level).

My 66000 Architecture has 4 Root Pointers available at all instants
of time. The above was designed before the rise of HyperVisors and is
now showing its age problems. All 4 Root Pointers are used based on
privilege level::

HOB=0 HOB=1
Application:: Application 2-level No Access
Guest OS :: Application 2-level Guest OS 2-level
Guest HV :: Guest HV 1-level Guest OS 2-level
Real HV :: Guest HV 1-level Real HV 1-level

The overhead of Application to Application is no higher than that
of Guest OS to a different Guest OS--whereas on the machines with
VMENTER and VMEXIT it takes 10,000 cycles whereas Application to
Application is closer to 1,000 cycles. I want this down in the
10-100 cycle range.

The exception <stack> system is designed to allow Guest HV to
recover a Guest OS that takes page faults while servicing ISRs
(and the like).

The interrupt <stack> system is designed to allow the ISR to
RPC or softIRQ without having to look at the pending stack on
the way out. RTI looks at the pending stack and services the
highest pending PRC/softIRQ affinitized to the CPU with control.

The Interrupt dispatch system allows the CPU to continue running
instructions until the contending CPUs decide which interrupt
is claimed by which CPU (1::1) and then context switch do the
interrupt dispatcher.

Re: Tonight's tradeoff

<bps8N.150652$wvv7.7314@fx14.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35225&group=comp.arch#35225

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.swapon.de!news.in-chemnitz.de!3.eu.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx14.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Tonight's tradeoff
Newsgroups: comp.arch
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad> <643607718b82ff03ae09d2b661963223@news.novabbs.com> <uj1o0t$1kves$1@dont-email.me> <7761287e80bb22b7742fd7f292664497@news.novabbs.com> <uj9bm2$36401$1@dont-email.me> <71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com> <ujb40q$3eepe$1@dont-email.me> <ujrfaa$2h1v9$1@dont-email.me> <987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com> <ujrm4a$2llie$1@dont-email.me> <d1f73b9de9ff6f86dac089ebd4bca037@news.novabbs.com>
Lines: 55
Message-ID: <bps8N.150652$wvv7.7314@fx14.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sat, 25 Nov 2023 20:02:15 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sat, 25 Nov 2023 20:02:15 GMT
X-Received-Bytes: 3313
 by: Scott Lurndal - Sat, 25 Nov 2023 20:02 UTC

mitchalsup@aol.com (MitchAlsup) writes:
>Robert Finch wrote:
>
>> On 2023-11-24 8:00 p.m., MitchAlsup wrote:
>>> Robert Finch wrote:
>>>
>>>> On 2023-11-18 2:41 p.m., Robert Finch wrote:
>>>> Q+ uses 64kB memory pages containing 8192 PTEs to map memory. A single
>>>> 64kB page can handle 512MB of mappings. Tonight’s trade-off is how
>>>> many root pointers to support. With a 12-bit ASID, 4096 root pointers
>>>> are required to link to the mapping tables with one root pointer for
>>>> each address space.
>>>
>>> So, you associate a single ROOT pointer VALUE with an ASID, and manage
>>> in SW who gets that ROOT pointer VALUE; using ASID as an index into
>>> Virtual Address Spaces.
>>>
>>> How is this usefully different that only using the ASID to qualify TLB
>>> results ?? <Was this TLB entry installed from the same ASID as is accessing
>>> right now>. And using ASID as an index into any array might lead to some
>>> conundrum down the road a apiece.
>>>
>>> Secondarily, SUN started out with 12-bit ASID and went to 16-bits just
>>> about
>>> as fast as they could--even before main memories went bigger than 4GB.
>
>> I view the address space as an entity in it own right to be managed by
>> the MMU. ASIDs and address spaces should be mapped 1:1. The ASID that
>> identifies the address space has a life outside of just the TLB. I may
>> be increasing the typical scope of an ASID.
>
>Consider the case where two different processes MMAP the same area
>of memory.

In which case, the area of memory would be mapped to different
virtual address ranges in each process, and thus naturally
consume two TLBs.

FWIW, MAP_FIXED is specified as an optional feature by POSIX
and may not be supported by the OS at all.

Given various forms of ASLR being used, it's unlikely even in
two instances of the same executable that a call to mmap
with MAP_SHARED without MAP_FIXED would map the region at
the same virtual address in both processes.

>Should they both end up using the same ASID ??

They couldn't share an ASID assuming the TLB looks up by VA.

>Should they both take extra TLB walks because they use different ASIDs ??

Given the above, yes. It's likely they'll each be scheduled
on different cores anyway in any modern system.

Re: Tonight's tradeoff

<1cc3cef16ea12c020cb2fd81c9e0e365@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35228&group=comp.arch#35228

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Sat, 25 Nov 2023 20:40:11 +0000
Organization: novaBBS
Message-ID: <1cc3cef16ea12c020cb2fd81c9e0e365@news.novabbs.com>
References: <uis67u$fkj4$1@dont-email.me> <aUr4N.33009$BbXa.15163@fx16.iad> <643607718b82ff03ae09d2b661963223@news.novabbs.com> <uj1o0t$1kves$1@dont-email.me> <7761287e80bb22b7742fd7f292664497@news.novabbs.com> <uj9bm2$36401$1@dont-email.me> <71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com> <ujb40q$3eepe$1@dont-email.me> <ujrfaa$2h1v9$1@dont-email.me> <987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com> <ujrm4a$2llie$1@dont-email.me> <d1f73b9de9ff6f86dac089ebd4bca037@news.novabbs.com> <bps8N.150652$wvv7.7314@fx14.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2103362"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$sOKQczu8brdlAFw2S8NVH.HDp4jbTT3niYm27Ei62URbOMWI53lUW
X-Spam-Level: *
 by: MitchAlsup - Sat, 25 Nov 2023 20:40 UTC

Scott Lurndal wrote:

> mitchalsup@aol.com (MitchAlsup) writes:
>>Robert Finch wrote:
>>
>>> On 2023-11-24 8:00 p.m., MitchAlsup wrote:
>>>> Robert Finch wrote:
>>>>
>>>>> On 2023-11-18 2:41 p.m., Robert Finch wrote:
>>>>> Q+ uses 64kB memory pages containing 8192 PTEs to map memory. A single
>>>>> 64kB page can handle 512MB of mappings. Tonight’s trade-off is how
>>>>> many root pointers to support. With a 12-bit ASID, 4096 root pointers
>>>>> are required to link to the mapping tables with one root pointer for
>>>>> each address space.
>>>>
>>>> So, you associate a single ROOT pointer VALUE with an ASID, and manage
>>>> in SW who gets that ROOT pointer VALUE; using ASID as an index into
>>>> Virtual Address Spaces.
>>>>
>>>> How is this usefully different that only using the ASID to qualify TLB
>>>> results ?? <Was this TLB entry installed from the same ASID as is accessing
>>>> right now>. And using ASID as an index into any array might lead to some
>>>> conundrum down the road a apiece.
>>>>
>>>> Secondarily, SUN started out with 12-bit ASID and went to 16-bits just
>>>> about
>>>> as fast as they could--even before main memories went bigger than 4GB.
>>
>>> I view the address space as an entity in it own right to be managed by
>>> the MMU. ASIDs and address spaces should be mapped 1:1. The ASID that
>>> identifies the address space has a life outside of just the TLB. I may
>>> be increasing the typical scope of an ASID.
>>
>>Consider the case where two different processes MMAP the same area
>>of memory.

> In which case, the area of memory would be mapped to different
> virtual address ranges in each process, and thus naturally
> consume two TLBs.

MMAP() first, fork() second. Now we have 2 processes with the
memory mapped shared memory at the same address.

> FWIW, MAP_FIXED is specified as an optional feature by POSIX
> and may not be supported by the OS at all.

> Given various forms of ASLR being used, it's unlikely even in
> two instances of the same executable that a call to mmap
> with MAP_SHARED without MAP_FIXED would map the region at
> the same virtual address in both processes.

>>Should they both end up using the same ASID ??

> They couldn't share an ASID assuming the TLB looks up by VA.

>>Should they both take extra TLB walks because they use different ASIDs ??

> Given the above, yes. It's likely they'll each be scheduled
> on different cores anyway in any modern system.

Re: Tonight's tradeoff

<Y2u8N.108363$svP4.76046@fx12.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35232&group=comp.arch#35232

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.network!news.neodome.net!news.mixmin.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx12.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Tonight's tradeoff
Newsgroups: comp.arch
References: <uis67u$fkj4$1@dont-email.me> <uj1o0t$1kves$1@dont-email.me> <7761287e80bb22b7742fd7f292664497@news.novabbs.com> <uj9bm2$36401$1@dont-email.me> <71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com> <ujb40q$3eepe$1@dont-email.me> <ujrfaa$2h1v9$1@dont-email.me> <987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com> <ujrm4a$2llie$1@dont-email.me> <d1f73b9de9ff6f86dac089ebd4bca037@news.novabbs.com> <bps8N.150652$wvv7.7314@fx14.iad> <1cc3cef16ea12c020cb2fd81c9e0e365@news.novabbs.com>
Lines: 64
Message-ID: <Y2u8N.108363$svP4.76046@fx12.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sat, 25 Nov 2023 21:55:04 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sat, 25 Nov 2023 21:55:04 GMT
X-Received-Bytes: 3823
 by: Scott Lurndal - Sat, 25 Nov 2023 21:55 UTC

mitchalsup@aol.com (MitchAlsup) writes:
>Scott Lurndal wrote:
>
>> mitchalsup@aol.com (MitchAlsup) writes:
>>>Robert Finch wrote:
>>>
>>>> On 2023-11-24 8:00 p.m., MitchAlsup wrote:
>>>>> Robert Finch wrote:
>>>>>
>>>>>> On 2023-11-18 2:41 p.m., Robert Finch wrote:
>>>>>> Q+ uses 64kB memory pages containing 8192 PTEs to map memory. A single
>>>>>> 64kB page can handle 512MB of mappings. Tonight’s trade-off is how
>>>>>> many root pointers to support. With a 12-bit ASID, 4096 root pointers
>>>>>> are required to link to the mapping tables with one root pointer for
>>>>>> each address space.
>>>>>
>>>>> So, you associate a single ROOT pointer VALUE with an ASID, and manage
>>>>> in SW who gets that ROOT pointer VALUE; using ASID as an index into
>>>>> Virtual Address Spaces.
>>>>>
>>>>> How is this usefully different that only using the ASID to qualify TLB
>>>>> results ?? <Was this TLB entry installed from the same ASID as is accessing
>>>>> right now>. And using ASID as an index into any array might lead to some
>>>>> conundrum down the road a apiece.
>>>>>
>>>>> Secondarily, SUN started out with 12-bit ASID and went to 16-bits just
>>>>> about
>>>>> as fast as they could--even before main memories went bigger than 4GB.
>>>
>>>> I view the address space as an entity in it own right to be managed by
>>>> the MMU. ASIDs and address spaces should be mapped 1:1. The ASID that
>>>> identifies the address space has a life outside of just the TLB. I may
>>>> be increasing the typical scope of an ASID.
>>>
>>>Consider the case where two different processes MMAP the same area
>>>of memory.
>
>> In which case, the area of memory would be mapped to different
>> virtual address ranges in each process, and thus naturally
>> consume two TLBs.
>
>MMAP() first, fork() second. Now we have 2 processes with the
>memory mapped shared memory at the same address.

Yes, in that case, they'll be mapped at the same VA. All
the below points still apply so long as TLB's are per core.

>
>> FWIW, MAP_FIXED is specified as an optional feature by POSIX
>> and may not be supported by the OS at all.
>
>> Given various forms of ASLR being used, it's unlikely even in
>> two instances of the same executable that a call to mmap
>> with MAP_SHARED without MAP_FIXED would map the region at
>> the same virtual address in both processes.
>
>>>Should they both end up using the same ASID ??
>
>> They couldn't share an ASID assuming the TLB looks up by VA.
>
>>>Should they both take extra TLB walks because they use different ASIDs ??
>
>> Given the above, yes. It's likely they'll each be scheduled
>> on different cores anyway in any modern system.

Re: Tonight's tradeoff

<uju4k6$3040j$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35242&group=comp.arch#35242

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: robfi680@gmail.com (Robert Finch)
Newsgroups: comp.arch
Subject: Re: Tonight's tradeoff
Date: Sat, 25 Nov 2023 19:48:06 -0500
Organization: A noiseless patient Spider
Lines: 10
Message-ID: <uju4k6$3040j$1@dont-email.me>
References: <uis67u$fkj4$1@dont-email.me> <uj1o0t$1kves$1@dont-email.me>
<7761287e80bb22b7742fd7f292664497@news.novabbs.com>
<uj9bm2$36401$1@dont-email.me>
<71cb5ad7604b3d909df865a19ee3d52e@news.novabbs.com>
<ujb40q$3eepe$1@dont-email.me> <ujrfaa$2h1v9$1@dont-email.me>
<987455c358f93a9a7896c9af3d5f2b75@news.novabbs.com>
<ujrm4a$2llie$1@dont-email.me>
<d1f73b9de9ff6f86dac089ebd4bca037@news.novabbs.com>
<bps8N.150652$wvv7.7314@fx14.iad>
<1cc3cef16ea12c020cb2fd81c9e0e365@news.novabbs.com>
<Y2u8N.108363$svP4.76046@fx12.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 26 Nov 2023 00:48:06 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a2b5c38141bd1c6f19e5121613a6d88a";
logging-data="3149843"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+AylcxuugAKrc4epTQQjJmxAVTgHvdYqA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:FwxHY0XpLfaEy2m8Gtzyx/aaio0=
Content-Language: en-US
In-Reply-To: <Y2u8N.108363$svP4.76046@fx12.iad>
 by: Robert Finch - Sun, 26 Nov 2023 00:48 UTC

Are top-level page directory pages shared between tasks? Suppose a task
needs a 32-bit address space. With one level of page maps, 27 bits is
accommodated, that leaves 5 bits of address translation to be done by
the page directory. Using a whole page which can handle 11 address bits
would be wasteful. But if root pointers could point into the same page
directory page then the space would not be wasted. For instance, root
pointer for task #1 could point the first 32 entries, root pointer for
task #2 could point into the next 32 entries, and so on.


devel / comp.arch / Re: Tonight's tradeoff

Pages:123456789101112
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor