Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

I have hardly ever known a mathematician who was capable of reasoning. -- Plato


devel / comp.arch / Re: Load/Store with auto-increment

SubjectAuthor
* Re: Load/Store with auto-incrementEricP
+* Re: Load/Store with auto-incrementMitchAlsup
|`* Re: Load/Store with auto-incrementEricP
| `* Re: Load/Store with auto-incrementluke.l...@gmail.com
|  +* Re: Load/Store with auto-incrementluke.l...@gmail.com
|  |`- Re: Load/Store with auto-incrementEricP
|  `* Re: Load/Store with auto-incrementEricP
|   `* Re: Load/Store with auto-incrementMitchAlsup
|    `- Re: Load/Store with auto-incrementEricP
`- Re: Load/Store with auto-incrementluke.l...@gmail.com

1
Re: Load/Store with auto-increment

<nEb7M.1732803$t5W7.275293@fx13.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32175&group=comp.arch#32175

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx13.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
References: <u35prk$2ssbq$1@dont-email.me> <u36fd2$121nc$1@newsreader4.netcologne.de> <2023May9.111344@mips.complang.tuwien.ac.at> <UQt6M.233407$qpNc.65909@fx03.iad> <_Qu6M.539024$Olad.404121@fx35.iad> <Y7y6M.233411$qpNc.12100@fx03.iad> <2023May10.100025@mips.complang.tuwien.ac.at> <LHP6M.2840676$9sn9.1828478@fx17.iad> <2023May11.120936@mips.complang.tuwien.ac.at>
In-Reply-To: <2023May11.120936@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 56
Message-ID: <nEb7M.1732803$t5W7.275293@fx13.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Thu, 11 May 2023 19:48:35 UTC
Date: Thu, 11 May 2023 15:48:14 -0400
X-Received-Bytes: 3350
 by: EricP - Thu, 11 May 2023 19:48 UTC

Anton Ertl wrote:
> EricP <ThatWouldBeTelling@thevillage.com> writes:
>> Anton Ertl wrote:
> [..]
>> But trying to come up with a circuit that can choose the optimum
>> crossbar configuration in 1 clock proved too difficult at that time.
>
> How about taking >1 cycle? What would suffer in that case?

Continued from prior msg...

To rename 4 or more lanes I needed was a faster way to allocate
resources in parallel with some log_N gate delay growth.

One part of the mechanism I came up with I call a "running popcount'.
It takes a vector of input bits, and outputs for each bit position
the sum the number number of 1's at that position and before.

e.g. from left to right

Pos: 0 1 2 3 4 5 6 7
Bit: 0 0 1 1 0 1 0 1
Sum: 0 0 1 2 2 3 3 4

At position 5 the sum is 3 1-bits there and to the left.

A running popcount circuit can be built from a multiple layer
summation tree, requiring log_2(number of input bits) layers.
I start by separating the inputs into groups of 3 because
a single full adder can sum 3 input bits.
Then the 3-groups are joined together into 6-groups, 12-groups, ...
with each layer doubling in size until all the input bits are counted.
So 255 inputs can be running-popcount'ed with 8 layers of
7-bit or less carry-lookahead adders.

The running popcount circuit can be used to select resources.
Each free resource indicates with a 1 input.
A compare-EQ (nxor) on the output of each position compares
that position sum to a fixed number. If the input is a 1 AND
the output sum == 3 then this is the third free resource.
The position number is converted to binary and output as the
resource number of the third available free resource.

I had planned on using that for choosing free physical registers
for multiple lanes.

A running popcount could also be used to allocate register read ports
to uOp lanes. Each uOp outputs a 1 for each source register that it reads,
the running sum is the port number to be used for reading that operand.
If a uOp gets a port for all its register operands then it can dispatch.
The number of resources to choose from is of course smaller than
what I was choosing from. For 4 lanes, with uOps usually having 2 source
registers, plus a couple of extras, so maybe 10 read ports.

Re: Load/Store with auto-increment

<7a6c447f-5e17-46db-bca6-80f818a9a202n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32177&group=comp.arch#32177

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:911:b0:61b:5dd8:e623 with SMTP id dj17-20020a056214091100b0061b5dd8e623mr4111245qvb.3.1683836863477;
Thu, 11 May 2023 13:27:43 -0700 (PDT)
X-Received: by 2002:aca:a98a:0:b0:38d:788b:38cd with SMTP id
s132-20020acaa98a000000b0038d788b38cdmr2323307oie.3.1683836863215; Thu, 11
May 2023 13:27:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 11 May 2023 13:27:42 -0700 (PDT)
In-Reply-To: <nEb7M.1732803$t5W7.275293@fx13.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:b141:ed72:1f40:88ff;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:b141:ed72:1f40:88ff
References: <u35prk$2ssbq$1@dont-email.me> <u36fd2$121nc$1@newsreader4.netcologne.de>
<2023May9.111344@mips.complang.tuwien.ac.at> <UQt6M.233407$qpNc.65909@fx03.iad>
<_Qu6M.539024$Olad.404121@fx35.iad> <Y7y6M.233411$qpNc.12100@fx03.iad>
<2023May10.100025@mips.complang.tuwien.ac.at> <LHP6M.2840676$9sn9.1828478@fx17.iad>
<2023May11.120936@mips.complang.tuwien.ac.at> <nEb7M.1732803$t5W7.275293@fx13.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7a6c447f-5e17-46db-bca6-80f818a9a202n@googlegroups.com>
Subject: Re: Load/Store with auto-increment
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 11 May 2023 20:27:43 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4528
 by: MitchAlsup - Thu, 11 May 2023 20:27 UTC

On Thursday, May 11, 2023 at 2:50:55 PM UTC-5, EricP wrote:
> Anton Ertl wrote:
> > EricP <ThatWould...@thevillage.com> writes:
> >> Anton Ertl wrote:
> > [..]
> >> But trying to come up with a circuit that can choose the optimum
> >> crossbar configuration in 1 clock proved too difficult at that time.
> >
> > How about taking >1 cycle? What would suffer in that case?
> Continued from prior msg...
> To rename 4 or more lanes I needed was a faster way to allocate
> resources in parallel with some log_N gate delay growth.
> One part of the mechanism I came up with I call a "running popcount'.
> It takes a vector of input bits, and outputs for each bit position
> the sum the number number of 1's at that position and before.
>
> e.g. from left to right
>
> Pos: 0 1 2 3 4 5 6 7
> Bit: 0 0 1 1 0 1 0 1
> Sum: 0 0 1 2 2 3 3 4
>
> At position 5 the sum is 3 1-bits there and to the left.
<
An admirable invention.
>
> A running popcount circuit can be built from a multiple layer
> summation tree, requiring log_2(number of input bits) layers.
> I start by separating the inputs into groups of 3 because
> a single full adder can sum 3 input bits.
<
At this point, you have a multiplier summation tree, an odd
multiplication width and an even odder multiplication height.
<
> Then the 3-groups are joined together into 6-groups, 12-groups, ...
> with each layer doubling in size until all the input bits are counted.
> So 255 inputs can be running-popcount'ed with 8 layers of
> 7-bit or less carry-lookahead adders.
<
One can implement Popcount as multiplication by altering the
Boothe recoder so that only the center column of the tree has
bit-product terms.
>
> The running popcount circuit can be used to select resources.
> Each free resource indicates with a 1 input.
> A compare-EQ (nxor) on the output of each position compares
> that position sum to a fixed number. If the input is a 1 AND
> the output sum == 3 then this is the third free resource.
> The position number is converted to binary and output as the
> resource number of the third available free resource.
>
> I had planned on using that for choosing free physical registers
> for multiple lanes.
>
> A running popcount could also be used to allocate register read ports
> to uOp lanes. Each uOp outputs a 1 for each source register that it reads,
> the running sum is the port number to be used for reading that operand.
> If a uOp gets a port for all its register operands then it can dispatch.
> The number of resources to choose from is of course smaller than
> what I was choosing from. For 4 lanes, with uOps usually having 2 source
> registers, plus a couple of extras, so maybe 10 read ports.

Re: Load/Store with auto-increment

<Y7P7M.1899588$gGD7.1460511@fx11.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32199&group=comp.arch#32199

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx11.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
References: <u35prk$2ssbq$1@dont-email.me> <u36fd2$121nc$1@newsreader4.netcologne.de> <2023May9.111344@mips.complang.tuwien.ac.at> <UQt6M.233407$qpNc.65909@fx03.iad> <_Qu6M.539024$Olad.404121@fx35.iad> <Y7y6M.233411$qpNc.12100@fx03.iad> <2023May10.100025@mips.complang.tuwien.ac.at> <LHP6M.2840676$9sn9.1828478@fx17.iad> <2023May11.120936@mips.complang.tuwien.ac.at> <nEb7M.1732803$t5W7.275293@fx13.iad> <7a6c447f-5e17-46db-bca6-80f818a9a202n@googlegroups.com>
In-Reply-To: <7a6c447f-5e17-46db-bca6-80f818a9a202n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 83
Message-ID: <Y7P7M.1899588$gGD7.1460511@fx11.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sat, 13 May 2023 16:44:40 UTC
Date: Sat, 13 May 2023 12:43:02 -0400
X-Received-Bytes: 4608
 by: EricP - Sat, 13 May 2023 16:43 UTC

MitchAlsup wrote:
> On Thursday, May 11, 2023 at 2:50:55 PM UTC-5, EricP wrote:
>> Anton Ertl wrote:
>>> EricP <ThatWould...@thevillage.com> writes:
>>>> Anton Ertl wrote:
>>> [..]
>>>> But trying to come up with a circuit that can choose the optimum
>>>> crossbar configuration in 1 clock proved too difficult at that time.
>>> How about taking >1 cycle? What would suffer in that case?
>> Continued from prior msg...
>> To rename 4 or more lanes I needed was a faster way to allocate
>> resources in parallel with some log_N gate delay growth.
>> One part of the mechanism I came up with I call a "running popcount'.
>> It takes a vector of input bits, and outputs for each bit position
>> the sum the number number of 1's at that position and before.
>>
>> e.g. from left to right
>>
>> Pos: 0 1 2 3 4 5 6 7
>> Bit: 0 0 1 1 0 1 0 1
>> Sum: 0 0 1 2 2 3 3 4
>>
>> At position 5 the sum is 3 1-bits there and to the left.
> <
> An admirable invention.

I'm not the first one to think of a running popcount
but I did have the same idea independently.

It sums up a series of inputs to count resources requests
which is something people have done for a long time,
such as I was doing for allocating read ports to uOp lanes.

I had thought of this serially which limits its applicability.
Later I realized that counting requests is a kind of popcount so the
sums can be done log_2 allowing it to be applied to larger resource pools.

I see the key idea as having each positions' sum drive a binary decoder
in parallel allowing all the IF-ELSE decisions to run concurrent,
such as output a selected resource id or controlling a mux.
That is where all the selection work is done.

It would take a shit load of gates and having a large number
of positions OR'ing their outputs together in giant trees
will limit how much can actually be done.

>> A running popcount circuit can be built from a multiple layer
>> summation tree, requiring log_2(number of input bits) layers.
>> I start by separating the inputs into groups of 3 because
>> a single full adder can sum 3 input bits.
> <
> At this point, you have a multiplier summation tree, an odd
> multiplication width and an even odder multiplication height.
> <

Yes I knew my summing approach is odd.
My feeling is that I just need to show that there is at least one
feasible implementation of a proposed solution that meets the
requirements of producing a running sum and scales log_2 or better.

>> Then the 3-groups are joined together into 6-groups, 12-groups, ...
>> with each layer doubling in size until all the input bits are counted.
>> So 255 inputs can be running-popcount'ed with 8 layers of
>> 7-bit or less carry-lookahead adders.
> <
> One can implement Popcount as multiplication by altering the
> Boothe recoder so that only the center column of the tree has
> bit-product terms.

Yes but that is for a single popcount value.
This produces a vector of running popcount values.
For N input bits it generates a vector of N sums.

Here I was thinking back to your log_2 decoder that takes tokens
and joins them in pairs, then quads, then octets, etc.
That gave me the idea to do similar but with runs of sums.
The number 3 comes from a full adder summing 3 inputs,
then the group of 3 joins to 6, then 12, 24, 48, ... in a tree of adders.

As you point out there are better way to do this.

Re: Load/Store with auto-increment

<c42be719-39b8-45a0-8c16-f992972b293bn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32200&group=comp.arch#32200

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5a03:0:b0:5ef:4ec7:c70a with SMTP id ei3-20020ad45a03000000b005ef4ec7c70amr5139946qvb.1.1683997595860;
Sat, 13 May 2023 10:06:35 -0700 (PDT)
X-Received: by 2002:a4a:a8c2:0:b0:547:54e2:688a with SMTP id
r2-20020a4aa8c2000000b0054754e2688amr5476699oom.0.1683997595559; Sat, 13 May
2023 10:06:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 13 May 2023 10:06:35 -0700 (PDT)
In-Reply-To: <nEb7M.1732803$t5W7.275293@fx13.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=92.19.80.230; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 92.19.80.230
References: <u35prk$2ssbq$1@dont-email.me> <u36fd2$121nc$1@newsreader4.netcologne.de>
<2023May9.111344@mips.complang.tuwien.ac.at> <UQt6M.233407$qpNc.65909@fx03.iad>
<_Qu6M.539024$Olad.404121@fx35.iad> <Y7y6M.233411$qpNc.12100@fx03.iad>
<2023May10.100025@mips.complang.tuwien.ac.at> <LHP6M.2840676$9sn9.1828478@fx17.iad>
<2023May11.120936@mips.complang.tuwien.ac.at> <nEb7M.1732803$t5W7.275293@fx13.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c42be719-39b8-45a0-8c16-f992972b293bn@googlegroups.com>
Subject: Re: Load/Store with auto-increment
From: luke.leighton@gmail.com (luke.l...@gmail.com)
Injection-Date: Sat, 13 May 2023 17:06:35 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2022
 by: luke.l...@gmail.com - Sat, 13 May 2023 17:06 UTC

On Thursday, May 11, 2023 at 8:50:55 PM UTC+1, EricP wrote:
>
> Pos: 0 1 2 3 4 5 6 7
> Bit: 0 0 1 1 0 1 0 1
> Sum: 0 0 1 2 2 3 3 4
>

there is a vector isa instruction for this, RVV i believe.
nice to see popping up. and impl. discussion.
popcount trees. parallel prefix sum normal way.
1st sum 1st bit, 2nd sum 1st&2nd ...
merge identical tree snippets avoid duplication
should not be mad gates.

l.

Re: Load/Store with auto-increment

<647cfb50-e24e-4c2d-a9aa-269083f50be6n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32201&group=comp.arch#32201

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:4d51:0:b0:618:ab80:7619 with SMTP id m17-20020ad44d51000000b00618ab807619mr4778051qvm.3.1683998323975;
Sat, 13 May 2023 10:18:43 -0700 (PDT)
X-Received: by 2002:a05:6870:7d13:b0:195:c867:f0e4 with SMTP id
os19-20020a0568707d1300b00195c867f0e4mr9314972oab.2.1683998323640; Sat, 13
May 2023 10:18:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 13 May 2023 10:18:43 -0700 (PDT)
In-Reply-To: <Y7P7M.1899588$gGD7.1460511@fx11.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=92.19.80.230; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 92.19.80.230
References: <u35prk$2ssbq$1@dont-email.me> <u36fd2$121nc$1@newsreader4.netcologne.de>
<2023May9.111344@mips.complang.tuwien.ac.at> <UQt6M.233407$qpNc.65909@fx03.iad>
<_Qu6M.539024$Olad.404121@fx35.iad> <Y7y6M.233411$qpNc.12100@fx03.iad>
<2023May10.100025@mips.complang.tuwien.ac.at> <LHP6M.2840676$9sn9.1828478@fx17.iad>
<2023May11.120936@mips.complang.tuwien.ac.at> <nEb7M.1732803$t5W7.275293@fx13.iad>
<7a6c447f-5e17-46db-bca6-80f818a9a202n@googlegroups.com> <Y7P7M.1899588$gGD7.1460511@fx11.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <647cfb50-e24e-4c2d-a9aa-269083f50be6n@googlegroups.com>
Subject: Re: Load/Store with auto-increment
From: luke.leighton@gmail.com (luke.l...@gmail.com)
Injection-Date: Sat, 13 May 2023 17:18:43 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2179
 by: luke.l...@gmail.com - Sat, 13 May 2023 17:18 UTC

On Saturday, May 13, 2023 at 5:44:44 PM UTC+1, EricP wrote:

> It sums up a series of inputs to count resources requests
> which is something people have done for a long time,
> such as I was doing for allocating read ports to uOp lanes.

num ports != max popcount suggests alternative.
num ports <<< max popcount use chained PriorityPickers?
PPs are flat, big AND gates. output from 1st INVERT and mask
out 2nd. repeat.

thoughts? (sorry brief, extreme pain)
l.

Re: Load/Store with auto-increment

<fa7525ad-02ce-4b47-961b-fa357692ef07n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32202&group=comp.arch#32202

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2985:b0:74a:8fd6:66de with SMTP id r5-20020a05620a298500b0074a8fd666demr9729967qkp.6.1683999822307;
Sat, 13 May 2023 10:43:42 -0700 (PDT)
X-Received: by 2002:a05:6830:1:b0:6a9:3b33:1e3a with SMTP id
c1-20020a056830000100b006a93b331e3amr4688773otp.7.1683999821981; Sat, 13 May
2023 10:43:41 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 13 May 2023 10:43:41 -0700 (PDT)
In-Reply-To: <647cfb50-e24e-4c2d-a9aa-269083f50be6n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=92.19.80.230; posting-account=soFpvwoAAADIBXOYOBcm_mixNPAaxW9p
NNTP-Posting-Host: 92.19.80.230
References: <u35prk$2ssbq$1@dont-email.me> <u36fd2$121nc$1@newsreader4.netcologne.de>
<2023May9.111344@mips.complang.tuwien.ac.at> <UQt6M.233407$qpNc.65909@fx03.iad>
<_Qu6M.539024$Olad.404121@fx35.iad> <Y7y6M.233411$qpNc.12100@fx03.iad>
<2023May10.100025@mips.complang.tuwien.ac.at> <LHP6M.2840676$9sn9.1828478@fx17.iad>
<2023May11.120936@mips.complang.tuwien.ac.at> <nEb7M.1732803$t5W7.275293@fx13.iad>
<7a6c447f-5e17-46db-bca6-80f818a9a202n@googlegroups.com> <Y7P7M.1899588$gGD7.1460511@fx11.iad>
<647cfb50-e24e-4c2d-a9aa-269083f50be6n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fa7525ad-02ce-4b47-961b-fa357692ef07n@googlegroups.com>
Subject: Re: Load/Store with auto-increment
From: luke.leighton@gmail.com (luke.l...@gmail.com)
Injection-Date: Sat, 13 May 2023 17:43:42 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 15
 by: luke.l...@gmail.com - Sat, 13 May 2023 17:43 UTC

On Saturday, May 13, 2023 at 6:18:45 PM UTC+1, luke.l...@gmail.com wrote:
> On Saturday, May 13, 2023 at 5:44:44 PM UTC+1, EricP wrote:
>
> > It sums up a series of inputs to count resources requests
> > which is something people have done for a long time,
> > such as I was doing for allocating read ports to uOp lanes.
> num ports != max popcount suggests alternative.

one of our team created undocumented extra idea. saturated-adders
on prefix-sums. implementation looks incomplete, like idea.

https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/picker.py;h=7aab175d8d60c031020b7b208727156123b8c09f;hb=4bf2f20bddc057df1597d14e0b990c0b9bdeb10e#l244

Re: Load/Store with auto-increment

<i%78M.3490407$GNG9.463032@fx18.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32212&group=comp.arch#32212

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.neodome.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx18.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
References: <u35prk$2ssbq$1@dont-email.me> <u36fd2$121nc$1@newsreader4.netcologne.de> <2023May9.111344@mips.complang.tuwien.ac.at> <UQt6M.233407$qpNc.65909@fx03.iad> <_Qu6M.539024$Olad.404121@fx35.iad> <Y7y6M.233411$qpNc.12100@fx03.iad> <2023May10.100025@mips.complang.tuwien.ac.at> <LHP6M.2840676$9sn9.1828478@fx17.iad> <2023May11.120936@mips.complang.tuwien.ac.at> <nEb7M.1732803$t5W7.275293@fx13.iad> <7a6c447f-5e17-46db-bca6-80f818a9a202n@googlegroups.com> <Y7P7M.1899588$gGD7.1460511@fx11.iad> <647cfb50-e24e-4c2d-a9aa-269083f50be6n@googlegroups.com>
In-Reply-To: <647cfb50-e24e-4c2d-a9aa-269083f50be6n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 41
Message-ID: <i%78M.3490407$GNG9.463032@fx18.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 14 May 2023 16:29:02 UTC
Date: Sun, 14 May 2023 12:28:17 -0400
X-Received-Bytes: 2973
 by: EricP - Sun, 14 May 2023 16:28 UTC

luke.l...@gmail.com wrote:
> On Saturday, May 13, 2023 at 5:44:44 PM UTC+1, EricP wrote:
>
>> It sums up a series of inputs to count resources requests
>> which is something people have done for a long time,
>> such as I was doing for allocating read ports to uOp lanes.
>
> num ports != max popcount suggests alternative.
> num ports <<< max popcount use chained PriorityPickers?
> PPs are flat, big AND gates. output from 1st INVERT and mask
> out 2nd. repeat.
>
> thoughts? (sorry brief, extreme pain)
> l.

The goal is to dynamically assign source register read ports based
on what each uOp in each lane actually requires so we don't need to
have the worst case number of ports for all uOp lanes.

The problem is that performing conditional resource allocation serially
does not allow many allocations within the stage gate delay width.

Sequentially assigning ports to uOp source registers can be done with
a chain of conditional increments (CI). Say each CI is 3 gate delays.

If each uOp needs 0..3 source ports, and if a stage is 20 gates wide,
then we can afford a chain of 6 CI or just 2 lanes within a stage window,
and we still haven't accounted for other logic overhead.

To get more than 2 lanes we have to either assign ports in parallel,
or do it serially but across multiple pipeline stages (assuming it can).
But to do this across multiple stages requires fixing
the uOps to their lanes early, which means that any front end
pipeline bubbles are fixed in place and those lanes remain empty
(it precludes a compacting front end pipeline optimization).

This is what I was trying to point out earlier,
that it is a bit like trying to pick up Jello with your hands.

Re: Load/Store with auto-increment

<616a42f5-cab0-4bd8-8810-f575a777ea2dn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32213&group=comp.arch#32213

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:584d:0:b0:3e9:9419:b153 with SMTP id h13-20020ac8584d000000b003e99419b153mr10642793qth.0.1684085828882;
Sun, 14 May 2023 10:37:08 -0700 (PDT)
X-Received: by 2002:a9d:6f04:0:b0:6a5:f20b:f6e6 with SMTP id
n4-20020a9d6f04000000b006a5f20bf6e6mr5184728otq.2.1684085828595; Sun, 14 May
2023 10:37:08 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!newsfeed.hasname.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 14 May 2023 10:37:08 -0700 (PDT)
In-Reply-To: <i%78M.3490407$GNG9.463032@fx18.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:2599:da7a:47b5:34ef;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:2599:da7a:47b5:34ef
References: <u35prk$2ssbq$1@dont-email.me> <u36fd2$121nc$1@newsreader4.netcologne.de>
<2023May9.111344@mips.complang.tuwien.ac.at> <UQt6M.233407$qpNc.65909@fx03.iad>
<_Qu6M.539024$Olad.404121@fx35.iad> <Y7y6M.233411$qpNc.12100@fx03.iad>
<2023May10.100025@mips.complang.tuwien.ac.at> <LHP6M.2840676$9sn9.1828478@fx17.iad>
<2023May11.120936@mips.complang.tuwien.ac.at> <nEb7M.1732803$t5W7.275293@fx13.iad>
<7a6c447f-5e17-46db-bca6-80f818a9a202n@googlegroups.com> <Y7P7M.1899588$gGD7.1460511@fx11.iad>
<647cfb50-e24e-4c2d-a9aa-269083f50be6n@googlegroups.com> <i%78M.3490407$GNG9.463032@fx18.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <616a42f5-cab0-4bd8-8810-f575a777ea2dn@googlegroups.com>
Subject: Re: Load/Store with auto-increment
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sun, 14 May 2023 17:37:08 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4466
 by: MitchAlsup - Sun, 14 May 2023 17:37 UTC

On Sunday, May 14, 2023 at 11:30:45 AM UTC-5, EricP wrote:
> luke.l...@gmail.com wrote:
> > On Saturday, May 13, 2023 at 5:44:44 PM UTC+1, EricP wrote:
> >
> >> It sums up a series of inputs to count resources requests
> >> which is something people have done for a long time,
> >> such as I was doing for allocating read ports to uOp lanes.
> >
> > num ports != max popcount suggests alternative.
> > num ports <<< max popcount use chained PriorityPickers?
> > PPs are flat, big AND gates. output from 1st INVERT and mask
> > out 2nd. repeat.
> >
> > thoughts? (sorry brief, extreme pain)
> > l.
> The goal is to dynamically assign source register read ports based
> on what each uOp in each lane actually requires so we don't need to
> have the worst case number of ports for all uOp lanes.
>
> The problem is that performing conditional resource allocation serially
> does not allow many allocations within the stage gate delay width.
>
> Sequentially assigning ports to uOp source registers can be done with
> a chain of conditional increments (CI). Say each CI is 3 gate delays.
>
> If each uOp needs 0..3 source ports, and if a stage is 20 gates wide,
<
That is what you get if you look at it from the instruction point of view.
<
Instead, if you look at it from the function unit point of view, there is
only 1 FU that services 3-rename ports (FMAC), so, while AGEN appears
to service 3-operands, only 2 of them can be registers, and the rest
are known to be 2-ports or 1-port.
<
I am working on a 6-Wide GBOoO design that only has 12-rename
ports, with one 3-register FU (FMAC and INSert and CMOV) and can
dish out new PRNs 6 per cycle.
<
> then we can afford a chain of 6 CI or just 2 lanes within a stage window,
> and we still haven't accounted for other logic overhead.
>
> To get more than 2 lanes we have to either assign ports in parallel,
> or do it serially but across multiple pipeline stages (assuming it can).
<
Route instructions to FUs THEN rename their ports. This eliminates
the 3-rename ports for all instructions; only the FUs that service
3-registers need 3-rename ports.
<
> But to do this across multiple stages requires fixing == routing
> the uOps to their lanes early, which means that any front end
> pipeline bubbles are fixed in place and those lanes remain empty
> (it precludes a compacting front end pipeline optimization).
<
I thought that way too in 1991........ there are solutions......
>
> This is what I was trying to point out earlier,
> that it is a bit like trying to pick up Jello with your hands.

Re: Load/Store with auto-increment

<_o98M.370835$b7Kc.284378@fx39.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32215&group=comp.arch#32215

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx39.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
References: <u35prk$2ssbq$1@dont-email.me> <u36fd2$121nc$1@newsreader4.netcologne.de> <2023May9.111344@mips.complang.tuwien.ac.at> <UQt6M.233407$qpNc.65909@fx03.iad> <_Qu6M.539024$Olad.404121@fx35.iad> <Y7y6M.233411$qpNc.12100@fx03.iad> <2023May10.100025@mips.complang.tuwien.ac.at> <LHP6M.2840676$9sn9.1828478@fx17.iad> <2023May11.120936@mips.complang.tuwien.ac.at> <nEb7M.1732803$t5W7.275293@fx13.iad> <7a6c447f-5e17-46db-bca6-80f818a9a202n@googlegroups.com> <Y7P7M.1899588$gGD7.1460511@fx11.iad> <647cfb50-e24e-4c2d-a9aa-269083f50be6n@googlegroups.com> <fa7525ad-02ce-4b47-961b-fa357692ef07n@googlegroups.com>
In-Reply-To: <fa7525ad-02ce-4b47-961b-fa357692ef07n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 45
Message-ID: <_o98M.370835$b7Kc.284378@fx39.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 14 May 2023 18:04:42 UTC
Date: Sun, 14 May 2023 14:04:02 -0400
X-Received-Bytes: 3243
 by: EricP - Sun, 14 May 2023 18:04 UTC

luke.l...@gmail.com wrote:
> On Saturday, May 13, 2023 at 6:18:45 PM UTC+1, luke.l...@gmail.com wrote:
>> On Saturday, May 13, 2023 at 5:44:44 PM UTC+1, EricP wrote:
>>
>>> It sums up a series of inputs to count resources requests
>>> which is something people have done for a long time,
>>> such as I was doing for allocating read ports to uOp lanes.
>> num ports != max popcount suggests alternative.
>
> one of our team created undocumented extra idea. saturated-adders
> on prefix-sums. implementation looks incomplete, like idea.
>
> https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/picker.py;h=7aab175d8d60c031020b7b208727156123b8c09f;hb=4bf2f20bddc057df1597d14e0b990c0b9bdeb10e#l244

I don't know python or your design but if I understand correctly,
yes the BetterMultiPriorityPicker is the same parallel multi-picker
I was referring to. The saturation adder could save propagating carries
unnecessarily (if you only need to pick K items then you only
need to sum to K).

For hardware the other part of the problem is how to get
each of the one-hot picks routed to a separate output port.
In other words a black box that takes N input bits and routes
each successive N-bit wide one-hot pick to successive outputs.

N item bits
|
v
Multi-Picker
---> pick[0] (one-hot, N-bits wide),(item number)
---> pick[1] (one-hot, N-bits wide),(item number)
---> pick[2] (one-hot, N-bits wide),(item number)

A decoder attached to each partial sum can drive the output
for each pick[n] plus generate a binary encoding of the bit position
(saving a one-hot to binary converter for each pick).

Used as a free register allocator, for each pick I get a
one-hot register selector plus a binary register number.

In my case the register number goes to Rename, and the one-hot value
is routed back to select the register status to change and remove it
from the free list if the register is taken by Rename.

Re: Load/Store with auto-increment

<9u98M.3490410$GNG9.530999@fx18.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32216&group=comp.arch#32216

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.neodome.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx18.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Load/Store with auto-increment
References: <u35prk$2ssbq$1@dont-email.me> <u36fd2$121nc$1@newsreader4.netcologne.de> <2023May9.111344@mips.complang.tuwien.ac.at> <UQt6M.233407$qpNc.65909@fx03.iad> <_Qu6M.539024$Olad.404121@fx35.iad> <Y7y6M.233411$qpNc.12100@fx03.iad> <2023May10.100025@mips.complang.tuwien.ac.at> <LHP6M.2840676$9sn9.1828478@fx17.iad> <2023May11.120936@mips.complang.tuwien.ac.at> <nEb7M.1732803$t5W7.275293@fx13.iad> <7a6c447f-5e17-46db-bca6-80f818a9a202n@googlegroups.com> <Y7P7M.1899588$gGD7.1460511@fx11.iad> <647cfb50-e24e-4c2d-a9aa-269083f50be6n@googlegroups.com> <i%78M.3490407$GNG9.463032@fx18.iad> <616a42f5-cab0-4bd8-8810-f575a777ea2dn@googlegroups.com>
In-Reply-To: <616a42f5-cab0-4bd8-8810-f575a777ea2dn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 12
Message-ID: <9u98M.3490410$GNG9.530999@fx18.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 14 May 2023 18:10:13 UTC
Date: Sun, 14 May 2023 14:09:53 -0400
X-Received-Bytes: 1806
 by: EricP - Sun, 14 May 2023 18:09 UTC

MitchAlsup wrote:
> On Sunday, May 14, 2023 at 11:30:45 AM UTC-5, EricP wrote:
>> But to do this across multiple stages requires fixing == routing
>> the uOps to their lanes early, which means that any front end
>> pipeline bubbles are fixed in place and those lanes remain empty
>> (it precludes a compacting front end pipeline optimization).
> <
> I thought that way too in 1991........ there are solutions......

I have a thing for elastic pipelines.

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor