Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

You know you've landed gear-up when it takes full power to taxi.


devel / comp.arch / Concertina II Progress

SubjectAuthor
* Concertina II ProgressQuadibloc
+- Re: Concertina II ProgressBGB
+* Re: Concertina II ProgressThomas Koenig
|+* Re: Concertina II ProgressBGB-Alt
||`* Re: Concertina II ProgressQuadibloc
|| `* Re: Concertina II ProgressBGB-Alt
||  +* Re: Concertina II ProgressQuadibloc
||  |+* Re: Concertina II ProgressBGB
||  ||`- Re: Concertina II ProgressMitchAlsup
||  |+* Re: Concertina II ProgressScott Lurndal
||  ||`* Re: Concertina II ProgressBGB
||  || +* Re: Concertina II ProgressStephen Fuld
||  || |`* Re: Concertina II ProgressMitchAlsup
||  || | +- Re: Concertina II ProgressBGB-Alt
||  || | `* Re: Concertina II ProgressStephen Fuld
||  || |  `* Re: Concertina II ProgressMitchAlsup
||  || |   `* Re: Concertina II ProgressStephen Fuld
||  || |    `* Re: Concertina II ProgressMitchAlsup
||  || |     `* Re: Concertina II ProgressStephen Fuld
||  || |      `* Re: Concertina II ProgressBGB
||  || |       `* Re: Concertina II ProgressMitchAlsup
||  || |        +* Re: Concertina II ProgressBGB
||  || |        |`* Re: Concertina II ProgressMitchAlsup
||  || |        | +* Re: Concertina II ProgressStefan Monnier
||  || |        | |`* Re: Concertina II ProgressMitchAlsup
||  || |        | | `* Re: Concertina II ProgressScott Lurndal
||  || |        | |  `* Re: Concertina II ProgressMitchAlsup
||  || |        | |   +- Re: Concertina II ProgressPaul A. Clayton
||  || |        | |   `* Re: Concertina II ProgressStefan Monnier
||  || |        | |    +- Re: Concertina II ProgressMitchAlsup
||  || |        | |    `* Re: Concertina II ProgressScott Lurndal
||  || |        | |     `* Re: Concertina II ProgressBGB
||  || |        | |      +* Re: Concertina II ProgressScott Lurndal
||  || |        | |      |`* Re: Concertina II ProgressBGB
||  || |        | |      | +* Re: Concertina II ProgressScott Lurndal
||  || |        | |      | |+* Re: Concertina II ProgressBGB
||  || |        | |      | ||`* Re: Concertina II ProgressScott Lurndal
||  || |        | |      | || `* Re: Concertina II ProgressBGB
||  || |        | |      | ||  +* Re: Concertina II ProgressScott Lurndal
||  || |        | |      | ||  |+- Re: Concertina II ProgressMitchAlsup
||  || |        | |      | ||  |`* Re: Concertina II ProgressBGB
||  || |        | |      | ||  | `- Re: Concertina II ProgressScott Lurndal
||  || |        | |      | ||  `* Re: Concertina II ProgressRobert Finch
||  || |        | |      | ||   `- Re: Concertina II ProgressBGB
||  || |        | |      | |`* Re: Concertina II ProgressMitchAlsup
||  || |        | |      | | `* Re: Concertina II ProgressScott Lurndal
||  || |        | |      | |  `* Re: Concertina II ProgressMitchAlsup
||  || |        | |      | |   +* Re: Concertina II ProgressScott Lurndal
||  || |        | |      | |   |`- Re: Concertina II ProgressMitchAlsup
||  || |        | |      | |   `* Re: Concertina II ProgressScott Lurndal
||  || |        | |      | |    `- Re: Concertina II ProgressMitchAlsup
||  || |        | |      | `- Re: Concertina II ProgressMitchAlsup
||  || |        | |      `* Re: Concertina II ProgressMitchAlsup
||  || |        | |       +- Re: Concertina II ProgressRobert Finch
||  || |        | |       `* Re: Concertina II ProgressScott Lurndal
||  || |        | |        `* Re: Concertina II ProgressMitchAlsup
||  || |        | |         `* Re: Concertina II ProgressChris M. Thomasson
||  || |        | |          `* Re: Concertina II ProgressMitchAlsup
||  || |        | |           `* Re: Concertina II ProgressMitchAlsup
||  || |        | |            `- Re: Concertina II ProgressChris M. Thomasson
||  || |        | `* Re: Concertina II ProgressBGB
||  || |        |  `* Re: Concertina II ProgressMitchAlsup
||  || |        |   `* Re: Concertina II ProgressBGB
||  || |        |    `* Re: Concertina II ProgressMitchAlsup
||  || |        |     +* Re: Concertina II ProgressRobert Finch
||  || |        |     |`* Re: Concertina II ProgressMitchAlsup
||  || |        |     | +- Re: Concertina II ProgressRobert Finch
||  || |        |     | `* Re: Concertina II ProgressQuadibloc
||  || |        |     |  +* Re: Concertina II ProgressQuadibloc
||  || |        |     |  |`* Re: Concertina II ProgressMitchAlsup
||  || |        |     |  | +* Re: Concertina II ProgressScott Lurndal
||  || |        |     |  | |`* Re: Concertina II ProgressMitchAlsup
||  || |        |     |  | | +- Re: Concertina II ProgressScott Lurndal
||  || |        |     |  | | `* Re: Concertina II ProgressQuadibloc
||  || |        |     |  | |  `* Re: Concertina II ProgressMitchAlsup
||  || |        |     |  | |   `* Re: Concertina II ProgressQuadibloc
||  || |        |     |  | |    `- Re: Concertina II ProgressQuadibloc
||  || |        |     |  | `* Re: Concertina II ProgressQuadibloc
||  || |        |     |  |  `- Re: Concertina II ProgressMitchAlsup
||  || |        |     |  `- Re: Concertina II ProgressMitchAlsup
||  || |        |     +- Re: Concertina II ProgressBGB
||  || |        |     `* Re: Concertina II ProgressPaul A. Clayton
||  || |        |      +* Re: Concertina II ProgressRobert Finch
||  || |        |      |`* Re: Concertina II ProgressPaul A. Clayton
||  || |        |      | +* Re: Concertina II ProgressMitchAlsup
||  || |        |      | |`* Re: Concertina II ProgressPaul A. Clayton
||  || |        |      | | +- Re: Concertina II ProgressBGB
||  || |        |      | +* Computer architecture (was: Concertina II Progress)Anton Ertl
||  || |        |      | |+* Re: Computer architectureEricP
||  || |        |      | ||`* Re: Computer architectureAnton Ertl
||  || |        |      | || `* Re: Computer architectureScott Lurndal
||  || |        |      | ||  +* Re: Computer architectureStefan Monnier
||  || |        |      | ||  |`* Re: Computer architectureScott Lurndal
||  || |        |      | ||  | `* Re: Computer architectureStefan Monnier
||  || |        |      | ||  |  +* Re: Computer architectureScott Lurndal
||  || |        |      | ||  |  |`* Re: Computer architectureStefan Monnier
||  || |        |      | ||  |  | `* Re: Computer architectureBGB
||  || |        |      | ||  |  |  `- Re: Computer architectureStefan Monnier
||  || |        |      | ||  |  `* Re: Computer architectureBGB
||  || |        |      | ||  |   `- Re: Computer architectureScott Lurndal
||  || |        |      | ||  +* Re: Computer architectureAnton Ertl
||  || |        |      | |`* Re: Computer architecturePaul A. Clayton
||  || |        |      `* Re: Concertina II ProgressMitchAlsup
||  || |        `* Re: Concertina II ProgressRobert Finch
||  || `* Re: Concertina II ProgressMitchAlsup
||  |+- Re: Concertina II ProgressMitchAlsup
||  |`* Re: Concertina II ProgressThomas Koenig
||  +- Re: Concertina II ProgressQuadibloc
||  `* Re: Concertina II ProgressQuadibloc
|`* Re: Concertina II ProgressQuadibloc
`* Re: Concertina II ProgressMitchAlsup

Pages:123456789101112131415161718192021222324252627282930313233343536373839
Re: IPC

<285f512c07cd4355c1caea3d2ef896d8@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37775&group=comp.arch#37775

  copy link   Newsgroups: comp.arch
Date: Sun, 3 Mar 2024 18:26:10 +0000
Subject: Re: IPC
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$u/dQsJGdJo.K670EoSPRgeZt6SmmmxSALMQImhq2I9Z1S./ke3UzS
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uigus7$1pteb$1@dont-email.me> <2024Jan24.084731@mips.complang.tuwien.ac.at> <uorlft$1ti8a$1@dont-email.me> <2024Jan24.225412@mips.complang.tuwien.ac.at> <cslsN.381334$83n7.291175@fx18.iad> <2024Jan25.074631@mips.complang.tuwien.ac.at> <nCtsN.64363$Sf59.39184@fx48.iad> <2024Jan25.162230@mips.complang.tuwien.ac.at> <urg471$215g3$5@dont-email.me> <d6206301512dacecc2d5648276a6a802@www.novabbs.org> <us0cn8$24dt8$1@dont-email.me> <ac55c75a923144f72d204c801ff7f984@www.novabbs.org> <2024Mar3.092158@mips.complang.tuwien.ac.at>
Organization: Rocksolid Light
Message-ID: <285f512c07cd4355c1caea3d2ef896d8@www.novabbs.org>
 by: MitchAlsup1 - Sun, 3 Mar 2024 18:26 UTC

Anton Ertl wrote:

> mitchalsup@aol.com (MitchAlsup1) writes:

> I don't think that it's plausible that the 88120, which would have
> appeared in the mid-1990s would perform as well or better than
> goldencove on this workload. My guess is that it would have had to
> undergo a silicon diet like the PPC620, probably even more so, because
> it was to appear earlier, which probably would have meant less
> transistors, which would have reduced the matrix300 IPC, and probably
> to a lesser amount, the Xlisp IPC.

> Also, the question is how fast the result would clock. The 88110 was
> available in 1992 at 50MHz, in the same year as the 200MHz 21064.
> When would the 88120 have been available at what clock rate? The

We were on schedule when I left in 1992 to be in silicon by 1995.
We were targeting 100 MHz.
We did not get far enough to know the die-size*.

(*) Motorola was offering us 0.5µ BiCMOS. SPICE simulations indicated
that the wire attached to the emitters was going to be suspect to
crystal migration due to current density. We (our project) was only
using the bipolars in sense amplifiers and not in normal gates.

> PPC620 was available in 1997 at up to 150MHz, while the Pentium II
> Klamath was available in 1997 at clock rates up to 300MHz, and the
> (in-order) 21164a was available at 600MHz; my guess is that the 21164a
> could also produce good matrix300 numbers.

> - anton

Re: 88xxx or PPC

<20240303203052.00007c61@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37776&group=comp.arch#37776

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.arch
Subject: Re: 88xxx or PPC
Date: Sun, 3 Mar 2024 20:30:52 +0200
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <20240303203052.00007c61@yahoo.com>
References: <uigus7$1pteb$1@dont-email.me>
<uorlft$1ti8a$1@dont-email.me>
<2024Jan24.225412@mips.complang.tuwien.ac.at>
<cslsN.381334$83n7.291175@fx18.iad>
<2024Jan25.074631@mips.complang.tuwien.ac.at>
<nCtsN.64363$Sf59.39184@fx48.iad>
<2024Jan25.162230@mips.complang.tuwien.ac.at>
<urg471$215g3$5@dont-email.me>
<d6206301512dacecc2d5648276a6a802@www.novabbs.org>
<us0cn8$24dt8$1@dont-email.me>
<ac55c75a923144f72d204c801ff7f984@www.novabbs.org>
<20240303165533.00004104@yahoo.com>
<2024Mar3.173345@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="f81cf74dab80d461d816c48d20d52fcc";
logging-data="2779746"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX188rEC3MVAwzcEMAhwtjKwUn2vF+y8gz/0="
Cancel-Lock: sha1:s1fiDxnEFDA5U23IAULkLWxc7eo=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
 by: Michael S - Sun, 3 Mar 2024 18:30 UTC

On Sun, 03 Mar 2024 16:33:45 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

> Michael S <already5chosen@yahoo.com> writes:
> >I can't find information about Matrix300.
> >It seems to be part of SPEC89 FP suite, but spec.org does not provide
> >info about anything older than SPEC92.
> >Can you tell me what exactly does it do?
>
> It's 300x300 FP matrix multiply (not sure if single or double). There
> was a company that had a tool (famous at the time, but I don't
> remember the name) that could transform the original source code into
> a cache-blocked variant, which typically ran at the limits imposed by
> the FUs. Eventually everyone used that tool in their compiler to get
> good SPEC89 results. As a consequence, SPEC eliminated matrix300 in
> SPEC92.
>
> - anton

So, in today's world it would be something like "How fast can you do
DGEMM with 7 out of your 8 [SIMD] hands tied behind your back?"
Or, may be, more than 7 if your variant of AMX supports double
precision.
The challenge is, funny, but the answer not particularly useful.
But even in that not particularly useful answer IPC appears to be the
least useful part. Far worse than FLOPS/Hz.

Re: 88xxx or PPC

<2024Mar3.232237@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37777&group=comp.arch#37777

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: 88xxx or PPC
Date: Sun, 03 Mar 2024 22:22:37 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 22
Message-ID: <2024Mar3.232237@mips.complang.tuwien.ac.at>
References: <uigus7$1pteb$1@dont-email.me> <cslsN.381334$83n7.291175@fx18.iad> <2024Jan25.074631@mips.complang.tuwien.ac.at> <nCtsN.64363$Sf59.39184@fx48.iad> <2024Jan25.162230@mips.complang.tuwien.ac.at> <urg471$215g3$5@dont-email.me> <d6206301512dacecc2d5648276a6a802@www.novabbs.org> <us0cn8$24dt8$1@dont-email.me> <ac55c75a923144f72d204c801ff7f984@www.novabbs.org> <20240303165533.00004104@yahoo.com> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com>
Injection-Info: dont-email.me; posting-host="a692ecae0713563188180b63e9c42ebc";
logging-data="2880941"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+KXZd+V9IlPSVfliNtjNxq"
Cancel-Lock: sha1:jytccMnLiQgIo455B2LVjoneHGs=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 3 Mar 2024 22:22 UTC

Michael S <already5chosen@yahoo.com> writes:
>But even in that not particularly useful answer IPC appears to be the
>least useful part. Far worse than FLOPS/Hz.

Those were the days before SIMD, so IPC told you a little about
FLOPS/Hz. I think, though, that you look at it from the other end.
You are asking: Is that a number I want to know for evaluating DGEMM
performance on the 88120? But Mitch Alsup and the other people were
probably thinking: this uarchitecture can never exceed 6 IPC, so
getting 5.9 IPC on an actual SPEC CPU89 benchmark is pretty good. And
matrix300 with its mixture of FP adds, FP muls, loads, stores, address
arithmetic, and control, i.e., making relatively balanced use of many
FUs, is a pretty good benchmark for getting high numbers of this kind.

I guess that the 21164 also showed close to 4 IPC on the 4-wide 21164
on matrix300, while its 2 integer units would limit it to much lower
performance on, say, intmm.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: 88xxx or PPC

<1a8a8601b08f6cca5457d663f7ffa3b2@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37778&group=comp.arch#37778

  copy link   Newsgroups: comp.arch
Date: Sun, 3 Mar 2024 22:41:03 +0000
Subject: Re: 88xxx or PPC
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$CgJ1NdAHokNJvsYjgWct2ufh6qrta6gPC1Iook0gFTr/7faaquXOa
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uigus7$1pteb$1@dont-email.me> <uorlft$1ti8a$1@dont-email.me> <2024Jan24.225412@mips.complang.tuwien.ac.at> <cslsN.381334$83n7.291175@fx18.iad> <2024Jan25.074631@mips.complang.tuwien.ac.at> <nCtsN.64363$Sf59.39184@fx48.iad> <2024Jan25.162230@mips.complang.tuwien.ac.at> <urg471$215g3$5@dont-email.me> <d6206301512dacecc2d5648276a6a802@www.novabbs.org> <us0cn8$24dt8$1@dont-email.me> <ac55c75a923144f72d204c801ff7f984@www.novabbs.org> <20240303165533.00004104@yahoo.com> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com>
Organization: Rocksolid Light
Message-ID: <1a8a8601b08f6cca5457d663f7ffa3b2@www.novabbs.org>
 by: MitchAlsup1 - Sun, 3 Mar 2024 22:41 UTC

Michael S wrote:

> On Sun, 03 Mar 2024 16:33:45 GMT
> anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

>> Michael S <already5chosen@yahoo.com> writes:
>> >I can't find information about Matrix300.
>> >It seems to be part of SPEC89 FP suite, but spec.org does not provide
>> >info about anything older than SPEC92.
>> >Can you tell me what exactly does it do?
>>
>> It's 300x300 FP matrix multiply (not sure if single or double). There
>> was a company that had a tool (famous at the time, but I don't
>> remember the name) that could transform the original source code into
>> a cache-blocked variant, which typically ran at the limits imposed by
>> the FUs. Eventually everyone used that tool in their compiler to get
>> good SPEC89 results. As a consequence, SPEC eliminated matrix300 in
>> SPEC92.
>>
>> - anton

> So, in today's world it would be something like "How fast can you do
> DGEMM with 7 out of your 8 [SIMD] hands tied behind your back?"
> Or, may be, more than 7 if your variant of AMX supports double
> precision.

xGEMM supports transposes of the input matrixes; D stands for Double
precision, S stands for Single precision. Matrix300 used DGEMM.

SIMD would only support 2 of the 8 calls to DGEMM where the transposes
are not {'N', 'n', 'T', 't', 'C', or 'c'}. SIMD would do nothing for
the 6 transpose calls. It should also be noted that the transposed
matrixes have significantly worse cache behavior than the non trans-
posed version as each access is to a different cache line.

The problem is typical SIMD does not support the kinds of transposes
xGEMM performs. That is the problem is not the transposes, it is naïve
SIMD which is the problem.

So, postulate that one can SIMD the non-transposed loop and gain 4×.
The other 3 loops get 1× for an overall gain of <less than> 25%;
where the "less than" is due to the cache and TLB effects.

A TLB with as few as 24 entries FA gets 100% hit rate in the non-
transpose case, and poor performance on any (all) of the transpose
cases, in the dual transpose case, the TLB takes a miss every other
cache access. Here a 256-entry DM TLB gets 100% hit rate where a 64-
entry FA TLB is getting close to zero hit rate.

> The challenge is, funny, but the answer not particularly useful.

xGEMM is likely the second most used GB-math number crunching
subroutine in use--FFT <flavors> being the most used.

> But even in that not particularly useful answer IPC appears to be the
> least useful part. Far worse than FLOPS/Hz.

Correct, and this illustrates how times have changed. In 1985 Matrix300
would use * and + as separate instructions. The major loop consists of
2 LDs, 1*, 1+, 1 ST, and a ADD-CMP-BC which could be distributed over
the 4-way unrolled loop (in source code). Mc 88110 compiler would
produce a 24 instruction loop (non-transposed) and the Mc 88120 sim-
ulator would run this loop (including DRAM accesses (cache misses and
cache victims), and TLB table-walking) in 4 cycles. Today, FMAC is
<fused into> 1 instruction, saving instruction count and calculation
latency (8->5) allowing Matrix 300 to fit in a 78-instruction execution
window instead of a 96-instruction EW. {{But you still have to count
FMAC as 2 FLOPs.}}

My 66000, using VEC-LOOP the instruction count goes down again to 5 (from
6) per loop since LOOP is performing ADD-CMP-BC in a single instruction
and in a single clock.

Re: 88xxx or PPC

<2024Mar4.095209@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37780&group=comp.arch#37780

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: 88xxx or PPC
Date: Mon, 04 Mar 2024 08:52:09 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 55
Message-ID: <2024Mar4.095209@mips.complang.tuwien.ac.at>
References: <uigus7$1pteb$1@dont-email.me> <2024Jan25.074631@mips.complang.tuwien.ac.at> <nCtsN.64363$Sf59.39184@fx48.iad> <2024Jan25.162230@mips.complang.tuwien.ac.at> <urg471$215g3$5@dont-email.me> <d6206301512dacecc2d5648276a6a802@www.novabbs.org> <us0cn8$24dt8$1@dont-email.me> <ac55c75a923144f72d204c801ff7f984@www.novabbs.org> <20240303165533.00004104@yahoo.com> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com> <1a8a8601b08f6cca5457d663f7ffa3b2@www.novabbs.org>
Injection-Info: dont-email.me; posting-host="5f7b967894908ea9ab01696180a3a5c9";
logging-data="3235555"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18VRXjqgopPDrviJe5fK9UQ"
Cancel-Lock: sha1:pQcIoUb18m16yw51Is0FeAk83Oo=
X-newsreader: xrn 10.11
 by: Anton Ertl - Mon, 4 Mar 2024 08:52 UTC

mitchalsup@aol.com (MitchAlsup1) writes:
>SIMD would only support 2 of the 8 calls to DGEMM where the transposes
>are not {'N', 'n', 'T', 't', 'C', or 'c'}. SIMD would do nothing for
>the 6 transpose calls.

The following version is good for the non-transposed matrices (it's
not DGEMM, but the difference to DGEMM is left as exercise):

void matmul(double a[], double b[], double c[], size_t m, size_t n, size_t p)
{ size_t i,j,k;
double r;
memset(c,0,n*p*sizeof(double));
for (i=0; i<n; i++)
for (k=0; k<m; k++)
for (j=0; j<p; j++)
c[i*p+j] += a[i*m+k]*b[k*p+j];
}

But the loops are interchangeable, and the naive i,j,k order is bad
for SIMD as well as introducing a recurrence on c[i*p+j] on the inner
loop. Given the amount of parallelism inherent in matrix
multiplication, I would be surprised if one if transposing some or all
of the involved matrices prevented some loop order or other
transformation that would prevent SIMD. In the extreme case, you just
transpose an appropriate input matrix at the start, or the result at
the end, at O(n^2) effort (for n*n matrices), while the matrix
multiply itself takes O(n^3) effort.

But of course, for the 88120 that was a non-issue, because it did not
have SIMD.

>It should also be noted that the transposed
>matrixes have significantly worse cache behavior than the non trans-
>posed version as each access is to a different cache line.

That was fixed by the cache-blocking transformation that everybody
used, and which resulted in the elimination of matrix300 from SPEC92.

It was not the cache sizes, which could easily have been addressed by
modifying matrix300 into, say, matrix2000.

>My 66000, using VEC-LOOP the instruction count goes down again to 5 (from
>6) per loop since LOOP is performing ADD-CMP-BC in a single instruction
>and in a single clock.

If the programmer (or compiler) for My66000 does not process the
elements in the favoured order, it will not perform particularly good
for arbitrary transpositions, either. I don't think that you want to
perform these program transformations in hardware, do you?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: 88xxx or PPC

<20240304171457.000067ea@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37782&group=comp.arch#37782

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.arch
Subject: Re: 88xxx or PPC
Date: Mon, 4 Mar 2024 17:14:57 +0200
Organization: A noiseless patient Spider
Lines: 73
Message-ID: <20240304171457.000067ea@yahoo.com>
References: <uigus7$1pteb$1@dont-email.me>
<cslsN.381334$83n7.291175@fx18.iad>
<2024Jan25.074631@mips.complang.tuwien.ac.at>
<nCtsN.64363$Sf59.39184@fx48.iad>
<2024Jan25.162230@mips.complang.tuwien.ac.at>
<urg471$215g3$5@dont-email.me>
<d6206301512dacecc2d5648276a6a802@www.novabbs.org>
<us0cn8$24dt8$1@dont-email.me>
<ac55c75a923144f72d204c801ff7f984@www.novabbs.org>
<20240303165533.00004104@yahoo.com>
<2024Mar3.173345@mips.complang.tuwien.ac.at>
<20240303203052.00007c61@yahoo.com>
<2024Mar3.232237@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="28d59ce037860cf831847d62725c7e80";
logging-data="3380750"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19vlBeyS5LqqBT8DemcT9Yd28GzPKr4dm4="
Cancel-Lock: sha1:bi70pxoi4PCIa26h4pIRpoYIAfw=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
 by: Michael S - Mon, 4 Mar 2024 15:14 UTC

On Sun, 03 Mar 2024 22:22:37 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

> Michael S <already5chosen@yahoo.com> writes:
> >But even in that not particularly useful answer IPC appears to be the
> >least useful part. Far worse than FLOPS/Hz.
>
>
> I guess that the 21164 also showed close to 4 IPC on the 4-wide 21164
> on matrix300, while its 2 integer units would limit it to much lower
> performance on, say, intmm.
>
> - anton

I don't know about specific case of matrix300 and what transformations
are allowed by SPEC rules and what not, but if I were tasked with
writing generic DGEMM for Alpha 21164 with maximal performance on
non-small relatively square matrices as a amin goal, then I'd start
with something like that:

// main_loop_3x6 - multiply 3 raws of A[][]
// by 6 columns of B[][] assuming C-language order
void innermost_loop_3x6(
const double* A, int lda,
const double* B, int ldb,
double* C, int ldc,
int n)
{ const double* A0 = A;
const double* A1 = A0 + lda;
const double* A2 = A1 + lda;
double acc00 = 0, acc10 = 0, acc20 = 0;
double acc01 = 0, acc11 = 0, acc21 = 0;
double acc02 = 0, acc12 = 0, acc22 = 0;
double acc03 = 0, acc13 = 0, acc23 = 0;
double acc04 = 0, acc14 = 0, acc24 = 0;
double acc05 = 0, acc15 = 0, acc25 = 0;

for (int i = 0; i < n; ++i) {
double a0 = A0[i];
double a1 = A1[i];
double a2 = A2[i];
double b;
b = B[0]; acc00 += a0 * b; acc10 += a1 * b; acc20 += a2 * b;
b = B[1]; acc01 += a0 * b; acc11 += a1 * b; acc21 += a2 * b;
b = B[2]; acc02 += a0 * b; acc12 += a1 * b; acc22 += a2 * b;
b = B[3]; acc03 += a0 * b; acc13 += a1 * b; acc23 += a2 * b;
b = B[4]; acc04 += a0 * b; acc14 += a1 * b; acc24 += a2 * b;
b = B[5]; acc05 += a0 * b; acc15 += a1 * b; acc25 += a2 * b;
B += ldb;
}

double* C0 = C;
double* C1 = C0 + ldc;
double* C2 = C1 + ldc;
C0[0] += acc00; C1[0] += acc10; C2[0] += acc20;
C0[1] += acc01; C1[1] += acc11; C2[1] += acc21;
C0[2] += acc02; C1[2] += acc12; C2[2] += acc22;
C0[3] += acc03; C1[3] += acc13; C2[3] += acc23;
C0[4] += acc04; C1[4] += acc14; C2[4] += acc24;
C0[5] += acc05; C1[5] += acc15; C2[5] += acc25;
}

The loop consists of 9 loads, 4 pointer updates,
1 counter decrement, 1 conditional branch, 18 DP multiplies and 18
DP adds. 51 total instructions.
Ideally, it will run in 18 clocks, for IPC = 2.83. Realistically on
real hardware with cache misses etc it will take 20-23 clock and IPC
would be proportionally lower.
What is my point? My point is that I expect "medium-IPC" kernel like
above to achieve higher FLOPS (== better performance) then simpler,
smaller kernel with IPC in excess of 3.5.

Re: 88xxx or PPC

<2024Mar4.191835@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37784&group=comp.arch#37784

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: 88xxx or PPC
Date: Mon, 04 Mar 2024 18:18:35 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 59
Message-ID: <2024Mar4.191835@mips.complang.tuwien.ac.at>
References: <uigus7$1pteb$1@dont-email.me> <nCtsN.64363$Sf59.39184@fx48.iad> <2024Jan25.162230@mips.complang.tuwien.ac.at> <urg471$215g3$5@dont-email.me> <d6206301512dacecc2d5648276a6a802@www.novabbs.org> <us0cn8$24dt8$1@dont-email.me> <ac55c75a923144f72d204c801ff7f984@www.novabbs.org> <20240303165533.00004104@yahoo.com> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com> <2024Mar3.232237@mips.complang.tuwien.ac.at> <20240304171457.000067ea@yahoo.com>
Injection-Info: dont-email.me; posting-host="5f7b967894908ea9ab01696180a3a5c9";
logging-data="3471136"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+N8huokyy/M8OpPMSDJ028"
Cancel-Lock: sha1:zvmzqiUun/Xp4wWFj6xS6/F0Mcw=
X-newsreader: xrn 10.11
 by: Anton Ertl - Mon, 4 Mar 2024 18:18 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Sun, 03 Mar 2024 22:22:37 GMT
>anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>
>> Michael S <already5chosen@yahoo.com> writes:
>> >But even in that not particularly useful answer IPC appears to be the
>> >least useful part. Far worse than FLOPS/Hz.
>>
>>
>> I guess that the 21164 also showed close to 4 IPC on the 4-wide 21164
>> on matrix300, while its 2 integer units would limit it to much lower
>> performance on, say, intmm.
>>
>> - anton
>
>I don't know about specific case of matrix300 and what transformations
>are allowed by SPEC rules and what not, but if I were tasked with
>writing generic DGEMM for Alpha 21164 with maximal performance on
>non-small relatively square matrices as a amin goal, then I'd start
>with something like that:
>
>// main_loop_3x6 - multiply 3 raws of A[][]
>// by 6 columns of B[][] assuming C-language order
....
>The loop consists of 9 loads, 4 pointer updates,
>1 counter decrement, 1 conditional branch, 18 DP multiplies and 18
>DP adds. 51 total instructions.
>Ideally, it will run in 18 clocks, for IPC = 2.83.

Given that starting 18 FP multiplies and 18 FP additions takes 18
cycles, that is optimal. But you unrolled more than is necessary to
achieve 2FlOPC (FP operations per cycle). With less unrolling, you
could have achieved the same 2FLOPC and of course you would see higher
IPC. And as Mitch Alsup explains, his 5.9 IPC was for a non-unrolled
loop.

>What is my point? My point is that I expect "medium-IPC" kernel like
>above to achieve higher FLOPS (== better performance) then simpler,
>smaller kernel with IPC in excess of 3.5.

These days, with power limits resulting in lower clocks for programs
that do more work, yes, I guess that you will see better FLOPS from
variants that execute fewer instructions. But in the 90s, CPUs ran at
their rated clock rate no matter what, and a 21164 would run a variant
that does 2 FLOPc at the same speed as any other 2 FLOPC variant,
whether that variant performs 0.83 non-flop instructions/cycle or 1.9
non-flop instructions/cycle.

But yes, 5.9 IPC on matrix300 shows little about the matrix multiply
performance. Still, I think that the point is that there are many
hurdles that might result in a lower IPC (for code where only 6IPC
means 2FLOPC), the fact that they achieved 5.9 indicates that they
managed to lower the hurdles a lot; true, it would be better if they
could have shown it with code where 6IPC is more meaningful.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: 88xxx or PPC

<79443a6a95ad834081492149bb5e619a@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37785&group=comp.arch#37785

  copy link   Newsgroups: comp.arch
Date: Mon, 4 Mar 2024 19:43:11 +0000
Subject: Re: 88xxx or PPC
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$BuH5KHcY8lCVLV6PgSSt.OHgSf/Bs834klPyrMpmXQv1kzuGOgl3O
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uigus7$1pteb$1@dont-email.me> <nCtsN.64363$Sf59.39184@fx48.iad> <2024Jan25.162230@mips.complang.tuwien.ac.at> <urg471$215g3$5@dont-email.me> <d6206301512dacecc2d5648276a6a802@www.novabbs.org> <us0cn8$24dt8$1@dont-email.me> <ac55c75a923144f72d204c801ff7f984@www.novabbs.org> <20240303165533.00004104@yahoo.com> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com> <2024Mar3.232237@mips.complang.tuwien.ac.at> <20240304171457.000067ea@yahoo.com> <2024Mar4.191835@mips.complang.tuwien.ac.at>
Organization: Rocksolid Light
Message-ID: <79443a6a95ad834081492149bb5e619a@www.novabbs.org>
 by: MitchAlsup1 - Mon, 4 Mar 2024 19:43 UTC

Anton Ertl wrote:

> Michael S <already5chosen@yahoo.com> writes:
>>On Sun, 03 Mar 2024 22:22:37 GMT
>>anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>>
>>> Michael S <already5chosen@yahoo.com> writes:
>>> >But even in that not particularly useful answer IPC appears to be the
>>> >least useful part. Far worse than FLOPS/Hz.
>>>
>>>
>>> I guess that the 21164 also showed close to 4 IPC on the 4-wide 21164
>>> on matrix300, while its 2 integer units would limit it to much lower
>>> performance on, say, intmm.
>>>
>>> - anton
>>
>>I don't know about specific case of matrix300 and what transformations
>>are allowed by SPEC rules and what not, but if I were tasked with
>>writing generic DGEMM for Alpha 21164 with maximal performance on
>>non-small relatively square matrices as a amin goal, then I'd start
>>with something like that:
>>
>>// main_loop_3x6 - multiply 3 raws of A[][]
>>// by 6 columns of B[][] assuming C-language order
> ....
>>The loop consists of 9 loads, 4 pointer updates,
>>1 counter decrement, 1 conditional branch, 18 DP multiplies and 18
>>DP adds. 51 total instructions.
>>Ideally, it will run in 18 clocks, for IPC = 2.83.

> Given that starting 18 FP multiplies and 18 FP additions takes 18
> cycles, that is optimal. But you unrolled more than is necessary to
> achieve 2FlOPC (FP operations per cycle). With less unrolling, you
> could have achieved the same 2FLOPC and of course you would see higher
> IPC. And as Mitch Alsup explains, his 5.9 IPC was for a non-unrolled
> loop.

>>What is my point? My point is that I expect "medium-IPC" kernel like
>>above to achieve higher FLOPS (== better performance) then simpler,
>>smaller kernel with IPC in excess of 3.5.

> These days, with power limits resulting in lower clocks for programs
> that do more work, yes, I guess that you will see better FLOPS from
> variants that execute fewer instructions. But in the 90s, CPUs ran at
> their rated clock rate no matter what, and a 21164 would run a variant
> that does 2 FLOPc at the same speed as any other 2 FLOPC variant,
> whether that variant performs 0.83 non-flop instructions/cycle or 1.9
> non-flop instructions/cycle.

> But yes, 5.9 IPC on matrix300 shows little about the matrix multiply
> performance. Still, I think that the point is that there are many
> hurdles that might result in a lower IPC (for code where only 6IPC
> means 2FLOPC), the fact that they achieved 5.9 indicates that they
> managed to lower the hurdles a lot; true, it would be better if they
> could have shown it with code where 6IPC is more meaningful.

The processor for which that IPC was stated had a 16KB L1 DM Cache
with 4 banks used twice per cycle and a 16 byte line, so DGEMM was
<essentially> always taking cache misses (every other cycle). Most
of the performance of the overall design came down to 3 things::
The DRAM memory system which could start 2 new accesses every cycle
1 RD 1 WT; The zero cycle branch mispredict repair, and
The short 6-cycle pipeline from Fetch to Retire.

And most of this centered around what we called the conditional
cache--a RoB for memory <if you will>--a place where ST could be
placed and LDs could access but would not get written to L1 if
the instruction packet could not retire (mispredict or exception).

No processor today is doing any of these (well maybe My 66600...)
The BOOM RISC-V processor has a 7 stage front-end and 3 cycle branch
mispredict repair...

> - anton

Re: Efficiency of in-order vs. OoO

<us5cdl$3bfbl$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37786&group=comp.arch#37786

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Efficiency of in-order vs. OoO
Date: Mon, 4 Mar 2024 21:54:12 +0100
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <us5cdl$3bfbl$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me> <unbumf$lot2$1@dont-email.me>
<uncbrq$o1ml$1@dont-email.me>
<06c802c7848a4a522bc022bbd2fdce68@news.novabbs.com>
<2024Jan7.091347@mips.complang.tuwien.ac.at> <uoovaf$1crob$1@dont-email.me>
<2024Jan24.084731@mips.complang.tuwien.ac.at> <uorlft$1ti8a$1@dont-email.me>
<2024Jan24.225412@mips.complang.tuwien.ac.at>
<cslsN.381334$83n7.291175@fx18.iad>
<2024Jan25.074631@mips.complang.tuwien.ac.at>
<nCtsN.64363$Sf59.39184@fx48.iad>
<2024Jan25.162230@mips.complang.tuwien.ac.at>
<jWPsN.286143$c3Ea.55679@fx10.iad>
<a292b7f4fd21a329c25a686bb16a2b4d@www.novabbs.org>
<gkxtN.44654$SyNd.35025@fx33.iad> <urg470$215g3$3@dont-email.me>
<zp2DN.467627$xHn7.106497@fx14.iad>
<009563f7507fe3c5367a5c4dff2a6bdc@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 4 Mar 2024 20:54:13 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="0b26dd3adf4f69716c8d32985f333305";
logging-data="3521909"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/uAunUWc4gNO+biYcubraJwB2WrOJjBfbLXRS4rCoSiA=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18.1
Cancel-Lock: sha1:J2inJXD13UUJumY1fipGGQIyOFM=
In-Reply-To: <009563f7507fe3c5367a5c4dff2a6bdc@www.novabbs.org>
 by: Terje Mathisen - Mon, 4 Mar 2024 20:54 UTC

MitchAlsup1 wrote:
> EricP wrote:
>> My only exception handler that is triggered with any regularity is
>> page fault (assuming a hardware table walker so no TLB miss exceptions),
>> and it typically invokes a handler with many thousands of instructions
>> so prefetching that code a few clocks earlier won't make any difference.
>
> If you use it often enough it will still be in your cache when you next
> need it. {I don't remember exactly who told me this, but it was one of
> the original MIPS (the company not Stanford) guys}; so you don't need to
> prefetch it.

That has been my rule-of-thumb for lookup tables replacing logic: If the
table is small enough and used often enough that it could make a
significant difference to the total runtime, then it will also stay in
cache nearly all the time.

if it does get evicted between uses most of the time, then it simply
wasn't that important.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: 88xxx or PPC

<20240305000621.000039d2@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37787&group=comp.arch#37787

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.arch
Subject: Re: 88xxx or PPC
Date: Tue, 5 Mar 2024 00:06:21 +0200
Organization: A noiseless patient Spider
Lines: 61
Message-ID: <20240305000621.000039d2@yahoo.com>
References: <uigus7$1pteb$1@dont-email.me>
<nCtsN.64363$Sf59.39184@fx48.iad>
<2024Jan25.162230@mips.complang.tuwien.ac.at>
<urg471$215g3$5@dont-email.me>
<d6206301512dacecc2d5648276a6a802@www.novabbs.org>
<us0cn8$24dt8$1@dont-email.me>
<ac55c75a923144f72d204c801ff7f984@www.novabbs.org>
<20240303165533.00004104@yahoo.com>
<2024Mar3.173345@mips.complang.tuwien.ac.at>
<20240303203052.00007c61@yahoo.com>
<2024Mar3.232237@mips.complang.tuwien.ac.at>
<20240304171457.000067ea@yahoo.com>
<2024Mar4.191835@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="ca10b4c024403001a5662ed8e59df858";
logging-data="3548838"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+mnI3icZrLvt2xSIzV5OTLfXaId+iD0ho="
Cancel-Lock: sha1:+3pynZHy+C5xoMg45pMHy47/feM=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
 by: Michael S - Mon, 4 Mar 2024 22:06 UTC

On Mon, 04 Mar 2024 18:18:35 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

> Michael S <already5chosen@yahoo.com> writes:
> >On Sun, 03 Mar 2024 22:22:37 GMT
> >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
> >
> >> Michael S <already5chosen@yahoo.com> writes:
> >> >But even in that not particularly useful answer IPC appears to be
> >> >the least useful part. Far worse than FLOPS/Hz.
> >>
> >>
> >> I guess that the 21164 also showed close to 4 IPC on the 4-wide
> >> 21164 on matrix300, while its 2 integer units would limit it to
> >> much lower performance on, say, intmm.
> >>
> >> - anton
> >
> >I don't know about specific case of matrix300 and what
> >transformations are allowed by SPEC rules and what not, but if I
> >were tasked with writing generic DGEMM for Alpha 21164 with maximal
> >performance on non-small relatively square matrices as a amin goal,
> >then I'd start with something like that:
> >
> >// main_loop_3x6 - multiply 3 raws of A[][]
> >// by 6 columns of B[][] assuming C-language order
> ...
> >The loop consists of 9 loads, 4 pointer updates,
> >1 counter decrement, 1 conditional branch, 18 DP multiplies and 18
> >DP adds. 51 total instructions.
> >Ideally, it will run in 18 clocks, for IPC = 2.83.
>
> Given that starting 18 FP multiplies and 18 FP additions takes 18
> cycles, that is optimal. But you unrolled more than is necessary to
> achieve 2FlOPC (FP operations per cycle). With less unrolling, you
> could have achieved the same 2FLOPC and of course you would see higher
> IPC. And as Mitch Alsup explains, his 5.9 IPC was for a non-unrolled
> loop.
>
> >What is my point? My point is that I expect "medium-IPC" kernel like
> >above to achieve higher FLOPS (== better performance) then simpler,
> >smaller kernel with IPC in excess of 3.5.
>
> These days, with power limits resulting in lower clocks for programs
> that do more work, yes, I guess that you will see better FLOPS from
> variants that execute fewer instructions. But in the 90s, CPUs ran at
> their rated clock rate no matter what, and a 21164 would run a variant
> that does 2 FLOPc at the same speed as any other 2 FLOPC variant,
> whether that variant performs 0.83 non-flop instructions/cycle or 1.9
> non-flop instructions/cycle.
>
> But yes, 5.9 IPC on matrix300 shows little about the matrix multiply
> performance. Still, I think that the point is that there are many
> hurdles that might result in a lower IPC (for code where only 6IPC
> means 2FLOPC), the fact that they achieved 5.9 indicates that they
> managed to lower the hurdles a lot; true, it would be better if they
> could have shown it with code where 6IPC is more meaningful.
>
> - anton

Re: 88xxx or PPC

<20240305001833.000027a9@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37788&group=comp.arch#37788

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.arch
Subject: Re: 88xxx or PPC
Date: Tue, 5 Mar 2024 00:18:33 +0200
Organization: A noiseless patient Spider
Lines: 73
Message-ID: <20240305001833.000027a9@yahoo.com>
References: <uigus7$1pteb$1@dont-email.me>
<nCtsN.64363$Sf59.39184@fx48.iad>
<2024Jan25.162230@mips.complang.tuwien.ac.at>
<urg471$215g3$5@dont-email.me>
<d6206301512dacecc2d5648276a6a802@www.novabbs.org>
<us0cn8$24dt8$1@dont-email.me>
<ac55c75a923144f72d204c801ff7f984@www.novabbs.org>
<20240303165533.00004104@yahoo.com>
<2024Mar3.173345@mips.complang.tuwien.ac.at>
<20240303203052.00007c61@yahoo.com>
<2024Mar3.232237@mips.complang.tuwien.ac.at>
<20240304171457.000067ea@yahoo.com>
<2024Mar4.191835@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="ca10b4c024403001a5662ed8e59df858";
logging-data="3548838"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18hB5ulI1VIBEJEUDa/QtGAS8i5b8eg+yc="
Cancel-Lock: sha1:C+AJgtgEc7qr4MRNfUWcRUWINeA=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
 by: Michael S - Mon, 4 Mar 2024 22:18 UTC

On Mon, 04 Mar 2024 18:18:35 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

> Michael S <already5chosen@yahoo.com> writes:
> >On Sun, 03 Mar 2024 22:22:37 GMT
> >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
> >
> >> Michael S <already5chosen@yahoo.com> writes:
> >> >But even in that not particularly useful answer IPC appears to be
> >> >the least useful part. Far worse than FLOPS/Hz.
> >>
> >>
> >> I guess that the 21164 also showed close to 4 IPC on the 4-wide
> >> 21164 on matrix300, while its 2 integer units would limit it to
> >> much lower performance on, say, intmm.
> >>
> >> - anton
> >
> >I don't know about specific case of matrix300 and what
> >transformations are allowed by SPEC rules and what not, but if I
> >were tasked with writing generic DGEMM for Alpha 21164 with maximal
> >performance on non-small relatively square matrices as a amin goal,
> >then I'd start with something like that:
> >
> >// main_loop_3x6 - multiply 3 raws of A[][]
> >// by 6 columns of B[][] assuming C-language order
> ...
> >The loop consists of 9 loads, 4 pointer updates,
> >1 counter decrement, 1 conditional branch, 18 DP multiplies and 18
> >DP adds. 51 total instructions.
> >Ideally, it will run in 18 clocks, for IPC = 2.83.
>
> Given that starting 18 FP multiplies and 18 FP additions takes 18
> cycles, that is optimal. But you unrolled more than is necessary to
> achieve 2FlOPC (FP operations per cycle). With less unrolling, you
> could have achieved the same 2FLOPC and of course you would see higher
> IPC. And as Mitch Alsup explains, his 5.9 IPC was for a non-unrolled
> loop.
>
> >What is my point? My point is that I expect "medium-IPC" kernel like
> >above to achieve higher FLOPS (== better performance) then simpler,
> >smaller kernel with IPC in excess of 3.5.
>
> These days, with power limits resulting in lower clocks for programs
> that do more work, yes, I guess that you will see better FLOPS from
> variants that execute fewer instructions. But in the 90s, CPUs ran at
> their rated clock rate no matter what, and a 21164 would run a variant
> that does 2 FLOPc at the same speed as any other 2 FLOPC variant,
> whether that variant performs 0.83 non-flop instructions/cycle or 1.9
> non-flop instructions/cycle.
>

In 90-x CPUs had other reasons to minimize the # of instructions and
esp. the # of load instructions per task. E.g. too few banks in L1D
cache, so the cache that in theory supports two accesses per clock in
practice is closer to 1. E.g. very few hits served under miss. E.g. low
associativity. E.g. theoretically 4-wide instruction Fetch/Decode that
in practice delivers 4 decoded instructions only when all inner planets
in solar system are aligned.
According to my understanding, 21164 being speed racer suffered from
that sort of problems more than most competitors.

> But yes, 5.9 IPC on matrix300 shows little about the matrix multiply
> performance. Still, I think that the point is that there are many
> hurdles that might result in a lower IPC (for code where only 6IPC
> means 2FLOPC), the fact that they achieved 5.9 indicates that they
> managed to lower the hurdles a lot; true, it would be better if they
> could have shown it with code where 6IPC is more meaningful.
>
> - anton

Re: 88xxx or PPC

<0c2e37386287e8a0303191dc7b989c76@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37789&group=comp.arch#37789

  copy link   Newsgroups: comp.arch
Date: Tue, 5 Mar 2024 00:05:45 +0000
Subject: Re: 88xxx or PPC
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$02.w64fOwRkht6Lt1LKNFevxticsNeHcUnRrmNdGBM8dOcL2c8eCm
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uigus7$1pteb$1@dont-email.me> <nCtsN.64363$Sf59.39184@fx48.iad> <2024Jan25.162230@mips.complang.tuwien.ac.at> <urg471$215g3$5@dont-email.me> <d6206301512dacecc2d5648276a6a802@www.novabbs.org> <us0cn8$24dt8$1@dont-email.me> <ac55c75a923144f72d204c801ff7f984@www.novabbs.org> <20240303165533.00004104@yahoo.com> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com> <2024Mar3.232237@mips.complang.tuwien.ac.at> <20240304171457.000067ea@yahoo.com> <2024Mar4.191835@mips.complang.tuwien.ac.at> <20240305001833.000027a9@yahoo.com>
Organization: Rocksolid Light
Message-ID: <0c2e37386287e8a0303191dc7b989c76@www.novabbs.org>
 by: MitchAlsup1 - Tue, 5 Mar 2024 00:05 UTC

Michael S wrote:

> On Mon, 04 Mar 2024 18:18:35 GMT
> anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

>> These days, with power limits resulting in lower clocks for programs
>> that do more work, yes, I guess that you will see better FLOPS from
>> variants that execute fewer instructions. But in the 90s, CPUs ran at
>> their rated clock rate no matter what, and a 21164 would run a variant
>> that does 2 FLOPc at the same speed as any other 2 FLOPC variant,
>> whether that variant performs 0.83 non-flop instructions/cycle or 1.9
>> non-flop instructions/cycle.
>>

> In 90-x CPUs had other reasons to minimize the # of instructions and

Everyone Always have excellent reasons to minimize the number of instructions.

Over in CISC-land, it takes fewer instructions to get the job done.
Over in RISC-land, the instructions run in fewer nanoseconds.

The critical term in performance is::

seconds/program = instructions/program × cycles/instruction × seconds/cycle

RISCs tend to get instructions/program wrong and cycles/instruction right,
while
CISCs tend to get instructions/program right and cycles/instruction wrong.

I happen to believe that between RISC and CISC is a realm where one needs
fewer instructions but sacrifices essentially nothing in the frequency
department.

My 66000 tends to have only 10% more instructions than VAX while RISC-V
tends to have 50% more instructions than VAX--My 66000 needs only 71% the
instruction count as RISC-V.

> esp. the # of load instructions per task. E.g. too few banks in L1D
> cache, so the cache that in theory supports two accesses per clock in
> practice is closer to 1.

CISCs generally have a 45%-50% memory reference density, while
RISCs generally have a 30%-35% memory reference density. So, CISCs tend
to run into the cache banking wall at 2 IPC, while RISCs delay that wall
until 3 IPC.

> E.g. very few hits served under miss.

Accesses are correlated, so this is to be expected. The real question is
whether you can still perform with miss under miss !! Even if you don't
take hits, you can still get the next request out "there" sooner. Sooner
saves latency.

> E.g. low associativity.

Associativity costs power and area.

> E.g. theoretically 4-wide instruction Fetch/Decode that
> in practice delivers 4 decoded instructions only when all inner planets
> in solar system are aligned.

A 4-wide instruction fetch yields only 2.5 instructions per random access.
This is just std math:: (1+2+3+4)/4 = 2.5

But access to good predication means up to 1/3rd of all short branches can
be avoided. Few ISAs have access to "good" predication. Here, a good solution
for predication, drives the random 4-wide fetch access to deliver 3.25 instruc-
tion per fetch; a 50% increase.

> According to my understanding, 21164 being speed racer suffered from
> that sort of problems more than most competitors.

Because it was wider and faster it was more dependent on "everything
working well all the time" and the fact that it was high frequency
than others means all its bad cache behavior got multiplied by the
latency to deeper levels of the memory hierarchy !

Re: 88xxx or PPC

<us5t1c$36voh$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37790&group=comp.arch#37790

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: 88xxx or PPC
Date: Mon, 4 Mar 2024 17:37:48 -0800
Organization: A noiseless patient Spider
Lines: 56
Message-ID: <us5t1c$36voh$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me> <nCtsN.64363$Sf59.39184@fx48.iad>
<2024Jan25.162230@mips.complang.tuwien.ac.at> <urg471$215g3$5@dont-email.me>
<d6206301512dacecc2d5648276a6a802@www.novabbs.org>
<us0cn8$24dt8$1@dont-email.me>
<ac55c75a923144f72d204c801ff7f984@www.novabbs.org>
<20240303165533.00004104@yahoo.com>
<2024Mar3.173345@mips.complang.tuwien.ac.at>
<20240303203052.00007c61@yahoo.com>
<2024Mar3.232237@mips.complang.tuwien.ac.at>
<20240304171457.000067ea@yahoo.com>
<2024Mar4.191835@mips.complang.tuwien.ac.at>
<20240305001833.000027a9@yahoo.com>
<0c2e37386287e8a0303191dc7b989c76@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 5 Mar 2024 01:37:49 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="6207aa1737e71cfbbf69f822a72086c0";
logging-data="3374865"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ITZSnic/t4eTefJKMVcsPvlWaUYLoWNs="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:O8CUOfY2RT3bW/LNeorq5iConVc=
Content-Language: en-US
In-Reply-To: <0c2e37386287e8a0303191dc7b989c76@www.novabbs.org>
 by: Stephen Fuld - Tue, 5 Mar 2024 01:37 UTC

On 3/4/2024 4:05 PM, MitchAlsup1 wrote:
> Michael S wrote:
>
>> On Mon, 04 Mar 2024 18:18:35 GMT
>> anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>
>>> These days, with power limits resulting in lower clocks for programs
>>> that do more work, yes, I guess that you will see better FLOPS from
>>> variants that execute fewer instructions.  But in the 90s, CPUs ran at
>>> their rated clock rate no matter what, and a 21164 would run a variant
>>> that does 2 FLOPc at the same speed as any other 2 FLOPC variant,
>>> whether that variant performs 0.83 non-flop instructions/cycle or 1.9
>>> non-flop instructions/cycle.
>>>
>
>> In 90-x CPUs had other reasons to minimize the # of instructions and
>
> Everyone Always have excellent reasons to minimize the number of
> instructions.
>
> Over in CISC-land, it takes fewer instructions to get the job done.
> Over in RISC-land, the instructions run in fewer nanoseconds.
>
> The critical term in performance is::
>
> seconds/program = instructions/program × cycles/instruction × seconds/cycle
>
> RISCs tend to get instructions/program wrong and cycles/instruction
> right, while
> CISCs tend to get instructions/program right and cycles/instruction wrong.
>
> I happen to believe that between RISC and CISC is a realm where one needs
> fewer instructions but sacrifices essentially nothing in the frequency
> department.
>
> My 66000 tends to have only 10% more instructions than VAX while RISC-V
> tends to have 50% more instructions than VAX--My 66000 needs only 71%
> the instruction count as RISC-V.
>
>> esp. the # of load instructions per task. E.g. too few banks in L1D
>> cache, so the cache that in theory supports two accesses per clock in
>> practice is closer to 1.
>
> CISCs generally have a 45%-50% memory reference density, while
> RISCs generally have a 30%-35% memory reference density.

If those percentages are number of loads & stores divided by total
instruction count, isn't this just a restatement of your previous point
that CISCs need fewer instructions to do the job? i.e. the *time*
between loads or stores is the same for RISCs and CISCs?

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: 88xxx or PPC

<us5vlr$3ih4t$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37791&group=comp.arch#37791

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: 88xxx or PPC
Date: Mon, 4 Mar 2024 21:22:49 -0500
Organization: A noiseless patient Spider
Lines: 8
Message-ID: <us5vlr$3ih4t$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<2024Jan25.074631@mips.complang.tuwien.ac.at>
<nCtsN.64363$Sf59.39184@fx48.iad>
<2024Jan25.162230@mips.complang.tuwien.ac.at> <urg471$215g3$5@dont-email.me>
<d6206301512dacecc2d5648276a6a802@www.novabbs.org>
<us0cn8$24dt8$1@dont-email.me>
<ac55c75a923144f72d204c801ff7f984@www.novabbs.org>
<20240303165533.00004104@yahoo.com>
<2024Mar3.173345@mips.complang.tuwien.ac.at>
<20240303203052.00007c61@yahoo.com>
<1a8a8601b08f6cca5457d663f7ffa3b2@www.novabbs.org>
<2024Mar4.095209@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 5 Mar 2024 02:22:52 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c220762343412247c24b8a178013045b";
logging-data="3753117"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18n76VOh94NRaUjpSdyFYsrtIHOFZPhgE4="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:Xj0d7jxPo2/Myxj/Flwxzv1wXgY=
In-Reply-To: <2024Mar4.095209@mips.complang.tuwien.ac.at>
 by: Paul A. Clayton - Tue, 5 Mar 2024 02:22 UTC

On 3/4/24 3:52 AM, Anton Ertl wrote:
[snip]
> That was fixed by the cache-blocking transformation that everybody
> used, and which resulted in the elimination of matrix300 from SPEC92.

I seem to recall that art from SPEC CPU 2000 had a similar issue
where array of structures to structure of arrays compiler
transformation significantly improved performance.

Re: 88xxx or PPC

<df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37792&group=comp.arch#37792

  copy link   Newsgroups: comp.arch
Date: Tue, 5 Mar 2024 02:33:21 +0000
Subject: Re: 88xxx or PPC
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$DfP0WXonFPjJxuWILlxOOOrEtSwkWKmbBWByFboRudcitXp5P5aWW
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uigus7$1pteb$1@dont-email.me> <nCtsN.64363$Sf59.39184@fx48.iad> <2024Jan25.162230@mips.complang.tuwien.ac.at> <urg471$215g3$5@dont-email.me> <d6206301512dacecc2d5648276a6a802@www.novabbs.org> <us0cn8$24dt8$1@dont-email.me> <ac55c75a923144f72d204c801ff7f984@www.novabbs.org> <20240303165533.00004104@yahoo.com> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com> <2024Mar3.232237@mips.complang.tuwien.ac.at> <20240304171457.000067ea@yahoo.com> <2024Mar4.191835@mips.complang.tuwien.ac.at> <20240305001833.000027a9@yahoo.com> <0c2e37386287e8a0303191dc7b989c76@www.novabbs.org> <us5t1c$36voh$1@dont-email.me>
Organization: Rocksolid Light
Message-ID: <df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org>
 by: MitchAlsup1 - Tue, 5 Mar 2024 02:33 UTC

Stephen Fuld wrote:

> On 3/4/2024 4:05 PM, MitchAlsup1 wrote:
>> Michael S wrote:
>>
>>> On Mon, 04 Mar 2024 18:18:35 GMT
>>> anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>>
>>>> These days, with power limits resulting in lower clocks for programs
>>>> that do more work, yes, I guess that you will see better FLOPS from
>>>> variants that execute fewer instructions.  But in the 90s, CPUs ran at
>>>> their rated clock rate no matter what, and a 21164 would run a variant
>>>> that does 2 FLOPc at the same speed as any other 2 FLOPC variant,
>>>> whether that variant performs 0.83 non-flop instructions/cycle or 1.9
>>>> non-flop instructions/cycle.
>>>>
>>
>>> In 90-x CPUs had other reasons to minimize the # of instructions and
>>
>> Everyone Always have excellent reasons to minimize the number of
>> instructions.
>>
>> Over in CISC-land, it takes fewer instructions to get the job done.
>> Over in RISC-land, the instructions run in fewer nanoseconds.
>>
>> The critical term in performance is::
>>
>> seconds/program = instructions/program × cycles/instruction × seconds/cycle
>>
>> RISCs tend to get instructions/program wrong and cycles/instruction
>> right, while
>> CISCs tend to get instructions/program right and cycles/instruction wrong.
>>
>> I happen to believe that between RISC and CISC is a realm where one needs
>> fewer instructions but sacrifices essentially nothing in the frequency
>> department.
>>
>> My 66000 tends to have only 10% more instructions than VAX while RISC-V
>> tends to have 50% more instructions than VAX--My 66000 needs only 71%
>> the instruction count as RISC-V.
>>
>>> esp. the # of load instructions per task. E.g. too few banks in L1D
>>> cache, so the cache that in theory supports two accesses per clock in
>>> practice is closer to 1.
>>
>> CISCs generally have a 45%-50% memory reference density, while
>> RISCs generally have a 30%-35% memory reference density.

> If those percentages are number of loads & stores divided by total
> instruction count, isn't this just a restatement of your previous point
> that CISCs need fewer instructions to do the job? i.e. the *time*
> between loads or stores is the same for RISCs and CISCs?

Not when you include various other facts::
CISCs tend to have fewer registers
CISCs tend to have LD-OPs and some have LD-OP-STs
Both of the above give the compiler the illusion that inbound memory
references are less expensive than a typical LD because you get the
LD, and you don't have to waste a precious register. Thus there are
more memory references--but RISC compilers have taught us that more
registers beats LD-OPs--pipeline designers have taught us that thin-
ner pipelines perform better--both stand against LD-OPs and LD-OP-STs.

VAX went so far as to allow any operand and any result to be memory
{Most of us now believe this was a massive overstep.}

CISCs really do perform more memory references--not by as much as the
above statistics imply, but significantly more memory references.

Re: 88xxx or PPC

<us68aa$3jrj2$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37793&group=comp.arch#37793

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: 88xxx or PPC
Date: Mon, 4 Mar 2024 20:50:18 -0800
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <us68aa$3jrj2$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me> <nCtsN.64363$Sf59.39184@fx48.iad>
<2024Jan25.162230@mips.complang.tuwien.ac.at> <urg471$215g3$5@dont-email.me>
<d6206301512dacecc2d5648276a6a802@www.novabbs.org>
<us0cn8$24dt8$1@dont-email.me>
<ac55c75a923144f72d204c801ff7f984@www.novabbs.org>
<20240303165533.00004104@yahoo.com>
<2024Mar3.173345@mips.complang.tuwien.ac.at>
<20240303203052.00007c61@yahoo.com>
<2024Mar3.232237@mips.complang.tuwien.ac.at>
<20240304171457.000067ea@yahoo.com>
<2024Mar4.191835@mips.complang.tuwien.ac.at>
<20240305001833.000027a9@yahoo.com>
<0c2e37386287e8a0303191dc7b989c76@www.novabbs.org>
<us5t1c$36voh$1@dont-email.me>
<df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 5 Mar 2024 04:50:18 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="6207aa1737e71cfbbf69f822a72086c0";
logging-data="3796578"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19wfv/s43eyRKg67BhJueLOymS6TsIdbNo="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:QcrMPBdXX7DaIMHX5nQ2iBJTLqE=
In-Reply-To: <df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org>
Content-Language: en-US
 by: Stephen Fuld - Tue, 5 Mar 2024 04:50 UTC

On 3/4/2024 6:33 PM, MitchAlsup1 wrote:
> Stephen Fuld wrote:
>
>> On 3/4/2024 4:05 PM, MitchAlsup1 wrote:

snip

>>> CISCs generally have a 45%-50% memory reference density, while
>>> RISCs generally have a 30%-35% memory reference density.
>
>> If those percentages are number of loads & stores divided by total
>> instruction count, isn't this just a restatement of your previous
>> point that CISCs need fewer instructions to do the job?  i.e. the
>> *time* between loads or stores is the same for RISCs and CISCs?
>
>
> Not when you include various other facts::
> CISCs tend to have fewer registers
> CISCs tend to have LD-OPs and some have LD-OP-STs
> Both of the above give the compiler the illusion that inbound memory
> references are less expensive than a typical LD because you get the
> LD, and you don't have to waste a precious register. Thus there are
> more memory references--but RISC compilers have taught us that more
> registers beats LD-OPs--pipeline designers have taught us that thin-
> ner pipelines perform better--both stand against LD-OPs and LD-OP-STs.
>
> VAX went so far as to allow any operand and any result to be memory
> {Most of us now believe this was a massive overstep.}
>
> CISCs really do perform more memory references--not by as much as the
> above statistics imply, but significantly more memory references.

Interesting. Thanks.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: 88xxx or PPC

<3hGFN.115182$m4d.77183@fx43.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37795&group=comp.arch#37795

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx43.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: 88xxx or PPC
Newsgroups: comp.arch
References: <uigus7$1pteb$1@dont-email.me> <ac55c75a923144f72d204c801ff7f984@www.novabbs.org> <20240303165533.00004104@yahoo.com> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com> <2024Mar3.232237@mips.complang.tuwien.ac.at> <20240304171457.000067ea@yahoo.com> <2024Mar4.191835@mips.complang.tuwien.ac.at> <20240305001833.000027a9@yahoo.com> <0c2e37386287e8a0303191dc7b989c76@www.novabbs.org> <us5t1c$36voh$1@dont-email.me> <df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org>
Lines: 27
Message-ID: <3hGFN.115182$m4d.77183@fx43.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 05 Mar 2024 14:48:31 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Tue, 05 Mar 2024 14:48:31 GMT
X-Received-Bytes: 1903
 by: Scott Lurndal - Tue, 5 Mar 2024 14:48 UTC

mitchalsup@aol.com (MitchAlsup1) writes:
>Stephen Fuld wrote:
>
>> On 3/4/2024 4:05 PM, MitchAlsup1 wrote:

>>>
>>> CISCs generally have a 45%-50% memory reference density, while
>>> RISCs generally have a 30%-35% memory reference density.
>
>> If those percentages are number of loads & stores divided by total
>> instruction count, isn't this just a restatement of your previous point
>> that CISCs need fewer instructions to do the job? i.e. the *time*
>> between loads or stores is the same for RISCs and CISCs?
>
>
>Not when you include various other facts::
>CISCs tend to have fewer registers

How much of that is because active CISC architectures
are forty or fifty years old?

Would a modern, designed from scratch, CISC architecture
still restrict the number of registers?

If memory access ever becomes as fast a register access,
all bets will be off...

Re: 88xxx or PPC

<0cc87b9d559c4f79b9b2d7663fa3ccbf@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37798&group=comp.arch#37798

  copy link   Newsgroups: comp.arch
Date: Tue, 5 Mar 2024 15:44:29 +0000
Subject: Re: 88xxx or PPC
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$THAMWBoTnOaWvFjHeAlDF.ZspnsoPwNJYtZcyoLw9DMYICmweJFSS
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uigus7$1pteb$1@dont-email.me> <ac55c75a923144f72d204c801ff7f984@www.novabbs.org> <20240303165533.00004104@yahoo.com> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com> <2024Mar3.232237@mips.complang.tuwien.ac.at> <20240304171457.000067ea@yahoo.com> <2024Mar4.191835@mips.complang.tuwien.ac.at> <20240305001833.000027a9@yahoo.com> <0c2e37386287e8a0303191dc7b989c76@www.novabbs.org> <us5t1c$36voh$1@dont-email.me> <df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org> <3hGFN.115182$m4d.77183@fx43.iad>
Organization: Rocksolid Light
Message-ID: <0cc87b9d559c4f79b9b2d7663fa3ccbf@www.novabbs.org>
 by: MitchAlsup1 - Tue, 5 Mar 2024 15:44 UTC

Scott Lurndal wrote:

> mitchalsup@aol.com (MitchAlsup1) writes:
>>Stephen Fuld wrote:
>>
>>> On 3/4/2024 4:05 PM, MitchAlsup1 wrote:

>>>>
>>>> CISCs generally have a 45%-50% memory reference density, while
>>>> RISCs generally have a 30%-35% memory reference density.
>>
>>> If those percentages are number of loads & stores divided by total
>>> instruction count, isn't this just a restatement of your previous point
>>> that CISCs need fewer instructions to do the job? i.e. the *time*
>>> between loads or stores is the same for RISCs and CISCs?
>>
>>
>>Not when you include various other facts::
>>CISCs tend to have fewer registers

> How much of that is because active CISC architectures
> are forty or fifty years old?

The problems of encoding remain as relevant today as 50 years ago.
But the things one wants the ISA to do are larger today than 50 YA.
Those encodings with LD-OPs are pretty much restricted to 16 registers
(16 base registers) and here you still have OpCode mapping difficulties.

If you give up on LD-OPs to gain register count, you are already in the
RISC-camp.

> Would a modern, designed from scratch, CISC architecture
> still restrict the number of registers?

I wanted to do a 64-bit VAX minus the indirect address modes and
give it 32 registers. Never got around to it though.

My experience with My 66000 indicates one can get vanishingly close
to VAX instruction density (and count) with a RISC ISA done right.

> If memory access ever becomes as fast a register access,
> all bets will be off...

It won't, and never has.

Re: 88xxx or PPC

<4FHFN.165452$taff.97167@fx41.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37800&group=comp.arch#37800

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx41.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: 88xxx or PPC
Newsgroups: comp.arch
References: <uigus7$1pteb$1@dont-email.me> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com> <2024Mar3.232237@mips.complang.tuwien.ac.at> <20240304171457.000067ea@yahoo.com> <2024Mar4.191835@mips.complang.tuwien.ac.at> <20240305001833.000027a9@yahoo.com> <0c2e37386287e8a0303191dc7b989c76@www.novabbs.org> <us5t1c$36voh$1@dont-email.me> <df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org> <3hGFN.115182$m4d.77183@fx43.iad> <0cc87b9d559c4f79b9b2d7663fa3ccbf@www.novabbs.org>
Lines: 10
Message-ID: <4FHFN.165452$taff.97167@fx41.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 05 Mar 2024 16:22:24 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Tue, 05 Mar 2024 16:22:24 GMT
X-Received-Bytes: 1297
 by: Scott Lurndal - Tue, 5 Mar 2024 16:22 UTC

mitchalsup@aol.com (MitchAlsup1) writes:
>Scott Lurndal wrote:

>> If memory access ever becomes as fast a register access,
>> all bets will be off...
>
>It won't, and never has.

There were a number of historic implementations where the
registers were actually stored in low memory.

Re: 88xxx or PPC

<98fa39483907eb8f0b44a13fa1596981@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37803&group=comp.arch#37803

  copy link   Newsgroups: comp.arch
Date: Tue, 5 Mar 2024 17:33:23 +0000
Subject: Re: 88xxx or PPC
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$oJ45UbkUmKGcNxeQNChrGeIg4XygGJ7yeXuoIRqrbOO4jKWhNBNj2
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uigus7$1pteb$1@dont-email.me> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com> <2024Mar3.232237@mips.complang.tuwien.ac.at> <20240304171457.000067ea@yahoo.com> <2024Mar4.191835@mips.complang.tuwien.ac.at> <20240305001833.000027a9@yahoo.com> <0c2e37386287e8a0303191dc7b989c76@www.novabbs.org> <us5t1c$36voh$1@dont-email.me> <df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org> <3hGFN.115182$m4d.77183@fx43.iad> <0cc87b9d559c4f79b9b2d7663fa3ccbf@www.novabbs.org> <4FHFN.165452$taff.97167@fx41.iad>
Organization: Rocksolid Light
Message-ID: <98fa39483907eb8f0b44a13fa1596981@www.novabbs.org>
 by: MitchAlsup1 - Tue, 5 Mar 2024 17:33 UTC

Scott Lurndal wrote:

> mitchalsup@aol.com (MitchAlsup1) writes:
>>Scott Lurndal wrote:

>>> If memory access ever becomes as fast a register access,
>>> all bets will be off...
>>
>>It won't, and never has.

> There were a number of historic implementations where the
> registers were actually stored in low memory.

Yes, but that is making registers as slow as memory,
not making memory as fast as registers.

Re: 88xxx or PPC

<2024Mar5.185714@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37805&group=comp.arch#37805

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: 88xxx or PPC
Date: Tue, 05 Mar 2024 17:57:14 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 11
Message-ID: <2024Mar5.185714@mips.complang.tuwien.ac.at>
References: <uigus7$1pteb$1@dont-email.me> <20240303165533.00004104@yahoo.com> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com> <2024Mar3.232237@mips.complang.tuwien.ac.at> <20240304171457.000067ea@yahoo.com> <2024Mar4.191835@mips.complang.tuwien.ac.at> <20240305001833.000027a9@yahoo.com> <0c2e37386287e8a0303191dc7b989c76@www.novabbs.org> <us5t1c$36voh$1@dont-email.me> <df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org> <3hGFN.115182$m4d.77183@fx43.iad>
Injection-Info: dont-email.me; posting-host="e3a6c9aa69f794531e98f8bdcaa53fa3";
logging-data="4092778"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+4LSrkDau93PWJuDwgNOgK"
Cancel-Lock: sha1:SoSF4AHS6VX7Tpdw+EsNe6klYXA=
X-newsreader: xrn 10.11
 by: Anton Ertl - Tue, 5 Mar 2024 17:57 UTC

scott@slp53.sl.home (Scott Lurndal) writes:
>Would a modern, designed from scratch, CISC architecture
>still restrict the number of registers?

Designed from scratch? AVX-512 supports 32 xmm/ymm/zmm registers.
Intel APX will support 32 GPRs.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: 88xxx or PPC

<2eJFN.88677$zqTf.47871@fx35.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37806&group=comp.arch#37806

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!newsfeed.endofthelinebbs.com!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx35.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: 88xxx or PPC
Newsgroups: comp.arch
References: <uigus7$1pteb$1@dont-email.me> <2024Mar3.232237@mips.complang.tuwien.ac.at> <20240304171457.000067ea@yahoo.com> <2024Mar4.191835@mips.complang.tuwien.ac.at> <20240305001833.000027a9@yahoo.com> <0c2e37386287e8a0303191dc7b989c76@www.novabbs.org> <us5t1c$36voh$1@dont-email.me> <df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org> <3hGFN.115182$m4d.77183@fx43.iad> <0cc87b9d559c4f79b9b2d7663fa3ccbf@www.novabbs.org> <4FHFN.165452$taff.97167@fx41.iad> <98fa39483907eb8f0b44a13fa1596981@www.novabbs.org>
Lines: 28
Message-ID: <2eJFN.88677$zqTf.47871@fx35.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Tue, 05 Mar 2024 18:10:06 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Tue, 05 Mar 2024 18:10:06 GMT
X-Received-Bytes: 1990
 by: Scott Lurndal - Tue, 5 Mar 2024 18:10 UTC

mitchalsup@aol.com (MitchAlsup1) writes:
>Scott Lurndal wrote:
>
>> mitchalsup@aol.com (MitchAlsup1) writes:
>>>Scott Lurndal wrote:
>
>>>> If memory access ever becomes as fast a register access,
>>>> all bets will be off...
>>>
>>>It won't, and never has.
>
>> There were a number of historic implementations where the
>> registers were actually stored in low memory.
>
>
>Yes, but that is making registers as slow as memory,
> not making memory as fast as registers.

With something like SRAM, with less than 1ns latency, then. With that
you might completely eliminate registers and do everything
memory-to-memory. The small gain lost by eliminating
registers will likely be offset by fewer instructions
to execute. Sufficient for most desktop users, surely.

That's assuming SRAM can scale at reasonable cost due
to some future technology breakthrough (or some other
future non-volatile memory technology with favorable
access timing).

Re: 88xxx or PPC

<08404980d6265588ebd10715e966b1da@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37814&group=comp.arch#37814

  copy link   Newsgroups: comp.arch
Date: Tue, 5 Mar 2024 19:28:01 +0000
Subject: Re: 88xxx or PPC
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$JyZmgFS842QS/lzJOWw2bOIJp1SFhufNNzodPMDT/15LckvjEwfpa
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uigus7$1pteb$1@dont-email.me> <2024Mar3.232237@mips.complang.tuwien.ac.at> <20240304171457.000067ea@yahoo.com> <2024Mar4.191835@mips.complang.tuwien.ac.at> <20240305001833.000027a9@yahoo.com> <0c2e37386287e8a0303191dc7b989c76@www.novabbs.org> <us5t1c$36voh$1@dont-email.me> <df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org> <3hGFN.115182$m4d.77183@fx43.iad> <0cc87b9d559c4f79b9b2d7663fa3ccbf@www.novabbs.org> <4FHFN.165452$taff.97167@fx41.iad> <98fa39483907eb8f0b44a13fa1596981@www.novabbs.org> <2eJFN.88677$zqTf.47871@fx35.iad>
Organization: Rocksolid Light
Message-ID: <08404980d6265588ebd10715e966b1da@www.novabbs.org>
 by: MitchAlsup1 - Tue, 5 Mar 2024 19:28 UTC

Scott Lurndal wrote:

> mitchalsup@aol.com (MitchAlsup1) writes:
>>Scott Lurndal wrote:
>>
>>> mitchalsup@aol.com (MitchAlsup1) writes:
>>>>Scott Lurndal wrote:
>>
>>>>> If memory access ever becomes as fast a register access,
>>>>> all bets will be off...
>>>>
>>>>It won't, and never has.
>>
>>> There were a number of historic implementations where the
>>> registers were actually stored in low memory.
>>
>>
>>Yes, but that is making registers as slow as memory,
>> not making memory as fast as registers.

> With something like SRAM, with less than 1ns latency, then. With that
> you might completely eliminate registers and do everything
> memory-to-memory. The small gain lost by eliminating
> registers will likely be offset by fewer instructions
> to execute. Sufficient for most desktop users, surely.

Registers are 200ps, while on-die SRAM can cycle at 200ps
you still have to pay the latency to route address-bits
from AGEN to the SRAM arrays, and route the data back,
while one can access a register and run through the
forwarding logic in the same cycle. So, you are essentially
comparing something that costs ½* cycle to one that costs
1¼ cycles.

(*) maybe ¾-cycle on a GB physical RF.

> That's assuming SRAM can scale at reasonable cost due
> to some future technology breakthrough (or some other
> future non-volatile memory technology with favorable
> access timing).

I have lived in FAB environments where I was told "SRAM
will approach DRAM densities* in a generation or two, too.
Still, yet to happen.

(*) They thought they had to keep the DRAM capacitor above
a certain amount of fF to retain the 8ms (-16ms) refresh
rates. What they really had to do was control leakage !

Re: 88xxx or PPC

<20240305223439.00001b1a@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37815&group=comp.arch#37815

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.arch
Subject: Re: 88xxx or PPC
Date: Tue, 5 Mar 2024 22:34:39 +0200
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <20240305223439.00001b1a@yahoo.com>
References: <uigus7$1pteb$1@dont-email.me>
<2024Mar3.232237@mips.complang.tuwien.ac.at>
<20240304171457.000067ea@yahoo.com>
<2024Mar4.191835@mips.complang.tuwien.ac.at>
<20240305001833.000027a9@yahoo.com>
<0c2e37386287e8a0303191dc7b989c76@www.novabbs.org>
<us5t1c$36voh$1@dont-email.me>
<df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org>
<3hGFN.115182$m4d.77183@fx43.iad>
<0cc87b9d559c4f79b9b2d7663fa3ccbf@www.novabbs.org>
<4FHFN.165452$taff.97167@fx41.iad>
<98fa39483907eb8f0b44a13fa1596981@www.novabbs.org>
<2eJFN.88677$zqTf.47871@fx35.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="31f8214f19f29f680f4ca2d83e873e49";
logging-data="4151095"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19/JbVJzF74127EsTc1iXjTvvaMj50C4sE="
Cancel-Lock: sha1:1j2DipoCjnN9oirvdS8XRzIBjnE=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
 by: Michael S - Tue, 5 Mar 2024 20:34 UTC

On Tue, 05 Mar 2024 18:10:06 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:

> mitchalsup@aol.com (MitchAlsup1) writes:
> >Scott Lurndal wrote:
> >
> >> mitchalsup@aol.com (MitchAlsup1) writes:
> >>>Scott Lurndal wrote:
> >
> >>>> If memory access ever becomes as fast a register access,
> >>>> all bets will be off...
> >>>
> >>>It won't, and never has.
> >
> >> There were a number of historic implementations where the
> >> registers were actually stored in low memory.
> >
> >
> >Yes, but that is making registers as slow as memory,
> > not making memory as fast as registers.
>
> With something like SRAM, with less than 1ns latency, then. With that
> you might completely eliminate registers and do everything
> memory-to-memory. The small gain lost by eliminating
> registers will likely be offset by fewer instructions
> to execute. Sufficient for most desktop users, surely.
>
> That's assuming SRAM can scale at reasonable cost due
> to some future technology breakthrough

1ns latency means the size of your SRAM array is 256KB at best.
More realistically 128 KB.
The only possible breakthrough I can see at this front is 3D
working much better than anticipated (and better than how well it works
for 3D NAND flash). But even that can improve capacity by factor of 200
at best, more realistically by factor of 100.
So, in the best possible future scenario, given all benefits of doubt
1ns latency requirement limits the size of your SRAM to ~50 MB.
Make it at least 5 ns instead of 1 then you can start dreaming.
Still more for benefit of your grandchildren rather than for yourself.

> (or some other
> future non-volatile memory technology with favorable
> access timing).

Re: 88xxx or PPC

<20240305224037.000077a4@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=37816&group=comp.arch#37816

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.arch
Subject: Re: 88xxx or PPC
Date: Tue, 5 Mar 2024 22:40:37 +0200
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <20240305224037.000077a4@yahoo.com>
References: <uigus7$1pteb$1@dont-email.me>
<2024Mar3.232237@mips.complang.tuwien.ac.at>
<20240304171457.000067ea@yahoo.com>
<2024Mar4.191835@mips.complang.tuwien.ac.at>
<20240305001833.000027a9@yahoo.com>
<0c2e37386287e8a0303191dc7b989c76@www.novabbs.org>
<us5t1c$36voh$1@dont-email.me>
<df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org>
<3hGFN.115182$m4d.77183@fx43.iad>
<0cc87b9d559c4f79b9b2d7663fa3ccbf@www.novabbs.org>
<4FHFN.165452$taff.97167@fx41.iad>
<98fa39483907eb8f0b44a13fa1596981@www.novabbs.org>
<2eJFN.88677$zqTf.47871@fx35.iad>
<08404980d6265588ebd10715e966b1da@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="31f8214f19f29f680f4ca2d83e873e49";
logging-data="4151095"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/9ERc717Uyh5z3nZt2zhcjxYr/HzFRLx8="
Cancel-Lock: sha1:kW7o615IlLMb384AFc9+8s9i7Zg=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
 by: Michael S - Tue, 5 Mar 2024 20:40 UTC

On Tue, 5 Mar 2024 19:28:01 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:
>
> I have lived in FAB environments where I was told "SRAM
> will approach DRAM densities* in a generation or two, too.
> Still, yet to happen.
>
> (*) They thought they had to keep the DRAM capacitor above
> a certain amount of fF to retain the 8ms (-16ms) refresh
> rates. What they really had to do was control leakage !

According to what I read on RWT forum, in recent 3-4 years the table
was turned completely - SRAM-to-DRAM area ratio is growing rather than
shrinking. Slowly, of course, nowadays everything is slow.


devel / comp.arch / Concertina II Progress

Pages:123456789101112131415161718192021222324252627282930313233343536373839
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor