Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

Even bytes get lonely for a little bit.


devel / comp.arch / Re: The vector machines of ITH Zurich

SubjectAuthor
* The vector machines of ITH ZurichJimBrakefield
+* Re: The vector machines of ITH ZurichMitchAlsup
|+* Re: The vector machines of ITH ZurichJimBrakefield
||`* Re: The vector machines of ITH Zurichmac
|| `- Re: The vector machines of ITH ZurichIvan Godard
|`* Re: The vector machines of ITH ZurichQuadibloc
| +- Re: The vector machines of ITH ZurichQuadibloc
| +* Re: The vector machines of ITH Zurichluke.l...@gmail.com
| |`* Re: The vector machines of ITH ZurichQuadibloc
| | +- Re: The vector machines of ITH Zurichluke.l...@gmail.com
| | `- Re: The vector machines of ITH ZurichJimBrakefield
| `* Re: The vector machines of ITH ZurichMitchAlsup
|  +* Re: The vector machines of ITH ZurichQuadibloc
|  |`* Re: The vector machines of ITH Zurichluke.l...@gmail.com
|  | `* Re: The vector machines of ITH ZurichMitchAlsup
|  |  `- Re: The vector machines of ITH ZurichPaul A. Clayton
|  `* Re: The vector machines of ITH ZurichBrett
|   +* Re: The vector machines of ITH ZurichMitchAlsup
|   |+* Re: The vector machines of ITH ZurichTerje Mathisen
|   ||+- Re: The vector machines of ITH ZurichMichael S
|   ||`* Re: The vector machines of ITH ZurichAnton Ertl
|   || +* Re: The vector machines of ITH ZurichMichael S
|   || |`* Re: The vector machines of ITH ZurichAnton Ertl
|   || | +* Re: The vector machines of ITH ZurichScott Lurndal
|   || | |`* Re: The vector machines of ITH ZurichMichael S
|   || | | `* Re: The vector machines of ITH ZurichMitchAlsup
|   || | |  `- Re: The vector machines of ITH ZurichMichael S
|   || | `- Re: The vector machines of ITH ZurichMichael S
|   || `- Re: The vector machines of ITH ZurichTerje Mathisen
|   |+* Re: The vector machines of ITH ZurichStephen Fuld
|   ||`* Re: The vector machines of ITH ZurichAnton Ertl
|   || +- Re: The vector machines of ITH ZurichQuadibloc
|   || `- Re: The vector machines of ITH ZurichStephen Fuld
|   |`- Re: The vector machines of ITH ZurichJohn Dallman
|   `* Re: The vector machines of ITH ZurichThomas Koenig
|    `- Re: The vector machines of ITH ZurichMitchAlsup
`- Re: The vector machines of ITH Zurichluke.l...@gmail.com

Pages:12
Re: The vector machines of ITH Zurich

<u5vmse$1v86a$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32730&group=comp.arch#32730

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: The vector machines of ITH Zurich
Date: Fri, 9 Jun 2023 10:18:06 -0700
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <u5vmse$1v86a$1@dont-email.me>
References: <386698fb-72ba-435e-b99b-e251e29bb773n@googlegroups.com>
<0b405880-1e5f-44eb-a2cb-a0f8099c255bn@googlegroups.com>
<7ed26efe-3be6-4cf0-a097-16d98a8f37dcn@googlegroups.com>
<5563d673-57fb-441c-a7f1-81878a0bfee6n@googlegroups.com>
<u5qtk8$185op$1@dont-email.me>
<1ca69920-e709-4aac-871e-e406d6372a54n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 9 Jun 2023 17:18:06 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3aaf66da1caa3e6f7643eef5c56c093f";
logging-data="2072778"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ipDERiFSlw61IErxWCk6VDqS2xu5FrlU="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:Ln5sE2Z3TnUwQAoMxbvKxqeaG5s=
Content-Language: en-US
In-Reply-To: <1ca69920-e709-4aac-871e-e406d6372a54n@googlegroups.com>
 by: Stephen Fuld - Fri, 9 Jun 2023 17:18 UTC

On 6/8/2023 10:00 AM, MitchAlsup wrote:
> On Wednesday, June 7, 2023 at 4:42:36 PM UTC-5, Brett wrote:

snip

>> With the M2 machines Apple has dumped DIMMs and brought RAM onto the CPU
>> carrier, quadrupling the dram bandwidth.
> <
> And makes upgrading DRAM size impossible.

I don't know how much better performance the quadrupling of DRAM
bandwidth provides, but if it is at all significant, I would bet that
the average Mac customer would prefer better performance to DRAM size
upgradeability.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: The vector machines of ITH Zurich

<2023Jun10.094351@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32743&group=comp.arch#32743

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: The vector machines of ITH Zurich
Date: Sat, 10 Jun 2023 07:43:51 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 38
Message-ID: <2023Jun10.094351@mips.complang.tuwien.ac.at>
References: <386698fb-72ba-435e-b99b-e251e29bb773n@googlegroups.com> <0b405880-1e5f-44eb-a2cb-a0f8099c255bn@googlegroups.com> <7ed26efe-3be6-4cf0-a097-16d98a8f37dcn@googlegroups.com> <5563d673-57fb-441c-a7f1-81878a0bfee6n@googlegroups.com> <u5qtk8$185op$1@dont-email.me> <1ca69920-e709-4aac-871e-e406d6372a54n@googlegroups.com> <u5vmse$1v86a$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="533b55acbe9bb2d55ad8492a0146944a";
logging-data="2374501"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/pqIDKEl4wOAkdKCMWTq2o"
Cancel-Lock: sha1:zVepEiTwCJTVbal/RXQqL3629CI=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sat, 10 Jun 2023 07:43 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>I don't know how much better performance the quadrupling of DRAM
>bandwidth provides

That depends on the application. You can get some idea by looking at
the 4-channel (3990X) vs. 8-channel (3995WX) results on
<https://www.anandtech.com/print/16478/64-cores-of-rendering-madness-the-amd-threadripper-pro-3995wx-review>.

And you can find an explicit 2-channel vs. 8-channel comparison in
<https://www.anandtech.com/print/16482/lenovo-thinkstation-p620-review-a-vehicle-for-threadripper-pro>.
In 25 out of 105 tests, the performance of the 8-channel configuration
was more than a factor 1.03 compared to the 2-channel configuration.

And note that the latter results were with 64-core CPUs with 51.2 GB/s
or 204.8GB/s memory bandwidth, wheras the M2 Max has 12 (8P+4E) cores
with (according to
<https://www.apple.com/mz/newsroom/2023/06/apple-introduces-m2-ultra/>)
400GB/s memory bandwidth and the M2 Ultra has 24 (16P+8E) cores with
800GB/s memory bandwidth. I expect that, as far as CPU performance
goes, both could do with a quarter of the bandwidth, and very few
applications would see a measurable difference; things may be
different for stuff that uses the GPU. If more CPU performance is
desired and the application is parallelizable, there are good chances
that a Threadripper 5995WX box (more than twice the cores) beats an M2
Ultra box (four times the memory bandwidth).

>but if it is at all significant, I would bet that
>the average Mac customer would prefer better performance to DRAM size
>upgradeability.

Why? Are Macs slow? I expect that my collegue would have preferred
to be able to upgrade the RAM of his laptop over having more memory
bandwidth.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: The vector machines of ITH Zurich

<7fa0d5ee-496b-42f6-88e0-50bc9b03fab2n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32746&group=comp.arch#32746

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:6487:0:b0:75b:2611:8b66 with SMTP id y129-20020a376487000000b0075b26118b66mr665139qkb.12.1686403569674;
Sat, 10 Jun 2023 06:26:09 -0700 (PDT)
X-Received: by 2002:a05:6870:a8ad:b0:196:6371:c8fb with SMTP id
eb45-20020a056870a8ad00b001966371c8fbmr1240722oab.11.1686403569410; Sat, 10
Jun 2023 06:26:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 10 Jun 2023 06:26:09 -0700 (PDT)
In-Reply-To: <2023Jun10.094351@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa34:c000:fdfa:4a7d:8e64:2953;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa34:c000:fdfa:4a7d:8e64:2953
References: <386698fb-72ba-435e-b99b-e251e29bb773n@googlegroups.com>
<0b405880-1e5f-44eb-a2cb-a0f8099c255bn@googlegroups.com> <7ed26efe-3be6-4cf0-a097-16d98a8f37dcn@googlegroups.com>
<5563d673-57fb-441c-a7f1-81878a0bfee6n@googlegroups.com> <u5qtk8$185op$1@dont-email.me>
<1ca69920-e709-4aac-871e-e406d6372a54n@googlegroups.com> <u5vmse$1v86a$1@dont-email.me>
<2023Jun10.094351@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7fa0d5ee-496b-42f6-88e0-50bc9b03fab2n@googlegroups.com>
Subject: Re: The vector machines of ITH Zurich
From: jsavard@ecn.ab.ca (Quadibloc)
Injection-Date: Sat, 10 Jun 2023 13:26:09 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2109
 by: Quadibloc - Sat, 10 Jun 2023 13:26 UTC

On Saturday, June 10, 2023 at 2:13:29 AM UTC-6, Anton Ertl wrote:
> Stephen Fuld <sf...@alumni.cmu.edu.invalid> writes:

> >but if it is at all significant, I would bet that
> >the average Mac customer would prefer better performance to DRAM size
> >upgradeability.

> Why? Are Macs slow?

I think it is rather that Macintosh computers, in general, have very limited
or no upgrade options, and thus the average Mac customer is not concerned
too much about upgradeability - those who are, are PC customers instead.

John Savard

Re: The vector machines of ITH Zurich

<memo.20230610151613.5208b@jgd.cix.co.uk>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32747&group=comp.arch#32747

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jgd@cix.co.uk (John Dallman)
Newsgroups: comp.arch
Subject: Re: The vector machines of ITH Zurich
Date: Sat, 10 Jun 2023 15:16 +0100 (BST)
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <memo.20230610151613.5208b@jgd.cix.co.uk>
References: <1ca69920-e709-4aac-871e-e406d6372a54n@googlegroups.com>
Reply-To: jgd@cix.co.uk
Injection-Info: dont-email.me; posting-host="bbbad20b04eb1a91c1c1f51d956ba1b3";
logging-data="2452303"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/1CNo0nKeY1AuUZZc9Zl6Jg9Zfbiy3T1I="
Cancel-Lock: sha1:0mQ2Bxe8BItWTPMyCqj1b9cfMjA=
 by: John Dallman - Sat, 10 Jun 2023 14:16 UTC

In article <1ca69920-e709-4aac-871e-e406d6372a54n@googlegroups.com>,
MitchAlsup@aol.com (MitchAlsup) wrote:

> On Wednesday, June 7, 2023 at 4:42:36_PM UTC-5, Brett wrote:
> > With the M2 machines Apple has dumped DIMMs and brought RAM onto
> > the CPU carrier, quadrupling the dram bandwidth.
> And makes upgrading DRAM size impossible.

M-series Macs do not have spinning disks, only SSDs, and fast ones at
that. They have quite limited RAM, but they can do demand paging very
quickly. This works pretty well in my limited experience of using them
for software building and testing.

John

Re: The vector machines of ITH Zurich

<u620l4$2aqus$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32748&group=comp.arch#32748

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: The vector machines of ITH Zurich
Date: Sat, 10 Jun 2023 07:17:07 -0700
Organization: A noiseless patient Spider
Lines: 52
Message-ID: <u620l4$2aqus$1@dont-email.me>
References: <386698fb-72ba-435e-b99b-e251e29bb773n@googlegroups.com>
<0b405880-1e5f-44eb-a2cb-a0f8099c255bn@googlegroups.com>
<7ed26efe-3be6-4cf0-a097-16d98a8f37dcn@googlegroups.com>
<5563d673-57fb-441c-a7f1-81878a0bfee6n@googlegroups.com>
<u5qtk8$185op$1@dont-email.me>
<1ca69920-e709-4aac-871e-e406d6372a54n@googlegroups.com>
<u5vmse$1v86a$1@dont-email.me> <2023Jun10.094351@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 10 Jun 2023 14:17:09 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3ffbe55976860e96232b7c7ffe7520ff";
logging-data="2452444"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX182XYB9/u0NvyLdQf/VEi4A5ZjzjldZ5XY="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:TzBIY6QqMTBVZtg64vvg5vr2BLs=
Content-Language: en-US
In-Reply-To: <2023Jun10.094351@mips.complang.tuwien.ac.at>
 by: Stephen Fuld - Sat, 10 Jun 2023 14:17 UTC

On 6/10/2023 12:43 AM, Anton Ertl wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>> I don't know how much better performance the quadrupling of DRAM
>> bandwidth provides
>
> That depends on the application. You can get some idea by looking at
> the 4-channel (3990X) vs. 8-channel (3995WX) results on
> <https://www.anandtech.com/print/16478/64-cores-of-rendering-madness-the-amd-threadripper-pro-3995wx-review>.
>
> And you can find an explicit 2-channel vs. 8-channel comparison in
> <https://www.anandtech.com/print/16482/lenovo-thinkstation-p620-review-a-vehicle-for-threadripper-pro>.
> In 25 out of 105 tests, the performance of the 8-channel configuration
> was more than a factor 1.03 compared to the 2-channel configuration.
>
> And note that the latter results were with 64-core CPUs with 51.2 GB/s
> or 204.8GB/s memory bandwidth, wheras the M2 Max has 12 (8P+4E) cores
> with (according to
> <https://www.apple.com/mz/newsroom/2023/06/apple-introduces-m2-ultra/>)
> 400GB/s memory bandwidth and the M2 Ultra has 24 (16P+8E) cores with
> 800GB/s memory bandwidth. I expect that, as far as CPU performance
> goes, both could do with a quarter of the bandwidth, and very few
> applications would see a measurable difference; things may be
> different for stuff that uses the GPU. If more CPU performance is
> desired and the application is parallelizable, there are good chances
> that a Threadripper 5995WX box (more than twice the cores) beats an M2
> Ultra box (four times the memory bandwidth).
>
>> but if it is at all significant, I would bet that
>> the average Mac customer would prefer better performance to DRAM size
>> upgradeability.
>
> Why? Are Macs slow?

I am not saying that, but my understanding is that a big market for Macs
is for graphic design/animation, etc. where memory bandwidth is often a
limiting factor, and faster is always better, and often noticeable.

> I expect that my collegue would have preferred
> to be able to upgrade the RAM of his laptop over having more memory
> bandwidth.

While I don't doubt that, from my understanding, your colleague may not
be representative of the "typical" Mac user. Of course YMMV, or perhaps
more accurately YCMMV. :-)

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: The vector machines of ITH Zurich

<u62kj6$2dc9h$3@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32756&group=comp.arch#32756

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: The vector machines of ITH Zurich
Date: Sat, 10 Jun 2023 15:57:26 -0400
Organization: A noiseless patient Spider
Lines: 53
Message-ID: <u62kj6$2dc9h$3@dont-email.me>
References: <386698fb-72ba-435e-b99b-e251e29bb773n@googlegroups.com>
<0b405880-1e5f-44eb-a2cb-a0f8099c255bn@googlegroups.com>
<7ed26efe-3be6-4cf0-a097-16d98a8f37dcn@googlegroups.com>
<5563d673-57fb-441c-a7f1-81878a0bfee6n@googlegroups.com>
<8b0f3416-68bd-4c73-bc1c-7f795009b7c8n@googlegroups.com>
<cdcd3a51-734a-4634-99e9-0f2210b45de6n@googlegroups.com>
<2aa5217e-8b63-4ba0-a939-24f11e6e1cabn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 10 Jun 2023 19:57:26 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3c2ccde43b5350e23ec8dde74430ce50";
logging-data="2535729"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+bk/D24nBPu/f2iA167zCIh727ny6DpX8="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:GFMlVR7y+Vod0K8lMpwUG80kOuM=
X-Mozilla-News-Host: news://news.eternal-september.org
In-Reply-To: <2aa5217e-8b63-4ba0-a939-24f11e6e1cabn@googlegroups.com>
 by: Paul A. Clayton - Sat, 10 Jun 2023 19:57 UTC

On 6/3/23 12:52 PM, MitchAlsup wrote:
[snip]
> On my GBOoO designs, the CAMs only look at 1 of the
> 6 tag-busses; yes all 6 tag-busses go to every CAM,
> but during INSERT the renamer tells the CAM which FU
> will produce the result, and so there is a 6-way mux from
> the tag-busses to the CAM, and the CAM only looks at a
> single bus.

It seems that for very large schedulers one might be able to use
something like set associativity where some scheduler entries only
listen to certain classes of results. (For smaller schedulers
"conflict misses" would excessively shrink utilization.) One might
include a "fully [or more highly] associative victim cache" to
reduce the impact of bad cases.

This would effectively be a hardwired filter/router.

(If each dynamic operand had a different scheduler entry and
operations with multiple dynamic operands somehow merged the ready
signals, such might reduce the number of CAMs needed as well as
possibly facilitate N-operand operations without a matrix
scheduler. I suspect such would have severe routing issues, but
perhaps constraints could ease those issues without losing most of
the advantages — if those advantages actually exist. Last arriving
prediction can reduce all operations [except perhaps a few that
are hard to predict] to one operand for scheduling, but adds
prediction/recovery overhead.)

Depending on rename-time FU choice would constrain scheduling.
While this would work well when each functional unit has a
dedicated scheduler, I have read that "unified" schedulers pair
execution ports in a scrambled/overlaid manner to provide the
utilization advantages of a unified scheduler with the routing
advantages of localized schedulers. For such a design, the
specific port would not be known at rename/insertion (though a
choice of only two is still easier than a fully general checking).
Even with such a design one could filter by the scheduler unit
rather than the execution port (FU).

(Variable latency also seems to introduce complexity. If a port
has operations that take one cycle or two cycles, two results
could be available from that port at the same time.)

(I wonder if one could use scheduler limited associativity with
something like overlaid skewed associativity to increase entry
utilization while providing advantages of limited associativity.
Managing routing/locality concerns seems challenging.)

In general, there is often information available earlier to help
filter decisions and specializations can often reduce resource
requirements without detracting excessively from the utilization
advantages of generality.

Re: The vector machines of ITH Zurich

<87eeca96-b881-4944-bb92-3634817a1656n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32758&group=comp.arch#32758

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1a04:b0:3f7:fab0:6317 with SMTP id f4-20020a05622a1a0400b003f7fab06317mr1700387qtb.10.1686427431324;
Sat, 10 Jun 2023 13:03:51 -0700 (PDT)
X-Received: by 2002:a05:6870:b792:b0:19e:8ab9:8f6e with SMTP id
ed18-20020a056870b79200b0019e8ab98f6emr1634753oab.0.1686427431082; Sat, 10
Jun 2023 13:03:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 10 Jun 2023 13:03:50 -0700 (PDT)
In-Reply-To: <2023Jun9.175655@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:781f:25b8:69ef:610e;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:781f:25b8:69ef:610e
References: <386698fb-72ba-435e-b99b-e251e29bb773n@googlegroups.com>
<0b405880-1e5f-44eb-a2cb-a0f8099c255bn@googlegroups.com> <7ed26efe-3be6-4cf0-a097-16d98a8f37dcn@googlegroups.com>
<5563d673-57fb-441c-a7f1-81878a0bfee6n@googlegroups.com> <u5qtk8$185op$1@dont-email.me>
<1ca69920-e709-4aac-871e-e406d6372a54n@googlegroups.com> <u5ue1c$1r0s1$1@dont-email.me>
<2023Jun9.124633@mips.complang.tuwien.ac.at> <7bb17310-321a-43ae-8181-5d1c10a30086n@googlegroups.com>
<2023Jun9.175655@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <87eeca96-b881-4944-bb92-3634817a1656n@googlegroups.com>
Subject: Re: The vector machines of ITH Zurich
From: already5chosen@yahoo.com (Michael S)
Injection-Date: Sat, 10 Jun 2023 20:03:51 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3302
 by: Michael S - Sat, 10 Jun 2023 20:03 UTC

On Friday, June 9, 2023 at 7:13:06 PM UTC+3, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
> >So, you say, I can upgrade E-2176G based development server to
> >128 GB without losing bandwidth or latency?=20
> >By now, with 16GB DIMMs, it appears to operate at max rate (2666 MT/s),=20
> >but I am not sure about latency.
> I am just saying that you can upgrade it to 128GB. However, if you
> now use 4 two-sided 16GB DIMMs,

Yes, that's what we have now.

> I don't expect bandwidth or latency to
> get worse if you change to 4 32GB DIMMs. If you have only 2 DIMMs, or
> 4 one-sided DIMMs, you may see worse numbers, but for a development
> server I don't expect it to translate into a noticable slowdown.
>

Not software development. FPGA.
FPGA P&R tools appear to be quite sensitive to main DRAM latency. Probably,
also to DRAM bandwidth achievable by single core. But not sensitive to all cores
bandwidth. I.e. quite different, if not to say, opposite, to to compilation of large
SW project.

> We have Xeon E-2388G servers with 128GB here, and what dmidecode tells
> me about one of the DIMMs is:
>
> Speed: 3200 MT/s
> Configured Memory Speed: 2933 MT/s

E-2388G is Rocket Lake. Much newer than our Coffee Lake.
Better cores, but slower L2. Not sure how it effect FPGA tools.

> - anton

Thank you.
I didn't pay attention that 32 GB DDR4 UDIMMs finally materialized.

> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: The vector machines of ITH Zurich

<3823e721-b598-4e54-bda5-b8e9ff07089an@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32760&group=comp.arch#32760

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:176f:b0:626:1617:bbdf with SMTP id et15-20020a056214176f00b006261617bbdfmr771496qvb.1.1686428331077;
Sat, 10 Jun 2023 13:18:51 -0700 (PDT)
X-Received: by 2002:a05:6830:1da4:b0:6a8:b659:d46d with SMTP id
z4-20020a0568301da400b006a8b659d46dmr1543244oti.3.1686428330934; Sat, 10 Jun
2023 13:18:50 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 10 Jun 2023 13:18:50 -0700 (PDT)
In-Reply-To: <OUIgM.11898$fZx2.7495@fx14.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:781f:25b8:69ef:610e;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:781f:25b8:69ef:610e
References: <386698fb-72ba-435e-b99b-e251e29bb773n@googlegroups.com>
<0b405880-1e5f-44eb-a2cb-a0f8099c255bn@googlegroups.com> <7ed26efe-3be6-4cf0-a097-16d98a8f37dcn@googlegroups.com>
<5563d673-57fb-441c-a7f1-81878a0bfee6n@googlegroups.com> <u5qtk8$185op$1@dont-email.me>
<1ca69920-e709-4aac-871e-e406d6372a54n@googlegroups.com> <u5ue1c$1r0s1$1@dont-email.me>
<2023Jun9.124633@mips.complang.tuwien.ac.at> <7bb17310-321a-43ae-8181-5d1c10a30086n@googlegroups.com>
<2023Jun9.175655@mips.complang.tuwien.ac.at> <OUIgM.11898$fZx2.7495@fx14.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3823e721-b598-4e54-bda5-b8e9ff07089an@googlegroups.com>
Subject: Re: The vector machines of ITH Zurich
From: already5chosen@yahoo.com (Michael S)
Injection-Date: Sat, 10 Jun 2023 20:18:51 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3644
 by: Michael S - Sat, 10 Jun 2023 20:18 UTC

On Friday, June 9, 2023 at 8:00:34 PM UTC+3, Scott Lurndal wrote:
> an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> >Michael S <already...@yahoo.com> writes:
> >>So, you say, I can upgrade E-2176G based development server to
> >>128 GB without losing bandwidth or latency?=20
> >>By now, with 16GB DIMMs, it appears to operate at max rate (2666 MT/s),=20
> >>but I am not sure about latency.
> >
> >I am just saying that you can upgrade it to 128GB. However, if you
> >now use 4 two-sided 16GB DIMMs, I don't expect bandwidth or latency to
> >get worse if you change to 4 32GB DIMMs. If you have only 2 DIMMs, or
> >4 one-sided DIMMs, you may see worse numbers, but for a development
> >server I don't expect it to translate into a noticable slowdown.
> >
> >We have Xeon E-2388G servers with 128GB here, and what dmidecode tells
> >me about one of the DIMMs is:
> >
> > Speed: 3200 MT/s
> > Configured Memory Speed: 2933 MT/s
> Our development machine has 380GB, DDR4, all at the
> same speed as yours using 48 DIMMs.
>
> $ grep DIMM /tmp/dmi.decode | wc -l
> 48
> model name : Intel(R) Xeon(R) Gold 6246R CPU @ 3.40GHz

Those servers have much higher DRAM latency than Xeon E.
Also, according to Anandetch benchmarks (back from times when Andrei and
Ian still worked there) the bandwidth, available to individual core, is at best half
of what available on Xeon E.
Of course, for big SW compilation all that insignificant, what matters is mostly
total bandwidth on all cores and the size of LLC cache, both of which are a lot
better than on Xeon-E. But for FPGA development, our small inexpensive server
probably beats yours big and costly box by wide margin. The only problem is,
that 64 GB is not enough to run two Stratix-10 compilations in parallel.
And sometimes, not very often, we want to do just that.

Re: The vector machines of ITH Zurich

<815ff59d-8473-4ae9-b460-b48325edc6cbn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32762&group=comp.arch#32762

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:590e:0:b0:3f9:a751:1dac with SMTP id 14-20020ac8590e000000b003f9a7511dacmr1702440qty.9.1686446332750;
Sat, 10 Jun 2023 18:18:52 -0700 (PDT)
X-Received: by 2002:a05:6870:a8b0:b0:195:47bc:1f7f with SMTP id
eb48-20020a056870a8b000b0019547bc1f7fmr1757030oab.3.1686446332499; Sat, 10
Jun 2023 18:18:52 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 10 Jun 2023 18:18:52 -0700 (PDT)
In-Reply-To: <3823e721-b598-4e54-bda5-b8e9ff07089an@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:f0ab:170:51fc:9213;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:f0ab:170:51fc:9213
References: <386698fb-72ba-435e-b99b-e251e29bb773n@googlegroups.com>
<0b405880-1e5f-44eb-a2cb-a0f8099c255bn@googlegroups.com> <7ed26efe-3be6-4cf0-a097-16d98a8f37dcn@googlegroups.com>
<5563d673-57fb-441c-a7f1-81878a0bfee6n@googlegroups.com> <u5qtk8$185op$1@dont-email.me>
<1ca69920-e709-4aac-871e-e406d6372a54n@googlegroups.com> <u5ue1c$1r0s1$1@dont-email.me>
<2023Jun9.124633@mips.complang.tuwien.ac.at> <7bb17310-321a-43ae-8181-5d1c10a30086n@googlegroups.com>
<2023Jun9.175655@mips.complang.tuwien.ac.at> <OUIgM.11898$fZx2.7495@fx14.iad> <3823e721-b598-4e54-bda5-b8e9ff07089an@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <815ff59d-8473-4ae9-b460-b48325edc6cbn@googlegroups.com>
Subject: Re: The vector machines of ITH Zurich
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sun, 11 Jun 2023 01:18:52 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4891
 by: MitchAlsup - Sun, 11 Jun 2023 01:18 UTC

On Saturday, June 10, 2023 at 3:18:52 PM UTC-5, Michael S wrote:
> On Friday, June 9, 2023 at 8:00:34 PM UTC+3, Scott Lurndal wrote:
> > an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> > >Michael S <already...@yahoo.com> writes:
> > >>So, you say, I can upgrade E-2176G based development server to
> > >>128 GB without losing bandwidth or latency?=20
> > >>By now, with 16GB DIMMs, it appears to operate at max rate (2666 MT/s),=20
> > >>but I am not sure about latency.
> > >
> > >I am just saying that you can upgrade it to 128GB. However, if you
> > >now use 4 two-sided 16GB DIMMs, I don't expect bandwidth or latency to
> > >get worse if you change to 4 32GB DIMMs. If you have only 2 DIMMs, or
> > >4 one-sided DIMMs, you may see worse numbers, but for a development
> > >server I don't expect it to translate into a noticable slowdown.
> > >
> > >We have Xeon E-2388G servers with 128GB here, and what dmidecode tells
> > >me about one of the DIMMs is:
> > >
> > > Speed: 3200 MT/s
> > > Configured Memory Speed: 2933 MT/s
> > Our development machine has 380GB, DDR4, all at the
> > same speed as yours using 48 DIMMs.
> >
> > $ grep DIMM /tmp/dmi.decode | wc -l
> > 48
> > model name : Intel(R) Xeon(R) Gold 6246R CPU @ 3.40GHz
<
> Those servers have much higher DRAM latency than Xeon E.
<
Servers benefit from closed page mode, too; whereas application processors
benefit from open page mode DRAM access. Lower aggregate latency favors
servers, lower minimum latency favors application processing.
<
Once you get away from:: "I want this thread to run as fast as possible"
and go to "I want all threads to run in aggregate as fast as possible",
lots of things want to be utilized and configured differently.
<
> Also, according to Anandetch benchmarks (back from times when Andrei and
> Ian still worked there) the bandwidth, available to individual core, is at best half
> of what available on Xeon E.
<
SUN 10,000 (64 processor) servers had 3× the latency of the 4 processor
servers, yet could perform database stuff more than 16× faster than the
4 processor servers. {Circa 2000}
<
> Of course, for big SW compilation all that insignificant, what matters is mostly
> total bandwidth on all cores and the size of LLC cache, both of which are a lot
> better than on Xeon-E. But for FPGA development, our small inexpensive server
> probably beats yours big and costly box by wide margin. The only problem is,
> that 64 GB is not enough to run two Stratix-10 compilations in parallel.
> And sometimes, not very often, we want to do just that.
<
That is what I am getting at:: at one end you are latency bound, at the other
you are throughput bound. I might note that GPUs often average (AVERAGE)
400 clocks to memory*, but with 10s of thousands of threads available, they
dramatically outperform non-WARP-like ISAs even with horribly slow (by CPU
standards) memory.
<
(*) cache hits included

Re: The vector machines of ITH Zurich

<bde659c1-1da5-4ffa-8358-5c0c110e1aben@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32763&group=comp.arch#32763

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:483:0:b0:75b:7eea:8157 with SMTP id 125-20020a370483000000b0075b7eea8157mr377216qke.14.1686478768688;
Sun, 11 Jun 2023 03:19:28 -0700 (PDT)
X-Received: by 2002:a05:6870:a893:b0:19f:a809:a8a8 with SMTP id
eb19-20020a056870a89300b0019fa809a8a8mr1386107oab.4.1686478768425; Sun, 11
Jun 2023 03:19:28 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 11 Jun 2023 03:19:28 -0700 (PDT)
In-Reply-To: <815ff59d-8473-4ae9-b460-b48325edc6cbn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <386698fb-72ba-435e-b99b-e251e29bb773n@googlegroups.com>
<0b405880-1e5f-44eb-a2cb-a0f8099c255bn@googlegroups.com> <7ed26efe-3be6-4cf0-a097-16d98a8f37dcn@googlegroups.com>
<5563d673-57fb-441c-a7f1-81878a0bfee6n@googlegroups.com> <u5qtk8$185op$1@dont-email.me>
<1ca69920-e709-4aac-871e-e406d6372a54n@googlegroups.com> <u5ue1c$1r0s1$1@dont-email.me>
<2023Jun9.124633@mips.complang.tuwien.ac.at> <7bb17310-321a-43ae-8181-5d1c10a30086n@googlegroups.com>
<2023Jun9.175655@mips.complang.tuwien.ac.at> <OUIgM.11898$fZx2.7495@fx14.iad>
<3823e721-b598-4e54-bda5-b8e9ff07089an@googlegroups.com> <815ff59d-8473-4ae9-b460-b48325edc6cbn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bde659c1-1da5-4ffa-8358-5c0c110e1aben@googlegroups.com>
Subject: Re: The vector machines of ITH Zurich
From: already5chosen@yahoo.com (Michael S)
Injection-Date: Sun, 11 Jun 2023 10:19:28 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5005
 by: Michael S - Sun, 11 Jun 2023 10:19 UTC

On Sunday, June 11, 2023 at 4:18:54 AM UTC+3, MitchAlsup wrote:
> On Saturday, June 10, 2023 at 3:18:52 PM UTC-5, Michael S wrote:
> > On Friday, June 9, 2023 at 8:00:34 PM UTC+3, Scott Lurndal wrote:
> > > an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> > > >Michael S <already...@yahoo.com> writes:
> > > >>So, you say, I can upgrade E-2176G based development server to
> > > >>128 GB without losing bandwidth or latency?=20
> > > >>By now, with 16GB DIMMs, it appears to operate at max rate (2666 MT/s),=20
> > > >>but I am not sure about latency.
> > > >
> > > >I am just saying that you can upgrade it to 128GB. However, if you
> > > >now use 4 two-sided 16GB DIMMs, I don't expect bandwidth or latency to
> > > >get worse if you change to 4 32GB DIMMs. If you have only 2 DIMMs, or
> > > >4 one-sided DIMMs, you may see worse numbers, but for a development
> > > >server I don't expect it to translate into a noticable slowdown.
> > > >
> > > >We have Xeon E-2388G servers with 128GB here, and what dmidecode tells
> > > >me about one of the DIMMs is:
> > > >
> > > > Speed: 3200 MT/s
> > > > Configured Memory Speed: 2933 MT/s
> > > Our development machine has 380GB, DDR4, all at the
> > > same speed as yours using 48 DIMMs.
> > >
> > > $ grep DIMM /tmp/dmi.decode | wc -l
> > > 48
> > > model name : Intel(R) Xeon(R) GoldXeon(R) Gold 6246R CPU @ 3.40GHz
> <
> > Those servers have much higher DRAM latency than Xeon E.
> <
> Servers benefit from closed page mode, too; whereas application processors
> benefit from open page mode DRAM access. Lower aggregate latency favors
> servers, lower minimum latency favors application processing.

I am not so sure that it is as true for 3 y.o. Xeon 6246R as it was for 20 y.o.
DDR1-based Opteron. I mean, not sure that closed page mode is still beneficial.

DDR4 has 16 banks vs 2 in DDR1. The number of R-DIMMs per channel is a little
lower than in olden times (3 vs 4?), but today each server-grade DIMM has at least
2 ranks and sometimes 4, while back than it was 1 or sometimes 2. So # average
ranks per channel increased, too. And there are 6 logical channels instead of
one (it seems to me, those Opterons were running a pair of DIMMs in lockstep,
as a single logical channel, but may be I am confusing them with contemporary
Xeons ?).
On top of that, queues in the memory controllers are probably at least 8 times
deeper, but likely more than 8 times, so controller has much better visibility into
what is coming in the near future. All that makes "smart" open page policy more
attractive even in most serverish of server scenarios. Especially so for likes of
6246R that has modest # of cores (16).

When I say "smart" open page policy I mean that page is closed either when MC
sees that the next access to bank in question is directed to different row or when
request queue holds no accesses to this bank and the time to next refresh is below
judiciously chosen threshold. Otherwise page kept open.

Re: The vector machines of ITH Zurich

<u6511a$2t5q3$1@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32768&group=comp.arch#32768

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-2d6e-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: The vector machines of ITH Zurich
Date: Sun, 11 Jun 2023 17:42:02 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <u6511a$2t5q3$1@newsreader4.netcologne.de>
References: <386698fb-72ba-435e-b99b-e251e29bb773n@googlegroups.com>
<0b405880-1e5f-44eb-a2cb-a0f8099c255bn@googlegroups.com>
<7ed26efe-3be6-4cf0-a097-16d98a8f37dcn@googlegroups.com>
<5563d673-57fb-441c-a7f1-81878a0bfee6n@googlegroups.com>
<u5qtk8$185op$1@dont-email.me>
Injection-Date: Sun, 11 Jun 2023 17:42:02 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-2d6e-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:2d6e:0:7285:c2ff:fe6c:992d";
logging-data="3053379"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Sun, 11 Jun 2023 17:42 UTC

Brett <ggtgp@yahoo.com> schrieb:

> With the M2 machines Apple has dumped DIMMs and brought RAM onto the CPU
> carrier, quadrupling the dram bandwidth.

I would _love_ to see a standard OpenFOAM benchmarks on these machines.

CFD is traditionally memory-bound; twice the memory bandwidth
translates to roughly twice the performance.

Even at the inflated prices of Apple hardware, having n/4 Apple
M2 machines in a cluster could actually be more economical than
having n x86 machines.

Something similar for POWER10, but I assume that the prices are
even more astronomical there.

Re: The vector machines of ITH Zurich

<080e3879-3c90-4b40-a140-7ca37a8000bdn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=32774&group=comp.arch#32774

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5846:0:b0:3f8:30a6:983a with SMTP id h6-20020ac85846000000b003f830a6983amr2457812qth.3.1686524026687;
Sun, 11 Jun 2023 15:53:46 -0700 (PDT)
X-Received: by 2002:a9d:4d84:0:b0:6b1:58cf:75ac with SMTP id
u4-20020a9d4d84000000b006b158cf75acmr3154371otk.1.1686524026463; Sun, 11 Jun
2023 15:53:46 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 11 Jun 2023 15:53:46 -0700 (PDT)
In-Reply-To: <u6511a$2t5q3$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:4c76:5eca:db1b:54ff;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:4c76:5eca:db1b:54ff
References: <386698fb-72ba-435e-b99b-e251e29bb773n@googlegroups.com>
<0b405880-1e5f-44eb-a2cb-a0f8099c255bn@googlegroups.com> <7ed26efe-3be6-4cf0-a097-16d98a8f37dcn@googlegroups.com>
<5563d673-57fb-441c-a7f1-81878a0bfee6n@googlegroups.com> <u5qtk8$185op$1@dont-email.me>
<u6511a$2t5q3$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <080e3879-3c90-4b40-a140-7ca37a8000bdn@googlegroups.com>
Subject: Re: The vector machines of ITH Zurich
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Sun, 11 Jun 2023 22:53:46 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2631
 by: MitchAlsup - Sun, 11 Jun 2023 22:53 UTC

On Sunday, June 11, 2023 at 12:42:06 PM UTC-5, Thomas Koenig wrote:
> Brett <gg...@yahoo.com> schrieb:
> > With the M2 machines Apple has dumped DIMMs and brought RAM onto the CPU
> > carrier, quadrupling the dram bandwidth.
> I would _love_ to see a standard OpenFOAM benchmarks on these machines.
>
> CFD is traditionally memory-bound; twice the memory bandwidth
> translates to roughly twice the performance.
<
True for 3D CFD not so true for 1D CFD.
<
When you have to touch 1M nodes of 64-bytes each × every iteration step,
memory bandwidth becomes the entire game.
<
Also FFT:: no mater how you break the thing up, at some point in the
calculation, every complex number gets multiplied by every other
complex number in the data set undergoing transformation. Nothing
caches can do for you here (except become as big as the data set....)
>
> Even at the inflated prices of Apple hardware, having n/4 Apple
> M2 machines in a cluster could actually be more economical than
> having n x86 machines.
>
> Something similar for POWER10, but I assume that the prices are
> even more astronomical there.

Pages:12
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor