Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

"Time is money and money can't buy you love and I love your outfit" -- T.H.U.N.D.E.R. #1


devel / comp.arch / L3 cache allocation on Intel

SubjectAuthor
* L3 cache allocation on IntelAnton Ertl
+* Re: L3 cache allocation on IntelEricP
|`* Re: L3 cache allocation on IntelEricP
| `* Re: L3 cache allocation on IntelAnton Ertl
|  +* Re: L3 cache allocation on IntelEricP
|  |`* Re: L3 cache allocation on IntelAnton Ertl
|  | `* Re: L3 cache allocation on IntelScott Lurndal
|  |  +- Re: L3 cache allocation on IntelMitchAlsup
|  |  `* Re: L3 cache allocation on IntelAnton Ertl
|  |   `- Re: L3 cache allocation on IntelScott Lurndal
|  `* Re: L3 cache allocation on IntelEricP
|   `* Re: L3 cache allocation on IntelMitchAlsup
|    +* Re: L3 cache allocation on IntelEricP
|    |`- Re: L3 cache allocation on IntelMitchAlsup
|    +* Re: L3 cache allocation on IntelTerje Mathisen
|    |`* Re: L3 cache allocation on IntelMitchAlsup
|    | `* Re: L3 cache allocation on IntelTerje Mathisen
|    |  `- Re: L3 cache allocation on IntelAnton Ertl
|    +- Re: L3 cache allocation on IntelPaul A. Clayton
|    `- Re: L3 cache allocation on IntelEricP
+* Re: L3 cache allocation on IntelScott Lurndal
|+* Re: L3 cache allocation on IntelMitchAlsup
||`* Re: L3 cache allocation on IntelTerje Mathisen
|| `- Re: L3 cache allocation on IntelMitchAlsup
|`* Re: L3 cache allocation on IntelAnton Ertl
| +* Re: L3 cache allocation on IntelTerje Mathisen
| |`* Re: L3 cache allocation on IntelAnton Ertl
| | `* Re: L3 cache allocation on IntelMichael S
| |  `* Re: L3 cache allocation on IntelAnton Ertl
| |   +* Re: L3 cache allocation on IntelScott Lurndal
| |   |`* Re: L3 cache allocation on IntelAnton Ertl
| |   | `- Re: L3 cache allocation on IntelScott Lurndal
| |   `* Re: L3 cache allocation on IntelMichael S
| |    `* Re: L3 cache allocation on IntelAnton Ertl
| |     `- Re: L3 cache allocation on IntelMichael S
| `- Re: L3 cache allocation on IntelScott Lurndal
`- Re: L3 cache allocation on IntelEricP

Pages:12
L3 cache allocation on Intel

<2023Oct18.190923@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34576&group=comp.arch#34576

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: L3 cache allocation on Intel
Date: Wed, 18 Oct 2023 17:09:23 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 24
Message-ID: <2023Oct18.190923@mips.complang.tuwien.ac.at>
Injection-Info: dont-email.me; posting-host="d9c6476fc509dc039ccd17d374278b62";
logging-data="3970641"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Jaj/22Rj8ojvRpJQuzI4P"
Cancel-Lock: sha1:JTGpjb2i5Ctlor4e4yxJOQeTovc=
X-newsreader: xrn 10.11
 by: Anton Ertl - Wed, 18 Oct 2023 17:09 UTC

A student asked me a question that essentially reduces to: How does
Intel allocate/replace cache lines in the L3? The L3 consists of a
number of slices; e.g., on the Raptor Lake a slice has 3MB, and there
is one slice per Raptor Cove core and one per 4 Gracemont cores,
resulting in, e.g., 12 slices for a 13900.

Now if each core allocated only in its own slice, many programs would
see an effect of only 3MB of L3 cache, but Intel processors are well
documented to access all the cache even from one thread running on
only one core.

Using some pseudo-LRU replacement across the whole L3 also seems
unlikely to me (too slow and too power-expensive).

The way I would do it if I don't want to be limited to a slice is to
select some slice (maybe with a per-core round-robin scheme), and
allocate the line there (with pseudo-LRU within the slice).

How does Intel do it?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: L3 cache allocation on Intel

<SCVXM.40044$MJ59.16148@fx10.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34580&group=comp.arch#34580

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx10.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
References: <2023Oct18.190923@mips.complang.tuwien.ac.at>
In-Reply-To: <2023Oct18.190923@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 28
Message-ID: <SCVXM.40044$MJ59.16148@fx10.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 18 Oct 2023 18:38:42 UTC
Date: Wed, 18 Oct 2023 14:38:28 -0400
X-Received-Bytes: 1767
 by: EricP - Wed, 18 Oct 2023 18:38 UTC

Anton Ertl wrote:
> A student asked me a question that essentially reduces to: How does
> Intel allocate/replace cache lines in the L3? The L3 consists of a
> number of slices; e.g., on the Raptor Lake a slice has 3MB, and there
> is one slice per Raptor Cove core and one per 4 Gracemont cores,
> resulting in, e.g., 12 slices for a 13900.
>
> Now if each core allocated only in its own slice, many programs would
> see an effect of only 3MB of L3 cache, but Intel processors are well
> documented to access all the cache even from one thread running on
> only one core.
>
> Using some pseudo-LRU replacement across the whole L3 also seems
> unlikely to me (too slow and too power-expensive).
>
> The way I would do it if I don't want to be limited to a slice is to
> select some slice (maybe with a per-core round-robin scheme), and
> allocate the line there (with pseudo-LRU within the slice).
>
> How does Intel do it?
>
> - anton

This recent article talks some about Intel L3 and its connections:

https://chipsandcheese.com/2023/08/04/sandy-bridge-setting-intels-modern-foundation/

Re: L3 cache allocation on Intel

<bTVXM.87760$8fO.38095@fx15.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34581&group=comp.arch#34581

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.1d4.us!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx15.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <SCVXM.40044$MJ59.16148@fx10.iad>
In-Reply-To: <SCVXM.40044$MJ59.16148@fx10.iad>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 33
Message-ID: <bTVXM.87760$8fO.38095@fx15.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 18 Oct 2023 18:56:07 UTC
Date: Wed, 18 Oct 2023 14:55:55 -0400
X-Received-Bytes: 1973
 by: EricP - Wed, 18 Oct 2023 18:55 UTC

EricP wrote:
> Anton Ertl wrote:
>> A student asked me a question that essentially reduces to: How does
>> Intel allocate/replace cache lines in the L3? The L3 consists of a
>> number of slices; e.g., on the Raptor Lake a slice has 3MB, and there
>> is one slice per Raptor Cove core and one per 4 Gracemont cores,
>> resulting in, e.g., 12 slices for a 13900.
>>
>> Now if each core allocated only in its own slice, many programs would
>> see an effect of only 3MB of L3 cache, but Intel processors are well
>> documented to access all the cache even from one thread running on
>> only one core.
>>
>> Using some pseudo-LRU replacement across the whole L3 also seems
>> unlikely to me (too slow and too power-expensive).
>>
>> The way I would do it if I don't want to be limited to a slice is to
>> select some slice (maybe with a per-core round-robin scheme), and
>> allocate the line there (with pseudo-LRU within the slice).
>>
>> How does Intel do it?
>>
>> - anton
>
> This recent article talks some about Intel L3 and its connections:
>
> https://chipsandcheese.com/2023/08/04/sandy-bridge-setting-intels-modern-foundation/

This also talks about L3 slices

Sapphire Rapids: Golden Cove Hits Servers
https://chipsandcheese.com/2023/03/12/a-peek-at-sapphire-rapids/

Re: L3 cache allocation on Intel

<vqWXM.40046$MJ59.2768@fx10.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34584&group=comp.arch#34584

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer02.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx10.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: L3 cache allocation on Intel
Newsgroups: comp.arch
References: <2023Oct18.190923@mips.complang.tuwien.ac.at>
Lines: 24
Message-ID: <vqWXM.40046$MJ59.2768@fx10.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Wed, 18 Oct 2023 19:33:47 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Wed, 18 Oct 2023 19:33:47 GMT
X-Received-Bytes: 1715
 by: Scott Lurndal - Wed, 18 Oct 2023 19:33 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>A student asked me a question that essentially reduces to: How does
>Intel allocate/replace cache lines in the L3? The L3 consists of a
>number of slices; e.g., on the Raptor Lake a slice has 3MB, and there
>is one slice per Raptor Cove core and one per 4 Gracemont cores,
>resulting in, e.g., 12 slices for a 13900.
>
>Now if each core allocated only in its own slice, many programs would
>see an effect of only 3MB of L3 cache, but Intel processors are well
>documented to access all the cache even from one thread running on
>only one core.
>
>Using some pseudo-LRU replacement across the whole L3 also seems
>unlikely to me (too slow and too power-expensive).
>
>The way I would do it if I don't want to be limited to a slice is to
>select some slice (maybe with a per-core round-robin scheme), and
>allocate the line there (with pseudo-LRU within the slice).
>
>How does Intel do it?

The probably hash the linear/physical address to select which
L3 island to route to.

Re: L3 cache allocation on Intel

<67d2e59f-00d6-48aa-886c-90b58e8085d1n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34585&group=comp.arch#34585

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:967:b0:66d:291e:352b with SMTP id do7-20020a056214096700b0066d291e352bmr6177qvb.1.1697670324130;
Wed, 18 Oct 2023 16:05:24 -0700 (PDT)
X-Received: by 2002:a05:6808:2087:b0:3a7:56ad:cb9e with SMTP id
s7-20020a056808208700b003a756adcb9emr160526oiw.9.1697670323830; Wed, 18 Oct
2023 16:05:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 18 Oct 2023 16:05:23 -0700 (PDT)
In-Reply-To: <vqWXM.40046$MJ59.2768@fx10.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:78f1:aabe:a874:335e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:78f1:aabe:a874:335e
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <vqWXM.40046$MJ59.2768@fx10.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <67d2e59f-00d6-48aa-886c-90b58e8085d1n@googlegroups.com>
Subject: Re: L3 cache allocation on Intel
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Wed, 18 Oct 2023 23:05:24 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Wed, 18 Oct 2023 23:05 UTC

On Wednesday, October 18, 2023 at 2:33:52 PM UTC-5, Scott Lurndal wrote:
> an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> >A student asked me a question that essentially reduces to: How does
> >Intel allocate/replace cache lines in the L3? The L3 consists of a
> >number of slices; e.g., on the Raptor Lake a slice has 3MB, and there
> >is one slice per Raptor Cove core and one per 4 Gracemont cores,
> >resulting in, e.g., 12 slices for a 13900.
> >
> >Now if each core allocated only in its own slice, many programs would
> >see an effect of only 3MB of L3 cache, but Intel processors are well
> >documented to access all the cache even from one thread running on
> >only one core.
> >
> >Using some pseudo-LRU replacement across the whole L3 also seems
> >unlikely to me (too slow and too power-expensive).
> >
> >The way I would do it if I don't want to be limited to a slice is to
> >select some slice (maybe with a per-core round-robin scheme), and
> >allocate the line there (with pseudo-LRU within the slice).
> >
> >How does Intel do it?
> The probably hash the linear/physical address to select which
> L3 island to route to.
<
This recent article talks some about Intel L3 and its connections:
<
https://chipsandcheese.com/2023/08/04/sandy-bridge-setting-intels-modern-foundation/
<
One of the drawings shows that all the L3s are on a ring bus.

Re: L3 cache allocation on Intel

<ugqi0c$6he1$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34587&group=comp.arch#34587

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
Date: Thu, 19 Oct 2023 08:23:08 +0200
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <ugqi0c$6he1$1@dont-email.me>
References: <2023Oct18.190923@mips.complang.tuwien.ac.at>
<vqWXM.40046$MJ59.2768@fx10.iad>
<67d2e59f-00d6-48aa-886c-90b58e8085d1n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 19 Oct 2023 06:23:08 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b1d69acebcddd7b9b0d49e3fdaa2d089";
logging-data="214465"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+4kXYrZf87BUpFlX6Bmpvr1BnDBR8k0pYJaTnnnAEcfQ=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17.1
Cancel-Lock: sha1:uUy3K533YgB2cjFjmlSbZjC6ICM=
In-Reply-To: <67d2e59f-00d6-48aa-886c-90b58e8085d1n@googlegroups.com>
 by: Terje Mathisen - Thu, 19 Oct 2023 06:23 UTC

MitchAlsup wrote:
> On Wednesday, October 18, 2023 at 2:33:52 PM UTC-5, Scott Lurndal wrote:
>> an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>>> A student asked me a question that essentially reduces to: How does
>>> Intel allocate/replace cache lines in the L3? The L3 consists of a
>>> number of slices; e.g., on the Raptor Lake a slice has 3MB, and there
>>> is one slice per Raptor Cove core and one per 4 Gracemont cores,
>>> resulting in, e.g., 12 slices for a 13900.
>>>
>>> Now if each core allocated only in its own slice, many programs would
>>> see an effect of only 3MB of L3 cache, but Intel processors are well
>>> documented to access all the cache even from one thread running on
>>> only one core.
>>>
>>> Using some pseudo-LRU replacement across the whole L3 also seems
>>> unlikely to me (too slow and too power-expensive).
>>>
>>> The way I would do it if I don't want to be limited to a slice is to
>>> select some slice (maybe with a per-core round-robin scheme), and
>>> allocate the line there (with pseudo-LRU within the slice).
>>>
>>> How does Intel do it?
>> The probably hash the linear/physical address to select which
>> L3 island to route to.
> <
> This recent article talks some about Intel L3 and its connections:
> <
> https://chipsandcheese.com/2023/08/04/sandy-bridge-setting-intels-modern-foundation/
> <
> One of the drawings shows that all the L3s are on a ring bus.

I _think_ that Intel introduced the dual counter-rotating ring bus
structure for Larrabee, to allow 60+ cores to communicate efficiently
(low latency & power)?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: L3 cache allocation on Intel

<5e8c10fc-d947-4587-8fe3-b110877d663an@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34590&group=comp.arch#34590

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:41d9:0:b0:41b:7f46:e4d4 with SMTP id o25-20020ac841d9000000b0041b7f46e4d4mr55492qtm.6.1697738953592; Thu, 19 Oct 2023 11:09:13 -0700 (PDT)
X-Received: by 2002:a05:6808:201e:b0:3b2:e214:9118 with SMTP id q30-20020a056808201e00b003b2e2149118mr1079178oiw.4.1697738953407; Thu, 19 Oct 2023 11:09:13 -0700 (PDT)
Path: i2pn2.org!rocksolid2!news.neodome.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!69.80.99.15.MISMATCH!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 19 Oct 2023 11:09:13 -0700 (PDT)
In-Reply-To: <ugqi0c$6he1$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7ce1:9319:ae08:837f; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7ce1:9319:ae08:837f
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <vqWXM.40046$MJ59.2768@fx10.iad> <67d2e59f-00d6-48aa-886c-90b58e8085d1n@googlegroups.com> <ugqi0c$6he1$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5e8c10fc-d947-4587-8fe3-b110877d663an@googlegroups.com>
Subject: Re: L3 cache allocation on Intel
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 19 Oct 2023 18:09:13 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 50
 by: MitchAlsup - Thu, 19 Oct 2023 18:09 UTC

On Thursday, October 19, 2023 at 1:23:13 AM UTC-5, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Wednesday, October 18, 2023 at 2:33:52 PM UTC-5, Scott Lurndal wrote:
> >> an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> >>> A student asked me a question that essentially reduces to: How does
> >>> Intel allocate/replace cache lines in the L3? The L3 consists of a
> >>> number of slices; e.g., on the Raptor Lake a slice has 3MB, and there
> >>> is one slice per Raptor Cove core and one per 4 Gracemont cores,
> >>> resulting in, e.g., 12 slices for a 13900.
> >>>
> >>> Now if each core allocated only in its own slice, many programs would
> >>> see an effect of only 3MB of L3 cache, but Intel processors are well
> >>> documented to access all the cache even from one thread running on
> >>> only one core.
> >>>
> >>> Using some pseudo-LRU replacement across the whole L3 also seems
> >>> unlikely to me (too slow and too power-expensive).
> >>>
> >>> The way I would do it if I don't want to be limited to a slice is to
> >>> select some slice (maybe with a per-core round-robin scheme), and
> >>> allocate the line there (with pseudo-LRU within the slice).
> >>>
> >>> How does Intel do it?
> >> The probably hash the linear/physical address to select which
> >> L3 island to route to.
> > <
> > This recent article talks some about Intel L3 and its connections:
> > <
> > https://chipsandcheese.com/2023/08/04/sandy-bridge-setting-intels-modern-foundation/
> > <
> > One of the drawings shows that all the L3s are on a ring bus.
> I _think_ that Intel introduced the dual counter-rotating ring bus
> structure for Larrabee, to allow 60+ cores to communicate efficiently
> (low latency & power)?
<
Larry-the-bee certainly had a ring bus.
>
> Terje
>
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: L3 cache allocation on Intel

<2023Oct19.211807@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34591&group=comp.arch#34591

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
Date: Thu, 19 Oct 2023 19:18:07 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 18
Message-ID: <2023Oct19.211807@mips.complang.tuwien.ac.at>
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <SCVXM.40044$MJ59.16148@fx10.iad> <bTVXM.87760$8fO.38095@fx15.iad>
Injection-Info: dont-email.me; posting-host="a7cfb967dcf297883322f3dc735e4c45";
logging-data="548824"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+i3/mtSndZuv5ldo4NQSm5"
Cancel-Lock: sha1:6rbH2evjr/meKczCfmUZZPSYGwY=
X-newsreader: xrn 10.11
 by: Anton Ertl - Thu, 19 Oct 2023 19:18 UTC

EricP <ThatWouldBeTelling@thevillage.com> writes:
>EricP wrote:
>> This recent article talks some about Intel L3 and its connections:
>>
>> https://chipsandcheese.com/2023/08/04/sandy-bridge-setting-intels-modern-foundation/
>
>This also talks about L3 slices
>
>Sapphire Rapids: Golden Cove Hits Servers
>https://chipsandcheese.com/2023/03/12/a-peek-at-sapphire-rapids/

Good stuff, yes, and I looked at it again now, but if it says anything
about L3 replacement policies, I missed it again.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: L3 cache allocation on Intel

<2023Oct19.212155@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34592&group=comp.arch#34592

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
Date: Thu, 19 Oct 2023 19:21:55 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 21
Message-ID: <2023Oct19.212155@mips.complang.tuwien.ac.at>
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <vqWXM.40046$MJ59.2768@fx10.iad>
Injection-Info: dont-email.me; posting-host="a7cfb967dcf297883322f3dc735e4c45";
logging-data="548824"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19h0W47qW+/LDDrsAXJ4cOQ"
Cancel-Lock: sha1:pUZaT8qRX4XJorZeJMUsP48s6kU=
X-newsreader: xrn 10.11
 by: Anton Ertl - Thu, 19 Oct 2023 19:21 UTC

scott@slp53.sl.home (Scott Lurndal) writes:
>The probably hash the linear/physical address to select which
>L3 island to route to.

The number of slices (islands) depends on the enabled cores (not sure
if that is also true of BIOS-enabled cores, but it certainly is true
of cores enabled/disabled after testing and binning), so such a hash
function would be more complex (and slower) than I would be
comfortable with.

An advantage of such an approach would be that only one slice needs to
make the actual lookup on a later access, which saves power and
increases total bandwidth.

So overall, this looks more plausible to me than my round-robin
concept.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: L3 cache allocation on Intel

<ugt4bf$t978$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34593&group=comp.arch#34593

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
Date: Fri, 20 Oct 2023 07:48:31 +0200
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <ugt4bf$t978$1@dont-email.me>
References: <2023Oct18.190923@mips.complang.tuwien.ac.at>
<vqWXM.40046$MJ59.2768@fx10.iad>
<2023Oct19.212155@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 20 Oct 2023 05:48:31 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="0460ededeef20392b9d8184d032b6f0a";
logging-data="959720"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+MefBNNHEeaxh2o7dS9S5TpYMHVIil7gBHcIattj3X8A=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17.1
Cancel-Lock: sha1:khaOrDmtZhk8nMEWv8b0WOkOxko=
In-Reply-To: <2023Oct19.212155@mips.complang.tuwien.ac.at>
 by: Terje Mathisen - Fri, 20 Oct 2023 05:48 UTC

Anton Ertl wrote:
> scott@slp53.sl.home (Scott Lurndal) writes:
>> The probably hash the linear/physical address to select which
>> L3 island to route to.
>
> The number of slices (islands) depends on the enabled cores (not sure
> if that is also true of BIOS-enabled cores, but it certainly is true
> of cores enabled/disabled after testing and binning), so such a hash
> function would be more complex (and slower) than I would be
> comfortable with.
>
> An advantage of such an approach would be that only one slice needs to
> make the actual lookup on a later access, which saves power and
> increases total bandwidth.
>
> So overall, this looks more plausible to me than my round-robin
> concept.

If I had to implement a fixed for the CPU but variable between bins
modulo operator for sharding, then it could be feasible to do this via a
reciprocal MUL selected during the binning process.

MUL is expensive, yeah, but L3 accesses are about 5 X more
power-intensive than a double precision FMAC, so a custom, slightly
reduced unsigned MUL must be even cheaper, right?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: L3 cache allocation on Intel

<2023Oct20.103348@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34594&group=comp.arch#34594

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
Date: Fri, 20 Oct 2023 08:33:48 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 29
Message-ID: <2023Oct20.103348@mips.complang.tuwien.ac.at>
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <vqWXM.40046$MJ59.2768@fx10.iad> <2023Oct19.212155@mips.complang.tuwien.ac.at> <ugt4bf$t978$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="bcc076762886499dbe4be495028f1ef1";
logging-data="1047064"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+cUYdK4irPhIr+0ZpipHeY"
Cancel-Lock: sha1:tFYoKmWn8wUvUU1uGc0Ap3rZqgk=
X-newsreader: xrn 10.11
 by: Anton Ertl - Fri, 20 Oct 2023 08:33 UTC

Terje Mathisen <terje.mathisen@tmsw.no> writes:
>If I had to implement a fixed for the CPU but variable between bins
>modulo operator for sharding, then it could be feasible to do this via a
>reciprocal MUL selected during the binning process.

Modulo would require division (reciprocal multiplication) followed by
a multiply and a subtraction.

The following is more effective: Preliminaries: We have n sets in an
L3 slice and s slices, with t=ceil(ld(s)).

To compute the slice and set, n+t bits from the physical address.
Multiply this number with s, giving x. The target slice then is
x/2^(n+t) (which is in the range [0,n)). The target set is the bits
n+t-1..t of x.

This requires an (n+t)*t-bit multiplier with an n+2t-bit result, and
for, e.g., Sapphire Rapids, t=6 and n=11 (assuming that each way has
128KB; with smaller ways and more associativity, n would be smaller; n
cannot be larger, because 128KB is the largest power-of-2 that divides
the size of an L3 slice). This multiplier size (17*6->23 bits) seems
downright cheap compared to the integer (64*64->128) or FP
multipliers, and, as you point out, also cheap compared to the rest of
the L3 access (both in latency and in energy).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: L3 cache allocation on Intel

<914e69a3-d66c-4ddd-a136-d54a6742bc3en@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34595&group=comp.arch#34595

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:6b81:0:b0:40f:e2a5:3100 with SMTP id z1-20020ac86b81000000b0040fe2a53100mr31630qts.6.1697802270240;
Fri, 20 Oct 2023 04:44:30 -0700 (PDT)
X-Received: by 2002:a05:6808:1586:b0:3ad:f3e6:66fe with SMTP id
t6-20020a056808158600b003adf3e666femr575205oiw.4.1697802270049; Fri, 20 Oct
2023 04:44:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 20 Oct 2023 04:44:29 -0700 (PDT)
In-Reply-To: <2023Oct20.103348@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:38ba:49f4:ccbc:3c;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:38ba:49f4:ccbc:3c
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <vqWXM.40046$MJ59.2768@fx10.iad>
<2023Oct19.212155@mips.complang.tuwien.ac.at> <ugt4bf$t978$1@dont-email.me> <2023Oct20.103348@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <914e69a3-d66c-4ddd-a136-d54a6742bc3en@googlegroups.com>
Subject: Re: L3 cache allocation on Intel
From: already5chosen@yahoo.com (Michael S)
Injection-Date: Fri, 20 Oct 2023 11:44:30 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Michael S - Fri, 20 Oct 2023 11:44 UTC

On Friday, October 20, 2023 at 12:09:36 PM UTC+3, Anton Ertl wrote:
> Terje Mathisen <terje.m...@tmsw.no> writes:
> >If I had to implement a fixed for the CPU but variable between bins
> >modulo operator for sharding, then it could be feasible to do this via a
> >reciprocal MUL selected during the binning process.
> Modulo would require division (reciprocal multiplication) followed by
> a multiply and a subtraction.
>
> The following is more effective: Preliminaries: We have n sets in an
> L3 slice and s slices, with t=ceil(ld(s)).
>
> To compute the slice and set, n+t bits from the physical address.
> Multiply this number with s, giving x. The target slice then is
> x/2^(n+t) (which is in the range [0,n)). The target set is the bits
> n+t-1..t of x.
>
> This requires an (n+t)*t-bit multiplier with an n+2t-bit result, and
> for, e.g., Sapphire Rapids, t=6 and n=11 (assuming that each way has
> 128KB; with smaller ways and more associativity, n would be smaller; n
> cannot be larger, because 128KB is the largest power-of-2 that divides
> the size of an L3 slice). This multiplier size (17*6->23 bits) seems
> downright cheap compared to the integer (64*64->128) or FP
> multipliers, and, as you point out, also cheap compared to the rest of
> the L3 access (both in latency and in energy).
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

When you say "physical address" do you mean PA[16:6] a.k.a. Line Index in
classic Nehalem-to-SkylakeClient Intel LLC? Or something else?

Re: L3 cache allocation on Intel

<4owYM.9183$igw9.27@fx37.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34596&group=comp.arch#34596

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx37.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: L3 cache allocation on Intel
Newsgroups: comp.arch
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <vqWXM.40046$MJ59.2768@fx10.iad> <2023Oct19.212155@mips.complang.tuwien.ac.at>
Lines: 19
Message-ID: <4owYM.9183$igw9.27@fx37.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Fri, 20 Oct 2023 14:45:20 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Fri, 20 Oct 2023 14:45:20 GMT
X-Received-Bytes: 1672
 by: Scott Lurndal - Fri, 20 Oct 2023 14:45 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>scott@slp53.sl.home (Scott Lurndal) writes:
>>They probably hash the linear/physical address to select which
>>L3 island to route to.
>
>The number of slices (islands) depends on the enabled cores (not sure
>if that is also true of BIOS-enabled cores, but it certainly is true
>of cores enabled/disabled after testing and binning), so such a hash
>function would be more complex (and slower) than I would be
>comfortable with.

For binning, yes, the hash function needs to accommodate fused out islands/slices.
So does the core identifier (core number) space, such that the logical
core number is dense with no gaps.

If the hardware allows "renaming" of the remaining islands such that
they have a monotonically increasing identifier space, then the hash
function will automatically accomodate the reduced island count without
needing to handle missing islands.

Re: L3 cache allocation on Intel

<2023Oct20.171424@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34597&group=comp.arch#34597

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
Date: Fri, 20 Oct 2023 15:14:24 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 45
Message-ID: <2023Oct20.171424@mips.complang.tuwien.ac.at>
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <vqWXM.40046$MJ59.2768@fx10.iad> <2023Oct19.212155@mips.complang.tuwien.ac.at> <ugt4bf$t978$1@dont-email.me> <2023Oct20.103348@mips.complang.tuwien.ac.at> <914e69a3-d66c-4ddd-a136-d54a6742bc3en@googlegroups.com>
Injection-Info: dont-email.me; posting-host="bcc076762886499dbe4be495028f1ef1";
logging-data="1242985"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX186DMQ46MFfvUrgZGL9Z3YJ"
Cancel-Lock: sha1:ydPHg17mlwDiBjvIRlp2i7Fk/Pw=
X-newsreader: xrn 10.11
 by: Anton Ertl - Fri, 20 Oct 2023 15:14 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Friday, October 20, 2023 at 12:09:36=E2=80=AFPM UTC+3, Anton Ertl wrote:
>> Preliminaries: We have n sets in an
>> L3 slice and s slices, with t=ceil(ld(s)).
>>
>> To compute the slice and set, n+t bits from the physical address.
>> Multiply this number with s, giving x. The target slice then is
>> x/2^(n+t) (which is in the range [0,n)). The target set is the bits
>> n+t-1..t of x.
>>
>> This requires an (n+t)*t-bit multiplier with an n+2t-bit result, and
>> for, e.g., Sapphire Rapids, t=6 and n=11 (assuming that each way has
>> 128KB; with smaller ways and more associativity, n would be smaller; n
>> cannot be larger, because 128KB is the largest power-of-2 that divides
>> the size of an L3 slice). This multiplier size (17*6->23 bits) seems
>> downright cheap compared to the integer (64*64->128) or FP
>> multipliers, and, as you point out, also cheap compared to the rest of
>> the L3 access (both in latency and in energy).
>> - anton
>> --
>> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
>> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>
>
>When you say "physical address" do you mean PA[16:6] a.k.a. Line Index in
>classic Nehalem-to-SkylakeClient Intel LLC? Or something else?

I wrote "n+t bits from the physical address", which would be 17 bits
in the example above. So yes, maybe PA[22:6].

One disadvantage of this scheme AFAICS is that we need to store
essentially the whole physical address (apart from the bottom 6 bits)
as tag, whereas in a normal un"hashed" cache access scheme you do not
need to store the n bits that identify the set.

I can think of ways of varying the scheme to use the physical address
rather than the "hashed" address for accessing the set, and thus
reduce the tags by n bits (per cache line) and also reduces the
multiplier to a t*t->2t bit multiplier. But I am not sure if this
results in a significantly skewed distribution, which would be worse
than expending a few more hardware resources.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: L3 cache allocation on Intel

<IQxYM.59453$sxoa.57335@fx13.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34598&group=comp.arch#34598

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx13.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: L3 cache allocation on Intel
Newsgroups: comp.arch
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <vqWXM.40046$MJ59.2768@fx10.iad> <2023Oct19.212155@mips.complang.tuwien.ac.at> <ugt4bf$t978$1@dont-email.me> <2023Oct20.103348@mips.complang.tuwien.ac.at> <914e69a3-d66c-4ddd-a136-d54a6742bc3en@googlegroups.com> <2023Oct20.171424@mips.complang.tuwien.ac.at>
Lines: 50
Message-ID: <IQxYM.59453$sxoa.57335@fx13.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Fri, 20 Oct 2023 16:24:08 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Fri, 20 Oct 2023 16:24:08 GMT
X-Received-Bytes: 3306
 by: Scott Lurndal - Fri, 20 Oct 2023 16:24 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>Michael S <already5chosen@yahoo.com> writes:
>>On Friday, October 20, 2023 at 12:09:36=E2=80=AFPM UTC+3, Anton Ertl wrote:
>>> Preliminaries: We have n sets in an
>>> L3 slice and s slices, with t=ceil(ld(s)).
>>>
>>> To compute the slice and set, n+t bits from the physical address.
>>> Multiply this number with s, giving x. The target slice then is
>>> x/2^(n+t) (which is in the range [0,n)). The target set is the bits
>>> n+t-1..t of x.
>>>
>>> This requires an (n+t)*t-bit multiplier with an n+2t-bit result, and
>>> for, e.g., Sapphire Rapids, t=6 and n=11 (assuming that each way has
>>> 128KB; with smaller ways and more associativity, n would be smaller; n
>>> cannot be larger, because 128KB is the largest power-of-2 that divides
>>> the size of an L3 slice). This multiplier size (17*6->23 bits) seems
>>> downright cheap compared to the integer (64*64->128) or FP
>>> multipliers, and, as you point out, also cheap compared to the rest of
>>> the L3 access (both in latency and in energy).
>>> - anton
>>> --
>>> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
>>> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>
>>
>>When you say "physical address" do you mean PA[16:6] a.k.a. Line Index in
>>classic Nehalem-to-SkylakeClient Intel LLC? Or something else?
>
>I wrote "n+t bits from the physical address", which would be 17 bits
>in the example above. So yes, maybe PA[22:6].

To quote one of my colleagues:

"It is well understood that the bits of a key (such as an address)
should not be used as-is to index into a cache or similar table".

One needs to give some thought to how the islands/slices themselves
are identified for routing purposes. For rings, each island has an
unique address (small integer log2(island-count)). For a mesh, each
island has an X and Y coordinate.

To support binning, there needs to be a function to map the output
of the address hash function into a destination address (ring or mesh);
which could be a simple hardware lookup table (flops or ram) programmed
by firmware or on-board microcontroller early in the boot process
before caching is enabled on the CPU.

A useful characteristic of a hash function is invertibility.

Note that the destination for an access to a physical address may
not necessarily be an L3 island, but rather to an I/O host bridge.

Re: L3 cache allocation on Intel

<uHyYM.84784$tnmf.71840@fx09.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34599&group=comp.arch#34599

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx09.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <SCVXM.40044$MJ59.16148@fx10.iad> <bTVXM.87760$8fO.38095@fx15.iad> <2023Oct19.211807@mips.complang.tuwien.ac.at>
In-Reply-To: <2023Oct19.211807@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 38
Message-ID: <uHyYM.84784$tnmf.71840@fx09.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Fri, 20 Oct 2023 17:22:34 UTC
Date: Fri, 20 Oct 2023 13:21:54 -0400
X-Received-Bytes: 2251
 by: EricP - Fri, 20 Oct 2023 17:21 UTC

Anton Ertl wrote:
> EricP <ThatWouldBeTelling@thevillage.com> writes:
>> EricP wrote:
>>> This recent article talks some about Intel L3 and its connections:
>>>
>>> https://chipsandcheese.com/2023/08/04/sandy-bridge-setting-intels-modern-foundation/
>> This also talks about L3 slices
>>
>> Sapphire Rapids: Golden Cove Hits Servers
>> https://chipsandcheese.com/2023/03/12/a-peek-at-sapphire-rapids/
>
> Good stuff, yes, and I looked at it again now, but if it says anything
> about L3 replacement policies, I missed it again.
>
> - anton

Sorry, I didn't read your msg close enough.

I put 'intel "L3" cache slice' into Google Scholar and got a few hits:

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=intel+%22L3%22+cache+slice&btnG=

with titles that sound like what you are looking for, such as:

Systematic reverse engineering of cache slice selection
in Intel processors, 2015
https://eprint.iacr.org/2015/690.pdf

Mapping addresses to l3/cha slices in intel processors, JD McCalpin, 2021
https://repositories.lib.utexas.edu/handle/2152/87595
https://repositories.lib.utexas.edu/bitstream/handle/2152/87595/TR-2021-03_IntelAddressHashing.pdf?sequence=45&isAllowed=y

Mapping core and L3 slice numbering to die location in
Intel Xeon Scalable processors, JD McCalpin, 2021
https://repositories.lib.utexas.edu/handle/2152/86168
https://repositories.lib.utexas.edu/bitstream/handle/2152/86168/TR-2021-01b_SKX_CLX_Die_Layout.pdf?sequence=3

Re: L3 cache allocation on Intel

<2023Oct20.191828@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34600&group=comp.arch#34600

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
Date: Fri, 20 Oct 2023 17:18:28 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 29
Message-ID: <2023Oct20.191828@mips.complang.tuwien.ac.at>
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <vqWXM.40046$MJ59.2768@fx10.iad> <2023Oct19.212155@mips.complang.tuwien.ac.at> <ugt4bf$t978$1@dont-email.me> <2023Oct20.103348@mips.complang.tuwien.ac.at> <914e69a3-d66c-4ddd-a136-d54a6742bc3en@googlegroups.com> <2023Oct20.171424@mips.complang.tuwien.ac.at> <IQxYM.59453$sxoa.57335@fx13.iad>
Injection-Info: dont-email.me; posting-host="bcc076762886499dbe4be495028f1ef1";
logging-data="1297667"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18H9etQGWVLXNbytWJKyb0x"
Cancel-Lock: sha1:/6OCMEP9qWQD6aCVU/+A6G0DxBE=
X-newsreader: xrn 10.11
 by: Anton Ertl - Fri, 20 Oct 2023 17:18 UTC

scott@slp53.sl.home (Scott Lurndal) writes:
>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>To quote one of my colleagues:
>
> "It is well understood that the bits of a key (such as an address)
> should not be used as-is to index into a cache or similar table".

It's not well understood by most cache designers who have used bits of
the address to select the set. And I don't see any reason why they
should not be used. Maybe your collegue can provide a justification
for the claim rather than just a claim of well-understood-ness.

>A useful characteristic of a hash function is invertibility.

Why? Given that the result is intended to be a non-power-of-2 slice
number, and the actual physical addresses are a very small and
possibly distributed subset of the huge address space, I don't see
that you can get that characteristic even if it is useful.

>Note that the destination for an access to a physical address may
>not necessarily be an L3 island, but rather to an I/O host bridge.

Non-cacheable accesses are identified during translation, and then you
don't need to worry about which L3 slice to access.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: L3 cache allocation on Intel

<m2zYM.140606$rbid.98132@fx18.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34601&group=comp.arch#34601

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx18.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: L3 cache allocation on Intel
Newsgroups: comp.arch
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <vqWXM.40046$MJ59.2768@fx10.iad> <2023Oct19.212155@mips.complang.tuwien.ac.at> <ugt4bf$t978$1@dont-email.me> <2023Oct20.103348@mips.complang.tuwien.ac.at> <914e69a3-d66c-4ddd-a136-d54a6742bc3en@googlegroups.com> <2023Oct20.171424@mips.complang.tuwien.ac.at> <IQxYM.59453$sxoa.57335@fx13.iad> <2023Oct20.191828@mips.complang.tuwien.ac.at>
Lines: 18
Message-ID: <m2zYM.140606$rbid.98132@fx18.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Fri, 20 Oct 2023 17:46:58 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Fri, 20 Oct 2023 17:46:58 GMT
X-Received-Bytes: 1788
 by: Scott Lurndal - Fri, 20 Oct 2023 17:46 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>scott@slp53.sl.home (Scott Lurndal) writes:
>>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>>To quote one of my colleagues:
>>
>> "It is well understood that the bits of a key (such as an address)
>> should not be used as-is to index into a cache or similar table".
>
>It's not well understood by most cache designers who have used bits of
>the address to select the set. And I don't see any reason why they
>should not be used. Maybe your collegue can provide a justification
>for the claim rather than just a claim of well-understood-ness.

"This is due to possible imbalances in the bit statistics
and correlation between bits."

physical addresess aren't random enough to ensure balance.

Re: L3 cache allocation on Intel

<3hAYM.29016$Ssze.11022@fx48.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34602&group=comp.arch#34602

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx48.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <SCVXM.40044$MJ59.16148@fx10.iad> <bTVXM.87760$8fO.38095@fx15.iad> <2023Oct19.211807@mips.complang.tuwien.ac.at>
In-Reply-To: <2023Oct19.211807@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 30
Message-ID: <3hAYM.29016$Ssze.11022@fx48.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Fri, 20 Oct 2023 19:10:55 UTC
Date: Fri, 20 Oct 2023 15:10:29 -0400
X-Received-Bytes: 1810
 by: EricP - Fri, 20 Oct 2023 19:10 UTC

Anton Ertl wrote:
> EricP <ThatWouldBeTelling@thevillage.com> writes:
>> EricP wrote:
>>> This recent article talks some about Intel L3 and its connections:
>>>
>>> https://chipsandcheese.com/2023/08/04/sandy-bridge-setting-intels-modern-foundation/
>> This also talks about L3 slices
>>
>> Sapphire Rapids: Golden Cove Hits Servers
>> https://chipsandcheese.com/2023/03/12/a-peek-at-sapphire-rapids/
>
> Good stuff, yes, and I looked at it again now, but if it says anything
> about L3 replacement policies, I missed it again.
>
> - anton

I notice that Intel has Resource Director Technology (RDT)
which also might interact with L3 allocation and access time:
- Cache Allocation Technology (CAT) can control
which cache *ways* are allocated to which cores
- Memory Bandwidth Allocation (MBA) can throttle the bandwidth
between per-core L2 cache and L3 interconnect.

A Closer Look at Intel Resource Director Technology (RDT), 2022
https://par.nsf.gov/biblio/10332455
https://par.nsf.gov/servlets/purl/10332455

Re: L3 cache allocation on Intel

<ff7eae0a-7955-4b11-b857-0393ac340c24n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34603&group=comp.arch#34603

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5417:0:b0:410:9af1:f9db with SMTP id b23-20020ac85417000000b004109af1f9dbmr46064qtq.8.1697831106008;
Fri, 20 Oct 2023 12:45:06 -0700 (PDT)
X-Received: by 2002:a05:6830:1da8:b0:6bd:909:eb1a with SMTP id
z8-20020a0568301da800b006bd0909eb1amr743900oti.3.1697831105836; Fri, 20 Oct
2023 12:45:05 -0700 (PDT)
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!2.eu.feeder.erje.net!feeder.erje.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 20 Oct 2023 12:45:05 -0700 (PDT)
In-Reply-To: <3hAYM.29016$Ssze.11022@fx48.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:c961:cb0b:2ea3:b545;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:c961:cb0b:2ea3:b545
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <SCVXM.40044$MJ59.16148@fx10.iad>
<bTVXM.87760$8fO.38095@fx15.iad> <2023Oct19.211807@mips.complang.tuwien.ac.at>
<3hAYM.29016$Ssze.11022@fx48.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ff7eae0a-7955-4b11-b857-0393ac340c24n@googlegroups.com>
Subject: Re: L3 cache allocation on Intel
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Fri, 20 Oct 2023 19:45:06 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Fri, 20 Oct 2023 19:45 UTC

It seems to me that all the L3 chunks are on a ring bus.
And that getting to RingBus[mod-t] requires a trip around the ring.
<
So, why not just snoop that L3 of the ring node you are on while
passing from where you are to where the data is.
<
That is, no special hashing needed, get data if you run into it,
use snoop statistics to decide where to put it if no snoops succeed.
<
So, why are we using a hash again ??

Re: L3 cache allocation on Intel

<VGBYM.64281$HwD9.37647@fx11.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34604&group=comp.arch#34604

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx11.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
References: <2023Oct18.190923@mips.complang.tuwien.ac.at>
In-Reply-To: <2023Oct18.190923@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 41
Message-ID: <VGBYM.64281$HwD9.37647@fx11.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Fri, 20 Oct 2023 20:46:45 UTC
Date: Fri, 20 Oct 2023 16:46:18 -0400
X-Received-Bytes: 2424
 by: EricP - Fri, 20 Oct 2023 20:46 UTC

Anton Ertl wrote:
> A student asked me a question that essentially reduces to: How does
> Intel allocate/replace cache lines in the L3? The L3 consists of a
> number of slices; e.g., on the Raptor Lake a slice has 3MB, and there
> is one slice per Raptor Cove core and one per 4 Gracemont cores,
> resulting in, e.g., 12 slices for a 13900.
>
> Now if each core allocated only in its own slice, many programs would
> see an effect of only 3MB of L3 cache, but Intel processors are well
> documented to access all the cache even from one thread running on
> only one core.
>
> Using some pseudo-LRU replacement across the whole L3 also seems
> unlikely to me (too slow and too power-expensive).
>
> The way I would do it if I don't want to be limited to a slice is to
> select some slice (maybe with a per-core round-robin scheme), and
> allocate the line there (with pseudo-LRU within the slice).
>
> How does Intel do it?
>
> - anton

According to this which has two sections on Intel replacement policy:

Understanding HPC Benchmark Performance on Intel Broadwell
and Cascade Lake Processors, 2020
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295341/pdf/978-3-030-50743-5_Chapter_21.pdf

"L3 Cache Replacement Policy
....
new Intel microarchitectures have implemented dynamic replacement policies
....
Instead of applying the same pseudo-LRU policy to all workloads,
post-SNB processors make use of a small amount of dedicated leader sets,
each of which implements a different replacement policy. During execution,
the processor constantly monitors which of the leader sets delivers the
highest hit rate, and instructs all remaining sets (also called follower
sets) to use the best-performing leader set’s replacement strategy"

Re: L3 cache allocation on Intel

<u3CYM.17017$rEF.4533@fx47.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34605&group=comp.arch#34605

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx47.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <SCVXM.40044$MJ59.16148@fx10.iad> <bTVXM.87760$8fO.38095@fx15.iad> <2023Oct19.211807@mips.complang.tuwien.ac.at> <3hAYM.29016$Ssze.11022@fx48.iad> <ff7eae0a-7955-4b11-b857-0393ac340c24n@googlegroups.com>
In-Reply-To: <ff7eae0a-7955-4b11-b857-0393ac340c24n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 18
Message-ID: <u3CYM.17017$rEF.4533@fx47.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Fri, 20 Oct 2023 21:12:58 UTC
Date: Fri, 20 Oct 2023 17:12:51 -0400
X-Received-Bytes: 1561
 by: EricP - Fri, 20 Oct 2023 21:12 UTC

MitchAlsup wrote:
> It seems to me that all the L3 chunks are on a ring bus.
> And that getting to RingBus[mod-t] requires a trip around the ring.
> <
> So, why not just snoop that L3 of the ring node you are on while
> passing from where you are to where the data is.
> <
> That is, no special hashing needed, get data if you run into it,
> use snoop statistics to decide where to put it if no snoops succeed.
> <
> So, why are we using a hash again ??

One source said it was to minimize contention.
Presumably they didn't want consecutive lines from
the same page to all come from the same source.

Re: L3 cache allocation on Intel

<0ea7e82b-99db-4099-8ed1-a269436f0e2en@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34606&group=comp.arch#34606

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:6594:b0:770:58ab:afb4 with SMTP id qd20-20020a05620a659400b0077058abafb4mr51964qkn.8.1697846192179;
Fri, 20 Oct 2023 16:56:32 -0700 (PDT)
X-Received: by 2002:a05:6870:a446:b0:1e9:9b32:3e56 with SMTP id
n6-20020a056870a44600b001e99b323e56mr1435307oal.7.1697846191936; Fri, 20 Oct
2023 16:56:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 20 Oct 2023 16:56:31 -0700 (PDT)
In-Reply-To: <u3CYM.17017$rEF.4533@fx47.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:c961:cb0b:2ea3:b545;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:c961:cb0b:2ea3:b545
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <SCVXM.40044$MJ59.16148@fx10.iad>
<bTVXM.87760$8fO.38095@fx15.iad> <2023Oct19.211807@mips.complang.tuwien.ac.at>
<3hAYM.29016$Ssze.11022@fx48.iad> <ff7eae0a-7955-4b11-b857-0393ac340c24n@googlegroups.com>
<u3CYM.17017$rEF.4533@fx47.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0ea7e82b-99db-4099-8ed1-a269436f0e2en@googlegroups.com>
Subject: Re: L3 cache allocation on Intel
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Fri, 20 Oct 2023 23:56:32 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2462
 by: MitchAlsup - Fri, 20 Oct 2023 23:56 UTC

On Friday, October 20, 2023 at 4:13:02 PM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > It seems to me that all the L3 chunks are on a ring bus.
> > And that getting to RingBus[mod-t] requires a trip around the ring.
> > <
> > So, why not just snoop that L3 of the ring node you are on while
> > passing from where you are to where the data is.
> > <
> > That is, no special hashing needed, get data if you run into it,
> > use snoop statistics to decide where to put it if no snoops succeed.
> > <
> > So, why are we using a hash again ??
> One source said it was to minimize contention.
> Presumably they didn't want consecutive lines from
> the same page to all come from the same source.
<
Back in 2000:: optimal large-data-set memory movement minimized
page crossing (linear address sequence) and a single ACT to DRAM
could satisfy at least 4096 bytes of data, thus being faster than
spreading the ACTs across the available DRAM DIMMs; lower power,
too.
<
What has changed ??

Re: L3 cache allocation on Intel

<uh0efg$1o4ut$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34607&group=comp.arch#34607

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
Date: Sat, 21 Oct 2023 13:59:44 +0200
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <uh0efg$1o4ut$1@dont-email.me>
References: <2023Oct18.190923@mips.complang.tuwien.ac.at>
<SCVXM.40044$MJ59.16148@fx10.iad> <bTVXM.87760$8fO.38095@fx15.iad>
<2023Oct19.211807@mips.complang.tuwien.ac.at>
<3hAYM.29016$Ssze.11022@fx48.iad>
<ff7eae0a-7955-4b11-b857-0393ac340c24n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 21 Oct 2023 11:59:44 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="62550e5a04a0e30a38ef59d26d9548d5";
logging-data="1840093"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/xIk+ijZY5j37udGhKJUZcE1KbptFqklKLYrTpdeGe1A=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17.1
Cancel-Lock: sha1:rKNZdJmQCXoccCRn3cQu0tuIqLU=
In-Reply-To: <ff7eae0a-7955-4b11-b857-0393ac340c24n@googlegroups.com>
 by: Terje Mathisen - Sat, 21 Oct 2023 11:59 UTC

MitchAlsup wrote:
> It seems to me that all the L3 chunks are on a ring bus.
> And that getting to RingBus[mod-t] requires a trip around the ring.
> <
> So, why not just snoop that L3 of the ring node you are on while
> passing from where you are to where the data is.
> <
> That is, no special hashing needed, get data if you run into it,
> use snoop statistics to decide where to put it if no snoops succeed.
> <
> So, why are we using a hash again ??
>
Each node/L3 slice needs a way to determine if it should cache a
particular line?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: L3 cache allocation on Intel

<2023Oct21.165243@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34611&group=comp.arch#34611

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: L3 cache allocation on Intel
Date: Sat, 21 Oct 2023 14:52:43 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 53
Message-ID: <2023Oct21.165243@mips.complang.tuwien.ac.at>
References: <2023Oct18.190923@mips.complang.tuwien.ac.at> <SCVXM.40044$MJ59.16148@fx10.iad> <bTVXM.87760$8fO.38095@fx15.iad> <2023Oct19.211807@mips.complang.tuwien.ac.at> <uHyYM.84784$tnmf.71840@fx09.iad>
Injection-Info: dont-email.me; posting-host="c39ae45fc6a8fe0ba320fe5b1bcaadd9";
logging-data="1932252"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18ZRlGLClm6YLELFK14ORIg"
Cancel-Lock: sha1:QhHYC2gF0b3uldbUcxoSmrBgEy8=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sat, 21 Oct 2023 14:52 UTC

EricP <ThatWouldBeTelling@thevillage.com> writes:
>Systematic reverse engineering of cache slice selection
>in Intel processors, 2015
>https://eprint.iacr.org/2015/690.pdf

According to this, Intel selects the slice using a hash function that
uses PA[32:17] for Ivy Bridge and Haswell. Only power-of-two number
of slices occur in this paper. This paper discusses how to reverse
engineer this knowledge.

This paper shows in Figure 2 that the set is determined without going
through the hash function. It also shows that the set is determined
from the lowest bits beyond the 6 bits used for addressing the byte
within the cache line. And, as mentioned above, the has value is
determined by higher bits. So if there is an aligned block of 128KB
consecutive physical addresses (say, in a huge page), it will be
allocated in the same slice. Is this good, bad, or don't care? My
feeling is that I would prefer to distribute consecutive cache lines
across slices.

>Mapping addresses to l3/cha slices in intel processors, JD McCalpin, 2021
>https://repositories.lib.utexas.edu/handle/2152/87595
>https://repositories.lib.utexas.edu/bitstream/handle/2152/87595/TR-2021-03_IntelAddressHashing.pdf?sequence=45&isAllowed=y

This one discusses mainly Skylake-X/Cascade Lake CPUs with various
core counts, plus an IceLake-X and a Knight's Landing. The hash
functions used here use PA[37:10]..PA[37:20] (depending on the number
of slices).

The hash function is used for determining the slice. If I understand
table 9 correctly, it is not particularly good at uniform distribution
if the number of slices is not a power of 2, with factors >1.25
between the popularity of slices for some CPUs (or, for a less
dramatic number, the most heavily allocated lines receive 1.6% to 3.7%
(KNL 7.6%) more cache lines than with a uniform distribution (the
least allocated receive significantly less than with a uniform
distribution).

I have not read through the paper completely, but it seems that the
set number is permuted based on the hash. So while in the cases of
using PA[37:20], an aligned 1MB block of physical addresses will be
allocated to the same L3 slice, the set number depends not only on the
low bits, but also on the higher bits. This may be helpful in case
where you have a lot of huge pages around of which you don't use the
end: then the parmutation helps distributing these addresses more
evenly among sets.

Thanks for finding these papers.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Pages:12
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor