Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

In English, every word can be verbed. Would that it were so in our programming languages.


devel / comp.arch / Re: "Mini" tags to reduce the number of op codes

SubjectAuthor
* "Mini" tags to reduce the number of op codesStephen Fuld
+* Re: "Mini" tags to reduce the number of op codesAnton Ertl
|+* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
||`* Re: "Mini" tags to reduce the number of op codesTerje Mathisen
|| +- Re: "Mini" tags to reduce the number of op codesTerje Mathisen
|| `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
||  `* Re: "Mini" tags to reduce the number of op codesTerje Mathisen
||   `- Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|`- Re: "Mini" tags to reduce the number of op codesStephen Fuld
+* Re: "Mini" tags to reduce the number of op codesEricP
|`* Re: "Mini" tags to reduce the number of op codesStephen Fuld
| `- Re: "Mini" tags to reduce the number of op codesMitchAlsup1
+* Re: "Mini" tags to reduce the number of op codesThomas Koenig
|`* Re: "Mini" tags to reduce the number of op codesStephen Fuld
| `- Re: "Mini" tags to reduce the number of op codesBGB-Alt
+* Re: "Mini" tags to reduce the number of op codesBGB-Alt
|+* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
||+* Re: "Mini" tags to reduce the number of op codesTerje Mathisen
|||+* Re: "Mini" tags to reduce the number of op codesMichael S
||||`* Re: "Mini" tags to reduce the number of op codesTerje Mathisen
|||| `- Re: "Mini" tags to reduce the number of op codesMichael S
|||`* Re: "Mini" tags to reduce the number of op codesBGB-Alt
||| `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|||  `- Re: "Mini" tags to reduce the number of op codesBGB
||`- Re: "Mini" tags to reduce the number of op codesStephen Fuld
|`* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
| +- Re: "Mini" tags to reduce the number of op codesScott Lurndal
| `- Re: "Mini" tags to reduce the number of op codesBGB
+* Re: "Mini" tags to reduce the number of op codesJohn Savard
|+- Re: "Mini" tags to reduce the number of op codesBGB-Alt
|`* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
| `* Re: "Mini" tags to reduce the number of op codesJohn Savard
|  +* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|  |`* Re: "Mini" tags to reduce the number of op codesJohn Savard
|  | +* Re: "Mini" tags to reduce the number of op codesThomas Koenig
|  | |`- Re: "Mini" tags to reduce the number of op codesJohn Savard
|  | `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|  |  `- Re: "Mini" tags to reduce the number of op codesJohn Savard
|  `* Re: "Mini" tags to reduce the number of op codesThomas Koenig
|   `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|    `* Re: "Mini" tags to reduce the number of op codesThomas Koenig
|     +- Re: "Mini" tags to reduce the number of op codesAnton Ertl
|     `* Re: "Mini" tags to reduce the number of op codesThomas Koenig
|      +* Re: "Mini" tags to reduce the number of op codesBGB
|      |`* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      | `* Re: "Mini" tags to reduce the number of op codesBGB-Alt
|      |  +* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  |+* Re: "Mini" tags to reduce the number of op codesBGB
|      |  ||`* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || +* Re: "Mini" tags to reduce the number of op codesScott Lurndal
|      |  || |+- Re: "Mini" tags to reduce the number of op codesBGB-Alt
|      |  || |+* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || ||`* Re: "Mini" tags to reduce the number of op codesMichael S
|      |  || || `* Re: "Mini" tags to reduce the number of op codesBGB
|      |  || ||  `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || ||   +* Re: "Mini" tags to reduce the number of op codesBGB-Alt
|      |  || ||   |`* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || ||   | `* Re: "Mini" tags to reduce the number of op codesBGB
|      |  || ||   |  `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || ||   |   `* Re: "Mini" tags to reduce the number of op codesBGB
|      |  || ||   |    +- Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || ||   |    `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || ||   |     +- Re: "Mini" tags to reduce the number of op codesBGB
|      |  || ||   |     `- Re: "Mini" tags to reduce the number of op codesBGB-Alt
|      |  || ||   `* Re: "Mini" tags to reduce the number of op codesMichael S
|      |  || ||    +* Re: "Mini" tags to reduce the number of op codesScott Lurndal
|      |  || ||    |`- Re: "Mini" tags to reduce the number of op codesMichael S
|      |  || ||    `- Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || |`- Re: "Mini" tags to reduce the number of op codesTerje Mathisen
|      |  || `* Re: "Mini" tags to reduce the number of op codesBGB-Alt
|      |  ||  `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  ||   `- Re: "Mini" tags to reduce the number of op codesBGB
|      |  |`* Re: "Mini" tags to reduce the number of op codesPaul A. Clayton
|      |  | +- Re: "Mini" tags to reduce the number of op codesBGB
|      |  | `* Re: "Mini" tags to reduce the number of op codesScott Lurndal
|      |  |  +* Re: "Mini" tags to reduce the number of op codesBGB-Alt
|      |  |  |`- Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  |  +* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  |  |`- Re: "Mini" tags to reduce the number of op codesPaul A. Clayton
|      |  |  `- Re: "Mini" tags to reduce the number of op codesPaul A. Clayton
|      |  `* Re: "Mini" tags to reduce the number of op codesChris M. Thomasson
|      |   `* Re: "Mini" tags to reduce the number of op codesBGB
|      |    `* Re: "Mini" tags to reduce the number of op codesChris M. Thomasson
|      |     `- Re: "Mini" tags to reduce the number of op codesBGB-Alt
|      `- Re: "Mini" tags to reduce the number of op codesBrian G. Lucas
`- Re: "Mini" tags to reduce the number of op codesMitchAlsup1

Pages:1234
Re: "Mini" tags to reduce the number of op codes

<4cecd7c9e19ad08267022913d60fc434@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38302&group=comp.arch#38302

  copy link   Newsgroups: comp.arch
Date: Tue, 16 Apr 2024 18:14:39 +0000
Subject: Re: "Mini" tags to reduce the number of op codes
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$tfLslz302u6b2zVGvpXvKOyYtYjbQ/pQpHsC5tnkwjM8fFrqDpMFS
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uuk100$inj$1@dont-email.me> <2024Apr3.192405@mips.complang.tuwien.ac.at> <86d1dd03deee83e339afa725524ab259@www.novabbs.org> <uvimv7$629s$1@dont-email.me> <983c789e7c6d9f3ca4ffe40fdc3aa709@www.novabbs.org> <uvl6oa$qbkb$1@dont-email.me>
Organization: Rocksolid Light
Message-ID: <4cecd7c9e19ad08267022913d60fc434@www.novabbs.org>
 by: MitchAlsup1 - Tue, 16 Apr 2024 18:14 UTC

Terje Mathisen wrote:

> MitchAlsup1 wrote:
>> Terje Mathisen wrote:
>>
>>> MitchAlsup1 wrote:
>>>>
>>
>>> In the non-OoO (i.e Pentium) days, I would have inverted the loop in
>>> order to hide the latencies as much as possible, resulting in an inner
>>> loop something like this:
>>
>>>   next:
>>>    adc eax,ebx
>>>    mov ebx,[edx+ecx*4]    ; First cycle
>>
>>>    mov [edi+ecx*4],eax
>>>    mov eax,[esi+ecx*4]    ; Second cycle
>>
>>>    inc ecx
>>>    jnz next        ; Third cycle
>>
>>> Terje
>>
>> As opposed to::
>>
>>     .global mpn_add_n
>> mpn_add_n:
>>     MOV   R5,#0     // c
>>     MOV   R6,#0     // i
>>
>>     VEC   R7,{}
>>     LDD   R8,[R2,Ri<<3]       // Load 128-to-512 bits
>>     LDD   R9,[R3,Ri<<3]       // Load 128-to-512 bits
>>     CARRY R5,{{IO}}
>>     ADD   R10,R8,R9           // Add pair to add octal
>>     STD   R10,[R1,Ri<<3]      // Store 128-to-512 bits
>>     LOOP  LT,R6,#1,R4         // increment 2-to-8 times
>>     RET
>>
>> --------------------------------------------------------
>>
>>     LDD   R8,[R2,Ri<<3]       // AGEN cycle 1
>>     LDD   R9,[R3,Ri<<3]       // AGEN cycle 2 data cycle 4
>>     CARRY R5,{{IO}}
>>     ADD   R10,R8,R9           // cycle 4
>>     STD   R10,[R1,Ri<<3]      // AGEN cycle 3 write cycle 5
>>     LOOP  LT,R6,#1,R4         // cycle 3
>>
>> OR
>>
>>     LDD       LDd
>>          LDD       LDd                    ADD
>>               ST        STd
>>               LOOP
>>                    LDD       LDd
>>                         LDD       LDd
>> ADD
>>                              ST        STd
>>                              LOOP
>>
>> 10 instructions (2 iterations) in 4 clocks on a 64-bit 1-wide VVM
>> machine !!
>> without code scheduling heroics.
>>
>> 40 instructions (8 iterations) in 4 clocks on a 512 wide SIMD VVM
>> machine !!

> It all comes down to the carry propagation, right?

> The way I understood the original code, you are doing a very wide
> unsigned add, so you need a carry to propagate from each and every block
> to the next, right?

Most ST pipelines have an align stage to align the data to be stored to where
it needs to be stored, one can extend the carry into this stage if needed,
and capture the a+b and a+b+1 and use carry in to select one or the other.

> If you can do that at half a clock cycle per 64 bit ADD, then consider
> me very impressed!

> Terje

Re: "Mini" tags to reduce the number of op codes

<uvmsh6$ql2d$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38304&group=comp.arch#38304

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Tue, 16 Apr 2024 15:02:13 -0700
Organization: A noiseless patient Spider
Lines: 73
Message-ID: <uvmsh6$ql2d$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me>
<2024Apr3.192405@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 17 Apr 2024 00:02:15 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="e4560d6e9e80920818ac28626eb60e90";
logging-data="873549"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19bOa7EVK+T9BfGsZCFYpzKeu/4URyDSfI="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:zpu5qRhcr+Faj4O/oGoDAT3y41U=
Content-Language: en-US
In-Reply-To: <2024Apr3.192405@mips.complang.tuwien.ac.at>
 by: Stephen Fuld - Tue, 16 Apr 2024 22:02 UTC

On 4/3/2024 10:24 AM, Anton Ertl wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>> The idea is to add 32 bits to the processor state, one per register
>> (though probably not physically part of the register file) as a tag. If
>> set, the bit indicates that the corresponding register contains a
>> floating-point value. Clear indicates not floating point (integer,
>> address, etc.). There would be two additional instructions, load single
>> floating and load double floating, which work the same as the other 32-
>> and 64-bit loads, but in addition to loading the value, set the tag bit
>> for the destination register. Non-floating-point loads would clear the
>> tag bit. As I show below, I don’t think you need any special "store
>> tag" instructions.
> ...
>> But we can go further. There are some opcodes that only make sense for
>> FP operands, e.g. the transcendental instructions. And there are some
>> operations that probably only make sense for non-FP operands, e.g. POP,
>> FF1, probably shifts. Given the tag bit, these could share the same
>> op-code. There may be several more of these.
>
> Certainly makes reading disassembler output fun (or writing the
> disassembler).

Good point. It probably isn't too bad for the arithmetic operations,
etc, but once you extend it as I suggested in the last paragraph it gets
ugly. :-(

big snip
>
>> That is as far as I got. I think you could net save perhaps 8-12 op
>> codes, which is about 10% of the existing op codes - not bad. Is it
>> worth it? To me, a major question is the effect on performance. What
>> is the cost of having to decode the source registers and reading their
>> respective tag bits before knowing which FU to use?
>
> In in OoO CPU, that's pretty heavy.

OK, but in the vast majority of cases (i.e. unless there is something
like a conditional branch that uses floating point or integer depending
upon whether the branch is taken.) the flag bit that a register will
have can be known well in advance. As I said, IANAHG, but that might
make it easier.

> But actually, your idea does not need any computation results for
> determining the tag bits of registers (except during EXIT),

But even here, you almost certainly know what the tag bit for any given
register is long before you execute the EXIT instruction. And remember,
on MY 66000 EXIT is performed lazily, so you have time and the mechanism
is in place to wait if needed.

> so you
> probably can handle the tags in the front end (decoder and renamer).
> Then the tags are really separate and not part of the rgisters that
> have to be renamed, and you don't need to perform any waiting on
> ENTER.
>
> However, in EXIT the front end would have to wait for the result of
> the load/store unit loading the 32 bits, unless you add a special
> mechanism for that. So EXIT would become expensive, one way or the
> other.

Yes.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: "Mini" tags to reduce the number of op codes

<uvmspo$ql2c$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38305&group=comp.arch#38305

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Tue, 16 Apr 2024 15:06:48 -0700
Organization: A noiseless patient Spider
Lines: 67
Message-ID: <uvmspo$ql2c$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me> <YshPN.227779$hN14.133879@fx17.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 17 Apr 2024 00:06:51 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="e4560d6e9e80920818ac28626eb60e90";
logging-data="873548"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/ze8BLQS5Z2lyR4hGsbx4q111jtj1XrIs="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:5fpoTIeGdNc4O2/OfRrk4qeYJeI=
Content-Language: en-US
In-Reply-To: <YshPN.227779$hN14.133879@fx17.iad>
 by: Stephen Fuld - Tue, 16 Apr 2024 22:06 UTC

On 4/3/2024 11:44 AM, EricP wrote:
> Stephen Fuld wrote:
>> There has been discussion here about the benefits of reducing the
>> number of op codes.  One reason not mentioned before is if you have
>> fixed length instructions, you may want to leave as many codes as
>> possible available for future use.  Of course, if you are doing a
>> 16-bit instruction design, where instruction bits are especially
>> tight, you may save enough op-codes to save a bit, perhaps allowing a
>> larger register specifier field, or to allow more instructions in the
>> smaller subset.
>>
>> It is in this spirit that I had an idea, partially inspired by Mill’s
>> use of tags in registers, but not memory.  I worked through this idea
>> using the My 6600 as an example “substrate” for two reasons.  First,
>> it has several features that are “friendly” to the idea.  Second, I
>> know Mitch cares about keeping the number of op codes low.
>>
>> Please bear in mind that this is just the germ of an idea.  It is
>> certainly not fully worked out.  I present it here to stimulate
>> discussions, and because it has been fun to think about.
>>
>> The idea is to add 32 bits to the processor state, one per register
>> (though probably not physically part of the register file) as a tag.
>> If set, the bit indicates that the corresponding register contains a
>> floating-point value.  Clear indicates not floating point (integer,
>> address, etc.).  There would be two additional instructions, load
>> single floating and load double floating, which work the same as the
>> other 32- and 64-bit loads, but in addition to loading the value, set
>> the tag bit for the destination register.  Non-floating-point loads
>> would clear the tag bit.  As I show below, I don’t think you need any
>> special "store tag" instructions.
>
> If you are adding a float/int data type flag you might as well
> also add operand size for floats at least, though some ISA's
> have both int32 and int64 ALU operations for result compatibility.

Not needed for My 66000, as all floating point loads convert the loaded
value to double precision.

big snip

> Currently the opcode data type can tell the uArch how to route
> the  operands internally without knowing the data values.
> For example, FPU reservation stations monitor float operands
> and schedule for just the FPU FADD or FMUL units.
>
> Dynamic data typing would change that to be data dependent routing.
> It means, for example, you can't begin to schedule a uOp
> until you know all its operand types and opcode.

Seems right.

>
> Looks like it makes such distributed decisions impossible.
> Probably everything winds up in a big pile of logic in the center,
> which might be problematic for those things whose complexity grows N^2.
> Not sure how significant that is.

Could be. Again, IANAHG.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: "Mini" tags to reduce the number of op codes

<uvmstq$ql2d$2@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38306&group=comp.arch#38306

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Tue, 16 Apr 2024 15:08:58 -0700
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <uvmstq$ql2d$2@dont-email.me>
References: <uuk100$inj$1@dont-email.me> <uukckh$4g83$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 17 Apr 2024 00:08:59 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="e4560d6e9e80920818ac28626eb60e90";
logging-data="873549"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18u+DzCjjZshm89T3UmyH+3o93Vo1wZoho="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:B71JxdOFKWNi44YJFFEaVbg0C8Y=
In-Reply-To: <uukckh$4g83$1@dont-email.me>
Content-Language: en-US
 by: Stephen Fuld - Tue, 16 Apr 2024 22:08 UTC

On 4/3/2024 1:02 PM, Thomas Koenig wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
>
> [saving opcodes]
>
>
>> The idea is to add 32 bits to the processor state, one per register
>> (though probably not physically part of the register file) as a tag. If
>> set, the bit indicates that the corresponding register contains a
>> floating-point value. Clear indicates not floating point (integer,
>> address, etc.).
>
> I don't think this would save a lot of opcode space, which
> is the important thing.
>
> A typical RISC design has a six-bit major opcode.
> Having three registers takes away fifteen bits, leaving
> eleven, which is far more than anybody would ever want as
> minor opdoce for arithmetic instructions. Compare with
> https://en.wikipedia.org/wiki/DEC_Alpha#Instruction_formats
> where DEC actually left out three bits because they did not
> need them.

I think that is probably true for 32 bit instructions, but what about 16
bit?

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: "Mini" tags to reduce the number of op codes

<uvmv38$16acb$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38307&group=comp.arch#38307

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bohannonindustriesllc@gmail.com (BGB-Alt)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Tue, 16 Apr 2024 17:46:00 -0500
Organization: A noiseless patient Spider
Lines: 76
Message-ID: <uvmv38$16acb$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me> <uukckh$4g83$1@dont-email.me>
<uvmstq$ql2d$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 17 Apr 2024 00:46:18 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f91ef60a688902118f0b09a24a35e407";
logging-data="1255819"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18GK6UQKun2J8q6BW9ACaLopVsn1gVJ7Y0="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:W6XHd2xEMeD/LJvXDbkCNCLTss0=
Content-Language: en-US
In-Reply-To: <uvmstq$ql2d$2@dont-email.me>
 by: BGB-Alt - Tue, 16 Apr 2024 22:46 UTC

On 4/16/2024 5:08 PM, Stephen Fuld wrote:
> On 4/3/2024 1:02 PM, Thomas Koenig wrote:
>> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
>>
>> [saving opcodes]
>>
>>
>>> The idea is to add 32 bits to the processor state, one per register
>>> (though probably not physically part of the register file) as a tag.  If
>>> set, the bit indicates that the corresponding register contains a
>>> floating-point value.  Clear indicates not floating point (integer,
>>> address, etc.).
>>
>> I don't think this would save a lot of opcode space, which
>> is the important thing.
>>
>> A typical RISC design has a six-bit major opcode.
>> Having three registers takes away fifteen bits, leaving
>> eleven, which is far more than anybody would ever want as
>> minor opdoce for arithmetic instructions.  Compare with
>> https://en.wikipedia.org/wiki/DEC_Alpha#Instruction_formats
>> where DEC actually left out three bits because they did not
>> need them.
>
> I think that is probably true for 32 bit instructions, but what about 16
> bit?
>

At least, as I see it...

If 4 bit registers:
16-4-4 => 8
If 5 bit registers:
15-5-5 => 6

Realistically, I don't think 6 bits of opcode is enough except if the
purpose of the 16-bit ops is merely to shorten some common 32-bit ops.

But, a subset of instructions can use 5-bit fields (say, MOV, EXTS.L,
and common Load/Store ops).

Say (in my notation):
MOV Rm, Rn
EXTS.L Rm, Rn
MOV.L (SP, Disp), Rn
MOV.Q (SP, Disp), Rn
MOV.X (SP, Disp), Xn
MOV.L Rn, (SP, Disp)
MOV.Q Rn, (SP, Disp)
MOV.X Xn, (SP, Disp)
As, these tend to be some of the most commonly used instructions.

For most everything else, one can limit things either to the first 16
registers, or the most commonly used 16 registers (if not equivalent to
the first 16).

Though, for 1R ops, it can make sense to have 5-bit registers.

I don't really think 3-bit register fields are worth bothering with;
even if limited to the most common registers. Granted, being limited to
2R encodings is also limiting.

Granted, both Thumb and RVC apparently thought 3-bit register fields
were worthwhile, so...

Similarly, not worth bothering (at all) with 6-bit register fields in
16-bit ops.

Though, if one has 16-bit VLE, a question is how is best to split up 16
vs 32-bit encoding space.

....

Re: "Mini" tags to reduce the number of op codes

<uvn2gr$ql2d$3@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38308&group=comp.arch#38308

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Tue, 16 Apr 2024 16:44:27 -0700
Organization: A noiseless patient Spider
Lines: 152
Message-ID: <uvn2gr$ql2d$3@dont-email.me>
References: <uuk100$inj$1@dont-email.me> <uukduu$4o4p$1@dont-email.me>
<420556afacf3ef3eea07b95498bcbef0@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 17 Apr 2024 01:44:28 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="e4560d6e9e80920818ac28626eb60e90";
logging-data="873549"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19PxXJH7XWpNloOwihnkImyuH+K1NSlnRw="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:wv6JX54YZLfoBVQ0pmWQ+PRoubw=
Content-Language: en-US
In-Reply-To: <420556afacf3ef3eea07b95498bcbef0@www.novabbs.org>
 by: Stephen Fuld - Tue, 16 Apr 2024 23:44 UTC

On 4/3/2024 2:30 PM, MitchAlsup1 wrote:
> BGB-Alt wrote:
>
>> On 4/3/2024 11:43 AM, Stephen Fuld wrote:
>>> There has been discussion here about the benefits of reducing the
>>> number of op codes.  One reason not mentioned before is if you have
>>> fixed length instructions, you may want to leave as many codes as
>>> possible available for future use.  Of course, if you are doing a
>>> 16-bit instruction design, where instruction bits are especially
>>> tight, you may save enough op-codes to save a bit, perhaps allowing a
>>> larger register specifier field, or to allow more instructions in the
>>> smaller subset.
>>>
>>> It is in this spirit that I had an idea, partially inspired by Mill’s
>>> use of tags in registers, but not memory.  I worked through this idea
>>> using the My 6600 as an example “substrate” for two reasons.  First, it
>                66000
Sorry. Typo.

>>> has several features that are “friendly” to the idea.  Second, I know
>>> Mitch cares about keeping the number of op codes low.
>>>
>>> Please bear in mind that this is just the germ of an idea.  It is
>>> certainly not fully worked out.  I present it here to stimulate
>>> discussions, and because it has been fun to think about.
>>>
>>> The idea is to add 32 bits to the processor state, one per register
>>> (though probably not physically part of the register file) as a tag.
>>> If set, the bit indicates that the corresponding register contains a
>>> floating-point value.  Clear indicates not floating point (integer,
>>> address, etc.).  There would be two additional instructions, load
>>> single floating and load double floating, which work the same as the
>>> other 32- and 64-bit loads, but in addition to loading the value, set
>>> the tag bit for the destination register.  Non-floating-point loads
>>> would clear the tag bit.  As I show below, I don’t think you need any
>>> special "store tag" instructions.
>
> What do you do when you want a FP bit pattern interpreted as an integer,
> or vice versa.

As I said below, if you need that, you can use an otherwise :"useless"
instruction, such as ORing a register with itself the modify the tag bits.

>
>>> When executing arithmetic instructions, if the tag bits of both
>>> sources of an instruction are the same, do the appropriate operation
>>> (floating or integer), and set the tag bit of the result register
>>> appropriately.
>>> If the tag bits of the two sources are different, I see several
>>> possibilities.
>>>
>>> 1.    Generate an exception.
>>> 2.    Use the sense of source 1 for the arithmetic operation, but
>>> perform the appropriate conversion on the second operand first,
>>> potentially saving an instruction
>
> Conversions to/from FP often require a rounding mode. How do you specify
> that?

Good point.

>
>>> 3.    Always do the operation in floating point and convert the
>>> integer operand prior to the operation.  (Or, if you prefer, change
>>> floating point to integer in the above description.)
>>> 4.    Same as 2 or 3 above, but don’t do the conversions.
>>>
>>> I suspect this is the least useful choice.  I am not sure which is
>>> the best option.
>>>
>>> Given that, use the same op code for the floating-point and fixed
>>> versions of the same operations.  So we can save eight op codes, the
>>> four arithmetic operations, max, min, abs and compare.  So far, a net
>>> savings of six opcodes.
>>>
>>> But we can go further.  There are some opcodes that only make sense
>>> for FP operands, e.g. the transcendental instructions.  And there are
>>> some operations that probably only make sense for non-FP operands,
>>> e.g. POP, FF1, probably shifts.  Given the tag bit, these could share
>>> the same op-code.  There may be several more of these.
>
> Hands waving:: "Danger Will Robinson, Danger" more waving of hands.

Agreed.

>>> I think this all works fine for a single compilation unit, as the
>>> compiler certainly knows the type of the data.  But what happens with
>>> separate compilations?  The called function probably doesn’t know the
>
> The compiler will certainly have a function prototype. In any event, if FP
> and Integers share a register file the lack of prototype is much less
> stress-
> full to the compiler/linking system.
>
>>> tag value for callee saved registers.  Fortunately, the My 66000
>>> architecture comes to the rescue here.  You would modify the Enter
>>> and Exit instructions to save/restore the tag bits of the registers
>>> they are saving or restoring in the same data structure it uses for
>>> the registers (yes, it adds 32 bits to that structure – minimal
>>> cost).  The same mechanism works for interrupts that take control
>>> away from a running process.
>
> Yes, but we do just fine without the tag and without the stuff mentioned
> above. Neither ENTER nor EXIT care about the 64-bit pattern in the
> register.

I think you need it for callee saved registers to insure the tag is set
correctly for the calling program upon return to it.

>
>>> I don’t think you need to set or clear the tag bits without doing
>>> anything else, but if you do, I think you could “repurpose” some
>>> other instructions to do this, without requiring another op-code.
>>> For example, Oring a register with itself could be used to set the
>>> tag bit and Oring a register with zero could clear it.  These should
>>> be pretty rare.
>
>>> That is as far as I got.  I think you could net save perhaps 8-12 op
>>> codes, which is about 10% of the existing op codes - not bad.  Is it
>>> worth it?
>
> No.
>
>>            To me, a major question is the effect on performance.  What
>>> is the cost of having to decode the source registers and reading
>>> their respective tag bits before knowing which FU to use?
>
> The problem is you have put decode dependent on dynamic pipeline
> information.
> I suggest you don't want to do that. Consider a change from int to FP
> instruction
> as a predicated instruction, so the pipeline cannot DECODE the
> instruction at
> hand until the predicate resolves. Yech.

Good point.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: "Mini" tags to reduce the number of op codes

<dcea65edc5e31ec3b2ed637a9ad8a0bc@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38309&group=comp.arch#38309

  copy link   Newsgroups: comp.arch
Date: Wed, 17 Apr 2024 01:11:12 +0000
Subject: Re: "Mini" tags to reduce the number of op codes
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$xA17f8FrjkrHJoVCgC/haOiXm.SCFhzcOqM3n1P5dtRuoCQz2AaZy
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uuk100$inj$1@dont-email.me> <YshPN.227779$hN14.133879@fx17.iad> <uvmspo$ql2c$1@dont-email.me>
Organization: Rocksolid Light
Message-ID: <dcea65edc5e31ec3b2ed637a9ad8a0bc@www.novabbs.org>
 by: MitchAlsup1 - Wed, 17 Apr 2024 01:11 UTC

Stephen Fuld wrote:

> On 4/3/2024 11:44 AM, EricP wrote:
>>
>>
>> If you are adding a float/int data type flag you might as well
>> also add operand size for floats at least, though some ISA's
>> have both int32 and int64 ALU operations for result compatibility.

> Not needed for My 66000, as all floating point loads convert the loaded
> value to double precision.

Insufficient verbal precision::

My 66000 only cares about the size of a value being loaded from memory
(or ST into memory).

While (float) LDs load the 32-bit value from memory, they remain (float)
while residing in the register; and the High Order 32-bits are ignored.
The (float) register can be consumed by a (float) FP calculation and it
remains (float) after processing.

Small immediates, when consumed by FP instructions, are converted from
integer to <sized> FP during DECODE. So::

FADD R7,R7,#1

adds 1.0D0 to the (double) value in R7 (and takes one 32-bit instruction),
while:

FADDs R7,R7,#1

Adds 1.0E0 to the (float) value in R7.

Re: "Mini" tags to reduce the number of op codes

<9ce02jd7i3uhbqv5iq9djiagblu47vkcs2@4ax.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38310&group=comp.arch#38310

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (John Savard)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Wed, 17 Apr 2024 15:06:05 -0600
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <9ce02jd7i3uhbqv5iq9djiagblu47vkcs2@4ax.com>
References: <uuk100$inj$1@dont-email.me> <6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com> <15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org> <lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com> <9280b28665576d098af53a9416604e36@www.novabbs.org> <e8q71jljlep537vm7tbue7ch37o9q66l8k@4ax.com> <uv19ai$3kitp$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 17 Apr 2024 23:06:06 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2945c5dc9f0f6a427cf330164902ac6e";
logging-data="1909015"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+c1+t5zc+oo1Q/fp4gk7ukD/FoTlK1V+0="
Cancel-Lock: sha1:UID0XDd9/92GtT8kt3Dzb8bk5qY=
X-Newsreader: Forte Free Agent 3.3/32.846
 by: John Savard - Wed, 17 Apr 2024 21:06 UTC

On Mon, 8 Apr 2024 17:25:38 -0000 (UTC), Thomas Koenig
<tkoenig@netcologne.de> wrote:

>John Savard <quadibloc@servername.invalid> schrieb:
>
>> Well, when the computer fetches a 256-bit block of code, the first
>> four bits indicates whether it is composed of 36-bit instructions or
>> 28-bit instructions.
>
>Do you think that instructions which require a certain size (almost)
>always happen to be situated together so they fit in a block?

Well, floating-point and integer instructions of one size each can be
arbitrarily mixed. And when different sizes need to mix, going to
36-bit instructions is low overhead.

John Savard

Re: "Mini" tags to reduce the number of op codes

<6ee02jlul3rameturqm8ss87ht2aq54k3u@4ax.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38311&group=comp.arch#38311

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (John Savard)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Wed, 17 Apr 2024 15:07:18 -0600
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <6ee02jlul3rameturqm8ss87ht2aq54k3u@4ax.com>
References: <uuk100$inj$1@dont-email.me> <6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com> <15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org> <lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com> <9280b28665576d098af53a9416604e36@www.novabbs.org> <e8q71jljlep537vm7tbue7ch37o9q66l8k@4ax.com> <ab4e76f2dc47f737941f9c385220f2a8@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 17 Apr 2024 23:07:19 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2945c5dc9f0f6a427cf330164902ac6e";
logging-data="1909015"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19lcPnma9fVAxLv96eTXHpXkLl/IEhqEo4="
Cancel-Lock: sha1:LaK0OX1TXRkstdA4LRrPdRNXqvo=
X-Newsreader: Forte Free Agent 3.3/32.846
 by: John Savard - Wed, 17 Apr 2024 21:07 UTC

On Mon, 8 Apr 2024 19:56:27 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:

>So, instead of using the branch target address, one rounds it down to
>a 256-bit boundary, reads 256-bits and looks at the first 4-bits to
>determine the format, nd then uses the branch offset to pick a cont-
>tainer which will become the first instruction executed.
>
>Sounds more complicated than necessary.

Yes, I don't disagree. I'm just pointing out that it's possible to
make the mini tags idea work that way, since it lets you easily turn
mini tags off when you need to.

John Savard

Re: "Mini" tags to reduce the number of op codes

<v038qo$bmtm$3@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38343&group=comp.arch#38343

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Sat, 20 Apr 2024 19:19:53 -0400
Organization: A noiseless patient Spider
Lines: 61
Message-ID: <v038qo$bmtm$3@dont-email.me>
References: <uuk100$inj$1@dont-email.me>
<15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org>
<lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com> <uuv1ir$30htt$1@dont-email.me>
<d71c59a1e0342d0d01f8ce7c0f449f9b@www.novabbs.org>
<uv02dn$3b6ik$1@dont-email.me> <uv415n$ck2j$1@dont-email.me>
<uv46rg$e4nb$1@dont-email.me>
<a81256dbd4f121a9345b151b1280162f@www.novabbs.org>
<uv4ghh$gfsv$1@dont-email.me>
<8e61b7c856aff15374ab3cc55956be9d@www.novabbs.org>
<uv7h9k$1ek3q$1@dont-email.me> <7uSRN.161295$m4d.65414@fx43.iad>
<e4443c417f7145d65b04bec48160c629@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 21 Apr 2024 16:45:44 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="5d52f8e0f0694c11b30894cb014da68f";
logging-data="383926"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19XRrEqjjwrZ/AdcLMHGust7qII3iW5Fbs="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:Q9F+O7czcn0qfW1fzQfFRWDqds4=
In-Reply-To: <e4443c417f7145d65b04bec48160c629@www.novabbs.org>
 by: Paul A. Clayton - Sat, 20 Apr 2024 23:19 UTC

On 4/11/24 7:12 PM, MitchAlsup1 wrote:
> Scott Lurndal wrote:
[snip]
>> It seems to me that an offloaded DMA engine would be a far
>> better way to do memmove (over some threshhold, perhaps a
>> cache line) without trashing the caches.   Likewise memset.
>
> Effectively, that is what HW does, even on the lower end machines,
> the AGEN unit of the Cache access pipeline is repeatedly cycled,
> and data is read and/or written. One can execute instructions not
> needing memory references while LDM, STM, ENTER, EXIT, MM, and MS
> are in progress.
>
> Moving this sequencer farther out would still require it to consume
> all L1 BW in any event (snooping) for memory consistency reasons.
> {Note: cache accesses are performed line-wide not register width
> wide}

If the data was not in L1 cache, only its absence would need to be
determined by the DMA engine. A snoop filter, tag-inclusive L2/L3
probing, or similar mechanism could avoid L1 accesses. Even if the
source or destination for a memory copy was in L1, only one L1
access per cache line might be needed.

I also wonder if the cache fill and/or spill mechanism might be
decoupled from the load/store such that if the cache had enough
banks/subarrays some loads and stores might be done in parallel
with a cache fill or spill/external-read-without-eviction. Tag
checking would limit the utility of such, though tags might also
be banked or access flexibly scheduled (at the cost of choosing a
victim early for fills). Of course, if the cache has such
available bandwidth, why not make it available to the core as well
even if it was rarely useful? (Perhaps higher register bandwidth
might be more difficult than higher cache bandwidth for banking-
friendly patterns?)

Deciding when to bypass cache seems difficult (for both software
developers and hardware). Overwriting cache lines within the same
memory copy is obviously silly. Filling a cache with a memory copy
is also suboptimal, but L1 hardware copy-on-write would probably
be too complicated even with page aligned copies. A copy from
cacheable memory to uncacheable memory (I/O) might be a strong
hint that the source should not be installed into L1 or L2 cache,
but I would guess that not installing the source would often be
the right choice.

I could also imagine a programmer wanting to use memory copy as a
prefetch *directive* for a large chunk of memory (by having source
and destination be the same). This idiom would be easy to detect
(from and to base registers being the same), but may be too niche
to be worth detecting (for most implementations).

(My 66000 might use an idiom with a prefetch instruction preceding
a memory move to indicate the cache level of the destination but
that only manages [some of] the difficulty of the hardware
choice.)

For memset, compression is also an obvious possibility. A memset
might not write any cache lines but rather cache the address range
and the set value and perform hardware copy on access into cache
lines.

Re: "Mini" tags to reduce the number of op codes

<v038qp$bmtm$4@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38344&group=comp.arch#38344

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaronclayton@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Sat, 20 Apr 2024 20:02:07 -0400
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <v038qp$bmtm$4@dont-email.me>
References: <uuk100$inj$1@dont-email.me>
<15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org>
<lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com> <uuv1ir$30htt$1@dont-email.me>
<d71c59a1e0342d0d01f8ce7c0f449f9b@www.novabbs.org>
<uv02dn$3b6ik$1@dont-email.me> <uv415n$ck2j$1@dont-email.me>
<uv46rg$e4nb$1@dont-email.me>
<a81256dbd4f121a9345b151b1280162f@www.novabbs.org>
<uv4ghh$gfsv$1@dont-email.me>
<8e61b7c856aff15374ab3cc55956be9d@www.novabbs.org>
<uv7h9k$1ek3q$1@dont-email.me> <7uSRN.161295$m4d.65414@fx43.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 21 Apr 2024 16:45:45 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="5d52f8e0f0694c11b30894cb014da68f";
logging-data="383926"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19RNxz5Vult0ot+r8mq3TjBe7d4O8k5Vzw="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:MAiTSLRlAoOTOuAgBoq0Y/V09GA=
In-Reply-To: <7uSRN.161295$m4d.65414@fx43.iad>
 by: Paul A. Clayton - Sun, 21 Apr 2024 00:02 UTC

On 4/11/24 10:30 AM, Scott Lurndal wrote:
[snip]
>> On 4/9/24 8:28 PM, MitchAlsup1 wrote:
[snip]
>>> MMs and MSs that do not cross page boundaries are ATOMIC. The
>>> entire system
>>> sees only the before or only the after state and nothing in
>>> between.
>
> One might wonder how that atomicity is guaranteed in a
> SMP processor...

While Mitch Alsup's response ("The entire chunk of data traverses
the interconnect as a single transaction." — I am not certain how
that would work given reading up to a page and writing up to a
page) provides one mechanism and probably the best one,
theoretically the *data* does not need to be moved atomically but
only the "ownership" (the source does not have to be owned in the
traditional sense but needs to marked as readable by the copier).
This is somewhat similar to My 66000's Exotic Synchronization
Mechanism in that once all the addresses involved are known (the
two ranges for memory copy), NAKs can be used for remote requests
for "owned" cache lines while the copy is made.

Only the visibility needs to be atomic.

Memory set provides optimization opportunities in that the source
is small. In theory, the set value could be sent to L3 with the
destination range and all monitoring could be done at L3 and
requested cache line sent immediately from L3 (hardware copy on
access) — the first and last part of the range might be partial
cache lines requiring read-for-ownership.

For cache line aligned copies, a cache which used indirection
between tags and data might not even copy the data but only the
tag-related metadata. Some forms of cache compression might allow
partial cache lines to be cached such that even unaligned copies
might partially share data by having one tag indicate lossy
compression with an indication of where the stored data is not
valid, but that seems too funky to be practical.

Pages:1234
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor