Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

The only perfect science is hind-sight.


devel / comp.arch / "Mini" tags to reduce the number of op codes

SubjectAuthor
* "Mini" tags to reduce the number of op codesStephen Fuld
+* Re: "Mini" tags to reduce the number of op codesAnton Ertl
|+* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
||`* Re: "Mini" tags to reduce the number of op codesTerje Mathisen
|| +- Re: "Mini" tags to reduce the number of op codesTerje Mathisen
|| `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
||  `* Re: "Mini" tags to reduce the number of op codesTerje Mathisen
||   `- Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|`- Re: "Mini" tags to reduce the number of op codesStephen Fuld
+* Re: "Mini" tags to reduce the number of op codesEricP
|`* Re: "Mini" tags to reduce the number of op codesStephen Fuld
| `- Re: "Mini" tags to reduce the number of op codesMitchAlsup1
+* Re: "Mini" tags to reduce the number of op codesThomas Koenig
|`* Re: "Mini" tags to reduce the number of op codesStephen Fuld
| `- Re: "Mini" tags to reduce the number of op codesBGB-Alt
+* Re: "Mini" tags to reduce the number of op codesBGB-Alt
|+* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
||+* Re: "Mini" tags to reduce the number of op codesTerje Mathisen
|||+* Re: "Mini" tags to reduce the number of op codesMichael S
||||`* Re: "Mini" tags to reduce the number of op codesTerje Mathisen
|||| `- Re: "Mini" tags to reduce the number of op codesMichael S
|||`* Re: "Mini" tags to reduce the number of op codesBGB-Alt
||| `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|||  `- Re: "Mini" tags to reduce the number of op codesBGB
||`- Re: "Mini" tags to reduce the number of op codesStephen Fuld
|`* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
| +- Re: "Mini" tags to reduce the number of op codesScott Lurndal
| `- Re: "Mini" tags to reduce the number of op codesBGB
+* Re: "Mini" tags to reduce the number of op codesJohn Savard
|+- Re: "Mini" tags to reduce the number of op codesBGB-Alt
|`* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
| `* Re: "Mini" tags to reduce the number of op codesJohn Savard
|  +* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|  |`* Re: "Mini" tags to reduce the number of op codesJohn Savard
|  | +* Re: "Mini" tags to reduce the number of op codesThomas Koenig
|  | |`- Re: "Mini" tags to reduce the number of op codesJohn Savard
|  | `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|  |  `- Re: "Mini" tags to reduce the number of op codesJohn Savard
|  `* Re: "Mini" tags to reduce the number of op codesThomas Koenig
|   `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|    `* Re: "Mini" tags to reduce the number of op codesThomas Koenig
|     +- Re: "Mini" tags to reduce the number of op codesAnton Ertl
|     `* Re: "Mini" tags to reduce the number of op codesThomas Koenig
|      +* Re: "Mini" tags to reduce the number of op codesBGB
|      |`* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      | `* Re: "Mini" tags to reduce the number of op codesBGB-Alt
|      |  +* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  |+* Re: "Mini" tags to reduce the number of op codesBGB
|      |  ||`* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || +* Re: "Mini" tags to reduce the number of op codesScott Lurndal
|      |  || |+- Re: "Mini" tags to reduce the number of op codesBGB-Alt
|      |  || |+* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || ||`* Re: "Mini" tags to reduce the number of op codesMichael S
|      |  || || `* Re: "Mini" tags to reduce the number of op codesBGB
|      |  || ||  `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || ||   +* Re: "Mini" tags to reduce the number of op codesBGB-Alt
|      |  || ||   |`* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || ||   | `* Re: "Mini" tags to reduce the number of op codesBGB
|      |  || ||   |  `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || ||   |   `* Re: "Mini" tags to reduce the number of op codesBGB
|      |  || ||   |    +- Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || ||   |    `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || ||   |     +- Re: "Mini" tags to reduce the number of op codesBGB
|      |  || ||   |     `- Re: "Mini" tags to reduce the number of op codesBGB-Alt
|      |  || ||   `* Re: "Mini" tags to reduce the number of op codesMichael S
|      |  || ||    +* Re: "Mini" tags to reduce the number of op codesScott Lurndal
|      |  || ||    |`- Re: "Mini" tags to reduce the number of op codesMichael S
|      |  || ||    `- Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  || |`- Re: "Mini" tags to reduce the number of op codesTerje Mathisen
|      |  || `* Re: "Mini" tags to reduce the number of op codesBGB-Alt
|      |  ||  `* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  ||   `- Re: "Mini" tags to reduce the number of op codesBGB
|      |  |`* Re: "Mini" tags to reduce the number of op codesPaul A. Clayton
|      |  | +- Re: "Mini" tags to reduce the number of op codesBGB
|      |  | `* Re: "Mini" tags to reduce the number of op codesScott Lurndal
|      |  |  +* Re: "Mini" tags to reduce the number of op codesBGB-Alt
|      |  |  |`- Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  |  +* Re: "Mini" tags to reduce the number of op codesMitchAlsup1
|      |  |  |`- Re: "Mini" tags to reduce the number of op codesPaul A. Clayton
|      |  |  `- Re: "Mini" tags to reduce the number of op codesPaul A. Clayton
|      |  `* Re: "Mini" tags to reduce the number of op codesChris M. Thomasson
|      |   `* Re: "Mini" tags to reduce the number of op codesBGB
|      |    `* Re: "Mini" tags to reduce the number of op codesChris M. Thomasson
|      |     `- Re: "Mini" tags to reduce the number of op codesBGB-Alt
|      `- Re: "Mini" tags to reduce the number of op codesBrian G. Lucas
`- Re: "Mini" tags to reduce the number of op codesMitchAlsup1

Pages:1234
"Mini" tags to reduce the number of op codes

<uuk100$inj$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38211&group=comp.arch#38211

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: "Mini" tags to reduce the number of op codes
Date: Wed, 3 Apr 2024 09:43:44 -0700
Organization: A noiseless patient Spider
Lines: 93
Message-ID: <uuk100$inj$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 03 Apr 2024 16:43:44 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="76b2f13c17873d1fc6bc86107ab60e09";
logging-data="19187"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/UECkw+Y831HOaeBB5sUYMWBYnLPcayHo="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:bjcK9+8ysRcmKO8Yt6s+y20x/wI=
Content-Language: en-US
 by: Stephen Fuld - Wed, 3 Apr 2024 16:43 UTC

There has been discussion here about the benefits of reducing the number
of op codes. One reason not mentioned before is if you have fixed
length instructions, you may want to leave as many codes as possible
available for future use. Of course, if you are doing a 16-bit
instruction design, where instruction bits are especially tight, you may
save enough op-codes to save a bit, perhaps allowing a larger register
specifier field, or to allow more instructions in the smaller subset.

It is in this spirit that I had an idea, partially inspired by Mill’s
use of tags in registers, but not memory. I worked through this idea
using the My 6600 as an example “substrate” for two reasons. First, it
has several features that are “friendly” to the idea. Second, I know
Mitch cares about keeping the number of op codes low.

Please bear in mind that this is just the germ of an idea. It is
certainly not fully worked out. I present it here to stimulate
discussions, and because it has been fun to think about.

The idea is to add 32 bits to the processor state, one per register
(though probably not physically part of the register file) as a tag. If
set, the bit indicates that the corresponding register contains a
floating-point value. Clear indicates not floating point (integer,
address, etc.). There would be two additional instructions, load single
floating and load double floating, which work the same as the other 32-
and 64-bit loads, but in addition to loading the value, set the tag bit
for the destination register. Non-floating-point loads would clear the
tag bit. As I show below, I don’t think you need any special "store
tag" instructions.

When executing arithmetic instructions, if the tag bits of both sources
of an instruction are the same, do the appropriate operation (floating
or integer), and set the tag bit of the result register appropriately.
If the tag bits of the two sources are different, I see several
possibilities.

1. Generate an exception.
2. Use the sense of source 1 for the arithmetic operation, but perform
the appropriate conversion on the second operand first, potentially
saving an instruction
3. Always do the operation in floating point and convert the integer
operand prior to the operation. (Or, if you prefer, change floating
point to integer in the above description.)
4. Same as 2 or 3 above, but don’t do the conversions.

I suspect this is the least useful choice. I am not sure which is the
best option.

Given that, use the same op code for the floating-point and fixed
versions of the same operations. So we can save eight op codes, the
four arithmetic operations, max, min, abs and compare. So far, a net
savings of six opcodes.

But we can go further. There are some opcodes that only make sense for
FP operands, e.g. the transcendental instructions. And there are some
operations that probably only make sense for non-FP operands, e.g. POP,
FF1, probably shifts. Given the tag bit, these could share the same
op-code. There may be several more of these.

I think this all works fine for a single compilation unit, as the
compiler certainly knows the type of the data. But what happens with
separate compilations? The called function probably doesn’t know the
tag value for callee saved registers. Fortunately, the My 66000
architecture comes to the rescue here. You would modify the Enter and
Exit instructions to save/restore the tag bits of the registers they are
saving or restoring in the same data structure it uses for the registers
(yes, it adds 32 bits to that structure – minimal cost). The same
mechanism works for interrupts that take control away from a running
process.

I don’t think you need to set or clear the tag bits without doing
anything else, but if you do, I think you could “repurpose” some other
instructions to do this, without requiring another op-code. For
example, Oring a register with itself could be used to set the tag bit
and Oring a register with zero could clear it. These should be pretty rare.

That is as far as I got. I think you could net save perhaps 8-12 op
codes, which is about 10% of the existing op codes - not bad. Is it
worth it? To me, a major question is the effect on performance. What
is the cost of having to decode the source registers and reading their
respective tag bits before knowing which FU to use? If it causes an
extra cycle per instruction, then it is almost certainly not worth it.
IANAHG, so I don’t know. But even if it doesn’t cost any performance, I
think the overall gains are pretty small, and probably not worth it
unless the op-code space is really tight (which, for My 66000 it isn’t).

Anyway, it has been fun thinking about this, so I hope you don’t mind
the, probably too long, post.
Any comments are welcome.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: "Mini" tags to reduce the number of op codes

<2024Apr3.192405@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38212&group=comp.arch#38212

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Wed, 03 Apr 2024 17:24:05 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 111
Message-ID: <2024Apr3.192405@mips.complang.tuwien.ac.at>
References: <uuk100$inj$1@dont-email.me>
Injection-Date: Wed, 03 Apr 2024 17:56:51 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="ba1b3917f92d144e6581c1d70f78ebc1";
logging-data="88138"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX190hhIb6DiFaAspc0XanRMi"
Cancel-Lock: sha1:nAGElC8ZrZ8qBH3rZxLUpS4fqg8=
X-newsreader: xrn 10.11
 by: Anton Ertl - Wed, 3 Apr 2024 17:24 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
>The idea is to add 32 bits to the processor state, one per register
>(though probably not physically part of the register file) as a tag. If
>set, the bit indicates that the corresponding register contains a
>floating-point value. Clear indicates not floating point (integer,
>address, etc.). There would be two additional instructions, load single
>floating and load double floating, which work the same as the other 32-
>and 64-bit loads, but in addition to loading the value, set the tag bit
>for the destination register. Non-floating-point loads would clear the
>tag bit. As I show below, I don’t think you need any special "store
>tag" instructions.
....
>But we can go further. There are some opcodes that only make sense for
>FP operands, e.g. the transcendental instructions. And there are some
>operations that probably only make sense for non-FP operands, e.g. POP,
>FF1, probably shifts. Given the tag bit, these could share the same
>op-code. There may be several more of these.

Certainly makes reading disassembler output fun (or writing the
disassembler). This reminds me of the work on SafeTSA [amme+01] where
they encode only programs that are correct (according to some notion
of correctness).

>I think this all works fine for a single compilation unit, as the
>compiler certainly knows the type of the data. But what happens with
>separate compilations? The called function probably doesn’t know the
>tag value for callee saved registers. Fortunately, the My 66000
>architecture comes to the rescue here. You would modify the Enter and
>Exit instructions to save/restore the tag bits of the registers they are
>saving or restoring in the same data structure it uses for the registers
>(yes, it adds 32 bits to that structure – minimal cost).

That's expensive in an OoO CPU. There you want each tag to be stored
alongside with the other 64 bits of the register, because they should
be renamed at the same time. So the ENTER instruction would depend on
all the registers that it saves (or maybe on all registers). And upon
EXIT the restored registers have to be reassembled (which ist not that
expensive).

I have a similar problem for the carry and overflow bits in
<http://www.complang.tuwien.ac.at/anton/tmp/carry.pdf>, and chose to
let those bits not survive across calls; if there was a cheap solution
for the problem, it would eliminate this drawback of my idea.

>The same
>mechanism works for interrupts that take control away from a running
>process.

For context switches one cannot get around the problem, but they are
much rarer than calls and returns, so requiring a pipeline drain for
them is not so bad.

Concerning interrupts, as long as nesting is limited, one could just
treat the physical registers of the interrupted program as taken, and
execute the interrupt with the remaining physical registers. No need
to save any architectural registers or their tag, carry, or overflow
bits.

>That is as far as I got. I think you could net save perhaps 8-12 op
>codes, which is about 10% of the existing op codes - not bad. Is it
>worth it? To me, a major question is the effect on performance. What
>is the cost of having to decode the source registers and reading their
>respective tag bits before knowing which FU to use?

In in OoO CPU, that's pretty heavy.

But actually, your idea does not need any computation results for
determining the tag bits of registers (except during EXIT), so you
probably can handle the tags in the front end (decoder and renamer).
Then the tags are really separate and not part of the rgisters that
have to be renamed, and you don't need to perform any waiting on
ENTER.

However, in EXIT the front end would have to wait for the result of
the load/store unit loading the 32 bits, unless you add a special
mechanism for that. So EXIT would become expensive, one way or the
other.

@InProceedings{amme+01,
author = {Wolfram Amme and Niall Dalton and Jeffery von Ronne
and Michael Franz},
title = {Safe{TSA}: A Type Safe and Referentially Secure
Mobile-Code Representation Based on Static Single
Assignment Form},
crossref = {sigplan01},
pages = {137--147},
annote = {The basic ideas in this representation are:
variables are named as the pair (distance in the
dominator tree, assignment within basic block);
variables are separated by type, with operations
referring only to variables of the right type (like
integer and FP instructions and registers in
assemblers); memory references use types to encode
that a null-pointer check and/or a range check has
already occured, allowing optimizing these
operations; the resulting code is encoded (using
text compression methods) in a way that supports
only correct code. These ideas are discussed mostly
in a general way, with some Java-specifics, but the
representation supposedly also supports Fortran95
and Ada95. The representation supports some CSE, but
not for address computation operations. The paper
also gives numbers on size (usually a little smaller
than Java bytecode), and some other static metrics,
especially wrt. the effect of optimizations.}
}

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: "Mini" tags to reduce the number of op codes

<YshPN.227779$hN14.133879@fx17.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38214&group=comp.arch#38214

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!nntp.comgw.net!peer03.ams4!peer.am4.highwinds-media.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx17.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
References: <uuk100$inj$1@dont-email.me>
In-Reply-To: <uuk100$inj$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 110
Message-ID: <YshPN.227779$hN14.133879@fx17.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 03 Apr 2024 18:45:12 UTC
Date: Wed, 03 Apr 2024 14:44:27 -0400
X-Received-Bytes: 6732
 by: EricP - Wed, 3 Apr 2024 18:44 UTC

Stephen Fuld wrote:
> There has been discussion here about the benefits of reducing the number
> of op codes. One reason not mentioned before is if you have fixed
> length instructions, you may want to leave as many codes as possible
> available for future use. Of course, if you are doing a 16-bit
> instruction design, where instruction bits are especially tight, you may
> save enough op-codes to save a bit, perhaps allowing a larger register
> specifier field, or to allow more instructions in the smaller subset.
>
> It is in this spirit that I had an idea, partially inspired by Mill’s
> use of tags in registers, but not memory. I worked through this idea
> using the My 6600 as an example “substrate” for two reasons. First, it
> has several features that are “friendly” to the idea. Second, I know
> Mitch cares about keeping the number of op codes low.
>
> Please bear in mind that this is just the germ of an idea. It is
> certainly not fully worked out. I present it here to stimulate
> discussions, and because it has been fun to think about.
>
> The idea is to add 32 bits to the processor state, one per register
> (though probably not physically part of the register file) as a tag. If
> set, the bit indicates that the corresponding register contains a
> floating-point value. Clear indicates not floating point (integer,
> address, etc.). There would be two additional instructions, load single
> floating and load double floating, which work the same as the other 32-
> and 64-bit loads, but in addition to loading the value, set the tag bit
> for the destination register. Non-floating-point loads would clear the
> tag bit. As I show below, I don’t think you need any special "store
> tag" instructions.

If you are adding a float/int data type flag you might as well
also add operand size for floats at least, though some ISA's
have both int32 and int64 ALU operations for result compatibility.

> When executing arithmetic instructions, if the tag bits of both sources
> of an instruction are the same, do the appropriate operation (floating
> or integer), and set the tag bit of the result register appropriately.
> If the tag bits of the two sources are different, I see several
> possibilities.
>
> 1. Generate an exception.
> 2. Use the sense of source 1 for the arithmetic operation, but
> perform the appropriate conversion on the second operand first,
> potentially saving an instruction
> 3. Always do the operation in floating point and convert the integer
> operand prior to the operation. (Or, if you prefer, change floating
> point to integer in the above description.)
> 4. Same as 2 or 3 above, but don’t do the conversions.
>
> I suspect this is the least useful choice. I am not sure which is the
> best option.
>
> Given that, use the same op code for the floating-point and fixed
> versions of the same operations. So we can save eight op codes, the
> four arithmetic operations, max, min, abs and compare. So far, a net
> savings of six opcodes.
>
> But we can go further. There are some opcodes that only make sense for
> FP operands, e.g. the transcendental instructions. And there are some
> operations that probably only make sense for non-FP operands, e.g. POP,
> FF1, probably shifts. Given the tag bit, these could share the same
> op-code. There may be several more of these.
>
> I think this all works fine for a single compilation unit, as the
> compiler certainly knows the type of the data. But what happens with
> separate compilations? The called function probably doesn’t know the
> tag value for callee saved registers. Fortunately, the My 66000
> architecture comes to the rescue here. You would modify the Enter and
> Exit instructions to save/restore the tag bits of the registers they are
> saving or restoring in the same data structure it uses for the registers
> (yes, it adds 32 bits to that structure – minimal cost). The same
> mechanism works for interrupts that take control away from a running
> process.
>
> I don’t think you need to set or clear the tag bits without doing
> anything else, but if you do, I think you could “repurpose” some other
> instructions to do this, without requiring another op-code. For
> example, Oring a register with itself could be used to set the tag bit
> and Oring a register with zero could clear it. These should be pretty
> rare.
>
> That is as far as I got. I think you could net save perhaps 8-12 op
> codes, which is about 10% of the existing op codes - not bad. Is it
> worth it? To me, a major question is the effect on performance. What
> is the cost of having to decode the source registers and reading their
> respective tag bits before knowing which FU to use? If it causes an
> extra cycle per instruction, then it is almost certainly not worth it.
> IANAHG, so I don’t know. But even if it doesn’t cost any performance, I
> think the overall gains are pretty small, and probably not worth it
> unless the op-code space is really tight (which, for My 66000 it isn’t).
>
> Anyway, it has been fun thinking about this, so I hope you don’t mind
> the, probably too long, post.
> Any comments are welcome.

Currently the opcode data type can tell the uArch how to route
the operands internally without knowing the data values.
For example, FPU reservation stations monitor float operands
and schedule for just the FPU FADD or FMUL units.

Dynamic data typing would change that to be data dependent routing.
It means, for example, you can't begin to schedule a uOp
until you know all its operand types and opcode.

Looks like it makes such distributed decisions impossible.
Probably everything winds up in a big pile of logic in the center,
which might be problematic for those things whose complexity grows N^2.
Not sure how significant that is.

Re: "Mini" tags to reduce the number of op codes

<uukckh$4g83$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38215&group=comp.arch#38215

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Wed, 3 Apr 2024 20:02:25 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <uukckh$4g83$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me>
Injection-Date: Wed, 03 Apr 2024 20:02:25 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="444ecd996ba17c8a0c9281729e348bfe";
logging-data="147715"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+nQzhBXbLw+guAOthu+jmVzDzkiek+Mu4="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:KegH2QaKMMOMbTgA+6RrV5yAbsA=
 by: Thomas Koenig - Wed, 3 Apr 2024 20:02 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:

[saving opcodes]

> The idea is to add 32 bits to the processor state, one per register
> (though probably not physically part of the register file) as a tag. If
> set, the bit indicates that the corresponding register contains a
> floating-point value. Clear indicates not floating point (integer,
> address, etc.).

I don't think this would save a lot of opcode space, which
is the important thing.

A typical RISC design has a six-bit major opcode.
Having three registers takes away fifteen bits, leaving
eleven, which is far more than anybody would ever want as
minor opdoce for arithmetic instructions. Compare with
https://en.wikipedia.org/wiki/DEC_Alpha#Instruction_formats
where DEC actually left out three bits because they did not
need them.

What is _really_ eating up opcode space are many- (usually 16-) bit
constants in the instructions.

Re: "Mini" tags to reduce the number of op codes

<uukduu$4o4p$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38217&group=comp.arch#38217

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bohannonindustriesllc@gmail.com (BGB-Alt)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Wed, 3 Apr 2024 15:25:01 -0500
Organization: A noiseless patient Spider
Lines: 227
Message-ID: <uukduu$4o4p$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 03 Apr 2024 20:25:03 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a9947662a71fe9f5c84a2ff97ba1cb43";
logging-data="155801"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Nnp3PF27BmjjanOYMINxmbotMiOlGTh8="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:hVnrD1y7jZYnDkDNVjODSD/S5A8=
Content-Language: en-US
In-Reply-To: <uuk100$inj$1@dont-email.me>
 by: BGB-Alt - Wed, 3 Apr 2024 20:25 UTC

On 4/3/2024 11:43 AM, Stephen Fuld wrote:
> There has been discussion here about the benefits of reducing the number
> of op codes.  One reason not mentioned before is if you have fixed
> length instructions, you may want to leave as many codes as possible
> available for future use.  Of course, if you are doing a 16-bit
> instruction design, where instruction bits are especially tight, you may
> save enough op-codes to save a bit, perhaps allowing a larger register
> specifier field, or to allow more instructions in the smaller subset.
>
> It is in this spirit that I had an idea, partially inspired by Mill’s
> use of tags in registers, but not memory.  I worked through this idea
> using the My 6600 as an example “substrate” for two reasons.  First, it
> has several features that are “friendly” to the idea.  Second, I know
> Mitch cares about keeping the number of op codes low.
>
> Please bear in mind that this is just the germ of an idea.  It is
> certainly not fully worked out.  I present it here to stimulate
> discussions, and because it has been fun to think about.
>
> The idea is to add 32 bits to the processor state, one per register
> (though probably not physically part of the register file) as a tag.  If
> set, the bit indicates that the corresponding register contains a
> floating-point value.  Clear indicates not floating point (integer,
> address, etc.).  There would be two additional instructions, load single
> floating and load double floating, which work the same as the other 32-
> and 64-bit loads, but in addition to loading the value, set the tag bit
> for the destination register.  Non-floating-point loads would clear the
> tag bit.  As I show below, I don’t think you need any special "store
> tag" instructions.
>
> When executing arithmetic instructions, if the tag bits of both sources
> of an instruction are the same, do the appropriate operation (floating
> or integer), and set the tag bit of the result register appropriately.
> If the tag bits of the two sources are different, I see several
> possibilities.
>
> 1.    Generate an exception.
> 2.    Use the sense of source 1 for the arithmetic operation, but
> perform the appropriate conversion on the second operand first,
> potentially saving an instruction
> 3.    Always do the operation in floating point and convert the integer
> operand prior to the operation.  (Or, if you prefer, change floating
> point to integer in the above description.)
> 4.    Same as 2 or 3 above, but don’t do the conversions.
>
> I suspect this is the least useful choice.  I am not sure which is the
> best option.
>
> Given that, use the same op code for the floating-point and fixed
> versions of the same operations.  So we can save eight op codes, the
> four arithmetic operations, max, min, abs and compare.  So far, a net
> savings of six opcodes.
>
> But we can go further.  There are some opcodes that only make sense for
> FP operands, e.g. the transcendental instructions.  And there are some
> operations that probably only make sense for non-FP operands, e.g. POP,
> FF1, probably shifts.  Given the tag bit, these could share the same
> op-code.  There may be several more of these.
>
> I think this all works fine for a single compilation unit, as the
> compiler certainly knows the type of the data.  But what happens with
> separate compilations?  The called function probably doesn’t know the
> tag value for callee saved registers.  Fortunately, the My 66000
> architecture comes to the rescue here.  You would modify the Enter and
> Exit instructions to save/restore the tag bits of the registers they are
> saving or restoring in the same data structure it uses for the registers
> (yes, it adds 32 bits to that structure – minimal cost).  The same
> mechanism works for interrupts that take control away from a running
> process.
>
> I don’t think you need to set or clear the tag bits without doing
> anything else, but if you do, I think you could “repurpose” some other
> instructions to do this, without requiring another op-code.   For
> example, Oring a register with itself could be used to set the tag bit
> and Oring a register with zero could clear it.  These should be pretty
> rare.
>
> That is as far as I got.  I think you could net save perhaps 8-12 op
> codes, which is about 10% of the existing op codes - not bad.  Is it
> worth it?  To me, a major question is the effect on performance.  What
> is the cost of having to decode the source registers and reading their
> respective tag bits before knowing which FU to use?  If it causes an
> extra cycle per instruction, then it is almost certainly not worth it.
> IANAHG, so I don’t know.  But even if it doesn’t cost any performance, I
> think the overall gains are pretty small, and probably not worth it
> unless the op-code space is really tight (which, for My 66000 it isn’t).
>
> Anyway, it has been fun thinking about this, so I hope you don’t mind
> the, probably too long, post.
> Any comments are welcome.
>
>

FWIW:
This doesn't seem too far off from what would be involved with dynamic
typing at the ISA level, but with many of same sorts of drawbacks...

Say, for example, top 2 bits of a register:
00: Object Reference
Next 2 bits:
00: Pointer (with type-tag)
01: ?
1z: Bounded Array
01: Fixnum (route to ALU)
10: Flonum (route to FPU)
11: Other types
00: Smaller value types
Say: int/uint, short/ushort, ...
...

One issue:
Decoding based on register tags would mean needing to know the register
tag bits at the same time the instruction is being decoded. In this
case, one is likely to need two clock-cycles to fully decode the opcode.

ID1: Unpack instruction to figure out register fields, etc.
ID2: Fetch registers, specialize variable instructions based on tag bits.

For timing though, one ideally doesn't want to do anything with the
register values until the EX stages (since ID2 might already be tied up
with the comparably expensive register-forwarding logic), but asking for
3 cycles for decode is a bit much.

Otherwise, if one does not know which FU should handle the operation
until EX1, this has its own issues. Or, possible, the FU's decide
whether to accept the operation:
ALU: Accepts operation if both are fixnum, FPU if both are Flonum.

But, a proper dynamic language allows mixing fixnum and flonum with the
result being implicitly converted to flonum, but from the FPU's POV,
this would effectively require two chained FADD operations (one for the
Fixnum to Flonum conversion, one for the FADD itself).

Many other cases could get hairy, but to have any real benefit, the CPU
would need to be able to deal with them. In cases where the compiler
deals with everything, the type-tags become mostly moot (or potentially
detrimental).

But, then, there is another issue:
C code expects C type semantics to be respected, say:
Signed int overflow wraps at 32 bits (sign extending);
Unsigned int overflow wraps at 32 bits (zero extending);
Variables may not hold values out-of-range for that type;
The 'long long' and 'unsigned long long' types are exactly 64-bit;
...
...

If one has tagged 64-bit registers, then fixnum might not hold the
entire range of 'long long'. If one has 66 or 68 bit registers, then
memory storage is a problem.

If one has untagged registers for cases where they are needed, one has
not saved any encoding space.

And, if one type-tags statically-typed variables, there no real
"value-added" here (and saving a little encoding space at the cost of
making the rest of the CPU more complicated and expensive, isn't much of
a win).

Better as I see it, to leave the CPU itself mostly working with raw
untagged values.

It can make sense to have helper-ops for type-tags, but these don't save
any encoding space, but rather making cases for dealing type-tagged data
a little faster.

Say:
Sign-extending a fixnum to 64 bits;
Setting the tag bits for a fixnum;
Doing the twiddling to convert between Flonum and Double;
Setting the tag for various bit patterns;
Checking the tag(s) against various bit patterns;
...

Where, on a more traditional ISA, the logic to do the bit-twiddling for
type-checking and tag modification are a significant part of the runtime
cost of a dynamically typed language.

With luck, one can have dynamic typing that isn't horribly slow.
But, one still isn't likely to see serious use of dynamic typing in
systems-level programming (if anything, Haskell style type-systems seem
to be more in fashion in this space at present, where trying to get the
code to be accepted by the compiler is itself an exercise in pain).


Click here to read the complete article
Re: "Mini" tags to reduce the number of op codes

<420556afacf3ef3eea07b95498bcbef0@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38218&group=comp.arch#38218

  copy link   Newsgroups: comp.arch
Date: Wed, 3 Apr 2024 21:30:02 +0000
Subject: Re: "Mini" tags to reduce the number of op codes
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$U9Aw.ScwyU3qK7Y8uA9Gn.YS9YN/HLsFskajZYm.sjInYmxsa0.Jy
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uuk100$inj$1@dont-email.me> <uukduu$4o4p$1@dont-email.me>
Organization: Rocksolid Light
Message-ID: <420556afacf3ef3eea07b95498bcbef0@www.novabbs.org>
 by: MitchAlsup1 - Wed, 3 Apr 2024 21:30 UTC

BGB-Alt wrote:

> On 4/3/2024 11:43 AM, Stephen Fuld wrote:
>> There has been discussion here about the benefits of reducing the number
>> of op codes.  One reason not mentioned before is if you have fixed
>> length instructions, you may want to leave as many codes as possible
>> available for future use.  Of course, if you are doing a 16-bit
>> instruction design, where instruction bits are especially tight, you may
>> save enough op-codes to save a bit, perhaps allowing a larger register
>> specifier field, or to allow more instructions in the smaller subset.
>>
>> It is in this spirit that I had an idea, partially inspired by Mill’s
>> use of tags in registers, but not memory.  I worked through this idea
>> using the My 6600 as an example “substrate” for two reasons.  First, it
66000
>> has several features that are “friendly” to the idea.  Second, I know
>> Mitch cares about keeping the number of op codes low.
>>
>> Please bear in mind that this is just the germ of an idea.  It is
>> certainly not fully worked out.  I present it here to stimulate
>> discussions, and because it has been fun to think about.
>>
>> The idea is to add 32 bits to the processor state, one per register
>> (though probably not physically part of the register file) as a tag.  If
>> set, the bit indicates that the corresponding register contains a
>> floating-point value.  Clear indicates not floating point (integer,
>> address, etc.).  There would be two additional instructions, load single
>> floating and load double floating, which work the same as the other 32-
>> and 64-bit loads, but in addition to loading the value, set the tag bit
>> for the destination register.  Non-floating-point loads would clear the
>> tag bit.  As I show below, I don’t think you need any special "store
>> tag" instructions.

What do you do when you want a FP bit pattern interpreted as an integer,
or vice versa.
>> When executing arithmetic instructions, if the tag bits of both sources
>> of an instruction are the same, do the appropriate operation (floating
>> or integer), and set the tag bit of the result register appropriately.
>> If the tag bits of the two sources are different, I see several
>> possibilities.
>>
>> 1.    Generate an exception.
>> 2.    Use the sense of source 1 for the arithmetic operation, but
>> perform the appropriate conversion on the second operand first,
>> potentially saving an instruction

Conversions to/from FP often require a rounding mode. How do you specify that?

>> 3.    Always do the operation in floating point and convert the integer
>> operand prior to the operation.  (Or, if you prefer, change floating
>> point to integer in the above description.)
>> 4.    Same as 2 or 3 above, but don’t do the conversions.
>>
>> I suspect this is the least useful choice.  I am not sure which is the
>> best option.
>>
>> Given that, use the same op code for the floating-point and fixed
>> versions of the same operations.  So we can save eight op codes, the
>> four arithmetic operations, max, min, abs and compare.  So far, a net
>> savings of six opcodes.
>>
>> But we can go further.  There are some opcodes that only make sense for
>> FP operands, e.g. the transcendental instructions.  And there are some
>> operations that probably only make sense for non-FP operands, e.g. POP,
>> FF1, probably shifts.  Given the tag bit, these could share the same
>> op-code.  There may be several more of these.

Hands waving:: "Danger Will Robinson, Danger" more waving of hands.

>> I think this all works fine for a single compilation unit, as the
>> compiler certainly knows the type of the data.  But what happens with
>> separate compilations?  The called function probably doesn’t know the

The compiler will certainly have a function prototype. In any event, if FP
and Integers share a register file the lack of prototype is much less stress-
full to the compiler/linking system.

>> tag value for callee saved registers.  Fortunately, the My 66000
>> architecture comes to the rescue here.  You would modify the Enter and
>> Exit instructions to save/restore the tag bits of the registers they are
>> saving or restoring in the same data structure it uses for the registers
>> (yes, it adds 32 bits to that structure – minimal cost).  The same
>> mechanism works for interrupts that take control away from a running
>> process.

Yes, but we do just fine without the tag and without the stuff mentioned
above. Neither ENTER nor EXIT care about the 64-bit pattern in the register.

>> I don’t think you need to set or clear the tag bits without doing
>> anything else, but if you do, I think you could “repurpose” some other
>> instructions to do this, without requiring another op-code.   For
>> example, Oring a register with itself could be used to set the tag bit
>> and Oring a register with zero could clear it.  These should be pretty
>> rare.

>> That is as far as I got.  I think you could net save perhaps 8-12 op
>> codes, which is about 10% of the existing op codes - not bad.  Is it
>> worth it? 

No.
> To me, a major question is the effect on performance.  What
>> is the cost of having to decode the source registers and reading their
>> respective tag bits before knowing which FU to use? 

The problem is you have put decode dependent on dynamic pipeline information.
I suggest you don't want to do that. Consider a change from int to FP instruction
as a predicated instruction, so the pipeline cannot DECODE the instruction at
hand until the predicate resolves. Yech.
> If it causes an
>> extra cycle per instruction, then it is almost certainly not worth it.
>> IANAHG, so I don’t know.  But even if it doesn’t cost any performance, I
>> think the overall gains are pretty small, and probably not worth it
>> unless the op-code space is really tight (which, for My 66000 it isn’t).
>>
>> Anyway, it has been fun thinking about this, so I hope you don’t mind
>> the, probably too long, post.
>> Any comments are welcome.

It is actually an interesting idea if you want to limit your architecture
to 1-wide.
>>

> FWIW:
> This doesn't seem too far off from what would be involved with dynamic
> typing at the ISA level, but with many of same sorts of drawbacks...

> Say, for example, top 2 bits of a register:
> 00: Object Reference
> Next 2 bits:
> 00: Pointer (with type-tag)
> 01: ?
> 1z: Bounded Array
> 01: Fixnum (route to ALU)
> 10: Flonum (route to FPU)
> 11: Other types
> 00: Smaller value types
> Say: int/uint, short/ushort, ...
> ...

> One issue:
> Decoding based on register tags would mean needing to know the register
> tag bits at the same time the instruction is being decoded. In this
> case, one is likely to need two clock-cycles to fully decode the opcode.

More importantly, you added a cycle AFTER register READ/Forward before
you can start executing (more when OoO is in use).

And finally, the compiler KNOWS what the type is at compile time.

Re: "Mini" tags to reduce the number of op codes

<e915303b53f3b4099ff254a4dcdfbe17@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38220&group=comp.arch#38220

  copy link   Newsgroups: comp.arch
Date: Wed, 3 Apr 2024 21:53:26 +0000
Subject: Re: "Mini" tags to reduce the number of op codes
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$esq/jKpdUDnmSE7kqeCOguzQBw5twAVyRYsuHRyPrq1LcvOuVSnN6
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uuk100$inj$1@dont-email.me> <uukduu$4o4p$1@dont-email.me>
Organization: Rocksolid Light
Message-ID: <e915303b53f3b4099ff254a4dcdfbe17@www.novabbs.org>
 by: MitchAlsup1 - Wed, 3 Apr 2024 21:53 UTC

BGB-Alt wrote:

>
> FWIW:
> This doesn't seem too far off from what would be involved with dynamic
> typing at the ISA level, but with many of same sorts of drawbacks...

> Say, for example, top 2 bits of a register:
> 00: Object Reference
> Next 2 bits:
> 00: Pointer (with type-tag)
> 01: ?
> 1z: Bounded Array
> 01: Fixnum (route to ALU)
> 10: Flonum (route to FPU)
> 11: Other types
> 00: Smaller value types
> Say: int/uint, short/ushort, ...
> ...

So, you either have 66-bit registers, or you have 62-bit FP numbers ?!?
This solves nobody's problems; not even LISP.

> One issue:
> Decoding based on register tags would mean needing to know the register
> tag bits at the same time the instruction is being decoded. In this
> case, one is likely to need two clock-cycles to fully decode the opcode.

Not good. But what if you don't know the tag until the register is delivered
from a latent FU, do you stall DECODE, or do you launch and make the instruction
queue element have to deal with all outcomes.

> ID1: Unpack instruction to figure out register fields, etc.
> ID2: Fetch registers, specialize variable instructions based on tag bits.

> For timing though, one ideally doesn't want to do anything with the
> register values until the EX stages (since ID2 might already be tied up
> with the comparably expensive register-forwarding logic), but asking for
> 3 cycles for decode is a bit much.

> Otherwise, if one does not know which FU should handle the operation
> until EX1, this has its own issues.

Real-friggen-ely

> Or, possible, the FU's decide
> whether to accept the operation:
> ALU: Accepts operation if both are fixnum, FPU if both are Flonum.

What if IMUL is performed in FMAC, IDIV in FDIV,... Int<->FP routing is
based on calculation capability {Even CDC 6600 performed int × in the
FP × unit (not in Thornton's book, but via conversation with 6600 logic
designer at Asilomar some time ago. All they had to do to get FP × to
perform int × was disable 1 gate.......)

> But, a proper dynamic language allows mixing fixnum and flonum with the
> result being implicitly converted to flonum, but from the FPU's POV,
> this would effectively require two chained FADD operations (one for the
> Fixnum to Flonum conversion, one for the FADD itself).

That is a LANGUAGE problem not an ISA problem. SNOBOL allowed one to add
a string to an integer and the string would be converted to int before.....

> Many other cases could get hairy, but to have any real benefit, the CPU
> would need to be able to deal with them. In cases where the compiler
> deals with everything, the type-tags become mostly moot (or potentially
> detrimental).

You are arguing that the added complexity would somehow pay for itself.
I can't see it paying for itself.

> But, then, there is another issue:
> C code expects C type semantics to be respected, say:
> Signed int overflow wraps at 32 bits (sign extending);
maybe
> Unsigned int overflow wraps at 32 bits (zero extending);
maybe
> Variables may not hold values out-of-range for that type;
LLVM does this GCC does not.
> The 'long long' and 'unsigned long long' types are exactly 64-bit;
At least 64-bit not exactly.
> ...
> ...

> If one has tagged 64-bit registers, then fixnum might not hold the
> entire range of 'long long'. If one has 66 or 68 bit registers, then
> memory storage is a problem.

Ya think ?

> If one has untagged registers for cases where they are needed, one has
> not saved any encoding space.

I give up--not worth trying to teach cosmologist why the color of the
lipstick going on the pig is not the problem.....

Re: "Mini" tags to reduce the number of op codes

<ivlPN.604513$PuZ9.269498@fx11.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38221&group=comp.arch#38221

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!nntp.comgw.net!peer02.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx11.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: "Mini" tags to reduce the number of op codes
Newsgroups: comp.arch
References: <uuk100$inj$1@dont-email.me> <uukduu$4o4p$1@dont-email.me> <e915303b53f3b4099ff254a4dcdfbe17@www.novabbs.org>
Lines: 42
Message-ID: <ivlPN.604513$PuZ9.269498@fx11.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Wed, 03 Apr 2024 23:20:46 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Wed, 03 Apr 2024 23:20:46 GMT
X-Received-Bytes: 2591
 by: Scott Lurndal - Wed, 3 Apr 2024 23:20 UTC

mitchalsup@aol.com (MitchAlsup1) writes:
>BGB-Alt wrote:
>
>>

>> But, a proper dynamic language allows mixing fixnum and flonum with the
>> result being implicitly converted to flonum, but from the FPU's POV,
>> this would effectively require two chained FADD operations (one for the
>> Fixnum to Flonum conversion, one for the FADD itself).
>
>That is a LANGUAGE problem not an ISA problem. SNOBOL allowed one to add
>a string to an integer and the string would be converted to int before.....

The Burroughs B3500 would simply ignore the zone digit when adding
a string to an integer, based on the address controller for the
operand.

ADD 1225 010000(UN) 020000(UA) 030000(UN)

Would add the 12 unsigned numeric nibbles at address 10000
to the 25 numeric digits of the 8-bit EBCDIC/ASCII data at address 20000
and store the result as 25 numeric nibbles at address 30000.

ADD 0507 010000(UN) 020000(UN) 030000(UA)

Would add the 5 unsigned numeric nibbles at 10000 to
the 7 unsigned numeric nibbles at 20000 and store them
as 8-bit EBCDIC bytes at 30000 (inserting the zone digit @F@
before each numeric nibble). A processor mode toggle selected
whether the inserted zone digit should be @F@ (EBCDIC) or @3@ (ASCII).

Likewise for SUB, INC, DEC, MPY, DIV and data movement instructions.

The data movement instructions would left- or right-align the destination
field (MVN (move numeric) would right justify and MVA (move alphanumeric) would
left justify) when the destination and source field lengths differ.

Floating point was BCD with an exponent sign digit, two exponent digits,
a mantissa sign digit and a variable length mantissa of up
to 100 digits in length. The integer instructions could be used
on either the mantissa or exponent individually, as they were
just fields in memory.

Re: "Mini" tags to reduce the number of op codes

<uuks6s$7p08$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38222&group=comp.arch#38222

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Wed, 3 Apr 2024 19:27:59 -0500
Organization: A noiseless patient Spider
Lines: 217
Message-ID: <uuks6s$7p08$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me> <uukduu$4o4p$1@dont-email.me>
<e915303b53f3b4099ff254a4dcdfbe17@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 04 Apr 2024 00:28:13 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="7a85b7280e08e1d7944c412aa4f1d5d9";
logging-data="254984"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/uAVUDSi6Y3+T5xBWHhAz+9+BnqJQEpZ8="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:rRpZoXPUj62iOM64UWvqwo+7guo=
In-Reply-To: <e915303b53f3b4099ff254a4dcdfbe17@www.novabbs.org>
Content-Language: en-US
 by: BGB - Thu, 4 Apr 2024 00:27 UTC

On 4/3/2024 4:53 PM, MitchAlsup1 wrote:
> BGB-Alt wrote:
>
>>
>> FWIW:
>> This doesn't seem too far off from what would be involved with dynamic
>> typing at the ISA level, but with many of same sorts of drawbacks...
>
>
>
>> Say, for example, top 2 bits of a register:
>>    00: Object Reference
>>      Next 2 bits:
>>        00: Pointer (with type-tag)
>>        01: ?
>>        1z: Bounded Array
>>    01: Fixnum (route to ALU)
>>    10: Flonum (route to FPU)
>>    11: Other types
>>      00: Smaller value types
>>        Say: int/uint, short/ushort, ...
>>      ...
>
> So, you either have 66-bit registers, or you have 62-bit FP numbers ?!?
> This solves nobody's problems; not even LISP.
>

Yeah, there is likely no way to make this worthwhile...

>> One issue:
>> Decoding based on register tags would mean needing to know the
>> register tag bits at the same time the instruction is being decoded.
>> In this case, one is likely to need two clock-cycles to fully decode
>> the opcode.
>
> Not good. But what if you don't know the tag until the register is
> delivered from a latent FU, do you stall DECODE, or do you launch and
> make the instruction
> queue element have to deal with all outcomes.
>

It is likely that the pipeline would need to stall until results are
available.

It is also likely that such a CPU would have a minimum effective latency
of 2 or 3 clock cycles for *every* instruction (and probably 4 or 5
cycles for memory load), in addition to requiring pipeline stalls.

>> ID1: Unpack instruction to figure out register fields, etc.
>> ID2: Fetch registers, specialize variable instructions based on tag bits.
>
>> For timing though, one ideally doesn't want to do anything with the
>> register values until the EX stages (since ID2 might already be tied
>> up with the comparably expensive register-forwarding logic), but
>> asking for 3 cycles for decode is a bit much.
>
>> Otherwise, if one does not know which FU should handle the operation
>> until EX1, this has its own issues.
>
> Real-friggen-ely
>

These issues could be a deal-breaker for such a CPU.

>>                                     Or, possible, the FU's decide
>> whether to accept the operation:
>>    ALU: Accepts operation if both are fixnum, FPU if both are Flonum.
>
> What if IMUL is performed in FMAC, IDIV in FDIV,... Int<->FP routing is
> based on calculation capability {Even CDC 6600 performed int × in the FP
> × unit (not in Thornton's book, but via conversation with 6600 logic
> designer at Asilomar some time ago. All they had to do to get FP × to
> perform int × was disable 1 gate.......)
>

Then you have a mess...

So, probably need to sort it out before EX in any case.

>> But, a proper dynamic language allows mixing fixnum and flonum with
>> the result being implicitly converted to flonum, but from the FPU's
>> POV, this would effectively require two chained FADD operations (one
>> for the Fixnum to Flonum conversion, one for the FADD itself).
>
> That is a LANGUAGE problem not an ISA problem. SNOBOL allowed one to add
> a string to an integer and the string would be converted to int before.....
>

If you have dynamic types in hardware in this way, then effectively the
typesystem mechanics switch from being a language issue to a hardware issue.

One may also end up with, say, a CPU that can run Scheme or JavaScript
or similar, but likely couldn't run C without significant hassles.

>> Many other cases could get hairy, but to have any real benefit, the
>> CPU would need to be able to deal with them. In cases where the
>> compiler deals with everything, the type-tags become mostly moot (or
>> potentially detrimental).
>
> You are arguing that the added complexity would somehow pay for itself.
> I can't see it paying for itself.
>

One either goes all in, or abandons the idea entirely.
There isn't really a middle option in this scenario (then one just ends
up with something that is bad at everything).

I was not saying it could work, but in a way, pointing out the issues
that would likely make this unworkable.

Though, that said, there could be possible merit in a CPU core that
could run a language like ECMAScript at roughly C like speeds, even if
it was basically unusable for pretty much anything else.

Though, for ECMAScript, also make a case for taking the SpiderMonkey
option and largely abandoning the use of an integer ALU (instead running
all of the integer math through the FPU; which could be modified to
support bitwise integer operations and similar as well).

>> But, then, there is another issue:
>>    C code expects C type semantics to be respected, say:
>>      Signed int overflow wraps at 32 bits (sign extending);
> maybe
>>      Unsigned int overflow wraps at 32 bits (zero extending);
> maybe

I am dealing with some code that has a bad habit of breaking if integer
overflows don't happen in the expected ways (say, the ROTT engine is
pretty bad about this one...).

When I first started working on my ROTT port, there was also a lot of
wackiness where the engine would go out of bounds, then behavior would
depend on what other things in memory it encountered when it did so.

I have mostly managed to fix up all the out-of-bounds issues, but this
isn't enough to keep the demo's from desyncing (a similar issue applies
with my Doom port).

Apparently, other engines like ZDoom and similar needed to do a bit of
"heavy lifting" to get the demos from all of the various WAD versions to
play without desync; as Doom was also dependent on the behavior of
out-of-bounds memory accesses, and it was needed to turn these into
in-bounds accesses (to larger memory objects) with the memory contents
of the out-of-bounds accesses being faked.

Of course, the other option is just to "fix" the out-of-bounds accesses,
and live with a port where the demo playback desyncs.

Meanwhile, Quake entirely avoided this issue:
The demo playback is based on recording the location and orientation of
the player and any enemies at every point in time and similar, rather
than based on recording and replaying the original sequence of keyboard
inputs (and assuming that everything always happens exactly the same
each time).

Then again, these sorts of issues are not unique to these games. Have
watched more than a few speed-runs involving using glitches either to
leave the playable parts of the map, or using convoluted sequences of
actions to corrupt memory in such a way as to achieve a desired effect
(such as triggering a warp to the end of the game).

Like, during normal gameplay, these games are seemingly just sorta
corrupting memory all over the place but, for the most part, no one
notices until something goes more obviously wrong...

>>      Variables may not hold values out-of-range for that type;
> LLVM does this GCC does not.
>>      The 'long long' and 'unsigned long long' types are exactly 64-bit;
> At least 64-bit not exactly.

C only requires at-least 64 bits.
I suspect in-practice, most code expects exactly 64 bits.

>>        ...
>>      ...
>
>> If one has tagged 64-bit registers, then fixnum might not hold the
>> entire range of 'long long'. If one has 66 or 68 bit registers, then
>> memory storage is a problem.
>
> Ya think ?
>

Both options suck, granted.

>> If one has untagged registers for cases where they are needed, one has
>> not saved any encoding space.
>
> I give up--not worth trying to teach cosmologist why the color of the
> lipstick going on the pig is not the problem.....

I was not trying to claim that this idea wouldn't suck.

In my case, I went a different route that works a little better:
Leaving all this stuff mostly up to software...

Re: "Mini" tags to reduce the number of op codes

<uulojh$honc$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38223&group=comp.arch#38223

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Thu, 4 Apr 2024 10:32:48 +0200
Organization: A noiseless patient Spider
Lines: 87
Message-ID: <uulojh$honc$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me> <uukduu$4o4p$1@dont-email.me>
<420556afacf3ef3eea07b95498bcbef0@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Injection-Date: Thu, 04 Apr 2024 08:32:51 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="bfdd135880ec513e2e0aed06acad7b8e";
logging-data="582380"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1946WP61eWYwVWGAL7BAke3ztxNcN6VI/4UJyeBK8l6kw=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18.2
Cancel-Lock: sha1:K8uMU+lNuYJ8Blmu2MiHblYzFUE=
In-Reply-To: <420556afacf3ef3eea07b95498bcbef0@www.novabbs.org>
 by: Terje Mathisen - Thu, 4 Apr 2024 08:32 UTC

MitchAlsup1 wrote:
> BGB-Alt wrote:
>
>> On 4/3/2024 11:43 AM, Stephen Fuld wrote:
>>> There has been discussion here about the benefits of reducing the
>>> number of op codes.  One reason not mentioned before is if you have
>>> fixed length instructions, you may want to leave as many codes as
>>> possible available for future use.  Of course, if you are doing a
>>> 16-bit instruction design, where instruction bits are especially
>>> tight, you may save enough op-codes to save a bit, perhaps allowing a
>>> larger register specifier field, or to allow more instructions in the
>>> smaller subset.
>>>
>>> It is in this spirit that I had an idea, partially inspired by Mill’s
>>> use of tags in registers, but not memory.  I worked through this idea
>>> using the My 6600 as an example “substrate” for two reasons.  First, it
>                66000
>>> has several features that are “friendly” to the idea.  Second, I know
>>> Mitch cares about keeping the number of op codes low.
>>>
>>> Please bear in mind that this is just the germ of an idea.  It is
>>> certainly not fully worked out.  I present it here to stimulate
>>> discussions, and because it has been fun to think about.
>>>
>>> The idea is to add 32 bits to the processor state, one per register
>>> (though probably not physically part of the register file) as a tag.
>>> If set, the bit indicates that the corresponding register contains a
>>> floating-point value.  Clear indicates not floating point (integer,
>>> address, etc.).  There would be two additional instructions, load
>>> single floating and load double floating, which work the same as the
>>> other 32- and 64-bit loads, but in addition to loading the value, set
>>> the tag bit for the destination register.  Non-floating-point loads
>>> would clear the tag bit.  As I show below, I don’t think you need any
>>> special "store tag" instructions.
>
> What do you do when you want a FP bit pattern interpreted as an integer,
> or vice versa.

This is why, if you want to copy Mill, you have to do it properly:

Mill does NOT care about the type of data loaded into a particular belt
slot, only the size and if it is a scalar or a vector filling up the
full belt slot. In either case you will also have marker bits for
special types like None and NaR.

So scalar 8/16/32/64/128 and vector 8x16/16x8/32x4/64x2/128x1 (with the
last being the same as the scalar anyway).

Only load ops and explicit widening/narrowing ops sets the size tag
bits, from that point any op where it makes sense will do the right
thing for either a scalar or a short vector, so you can add 16+16 8-bit
vars with the same ADD encoding as you would use for a single 64-bit ADD.

We do NOT make any attempt to interpret the actual bit patterns sotred
within each belt slot, that is up to the instructions. This means that
there is no difference between loading a float or an int32_t, it also
means that it is perfectly legel (and supported) to use bit operations
on a FP variable. This can be very useful, not just to fake exact
arithmetic by splitting a double into two 26-bit mantissa parts:

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: "Mini" tags to reduce the number of op codes

<20240404164744.0000371c@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38224&group=comp.arch#38224

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Thu, 4 Apr 2024 16:47:44 +0300
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <20240404164744.0000371c@yahoo.com>
References: <uuk100$inj$1@dont-email.me>
<uukduu$4o4p$1@dont-email.me>
<420556afacf3ef3eea07b95498bcbef0@www.novabbs.org>
<uulojh$honc$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 04 Apr 2024 13:47:53 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="12b418826dd2f4b5860a6017a4044401";
logging-data="4021724"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19MC7nda4iStFR7OBuhcDOBxNZtwfwNTOY="
Cancel-Lock: sha1:WJ6c/eD6kSDnnWgn1CIH7FvJJ6U=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
 by: Michael S - Thu, 4 Apr 2024 13:47 UTC

On Thu, 4 Apr 2024 10:32:48 +0200
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
>
> We do NOT make any attempt
>
> Terje
>

Does a present tense means that you are still involved in Mill project?

Re: "Mini" tags to reduce the number of op codes

<uumu4h$r0nk$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38226&group=comp.arch#38226

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Thu, 4 Apr 2024 21:13:21 +0200
Organization: A noiseless patient Spider
Lines: 21
Message-ID: <uumu4h$r0nk$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me> <uukduu$4o4p$1@dont-email.me>
<420556afacf3ef3eea07b95498bcbef0@www.novabbs.org>
<uulojh$honc$1@dont-email.me> <20240404164744.0000371c@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 04 Apr 2024 19:13:22 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="bfdd135880ec513e2e0aed06acad7b8e";
logging-data="885492"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/kZiaQ7GTW76hz+krClYgGuQSJDAog65Lg0uzvKfaAkA=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18.2
Cancel-Lock: sha1:2y/nqkpv0UhHK5vAB75aropevMc=
In-Reply-To: <20240404164744.0000371c@yahoo.com>
 by: Terje Mathisen - Thu, 4 Apr 2024 19:13 UTC

Michael S wrote:
> On Thu, 4 Apr 2024 10:32:48 +0200
> Terje Mathisen <terje.mathisen@tmsw.no> wrote:
>>
>> We do NOT make any attempt
>>
>> Terje
>>
>
> Does a present tense means that you are still involved in Mill project?
>
I am much less active than I used to be, but I still get the weekly conf
call invites and respond to any interesting subject on our mailing list.

So, yes, I do consider myself to still be involved.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: "Mini" tags to reduce the number of op codes

<20240404222530.00002a95@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38227&group=comp.arch#38227

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Thu, 4 Apr 2024 22:25:30 +0300
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <20240404222530.00002a95@yahoo.com>
References: <uuk100$inj$1@dont-email.me>
<uukduu$4o4p$1@dont-email.me>
<420556afacf3ef3eea07b95498bcbef0@www.novabbs.org>
<uulojh$honc$1@dont-email.me>
<20240404164744.0000371c@yahoo.com>
<uumu4h$r0nk$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 04 Apr 2024 19:25:34 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="3c4c4a494315ce62f6b780487464cc51";
logging-data="871521"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19mKabM0MY2CEN7RwXjpsteScqtkQd9Xwg="
Cancel-Lock: sha1:ew69voWCg7eG43V1QUjSrSmB260=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
 by: Michael S - Thu, 4 Apr 2024 19:25 UTC

On Thu, 4 Apr 2024 21:13:21 +0200
Terje Mathisen <terje.mathisen@tmsw.no> wrote:

> Michael S wrote:
> > On Thu, 4 Apr 2024 10:32:48 +0200
> > Terje Mathisen <terje.mathisen@tmsw.no> wrote:
> >>
> >> We do NOT make any attempt
> >>
> >> Terje
> >>
> >
> > Does a present tense means that you are still involved in Mill
> > project?
> I am much less active than I used to be, but I still get the weekly
> conf call invites and respond to any interesting subject on our
> mailing list.
>
> So, yes, I do consider myself to still be involved.
>
> Terje
>

Thank you

Re: "Mini" tags to reduce the number of op codes

<uun9is$tpk9$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38231&group=comp.arch#38231

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bohannonindustriesllc@gmail.com (BGB-Alt)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Thu, 4 Apr 2024 17:28:43 -0500
Organization: A noiseless patient Spider
Lines: 129
Message-ID: <uun9is$tpk9$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me> <uukduu$4o4p$1@dont-email.me>
<420556afacf3ef3eea07b95498bcbef0@www.novabbs.org>
<uulojh$honc$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 04 Apr 2024 22:28:44 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="b6c0ef59d3cbba2f2dcb16f34c8f843d";
logging-data="976521"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/80E6LBM+lpcA+mR80cZ1PWpXPYiICAZg="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ArRAneYNpiG2SmKH8VMcPoZozY4=
Content-Language: en-US
In-Reply-To: <uulojh$honc$1@dont-email.me>
 by: BGB-Alt - Thu, 4 Apr 2024 22:28 UTC

On 4/4/2024 3:32 AM, Terje Mathisen wrote:
> MitchAlsup1 wrote:
>> BGB-Alt wrote:
>>
>>> On 4/3/2024 11:43 AM, Stephen Fuld wrote:
>>>> There has been discussion here about the benefits of reducing the
>>>> number of op codes.  One reason not mentioned before is if you have
>>>> fixed length instructions, you may want to leave as many codes as
>>>> possible available for future use.  Of course, if you are doing a
>>>> 16-bit instruction design, where instruction bits are especially
>>>> tight, you may save enough op-codes to save a bit, perhaps allowing
>>>> a larger register specifier field, or to allow more instructions in
>>>> the smaller subset.
>>>>
>>>> It is in this spirit that I had an idea, partially inspired by
>>>> Mill’s use of tags in registers, but not memory.  I worked through
>>>> this idea using the My 6600 as an example “substrate” for two
>>>> reasons.  First, it
>>                 66000
>>>> has several features that are “friendly” to the idea.  Second, I
>>>> know Mitch cares about keeping the number of op codes low.
>>>>
>>>> Please bear in mind that this is just the germ of an idea.  It is
>>>> certainly not fully worked out.  I present it here to stimulate
>>>> discussions, and because it has been fun to think about.
>>>>
>>>> The idea is to add 32 bits to the processor state, one per register
>>>> (though probably not physically part of the register file) as a tag.
>>>> If set, the bit indicates that the corresponding register contains a
>>>> floating-point value.  Clear indicates not floating point (integer,
>>>> address, etc.).  There would be two additional instructions, load
>>>> single floating and load double floating, which work the same as the
>>>> other 32- and 64-bit loads, but in addition to loading the value,
>>>> set the tag bit for the destination register.  Non-floating-point
>>>> loads would clear the tag bit.  As I show below, I don’t think you
>>>> need any special "store tag" instructions.
>>
>> What do you do when you want a FP bit pattern interpreted as an
>> integer, or vice versa.
>
> This is why, if you want to copy Mill, you have to do it properly:
>
> Mill does NOT care about the type of data loaded into a particular belt
> slot, only the size and if it is a scalar or a vector filling up the
> full belt slot. In either case you will also have marker bits for
> special types like None and NaR.
>
> So scalar 8/16/32/64/128 and vector 8x16/16x8/32x4/64x2/128x1 (with the
> last being the same as the scalar anyway).
>
> Only load ops and explicit widening/narrowing ops sets the size tag
> bits, from that point any op where it makes sense will do the right
> thing for either a scalar or a short vector, so you can add 16+16 8-bit
> vars with the same ADD encoding as you would use for a single 64-bit ADD.
>
> We do NOT make any attempt to interpret the actual bit patterns sotred
> within each belt slot, that is up to the instructions. This means that
> there is no difference between loading a float or an int32_t, it also
> means that it is perfectly legel (and supported) to use bit operations
> on a FP variable. This can be very useful, not just to fake exact
> arithmetic by splitting a double into two 26-bit mantissa parts:
>

I guess useful to know.

Haven't heard much about Mill in a while, so don't know what if any
progress is being made.

As I can note, in my actual ISA, any type-tagging in the registers was
explicit and opt-in, generally managed by the compiler/runtime/etc; in
this case, the ISA merely providing facilities to assist with this.

The main exception would likely have been the possible "Bounds Check
Enforce" mode, which would still need a bit of work to implement, and is
not likely to be terribly useful. Most complicated and expensive parts
are that it will require implicit register and memory tagging (to flag
capabilities). Though, cheaper option is simply to not enable it, in
which case things either behave as before, with the new functionality
essentially being NOP. Much of the work still needed on this would be
getting the 128-bit ABI working, and adding some new tweaks to the ABI
to play well with the capability addressing (effectively it requires
partly reworking how global variables are accessed).

The type-tagging scheme used in my case is very similar to that used in
my previous BGBScript VMs (where, as I can note, BGBCC was itself a fork
off of an early version of the BGBScript VM, and effectively using a lax
hybrid typesystem masquerading as C). Though, it has long since moved to
a more proper C style typesystem, with dynamic types more as an optional
extension.

But, as can be noted, since dynamic typing is implemented via runtime
calls, it is slower than the use of static types. But, this is likely to
be unavoidable with any kind of conventional-ish architecture (and, some
structures, like bounded array objects and ex-nihilo objects, are
difficult to make performance competitive with bare pointers and structs).

Though, it is not so much that I think it is justifiable to forbid their
existence entirely (as is more the philosophy in many strict static
languages), or to mandate that programs roll their own (as typical in C
and C++ land). Where, with compiler and runtime support, it is possible
to provide them in ways that are higher performance than a plain C
implementation.

Well, and also the annoyance that seemingly every dynamic-language VM
takes a different approach to the implementation of its dynamic
typesystem (along with language differences, ...).

For example, Common Lisp is very different from Smalltalk, despite both
being categorically similar in this sense (or, versus Python, or versus
JavaScript, or, ...). Not likely viable to address all of them in the
same runtime (and would likely result in a typesystem that doesn't
really match with any of them, ...).

Though, annoyingly, there are not really any mainstream languages in the
"hybrid" category (say, in the gray area between C and ActionScript).
And, then it ends up being a question of which is better in a choice
between C with AS-like features, or "like AS but with C features".

So, alas...

> Terje
>

Re: "Mini" tags to reduce the number of op codes

<1eda150f3f6f24095c2204722a2fd541@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38232&group=comp.arch#38232

  copy link   Newsgroups: comp.arch
Date: Fri, 5 Apr 2024 01:48:33 +0000
Subject: Re: "Mini" tags to reduce the number of op codes
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$sZCefH.9PGF2MK/opTu/SOAzNng3Dkj9CyGcz5Q6Fo/Ks6FO01wJ2
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uuk100$inj$1@dont-email.me> <uukduu$4o4p$1@dont-email.me> <420556afacf3ef3eea07b95498bcbef0@www.novabbs.org> <uulojh$honc$1@dont-email.me> <uun9is$tpk9$1@dont-email.me>
Organization: Rocksolid Light
Message-ID: <1eda150f3f6f24095c2204722a2fd541@www.novabbs.org>
 by: MitchAlsup1 - Fri, 5 Apr 2024 01:48 UTC

BGB-Alt wrote:

> On 4/4/2024 3:32 AM, Terje Mathisen wrote:
>> MitchAlsup1 wrote:
>>>

> As I can note, in my actual ISA, any type-tagging in the registers was
> explicit and opt-in, generally managed by the compiler/runtime/etc; in
> this case, the ISA merely providing facilities to assist with this.

> The main exception would likely have been the possible "Bounds Check
> Enforce" mode, which would still need a bit of work to implement, and is
> not likely to be terribly useful.

A while back (and maybe in the future) My 66000 had what I called the
Foreign Access Mode. When the HoB of the pointer was set, the first
entry in the translation table was a 4 doubleword structure, A Root
pointer, the Lowest addressable Byte, the Highest addressable Byte,
and a DW of access rights, permissions,... While sort-of like a capability
I don't think it was close enough to actually be a capability or used as
one.

So, it fell out of favor, and it was not clear how it fit into the
HyperVisor/SuperVisor model, either.

> Most complicated and expensive parts
> are that it will require implicit register and memory tagging (to flag
> capabilities). Though, cheaper option is simply to not enable it, in
> which case things either behave as before, with the new functionality
> essentially being NOP. Much of the work still needed on this would be
> getting the 128-bit ABI working, and adding some new tweaks to the ABI
> to play well with the capability addressing (effectively it requires
> partly reworking how global variables are accessed).

> The type-tagging scheme used in my case is very similar to that used in
> my previous BGBScript VMs (where, as I can note, BGBCC was itself a fork
> off of an early version of the BGBScript VM, and effectively using a lax
> hybrid typesystem masquerading as C). Though, it has long since moved to
> a more proper C style typesystem, with dynamic types more as an optional
> extension.

In general, any time one needs to change the type you waste an instruction
compared to type less registers.

Re: "Mini" tags to reduce the number of op codes

<6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38233&group=comp.arch#38233

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (John Savard)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Thu, 04 Apr 2024 21:13:13 -0600
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com>
References: <uuk100$inj$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 05 Apr 2024 03:13:15 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="3c07d85c4e653c1adc6fe68b70f91598";
logging-data="1213503"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX185X2WdxTRFtvSjWXewtWVytmKLEUAM54s="
Cancel-Lock: sha1:tPUIG3W7XZPfo7ef355kMJl1LLc=
X-Newsreader: Forte Free Agent 3.3/32.846
 by: John Savard - Fri, 5 Apr 2024 03:13 UTC

On some older CPUs, there might be one set of integer opcodes and one
set of floating-point opcodes, with a status register containing the
integer precision, and the floating-point precision, currently in use.

The idea was that this would be efficient because most programs only
use one size of each type of number, so the number of opcodes would be
the most appropriate, and that status register wouldn't need to be
reloaded too often.

It's considered dangerous, though, to have a mechanism for changing
what instructions mean, since this could let malware alter what
programs do in a useful and sneaky fashion. Memory bandwidth is no
longer a crippling constraint the way it was back in the days of core
memory and discrete transistors - at least not for program code, even
if memory bandwidth for _data_ often limits the processing speed of
computers.

This is basically because any program that does any real work, taking
any real length of time to do its job, is going to mostly consist of
loops that fit in cache. So letting program code be verbose if there
are other benefits obtained thereby is the current conventional
wisdom.

John Savard

Re: "Mini" tags to reduce the number of op codes

<uuo3pp$16v2r$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38234&group=comp.arch#38234

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Fri, 5 Apr 2024 00:54:54 -0500
Organization: A noiseless patient Spider
Lines: 268
Message-ID: <uuo3pp$16v2r$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me> <uukduu$4o4p$1@dont-email.me>
<420556afacf3ef3eea07b95498bcbef0@www.novabbs.org>
<uulojh$honc$1@dont-email.me> <uun9is$tpk9$1@dont-email.me>
<1eda150f3f6f24095c2204722a2fd541@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 05 Apr 2024 05:56:10 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="ae0c7ce1f1f2160912b24510716acaff";
logging-data="1277019"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19SRP0Q3kLeqBU3wu6PUyj77x2ZkdTAZnA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ofmEpBNO28WTxUiOmT9bSFrnz+8=
In-Reply-To: <1eda150f3f6f24095c2204722a2fd541@www.novabbs.org>
Content-Language: en-US
 by: BGB - Fri, 5 Apr 2024 05:54 UTC

On 4/4/2024 8:48 PM, MitchAlsup1 wrote:
> BGB-Alt wrote:
>
>> On 4/4/2024 3:32 AM, Terje Mathisen wrote:
>>> MitchAlsup1 wrote:
>>>>
>
>> As I can note, in my actual ISA, any type-tagging in the registers was
>> explicit and opt-in, generally managed by the compiler/runtime/etc; in
>> this case, the ISA merely providing facilities to assist with this.
>
>
>> The main exception would likely have been the possible "Bounds Check
>> Enforce" mode, which would still need a bit of work to implement, and
>> is not likely to be terribly useful.
>
> A while back (and maybe in the future) My 66000 had what I called the
> Foreign Access Mode. When the HoB of the pointer was set, the first
> entry in the translation table was a 4 doubleword structure, A Root
> pointer, the Lowest addressable Byte, the Highest addressable Byte,
> and a DW of access rights, permissions,... While sort-of like a capability
> I don't think it was close enough to actually be a capability or used as
> one.
>
> So, it fell out of favor, and it was not clear how it fit into the
> HyperVisor/SuperVisor model, either.
>

Possibly true.

The idea with BCE mode would be that the pointers would contain an
address along with an upper and lower bound, and possibly a few access
flags. It would disable the narrower 64-bit pointer instructions,
forcing the use of the 128-bit pointer instructions; which would perform
bounds checks, and some instructions would gain some additional semantics.

In addition, the Boot SRAM and DRAM gain some special "Tag Bits" areas.

However, it is unclear if the enforcing mode gains much over the normal
optional bounds checking to justify the extra cost. The main "merit"
case is that, in theory, it could offer some additional protection
against hostile machine code (whereas the non-enforcing mode is mostly
useful for detecting out-of-bounds memory accesses).

However, the optional mode is compatible with the use of 64-bit pointers
and the existing C ABI, so there is less overhead.

>>                                   Most complicated and expensive parts
>> are that it will require implicit register and memory tagging (to flag
>> capabilities). Though, cheaper option is simply to not enable it, in
>> which case things either behave as before, with the new functionality
>> essentially being NOP. Much of the work still needed on this would be
>> getting the 128-bit ABI working, and adding some new tweaks to the ABI
>> to play well with the capability addressing (effectively it requires
>> partly reworking how global variables are accessed).
>
>
>> The type-tagging scheme used in my case is very similar to that used
>> in my previous BGBScript VMs (where, as I can note, BGBCC was itself a
>> fork off of an early version of the BGBScript VM, and effectively
>> using a lax hybrid typesystem masquerading as C). Though, it has long
>> since moved to a more proper C style typesystem, with dynamic types
>> more as an optional extension.
>
> In general, any time one needs to change the type you waste an instruction
> compared to type less registers.

In my case, both types of values are used:
int x; //x is a bare register
void *p; //may or may not have tag, high 16 bits 0000 if untagged
__variant y; //y is tagged
auto z; //may be tagged or untagged

Here, untagged values will generally be used for non-variant types,
whereas tagged values for variant types.

Here, 'auto' and 'variant' differ, in that variant says "the type is
only known at runtime", whereas 'auto' assumes that a type exists and
may optionally be resolved at compile time (or, alternatively, it may
decay into variant; assumption being that one may not use auto in ways
that are incompatible with variant). In terms of behavior, both cases
may appear superficially similar.

Though:
auto z = expr;
Would instead define 'z' as a type inferred from the expression (in a
similar way to how it works in C++).

Note that:
__var x;
Would also give a variable of type variant, but is not exactly the same
("__variant" is the type, where "__var" is a statement/expression
keyword that just so happens to declare a variable of type "__variant"
when used in this way).

Say, non-variant:
int, long, double
void*, char*, Foo*, ...
__m128, __vec4f, ...
Variant:
__variant, __object, __fixnum, __string, ...

Where, for example:
__variant
May hold (nearly) any type of value at runtime.
Though, with some semantic restrictions.
__object
Tagged value, like variant;
But does not allow using operators on it directly.
__fixnum
Represents a 62-bit signed integer value.
Always exists in tagged form.
__flonum
Represents a 62-bit floating-point value.
Effectively a tagged Binary64 shifted-right by 2 bits.
__string
Holds a string;
Essentially 'char*' but with a type-tagged pointer.
Defaults to CP-1252 at present, but may also hold a UCS-2 string.
Strings are assumed to be a read-only character array.
...

So, say:
int x, z;
__variant y;

y=x; //implicit int -> __fixnum -> __variant
z=(int)y; //coerces y to 'int'

There are some operators that exist for variant types but not for
non-variant types, such as __instanceof.

if(y __instanceof __fixnum)
{
//y is known to be a fixnum here
}

Where __instanceof can also be used on class instances:
__class Foo __extends Bar __implements IBaz {
... class members ...
};

In theory, could add a header to #define a lot of these keywords in
non-prefixed forms, in which case one could theoretically write, say:
public class Foo extends Bar implements IBaz {
private int x, y;
public int someMethod()
{ return x+y; }
public void setX(int val)
{ x=val; }
...
};

And, if one has, say:
IBaz baz;
...
if(baz instanceof Foo)
{
//baz is an instance of the Foo class
}

Though, will note that object instances are pass-by-reference here (like
in Java and C#) and not by-value. Though, if one is familiar with Java,
probably not too hard to figure out how some of this works. Also, as can
be noted, the object model is more like Java family languages than like C++.

However, unlike Java (and more like ActionScript), one can throw a
'dynamic' (or '__dynamic') keyword on a class, in which case it is
possible to create new members in the object instances merely by
assigning to them (where any members created this way will default to
being 'public variant').

Object member access will differ depending on the type of object.
Direct access to a non-dynamic class member will use a fixed
displacement (like when accessing a struct). Dynamic members will
implicitly access an ex-nihilo object that exists as a hidden member in
the class instance (and using the 'dynamic' modifier on a class will
implicitly create this member).

In this case, interfaces are pulled off by sticking a interface VTable
pointer onto the end of the object, and then encoding the Interface
reference as a pointer to the pointer to this vtable (with the VTable
encoding the offset to adjust the object pointer to give a pointer to
the base class for the virtual method). Note that (unlike in the JVM),
what interfaces a class implements is fixed at compile time ("interface
injection" is not possible in BGBCC).

There was an experimental C++ mode, which tries to mimic C++ syntax and
semantics (kinda), sort of trying to awkwardly fake C++'s object system
on top of the Java-like object system (with POD classes decaying into C
structs; value objects faked with object cloning, ...). Will not take
much to see through this illusion though (and almost doesn't really seem
worth it).

If ex-nihilo objects are used, these are treated as separate from the
instance-of-class objects. In the current implementation, these objects
are represented as small B-Trees representing key/value associations.
Here, each key is a 16-bit number (associated with a "symbol") and the
value is a 64-bit value (variant). Each object has a fixed capacity (16
members), and if exceeded, splits apart into a tree (say, a 2-level tree
representing up to 256 members; with the keys in the top-level node
encoding the ranges of keys present in each sub-node).

At present, there is a limit of 64K unique symbols, but this isn't too
big of an issue in practice (each symbol can be seen as a mapping
between a 16-bit number and an ASCII string representing the symbol's name).

If accessing a normal class member, it will be accessed as a direct
memory load or store, or if it is a dynamic member, an implicit runtime
call will be used.


Click here to read the complete article
Re: "Mini" tags to reduce the number of op codes

<uupkes$1ilv6$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38236&group=comp.arch#38236

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bohannonindustriesllc@gmail.com (BGB-Alt)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Fri, 5 Apr 2024 14:46:35 -0500
Organization: A noiseless patient Spider
Lines: 77
Message-ID: <uupkes$1ilv6$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me>
<6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 05 Apr 2024 19:46:37 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="abc1a6b059189d9dcd2a246a5a525a8a";
logging-data="1660902"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/YNM25XS5y0IwEaiyxL4rxLPsOzhvfXO0="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:QHssr1XHvhECPWG6NHpUwRgmC54=
Content-Language: en-US
In-Reply-To: <6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com>
 by: BGB-Alt - Fri, 5 Apr 2024 19:46 UTC

On 4/4/2024 10:13 PM, John Savard wrote:
> On some older CPUs, there might be one set of integer opcodes and one
> set of floating-point opcodes, with a status register containing the
> integer precision, and the floating-point precision, currently in use.
>
> The idea was that this would be efficient because most programs only
> use one size of each type of number, so the number of opcodes would be
> the most appropriate, and that status register wouldn't need to be
> reloaded too often.
>
> It's considered dangerous, though, to have a mechanism for changing
> what instructions mean, since this could let malware alter what
> programs do in a useful and sneaky fashion. Memory bandwidth is no
> longer a crippling constraint the way it was back in the days of core
> memory and discrete transistors - at least not for program code, even
> if memory bandwidth for _data_ often limits the processing speed of
> computers.
>
> This is basically because any program that does any real work, taking
> any real length of time to do its job, is going to mostly consist of
> loops that fit in cache. So letting program code be verbose if there
> are other benefits obtained thereby is the current conventional
> wisdom.
>

This was how the FPU worked in SH-4. Reloading some bits in FPSCR would
effectively bank out the current set of FPU instructions (say, between
Single and Double, etc).

Also it was how 64-bit operations worked in early versions of 64-bit
versions of BJX1.

Say. there were DQ and JQ bits added to the control register:
DQ=0: 32-bit for variable-sized operations (like SH-4)
DQ=1: 64-bit for variable-sized operations.
JQ=0: 32-bit addressing (SH-4 memory map)
JQ=1: 48-bit addressing (like the later BJX2 memory map).

The DQ bit would also effect whether one had MOV.W or MOV.Q operations
available.
DQ=0: Interpret ops as MOV.W (16-bit)
DQ=1: Interpret ops as MOV.Q (64-bit)

In the DQ=JQ=0 case, it would have been mostly equivalent to SH-4 (and
could still run GCC's compiler output). This was a similar situation to
switching the FPU mode.

Though, a later version of the BJX1 ISA had dropped and repurposed some
encodings, allowing MOV.W and MOV.Q to coexist (and avoiding the need
for the compiler to endlessly toggle this bit), albeit with fewer
addressing modes for the latter.

All this was an issue mostly because SH-4 had used fixed-length 16-bit
instructions, and the encoding space was effectively almost entirely
full when I started (so new instructions required either sacrificing
existing instructions, or using mode bits).

Though, BJX1 did end up with some 32-bit ops, some borrowed from SH-2A
and similar. These were mostly stuck into awkward ad-hoc places in the
16-bit map, so decoding was kind of a pain.

....

When I later rebooted things as my BJX2 project, I effectively dropped
this whole mess and started over (with the caveat that it lost SH-4
compatibility). However, it has since gained RISC-V compatibility, for
better/worse, at least RISC-V is likely to get slightly better
performance than SH-4 at least (and both ISA's can be 64-bit).

....

> John Savard

Re: "Mini" tags to reduce the number of op codes

<15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38238&group=comp.arch#38238

  copy link   Newsgroups: comp.arch
Date: Fri, 5 Apr 2024 21:34:16 +0000
Subject: Re: "Mini" tags to reduce the number of op codes
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$uO8AsEUM.MJV62Ay/nQvzuvcydpXOTBk0o5fxA2lFwol/qKMKUqjG
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uuk100$inj$1@dont-email.me> <6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com>
Organization: Rocksolid Light
Message-ID: <15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org>
 by: MitchAlsup1 - Fri, 5 Apr 2024 21:34 UTC

John Savard wrote:

> On some older CPUs, there might be one set of integer opcodes and one
> set of floating-point opcodes, with a status register containing the
> integer precision, and the floating-point precision, currently in use.

> The idea was that this would be efficient because most programs only
> use one size of each type of number, so the number of opcodes would be
> the most appropriate, and that status register wouldn't need to be
> reloaded too often.

Most programs I write use bytes (mostly unsigned) a few halfwords (mostly
signed) a useful count of integers (both signed and unsigned--mainly as
already defined arguments/returns), and a vast majority of doublewords
(invariably unsigned).

Early in My 66000 LLVM development Brian looked at the cost of having
only 1 FP OpCode set--and it did not look good--so we went back to the
standard way of an OpCode for each FP size × calculation.

> It's considered dangerous, though, to have a mechanism for changing
> what instructions mean, since this could let malware alter what
> programs do in a useful and sneaky fashion. Memory bandwidth is no
> longer a crippling constraint the way it was back in the days of core
> memory and discrete transistors - at least not for program code, even
> if memory bandwidth for _data_ often limits the processing speed of
> computers.

> This is basically because any program that does any real work, taking
> any real length of time to do its job, is going to mostly consist of
> loops that fit in cache. So letting program code be verbose if there
> are other benefits obtained thereby is the current conventional
> wisdom.

> John Savard

Re: "Mini" tags to reduce the number of op codes

<lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38241&group=comp.arch#38241

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (John Savard)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Sat, 06 Apr 2024 21:30:47 -0600
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com>
References: <uuk100$inj$1@dont-email.me> <6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com> <15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 07 Apr 2024 03:30:50 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="96157b3672e128fa20c43971736b9ffd";
logging-data="2714474"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/vUIVxuB3p8D9giUaZwuPOA45WJ5GmYVk="
Cancel-Lock: sha1:Tpj2yzw9fVJanvOxwia1juxktBs=
X-Newsreader: Forte Free Agent 3.3/32.846
 by: John Savard - Sun, 7 Apr 2024 03:30 UTC

On Fri, 5 Apr 2024 21:34:16 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:

>Early in My 66000 LLVM development Brian looked at the cost of having
>only 1 FP OpCode set--and it did not look good--so we went back to the
>standard way of an OpCode for each FP size × calculation.

I do tend to agree.

However, a silly idea has now occurred to me.

256 bits can contain eight instructions that are 32 bits long.

Or they can also contain seven instructions that are 36 bits long,
with four bits left over.

So they could contain *nine* instructions that are 28 bits long, also
with four bits left over.

Thus, instead of having mode bits, one _could_ do the following:

Usually, have 28 bit instructions that are shorter because there's
only one opcode for each floating and integer operation. The first
four bits in a block give the lengths of data to be used.

But have one value for the first four bits in a block that indicates
36-bit instructions instead, which do include type information, so
that very occasional instructions for rarely-used types can be mixed
in which don't fill a whole block.

While that's a theoretical possibility, I don't view it as being
worthwhile in practice.

John Savard

Re: "Mini" tags to reduce the number of op codes

<9280b28665576d098af53a9416604e36@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38242&group=comp.arch#38242

  copy link   Newsgroups: comp.arch
Date: Sun, 7 Apr 2024 20:41:45 +0000
Subject: Re: "Mini" tags to reduce the number of op codes
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$VtVxYWfxsGsp3TNJYojBEuSD7j7WWK.HxFwo9YEnxGBt3OOy8zuzq
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uuk100$inj$1@dont-email.me> <6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com> <15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org> <lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com>
Organization: Rocksolid Light
Message-ID: <9280b28665576d098af53a9416604e36@www.novabbs.org>
 by: MitchAlsup1 - Sun, 7 Apr 2024 20:41 UTC

John Savard wrote:

> On Fri, 5 Apr 2024 21:34:16 +0000, mitchalsup@aol.com (MitchAlsup1)
> wrote:

>>Early in My 66000 LLVM development Brian looked at the cost of having
>>only 1 FP OpCode set--and it did not look good--so we went back to the
>>standard way of an OpCode for each FP size × calculation.

> I do tend to agree.

> However, a silly idea has now occurred to me.

> 256 bits can contain eight instructions that are 32 bits long.

> Or they can also contain seven instructions that are 36 bits long,
> with four bits left over.

> So they could contain *nine* instructions that are 28 bits long, also
> with four bits left over.

I agree with the arithmetic going into this statement. What I don't
have sufficient data concerning is "whether these extra formats pay
for themselves". For example, how many of the 36-bit encodings are
irredundant with the 32-bit ones, and so on with the 28-bit ones.

Take::

ADD R7,R7,#1

I suspect there is a 28-bit form, a 32-bit form, and a 36-bit form
for this semantic step, that you pay for multiple times in decoding
and possibly pipelining. {{There may also be other encodings for
this; such as:: INC R7}}

> Thus, instead of having mode bits, one _could_ do the following:

> Usually, have 28 bit instructions that are shorter because there's
> only one opcode for each floating and integer operation. The first
> four bits in a block give the lengths of data to be used.

How do you attach 32-bit or 64-bit constants to 28-bit instructions ??

How do you switch from 64-bit to Byte to 32-bit to 16-bit in one
set of 256-bit instruction decodes ??

> But have one value for the first four bits in a block that indicates
> 36-bit instructions instead, which do include type information, so
> that very occasional instructions for rarely-used types can be mixed
> in which don't fill a whole block.

In complicated if-then-else codes (and switches) I often see one inst-
ruction followed by a branch to a common point. Does your encoding deal
with these efficiently ?? That is:: what happens when you jump to the
middle of a block of 36-bit instructions ??

> While that's a theoretical possibility, I don't view it as being
> worthwhile in practice.

Agreed.............

> John Savard

Re: "Mini" tags to reduce the number of op codes

<uuv1ir$30htt$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38243&group=comp.arch#38243

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Sun, 7 Apr 2024 21:01:15 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 29
Message-ID: <uuv1ir$30htt$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me>
<6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com>
<15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org>
<lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com>
Injection-Date: Sun, 07 Apr 2024 21:01:15 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="d8a8289746225a35da0d7c8a2409f6e8";
logging-data="3164093"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ckeCGFjJSLHO4ZJ8PUbyViNy4qZAG0J8="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:+KwlKkw7lzkSA5RU5XbmvrxOpyk=
 by: Thomas Koenig - Sun, 7 Apr 2024 21:01 UTC

John Savard <quadibloc@servername.invalid> schrieb:

> Thus, instead of having mode bits, one _could_ do the following:
>
> Usually, have 28 bit instructions that are shorter because there's
> only one opcode for each floating and integer operation. The first
> four bits in a block give the lengths of data to be used.
>
> But have one value for the first four bits in a block that indicates
> 36-bit instructions instead, which do include type information, so
> that very occasional instructions for rarely-used types can be mixed
> in which don't fill a whole block.
>
> While that's a theoretical possibility, I don't view it as being
> worthwhile in practice.

I played around a bit with another scheme: Encoding things into
128-bit blocks, with either 21-bit or 42-bit or longer instructions
(or a block header with six bits, and 20 or 40 bits for each
instruction).

Did that look promising? Not really; the 21 bits offered a lot
of useful opcode space for two-register operations and even for
a few of the often-used three-register, but 42 bits was really
a bit too long, so the advantage wasn't great. And embedding
32-bit or 64-bit instructions in the code stream does not really
fit the 21-bit raster well, so compared to an ISA which can do so
(like My 66000) it came out at a disadvantage. Might be possible
to beat RISC-V, though.

Re: "Mini" tags to reduce the number of op codes

<d71c59a1e0342d0d01f8ce7c0f449f9b@www.novabbs.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38245&group=comp.arch#38245

  copy link   Newsgroups: comp.arch
Date: Sun, 7 Apr 2024 21:22:50 +0000
Subject: Re: "Mini" tags to reduce the number of op codes
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$S6/F.OSUdAdpQKCF/ypUae8WizQ4TQ74y.lEDFYl5m9SjZbk9or3G
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uuk100$inj$1@dont-email.me> <6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com> <15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org> <lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com> <uuv1ir$30htt$1@dont-email.me>
Organization: Rocksolid Light
Message-ID: <d71c59a1e0342d0d01f8ce7c0f449f9b@www.novabbs.org>
 by: MitchAlsup1 - Sun, 7 Apr 2024 21:22 UTC

Thomas Koenig wrote:

> John Savard <quadibloc@servername.invalid> schrieb:

>> Thus, instead of having mode bits, one _could_ do the following:
>>
>> Usually, have 28 bit instructions that are shorter because there's
>> only one opcode for each floating and integer operation. The first
>> four bits in a block give the lengths of data to be used.
>>
>> But have one value for the first four bits in a block that indicates
>> 36-bit instructions instead, which do include type information, so
>> that very occasional instructions for rarely-used types can be mixed
>> in which don't fill a whole block.
>>
>> While that's a theoretical possibility, I don't view it as being
>> worthwhile in practice.

> I played around a bit with another scheme: Encoding things into
> 128-bit blocks, with either 21-bit or 42-bit or longer instructions
> (or a block header with six bits, and 20 or 40 bits for each
> instruction).

Not having seen said encoding scheme:: I suspect you used the Rd=Rs1
destructive operand model for the 21-bit encodings. Yes :: no ??
Otherwise one has 3×5-bit registers = 15-bits leaving only 6-bits
for 64 OpCodes. Now if you have floats and doubles and signed and
unsigned, you get 16 of each and we have not looked at memory
references or branching.

> Did that look promising? Not really; the 21 bits offered a lot
> of useful opcode space for two-register operations and even for
> a few of the often-used three-register, but 42 bits was really
> a bit too long, so the advantage wasn't great. And embedding
> 32-bit or 64-bit instructions in the code stream does not really
> fit the 21-bit raster well, so compared to an ISA which can do so
> (like My 66000) it came out at a disadvantage. Might be possible
> to beat RISC-V, though.

But beating RISC-V is easy, try getting you instruction count down
to VAX counts without losing the ability to pipeline and parallel
instruction execution.

At handwaving accuracy::
VAX has 1.0 instructions
My 66000 has 1.1 instructions
RISC-V has 1.5 instructions

Re: "Mini" tags to reduce the number of op codes

<uv02dn$3b6ik$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38247&group=comp.arch#38247

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Mon, 8 Apr 2024 06:21:43 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 78
Message-ID: <uv02dn$3b6ik$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me>
<6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com>
<15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org>
<lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com> <uuv1ir$30htt$1@dont-email.me>
<d71c59a1e0342d0d01f8ce7c0f449f9b@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 08 Apr 2024 06:21:43 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="d6deb714ccc72ec1e80045745fb7bd3c";
logging-data="3512916"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+2XGIqqUzjk8K8QMlGuMyfqps8HSdazN8="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:J4ZoztDGY+j7kyoZlsxCyeQmbHg=
 by: Thomas Koenig - Mon, 8 Apr 2024 06:21 UTC

MitchAlsup1 <mitchalsup@aol.com> schrieb:
> Thomas Koenig wrote:
>
>> John Savard <quadibloc@servername.invalid> schrieb:
>
>>> Thus, instead of having mode bits, one _could_ do the following:
>>>
>>> Usually, have 28 bit instructions that are shorter because there's
>>> only one opcode for each floating and integer operation. The first
>>> four bits in a block give the lengths of data to be used.
>>>
>>> But have one value for the first four bits in a block that indicates
>>> 36-bit instructions instead, which do include type information, so
>>> that very occasional instructions for rarely-used types can be mixed
>>> in which don't fill a whole block.
>>>
>>> While that's a theoretical possibility, I don't view it as being
>>> worthwhile in practice.
>
>> I played around a bit with another scheme: Encoding things into
>> 128-bit blocks, with either 21-bit or 42-bit or longer instructions
>> (or a block header with six bits, and 20 or 40 bits for each
>> instruction).
>
> Not having seen said encoding scheme:: I suspect you used the Rd=Rs1
> destructive operand model for the 21-bit encodings. Yes :: no ??

It was not very well developed, I gave it up when I saw there wasn't
much to gain.

> Otherwise one has 3×5-bit registers = 15-bits leaving only 6-bits
> for 64 OpCodes.

There could have been a case for adding this (maybe just for
the a few frequent ones: "add r1,r2,r3", "add r1,r2,-r3", "add
r1,r2,#num" and "add r1,r2,#-num", but I did not pursue that
further.

I looked at load and store instructions with short offsets
(these would then have been scaled), and short branches. But
the 21-bit opcode space filled up really, really rapidly.

Also, it is easy to synthesize a 3-register operation from
a 2-register operation and a memory move. If the decoder is
set up for 42 bits anyway, instruction fusion also a possibility.
This got a bit weird.

> Now if you have floats and doubles and signed and
> unsigned, you get 16 of each and we have not looked at memory
> references or branching.

For somebody who does Fortran, I find the frequency of floating
point instructions surprisingly low, even in Fortran code.
>
>> Did that look promising? Not really; the 21 bits offered a lot
>> of useful opcode space for two-register operations and even for
>> a few of the often-used three-register, but 42 bits was really
>> a bit too long, so the advantage wasn't great. And embedding
>> 32-bit or 64-bit instructions in the code stream does not really
>> fit the 21-bit raster well, so compared to an ISA which can do so
>> (like My 66000) it came out at a disadvantage. Might be possible
>> to beat RISC-V, though.
>
> But beating RISC-V is easy, try getting you instruction count down
> to VAX counts without losing the ability to pipeline and parallel
> instruction execution.

> At handwaving accuracy::
> VAX has 1.0 instructions
> My 66000 has 1.1 instructions
> RISC-V has 1.5 instructions

To reach VAX instruction density, one would have to have things
like memory operands (with the associated danger that compilers
will not put intermediate results in registers, but since they have
been optimized for x86 for decades, they are probably better now)
and load with update, which would then have to be cracked
into two micro-ops. Not sure about the benefit.

Re: "Mini" tags to reduce the number of op codes

<2024Apr8.091608@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=38248&group=comp.arch#38248

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Mon, 08 Apr 2024 07:16:08 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 26
Message-ID: <2024Apr8.091608@mips.complang.tuwien.ac.at>
References: <uuk100$inj$1@dont-email.me> <6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com> <15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org> <lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com> <uuv1ir$30htt$1@dont-email.me> <d71c59a1e0342d0d01f8ce7c0f449f9b@www.novabbs.org> <uv02dn$3b6ik$1@dont-email.me>
Injection-Date: Mon, 08 Apr 2024 07:38:29 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="9e31c17914111ad31972de48734f5b7d";
logging-data="3545801"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+qrS+kuPE9j7zE4T8Ka6Vf"
Cancel-Lock: sha1:ybT3oODmCsS+ZXySxl/sT0apsB0=
X-newsreader: xrn 10.11
 by: Anton Ertl - Mon, 8 Apr 2024 07:16 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
>> But beating RISC-V is easy, try getting you instruction count down
>> to VAX counts without losing the ability to pipeline and parallel
>> instruction execution.
>
>> At handwaving accuracy::
>> VAX has 1.0 instructions
>> My 66000 has 1.1 instructions
>> RISC-V has 1.5 instructions
>
>To reach VAX instruction density

Note that in recent times Mitch Alsup ist writing not about code
density (static code size or dynamically executed bytes), but about
instrruction counts. It's unclear why instruction count would be a
primary metric, except that he thinks that he can score points for My
66000 with it. As VAX demonstrates, you can produce an instruction
set with low instruction counts that is bad at the metrics that really
count: cycles for executing the program (for a given CPU chip area in
a given manufacturing process), and, for very small systems, static
code size.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Pages:1234
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor