Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

What this country needs is a good five cent microcomputer.


devel / comp.arch / Alternative Representations of the Concertina II ISA

SubjectAuthor
* Alternative Representations of the Concertina II ISAQuadibloc
+* Re: Alternative Representations of the Concertina II ISAQuadibloc
|`- Re: Alternative Representations of the Concertina II ISAQuadibloc
+* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|`* Re: Alternative Representations of the Concertina II ISAQuadibloc
| +* Re: Alternative Representations of the Concertina II ISAQuadibloc
| |`- Re: Alternative Representations of the Concertina II ISABGB
| `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|  `* Re: Alternative Representations of the Concertina II ISAQuadibloc
|   +- Re: Alternative Representations of the Concertina II ISAMitchAlsup
|   `* Re: Alternative Representations of the Concertina II ISABGB
|    +* Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    |+* Re: Alternative Representations of the Concertina II ISAQuadibloc
|    ||`* Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    || +* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    || |`- Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    || `* Re: Alternative Representations of the Concertina II ISAQuadibloc
|    ||  `- Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |+- Re: Alternative Representations of the Concertina II ISABGB
|    |`* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    | +- Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    | +* Re: Alternative Representations of the Concertina II ISABGB
|    | |`* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    | | `* Re: Alternative Representations of the Concertina II ISARobert Finch
|    | |  +* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    | |  |`- Re: Alternative Representations of the Concertina II ISABGB
|    | |  `- Re: Alternative Representations of the Concertina II ISABGB
|    | `* Re: Alternative Representations of the Concertina II ISAPaul A. Clayton
|    |  +* Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    |  |+* Re: Alternative Representations of the Concertina II ISABGB
|    |  ||+- Re: Alternative Representations of the Concertina II ISABGB
|    |  ||+* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |  |||`- Re: Alternative Representations of the Concertina II ISABGB
|    |  ||`* Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    |  || `* Re: Alternative Representations of the Concertina II ISABGB
|    |  ||  `- Re: Alternative Representations of the Concertina II ISAAnton Ertl
|    |  |`* Re: Alternative Representations of the Concertina II ISAPaul A. Clayton
|    |  | `- Re: Alternative Representations of the Concertina II ISABGB
|    |  `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |   `* Re: Alternative Representations of the Concertina II ISAPaul A. Clayton
|    |    `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |     `* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      +* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      |`* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      | +* Re: Alternative Representations of the Concertina II ISABGB
|    |      | |`* Re: Alternative Representations of the Concertina II ISARobert Finch
|    |      | | +- Re: Alternative Representations of the Concertina II ISABGB
|    |      | | `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      | |  `- Re: Alternative Representations of the Concertina II ISABGB
|    |      | `* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      |  `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      |   `* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      |    +* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      |    |`* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      |    | `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      |    |  `- Re: Alternative Representations of the Concertina II ISAPaul A. Clayton
|    |      |    `* Re: Alternative Representations of the Concertina II ISATerje Mathisen
|    |      |     `* Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    |      |      `- Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |      `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|    |       `- Re: Alternative Representations of the Concertina II ISAThomas Koenig
|    `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|     +- Re: Alternative Representations of the Concertina II ISABGB
|     `* Re: Alternative Representations of the Concertina II ISAMarko Zec
|      `* Re: Alternative Representations of the Concertina II ISABGB
|       `* Re: Alternative Representations of the Concertina II ISAStephen Fuld
|        `* Re: Alternative Representations of the Concertina II ISABGB
|         `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          +* Re: Alternative Representations of the Concertina II ISABGB
|          |`* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          | `* Re: Alternative Representations of the Concertina II ISABGB
|          |  `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          |   +* Re: Alternative Representations of the Concertina II ISARobert Finch
|          |   |+* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          |   ||+* Re: Alternative Representations of the Concertina II ISAChris M. Thomasson
|          |   |||`- Re: Alternative Representations of the Concertina II ISAChris M. Thomasson
|          |   ||`- Re: Alternative Representations of the Concertina II ISARobert Finch
|          |   |`* Re: Alternative Representations of the Concertina II ISATerje Mathisen
|          |   | `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          |   |  `* Re: Alternative Representations of the Concertina II ISATerje Mathisen
|          |   |   `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          |   |    +* Re: Alternative Representations of the Concertina II ISAChris M. Thomasson
|          |   |    |`- Re: Alternative Representations of the Concertina II ISABGB
|          |   |    `- Re: Alternative Representations of the Concertina II ISATerje Mathisen
|          |   `* Re: Alternative Representations of the Concertina II ISABGB
|          |    `- Re: Alternative Representations of the Concertina II ISAMitchAlsup
|          `* Re: Alternative Representations of the Concertina II ISATerje Mathisen
|           +* Re: Alternative Representations of the Concertina II ISAChris M. Thomasson
|           |`* Fast approx hypotenuse (Was Re: Alternative Representations of theTerje Mathisen
|           | `- Re: Fast approx hypotenuse (Was Re: Alternative Representations ofBGB
|           `* Re: Alternative Representations of the Concertina II ISAMitchAlsup
|            `- Re: Alternative Representations of the Concertina II ISABGB
`* Re: Alternative Representations of the Concertina II ISAQuadibloc
 `- Re: Alternative Representations of the Concertina II ISAQuadibloc

Pages:1234
Alternative Representations of the Concertina II ISA

<ujp81t$26ff9$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35173&group=comp.arch#35173

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Alternative Representations of the Concertina II ISA
Date: Fri, 24 Nov 2023 04:15:57 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 166
Message-ID: <ujp81t$26ff9$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 24 Nov 2023 04:15:57 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="73c404d9d948961cd13757af2b101c9b";
logging-data="2309609"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX196NMeHMY04H4D/Xk0VtGzxiaT4cMZWmY8="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:AbvLvNrFhPceUk1tD931hW9n0Ks=
 by: Quadibloc - Fri, 24 Nov 2023 04:15 UTC

The Concertina II ISA, as it stands now, has a
very limited number of opcodes for load and
store memory-reference operations, but a huge
number of opcodes for register-to-register
operate instructions.

This makes it suitable for an implementation where:

Physical memory is some power-of-two multiple of
96 bits in width, for example, 192 bits or 384 bits
wide;

A simple hardware divide-by-three circuit allows
accessing memory normally with the RAM module width
of 64 bits as the fundamental unit;

Applying no conversion allows 24 bits to be the
fundamental unit;

An even simpler _multiply_ by five circuit allows
accessing memory with a 60-bit word as the
fundamental unit.

So it's possible to take an implementation of the
architecture with three-channel memory, and add the
feature of allocating data memory blocks that look
like they're made up of 48-bit words or 60-bit words.

When a memory-reference instruction refers to an
address which is inside of such a variant block,
its operation woudl be modified to handle a data
type of an appropriate width for that kind of memory.

Thus:

In 60-bit memory,

Load Floating would be an invalid operation;
Load Double would load a 60-bit floating-point number;
Load Medium would load a 45-bit floating-point number, as
such a float would still be wide enough to be useful.

Load Byte would be an invalid operation.
Load Halfword would load a 15-bit integer;
Load would load a 30-bit integer;
Load Long would load a 60-bit integer.

In 36-bit memory,

Load Floating would load a 36-bit floating-point number
Load Double would load a 72-bit floating-point nuber
Load Medium would load a 54-bit floating-point number

Load Byte would load a 9-bit byte
Load Halfword would load an 18-bit halfword integer
Load would load a 36-bit integer
Load Long would load a 72-bit integer into a _pair_
of registers, the first of which would be an even-numbered
register, as the registers are only 64 bits long

In 24-bit memory,

Load Floating would be an invalid operation
Load Double would load a 72-bit floating-point number;
Load Medium would load a 48-bit floating-point number;

Load Byte would load a 6-bit character;
Load Halfword would load a 12-bit integer;
Load would load a 24-bit integer;
Load Long would load a 48-bit integer.

Note that the relationship between floating-point
widths and integer widths in instructions is offset
by a factor of two when 24-bit memory is referenced.

And there would be separate opcodes for doing arithmetic
on numbers of these different lengths in the registers.

This is all very well. But in program code, which, as
it is so far specified, can _only_ reside in 64-bit memory,
what about pseudo-immediate values?

Since the instruction format that references pseudo-immediate
values is that of an operate instruction, pseudo-immediates
_could_ be allowed, with considerable wasted space in most
cases, in operate instructions for the data types associated
with variant memory widths.

However, I have thought of a way to allow the CPU to treat
the different memory widths with greater equality.

Instead of devising a whole new ISA for each of the other
memory widths, the *existing* ISA, designed around standard
memory built up from the 64-bit unit, could be given
_alternative representations_ wherein programs could be
stored in memory of other widths, using the same ISA with
minimum modifications.

The 36-bit Representation

In the standard 64-bit representation, a block may begin
with 1100, followed by a 2-bit prefix field for each of the
remaining fourteen 16-bit parts of the remaining seven
32-bit instruction slots in the block.

So, make that portion of the ISA the only one supported in
the 36-bit representation, but with each 36-bit word, in
a block, now 288 bits long, with eight 36-bit words, consisting
of a prefix field, followed by 16 instruction bits, repeated
sixteen times.

But it must still be possible to have block headers if one is
to have pseudo-immediates.

So the first 36-bit instruction slot of a block, if it is to
be a header, would start with 1011100011, and the pre- field
portion of the second 18 bits of that instruction slot would
contain 11. 1011100011 consists of 10 (32-bit instruction)
and 11100011 (header prefix in 32-bit mode).

So the ISA is modified because in 64-bit memory, only seven
instruction slots, not all eight, can contain 36-bit instructions,
and there are no headers inside 36 bit instructions. But the
modifications are minimal, for the purpose of adapting programs
to the different size of memory, with that size being native
for them.

The 24-bit representation

Since a 24-bit word contains three 8-bit bytes, here, the
form of instructions isn't modified at all. But blocks shrink
from eight instruction slots that are 32-bits in length to
eight 24-bit words, so a block would be 192 bits long.

The amount of wasted space for data in 24-bit types wouldn't
be changed much in many cases by using this representation
instead of the 32-bit representation for program code,
because the clash between instruction widths and data widths
wouldn't be affected. However, _some_ improvement would be
achieved, because now if there are multiple pseudo-immediate
values used in a single block, the 24-bit data items could
at least be *packed* correctly in the block.

The 60-bit representation

Here, a block is 480 bits long.

It consists of two 240-bit sub-blocks, in the following form:

One 15 bit halfword containing 15 prefix bits,

and fifteen 15-bit halfwords to which those bits apply,
turning them into 16-bit halfwords, which are used to form
instructions.

The blocks are aligned on 480-bit boundaries. When code
consists of 32-bit instructions, the first sub-block begins
with an instruction, and the second sub-block begins with the
second half of an instruction.

The first whole 32-bit instruction in a sub-block can be a
header, and so one has a header applying to seven subsequent
instruction slots in the first sub-block, and a header applying
to six subsequent instruction slots in the second sub-block.

John Savard

Re: Alternative Representations of the Concertina II ISA

<ujp9jm$26sr2$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35175&group=comp.arch#35175

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Fri, 24 Nov 2023 04:42:31 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <ujp9jm$26sr2$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 24 Nov 2023 04:42:31 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="73c404d9d948961cd13757af2b101c9b";
logging-data="2323298"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Ii1RnkgAtml2yYmL7p5qf1sXh5CImIFw="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:xcVI8cuQSXUCU7wJumPnX0/SAzw=
 by: Quadibloc - Fri, 24 Nov 2023 04:42 UTC

On Fri, 24 Nov 2023 04:15:57 +0000, Quadibloc wrote:

> In 24-bit memory,
>
> Load Floating would be an invalid operation Load Double would load a
> 72-bit floating-point number;
> Load Medium would load a 48-bit floating-point number;

Alternatively, it would perhaps be simpler to do it this way:

Load Floating would load a 48-bit floating-point number;
Load Medium would load a 72-bit floating-point number.

This would prioritize structure over function, but only
the compiler would ever know...

The 24-bit representation of code, however, has a bigger problem.

In the 60-bit representation, a 15-bit halfword that can be
addressed easily in 60-bit memory corresponds to a 16-bit area
in the original ISA;

In the 36-bit representation, an 18-bit halfword, easily addressible
in 36-bit memory, does the same.

In the 24-bit representation, though, a problem exists; the simplest
way to deal with that would be, in code consisting entirely of 32-bit
instructions, to allow only every third instruction to be a branch
target.

John Savard

Re: Alternative Representations of the Concertina II ISA

<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35192&group=comp.arch#35192

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Fri, 24 Nov 2023 18:48:53 +0000
Organization: novaBBS
Message-ID: <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1989331"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$LeZamGU8iC4xJ5.ny9t4cO0mzOD5U0JX1ML/.G8CbcVt.gPIyGdLy
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Fri, 24 Nov 2023 18:48 UTC

Quadibloc wrote:

> The Concertina II ISA, as it stands now, has a
> very limited number of opcodes for load and
> store memory-reference operations, but a huge
> number of opcodes for register-to-register
> operate instructions.

> This makes it suitable for an implementation where:

> Physical memory is some power-of-two multiple of
> 96 bits in width, for example, 192 bits or 384 bits
> wide;

> A simple hardware divide-by-three circuit allows
> accessing memory normally with the RAM module width
> of 64 bits as the fundamental unit;

> Applying no conversion allows 24 bits to be the
> fundamental unit;

> An even simpler _multiply_ by five circuit allows
> accessing memory with a 60-bit word as the
> fundamental unit.

> So it's possible to take an implementation of the
> architecture with three-channel memory, and add the
> feature of allocating data memory blocks that look
> like they're made up of 48-bit words or 60-bit words.

> When a memory-reference instruction refers to an
> address which is inside of such a variant block,
> its operation woudl be modified to handle a data
> type of an appropriate width for that kind of memory.

> Thus:

> In 60-bit memory,

> Load Floating would be an invalid operation;
> Load Double would load a 60-bit floating-point number;
> Load Medium would load a 45-bit floating-point number, as
> such a float would still be wide enough to be useful.

I have not found a need to have LD/ST floating point instructions
because I don't have FP register file(s) {or SIMD files}. Why do
you seem to want/need a file dedicated to floating point values ??
This saves a huge chunk of OpCode Space.......

> Load Byte would be an invalid operation.
> Load Halfword would load a 15-bit integer;
> Load would load a 30-bit integer;
> Load Long would load a 60-bit integer.

> In 36-bit memory,

> Load Floating would load a 36-bit floating-point number
> Load Double would load a 72-bit floating-point nuber
> Load Medium would load a 54-bit floating-point number

> Load Byte would load a 9-bit byte
> Load Halfword would load an 18-bit halfword integer
> Load would load a 36-bit integer
> Load Long would load a 72-bit integer into a _pair_
> of registers, the first of which would be an even-numbered
> register, as the registers are only 64 bits long

> In 24-bit memory,

> Load Floating would be an invalid operation
> Load Double would load a 72-bit floating-point number;
> Load Medium would load a 48-bit floating-point number;

> Load Byte would load a 6-bit character;
> Load Halfword would load a 12-bit integer;
> Load would load a 24-bit integer;
> Load Long would load a 48-bit integer.

> Note that the relationship between floating-point
> widths and integer widths in instructions is offset
> by a factor of two when 24-bit memory is referenced.

> And there would be separate opcodes for doing arithmetic
> on numbers of these different lengths in the registers.

> This is all very well. But in program code, which, as
> it is so far specified, can _only_ reside in 64-bit memory,
> what about pseudo-immediate values?

> Since the instruction format that references pseudo-immediate
> values is that of an operate instruction, pseudo-immediates
> _could_ be allowed, with considerable wasted space in most
> cases, in operate instructions for the data types associated
> with variant memory widths.

> However, I have thought of a way to allow the CPU to treat
> the different memory widths with greater equality.

> Instead of devising a whole new ISA for each of the other
> memory widths, the *existing* ISA, designed around standard
> memory built up from the 64-bit unit, could be given
> _alternative representations_ wherein programs could be
> stored in memory of other widths, using the same ISA with
> minimum modifications.

> The 36-bit Representation

> In the standard 64-bit representation, a block may begin
> with 1100, followed by a 2-bit prefix field for each of the
> remaining fourteen 16-bit parts of the remaining seven
> 32-bit instruction slots in the block.

> So, make that portion of the ISA the only one supported in
> the 36-bit representation, but with each 36-bit word, in
> a block, now 288 bits long, with eight 36-bit words, consisting
> of a prefix field, followed by 16 instruction bits, repeated
> sixteen times.

> But it must still be possible to have block headers if one is
> to have pseudo-immediates.

> So the first 36-bit instruction slot of a block, if it is to
> be a header, would start with 1011100011, and the pre- field
> portion of the second 18 bits of that instruction slot would
> contain 11. 1011100011 consists of 10 (32-bit instruction)
> and 11100011 (header prefix in 32-bit mode).

> So the ISA is modified because in 64-bit memory, only seven
> instruction slots, not all eight, can contain 36-bit instructions,
> and there are no headers inside 36 bit instructions. But the
> modifications are minimal, for the purpose of adapting programs
> to the different size of memory, with that size being native
> for them.

> The 24-bit representation

> Since a 24-bit word contains three 8-bit bytes, here, the
> form of instructions isn't modified at all. But blocks shrink
> from eight instruction slots that are 32-bits in length to
> eight 24-bit words, so a block would be 192 bits long.

> The amount of wasted space for data in 24-bit types wouldn't
> be changed much in many cases by using this representation
> instead of the 32-bit representation for program code,
> because the clash between instruction widths and data widths
> wouldn't be affected. However, _some_ improvement would be
> achieved, because now if there are multiple pseudo-immediate
> values used in a single block, the 24-bit data items could
> at least be *packed* correctly in the block.

> The 60-bit representation

> Here, a block is 480 bits long.

> It consists of two 240-bit sub-blocks, in the following form:

> One 15 bit halfword containing 15 prefix bits,

> and fifteen 15-bit halfwords to which those bits apply,
> turning them into 16-bit halfwords, which are used to form
> instructions.

> The blocks are aligned on 480-bit boundaries. When code
> consists of 32-bit instructions, the first sub-block begins
> with an instruction, and the second sub-block begins with the
> second half of an instruction.

> The first whole 32-bit instruction in a sub-block can be a
> header, and so one has a header applying to seven subsequent
> instruction slots in the first sub-block, and a header applying
> to six subsequent instruction slots in the second sub-block.

> John Savard

Re: Alternative Representations of the Concertina II ISA

<ujr8l8$2g4e8$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35194&group=comp.arch#35194

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Fri, 24 Nov 2023 22:38:32 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <ujr8l8$2g4e8$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 24 Nov 2023 22:38:32 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="73c404d9d948961cd13757af2b101c9b";
logging-data="2625992"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX193Xfdbe4dsHrQb2zK7oTYeUKwny20RHFk="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:k7PIizoX5L0EAKND9SNc8n+m8G0=
 by: Quadibloc - Fri, 24 Nov 2023 22:38 UTC

On Fri, 24 Nov 2023 18:48:53 +0000, MitchAlsup wrote:

> I have not found a need to have LD/ST floating point instructions
> because I don't have FP register file(s) {or SIMD files}. Why do you
> seem to want/need a file dedicated to floating point values ??
> This saves a huge chunk of OpCode Space.......

Well, for one thing, by giving the integer functional unit one register
file, and the floating-point functional unit its own register file, then
those two functional units are not constrained to be adjacent.

I can stick the Decimal Floating-Point functional unit on the other side
of the floating register file, and the Simple Floating functional unit,
or the Register Packed Decimal functional unit, on the other side of the
integer register file.

But of course my main reason is that this is the way everyone (i.e.
the IBM System/360) does it, so of course I should do it that way!

But having an excess of data types is at least a helpful excuse...

John Savard

Re: Alternative Representations of the Concertina II ISA

<ujr8pv$2g4e8$2@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35195&group=comp.arch#35195

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Fri, 24 Nov 2023 22:41:03 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <ujr8pv$2g4e8$2@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me> <ujp9jm$26sr2$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 24 Nov 2023 22:41:03 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="73c404d9d948961cd13757af2b101c9b";
logging-data="2625992"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX189MykVG6swM74XUK2XUX2HY/o3bmmfHic="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:9y+GMDMsUkPuaxsRY6m2HfZPdyE=
 by: Quadibloc - Fri, 24 Nov 2023 22:41 UTC

On Fri, 24 Nov 2023 04:42:31 +0000, Quadibloc wrote:

> On Fri, 24 Nov 2023 04:15:57 +0000, Quadibloc wrote:
>
>> In 24-bit memory,
>>
>> Load Floating would be an invalid operation Load Double would load a
>> 72-bit floating-point number;
>> Load Medium would load a 48-bit floating-point number;
>
> Alternatively, it would perhaps be simpler to do it this way:
>
> Load Floating would load a 48-bit floating-point number;
> Load Medium would load a 72-bit floating-point number.
>
> This would prioritize structure over function, but only the compiler
> would ever know...

Silly me. How could I have forgotten that 24-bit memory is the
only place where the "ideal" lengths of floating-point numbers
can exist, so I should use them, rather than try to concoct
something that's 24-bit native.

So

Load Floating would load a 36-bit floating-point number,
Load Medium would load a 48-bit floating-point number, and
Load Double would load a 60-bit floating-point number.

John Savard

Re: Alternative Representations of the Concertina II ISA

<ujr90g$2g7qk$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35196&group=comp.arch#35196

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.niel.me!news.gegeweb.eu!gegeweb.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Fri, 24 Nov 2023 22:44:32 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 19
Message-ID: <ujr90g$2g7qk$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 24 Nov 2023 22:44:32 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="73c404d9d948961cd13757af2b101c9b";
logging-data="2629460"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+M8bW+q18l3UORJbogbNCo+O3QqvBZyEg="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:96iv77H/1D3VDrOsZVL8m45/Mn8=
 by: Quadibloc - Fri, 24 Nov 2023 22:44 UTC

On Fri, 24 Nov 2023 22:38:32 +0000, Quadibloc wrote:

> On Fri, 24 Nov 2023 18:48:53 +0000, MitchAlsup wrote:
>
>> I have not found a need to have LD/ST floating point instructions
>> because I don't have FP register file(s) {or SIMD files}. Why do you
>> seem to want/need a file dedicated to floating point values ??
>> This saves a huge chunk of OpCode Space.......

Also, because IEEE 754 calls for a hidden first bit, and
gradual underflow, these things make floating-point
arithmeitc slightly slower.

Hence, I deal with them at load and store time, by using
the load and store floating point instructions to convert
to and from an *internal representation* of floats. That
in itself precludes using the integer instructions for them.

John Savard

Re: Alternative Representations of the Concertina II ISA

<ujr9i4$2gau2$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35197&group=comp.arch#35197

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Fri, 24 Nov 2023 22:53:57 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <ujr9i4$2gau2$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 24 Nov 2023 22:53:57 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="73c404d9d948961cd13757af2b101c9b";
logging-data="2632642"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/AVcHHXlQoLYQQDA03R62P1Qoqj/G0Ay0="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:4xFDYPRSyNkrvvKh+XGJQMv7t3I=
 by: Quadibloc - Fri, 24 Nov 2023 22:53 UTC

On Fri, 24 Nov 2023 04:15:57 +0000, Quadibloc wrote:

> An even simpler _multiply_ by five circuit allows accessing memory with
> a 60-bit word as the fundamental unit.

As well, a multiply by three circuit is used to address the 36-bit memory.

> Instead of devising a whole new ISA for each of the other memory widths,
> the *existing* ISA, designed around standard memory built up from the
> 64-bit unit, could be given _alternative representations_ wherein
> programs could be stored in memory of other widths, using the same ISA
> with minimum modifications.

While this is an "improvement", I do realize that this, along with even
supporting multiple memory widths, even if I have found ways that it can
technically be done, is sheer insanity.

I do have a rationale for going there. It's simple enough. It comes from
the x86.

Since the irresistible force of availability of applications means that
we are doomed to always have just *one* dominant ISA...

then, to minimize the terrible loss that this entails, that one dominant
ISA should be maximally flexible. To approach, as closely as possible, the
"good old days" where the PDP-10 and the CDC 6600 coexisted alongside the
PDP-11 and the IBM System/360.

This is the driving force behind my insane quest to design an ISA for a CPU
which has the ability to do something which no one seems to want.

John Savard

Re: Alternative Representations of the Concertina II ISA

<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35226&group=comp.arch#35226

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sat, 25 Nov 2023 19:55:59 +0000
Organization: novaBBS
Message-ID: <f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2100096"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$3AHrrNYXTBlry9Qefvz13uw.arbyQHrvqRkBdVn3ShkpMIiN2VULO
 by: MitchAlsup - Sat, 25 Nov 2023 19:55 UTC

Quadibloc wrote:

> On Fri, 24 Nov 2023 18:48:53 +0000, MitchAlsup wrote:

>> I have not found a need to have LD/ST floating point instructions
>> because I don't have FP register file(s) {or SIMD files}. Why do you
>> seem to want/need a file dedicated to floating point values ??
>> This saves a huge chunk of OpCode Space.......

> Well, for one thing, by giving the integer functional unit one register
> file, and the floating-point functional unit its own register file, then
> those two functional units are not constrained to be adjacent.

But Integer Multiply and Divide can share the FPU that does these.
And we have the need to use FP values in deciding to take branches.
So there is a lot of cross pollination of types to FUs. And finally
you have to have 2 function units one to convert from FP to Int and
one to convert INT to FP.

Lets keep them apart and now we need a FU bigger than the FMAC unit
to perform integer multiplies and divides. In addition, you need a Bus
from the FPU side to the Condition unit that cooperates with integer
branch instructions.

Does not look like you can keep them "that far apart" and even when you
do there is lots of cross wiring--some of which are more costly in HW
than keeping FP and Int in the same RF.

> I can stick the Decimal Floating-Point functional unit on the other side
> of the floating register file, and the Simple Floating functional unit,
> or the Register Packed Decimal functional unit, on the other side of the
> integer register file.

> But of course my main reason is that this is the way everyone (i.e.
> the IBM System/360) does it, so of course I should do it that way!

> But having an excess of data types is at least a helpful excuse...

> John Savard

Re: Alternative Representations of the Concertina II ISA

<ujtmfb$2u9mt$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35229&group=comp.arch#35229

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.niel.me!news.gegeweb.eu!gegeweb.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Sat, 25 Nov 2023 14:46:33 -0600
Organization: A noiseless patient Spider
Lines: 76
Message-ID: <ujtmfb$2u9mt$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me> <ujr90g$2g7qk$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 25 Nov 2023 20:46:35 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="7453bc91ed922c1bfb3262e310d4156c";
logging-data="3090141"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+tUJ35hEc/XUinf4ucsz7o"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:TAaFZ5PE91U/xiEc7lJC3QTBJUM=
In-Reply-To: <ujr90g$2g7qk$1@dont-email.me>
Content-Language: en-US
 by: BGB - Sat, 25 Nov 2023 20:46 UTC

On 11/24/2023 4:44 PM, Quadibloc wrote:
> On Fri, 24 Nov 2023 22:38:32 +0000, Quadibloc wrote:
>
>> On Fri, 24 Nov 2023 18:48:53 +0000, MitchAlsup wrote:
>>
>>> I have not found a need to have LD/ST floating point instructions
>>> because I don't have FP register file(s) {or SIMD files}. Why do you
>>> seem to want/need a file dedicated to floating point values ??
>>> This saves a huge chunk of OpCode Space.......
>
> Also, because IEEE 754 calls for a hidden first bit, and
> gradual underflow, these things make floating-point
> arithmeitc slightly slower.
>

DAZ/FTZ fixes the gradual underflow issue...
Not the standard solution, but works.

One could maybe also make a case for truncate-only rounding:
Easier to make bit-exact across implementations;
Cheapest option;
...

Though, truncate does occasionally have visible effects on code.
For example, the player's yaw angle in my Quake port was prone to drift,
had to manually add a small bias to stop the drifting.
....

Say, Round-Nearest at least having the advantage that looping relative
computations don't have a tendency to drift towards 0. Though, on the
other side, computations which are stable with truncate rounding will
tend to drift away from zero with round-nearest.

In this case, this applied mostly to 'float' and 'short float' (when set
to be routed through the low-precision unit via a compiler flag), with
the main FPU (used for 'double'; and the others with default options)
still doing proper round-nearest.

I don't imagine a non-hidden bit representation would be much better here.

Renormalization would still be required regularly to avoid other issues,
and manual re-normalization would negatively effect performance.

Operations like FADD/FSUB would also still need to be able to
shift-align operands to be able to do their thing, so non-normalized
values would not save much here either.

Well, with the partial exception if one instead re-interprets the
floating-point to be more like variable-fixed-point, in which case the
compiler would need to deal with these issues (and probably the
programmer as well, say, to specify how many bits are above or below the
decimal point).

But, in this case, may as well save a few bits and just use "normal"
fixed-point (or maybe add a compiler type, like "__fixed(16,16)" or
similar, then have the compiler deal with the shifts and similar...).

> Hence, I deal with them at load and store time, by using
> the load and store floating point instructions to convert
> to and from an *internal representation* of floats. That
> in itself precludes using the integer instructions for them.
>

Possible.

I went with integer operations myself...

> John Savard

Re: Alternative Representations of the Concertina II ISA

<uk0rek$3flrp$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35274&group=comp.arch#35274

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Mon, 27 Nov 2023 01:29:56 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <uk0rek$3flrp$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 27 Nov 2023 01:29:56 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b074260f4c9a601904ed11e0b3fc3bc5";
logging-data="3659641"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18MuNefclN3I6k6COPpzArNQPSz6FIJS4E="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:yxRlE1/QPJoe3U9SW0KEHky6NSQ=
 by: Quadibloc - Mon, 27 Nov 2023 01:29 UTC

On Sat, 25 Nov 2023 19:55:59 +0000, MitchAlsup wrote:

> But Integer Multiply and Divide can share the FPU that does these.

But giving each one its own multiplier means more superscalar
goodness!

But you _do_ have a good point. This kind of "superscalar goodness"
is usually wasted, because most programs don't do an equal amount
of integer and FP arithmetic. So it would be way better to have
twice as many multipliers and dividers that are designed to be used
either way than having units of two kinds.

I'm assuming, though, that one can make the multipliers and dividers
simpler, hence faster, by making them single-use, and that it's
possible to know, for a given use case, what the ratio between integer
and FP operations will be. After all, integer multiplication discards
the MSBs of the result, and floating-point multiplication discards the
LSBs of the result.

> And we have the need to use FP values in deciding to take branches.

What gets used in taking branches is the _condition codes_. They're
conveyed from the integer and floating point functional units, as well
as all the other functional units, to a central place (the program
status word).

John Savard

Re: Alternative Representations of the Concertina II ISA

<aef629c4e888515175917d4b41fe41d8@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35275&group=comp.arch#35275

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Mon, 27 Nov 2023 01:56:01 +0000
Organization: novaBBS
Message-ID: <aef629c4e888515175917d4b41fe41d8@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2230169"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$UszqVt7FeBpYoSy4S5FAueAdcRiZrewW8rwbRJpjb1ES9aXz8o2K.
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Mon, 27 Nov 2023 01:56 UTC

Quadibloc wrote:

> On Sat, 25 Nov 2023 19:55:59 +0000, MitchAlsup wrote:

>> But Integer Multiply and Divide can share the FPU that does these.

> But giving each one its own multiplier means more superscalar
> goodness!

> But you _do_ have a good point. This kind of "superscalar goodness"
> is usually wasted, because most programs don't do an equal amount
> of integer and FP arithmetic. So it would be way better to have
> twice as many multipliers and dividers that are designed to be used
> either way than having units of two kinds.

> I'm assuming, though, that one can make the multipliers and dividers
> simpler, hence faster, by making them single-use, and that it's
> possible to know, for a given use case, what the ratio between integer
> and FP operations will be.

In non-scientific code, there are 10 integer mul/divs per 1 FP mul/div
In Scientific code, there are 2 integer mul/divs per 10 FP mul/divs.

> After all, integer multiplication discards
> the MSBs of the result, and floating-point multiplication discards the
> LSBs of the result.

Means almost nothing because a FMAC can end up with those very same bits
as are preferred for Int ×s

>> And we have the need to use FP values in deciding to take branches.

> What gets used in taking branches is the _condition codes_. They're
> conveyed from the integer and floating point functional units, as well
> as all the other functional units, to a central place (the program
> status word).

LD R4,[address of FP value]
BFEQ0 R4,some label

How do you do this on a condition code machine in 2 instructions ??
System 460 had to::

LD F4,[address of FP value]
TST F4
BEQ some label

> John Savard

Re: Alternative Representations of the Concertina II ISA

<uk1cbs$3lign$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35280&group=comp.arch#35280

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Mon, 27 Nov 2023 00:18:34 -0600
Organization: A noiseless patient Spider
Lines: 71
Message-ID: <uk1cbs$3lign$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 27 Nov 2023 06:18:36 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2fcebed1d03588c400a4bb58f77147d2";
logging-data="3852823"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/usYdWTQjJ1sjFOqo5c9HV"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:THSScwE7q2Z4ZVH//XB8HK1dFEU=
In-Reply-To: <uk0rek$3flrp$1@dont-email.me>
Content-Language: en-US
 by: BGB - Mon, 27 Nov 2023 06:18 UTC

On 11/26/2023 7:29 PM, Quadibloc wrote:
> On Sat, 25 Nov 2023 19:55:59 +0000, MitchAlsup wrote:
>
>> But Integer Multiply and Divide can share the FPU that does these.
>
> But giving each one its own multiplier means more superscalar
> goodness!
>
> But you _do_ have a good point. This kind of "superscalar goodness"
> is usually wasted, because most programs don't do an equal amount
> of integer and FP arithmetic. So it would be way better to have
> twice as many multipliers and dividers that are designed to be used
> either way than having units of two kinds.
>
> I'm assuming, though, that one can make the multipliers and dividers
> simpler, hence faster, by making them single-use, and that it's
> possible to know, for a given use case, what the ratio between integer
> and FP operations will be. After all, integer multiplication discards
> the MSBs of the result, and floating-point multiplication discards the
> LSBs of the result.
>

If one wants to have the most superscalar goodness, for general case
code, this means mostly lots of simple ALU ops (particularly,
ADD/SUB/AND/OR/SHIFT), and ability to do *lots* of memory Load/Store ops.

Where, say, Load/Store ops and simple ALU ops tend to dominate over
pretty much everything else in terms of usage frequency.

In most code, FPU ops are comparably sparse, as is MUL, and DIV/MOD is
fairly rare in comparison to either. MUL is common enough that you want
it to be fast when it happens, but not so common to where one is dealing
with piles of them all at the same time (except maybe in edge cases,
like the DCT transform in JPEG/MPEG).

If FMUL and MUL can't be co issued, this is unlikely to matter.

DIV is uncommon enough that it can be 80 or 100 cycles or similar, and
most code will not notice (except in rasterizers or similar, which
actually need fast DIV, but can cheat in that they don't usually need
exact DIV and the divisors are small, allowing for "shortcuts", such as
using lookup tables of reciprocals or similar).

Otherwise, one can debate whether or not having DIV/MOD in hardware
makes sense at all (and if they do have it, "cheap-ish" 68 cycle DIV is
at least "probably" faster than a generic software-only solution).

For cases like divide-by-constant, also, typically it is possible to
turn it in the compiler into a multiply by reciprocal.

>> And we have the need to use FP values in deciding to take branches.
>
> What gets used in taking branches is the _condition codes_. They're
> conveyed from the integer and floating point functional units, as well
> as all the other functional units, to a central place (the program
> status word).
>

Yeah, better to not have this style of condition codes...

Like, there are other ways of doing conditional branches...

> John Savard

Re: Alternative Representations of the Concertina II ISA

<2023Nov27.101759@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35284&group=comp.arch#35284

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Mon, 27 Nov 2023 09:17:59 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 56
Message-ID: <2023Nov27.101759@mips.complang.tuwien.ac.at>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="9f161ffc6bf8f4ad3914c9d4166bd6be";
logging-data="3909547"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Fk6bl3sDEXKwP6b+s9sw1"
Cancel-Lock: sha1:33xAEkD5dIrirw6m1ATh+4IEyT0=
X-newsreader: xrn 10.11
 by: Anton Ertl - Mon, 27 Nov 2023 09:17 UTC

BGB <cr88192@gmail.com> writes:
>On 11/26/2023 7:29 PM, Quadibloc wrote:
>> On Sat, 25 Nov 2023 19:55:59 +0000, MitchAlsup wrote:
>>
>>> But Integer Multiply and Divide can share the FPU that does these.
>>
>> But giving each one its own multiplier means more superscalar
>> goodness!

Having two multipliers that serve both purposes means even more
superscalar goodness for similar area cost. However, there is the
issue of latency. The Willamette ships the integers over to the FPU
for multiplication, and the result back, crossing several clock
domains (at one clock loss per domain crossing), resulting in a
10-cycle latency for integer multiplication. I think that these days
every high-performance core with real silicon invests into separate
GPR and FP/SIMD (including integer SIMD) multipliers.

>In most code, FPU ops are comparably sparse

In terms of executed ops, that depends very much on the code. GP
cores have acquired SIMD cores primarily for FP ops, as can be seen by
both SSE and supporting only FP at first, and only later adding
integer stuff, because it cost little extra. Plus, we have added GPUs
that are now capable of doing huge amounts of FP ops, with uses in
graphics rendering, HPC and AI.

>Otherwise, one can debate whether or not having DIV/MOD in hardware
>makes sense at all (and if they do have it, "cheap-ish" 68 cycle DIV is
>at least "probably" faster than a generic software-only solution).

That debate has been held, and MIPS has hardware integer divide, Alpha
and IA-64 don't have a hardware integer divide; they both have FP
divide instructions.

However, looking at more recent architectures, the RISC-V M extension
(which is part of RV64G and RV32G, i.e., a standard extension) has not
just multiply instructions (MUL, MULH, MULHU, MULHSU, MULW), but also
integer divide instructions: DIV, DIVU, REM, REMU, DIVW, DIVUW, REMW,
and REMUW. ARM A64 also has divide instructions (SDIV, UDIV), but
RISC-V seems significant to me because there the philosophy seems to
be to go for minimalism. So the debate has apparently come to the
conclusion that for general-purpose architectures, you include an
integer divide instruction.

>For cases like divide-by-constant, also, typically it is possible to
>turn it in the compiler into a multiply by reciprocal.

But for that you want a multiply instruction, which in the RISC-V case
means including the M extension, which also includes divide
instructions. Multiplying by the reciprocal may still be faster.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Alternative Representations of the Concertina II ISA

<uk25f4$3pc0r$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35285&group=comp.arch#35285

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Mon, 27 Nov 2023 13:27:01 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 11
Message-ID: <uk25f4$3pc0r$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 27 Nov 2023 13:27:01 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b074260f4c9a601904ed11e0b3fc3bc5";
logging-data="3977243"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/8XZCHgeBQvsih9jcZye87IOolNLgXDbQ="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:FCYXJ1kxDCfK7miq1eP+prt8+fQ=
 by: Quadibloc - Mon, 27 Nov 2023 13:27 UTC

On Mon, 27 Nov 2023 09:17:59 +0000, Anton Ertl wrote:

> That debate has been held, and MIPS has hardware integer divide, Alpha
> and IA-64 don't have a hardware integer divide; they both have FP divide
> instructions.

I didn't know this about the Itanium. All I remembered hearing was that
the Itanium "didn't have a divide instruction", and so I didn't realize
this applied to fixed-point arithmetic only.

John Savard

Re: Alternative Representations of the Concertina II ISA

<2023Nov27.165403@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35289&group=comp.arch#35289

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Mon, 27 Nov 2023 15:54:03 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 42
Message-ID: <2023Nov27.165403@mips.complang.tuwien.ac.at>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <2023Nov27.101759@mips.complang.tuwien.ac.at> <uk25f4$3pc0r$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="9f161ffc6bf8f4ad3914c9d4166bd6be";
logging-data="4029951"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/h9ihShjVVk3Ysv9TYDIY9"
Cancel-Lock: sha1:zTSXbz3kdhZX7/hkE25P2tTipo8=
X-newsreader: xrn 10.11
 by: Anton Ertl - Mon, 27 Nov 2023 15:54 UTC

Quadibloc <quadibloc@servername.invalid> writes:
>On Mon, 27 Nov 2023 09:17:59 +0000, Anton Ertl wrote:
>
>> That debate has been held, and MIPS has hardware integer divide, Alpha
>> and IA-64 don't have a hardware integer divide; they both have FP divide
>> instructions.
>
>I didn't know this about the Itanium. All I remembered hearing was that
>the Itanium "didn't have a divide instruction", and so I didn't realize
>this applied to fixed-point arithmetic only.

You are right, I was wrong.
<http://gec.di.uminho.pt/discip/minf/ac0203/icca03/ia64fpbf1.pdf>
says:

|A number of floating-point operations defined by the IEEE Standard are
|deferred to software by the IA-64 architecture in all its
|implementations
| |* floating-point divide (integer divide, which is based on the
| floating-point divide operation, is also deferred to software)

The paper goes on to describe that FP division a/b is based on
determining 1/b through Newton-Raphson approximation, using the FMA
instruction, and then multiplying with a. It shows a sequence of 13
instructions for double precision, 1 frcpa and 12 fma or fnma
instructions.

Given that integer division is based on FP division, it probably takes
even more instructions.

Meanwhile, a lowly Cortex-A53 with its divide instruction produces an
integer division result with a small quotient in a few cycles. In
other words, if you want to use that method for division, still
provide a division instruction, and then implement it with that
method. This provides the flexibility to switch to some other method
(e.g., the one used by ARM) later.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Alternative Representations of the Concertina II ISA

<uk2rcr$3t44f$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35295&group=comp.arch#35295

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.samoylyk.net!news.gegeweb.eu!gegeweb.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Mon, 27 Nov 2023 13:41:12 -0600
Organization: A noiseless patient Spider
Lines: 253
Message-ID: <uk2rcr$3t44f$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 27 Nov 2023 19:41:15 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2fcebed1d03588c400a4bb58f77147d2";
logging-data="4100239"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+KQnIFWDm1Lal99y31GFKb"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:HGdqVyQ1Wdj3hKMNcMR5Bvsb3sA=
Content-Language: en-US
In-Reply-To: <2023Nov27.101759@mips.complang.tuwien.ac.at>
 by: BGB - Mon, 27 Nov 2023 19:41 UTC

On 11/27/2023 3:17 AM, Anton Ertl wrote:
> BGB <cr88192@gmail.com> writes:
>> On 11/26/2023 7:29 PM, Quadibloc wrote:
>>> On Sat, 25 Nov 2023 19:55:59 +0000, MitchAlsup wrote:
>>>
>>>> But Integer Multiply and Divide can share the FPU that does these.
>>>
>>> But giving each one its own multiplier means more superscalar
>>> goodness!
>
> Having two multipliers that serve both purposes means even more
> superscalar goodness for similar area cost. However, there is the
> issue of latency. The Willamette ships the integers over to the FPU
> for multiplication, and the result back, crossing several clock
> domains (at one clock loss per domain crossing), resulting in a
> 10-cycle latency for integer multiplication. I think that these days
> every high-performance core with real silicon invests into separate
> GPR and FP/SIMD (including integer SIMD) multipliers.
>

I ended up with different multipliers mostly because the requirements
are different...

>> In most code, FPU ops are comparably sparse
>
> In terms of executed ops, that depends very much on the code. GP
> cores have acquired SIMD cores primarily for FP ops, as can be seen by
> both SSE and supporting only FP at first, and only later adding
> integer stuff, because it cost little extra. Plus, we have added GPUs
> that are now capable of doing huge amounts of FP ops, with uses in
> graphics rendering, HPC and AI.
>

Yeah, this was not to say there is not FPU dense code, or that FP-SIMD
is not useful, but rather that in most general code, ALU and LD/ST ops
tend to dominate by a fair margin.

Similarly, SIMD ops may still be useful, even if they are a relative
minority of the executed instructions (even in code sequences which are
actively using SIMD ops...).

Like, typically, for every SIMD op used, there are also things like:
The loads and stores to get the value from memory and put the results in
memory;
ALU ops to calculate the index into the array or similar that we are
loading from or storing to;
....

Well, along with other ops, like shuffles and similar to get the SIMD
elements into the desired order, etc.

Like, some of this is why it is difficult to get anywhere near the
theoretical 200 MFLOP of the SIMD unit, apart from very contrived
use-cases (such as running neural net code), and had involved wonk like
operations which combined a SIMD shuffle into the SIMD ADD/SUB/MUL ops.

For a lot of other use cases, I can just be happy enough that the SIMD
ops are "not slow".

>> Otherwise, one can debate whether or not having DIV/MOD in hardware
>> makes sense at all (and if they do have it, "cheap-ish" 68 cycle DIV is
>> at least "probably" faster than a generic software-only solution).
>
> That debate has been held, and MIPS has hardware integer divide, Alpha
> and IA-64 don't have a hardware integer divide; they both have FP
> divide instructions.
>

Technically, also, BJX2 has ended up having both Integer DIV and FDIV
instructions. But, they don't gain all that much, so are still left as
optional features.

The integer divide isn't very fast, but it doesn't matter if it isn't
used all that often.

The FDIV is effectively a boat anchor (around 122 clock cycles).
Though, mostly this was based on the observation that with some minor
tweaks, the integer divide unit could be made to perform floating-point
divide as well.

The main merit though (over a software N-R divider) is that it can
apparently give exact results (my N-R dividers generally can't converge
past the low order 4 bits).

> However, looking at more recent architectures, the RISC-V M extension
> (which is part of RV64G and RV32G, i.e., a standard extension) has not
> just multiply instructions (MUL, MULH, MULHU, MULHSU, MULW), but also
> integer divide instructions: DIV, DIVU, REM, REMU, DIVW, DIVUW, REMW,
> and REMUW. ARM A64 also has divide instructions (SDIV, UDIV), but
> RISC-V seems significant to me because there the philosophy seems to
> be to go for minimalism. So the debate has apparently come to the
> conclusion that for general-purpose architectures, you include an
> integer divide instruction.
>

Yeah.

I mostly ended up adding integer divide so that the RISC-V mode could
support the 'M' extension, and if I have it for RISC-V, may as well also
add it in BJX2 as its own optional extension.

Had also added a "FAZ DIV" special case that made the integer divide
faster, where over a limited range of input values, the integer divide
would be turned into a multiply by reciprocal.

So, say, DIV latency:
64-bit: 68 cycle
32-bit: 36 cycle
32-bit FAZ: 3 cycle.

As it so happens, FAZ also covers a similar range to that typically used
for a rasterizer, but is more limited in that it only handles a range of
values it can calculate exactly. For my hardware rasterizer, a similar
strategy was used, but extended to support bigger divisors at the
tradeoff that it becomes inexact with larger divisors.

However, adding a "Divide two integers quickly, but may give an
inaccurate result" instruction would be a bit niche (and normal C code
couldn't use it without an intrinsic or similar).

Partial observation is that mostly, the actual bit patterns in the
reciprocals tends to be fairly repetitive, so it is possible to
synthesize a larger range of reciprocals using lookup tables and
shift-adjustments.

In the hardware rasterizer, I had experimented with a fixed-point 1/Z
divider for "more accurate" perspective-correct rasterization, but this
feature isn't cheap (fixed-point reciprocal is more complicated), and
ended up not using it for now (in favor of merely dividing S and T by Z
and then multiplying by the interpolated Z again during rasterization,
as a sort of less accurate "poor man's" version).

The "poor man's perspective correct" strategy doesn't eliminate the need
to subdivide primitives, but does allow the primitives to be somewhat
larger without having as much distortion (mostly relevant as my GL
renderer is still mostly CPU bound in the transform and subdivision
stages, *1).

In theory, proper perspective correct would entirely eliminate the need
to subdivide primitives, but would require clipping geometry to the
viewing frustum (otherwise, the texture coordinates freak out for
primitives crossing outside the frustum or crossing the near plane).

Approximate FDIV is possible, but I have typically used a different
strategy for the reciprocal.

*1: Though, ironically, this does also mean that, via multi-texturing,
it is semi-viable to also use lightmap lighting again in GLQuake (since
this doesn't add too much CPU penalty over the non-multitexture case).

However, dynamic lightmaps still aren't viable, as the CPU-side cost of
drawing the dynamic light-sources to the lightmaps, and then uploading
them, is still a bit too much.

I suspect, probably in any case, GLQuake wasn't likely written to run on
a 50MHz CPU.

Might have been a little easier if the games had poly-counts and and
draw distances more on par with PlayStation games (say, if the whole
scene is kept less than 500 polys; rather than 1k-2k polys).

Say, Quake being generally around 1k-2k polys for the scene, and roughly
300-500 triangles per alias model, ... Despite the scenes looking crude
with large flat surfaces, most of these surfaces had already been hacked
up a fair bit by the BSP algorithm (though, theoretically, some
annoyances could be reduced if the Quake1 maps were rebuilt using a
modified Q3BSP or similar, as the Quake3 tools natively support vertex
lighting and also don't hack up the geometry quite as much as the
original Quake1 BSP tool did, but alas...).

Wouldn't save much, as I still couldn't legally redistribute the data
files (and my GLQuake port isn't particularly usable absent using a
process to convert all the textures into DDS files and rebuilding the
PAK's and similar, so...).

To have something redistributable, would need to replace all of the
textures, sound effects, alias models, etc. An old (probably long dead)
"OpenQuartz" project had partly done a lot of this, but "got creative"
with the character models in a way I didn't like (would have preferred
something at least "in the same general spirit" as the original models).

Similar sort of annoyances as with FreeDoom, but at least FreeDoom
stayed closer to the original in these areas.

Also, possibly, I may need to rework some things in terms of how TKRA-GL
works to better match up with more recent developments (all this is
still a bit hacky; effectively linking the whole GL implementation to
the binary that uses GL).

Granted, it is also possible I may need to at some point move away from
linking the whole TestKern kernel to any programs that use it as well,
with the tradeoff that programs would no longer be able to launch "bare
metal" in the emulator (but would always need the kernel to also be
present).


Click here to read the complete article
Re: Alternative Representations of the Concertina II ISA

<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35296&group=comp.arch#35296

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Tue, 28 Nov 2023 00:58:32 +0000
Organization: novaBBS
Message-ID: <cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2332597"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$iyGzruZMX8AXNN3fPPETM.IJvh8BhqvGQEqS9kBYH3hpbyzvzQXYm
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: MitchAlsup - Tue, 28 Nov 2023 00:58 UTC

BGB wrote:

> On 11/26/2023 7:29 PM, Quadibloc wrote:
>> On Sat, 25 Nov 2023 19:55:59 +0000, MitchAlsup wrote:
>>
>>> But Integer Multiply and Divide can share the FPU that does these.
>>
>> But giving each one its own multiplier means more superscalar
>> goodness!
>>
>> But you _do_ have a good point. This kind of "superscalar goodness"
>> is usually wasted, because most programs don't do an equal amount
>> of integer and FP arithmetic. So it would be way better to have
>> twice as many multipliers and dividers that are designed to be used
>> either way than having units of two kinds.
>>
>> I'm assuming, though, that one can make the multipliers and dividers
>> simpler, hence faster, by making them single-use, and that it's
>> possible to know, for a given use case, what the ratio between integer
>> and FP operations will be. After all, integer multiplication discards
>> the MSBs of the result, and floating-point multiplication discards the
>> LSBs of the result.
>>

> If one wants to have the most superscalar goodness, for general case
> code, this means mostly lots of simple ALU ops (particularly,
> ADD/SUB/AND/OR/SHIFT), and ability to do *lots* of memory Load/Store ops.

> Where, say, Load/Store ops and simple ALU ops tend to dominate over
> pretty much everything else in terms of usage frequency.

LD+ST 33%
Int 45%

> In most code, FPU ops are comparably sparse,

FP 9%
> as is MUL,
MUL 2%
> and DIV/MOD is
DIV 0.3%
> fairly rare in comparison to either. MUL is common enough that you want
> it to be fast when it happens, but not so common to where one is dealing
> with piles of them all at the same time (except maybe in edge cases,
> like the DCT transform in JPEG/MPEG).

MUL is much more common in FORTRAN than in C-like languages as it forms
the underlying math for multidimensional arrays, and more ×s exist in
typical number crunching code (F) than in pointer chasing (C) languages.

> If FMUL and MUL can't be co issued, this is unlikely to matter.

Since both are pipelined, this is a small stall {{Unlike MIPS R2000 where
IMUL was 12 cycles blocking while FMUL was 3 (or was it 4) pipelined.

> DIV is uncommon enough that it can be 80 or 100 cycles or similar, and
> most code will not notice (except in rasterizers or similar, which
> actually need fast DIV, but can cheat in that they don't usually need
> exact DIV and the divisors are small, allowing for "shortcuts", such as
> using lookup tables of reciprocals or similar).

We know that FDIV can be performed in 17 cycles in the FMUL unit, and
this has been known since 1989......why are you making it so long ??

0.3% IDIV × 100 cycles and you have added 3 cycles to the average
instruction !!!

> Otherwise, one can debate whether or not having DIV/MOD in hardware
> makes sense at all (and if they do have it, "cheap-ish" 68 cycle DIV is
> at least "probably" faster than a generic software-only solution).

With an 8-bit start, and Newton-Raphson iteration, SW should be able
to do this in mid-30-cycles. {{See ITANIC FDIV SW profile}}

> For cases like divide-by-constant, also, typically it is possible to
> turn it in the compiler into a multiply by reciprocal.

And you should if it helps.

>>> And we have the need to use FP values in deciding to take branches.
>>
>> What gets used in taking branches is the _condition codes_. They're
>> conveyed from the integer and floating point functional units, as well
>> as all the other functional units, to a central place (the program
>> status word).
>>

> Yeah, better to not have this style of condition codes...

RISC-V, MIPS, Mc 88K, My 66000, have no condition codes......

> Like, there are other ways of doing conditional branches...

"Compare to zero and branch" × {Signed, Unsigned, FP16, FP32, FP64}}
for 1 instruction FP branches

Compare and then branch on anything (2 inst::1 int, 1 FP) your heart desires
Branch {NaN, -Infinity, +Infinity, -DeNorm, +Denorm, -Normal, +Normal, Zero,
-zero, +zero, < <= > >= == !=, 0 < x < y, 0 <=x < y, 0 < x <= y, 0 <= x <= y, ... }
Can you even record all these states in a single condition code ??

>> John Savard

Re: Alternative Representations of the Concertina II ISA

<3020102144e0e12cd79c784d2b80af78@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35297&group=comp.arch#35297

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Tue, 28 Nov 2023 01:03:39 +0000
Organization: novaBBS
Message-ID: <3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <2023Nov27.101759@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2332921"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$1h0MggHcwn1bPOkFA7HGjeCVk59HBy6J1NfRObhwE3xMoEuBvjB8.
 by: MitchAlsup - Tue, 28 Nov 2023 01:03 UTC

Anton Ertl wrote:

> BGB <cr88192@gmail.com> writes:
>>On 11/26/2023 7:29 PM, Quadibloc wrote:
>>> On Sat, 25 Nov 2023 19:55:59 +0000, MitchAlsup wrote:
>>>
>>>> But Integer Multiply and Divide can share the FPU that does these.
>>>
>>> But giving each one its own multiplier means more superscalar
>>> goodness!

> Having two multipliers that serve both purposes means even more
> superscalar goodness for similar area cost. However, there is the
> issue of latency. The Willamette ships the integers over to the FPU
> for multiplication, and the result back, crossing several clock
> domains (at one clock loss per domain crossing), resulting in a
> 10-cycle latency for integer multiplication. I think that these days
> every high-performance core with real silicon invests into separate
> GPR and FP/SIMD (including integer SIMD) multipliers.

>>In most code, FPU ops are comparably sparse

> In terms of executed ops, that depends very much on the code. GP
> cores have acquired SIMD cores primarily for FP ops, as can be seen by
> both SSE and supporting only FP at first, and only later adding
> integer stuff, because it cost little extra. Plus, we have added GPUs
> that are now capable of doing huge amounts of FP ops, with uses in
> graphics rendering, HPC and AI.

Once SIMD gains Integer operations, the Multiplier has to be built to
do both, might as well use it for more things than just SIMD.

>>Otherwise, one can debate whether or not having DIV/MOD in hardware
>>makes sense at all (and if they do have it, "cheap-ish" 68 cycle DIV is
>>at least "probably" faster than a generic software-only solution).

> That debate has been held, and MIPS has hardware integer divide, Alpha
> and IA-64 don't have a hardware integer divide; they both have FP
> divide instructions.

And all of these lead fruitful long productive lives before taking over
-------Oh wait !!

> However, looking at more recent architectures, the RISC-V M extension
> (which is part of RV64G and RV32G, i.e., a standard extension) has not
> just multiply instructions (MUL, MULH, MULHU, MULHSU, MULW), but also
> integer divide instructions: DIV, DIVU, REM, REMU, DIVW, DIVUW, REMW,
> and REMUW.

All of which are possible in My 66000 using operand sign control, S-bit,
and CARRY when you want 64×64->128 or 128/64->{64 quotient, 64 remainder}

> ARM A64 also has divide instructions (SDIV, UDIV), but
> RISC-V seems significant to me because there the philosophy seems to
> be to go for minimalism. So the debate has apparently come to the
> conclusion that for general-purpose architectures, you include an
> integer divide instruction.

Several forms of integer DIV at least signed and unsigned.....

Re: Alternative Representations of the Concertina II ISA

<330cde3df7584cb16cc1b6b154f2142e@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35298&group=comp.arch#35298

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Tue, 28 Nov 2023 01:05:21 +0000
Organization: novaBBS
Message-ID: <330cde3df7584cb16cc1b6b154f2142e@news.novabbs.com>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <2023Nov27.101759@mips.complang.tuwien.ac.at> <uk25f4$3pc0r$1@dont-email.me> <2023Nov27.165403@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="2332921"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$zib86VpfSC4A550dHpU9l.iqRMgL1lG8nX7MOQfUE9goPbpfzSiUe
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Tue, 28 Nov 2023 01:05 UTC

Anton Ertl wrote:

> Quadibloc <quadibloc@servername.invalid> writes:
>>On Mon, 27 Nov 2023 09:17:59 +0000, Anton Ertl wrote:
>>
>>> That debate has been held, and MIPS has hardware integer divide, Alpha
>>> and IA-64 don't have a hardware integer divide; they both have FP divide
>>> instructions.
>>
>>I didn't know this about the Itanium. All I remembered hearing was that
>>the Itanium "didn't have a divide instruction", and so I didn't realize
>>this applied to fixed-point arithmetic only.

> You are right, I was wrong.
> <http://gec.di.uminho.pt/discip/minf/ac0203/icca03/ia64fpbf1.pdf>
> says:

> |A number of floating-point operations defined by the IEEE Standard are
> |deferred to software by the IA-64 architecture in all its
> |implementations
> |
> |* floating-point divide (integer divide, which is based on the
> | floating-point divide operation, is also deferred to software)

> The paper goes on to describe that FP division a/b is based on
> determining 1/b through Newton-Raphson approximation, using the FMA
> instruction, and then multiplying with a. It shows a sequence of 13
> instructions for double precision, 1 frcpa and 12 fma or fnma
> instructions.

> Given that integer division is based on FP division, it probably takes
> even more instructions.

You have to synthesize the bits between 63 and 53 to get 64/64->64.

> Meanwhile, a lowly Cortex-A53 with its divide instruction produces an
> integer division result with a small quotient in a few cycles. In
> other words, if you want to use that method for division, still
> provide a division instruction, and then implement it with that
> method. This provides the flexibility to switch to some other method
> (e.g., the one used by ARM) later.

> - anton

Re: Alternative Representations of the Concertina II ISA

<uk3ric$5adb$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35299&group=comp.arch#35299

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Tue, 28 Nov 2023 04:50:20 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <uk3ric$5adb$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<2023Nov27.101759@mips.complang.tuwien.ac.at>
<uk25f4$3pc0r$1@dont-email.me>
<2023Nov27.165403@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 28 Nov 2023 04:50:20 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="a89adbf7516b70ebc68ba409ba7fb0a5";
logging-data="174507"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/axDfacnTM+TQWMOI2fqFJB4IJyAAYdS8="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:JgE9bPE0SltU7A9Y+92Sh4+/KwA=
 by: Quadibloc - Tue, 28 Nov 2023 04:50 UTC

On Mon, 27 Nov 2023 15:54:03 +0000, Anton Ertl wrote:

> Quadibloc <quadibloc@servername.invalid> writes:
>>On Mon, 27 Nov 2023 09:17:59 +0000, Anton Ertl wrote:
>>
>>> That debate has been held, and MIPS has hardware integer divide, Alpha
>>> and IA-64 don't have a hardware integer divide; they both have FP
>>> divide instructions.
>>
>>I didn't know this about the Itanium. All I remembered hearing was that
>>the Itanium "didn't have a divide instruction", and so I didn't realize
>>this applied to fixed-point arithmetic only.
>
> You are right, I was wrong.
> <http://gec.di.uminho.pt/discip/minf/ac0203/icca03/ia64fpbf1.pdf> says:
>
> |A number of floating-point operations defined by the IEEE Standard are
> |deferred to software by the IA-64 architecture in all its
> |implementations |
> |* floating-point divide (integer divide, which is based on the |
> floating-point divide operation, is also deferred to software)
>
> The paper goes on to describe that FP division a/b is based on
> determining 1/b through Newton-Raphson approximation, using the FMA
> instruction, and then multiplying with a. It shows a sequence of 13
> instructions for double precision, 1 frcpa and 12 fma or fnma
> instructions.
>
> Given that integer division is based on FP division, it probably takes
> even more instructions.

The fastest _hardware_ implementations of FP division are Goldschmidt
division and Newton-Raphson divisiion, and they both make use of
multiplication hardware.

So using Newton-Raphson division to increase the width of an FP division
result to perform 64-bit integer division won't be too hard.

Since the Wikipedia article didn't go into detail, I had to look further
to find out that you _were_ right about the DEC Alpha, however. It did
have floating divide but not integer divide. The 21264 added floating
square root.

John Savard

Re: Alternative Representations of the Concertina II ISA

<2023Nov28.101037@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35300&group=comp.arch#35300

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Tue, 28 Nov 2023 09:10:37 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 37
Message-ID: <2023Nov28.101037@mips.complang.tuwien.ac.at>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <2023Nov27.101759@mips.complang.tuwien.ac.at> <uk25f4$3pc0r$1@dont-email.me> <2023Nov27.165403@mips.complang.tuwien.ac.at> <330cde3df7584cb16cc1b6b154f2142e@news.novabbs.com>
Injection-Info: dont-email.me; posting-host="97289b610d9704ba7ee09f5ae9daaff4";
logging-data="241703"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/YhHx/TgJwmXz7mWUHCVrK"
Cancel-Lock: sha1:0iWIPy1dEs8FPIN2Bnq8Ro4fUBE=
X-newsreader: xrn 10.11
 by: Anton Ertl - Tue, 28 Nov 2023 09:10 UTC

mitchalsup@aol.com (MitchAlsup) writes:
>Anton Ertl wrote:
>> <http://gec.di.uminho.pt/discip/minf/ac0203/icca03/ia64fpbf1.pdf>
>> says:
>
>> |A number of floating-point operations defined by the IEEE Standard are
>> |deferred to software by the IA-64 architecture in all its
>> |implementations
>> |
>> |* floating-point divide (integer divide, which is based on the
>> | floating-point divide operation, is also deferred to software)
>
>> The paper goes on to describe that FP division a/b is based on
>> determining 1/b through Newton-Raphson approximation, using the FMA
>> instruction, and then multiplying with a. It shows a sequence of 13
>> instructions for double precision, 1 frcpa and 12 fma or fnma
>> instructions.
>
>> Given that integer division is based on FP division, it probably takes
>> even more instructions.
>
>You have to synthesize the bits between 63 and 53 to get 64/64->64.

The paper says that they use double-extended division to synthesize
64/64-bit division; I expect that double-extended uses even more
Newton-Raphson steps (and FMAs) than double precision; and then there
are the additional steps for getting an integer.

The paper only shows signed 16/16 bit division, a 15-instruction
sequence including 1 fcrpa and 3 fma/fnma instructions. So you can
expect that 64/64 is at least 13+15-4=24 instructions long, probably
longer (because of double-extended precision).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Alternative Representations of the Concertina II ISA

<2023Nov28.102044@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35301&group=comp.arch#35301

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Tue, 28 Nov 2023 09:20:44 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 22
Message-ID: <2023Nov28.102044@mips.complang.tuwien.ac.at>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <2023Nov27.101759@mips.complang.tuwien.ac.at> <3020102144e0e12cd79c784d2b80af78@news.novabbs.com>
Injection-Info: dont-email.me; posting-host="97289b610d9704ba7ee09f5ae9daaff4";
logging-data="249171"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+5L9YXUAe6fU7MzA6i3BVw"
Cancel-Lock: sha1:99OibcMJFQrueW1NzSQawK1rcYM=
X-newsreader: xrn 10.11
 by: Anton Ertl - Tue, 28 Nov 2023 09:20 UTC

mitchalsup@aol.com (MitchAlsup) writes:
>Several forms of integer DIV at least signed and unsigned.....

Gforth has unsigned, floored (signed) and symmetric (signed), in
double-cell/single-cell variants (standardized in Forth) as well as in
single/single variants.

Floored rounds the quotient down, symmetric (often called truncated)
rounds the quotient towards 0. There is also something called
Euclidean that always produces a remainder >=0, but the practical
difference from floored is when the divisor is negative, which is
exceedingly rare.

Given the rareness of negative divisors, one might wonder about
signed/unsigned, but at least implementations of current programming
languages would probably make little use of such instructions even if
they were cheaper than the signed/signed ones.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Alternative Representations of the Concertina II ISA

<uk4fkr$83u4$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35304&group=comp.arch#35304

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Tue, 28 Nov 2023 04:32:58 -0600
Organization: A noiseless patient Spider
Lines: 190
Message-ID: <uk4fkr$83u4$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 28 Nov 2023 10:32:59 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c208050d75b3f71c95ee722651efd037";
logging-data="266180"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18NTTvZY8/dLLMilAib9kzm"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:zU/Ktt4A8e1DjrxPtS9GrCH+Zfg=
Content-Language: en-US
In-Reply-To: <cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
 by: BGB - Tue, 28 Nov 2023 10:32 UTC

On 11/27/2023 6:58 PM, MitchAlsup wrote:
> BGB wrote:
>
>> On 11/26/2023 7:29 PM, Quadibloc wrote:
>>> On Sat, 25 Nov 2023 19:55:59 +0000, MitchAlsup wrote:
>>>
>>>> But Integer Multiply and Divide can share the FPU that does these.
>>>
>>> But giving each one its own multiplier means more superscalar
>>> goodness!
>>>
>>> But you _do_ have a good point. This kind of "superscalar goodness"
>>> is usually wasted, because most programs don't do an equal amount
>>> of integer and FP arithmetic. So it would be way better to have
>>> twice as many multipliers and dividers that are designed to be used
>>> either way than having units of two kinds.
>>>
>>> I'm assuming, though, that one can make the multipliers and dividers
>>> simpler, hence faster, by making them single-use, and that it's
>>> possible to know, for a given use case, what the ratio between integer
>>> and FP operations will be. After all, integer multiplication discards
>>> the MSBs of the result, and floating-point multiplication discards the
>>> LSBs of the result.
>>>
>
>> If one wants to have the most superscalar goodness, for general case
>> code, this means mostly lots of simple ALU ops (particularly,
>> ADD/SUB/AND/OR/SHIFT), and ability to do *lots* of memory Load/Store ops.
>
>> Where, say, Load/Store ops and simple ALU ops tend to dominate over
>> pretty much everything else in terms of usage frequency.
>
> LD+ST 33%
> Int   45%
>
>> In most code, FPU ops are comparably sparse,
>
> FP     9%
>>                                              as is MUL,
> MUL    2%
>>                                                         and DIV/MOD is
> DIV  0.3%

Yeah, pretty much...

>> fairly rare in comparison to either. MUL is common enough that you
>> want it to be fast when it happens, but not so common to where one is
>> dealing with piles of them all at the same time (except maybe in edge
>> cases, like the DCT transform in JPEG/MPEG).
>
> MUL is much more common in FORTRAN than in C-like languages as it forms
> the underlying math for multidimensional arrays, and more ×s exist in
> typical number crunching code (F) than in pointer chasing (C) languages.
>

Yeah.

Multidimensional arrays are comparably rarely used in C land, and when
they are used, they typically have power-of-2 dimensions.

>> If FMUL and MUL can't be co issued, this is unlikely to matter.
>
> Since both are pipelined, this is a small stall {{Unlike MIPS R2000 where
> IMUL was 12 cycles blocking while FMUL was 3 (or was it 4) pipelined.
>

Slow IMUL but fast FMUL: WTF?...
Unless this was only counting single precision, then it could make more
sense.

>> DIV is uncommon enough that it can be 80 or 100 cycles or similar, and
>> most code will not notice (except in rasterizers or similar, which
>> actually need fast DIV, but can cheat in that they don't usually need
>> exact DIV and the divisors are small, allowing for "shortcuts", such
>> as using lookup tables of reciprocals or similar).
>
> We know that FDIV can be performed in 17 cycles in the FMUL unit, and
> this has been known since 1989......why are you making it so long ??
>
> 0.3% IDIV × 100 cycles and you have added 3 cycles to the average
> instruction !!!
>

Can note that using DIV/MOD seems a lot rarer in my case...

But, I can note that my compiler turns constant divide into alternatives:
y=x/C; //~ y=(x*RCP)>>SHR;
y=x%C; //~ y=x-(((x*RCP)>>SHR)*C);

Though, with some extra wonk thrown in so negative values round the
divide towards zero and similar (otherwise, negative values would round
away from zero, which is not the commonly accepted result).

y=((x+((x<0)?(C-1):0))*RCP)>>SHR;

The variable cases are comparably infrequent.

>> Otherwise, one can debate whether or not having DIV/MOD in hardware
>> makes sense at all (and if they do have it, "cheap-ish" 68 cycle DIV
>> is at least "probably" faster than a generic software-only solution).
>
> With an 8-bit start, and Newton-Raphson iteration, SW should be able
> to do this in mid-30-cycles. {{See ITANIC FDIV SW profile}}
>

I was thinking of integer divide here (being 68 cycle for DIVS.Q/DIVU.Q
in my case; 36 for DIVS.L/DIVU.L).

But, yeah, the FDIV instruction is even slower (120 cycles); and
software Newton-Raphson will win this race...

Partly this is the drawback of using a Shift-Add design.

Say, every clock-cycle, the value is shifted left by 1 bit, and logic
determines whether or not to add a second value to the working value.

For multiply, you can use the bits of one input to mask whether or not
to add the other input. For divide, the carry of the adder can be used
to mask whether to add the inverse of the divisor.

For FDIV, one runs the divider for more clock cycles, and it generates
bits below the decimal point.

Though, at least for the merit of an Integer DIV op, it is faster to use
a 36-cycle integer divide instruction than it is to run a shift-subtract
loop in software.

>> For cases like divide-by-constant, also, typically it is possible to
>> turn it in the compiler into a multiply by reciprocal.
>
> And you should if it helps.
>

Yeah, it does.
Multiply by reciprocal is considerably faster.

>>>> And we have the need to use FP values in deciding to take branches.
>>>
>>> What gets used in taking branches is the _condition codes_. They're
>>> conveyed from the integer and floating point functional units, as well
>>> as all the other functional units, to a central place (the program
>>> status word).
>>>
>
>> Yeah, better to not have this style of condition codes...
>
> RISC-V, MIPS, Mc 88K, My 66000, have no condition codes......
>

Yeah.

In BJX2, there is only SR.T, however, the way it is updated and used is
much narrower in scope.

A case could be made for narrowing its scope further though.

>> Like, there are other ways of doing conditional branches...
>
> "Compare to zero and branch" × {Signed, Unsigned, FP16, FP32, FP64}}
> for 1 instruction FP branches
>

Yeah, I had ended up adding this strategy as well as the prior SR.T
mechanism.

> Compare and then branch on anything (2 inst::1 int, 1 FP) your heart
> desires Branch {NaN, -Infinity, +Infinity, -DeNorm, +Denorm, -Normal,
> +Normal, Zero, -zero, +zero, < <= > >= == !=, 0 < x < y, 0 <=x < y, 0 <
> x <= y, 0 <= x <= y, ... }
> Can you even record all these states in a single condition code ??
>

Possible.

>>> John Savard

Re: Alternative Representations of the Concertina II ISA

<uk4p80$14jeu$1@news1.carnet.hr>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35306&group=comp.arch#35306

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!nntp.comgw.net!weretis.net!feeder8.news.weretis.net!newsfeed.CARNet.hr!.POSTED.161.53.63.21!not-for-mail
From: zec@fer.hr (Marko Zec)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Tue, 28 Nov 2023 13:16:48 -0000 (UTC)
Organization: CARNet, Croatia
Sender: Marko Zec <marko@login.nxlab.fer.hr>
Message-ID: <uk4p80$14jeu$1@news1.carnet.hr>
References: <ujp81t$26ff9$1@dont-email.me> <43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com> <ujr8l8$2g4e8$1@dont-email.me> <f693434f205f88781a86a0c6fac43eda@news.novabbs.com> <uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me> <cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
Injection-Date: Tue, 28 Nov 2023 13:16:48 -0000 (UTC)
Injection-Info: news1.carnet.hr; posting-host="161.53.63.21";
logging-data="1199582"; mail-complaints-to="abuse@CARNet.hr"
User-Agent: tin/2.4.5-20201224 ("Glen Albyn") (FreeBSD/12.2-RELEASE-p14 (amd64))
 by: Marko Zec - Tue, 28 Nov 2023 13:16 UTC

MitchAlsup <mitchalsup@aol.com> wrote:
....
> 0.3% IDIV ? 100 cycles and you have added 3 cycles to the average
> instruction !!!

Not really, more likeliky it would be a 0.33 average CPI bump, assuming
a 1.0 baseline CPI for non-DIV instructions. Still way to much to be
ignored...

Re: Alternative Representations of the Concertina II ISA

<uk5dbu$curb$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35308&group=comp.arch#35308

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.chmurka.net!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Alternative Representations of the Concertina II ISA
Date: Tue, 28 Nov 2023 13:00:11 -0600
Organization: A noiseless patient Spider
Lines: 61
Message-ID: <uk5dbu$curb$1@dont-email.me>
References: <ujp81t$26ff9$1@dont-email.me>
<43f9c94c44017bc3becd8d9c58d846d5@news.novabbs.com>
<ujr8l8$2g4e8$1@dont-email.me>
<f693434f205f88781a86a0c6fac43eda@news.novabbs.com>
<uk0rek$3flrp$1@dont-email.me> <uk1cbs$3lign$1@dont-email.me>
<cf0523e8dc4ba43c9d1a95184186f199@news.novabbs.com>
<uk4p80$14jeu$1@news1.carnet.hr>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 28 Nov 2023 19:00:14 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c208050d75b3f71c95ee722651efd037";
logging-data="424811"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Gk/zwarE0MQVYlw/wQCvS"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:MLc80ulf26l+BQryfcbMXnHGxUQ=
In-Reply-To: <uk4p80$14jeu$1@news1.carnet.hr>
Content-Language: en-US
 by: BGB - Tue, 28 Nov 2023 19:00 UTC

On 11/28/2023 7:16 AM, Marko Zec wrote:
> MitchAlsup <mitchalsup@aol.com> wrote:
> ...
>> 0.3% IDIV ? 100 cycles and you have added 3 cycles to the average
>> instruction !!!
>
> Not really, more likeliky it would be a 0.33 average CPI bump, assuming
> a 1.0 baseline CPI for non-DIV instructions. Still way to much to be
> ignored...

I missed that, but yeah.

Still, it seems the incidence of the integer DIV/MOD/etc instructions
being used is somewhat rarer than this estimate in the code I was measuring.

Checking my compiler output for static use count:
Around 0.05% ...

It might have been more common though, if my compiler didn't
aggressively eliminate the divide by constant cases (and my coding
practices didn't actively avoid using division in general, etc).

Since, between either a slowish DIV instruction, or a software
shift-subtract loop, neither was expected to be all that fast.

On pretty much every system I have used IRL, division has tended to be
fairly slow, so I had tended to avoid it whenever possible.

In cases where fast divide was needed, there were often other
workarounds (such as using a lookup table of reciprocals).

If inexact results are acceptable, it is possible to stretch the table
of reciprocals up to pretty much arbitrary divisors (via normalization
and some shift-related trickery).
In software, this is helped with a CLZ instruction or similar, but had
also used a variant of this effectively in my hardware rasterizer module.

Say:
z=x/y;
y:
0.. 31: Direct multiply by MAX/y (Table A)
32.. 63: Normalized multiply by MAX/y (Table B)
64..127: Use (y>>1) as index to Table B, add 1 to final SHR
128..255: Use (y>>2) as index to Table B, add 2 to final SHR
...

The sizes of Tables A/B can be reduced (via another lookup table) by
making use of repetitions in the bit patterns (though, this is more
practical in Verilog than in software).

....

Though, getting acceptable results from fixed-point division is a little
more of an issue.

Pages:1234
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor