Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

The only thing cheaper than hardware is talk.


devel / comp.arch / Re: Concertina II Progress

SubjectAuthor
* Concertina II ProgressQuadibloc
+- Re: Concertina II ProgressBGB
+* Re: Concertina II ProgressThomas Koenig
|+* Re: Concertina II ProgressBGB-Alt
||`* Re: Concertina II ProgressQuadibloc
|| `* Re: Concertina II ProgressBGB-Alt
||  +* Re: Concertina II ProgressQuadibloc
||  |+* Re: Concertina II ProgressBGB
||  ||`- Re: Concertina II ProgressMitchAlsup
||  |+* Re: Concertina II ProgressScott Lurndal
||  ||`* Re: Concertina II ProgressBGB
||  || +* Re: Concertina II ProgressStephen Fuld
||  || |`* Re: Concertina II ProgressMitchAlsup
||  || | +- Re: Concertina II ProgressBGB-Alt
||  || | `* Re: Concertina II ProgressStephen Fuld
||  || |  `* Re: Concertina II ProgressMitchAlsup
||  || |   `* Re: Concertina II ProgressStephen Fuld
||  || |    `* Re: Concertina II ProgressMitchAlsup
||  || |     `* Re: Concertina II ProgressStephen Fuld
||  || |      `* Re: Concertina II ProgressBGB
||  || |       `* Re: Concertina II ProgressMitchAlsup
||  || |        +* Re: Concertina II ProgressBGB
||  || |        |`* Re: Concertina II ProgressMitchAlsup
||  || |        | +* Re: Concertina II ProgressStefan Monnier
||  || |        | |`* Re: Concertina II ProgressMitchAlsup
||  || |        | | `* Re: Concertina II ProgressScott Lurndal
||  || |        | |  `* Re: Concertina II ProgressMitchAlsup
||  || |        | |   +- Re: Concertina II ProgressPaul A. Clayton
||  || |        | |   `* Re: Concertina II ProgressStefan Monnier
||  || |        | |    +- Re: Concertina II ProgressMitchAlsup
||  || |        | |    `* Re: Concertina II ProgressScott Lurndal
||  || |        | |     `* Re: Concertina II ProgressBGB
||  || |        | |      +* Re: Concertina II ProgressScott Lurndal
||  || |        | |      |`* Re: Concertina II ProgressBGB
||  || |        | |      | +* Re: Concertina II ProgressScott Lurndal
||  || |        | |      | |+* Re: Concertina II ProgressBGB
||  || |        | |      | ||`* Re: Concertina II ProgressScott Lurndal
||  || |        | |      | || `* Re: Concertina II ProgressBGB
||  || |        | |      | ||  +* Re: Concertina II ProgressScott Lurndal
||  || |        | |      | ||  |+- Re: Concertina II ProgressMitchAlsup
||  || |        | |      | ||  |`* Re: Concertina II ProgressBGB
||  || |        | |      | ||  | `- Re: Concertina II ProgressScott Lurndal
||  || |        | |      | ||  `* Re: Concertina II ProgressRobert Finch
||  || |        | |      | ||   `- Re: Concertina II ProgressBGB
||  || |        | |      | |`* Re: Concertina II ProgressMitchAlsup
||  || |        | |      | | `* Re: Concertina II ProgressScott Lurndal
||  || |        | |      | |  `* Re: Concertina II ProgressMitchAlsup
||  || |        | |      | |   +* Re: Concertina II ProgressScott Lurndal
||  || |        | |      | |   |`- Re: Concertina II ProgressMitchAlsup
||  || |        | |      | |   `* Re: Concertina II ProgressScott Lurndal
||  || |        | |      | |    `- Re: Concertina II ProgressMitchAlsup
||  || |        | |      | `- Re: Concertina II ProgressMitchAlsup
||  || |        | |      `* Re: Concertina II ProgressMitchAlsup
||  || |        | |       +- Re: Concertina II ProgressRobert Finch
||  || |        | |       `* Re: Concertina II ProgressScott Lurndal
||  || |        | |        `* Re: Concertina II ProgressMitchAlsup
||  || |        | |         `* Re: Concertina II ProgressChris M. Thomasson
||  || |        | |          `* Re: Concertina II ProgressMitchAlsup
||  || |        | |           `* Re: Concertina II ProgressMitchAlsup
||  || |        | |            `- Re: Concertina II ProgressChris M. Thomasson
||  || |        | `* Re: Concertina II ProgressBGB
||  || |        |  `* Re: Concertina II ProgressMitchAlsup
||  || |        |   `* Re: Concertina II ProgressBGB
||  || |        |    `* Re: Concertina II ProgressMitchAlsup
||  || |        |     +* Re: Concertina II ProgressRobert Finch
||  || |        |     |`* Re: Concertina II ProgressMitchAlsup
||  || |        |     | +- Re: Concertina II ProgressRobert Finch
||  || |        |     | `* Re: Concertina II ProgressQuadibloc
||  || |        |     |  +* Re: Concertina II ProgressQuadibloc
||  || |        |     |  |`* Re: Concertina II ProgressMitchAlsup
||  || |        |     |  | +* Re: Concertina II ProgressScott Lurndal
||  || |        |     |  | |`* Re: Concertina II ProgressMitchAlsup
||  || |        |     |  | | +- Re: Concertina II ProgressScott Lurndal
||  || |        |     |  | | `* Re: Concertina II ProgressQuadibloc
||  || |        |     |  | |  `* Re: Concertina II ProgressMitchAlsup
||  || |        |     |  | |   `* Re: Concertina II ProgressQuadibloc
||  || |        |     |  | |    `- Re: Concertina II ProgressQuadibloc
||  || |        |     |  | `* Re: Concertina II ProgressQuadibloc
||  || |        |     |  |  `- Re: Concertina II ProgressMitchAlsup
||  || |        |     |  `- Re: Concertina II ProgressMitchAlsup
||  || |        |     +- Re: Concertina II ProgressBGB
||  || |        |     `* Re: Concertina II ProgressPaul A. Clayton
||  || |        |      +* Re: Concertina II ProgressRobert Finch
||  || |        |      |`* Re: Concertina II ProgressPaul A. Clayton
||  || |        |      | +* Re: Concertina II ProgressMitchAlsup
||  || |        |      | |`* Re: Concertina II ProgressPaul A. Clayton
||  || |        |      | | +- Re: Concertina II ProgressBGB
||  || |        |      | +* Computer architecture (was: Concertina II Progress)Anton Ertl
||  || |        |      | |+* Re: Computer architectureEricP
||  || |        |      | ||`* Re: Computer architectureAnton Ertl
||  || |        |      | || `* Re: Computer architectureScott Lurndal
||  || |        |      | ||  +* Re: Computer architectureStefan Monnier
||  || |        |      | ||  |`* Re: Computer architectureScott Lurndal
||  || |        |      | ||  | `* Re: Computer architectureStefan Monnier
||  || |        |      | ||  |  +* Re: Computer architectureScott Lurndal
||  || |        |      | ||  |  |`* Re: Computer architectureStefan Monnier
||  || |        |      | ||  |  | `* Re: Computer architectureBGB
||  || |        |      | ||  |  |  `- Re: Computer architectureStefan Monnier
||  || |        |      | ||  |  `* Re: Computer architectureBGB
||  || |        |      | ||  |   `- Re: Computer architectureScott Lurndal
||  || |        |      | ||  +* Re: Computer architectureAnton Ertl
||  || |        |      | |`* Re: Computer architecturePaul A. Clayton
||  || |        |      `* Re: Concertina II ProgressMitchAlsup
||  || |        `* Re: Concertina II ProgressRobert Finch
||  || `* Re: Concertina II ProgressMitchAlsup
||  |+- Re: Concertina II ProgressMitchAlsup
||  |`* Re: Concertina II ProgressThomas Koenig
||  +- Re: Concertina II ProgressQuadibloc
||  `* Re: Concertina II ProgressQuadibloc
|`* Re: Concertina II ProgressQuadibloc
`* Re: Concertina II ProgressMitchAlsup

Pages:123456789101112131415161718192021222324252627282930313233343536373839
Re: Concertina II Progress

<c05870a9671090819ed87c07f6b9c8ad@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34939&group=comp.arch#34939

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Sun, 12 Nov 2023 21:37:39 +0000
Organization: novaBBS
Message-ID: <c05870a9671090819ed87c07f6b9c8ad@news.novabbs.com>
References: <uigus7$1pteb$1@dont-email.me> <cb09075f8208771a17611005f8aeb4f3@news.novabbs.com> <uikb5h$2lcq7$1@dont-email.me> <b823b8abcfb22863a70eae7e0283cc39@news.novabbs.com> <uilu1p$2vbev$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="712949"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$jJMUKyz4xM/D4uT882H3OuiT3cTcmYbdEGIQ1VM7ZjUl9KD8Lyuy.
X-Spam-Level: *
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Sun, 12 Nov 2023 21:37 UTC

BGB wrote:

> On 11/10/2023 12:22 PM, MitchAlsup wrote:
>> BGB wrote:
>
>>> One can argue that aligned-only allows for a cheaper L1 D$, but also
>>> "sucks pretty bad" for some tasks:
>>>    Fast memcpy;
>>>    LZ decompression;
>>>    Huffman;
>>>    ...
>> <
>> Time found that HW can solve the problem way more than adequately--
>> obviating its inclusion entirely. {Sooner or later Reduced leads RISC}
>> <
>>

> Wait, are you arguing for aligned-only memory ops here?...
<
I have not argued for aligned memory references since about 2000 (maybe as
early as 1991).
<

Re: Alignment (was: Concertina II Progress)

<uirivn$8dlh$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34941&group=comp.arch#34941

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: kegs@provalid.com (Kent Dickey)
Newsgroups: comp.arch
Subject: Re: Alignment (was: Concertina II Progress)
Date: Sun, 12 Nov 2023 22:18:31 -0000 (UTC)
Organization: provalid.com
Lines: 41
Message-ID: <uirivn$8dlh$1@dont-email.me>
References: <2023Nov11.112254@mips.complang.tuwien.ac.at> <memo.20231111165327.11928Y@jgd.cix.co.uk> <FlS3N.25739$_Oab.3565@fx15.iad>
Injection-Date: Sun, 12 Nov 2023 22:18:31 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e4750fb26d300cbd10d591f68ada5a48";
logging-data="276145"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19tgqDduD3E4lercbpz4OZV"
Cancel-Lock: sha1:eOMRFJ7sy/ilw3TpzOrU8M97sPY=
X-Newsreader: trn 4.0-test76 (Apr 2, 2001)
Originator: kegs@provalid.com (Kent Dickey)
 by: Kent Dickey - Sun, 12 Nov 2023 22:18 UTC

In article <FlS3N.25739$_Oab.3565@fx15.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:
>jgd@cix.co.uk (John Dallman) writes:
>>In article <2023Nov11.112254@mips.complang.tuwien.ac.at>,
>>anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>>
>>> True, but that has been tried out and, in a world (like Linux) where
>>> software is developed on a platform that supports unaligned
>>> accesses, and then compiled by package maintainers (who often are
>>> not that familiar with the software) on a lot of platforms, the end
>>> result was that the kernel by default performed a fixup (and put a
>>> message in the dmesg buffer) instead of delivering a SIGBUS.
>>
>>Yup. The software I work on is meant, in itself, to work on platforms
>>that enforce alignment, and it was a useful catcher for some kinds of bug.
>>However, I'm now down to one that actually enforces it, in SPARC Solaris,
>>and that isn't long for this world.
>>
>>I dug into what it would take to have x86-64 Linux work with alignment
>>enforcement turned on, and it's a huge job.
>
>It might be easier with AArch64. Just set the A bit (bit 1) in SCTLR_EL1;
>it only effects code executing in usermode.
>
>There may even already be some ELF flag that will set it when the
>file is exec(2)'d.

On Aarch64, with GCC at least, you also need to specify "-mstrict-align"
when compiling all source code, to prevent the compiler from assuming it
can access structure fields in an unaligned way, even if all of your
code accesses are fully aligned. GCC can mess around behind your back,
changing ptr->array32[1] = 0 and ptr->array32[2] = 0 into a single
64-bit write of ptr->array32[1] = 0, among other things. If the offset
of array32[1] wasn't 64-bit aligned, it's an alignment trap if
SCTLR_EL1.A=1.

On all Arm system, Device memory accesses must always be aligned. User code
in general does not get access to Device memory, so this does not affect
regular users.

Kent

Re: Concertina II Progress

<2023Nov12.230924@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34945&group=comp.arch#34945

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Sun, 12 Nov 2023 22:09:24 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 35
Message-ID: <2023Nov12.230924@mips.complang.tuwien.ac.at>
References: <uigus7$1pteb$1@dont-email.me> <uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me> <uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me> <uire3v$7li2$1@dont-email.me>
Injection-Info: dont-email.me; posting-host="1c67750aa3c13e65e256ac016f5d9798";
logging-data="283393"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+a/qSfi/pqUr78KaU1pmG5"
Cancel-Lock: sha1:8Ssw5OMtJ495tBzQ6CRA4T2SKHM=
X-newsreader: xrn 10.11
 by: Anton Ertl - Sun, 12 Nov 2023 22:09 UTC

Quadibloc <quadibloc@servername.invalid> writes:
>On Thu, 09 Nov 2023 17:49:03 -0600, BGB-Alt wrote:
>> Errm, splitting up registers like this is likely to hurt far more than
>> anything that 16-bit displacements are likely to gain.
....
>> Much preferable for a compiler to have a flat space of 32 or 64
>> registers. Having 16 sorta works, but does still add a bit to spill and
>> fill.
....
>But if the 16-bit instructions I'm making room for are useless to
>compilers, that's questionable.

It works for the RISC-V C (compressed) extension. Some of these
compressed instrutions use registers 8-15 (others use all 32
registers, but have other restrictions). But it works fine exactly
because, if your register usage does not fit the limitations of the
16-bit encoding, you just use the 32-bit version of the instruction.
It seems that they designed the ABI such that registers 8-15 occur
often in the code. Maybe the gcc maintainer also put some work into
preferring these registers.

OTOH, ARM who have extensive experience with mixed 32-bit/16-bit
instruction sets with their A32/T32 instruction set(s), designed their
A64 instruction set to strictly use 32-bit instructions.

So if MIPS, SPARC, Power, Alpha, and ARM A64 went for fixed-width
32-bit instructions, why make your task harder by also implementing
short instructions? Of course, if that is your goal or you have fun
with this, why not? But if you want to make progress, it seems to be
something that can be skipped.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Concertina II Progress

<uirmau$8tah$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34946&group=comp.arch#34946

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Sun, 12 Nov 2023 23:15:43 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 38
Message-ID: <uirmau$8tah$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<cb09075f8208771a17611005f8aeb4f3@news.novabbs.com>
<uikcd2$2lh5f$3@dont-email.me>
<b8330e20443df008b0ab07560e543581@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 12 Nov 2023 23:15:43 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="6b4eecbb0d62670bc6753d55ba5ba343";
logging-data="292177"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ZQyRcn4QdufDmSwd7shHvCusdvVLtwy0="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:VZbEdMDfz4UUWXZoRxRpFwFpn34=
 by: Quadibloc - Sun, 12 Nov 2023 23:15 UTC

On Sun, 12 Nov 2023 21:25:20 +0000, MitchAlsup wrote:

> I am not buying this. Which takes more opcode space::
> a) an ISA with unaligned only LDs and STs (11)
> or b) an ISA with unaligned LDs and STs (11) and aligned LDs and STs
> (another 11)

That is true, *other things being equal*.

However, what I had was:

An ISA with unaligned loads and stores, that could use all 32 destination
registers, and all 8 index and base registers. (Call this A)

That took up too much opcode space to allow 16-bit instructions.

So I made various compromises to shave one bit off the loads and stores,
and then I could have 16 bit instructions. (Call this B)

But I didn't like the compromises.

So I made _more_ compromises, to shave _another_ bit off the loads and
stores. This way, I had enough opcode space to add aligned-only loads
and stores... that could use all 32 destination registers, and all 8
index and base registers. (Call this C)

Since other things _were not equal_, it was perfectly possible for C
to use less opcode space than A, and about the same amount of opcode
space as B. So I got to use 16-bit instructions AND have a set of loads
and stores that used all 32 destnation registers, and all 8 index and
base registers.

The compromises on the _unaligned_ loads and stores were painful, but
they were chosen so that code using them wouldn't have to be be
significantly less efficient than code with the set of loads and stores
in A.

John Savard

Re: Alignment

<954b771fc393946b7c3b6a40b4a38693@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34947&group=comp.arch#34947

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Alignment
Date: Mon, 13 Nov 2023 00:09:00 +0000
Organization: novaBBS
Message-ID: <954b771fc393946b7c3b6a40b4a38693@news.novabbs.com>
References: <2023Nov11.112254@mips.complang.tuwien.ac.at> <memo.20231111165327.11928Y@jgd.cix.co.uk> <FlS3N.25739$_Oab.3565@fx15.iad> <uirivn$8dlh$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="724215"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$j5vJjIpiN5Z00Jf1A1veg.38k7ZHoIy.P6PCj3KDIvY9sz0GsHm2a
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Mon, 13 Nov 2023 00:09 UTC

Kent Dickey wrote:

> In article <FlS3N.25739$_Oab.3565@fx15.iad>,
> Scott Lurndal <slp53@pacbell.net> wrote:
>>jgd@cix.co.uk (John Dallman) writes:
>>>In article <2023Nov11.112254@mips.complang.tuwien.ac.at>,
>>>anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>>>
>>>> True, but that has been tried out and, in a world (like Linux) where
>>>> software is developed on a platform that supports unaligned
>>>> accesses, and then compiled by package maintainers (who often are
>>>> not that familiar with the software) on a lot of platforms, the end
>>>> result was that the kernel by default performed a fixup (and put a
>>>> message in the dmesg buffer) instead of delivering a SIGBUS.
>>>
>>>Yup. The software I work on is meant, in itself, to work on platforms
>>>that enforce alignment, and it was a useful catcher for some kinds of bug.
>>>However, I'm now down to one that actually enforces it, in SPARC Solaris,
>>>and that isn't long for this world.
>>>
>>>I dug into what it would take to have x86-64 Linux work with alignment
>>>enforcement turned on, and it's a huge job.
>>
>>It might be easier with AArch64. Just set the A bit (bit 1) in SCTLR_EL1;
>>it only effects code executing in usermode.
>>
>>There may even already be some ELF flag that will set it when the
>>file is exec(2)'d.

> On Aarch64, with GCC at least, you also need to specify "-mstrict-align"
> when compiling all source code, to prevent the compiler from assuming it
> can access structure fields in an unaligned way, even if all of your
> code accesses are fully aligned. GCC can mess around behind your back,
> changing ptr->array32[1] = 0 and ptr->array32[2] = 0 into a single
> 64-bit write of ptr->array32[1] = 0, among other things. If the offset
> of array32[1] wasn't 64-bit aligned, it's an alignment trap if
> SCTLR_EL1.A=1.

> On all Arm system, Device memory accesses must always be aligned. User code
> in general does not get access to Device memory, so this does not affect
> regular users.
<
For all the same reasons one does not do misaligned accesses to ATOMIC
memory locations, one does not do misaligned accesses to device control
registers.
<
> Kent

Re: Concertina II Progress

<9ea96b80fece16c3337f16822115bb63@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34948&group=comp.arch#34948

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Mon, 13 Nov 2023 00:10:44 +0000
Organization: novaBBS
Message-ID: <9ea96b80fece16c3337f16822115bb63@news.novabbs.com>
References: <uigus7$1pteb$1@dont-email.me> <uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me> <uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me> <uire3v$7li2$1@dont-email.me> <2023Nov12.230924@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="724215"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$mwfTaHy/fKUTJDhJFFiHau.zsQJG75mCNj/uPc7FtiTD5kwYIjdKK
 by: MitchAlsup - Mon, 13 Nov 2023 00:10 UTC

Anton Ertl wrote:

> Quadibloc <quadibloc@servername.invalid> writes:
>>On Thu, 09 Nov 2023 17:49:03 -0600, BGB-Alt wrote:
>>> Errm, splitting up registers like this is likely to hurt far more than
>>> anything that 16-bit displacements are likely to gain.
> ....
>>> Much preferable for a compiler to have a flat space of 32 or 64
>>> registers. Having 16 sorta works, but does still add a bit to spill and
>>> fill.
> ....
>>But if the 16-bit instructions I'm making room for are useless to
>>compilers, that's questionable.

> It works for the RISC-V C (compressed) extension. Some of these
> compressed instrutions use registers 8-15 (others use all 32
> registers, but have other restrictions). But it works fine exactly
> because, if your register usage does not fit the limitations of the
> 16-bit encoding, you just use the 32-bit version of the instruction.
> It seems that they designed the ABI such that registers 8-15 occur
> often in the code. Maybe the gcc maintainer also put some work into
> preferring these registers.

> OTOH, ARM who have extensive experience with mixed 32-bit/16-bit
> instruction sets with their A32/T32 instruction set(s), designed their
> A64 instruction set to strictly use 32-bit instructions.

> So if MIPS, SPARC, Power, Alpha, and ARM A64 went for fixed-width
> 32-bit instructions, why make your task harder by also implementing
> short instructions? Of course, if that is your goal or you have fun
> with this, why not? But if you want to make progress, it seems to be
> something that can be skipped.
<
Sound
<
> - anton

Re: Concertina II Progress

<4e99726a78a7843a505893980635b8dd@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34949&group=comp.arch#34949

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Mon, 13 Nov 2023 00:16:24 +0000
Organization: novaBBS
Message-ID: <4e99726a78a7843a505893980635b8dd@news.novabbs.com>
References: <uigus7$1pteb$1@dont-email.me> <cb09075f8208771a17611005f8aeb4f3@news.novabbs.com> <uikcd2$2lh5f$3@dont-email.me> <b8330e20443df008b0ab07560e543581@news.novabbs.com> <uirmau$8tah$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="724625"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$ekReDIi2oM0DMzTUkDoDCuHNT1ZMFOaGqG5EY3/4mawCUEyjtFXiK
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
 by: MitchAlsup - Mon, 13 Nov 2023 00:16 UTC

Quadibloc wrote:

> On Sun, 12 Nov 2023 21:25:20 +0000, MitchAlsup wrote:

>> I am not buying this. Which takes more opcode space::
>> a) an ISA with unaligned only LDs and STs (11)
>> or b) an ISA with unaligned LDs and STs (11) and aligned LDs and STs
>> (another 11)

> That is true, *other things being equal*.

> However, what I had was:
<
A poorly chosen starting point (dark alley)
<
> An ISA with unaligned loads and stores, that could use all 32 destination
> registers, and all 8 index and base registers. (Call this A)

> That took up too much opcode space to allow 16-bit instructions.

> So I made various compromises to shave one bit off the loads and stores,
> and then I could have 16 bit instructions. (Call this B)

> But I didn't like the compromises.
<
Captain Obvious to the rescue::
<
> So I made _more_ compromises, to shave _another_ bit off the loads and
> stores. This way, I had enough opcode space to add aligned-only loads
> and stores... that could use all 32 destination registers, and all 8
> index and base registers. (Call this C)
<
Back out of the dark alley, and start from first principles again.
<
> Since other things _were not equal_, it was perfectly possible for C
> to use less opcode space than A, and about the same amount of opcode
> space as B. So I got to use 16-bit instructions AND have a set of loads
> and stores that used all 32 destnation registers, and all 8 index and
> base registers.
<
Maybe "less opcode space" if you count bits, but it is "more opcode space"
if/when you enumerate all the opcodes within the space.
<
> The compromises on the _unaligned_ loads and stores were painful, but
> they were chosen so that code using them wouldn't have to be be
> significantly less efficient than code with the set of loads and stores
> in A.
<
Does you compiler agree with this assertion ??
<
> John Savard

Re: Concertina II Progress

<uirs4p$a10p$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34953&group=comp.arch#34953

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Mon, 13 Nov 2023 00:54:49 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 38
Message-ID: <uirs4p$a10p$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<cb09075f8208771a17611005f8aeb4f3@news.novabbs.com>
<uikcd2$2lh5f$3@dont-email.me>
<b8330e20443df008b0ab07560e543581@news.novabbs.com>
<uirmau$8tah$1@dont-email.me>
<4e99726a78a7843a505893980635b8dd@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 13 Nov 2023 00:54:49 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="6b4eecbb0d62670bc6753d55ba5ba343";
logging-data="328729"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Kcox5xJmnupci2UnHJRuZTKS1LdGXLLE="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:+v4dZmA7HN4xK52fpYxP7tsa54s=
 by: Quadibloc - Mon, 13 Nov 2023 00:54 UTC

On Mon, 13 Nov 2023 00:16:24 +0000, MitchAlsup wrote:

> Does you compiler agree with this assertion ??

As I'm still only in the early stages of roughing out
the bare outlines of an ISA, I have not yet built such
advanced diagnostic tools, I must admit.

However, my original compromise had been to reduce
the number of index registers used with memory-reference
instructions to 3 from 7.

The two improved compromises I used in this later effort
were:

Compromise 1:

Reduce the number of base registers used with memory-reference
instructions (when using a 16-bit displacement) to 3 from 7.

I figured that _this_ was far less likely to reduce efficiency,
since normally not that many base registers were used in any
case.

Compromise 2:

When an instruction is not indexed, reduce the size of the index
register field to two bits, both containing 0.

When an instruction is indexed, reduce the size of the destination
register field to 4 bits from 5, thus allowing only 16 of the 32
registers to be used with indexed memory accesses.

This one is more painful, but it had historical precedent. One
consequence is that the number of index registers is reduced, to
six from 7, because now index register 4 "looks like zero".

John Savard

Re: Concertina II Progress

<uiru4l$af3l$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34954&group=comp.arch#34954

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Sun, 12 Nov 2023 19:28:51 -0600
Organization: A noiseless patient Spider
Lines: 74
Message-ID: <uiru4l$af3l$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<cb09075f8208771a17611005f8aeb4f3@news.novabbs.com>
<uikb5h$2lcq7$1@dont-email.me>
<b823b8abcfb22863a70eae7e0283cc39@news.novabbs.com>
<uilu1p$2vbev$1@dont-email.me>
<c05870a9671090819ed87c07f6b9c8ad@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 13 Nov 2023 01:28:54 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e919696c091d00cd19514acc6193b0c6";
logging-data="343157"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+TEFDAPS2Zk7bgLaX4frFx"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:9vrEJTYimy5wH2oXrI4mz3Iz2LA=
In-Reply-To: <c05870a9671090819ed87c07f6b9c8ad@news.novabbs.com>
Content-Language: en-US
 by: BGB - Mon, 13 Nov 2023 01:28 UTC

On 11/12/2023 3:37 PM, MitchAlsup wrote:
> BGB wrote:
>
>> On 11/10/2023 12:22 PM, MitchAlsup wrote:
>>> BGB wrote:
>>
>>>> One can argue that aligned-only allows for a cheaper L1 D$, but also
>>>> "sucks pretty bad" for some tasks:
>>>>    Fast memcpy;
>>>>    LZ decompression;
>>>>    Huffman;
>>>>    ...
>>> <
>>> Time found that HW can solve the problem way more than adequately--
>>> obviating its inclusion entirely. {Sooner or later Reduced leads RISC}
>>> <
>>>
>
>> Wait, are you arguing for aligned-only memory ops here?...
> <
> I have not argued for aligned memory references since about 2000 (maybe as
> early as 1991).
> <

Makes sense, but I was confused as to what was being argued here...

I prefer unaligned memory access, since it allows a lot of nifty stuff
to be done.

But, I can note that the main drawback it has is in terms of requiring a
more expensive L1 cache.

Aligned-only cache only needs:
A single row of cache-lines
To check a single address for hit/miss;
Can use a simpler set of MUX'es for extract/insert.

Vs, say:
Two rows of cache lines (say, even and odd);
Needs to check two addresses;
More complicated extract/insert logic.

But, say, if one needs to operate within the limits of an aligned-only
cache, then even something like an LZ4 decompressor is painfully slow,
as it has to basically do damn near everything 1 byte at a time (or, at
least, more so than it does already).

I once did have a compressor (FeLZ32) more designed for the constraints
of the SuperH ISA (and aligned-only memory access), but its main
"feature" was that pretty much everything was defined in terms of 32-bit
words (it was not copying bytes, rather, 32 bit words, and the encoded
stream was itself an array of 32-bit words).

It also managed to beat out LZ4's performance by a fair margin on the
Piledriver I was using at the time.

But, this performance advantage effectively evaporated on my Ryzen
(where LZ4 speed increased significantly), and was also mostly N/A on
BJX2. In this case, the byte-oriented formats were more preferable as
they got better compression.

Like, a lot of the performance tricks I had developed on the Piledriver
were effectively rendered moot.

Though, some amount of the tricks were mostly workarounds for "things
that were slow", which the newer CPU had made effectively unnecessary or
counter productive.

....

Re: Concertina II Progress

<uis16r$egrf$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34956&group=comp.arch#34956

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Sun, 12 Nov 2023 20:21:13 -0600
Organization: A noiseless patient Spider
Lines: 268
Message-ID: <uis16r$egrf$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me>
<uikc1s$2lh5f$2@dont-email.me> <4sr3N.17406$AqO5.3263@fx11.iad>
<uilskk$2v1d2$1@dont-email.me>
<623e659449642a1a6fdc8eda7f5470fe@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 13 Nov 2023 02:21:16 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e919696c091d00cd19514acc6193b0c6";
logging-data="476015"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+lTu5sTDkxAEKn+O1H+2cK"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:uVMCrpTPhTAzHBW92ynh2WhUK+k=
Content-Language: en-US
In-Reply-To: <623e659449642a1a6fdc8eda7f5470fe@news.novabbs.com>
 by: BGB - Mon, 13 Nov 2023 02:21 UTC

On 11/12/2023 3:35 PM, MitchAlsup wrote:
> BGB wrote:
>
>> On 11/10/2023 8:51 AM, Scott Lurndal wrote:
>>>
>
>> As for register arguments:
>> * Probably 8 or 16.
>> ** 8 makes the most sense with 32 GPRs.
>> *** 16 is asking too much.
>> *** 8 deals with around 98% of functions.
>> ** 16 makes sense with 64 GPRs.
>> *** Nearly all functions can use exclusively register arguments.
>> *** Gain is small though, if it only benefits 2% of functions.
>> *** It is almost a "shoe in", except for cost of fixed spill space
>> *** 128 bytes at the bottom of every non-leaf stack-frame is noticeable.
>> *** Though, an ABI could decide to not have a spill space in this way.
> <
> For the reasons stated above (some clipped) I agree with this whole
> block of statements.
> <
> Since My 66000 has 32 registers, I went with upto 8 arguments in registers,
> upto 8 results in registers, with the 9th of either on-the-stack in such a
> way that if the callee is vararg the argument registers can be pushed on
> the
> stack to form a memory resident vector of arguments {{just perfect for
> printf().}}
> <
> With 8 registers covering 98%-ile of calls, there is too little left by
> making this boundary 12-16 both of which ARE still possible.
> <

Yeah.

Short of things like using 128-bit pointers, or lots of 128-bit
arguments (with an ABI that expresses these in pairs), the 8 argument
ABI seems to be slightly ahead here (even with 64 registers).

Mostly, because 2% of functions needing to use memory arguments seems to
cost less than the indirect cost of every other non-leaf function
needing to reserve an extra 64 bytes in the stack frame.

Had considered a possible ABI tweak where functions that only call other
functions with fewer than 8 register arguments (likely excluding
vararg); only need to reserve space for the first 8 arguments.

But, the gains are likely to be rather small compared to the added
debugging effort.

>> Though, admittedly, for a lot of my programs I had still ended up
>> going with 8 register arguments with 64 GPRs, mostly as the gains of
>> 16 arguments is small, relative of the cost of spending an additional
>> 64 bytes in nearly every stack frame (and also there are still some
>> unresolved bugs when using 16 argument mode).
> <
> It is a delicate balance and it is easy to make the code look better
> while actually running slower.
> <

Yeah.

I suspect it is likely due mostly to something like L1 cache misses or
similar (bigger stack frame, more area for the L1 cache to miss).

OTOH: Had recently added the logic to shuffle prolog register-stores in
an attempt to reduce WAW stalls. Turned out, fully aligning stuff would
be a much bigger pain than initially hope (the curse of multiple cases
of duplicated logic that needs to operate in lockstep).

Did come up with an intermediate option:
Generate an temporary array of which registers are saved at which offsets;
Generate a permutation array for which order to store these registers;
Initial permutation uses simple XOR shuffling;
Have a function to model the WAW cost of each permutation;
Shuffle the permutations with a PRNG (up to N times);
Pick the permutation with the smallest WAW cost.

Mostly works OK, but granted, nearly any ordering is better at this
metric than saving them in a linear order.

Though, doesn't really gain much if the forwarding option is enabled.

Relatedly, was also able to make Doom a little faster with another trick:
Instead of drawing into an off-screen buffer, and then copying this to
the screen in the form of a DIB Bitmap object...

There can be functions to request and release framebuffers for a given
Drawing-Context (with a supplied BITMAPINFOHEADER; this request failing
and returning NULL if the BITMAPINFOHEADER doesn't match the format used
by the HDC or similar; forcing fallback to the older method).

Similarly, there is a "SwapBuffers" style call, with these buffers
effectively operating in a double-buffering style.

In effect, it is an interface slightly more like what SDL uses.

Was kind of a hassle to modify Doom to play well with double buffering
though, initially it was a strobe-filled / flickering mess , with the
status bar effectively having a seizure. Does still have the annoyance
that when one noclip's though a wall, then whatever garbage is left over
is now prone to a strobe effect.

However, using shared buffers and then having Doom draw into them, does
reduce the amount of framebuffer copying needed for each screen update.

As-is, will currently only work though in 320x200 hi-color mode (where
biHeight==-200, where negative height indicates an origin in the
top-left corner).

However, the DIB drawing method does allow more flexibility here (the
internal bitmap can be in a wider range of formats, and will be
converted as needed).

Granted, one can note that things like pixel format conversion and
similar aren't free.

Also recently encountered a video online where someone was running Doom
on a 386, and, the framerates *sucked*... ( Like, mostly single-digit
territory, and with somewhat longer load-times as well. )

Can at least probably say, with reasonable confidence, that my BJX2 core
is faster than a 386...

Some other information implies that the speeds I am seeing are more
on-par with a high-end 486 or maybe a low-end Pentium.

( Nevermind that Quake performance is still crap in my case... )

( Somehow, it seems like old computers were generally worse and less
capable than my childhood self remembered. )

Formats supported in DIB form at present:
RGB555, RGB24, RGBA32, Indexed 1/2/4/8-bit, UTX2.

Formats used by the display hardware:
Color-Cell 8x8 as 4x 4x4x2bpp (2 endpoints per 4x4 cell);
Color-Cell 8x8x1 (2 color endpoints).
Also used for text-mode display.
4x4x16bit RGB555
4x4x8bit Indexed
(New/Experimental) Linear RGB555 and Indexed 8-bit
Framebuffer pixels now in a conventional linear raster ordering.
Also, the framebuffer is now movable, allowing double-buffering.
Framebuffer will require a 32 byte alignment though.
And needs to be in a physically-mapped address range.

Still don't have any "good" 256 color palettes:
6*6*6 and 6*7*6 (216 and 252 color)
Good for bright cartoony graphics, poor for much else.
Generally loses any detail in things like shading.
6*7*6 can't do grays effectively, only purple and green tints.
16 shades of 16 colors
Better "in general", obvious color distortion for cartoon images
13 shades of 19 colors (*1)
Slightly better than the previous
Mostly cutting off "near black" for additional colors.
Say: adding an Orange, Olive-Green, and Sky-Blue gradient.
Don't need 48 colors of "almost black"...

I don't know of any palette optimization algorithms that are fast enough
to run in real-time on the BJX2 core (I suspect "in the old days",
palette optimization was likely offline only).

Granted, other palettes are possible, mostly just the difficulty of
finding an organization that "looks good in the general case".

*1:
0z: Gray
1z: Blue (High Sat)
2z: Green (High Sat)
3z: Cyan (High Sat)
4z: Red (High Sat)
5z: Magenta (High Sat)
6z: Yellow (High Sat)
7z: Pink (Off-White)
8z: Beige (Off-White)
9z: Blue (Low Sat)
Az: Green (Low Sat)
Bz: Cyan (Low Sat)
Cz: Red (Low Sat)
Dz: Magenta (Low Sat)
Ez: Yellow (Low Sat)
Fz: Sky Blue (Off-White)

z0: Orange (Mid Sat)
z1: Olive (Mid Sat)
z2: Sky Blue (Mid Sat)

00: Black
01, 02: Very dark gray.
10/11/12/20/21/22: Various other "nearly black" colors.
Technically, the bottoms of the orange/olive/sky bars;
But, these can effectively "merge" the other colors.

In my fiddling, this was generally the "best performing" palette layout
I could seem to find thus far.

>> ....
>
>
>
>> Current leaning is also that:
>>    32-bit primary instruction size;
>>    32/64/96 bit for variable-length instructions;
>>    Is "pretty good".
>
>> In performance-oriented use cases, 16-bit encodings "aren't really
>> worth it".
>> In cases where you need a 32 or 64 bit value, being able to encode
>> them or load them quickly into a register is ideal. Spending multiple
>> instructions to glue a value together isn't ideal, nor is needing to
>> load it from memory (this particularly sucks from the compiler POV).
>
>
>> As for addressing modes:
>>    (Rb, Disp) : ~ 66-75%
>>    (Rb, Ri)   : ~ 25-33%
>> Can address the vast majority of cases.
>
>> Displacements are most effective when scaled by the size of the
>> element type, as unaligned displacements are exceedingly rare. The
>> vast majority of displacements are also positive.
>
>> Not having a register-indexed mode is shooting oneself in the foot, as
>> these are "not exactly rare".
>
>> Most other possible addressing modes can be mostly ignored.
>>    Auto-increment becomes moot if one has superscalar or VLIW;
>>    (Rb, Ri, Disp) is only really applicable in niche cases
>>      Eg, array inside struct, etc.
>>    ...
>
>
>
>> RISC-V did sort of shoot itself in the foot in several of these areas,
>> albeit with some workarounds in "Bitmanip":
>>    SHnADD, can mimic a LEA, allowing array access in fewer ops.
>>    PACK, allows an inline 64-bit constant load in 5 instructions...
>>      LUI+ADD+LUI+ADD+PACK
>>    ...
>
>> Still not ideal...
>
>> An extra cycle for memory access is not ideal for a close second place
>> addressing mode; nor are 64-bit constants rare enough that one
>> necessarily wants to spend 5 or so clock cycles on them.
>
>> But, still better than the situation where one does not have these
>> instructions.
>
>> ....


Click here to read the complete article
Re: Concertina II Progress

<uis2j9$f06t$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34957&group=comp.arch#34957

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Mon, 13 Nov 2023 02:44:57 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <uis2j9$f06t$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<cb09075f8208771a17611005f8aeb4f3@news.novabbs.com>
<uikcd2$2lh5f$3@dont-email.me>
<b8330e20443df008b0ab07560e543581@news.novabbs.com>
<uirmau$8tah$1@dont-email.me>
<4e99726a78a7843a505893980635b8dd@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 13 Nov 2023 02:44:57 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="6b4eecbb0d62670bc6753d55ba5ba343";
logging-data="491741"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/0alxfJYtS62RsydvIxcxNha1BMOu8szk="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:Quj1qeBECz9TFbfwYXcs30S0PK0=
 by: Quadibloc - Mon, 13 Nov 2023 02:44 UTC

On Mon, 13 Nov 2023 00:16:24 +0000, MitchAlsup wrote:

> A poorly chosen starting point (dark alley)

> Back out of the dark alley, and start from first principles again.

By the way, I think you mean a _blind_ alley.

A dark alley is just a dangerous place, since robbers can attack you
there without being seen.

A _blind_ alley is one that had no exit, one that is a dead end. That
seems to better fit the context of your remarks.

John Savard

Re: Concertina II Progress

<f1e9e6491af62c310d7473f87dcf139e@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34958&group=comp.arch#34958

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Mon, 13 Nov 2023 03:06:03 +0000
Organization: novaBBS
Message-ID: <f1e9e6491af62c310d7473f87dcf139e@news.novabbs.com>
References: <uigus7$1pteb$1@dont-email.me> <cb09075f8208771a17611005f8aeb4f3@news.novabbs.com> <uikcd2$2lh5f$3@dont-email.me> <b8330e20443df008b0ab07560e543581@news.novabbs.com> <uirmau$8tah$1@dont-email.me> <4e99726a78a7843a505893980635b8dd@news.novabbs.com> <uis2j9$f06t$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="735461"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$wwOS/sKkGsmWLhUI6DwLuOq5Fdxbhjg/IT7vpU5E7jrTkyvYjUkBO
 by: MitchAlsup - Mon, 13 Nov 2023 03:06 UTC

Quadibloc wrote:

> On Mon, 13 Nov 2023 00:16:24 +0000, MitchAlsup wrote:

>> A poorly chosen starting point (dark alley)

>> Back out of the dark alley, and start from first principles again.

> By the way, I think you mean a _blind_ alley.

> A dark alley is just a dangerous place, since robbers can attack you
> there without being seen.

> A _blind_ alley is one that had no exit, one that is a dead end. That
> seems to better fit the context of your remarks.
<
based on our definitions I definitively meant dark as in dangerous as
opposed to no way out except backwards.

> John Savard

Re: Alignment (was: Concertina II Progress)

<3Dq4N.33002$BbXa.1697@fx16.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34965&group=comp.arch#34965

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx16.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Alignment (was: Concertina II Progress)
Newsgroups: comp.arch
References: <PQ74N.100$ayBd.39@fx07.iad> <memo.20231112174054.11928i@jgd.cix.co.uk>
Lines: 26
Message-ID: <3Dq4N.33002$BbXa.1697@fx16.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Mon, 13 Nov 2023 14:44:15 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Mon, 13 Nov 2023 14:44:15 GMT
X-Received-Bytes: 1715
 by: Scott Lurndal - Mon, 13 Nov 2023 14:44 UTC

jgd@cix.co.uk (John Dallman) writes:
>In article <PQ74N.100$ayBd.39@fx07.iad>, scott@slp53.sl.home (Scott
>Lurndal) wrote:
>
>> jgd@cix.co.uk (John Dallman) writes:
>> >In article <FlS3N.25739$_Oab.3565@fx15.iad>, scott@slp53.sl.home
>> (Scott Lurndal) wrote:
>> >> jgd@cix.co.uk (John Dallman) writes:
>> >> It might be easier with AArch64. Just set the A bit (bit 1) in
>> >> SCTLR_EL1; it only effects code executing in usermode.
>> >>
>> >> There may even already be some ELF flag that will set it when the
>> >> file is exec(2)'d.
>> >
>> >I'll take a look, but I doubt glibc on Aarch64 is built to be run
>> >with alignment trapping. Should it be EL0 for usermode?
>>
>> The EL1 in the register name describes the minimum exception level
>> allowed to access the register. SCTLR_EL1 includes control bits
>> for both EL1 and EL0.
>
>Aha. It's harder for ARM64: I'd have to be in supervisor mode to set that
>bit, and the stuff I work on is strictly application code.

Unless the ELF flag trick is implemented. I haven't looked at the kernel
with respect to that.

Re: Alignment

<uite8t$moak$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34966&group=comp.arch#34966

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Alignment
Date: Mon, 13 Nov 2023 16:10:20 +0100
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <uite8t$moak$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<cb09075f8208771a17611005f8aeb4f3@news.novabbs.com>
<uikb5h$2lcq7$1@dont-email.me>
<b823b8abcfb22863a70eae7e0283cc39@news.novabbs.com>
<uilu1p$2vbev$1@dont-email.me> <2023Nov11.082221@mips.complang.tuwien.ac.at>
<uiokob$3j4r2$1@dont-email.me>
<5d7123751a270e1123737e667b1c31f4@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 13 Nov 2023 15:10:22 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1eda5e5161597df4c60b057a89c1dff6";
logging-data="745812"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19jmtezFC7hVi3DJHD+CfQs5/FHmk8LR3XXgPwxaXOvKA=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17.1
Cancel-Lock: sha1:x9+eaeBOMmVQtuvYqttaXjrwZA0=
In-Reply-To: <5d7123751a270e1123737e667b1c31f4@news.novabbs.com>
 by: Terje Mathisen - Mon, 13 Nov 2023 15:10 UTC

MitchAlsup wrote:
> Chris M. Thomasson wrote:
>
>>
>> Think of LL/SC... If one did not honor the reservation granule....
>> well... Shit.. False sharing on a reservation granule can cause live
>> lock and damage forward progress wrt some LL/SC setups.
> <
> One should NEVER (N. E. V. E. R.) attempt ATOMIC stuff on an unaligned
> container. Only aligned containers possess ATOMIC-smelling properties.

This is so obviously correct that you should not have needed to mention
it. Hammering HW with unaligned (maybe even page-straddling) LOCKed
updates is something that should only ever be done for testing purposes.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

short instructions (was: Concertina II Progress)

<jwvil65a9yg.fsf-monnier+comp.arch@gnu.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34969&group=comp.arch#34969

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: monnier@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: short instructions (was: Concertina II Progress)
Date: Mon, 13 Nov 2023 11:46:47 -0500
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <jwvil65a9yg.fsf-monnier+comp.arch@gnu.org>
References: <uigus7$1pteb$1@dont-email.me>
<cb09075f8208771a17611005f8aeb4f3@news.novabbs.com>
<uikcd2$2lh5f$3@dont-email.me>
<b8330e20443df008b0ab07560e543581@news.novabbs.com>
<uirmau$8tah$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="16f9ef7df38d3fc9f1f0aa027a6fa203";
logging-data="775562"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19vR+k12GkLsMR90aG7Y7lOQAlNrnlkKi4="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:aoEsQw4wlpE9ycQQItiycAP/A/w=
sha1:D4ADGhtj13yxffXqL6gJ2tOfXnQ=
 by: Stefan Monnier - Mon, 13 Nov 2023 16:46 UTC

> That took up too much opcode space to allow 16-bit instructions.

You might want to try and get fancy in your short instructions by
"randomizing" the subset of registers they can access.

E.g. allow both your short LD and ST instruction access 16 registers
but not exactly the same 16.
Or allow your arithmetic instructions to access only 8 registers for their
input and output args but not exactly the same 8 for the two inputs
and/or for the output.

I suspect that if done well, it could give benefits similar to the
skewed-associative caches. The other upside is that it makes register
allocation *really* interesting, thus opening up opportunities to
spend a few more years working on that subproblem :-)

To up the ante, you could make the set of registers reachable from each
instruction depend not just on the opcode but also on the instruction's
address, so you can sometimes avoid a spill by swapping two
instructions. This would allow the register allocation to interact in
even more interesting ways with instruction scheduling.
There could be a few more PhDs worth of research there.

Stefan

Re: Concertina II Progress

<uitvv4$sfhq$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34978&group=comp.arch#34978

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Mon, 13 Nov 2023 14:12:16 -0600
Organization: A noiseless patient Spider
Lines: 152
Message-ID: <uitvv4$sfhq$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me>
<uire3v$7li2$1@dont-email.me> <2023Nov12.230924@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 13 Nov 2023 20:12:20 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="e919696c091d00cd19514acc6193b0c6";
logging-data="933434"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18QNh7NoHOKvbV58AtVZigy"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:hA3oDa99o+WFEhvbSMZjaOke7YQ=
In-Reply-To: <2023Nov12.230924@mips.complang.tuwien.ac.at>
Content-Language: en-US
 by: BGB - Mon, 13 Nov 2023 20:12 UTC

On 11/12/2023 4:09 PM, Anton Ertl wrote:
> Quadibloc <quadibloc@servername.invalid> writes:
>> On Thu, 09 Nov 2023 17:49:03 -0600, BGB-Alt wrote:
>>> Errm, splitting up registers like this is likely to hurt far more than
>>> anything that 16-bit displacements are likely to gain.
> ...
>>> Much preferable for a compiler to have a flat space of 32 or 64
>>> registers. Having 16 sorta works, but does still add a bit to spill and
>>> fill.
> ...
>> But if the 16-bit instructions I'm making room for are useless to
>> compilers, that's questionable.
>
> It works for the RISC-V C (compressed) extension. Some of these
> compressed instrutions use registers 8-15 (others use all 32
> registers, but have other restrictions). But it works fine exactly
> because, if your register usage does not fit the limitations of the
> 16-bit encoding, you just use the 32-bit version of the instruction.
> It seems that they designed the ABI such that registers 8-15 occur
> often in the code. Maybe the gcc maintainer also put some work into
> preferring these registers.
>

Yeah. They can be used by a compiler, and can make a difference for
code-density.

Just, it is more a case of, if one has a tradeoff of:
Fewer instructions but more bytes;
More instructions but fewer bytes.
Then the former is better for performance.

Things like reusing registers more aggressively and using a smaller
subset of the registers, are good for making 16-bit instructions usable,
but are less good for performance.

....

Though, granted, one doesn't want to try to reserve too many registers
(on an ISA with plenty of registers), as one may find that
saving/restoring them costs more than that gained by having them
available for use.

Though, the partial workaround for this (in my case) was dividing the
registers up into sub-groups, and using heuristics to enable these
groups based on an estimate of the register pressure.

Say:
R8 ..R14: Always available, prioritized for size optimization ("/Os");
R24..R31: Enables as needed for "/Os", always enabled for perf opt.
R40..R47: Enabled with high register pressure.
R56..R63: Enabled with very high register pressure.

Note:
BGBCC's command-line accepts both "/Os" and "-Os" style arguments.
"/Os": Size optimize
"/O1": Moderate speed (try to balance speed and size)
"/O2": Prioritize speed.
"/Z*": Mostly debug related options (like "-g" in GCC)
"/f*": Optional feature flags.
"/m*": Selects target arch/profile.
"/Fe*": Specify output binary (like "-o" in GCC)
Else, it will try to guess an output file name.
Eg: "foo.c" -> "foo.exe"
...

It does try to guess whether the '/' is part of an option or the start
of a filename. If it sees more than one '/', or sees a '.' or similar,
without encountering an '=', assume it is a filename.

It is almost, but not quite, based on a count of the in-use variables.

It helps to also apply a scale factor for each variable based on how
deeply nested in a loop it is (so that if one has a lot of variables in
use inside a deeply nested loop, the register pressure estimate will be
higher than if most are used outside of a loop).

Though, this scale-factor is nowhere near as severe as with the register
allocation priority (where the nesting level was effectively raised to
an exponent). For pressure estimates, one can use a gentler scale, more
like, say: "scale=sqrt(deepest_nest_level+1.0);".

For dynamically allocated variables in leaf blocks (basic block does not
contain a function call), it may make sense to allocate them in scratch
registers.

Scratch registers are similar:
R0..R1: Not used as GPRs by compiler;
R2..R3: Designated scratch, not used for reg alloc.
R4..R7: Always available;
R16..R17: Designated scratch, not used for reg alloc.
R18..R23: Available when R24..R31 are enabled (always for perf opt);
R32..R39, R48..R55: Available under high register pressure.
Always available if the registers are available and perf optimized.

In performance optimized code, in my case, the spread of the registers
is generally too disperse to really make any sort of small sub-setting
particularly effective.

> OTOH, ARM who have extensive experience with mixed 32-bit/16-bit
> instruction sets with their A32/T32 instruction set(s), designed their
> A64 instruction set to strictly use 32-bit instructions.
>

I guess it can also be noted, that 64-bit ARM went all-in with a lot of
the sorts of features that RISC-V avoided. For example, it still has
some more complex addressing modes, etc.

I guess also they approached constants a little differently:
You can load a 16-bit value into 1 of 4 positions within a register,
with one of: zero fill, one fill, or keeping the prior contents.

This allows loading an arbitrary constant in between 1 and 4 instructions.

Though, I did realize that with RISC-V's Bitmanip extensions, it is
possible to get a 64-bit constant load down to 5 instructions, which is
better than RV64I needing 6 (and in both cases, needing 2 registers).

In BJX2, with Jumbo, it is 3 instruction words and 1 clock cycle.
Without Jumbo, it is 4 instructions (albeit less flexible than the
mechanism in ARM).

> So if MIPS, SPARC, Power, Alpha, and ARM A64 went for fixed-width
> 32-bit instructions, why make your task harder by also implementing
> short instructions? Of course, if that is your goal or you have fun
> with this, why not? But if you want to make progress, it seems to be
> something that can be skipped.
>

In my case, I am left with an awkward split in my ISA:
Baseline Mode, which has both 16 and 32-bit instructions (and bigger);
XG2, which is 32-bit (and bigger).

Some of my newer design variants had leaned towards 32-bit and 64
registers, mostly because the higher register count does towards
performance (at least, performance per clock; not so sure it helps with
LUTs or timing constraints though, *).

*: Mostly because the 5-bit LUTRAMs work with 3 bits of data, but the
6-bit LUTRAMs only have 2 bits of data.

Re: Concertina II Progress

<uiu65e$t5p2$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34983&group=comp.arch#34983

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m.thomasson.1@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Mon, 13 Nov 2023 13:58:06 -0800
Organization: A noiseless patient Spider
Lines: 20
Message-ID: <uiu65e$t5p2$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<cb09075f8208771a17611005f8aeb4f3@news.novabbs.com>
<uikcd2$2lh5f$3@dont-email.me>
<b8330e20443df008b0ab07560e543581@news.novabbs.com>
<uirmau$8tah$1@dont-email.me>
<4e99726a78a7843a505893980635b8dd@news.novabbs.com>
<uis2j9$f06t$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 13 Nov 2023 21:58:07 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="55c5ba4d49df2a7646321f8f0cd32f12";
logging-data="956194"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+doNEUda9lhdoRODGmDk9EO4RRgobQhog="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:CbfTn6QAQiKpns/jhIA5W00phyM=
Content-Language: en-US
In-Reply-To: <uis2j9$f06t$1@dont-email.me>
 by: Chris M. Thomasson - Mon, 13 Nov 2023 21:58 UTC

On 11/12/2023 6:44 PM, Quadibloc wrote:
> On Mon, 13 Nov 2023 00:16:24 +0000, MitchAlsup wrote:
>
>> A poorly chosen starting point (dark alley)
>
>> Back out of the dark alley, and start from first principles again.
>
> By the way, I think you mean a _blind_ alley.
>
> A dark alley is just a dangerous place, since robbers can attack you
> there without being seen.

Expose the darkness to the light, before any adventures...? ;^)

>
> A _blind_ alley is one that had no exit, one that is a dead end. That
> seems to better fit the context of your remarks.
>
> John Savard

Re: short instructions (was: Concertina II Progress)

<uj01n8$1916k$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=34998&group=comp.arch#34998

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: short instructions (was: Concertina II Progress)
Date: Tue, 14 Nov 2023 14:54:32 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 53
Message-ID: <uj01n8$1916k$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<cb09075f8208771a17611005f8aeb4f3@news.novabbs.com>
<uikcd2$2lh5f$3@dont-email.me>
<b8330e20443df008b0ab07560e543581@news.novabbs.com>
<uirmau$8tah$1@dont-email.me> <jwvil65a9yg.fsf-monnier+comp.arch@gnu.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 14 Nov 2023 14:54:32 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="52796a42deb26f6c9e81fe6bfbfec78b";
logging-data="1344724"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19mrPw6k9/X4XhgwKosVrXPetPAH+a1ic8="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:p0gTUukgnOQh7ED8zC3wm5fm8+U=
 by: Quadibloc - Tue, 14 Nov 2023 14:54 UTC

On Mon, 13 Nov 2023 11:46:47 -0500, Stefan Monnier wrote:

> You might want to try and get fancy in your short instructions by
> "randomizing" the subset of registers they can access.
>
> E.g. allow both your short LD and ST instruction access 16 registers but
> not exactly the same 16.
> Or allow your arithmetic instructions to access only 8 registers for
> their input and output args but not exactly the same 8 for the two
> inputs and/or for the output.
>
> I suspect that if done well, it could give benefits similar to the
> skewed-associative caches. The other upside is that it makes register
> allocation *really* interesting, thus opening up opportunities to spend
> a few more years working on that subproblem :-)

I would like to be able to say that this idea was too bizarre even for
me.

However, one of the ideas I toyed with before settling on my current
iteration of Concertina II was to

- drop the aligned memory-reference instructions
- somehow squeeze the 32-bit operate instructions into the space left
over by the byte instructions in the family
- thereby doubling the space available for 16-bit instructions.

The instruction slots of the form 0-0- would be as before: two instructions
where both source and destination are in the same group of eight registers.

The instruction slots of the form 0-1- would contain two 16-bit
instructions where the source and destination registers are each
four bits long, allowing (as in the indexed memory-reference
instructions) the use of the first four registers in each of the four
groups of eight registers.

Thus, one instruction type uses all the registers, and the other
allows transfers between the 8-bit banks.

So, sadly, I actually *did* contemplate going there. Fortunately, I
thought better of it.

> To up the ante, you could make the set of registers reachable from each
> instruction depend not just on the opcode but also on the instruction's
> address, so you can sometimes avoid a spill by swapping two
> instructions. This would allow the register allocation to interact in
> even more interesting ways with instruction scheduling.
> There could be a few more PhDs worth of research there.

That would definitely be one trick to allow access to more registers than
the number of opcode bits allows.

John Savard

Re: Concertina II Progress

<uj3380$1rnvb$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35008&group=comp.arch#35008

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Wed, 15 Nov 2023 10:38:56 -0800
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <uj3380$1rnvb$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me>
<uikc1s$2lh5f$2@dont-email.me> <4sr3N.17406$AqO5.3263@fx11.iad>
<uilskk$2v1d2$1@dont-email.me> <uilvki$2vjld$1@dont-email.me>
<74fd95a7bc98b42a4c1c8517ab7cdac8@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 15 Nov 2023 18:38:57 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d89a8fafcfc2273ff43f55e3c5fb8545";
logging-data="1957867"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19lUPvZx77+M/PaYSfV7JGAcs8+BHH9HYY="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ARtZ9ZYHe4nYS/aC4BVmJmbyjkQ=
In-Reply-To: <74fd95a7bc98b42a4c1c8517ab7cdac8@news.novabbs.com>
Content-Language: en-US
 by: Stephen Fuld - Wed, 15 Nov 2023 18:38 UTC

On 11/11/2023 10:11 AM, MitchAlsup wrote:
> Stephen Fuld wrote:
>
>> On 11/10/2023 10:24 AM, BGB wrote:
>>
>>>
>>>
>>> Much better to have a big flat register space.
>
>> Yes, but sometimes you just need "another bit" in the instructions.
>> So an alternative is to break the requirement that all register
>> specifier fields in the instruction be the same length.  So, for
>> example, allow
> <
> Another way to get a few more bits is to use a prefix-instruction like
> CARRY for those seldom needed bits.

Good point. A combination of the two ideas could be to have the prefix
instruction specify which register to use instead of the one specified
in the reduced register specifier for whichever instructions in its
shadow have the bit set in the prefix. Worst case, this is the same as
my original proposal - one extra, not really executed, instruction
(prefix versus register to register move) for one where you need to use
it, but this idea might, by allowing the prefix to specify multiple
instructions, save more than one extra "instruction". The only downside
is it requires an additional op code.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Concertina II Progress

<5412afba176e6044e28a72965f13ac4a@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35010&group=comp.arch#35010

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Wed, 15 Nov 2023 19:02:00 +0000
Organization: novaBBS
Message-ID: <5412afba176e6044e28a72965f13ac4a@news.novabbs.com>
References: <uigus7$1pteb$1@dont-email.me> <uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me> <uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me> <uikc1s$2lh5f$2@dont-email.me> <4sr3N.17406$AqO5.3263@fx11.iad> <uilskk$2v1d2$1@dont-email.me> <uilvki$2vjld$1@dont-email.me> <74fd95a7bc98b42a4c1c8517ab7cdac8@news.novabbs.com> <uj3380$1rnvb$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1024547"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Spam-Level: *
X-Rslight-Site: $2y$10$6St4pPFAEtj24Du3rrIcpeT42BOvqSVo/uwOk31cnxzA37yO9MzlC
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Wed, 15 Nov 2023 19:02 UTC

Stephen Fuld wrote:

> On 11/11/2023 10:11 AM, MitchAlsup wrote:
>> Stephen Fuld wrote:
>>
>>> On 11/10/2023 10:24 AM, BGB wrote:
>>>
>>>>
>>>>
>>>> Much better to have a big flat register space.
>>
>>> Yes, but sometimes you just need "another bit" in the instructions.
>>> So an alternative is to break the requirement that all register
>>> specifier fields in the instruction be the same length.  So, for
>>> example, allow
>> <
>> Another way to get a few more bits is to use a prefix-instruction like
>> CARRY for those seldom needed bits.

> Good point. A combination of the two ideas could be to have the prefix
> instruction specify which register to use instead of the one specified
> in the reduced register specifier for whichever instructions in its
> shadow have the bit set in the prefix.
<
You could have the prefix instruction supply the missing bits of all
shortened register specifiers.
<
< Worst case, this is the same as
> my original proposal - one extra, not really executed, instruction
<
Which is why I use the term instruction-modifier.
<
> (prefix versus register to register move) for one where you need to use
> it, but this idea might, by allowing the prefix to specify multiple
> instructions, save more than one extra "instruction". The only downside
> is it requires an additional op code.
<
But by having an instruction-modifier that can add bits to several
succeeding instructions, you can avoid cluttering up ISA with things
like ADC, SBC, IMULD, DDIV, ....... So, in the end, you save OpCode
enumeration space not consume it.

Re: Concertina II Progress

<uj37t1$1sgg4$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35014&group=comp.arch#35014

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Wed, 15 Nov 2023 11:58:25 -0800
Organization: A noiseless patient Spider
Lines: 67
Message-ID: <uj37t1$1sgg4$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me>
<uikc1s$2lh5f$2@dont-email.me> <4sr3N.17406$AqO5.3263@fx11.iad>
<uilskk$2v1d2$1@dont-email.me> <uilvki$2vjld$1@dont-email.me>
<74fd95a7bc98b42a4c1c8517ab7cdac8@news.novabbs.com>
<uj3380$1rnvb$1@dont-email.me>
<5412afba176e6044e28a72965f13ac4a@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 15 Nov 2023 19:58:25 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d89a8fafcfc2273ff43f55e3c5fb8545";
logging-data="1982980"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Sy1sHjf3rRa0M3nVq498CFrM4IZRQ/YI="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:BII3LkU74S7CiJ8vikLEk7u7rxQ=
In-Reply-To: <5412afba176e6044e28a72965f13ac4a@news.novabbs.com>
Content-Language: en-US
 by: Stephen Fuld - Wed, 15 Nov 2023 19:58 UTC

On 11/15/2023 11:02 AM, MitchAlsup wrote:
> Stephen Fuld wrote:
>
>> On 11/11/2023 10:11 AM, MitchAlsup wrote:
>>> Stephen Fuld wrote:
>>>
>>>> On 11/10/2023 10:24 AM, BGB wrote:
>>>>
>>>>>
>>>>>
>>>>> Much better to have a big flat register space.
>>>
>>>> Yes, but sometimes you just need "another bit" in the instructions.
>>>> So an alternative is to break the requirement that all register
>>>> specifier fields in the instruction be the same length.  So, for
>>>> example, allow
>>> <
>>> Another way to get a few more bits is to use a prefix-instruction like
>>> CARRY for those seldom needed bits.
>
>> Good point. A combination of the two ideas could be to have the prefix
>> instruction specify which register to use instead of the one specified
>> in the reduced register specifier for whichever instructions in its
>> shadow have the bit set in the prefix.
> <
> You could have the prefix instruction supply the missing bits of all
> shortened register specifiers.

I am not sure what you are proposing here. Can you show an example?

> <
> <                                         Worst case, this is the same as
>> my original proposal - one extra, not really executed, instruction
> <
> Which is why I use the term instruction-modifier.

Agreed.

> <
>> (prefix versus register to register move) for one where you need to
>> use it, but this idea might, by allowing the prefix to specify
>> multiple instructions, save more than one extra "instruction".  The
>> only downside is it requires an additional op code.
> <
> But by having an instruction-modifier that can add bits to several
> succeeding instructions, you can avoid cluttering up ISA with things
> like ADC, SBC, IMULD, DDIV, ....... So, in the end, you save OpCode
> enumeration space not consume it.

In the general case, I certainly agree. But here you need a different
op-code than CARRY, as this has different semantics, and I think the new
instruction modifier has no other use, hence it is an additional op code
versus the original proposal of using essentially a register copy
instruction, which already exists (i.e. a load with a zero displacement
and the source register as the address modifier).

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Concertina II Progress

<063885f383205c854c2387dcea32ba7a@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35022&group=comp.arch#35022

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Wed, 15 Nov 2023 21:10:52 +0000
Organization: novaBBS
Message-ID: <063885f383205c854c2387dcea32ba7a@news.novabbs.com>
References: <uigus7$1pteb$1@dont-email.me> <uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me> <uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me> <uikc1s$2lh5f$2@dont-email.me> <4sr3N.17406$AqO5.3263@fx11.iad> <uilskk$2v1d2$1@dont-email.me> <uilvki$2vjld$1@dont-email.me> <74fd95a7bc98b42a4c1c8517ab7cdac8@news.novabbs.com> <uj3380$1rnvb$1@dont-email.me> <5412afba176e6044e28a72965f13ac4a@news.novabbs.com> <uj37t1$1sgg4$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1035833"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$WQ9ebk3j0iWFheq3yRH1MOhTm8Qk437mJtVibLrAnIGRV4zk04gs6
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Spam-Level: *
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
 by: MitchAlsup - Wed, 15 Nov 2023 21:10 UTC

Stephen Fuld wrote:

> On 11/15/2023 11:02 AM, MitchAlsup wrote:
>> Stephen Fuld wrote:
>>
>>> On 11/11/2023 10:11 AM, MitchAlsup wrote:
>>>> Stephen Fuld wrote:
>>>>
>>>>> On 11/10/2023 10:24 AM, BGB wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> Much better to have a big flat register space.
>>>>
>>>>> Yes, but sometimes you just need "another bit" in the instructions.
>>>>> So an alternative is to break the requirement that all register
>>>>> specifier fields in the instruction be the same length.  So, for
>>>>> example, allow
>>>> <
>>>> Another way to get a few more bits is to use a prefix-instruction like
>>>> CARRY for those seldom needed bits.
>>
>>> Good point. A combination of the two ideas could be to have the prefix
>>> instruction specify which register to use instead of the one specified
>>> in the reduced register specifier for whichever instructions in its
>>> shadow have the bit set in the prefix.
>> <
>> You could have the prefix instruction supply the missing bits of all
>> shortened register specifiers.

> I am not sure what you are proposing here. Can you show an example?

Let us postulate an MoreBits instruction-modifier with a 16-bit immediate
field. Now each 16-bit instruction, that has access to only 8 registers,
strips off 2-bits/specifier, so now all its register specifiers are 5-bits.
The immediate supplies the bits and as bits are stripped off the Decoder
shifts the field down by the consumed bits. When the last bit has been
stripped off you would need another MB im to supply those bits. Since
only 16-bit instructions are "limited" one MB should last about a basic
block or extended basic block.

Note I don't care how the bits are apportioned, formatted, consumed, ...

>> <
>> <                                         Worst case, this is the same as
>>> my original proposal - one extra, not really executed, instruction
>> <
>> Which is why I use the term instruction-modifier.

> Agreed.

>> <
>>> (prefix versus register to register move) for one where you need to
>>> use it, but this idea might, by allowing the prefix to specify
>>> multiple instructions, save more than one extra "instruction".  The
>>> only downside is it requires an additional op code.
>> <
>> But by having an instruction-modifier that can add bits to several
>> succeeding instructions, you can avoid cluttering up ISA with things
>> like ADC, SBC, IMULD, DDIV, ....... So, in the end, you save OpCode
>> enumeration space not consume it.

> In the general case, I certainly agree. But here you need a different
> op-code than CARRY, as this has different semantics, and I think the new
> instruction modifier has no other use, hence it is an additional op code
> versus the original proposal of using essentially a register copy
> instruction, which already exists (i.e. a load with a zero displacement
> and the source register as the address modifier).

CARRY is your access to ALL extended precision calculations (saving 20+
OpCodes when you consider a robust commercial ISA rather than an Academic
ISA.) Carry accesses integer arithmetic, shifts, extracts, inserts, and
exact floating point calculations larger than 64-bits including Kahan-
Babashuka summation. {{Not bad for 1 OpCode !!}}

Similarly:: VEC-LOOP provide access to 1,000+ SIMD instructions and 400+
Vector instructions at the cost of 2 units in the OpCode Space !! It also
allows a future implementation to execute wider (or narrower) than SIMD
with no change in the instruction sequence.

MoreBits is effectively just like REX except it can span instructions.

Re: Concertina II Progress

<ujg54v$c6r4$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35131&group=comp.arch#35131

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.network!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Mon, 20 Nov 2023 09:31:11 -0800
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <ujg54v$c6r4$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me>
<uikc1s$2lh5f$2@dont-email.me> <4sr3N.17406$AqO5.3263@fx11.iad>
<uilskk$2v1d2$1@dont-email.me> <uilvki$2vjld$1@dont-email.me>
<74fd95a7bc98b42a4c1c8517ab7cdac8@news.novabbs.com>
<uj3380$1rnvb$1@dont-email.me>
<5412afba176e6044e28a72965f13ac4a@news.novabbs.com>
<uj37t1$1sgg4$1@dont-email.me>
<063885f383205c854c2387dcea32ba7a@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 20 Nov 2023 17:31:11 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="18b916a3fcf225267af21695b58813b0";
logging-data="400228"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19PKwlhQEiQqxOPB6ZmEdcb7XCFPHgxScw="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:87qrLL4oDe9fZl5pLkJfCQYIfCo=
In-Reply-To: <063885f383205c854c2387dcea32ba7a@news.novabbs.com>
Content-Language: en-US
 by: Stephen Fuld - Mon, 20 Nov 2023 17:31 UTC

On 11/15/2023 1:10 PM, MitchAlsup wrote:
> Stephen Fuld wrote:
>
>> On 11/15/2023 11:02 AM, MitchAlsup wrote:
>>> Stephen Fuld wrote:
>>>
>>>> On 11/11/2023 10:11 AM, MitchAlsup wrote:
>>>>> Stephen Fuld wrote:
>>>>>
>>>>>> On 11/10/2023 10:24 AM, BGB wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Much better to have a big flat register space.
>>>>>
>>>>>> Yes, but sometimes you just need "another bit" in the
>>>>>> instructions. So an alternative is to break the requirement that
>>>>>> all register specifier fields in the instruction be the same
>>>>>> length.  So, for example, allow
>>>>> <
>>>>> Another way to get a few more bits is to use a prefix-instruction like
>>>>> CARRY for those seldom needed bits.
>>>
>>>> Good point. A combination of the two ideas could be to have the
>>>> prefix instruction specify which register to use instead of the one
>>>> specified in the reduced register specifier for whichever
>>>> instructions in its shadow have the bit set in the prefix.
>>> <
>>> You could have the prefix instruction supply the missing bits of all
>>> shortened register specifiers.
>
>> I am not sure what you are proposing here.  Can you show an example?
>
> Let us postulate an MoreBits instruction-modifier with a 16-bit immediate
> field. Now each 16-bit instruction, that has access to only 8 registers,
> strips off 2-bits/specifier, so now all its register specifiers are 5-bits.
> The immediate supplies the bits and as bits are stripped off the Decoder
> shifts the field down by the consumed bits. When the last bit has been
> stripped off you would need another MB im to supply those bits. Since
> only 16-bit instructions are "limited" one MB should last about a basic
> block or extended basic block.
>
> Note I don't care how the bits are apportioned, formatted, consumed, ...

Oh, so you have changed the meaning of the "immediate bit map" from
specifying which of the following instructions it applies to (e.g.
CARRY) to the actual data. I like it!

If using 16 bit instructions, and if you only have one small register
field per instruction, I think it is better to make "MoreBits" a 16 bit
instruction modifier itself, with say a five bit op code and an eleven
bit immediate, which supplies the extra bit for the next 11
instructions. More compact than a 32 bit instruction, and almost as
"far reaching". If you need more than 11 bits, even if you add a second
MB instruction modifier 11 instructions later, you are still no worse
off than an instruction modifier plus a 16 bit immediate.

Of course, if you need more than one extra bit per instruction, then
more "drastic" measures, such as your proposal, are needed.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Concertina II Progress

<ujgrel$h32p$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35137&group=comp.arch#35137

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Mon, 20 Nov 2023 17:51:46 -0600
Organization: A noiseless patient Spider
Lines: 270
Message-ID: <ujgrel$h32p$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me>
<uikc1s$2lh5f$2@dont-email.me> <4sr3N.17406$AqO5.3263@fx11.iad>
<uilskk$2v1d2$1@dont-email.me> <uilvki$2vjld$1@dont-email.me>
<74fd95a7bc98b42a4c1c8517ab7cdac8@news.novabbs.com>
<uj3380$1rnvb$1@dont-email.me>
<5412afba176e6044e28a72965f13ac4a@news.novabbs.com>
<uj37t1$1sgg4$1@dont-email.me>
<063885f383205c854c2387dcea32ba7a@news.novabbs.com>
<ujg54v$c6r4$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 20 Nov 2023 23:51:50 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="fee2410350d052836693883466797095";
logging-data="560217"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/y+qhxOhHZYUzC1bfR9jBV"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:iwqTWC+dfcqkP1Fnie8rgA4Vi6s=
In-Reply-To: <ujg54v$c6r4$1@dont-email.me>
Content-Language: en-US
 by: BGB - Mon, 20 Nov 2023 23:51 UTC

On 11/20/2023 11:31 AM, Stephen Fuld wrote:
> On 11/15/2023 1:10 PM, MitchAlsup wrote:
>> Stephen Fuld wrote:
>>
>>> On 11/15/2023 11:02 AM, MitchAlsup wrote:
>>>> Stephen Fuld wrote:
>>>>
>>>>> On 11/11/2023 10:11 AM, MitchAlsup wrote:
>>>>>> Stephen Fuld wrote:
>>>>>>
>>>>>>> On 11/10/2023 10:24 AM, BGB wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Much better to have a big flat register space.
>>>>>>
>>>>>>> Yes, but sometimes you just need "another bit" in the
>>>>>>> instructions. So an alternative is to break the requirement that
>>>>>>> all register specifier fields in the instruction be the same
>>>>>>> length.  So, for example, allow
>>>>>> <
>>>>>> Another way to get a few more bits is to use a prefix-instruction
>>>>>> like
>>>>>> CARRY for those seldom needed bits.
>>>>
>>>>> Good point. A combination of the two ideas could be to have the
>>>>> prefix instruction specify which register to use instead of the one
>>>>> specified in the reduced register specifier for whichever
>>>>> instructions in its shadow have the bit set in the prefix.
>>>> <
>>>> You could have the prefix instruction supply the missing bits of all
>>>> shortened register specifiers.
>>
>>> I am not sure what you are proposing here.  Can you show an example?
>>
>> Let us postulate an MoreBits instruction-modifier with a 16-bit immediate
>> field. Now each 16-bit instruction, that has access to only 8 registers,
>> strips off 2-bits/specifier, so now all its register specifiers are
>> 5-bits.
>> The immediate supplies the bits and as bits are stripped off the Decoder
>> shifts the field down by the consumed bits. When the last bit has been
>> stripped off you would need another MB im to supply those bits. Since
>> only 16-bit instructions are "limited" one MB should last about a
>> basic block or extended basic block.
>>
>> Note I don't care how the bits are apportioned, formatted, consumed, ...
>
> Oh, so you have changed the meaning of the "immediate bit map" from
> specifying which of the following instructions it applies to (e.g.
> CARRY) to the actual data.  I like it!
>
> If using 16 bit instructions, and if you only have one small register
> field per instruction, I think it is better to make "MoreBits" a 16 bit
> instruction modifier itself, with say a five bit op code and an eleven
> bit immediate, which supplies the extra bit for the next 11
> instructions.  More compact than a 32 bit instruction, and almost as
> "far reaching".  If you need more than 11 bits, even if you add a second
> MB instruction modifier 11 instructions later, you are still no worse
> off than an instruction modifier plus a 16 bit immediate.
>
> Of course, if you need more than one extra bit per instruction, then
> more "drastic" measures, such as your proposal, are needed.
>
>

Ironically, this is closer to how 32-bit ops were originally intended to
work in BJX2, and how they worked in BJX1 (where most of the 32-bit ops
were basically prefixes on the existing 16-bit SuperH ops).

Say:
ZnmZ //typical layout of a 16-bit op, R0..R15
8Ceo-ZnmZ //Op gains an extra register field, and R16..R31.

Then, in the original form of BJX2:
ZZnm
F0eo-ZZnm

For some ops, the 3rd register (Ro) would instead operate as a 5-bit
immediate/displacement field. Which was initially a similar idea, with
the 32-bit space mirroring the 16-bit space.

When I later added the Imm9 encodings, the encoding of the other ops was
changed to be more consistent with this:
F0nm-ZeoZ
F2nm-Zeii

This was originally designed as a possible successor ISA, but it seemed
"better" to back-fold it into my existing ISA (effectively replacing the
original encoding scheme in the process).

This encoding was relatively stable, until Jumbo prefixes were added and
shook things up a little more (and the more recent shakeup with XG2,
which has effectively fragmented the ISA into two sub-variants with
neither being a "clear winner", *).

*: The previous Baseline encoding is better for code density (due to
still having 16-bit ops), XG2 is better for performance (due to more
orthogonality, such as the ability to use every register from every
instruction, and adding a bit to the Immed/Displacement fields, or 3 in
the case of plain branches).

Had considered possible options for "Make XG2's encoding less dog
chewed", but the issue is not so simple as simply shifting the bits
around (shuffling the bits would just make it dog-chewed in other ways).

So, existing encoding, expressed in bits, is roughly:
NMOP-ZwZZ-nnnn-mmmm ZZZZ-Qnmo-oooo-ZZZZ

And the possible revised form:
PwZZ-ZZZZ-ZZnn-nnnn-mmmm-mmoo-oooo-ZZZZ

However, what I have thus far would effectively amount to nearly a full
reboot of the encoding (which would be a huge pile of effort), so less
likely to be "worth it" in the name of a slightly less chewed encoding
scheme (and, hell, RISC-V is going along OK with its immediate fields
being effectively confetti).

Though, another option could be closer to a straight reshuffle:
NMOP-ZwZZ-nnnn-mmmm YYYY-Qnmo-oooo-XXXX
NMIP-ZwZZ-nnnn-mmmm YYYY-Qnmi-iiii-iiii
To:
PwZZ-ZQnn-nnnn-YYYY-mmmm-mmoo-oooo-XXXX
PwZZ-ZQnn-nnnn-YYYY-mmmm-mmii-iiii-iiii

So, the existing ISA listing could be mapped over mostly as-is, with the
main changes (besides the bit-reshuffle) being in the immediate field.

However:
DDDP-0w00-nnnn-mmmm 1100-dddd-dddd-dddd
To:
Pw00-0ddd-dddd-YYYY-dddd-dddd-dddd-dddd

Is gonna need some new relocs, ...

OTOH, it would allow making the F8 block's encoding consistent with the
rest of the ISA.

But, recently I am left feeling uncertain if any of this is anything
more than moot...

Did recently make a little bit of progress towards having a GUI in
TestKern, in that I now have a console window with a shell "sorta" able
to run inside this console.

Has partly opened the "pandora's box" though that is needing to deal
with multitasking, re-entrance, and the possible need for needing to use
mutex locking (as-is, it was "barely working" in that I had to carefully
avoid re-entrance in a few areas to keep the kernel from exploding; as
none of this stuff has mutexes).

Well, and then having to fix-up issues like making the scheduler not try
to schedule the syscall-handler task and then promptly causing the "OS"
to explode (for now, these are special cased; I may need to come up with
a general way of flagging some tasks as "do not schedule", since they
will exist as special-cases to handle syscalls or specifically as the
target of inter-process VTable calls, as is the case with TKGDI, where
the call itself will schedule the task). Where, in this case, the
mechanism for inter-task control flow will take a form resembling that
of COM objects (it is likely that TKRA-GL may need to be reworked into
this form as well, *2).

Also looking like I will need to rework how the shell works.
Effectively, now, rather than the CLI running directly in the kernel, it
needs to be a userland (or "superuserland", *) task communicating with
the kernel via syscalls. So, the shell can no longer directly invoke the
PE/COFF loader, but will now need to use a "CreateProcess" call (and
then probably sleep-loop until the created process terminates).

*: Where a task is being run more like a userland task, but still in
running in supervisor mode (the syscall handler task and TKGDI backend
running in this mode).

Where, say:
Thread: Logical thread of execution within some existing process;
Process: Distinct collection of 1 or more threads within a shared
address space and shared process identity (may have its own address
space, though as-of-yet, TestKern uses a shared global address space);
Task: Supergroup that includes Threads, Processes, and other thread-like
entities (such as call and method handlers), may be either thread-like
or process-like.

Where, say, the Syscall interrupt handler doesn't generally handle
syscalls itself (since the ISRs will only have access to
physically-mapped addresses), but effectively instead initiates a
context switch to the task that can handle the request (or, to context
switch back to the task that made the request, or to yield to another
task, ...).

Though, will need to probably add more special case handling such that
the Syscall task can not yield or try to itself make a syscall (the only
valid exit point for this task being where it transfers control back to
the caller and awaits the next syscall to arrive; and it is not valid
for this task to try to syscall back into itself).


Click here to read the complete article
Re: Concertina II Progress

<57b4666649236a3e79cd04773a76f7ee@news.novabbs.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=35145&group=comp.arch#35145

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Tue, 21 Nov 2023 22:12:18 +0000
Organization: novaBBS
Message-ID: <57b4666649236a3e79cd04773a76f7ee@news.novabbs.com>
References: <uigus7$1pteb$1@dont-email.me> <uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me> <uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me> <uikc1s$2lh5f$2@dont-email.me> <4sr3N.17406$AqO5.3263@fx11.iad> <uilskk$2v1d2$1@dont-email.me> <uilvki$2vjld$1@dont-email.me> <74fd95a7bc98b42a4c1c8517ab7cdac8@news.novabbs.com> <uj3380$1rnvb$1@dont-email.me> <5412afba176e6044e28a72965f13ac4a@news.novabbs.com> <uj37t1$1sgg4$1@dont-email.me> <063885f383205c854c2387dcea32ba7a@news.novabbs.com> <ujg54v$c6r4$1@dont-email.me> <ujgrel$h32p$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="1692388"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$RZEtTjH4IvWBJDdZMfbXbuUyAl1/4opeKYRrsqZ6C40/vy6ZoGwgy
 by: MitchAlsup - Tue, 21 Nov 2023 22:12 UTC

BGB wrote:

> On 11/20/2023 11:31 AM, Stephen Fuld wrote:
>> On 11/15/2023 1:10 PM, MitchAlsup wrote:

> For some ops, the 3rd register (Ro) would instead operate as a 5-bit
> immediate/displacement field. Which was initially a similar idea, with
> the 32-bit space mirroring the 16-bit space.

Almost all My 66000 {1,2,3}-operand instructions can convert a 5-bit register
specifier into a 5-bit immediate of either positive or negative integer
value. This makes::

1<<n
~0<<n
container.bitfield = 7;

single instructions.

> Where, say:
> Thread: Logical thread of execution within some existing process;
has a register file and a stack.
> Process: Distinct collection of 1 or more threads within a shared
has a memory map a heap and a vector of threads.
> address space and shared process identity (may have its own address
> space, though as-of-yet, TestKern uses a shared global address space);
> Task: Supergroup that includes Threads, Processes, and other thread-like
> entities (such as call and method handlers), may be either thread-like
> or process-like.

> Where, say, the Syscall interrupt handler doesn't generally handle
> syscalls itself (since the ISRs will only have access to
> physically-mapped addresses), but effectively instead initiates a
> context switch to the task that can handle the request (or, to context
> switch back to the task that made the request, or to yield to another
> task, ...).

We call these things:: dispatchers.

> Though, will need to probably add more special case handling such that
> the Syscall task can not yield or try to itself make a syscall (the only
> valid exit point for this task being where it transfers control back to
> the caller and awaits the next syscall to arrive; and it is not valid
> for this task to try to syscall back into itself).

In My 66000, every <effective> SysCall goes deeper into the privilege
hierarchy. So, Application SysCalls Guest OS, Guest OS SysCalls Guest HV,
Guest HV SysCalls real HV. No data structures need maintenance during
these transitions of the hierarchy.


devel / comp.arch / Re: Concertina II Progress

Pages:123456789101112131415161718192021222324252627282930313233343536373839
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor