Welcome to Rocksolid Light

mail files register newsreader groups login

Message-ID:

It's not an optical illusion, it just looks like one. -- Phil White

Concertina II Progress

Subject	Author
Concertina II Progress	Quadibloc
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Thomas Koenig
Re: Concertina II Progress	BGB-Alt
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	BGB-Alt
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	BGB
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Stephen Fuld
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	BGB-Alt
Re: Concertina II Progress	Stephen Fuld
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Stephen Fuld
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Stephen Fuld
Re: Concertina II Progress	BGB
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	BGB
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Stefan Monnier
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Paul A. Clayton
Re: Concertina II Progress	Stefan Monnier
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	Robert Finch
Re: Concertina II Progress	BGB
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Robert Finch
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Chris M. Thomasson
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Chris M. Thomasson
Re: Concertina II Progress	BGB
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	BGB
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Robert Finch
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Robert Finch
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Paul A. Clayton
Re: Concertina II Progress	Robert Finch
Re: Concertina II Progress	Paul A. Clayton
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Paul A. Clayton
Re: Concertina II Progress	BGB
Computer architecture (was: Concertina II Progress)	Anton Ertl
Re: Computer architecture	EricP
Re: Computer architecture	Anton Ertl
Re: Computer architecture	Scott Lurndal
Re: Computer architecture	Stefan Monnier
Re: Computer architecture	Scott Lurndal
Re: Computer architecture	Stefan Monnier
Re: Computer architecture	Scott Lurndal
Re: Computer architecture	Stefan Monnier
Re: Computer architecture	BGB
Re: Computer architecture	Stefan Monnier
Re: Computer architecture	BGB
Re: Computer architecture	Scott Lurndal
Re: Computer architecture	Anton Ertl
Re: Computer architecture	Paul A. Clayton
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Robert Finch
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Thomas Koenig
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	MitchAlsup

Pages:12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

Concertina II Progress

<uigus7$1pteb$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34854&group=comp.arch#34854

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.chmurka.net!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Concertina II Progress
Date: Wed, 8 Nov 2023 21:33:59 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <uigus7$1pteb$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 8 Nov 2023 21:33:59 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f71880226a02007904df37806b9e2dd7";
logging-data="1897931"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18EKupDanYGlutV2Ae7RlGUTZTcl/YTuXc="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:R8+CGOsXr8VuNsw8n8Dc2splCts=

by: Quadibloc - Wed, 8 Nov 2023 21:33 UTC

Some progress has been made in advancing a small step towards sanity
in the description of the Concertina II architecture described at

http://www.quadibloc.com/arch/ct17int.htm

As Mitch Alsup has rightly noted, I want to have my cake and eat it
too. I want an instruction format that is quick to fetch and decode,
like a RISC format. I want RISC-like banks of 32 registers, and I
want the CISC-like addressing modes of the IBM System/360, but with
16-bit displacements, not 12-bit displacements.

I want memory-reference instructions to still fit in 32 bits, despite
asking for so much more capacity.

So what I had done was, after squeezing as much as I could into a basic
instruction format, I provided for switching into alternate instruction
formats which made different compromises by using the block headers.

This has now been dropped. Since I managed to get the normal (unaligned)
memory-reference instruction squeezed into so much less opcode space that
I also had room for the aligned memory-reference format without compromises
in the basic instruction set, it wasn't needed to have multiple instruction
formats.

I had to change the instructions longer than 32 bits to get them in the
basic instruction format, so now they're less dense.

Block structure is still used, but now for only the two things it's
actually needed for: reserving part of a block as unused for the
pseudo-immediates, and for VLIW features (explicitly indicating
parallelism, and instruction predication).

The ISA is still tremendously complicated, since I've put room in it for
a large assortment of instructions of all kinds, but I think it's
definitely made a significant stride towards sanity.

John Savard

Re: Concertina II Progress

<uihv6r$234ka$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34855&group=comp.arch#34855

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Thu, 9 Nov 2023 00:43:27 -0600
Organization: A noiseless patient Spider
Lines: 401
Message-ID: <uihv6r$234ka$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 9 Nov 2023 06:45:47 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="6042c65b507cbbda12ce906953e509f6";
logging-data="2200202"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/RjFIoKBThsF7jxwFrKSuY"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:X385dVCdHHlSko/1wazsFBL2VDg=
Content-Language: en-US
In-Reply-To: <uigus7$1pteb$1@dont-email.me>

by: BGB - Thu, 9 Nov 2023 06:43 UTC

On 11/8/2023 3:33 PM, Quadibloc wrote:
> Some progress has been made in advancing a small step towards sanity
> in the description of the Concertina II architecture described at
>
> http://www.quadibloc.com/arch/ct17int.htm
>
> As Mitch Alsup has rightly noted, I want to have my cake and eat it
> too. I want an instruction format that is quick to fetch and decode,
> like a RISC format. I want RISC-like banks of 32 registers, and I
> want the CISC-like addressing modes of the IBM System/360, but with
> 16-bit displacements, not 12-bit displacements.
>

Ironically, I am getting slightly better reach on average with (scaled)
9-bit (and 10) bit displacements than RISC-V gets with 12 bits...

Say:
DWORD:
12s, Unscaled: +/- 2K
9u, 4B Scale : + 2K
10s, 4B Scale: +/- 2K (XG2)
QWORD:
12s, Unscaled: +/- 2K
9u, 8B Scale : + 4K
10s, 8B Scale: +/- 4K (XG2)

It was a pretty tight call between 10s and 10u, but 10s won out by a
slight margin mostly because the majority of structs and stack-frames
tend to be smaller than 4K (but, does create an incentive to use larger
storage formats for on-stack storage).

Though, for integer immediate instructions, RISC-V would have a slight
advantage. Where, say, roughly 9% of 3R integer immediate values miss
with the existing Imm9u/Imm9n scheme; but the sliver of "Misses with 9
bits, but would hit with 12 bits", is relatively small (most of the
"miss" cases are much larger constants).

However, a fair chunk of these "miss" cases, could be handled with a
bit-set/bit-clear instruction, say:
y=x|0x02000000;
z=x&0xFBFFFFFF;
Turning into, say:
BIS R4, 25, R6
BIC R4, 25, R7

Unclear if this case is quite common enough to justify adding these
instructions though (granted, a case could be made for them).

However, a few cases do typically need larger displacements:
PC relative, such as branches.
GBR relative, namely constant loads.

For PC relative, 20-bits is "mostly enough", but one program has hit the
20-bit limit (+/- 1MB). Recently, via a tweak, in current forms of the
ISA, the effective branch-displacement limit (for a 32-bit instruction
form) has been increased to 23 bit (+/- 8MB).
Baseline+XGPR: Unconditional BRA and BSR only.
Conditional branches still limited to 20 bits.
XG2: Also includes conditional branches.

In these cases, it was mostly because the bits that were being used to
extend the GPRs to 6 bits were N/A for their original purpose with
branch-ops, and this could be repurposed to the displacement. Main other
alternatives would have been 22 bits + alternate link register, or a
3-bit LR field; however, the cost of supporting this would have been
higher than that of reassigning them simply towards making the
displacement bigger.

Potentially a similar role could have been served by a conjoined "MOV
LR, R1 | BSR Disp" instruction (and/or allowing "MOV LR, R1" in Lane 2
as a special case for this, even if it would not otherwise be allowed
within the ISA rules). Though, would defeat the point if this encoding
foils the branch predictor.

Recently, had ended up adding some Disp11s Compare-with-Zero branches,
mostly as these branches turn out to be useful (in the face of 2-cycle
CMPxx), and 8 bits "wasn't quite enough". Say, Disp11s can cover a much
bigger if/else block or loop body (+/- 2K) than Disp8s (+/- 256B).

For GBR Relative:
The default 9-bit displacement was Byte scaled (for "reasons");
But, a 512B range isn't terribly useful;
Later forms ended up with Disp10u Scaled:
This gives 4K or 8K of range (in Baseline)
This increases to 8K and 16K in XG2.

If the compiler sorts primitive global variables by descending-usage
(and emits the top N specially, at the start of ".data"), then the
Scaled GBR cases can access a majority of the global variables (around
75-80% with a scaled 10-bit displacement).

Effectively, the remaining 20-25% or so need to be handled as one of:
Jumbo Disp33s (if Jumbo prefixes are available, most profiles);
2-op Disp25s (no jumbo, '.data'+'.bss' less than 16MB).
3-op Disp33s (else).

Though, as with the stack frames, these instructions do create an
incentive to effectively promote any small global variables to a larger
storage type (such as 'char' or 'short' to 'int'); just with implicit
sign (or zero) extensions to preserve the expected behavior of the
smaller type (though, strictly speaking, only zero-extensions would be
required by the C standard, given signed overflow is technically UB; but
there would be something "deeply wrong" with a 'char' variable being
able to hold, say, -4495213, or similar).

Though, does mean for normal variables, "just use int or similar" is
typically faster (say, because there are dedicated 32-bit sign and zero
extending forms of some of the common ALU ops, but not for 8 or 16 bit
cases).

A Disp16u case could maybe reach 256K or 512K, which could cover much of
a combined data+bss section. While in theory this could be better, to
make effective use of this would require effectively folding much of
".bss" into ".data", which is not such a good thing for the program
loader (as opposed to merely folding the top N most-used variables into
".data").

Then again, uninitialized global arrays could probably still be left in
".bss", which tend to be the main "bulking factor" for this section (as
opposed to normal variables).

> I want memory-reference instructions to still fit in 32 bits, despite
> asking for so much more capacity.
>

Yeah.

If you want a Load/Store to have two 5 bit registers and a 16-bit
displacement, only 6 bits are left in a 32-bit instruction word. This
is, not a whole lot...

For a full set of Load/Store ops, this is 4 bits;
For a set of basic ALU ops, this is another 3 bits.

So, just for Load/Store and basic ALU ops, half the encoding space is
gone...

Would it be worth it?...

> So what I had done was, after squeezing as much as I could into a basic
> instruction format, I provided for switching into alternate instruction
> formats which made different compromises by using the block headers.
>
> This has now been dropped. Since I managed to get the normal (unaligned)
> memory-reference instruction squeezed into so much less opcode space that
> I also had room for the aligned memory-reference format without compromises
> in the basic instruction set, it wasn't needed to have multiple instruction
> formats.
>
> I had to change the instructions longer than 32 bits to get them in the
> basic instruction format, so now they're less dense.
>
> Block structure is still used, but now for only the two things it's
> actually needed for: reserving part of a block as unused for the
> pseudo-immediates, and for VLIW features (explicitly indicating
> parallelism, and instruction predication).
>
> The ISA is still tremendously complicated, since I've put room in it for
> a large assortment of instructions of all kinds, but I think it's
> definitely made a significant stride towards sanity.
>

Such is a long standing issue...

I am also annoyed sometimes at how complicated my design has gotten.
Still, it is within reason, and not too far outside the scope of many
existing RISC's.

But, as noted, the reason XG2 exists as-is was sort of a compromise:
I couldn't come up with any encoding which could actually give
everything I wanted, and the "most practical" option was effectively to
dust off an idea I had originally rejected:
Having an alternate encoding which dropped 16-bit ops in favor of
reusing these bits for more GPRs.

At first glance, RISC-V seems cleaner and simpler, but this falls on its
face once one goes outside the scope of RV64IM or similar.

And, it isn't tempting when, at least from my POV, RV64 seems "less
good" than what I have already (others may disagree; but at least to me,
some parts of RISC-V's design seem to me like kind of a trash fire).

The main tempting thing the RV64 has is that, maybe, if one goes and
implements RV64GC and clones a bunch of SiFive's hardware interfaces,
then potentially one can run a mainline Linux on it.

There have apparently been some people that have gotten NOMMU Linux
working on RV32IM targets, which is possible (and, ironically, seemingly
basing these on the SuperH branch in the Linux kernel from what I had
seen...).

Seemingly, AMD/Xilinx is jumping over from MicroBlaze to an RV32
variant. But, granted, RV32 isn't too far from what MicroBlaze is
typically used for, so not really a huge stretch.

I sometimes wonder if maybe I would be better off jumping to RV, but
then I end up seeing examples where cores running at somewhat higher
clock speeds still manage to deliver relatively poor framerates in Doom.

Click here to read the complete article

Re: Concertina II Progress

<uij9lt$3054t$1@newsreader4.netcologne.de>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34858&group=comp.arch#34858

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2a0a-a540-1bc1-0-ffd9-420a-30cf-abb.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Thu, 9 Nov 2023 18:50:37 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <uij9lt$3054t$1@newsreader4.netcologne.de>
References: <uigus7$1pteb$1@dont-email.me>
Injection-Date: Thu, 9 Nov 2023 18:50:37 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2a0a-a540-1bc1-0-ffd9-420a-30cf-abb.ipv6dyn.netcologne.de:2a0a:a540:1bc1:0:ffd9:420a:30cf:abb";
logging-data="3151005"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Thu, 9 Nov 2023 18:50 UTC

Quadibloc <quadibloc@servername.invalid> schrieb:

So, r1 = r2 + r3 + offset.

Three registers is 15 bits plus a 16-bit offset, which gives you
31 bits. You're left with one bit of opcode, one for load and
one for store.

The /360 had 12 bits for three registers plus 12 bits of offset, so
24 bits left eight bits for the opcode (the RX format).

So, if you want to do this kind of thing, why not go for a full 32-bit
offset in a second 32-bit word?

[...]

> The ISA is still tremendously complicated, since I've put room in it for
> a large assortment of instructions of all kinds, but I think it's
> definitely made a significant stride towards sanity.

Have you ever written an assembler for your ISA?

Re: Concertina II Progress

<uijjcd$2d9sp$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34859&group=comp.arch#34859

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.network!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bohannonindustriesllc@gmail.com (BGB-Alt)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Thu, 9 Nov 2023 15:36:12 -0600
Organization: A noiseless patient Spider
Lines: 189
Message-ID: <uijjcd$2d9sp$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 9 Nov 2023 21:36:13 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="6b2e97def98605b7eaa8d74daf619e18";
logging-data="2533273"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18ac+MnyMAaR/FAsEPBZOny8GLAyspHD3Q="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:y9PFaIYqHYvyuRCERH9aS/FHuj0=
In-Reply-To: <uij9lt$3054t$1@newsreader4.netcologne.de>
Content-Language: en-US

by: BGB-Alt - Thu, 9 Nov 2023 21:36 UTC

On 11/9/2023 12:50 PM, Thomas Koenig wrote:
> Quadibloc <quadibloc@servername.invalid> schrieb:
>
>> As Mitch Alsup has rightly noted, I want to have my cake and eat it
>> too. I want an instruction format that is quick to fetch and decode,
>> like a RISC format. I want RISC-like banks of 32 registers, and I
>> want the CISC-like addressing modes of the IBM System/360, but with
>> 16-bit displacements, not 12-bit displacements.
>
> So, r1 = r2 + r3 + offset.
>
> Three registers is 15 bits plus a 16-bit offset, which gives you
> 31 bits. You're left with one bit of opcode, one for load and
> one for store.
>

Oh, that is even worse than I understood it as, namely:
LDx Rd, (Rs, Disp16)
....

But, yeah, 1 bit of opcode clearly wouldn't work...

> The /360 had 12 bits for three registers plus 12 bits of offset, so
> 24 bits left eight bits for the opcode (the RX format).
>
> So, if you want to do this kind of thing, why not go for a full 32-bit
> offset in a second 32-bit word?
>

Originally, I had turned any displacements that didn't fit into 9 bits
into a 2-op sequence:
MOV Imm25s, R0
MOV.x (Rb, R0), Rn

Actually, worse yet, the first form of BJX2 only had 5-bit Load/Store
displacements, but it didn't take long to realize that 5 bits wasn't
really enough (say, when roughly 2/3 of the load and store operations
can't fit in the displacement).

But, now, there are Jumbo-encodings, which can encode a full 33-bit
displacement in a 64-bit encoding. Not everything is perfect though,
mostly because these encodings are bigger and can't be used in a bundle.

But, still "less bad" in this sense than my original 48-bit encodings,
where "for reasons", these couldn't co-exist with bundles in the same
code block.

Despite the loss of 48-bit ops though:
The jumbo encodings give larger displacements (33s vs 24u or 17s);
They reuse the existing 32-bit decoders, rather than needing a dedicated
48-bit decoder.

But, yeah, "use another instruction word" if one needs a larger
displacement, is mostly the option that I would probably recommend.

At first, the 5-bit encodings went away, but later came back as a zombie
of sorts (cases emerged where their existence was still valuable).

But, then it later came up to a tradeoff (with the design of XG2):
Do I expand the Disp9u to Disp10u, and then keep with the XGPR encoding
of using the Disp5u encodings to encode a Disp6s case (for a small range
of negative displacements), or expand to Disp9u to Disp10s?...

In this case, Disp10s won out by a small margin, as I needed non-trivial
negative displacements at least slightly more often than I needed 8K for
structs and stack frames and similar.

But, for most things, a 16-bit displacement would be a waste...
If I were going to go the route of using a signed 12-bit displacement
(like RISC-V), would probably still keep it scaled though, as 8K/16K is
still more useful than 2K.

Branch displacements are typically still hard-wired as 2 though, partly
as the ISA started out with 16-bit ops, and switching XG2 over to 4-byte
scale would have broken its symmetry with the Baseline ISA.

Though, could pull a cheap trick and repurpose the LSB of branch ops in
XG2, given as-is, it is effectively "Must Be Zero" (all instructions
have a 32-bit alignment in this mode, and branches to an odd address are
not allowed).

So, the idea of a BSR that uses R1 as an alternate Link-Register is
still not (entirely) dead (while at the same time allowing for the
'.text' section to be expanded to 8MB).

There are 64-bit Disp33s and Abs48 branch encodings, but, yeah, they
have costs:
They are 64-bit vs 32-bit, thus, bigger;
Are ignored by the branch predictor, thus, slower;
The Abs48 case is not PC relative
Using it within a program requires a base reloc;
Is generally useful for DLL imports and special cases though (*1).

*1: Its existence is mostly as an alternative in these cases to a more
expensive option:
MOV Addr64, R1
JMP R1
Which needs 128-bits, and is also ignored by the branch predictor.

> [...]
>
>> The ISA is still tremendously complicated, since I've put room in it for
>> a large assortment of instructions of all kinds, but I think it's
>> definitely made a significant stride towards sanity.
>
> Have you ever written an assembler for your ISA?

Yeah, whether someone can write an assembler, or disassembler/emulator,
and not drive themselves insane in the attempt, is possibly a test of
"sanity".

Granted, still not foolproof, as it isn't that bad to write an
assembler/disassembler for x86 either, but trying to decode it in
hardware would be nightmarish.

Best guess I can have would be a "preclassify" stage:
If this is an opcode byte, how long will it be, and will a Mod/RM
follow, ...?
If this is a Mod/RM byte, how many bytes will this add.

Then in theory, one can figure instruction length like:
Fetch OpLen for IP;
Fetch Mod/RM len for IP+OpLen if Mod/RM flag is set;
Add OpLen+ModRmLen.
Add an extra 2/4 bytes if an Immed is present for this opcode.

Nicer to not bother.

For my 75 MHz experiment, did end up adding a similar sort of
"preclassify" logic to deal with instruction-lengths though, at the cost
that now L1 I$ cache-lines are specific to the operating mode in which
they were fetched (which now needs to be checked along with the address
and similar).

Mostly all this is a case of "looking up 4 bits of tag metadata" being
less latency than "feed 9 bits of instruction bits through some LUTs"
(or 12 bits if RISC-V decoding is enabled). There is still some latency
due to MUX'ing and similar, but this part is unavoidable.

So, former case:
8 bits: Classify BJX2 instruction length;
1 bit: Specify Baseline or XG2.
Latter case:
8 bits: Classify BJX2 instruction length;
2 bits: Classify RISC-V instruction length (16/32)
2 bits: Specify Baseline, XG2, RISC-V, or XG2RV.

Which map to 4 bits (IIRC):
(0): 16-bit
(1): (WEX && WxE) || Jumbo
(2): WEX
(3): Jumbo

As-is, after MUX'ing, this can effectively turn op-len determination
into a 4 or 6 bit lookup, say (bits tag 1:0 for two adjacent 32-bit words):
00zz: 32-bit
01zz: 16-bit
1000: 64-bit
1001: 48-bit (unused)
1010: 96-bit (*)
1011: Invalid
11zz: Invalid

*: Here, we just assume that the 3'rd instruction word is 00.
Would actually need to check this if either 4-wide bundles or 80-bit
encodings were "actually a thing".

Where, handling both XG2 and WXE (WEX Enable) in the preclassify step
greatly simplifies the logic during instruction fetch.

This could, in premise, be reduced further in an "XG2 only" core, or to
a lesser extent by eliminating the original XGPR scheme. These are not
currently planned though (say, the first-stage lookup width could be
reduced from 8 to 5 or 7 bits).

....

Re: Concertina II Progress

<uijjgn$2d6t9$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34860&group=comp.arch#34860

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Thu, 9 Nov 2023 21:38:31 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 55
Message-ID: <uijjgn$2d6t9$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 9 Nov 2023 21:38:31 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c079aedf6b4bac36d322fcc71f54415d";
logging-data="2530217"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/XiLJ3JTuj/RWOvJlIGmHRbEQriVAI/Bk="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:8KE6XByMzvQ+Js7FCWFXVCxd0BE=

by: Quadibloc - Thu, 9 Nov 2023 21:38 UTC

On Thu, 09 Nov 2023 18:50:37 +0000, Thomas Koenig wrote:

> So, r1 = r2 + r3 + offset.
>
> Three registers is 15 bits plus a 16-bit offset, which gives you 31
> bits. You're left with one bit of opcode, one for load and one for
> store.

Yes, and obviously that isn't enough. So I do have to make some
compromises.

The offset is 16 bits, because the 68000 (and the 8086, and others) had 16
bit offsets!

But the base and index registers are each specified by only 3 bits - only
the destination register gets a 5-bit field.

I need 5 bits for the opcode. That lets me have load and store for four
floating-point types, load, store, unsigned load, and insert for four
integer types (the largest one only uses load and store).

So it is doable! 5 plus 5 plus 3 plus 3 equals 16, so I have 16 bits left
for the offset.

But that leaves only 1/4 of the opcode space. Which would be fine for a
conventional RISC design, as that's plenty for the operate instructions.
But I needed to reserve _half_ the opcode space, because I needed another
1/4 of the opcode space for putting two 16-bit instructions in a 32-bit
word for more compact code.

That led me to look for compromises... and I found some that would not
overly impair the effectiveness of the memory reference instructions,
which I discussed previously. I ended up using _both_ of two alternatives
each of which alone would have given me the needed savings in opcode
space... that way, the compromised memory-reference instructions could be
accompanied by another complete set of memory-reference instructions with
_no_ compromise... except for only being able to specify aligned operands.

> The /360 had 12 bits for three registers plus 12 bits of offset, so 24
> bits left eight bits for the opcode (the RX format).

Oh, yes, I remember it well.

> So, if you want to do this kind of thing, why not go for a full 32-bit
> offset in a second 32-bit word?

Because the 360 only took 32 bits for a memory-reference instruction, so
using 32 bits for one is sinfully wasteful!

I want to "have my cake and eat it too" - to have a computer that's just
as good as a Power PC or a 68000 or a System/360, even though they have
different, incompatible, strengths that conflict with a computer being
able to be good at what each of them is good at simultaneously.

John Savard

Re: Concertina II Progress

<uijjoj$2dc2i$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34861&group=comp.arch#34861

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Thu, 9 Nov 2023 21:42:43 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 11
Message-ID: <uijjoj$2dc2i$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjgn$2d6t9$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 9 Nov 2023 21:42:43 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c079aedf6b4bac36d322fcc71f54415d";
logging-data="2535506"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+21EIJ3KLEbQZus7fmWcVIUWIyrAoXwDA="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:Mc2JpTbenqeCdbvfhn4vH8gaSb8=

by: Quadibloc - Thu, 9 Nov 2023 21:42 UTC

On Thu, 09 Nov 2023 21:38:31 +0000, Quadibloc wrote:

> I want to "have my cake and eat it too" - to have a computer that's just
> as good as a Power PC or a 68000 or a System/360, even though they have
> different, incompatible, strengths that conflict with a computer being
> able to be good at what each of them is good at simultaneously.

Actually, it's worse than that, since I also want the virtues of processors
like the TMS320C2000 or the Itanium.

John Savard

Re: Concertina II Progress

<uijk93$2dc2i$2@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34862&group=comp.arch#34862

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Thu, 9 Nov 2023 21:51:31 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 55
Message-ID: <uijk93$2dc2i$2@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 9 Nov 2023 21:51:31 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c079aedf6b4bac36d322fcc71f54415d";
logging-data="2535506"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX183D40KAmslBTl1Q4b9XfZWvgLFfIlUJ3I="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:TVT68k3EpkvCcKzjdD+w47YYiOA=

by: Quadibloc - Thu, 9 Nov 2023 21:51 UTC

On Thu, 09 Nov 2023 15:36:12 -0600, BGB-Alt wrote:
> On 11/9/2023 12:50 PM, Thomas Koenig wrote:

>> So, r1 = r2 + r3 + offset.
>>
>> Three registers is 15 bits plus a 16-bit offset, which gives you 31
>> bits. You're left with one bit of opcode, one for load and one for
>> store.
>>
>>
> Oh, that is even worse than I understood it as, namely:
> LDx Rd, (Rs, Disp16)
> ...
>
> But, yeah, 1 bit of opcode clearly wouldn't work...

And indeed, he is correct, that is what I'm trying to do.

But I easily solve _most_ of the problem.

I just use 3 bits for the index register and the base register.

The 32 general registers aren't _quite_ general. They're divided into
four groups of eight.

16-bit register-to-register instructions use eight bits to specify their
source and destination registers, so both registers must be from the same
group of eight registers.

This lends itself to writing code where four distinct threads are
interleaved, helping pipelining in implementations too cheap to have
out-of-order executiion.

The index register can be one of registers 1 to 7 (0 means no indexing).

The base register can be one of registers 25 to 31. (24, or a 0 in the
three-bit base register field, indicates a special addressing mode.)

This sort of is reminiscent of System/360 coding conventions.

The special addressing modes do stuff like using registers 17 to 23 as
base registers with a 12 bit displacement, so that additional short
segments can be accessed.

As I noted, shaving off two bits each from two fields gives me four more
bits, and five bits is exactly what I need for the opcode field.

Unfortunately, I needed one more bit, because I also wanted 16-bit
instructions, and they take up too much space. That led me... to some
interesting gyrations, but I finally found a compromise that was
acceptable to me for saving those bits, so acceptable that I could drop
the option of using the block header to switch to using "full" instructions
instead. Finally!

John Savard

Re: Concertina II Progress

<uijlet$2dmqh$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34863&group=comp.arch#34863

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Thu, 9 Nov 2023 22:11:41 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <uijlet$2dmqh$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjgn$2d6t9$1@dont-email.me>
<uijjoj$2dc2i$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 9 Nov 2023 22:11:41 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c079aedf6b4bac36d322fcc71f54415d";
logging-data="2546513"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+W8b1zbzEHLOn4rNKT9wxf14FzHi5PHXE="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:OOB39O342eGGEh71GmZmsd2BPwg=

by: Quadibloc - Thu, 9 Nov 2023 22:11 UTC

On Thu, 09 Nov 2023 21:42:43 +0000, Quadibloc wrote:

> On Thu, 09 Nov 2023 21:38:31 +0000, Quadibloc wrote:
>
>> I want to "have my cake and eat it too" - to have a computer that's
>> just as good as a Power PC or a 68000 or a System/360, even though they
>> have different, incompatible, strengths that conflict with a computer
>> being able to be good at what each of them is good at simultaneously.
>
> Actually, it's worse than that, since I also want the virtues of
> processors like the TMS320C2000 or the Itanium.

And don't forget the Cray-I.

So the idea is to have *one* ISA that will serve for...

embedded microcontrollers,
data-base servers,
desktop workstations, and
HPC supercomputers.

Of course, these different tasks will require different implementations,
which focus on doing parts of the ISA well.

John Savard

Re: Concertina II Progress

<uijr5g$2ep8o$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34864&group=comp.arch#34864

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bohannonindustriesllc@gmail.com (BGB-Alt)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Thu, 9 Nov 2023 17:49:03 -0600
Organization: A noiseless patient Spider
Lines: 207
Message-ID: <uijr5g$2ep8o$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 9 Nov 2023 23:49:04 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="efb5eeda7acc0ccb1252eabdcfde588b";
logging-data="2581784"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1994dLDXq5F7BP0ks8IFnOrAjhiHdQX1yM="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:qJeenanYvhq8v5/BV8wpR95tAd8=
Content-Language: en-US
In-Reply-To: <uijk93$2dc2i$2@dont-email.me>

by: BGB-Alt - Thu, 9 Nov 2023 23:49 UTC

On 11/9/2023 3:51 PM, Quadibloc wrote:
> On Thu, 09 Nov 2023 15:36:12 -0600, BGB-Alt wrote:
>> On 11/9/2023 12:50 PM, Thomas Koenig wrote:
>
>>> So, r1 = r2 + r3 + offset.
>>>
>>> Three registers is 15 bits plus a 16-bit offset, which gives you 31
>>> bits. You're left with one bit of opcode, one for load and one for
>>> store.
>>>
>>>
>> Oh, that is even worse than I understood it as, namely:
>> LDx Rd, (Rs, Disp16)
>> ...
>>
>> But, yeah, 1 bit of opcode clearly wouldn't work...
>
> And indeed, he is correct, that is what I'm trying to do.
>
> But I easily solve _most_ of the problem.
>
> I just use 3 bits for the index register and the base register.
>
> The 32 general registers aren't _quite_ general. They're divided into
> four groups of eight.
>

Errm, splitting up registers like this is likely to hurt far more than
anything that 16-bit displacements are likely to gain.

Unless, maybe, registers were being treated like a stack, but even then,
this is still gonna suck.

Much preferable for a compiler to have a flat space of 32 or 64
registers. Having 16 sorta works, but does still add a bit to spill and
fill.

Theoretically, 32 registers should be "pretty good", but I ended up with
64 partly due to arguable weakness in my compilers' register allocation.

Say, 64 makes it possible to static assign most of the variables in most
of the functions, which avoids the need for spill and fill; at least
with a register allocator that isn't smart enough to locally assign
registers across basic-block boundaries).

I am not sure if a more clever compiler (such as GCC) could also find
ways to make effective use of 64 GPRs.

I guess, IA-64 did have 128 registers in banks of 32. Not sure how well
this worked.

> 16-bit register-to-register instructions use eight bits to specify their
> source and destination registers, so both registers must be from the same
> group of eight registers.
>

When I added R32..R63, I ended up not bothering adding any way to access
them from 16-bit ops.

So:
R0..R15: Generally accessible for all of 16-bit land;
R16..R31: Accessible from a limited subset of 16-bit operations.
R32..R63: Inaccessible from 16-bit land.
Only accessible for an ISA subset for 32-bit ops in XGPR.

Things are more orthogonal in XG2:
No 16-bit ops;
All of the 32-bit ops can access R0..R63 in the same way.

> This lends itself to writing code where four distinct threads are
> interleaved, helping pipelining in implementations too cheap to have
> out-of-order executiion.
>

Considered variations on this in my case as well, just with static
control flow.

However, BGBCC is nowhere near clever enough to pull this off...

Best that can be managed is doing this sort of thing manually (this is
sort of how "functions with 100+ local variables" are born).

In theory, a compiler could infer when blocks of code or functions are
not sequentially dependent and inline everything and schedule it in
parallel, but alas, this sort of thing requires a bit of cleverness that
is hard to pull off.

> The index register can be one of registers 1 to 7 (0 means no indexing).
>
> The base register can be one of registers 25 to 31. (24, or a 0 in the
> three-bit base register field, indicates a special addressing mode.)
>
> This sort of is reminiscent of System/360 coding conventions.
>

OK.

> The special addressing modes do stuff like using registers 17 to 23 as
> base registers with a 12 bit displacement, so that additional short
> segments can be accessed.
>
> As I noted, shaving off two bits each from two fields gives me four more
> bits, and five bits is exactly what I need for the opcode field.
>
> Unfortunately, I needed one more bit, because I also wanted 16-bit
> instructions, and they take up too much space. That led me... to some
> interesting gyrations, but I finally found a compromise that was
> acceptable to me for saving those bits, so acceptable that I could drop
> the option of using the block header to switch to using "full" instructions
> instead. Finally!
>

A more straightforward encoding would make things, more straightforward...

Main debates I think are, say:
Whether to start with the MSB of each word (what I had often done);
Or, start from the LSB (like RISC-V);
Whether 5 or 6 bit register fields;
How much bits for immediate and opcode fields;
...

Bundling and predication may eat a few bits, say:
00: Scalar
01: Bundle
10/11: If-True / If-False

In my case, this did leave an ugly hack case to support conditional ops
in bundles. Namely, the instruction to "Load 24 bits into R0" has
different interpretations in each case (Scalar: Load 24 bits into R0;
Bundle: Jumbo Prefix; If-True/If-False, repeat a different instruction
block, but understood as both conditional and bundled).

This could be fully orthogonal with 3 bits, but it seems, this is a big ask:
000, Unconditional, Scalar
001, Unconditional, Bundle
010, Special, Scalar (Eg: Large constant load or Branch)
011, Special, Bundle (Eg: Jumbo Prefix)
100, If-True, Scalar
101, If-True, Bundle
110, If-False, Scalar
111, If-False, Bundle

This leads to a lopsided encoding though, and it seems like things only
really fit together nicely with a limited combination of sizes.

Say, for an immediate field:
24+ 9 => 33s
24+24+16 => 64
This is almost magic...

Though:
26+ 7 => 33s
26+26+12 => 64
Could also work.

But, does end up with an ISA layout where immediate values are mostly 7u
or 7n, which is not nearly as attractive as 9u and 9n.

Say, for Load/Store displacement hit (rough approximations, from memory):
5u: 35%
7u: 65%
9u: 90%
....

All turns into a bit of an annoying numbers game sometimes...

But, this ended up as part of why I ended up with XG2, which didn't give
me everything I wanted, and the encodings of some things does have more
"dog chew" than I would like (I would have preferred if everything were
nice contiguous fields, rather than the bits for each register field
being scattered across the instruction word).

But, the numbers added up in a way that worked better than most of the
alternatives I could come up with (and happened to also be the "least
effort" implementation path).

Granted, I still keep half expecting people to be like "Dude, just jump
onto the RISC-V wagon...".

Or, failing this, at least implement enough of RISC-V to be able to run
Linux on it (but, this would require significant architectural changes;
being able to run a "stock" RV64GC Linux build would effectively require
partially cloning a bunch of SiFive's architectural choices or similar;
which is not something I would be happy with).

But, otherwise, pretty much any other option in this area would still
mean a porting effort...

Well, and the on/off consideration of trying to port a BSD variant, as
BSD seemed like potentially less effort (there is far less implicit
assumptions of GNU related stuff being used).

....

Re: Concertina II Progress

<cb09075f8208771a17611005f8aeb4f3@news.novabbs.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34866&group=comp.arch#34866

copy link Newsgroups: comp.arch

Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Fri, 10 Nov 2023 01:11:13 +0000
Organization: novaBBS
Message-ID: <cb09075f8208771a17611005f8aeb4f3@news.novabbs.com>
References: <uigus7$1pteb$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="403341"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$Yot04ah6u5G.09duYqCHb.f/Eiv5CoAiQk1ofxpJf6Jkag9W5rTeK

by: MitchAlsup - Fri, 10 Nov 2023 01:11 UTC

Quadibloc wrote:

> Some progress has been made in advancing a small step towards sanity
> in the description of the Concertina II architecture described at

> http://www.quadibloc.com/arch/ct17int.htm

> As Mitch Alsup has rightly noted, I want to have my cake and eat it
> too. I want an instruction format that is quick to fetch and decode,
> like a RISC format. I want RISC-like banks of 32 registers, and I
> want the CISC-like addressing modes of the IBM System/360, but with
> 16-bit displacements, not 12-bit displacements.
<
My 66000 has all of this.
<
> I want memory-reference instructions to still fit in 32 bits, despite
> asking for so much more capacity.
<
The simple/easy ones definitely, the ones with longer displacements no.
<
> So what I had done was, after squeezing as much as I could into a basic
> instruction format, I provided for switching into alternate instruction
> formats which made different compromises by using the block headers.
<
Block headers are simply consuming entropy.
<
> This has now been dropped. Since I managed to get the normal (unaligned)
> memory-reference instruction squeezed into so much less opcode space that
> I also had room for the aligned memory-reference format without compromises
> in the basic instruction set, it wasn't needed to have multiple instruction
> formats.
<
I never had any aligned memory references. The HW overhead to "fix" the
problem is so small as to be compelling.
<
> I had to change the instructions longer than 32 bits to get them in the
> basic instruction format, so now they're less dense.

> Block structure is still used, but now for only the two things it's
> actually needed for: reserving part of a block as unused for the
> pseudo-immediates, and for VLIW features (explicitly indicating
> parallelism, and instruction predication).

> The ISA is still tremendously complicated, since I've put room in it for
> a large assortment of instructions of all kinds, but I think it's
> definitely made a significant stride towards sanity.
<
Yet, mine remains simple and compact.
<
> John Savard

Re: Concertina II Progress

<uikb5h$2lcq7$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34873&group=comp.arch#34873

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Thu, 9 Nov 2023 22:19:48 -0600
Organization: A noiseless patient Spider
Lines: 106
Message-ID: <uikb5h$2lcq7$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<cb09075f8208771a17611005f8aeb4f3@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 10 Nov 2023 04:22:09 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b3b72785317a8a803b52477c3548dfe9";
logging-data="2798407"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/d/xKRxljUe6PxvlXSexnF"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:/AhcGZz5L3wXDTbzYMK8kmHi4qk=
In-Reply-To: <cb09075f8208771a17611005f8aeb4f3@news.novabbs.com>
Content-Language: en-US

by: BGB - Fri, 10 Nov 2023 04:19 UTC

On 11/9/2023 7:11 PM, MitchAlsup wrote:
> Quadibloc wrote:
>

Good to see you are back on here...

>> Some progress has been made in advancing a small step towards sanity
>> in the description of the Concertina II architecture described at
>
>> http://www.quadibloc.com/arch/ct17int.htm
>
>> As Mitch Alsup has rightly noted, I want to have my cake and eat it
>> too. I want an instruction format that is quick to fetch and decode,
>> like a RISC format. I want RISC-like banks of 32 registers, and I
>> want the CISC-like addressing modes of the IBM System/360, but with
>> 16-bit displacements, not 12-bit displacements.
> <
> My 66000 has all of this.
> <
>> I want memory-reference instructions to still fit in 32 bits, despite
>> asking for so much more capacity.
> <
> The simple/easy ones definitely, the ones with longer displacements no.
> <

Yes.

As noted a few times, as I see it, 9 .. 12 is sufficient.
Much less than 9 is "not enough", much more than 12 is wasting entropy,
at least for 32-bit encodings.

12u-scaled would be "pretty good", say, being able to handle 32K for
QWORD ops.

>> So what I had done was, after squeezing as much as I could into a basic
>> instruction format, I provided for switching into alternate instruction
>> formats which made different compromises by using the block headers.
> <
> Block headers are simply consuming entropy.
> <

Also yes.

>> This has now been dropped. Since I managed to get the normal (unaligned)
>> memory-reference instruction squeezed into so much less opcode space that
>> I also had room for the aligned memory-reference format without
>> compromises
>> in the basic instruction set, it wasn't needed to have multiple
>> instruction
>> formats.
> <
> I never had any aligned memory references. The HW overhead to "fix" the
> problem is so small as to be compelling.
> <

In my case, it is only for 128-bit load/store operations, which require
64-bit alignment.

Well, and an esoteric edge case:
if((PC&0xE)==0xE)
You can't use a 96-bit encoding, and will need to insert a NOP if one
needs to do so.

One can argue that aligned-only allows for a cheaper L1 D$, but also
"sucks pretty bad" for some tasks:
Fast memcpy;
LZ decompression;
Huffman;
...

>> I had to change the instructions longer than 32 bits to get them in
>> the basic instruction format, so now they're less dense.
>
>> Block structure is still used, but now for only the two things it's
>> actually needed for: reserving part of a block as unused for the
>> pseudo-immediates, and for VLIW features (explicitly indicating
>> parallelism, and instruction predication).
>
>> The ISA is still tremendously complicated, since I've put room in it for
>> a large assortment of instructions of all kinds, but I think it's
>> definitely made a significant stride towards sanity.
> <
> Yet, mine remains simple and compact.
> <

Mostly similar.
Though, I guess some people could debate this in my case.

Granted, I specify the entire ISA in a single location, rather than
spreading it across a bunch of different documents (as was the case with
RISC-V).

Well, and where there is a lot that is left up to the specific hardware
implementations in terms of stuff that one would need to "actually have
an OS run on it", ...

>> John Savard

Re: Concertina II Progress

<uikbng$2lh5f$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34874&group=comp.arch#34874

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Fri, 10 Nov 2023 04:31:45 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 16
Message-ID: <uikbng$2lh5f$1@dont-email.me>
References: <uijjoj$2dc2i$1@dont-email.me>
<memo.20231110002917.11928M@jgd.cix.co.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 10 Nov 2023 04:31:45 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1a0ac2400ab3a89b8e8874de24426a5c";
logging-data="2802863"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+9AbYO8bhkL0gMIgj2YyANdC2RIB5AzRI="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:iZiho4NKawAPer0hohjSrRkNeqs=

by: Quadibloc - Fri, 10 Nov 2023 04:31 UTC

On Fri, 10 Nov 2023 00:29:00 +0000, John Dallman wrote:

> In article <uijjoj$2dc2i$1@dont-email.me>, quadibloc@servername.invalid
> (Quadibloc) wrote:
>
>> Actually, it's worse than that, since I also want the virtues of
>> processors like the TMS320C2000 or the Itanium.
>
> What do you consider the virtues of Itanium to be?

Well, I think that superscalar operation of microprocessors is a good
thing. Explicitly indicating which instructions may execute in parallel
is one way to facilitate that. Even if the Itanium was an unsuccessful
implementation of that principle.

John Savard

Re: Concertina II Progress

<uikc1s$2lh5f$2@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34875&group=comp.arch#34875

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Fri, 10 Nov 2023 04:37:16 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <uikc1s$2lh5f$2@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 10 Nov 2023 04:37:16 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1a0ac2400ab3a89b8e8874de24426a5c";
logging-data="2802863"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+X6jRwaHgmOBrRwMnFUaijAEpFM2NGfdo="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:37EeCq/6fIHvufg0dno94YsZlFc=

by: Quadibloc - Fri, 10 Nov 2023 04:37 UTC

On Thu, 09 Nov 2023 17:49:03 -0600, BGB-Alt wrote:
> On 11/9/2023 3:51 PM, Quadibloc wrote:

>> The 32 general registers aren't _quite_ general. They're divided into
>> four groups of eight.

> Errm, splitting up registers like this is likely to hurt far more than
> anything that 16-bit displacements are likely to gain.

For 32-bit instructions, the only implication is that the first few
integer registers would be used as index registers, and the last few
would be used as base registers, which is likely to be true in any
case.

It's only in the 16-bit operate instructions that this splitting of
registers is actively present as a constraint. It is needed to make
16-bit operate instructions possible.

So the cure is that if a compiler finds this too much trouble, it
doesn't have to use the 16-bit instructions.

Of course, if compilers can't use them, that raises the question of
whether 16-bit instructions are worth having. Without them, the
complications that I needed to be happy about my memory-reference
instructions could have been entirely avoided.

John Savard

Re: Concertina II Progress

<uikcd2$2lh5f$3@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34876&group=comp.arch#34876

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadibloc@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Fri, 10 Nov 2023 04:43:14 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <uikcd2$2lh5f$3@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<cb09075f8208771a17611005f8aeb4f3@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 10 Nov 2023 04:43:14 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1a0ac2400ab3a89b8e8874de24426a5c";
logging-data="2802863"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18NqETCZL5P4IndLHevCldreExjTpw3N0Y="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:XXcnZI9uusC2ns9RWypjnIidjAs=

by: Quadibloc - Fri, 10 Nov 2023 04:43 UTC

On Fri, 10 Nov 2023 01:11:13 +0000, MitchAlsup wrote:

> I never had any aligned memory references. The HW overhead to "fix" the
> problem is so small as to be compelling.

Since I have a complete set of memory-reference instructions for which
unaligned memory-reference instructions are supported, the problem isn't
that I think unaligned fetches and stores take too many gates.

Rather, being able to only specify aligned accesses saves *opcode space*,
which lets me fit in one complete set of memory-reference instructions that
can use all the base registers, all the index registers, and always use all
the registers as destination registers.

While the unaligned-capable instructions, that offer also important
additional addressing modes, had to have certain restrictions to fit in.

So they use six out of the seven index registers, they can use only half
the registers as destination registers on indexed accesses, and they use
four out of the seven base registers.

Having 16-bit instructions for the possibility of more compact code meant
that I had to have at least one of the two restrictions noted above -
having both restrictions meant that I could offer the alternative of
aligned-only instructions with neither restriction, which may be far less
painful for some.

John Savard

Re: Concertina II Progress

<uikjp2$2muaq$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34877&group=comp.arch#34877

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Fri, 10 Nov 2023 00:46:43 -0600
Organization: A noiseless patient Spider
Lines: 128
Message-ID: <uikjp2$2muaq$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me>
<uikc1s$2lh5f$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 10 Nov 2023 06:49:06 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b3b72785317a8a803b52477c3548dfe9";
logging-data="2849114"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18A9xaoshG/spmrIFd6EOmn"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:WDXygZGZoCi54RWWXWm07CpewM8=
In-Reply-To: <uikc1s$2lh5f$2@dont-email.me>
Content-Language: en-US

by: BGB - Fri, 10 Nov 2023 06:46 UTC

On 11/9/2023 10:37 PM, Quadibloc wrote:
> On Thu, 09 Nov 2023 17:49:03 -0600, BGB-Alt wrote:
>> On 11/9/2023 3:51 PM, Quadibloc wrote:
>
>>> The 32 general registers aren't _quite_ general. They're divided into
>>> four groups of eight.
>
>> Errm, splitting up registers like this is likely to hurt far more than
>> anything that 16-bit displacements are likely to gain.
>
> For 32-bit instructions, the only implication is that the first few
> integer registers would be used as index registers, and the last few
> would be used as base registers, which is likely to be true in any
> case.
>
> It's only in the 16-bit operate instructions that this splitting of
> registers is actively present as a constraint. It is needed to make
> 16-bit operate instructions possible.
>

FWIW: I went with 16-bit ops with 4-bit register fields (with a small
subset with 5-bit register fields).

Granted, layout was different than SH:
zzzz-nnnn-mmmm-zzzz //typical SH layout
zzzz-zzzz-nnnn-mmmm //typical BJX2 layout

Where, as noted, typical 32-bit layout in my case is:
111p-ZwZZ-nnnn-mmmm ZZZZ-qnmo-oooo-ZZZZ
And, in XG2:
NMOP-ZwZZ-nnnn-mmmm ZZZZ-qnmo-oooo-ZZZZ

I guess, a "minor" reorganization might yield, say:
PwZZ-ZZZZ-ZZnn-nnnn-mmmm-mmoo-oooo-ZZZZ (3R)
PwZZ-ZZZZ-ZZnn-nnnn-mmmm-mmZZ-ZZZZ-ZZZZ (2R)
PwZZ-ZZZZ-ZZnn-nnnn-mmmm-mmii-iiii-iiii (3RI, Imm10)
PwZZ-ZZZZ-ZZnn-nnnn-ZZZZ-ZZii-iiii-iiii (2RI, Imm10)
PwZZ-ZZZZ-ZZnn-nnnn-iiii-iiii-iiii-iiii (2RI, Imm16)
PwZZ-ZZZZ-iiii-iiii-iiii-iiii-iiii-iiii (Imm24)

Which seems like actually a relatively nice layout thus far...

Possibly, going further:
Pw00-ZZZZ-ZZnn-nnnn-mmmm-mmoo-oooo-ZZZZ (3R Space)
Pw00-1111-ZZnn-nnnn-mmmm-mmZZ-ZZZZ-ZZZZ (2R Space)

Pw01-ZZZZ-ZZnn-nnnn-mmmm-mmii-iiii-iiii (Ld/St Disp10)

Pw10-0ZZZ-ZZnn-nnnn-mmmm-mmii-iiii-iiii (3RI Imm10, ALU Block)
Pw10-1ZZZ-ZZnn-nnnn-ZZZZ-ZZii-iiii-iiii (2RI Imm10)

Pw11-0ZZZ-ZZnn-nnnn-iiii-iiii-iiii-iiii (2RI, Imm16)

Pw11-1110-iiii-iiii-iiii-iiii-iiii-iiii BRA Disp24s (+/- 32MB)
Pw11-1111-iiii-iiii-iiii-iiii-iiii-iiii BSR Disp24s (+/- 32MB)

1111-111Z-iiii-iiii-iiii-iiii-iiii-iiii Jumbo

Though, might almost make sense for PrWEX to be N/E, as the PrWEX blocks
seem to be infrequently used in BJX2 (basically, for predicated
instructions that exist as part of an instruction bundle).

Say:
Scalar: 77.3%
WEX : 8.9%
Pred : 13.5%
PrWEX : 0.3%

> So the cure is that if a compiler finds this too much trouble, it
> doesn't have to use the 16-bit instructions.
>
> Of course, if compilers can't use them, that raises the question of
> whether 16-bit instructions are worth having. Without them, the
> complications that I needed to be happy about my memory-reference
> instructions could have been entirely avoided.
>

For performance optimized cases, I am starting to suspect 16-bit ops are
not worth it.

For size optimization, they make sense; but size optimization also means
mostly confining register allocation to R0..R15 in my case, with
heuristics for when to enable additional registers, where enabling the
higher registers effectively hinders the use of 16-bit instructions.

The other option I have found is that, rather than optimizing for
smaller instructions (as in an ISA with 16 bit instructions), one can
instead optimize for doing stuff in as few instructions as it is
reasonable to do so, which in turn further goes against the use of
16-bit instructions.

And, thus far, I am ending up building a lot of my programs in XG2 mode
despite the slightly worse code density (leaving the main "hold outs"
for the Baseline encoding mostly being the kernel and Boot ROM).

The kernel could go over to XG2 without too much issue, mostly leaving
the Boot ROM. Switching over the ROM would require some functional
tweaks (coming out of reset in a different mode), as well as probably
either increasing the size of the ROM or removing some stuff (building
the Boot ROM as-is in XG2 mode would exceed the current 32K limit).

Granted, the main things the ROM contains is a bunch of boot-time sanity
check stuff, a RAM counter, FAT32 driver, and stuff to init the graphics
module (such as a Boot-time ASCII font, *).

*: Though, this font saves some space by only encoding the ASCII-range
characters, and packing the character glyphs into 5*6 pixels (allowing
32-bits, rather than the 64-bits needed for an 8x8 glyph). This won out
aesthetically over using a 7-segment or 14-segment font (as well as it
taking more complex logic to unpack 7 or 14 segment into an 8x8
character cell).

Where, say, unlike a CGA or VGA, the initial font is not held in a
hardware ROM. There was originally, but it was cheaper to manage the
font in software, effectively using the VRAM as a plain color-cell
display in text mode.

....

Re: Concertina II Progress

<4sr3N.17406$AqO5.3263@fx11.iad>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34878&group=comp.arch#34878

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.furie.org.uk!usenet.goja.nl.eu.org!2.eu.feeder.erje.net!feeder.erje.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer02.ams4!peer.am4.highwinds-media.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx11.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Concertina II Progress
Newsgroups: comp.arch
References: <uigus7$1pteb$1@dont-email.me> <uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me> <uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me> <uikc1s$2lh5f$2@dont-email.me>
Lines: 19
Message-ID: <4sr3N.17406$AqO5.3263@fx11.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Fri, 10 Nov 2023 14:51:44 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Fri, 10 Nov 2023 14:51:44 GMT
X-Received-Bytes: 1643

by: Scott Lurndal - Fri, 10 Nov 2023 14:51 UTC

Quadibloc <quadibloc@servername.invalid> writes:
>On Thu, 09 Nov 2023 17:49:03 -0600, BGB-Alt wrote:
>> On 11/9/2023 3:51 PM, Quadibloc wrote:
>
>>> The 32 general registers aren't _quite_ general. They're divided into
>>> four groups of eight.
>
>> Errm, splitting up registers like this is likely to hurt far more than
>> anything that 16-bit displacements are likely to gain.
>
>For 32-bit instructions, the only implication is that the first few
>integer registers would be used as index registers, and the last few
>would be used as base registers, which is likely to be true in any
>case.

As soon as you make 'general purpose registers' not 'general'
you've significantly complicated register allocation in compilers
and likely caused additional memory accesses due to the need to
spill registers unnecessarily.

Re: Concertina II Progress

<b823b8abcfb22863a70eae7e0283cc39@news.novabbs.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34879&group=comp.arch#34879

copy link Newsgroups: comp.arch

Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Fri, 10 Nov 2023 18:22:43 +0000
Organization: novaBBS
Message-ID: <b823b8abcfb22863a70eae7e0283cc39@news.novabbs.com>
References: <uigus7$1pteb$1@dont-email.me> <cb09075f8208771a17611005f8aeb4f3@news.novabbs.com> <uikb5h$2lcq7$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="480217"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Level: *
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$hNq6wQXZ9JsmofDhLDA.TOTcheksPLOrSAGcWCNfSzOm2WTDIM7dW
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949

by: MitchAlsup - Fri, 10 Nov 2023 18:22 UTC

BGB wrote:

> On 11/9/2023 7:11 PM, MitchAlsup wrote:
>> Quadibloc wrote:
>>

> Good to see you are back on here...

>>> Some progress has been made in advancing a small step towards sanity
>>> in the description of the Concertina II architecture described at
>>
>>> http://www.quadibloc.com/arch/ct17int.htm
>>
>>> As Mitch Alsup has rightly noted, I want to have my cake and eat it
>>> too. I want an instruction format that is quick to fetch and decode,
>>> like a RISC format. I want RISC-like banks of 32 registers, and I
>>> want the CISC-like addressing modes of the IBM System/360, but with
>>> 16-bit displacements, not 12-bit displacements.
>> <
>> My 66000 has all of this.
>> <
>>> I want memory-reference instructions to still fit in 32 bits, despite
>>> asking for so much more capacity.
>> <
>> The simple/easy ones definitely, the ones with longer displacements no.
>> <

> Yes.

> As noted a few times, as I see it, 9 .. 12 is sufficient.
> Much less than 9 is "not enough", much more than 12 is wasting entropy,
> at least for 32-bit encodings.
<
Can you suggest something I could have done by sacrificing 16-bits
down to 12-bits that would have improved "something" in my ISA ??
{{You see I did not have any trouble in having all 16-bits for MEM
references--just like having 16-bits for integer, logical, and branch
offsets.}}
<
> 12u-scaled would be "pretty good", say, being able to handle 32K for
> QWORD ops.
<
IBM 360 found so, EMBench is replete with stack sizes and struct sizes
where My 66000 uses 1×32-bit instruction where RISC-V needs 2×32-bit...
Exactly the difference between 12-bits and 14-bits....

>>> So what I had done was, after squeezing as much as I could into a basic
>>> instruction format, I provided for switching into alternate instruction
>>> formats which made different compromises by using the block headers.
>> <
>> Block headers are simply consuming entropy.
>> <

> Also yes.

>>> This has now been dropped. Since I managed to get the normal (unaligned)
>>> memory-reference instruction squeezed into so much less opcode space that
>>> I also had room for the aligned memory-reference format without
>>> compromises
>>> in the basic instruction set, it wasn't needed to have multiple
>>> instruction
>>> formats.
>> <
>> I never had any aligned memory references. The HW overhead to "fix" the
>> problem is so small as to be compelling.
>> <

> In my case, it is only for 128-bit load/store operations, which require
> 64-bit alignment.
<
VVM does all the wide stuff without necessitating the wide stuff in
registers or instructions.
<
> Well, and an esoteric edge case:
> if((PC&0xE)==0xE)
> You can't use a 96-bit encoding, and will need to insert a NOP if one
> needs to do so.
<
Ehhhhh...
<
> One can argue that aligned-only allows for a cheaper L1 D$, but also
> "sucks pretty bad" for some tasks:
> Fast memcpy;
> LZ decompression;
> Huffman;
> ...
<
Time found that HW can solve the problem way more than adequately--
obviating its inclusion entirely. {Sooner or later Reduced leads RISC}
<

>>> I had to change the instructions longer than 32 bits to get them in
>>> the basic instruction format, so now they're less dense.
>>
>>> Block structure is still used, but now for only the two things it's
>>> actually needed for: reserving part of a block as unused for the
>>> pseudo-immediates, and for VLIW features (explicitly indicating
>>> parallelism, and instruction predication).
>>
>>> The ISA is still tremendously complicated, since I've put room in it for
>>> a large assortment of instructions of all kinds, but I think it's
>>> definitely made a significant stride towards sanity.
>> <
>> Yet, mine remains simple and compact.
>> <

> Mostly similar.
> Though, I guess some people could debate this in my case.

> Granted, I specify the entire ISA in a single location, rather than
> spreading it across a bunch of different documents (as was the case with
> RISC-V).

> Well, and where there is a lot that is left up to the specific hardware
> implementations in terms of stuff that one would need to "actually have
> an OS run on it", ...

>>> John Savard

Re: Concertina II Progress

<uilskk$2v1d2$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34880&group=comp.arch#34880

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Fri, 10 Nov 2023 12:24:08 -0600
Organization: A noiseless patient Spider
Lines: 144
Message-ID: <uilskk$2v1d2$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me>
<uikc1s$2lh5f$2@dont-email.me> <4sr3N.17406$AqO5.3263@fx11.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 10 Nov 2023 18:26:29 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b3b72785317a8a803b52477c3548dfe9";
logging-data="3114402"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18RZ7f3zRuzDVG5oGjF6eY3"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:TlbypDbUr6yBFd7xhqkgtISXgqo=
Content-Language: en-US
In-Reply-To: <4sr3N.17406$AqO5.3263@fx11.iad>

by: BGB - Fri, 10 Nov 2023 18:24 UTC

On 11/10/2023 8:51 AM, Scott Lurndal wrote:
> Quadibloc <quadibloc@servername.invalid> writes:
>> On Thu, 09 Nov 2023 17:49:03 -0600, BGB-Alt wrote:
>>> On 11/9/2023 3:51 PM, Quadibloc wrote:
>>
>>>> The 32 general registers aren't _quite_ general. They're divided into
>>>> four groups of eight.
>>
>>> Errm, splitting up registers like this is likely to hurt far more than
>>> anything that 16-bit displacements are likely to gain.
>>
>> For 32-bit instructions, the only implication is that the first few
>> integer registers would be used as index registers, and the last few
>> would be used as base registers, which is likely to be true in any
>> case.
>
> As soon as you make 'general purpose registers' not 'general'
> you've significantly complicated register allocation in compilers
> and likely caused additional memory accesses due to the need to
> spill registers unnecessarily.

Yeah.

Either banks of 8, or an 8 data + 8 address, or ... would kinda "rather
suck".

Or, even smaller cases, like, "most instructions can use all the
registers, but these ops only work on a subset" is kind of an annoyance
(this is a big part of why I bothered with the whole XG2 thing).

Much better to have a big flat register space.

Though, within reason.
Say:
* 8: Pain, can barely hold anything in registers.
** One barely has enough for working values for expressions, etc.
* 16: Not quite enough, still lots of spill/fill.
* 32: Can work well, with a good register allocator;
* 64: Can largely eliminate spill/fill, but a little much.
* 128: Too many.
* 256: Absurd.

So, say, 32 and 64 seem to be the "good" area, where with 32, a majority
of the functions can sit comfortably with most or all of their variables
held in registers. But, for functions with a large number of variables
(say, 100 or more), spill/fill becomes an issue (*).

Having 64 allows a majority of functions to use a "static assign
everything" strategy, where spill/fill can be eliminated entirely (apart
from the prolog/epilog sequences), and otherwise seems to deal better
with functions with large numbers of variables.

*: And is more of a pain with a register allocator design which can't
keep any non-static-assigned values in registers across basic-block
boundaries. This issue is, ironically, less obvious with 16 registers
(since spill/fill runs rampant anyways). But having nearly every basic
block start with a blob of stack loads, and end with a blob of stores,
only to reload them all again on the other side of a label, is fairly
obvious.

Having 64 registers does at least mostly hit this nail...

Meanwhile, for 128, there aren't really enough variables and temporaries
in most functions to make effective use of them. Also, 7 bit register
fields wont fit easily into a 32-bit instruction word.

As for register arguments:
* Probably 8 or 16.
** 8 makes the most sense with 32 GPRs.
*** 16 is asking too much.
*** 8 deals with around 98% of functions.
** 16 makes sense with 64 GPRs.
*** Nearly all functions can use exclusively register arguments.
*** Gain is small though, if it only benefits 2% of functions.
*** It is almost a "shoe in", except for cost of fixed spill space
*** 128 bytes at the bottom of every non-leaf stack-frame is noticeable.
*** Though, an ABI could decide to not have a spill space in this way.

Though, admittedly, for a lot of my programs I had still ended up going
with 8 register arguments with 64 GPRs, mostly as the gains of 16
arguments is small, relative of the cost of spending an additional 64
bytes in nearly every stack frame (and also there are still some
unresolved bugs when using 16 argument mode).

....

Current leaning is also that:
32-bit primary instruction size;
32/64/96 bit for variable-length instructions;
Is "pretty good".

In performance-oriented use cases, 16-bit encodings "aren't really worth
it".
In cases where you need a 32 or 64 bit value, being able to encode them
or load them quickly into a register is ideal. Spending multiple
instructions to glue a value together isn't ideal, nor is needing to
load it from memory (this particularly sucks from the compiler POV).

As for addressing modes:
(Rb, Disp) : ~ 66-75%
(Rb, Ri) : ~ 25-33%
Can address the vast majority of cases.

Displacements are most effective when scaled by the size of the element
type, as unaligned displacements are exceedingly rare. The vast majority
of displacements are also positive.

Not having a register-indexed mode is shooting oneself in the foot, as
these are "not exactly rare".

Most other possible addressing modes can be mostly ignored.
Auto-increment becomes moot if one has superscalar or VLIW;
(Rb, Ri, Disp) is only really applicable in niche cases
Eg, array inside struct, etc.
...

RISC-V did sort of shoot itself in the foot in several of these areas,
albeit with some workarounds in "Bitmanip":
SHnADD, can mimic a LEA, allowing array access in fewer ops.
PACK, allows an inline 64-bit constant load in 5 instructions...
LUI+ADD+LUI+ADD+PACK
...

Still not ideal...

An extra cycle for memory access is not ideal for a close second place
addressing mode; nor are 64-bit constants rare enough that one
necessarily wants to spend 5 or so clock cycles on them.

But, still better than the situation where one does not have these
instructions.

....

Re: Concertina II Progress

<30c59d005b54669d78ecb0340028c400@news.novabbs.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34881&group=comp.arch#34881

copy link Newsgroups: comp.arch

Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Fri, 10 Nov 2023 18:29:56 +0000
Organization: novaBBS
Message-ID: <30c59d005b54669d78ecb0340028c400@news.novabbs.com>
References: <uigus7$1pteb$1@dont-email.me> <uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me> <uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me> <uikc1s$2lh5f$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="480521"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Rslight-Site: $2y$10$bOAaj0s39CT5tvIIdxso8.vKSCdDc3wIBKSPojH7v.jYFbdy4qsQ6
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Spam-Level: *

by: MitchAlsup - Fri, 10 Nov 2023 18:29 UTC

Quadibloc wrote:

> On Thu, 09 Nov 2023 17:49:03 -0600, BGB-Alt wrote:
>> On 11/9/2023 3:51 PM, Quadibloc wrote:

>>> The 32 general registers aren't _quite_ general. They're divided into
>>> four groups of eight.

>> Errm, splitting up registers like this is likely to hurt far more than
>> anything that 16-bit displacements are likely to gain.

> For 32-bit instructions, the only implication is that the first few
> integer registers would be used as index registers, and the last few
> would be used as base registers, which is likely to be true in any
> case.

> It's only in the 16-bit operate instructions that this splitting of
> registers is actively present as a constraint. It is needed to make
> 16-bit operate instructions possible.

> So the cure is that if a compiler finds this too much trouble, it
> doesn't have to use the 16-bit instructions.
<
Then why are they there ??
<
I think you will find (like RISC-V is) that having and not mandating use
means you get a bit under ½ of what you think you are getting.
<
> Of course, if compilers can't use them, that raises the question of
> whether 16-bit instructions are worth having. Without them, the
> complications that I needed to be happy about my memory-reference
> instructions could have been entirely avoided.
<
There is a subset of RISC-V designers who want to discard the 16-bit
subset in order to solve the problems of the 32-bit set.
<
I might note: given the space of the compressed ISA in RISC-V, I could
install the entire My 66000 ISA and then not need any of the RISC-V
ISA.....
<
> John Savard

Re: Concertina II Progress

<991895576cac35f060c3dfec992a5efe@news.novabbs.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34882&group=comp.arch#34882

copy link Newsgroups: comp.arch

Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Fri, 10 Nov 2023 18:26:20 +0000
Organization: novaBBS
Message-ID: <991895576cac35f060c3dfec992a5efe@news.novabbs.com>
References: <uijjoj$2dc2i$1@dont-email.me> <memo.20231110002917.11928M@jgd.cix.co.uk> <uikbng$2lh5f$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="480521"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Spam-Level: *
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Site: $2y$10$tATApGs.754n4RgTyiH.mundS4NmA5QhJMiUXQwwEaogiWGI23gGe
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949

by: MitchAlsup - Fri, 10 Nov 2023 18:26 UTC

Quadibloc wrote:

> On Fri, 10 Nov 2023 00:29:00 +0000, John Dallman wrote:

>> In article <uijjoj$2dc2i$1@dont-email.me>, quadibloc@servername.invalid
>> (Quadibloc) wrote:
>>
>>> Actually, it's worse than that, since I also want the virtues of
>>> processors like the TMS320C2000 or the Itanium.
>>
>> What do you consider the virtues of Itanium to be?

Itanic's main virtue was to consume several Intel design teams, over 20
years, preventing Intel from taking over the entire µprocessor market.

I, personally, don't believe in exposing the scalarity to the compiler,
nor the rotating register file to do what renaming does naturally,
nor the lack of proper FP instructions (FDIV, SQRT), ...

Academic quality at industrial prices.

Re: Concertina II Progress

<uilu1p$2vbev$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34884&group=comp.arch#34884

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Fri, 10 Nov 2023 12:48:10 -0600
Organization: A noiseless patient Spider
Lines: 176
Message-ID: <uilu1p$2vbev$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<cb09075f8208771a17611005f8aeb4f3@news.novabbs.com>
<uikb5h$2lcq7$1@dont-email.me>
<b823b8abcfb22863a70eae7e0283cc39@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 10 Nov 2023 18:50:33 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b3b72785317a8a803b52477c3548dfe9";
logging-data="3124703"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+4xhaZOUOIFccKpWOFTd+e"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:wQMFWDGjrXTwUkCe5C6O2b6p6ss=
Content-Language: en-US
In-Reply-To: <b823b8abcfb22863a70eae7e0283cc39@news.novabbs.com>

by: BGB - Fri, 10 Nov 2023 18:48 UTC

On 11/10/2023 12:22 PM, MitchAlsup wrote:
> BGB wrote:
>
>> On 11/9/2023 7:11 PM, MitchAlsup wrote:
>>> Quadibloc wrote:
>>>
>
>> Good to see you are back on here...
>
>
>>>> Some progress has been made in advancing a small step towards sanity
>>>> in the description of the Concertina II architecture described at
>>>
>>>> http://www.quadibloc.com/arch/ct17int.htm
>>>
>>>> As Mitch Alsup has rightly noted, I want to have my cake and eat it
>>>> too. I want an instruction format that is quick to fetch and decode,
>>>> like a RISC format. I want RISC-like banks of 32 registers, and I
>>>> want the CISC-like addressing modes of the IBM System/360, but with
>>>> 16-bit displacements, not 12-bit displacements.
>>> <
>>> My 66000 has all of this.
>>> <
>>>> I want memory-reference instructions to still fit in 32 bits, despite
>>>> asking for so much more capacity.
>>> <
>>> The simple/easy ones definitely, the ones with longer displacements no.
>>> <
>
>> Yes.
>
>> As noted a few times, as I see it, 9 .. 12 is sufficient.
>> Much less than 9 is "not enough", much more than 12 is wasting
>> entropy, at least for 32-bit encodings.
> <
> Can you suggest something I could have done by sacrificing 16-bits
> down to 12-bits that would have improved "something" in my ISA ??
> {{You see I did not have any trouble in having all 16-bits for MEM
> references--just like having 16-bits for integer, logical, and branch
> offsets.}}
> <
>> 12u-scaled would be "pretty good", say, being able to handle 32K for
>> QWORD ops.
> <
> IBM 360 found so, EMBench is replete with stack sizes and struct sizes
> where My 66000 uses 1×32-bit instruction where RISC-V needs 2×32-bit...
> Exactly the difference between 12-bits and 14-bits....
>

RISC-V is 12-bit signed unscaled (which can only do +/- 2K).

On average, 12-bit signed unscaled is actually worse than 9-bit unsigned
scaled (4K range, for QWORD).

So, ironically, despite BJX2 having smaller displacements than RISC-V,
it actually deals better with the larger stack frames.

But, if one could address 32K, this should cover the vast majority of
structs and stack-frames.

A 16-bit unsigned scaled displacement would cover 512K for QWORD ops,
which could be nice, but likely unnecessary.

>>>> So what I had done was, after squeezing as much as I could into a basic
>>>> instruction format, I provided for switching into alternate instruction
>>>> formats which made different compromises by using the block headers.
>>> <
>>> Block headers are simply consuming entropy.
>>> <
>
>> Also yes.
>
>
>>>> This has now been dropped. Since I managed to get the normal
>>>> (unaligned)
>>>> memory-reference instruction squeezed into so much less opcode space
>>>> that
>>>> I also had room for the aligned memory-reference format without
>>>> compromises
>>>> in the basic instruction set, it wasn't needed to have multiple
>>>> instruction
>>>> formats.
>>> <
>>> I never had any aligned memory references. The HW overhead to "fix" the
>>> problem is so small as to be compelling.
>>> <
>
>> In my case, it is only for 128-bit load/store operations, which
>> require 64-bit alignment.
> <
> VVM does all the wide stuff without necessitating the wide stuff in
> registers or instructions.
> <
>> Well, and an esoteric edge case:
>> if((PC&0xE)==0xE)
>> You can't use a 96-bit encoding, and will need to insert a NOP if one
>> needs to do so.
> <
> Ehhhhh...
> <

This is mostly due to a quirk in the L1 I$ design, where "fixing" it
costs more than just being like, "yeah, this case isn't allowed" (and
having the compiler emit a NOP in the rare edge cases it is encountered).

>> One can argue that aligned-only allows for a cheaper L1 D$, but also
>> "sucks pretty bad" for some tasks:
>>    Fast memcpy;
>>    LZ decompression;
>>    Huffman;
>>    ...
> <
> Time found that HW can solve the problem way more than adequately--
> obviating its inclusion entirely. {Sooner or later Reduced leads RISC}
> <
>

Wait, are you arguing for aligned-only memory ops here?...

But, yeah, for me, a major selling points for unaligned access is mostly
that I can copy blocks of memory around like:
v0=((uint64_t *)cs)[0];
v1=((uint64_t *)cs)[1];
v2=((uint64_t *)cs)[2];
v3=((uint64_t *)cs)[3];
((uint64_t *)ct)[0]=v0;
((uint64_t *)ct)[1]=v1;
((uint64_t *)ct)[2]=v2;
((uint64_t *)ct)[3]=v3;
cs+=32; ct+=32;

For Huffman, some of the fastest strategies to implement the bitstream
reading/writing, tend to be to casually make use of unaligned access
(shifting in and loading bytes is slower in comparison).

Though, all this falls on its face, if encountering a CPU that uses
traps to emulate unaligned access (apparently a lot of the SiFive cores
and similar).

>>>> I had to change the instructions longer than 32 bits to get them in
>>>> the basic instruction format, so now they're less dense.
>>>
>>>> Block structure is still used, but now for only the two things it's
>>>> actually needed for: reserving part of a block as unused for the
>>>> pseudo-immediates, and for VLIW features (explicitly indicating
>>>> parallelism, and instruction predication).
>>>
>>>> The ISA is still tremendously complicated, since I've put room in it
>>>> for
>>>> a large assortment of instructions of all kinds, but I think it's
>>>> definitely made a significant stride towards sanity.
>>> <
>>> Yet, mine remains simple and compact.
>>> <
>
>> Mostly similar.
>> Though, I guess some people could debate this in my case.
>
>
>> Granted, I specify the entire ISA in a single location, rather than
>> spreading it across a bunch of different documents (as was the case
>> with RISC-V).
>
>> Well, and where there is a lot that is left up to the specific
>> hardware implementations in terms of stuff that one would need to
>> "actually have an OS run on it", ...
>
>
>>>> John Savard

Re: Concertina II Progress

<uilvki$2vjld$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34885&group=comp.arch#34885

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Fri, 10 Nov 2023 11:17:37 -0800
Organization: A noiseless patient Spider
Lines: 56
Message-ID: <uilvki$2vjld$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me>
<uikc1s$2lh5f$2@dont-email.me> <4sr3N.17406$AqO5.3263@fx11.iad>
<uilskk$2v1d2$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 10 Nov 2023 19:17:39 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="343492a117bca923f8a2126bde8b7562";
logging-data="3133101"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18xup7THx9SOHmBmflHhG1cgWMUwb1GuoM="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:D9m9QG8pdRfzR1mQ7DDjnKtyUwE=
Content-Language: en-US
In-Reply-To: <uilskk$2v1d2$1@dont-email.me>

by: Stephen Fuld - Fri, 10 Nov 2023 19:17 UTC

On 11/10/2023 10:24 AM, BGB wrote:
> On 11/10/2023 8:51 AM, Scott Lurndal wrote:
>> Quadibloc <quadibloc@servername.invalid> writes:
>>> On Thu, 09 Nov 2023 17:49:03 -0600, BGB-Alt wrote:
>>>> On 11/9/2023 3:51 PM, Quadibloc wrote:
>>>
>>>>> The 32 general registers aren't _quite_ general. They're divided into
>>>>> four groups of eight.
>>>
>>>> Errm, splitting up registers like this is likely to hurt far more than
>>>> anything that 16-bit displacements are likely to gain.
>>>
>>> For 32-bit instructions, the only implication is that the first few
>>> integer registers would be used as index registers, and the last few
>>> would be used as base registers, which is likely to be true in any
>>> case.
>>
>> As soon as you make 'general purpose registers' not 'general'
>> you've significantly complicated register allocation in compilers
>> and likely caused additional memory accesses due to the need to
>> spill registers unnecessarily.
>
> Yeah.
>
> Either banks of 8, or an 8 data + 8 address, or ... would kinda "rather
> suck".
>
> Or, even smaller cases, like, "most instructions can use all the
> registers, but these ops only work on a subset" is kind of an annoyance
> (this is a big part of why I bothered with the whole XG2 thing).
>
>
> Much better to have a big flat register space.

Yes, but sometimes you just need "another bit" in the instructions. So
an alternative is to break the requirement that all register specifier
fields in the instruction be the same length. So, for example, allow
access to all registers from one source operand position, but say only
half from the other source operand position. So, for a system with 32
registers, you would need 5 plus 5 plus 4 bits. Much of the time, such
as with commutative operations like adds, this doesn't hurt at all.

Yes, this makes register allocation in the compiler harder. And
occasionally you might need an extra instruction to copy a value to the
half size field, but on high end systems, this can be done in the rename
stage without taking an execution slot.

A more extreme alternative is to only allow the destination field to
also be one bit smaller. Of course, this makes things even harder for
the compiler, and probably requires extra "copy" instructions more
frequently, but sometimes you just gotta do what you gotta do. :-(

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Concertina II Progress

<uim9bb$324n9$1@newsreader4.netcologne.de>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34887&group=comp.arch#34887

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2a0a-a540-1bc1-0-504f-11d4-aff-5db1.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Fri, 10 Nov 2023 22:03:23 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <uim9bb$324n9$1@newsreader4.netcologne.de>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me>
<uikc1s$2lh5f$2@dont-email.me>
Injection-Date: Fri, 10 Nov 2023 22:03:23 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2a0a-a540-1bc1-0-504f-11d4-aff-5db1.ipv6dyn.netcologne.de:2a0a:a540:1bc1:0:504f:11d4:aff:5db1";
logging-data="3216105"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Fri, 10 Nov 2023 22:03 UTC

Quadibloc <quadibloc@servername.invalid> schrieb:
> On Thu, 09 Nov 2023 17:49:03 -0600, BGB-Alt wrote:
>> On 11/9/2023 3:51 PM, Quadibloc wrote:
>
>>> The 32 general registers aren't _quite_ general. They're divided into
>>> four groups of eight.
>
>> Errm, splitting up registers like this is likely to hurt far more than
>> anything that 16-bit displacements are likely to gain.
>
> For 32-bit instructions, the only implication is that the first few
> integer registers would be used as index registers, and the last few
> would be used as base registers, which is likely to be true in any
> case.

This breaks with the central tenet of the /360, the PDP-11,
the VAX, and all RISC architectures: (Almost) all registers are
general-purpose registers.

This would make your ISA very un-S/360-like.

Re: Concertina II Progress

<c9a50eb43dfeef91982b1aea845425cd@news.novabbs.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34888&group=comp.arch#34888

copy link Newsgroups: comp.arch

Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Fri, 10 Nov 2023 23:21:08 +0000
Organization: novaBBS
Message-ID: <c9a50eb43dfeef91982b1aea845425cd@news.novabbs.com>
References: <uigus7$1pteb$1@dont-email.me> <cb09075f8208771a17611005f8aeb4f3@news.novabbs.com> <uikb5h$2lcq7$1@dont-email.me> <b823b8abcfb22863a70eae7e0283cc39@news.novabbs.com> <uilu1p$2vbev$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="502698"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$YFAQeq/aR59jDQ4EwmDEDuTsHJtW7WqpzH3lLFwxyuAKxm/LQlvdC
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Spam-Level: *

by: MitchAlsup - Fri, 10 Nov 2023 23:21 UTC

BGB wrote:

> On 11/10/2023 12:22 PM, MitchAlsup wrote:
>
>>> One can argue that aligned-only allows for a cheaper L1 D$, but also
>>> "sucks pretty bad" for some tasks:
>>>    Fast memcpy;
>>>    LZ decompression;
>>>    Huffman;
>>>    ...
>> <
>> Time found that HW can solve the problem way more than adequately--
>> obviating its inclusion entirely. {Sooner or later Reduced leads RISC}
>> <
>>

> Wait, are you arguing for aligned-only memory ops here?...
<

No, I am arguing that all memory references are inherently un aligned, but where
aligned references never suffer a stall penalty; and the the compiler does not
need to understand if the reference is aligned or unaligned.
<
> But, yeah, for me, a major selling points for unaligned access is mostly
> that I can copy blocks of memory around like:
> v0=((uint64_t *)cs)[0];
> v1=((uint64_t *)cs)[1];
> v2=((uint64_t *)cs)[2];
> v3=((uint64_t *)cs)[3];
> ((uint64_t *)ct)[0]=v0;
> ((uint64_t *)ct)[1]=v1;
> ((uint64_t *)ct)[2]=v2;
> ((uint64_t *)ct)[3]=v3;
> cs+=32; ct+=32;
<
MM Rcs,Rct,#length // without the for loop
<
> For Huffman, some of the fastest strategies to implement the bitstream
> reading/writing, tend to be to casually make use of unaligned access
> (shifting in and loading bytes is slower in comparison).

> Though, all this falls on its face, if encountering a CPU that uses
> traps to emulate unaligned access (apparently a lot of the SiFive cores
> and similar).
<
Traps to perform unaligned are so 1985......either don't allow them at all
(SIGSEGV) or treat them as first class citizens. The former fails in the market.
<
>

Re: Concertina II Progress

<0166705bcb25fe905da6138067ebf665@news.novabbs.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=34889&group=comp.arch#34889

copy link Newsgroups: comp.arch

Path: i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Fri, 10 Nov 2023 23:25:41 +0000
Organization: novaBBS
Message-ID: <0166705bcb25fe905da6138067ebf665@news.novabbs.com>
References: <uigus7$1pteb$1@dont-email.me> <uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me> <uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me> <uikc1s$2lh5f$2@dont-email.me> <uim9bb$324n9$1@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="503095"; mail-complaints-to="usenet@i2pn2.org";
posting-account="t+lO0yBNO1zGxasPvGSZV1BRu71QKx+JE37DnW+83jQ";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$wx6ZRtdTO6nP1.DRBkbWnO/snA7vo3k3La9IYJwy.tzBSfkgegAzi
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949

by: MitchAlsup - Fri, 10 Nov 2023 23:25 UTC

Thomas Koenig wrote:

> Quadibloc <quadibloc@servername.invalid> schrieb:
>> On Thu, 09 Nov 2023 17:49:03 -0600, BGB-Alt wrote:
>>> On 11/9/2023 3:51 PM, Quadibloc wrote:
>>
>>>> The 32 general registers aren't _quite_ general. They're divided into
>>>> four groups of eight.
>>
>>> Errm, splitting up registers like this is likely to hurt far more than
>>> anything that 16-bit displacements are likely to gain.
>>
>> For 32-bit instructions, the only implication is that the first few
>> integer registers would be used as index registers, and the last few
>> would be used as base registers, which is likely to be true in any
>> case.

> This breaks with the central tenet of the /360, the PDP-11,
> the VAX, and all RISC architectures: (Almost) all registers are
> general-purpose registers.
<
But follows S.E.L 32/{...} series and several other minicomputers with
isolated base registers. In the 32/{..} series, there was 2 LDs and 2 STs
1 LD was byte (signed) with 19-bit displacement
2 LD was size (signed) with the lower bits of displacement specifying size.
3 ST was byte <ibid>
3 ST was size <ibid>
<
only registers 1-7 could be used as base register.
<
I saw several others using similar tricks but can't remember.....
<
> This would make your ISA very un-S/360-like.

Pages:12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

server_pubkey.txt

rocksolid light 0.9.8
clearnet tor