Rocksolid Light - comp.arch - Re: How much space did the 68000 registers take up?

Re: bus wars, How much space did the 68000 registers take up?

<c48c688a-5744-453a-a3a2-e3803f0e7958n@googlegroups.com>

https://news.novabbs.org/devel/article-flat.php?id=33204&group=comp.arch#33204

X-Received: by 2002:ac8:1184:0:b0:400:aa9a:d9c with SMTP id d4-20020ac81184000000b00400aa9a0d9cmr18190qtj.0.1689171428851;
Wed, 12 Jul 2023 07:17:08 -0700 (PDT)
X-Received: by 2002:a9d:7854:0:b0:6b9:92c9:11fe with SMTP id
c20-20020a9d7854000000b006b992c911femr4886593otm.3.1689171428553; Wed, 12 Jul
2023 07:17:08 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 12 Jul 2023 07:17:08 -0700 (PDT)
In-Reply-To: <2023Jul12.071827@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=71.230.96.169; posting-account=ujX_IwoAAACu0_cef9hMHeR8g0ZYDNHh
NNTP-Posting-Host: 71.230.96.169
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<a27529b5-fa8e-4bb8-a1f3-2b285fdf3e7cn@googlegroups.com> <u8duml$chsc$1@newsreader4.netcologne.de>
<u8f3b7$kqn$1@gal.iecc.com> <u8f4r5$27re5$1@dont-email.me>
<98fafac9-74c4-4548-900c-628d0d6d7f21n@googlegroups.com> <63488f8f-fb98-4667-a096-53aec8104f24n@googlegroups.com>
<2023Jul12.071827@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c48c688a-5744-453a-a3a2-e3803f0e7958n@googlegroups.com>
Subject: Re: bus wars, How much space did the 68000 registers take up?
From: timcaffrey@aol.com (Timothy McCaffrey)
Injection-Date: Wed, 12 Jul 2023 14:17:08 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: Timothy McCaffrey - Wed, 12 Jul 2023 14:17 UTC

On Wednesday, July 12, 2023 at 1:31:24 AM UTC-4, Anton Ertl wrote:
> Timothy McCaffrey <timca...@aol.com> writes:
> >Also, all those weir=
> >d indirect modes they
> >added to the '020 probably delayed things (and I suspect they were put in t> >here because
> >of Apple, but I have never confirmed that).
> Seems unlikely to me. Apollo, HP and Sun were using 68000s since
> 1981, while Apple's Lisa was introduced only in 1983 at a high price
> (and it sold few), and the MacIntosh was introduced in 1984. The
> 68020 design was completed in the summer of 1983, so why should
> Motorola put the modes in because of Apple? Apple used the 68020 only
> in the MacIntosh II in 1987, so it's not as if Apple was particularly
> eager to use the 68020.
>
> Maybe Mitch Alsup knows how the indirect modes were added to the
> 68020. My guess would be that they were added because the VAX had
> them, and the 68020 architects had not heard of the RISC research (or
> did not take it seriously).
>

Your comment about the timing does make it seem unlikely that Apple
had any say, although Lisa/Macintosh development did take a few years.

The reason I kept thinking it was caused by Apple is the original MacOS
(and maybe the Lisa OS (whatever that was called)) used a memory management
where the application had a "handle" to a memory chunk, and needed to
lookup the actual memory address pretty much every single time the chunk
was accessed (in practice, there may have been some lee-way if you could
ensure you hadn't given up the processor in between accesses). This
allowed the chunk to be moved for garbage collection/allocation purposes.

Many of the indirect memory addressing modes of the '020 would have
the potential to make this easier, and possibly more efficient since it
may have freed up a register or two.

Ironically, I was involved in another product that used a similar scheme in the
same time period. Boy, was it ever bug prone. When we moved to the 386
and protected mode, the segment registers made all those issues just go away
(well, it flagged a bunch of problems, but then they went away :) ).
I bet something similar would have happened if Apple had used the 8086 ;)
(Which is kinda what happened with Windows....)

- Tim

On 7/11/23 22:23, Timothy McCaffrey wrote:
> On Sunday, July 9, 2023 at 4:49:50 PM UTC-4, MitchAlsup wrote:
>>> I guess a question is if it had been used in the PC instead of x86, if
>>> Motorola could have then made it performance competitive with what later
>>> x86 systems became?...
>> <
>> If 68K was used instead of 8086, PC would have been more like LISA.
>> {{which evolved into MacIntosh.}
>>>
>
> I think the Atari ST is closer to what the PC would have been capabilty wise.
>
>> 68K was out performing 8086 in that era
>> After 486 it was all done.
>
> I always got the impression that Moto management just DID NOT UNDERSTAND
> that they could not just create on product and coast on it for years. Intel management
> (at the time) did seem to understand that, although they wasted a bunch of time
> on the iAPX32. If you look at the time line, there was a big pause between 1979 and 1984,
> and by 1984 Motorola was definitely playing catch-up. Also, all those weird indirect modes they
> added to the '020 probably delayed things (and I suspect they were put in there because
> of Apple, but I have never confirmed that).
>
> - Tim

Not sure Motorola did coast on the 68K range. Working in embedded
space, it was all over the place, in networking, peripherals,
avionics and more. The first 16 bit machine you could build serious
product with. That and later derivatives were largely responsible
for the desktop unix revolution. It's easy to miss just how
significant the 68K series were, even if there were other competitors
such as the Z8000, the 16032 series from Nat Semi and the Texas 9xxx
series.

Motorola took a lot of hint from DEC PDP11 and maybe VAX as well, the
instruction set mnemonics are quite similar to pdp11...

Chris

Timothy McCaffrey <timcaffrey@aol.com> writes:
>The reason I kept thinking it was caused by Apple is the original MacOS
>(and maybe the Lisa OS (whatever that was called)) used a memory management
>where the application had a "handle" to a memory chunk, and needed to
>lookup the actual memory address pretty much every single time the chunk
>was accessed (in practice, there may have been some lee-way if you could
>ensure you hadn't given up the processor in between accesses).

IIRC the original MacOS uses no multi-tasking, so ensuring that was no
problem.

>This
>allowed the chunk to be moved for garbage collection/allocation purposes.
>
>Many of the indirect memory addressing modes of the '020 would have=20
>the potential to make this easier, and possibly more efficient since it
>may have freed up a register or two.

But MacOS applications were written for the 68000 ISA, which did not
have these addressing modes.

I guess the 68020 architects had some encoding space left, and
wondered what to do with that. Adding indirection probably was cheap
to implement and appeared useful, given that earlier architectures had
such features. Already the 68010 used interruptible (instead of
restartable) instructions (and stack puke), so the indirect modes did
not obviously worsen that aspect. I expect that the 68040 and 68060
designers were less happy about this.

Looking at
<https://archive.computerhistory.org/resources/access/text/2012/04/102658164-05-01-acc.pdf>, I see:

|Gunter: [...] As we went up the chain a little bit to 68010, 68020, we
|created a monster in terms of all the addressing modes that we had. We
|thought that adding more addressing modes was the way you made a
|machine more powerful, totally contrary to the principle of RISC
|later. [...]

One other thing he said:

|Gunter: [...] we did a lot of the design, again too smart for out own
|good, because Pascal was going to be the next language. [...] We put
|a lot of features into the 68000–20 particularly to have modes of
|operation that would underwrite Pascal. I don't think we ever sold
|more than 10,000 to the entire Pascal community.

Although AFAIK Pascal was relevant on Apollo and the Mac in the
beginning.

And another one from him, related to the IBM PC:

|Gunter: We were constantly told that the real decision at IBM was they
|wanted to make sure that we were going to have a competitive product
|from a cost perspective; and the fact that you had the 8088, an 8–bit
|implementation. And we didn’t. That gave you the option to build—IBM
|at that time didn’t have the PC. [...] We refused to do the 8–bit
|version of the 68000. In retrospect, it would take one person less
|than a month to do the conversion. We basically didn't do that until
|after the fact. That would even have fit in a 48–pin package.
| |House: Well, of course this is all part of the 8086 story as opposed
|to the 68K story but the design team at Intel, they'd done an optional
|8-bit bus, but it was on the die. We never bonded it down, never
|tested it at the time that we proposed that. And the bigger damage was
|that it could use the 8–bit peripherals from the 8085. They were all
|second sourced by AMD at the time and very low priced, which provided
|an advantage. But clearly the 68000, with its register structure and
|addressing modes, had a significant architectural advantage. Our
|feeling was that we had to take a cost approach. I had certainly
|assumed that you had better access to the design engineers, because
|they clearly talked to us about the 68000 multiple times. They were
|looking at it closely.

>Ironically, I was involved in another product that used a similar scheme in=
> the
>same time period. Boy, was it ever bug prone. When we moved to the 386
>and protected mode, the segment registers made all those issues just go awa=
>y

So you were one of the few users who used segment registers as
envisioned by the 80286 designers?

Earlier I wrote about the 68020 (1984) as equibvalent to the i386
(1985), but actually the i386 has a paging MMU built-in, and only the
68030 (1987) included such an MMU. But then very few 386 customers
used the paging in the early years, while I expect that most of the
early 68020s were accompanied by the 68851.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: bus wars, How much space did the 68000 registers take up?

<548d1def-d0f8-4e50-b520-752bb3a0f934n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33207&group=comp.arch#33207

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:103:b0:3ff:21f1:95b with SMTP id u3-20020a05622a010300b003ff21f1095bmr79390qtw.6.1689182845445;
Wed, 12 Jul 2023 10:27:25 -0700 (PDT)
X-Received: by 2002:aca:b956:0:b0:3a1:f3ed:e9e with SMTP id
j83-20020acab956000000b003a1f3ed0e9emr5448870oif.3.1689182845133; Wed, 12 Jul
2023 10:27:25 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 12 Jul 2023 10:27:24 -0700 (PDT)
In-Reply-To: <2023Jul12.163318@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=12.150.92.234; posting-account=ujX_IwoAAACu0_cef9hMHeR8g0ZYDNHh
NNTP-Posting-Host: 12.150.92.234
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<a27529b5-fa8e-4bb8-a1f3-2b285fdf3e7cn@googlegroups.com> <u8duml$chsc$1@newsreader4.netcologne.de>
<u8f3b7$kqn$1@gal.iecc.com> <u8f4r5$27re5$1@dont-email.me>
<98fafac9-74c4-4548-900c-628d0d6d7f21n@googlegroups.com> <63488f8f-fb98-4667-a096-53aec8104f24n@googlegroups.com>
<2023Jul12.071827@mips.complang.tuwien.ac.at> <c48c688a-5744-453a-a3a2-e3803f0e7958n@googlegroups.com>
<2023Jul12.163318@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <548d1def-d0f8-4e50-b520-752bb3a0f934n@googlegroups.com>
Subject: Re: bus wars, How much space did the 68000 registers take up?
From: timcaffrey@aol.com (Timothy McCaffrey)
Injection-Date: Wed, 12 Jul 2023 17:27:25 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4015

by: Timothy McCaffrey - Wed, 12 Jul 2023 17:27 UTC

On Wednesday, July 12, 2023 at 1:06:23 PM UTC-4, Anton Ertl wrote:
> Timothy McCaffrey <timca...@aol.com> writes:
> IIRC the original MacOS uses no multi-tasking, so ensuring that was no
> problem.

It was a "co-operative multitasking" OS, like early versions of Windows.
However, I remember now, memory chunks could be relocated any time
a new memory chunk was allocated. You had to make sure any raw
pointers to memory chunks were re-loaded any time a memory allocation
call was done. Just think how much fun that would be with modern
languages that do that behind your back...

> >This
> >allowed the chunk to be moved for garbage collection/allocation purposes..
> >
> >Many of the indirect memory addressing modes of the '020 would have=20
> >the potential to make this easier, and possibly more efficient since it
> >may have freed up a register or two.
> But MacOS applications were written for the 68000 ISA, which did not
> have these addressing modes.
>

Agreed. My point was that the indirect modes would have made things easier for the
programmer if they chose to use them.

[snip]

> >Ironically, I was involved in another product that used a similar scheme in> > the
> >same time period. Boy, was it ever bug prone. When we moved to the 386
> >and protected mode, the segment registers made all those issues just go awa=
> >y
>
> So you were one of the few users who used segment registers as
> envisioned by the 80286 designers?
>

And so were: Windows/286 ..ME users, CTOS, Xenix 286 (and other Unix/Unix derivatives),
OS/2 and some DOS extenders. I think one of the reasons that 286 protected mode wasn't
used more is because of the performance impact, up to 50% on our code. If FS & GS had
existed earlier (at least introduced with the 286) segment register thrashing would have been
reduced.

> Earlier I wrote about the 68020 (1984) as equibvalent to the i386
> (1985), but actually the i386 has a paging MMU built-in, and only the
> 68030 (1987) included such an MMU. But then very few 386 customers
> used the paging in the early years, while I expect that most of the
> early 68020s were accompanied by the 68851.
>
I think Sun did their own using external logic because the '851 was kind of a dog.

- Tim

Re: bus wars, How much space did the 68000 registers take up?

<fb3f096d-d6f9-4e46-9f1a-3dea7cb001c9n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33208&group=comp.arch#33208

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:1887:b0:400:9f40:e4f4 with SMTP id v7-20020a05622a188700b004009f40e4f4mr22890qtc.6.1689184317199;
Wed, 12 Jul 2023 10:51:57 -0700 (PDT)
X-Received: by 2002:a05:6870:7702:b0:1b0:239b:8dfa with SMTP id
dw2-20020a056870770200b001b0239b8dfamr5743311oab.11.1689184317009; Wed, 12
Jul 2023 10:51:57 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.cmpublishers.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 12 Jul 2023 10:51:56 -0700 (PDT)
In-Reply-To: <2023Jul12.071827@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:6c3a:4c21:b5f4:f0a8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:6c3a:4c21:b5f4:f0a8
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<a27529b5-fa8e-4bb8-a1f3-2b285fdf3e7cn@googlegroups.com> <u8duml$chsc$1@newsreader4.netcologne.de>
<u8f3b7$kqn$1@gal.iecc.com> <u8f4r5$27re5$1@dont-email.me>
<98fafac9-74c4-4548-900c-628d0d6d7f21n@googlegroups.com> <63488f8f-fb98-4667-a096-53aec8104f24n@googlegroups.com>
<2023Jul12.071827@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fb3f096d-d6f9-4e46-9f1a-3dea7cb001c9n@googlegroups.com>
Subject: Re: bus wars, How much space did the 68000 registers take up?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Wed, 12 Jul 2023 17:51:57 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3117

by: MitchAlsup - Wed, 12 Jul 2023 17:51 UTC

On Wednesday, July 12, 2023 at 12:31:24 AM UTC-5, Anton Ertl wrote:
> Timothy McCaffrey <timca...@aol.com> writes:
> >Also, all those weir=
> >d indirect modes they
> >added to the '020 probably delayed things (and I suspect they were put in t> >here because
> >of Apple, but I have never confirmed that).
> Seems unlikely to me. Apollo, HP and Sun were using 68000s since
> 1981, while Apple's Lisa was introduced only in 1983 at a high price
> (and it sold few), and the MacIntosh was introduced in 1984. The
> 68020 design was completed in the summer of 1983, so why should
> Motorola put the modes in because of Apple? Apple used the 68020 only
> in the MacIntosh II in 1987, so it's not as if Apple was particularly
> eager to use the 68020.
<
Bugs and cost. Apple wanted -020s at $20/per In 1983 when I got there
Moto yield was such that selling at Apple desired cost would put moto
in the position of wrapping $50-$100 around every chip delivered. Apple
also demanded a warrantee on the chips.
<
Then there was the wrangling's of second sources........
>
> Maybe Mitch Alsup knows how the indirect modes were added to the
> 68020. My guess would be that they were added because the VAX had
> them, and the 68020 architects had not heard of the RISC research (or
> did not take it seriously).
>
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: bus wars, How much space did the 68000 registers take up?

<61d07fc7-7bbc-48f3-a615-594cda1b4dacn@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33209&group=comp.arch#33209

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:420d:b0:635:db0c:95eb with SMTP id nd13-20020a056214420d00b00635db0c95ebmr19964qvb.1.1689184704231;
Wed, 12 Jul 2023 10:58:24 -0700 (PDT)
X-Received: by 2002:a05:6870:955e:b0:1b0:60ff:b746 with SMTP id
v30-20020a056870955e00b001b060ffb746mr6504748oal.4.1689184703907; Wed, 12 Jul
2023 10:58:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 12 Jul 2023 10:58:23 -0700 (PDT)
In-Reply-To: <2023Jul12.163318@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:6c3a:4c21:b5f4:f0a8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:6c3a:4c21:b5f4:f0a8
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<a27529b5-fa8e-4bb8-a1f3-2b285fdf3e7cn@googlegroups.com> <u8duml$chsc$1@newsreader4.netcologne.de>
<u8f3b7$kqn$1@gal.iecc.com> <u8f4r5$27re5$1@dont-email.me>
<98fafac9-74c4-4548-900c-628d0d6d7f21n@googlegroups.com> <63488f8f-fb98-4667-a096-53aec8104f24n@googlegroups.com>
<2023Jul12.071827@mips.complang.tuwien.ac.at> <c48c688a-5744-453a-a3a2-e3803f0e7958n@googlegroups.com>
<2023Jul12.163318@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <61d07fc7-7bbc-48f3-a615-594cda1b4dacn@googlegroups.com>
Subject: Re: bus wars, How much space did the 68000 registers take up?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Wed, 12 Jul 2023 17:58:24 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 115

by: MitchAlsup - Wed, 12 Jul 2023 17:58 UTC

On Wednesday, July 12, 2023 at 12:06:23 PM UTC-5, Anton Ertl wrote:
> Timothy McCaffrey <timca...@aol.com> writes:
> >The reason I kept thinking it was caused by Apple is the original MacOS
> >(and maybe the Lisa OS (whatever that was called)) used a memory management
> >where the application had a "handle" to a memory chunk, and needed to
> >lookup the actual memory address pretty much every single time the chunk
> >was accessed (in practice, there may have been some lee-way if you could
> >ensure you hadn't given up the processor in between accesses).
> IIRC the original MacOS uses no multi-tasking, so ensuring that was no
> problem.
> >This
> >allowed the chunk to be moved for garbage collection/allocation purposes..
> >
> >Many of the indirect memory addressing modes of the '020 would have=20
> >the potential to make this easier, and possibly more efficient since it
> >may have freed up a register or two.
> But MacOS applications were written for the 68000 ISA, which did not
> have these addressing modes.
>
> I guess the 68020 architects had some encoding space left, and
> wondered what to do with that. Adding indirection probably was cheap
> to implement and appeared useful, given that earlier architectures had
> such features. Already the 68010 used interruptible (instead of
> restartable) instructions (and stack puke), so the indirect modes did
> not obviously worsen that aspect. I expect that the 68040 and 68060
> designers were less happy about this.
>
> Looking at
> <https://archive.computerhistory.org/resources/access/text/2012/04/102658164-05-01-acc.pdf>, I see:
>
> |Gunter: [...] As we went up the chain a little bit to 68010, 68020, we
> |created a monster in terms of all the addressing modes that we had. We
> |thought that adding more addressing modes was the way you made a
> |machine more powerful, totally contrary to the principle of RISC
> |later. [...]
>
> One other thing he said:
>
> |Gunter: [...] we did a lot of the design, again too smart for out own
> |good, because Pascal was going to be the next language. [...] We put
> |a lot of features into the 68000–20 particularly to have modes of
> |operation that would underwrite Pascal. I don't think we ever sold
> |more than 10,000 to the entire Pascal community.
>
> Although AFAIK Pascal was relevant on Apollo and the Mac in the
> beginning.
>
> And another one from him, related to the IBM PC:
>
> |Gunter: We were constantly told that the real decision at IBM was they
> |wanted to make sure that we were going to have a competitive product
> |from a cost perspective; and the fact that you had the 8088, an 8–bit
> |implementation. And we didn’t. That gave you the option to build—IBM
> |at that time didn’t have the PC. [...] We refused to do the 8–bit
> |version of the 68000. In retrospect, it would take one person less
> |than a month to do the conversion. We basically didn't do that until
> |after the fact. That would even have fit in a 48–pin package.
> |
> |House: Well, of course this is all part of the 8086 story as opposed
> |to the 68K story but the design team at Intel, they'd done an optional
> |8-bit bus, but it was on the die. We never bonded it down, never
> |tested it at the time that we proposed that. And the bigger damage was
> |that it could use the 8–bit peripherals from the 8085. They were all
> |second sourced by AMD at the time and very low priced, which provided
> |an advantage. But clearly the 68000, with its register structure and
> |addressing modes, had a significant architectural advantage. Our
> |feeling was that we had to take a cost approach. I had certainly
> |assumed that you had better access to the design engineers, because
> |they clearly talked to us about the 68000 multiple times. They were
> |looking at it closely.
>
>
> >Ironically, I was involved in another product that used a similar scheme in> > the
> >same time period. Boy, was it ever bug prone. When we moved to the 386
> >and protected mode, the segment registers made all those issues just go awa=
> >y
>
> So you were one of the few users who used segment registers as
> envisioned by the 80286 designers?
>
> Earlier I wrote about the 68020 (1984) as equibvalent to the i386
> (1985), but actually the i386 has a paging MMU built-in, and only the
> 68030 (1987) included such an MMU. But then very few 386 customers
> used the paging in the early years, while I expect that most of the
> early 68020s were accompanied by the 68851.
<
Sun used 68K and had their own MMU--an SRAM accessed between DRAM RAS
and CAS supplied the translated CAS address and an interrupt back to the CPU
on access error. This was a lot cheaper solution than -851.
<
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: How much space did the 68000 registers take up?

<u8n9n3$3d8jv$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33210&group=comp.arch#33210

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: How much space did the 68000 registers take up?
Date: Wed, 12 Jul 2023 17:32:55 -0500
Organization: A noiseless patient Spider
Lines: 156
Message-ID: <u8n9n3$3d8jv$1@dont-email.me>
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<a27529b5-fa8e-4bb8-a1f3-2b285fdf3e7cn@googlegroups.com>
<2023Jul9.173411@mips.complang.tuwien.ac.at> <u8eth9$26qad$1@dont-email.me>
<92742c70-42d4-44ff-b667-df77c679d5afn@googlegroups.com>
<u8fe16$292j5$1@dont-email.me>
<5ea12dcd-a5c6-4b74-a580-c0d871694dbdn@googlegroups.com>
<u8fmeq$29ukm$1@dont-email.me>
<6beed6fc-0432-4f55-9f38-36a57670c56en@googlegroups.com>
<u8hb2p$2iql4$1@dont-email.me>
<bf88abfc-d0dc-4c97-8717-65d21328bb71n@googlegroups.com>
<u8k7a1$3002q$1@dont-email.me>
<bc7611d8-ba5d-488b-960d-8c539b9cf96cn@googlegroups.com>
<u8kl05$31cjp$1@dont-email.me>
<b0167266-782f-4468-a552-a85e3a5da427n@googlegroups.com>
<4c89ac71-75db-4688-a5b2-e89bf57a1fbdn@googlegroups.com>
<u8lf80$hh9l$1@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 12 Jul 2023 22:33:08 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d20b2869af2f8a4407bb2e819b7f51f5";
logging-data="3580543"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+mZ2Q8iYRFEuUGzo+P+mEO"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:bOfQb9wFcGyUa7TFwuzIkXsDTN8=
In-Reply-To: <u8lf80$hh9l$1@newsreader4.netcologne.de>
Content-Language: en-US

by: BGB - Wed, 12 Jul 2023 22:32 UTC

On 7/12/2023 12:55 AM, Thomas Koenig wrote:
> MitchAlsup <MitchAlsup@aol.com> schrieb:
>
>> I chose 64-bit constants because I was doing a 64-bit machine and 64-bit
>> constants are a perfect match. Since the constants are universal, I have
>> easy access to 64-bit displacements--and support Fortran common blocks
>> of any size (up to about 2^62). Fortran programmers can go back and pass
>> everything in common blocks......one can port dusty deck programs change
>> the sizes of array bounds and it all just works.
>
> There is one additional problem (outside the scope of an ISA):
> Integer sizes.
>
> FORTRAN up to 77 (which is what dusty decks are, by definition) are
> only had one integer size, which is 32 bits on today's platforms.
> Compilers have switches to set default integer sizes, but these
> switches make the compiler violate the Fortran standard because
> suddenly the size of an INTEGER is twice the size of a REAL,
> which is likely to mess up COMMON-block based storage management
> in dusty deck programs.
>
> When trying to bring an old FORTRAN program into the 64-bit world,
> some decisions and some work will still be required. A significant
> first step would probably be to change the memory management to
> dynamic style (which has only been in the language for ~30 years
> now :-)
>

In C land, we still haven't entirely seen the end of:
foo(p, q) void *p, *q; { ... }

It is a lot harder to port code when it assumes that
"sizeof(int)==sizeof(void *)", "sizeof(int)==2", ...

A lot of this code will often need significant modification to port to BJX2.

Well, along with things like:
unsigned char * far screen;
screen = MK_FP(0xA000, 0x0000);
...

And, my graphics hardware being unable to directly emulate "Mode 13h"
style graphics.

Though, did at least at one point think up a possible hack that "could"
be used to support linear-addressed bitmapped graphics modes without too
much modification. Haven't done so yet.

But, yeah, don't know much about FORTRAN, as it wasn't really a language
I have used...

As for constants...

I can also note (compiler stats):
Disp9 hit rate: 98.15%
3RI, Imm9 hit rate: 94.47%

Hit rate, 3RI, Imm9 & Imm33s: 99.44%
Miss Rate: 0.56% (would require a 96-bit JLDI / "Jumbo Load" op).

Constant displacements which exceed 33 bits are basically non-existent
(So, Imm9u+Disp33s: 100%).

Between several programs, it is a similar story (though the exact
percentages are prone to vary).

Say, Imm9 hit rate varies from around ~ 90-98%, ...

Or, IOW: It doesn't particularly seem worth worrying too much about
larger constants, as they are likely to disappear into the noise.

Checking my "cycles used per instruction" stats, the relevant case
'JLDI' is ~ 0.65% of the cycle budget.

Seemingly, its dominant use-case is loading absolute addresses for MMIO
and similar.

The bigger source of overflowed displacements (requiring a jumbo
encoding) is mostly in relation to global variables.

As noted, even with common global variables being sorted by usage
frequency, there are still a lot of global variables that fall well
outside the reach of a 9 or 10 bit GBR displacement (512B or 1K).

If Jumbo is enabled, looks like these cases are mostly being handled
with a 64-bit / Disp33s encoding.

The non-jumbo fallback strategy is loading a 24-bit displacement into
R0, but this will fail if ".data"+".bss" exceeds 16MB. Luckily, any core
that is likely to be running programs big enough for this to matter,
will also be big enough to support jumbo prefixes.

Can assume that ".data"+".bss" is very unlikely to exceed 4GB.
Supporting this would require adding a slower fallback case, say:
MOV GBR, R4
MOV Disp48, R0
ADD R4, R0, R4
MOV.Q (R4), R5
Would kinda suck though...

And/or allow a special case exception, say:
A Disp57s encoding can be used with LEA.B.
LEA.B (GBR, Disp57s), R4
MOV.Q (R4), R5
Mostly by quietly routing the LEA.B operation through the ALU in this
case (and quietly sidestepping the Disp33s limitation for this LEA.B
encoding, which would instead behave more like an ADD instruction).

....

OTOH, I think I just figured out the likely cause of the
"R_AliasDrawModel sometimes gets totally messed up" bug in Quake:
Quake was blowing out the stack something hard (and was going well
outside the 128K stack-size limit...).

Partly it seems to have been a case of the code to fold stack arrays
into temporary heap allocations "wasn't actually fully implemented".
Have gone and implemented more of it now (then stumbled on a few more
implementation holes related to VLA handling in the process, ...).

Technically, it is dealing with large local arrays by flagging them as
VLAs instead (which then use RefArrays built on top of the "alloca"
mechanism which in turn is built on "malloc", ...).

Currently this only works for 1D arrays of primitive types (as opposed
to multidimensional arrays or arrays of structs). This may be a "TODO"
thing (could work for multidimensional arrays, but it looks like the
design of the VLA mechanism falls short of what is needed to deal with
arrays of structs).

But, turns out Quake had a bunch of largish arrays of structs on the
stack that I hadn't noticed (but did go and add a compiler warning for
this stuff...).

Might make sense to add something to properly detect if the user program
has overflowed the stack (vs, say, just having it run off the end and
potentially corrupt memory somewhere else).

....

Re: How much space did the 68000 registers take up?

<7aae0ec6-7a81-471c-97c2-c4087fd46a33n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33211&group=comp.arch#33211

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:4a48:b0:635:e5f2:4ecc with SMTP id ph8-20020a0562144a4800b00635e5f24eccmr77107qvb.5.1689202387013;
Wed, 12 Jul 2023 15:53:07 -0700 (PDT)
X-Received: by 2002:a9d:6219:0:b0:6b9:513:e364 with SMTP id
g25-20020a9d6219000000b006b90513e364mr26451otj.1.1689202386715; Wed, 12 Jul
2023 15:53:06 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 12 Jul 2023 15:53:06 -0700 (PDT)
In-Reply-To: <u8n9n3$3d8jv$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:6c3a:4c21:b5f4:f0a8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:6c3a:4c21:b5f4:f0a8
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<a27529b5-fa8e-4bb8-a1f3-2b285fdf3e7cn@googlegroups.com> <2023Jul9.173411@mips.complang.tuwien.ac.at>
<u8eth9$26qad$1@dont-email.me> <92742c70-42d4-44ff-b667-df77c679d5afn@googlegroups.com>
<u8fe16$292j5$1@dont-email.me> <5ea12dcd-a5c6-4b74-a580-c0d871694dbdn@googlegroups.com>
<u8fmeq$29ukm$1@dont-email.me> <6beed6fc-0432-4f55-9f38-36a57670c56en@googlegroups.com>
<u8hb2p$2iql4$1@dont-email.me> <bf88abfc-d0dc-4c97-8717-65d21328bb71n@googlegroups.com>
<u8k7a1$3002q$1@dont-email.me> <bc7611d8-ba5d-488b-960d-8c539b9cf96cn@googlegroups.com>
<u8kl05$31cjp$1@dont-email.me> <b0167266-782f-4468-a552-a85e3a5da427n@googlegroups.com>
<4c89ac71-75db-4688-a5b2-e89bf57a1fbdn@googlegroups.com> <u8lf80$hh9l$1@newsreader4.netcologne.de>
<u8n9n3$3d8jv$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7aae0ec6-7a81-471c-97c2-c4087fd46a33n@googlegroups.com>
Subject: Re: How much space did the 68000 registers take up?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Wed, 12 Jul 2023 22:53:07 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: MitchAlsup - Wed, 12 Jul 2023 22:53 UTC

On Wednesday, July 12, 2023 at 5:33:12 PM UTC-5, BGB wrote:
> On 7/12/2023 12:55 AM, Thomas Koenig wrote:
> > MitchAlsup <Mitch...@aol.com> schrieb:
> >
> >> I chose 64-bit constants because I was doing a 64-bit machine and 64-bit
> >> constants are a perfect match. Since the constants are universal, I have
> >> easy access to 64-bit displacements--and support Fortran common blocks
> >> of any size (up to about 2^62). Fortran programmers can go back and pass
> >> everything in common blocks......one can port dusty deck programs change
> >> the sizes of array bounds and it all just works.
> >
> > There is one additional problem (outside the scope of an ISA):
> > Integer sizes.
> >
> > FORTRAN up to 77 (which is what dusty decks are, by definition) are
> > only had one integer size, which is 32 bits on today's platforms.
> > Compilers have switches to set default integer sizes, but these
> > switches make the compiler violate the Fortran standard because
> > suddenly the size of an INTEGER is twice the size of a REAL,
> > which is likely to mess up COMMON-block based storage management
> > in dusty deck programs.
> >
> > When trying to bring an old FORTRAN program into the 64-bit world,
> > some decisions and some work will still be required. A significant
> > first step would probably be to change the memory management to
> > dynamic style (which has only been in the language for ~30 years
> > now :-)
> >
>
>
> In C land, we still haven't entirely seen the end of:
> foo(p, q) void *p, *q; { ... }
<
A useful reason for AI programming machines.........
>
> It is a lot harder to port code when it assumes that
> "sizeof(int)==sizeof(void *)", "sizeof(int)==2", ...
<
Another
>
> A lot of this code will often need significant modification to port to BJX2.
>
> Well, along with things like:
> unsigned char * far screen;
> screen = MK_FP(0xA000, 0x0000);
> ...
>
> And, my graphics hardware being unable to directly emulate "Mode 13h"
> style graphics.
>
> Though, did at least at one point think up a possible hack that "could"
> be used to support linear-addressed bitmapped graphics modes without too
> much modification. Haven't done so yet.
>
>
>
>
> But, yeah, don't know much about FORTRAN, as it wasn't really a language
> I have used...
>
>
>
>
> As for constants...
>
> I can also note (compiler stats):
> Disp9 hit rate: 98.15%
> 3RI, Imm9 hit rate: 94.47%
<
It is not the hit rate that maters, it is how bad is the hassle to get to 100%
>
> Hit rate, 3RI, Imm9 & Imm33s: 99.44%
> Miss Rate: 0.56% (would require a 96-bit JLDI / "Jumbo Load" op).
>
> Constant displacements which exceed 33 bits are basically non-existent
> (So, Imm9u+Disp33s: 100%).
<
What about 50 years from now, Windows will be larger than a 48-bit address
space by then.
>
> Between several programs, it is a similar story (though the exact
> percentages are prone to vary).
>
> Say, Imm9 hit rate varies from around ~ 90-98%, ...
>
>
>
> Or, IOW: It doesn't particularly seem worth worrying too much about
> larger constants, as they are likely to disappear into the noise.
<
For now, and especially for the things you are interested in;
What about 50 years from now ??
>
> Checking my "cycles used per instruction" stats, the relevant case
> 'JLDI' is ~ 0.65% of the cycle budget.
>
> Seemingly, its dominant use-case is loading absolute addresses for MMIO
> and similar.
>
>
> The bigger source of overflowed displacements (requiring a jumbo
> encoding) is mostly in relation to global variables.
<
One of my original goals, and one I have not ever lost sight of is
how does one present a 64-bit VAS to software. That is:: the Compiler,
the Linker, and the OS can put anything anywhere they want to put it.
My 66000 supplies the tools, they get to use them however they want.
>
> As noted, even with common global variables being sorted by usage
> frequency, there are still a lot of global variables that fall well
> outside the reach of a 9 or 10 bit GBR displacement (512B or 1K).
>
> If Jumbo is enabled, looks like these cases are mostly being handled
> with a 64-bit / Disp33s encoding.
>
> The non-jumbo fallback strategy is loading a 24-bit displacement into
> R0, but this will fail if ".data"+".bss" exceeds 16MB. Luckily, any core
> that is likely to be running programs big enough for this to matter,
> will also be big enough to support jumbo prefixes.
>
What if .data+.bss+.rodata is bigger than 48-bits ?
>
> Can assume that ".data"+".bss" is very unlikely to exceed 4GB.
<
For 99% of programs, yes; but what bout the truly large stadium
sized machines with 16,000 processors, 32EB of DRAM, running
shared database applications in cache coherent memory over CXL
extension to PCIe ??

BGB <cr88192@gmail.com> writes:
>On 7/12/2023 12:55 AM, Thomas Koenig wrote:
>> MitchAlsup <MitchAlsup@aol.com> schrieb:
>>
>>> I chose 64-bit constants because I was doing a 64-bit machine and 64-bit
>>> constants are a perfect match. Since the constants are universal, I have
>>> easy access to 64-bit displacements--and support Fortran common blocks
>>> of any size (up to about 2^62). Fortran programmers can go back and pass
>>> everything in common blocks......one can port dusty deck programs change
>>> the sizes of array bounds and it all just works.
>>
>> There is one additional problem (outside the scope of an ISA):
>> Integer sizes.
>>
>> FORTRAN up to 77 (which is what dusty decks are, by definition) are
>> only had one integer size, which is 32 bits on today's platforms.
>> Compilers have switches to set default integer sizes, but these
>> switches make the compiler violate the Fortran standard because
>> suddenly the size of an INTEGER is twice the size of a REAL,
>> which is likely to mess up COMMON-block based storage management
>> in dusty deck programs.
>>
>> When trying to bring an old FORTRAN program into the 64-bit world,
>> some decisions and some work will still be required. A significant
>> first step would probably be to change the memory management to
>> dynamic style (which has only been in the language for ~30 years
>> now :-)
>>
>
>
>In C land, we still haven't entirely seen the end of:
>foo(p, q) void *p, *q; { ... }

Frankly, I haven't seen K&R C style declarations or code for
decades. And iirc, there were several tools to help
convert to ANSI-style declarations back in that timeframe.

I did recently try compiling the version 6 Unix C compiler
with gcc, wouldn't compile, even in most lenient mode.

>
>It is a lot harder to port code when it assumes that
>"sizeof(int)==sizeof(void *)", "sizeof(int)==2", ...

SVR4 (circa 1990) introduced a number of synthetic typedefs,
primarily to ameliorate such issues (along with guidance to use
sizeof() on a variable, not the underlying type). Things like
pid_t, uid_t, gid_t were all changed from the 16-bit used in SVR3
to 32-bit in SVR4. Lots of code broke. Using a typedef just requires
a simple recompile (and the kernel can continue to support legacy
binaries so long as no GID/UID is assigned to that process that exceeds
16 bits).

Re: How much space did the 68000 registers take up?

<u8nhu0$3du57$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33213&group=comp.arch#33213

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: How much space did the 68000 registers take up?
Date: Wed, 12 Jul 2023 19:53:07 -0500
Organization: A noiseless patient Spider
Lines: 325
Message-ID: <u8nhu0$3du57$1@dont-email.me>
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<a27529b5-fa8e-4bb8-a1f3-2b285fdf3e7cn@googlegroups.com>
<2023Jul9.173411@mips.complang.tuwien.ac.at> <u8eth9$26qad$1@dont-email.me>
<92742c70-42d4-44ff-b667-df77c679d5afn@googlegroups.com>
<u8fe16$292j5$1@dont-email.me>
<5ea12dcd-a5c6-4b74-a580-c0d871694dbdn@googlegroups.com>
<u8fmeq$29ukm$1@dont-email.me>
<6beed6fc-0432-4f55-9f38-36a57670c56en@googlegroups.com>
<u8hb2p$2iql4$1@dont-email.me>
<bf88abfc-d0dc-4c97-8717-65d21328bb71n@googlegroups.com>
<u8k7a1$3002q$1@dont-email.me>
<bc7611d8-ba5d-488b-960d-8c539b9cf96cn@googlegroups.com>
<u8kl05$31cjp$1@dont-email.me>
<b0167266-782f-4468-a552-a85e3a5da427n@googlegroups.com>
<4c89ac71-75db-4688-a5b2-e89bf57a1fbdn@googlegroups.com>
<u8lf80$hh9l$1@newsreader4.netcologne.de> <u8n9n3$3d8jv$1@dont-email.me>
<7aae0ec6-7a81-471c-97c2-c4087fd46a33n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 13 Jul 2023 00:53:20 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d20b2869af2f8a4407bb2e819b7f51f5";
logging-data="3602599"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX180F0h1D8MpoRAsSQAGyRcX"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:KvTGN7wbLDr6/AKF+jzZScHLjJk=
In-Reply-To: <7aae0ec6-7a81-471c-97c2-c4087fd46a33n@googlegroups.com>
Content-Language: en-US

by: BGB - Thu, 13 Jul 2023 00:53 UTC

On 7/12/2023 5:53 PM, MitchAlsup wrote:
> On Wednesday, July 12, 2023 at 5:33:12 PM UTC-5, BGB wrote:
>> On 7/12/2023 12:55 AM, Thomas Koenig wrote:
>>> MitchAlsup <Mitch...@aol.com> schrieb:
>>>
>>>> I chose 64-bit constants because I was doing a 64-bit machine and 64-bit
>>>> constants are a perfect match. Since the constants are universal, I have
>>>> easy access to 64-bit displacements--and support Fortran common blocks
>>>> of any size (up to about 2^62). Fortran programmers can go back and pass
>>>> everything in common blocks......one can port dusty deck programs change
>>>> the sizes of array bounds and it all just works.
>>>
>>> There is one additional problem (outside the scope of an ISA):
>>> Integer sizes.
>>>
>>> FORTRAN up to 77 (which is what dusty decks are, by definition) are
>>> only had one integer size, which is 32 bits on today's platforms.
>>> Compilers have switches to set default integer sizes, but these
>>> switches make the compiler violate the Fortran standard because
>>> suddenly the size of an INTEGER is twice the size of a REAL,
>>> which is likely to mess up COMMON-block based storage management
>>> in dusty deck programs.
>>>
>>> When trying to bring an old FORTRAN program into the 64-bit world,
>>> some decisions and some work will still be required. A significant
>>> first step would probably be to change the memory management to
>>> dynamic style (which has only been in the language for ~30 years
>>> now :-)
>>>
>>
>>
>> In C land, we still haven't entirely seen the end of:
>> foo(p, q) void *p, *q; { ... }
> <
> A useful reason for AI programming machines.........
>>
>> It is a lot harder to port code when it assumes that
>> "sizeof(int)==sizeof(void *)", "sizeof(int)==2", ...
> <
> Another

It gets annoying, as there exist profiles in my case where pointers are:
32-bits (32b storage), smaller configs;
48-bits (64b storage), current active configs;
96-bits (128b storage), experimental / not fully debugged.

So, can't just make a blanket assumption of a 64-bit format in a lot of
my runtime code.

>>
>> A lot of this code will often need significant modification to port to BJX2.
>>
>> Well, along with things like:
>> unsigned char * far screen;
>> screen = MK_FP(0xA000, 0x0000);
>> ...
>>
>> And, my graphics hardware being unable to directly emulate "Mode 13h"
>> style graphics.
>>
>> Though, did at least at one point think up a possible hack that "could"
>> be used to support linear-addressed bitmapped graphics modes without too
>> much modification. Haven't done so yet.
>>
>>
>>
>>
>> But, yeah, don't know much about FORTRAN, as it wasn't really a language
>> I have used...
>>
>>
>>
>>
>> As for constants...
>>
>> I can also note (compiler stats):
>> Disp9 hit rate: 98.15%
>> 3RI, Imm9 hit rate: 94.47%
> <
> It is not the hit rate that maters, it is how bad is the hassle to get to 100%

For ASM code, it is trivial to write;
For performance, it isn't too big of an issue.
The compiler can also deal with it without too much issue.

Using 12-bit immediate values would have been slightly better for
hit-rate, but the difference is small.

Though, 5-bit immediate and displacement values would have had a fairly
poor hit rate (Disp5 would have only been around 24%).

Say, for a displacement:
5u: 24.5%
6u: 55.6%
7u: 68.3%
8u: 85.4%
9u: 98.2%
10u: 98.8%
11u: 99.4%
12u: 99.7%
With the current programs, 100% would be hit at ~ 22 bits (mostly due to
the size of the ".bss" sections with a few of the programs).

>>
>> Hit rate, 3RI, Imm9 & Imm33s: 99.44%
>> Miss Rate: 0.56% (would require a 96-bit JLDI / "Jumbo Load" op).
>>
>> Constant displacements which exceed 33 bits are basically non-existent
>> (So, Imm9u+Disp33s: 100%).
> <
> What about 50 years from now, Windows will be larger than a 48-bit address
> space by then.

Maybe...

But, not exactly like x86-64 would fare much better here with its use of
32-bit displacement fields, ...

The main place where this particularly issue is likely to manifest would
be if/when structs and arrays and similar start getting into GB territory.

Granted, x86-64 does use "full sized" index registers, rather than being
like, "Well, it is defined for 33 bits, and then it is undefined...".

>>
>> Between several programs, it is a similar story (though the exact
>> percentages are prone to vary).
>>
>> Say, Imm9 hit rate varies from around ~ 90-98%, ...
>>
>>
>>
>> Or, IOW: It doesn't particularly seem worth worrying too much about
>> larger constants, as they are likely to disappear into the noise.
> <
> For now, and especially for the things you are interested in;
> What about 50 years from now ??

I don't expect things will continue to get bigger:
Moore's law will hit a limit, and then "ever more RAM" will become "ever
more expensive".

My current prediction is mostly that "pretty much everything" is likely
to effectively hit a brick wall in the "not too distant" future; and it
will be unclear if/how it would ever be resolved.

If no one can afford the RAM, one need not care how efficiently one
could address it if it were to exist.

Though, I had experimentally designed features that could allow for
96-bit pseudo-linear addressing, if a need for this became "actually
relevant".

>>
>> Checking my "cycles used per instruction" stats, the relevant case
>> 'JLDI' is ~ 0.65% of the cycle budget.
>>
>> Seemingly, its dominant use-case is loading absolute addresses for MMIO
>> and similar.
>>
>>
>> The bigger source of overflowed displacements (requiring a jumbo
>> encoding) is mostly in relation to global variables.
> <
> One of my original goals, and one I have not ever lost sight of is
> how does one present a 64-bit VAS to software. That is:: the Compiler,
> the Linker, and the OS can put anything anywhere they want to put it.
> My 66000 supplies the tools, they get to use them however they want.

You *can* put anything anywhere in the address space...

Doesn't mean:
short *ptr;
int i, j;
...
ptr[i]=j;

Needs to be able to reach it.

Though:
short *ptr;
ptrdiff_t i, j;
...
ptr[i]=j;

Will need to reach the whole address space, just, at a
"not-strictly-zero" performance cost.

Though:
struct foo_t *obj;
obj->x=i;

Would currently be limited to sizeof(struct foo_t) < 4GB, but if this
case came up, the compiler could be like "Oh crap, this struct is over
4GB!" and then do something about it.

>>
>> As noted, even with common global variables being sorted by usage
>> frequency, there are still a lot of global variables that fall well
>> outside the reach of a 9 or 10 bit GBR displacement (512B or 1K).
>>
>> If Jumbo is enabled, looks like these cases are mostly being handled
>> with a 64-bit / Disp33s encoding.
>>
>> The non-jumbo fallback strategy is loading a 24-bit displacement into
>> R0, but this will fail if ".data"+".bss" exceeds 16MB. Luckily, any core
>> that is likely to be running programs big enough for this to matter,
>> will also be big enough to support jumbo prefixes.
>>
> What if .data+.bss+.rodata is bigger than 48-bits ?

I will just sorta ignore this possibility for now.

But, did technically recently add some "SUBX.P"/"CMPPGTX"/...
instructions, and it would be "trivial enough" to add an "ADDX.P"
instruction (the mechanism already exists in my Verilog code, just
didn't define an encoding for this case; could do so, now realizing that
a possible use-case exists for it).

With these, one does have a mechanism that (albeit a little wonky) could
support arrays up to 2^96 bits. Though, in this case, address
calculation becomes a multi-step operation.

Partial reason these were added initially was to reduce some overheads
associated with the use of bounds-checking (though, "ADDX.P" would be
effectively an ALU only op and would zero the tag bits).

There are also "SUB.P"/"CMPPGT"/... which operate on 48 bits, just sort
of modified ALU ops that pretend as-if the high-order 16 bits were zeroed...

Click here to read the complete article

On 7/12/2023 7:21 PM, Scott Lurndal wrote:
> BGB <cr88192@gmail.com> writes:
>> On 7/12/2023 12:55 AM, Thomas Koenig wrote:
>>> MitchAlsup <MitchAlsup@aol.com> schrieb:
>>>
>>>> I chose 64-bit constants because I was doing a 64-bit machine and 64-bit
>>>> constants are a perfect match. Since the constants are universal, I have
>>>> easy access to 64-bit displacements--and support Fortran common blocks
>>>> of any size (up to about 2^62). Fortran programmers can go back and pass
>>>> everything in common blocks......one can port dusty deck programs change
>>>> the sizes of array bounds and it all just works.
>>>
>>> There is one additional problem (outside the scope of an ISA):
>>> Integer sizes.
>>>
>>> FORTRAN up to 77 (which is what dusty decks are, by definition) are
>>> only had one integer size, which is 32 bits on today's platforms.
>>> Compilers have switches to set default integer sizes, but these
>>> switches make the compiler violate the Fortran standard because
>>> suddenly the size of an INTEGER is twice the size of a REAL,
>>> which is likely to mess up COMMON-block based storage management
>>> in dusty deck programs.
>>>
>>> When trying to bring an old FORTRAN program into the 64-bit world,
>>> some decisions and some work will still be required. A significant
>>> first step would probably be to change the memory management to
>>> dynamic style (which has only been in the language for ~30 years
>>> now :-)
>>>
>>
>>
>> In C land, we still haven't entirely seen the end of:
>> foo(p, q) void *p, *q; { ... }
>
> Frankly, I haven't seen K&R C style declarations or code for
> decades. And iirc, there were several tools to help
> convert to ANSI-style declarations back in that timeframe.
>
> I did recently try compiling the version 6 Unix C compiler
> with gcc, wouldn't compile, even in most lenient mode.
>

The version of Dhrystone I am using is still using K&R syntax.

The "Duke Nukem" side scroller code was (IIRC) a mix of K&R style and
C90 style. There were some random bits of K&R syntax hanging around in
ROTT as well (though both the Wold3D and "Commander Keen" code seemed to
be entirely C90 style IIRC, albeit with a fair chunk of real-mode
assembler code in the mix, ...).

Wolf3D, Keen, and Duke Nukem would likely require extensive rewriting to
port. ROTT was a bit of a pain, but I already did this part...

Granted, much of this is code from around 30+ years ago...

>>
>> It is a lot harder to port code when it assumes that
>> "sizeof(int)==sizeof(void *)", "sizeof(int)==2", ...
>
> SVR4 (circa 1990) introduced a number of synthetic typedefs,
> primarily to ameliorate such issues (along with guidance to use
> sizeof() on a variable, not the underlying type). Things like
> pid_t, uid_t, gid_t were all changed from the 16-bit used in SVR3
> to 32-bit in SVR4. Lots of code broke. Using a typedef just requires
> a simple recompile (and the kernel can continue to support legacy
> binaries so long as no GID/UID is assigned to that process that exceeds
> 16 bits).

OK.

Main hassle is mostly things like structure layouts and similar for
things being read into memory.

Say, if the program assumes 16-bit values in a struct, and has packed
most of its data into 4K blocks that are loaded and unloaded from RAM,
etc...

Or, RLE unpacking code that assumes the ability to be like "*(int *)s"
and then get a 16-bit value, ...

Sometimes there may be other wonkiness, like say, "long" being 32-bit
but having a 16-bit alignment. Even if the type sizes are converted
over, the structure size and layout might change based on a 32-bit
struct member changing from a 16 to 32 bit alignment.

....

Re: How much space did the 68000 registers take up?

<07dd9312-a6d7-4666-978b-c4e0a0852c67n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33215&group=comp.arch#33215

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:1aa2:b0:762:48ac:c40b with SMTP id bl34-20020a05620a1aa200b0076248acc40bmr585qkb.14.1689213122287;
Wed, 12 Jul 2023 18:52:02 -0700 (PDT)
X-Received: by 2002:a05:6870:c7ac:b0:1b0:40fb:9a0c with SMTP id
dy44-20020a056870c7ac00b001b040fb9a0cmr496822oab.3.1689213122022; Wed, 12 Jul
2023 18:52:02 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 12 Jul 2023 18:52:01 -0700 (PDT)
In-Reply-To: <u8nhu0$3du57$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:6c3a:4c21:b5f4:f0a8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:6c3a:4c21:b5f4:f0a8
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<a27529b5-fa8e-4bb8-a1f3-2b285fdf3e7cn@googlegroups.com> <2023Jul9.173411@mips.complang.tuwien.ac.at>
<u8eth9$26qad$1@dont-email.me> <92742c70-42d4-44ff-b667-df77c679d5afn@googlegroups.com>
<u8fe16$292j5$1@dont-email.me> <5ea12dcd-a5c6-4b74-a580-c0d871694dbdn@googlegroups.com>
<u8fmeq$29ukm$1@dont-email.me> <6beed6fc-0432-4f55-9f38-36a57670c56en@googlegroups.com>
<u8hb2p$2iql4$1@dont-email.me> <bf88abfc-d0dc-4c97-8717-65d21328bb71n@googlegroups.com>
<u8k7a1$3002q$1@dont-email.me> <bc7611d8-ba5d-488b-960d-8c539b9cf96cn@googlegroups.com>
<u8kl05$31cjp$1@dont-email.me> <b0167266-782f-4468-a552-a85e3a5da427n@googlegroups.com>
<4c89ac71-75db-4688-a5b2-e89bf57a1fbdn@googlegroups.com> <u8lf80$hh9l$1@newsreader4.netcologne.de>
<u8n9n3$3d8jv$1@dont-email.me> <7aae0ec6-7a81-471c-97c2-c4087fd46a33n@googlegroups.com>
<u8nhu0$3du57$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <07dd9312-a6d7-4666-978b-c4e0a0852c67n@googlegroups.com>
Subject: Re: How much space did the 68000 registers take up?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 13 Jul 2023 01:52:02 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 20

by: MitchAlsup - Thu, 13 Jul 2023 01:52 UTC

On Wednesday, July 12, 2023 at 7:53:24 PM UTC-5, BGB wrote:
> On 7/12/2023 5:53 PM, MitchAlsup wrote:
> > On Wednesday, July 12, 2023 at 5:33:12 PM UTC-5, BGB wrote:

> > For 99% of programs, yes; but what bout the truly large stadium
> > sized machines with 16,000 processors, 32EB of DRAM, running
> > shared database applications in cache coherent memory over CXL
> > extension to PCIe ??
> >
> Then one deals with it...
<
What if it just ran fine from day 1 without any hassle ?
>
> But, which is worse:
> Being limited to Disp33s in the default case, paying a few extra cycles
> as needed to make it bigger;
> Adding an extra 2 clock cycles or so of latency to every memory access.
<
What makes you think 64-bit constants add any cycles to any memory
references ??

Re: How much space did the 68000 registers take up?

<u8nuvi$3ik9c$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33216&group=comp.arch#33216

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: How much space did the 68000 registers take up?
Date: Wed, 12 Jul 2023 23:35:49 -0500
Organization: A noiseless patient Spider
Lines: 101
Message-ID: <u8nuvi$3ik9c$1@dont-email.me>
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<a27529b5-fa8e-4bb8-a1f3-2b285fdf3e7cn@googlegroups.com>
<2023Jul9.173411@mips.complang.tuwien.ac.at> <u8eth9$26qad$1@dont-email.me>
<92742c70-42d4-44ff-b667-df77c679d5afn@googlegroups.com>
<u8fe16$292j5$1@dont-email.me>
<5ea12dcd-a5c6-4b74-a580-c0d871694dbdn@googlegroups.com>
<u8fmeq$29ukm$1@dont-email.me>
<6beed6fc-0432-4f55-9f38-36a57670c56en@googlegroups.com>
<u8hb2p$2iql4$1@dont-email.me>
<bf88abfc-d0dc-4c97-8717-65d21328bb71n@googlegroups.com>
<u8k7a1$3002q$1@dont-email.me>
<bc7611d8-ba5d-488b-960d-8c539b9cf96cn@googlegroups.com>
<u8kl05$31cjp$1@dont-email.me>
<b0167266-782f-4468-a552-a85e3a5da427n@googlegroups.com>
<4c89ac71-75db-4688-a5b2-e89bf57a1fbdn@googlegroups.com>
<u8lf80$hh9l$1@newsreader4.netcologne.de> <u8n9n3$3d8jv$1@dont-email.me>
<7aae0ec6-7a81-471c-97c2-c4087fd46a33n@googlegroups.com>
<u8nhu0$3du57$1@dont-email.me>
<07dd9312-a6d7-4666-978b-c4e0a0852c67n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 13 Jul 2023 04:36:02 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d20b2869af2f8a4407bb2e819b7f51f5";
logging-data="3756332"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX187UQ9ZE036jHSYwSkp78/h"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:RHVQkq1+o01qLgo0QZk4RF6+tGI=
In-Reply-To: <07dd9312-a6d7-4666-978b-c4e0a0852c67n@googlegroups.com>
Content-Language: en-US

by: BGB - Thu, 13 Jul 2023 04:35 UTC

On 7/12/2023 8:52 PM, MitchAlsup wrote:
> On Wednesday, July 12, 2023 at 7:53:24 PM UTC-5, BGB wrote:
>> On 7/12/2023 5:53 PM, MitchAlsup wrote:
>>> On Wednesday, July 12, 2023 at 5:33:12 PM UTC-5, BGB wrote:
>
>>> For 99% of programs, yes; but what bout the truly large stadium
>>> sized machines with 16,000 processors, 32EB of DRAM, running
>>> shared database applications in cache coherent memory over CXL
>>> extension to PCIe ??
>>>
>> Then one deals with it...
> <
> What if it just ran fine from day 1 without any hassle ?

Could be done, wouldn't be free.

>>
>> But, which is worse:
>> Being limited to Disp33s in the default case, paying a few extra cycles
>> as needed to make it bigger;
>> Adding an extra 2 clock cycles or so of latency to every memory access.
> <
> What makes you think 64-bit constants add any cycles to any memory
> references ??

If the AGU needed to calculate using a full 96 bit adder, this is going
to have a higher latency cost than, say, adding 48 bit value with a
36-bit shifted displacement with the upper 48 bits being copied over
unchanged.

Say:
(35: 0): Add Base with Disp shifted 0..3 bits.
(47:36): either val+1, val-1, or val (copy unchanged).
(95:48): Copy unchanged.

But, say, Disp48 could be supported later or as an optional extension.

The reason for limiting displacements to 33 bits is not about how to get
the displacement value into the AGU, but rather the latency of the carry
propagation within the AGU (which is then "immediately" handed off to
both the L1 D$ and branch-handling logic).

A bigger displacement would likely need to give a full clock cycle for
the AGU to do its thing.

However, if the calculation is done externally via the ALU when larger
displacements are needed, the ALU can be given a full clock cycle to do
its work (since these operations already have a 2-cycle latency).

Similar goes for the 33-bit displacement limit on relative branches, etc...

Likewise, the idea for allowing a larger 57-bit displacement case for
LEA.B isn't about adding the new encoding (essentially, the encoding
exists already, it is just not used), but rather, since this case can be
quietly redirected through the ALU instead (via the instruction decoder).

Maybe even XLEA.B could also get in on this, though this would require
adding a new immediate-handling special case (for the full 57 bits to
actually be used).

So, say, very long branch:
XLEA.B (PC, Disp57s), R16
JMPX R16

It is likely that "ADD.P" and "ADDX.P" would use the same logic, and I
may as well have them be decoded using the same rules as memory Load or
LEA (for sake of allowing PC-rel and GBR-rel).

Possibly then:
ADD.P (Rm, Ro), Rn
Or:
ADD.P Rm, Ro, Rn

Could be left as a choice of preference.

....

As for Imm57s ALU ops, like:
ADD Rm, Imm57s, Rn
These are an optional feature...

They could be used, and may exist depending on what features are
enabled, but only apply to around 0.5% of ADD-Imm cases (and may or may
not save 1 clock-cycle over the alternatives).

They mostly seems to disappear into the noise.

Then again, it is possible that if both XMOV+ALUPTR exists, may throw
these in as a side feature (similarly, they also come as a side feature
with FPIMM as well).

Re: How much space did the 68000 registers take up?

<090e577f-e465-4c7a-80a2-c7cc90b54579n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33218&group=comp.arch#33218

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:4ba8:0:b0:635:cbc7:95da with SMTP id i8-20020ad44ba8000000b00635cbc795damr672qvw.0.1689263567312;
Thu, 13 Jul 2023 08:52:47 -0700 (PDT)
X-Received: by 2002:a05:6808:30a4:b0:3a3:89a2:50a5 with SMTP id
bl36-20020a05680830a400b003a389a250a5mr2274920oib.10.1689263566923; Thu, 13
Jul 2023 08:52:46 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Jul 2023 08:52:46 -0700 (PDT)
In-Reply-To: <u8nuvi$3ik9c$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8cea:3802:b236:969c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8cea:3802:b236:969c
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<a27529b5-fa8e-4bb8-a1f3-2b285fdf3e7cn@googlegroups.com> <2023Jul9.173411@mips.complang.tuwien.ac.at>
<u8eth9$26qad$1@dont-email.me> <92742c70-42d4-44ff-b667-df77c679d5afn@googlegroups.com>
<u8fe16$292j5$1@dont-email.me> <5ea12dcd-a5c6-4b74-a580-c0d871694dbdn@googlegroups.com>
<u8fmeq$29ukm$1@dont-email.me> <6beed6fc-0432-4f55-9f38-36a57670c56en@googlegroups.com>
<u8hb2p$2iql4$1@dont-email.me> <bf88abfc-d0dc-4c97-8717-65d21328bb71n@googlegroups.com>
<u8k7a1$3002q$1@dont-email.me> <bc7611d8-ba5d-488b-960d-8c539b9cf96cn@googlegroups.com>
<u8kl05$31cjp$1@dont-email.me> <b0167266-782f-4468-a552-a85e3a5da427n@googlegroups.com>
<4c89ac71-75db-4688-a5b2-e89bf57a1fbdn@googlegroups.com> <u8lf80$hh9l$1@newsreader4.netcologne.de>
<u8n9n3$3d8jv$1@dont-email.me> <7aae0ec6-7a81-471c-97c2-c4087fd46a33n@googlegroups.com>
<u8nhu0$3du57$1@dont-email.me> <07dd9312-a6d7-4666-978b-c4e0a0852c67n@googlegroups.com>
<u8nuvi$3ik9c$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <090e577f-e465-4c7a-80a2-c7cc90b54579n@googlegroups.com>
Subject: Re: How much space did the 68000 registers take up?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 13 Jul 2023 15:52:47 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 6176

by: MitchAlsup - Thu, 13 Jul 2023 15:52 UTC

On Wednesday, July 12, 2023 at 11:36:06 PM UTC-5, BGB wrote:
> On 7/12/2023 8:52 PM, MitchAlsup wrote:
> > On Wednesday, July 12, 2023 at 7:53:24 PM UTC-5, BGB wrote:
> >> On 7/12/2023 5:53 PM, MitchAlsup wrote:
> >>> On Wednesday, July 12, 2023 at 5:33:12 PM UTC-5, BGB wrote:
> >
> >>> For 99% of programs, yes; but what bout the truly large stadium
> >>> sized machines with 16,000 processors, 32EB of DRAM, running
> >>> shared database applications in cache coherent memory over CXL
> >>> extension to PCIe ??
> >>>
> >> Then one deals with it...
> > <
> > What if it just ran fine from day 1 without any hassle ?
> Could be done, wouldn't be free.
> >>
> >> But, which is worse:
> >> Being limited to Disp33s in the default case, paying a few extra cycles
> >> as needed to make it bigger;
> >> Adding an extra 2 clock cycles or so of latency to every memory access..
> > <
> > What makes you think 64-bit constants add any cycles to any memory
> > references ??
> If the AGU needed to calculate using a full 96 bit adder, this is going
> to have a higher latency cost than, say, adding 48 bit value with a
> 36-bit shifted displacement with the upper 48 bits being copied over
> unchanged.
<
In full custom VLSI, a 64-bit adder is 11-actual gates of delay that generally
get pounded down (i.e., circuit designed) to 8-gates of actual delay. A 3-input
version is 1 gate delay longer. None of these gates are wider than fan-in 4
NAND gates or fan-in 3 NOR gates or fan-in 3-X[N]OR gates.
<
Carry is setup into 16-bit chunks (for fan-out and wire delay reasons.
Carry[15] is routed to carry[31] and then back to output select[15..31]
so the carries move 16-bits every gate of carry chain delay and fan-out
is 1 and fan-in is 1 on this wire.
<
In integer form that 11-gates includes the XOR used to ADD and SUB
In AGEN form that 11-gates includes the shift {0,1,2,3} for indexing.
These are Carry Select Adders.
<
In comparison, the logic delay of an SRAM is 12-gates of delay and you still
need to put flip-flops on the address-in and data-out busses.
>
>
> Say:
> (35: 0): Add Base with Disp shifted 0..3 bits.
> (47:36): either val+1, val-1, or val (copy unchanged).
> (95:48): Copy unchanged.
>
> But, say, Disp48 could be supported later or as an optional extension.
<
How wide the operands are is determined WAY before the AGEN adder,
AGEN only sees 64-bit inputs, and only performs 3-input 64-bit additions.
All the sizing is performed in Decode and Forwarding. Once something
gets on the operand busses it is 64-bits in size. Oh, and by the way,
selecting between 16-bits, 32-bits, and 64-bits takes less than ½ the
gates of delay that forwarding on a 1-wide machine takes.
>
>
> The reason for limiting displacements to 33 bits is not about how to get
> the displacement value into the AGU, but rather the latency of the carry
> propagation within the AGU (which is then "immediately" handed off to
> both the L1 D$ and branch-handling logic).
<
I have been at this since 1972, adder circuit design.
>
> A bigger displacement would likely need to give a full clock cycle for
> the AGU to do its thing.
<
Nope, no more than 2-gates of delay,
>
>
> However, if the calculation is done externally via the ALU when larger
> displacements are needed, the ALU can be given a full clock cycle to do
> its work (since these operations already have a 2-cycle latency).
<
The ALU is burdened by having to be able to subtract, whereas AGEN is not.
>
> Similar goes for the 33-bit displacement limit on relative branches, etc....
>
At this point it is pretty clear you don't know what you are talking about
to better than hand waving accuracy. Maybe (MAYBE) in FPGA what
you say is true, but it is not in full custom VLSI.

Re: How much space did the 68000 registers take up?

<u8p9rh$3mr6p$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33219&group=comp.arch#33219

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: How much space did the 68000 registers take up?
Date: Thu, 13 Jul 2023 11:47:31 -0500
Organization: A noiseless patient Spider
Lines: 201
Message-ID: <u8p9rh$3mr6p$1@dont-email.me>
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<u8eth9$26qad$1@dont-email.me>
<92742c70-42d4-44ff-b667-df77c679d5afn@googlegroups.com>
<u8fe16$292j5$1@dont-email.me>
<5ea12dcd-a5c6-4b74-a580-c0d871694dbdn@googlegroups.com>
<u8fmeq$29ukm$1@dont-email.me>
<6beed6fc-0432-4f55-9f38-36a57670c56en@googlegroups.com>
<u8hb2p$2iql4$1@dont-email.me>
<bf88abfc-d0dc-4c97-8717-65d21328bb71n@googlegroups.com>
<u8k7a1$3002q$1@dont-email.me>
<bc7611d8-ba5d-488b-960d-8c539b9cf96cn@googlegroups.com>
<u8kl05$31cjp$1@dont-email.me>
<b0167266-782f-4468-a552-a85e3a5da427n@googlegroups.com>
<4c89ac71-75db-4688-a5b2-e89bf57a1fbdn@googlegroups.com>
<u8lf80$hh9l$1@newsreader4.netcologne.de> <u8n9n3$3d8jv$1@dont-email.me>
<7aae0ec6-7a81-471c-97c2-c4087fd46a33n@googlegroups.com>
<u8nhu0$3du57$1@dont-email.me>
<07dd9312-a6d7-4666-978b-c4e0a0852c67n@googlegroups.com>
<u8nuvi$3ik9c$1@dont-email.me>
<090e577f-e465-4c7a-80a2-c7cc90b54579n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 13 Jul 2023 16:47:45 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d20b2869af2f8a4407bb2e819b7f51f5";
logging-data="3894489"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+u8sF4iuAvJ6W1cOAsv+JN"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:ALE+BYkfWUzFTq72Id6HV4qg2FA=
In-Reply-To: <090e577f-e465-4c7a-80a2-c7cc90b54579n@googlegroups.com>
Content-Language: en-US

by: BGB - Thu, 13 Jul 2023 16:47 UTC

On 7/13/2023 10:52 AM, MitchAlsup wrote:
> On Wednesday, July 12, 2023 at 11:36:06 PM UTC-5, BGB wrote:
>> On 7/12/2023 8:52 PM, MitchAlsup wrote:
>>> On Wednesday, July 12, 2023 at 7:53:24 PM UTC-5, BGB wrote:
>>>> On 7/12/2023 5:53 PM, MitchAlsup wrote:
>>>>> On Wednesday, July 12, 2023 at 5:33:12 PM UTC-5, BGB wrote:
>>>
>>>>> For 99% of programs, yes; but what bout the truly large stadium
>>>>> sized machines with 16,000 processors, 32EB of DRAM, running
>>>>> shared database applications in cache coherent memory over CXL
>>>>> extension to PCIe ??
>>>>>
>>>> Then one deals with it...
>>> <
>>> What if it just ran fine from day 1 without any hassle ?
>> Could be done, wouldn't be free.
>>>>
>>>> But, which is worse:
>>>> Being limited to Disp33s in the default case, paying a few extra cycles
>>>> as needed to make it bigger;
>>>> Adding an extra 2 clock cycles or so of latency to every memory access.
>>> <
>>> What makes you think 64-bit constants add any cycles to any memory
>>> references ??
>> If the AGU needed to calculate using a full 96 bit adder, this is going
>> to have a higher latency cost than, say, adding 48 bit value with a
>> 36-bit shifted displacement with the upper 48 bits being copied over
>> unchanged.
> <
> In full custom VLSI, a 64-bit adder is 11-actual gates of delay that generally
> get pounded down (i.e., circuit designed) to 8-gates of actual delay. A 3-input
> version is 1 gate delay longer. None of these gates are wider than fan-in 4
> NAND gates or fan-in 3 NOR gates or fan-in 3-X[N]OR gates.
> <

I don't have "gates of delay" here, rather "logic delay" and "net
delay", both measured in nanoseconds. The "net delay" tends to be the
larger of the two (typically by a factor of 2x or 3x).

Time seems to be however long it takes to get through a few CARRY4
units, and associated MUXF units, ...

Then, there is "worst negative slack", where "all is good" so long as
this number remains positive (eg, it did not fail timing).

There is also "fanout", which if this number gets large (say, 1000+),
the "net delay" also typically gets bigger.

When timing fails, often it is with paths that seem to zigzag across
much of the FPGA.

> Carry is setup into 16-bit chunks (for fan-out and wire delay reasons.
> Carry[15] is routed to carry[31] and then back to output select[15..31]
> so the carries move 16-bits every gate of carry chain delay and fan-out
> is 1 and fan-in is 1 on this wire.
> <
> In integer form that 11-gates includes the XOR used to ADD and SUB
> In AGEN form that 11-gates includes the shift {0,1,2,3} for indexing.
> These are Carry Select Adders.
> <

Yeah, I had also used 16-bit chunks, but more because fiddling showed
that 16-bit worked out best (though 12 and 20 also work OK). Needs to be
kept a multiple of 4 for sake of the CARRY4's doing 4+4=>4 (with
CIN/COUT signals).

> In comparison, the logic delay of an SRAM is 12-gates of delay and you still
> need to put flip-flops on the address-in and data-out busses.

The Block-RAM's operate on a clock-edge.
This is on the end of the EX1 clock, so:

EX1: Calculate Address;
Feed index into Block-RAM's (on clock edge);
EX2:
Check for Hit/Miss
Extract sub-block
EX3:
Result.

For the L2 cache, there is more delay:
Request arrives on bus.
-- edge
Feed into BRAM arrays;
-- edge
(idle)
-- edge
Check for hit/miss.
--
Emit response to bus.

These extra clock edges helping significantly with timing.
But, don't want to do this for L1 cache, as this would increase latency.

The L2 cache XORs all the bits together for the index, whereas the L1
only uses the low-order bits as a modulo index.

>>
>>
>> Say:
>> (35: 0): Add Base with Disp shifted 0..3 bits.
>> (47:36): either val+1, val-1, or val (copy unchanged).
>> (95:48): Copy unchanged.
>>
>> But, say, Disp48 could be supported later or as an optional extension.
> <
> How wide the operands are is determined WAY before the AGEN adder,
> AGEN only sees 64-bit inputs, and only performs 3-input 64-bit additions.
> All the sizing is performed in Decode and Forwarding. Once something
> gets on the operand busses it is 64-bits in size. Oh, and by the way,
> selecting between 16-bits, 32-bits, and 64-bits takes less than ½ the
> gates of delay that forwarding on a 1-wide machine takes.

I was handling AGU in EX1, it gets register inputs in just the same way
as the ALU or the various other units (namely, the values of the Rs and
Rt register ports, some control bits for the scale and other things, ...).

It is just sort of shoe-horned in with forming the address to hand off
to the L1 D$ (within the same clock-cycle), and then is used to index
the array.

Actual hit/miss checking being done during the following clock cycle.

>>
>>
>> The reason for limiting displacements to 33 bits is not about how to get
>> the displacement value into the AGU, but rather the latency of the carry
>> propagation within the AGU (which is then "immediately" handed off to
>> both the L1 D$ and branch-handling logic).
> <
> I have been at this since 1972, adder circuit design.

Granted, I haven't existed this long...

>>
>> A bigger displacement would likely need to give a full clock cycle for
>> the AGU to do its thing.
> <
> Nope, no more than 2-gates of delay,
>>
>>
>> However, if the calculation is done externally via the ALU when larger
>> displacements are needed, the ALU can be given a full clock cycle to do
>> its work (since these operations already have a 2-cycle latency).
> <
> The ALU is burdened by having to be able to subtract, whereas AGEN is not.

Granted.

But, the ALU also has more time given to it.
It has the full EX1 cycle...
(Mostly) we don't use its outputs until the cycle has finished.
So, it produces a result, which we can "see" in EX2.

The AGU produces a result in EX1, and the result is also consumed in EX1.

Though, things like Branch initiation don't actually take hold until EX2
(partly to reduce timing stress on the AGU). Though, the LEA instruction
does produce a result in EX1 (so the AGU's output also goes into all the
register-forwarding logic, etc).

>>
>> Similar goes for the 33-bit displacement limit on relative branches, etc...
>>
> At this point it is pretty clear you don't know what you are talking about
> to better than hand waving accuracy. Maybe (MAYBE) in FPGA what
> you say is true, but it is not in full custom VLSI.
>

I have spent enough time fiddling with getting stuff to pass timing in
Vivado that I think I have an idea how it all behaves.

Carry select makes it faster, but the benefits are mixed (and still
works best when the adder is given a full clock-cycle just to itself).

Carry select seems to help, but many of these adders are already using a
carry-select design (if not using a carry select design, it doesn't
really work at all; directly expressing A+B in Verilog tending to result
in a chain of CARRY4's that is however long as the input values were).

But, yeah, fiddling with timing has also resulted in stuff like the
branch-predictor only internally operating on the low 24 bits of the
address, and then falling back to a full branch if it would cross into
another 16MB window (with the Branch ops routed through the AGU in EX1).

Re: How much space did the 68000 registers take up?

<c05e5b3a-6cd8-4af3-8a57-d91e72b5bdc9n@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33220&group=comp.arch#33220

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:192e:b0:635:dbbe:7a6d with SMTP id es14-20020a056214192e00b00635dbbe7a6dmr6389qvb.13.1689269419413;
Thu, 13 Jul 2023 10:30:19 -0700 (PDT)
X-Received: by 2002:a05:6808:20a0:b0:39c:a74b:81d6 with SMTP id
s32-20020a05680820a000b0039ca74b81d6mr2761261oiw.7.1689269419125; Thu, 13 Jul
2023 10:30:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Jul 2023 10:30:18 -0700 (PDT)
In-Reply-To: <u8p9rh$3mr6p$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8cea:3802:b236:969c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8cea:3802:b236:969c
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<u8eth9$26qad$1@dont-email.me> <92742c70-42d4-44ff-b667-df77c679d5afn@googlegroups.com>
<u8fe16$292j5$1@dont-email.me> <5ea12dcd-a5c6-4b74-a580-c0d871694dbdn@googlegroups.com>
<u8fmeq$29ukm$1@dont-email.me> <6beed6fc-0432-4f55-9f38-36a57670c56en@googlegroups.com>
<u8hb2p$2iql4$1@dont-email.me> <bf88abfc-d0dc-4c97-8717-65d21328bb71n@googlegroups.com>
<u8k7a1$3002q$1@dont-email.me> <bc7611d8-ba5d-488b-960d-8c539b9cf96cn@googlegroups.com>
<u8kl05$31cjp$1@dont-email.me> <b0167266-782f-4468-a552-a85e3a5da427n@googlegroups.com>
<4c89ac71-75db-4688-a5b2-e89bf57a1fbdn@googlegroups.com> <u8lf80$hh9l$1@newsreader4.netcologne.de>
<u8n9n3$3d8jv$1@dont-email.me> <7aae0ec6-7a81-471c-97c2-c4087fd46a33n@googlegroups.com>
<u8nhu0$3du57$1@dont-email.me> <07dd9312-a6d7-4666-978b-c4e0a0852c67n@googlegroups.com>
<u8nuvi$3ik9c$1@dont-email.me> <090e577f-e465-4c7a-80a2-c7cc90b54579n@googlegroups.com>
<u8p9rh$3mr6p$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c05e5b3a-6cd8-4af3-8a57-d91e72b5bdc9n@googlegroups.com>
Subject: Re: How much space did the 68000 registers take up?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 13 Jul 2023 17:30:19 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 11331

by: MitchAlsup - Thu, 13 Jul 2023 17:30 UTC

On Thursday, July 13, 2023 at 11:47:50 AM UTC-5, BGB wrote:
> On 7/13/2023 10:52 AM, MitchAlsup wrote:
> > On Wednesday, July 12, 2023 at 11:36:06 PM UTC-5, BGB wrote:
> >> On 7/12/2023 8:52 PM, MitchAlsup wrote:
> >>> On Wednesday, July 12, 2023 at 7:53:24 PM UTC-5, BGB wrote:
> >>>> On 7/12/2023 5:53 PM, MitchAlsup wrote:
> >>>>> On Wednesday, July 12, 2023 at 5:33:12 PM UTC-5, BGB wrote:
> >>>
> >>>>> For 99% of programs, yes; but what bout the truly large stadium
> >>>>> sized machines with 16,000 processors, 32EB of DRAM, running
> >>>>> shared database applications in cache coherent memory over CXL
> >>>>> extension to PCIe ??
> >>>>>
> >>>> Then one deals with it...
> >>> <
> >>> What if it just ran fine from day 1 without any hassle ?
> >> Could be done, wouldn't be free.
> >>>>
> >>>> But, which is worse:
> >>>> Being limited to Disp33s in the default case, paying a few extra cycles
> >>>> as needed to make it bigger;
> >>>> Adding an extra 2 clock cycles or so of latency to every memory access.
> >>> <
> >>> What makes you think 64-bit constants add any cycles to any memory
> >>> references ??
> >> If the AGU needed to calculate using a full 96 bit adder, this is going
> >> to have a higher latency cost than, say, adding 48 bit value with a
> >> 36-bit shifted displacement with the upper 48 bits being copied over
> >> unchanged.
> > <
> > In full custom VLSI, a 64-bit adder is 11-actual gates of delay that generally
> > get pounded down (i.e., circuit designed) to 8-gates of actual delay. A 3-input
> > version is 1 gate delay longer. None of these gates are wider than fan-in 4
> > NAND gates or fan-in 3 NOR gates or fan-in 3-X[N]OR gates.
> > <
> I don't have "gates of delay" here, rather "logic delay" and "net
> delay", both measured in nanoseconds. The "net delay" tends to be the
> larger of the two (typically by a factor of 2x or 3x).
>
> Time seems to be however long it takes to get through a few CARRY4
> units, and associated MUXF units, ...
>
> Then, there is "worst negative slack", where "all is good" so long as
> this number remains positive (eg, it did not fail timing).
>
>
> There is also "fanout", which if this number gets large (say, 1000+),
> the "net delay" also typically gets bigger.
>
> When timing fails, often it is with paths that seem to zigzag across
> much of the FPGA.
> > Carry is setup into 16-bit chunks (for fan-out and wire delay reasons.
> > Carry[15] is routed to carry[31] and then back to output select[15..31]
> > so the carries move 16-bits every gate of carry chain delay and fan-out
> > is 1 and fan-in is 1 on this wire.
> > <
> > In integer form that 11-gates includes the XOR used to ADD and SUB
> > In AGEN form that 11-gates includes the shift {0,1,2,3} for indexing.
> > These are Carry Select Adders.
> > <
> Yeah, I had also used 16-bit chunks, but more because fiddling showed
> that 16-bit worked out best (though 12 and 20 also work OK). Needs to be
> kept a multiple of 4 for sake of the CARRY4's doing 4+4=>4 (with
> CIN/COUT signals).
> > In comparison, the logic delay of an SRAM is 12-gates of delay and you still
> > need to put flip-flops on the address-in and data-out busses.
> The Block-RAM's operate on a clock-edge.
> This is on the end of the EX1 clock, so:
>
> EX1: Calculate Address;
> Feed index into Block-RAM's (on clock edge);
> EX2:
> Check for Hit/Miss
> Extract sub-block
> EX3:
> Result.
>
>
> For the L2 cache, there is more delay:
> Request arrives on bus.
> -- edge
> Feed into BRAM arrays;
> -- edge
> (idle)
> -- edge
> Check for hit/miss.
> --
> Emit response to bus.
>
>
> These extra clock edges helping significantly with timing.
> But, don't want to do this for L1 cache, as this would increase latency.
>
>
> The L2 cache XORs all the bits together for the index, whereas the L1
> only uses the low-order bits as a modulo index.
> >>
> >>
> >> Say:
> >> (35: 0): Add Base with Disp shifted 0..3 bits.
> >> (47:36): either val+1, val-1, or val (copy unchanged).
> >> (95:48): Copy unchanged.
> >>
> >> But, say, Disp48 could be supported later or as an optional extension.
> > <
> > How wide the operands are is determined WAY before the AGEN adder,
> > AGEN only sees 64-bit inputs, and only performs 3-input 64-bit additions.
> > All the sizing is performed in Decode and Forwarding. Once something
> > gets on the operand busses it is 64-bits in size. Oh, and by the way,
> > selecting between 16-bits, 32-bits, and 64-bits takes less than ½ the
> > gates of delay that forwarding on a 1-wide machine takes.
> I was handling AGU in EX1, it gets register inputs in just the same way
> as the ALU or the various other units (namely, the values of the Rs and
> Rt register ports, some control bits for the scale and other things, ...)..
>
> It is just sort of shoe-horned in with forming the address to hand off
> to the L1 D$ (within the same clock-cycle), and then is used to index
> the array.
>
> Actual hit/miss checking being done during the following clock cycle.
> >>
> >>
> >> The reason for limiting displacements to 33 bits is not about how to get
> >> the displacement value into the AGU, but rather the latency of the carry
> >> propagation within the AGU (which is then "immediately" handed off to
> >> both the L1 D$ and branch-handling logic).
> > <
> > I have been at this since 1972, adder circuit design.
> Granted, I haven't existed this long...
> >>
> >> A bigger displacement would likely need to give a full clock cycle for
> >> the AGU to do its thing.
> > <
> > Nope, no more than 2-gates of delay,
> >>
> >>
> >> However, if the calculation is done externally via the ALU when larger
> >> displacements are needed, the ALU can be given a full clock cycle to do
> >> its work (since these operations already have a 2-cycle latency).
> > <
> > The ALU is burdened by having to be able to subtract, whereas AGEN is not.
> Granted.
>
> But, the ALU also has more time given to it.
> It has the full EX1 cycle...
> (Mostly) we don't use its outputs until the cycle has finished.
> So, it produces a result, which we can "see" in EX2.
>
>
> The AGU produces a result in EX1, and the result is also consumed in EX1.
>
> Though, things like Branch initiation don't actually take hold until EX2
> (partly to reduce timing stress on the AGU). Though, the LEA instruction
> does produce a result in EX1 (so the AGU's output also goes into all the
> register-forwarding logic, etc).
> >>
> >> Similar goes for the 33-bit displacement limit on relative branches, etc...
> >>
> > At this point it is pretty clear you don't know what you are talking about
> > to better than hand waving accuracy. Maybe (MAYBE) in FPGA what
> > you say is true, but it is not in full custom VLSI.
> >
> I have spent enough time fiddling with getting stuff to pass timing in
> Vivado that I think I have an idea how it all behaves.
>
>
> Carry select makes it faster, but the benefits are mixed (and still
> works best when the adder is given a full clock-cycle just to itself).
<
If you do this you cannot do back to back dependent adds.
<
Most designs place the loop around
a) add
b) result bus drive
c) operand multiplexer (a.k.a. forwarding)
so that back to back adds work.
I used 3 line items because I have worked in designs that placed the
principle clock edge between {(a and b) (b and c) (c and a)} and I
have worked on designs where the principle clock edge was rising
and falling.
<
My K9 design at AMD used 1 complete cycle for ADD, but it was a
8-gate/clock design--reading the register was 3 clocks !!! Would have
been 5 GHz in 90nm (also would have been HOT).
>
> Carry select seems to help, but many of these adders are already using a
> carry-select design (if not using a carry select design, it doesn't
> really work at all; directly expressing A+B in Verilog tending to result
> in a chain of CARRY4's that is however long as the input values were).
>
Which is why GPUs are down near 1 GHz instead of up at 5 GHz.
>
> But, yeah, fiddling with timing has also resulted in stuff like the
> branch-predictor only internally operating on the low 24 bits of the
> address, and then falling back to a full branch if it would cross into
> another 16MB window (with the Branch ops routed through the AGU in EX1).
<
It is a predictor, it does not have to use all the bits--just enough to do a
good job.

Click here to read the complete article

Scott Lurndal <scott@slp53.sl.home> schrieb:

> I did recently try compiling the version 6 Unix C compiler
> with gcc, wouldn't compile, even in most lenient mode.

You would probably have to do a reverse bootstrapping - use an old
Linux with gcc 4.something to compiler gcc in the 3.* timeframe,
use that for gcc 2.* and that for gcc 1.*; that could then be made
to work. (Some more intermediate steps might be required).

Chances are you have better things to do with your time, though :-)

Thomas Koenig <tkoenig@netcologne.de> writes:
>Scott Lurndal <scott@slp53.sl.home> schrieb:
>
>> I did recently try compiling the version 6 Unix C compiler
>> with gcc, wouldn't compile, even in most lenient mode.
>
>You would probably have to do a reverse bootstrapping - use an old
>Linux with gcc 4.something to compiler gcc in the 3.* timeframe,
>use that for gcc 2.* and that for gcc 1.*; that could then be made
>to work. (Some more intermediate steps might be required).
>
>Chances are you have better things to do with your time, though :-)

The biggest problem is that MOS names are global symbols and can
be used with any pointer. I don't believe even early gcc supported
that misfeature, which is heavily used by the v6 compiler.

Re: How much space did the 68000 registers take up?

<u8q3rb$3pivn$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33223&group=comp.arch#33223

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: How much space did the 68000 registers take up?
Date: Thu, 13 Jul 2023 19:11:09 -0500
Organization: A noiseless patient Spider
Lines: 394
Message-ID: <u8q3rb$3pivn$1@dont-email.me>
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<u8fe16$292j5$1@dont-email.me>
<5ea12dcd-a5c6-4b74-a580-c0d871694dbdn@googlegroups.com>
<u8fmeq$29ukm$1@dont-email.me>
<6beed6fc-0432-4f55-9f38-36a57670c56en@googlegroups.com>
<u8hb2p$2iql4$1@dont-email.me>
<bf88abfc-d0dc-4c97-8717-65d21328bb71n@googlegroups.com>
<u8k7a1$3002q$1@dont-email.me>
<bc7611d8-ba5d-488b-960d-8c539b9cf96cn@googlegroups.com>
<u8kl05$31cjp$1@dont-email.me>
<b0167266-782f-4468-a552-a85e3a5da427n@googlegroups.com>
<4c89ac71-75db-4688-a5b2-e89bf57a1fbdn@googlegroups.com>
<u8lf80$hh9l$1@newsreader4.netcologne.de> <u8n9n3$3d8jv$1@dont-email.me>
<7aae0ec6-7a81-471c-97c2-c4087fd46a33n@googlegroups.com>
<u8nhu0$3du57$1@dont-email.me>
<07dd9312-a6d7-4666-978b-c4e0a0852c67n@googlegroups.com>
<u8nuvi$3ik9c$1@dont-email.me>
<090e577f-e465-4c7a-80a2-c7cc90b54579n@googlegroups.com>
<u8p9rh$3mr6p$1@dont-email.me>
<c05e5b3a-6cd8-4af3-8a57-d91e72b5bdc9n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 14 Jul 2023 00:11:24 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c78e4dbbbde6992bf35f293de486ac2e";
logging-data="3984375"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19XDmB0hSLlgSDDmaO7AeWz"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:0oh5LMlu+e0Ye7Ykt1uBUnbs1sY=
In-Reply-To: <c05e5b3a-6cd8-4af3-8a57-d91e72b5bdc9n@googlegroups.com>
Content-Language: en-US

by: BGB - Fri, 14 Jul 2023 00:11 UTC

On 7/13/2023 12:30 PM, MitchAlsup wrote:
> On Thursday, July 13, 2023 at 11:47:50 AM UTC-5, BGB wrote:
>> On 7/13/2023 10:52 AM, MitchAlsup wrote:
>>> On Wednesday, July 12, 2023 at 11:36:06 PM UTC-5, BGB wrote:
>>>> On 7/12/2023 8:52 PM, MitchAlsup wrote:
>>>>> On Wednesday, July 12, 2023 at 7:53:24 PM UTC-5, BGB wrote:
>>>>>> On 7/12/2023 5:53 PM, MitchAlsup wrote:
>>>>>>> On Wednesday, July 12, 2023 at 5:33:12 PM UTC-5, BGB wrote:
>>>>>
>>>>>>> For 99% of programs, yes; but what bout the truly large stadium
>>>>>>> sized machines with 16,000 processors, 32EB of DRAM, running
>>>>>>> shared database applications in cache coherent memory over CXL
>>>>>>> extension to PCIe ??
>>>>>>>
>>>>>> Then one deals with it...
>>>>> <
>>>>> What if it just ran fine from day 1 without any hassle ?
>>>> Could be done, wouldn't be free.
>>>>>>
>>>>>> But, which is worse:
>>>>>> Being limited to Disp33s in the default case, paying a few extra cycles
>>>>>> as needed to make it bigger;
>>>>>> Adding an extra 2 clock cycles or so of latency to every memory access.
>>>>> <
>>>>> What makes you think 64-bit constants add any cycles to any memory
>>>>> references ??
>>>> If the AGU needed to calculate using a full 96 bit adder, this is going
>>>> to have a higher latency cost than, say, adding 48 bit value with a
>>>> 36-bit shifted displacement with the upper 48 bits being copied over
>>>> unchanged.
>>> <
>>> In full custom VLSI, a 64-bit adder is 11-actual gates of delay that generally
>>> get pounded down (i.e., circuit designed) to 8-gates of actual delay. A 3-input
>>> version is 1 gate delay longer. None of these gates are wider than fan-in 4
>>> NAND gates or fan-in 3 NOR gates or fan-in 3-X[N]OR gates.
>>> <
>> I don't have "gates of delay" here, rather "logic delay" and "net
>> delay", both measured in nanoseconds. The "net delay" tends to be the
>> larger of the two (typically by a factor of 2x or 3x).
>>
>> Time seems to be however long it takes to get through a few CARRY4
>> units, and associated MUXF units, ...
>>
>> Then, there is "worst negative slack", where "all is good" so long as
>> this number remains positive (eg, it did not fail timing).
>>
>>
>> There is also "fanout", which if this number gets large (say, 1000+),
>> the "net delay" also typically gets bigger.
>>
>> When timing fails, often it is with paths that seem to zigzag across
>> much of the FPGA.
>>> Carry is setup into 16-bit chunks (for fan-out and wire delay reasons.
>>> Carry[15] is routed to carry[31] and then back to output select[15..31]
>>> so the carries move 16-bits every gate of carry chain delay and fan-out
>>> is 1 and fan-in is 1 on this wire.
>>> <
>>> In integer form that 11-gates includes the XOR used to ADD and SUB
>>> In AGEN form that 11-gates includes the shift {0,1,2,3} for indexing.
>>> These are Carry Select Adders.
>>> <
>> Yeah, I had also used 16-bit chunks, but more because fiddling showed
>> that 16-bit worked out best (though 12 and 20 also work OK). Needs to be
>> kept a multiple of 4 for sake of the CARRY4's doing 4+4=>4 (with
>> CIN/COUT signals).
>>> In comparison, the logic delay of an SRAM is 12-gates of delay and you still
>>> need to put flip-flops on the address-in and data-out busses.
>> The Block-RAM's operate on a clock-edge.
>> This is on the end of the EX1 clock, so:
>>
>> EX1: Calculate Address;
>> Feed index into Block-RAM's (on clock edge);
>> EX2:
>> Check for Hit/Miss
>> Extract sub-block
>> EX3:
>> Result.
>>
>>
>> For the L2 cache, there is more delay:
>> Request arrives on bus.
>> -- edge
>> Feed into BRAM arrays;
>> -- edge
>> (idle)
>> -- edge
>> Check for hit/miss.
>> --
>> Emit response to bus.
>>
>>
>> These extra clock edges helping significantly with timing.
>> But, don't want to do this for L1 cache, as this would increase latency.
>>
>>
>> The L2 cache XORs all the bits together for the index, whereas the L1
>> only uses the low-order bits as a modulo index.
>>>>
>>>>
>>>> Say:
>>>> (35: 0): Add Base with Disp shifted 0..3 bits.
>>>> (47:36): either val+1, val-1, or val (copy unchanged).
>>>> (95:48): Copy unchanged.
>>>>
>>>> But, say, Disp48 could be supported later or as an optional extension.
>>> <
>>> How wide the operands are is determined WAY before the AGEN adder,
>>> AGEN only sees 64-bit inputs, and only performs 3-input 64-bit additions.
>>> All the sizing is performed in Decode and Forwarding. Once something
>>> gets on the operand busses it is 64-bits in size. Oh, and by the way,
>>> selecting between 16-bits, 32-bits, and 64-bits takes less than ½ the
>>> gates of delay that forwarding on a 1-wide machine takes.
>> I was handling AGU in EX1, it gets register inputs in just the same way
>> as the ALU or the various other units (namely, the values of the Rs and
>> Rt register ports, some control bits for the scale and other things, ...).
>>
>> It is just sort of shoe-horned in with forming the address to hand off
>> to the L1 D$ (within the same clock-cycle), and then is used to index
>> the array.
>>
>> Actual hit/miss checking being done during the following clock cycle.
>>>>
>>>>
>>>> The reason for limiting displacements to 33 bits is not about how to get
>>>> the displacement value into the AGU, but rather the latency of the carry
>>>> propagation within the AGU (which is then "immediately" handed off to
>>>> both the L1 D$ and branch-handling logic).
>>> <
>>> I have been at this since 1972, adder circuit design.
>> Granted, I haven't existed this long...
>>>>
>>>> A bigger displacement would likely need to give a full clock cycle for
>>>> the AGU to do its thing.
>>> <
>>> Nope, no more than 2-gates of delay,
>>>>
>>>>
>>>> However, if the calculation is done externally via the ALU when larger
>>>> displacements are needed, the ALU can be given a full clock cycle to do
>>>> its work (since these operations already have a 2-cycle latency).
>>> <
>>> The ALU is burdened by having to be able to subtract, whereas AGEN is not.
>> Granted.
>>
>> But, the ALU also has more time given to it.
>> It has the full EX1 cycle...
>> (Mostly) we don't use its outputs until the cycle has finished.
>> So, it produces a result, which we can "see" in EX2.
>>
>>
>> The AGU produces a result in EX1, and the result is also consumed in EX1.
>>
>> Though, things like Branch initiation don't actually take hold until EX2
>> (partly to reduce timing stress on the AGU). Though, the LEA instruction
>> does produce a result in EX1 (so the AGU's output also goes into all the
>> register-forwarding logic, etc).
>>>>
>>>> Similar goes for the 33-bit displacement limit on relative branches, etc...
>>>>
>>> At this point it is pretty clear you don't know what you are talking about
>>> to better than hand waving accuracy. Maybe (MAYBE) in FPGA what
>>> you say is true, but it is not in full custom VLSI.
>>>
>> I have spent enough time fiddling with getting stuff to pass timing in
>> Vivado that I think I have an idea how it all behaves.
>>
>>
>> Carry select makes it faster, but the benefits are mixed (and still
>> works best when the adder is given a full clock-cycle just to itself).
> <
> If you do this you cannot do back to back dependent adds.
> <

Granted, at least not without penalty cycles.
There will be penalty cycles in my case.

> Most designs place the loop around
> a) add
> b) result bus drive
> c) operand multiplexer (a.k.a. forwarding)
> so that back to back adds work.
> I used 3 line items because I have worked in designs that placed the
> principle clock edge between {(a and b) (b and c) (c and a)} and I
> have worked on designs where the principle clock edge was rising
> and falling.
> <
> My K9 design at AMD used 1 complete cycle for ADD, but it was a
> 8-gate/clock design--reading the register was 3 clocks !!! Would have
> been 5 GHz in 90nm (also would have been HOT).

Click here to read the complete article

Re: How much space did the 68000 registers take up?

<806ce3e6-0a9f-4104-af9e-06d87f322b1fn@googlegroups.com>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33224&group=comp.arch#33224

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5bd6:0:b0:403:c99a:600d with SMTP id b22-20020ac85bd6000000b00403c99a600dmr9207qtb.7.1689296080411;
Thu, 13 Jul 2023 17:54:40 -0700 (PDT)
X-Received: by 2002:a05:6870:5aa7:b0:1b0:1225:ffc0 with SMTP id
dt39-20020a0568705aa700b001b01225ffc0mr3117981oab.2.1689296080232; Thu, 13
Jul 2023 17:54:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 13 Jul 2023 17:54:39 -0700 (PDT)
In-Reply-To: <u8q3rb$3pivn$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8cea:3802:b236:969c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8cea:3802:b236:969c
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<u8fe16$292j5$1@dont-email.me> <5ea12dcd-a5c6-4b74-a580-c0d871694dbdn@googlegroups.com>
<u8fmeq$29ukm$1@dont-email.me> <6beed6fc-0432-4f55-9f38-36a57670c56en@googlegroups.com>
<u8hb2p$2iql4$1@dont-email.me> <bf88abfc-d0dc-4c97-8717-65d21328bb71n@googlegroups.com>
<u8k7a1$3002q$1@dont-email.me> <bc7611d8-ba5d-488b-960d-8c539b9cf96cn@googlegroups.com>
<u8kl05$31cjp$1@dont-email.me> <b0167266-782f-4468-a552-a85e3a5da427n@googlegroups.com>
<4c89ac71-75db-4688-a5b2-e89bf57a1fbdn@googlegroups.com> <u8lf80$hh9l$1@newsreader4.netcologne.de>
<u8n9n3$3d8jv$1@dont-email.me> <7aae0ec6-7a81-471c-97c2-c4087fd46a33n@googlegroups.com>
<u8nhu0$3du57$1@dont-email.me> <07dd9312-a6d7-4666-978b-c4e0a0852c67n@googlegroups.com>
<u8nuvi$3ik9c$1@dont-email.me> <090e577f-e465-4c7a-80a2-c7cc90b54579n@googlegroups.com>
<u8p9rh$3mr6p$1@dont-email.me> <c05e5b3a-6cd8-4af3-8a57-d91e72b5bdc9n@googlegroups.com>
<u8q3rb$3pivn$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <806ce3e6-0a9f-4104-af9e-06d87f322b1fn@googlegroups.com>
Subject: Re: How much space did the 68000 registers take up?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Fri, 14 Jul 2023 00:54:40 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5153

by: MitchAlsup - Fri, 14 Jul 2023 00:54 UTC

On Thursday, July 13, 2023 at 7:11:29 PM UTC-5, BGB wrote:
> On 7/13/2023 12:30 PM, MitchAlsup wrote:
>
> > If you do this you cannot do back to back dependent adds.
> > <
> Granted, at least not without penalty cycles.
> There will be penalty cycles in my case.
> > Most designs place the loop around
> > a) add
> > b) result bus drive
> > c) operand multiplexer (a.k.a. forwarding)
> > so that back to back adds work.
> > I used 3 line items because I have worked in designs that placed the
> > principle clock edge between {(a and b) (b and c) (c and a)} and I
> > have worked on designs where the principle clock edge was rising
> > and falling.
> > <
> > My K9 design at AMD used 1 complete cycle for ADD, but it was a
> > 8-gate/clock design--reading the register was 3 clocks !!! Would have
> > been 5 GHz in 90nm (also would have been HOT).
> As noted, the ADD/SUB/... instructions have a 2 cycle latency...
>
> So:
> ADD R4, 1, R4 //1c
> ADD R5, 1, R5 //1c
> But:
> ADD R4, 1, R4 //2c
> ADD R4, 1, R4 //1c
>
LDA R4,[R4<<2,1]
>
<snip>
> >>
> > Which is why GPUs are down near 1 GHz instead of up at 5 GHz.
> I am just here trying to keep everything happy at 50 MHz.
<
Different ball game.
>
> Everything seemingly goes into a big amorphous blob, and the relative
> size of this blob factors into how quickly it can go.
>
> It is faster if it is smaller, but then if it is smaller, one could use
> a smaller FPGA.
>
>
> Apparently, some people do the "floor planning" thing, though in my
> case, the layout in the FPGA is "whatever Vivado happens to emit" (and
> it usually seems to cluster related things together on its own).
<
Floor planning and clock engineering is the difference between 3GHz
operation and 5 GHz operation.
> >>
> >> But, yeah, fiddling with timing has also resulted in stuff like the
> >> branch-predictor only internally operating on the low 24 bits of the
> >> address, and then falling back to a full branch if it would cross into
> >> another 16MB window (with the Branch ops routed through the AGU in EX1).
> > <
> > It is a predictor, it does not have to use all the bits--just enough to do a
> > good job.
> Yeah.
>
> All that happens here is that one might get a slowdown if they span a
> program's ".text" section across a 16MB boundary.
>
> It also ignores all the big branches...
>
> So, say, it cares about:
> The 32-bit Disp20 branches;
> The 16-bit Disp8 branches;
> The 32-bit Disp8 branches (Compare+Branch and similar);
> RTS and "JMP R1".
>
It would probably have good performance if it just used specified bits
from somewhere in the instruction that is pertinent to branching.
>
>

Re: How much space did the 68000 registers take up?

<u8qkm2$3ukc6$1@dont-email.me>

copy mid

https://news.novabbs.org/devel/article-flat.php?id=33225&group=comp.arch#33225

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: How much space did the 68000 registers take up?
Date: Thu, 13 Jul 2023 23:58:27 -0500
Organization: A noiseless patient Spider
Lines: 167
Message-ID: <u8qkm2$3ukc6$1@dont-email.me>
References: <27f3682e-a7cd-44e7-98bb-67b603b6b03en@googlegroups.com>
<u8fmeq$29ukm$1@dont-email.me>
<6beed6fc-0432-4f55-9f38-36a57670c56en@googlegroups.com>
<u8hb2p$2iql4$1@dont-email.me>
<bf88abfc-d0dc-4c97-8717-65d21328bb71n@googlegroups.com>
<u8k7a1$3002q$1@dont-email.me>
<bc7611d8-ba5d-488b-960d-8c539b9cf96cn@googlegroups.com>
<u8kl05$31cjp$1@dont-email.me>
<b0167266-782f-4468-a552-a85e3a5da427n@googlegroups.com>
<4c89ac71-75db-4688-a5b2-e89bf57a1fbdn@googlegroups.com>
<u8lf80$hh9l$1@newsreader4.netcologne.de> <u8n9n3$3d8jv$1@dont-email.me>
<7aae0ec6-7a81-471c-97c2-c4087fd46a33n@googlegroups.com>
<u8nhu0$3du57$1@dont-email.me>
<07dd9312-a6d7-4666-978b-c4e0a0852c67n@googlegroups.com>
<u8nuvi$3ik9c$1@dont-email.me>
<090e577f-e465-4c7a-80a2-c7cc90b54579n@googlegroups.com>
<u8p9rh$3mr6p$1@dont-email.me>
<c05e5b3a-6cd8-4af3-8a57-d91e72b5bdc9n@googlegroups.com>
<u8q3rb$3pivn$1@dont-email.me>
<806ce3e6-0a9f-4104-af9e-06d87f322b1fn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 14 Jul 2023 04:58:42 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c78e4dbbbde6992bf35f293de486ac2e";
logging-data="4149638"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/2cOOcWqaDyWhTJtzPBBcl"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:PAuoE4TaYqZHLQd394NiRS6lUo0=
In-Reply-To: <806ce3e6-0a9f-4104-af9e-06d87f322b1fn@googlegroups.com>
Content-Language: en-US

by: BGB - Fri, 14 Jul 2023 04:58 UTC

On 7/13/2023 7:54 PM, MitchAlsup wrote:
> On Thursday, July 13, 2023 at 7:11:29 PM UTC-5, BGB wrote:
>> On 7/13/2023 12:30 PM, MitchAlsup wrote:
>>
>>> If you do this you cannot do back to back dependent adds.
>>> <
>> Granted, at least not without penalty cycles.
>> There will be penalty cycles in my case.
>>> Most designs place the loop around
>>> a) add
>>> b) result bus drive
>>> c) operand multiplexer (a.k.a. forwarding)
>>> so that back to back adds work.
>>> I used 3 line items because I have worked in designs that placed the
>>> principle clock edge between {(a and b) (b and c) (c and a)} and I
>>> have worked on designs where the principle clock edge was rising
>>> and falling.
>>> <
>>> My K9 design at AMD used 1 complete cycle for ADD, but it was a
>>> 8-gate/clock design--reading the register was 3 clocks !!! Would have
>>> been 5 GHz in 90nm (also would have been HOT).
>> As noted, the ADD/SUB/... instructions have a 2 cycle latency...
>>
>> So:
>> ADD R4, 1, R4 //1c
>> ADD R5, 1, R5 //1c
>> But:
>> ADD R4, 1, R4 //2c
>> ADD R4, 1, R4 //1c
>>
> LDA R4,[R4<<2,1]
>>
> <snip>

Point was not to show efficient code here...
Rather to point out that, yeah, back-to-back ADD's and similar will have
a penalty.

This is mildly annoying, but doesn't seem to be a particularly
significant contributor to stall cycles, so sorta passable...

>>>>
>>> Which is why GPUs are down near 1 GHz instead of up at 5 GHz.
>> I am just here trying to keep everything happy at 50 MHz.
> <
> Different ball game.
>>
>> Everything seemingly goes into a big amorphous blob, and the relative
>> size of this blob factors into how quickly it can go.
>>
>> It is faster if it is smaller, but then if it is smaller, one could use
>> a smaller FPGA.
>>
>>
>> Apparently, some people do the "floor planning" thing, though in my
>> case, the layout in the FPGA is "whatever Vivado happens to emit" (and
>> it usually seems to cluster related things together on its own).
> <
> Floor planning and clock engineering is the difference between 3GHz
> operation and 5 GHz operation.

OK.

I am more just running stuff with the "Vivado Synthesis Defaults"
preset, and not using much "extra" beyond setting up timing constraints.

Pretty much everything runs on a 50MHz master clock, with most lower
speed clocks being faked using adders...

There are a few places where "BUFG" modules or similar could be useful,
but are not used because I want to keep the code able to be used in
Verilator (so, pretty much everything is written in the common style
accepted by both Vivado and Verilator).

I fiddled with stuff a bit more when I was originally trying to target
100 MHz, but after dropping to 50, was like "screw it" and was doing
closer to the bare minimum in terms of fiddling with synthesis and similar.

Like, if I am doing the minimum, it is more likely that others could
more easily reproduce my results, vs getting something which will
invariably fail timing.

Granted, it is possible that I could provide the Vivado constraints
files and similar, which might be helpful (vs people figuring out the
constraints to set and how to set up the IO for the dev-board).

But, these parts aren't "that" difficult to set up...

>>>>
>>>> But, yeah, fiddling with timing has also resulted in stuff like the
>>>> branch-predictor only internally operating on the low 24 bits of the
>>>> address, and then falling back to a full branch if it would cross into
>>>> another 16MB window (with the Branch ops routed through the AGU in EX1).
>>> <
>>> It is a predictor, it does not have to use all the bits--just enough to do a
>>> good job.
>> Yeah.
>>
>> All that happens here is that one might get a slowdown if they span a
>> program's ".text" section across a 16MB boundary.
>>
>> It also ignores all the big branches...
>>
>> So, say, it cares about:
>> The 32-bit Disp20 branches;
>> The 16-bit Disp8 branches;
>> The 32-bit Disp8 branches (Compare+Branch and similar);
>> RTS and "JMP R1".
>>
> It would probably have good performance if it just used specified bits
> from somewhere in the instruction that is pertinent to branching.

It is pattern matching the whole instruction in these cases and then
setting flags for which type of calculation (if any) is applicable.

Otherwise:
Seems that the ALUPTR extension was particularly bad for timing.

After switching to 2-cycle LEA, it seems that full 48-bit AGU is no
longer as much of an issue (it seems likely the "hot path" may have been
into the register-forwarding logic).

Decided to go and redefine one of the "mostly unused" virtual addressing
modes ("Quadrant Add Mode") as behaving more like a 64-bit flat address
mode (effectively disabling the tag bits).

The mapping scheme for the high-order bits will be a little wonky in
this mode. Also this mode would still be limited to 48 bit addresses for
executable code and similar (code, stack, and data/bss would still need
to be within the lower 48-bit range).

This scheme would effectively replace a former mode that had a 60-bit
addressing mode (but slightly more convoluted address rules).

Still debating whether to define instructions to allow trying to fake a
96-bit linear address mode.

As-is, one wouldn't be able to have arrays larger than 256TB, but this
isn't exactly much of a loss...

Ended up deciding to go with a more straightforward mapping after noting
that this mode would most make sense for userland, which "probably"
isn't going to be located in "Quadrant 0".

Though, this does mean, if GBH==0, then:
0000800000000000..0000FFFFFFFFFFFF
Is going to contain special address bypass ranges and MMIO and similar.

If we set up a userland process where, say, PCH=GBH=0x0000000000010000
or similar, then the MMIO/etc range will not exist (note that if
PCH!=GBH, it would behave more like a Harvard architecture).

....

Thomas Koenig <tkoenig@netcologne.de> writes:
>You would probably have to do a reverse bootstrapping - use an old
>Linux with gcc 4.something to compiler gcc in the 3.* timeframe,
>use that for gcc 2.* and that for gcc 1.*; that could then be made
>to work. (Some more intermediate steps might be required).

In 2015 I built gcc-2.7 (from 1997 or so) on a then-current system
(maybe using gcc-4.8). It was not easy, but I did not need several
steps. IIRC the major problem was that I was on an AMD64 system and
gcc-2.7 does not support that architecture; compiling, assembling, and
linking IA-32 stuff on the AMD64 machine took special steps.

And running gcc-2.7 then also was a problem; IIRC I used gcc-2.7 -S to
produce assembly-language code, and then manually started the
assembling and linking, because the gcc-2.7 compiler driver does not
know how to properly call the assembler and linker of the 2015 system.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Scott Lurndal <scott@slp53.sl.home> schrieb:
> Thomas Koenig <tkoenig@netcologne.de> writes:
>>Scott Lurndal <scott@slp53.sl.home> schrieb:
>>
>>> I did recently try compiling the version 6 Unix C compiler
>>> with gcc, wouldn't compile, even in most lenient mode.
>>
>>You would probably have to do a reverse bootstrapping - use an old
>>Linux with gcc 4.something to compiler gcc in the 3.* timeframe,
>>use that for gcc 2.* and that for gcc 1.*; that could then be made
>>to work. (Some more intermediate steps might be required).
>>
>>Chances are you have better things to do with your time, though :-)
>
> The biggest problem is that MOS names are global symbols and can
> be used with any pointer. I don't believe even early gcc supported
> that misfeature, which is heavily used by the v6 compiler.

I've just looked at it, and gcc was first released end of 1987. By
that time of the v6 compiler was no longer in general use.
Quoting Dennis Ritchie:

"At the start of the decade, nearly every compiler was based on
Johnson's pcc; by 1985 there were many independently-produced
compiler products."

So, this is not surprising.

One question remains: What is an MOS name? I did not find that term.

On 2023-07-15 10:37, Thomas Koenig wrote:
> Scott Lurndal <scott@slp53.sl.home> schrieb:
>> Thomas Koenig <tkoenig@netcologne.de> writes:
>>> Scott Lurndal <scott@slp53.sl.home> schrieb:
>>>
>>>> I did recently try compiling the version 6 Unix C compiler
>>>> with gcc, wouldn't compile, even in most lenient mode.
>>>
>>> You would probably have to do a reverse bootstrapping - use an old
>>> Linux with gcc 4.something to compiler gcc in the 3.* timeframe,
>>> use that for gcc 2.* and that for gcc 1.*; that could then be made
>>> to work. (Some more intermediate steps might be required).
>>>
>>> Chances are you have better things to do with your time, though :-)
>>
>> The biggest problem is that MOS names are global symbols and can
>> be used with any pointer. I don't believe even early gcc supported
>> that misfeature, which is heavily used by the v6 compiler.
>
> I've just looked at it, and gcc was first released end of 1987. By
> that time of the v6 compiler was no longer in general use.
> Quoting Dennis Ritchie:
>
> "At the start of the decade, nearly every compiler was based on
> Johnson's pcc; by 1985 there were many independently-produced
> compiler products."
>
> So, this is not surprising.
>
> One question remains: What is an MOS name? I did not find that term.

Not sure, but it would make sense if MOS = Member Of Struct.

IIRC, in early C compilers the name of a struct member represented the
offset of the member's value in the struct value, and that offset could
be applied to any pointer, whatever the nominal type of the pointee.
More or less as if any pointer could be understood as pointing to any
known type of struct -- a kind of "implicit union" of all visible structs.

Niklas Holsti <niklas.holsti@tidorum.invalid> writes:
>On 2023-07-15 10:37, Thomas Koenig wrote:
>> Scott Lurndal <scott@slp53.sl.home> schrieb:
>>> Thomas Koenig <tkoenig@netcologne.de> writes:
>>>> Scott Lurndal <scott@slp53.sl.home> schrieb:
>>>>
>>>>> I did recently try compiling the version 6 Unix C compiler
>>>>> with gcc, wouldn't compile, even in most lenient mode.
>>>>
>>>> You would probably have to do a reverse bootstrapping - use an old
>>>> Linux with gcc 4.something to compiler gcc in the 3.* timeframe,
>>>> use that for gcc 2.* and that for gcc 1.*; that could then be made
>>>> to work. (Some more intermediate steps might be required).
>>>>
>>>> Chances are you have better things to do with your time, though :-)
>>>
>>> The biggest problem is that MOS names are global symbols and can
>>> be used with any pointer. I don't believe even early gcc supported
>>> that misfeature, which is heavily used by the v6 compiler.
>>
>> I've just looked at it, and gcc was first released end of 1987. By
>> that time of the v6 compiler was no longer in general use.
>> Quoting Dennis Ritchie:
>>
>> "At the start of the decade, nearly every compiler was based on
>> Johnson's pcc; by 1985 there were many independently-produced
>> compiler products."
>>
>> So, this is not surprising.
>>
>> One question remains: What is an MOS name? I did not find that term.
>
>
>Not sure, but it would make sense if MOS = Member Of Struct.
>
>IIRC, in early C compilers the name of a struct member represented the
>offset of the member's value in the struct value, and that offset could
>be applied to any pointer, whatever the nominal type of the pointee.
>More or less as if any pointer could be understood as pointing to any
>known type of struct -- a kind of "implicit union" of all visible structs.
>

Correct.

"The Computer made me do it."

devel / comp.arch / Re: How much space did the 68000 registers take up?

devel / comp.arch / Re: How much space did the 68000 registers take up?

Subject	Author
How much space did the 68000 registers take up?	Russell Wallace
Re: How much space did the 68000 registers take up?	MitchAlsup
Re: How much space did the 68000 registers take up?	Russell Wallace
Re: How much space did the 68000 registers take up?	Stephen Fuld
Re: How much space did the 68000 registers take up?	Russell Wallace
Re: How much space did the 68000 registers take up?	Quadibloc
Re: How much space did the 68000 registers take up?	Thomas Koenig
Re: How much space did the 68000 registers take up?	MitchAlsup
Re: bus wars, How much space did the 68000 registers take up?	John Levine
Re: bus wars, How much space did the 68000 registers take up?	BGB
Re: bus wars, How much space did the 68000 registers take up?	John Levine
Re: bus wars, How much space did the 68000 registers take up?	BGB
Re: bus wars, How much space did the 68000 registers take up?	robf...@gmail.com
Re: bus wars, How much space did the 68000 registers take up?	Terje Mathisen
Re: bus wars, How much space did the 68000 registers take up?	MitchAlsup
Re: bus wars, How much space did the 68000 registers take up?	John Levine
Re: bus wars, How much space did the 68000 registers take up?	Scott Lurndal
Re: bus wars, How much space did the 68000 registers take up?	BGB
Re: bus wars, How much space did the 68000 registers take up?	John Levine
Re: bus wars, How much space did the 68000 registers take up?	Terje Mathisen
Re: bus wars, How much space did the 68000 registers take up?	Robert Swindells
Re: bus wars, How much space did the 68000 registers take up?	Timothy McCaffrey
Re: bus wars, How much space did the 68000 registers take up?	EricP
Re: bus wars, How much space did the 68000 registers take up?	Terje Mathisen
Re: bus wars, How much space did the 68000 registers take up?	Bernd Linsel
Re: bus wars, How much space did the 68000 registers take up?	Terje Mathisen
Re: bus wars, How much space did the 68000 registers take up?	Timothy McCaffrey
Re: bus wars, How much space did the 68000 registers take up?	Stephen Fuld
Re: bus wars, How much space did the 68000 registers take up?	BGB
Re: bus wars, How much space did the 68000 registers take up?	Thomas Koenig
Re: bus wars, How much space did the 68000 registers take up?	Anton Ertl
Re: bus wars, How much space did the 68000 registers take up?	Terje Mathisen
Re: bus wars, How much space did the 68000 registers take up?	Stephen Fuld
Re: bus wars, How much space did the 68000 registers take up?	BGB
Re: bus wars, How much space did the 68000 registers take up?	MitchAlsup
Re: bus wars, How much space did the 68000 registers take up?	Scott Lurndal
Re: bus wars, How much space did the 68000 registers take up?	MitchAlsup
Re: bus wars, How much space did the 68000 registers take up?	Michael S
Re: bus wars, How much space did the 68000 registers take up?	Scott Lurndal
Re: bus wars, How much space did the 68000 registers take up?	Stephen Fuld
Re: bus wars, How much space did the 68000 registers take up?	Scott Lurndal
Re: bus wars, How much space did the 68000 registers take up?	Stephen Fuld
Re: bus wars, How much space did the 68000 registers take up?	Michael S
Re: bus wars, How much space did the 68000 registers take up?	Michael S
Re: bus wars, How much space did the 68000 registers take up?	Scott Lurndal
Re: bus wars, How much space did the 68000 registers take up?	Michael S
Re: bus wars, How much space did the 68000 registers take up?	Michael S
Re: bus wars, How much space did the 68000 registers take up?	Terje Mathisen
Re: bus wars, How much space did the 68000 registers take up?	Michael S
Re: bus wars, How much space did the 68000 registers take up?	Scott Lurndal
Re: bus wars, How much space did the 68000 registers take up?	Michael S
Re: bus wars, How much space did the 68000 registers take up?	Scott Lurndal
Re: bus wars, How much space did the 68000 registers take up?	BGB
Re: bus wars, How much space did the 68000 registers take up?	Terje Mathisen
Re: bus wars, How much space did the 68000 registers take up?	David Schultz
Re: bus wars, How much space did the 68000 registers take up?	MitchAlsup
Re: bus wars, How much space did the 68000 registers take up?	BGB
Re: bus wars, How much space did the 68000 registers take up?	John Dallman
Re: bus wars, How much space did the 68000 registers take up?	Anton Ertl
Re: bus wars, How much space did the 68000 registers take up?	Anton Ertl
Re: bus wars, How much space did the 68000 registers take up?	Timothy McCaffrey
Re: bus wars, How much space did the 68000 registers take up?	Anton Ertl
Re: bus wars, How much space did the 68000 registers take up?	Timothy McCaffrey
Re: bus wars, How much space did the 68000 registers take up?	Anton Ertl
Re: bus wars, How much space did the 68000 registers take up?	Timothy McCaffrey
Re: bus wars, How much space did the 68000 registers take up?	MitchAlsup
Re: bus wars, How much space did the 68000 registers take up?	MitchAlsup
Re: bus wars, How much space did the 68000 registers take up?	tridac
Re: bus wars, How much space did the 68000 registers take up?	Anton Ertl
Re: bus wars, How much space did the 68000 registers take up?	MitchAlsup
Re: bus wars, How much space did the 68000 registers take up?	Terje Mathisen
Re: bus wars, How much space did the 68000 registers take up?	Anton Ertl
Re: bus wars, How much space did the 68000 registers take up?	Thomas Koenig
Re: bus wars, How much space did the 68000 registers take up?	John Levine
Re: bus wars, How much space did the 68000 registers take up?	Michael S
Re: mainframe bus wars, How much space did the 68000 registers take up?	John Levine
Re: mainframe bus wars, How much space did the 68000 registers take up?	Lynn Wheeler
Re: How much space did the 68000 registers take up?	EricP
Re: How much space did the 68000 registers take up?	MitchAlsup
Re: CISC all the way down, How much space did the 68000 registers take up?	John Levine
Re: How much space did the 68000 registers take up?	Anton Ertl
Re: How much space did the 68000 registers take up?	BGB
Re: How much space did the 68000 registers take up?	MitchAlsup
Re: How much space did the 68000 registers take up?	BGB
Re: How much space did the 68000 registers take up?	MitchAlsup
Re: How much space did the 68000 registers take up?	BGB
Re: How much space did the 68000 registers take up?	MitchAlsup
Re: How much space did the 68000 registers take up?	BGB
Re: How much space did the 68000 registers take up?	MitchAlsup
Re: How much space did the 68000 registers take up?	BGB
Re: How much space did the 68000 registers take up?	MitchAlsup
Re: How much space did the 68000 registers take up?	MitchAlsup
Re: How much space did the 68000 registers take up?	BGB-Alt
Re: How much space did the 68000 registers take up?	robf...@gmail.com
Re: How much space did the 68000 registers take up?	MitchAlsup
Re: How much space did the 68000 registers take up?	BGB
Re: How much space did the 68000 registers take up?	Thomas Koenig
Re: How much space did the 68000 registers take up?	BGB
Re: How much space did the 68000 registers take up?	MitchAlsup
Re: How much space did the 68000 registers take up?	BGB
Re: How much space did the 68000 registers take up?	MitchAlsup
Re: How much space did the 68000 registers take up?	Scott Lurndal
Re: How much space did the 68000 registers take up?	BGB
Re: How much space did the 68000 registers take up?	Thomas Koenig
Re: How much space did the 68000 registers take up?	MitchAlsup