Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

The "cutting edge" is getting rather dull. -- Andy Purshottam


devel / comp.arch / Re: misaligned Fortran, What did it cost the 8086 to support unaligned access?

SubjectAuthor
* What did it cost the 8086 to support unaligned access?Russell Wallace
+* Re: What did it cost the 8086 to support unaligned access?John Levine
|+* Re: What did it cost the 8086 to support unaligned access?MitchAlsup
||+- Re: What did it cost the 8086 to support unaligned access?John Levine
||+- Re: What did it cost the 8086 to support unaligned access?Quadibloc
||`* Re: What did it cost the 8086 to support unaligned access?Thomas Koenig
|| +* Re: What did it cost the 8086 to support unaligned access?MitchAlsup
|| |`* Re: What did it cost the 8086 to support unaligned access?Thomas Koenig
|| | `* Re: misaligned Fortran, What did it cost the 8086 to support unaligned access?John Levine
|| |  +* Re: misaligned Fortran, What did it cost the 8086 to supportThomas Koenig
|| |  |`* Re: misaligned Fortran, What did it cost the 8086 to supportJohn Levine
|| |  | `- Re: misaligned Fortran, What did it cost the 8086 to supportMitchAlsup
|| |  `* Re: misaligned Fortran, What did it cost the 8086 to supportThomas Koenig
|| |   +- Re: misaligned Fortran, What did it cost the 8086 to supportJohn Levine
|| |   `- Re: misaligned Fortran, What did it cost the 8086 to supportMitchAlsup
|| +- Re: old Fortran, What did it cost the 8086 to support unaligned access?John Levine
|| `- Re: What did it cost the 8086 to support unaligned access?Anton Ertl
|+* Re: What did it cost the 8086 to support unaligned access?Thomas Koenig
||+- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
||+* Re: What did it cost the 8086 to support unaligned access?Michael S
|||`- Re: What did it cost the 8086 to support unaligned access?BGB
||`* Re: What did it cost the 8086 to support unaligned access?EricP
|| +* Re: What did it cost the 8086 to support unaligned access?Quadibloc
|| |+* Re: What did it cost the 8086 to support unaligned access?EricP
|| ||+* Re: What did it cost the 8086 to support unaligned access?Quadibloc
|| |||+- Re: What did it cost the 8086 to support unaligned access?EricP
|| |||`- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
|| ||`- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
|| |`- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
|| +* Re: What did it cost the 8086 to support unaligned access?Anton Ertl
|| |+* Re: What did it cost the 8086 to support unaligned access?robf...@gmail.com
|| ||`- Re: What did it cost the 8086 to support unaligned access?BGB
|| |+- Re: What did it cost the 8086 to support unaligned access?Quadibloc
|| |`* Re: What did it cost the 8086 to support unaligned access?EricP
|| | `- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
|| `* Re: What did it cost the 8086 to support unaligned access?Timothy McCaffrey
||  +- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
||  +* Re: What did it cost the 8086 to support unaligned access?Thomas Koenig
||  |`- Re: What did it cost the 8086 to support unaligned access?BGB
||  +* Re: What did it cost the 8086 to support unaligned access?Andy Valencia
||  |`* Re: What did it cost the 8086 to support unaligned access?Terje Mathisen
||  | +* Re: What did it cost the 8086 to support unaligned access?Stephen Fuld
||  | |+* Re: What did it cost the 8086 to support unaligned access?MitchAlsup
||  | ||`* Re: What did it cost the 8086 to support unaligned access?Stephen Fuld
||  | || `- Re: What did it cost the 8086 to support unaligned access?Thomas Koenig
||  | |`- Re: What did it cost the 8086 to support unaligned access?Terje Mathisen
||  | `- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
||  `- Re: What did it cost the 8086 to support unaligned access?Anton Ertl
|`* Re: What did it cost the 8086 to support unaligned access?Michael S
| `- Re: What did it cost the 8086 to support unaligned access?John Levine
+- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
+* Re: What did it cost the 8086 to support unaligned access?Quadibloc
|`- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
+* Re: What did it cost the 8086 to support unaligned access?Terje Mathisen
|`* Re: What did it cost the 8086 to support unaligned access?BGB
| `* Re: What did it cost the 8086 to support unaligned access?Terje Mathisen
|  `* Re: What did it cost the 8086 to support unaligned access?BGB
|   `- Re: What did it cost the 8086 to support unaligned access?BGB
`- Re: What did it cost the 8086 to support unaligned access?EricP

Pages:123
Re: What did it cost the 8086 to support unaligned access?

<3044deb4-13f3-4d55-bb35-2ed39272397cn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33091&group=comp.arch#33091

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:3197:b0:765:28ed:22ae with SMTP id bi23-20020a05620a319700b0076528ed22aemr14705qkb.14.1688769839697;
Fri, 07 Jul 2023 15:43:59 -0700 (PDT)
X-Received: by 2002:a05:6870:b7b0:b0:1b0:3f7f:673d with SMTP id
ed48-20020a056870b7b000b001b03f7f673dmr64813oab.6.1688769839174; Fri, 07 Jul
2023 15:43:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!1.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 7 Jul 2023 15:43:58 -0700 (PDT)
In-Reply-To: <u89oj9$1dqui$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:9856:90d0:2967:a029;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:9856:90d0:2967:a029
References: <u89e7b$9l3b$2@newsreader4.netcologne.de> <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de>
<1jnpM.980$8Ma1.956@fx37.iad> <e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>
<168875049023.17324.14559715909020368064@media.vsta.org> <u89oj9$1dqui$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3044deb4-13f3-4d55-bb35-2ed39272397cn@googlegroups.com>
Subject: Re: What did it cost the 8086 to support unaligned access?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Fri, 07 Jul 2023 22:43:59 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 43
 by: MitchAlsup - Fri, 7 Jul 2023 22:43 UTC

On Friday, July 7, 2023 at 2:21:17 PM UTC-5, Terje Mathisen wrote:
> Andy Valencia wrote:
> > Thomas Koenig <tko...@netcologne.de> writes:
> >>> If you get a raw network packet (complete with Ethernet header)
> >>> you can get all sorts of miscellaneous bytes thrown in that screws
> >>> up alignment.
> >> That is the first (for me) really convincing use case for supporting
> >> misaligned data.
> >
> > The first version of L2TP had a header which could be quite compact. I'm
> > told that later versions of the protocol shed much of this so as to let
> > silicon do the processing with less fuss. Fiddly bits and bytes, especially
> > with optional presence, make life harder for silicon.
> Andy, I would argue that this holds even more for software!
>
> I.e. I am pretty sure Mitch would agree that HW is great for doing
> multi-way decisions, which is exactly what software find to be the hardest.
<
The thing with HW is that is can make multi-way decisions as a set
and each bit in the set can control different HW. SW only has the
notion of a point-of-control, HW has the notion of multiple-points-
of-concurrent-control; even microcode has notions of multiple-points-
of-concurrent-control. Bits<x:y> control ALU, bits<i:j> control AGEN,...
>
> It is not a coincidence that a HW h264 decoder is both faster and uses
> far less power than a software ditto.
<
If only for the fact it does not have to increment an IP and read an instruction
from SRAM.
>
> This was still true after I had shown Intel how to double the speed of
> their reference implementation, instead of implementing my ideas they
> licensed a chunk of VLSI to do the same in hardware.
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: What did it cost the 8086 to support unaligned access?

<971b7842-c64e-4375-811c-3e03a2b6f25dn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33092&group=comp.arch#33092

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5e51:0:b0:3fd:e410:7399 with SMTP id i17-20020ac85e51000000b003fde4107399mr34947qtx.2.1688769942306;
Fri, 07 Jul 2023 15:45:42 -0700 (PDT)
X-Received: by 2002:a05:6a00:98e:b0:675:b734:d2fe with SMTP id
u14-20020a056a00098e00b00675b734d2femr8833001pfg.3.1688769941742; Fri, 07 Jul
2023 15:45:41 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 7 Jul 2023 15:45:41 -0700 (PDT)
In-Reply-To: <u89r8q$1e5ro$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:9856:90d0:2967:a029;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:9856:90d0:2967:a029
References: <u89e7b$9l3b$2@newsreader4.netcologne.de> <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de>
<1jnpM.980$8Ma1.956@fx37.iad> <e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>
<168875049023.17324.14559715909020368064@media.vsta.org> <u89oj9$1dqui$1@dont-email.me>
<u89r8q$1e5ro$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <971b7842-c64e-4375-811c-3e03a2b6f25dn@googlegroups.com>
Subject: Re: What did it cost the 8086 to support unaligned access?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Fri, 07 Jul 2023 22:45:42 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Fri, 7 Jul 2023 22:45 UTC

On Friday, July 7, 2023 at 3:06:54 PM UTC-5, Stephen Fuld wrote:
> On 7/7/2023 12:21 PM, Terje Mathisen wrote:
>
> snip
> > It is not a coincidence that a HW h264 decoder is both faster and uses
> > far less power than a software ditto.
> While that is certainly true, I think it is an example of the general
> principle that, above some minimum level of complexity (and below
> another, much higher level), almost any task can be done faster with
> lower power in hardware than in software. The issue for designers is
> whether the particular task happens often enough and saves enough in a
> particular workload to justify putting it into hardware. Different
> design teams make different decisions for their needs.
<
If you need to alter the "processing" after manufacture, you need
programmability, otherwise you save by not needing programmability.
>
> For example, at one end, most designs these days put floating point into
> hardware. In the middle, some designs put things like encryption in
> hardware, others don't. And some choose an intermediate solution with
> hardware encryption assist instructions. At the other end, AFAIK, only
> IBM puts sort assist in hardware.
>
>
> --
> - Stephen Fuld
> (e-mail address disguised to prevent spam)

Re: What did it cost the 8086 to support unaligned access?

<u8arne$1lfhi$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33093&group=comp.arch#33093

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
Date: Fri, 7 Jul 2023 22:20:44 -0700
Organization: A noiseless patient Spider
Lines: 29
Message-ID: <u8arne$1lfhi$1@dont-email.me>
References: <u89e7b$9l3b$2@newsreader4.netcologne.de>
<b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de>
<1jnpM.980$8Ma1.956@fx37.iad>
<e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>
<168875049023.17324.14559715909020368064@media.vsta.org>
<u89oj9$1dqui$1@dont-email.me> <u89r8q$1e5ro$1@dont-email.me>
<971b7842-c64e-4375-811c-3e03a2b6f25dn@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 8 Jul 2023 05:20:46 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="8013f979822ba159cfa7c801190d53ca";
logging-data="1752626"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+1JdHyb/w8ZCV19WFF1VS2rB6gXM5q9T4="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:l/Br/JO8RsB7zZ4fl8nZX6ZP+LY=
Content-Language: en-US
In-Reply-To: <971b7842-c64e-4375-811c-3e03a2b6f25dn@googlegroups.com>
 by: Stephen Fuld - Sat, 8 Jul 2023 05:20 UTC

On 7/7/2023 3:45 PM, MitchAlsup wrote:
> On Friday, July 7, 2023 at 3:06:54 PM UTC-5, Stephen Fuld wrote:
>> On 7/7/2023 12:21 PM, Terje Mathisen wrote:
>>
>> snip
>>> It is not a coincidence that a HW h264 decoder is both faster and uses
>>> far less power than a software ditto.
>> While that is certainly true, I think it is an example of the general
>> principle that, above some minimum level of complexity (and below
>> another, much higher level), almost any task can be done faster with
>> lower power in hardware than in software. The issue for designers is
>> whether the particular task happens often enough and saves enough in a
>> particular workload to justify putting it into hardware. Different
>> design teams make different decisions for their needs.
> <
> If you need to alter the "processing" after manufacture, you need
> programmability, otherwise you save by not needing programmability.

I absolutely agree. The reason for software is to handle those tasks
that either were not anticipated by the hardware designers, i.e.
software is far more flexible and more easily changed, or those tasks
that are just too complex to put in hardware, e.g. no one would consider
putting a Fortran compiler in hardware :-).

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: What did it cost the 8086 to support unaligned access?

<u8ca8q$1qm24$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33094&group=comp.arch#33094

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
Date: Sat, 8 Jul 2023 20:35:06 +0200
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <u8ca8q$1qm24$1@dont-email.me>
References: <u89e7b$9l3b$2@newsreader4.netcologne.de>
<b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de>
<1jnpM.980$8Ma1.956@fx37.iad>
<e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>
<168875049023.17324.14559715909020368064@media.vsta.org>
<u89oj9$1dqui$1@dont-email.me> <u89r8q$1e5ro$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 8 Jul 2023 18:35:06 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="43de747709394e2d03aaee566f32fa40";
logging-data="1923140"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/IGu7qL2PRSfgSegT4nTxx3pgGdFskKs096iCOBuc+8w=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.16
Cancel-Lock: sha1:CYvWBm4B4Vqz9Nv0pTpuyZV19xY=
In-Reply-To: <u89r8q$1e5ro$1@dont-email.me>
 by: Terje Mathisen - Sat, 8 Jul 2023 18:35 UTC

Stephen Fuld wrote:
> On 7/7/2023 12:21 PM, Terje Mathisen wrote:
>
> snip
>
>> It is not a coincidence that a HW h264 decoder is both faster and uses
>> far less power than a software ditto.
>
> While that is certainly true, I think it is an example of the general
> principle that, above some minimum level of complexity (and below
> another, much higher level), almost any task can be done faster with
> lower power in hardware than in software.  The issue for designers is
> whether the particular task happens often enough and saves enough in a
> particular workload to justify putting it into hardware.  Different
> design teams make different decisions for their needs.
>
> For example, at one end, most designs these days put floating point into
> hardware.  In the middle, some designs put things like encryption in
> hardware, others don't.  And some choose an intermediate solution with
> hardware encryption assist instructions.  At the other end, AFAIK, only
> IBM puts sort assist in hardware.

h264/CABAC is pretty much a worst possible scenario for a sw decoder:

CABAC -> Context Adaptive Binary Arithmetic Coder, i.e. a codec that
(mostly) extract single bits from the data stream, then branches on the
bit to one of two different contexts. It is only after this branch and
loading the corresponding context that decoding of the next bit can start.

With a compressed bitrate of 40 Mbit/s, expanding into a 1080p video
stream with very high fidelity, you are only eating a very small
fraction of an input bit for each output bit produced.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: What did it cost the 8086 to support unaligned access?

<u8clae$1rtka$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33095&group=comp.arch#33095

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
Date: Sat, 8 Jul 2023 16:43:35 -0500
Organization: A noiseless patient Spider
Lines: 249
Message-ID: <u8clae$1rtka$1@dont-email.me>
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u86pdf$vu1u$1@dont-email.me> <u86sra$1070j$1@dont-email.me>
<u86t8u$10b8l$1@dont-email.me> <u87onq$130mj$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 8 Jul 2023 21:43:43 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="c2dac11af96acc6744771f211e668e8a";
logging-data="1963658"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19nTgZ+BSP/BlPiTv+GVwgw"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:FkXA1whwH8/oU+fHwJvOrBoFTZ8=
Content-Language: en-US
In-Reply-To: <u87onq$130mj$1@dont-email.me>
 by: BGB - Sat, 8 Jul 2023 21:43 UTC

On 7/6/2023 8:11 PM, BGB wrote:
> On 7/6/2023 12:22 PM, Terje Mathisen wrote:
>> BGB wrote:
>>> On 7/6/2023 11:16 AM, Terje Mathisen wrote:
>>>> Russell Wallace wrote:
>>>>> The Intel 8086 supported unaligned loads and stores of 16-bit data,
>>>>> e.g. mov ax, foo was guaranteed to work even if foo was odd.
>>>>>
>>>>> What did this cost, in terms of performance and chip area, compared
>>>>> to an alternative architecture that would have been the same except
>>>>> for unaligned access being a trap or undefined behavior?
>>>>>
>>>>> To be clear, I'm not talking about the dynamic behavior of code. On
>>>>> the actual 8086, access was still faster if the pointer did happen to
>>>>> be even. I'm asking, suppose all your pointers for word access were
>>>>> actually even, how much bigger and slower was the chip made by having
>>>>> to support the possibility that some of them could have been odd?
>>>>
>>>> I would suggest that since they already knew that they would make an
>>>> 8-bit bus version (the 8088 which ended up in the IBM PC), the
>>>> control circuits already knew how to combine two 8-bit accesses into
>>>> a 16-bit load. In the '86 an aligned 16-bit load would run a single
>>>> bus cycle (taking 4 clock cycles), while the same operation on the
>>>> '88 took twice as long. Unless the '86 coud do unaligned accesses in
>>>> less than 8 cycles, I would guess the mechanism was the same!
>>>>
>>>
>>> This makes it seem like the 8086/8088 would have been painfully slow?...
>>
>> Oh, grasshopper, if only you knew! :-)
>>
>> It was in fact painfully slow. OTOH, it was possible to directly
>> calculate, with very high precision) how long any given code would
>> take since you could simply add together all code and data bytes read
>> or written and multiply by 4. It was only when you ran very slow ops,
>> like MUL/DIV or floating point that this could break down. For my own
>> code I assumed my 4.77 MHz cpu could handle 1 M bytes/second, most
>> fast code would run at maybe 250-300 K instructions/second.
>
> Hmm...
>
> Yeah, that seems kinda slow...
>
>
>
> My project seems to be averaging closer to 40-60 million instructions
> per second...
>

Adding a dynamic bundle/clock stat for the Verilog simulation, for Doom
startup (as far as it has gotten at the moment) it seems to be hovering
at around 0.64 bundle/clk, with ~ 1.19 instructions/bundle.

So, ~ 40 million instructions/seconds in this case.

Looks like when it starts hitting RAM a bit more, it drops a fair bit
lower (eg: ~ 0.2-0.4 bundles/clock when loading a PEL4 image or similar).

Where ~ 60 million seems to be more cases where there are minimal cache
misses (mostly CPU-bound C code). Higher is possible within ASM code
though (at least assuming it doesn't get wrecked by cache misses or
similar).

Where, 120-150 million represents a theoretical hard limit.

Seems at present, ringbus latency is actually being a bigger factor than
DRAM access latency. Also it seems that screen refresh was being done
using the MMIO interface, which is slower than by accessing the RAM
backing for the framebuffer (and skipping going through the MMIO interface).

Recently have been working on improving this.

Also seems like, due to an issue the display interface was thrashing the
ringbus (and apparently reducing RAM and L2 cache bandwidth to some
extent); mostly spamming the bus with huge numbers of L2 prefetch
requests... (Not entirely sure this is fixed, but has been reduced
slightly; may need to add logic to try to limit the VRAM mechanism from
spamming the bus in the case of missed cache lines).

....

> Quake performance still kinda sucks though...
>
> I guess, Doom/Heretic/Hexen/etc run reasonably well, but there were
> (presumably) intended to run mostly on 386 class PCs.
>
> What legacy stats I can find imply that my core is a bit faster than a
> 386 at least.
>

Then again, I guess maybe it is possible that "system requirements" in
those days were more "minimum required for the thing to run", rather
than "system on which this will deliver a consistent 30 fps", and
probably people going into a whole thing about how a game is
"unplayable" whenever it drops below 60 fps, wasn't really a thing...

My general metric for "playability" being "mostly consistently stays
above 10fps". Quake failing this metric as it is still usually below
10fps (though, in some cases my GLQuake port does go above 10fps, but
not consistently; the SW Quake port still being mostly stuck in the low
single digits).

The SW-Quake is mostly using a modified variant of the original C
software renderer, albeit modified to work with 16-bit pixels (RGB555).
In this case, the combination of 8-bit texel and lightmap texel are used
to lookup an RGB555 value in a colormap (as opposed to mapping it back
to an 8-bit value); with a 16-bit framebuffer being used. In this case,
surface textures with lighting applied are held in a "surface cache"
which is then with an edge-walking rasterizer (which uses a
perspective-correct drawing loop).

The final span-drawing loop is ASM, but everything before this point is
still C. The perspective correct drawing code uses an alternate
lower-precision FP divide which uses 2 N-R stages (0 or 1 N-R stages
leading to obvious rendering glitches).

The GL Quake port was modified somewhat, drawing using TKRA-GL, which
uses a mix of C and ASM for the primitive drawing (and some amount of
SIMD extensions to the C part). Well, and it also uses dynamic
tessellation rather than perspective-correct texturing.

Cheap hacks to improve speed:
This port tends to use vertex lighting rather than lightmap blending;
Sadly, dynamic lights and animated lighting don't currently work with
this. This has to be rebuilt dynamically from the lightmaps, as the
Quake BSP format has neither vertex lighting data nor information about
the original light-sources.

IIRC, algorithm for this first tries to rebuild a 3D grid of the light
at each point in the map (checking if the spot is not inside a wall,
skipping it if it is, and then ray-casting in various directions and
figuring out the lighting contribution from any surfaces hit by the
raycasts).

Once this is done, it can figure out lighting for vertices by linearly
interpolating between the points on the grid.

This was a modification of the original algorithm used to figure out
light levels for entities, which would mostly try to raycast down to the
floor and sample the lightmap at that point.

Though, one possible idea for animated lights could be to, for each grid
point, figuring out the strongest non-static lightstyle and then
recording this (along with the lightstyle number).

Another trick was a sort of "poor man's LOD" where 3D models frames are
rendered down to a sprite map (from a limited number of angles), with
distant 3D models being replaced by their sprite equivalents. Though,
the sprites are rendered at a low resolution and don't have alternate
skins, which is sometimes noticeable (otherwise, GLQuake will try to
draw every alias model in the PVS at full detail).

Well, and rewriting some stuff so that a lot of the use of
glBegin/glEnd/etc is replaced with glDrawArrays (where glBegin/glEnd has
a lot of needless overhead).

But, this isn't a "strictly authentic" port of GLQuake.

Granted, OpenGL "wasn't really a thing" on 386 and 486 class PCs (and
then generally after this, it was only really a thing with 3D cards).

So, using software-rasterized OpenGL for real-time 3D stuff doesn't seem
like it was really a thing.

....

Granted, it is possible there could be lower-overhead ways of doing 3D
rasterization than by exposing it behind the OpenGL API.

>
>
>>>
>>> Like, how exactly did they run programs like Wolfenstein 3D or the
>>> various platformer games?...
>>
>> You did not run Wolfenstein until the 386!
>
> OK. It was real-mode code, so I had guessed maybe it was intended to run
> on older PCs as well.
>
>
> I had noted that the code for the "Commander Keen" and "Duke Nukem"
> games was also 16-bit real-mode (these being side-scrolling
> platformers). I had not bothered trying to port these to BJX2 mostly as
> I don't want to have to deal with rewriting a bunch of 16-bit assembler
> code into C or similar.
>
> And also (like Wolf3D), even if I did port them, I could not legally
> redistribute the modified versions.
>
> Seemingly it was only Doom and later that had their source released
> under GPL. Well, also Doom and later being originally 32-bit code.
>
>
> Can note that for my ROTT port (was effectively a highly modified 32-bit
> variant of the Wolf3D engine), they had been using the VGA card in a
> weird way (basically, a planar mode).
>
> Rather than rewrite basically the whole renderer to use a planar
> framebuffer, just sort of ended up wrapping everything in function calls
> (so, basically function calls to update the plane mask or plot pixels on
> the screen; in place of the IO port twiddling and stores into the VGA
> memory).
>
> Though, some of the column and span-drawing functions partly sidestep
> this though (in the name of not being horridly slow).
>
>
>>>
>>> Like, even with all my fancy stuff, and a 1-cycle throughput for many
>>> memory accesses to the L1 cache, still difficult to get any semblance
>>> of usable performance with things like Wolf3D much under ~ 10-14 MHz ...
>>>
>>>
>>> Granted, a lot of these also required VGA, so maybe running them on
>>> the original PC wasn't really a thing even if they were originally
>>> written for 16-bit real-mode?...
>>
>> That's your answer right there. :-)
>>
>
> OK.
>
> Apparently Keen had a CGA mode. No idea how well it would run on an 8088
> though.
>
> Granted, this was "old tech" that had mostly already gone away by the
> time I was in elementary school (which at the time was mostly a world of
> PC's running Win 3.11 or Win 95 or similar...).
>
>


Click here to read the complete article
Re: What did it cost the 8086 to support unaligned access?

<u8iv4l$fs7r$2@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33171&group=comp.arch#33171

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-15fa-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
Date: Tue, 11 Jul 2023 07:08:05 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <u8iv4l$fs7r$2@newsreader4.netcologne.de>
References: <u89e7b$9l3b$2@newsreader4.netcologne.de>
<b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de>
<1jnpM.980$8Ma1.956@fx37.iad>
<e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>
<168875049023.17324.14559715909020368064@media.vsta.org>
<u89oj9$1dqui$1@dont-email.me> <u89r8q$1e5ro$1@dont-email.me>
<971b7842-c64e-4375-811c-3e03a2b6f25dn@googlegroups.com>
<u8arne$1lfhi$1@dont-email.me>
Injection-Date: Tue, 11 Jul 2023 07:08:05 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-15fa-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:15fa:0:7285:c2ff:fe6c:992d";
logging-data="520443"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Tue, 11 Jul 2023 07:08 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:

> The reason for software is to handle those tasks
> that either were not anticipated by the hardware designers, i.e.
> software is far more flexible and more easily changed, or those tasks
> that are just too complex to put in hardware, e.g. no one would consider
> putting a Fortran compiler in hardware :-).

I think IBM had a linkage editor as a machine instruction in the
Future Systems design. It was microcoded, I assume.

Re: misaligned Fortran, What did it cost the 8086 to support unaligned access?

<u8j1gr$fu2o$1@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33172&group=comp.arch#33172

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-15fa-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: misaligned Fortran, What did it cost the 8086 to support
unaligned access?
Date: Tue, 11 Jul 2023 07:48:43 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <u8j1gr$fu2o$1@newsreader4.netcologne.de>
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u86sbp$7vac$2@newsreader4.netcologne.de>
<85db67ca-9fd3-4e7d-bcc3-adea91075af1n@googlegroups.com>
<u894ir$9e6s$1@newsreader4.netcologne.de> <u89cig$2dm4$1@gal.iecc.com>
Injection-Date: Tue, 11 Jul 2023 07:48:43 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-15fa-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:15fa:0:7285:c2ff:fe6c:992d";
logging-data="522328"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Tue, 11 Jul 2023 07:48 UTC

John Levine <johnl@taugh.com> schrieb:

> IBM's compilers warn you about misaligned data but they still make it
> work, slowly in software in the 1960s, fast with hardware since then.
> They had to, they had paying customers with large programs they needed
> to run.

Makes me wonder how they handled (in the 1960s, in software)

DOUBLE PRECISION D,E
REAL A,B
COMMON /FOO/ A,D,B,E
CALL BAR(D,E)

....

SUBROUTINE BAR(D,E)
DOUBLE PRECISION D,E

on the callee side?

Re: misaligned Fortran, What did it cost the 8086 to support unaligned access?

<u8k3hm$2jj1$1@gal.iecc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33185&group=comp.arch#33185

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: johnl@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: misaligned Fortran, What did it cost the 8086 to support
unaligned access?
Date: Tue, 11 Jul 2023 17:29:26 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <u8k3hm$2jj1$1@gal.iecc.com>
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com> <u894ir$9e6s$1@newsreader4.netcologne.de> <u89cig$2dm4$1@gal.iecc.com> <u8j1gr$fu2o$1@newsreader4.netcologne.de>
Injection-Date: Tue, 11 Jul 2023 17:29:26 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="85601"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com> <u894ir$9e6s$1@newsreader4.netcologne.de> <u89cig$2dm4$1@gal.iecc.com> <u8j1gr$fu2o$1@newsreader4.netcologne.de>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Tue, 11 Jul 2023 17:29 UTC

According to Thomas Koenig <tkoenig@netcologne.de>:
>> IBM's compilers warn you about misaligned data but they still make it
>> work, slowly in software in the 1960s, fast with hardware since then.
>> They had to, they had paying customers with large programs they needed
>> to run.
>
>Makes me wonder how they handled (in the 1960s, in software)
>
> DOUBLE PRECISION D,E
> REAL A,B
> COMMON /FOO/ A,D,B,E
> CALL BAR(D,E)
>
>...
>
> SUBROUTINE BAR(D,E)
> DOUBLE PRECISION D,E
>
>on the callee side?

It's what I said, the Fortran library caught and fixed up the
alignment fault when it fetched or stored the variable. It would even
work if you linked assembler subroutines into your Fortran program and
they fetched or stored misaligned operands.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: misaligned Fortran, What did it cost the 8086 to support unaligned access?

<308ce455-15f4-4d51-9837-db038388794cn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33186&group=comp.arch#33186

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:2e49:b0:636:af26:6aa with SMTP id my9-20020a0562142e4900b00636af2606aamr118172qvb.3.1689097597163;
Tue, 11 Jul 2023 10:46:37 -0700 (PDT)
X-Received: by 2002:a9d:7553:0:b0:6b7:45a8:a80c with SMTP id
b19-20020a9d7553000000b006b745a8a80cmr2293810otl.3.1689097596922; Tue, 11 Jul
2023 10:46:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 11 Jul 2023 10:46:36 -0700 (PDT)
In-Reply-To: <u8j1gr$fu2o$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:68c6:c068:dabf:2e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:68c6:c068:dabf:2e
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u86sbp$7vac$2@newsreader4.netcologne.de> <85db67ca-9fd3-4e7d-bcc3-adea91075af1n@googlegroups.com>
<u894ir$9e6s$1@newsreader4.netcologne.de> <u89cig$2dm4$1@gal.iecc.com> <u8j1gr$fu2o$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <308ce455-15f4-4d51-9837-db038388794cn@googlegroups.com>
Subject: Re: misaligned Fortran, What did it cost the 8086 to support
unaligned access?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Tue, 11 Jul 2023 17:46:37 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2423
 by: MitchAlsup - Tue, 11 Jul 2023 17:46 UTC

On Tuesday, July 11, 2023 at 2:48:48 AM UTC-5, Thomas Koenig wrote:
> John Levine <jo...@taugh.com> schrieb:
> > IBM's compilers warn you about misaligned data but they still make it
> > work, slowly in software in the 1960s, fast with hardware since then.
> > They had to, they had paying customers with large programs they needed
> > to run.
> Makes me wonder how they handled (in the 1960s, in software)
>
> DOUBLE PRECISION D,E
> REAL A,B
> COMMON /FOO/ A,D,B,E
> CALL BAR(D,E)
LD R7,&FOO(R13) // get address of FOO
LDA R1,4(R7) // get address of D
LDA R2,16(R7) // get address of E
BAL BAR // CALL BAR
>
> ...
>
> SUBROUTINE BAR(D,E)
> DOUBLE PRECISION D,E
>
> on the callee side?
<
FORTRAN was pass by address........exceptions happen in BAR.

Pages:123
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor