Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

"We came. We saw. We kicked its ass." -- Bill Murray, _Ghostbusters_


devel / comp.arch / Re: What did it cost the 8086 to support unaligned access?

SubjectAuthor
* What did it cost the 8086 to support unaligned access?Russell Wallace
+* Re: What did it cost the 8086 to support unaligned access?John Levine
|+* Re: What did it cost the 8086 to support unaligned access?MitchAlsup
||+- Re: What did it cost the 8086 to support unaligned access?John Levine
||+- Re: What did it cost the 8086 to support unaligned access?Quadibloc
||`* Re: What did it cost the 8086 to support unaligned access?Thomas Koenig
|| +* Re: What did it cost the 8086 to support unaligned access?MitchAlsup
|| |`* Re: What did it cost the 8086 to support unaligned access?Thomas Koenig
|| | `* Re: misaligned Fortran, What did it cost the 8086 to support unaligned access?John Levine
|| |  +* Re: misaligned Fortran, What did it cost the 8086 to supportThomas Koenig
|| |  |`* Re: misaligned Fortran, What did it cost the 8086 to supportJohn Levine
|| |  | `- Re: misaligned Fortran, What did it cost the 8086 to supportMitchAlsup
|| |  `* Re: misaligned Fortran, What did it cost the 8086 to supportThomas Koenig
|| |   +- Re: misaligned Fortran, What did it cost the 8086 to supportJohn Levine
|| |   `- Re: misaligned Fortran, What did it cost the 8086 to supportMitchAlsup
|| +- Re: old Fortran, What did it cost the 8086 to support unaligned access?John Levine
|| `- Re: What did it cost the 8086 to support unaligned access?Anton Ertl
|+* Re: What did it cost the 8086 to support unaligned access?Thomas Koenig
||+- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
||+* Re: What did it cost the 8086 to support unaligned access?Michael S
|||`- Re: What did it cost the 8086 to support unaligned access?BGB
||`* Re: What did it cost the 8086 to support unaligned access?EricP
|| +* Re: What did it cost the 8086 to support unaligned access?Quadibloc
|| |+* Re: What did it cost the 8086 to support unaligned access?EricP
|| ||+* Re: What did it cost the 8086 to support unaligned access?Quadibloc
|| |||+- Re: What did it cost the 8086 to support unaligned access?EricP
|| |||`- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
|| ||`- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
|| |`- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
|| +* Re: What did it cost the 8086 to support unaligned access?Anton Ertl
|| |+* Re: What did it cost the 8086 to support unaligned access?robf...@gmail.com
|| ||`- Re: What did it cost the 8086 to support unaligned access?BGB
|| |+- Re: What did it cost the 8086 to support unaligned access?Quadibloc
|| |`* Re: What did it cost the 8086 to support unaligned access?EricP
|| | `- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
|| `* Re: What did it cost the 8086 to support unaligned access?Timothy McCaffrey
||  +- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
||  +* Re: What did it cost the 8086 to support unaligned access?Thomas Koenig
||  |`- Re: What did it cost the 8086 to support unaligned access?BGB
||  +* Re: What did it cost the 8086 to support unaligned access?Andy Valencia
||  |`* Re: What did it cost the 8086 to support unaligned access?Terje Mathisen
||  | +* Re: What did it cost the 8086 to support unaligned access?Stephen Fuld
||  | |+* Re: What did it cost the 8086 to support unaligned access?MitchAlsup
||  | ||`* Re: What did it cost the 8086 to support unaligned access?Stephen Fuld
||  | || `- Re: What did it cost the 8086 to support unaligned access?Thomas Koenig
||  | |`- Re: What did it cost the 8086 to support unaligned access?Terje Mathisen
||  | `- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
||  `- Re: What did it cost the 8086 to support unaligned access?Anton Ertl
|`* Re: What did it cost the 8086 to support unaligned access?Michael S
| `- Re: What did it cost the 8086 to support unaligned access?John Levine
+- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
+* Re: What did it cost the 8086 to support unaligned access?Quadibloc
|`- Re: What did it cost the 8086 to support unaligned access?MitchAlsup
+* Re: What did it cost the 8086 to support unaligned access?Terje Mathisen
|`* Re: What did it cost the 8086 to support unaligned access?BGB
| `* Re: What did it cost the 8086 to support unaligned access?Terje Mathisen
|  `* Re: What did it cost the 8086 to support unaligned access?BGB
|   `- Re: What did it cost the 8086 to support unaligned access?BGB
`- Re: What did it cost the 8086 to support unaligned access?EricP

Pages:123
Re: What did it cost the 8086 to support unaligned access?

<u86t8u$10b8l$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33055&group=comp.arch#33055

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
Date: Thu, 6 Jul 2023 19:22:37 +0200
Organization: A noiseless patient Spider
Lines: 58
Message-ID: <u86t8u$10b8l$1@dont-email.me>
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u86pdf$vu1u$1@dont-email.me> <u86sra$1070j$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 6 Jul 2023 17:22:38 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="9899d6265b6f4364a9aa908dba5a298c";
logging-data="1060117"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+1LRRM8790+WIB3zvPbI00pqgcN5WX0t2QNT2IyudtjA=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.16
Cancel-Lock: sha1:2vK16ZEeVN31aVlvk6mLH69iy10=
In-Reply-To: <u86sra$1070j$1@dont-email.me>
 by: Terje Mathisen - Thu, 6 Jul 2023 17:22 UTC

BGB wrote:
> On 7/6/2023 11:16 AM, Terje Mathisen wrote:
>> Russell Wallace wrote:
>>> The Intel 8086 supported unaligned loads and stores of 16-bit data,
>>> e.g. mov ax, foo was guaranteed to work even if foo was odd.
>>>
>>> What did this cost, in terms of performance and chip area, compared
>>> to an alternative architecture that would have been the same except
>>> for unaligned access being a trap or undefined behavior?
>>>
>>> To be clear, I'm not talking about the dynamic behavior of code. On
>>> the actual 8086, access was still faster if the pointer did happen to
>>> be even. I'm asking, suppose all your pointers for word access were
>>> actually even, how much bigger and slower was the chip made by having
>>> to support the possibility that some of them could have been odd?
>>
>> I would suggest that since they already knew that they would make an
>> 8-bit bus version (the 8088 which ended up in the IBM PC), the control
>> circuits already knew how to combine two 8-bit accesses into a 16-bit
>> load. In the '86 an aligned 16-bit load would run a single bus cycle
>> (taking 4 clock cycles), while the same operation on the '88 took
>> twice as long. Unless the '86 coud do unaligned accesses in less than
>> 8 cycles, I would guess the mechanism was the same!
>>
>
> This makes it seem like the 8086/8088 would have been painfully slow?...

Oh, grasshopper, if only you knew! :-)

It was in fact painfully slow. OTOH, it was possible to directly
calculate, with very high precision) how long any given code would take
since you could simply add together all code and data bytes read or
written and multiply by 4. It was only when you ran very slow ops, like
MUL/DIV or floating point that this could break down. For my own code I
assumed my 4.77 MHz cpu could handle 1 M bytes/second, most fast code
would run at maybe 250-300 K instructions/second.
>
> Like, how exactly did they run programs like Wolfenstein 3D or the
> various platformer games?...

You did not run Wolfenstein until the 386!
>
> Like, even with all my fancy stuff, and a 1-cycle throughput for many
> memory accesses to the L1 cache, still difficult to get any semblance of
> usable performance with things like Wolf3D much under ~ 10-14 MHz ...
>
>
> Granted, a lot of these also required VGA, so maybe running them on the
> original PC wasn't really a thing even if they were originally written
> for 16-bit real-mode?...

That's your answer right there. :-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: What did it cost the 8086 to support unaligned access?

<aaced7dc-e0dd-49d7-ad63-2774f222a0ffn@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33057&group=comp.arch#33057

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:243:b0:3fd:dfa0:12b3 with SMTP id c3-20020a05622a024300b003fddfa012b3mr7868qtx.7.1688666434355;
Thu, 06 Jul 2023 11:00:34 -0700 (PDT)
X-Received: by 2002:a17:903:18b:b0:1b8:921e:e1a3 with SMTP id
z11-20020a170903018b00b001b8921ee1a3mr2460976plg.10.1688666433859; Thu, 06
Jul 2023 11:00:33 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.cmpublishers.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 Jul 2023 11:00:33 -0700 (PDT)
In-Reply-To: <7f5a951f-4dd1-45ac-9130-9d783c0eeb37n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8a5:2639:4a28:1128;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8a5:2639:4a28:1128
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com> <7f5a951f-4dd1-45ac-9130-9d783c0eeb37n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <aaced7dc-e0dd-49d7-ad63-2774f222a0ffn@googlegroups.com>
Subject: Re: What did it cost the 8086 to support unaligned access?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 06 Jul 2023 18:00:34 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2662
 by: MitchAlsup - Thu, 6 Jul 2023 18:00 UTC

On Thursday, July 6, 2023 at 1:47:39 AM UTC-5, Quadibloc wrote:
> On Wednesday, July 5, 2023 at 2:14:05 PM UTC-6, Russell Wallace wrote:
>
> > What did this cost, in terms of performance and chip area, compared
> > to an alternative architecture that would have been the same except
> > for unaligned access being a trap or undefined behavior?
> What does support for unaligned operations require?
>
> Basically, what has to happen is:
>
> Every time there's a memory access, there has to be a
> check for an unaligned access.
<
Check
>
> If there is an unaligned access, the sequence of events is now
> changed: the memory access is broken up into a larger number of
> memory accesses, and, in the case of a load, the operand is then
> shifted and assembled; in the case of a store, it's broken up and
> the parts are shifted before the actual memory access.
<
When building an aligned only machine, cache access width largest LD/ST datum.
<
When building an unaligned machine, cache access width = 2×
largest LD/ST datum.
<
This makes detecting and processing misaligneds a bit harder
but 7/8ths of them are processed in the same single cycle as
aligneds. Only the cache line crossing misaligned need 2 memory
accesses (or a special sequence).
>
> John Savard

Re: What did it cost the 8086 to support unaligned access?

<e1b647b7-955a-4b54-9b71-03fcc8a90a16n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33058&group=comp.arch#33058

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:11cb:b0:3fd:f5dd:bd79 with SMTP id n11-20020a05622a11cb00b003fdf5ddbd79mr6920qtk.10.1688666828028;
Thu, 06 Jul 2023 11:07:08 -0700 (PDT)
X-Received: by 2002:a05:6a00:9a9:b0:682:69ee:5037 with SMTP id
u41-20020a056a0009a900b0068269ee5037mr3476711pfg.0.1688666827620; Thu, 06 Jul
2023 11:07:07 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 Jul 2023 11:07:07 -0700 (PDT)
In-Reply-To: <923886d2-485f-4447-8629-f62aaebf0cf2n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8a5:2639:4a28:1128;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8a5:2639:4a28:1128
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de>
<1jnpM.980$8Ma1.956@fx37.iad> <923886d2-485f-4447-8629-f62aaebf0cf2n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e1b647b7-955a-4b54-9b71-03fcc8a90a16n@googlegroups.com>
Subject: Re: What did it cost the 8086 to support unaligned access?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 06 Jul 2023 18:07:08 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2905
 by: MitchAlsup - Thu, 6 Jul 2023 18:07 UTC

On Thursday, July 6, 2023 at 1:58:31 AM UTC-5, Quadibloc wrote:
> On Wednesday, July 5, 2023 at 5:48:17 PM UTC-6, EricP wrote:
>
> > And then the Standford MIPS came along,
> > And then the Alpha 21064 came along,
> Oh, dear. What a pity. While support for unaligned
> memory access is a convenience, I would have thought
> of it as a frill, that can, and should, be dispensed with on
> an architecture designed, say, for ultimate performance
> in high-performance computing.
>
> But if, in the real world, you are actually going to have to
> occasionally trap and emulate unaligned accesses, then
> unless "occasionally" is _very_ rare indeed, hardware support
> will be preferred.
<
If trapping and emulation take 1,000 cycles, misaligned must
be very rare.
<
If trapping and emulation take 100 cycles, misaligned must
be rare.
<
If trapping and emulation take 30 cycles, nobody will notice.
<
And this is why I spent so much time making My 66000 arch-
itecture have fast switches.
>
> Since the original System/360 got along reasonably well,
> and people wrote a version of SNOBOL for it, and a version
> of LISP for it, and so on... I had always assumed that
> unaligned access is not really needed; it's just a frill. But
> the history you're giving certainly suggests that for _some_
> reason, the world of computers today _is_ basically
> dependent on having this feature.
>
> John Savard

Re: What did it cost the 8086 to support unaligned access?

<fb10b1f0-af9e-4ed4-a765-827467e150e4n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33059&group=comp.arch#33059

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1a89:b0:767:15f4:7a81 with SMTP id bl9-20020a05620a1a8900b0076715f47a81mr5975qkb.10.1688667325728;
Thu, 06 Jul 2023 11:15:25 -0700 (PDT)
X-Received: by 2002:a17:902:e810:b0:1b8:c6ba:bf75 with SMTP id
u16-20020a170902e81000b001b8c6babf75mr1610160plg.0.1688667325281; Thu, 06 Jul
2023 11:15:25 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 Jul 2023 11:15:24 -0700 (PDT)
In-Reply-To: <3czpM.1716$eGef.678@fx47.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8a5:2639:4a28:1128;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8a5:2639:4a28:1128
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de>
<1jnpM.980$8Ma1.956@fx37.iad> <923886d2-485f-4447-8629-f62aaebf0cf2n@googlegroups.com>
<3czpM.1716$eGef.678@fx47.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fb10b1f0-af9e-4ed4-a765-827467e150e4n@googlegroups.com>
Subject: Re: What did it cost the 8086 to support unaligned access?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 06 Jul 2023 18:15:25 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3750
 by: MitchAlsup - Thu, 6 Jul 2023 18:15 UTC

On Thursday, July 6, 2023 at 8:20:03 AM UTC-5, EricP wrote:
> Quadibloc wrote:
> > On Wednesday, July 5, 2023 at 5:48:17 PM UTC-6, EricP wrote:
> >
> >> And then the Standford MIPS came along,
> >> And then the Alpha 21064 came along,
> >
> > Oh, dear. What a pity. While support for unaligned
> > memory access is a convenience, I would have thought
> > of it as a frill, that can, and should, be dispensed with on
> > an architecture designed, say, for ultimate performance
> > in high-performance computing.
> >
> > But if, in the real world, you are actually going to have to
> > occasionally trap and emulate unaligned accesses, then
> > unless "occasionally" is _very_ rare indeed, hardware support
> > will be preferred.
> The overhead of trap-and-emulate is the problem people most often cite
> because it is in their face.
>
> The far more insidious hidden problem is that while byte memory access cpus
> *define* that reads and writes to adjacent memory cells are independent,
> this is NOT true for cpus that use larger alignment and must perform
> such updates as a read-modify-write operation in registers.
<
This is why My 66000 architecture describe LDs and STs as ATOMIC
actors only when the access is aligned.
>
> That RMW sequence means that adjacent byte and word and all unaligned
> accesses are not independent as an update to one variable can clobber
> and lose a concurrent update to an adjacent variable.
<
And then there is the false sharing problems spanning cache lines.
>
> This race condition can show up even on uni-processors due to
<
Poorly defined
<
> interrupts
> and exceptions.
<
models.
>
> The fix is that all such byte, word and misaligned writes must be performed
> as atomic LL-SC sequences. But since you don't know which variables are
> adjacent then to be safe all such updates must be atomic. This turns each
> such write into a subroutine call so the performance is terrible.
> And this change must be applied to all code, just in case.
<
Conversely, you define ATOMIC as only supporting aligned accesses.

Re: What did it cost the 8086 to support unaligned access?

<08e47e5c-1489-4def-8d0f-0c6dede01201n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33060&group=comp.arch#33060

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:2e49:b0:636:af26:6aa with SMTP id my9-20020a0562142e4900b00636af2606aamr29341qvb.3.1688667548588;
Thu, 06 Jul 2023 11:19:08 -0700 (PDT)
X-Received: by 2002:a17:902:e549:b0:1b7:f5be:c934 with SMTP id
n9-20020a170902e54900b001b7f5bec934mr2486484plf.9.1688667548398; Thu, 06 Jul
2023 11:19:08 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 Jul 2023 11:19:07 -0700 (PDT)
In-Reply-To: <77a9c815-775b-4836-b623-73553a5f5a3dn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8a5:2639:4a28:1128;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8a5:2639:4a28:1128
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de>
<1jnpM.980$8Ma1.956@fx37.iad> <923886d2-485f-4447-8629-f62aaebf0cf2n@googlegroups.com>
<3czpM.1716$eGef.678@fx47.iad> <77a9c815-775b-4836-b623-73553a5f5a3dn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <08e47e5c-1489-4def-8d0f-0c6dede01201n@googlegroups.com>
Subject: Re: What did it cost the 8086 to support unaligned access?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 06 Jul 2023 18:19:08 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3338
 by: MitchAlsup - Thu, 6 Jul 2023 18:19 UTC

On Thursday, July 6, 2023 at 8:47:17 AM UTC-5, Quadibloc wrote:
> On Thursday, July 6, 2023 at 7:20:03 AM UTC-6, EricP wrote:
>
> > The far more insidious hidden problem is that while byte memory access cpus
> > *define* that reads and writes to adjacent memory cells are independent,
> > this is NOT true for cpus that use larger alignment and must perform
> > such updates as a read-modify-write operation in registers.
> >
> > That RMW sequence means that adjacent byte and word and all unaligned
> > accesses are not independent as an update to one variable can clobber
> > and lose a concurrent update to an adjacent variable.
<
> I thought that x86 CPUs were designed to solve this problem.
<
x86-64 solves the misaligned problem and goes on to say that misaligned
memory containers cannot be relied upon for ATOMIC access.
<
Say you perform a misaligned DW load to a cache line spanning
memory container, and the first line is IN your cache and the
second is not. The core will access the first line, and put data
in a holding register, while sending an access to memory. While
waiting on memory to respond, a snoop comes by and steals the
data in your holding register. Neither you nor the thief see this
access as ATOMIC.
>
> The memory bus is organized so that there is one enable line for each seven
> data lines, and, therefore, any read or write operation can be made to operate
> only on specific bytes, no matter how wide the data bus.
>
> And so a misaligned 64-bit data item, on a computer with a 64-bit data bus,
> can always be fetched or written in only two accesses.
>
> Of course, fetches are usually of an entire DRAM data line into the cache,
> but that's another issue.
>
> John Savard

Re: What did it cost the 8086 to support unaligned access?

<85db67ca-9fd3-4e7d-bcc3-adea91075af1n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33061&group=comp.arch#33061

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:17ac:b0:762:29f9:c47a with SMTP id ay44-20020a05620a17ac00b0076229f9c47amr6268qkb.15.1688667734199;
Thu, 06 Jul 2023 11:22:14 -0700 (PDT)
X-Received: by 2002:a63:33c7:0:b0:548:31da:92b1 with SMTP id
z190-20020a6333c7000000b0054831da92b1mr1603520pgz.3.1688667733649; Thu, 06
Jul 2023 11:22:13 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.niel.me!glou.org!news.glou.org!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 Jul 2023 11:22:13 -0700 (PDT)
In-Reply-To: <u86sbp$7vac$2@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8a5:2639:4a28:1128;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8a5:2639:4a28:1128
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <4733a536-4669-4930-931f-9b58c644b9b4n@googlegroups.com>
<u86sbp$7vac$2@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <85db67ca-9fd3-4e7d-bcc3-adea91075af1n@googlegroups.com>
Subject: Re: What did it cost the 8086 to support unaligned access?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 06 Jul 2023 18:22:14 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Thu, 6 Jul 2023 18:22 UTC

On Thursday, July 6, 2023 at 12:07:09 PM UTC-5, Thomas Koenig wrote:
> MitchAlsup <Mitch...@aol.com> schrieb:
> > FORTRAN common blocks required misaligned DP FP accesses.
> That is probably the most-ignored part of the Fortran standard.
> Even the very first FORTRAN 77 compiler, by Bell Labs, aligned the
> data in COMMON blocks.
<
HOW ??
<
compile unit 1:
<
COMMON /A/ INT, INT, INT, DP, DP, DP[10]
<
compile unit 2:
<
COMMON /A/ DP, DP, INT, INT, INT, DP[10]
<
?????
>
> The x86-64 psABI also specified aligned COMMON blocks (they are
> treated the same as structs).
>
> Hm... which ABI actually specifies non-aligned access? I don't
> know of any.

Re: What did it cost the 8086 to support unaligned access?

<jEDpM.3111$cFK.2817@fx34.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33062&group=comp.arch#33062

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx34.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
In-Reply-To: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 54
Message-ID: <jEDpM.3111$cFK.2817@fx34.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Thu, 06 Jul 2023 18:23:11 UTC
Date: Thu, 06 Jul 2023 14:22:58 -0400
X-Received-Bytes: 3948
 by: EricP - Thu, 6 Jul 2023 18:22 UTC

Russell Wallace wrote:
> The Intel 8086 supported unaligned loads and stores of 16-bit data, e.g. mov ax, foo was guaranteed to work even if foo was odd.
>
> What did this cost, in terms of performance and chip area, compared to an alternative architecture that would have been the same except for unaligned access being a trap or undefined behavior?

In the 8086 patent

Extended address, single and multiple bit microprocessor
https://patents.google.com/patent/US4449184A

in figure 1 there is an adder #60 in the address data path for the
segment offsets and it feeds that to the memory address register #62.
There is also an address feedback path through mux #56.

Once triggered by microcode the bus interface logic likely handles the
bus control line sequencing for a single bus access on its own.
That's should be a couple of flip-flops and some glue logic.

My guess - to handle odd addresses microcode would need to test the low
address bit A0 and execute an extra microcycle to recirculate the first
address through mux #56 and increment it. So the cost is 1 extra select
input to the uCode sequencer jump circuit and 1 uWord to do the bus access.

> To be clear, I'm not talking about the dynamic behavior of code. On the actual 8086, access was still faster if the pointer did happen to be even. I'm asking, suppose all your pointers for word access were actually even, how much bigger and slower was the chip made by having to support the possibility that some of them could have been odd?
>
> My first thought is that obviously a load/store circuit that doesn't need to take into account the possibility of unaligned access (and be ready to do the complicated fallback routine of two accesses and splicing the parts together) must clearly be simpler, therefore smaller and faster, then one does need to take into account this possibility.
>
> On the other hand, the instruction decoder needs to support unaligned access anyway.

Ken Shirriff's blog reverse engineeers lots of things including the 8086.

http://www.righto.com/search/label/8086

Inside the 8086 processor's instruction prefetch circuitry
http://www.righto.com/2023/01/inside-8086-processors-instruction.html

> On the third hand, that might not be relevant here; the instruction decoder is probably a completely separate circuit that doesn't share parts with the data load/store circuitry.
>
> Which points back to the conclusion that the chip could have been smaller and faster if it didn't support unaligned access.
>
> Then again, maybe the test for the fallback case, wasn't a bottleneck in cycle time? In that case, maybe it only cost chip area?
>
> Or maybe it was an extra microcode stage? In that case, every aligned access would have been at least one full clock cycle slower, just to support the possibility of unaligned access?

Microcode can select signals and branch based on them so it only
needs an extra uCycle if a word operand and the address A0 == 1.

How the 8086 processor's microcode engine works
http://www.righto.com/2022/11/how-8086-processors-microcode-engine.html

Example microcode jump logic select the next address low address bit.
http://static.righto.com/images/8086-mc-overview/addr-reg.jpg

Re: old Fortran, What did it cost the 8086 to support unaligned access?

<u87149$6mu$1@gal.iecc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33063&group=comp.arch#33063

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: johnl@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: old Fortran, What did it cost the 8086 to support unaligned access?
Date: Thu, 6 Jul 2023 18:28:25 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <u87149$6mu$1@gal.iecc.com>
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com> <u84l35$1aq4$1@gal.iecc.com> <4733a536-4669-4930-931f-9b58c644b9b4n@googlegroups.com> <u86sbp$7vac$2@newsreader4.netcologne.de>
Injection-Date: Thu, 6 Jul 2023 18:28:25 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="6878"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com> <u84l35$1aq4$1@gal.iecc.com> <4733a536-4669-4930-931f-9b58c644b9b4n@googlegroups.com> <u86sbp$7vac$2@newsreader4.netcologne.de>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Thu, 6 Jul 2023 18:28 UTC

It appears that Thomas Koenig <tkoenig@netcologne.de> said:
>Hm... which ABI actually specifies non-aligned access? I don't
>know of any.

That would be Fortran 66 and 77. Try this:

REAL A(100)
DOUBLE PRECISION B(49), C(49)
EQUIVALENCE (A(1), B), (A(2), C)

or this:

REAL A(3)
DOUBLE PRECISION C
COMMON /FOO/ A,C

In both of those, C will be on an odd word. On the 70x machines it
didn't matter, because everything was word addressed, but on S/360 it
did and by then there was a whole lot of Fortran code so it was too
late.

The S/360 Fortran manuals tell you that is a bad idea and misaligned
code will run very slowly. If your Fortran program tried to do a
misaligned instruction, the library caught the trap and fixed it up.
The first ten times it did that it printed a warning, then just shut
up and did it, slowly.

As I said in another message, the 360/91 which was intended as a
Fortran machine with fast out of order floating point arithmetic but
no decimal arithmetic. It also had imprecise interrupts so the Fortran
fixup code didn't work. So by 1968 the follow-on 360/195 had the Byte
Oriented Operand feature which allowed misaligned data. So did the
360/85, the first machine with a cache. That was sort of a reverse
embarassment. The cache and fast multiply worked so well that it gave
you most of the performance of a /91 for a lot less money.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: What did it cost the 8086 to support unaligned access?

<XrFpM.44364$N3_4.3796@fx10.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33071&group=comp.arch#33071

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx10.iad.POSTED!not-for-mail
From: ThatWouldBeTelling@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com> <u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de> <1jnpM.980$8Ma1.956@fx37.iad> <2023Jul6.082239@mips.complang.tuwien.ac.at>
In-Reply-To: <2023Jul6.082239@mips.complang.tuwien.ac.at>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 36
Message-ID: <XrFpM.44364$N3_4.3796@fx10.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Thu, 06 Jul 2023 20:26:31 UTC
Date: Thu, 06 Jul 2023 16:26:22 -0400
X-Received-Bytes: 2543
 by: EricP - Thu, 6 Jul 2023 20:26 UTC

Anton Ertl wrote:
> EricP <ThatWouldBeTelling@thevillage.com> writes:
>> And then the Alpha 21064 came along, believed this, eliminated unaligned
>> loads and stores, and found out humans do this *a lot*.
>
> It's also been a while since I used an Alpha, but even on the 21264B
> we saw notices of unaligned accesses by programs in the Linux kernel
> logs, and I wrote <https://www.complang.tuwien.ac.at/anton/uace.c> to
> control this Linux kernel feature (a similar program uac was available
> on Digital OSF/1 (or whatever it had been renamed to at the time), and
> as a result I (and my students) got SIGBUS signals on unaligned
> accesses even on Linux (where normally the kernel emulated unaligned
> accesses).
>
> Alpha also has ldq_u (which performed an aligned access using a
> possibly unaligned address), and pseudo-instructions like ustq (which
> expands to 11 instructions IIRC). I actually found that the gas
> implementation of ustq was broken, which showed that no Linux software
> used this pseudo-instruction.
>
> You may be confusing the unaligned access issue with byte and 16-bit
> memory access, which was added in EV56.

Not confused but rather lumping them all together on the theory
that once it has a shifter for natually aligned halfwords and bytes
that the extra logic for unaligned halfwords and added cost to the
critical logic path (which is what the risc's were all concerned about)
is minimal until one deals with straddles. And once straddles are
involved then it requires a whole separate clock cycle with its own
critical path.

So it seems to me that once it does aligned halfwords that full
unaligned accesses should be very little extra.

Re: What did it cost the 8086 to support unaligned access?

<e45955d0-85c0-4d98-8b4d-ffe75fefcf07n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33073&group=comp.arch#33073

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:28d2:b0:763:a459:529f with SMTP id l18-20020a05620a28d200b00763a459529fmr8854qkp.13.1688675481822;
Thu, 06 Jul 2023 13:31:21 -0700 (PDT)
X-Received: by 2002:a63:2bd0:0:b0:548:7fc1:8cf4 with SMTP id
r199-20020a632bd0000000b005487fc18cf4mr1856925pgr.4.1688675481343; Thu, 06
Jul 2023 13:31:21 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 Jul 2023 13:31:20 -0700 (PDT)
In-Reply-To: <XrFpM.44364$N3_4.3796@fx10.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8a5:2639:4a28:1128;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8a5:2639:4a28:1128
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de>
<1jnpM.980$8Ma1.956@fx37.iad> <2023Jul6.082239@mips.complang.tuwien.ac.at> <XrFpM.44364$N3_4.3796@fx10.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e45955d0-85c0-4d98-8b4d-ffe75fefcf07n@googlegroups.com>
Subject: Re: What did it cost the 8086 to support unaligned access?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 06 Jul 2023 20:31:21 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3696
 by: MitchAlsup - Thu, 6 Jul 2023 20:31 UTC

On Thursday, July 6, 2023 at 3:26:35 PM UTC-5, EricP wrote:
> Anton Ertl wrote:
> > EricP <ThatWould...@thevillage.com> writes:
> >> And then the Alpha 21064 came along, believed this, eliminated unaligned
> >> loads and stores, and found out humans do this *a lot*.
> >
> > It's also been a while since I used an Alpha, but even on the 21264B
> > we saw notices of unaligned accesses by programs in the Linux kernel
> > logs, and I wrote <https://www.complang.tuwien.ac.at/anton/uace.c> to
> > control this Linux kernel feature (a similar program uac was available
> > on Digital OSF/1 (or whatever it had been renamed to at the time), and
> > as a result I (and my students) got SIGBUS signals on unaligned
> > accesses even on Linux (where normally the kernel emulated unaligned
> > accesses).
> >
> > Alpha also has ldq_u (which performed an aligned access using a
> > possibly unaligned address), and pseudo-instructions like ustq (which
> > expands to 11 instructions IIRC). I actually found that the gas
> > implementation of ustq was broken, which showed that no Linux software
> > used this pseudo-instruction.
> >
> > You may be confusing the unaligned access issue with byte and 16-bit
> > memory access, which was added in EV56.
> Not confused but rather lumping them all together on the theory
> that once it has a shifter for natually aligned halfwords and bytes
> that the extra logic for unaligned halfwords and added cost to the
> critical logic path (which is what the risc's were all concerned about)
<
s/risc's/nieve RISC designers/
<
> is minimal until one deals with straddles. And once straddles are
> involved then it requires a whole separate clock cycle with its own
> critical path.
<
But one you recognize the misalignment in AGEN (about 4 gates before
the HoB resolves) you have time to insert a cycle in the LD pipeline
subsection.
>
> So it seems to me that once it does aligned halfwords that full
> unaligned accesses should be very little extra.
<
This has been my argument since about 1990. It is so close to free
an the overhead of emulation is so costly, that it is actually free--so
why not just make it so.

Re: What did it cost the 8086 to support unaligned access?

<e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33074&group=comp.arch#33074

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:4f10:0:b0:635:df72:729f with SMTP id fb16-20020ad44f10000000b00635df72729fmr8726qvb.12.1688682264729;
Thu, 06 Jul 2023 15:24:24 -0700 (PDT)
X-Received: by 2002:a17:902:b689:b0:1b8:a134:6fcb with SMTP id
c9-20020a170902b68900b001b8a1346fcbmr2553244pls.7.1688682264156; Thu, 06 Jul
2023 15:24:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!newsfeed.hasname.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 Jul 2023 15:24:23 -0700 (PDT)
In-Reply-To: <1jnpM.980$8Ma1.956@fx37.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=71.230.96.169; posting-account=ujX_IwoAAACu0_cef9hMHeR8g0ZYDNHh
NNTP-Posting-Host: 71.230.96.169
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de> <1jnpM.980$8Ma1.956@fx37.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>
Subject: Re: What did it cost the 8086 to support unaligned access?
From: timcaffrey@aol.com (Timothy McCaffrey)
Injection-Date: Thu, 06 Jul 2023 22:24:24 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3225
 by: Timothy McCaffrey - Thu, 6 Jul 2023 22:24 UTC

On Wednesday, July 5, 2023 at 7:48:17 PM UTC-4, EricP wrote:
> And then the Standford MIPS came along, believed this, eliminated unaligned
> loads and stores, and found out that while compilers may not issue these
> humans do it *a lot*. Faced with the prospect of rewriting lots of code
> to suit their processor they chose to add it back in for their first
> commercial version, the MIPS R2000.
>
> And then the Alpha 21064 came along, believed this, eliminated unaligned
> loads and stores, and found out humans do this *a lot*.
> Then DEC lied it caused problems, blamed the humans, claimed the code was
> broken to begin with (it wasn't), quietly published manuals on how to
> rewrite code to suit their processor, and finally added it back in again
> claiming it was to support Windows. By which time they had, in my opinion,
> driven away Alpha's market due to incompatibility and it never recovered.

What is not understood, I think, is that you cannot always dictate where the data comes from.

If you get a raw network packet (complete with Ethernet header) you can get all sorts of miscellaneous
bytes thrown in that screws up alignment. Yes, you can copy things so they are aligned again, and
how is that more efficient than just allowing un-aligned accesses? Yes, I have dealt with this on a MIPS. Functions
that get a pointer to an address field, they have to copy it one byte at a time because who knows what alignment
that pointer has? (Admittedly, that code was overly pessimistic, but the programmer got caught once and wasn't about
to deal with it a second time).

Even Intel, who should know better, got caught up in this. SSE ops that have aligned and unaligned versions.
It took until Nehalem for them to realize that it wasn't necessary (although I think some of the Atoms still had this issue).

- Tim

Re: What did it cost the 8086 to support unaligned access?

<9f4b7458-e6e8-4e3b-88c0-bb4c5a47fa69n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33075&group=comp.arch#33075

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:17a5:b0:763:a2e8:2b1a with SMTP id ay37-20020a05620a17a500b00763a2e82b1amr10048qkb.10.1688687246080;
Thu, 06 Jul 2023 16:47:26 -0700 (PDT)
X-Received: by 2002:a17:90b:1089:b0:262:c252:3724 with SMTP id
gj9-20020a17090b108900b00262c2523724mr2691083pjb.8.1688687245540; Thu, 06 Jul
2023 16:47:25 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 6 Jul 2023 16:47:24 -0700 (PDT)
In-Reply-To: <e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8a5:2639:4a28:1128;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8a5:2639:4a28:1128
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de>
<1jnpM.980$8Ma1.956@fx37.iad> <e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9f4b7458-e6e8-4e3b-88c0-bb4c5a47fa69n@googlegroups.com>
Subject: Re: What did it cost the 8086 to support unaligned access?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Thu, 06 Jul 2023 23:47:26 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1985
 by: MitchAlsup - Thu, 6 Jul 2023 23:47 UTC

On Thursday, July 6, 2023 at 5:24:26 PM UTC-5, Timothy McCaffrey wrote:
<
The second half of this thread is illustrating why nobody should
architect a machine that does not support misaligned accesses
ever gains in the future.
<
For my part, and for debugging purposes, one can enable a
misaligned access fault and isolate the necessary ones from
the ones that should be fixed. When returning from a mis-
aligned fault, My 66000 will retry the instruction with the
fault temporarily disabled for 1 instruction; so all the handler
has to do is keep statistics.

Re: What did it cost the 8086 to support unaligned access?

<u87onq$130mj$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33076&group=comp.arch#33076

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
Date: Thu, 6 Jul 2023 20:11:19 -0500
Organization: A noiseless patient Spider
Lines: 117
Message-ID: <u87onq$130mj$1@dont-email.me>
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u86pdf$vu1u$1@dont-email.me> <u86sra$1070j$1@dont-email.me>
<u86t8u$10b8l$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 7 Jul 2023 01:11:22 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1268626fcd9f2db665f15ad89aa8232d";
logging-data="1147603"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+uMNcPj7YGMzSyCiKsbFU5"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:k28pz8q6gUpWsVSs7iUdMKe5VW0=
In-Reply-To: <u86t8u$10b8l$1@dont-email.me>
Content-Language: en-US
 by: BGB - Fri, 7 Jul 2023 01:11 UTC

On 7/6/2023 12:22 PM, Terje Mathisen wrote:
> BGB wrote:
>> On 7/6/2023 11:16 AM, Terje Mathisen wrote:
>>> Russell Wallace wrote:
>>>> The Intel 8086 supported unaligned loads and stores of 16-bit data,
>>>> e.g. mov ax, foo was guaranteed to work even if foo was odd.
>>>>
>>>> What did this cost, in terms of performance and chip area, compared
>>>> to an alternative architecture that would have been the same except
>>>> for unaligned access being a trap or undefined behavior?
>>>>
>>>> To be clear, I'm not talking about the dynamic behavior of code. On
>>>> the actual 8086, access was still faster if the pointer did happen to
>>>> be even. I'm asking, suppose all your pointers for word access were
>>>> actually even, how much bigger and slower was the chip made by having
>>>> to support the possibility that some of them could have been odd?
>>>
>>> I would suggest that since they already knew that they would make an
>>> 8-bit bus version (the 8088 which ended up in the IBM PC), the
>>> control circuits already knew how to combine two 8-bit accesses into
>>> a 16-bit load. In the '86 an aligned 16-bit load would run a single
>>> bus cycle (taking 4 clock cycles), while the same operation on the
>>> '88 took twice as long. Unless the '86 coud do unaligned accesses in
>>> less than 8 cycles, I would guess the mechanism was the same!
>>>
>>
>> This makes it seem like the 8086/8088 would have been painfully slow?...
>
> Oh, grasshopper, if only you knew! :-)
>
> It was in fact painfully slow. OTOH, it was possible to directly
> calculate, with very high precision) how long any given code would take
> since you could simply add together all code and data bytes read or
> written and multiply by 4. It was only when you ran very slow ops, like
> MUL/DIV or floating point that this could break down. For my own code I
> assumed my 4.77 MHz cpu could handle 1 M bytes/second, most fast code
> would run at maybe 250-300 K instructions/second.

Hmm...

Yeah, that seems kinda slow...

My project seems to be averaging closer to 40-60 million instructions
per second...

Quake performance still kinda sucks though...

I guess, Doom/Heretic/Hexen/etc run reasonably well, but there were
(presumably) intended to run mostly on 386 class PCs.

What legacy stats I can find imply that my core is a bit faster than a
386 at least.

>>
>> Like, how exactly did they run programs like Wolfenstein 3D or the
>> various platformer games?...
>
> You did not run Wolfenstein until the 386!

OK. It was real-mode code, so I had guessed maybe it was intended to run
on older PCs as well.

I had noted that the code for the "Commander Keen" and "Duke Nukem"
games was also 16-bit real-mode (these being side-scrolling
platformers). I had not bothered trying to port these to BJX2 mostly as
I don't want to have to deal with rewriting a bunch of 16-bit assembler
code into C or similar.

And also (like Wolf3D), even if I did port them, I could not legally
redistribute the modified versions.

Seemingly it was only Doom and later that had their source released
under GPL. Well, also Doom and later being originally 32-bit code.

Can note that for my ROTT port (was effectively a highly modified 32-bit
variant of the Wolf3D engine), they had been using the VGA card in a
weird way (basically, a planar mode).

Rather than rewrite basically the whole renderer to use a planar
framebuffer, just sort of ended up wrapping everything in function calls
(so, basically function calls to update the plane mask or plot pixels on
the screen; in place of the IO port twiddling and stores into the VGA
memory).

Though, some of the column and span-drawing functions partly sidestep
this though (in the name of not being horridly slow).

>>
>> Like, even with all my fancy stuff, and a 1-cycle throughput for many
>> memory accesses to the L1 cache, still difficult to get any semblance
>> of usable performance with things like Wolf3D much under ~ 10-14 MHz ...
>>
>>
>> Granted, a lot of these also required VGA, so maybe running them on
>> the original PC wasn't really a thing even if they were originally
>> written for 16-bit real-mode?...
>
> That's your answer right there. :-)
>

OK.

Apparently Keen had a CGA mode. No idea how well it would run on an 8088
though.

Granted, this was "old tech" that had mostly already gone away by the
time I was in elementary school (which at the time was mostly a world of
PC's running Win 3.11 or Win 95 or similar...).

Re: What did it cost the 8086 to support unaligned access?

<2023Jul7.102249@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33078&group=comp.arch#33078

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
Date: Fri, 07 Jul 2023 08:22:49 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 14
Distribution: world
Message-ID: <2023Jul7.102249@mips.complang.tuwien.ac.at>
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com> <u84l35$1aq4$1@gal.iecc.com> <4733a536-4669-4930-931f-9b58c644b9b4n@googlegroups.com> <u86sbp$7vac$2@newsreader4.netcologne.de>
Injection-Info: dont-email.me; posting-host="e0289143d6c7d567f3107fd18072655e";
logging-data="1342594"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+qsjbeH4b44wcHbR9hWp2p"
Cancel-Lock: sha1:hXnB5QbRKdDK/9k+lmONl7avReM=
X-newsreader: xrn 10.11
 by: Anton Ertl - Fri, 7 Jul 2023 08:22 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
>MitchAlsup <MitchAlsup@aol.com> schrieb:
>
>> FORTRAN common blocks required misaligned DP FP accesses.
....
>Hm... which ABI actually specifies non-aligned access?

Intel's IA-32 ABI specifies that binary64 FP numbers are stored at
4-byte aligned addresses.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: What did it cost the 8086 to support unaligned access?

<u894ir$9e6s$1@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33079&group=comp.arch#33079

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-15fa-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
Date: Fri, 7 Jul 2023 13:39:39 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <u894ir$9e6s$1@newsreader4.netcologne.de>
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com>
<4733a536-4669-4930-931f-9b58c644b9b4n@googlegroups.com>
<u86sbp$7vac$2@newsreader4.netcologne.de>
<85db67ca-9fd3-4e7d-bcc3-adea91075af1n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 7 Jul 2023 13:39:39 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-15fa-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:15fa:0:7285:c2ff:fe6c:992d";
logging-data="309468"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Fri, 7 Jul 2023 13:39 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Thursday, July 6, 2023 at 12:07:09 PM UTC-5, Thomas Koenig wrote:
>> MitchAlsup <Mitch...@aol.com> schrieb:
>> > FORTRAN common blocks required misaligned DP FP accesses.
>> That is probably the most-ignored part of the Fortran standard.
>> Even the very first FORTRAN 77 compiler, by Bell Labs, aligned the
>> data in COMMON blocks.
><
> HOW ??

To quote the original publication, "A Portable Fortran 77 Compiler",
S. I. Feldman, P. J. Weinberger, under "VIOLATIONS OF THE STANDARD":

"We have chosen to require that all double precision real and
complex quantities fall on even word boundaries on machines with
corresponding hardware requirements, and to issue a diagnostic if
the source code demands a violation of the rule."

> compile unit 1:
><
> COMMON /A/ INT, INT, INT, DP, DP, DP[10]

They would flag that (because an INT would take one storage
unit, and a double precision woult ake two).

So, my statement above was actually incorrect, they simply refused
to compile it.

When I learned FORTRAN, I certainly learned to put DOUBLE PRECISION
before any REAL or INTEGER in a common block, and I certainly lerned
to use INCLUDE statements for them (non-standard, but life-saving).

Re: misaligned Fortran, What did it cost the 8086 to support unaligned access?

<u89cig$2dm4$1@gal.iecc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33080&group=comp.arch#33080

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: johnl@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: misaligned Fortran, What did it cost the 8086 to support unaligned access?
Date: Fri, 7 Jul 2023 15:56:00 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <u89cig$2dm4$1@gal.iecc.com>
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com> <u86sbp$7vac$2@newsreader4.netcologne.de> <85db67ca-9fd3-4e7d-bcc3-adea91075af1n@googlegroups.com> <u894ir$9e6s$1@newsreader4.netcologne.de>
Injection-Date: Fri, 7 Jul 2023 15:56:00 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="79556"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com> <u86sbp$7vac$2@newsreader4.netcologne.de> <85db67ca-9fd3-4e7d-bcc3-adea91075af1n@googlegroups.com> <u894ir$9e6s$1@newsreader4.netcologne.de>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Fri, 7 Jul 2023 15:56 UTC

According to Thomas Koenig <tkoenig@netcologne.de>:
>To quote the original publication, "A Portable Fortran 77 Compiler",
>S. I. Feldman, P. J. Weinberger, under "VIOLATIONS OF THE STANDARD":
>
>"We have chosen to require that all double precision real and
>complex quantities fall on even word boundaries on machines with
>corresponding hardware requirements, and to issue a diagnostic if
>the source code demands a violation of the rule."

That worked for them in their environment, a compiler for small
programs, provided for free with no promise of support. I don't really
understand why they did that since I don't think they had any target
machines where it mattered. I wrote the similar INfort around the same
time and float alignment wasn't even something I worried about.

IBM's compilers warn you about misaligned data but they still make it
work, slowly in software in the 1960s, fast with hardware since then.
They had to, they had paying customers with large programs they needed
to run.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: misaligned Fortran, What did it cost the 8086 to support unaligned access?

<u89e12$9l3b$1@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33081&group=comp.arch#33081

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-15fa-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: misaligned Fortran, What did it cost the 8086 to support
unaligned access?
Date: Fri, 7 Jul 2023 16:20:50 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <u89e12$9l3b$1@newsreader4.netcologne.de>
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u86sbp$7vac$2@newsreader4.netcologne.de>
<85db67ca-9fd3-4e7d-bcc3-adea91075af1n@googlegroups.com>
<u894ir$9e6s$1@newsreader4.netcologne.de> <u89cig$2dm4$1@gal.iecc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 7 Jul 2023 16:20:50 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-15fa-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:15fa:0:7285:c2ff:fe6c:992d";
logging-data="316523"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Fri, 7 Jul 2023 16:20 UTC

John Levine <johnl@taugh.com> schrieb:
> According to Thomas Koenig <tkoenig@netcologne.de>:
>>To quote the original publication, "A Portable Fortran 77 Compiler",
>>S. I. Feldman, P. J. Weinberger, under "VIOLATIONS OF THE STANDARD":
>>
>>"We have chosen to require that all double precision real and
>>complex quantities fall on even word boundaries on machines with
>>corresponding hardware requirements, and to issue a diagnostic if
>>the source code demands a violation of the rule."
>
> That worked for them in their environment, a compiler for small
> programs, provided for free with no promise of support. I don't really
> understand why they did that since I don't think they had any target
> machines where it mattered. I wrote the similar INfort around the same
> time and float alignment wasn't even something I worried about.

They explained their reasoning before the paragraph I quoted:

# Some machines (e.g., Honeywell 6000, IBM 360) require that double
# precision quantities be on double word boundaries; other machines
# (e.g., IBM 370), run inefficiently if this alignment rule is not
# observed. It is possible to tell which equivalenced and common
# variables suffer from a forced odd alignment, but every double
# precision argument would have to be assumed on a bad boundary. To
# load such a quantity on some machines, it would be necessary to
# use separate operations to move the upper and lower halves into the
# halves of an aligned temporary, then to load that double precision
# temporary; the reverse would be needed to store a result.

> IBM's compilers warn you about misaligned data but they still make it
> work, slowly in software in the 1960s, fast with hardware since then.
> They had to, they had paying customers with large programs they needed
> to run.

The authors may have not wanted to take any chances which what was
intended as a portable compiler across a wide range of architectures.

Its lineal descendant, f2c, which uses C as a portable assembler,
still has it. gfortran warns about the same code:

$ cat common.f
INTEGER A
DOUBLE PRECISION B
COMMON /FOO/ A, B
END
$ f2c common.f
common.f:
MAIN:
Error processing common blocks before line 4 of common.f: Declaration error for b: common alignment
$ gfortran common.f
common.f:3:18:

3 | COMMON /FOO/ A, B
| 1
Warning: Padding of 4 bytes required before ‘b’ in COMMON ‘foo’ at (1); reorder elements or use ‘-fno-align-commons’ [-Walign-commons]

Re: What did it cost the 8086 to support unaligned access?

<u89e7b$9l3b$2@newsreader4.netcologne.de>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33082&group=comp.arch#33082

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-15fa-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoenig@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
Date: Fri, 7 Jul 2023 16:24:11 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <u89e7b$9l3b$2@newsreader4.netcologne.de>
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de>
<1jnpM.980$8Ma1.956@fx37.iad>
<e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 7 Jul 2023 16:24:11 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-15fa-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:15fa:0:7285:c2ff:fe6c:992d";
logging-data="316523"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Fri, 7 Jul 2023 16:24 UTC

Timothy McCaffrey <timcaffrey@aol.com> schrieb:
> On Wednesday, July 5, 2023 at 7:48:17 PM UTC-4, EricP wrote:
>> And then the Standford MIPS came along, believed this, eliminated unaligned
>> loads and stores, and found out that while compilers may not issue these
>> humans do it *a lot*. Faced with the prospect of rewriting lots of code
>> to suit their processor they chose to add it back in for their first
>> commercial version, the MIPS R2000.
>>
>> And then the Alpha 21064 came along, believed this, eliminated unaligned
>> loads and stores, and found out humans do this *a lot*.
>> Then DEC lied it caused problems, blamed the humans, claimed the code was
>> broken to begin with (it wasn't), quietly published manuals on how to
>> rewrite code to suit their processor, and finally added it back in again
>> claiming it was to support Windows. By which time they had, in my opinion,
>> driven away Alpha's market due to incompatibility and it never recovered.
>
>
> What is not understood, I think, is that you cannot always
> dictate where the data comes from.

>
> If you get a raw network packet (complete with Ethernet header)
> you can get all sorts of miscellaneous bytes thrown in that screws
> up alignment.

That is the first (for me) really convincing use case for supporting
misaligned data.

Count me in as a believer :-)

Re: What did it cost the 8086 to support unaligned access?

<u89g5h$1cqoq$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33083&group=comp.arch#33083

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: cr88192@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
Date: Fri, 7 Jul 2023 11:57:15 -0500
Organization: A noiseless patient Spider
Lines: 57
Message-ID: <u89g5h$1cqoq$1@dont-email.me>
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de>
<1jnpM.980$8Ma1.956@fx37.iad>
<e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>
<u89e7b$9l3b$2@newsreader4.netcologne.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 7 Jul 2023 16:57:21 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="0ff97adf131098daf00a65fe5249808f";
logging-data="1469210"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/PkGi7mbk7ElAVg0CWzNW7"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:0fBoS+IPxQAk2PIGXUFl1iEJj0g=
In-Reply-To: <u89e7b$9l3b$2@newsreader4.netcologne.de>
Content-Language: en-US
 by: BGB - Fri, 7 Jul 2023 16:57 UTC

On 7/7/2023 11:24 AM, Thomas Koenig wrote:
> Timothy McCaffrey <timcaffrey@aol.com> schrieb:
>> On Wednesday, July 5, 2023 at 7:48:17 PM UTC-4, EricP wrote:
>>> And then the Standford MIPS came along, believed this, eliminated unaligned
>>> loads and stores, and found out that while compilers may not issue these
>>> humans do it *a lot*. Faced with the prospect of rewriting lots of code
>>> to suit their processor they chose to add it back in for their first
>>> commercial version, the MIPS R2000.
>>>
>>> And then the Alpha 21064 came along, believed this, eliminated unaligned
>>> loads and stores, and found out humans do this *a lot*.
>>> Then DEC lied it caused problems, blamed the humans, claimed the code was
>>> broken to begin with (it wasn't), quietly published manuals on how to
>>> rewrite code to suit their processor, and finally added it back in again
>>> claiming it was to support Windows. By which time they had, in my opinion,
>>> driven away Alpha's market due to incompatibility and it never recovered.
>>
>>
>> What is not understood, I think, is that you cannot always
>> dictate where the data comes from.
>
>>
>> If you get a raw network packet (complete with Ethernet header)
>> you can get all sorts of miscellaneous bytes thrown in that screws
>> up alignment.
>
> That is the first (for me) really convincing use case for supporting
> misaligned data.
>
> Count me in as a believer :-)

I also like to have a "not painfully slow" LZ4 implementation and similar...
Or, Huffman decoding can also be made faster when one has misaligned access.
....

Though, as noted, my case is hybrid as there are still some cases where
alignment matters.

But, these are more motivated as a case of:
Well, extract/insert logic that operates on a 128-bit block is cheaper
than logic that operates on a 256-bit block (and the 128-bit Load/Store
case is essentially side-stepping the normal logic to extract or insert
a value).

Well, excluding MMIO, which generally only supports aligned access and
at particular widths (usually 32 or 64 bit). But, the idea is that MMIO
is not normal memory, so such a restriction seems sane.

In my case, there is an "__unaligned" modifier for pointers, but
practically it only really makes much difference for 128-bit types, and
hints for the compiler to use a pair of loads/stores in these cases.

....

Re: What did it cost the 8086 to support unaligned access?

<168875049023.17324.14559715909020368064@media.vsta.org>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33085&group=comp.arch#33085

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: vandys@vsta.org (Andy Valencia)
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
Date: Fri, 07 Jul 2023 10:21:30 -0700
Lines: 15
Message-ID: <168875049023.17324.14559715909020368064@media.vsta.org>
References: <u89e7b$9l3b$2@newsreader4.netcologne.de> <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com> <u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de> <1jnpM.980$8Ma1.956@fx37.iad> <e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>
X-Trace: individual.net xuYvT0iRa8EzQZhiB4IzHAnaMTU+eQWVI4pBPYXR1E5a2dLrrg
X-Orig-Path: media
Cancel-Lock: sha1:IdxnUA+KvV0groBzYoL6wOJ2L5E= sha256:H4qaIrIYEtbB2YMW74SDoOG44eKM3g22ThfdH8aoKPo=
User-Agent: rn.py v0.0.1
 by: Andy Valencia - Fri, 7 Jul 2023 17:21 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
> > If you get a raw network packet (complete with Ethernet header)
> > you can get all sorts of miscellaneous bytes thrown in that screws
> > up alignment.
> That is the first (for me) really convincing use case for supporting
> misaligned data.

The first version of L2TP had a header which could be quite compact. I'm
told that later versions of the protocol shed much of this so as to let
silicon do the processing with less fuss. Fiddly bits and bytes, especially
with optional presence, make life harder for silicon.

Andy Valencia
Home page: https://www.vsta.org/andy/
To contact me: https://www.vsta.org/contact/andy.html

Re: What did it cost the 8086 to support unaligned access?

<2023Jul7.190026@mips.complang.tuwien.ac.at>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33086&group=comp.arch#33086

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
Date: Fri, 07 Jul 2023 17:00:26 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 43
Message-ID: <2023Jul7.190026@mips.complang.tuwien.ac.at>
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com> <u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de> <1jnpM.980$8Ma1.956@fx37.iad> <e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>
Injection-Info: dont-email.me; posting-host="e0289143d6c7d567f3107fd18072655e";
logging-data="1475699"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19eLHjoS3PQth8LMUQOj8rr"
Cancel-Lock: sha1:ODtnomcxM8QiNehmyvSUinCLObg=
X-newsreader: xrn 10.11
 by: Anton Ertl - Fri, 7 Jul 2023 17:00 UTC

Timothy McCaffrey <timcaffrey@aol.com> writes:
>If you get a raw network packet (complete with Ethernet header) you can get=
> all sorts of miscellaneous
>bytes thrown in that screws up alignment. Yes, you can copy things so they=
> are aligned again, and
>how is that more efficient than just allowing un-aligned accesses? Yes, I =
>have dealt with this on a MIPS. Functions
>that get a pointer to an address field, they have to copy it one byte at a =
>time because who knows what alignment
>that pointer has?

On MIPS you can use ulw (which expands to lwl and lwr) for 4-byte
access and similar pseudo-instructions for 2-byte and 8-byte accesses.

In C you write stuff like

int v;
memcpy(&v, p, 4);

and a modern compiler should compile this to ulw on MIPS. Whether it
does now, and whether it did at the time when the programmer you refer
to wrote his code is beyond my knowledge. I have certainly seen cases
where the result of this technique is worse than what one would wish
for.

>Even Intel, who should know better, got caught up in this. SSE ops that ha=
>ve aligned and unaligned versions.
>It took until Nehalem for them to realize that it wasn't necessary (althoug=
>h I think some of the Atoms still had this issue).

In Nehalem and later CPUs SSE loads and stores still have aligned and
unaligned versions. And the SSE load-and-op instructions still
require 16-byte alignment for the load part, in all Intel CPUs (AMD
CPUs have a flag that allows unaligned SSE load-and-op instructions
when set).

Maybe you are thinking of Sandy Bridge, which introduced AVX, which
allows unaligned load-and-op instructions.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: misaligned Fortran, What did it cost the 8086 to support unaligned access?

<u89jll$1hhf$1@gal.iecc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33087&group=comp.arch#33087

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: johnl@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: misaligned Fortran, What did it cost the 8086 to support
unaligned access?
Date: Fri, 7 Jul 2023 17:57:09 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <u89jll$1hhf$1@gal.iecc.com>
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com> <u894ir$9e6s$1@newsreader4.netcologne.de> <u89cig$2dm4$1@gal.iecc.com> <u89e12$9l3b$1@newsreader4.netcologne.de>
Injection-Date: Fri, 7 Jul 2023 17:57:09 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="50735"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com> <u894ir$9e6s$1@newsreader4.netcologne.de> <u89cig$2dm4$1@gal.iecc.com> <u89e12$9l3b$1@newsreader4.netcologne.de>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
 by: John Levine - Fri, 7 Jul 2023 17:57 UTC

According to Thomas Koenig <tkoenig@netcologne.de>:
>> IBM's compilers warn you about misaligned data but they still make it
>> work, slowly in software in the 1960s, fast with hardware since then.
>> They had to, they had paying customers with large programs they needed
>> to run.
>
>The authors may have not wanted to take any chances which what was
>intended as a portable compiler across a wide range of architectures.

As I said, it worked for them. This was a project at Bell Labs that
was never intended to be a commercial project. It was free and if it
didn't do quite what you wanted, well, at the price it wasn't bad.

>Its lineal descendant, f2c, which uses C as a portable assembler,
>still has it. gfortran warns about the same code:

Once that kind of thing is in the compiler, it never goes away.

Are there modern machines where it makes a signficant difference? It
is my impression that once everything had a cache, the cost of
misalignment got lost in the noise.

I note that the very first machine with a cache, the 360/85, had the
Byte Oriented Operand feature to allow misaligned data, so they
presumably thought it was an overall benefit.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: misaligned Fortran, What did it cost the 8086 to support unaligned access?

<75b17f5d-98c2-4af4-8c87-749b1d1c3914n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33088&group=comp.arch#33088

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:4720:b0:767:3541:413b with SMTP id bs32-20020a05620a472000b007673541413bmr45622qkb.1.1688757056545;
Fri, 07 Jul 2023 12:10:56 -0700 (PDT)
X-Received: by 2002:a17:90a:8913:b0:263:ef7:c4b9 with SMTP id
u19-20020a17090a891300b002630ef7c4b9mr4511599pjn.6.1688757055925; Fri, 07 Jul
2023 12:10:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.cmpublishers.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 7 Jul 2023 12:10:55 -0700 (PDT)
In-Reply-To: <u89jll$1hhf$1@gal.iecc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:9856:90d0:2967:a029;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:9856:90d0:2967:a029
References: <b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u894ir$9e6s$1@newsreader4.netcologne.de> <u89cig$2dm4$1@gal.iecc.com>
<u89e12$9l3b$1@newsreader4.netcologne.de> <u89jll$1hhf$1@gal.iecc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <75b17f5d-98c2-4af4-8c87-749b1d1c3914n@googlegroups.com>
Subject: Re: misaligned Fortran, What did it cost the 8086 to support
unaligned access?
From: MitchAlsup@aol.com (MitchAlsup)
Injection-Date: Fri, 07 Jul 2023 19:10:56 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3444
 by: MitchAlsup - Fri, 7 Jul 2023 19:10 UTC

On Friday, July 7, 2023 at 12:57:13 PM UTC-5, John Levine wrote:
> According to Thomas Koenig <tko...@netcologne.de>:
> >> IBM's compilers warn you about misaligned data but they still make it
> >> work, slowly in software in the 1960s, fast with hardware since then.
> >> They had to, they had paying customers with large programs they needed
> >> to run.
> >
> >The authors may have not wanted to take any chances which what was
> >intended as a portable compiler across a wide range of architectures.
> As I said, it worked for them. This was a project at Bell Labs that
> was never intended to be a commercial project. It was free and if it
> didn't do quite what you wanted, well, at the price it wasn't bad.
> >Its lineal descendant, f2c, which uses C as a portable assembler,
> >still has it. gfortran warns about the same code:
> Once that kind of thing is in the compiler, it never goes away.
>
> Are there modern machines where it makes a signficant difference? It
> is my impression that once everything had a cache, the cost of
> misalignment got lost in the noise.
>
> I note that the very first machine with a cache, the 360/85, had the
> Byte Oriented Operand feature to allow misaligned data, so they
> presumably thought it was an overall benefit.
<
Looking back at all the arguments I have heard about misalignment
over my time {1970s-present} it occurs to me that those who continue
to argue for slim designs are exactly the same people who would argue
that incorrect rounding on FP is also acceptable (CDC-quality FP).
<
It costs so little more to take a big problem off {the compiler, the
operating system, the libraries, the customer surprise function}
that whatever incentive remains in avoidance of doing the right
thing is penny wise and pound foolish.
> --
> Regards,
> John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
> Please consider the environment before reading this e-mail. https://jl.ly

Re: What did it cost the 8086 to support unaligned access?

<u89oj9$1dqui$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33089&group=comp.arch#33089

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.mathisen@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
Date: Fri, 7 Jul 2023 21:21:12 +0200
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <u89oj9$1dqui$1@dont-email.me>
References: <u89e7b$9l3b$2@newsreader4.netcologne.de>
<b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de>
<1jnpM.980$8Ma1.956@fx37.iad>
<e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>
<168875049023.17324.14559715909020368064@media.vsta.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 7 Jul 2023 19:21:13 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b058918ea28800eb94464d3dcfc49834";
logging-data="1502162"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19iOMYKcU7dyEd/wZS46u0RgS+QP2wMuWHoPQK3uOIq+A=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.16
Cancel-Lock: sha1:C6nKwrSbr3WIFWJNdI/Tdkw3MNk=
In-Reply-To: <168875049023.17324.14559715909020368064@media.vsta.org>
 by: Terje Mathisen - Fri, 7 Jul 2023 19:21 UTC

Andy Valencia wrote:
> Thomas Koenig <tkoenig@netcologne.de> writes:
>>> If you get a raw network packet (complete with Ethernet header)
>>> you can get all sorts of miscellaneous bytes thrown in that screws
>>> up alignment.
>> That is the first (for me) really convincing use case for supporting
>> misaligned data.
>
> The first version of L2TP had a header which could be quite compact. I'm
> told that later versions of the protocol shed much of this so as to let
> silicon do the processing with less fuss. Fiddly bits and bytes, especially
> with optional presence, make life harder for silicon.

Andy, I would argue that this holds even more for software!

I.e. I am pretty sure Mitch would agree that HW is great for doing
multi-way decisions, which is exactly what software find to be the hardest.

It is not a coincidence that a HW h264 decoder is both faster and uses
far less power than a software ditto.

This was still true after I had shown Intel how to double the speed of
their reference implementation, instead of implementing my ideas they
licensed a chunk of VLSI to do the same in hardware.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: What did it cost the 8086 to support unaligned access?

<u89r8q$1e5ro$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=33090&group=comp.arch#33090

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sfuld@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: What did it cost the 8086 to support unaligned access?
Date: Fri, 7 Jul 2023 13:06:48 -0700
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <u89r8q$1e5ro$1@dont-email.me>
References: <u89e7b$9l3b$2@newsreader4.netcologne.de>
<b2711000-2ce5-4c51-b44d-665e97a4c488n@googlegroups.com>
<u84l35$1aq4$1@gal.iecc.com> <u84lqq$6ioc$2@newsreader4.netcologne.de>
<1jnpM.980$8Ma1.956@fx37.iad>
<e3cfa786-fb8c-4433-8841-1c407a1a94d1n@googlegroups.com>
<168875049023.17324.14559715909020368064@media.vsta.org>
<u89oj9$1dqui$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 7 Jul 2023 20:06:50 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="edde5a95b37b71f977185ff1cd755633";
logging-data="1513336"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1//TWpjG6nfwOJinzi4YtbwlgFhyur9Gi4="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.12.0
Cancel-Lock: sha1:NgulcymxmGOMsBXNRafow72SyQ8=
Content-Language: en-US
In-Reply-To: <u89oj9$1dqui$1@dont-email.me>
 by: Stephen Fuld - Fri, 7 Jul 2023 20:06 UTC

On 7/7/2023 12:21 PM, Terje Mathisen wrote:

snip

> It is not a coincidence that a HW h264 decoder is both faster and uses
> far less power than a software ditto.

While that is certainly true, I think it is an example of the general
principle that, above some minimum level of complexity (and below
another, much higher level), almost any task can be done faster with
lower power in hardware than in software. The issue for designers is
whether the particular task happens often enough and saves enough in a
particular workload to justify putting it into hardware. Different
design teams make different decisions for their needs.

For example, at one end, most designs these days put floating point into
hardware. In the middle, some designs put things like encryption in
hardware, others don't. And some choose an intermediate solution with
hardware encryption assist instructions. At the other end, AFAIK, only
IBM puts sort assist in hardware.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)


devel / comp.arch / Re: What did it cost the 8086 to support unaligned access?

Pages:123
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor