Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

Just don't create a file called -rf. :-) -- Larry Wall in <11393@jpl-devvax.JPL.NASA.GOV>


devel / comp.lang.c / Re: A Famous Security Bug

SubjectAuthor
* A Famous Security BugStefan Ram
+* Re: A Famous Security BugKaz Kylheku
|+* Re: A Famous Security BugScott Lurndal
||`* Re: A Famous Security BugKeith Thompson
|| `- Re: A Famous Security BugKeith Thompson
|+* Re: A Famous Security BugDavid Brown
||`* Re: A Famous Security BugKaz Kylheku
|| +* Re: A Famous Security BugChris M. Thomasson
|| |`* Re: A Famous Security BugScott Lurndal
|| | `* Re: A Famous Security BugChris M. Thomasson
|| |  `* Re: A Famous Security BugScott Lurndal
|| |   `* Re: A Famous Security BugChris M. Thomasson
|| |    `- Re: A Famous Security BugChris M. Thomasson
|| +* Re: A Famous Security BugKeith Thompson
|| |+* Re: A Famous Security BugKaz Kylheku
|| ||+* Re: A Famous Security BugKeith Thompson
|| |||`* Re: A Famous Security BugKaz Kylheku
|| ||| +* Re: A Famous Security BugJames Kuyper
|| ||| |`- Re: A Famous Security BugKaz Kylheku
|| ||| +- Re: A Famous Security BugDavid Brown
|| ||| `* Re: A Famous Security BugKeith Thompson
|| |||  `* Re: A Famous Security BugKaz Kylheku
|| |||   `* Re: A Famous Security BugDavid Brown
|| |||    `* Re: A Famous Security BugKaz Kylheku
|| |||     +* Re: A Famous Security BugDavid Brown
|| |||     |`- Re: A Famous Security BugKaz Kylheku
|| |||     `* Re: A Famous Security BugJames Kuyper
|| |||      `* Re: A Famous Security BugKaz Kylheku
|| |||       `* Re: A Famous Security BugDavid Brown
|| |||        `* Re: A Famous Security BugKaz Kylheku
|| |||         +* Re: A Famous Security BugDavid Brown
|| |||         |`* Re: A Famous Security BugKaz Kylheku
|| |||         | `- Re: A Famous Security BugDavid Brown
|| |||         `- Re: A Famous Security BugChris M. Thomasson
|| ||+- Re: A Famous Security BugJames Kuyper
|| ||`* Re: A Famous Security BugDavid Brown
|| || `* Re: A Famous Security BugKaz Kylheku
|| ||  `- Re: A Famous Security BugDavid Brown
|| |`* Re: A Famous Security BugJames Kuyper
|| | `* Re: A Famous Security BugKaz Kylheku
|| |  `- Re: A Famous Security BugJames Kuyper
|| `- Re: A Famous Security BugDavid Brown
|`* Re: A Famous Security BugAnton Shepelev
| +- Re: A Famous Security BugKeith Thompson
| +* Re: A Famous Security BugKaz Kylheku
| |+* Re: A Famous Security BugDavid Brown
| ||`* Re: A Famous Security BugKaz Kylheku
| || +- Re: A Famous Security BugJames Kuyper
| || `* Re: A Famous Security BugDavid Brown
| ||  `* Re: A Famous Security BugRichard Kettlewell
| ||   +- Re: A Famous Security BugKaz Kylheku
| ||   +* Re: A Famous Security BugDavid Brown
| ||   |`- Re: A Famous Security BugKaz Kylheku
| ||   `* Re: A Famous Security BugTim Rentsch
| ||    `* Re: A Famous Security BugMalcolm McLean
| ||     `* Re: A Famous Security BugTim Rentsch
| ||      +- Re: A Famous Security BugDavid Brown
| ||      `- Re: A Famous Security BugKeith Thompson
| |`* Re: A Famous Security BugAnton Shepelev
| | `- Re: A Famous Security BugScott Lurndal
| +- Re: A Famous Security BugTim Rentsch
| `* Re: A Famous Security BugJames Kuyper
|  `* Re: A Famous Security Bugbart
|   +* Re: A Famous Security BugKeith Thompson
|   |`* Re: A Famous Security BugKaz Kylheku
|   | `* Re: A Famous Security BugDavid Brown
|   |  +- Re: A Famous Security BugScott Lurndal
|   |  `* Re: A Famous Security Bugbart
|   |   `- Re: A Famous Security BugDavid Brown
|   `* Re: A Famous Security BugJames Kuyper
|    `* Re: A Famous Security Bugbart
|     +* Re: A Famous Security BugDavid Brown
|     |`* Re: A Famous Security Bugbart
|     | +* Re: A Famous Security BugDavid Brown
|     | |`* Re: A Famous Security Bugbart
|     | | +* Re: A Famous Security BugKeith Thompson
|     | | |+- Re: A Famous Security BugDavid Brown
|     | | |+* Re: A Famous Security BugMichael S
|     | | ||+- Re: A Famous Security BugDavid Brown
|     | | ||`- Re: A Famous Security BugKeith Thompson
|     | | |`* Re: A Famous Security Bugbart
|     | | | `* Re: A Famous Security BugMichael S
|     | | |  +* Re: A Famous Security Bugbart
|     | | |  |+* Re: A Famous Security BugDavid Brown
|     | | |  ||`* Re: A Famous Security BugMalcolm McLean
|     | | |  || `- Re: A Famous Security BugMichael S
|     | | |  |`- Re: A Famous Security BugScott Lurndal
|     | | |  `* Re: A Famous Security BugDavid Brown
|     | | |   `- Re: A Famous Security BugScott Lurndal
|     | | `* Re: A Famous Security BugDavid Brown
|     | |  `* Re: A Famous Security BugMichael S
|     | |   `* Re: A Famous Security BugDavid Brown
|     | |    +* Re: A Famous Security BugMichael S
|     | |    |+- Re: A Famous Security BugDavid Brown
|     | |    |`- Re: A Famous Security Bugbart
|     | |    `* Re: A Famous Security Bugbart
|     | |     +* Re: A Famous Security BugMichael S
|     | |     |`* Re: A Famous Security Bugbart
|     | |     | +* Re: A Famous Security BugDavid Brown
|     | |     | |`- Re: A Famous Security BugScott Lurndal
|     | |     | `* Re: A Famous Security BugMichael S
|     | |     `- Re: A Famous Security BugDavid Brown
|     | `* Re: A Famous Security BugMichael S
|     +- Re: A Famous Security BugTim Rentsch
|     +- Re: A Famous Security BugMichael S
|     +* Re: A Famous Security BugMichael S
|     `- Re: A Famous Security BugJames Kuyper
+- Re: A Famous Security BugJoerg Mertens
+* Re: A Famous Security BugChris M. Thomasson
`* Re: A Famous Security BugStefan Ram

Pages:123456
Re: A Famous Security Bug

<utme8b$3jtip$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52373&group=comp.lang.c#52373

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc@freeuk.com (bart)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 11:26:03 +0000
Organization: A noiseless patient Spider
Lines: 66
Message-ID: <utme8b$3jtip$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<utkea9$31sr2$1@dont-email.me> <utktul$35ng8$1@dont-email.me>
<utm06k$3glqc$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 23 Mar 2024 11:26:03 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3148282f06fd82dfc3c7be70be8dd926";
logging-data="3798617"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Lf4m4Fa8OrGnbvOGRb+5q"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:5v+OpZB2yAOMEG0DyBxOJxORqgs=
In-Reply-To: <utm06k$3glqc$1@dont-email.me>
Content-Language: en-GB
 by: bart - Sat, 23 Mar 2024 11:26 UTC

On 23/03/2024 07:26, James Kuyper wrote:
> bart <bc@freeuk.com> writes:
>> On 22/03/2024 17:14, James Kuyper wrote:
> [...]
>>> If you want to tell a system not only what a program must do, but
>>> also how it must do it, you need to use a lower-level language than
>>> C.
>>
>> Which one?
>
> That's up to you. The point is, C is NOT that language.

I'm asking which /mainstream/ HLL is lower level than C. So specifically
ruling out assembly.

If there is no such choice, then this is the problem: it has to be C or
nothing.

>> I don't think anyone seriously wants to switch to assembly for the
>> sort of tasks they want to use C for.
>
> Why not? Assembly provides the kind of control you're looking for; C
> does not. If that kind of control is important to you, you have to find
> a language which provides it. If not assembler or C, what would you use?

Among non-mainstream ones, my own would fit the bill. Since I write the
implementations, I can ensure the compiler doesn't have a mind of its own.

However if somebody else tried to implement it, then I can't guarantee
the same behaviour. This would need to somehow be enforced with a
precise language spec, or mine would need to be a reference
implementation with a lot of test cases.

-----------------

Take this program:

#include <stdio.h>
int main(void) {
goto L;
0x12345678;
L:
printf("Hello, World!\n");
}

If I use my compiler, then that 12345678 pattern gets compiled into the
binary (because it is loaded into a register then discarded). That means
I can use that value as a marker or sentinel which can be searched for.

However no other compiler I tried will do that. If I instead change that
line to:

int a = 0x12345678;

then a tcc-compiled binary will contain that value. So will
lccwin32-compiled (with a warning). But not DMC or gcc.

If I get rid of the 'goto' , then gcc-O0 will work, but still not DMC or
gcc-O3.

Here I can use `volatile` to ensure that value stays in, but not if I
put the 'goto' back in!

It's all too unpredictable.

Re: A Famous Security Bug

<utmst2$3n7mv$2@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52377&group=comp.lang.c#52377

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 16:36:02 +0100
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <utmst2$3n7mv$2@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com>
<20240322094449.555@kylheku.com> <87cyrmyvnv.fsf@nosuchdomain.example.com>
<20240322123323.805@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 23 Mar 2024 15:36:02 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="78504154b5b778976e0d4e96fcf6a85e";
logging-data="3907295"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+DK2NnbC6xz9lnFvZFJZ/4HaorYL/x1uY="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:WHybhTkKRGPyrzAx+fQqXWmige0=
In-Reply-To: <20240322123323.805@kylheku.com>
Content-Language: en-GB
 by: David Brown - Sat, 23 Mar 2024 15:36 UTC

On 22/03/2024 20:43, Kaz Kylheku wrote:
> On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Is the "call" instruction *observable behavior* as defined in 5.1.2.3?
>

>> Running a program under a test harness is effectively running a
>> different program. Of course it can yield information about the
>> original program, but in effect you're linking the program with a
>> different set of libraries.
>
> It's a different program, but the retained translation unit must be the
> same, except that the external references it makes are resolved to
> different entities.

That is true - /if/ you make the restriction that the translation unit
is complied completely to linkable machine code or assembly, and that it
is not changed in any way when it is combined into the new program.
Such a setup is common in practice, but it is in no way required by the
C standards and does not apply for more advanced compilation and build
scenarios.

>
> If in one program we have an observable behavior which implies that a
> call took place (that itself not being directly observable, by
> definition, I again acknowledge) then under the same conditions in
> another program, that call also has to take place, by the fact that the
> translation unit has not changed.

Yes - again, /if/ you restrict your tools and build processes to make
this true. (And though the call may still be there, it is still not
observable behaviour, and it may no longer lead to any observable
behaviour in the new program.)

Basically, what you are saying is that if you have a compiler and build
system that compiles individual translation units into fixed individual
object files of linkable machine code, and these units are not
recompiled when you link them again in new programs, then the machine
code in for the externally linked functions defined in those translation
units is not changed.

I don't think anyone will argue with that - it is quite solid, and does
not come as news to anybody familiar with compilers and build processes.

The thing you get wrong is believing that the C standards require such a
compiler and build system. They don't - and thus all your beliefs
(about interaction across translation units) which depend on such a
requirement, fall apart.

Re: A Famous Security Bug

<20240323085544.1@kylheku.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52378&group=comp.lang.c#52378

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 16:06:34 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <20240323085544.1@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<20240321131621.321@kylheku.com> <utk1k9$2uojo$1@dont-email.me>
<20240322083037.20@kylheku.com> <utkgd2$32aj7$1@dont-email.me>
<wwva5mpwbh0.fsf@LkoBDZeT.terraraq.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 23 Mar 2024 16:06:34 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="bc8ead67574eda43cc8acb80cc4a36a2";
logging-data="3926330"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19D4IiLDlMKPlmV1RbUgDeU18Yz6AVmGGI="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:jukafmldlcFSJXR04xl5LEG2bI4=
 by: Kaz Kylheku - Sat, 23 Mar 2024 16:06 UTC

On 2024-03-23, Richard Kettlewell <invalid@invalid.invalid> wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> I have tried to explain the reality of what the C standards say in a
>> couple of posts (including one that I had not posted before you wrote
>> this one). I have tried to make things as clear as possible, and
>> hopefully you will see the point.
>>
>> If not, then you must accept that you interpret the C standards in a
>> different manner from the main compile vendors, as well as some "big
>> names" in this group. That is, of course, not proof in itself - but
>> you must realise that for practical purposes you need to be aware of
>> how others interpret the standard, both for your own coding and for
>> the advice or recommendations you give to others.
>
> Agreed that the ship has sailed on whether LTO is a valid optimization.

There is no question that LTO is a "valid" optimization for reasonable
definitions of valid.

> But it’s understandable why someone might reach a different conclusion.

That alone is a problem.

> - Phase 7 says the tokens are “semantically analyzed and translated as a
> translation unit”.
>
> - Phase 8 does not use either verb, “analyzed” or “translated”.

That adds up to requirements that are /obviously/ violated by LTO.

Someone might reach a different conclusion simply by reading the
black-and-white text, which obviously spells out what is required.

When reading the standard, you can't just ignore bits you think
are wrong.

It may be the case that a strictly conforming program cannot tell
whether these requirements are violated.

Strictly conforming programs are not the be all and end all of what is
important.

In the academic paradigm of a strictly conforming program, a security
problem of bytes not being nulled out (or any other such thing) does not
exist.

> This would be very easy to address, by replacing “collected” with a word
> or phrase that makes clear that further analysis and translation can
> happen outside the “as a translation unit” context.

No, it's not that easy to address. The standard should make explicit
provisions for LTO. There should be an optional translation phase
between the current 7 and 8 in which translation units may be
partitioned into subsets, an then subject to semantic analysis
and further translation within the subsets, prior to linking.

The standard wouldn't describe how the partitioning is requested from
the implementation, since it is part of the manner in which a program is
presented to it. All implementations should support a translation mode
in which no partitioning into subsets takes place.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: A Famous Security Bug

<20240323090700.848@kylheku.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52379&group=comp.lang.c#52379

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 16:07:47 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <20240323090700.848@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com>
<20240322094449.555@kylheku.com> <87cyrmyvnv.fsf@nosuchdomain.example.com>
<20240322123323.805@kylheku.com> <utmst2$3n7mv$2@dont-email.me>
Injection-Date: Sat, 23 Mar 2024 16:07:47 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="bc8ead67574eda43cc8acb80cc4a36a2";
logging-data="3926330"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+qHBKiggu79cR5pCRL9cEJO/E7j1gBP30="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:ePTVwSMEiByOICDJqhMbW0GiZ5g=
 by: Kaz Kylheku - Sat, 23 Mar 2024 16:07 UTC

On 2024-03-23, David Brown <david.brown@hesbynett.no> wrote:
> On 22/03/2024 20:43, Kaz Kylheku wrote:
>> On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>>> Is the "call" instruction *observable behavior* as defined in 5.1.2.3?
>>
>
>
>>> Running a program under a test harness is effectively running a
>>> different program. Of course it can yield information about the
>>> original program, but in effect you're linking the program with a
>>> different set of libraries.
>>
>> It's a different program, but the retained translation unit must be the
>> same, except that the external references it makes are resolved to
>> different entities.
>
> That is true - /if/ you make the restriction that the translation unit
> is complied completely to linkable machine code or assembly, and that it
> is not changed in any way when it is combined into the new program.
> Such a setup is common in practice, but it is in no way required by the
> C standards and does not apply for more advanced compilation and build
> scenarios.

Well, it's only not required if you hand-wave away the sentences in
section 5.

You can't just do that!

Re: A Famous Security Bug

<utmuqg$3nr3t$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52380&group=comp.lang.c#52380

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 17:08:48 +0100
Organization: A noiseless patient Spider
Lines: 76
Message-ID: <utmuqg$3nr3t$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<20240321131621.321@kylheku.com> <utk1k9$2uojo$1@dont-email.me>
<20240322083037.20@kylheku.com> <utkgd2$32aj7$1@dont-email.me>
<wwva5mpwbh0.fsf@LkoBDZeT.terraraq.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 23 Mar 2024 16:08:49 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="78504154b5b778976e0d4e96fcf6a85e";
logging-data="3927165"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19TalsGcRxRwpotHdTkF4xfH79swY5GWKc="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ZPpFgquswxzVYxbpeZP2rU7dXaw=
In-Reply-To: <wwva5mpwbh0.fsf@LkoBDZeT.terraraq.uk>
Content-Language: en-GB
 by: David Brown - Sat, 23 Mar 2024 16:08 UTC

On 23/03/2024 10:20, Richard Kettlewell wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> I have tried to explain the reality of what the C standards say in a
>> couple of posts (including one that I had not posted before you wrote
>> this one). I have tried to make things as clear as possible, and
>> hopefully you will see the point.
>>
>> If not, then you must accept that you interpret the C standards in a
>> different manner from the main compile vendors, as well as some "big
>> names" in this group. That is, of course, not proof in itself - but
>> you must realise that for practical purposes you need to be aware of
>> how others interpret the standard, both for your own coding and for
>> the advice or recommendations you give to others.
>
> Agreed that the ship has sailed on whether LTO is a valid optimization.
> But it’s understandable why someone might reach a different conclusion.

I /do/ understand why Kaz thinks the way he does. I am just trying to
show that his interpretation is wrong, so that he can better understand
what is going on, and how to get the behaviour he wants.

>
> - Phase 7 says the tokens are “semantically analyzed and translated as a
> translation unit”.
>
> - Phase 8 does not use either verb, “analyzed” or “translated”.
>
> - At least two steps (in the abstract, as-if model) are explicitly
> happening in the “as a translation unit” level but not in any wider
> context.
>
> - The result of those two steps (“translator output”) is than
> “collected”.
>
> - Unless you somehow understand that “collected” implicitly includes
> further analysis and translation, it’s does not seem unnatural to
> conclude that many of the whole-program optimizations done by LTO
> implementations would be outside the spec.
>
> This would be very easy to address, by replacing “collected” with a word
> or phrase that makes clear that further analysis and translation can
> happen outside the “as a translation unit” context.
>

I would be entirely happy to see clearer wording in the standards here,
or at least some footnotes saying what is allowed or not allowed.

> Obviously this would violate the principle from the rationale that
> existing code (that uses TU boundaries to get memset to “work”) is
> important and existing implementations (LTO) are not, but C
> standardization has never actually behaved as if that is true anyway.
>

Oh, I think the C standards committee have done quite well at that. But
doing it /completely/ would clearly be impossible, as different people
have different ideas about how they think C is defined, and how they
think C compilers have to behave. In my line of work, I see plenty of
old code that makes assumptions that are not remotely justified by the C
standards, but which happened to work on the old or limited toolchain
used by the person who wrote the code. If the C standards tried to
codify such practices, or if C compilers tried to make sure that /all/
code that worked with other compilers or older versions works on newer
tools, progress on compilers would be completely stalled and we'd have
no optimisations that weren't already in common use in the 1970's.

What the standards committee try to say is that if code follows C
standard N correctly, then when it is compiled under C standard N+1 it
should have the same semantics and the same behaviour. And they do that
reasonably, but not perfectly.

It would be unreasonable to expect them to guarantee the behaviour of
code under new standards when the code did not have guaranteed behaviour
under the old standards. Using TU boundaries to "get memset to work"
has never been guaranteed.

Re: A Famous Security Bug

<utmvq5$3o50v$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52381&group=comp.lang.c#52381

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 17:25:40 +0100
Organization: A noiseless patient Spider
Lines: 83
Message-ID: <utmvq5$3o50v$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<utkea9$31sr2$1@dont-email.me> <utktul$35ng8$1@dont-email.me>
<875xxdzvxj.fsf@nosuchdomain.example.com> <20240322170425.543@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 23 Mar 2024 16:25:41 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="78504154b5b778976e0d4e96fcf6a85e";
logging-data="3937311"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX192XBshDiNSmcK7oUpHcEHlQWRW++iq7dE="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:09mQFwXTZYCnuV6vXvu/mBmKRQo=
Content-Language: en-GB
In-Reply-To: <20240322170425.543@kylheku.com>
 by: David Brown - Sat, 23 Mar 2024 16:25 UTC

On 23/03/2024 01:09, Kaz Kylheku wrote:
> On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> bart <bc@freeuk.com> writes:
>>> On 22/03/2024 17:14, James Kuyper wrote:
>> [...]
>>>> If you want to tell a system not only what a program must do, but
>>>> also how it must do it, you need to use a lower-level language than
>>>> C.
>>>
>>> Which one?
>>
>> Good question.

I have no answer here either.

>>
>>> I don't think anyone seriously wants to switch to assembly for the
>>> sort of tasks they want to use C for.

One of the stated motivations for creating C was to save people from
writing code in assembly!

>>
>> Agreed. What some people seem to be looking for is a language that's
>> about as portable as C, but where every language construct is required
>> to result in generated code that performs the specified operation.
>> There's a lot of handwaving in that description. "C without
>> optimization", maybe?
>>
>> I'm not aware that any such language exists, at least in the mainstream
>> (and I've looked at a *lot* of programming languages). I conclude that
>> there just isn't enough demand for that kind of thing.

I think lack of demand combines with it actually being an extremely
difficult task.

Consider something as simple as "x++;" in C. How could that be
implemented? Perhaps the cpu has an "increment" instruction. Perhaps
it has an "add immediate" instruction. Perhaps it needs to load 1 into
a register, then use an "add" instruction. Perhaps "x" is in memory.
Some cpus can execute an increment directly on the memory address as an
atomic instruction. Some can do so, but only using specific (and more
expensive) instructions. Some can't do it at all without locking
mechanisms and synchronisation loops.

So what does this user of this mythical LLL expect when he/she writes
"x++;" ? If the language had been created in the days of 8086 on DOS,
perhaps it would have been defined as an atomic operation - and now
doing this atomically on an AArch64 device would be extremely inefficient.

The big trouble with saying that the compiler should "do what I say" is
that people have very different ideas about what they mean when they
write things. You either have to have quite high-level and abstract
definitions about meanings and give compilers a fair amount of freedom
when implementing them (thus you get high-level languages defined by
behaviours on abstract machines - like C and just about every other
programming language), or you have to tie it tightly to the target
processor (and you get assembly), or the language designer, the compiler
implementers and the programmers all have to think exactly the same way
(which really means one-person languages, like Bart's).

>
> I think you can more or less get something like that with the following
> strategy:
>
> - all memory accesses through pointers are performed as written.
> - local variables are aggressively optimized into registers.
> - basic optimizations:
> - constant folding, dead code elimination.
> - basic control flow ones: jump threading and the like.
> - basic data flow optimizations.
> - peephole, good instruction selection.
>
> In that environment, the way the programmer writes the code is the rest
> of the optimization. Want loop unrolling? Write it yourself.
>

You might like to try to formalise this. You won't be the first to
attempt it. But you might be the first to succeed, because no one
(AFAIK) has managed it so far.

Re: A Famous Security Bug

<utn1a0$3ogob$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52382&group=comp.lang.c#52382

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 17:51:12 +0100
Organization: A noiseless patient Spider
Lines: 125
Message-ID: <utn1a0$3ogob$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<utkea9$31sr2$1@dont-email.me> <utktul$35ng8$1@dont-email.me>
<utm06k$3glqc$1@dont-email.me> <utme8b$3jtip$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 23 Mar 2024 16:51:12 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="78504154b5b778976e0d4e96fcf6a85e";
logging-data="3949323"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+qajFyIj7wSLAML8V7iOyhBVamO5AU3NM="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:Ls7+v8SAyy7e3my4bOwW2O5FrCY=
Content-Language: en-GB
In-Reply-To: <utme8b$3jtip$1@dont-email.me>
 by: David Brown - Sat, 23 Mar 2024 16:51 UTC

On 23/03/2024 12:26, bart wrote:
> On 23/03/2024 07:26, James Kuyper wrote:
>> bart <bc@freeuk.com> writes:
>>> On 22/03/2024 17:14, James Kuyper wrote:
>> [...]
>>>> If you want to tell a system not only what a program must do, but
>>>> also how it must do it, you need to use a lower-level language than
>>>> C.
>>>
>>> Which one?
>>
>> That's up to you. The point is, C is NOT that language.
>
> I'm asking which /mainstream/ HLL is lower level than C. So specifically
> ruling out assembly.
>
> If there is no such choice, then this is the problem: it has to be C or
> nothing.

How much of a problem is it, really?

My field is probably the place where low level programming is most
ubiquitous. There are plenty of people who use assembly - for good
reasons or for bad (or for reasons that some people think are good,
other people think are bad). C is the most common choice.

Other languages used for small systems embedded programming include C++,
Ada, Forth, BASIC, Pascal, Lua, and Micropython. Forth is the only one
that could be argued as lower-level or more "directly translated" than C.

The trick to writing low-level code in C (or C++) is not to pretend that
C is a "directly translated" language, or to fight with your compiler.
It is to learn how to work /with/ your compiler and its optimisations to
get what you need. Complaining that "LTO broke my code" does not make
your product work. Arbitrarily disabling optimisations that you feel
are "bad" or imagine to be non-conforming is just kicking the can down
the road. You learn what /actually/ works - as guaranteed by the C
standards, or by your compiler.

Sometimes that means using compiler-specific or target-specific
extensions. That's okay. No one ever suggested that pure C-standard C
code was sufficient for all tasks. C was designed to allow some coding
to be done in a highly portable and re-usable manner, and also to
support non-portable systems programming relying on the implementation,
and this has not changed. When I write code for low-level use on a
specific microcontroller, I am not writing portable code anyway.

So what language is lower level than C? GCC C (or clang C, or IAR C for
the 8051, or any other specific C compiler).

How would /I/ ensure that after "memset(buffer, 0, sizeof(buffer));"
that the buffer was really written with zeros? I'd follow it with:

asm ("" : "+m" (buffer));

That's a gcc extension, but it will guarantee that the buffer is cleared
- without any other costs.

(Alternatively, I'd clear the memory using volatile writes, rather than
memset.)

>
>>> I don't think anyone seriously wants to switch to assembly for the
>>> sort of tasks they want to use C for.
>>
>> Why not? Assembly provides the kind of control you're looking for; C
>> does not. If that kind of control is important to you, you have to find
>> a language which provides it. If not assembler or C, what would you use?
>
> Among non-mainstream ones, my own would fit the bill. Since I write the
> implementations, I can ensure the compiler doesn't have a mind of its own.
>
> However if somebody else tried to implement it, then I can't guarantee
> the same behaviour. This would need to somehow be enforced with a
> precise language spec, or mine would need to be a reference
> implementation with a lot of test cases.
>
>
> -----------------
>
> Take this program:
>
>   #include <stdio.h>
>   int main(void) {
>       goto L;
>       0x12345678;
>   L:
>       printf("Hello, World!\n");
>   }
>
> If I use my compiler, then that 12345678 pattern gets compiled into the
> binary (because it is loaded into a register then discarded). That means
> I can use that value as a marker or sentinel which can be searched for.
>
> However no other compiler I tried will do that. If I instead change that
> line to:
>
>     int a = 0x12345678;
>
> then a tcc-compiled binary will contain that value. So will
> lccwin32-compiled (with a warning). But not DMC or gcc.
>
> If I get rid of the 'goto' , then gcc-O0 will work, but still not DMC or
> gcc-O3.
>
> Here I can use `volatile` to ensure that value stays in, but not if I
> put the 'goto' back in!
>
> It's all too unpredictable.
>

The /minimum/ requirements of the compiler are very predictable. The
details beyond that are not - which is completely as expected. You are
trying to achieve an effect that cannot be expressed in C, and thus it
is folly to expect a simple way to achieve it with any C compiler. You
will find that with many C compilers you can get what you want, but you
have to write it in a way that suits the compiler. For gcc, you might
do it by putting a const variable in an explicit linker section using a
gcc-specific __attribute__. Maybe you can get it by using a volatile
and /not/ removing the "goto".

But if you want to do something that has no semantic meaning in the
language you are using, you can't expect compilers to support a
particular way to achieve this!

Re: A Famous Security Bug

<bMDLN.127695$zF_1.78843@fx18.iad>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52383&group=comp.lang.c#52383

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!news.uni-stuttgart.de!npeer.as286.net!npeer-ng0.as286.net!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx18.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: scott@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: A Famous Security Bug
Newsgroups: comp.lang.c
References: <bug-20240320191736@ram.dialup.fu-berlin.de> <20240320114218.151@kylheku.com> <20240321211306.779b21d126e122556c34a346@gmail.moc> <utkea9$31sr2$1@dont-email.me> <utktul$35ng8$1@dont-email.me> <875xxdzvxj.fsf@nosuchdomain.example.com> <20240322170425.543@kylheku.com> <utmvq5$3o50v$1@dont-email.me>
Lines: 18
Message-ID: <bMDLN.127695$zF_1.78843@fx18.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sat, 23 Mar 2024 16:51:19 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sat, 23 Mar 2024 16:51:19 GMT
X-Received-Bytes: 1802
 by: Scott Lurndal - Sat, 23 Mar 2024 16:51 UTC

David Brown <david.brown@hesbynett.no> writes:
>On 23/03/2024 01:09, Kaz Kylheku wrote:
>> On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>>> bart <bc@freeuk.com> writes:
>>>> On 22/03/2024 17:14, James Kuyper wrote:

>Consider something as simple as "x++;" in C. How could that be
>implemented? Perhaps the cpu has an "increment" instruction. Perhaps
>it has an "add immediate" instruction. Perhaps it needs to load 1 into
>a register, then use an "add" instruction. Perhaps "x" is in memory.
>Some cpus can execute an increment directly on the memory address as an
>atomic instruction. Some can do so, but only using specific (and more
>expensive) instructions. Some can't do it at all without locking
>mechanisms and synchronisation loops.

And some can do it as a side effect of an indirect load, for
example autoindexing on the PDP-8.

Re: A Famous Security Bug

<20240323094244.435@kylheku.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52384&group=comp.lang.c#52384

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 16:56:09 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 64
Message-ID: <20240323094244.435@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<20240321131621.321@kylheku.com> <utk1k9$2uojo$1@dont-email.me>
<20240322083037.20@kylheku.com> <utkgd2$32aj7$1@dont-email.me>
<wwva5mpwbh0.fsf@LkoBDZeT.terraraq.uk> <utmuqg$3nr3t$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="bc8ead67574eda43cc8acb80cc4a36a2";
logging-data="3951151"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18/fBayGwq0RgwUECyXmlNlNAQFWSMKj6w="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:6d5JsyeoQ3sQJRSgyAUNP10XWFM=
 by: Kaz Kylheku - Sat, 23 Mar 2024 16:56 UTC

On 2024-03-23, David Brown <david.brown@hesbynett.no> wrote:
> On 23/03/2024 10:20, Richard Kettlewell wrote:
>> David Brown <david.brown@hesbynett.no> writes:
>>> I have tried to explain the reality of what the C standards say in a
>>> couple of posts (including one that I had not posted before you wrote
>>> this one). I have tried to make things as clear as possible, and
>>> hopefully you will see the point.
>>>
>>> If not, then you must accept that you interpret the C standards in a
>>> different manner from the main compile vendors, as well as some "big
>>> names" in this group. That is, of course, not proof in itself - but
>>> you must realise that for practical purposes you need to be aware of
>>> how others interpret the standard, both for your own coding and for
>>> the advice or recommendations you give to others.
>>
>> Agreed that the ship has sailed on whether LTO is a valid optimization.
>> But it’s understandable why someone might reach a different conclusion.
>
> I /do/ understand why Kaz thinks the way he does. I am just trying to
> show that his interpretation is wrong, so that he can better understand
> what is going on, and how to get the behaviour he wants.

I'm just looking at what very plain, simple sentences are saying and
taking it as-is.

> I would be entirely happy to see clearer wording in the standards here,
> or at least some footnotes saying what is allowed or not allowed.

The wording isn't unclear in any way, though.

What is needed is equally clear new wording which acknowledges the LTO
model of program construction that is currently not described.

That could be done without changing any of the existing wording.
A new translation phase could be wedged between 7 and 8 stating
that translation units may be optionally partitioned into subsets,
and those subsets subject to further semantic analysis and translation,
resulting in merged translation units.

The standard currently presents a reference model that is squarely based
on traditional technology.

If you read the Rationale for C89, mostly they were concerned with how
different models of linkage treat multiply defined identifers, and
worked out a common specification that allows programs to be portable
among those different linkage models.

Ideas like LTO were not on the radar.

> It would be unreasonable to expect them to guarantee the behaviour of
> code under new standards when the code did not have guaranteed behaviour
> under the old standards. Using TU boundaries to "get memset to work"
> has never been guaranteed.

memset is part of the language. It doesn't have to be a function
in another translation unit that is reached via external linkage.
The inclusion of <string.h> can bring in an inline or at least static
definition. Compilers have treated memset as if it were a built-in
primitive. That is justified. It is not part of my topic about LTO.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: A Famous Security Bug

<utn57t$3pbh7$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52386&group=comp.lang.c#52386

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 18:58:21 +0100
Organization: A noiseless patient Spider
Lines: 47
Message-ID: <utn57t$3pbh7$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com>
<20240322094449.555@kylheku.com> <87cyrmyvnv.fsf@nosuchdomain.example.com>
<20240322123323.805@kylheku.com> <utmst2$3n7mv$2@dont-email.me>
<20240323090700.848@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="78504154b5b778976e0d4e96fcf6a85e";
logging-data="3976743"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18RyOAXbeGizPfUDMXY7MfSXD9WkPn9vbs="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:EHOkaRU5/JqZzCamYRcTvPnehVk=
Content-Language: en-GB
In-Reply-To: <20240323090700.848@kylheku.com>
 by: David Brown - Sat, 23 Mar 2024 17:58 UTC

On 23/03/2024 17:07, Kaz Kylheku wrote:
> On 2024-03-23, David Brown <david.brown@hesbynett.no> wrote:
>> On 22/03/2024 20:43, Kaz Kylheku wrote:
>>> On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>>>> Is the "call" instruction *observable behavior* as defined in 5.1.2.3?
>>>
>>
>>
>>>> Running a program under a test harness is effectively running a
>>>> different program. Of course it can yield information about the
>>>> original program, but in effect you're linking the program with a
>>>> different set of libraries.
>>>
>>> It's a different program, but the retained translation unit must be the
>>> same, except that the external references it makes are resolved to
>>> different entities.
>>
>> That is true - /if/ you make the restriction that the translation unit
>> is complied completely to linkable machine code or assembly, and that it
>> is not changed in any way when it is combined into the new program.
>> Such a setup is common in practice, but it is in no way required by the
>> C standards and does not apply for more advanced compilation and build
>> scenarios.
>
> Well, it's only not required if you hand-wave away the sentences in
> section 5.
>
> You can't just do that!

And it is only required if you read between the lines in section 5 and
see things that simply are not there. You can't just do that!

I believe we are at an impasse here, unless someone can think of a new
point to make.

One thing I would ask before leaving this - could you take a look at the
latest draft for the next C standard after C23?

<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf>

Look at the definitions of the "reproducible" and "unsequenced" function
type attributes in 6.7.13.8. In particular, look at the leeway
explicitly given to the compiler for re-arranging code in 6.7.13.8.3p6
and similar examples. Consider how that fits (or fails to fit) with
your interpretation of the translation phases in section 5.

Re: A Famous Security Bug

<86wmpskcth.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52388&group=comp.lang.c#52388

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 11:44:42 -0700
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <86wmpskcth.fsf@linuxsc.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de> <20240320114218.151@kylheku.com> <20240321211306.779b21d126e122556c34a346@gmail.moc> <utkea9$31sr2$1@dont-email.me> <utktul$35ng8$1@dont-email.me> <utm06k$3glqc$1@dont-email.me> <utme8b$3jtip$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="6af46d5f1415f729bb6ec55b5ea784b7";
logging-data="4000316"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18ohsUKAkmgchi5CkFt7kKadHuJ9nCJmLw="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:TpwePTFs7Amp/2RJxf9/QVr8g7Q=
sha1:vdNjdEuX3BLpA7sJCAHIPHzp5Gw=
 by: Tim Rentsch - Sat, 23 Mar 2024 18:44 UTC

bart <bc@freeuk.com> writes:

> On 23/03/2024 07:26, James Kuyper wrote:
>
>> bart <bc@freeuk.com> writes:
>>
>>> On 22/03/2024 17:14, James Kuyper wrote:
>>
>> [...]
>>
>>>> If you want to tell a system not only what a program must do, but
>>>> also how it must do it, you need to use a lower-level language than
>>>> C.
>>>
>>> Which one?
>>
>> That's up to you. The point is, C is NOT that language.
>
> I'm asking which /mainstream/ HLL is lower level than C. So
> specifically ruling out assembly.
>
> If there is no such choice, then this is the problem: it has to be C
> or nothing.

If it has to be C or nothing, then it's nothing. Some people might
not like that, but that's the way it is.

Re: A Famous Security Bug

<utnca0$3r5uk$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52391&group=comp.lang.c#52391

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc@freeuk.com (bart)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 19:58:56 +0000
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <utnca0$3r5uk$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<utkea9$31sr2$1@dont-email.me> <utktul$35ng8$1@dont-email.me>
<875xxdzvxj.fsf@nosuchdomain.example.com> <20240322170425.543@kylheku.com>
<utmvq5$3o50v$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 23 Mar 2024 19:58:56 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3148282f06fd82dfc3c7be70be8dd926";
logging-data="4036564"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX181BmxRPxWyTfA4RWX4YLzA"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:gl/DGBn6XgF3QMBvnoJUJkVXNv4=
Content-Language: en-GB
In-Reply-To: <utmvq5$3o50v$1@dont-email.me>
 by: bart - Sat, 23 Mar 2024 19:58 UTC

On 23/03/2024 16:25, David Brown wrote:
> On 23/03/2024 01:09, Kaz Kylheku wrote:
>> On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

>>> I'm not aware that any such language exists, at least in the mainstream
>>> (and I've looked at a *lot* of programming languages).  I conclude that
>>> there just isn't enough demand for that kind of thing.
>
> I think lack of demand combines with it actually being an extremely
> difficult task.
>
> Consider something as simple as "x++;" in C.  How could that be
> implemented?  Perhaps the cpu has an "increment" instruction.  Perhaps
> it has an "add immediate" instruction.  Perhaps it needs to load 1 into
> a register, then use an "add" instruction.  Perhaps "x" is in memory.
> Some cpus can execute an increment directly on the memory address as an
> atomic instruction.  Some can do so, but only using specific (and more
> expensive) instructions.  Some can't do it at all without locking
> mechanisms and synchronisation loops.
>
> So what does this user of this mythical LLL expect when he/she writes
> "x++;" ?

This is not the issue the comes up in the OP (or the issue that was
assumed as I don't think the OP has clarified).

There it is not about micro-managing the implementation of x++, but the
compiler deciding it isn't needed at all.

Re: A Famous Security Bug

<utnh5m$3sdhk$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52393&group=comp.lang.c#52393

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc@freeuk.com (bart)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 21:21:58 +0000
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <utnh5m$3sdhk$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<utkea9$31sr2$1@dont-email.me> <utktul$35ng8$1@dont-email.me>
<utm06k$3glqc$1@dont-email.me> <utme8b$3jtip$1@dont-email.me>
<utn1a0$3ogob$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 23 Mar 2024 21:21:58 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="3148282f06fd82dfc3c7be70be8dd926";
logging-data="4077108"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/TqBcrhwpMGlyteQEF1uNo"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:Xu5TNo1OXZ/6eAohocVaankPGmE=
Content-Language: en-GB
In-Reply-To: <utn1a0$3ogob$1@dont-email.me>
 by: bart - Sat, 23 Mar 2024 21:21 UTC

On 23/03/2024 16:51, David Brown wrote:
> On 23/03/2024 12:26, bart wrote:
>> On 23/03/2024 07:26, James Kuyper wrote:
>>> bart <bc@freeuk.com> writes:
>>>> On 22/03/2024 17:14, James Kuyper wrote:
>>> [...]
>>>>> If you want to tell a system not only what a program must do, but
>>>>> also how it must do it, you need to use a lower-level language than
>>>>> C.
>>>>
>>>> Which one?
>>>
>>> That's up to you. The point is, C is NOT that language.
>>
>> I'm asking which /mainstream/ HLL is lower level than C. So
>> specifically ruling out assembly.
>>
>> If there is no such choice, then this is the problem: it has to be C
>> or nothing.
>
> How much of a problem is it, really?
>
> My field is probably the place where low level programming is most
> ubiquitous.  There are plenty of people who use assembly - for good
> reasons or for bad (or for reasons that some people think are good,
> other people think are bad).  C is the most common choice.
>
> Other languages used for small systems embedded programming include C++,
> Ada, Forth, BASIC, Pascal, Lua, and Micropython.  Forth is the only one
> that could be argued as lower-level or more "directly translated" than C.

Well, Forth is certainly cruder than C (it's barely a language IMO). But
I don't remember seeing anything in it resembling a type system that
corresponds to the 'i8-i64 u8-u64 f32-f64' types typical in current
hardware. (Imagine trying to create a precisely laid out struct.)

It is just too weird. I think I'd rather take my chances with C.

> BASIC, ..., Lua, and Micropython.

Hmm, I think my own scripting language is better at low level than any
of these. It supports those low-level types for a start. And I can do
stuff like this:

println peek(0x40'0000, u16):"m"

fun peek(addr, t=byte) = makeref(addr, t)^

This displays 'MZ', the signature of the (low-)loaded EXE image on Windows

Possibly it is even better than C; is this little program valid (no UB)
C, even when it is known that the program is low-loaded:

#include <stdio.h>
typedef unsigned char byte;

int main(void) {
printf("%c%c\n", *(byte*)0x400000, *(byte*)0x400001);
}

This works on DMC, tcc, mcc, lccwin, but not gcc because that loads
programs at high addresses. The problem being that the address involved,
while belonging to the program, is outside of any C data objects.

Re: A Famous Security Bug

<utnt30$3v0ck$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52396&group=comp.lang.c#52396

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jameskuyper@alumni.caltech.edu (James Kuyper)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sat, 23 Mar 2024 12:51:58 -0400
Organization: A noiseless patient Spider
Lines: 19
Message-ID: <utnt30$3v0ck$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com>
<20240322094449.555@kylheku.com> <87cyrmyvnv.fsf@nosuchdomain.example.com>
<20240322123323.805@kylheku.com> <utmst2$3n7mv$2@dont-email.me>
<20240323090700.848@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 24 Mar 2024 00:45:20 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="811208a30f52470ed826a66a2c4763ae";
logging-data="4161940"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+pV6LNYwMnMh13vKZxsSpE3g7a8dvKwCA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:2kEN3A4wMHouDUdcfGsAoYfi0CA=
In-Reply-To: <20240323090700.848@kylheku.com>
Content-Language: en-US
 by: James Kuyper - Sat, 23 Mar 2024 16:51 UTC

On 3/23/24 12:07, Kaz Kylheku wrote:
> On 2024-03-23, David Brown <david.brown@hesbynett.no> wrote:
....
>> That is true - /if/ you make the restriction that the translation unit
>> is complied completely to linkable machine code or assembly, and that it
>> is not changed in any way when it is combined into the new program.
>> Such a setup is common in practice, but it is in no way required by the
>> C standards and does not apply for more advanced compilation and build
>> scenarios.
>
> Well, it's only not required if you hand-wave away the sentences in
> section 5.

Or, you could read the whole of section 5. 5.1.2.3p6 makes it clear that
all of the other requirements of the standard apply only insofar as the
observable behavior of the program is concerned. Any method of achieving
observable behavior that matches the behavior that would be permitted if
the abstract semantics were followed, is permitted, even if the actual
semantics producing that behavior are quite different from those specified.

Re: A Famous Security Bug

<20240323173522.946@kylheku.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52397&group=comp.lang.c#52397

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sun, 24 Mar 2024 01:23:00 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 88
Message-ID: <20240323173522.946@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com>
<20240322094449.555@kylheku.com> <87cyrmyvnv.fsf@nosuchdomain.example.com>
<20240322123323.805@kylheku.com> <utmst2$3n7mv$2@dont-email.me>
<20240323090700.848@kylheku.com> <utn57t$3pbh7$1@dont-email.me>
Injection-Date: Sun, 24 Mar 2024 01:23:00 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="866d3663afea439826bdeb05c35522a4";
logging-data="4176815"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/jQ7JGdupZntLas6HOvwdX2SsaREd9XMQ="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:tu+IITRRNrnVqlWc2KykEP+vpD8=
 by: Kaz Kylheku - Sun, 24 Mar 2024 01:23 UTC

On 2024-03-23, David Brown <david.brown@hesbynett.no> wrote:
> I believe we are at an impasse here, unless someone can think of a new
> point to make.

How about a completely different one about a related but separate
matter (small one).

It has occurred to me that the definition of "translation unit" is
lacking a little bit in regard to existing practice. Or that at least
it could use a footnote:

"A source file together with all the headers and source files included
via the preprocessing directive #include is known as a preprocessing
translation unit."

But in fact, in actual compilers we can do something like this:

gcc -DMAIN='int main(void) { puts("hello"); }'

and then in the source file we can have

#include <stdio.h>
MAIN

the point is that a translation unit tokens can come from sources
other than a source file and its included header files.

Say we have:

printf '#include <stdio.h>\nMAIN\n' | \
gcc -DMAIN='int main(void) { puts("hello"); }' -x c -

How we can subject this to a standard-based interpretation is
to identify the output of printf piped into gcc, as well as
the -DMAIN option, as being the "source file".

"The text of the program is kept in units called source files, (or
preprocessing files) in this document."

Thus the unit in which we are keeping the source in the above shell
script is identifiable as the content of the pipe, and the symbol MAIN.
It is understood that the MAIN symbol precedes the content of the pipe.
Those things together are the "source file".

This is all fine, but could benefit from a foot note like "A source file
need not be a single data unit accessible by name in a file system.
Implementations may allow situations such as source code dynamically
generated, transmitted to the translator via an interprocess
communication mechanism or network. Furthermore, implementations may
allow some tokens of a translation unit to be injected via a
configuraton mechanism, such as command line arguments."
>
> One thing I would ask before leaving this - could you take a look at the
> latest draft for the next C standard after C23?
>
><https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf>

Thanks, I'm now using that, discontinuing most use of n3096.

> Look at the definitions of the "reproducible" and "unsequenced" function
> type attributes in 6.7.13.8. In particular, look at the leeway
> explicitly given to the compiler for re-arranging code in 6.7.13.8.3p6
> and similar examples. Consider how that fits (or fails to fit) with
> your interpretation of the tranSlation phases in section 5.

These are intersting and useful attributes. They are ortoghonal to the
translation unit issue though.

If we declare that a function in another translation unit is
reproducible, and we call it twice with the same arguments, then
two calls need not take place.

That is not anything like LTO: the function attributes which drives
those semantic possibilities comes from the same translation unit.

If a function is attributed as "reproducible" or "unsequenced" in another
translation unit, such that this is not visible to our current
translation unit (the header file declaration for the function omits
the attributes), then it looks like an ordinary function. If we call
it twice, it gets called twice.

There is no conflict between the semantics of these advanced attributes,
and the claim that LTO is nonconforming.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: A Famous Security Bug

<20240323182314.725@kylheku.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52400&group=comp.lang.c#52400

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sun, 24 Mar 2024 05:50:44 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 69
Message-ID: <20240323182314.725@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com>
<20240322094449.555@kylheku.com> <87cyrmyvnv.fsf@nosuchdomain.example.com>
<20240322123323.805@kylheku.com> <utmst2$3n7mv$2@dont-email.me>
<20240323090700.848@kylheku.com> <utnt30$3v0ck$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 24 Mar 2024 05:50:44 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="866d3663afea439826bdeb05c35522a4";
logging-data="204872"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ui4XlYMkTWZ6ymrXjfsVr/kB39cwYyrA="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:/OgJ46LTQWtPBJzM9LtyiTZeENs=
 by: Kaz Kylheku - Sun, 24 Mar 2024 05:50 UTC

On 2024-03-23, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> On 3/23/24 12:07, Kaz Kylheku wrote:
>> On 2024-03-23, David Brown <david.brown@hesbynett.no> wrote:
> ...
>>> That is true - /if/ you make the restriction that the translation unit
>>> is complied completely to linkable machine code or assembly, and that it
>>> is not changed in any way when it is combined into the new program.
>>> Such a setup is common in practice, but it is in no way required by the
>>> C standards and does not apply for more advanced compilation and build
>>> scenarios.
>>
>> Well, it's only not required if you hand-wave away the sentences in
>> section 5.
>
> Or, you could read the whole of section 5. 5.1.2.3p6 makes it clear that
> all of the other requirements of the standard apply only insofar as the

Aha, so you agree there are requirements, just that the behavior they
imply can be achieved without them being followed in every detail.

> observable behavior of the program is concerned.

I believe what you're referring to is now in 5.1.2.4¶6 in N3220.

Yes, you make the excellent point.

If we make any claim about conformance, it has to be rooted in
observable behavior, which is the determiner of conformance.

But we will not find that problem in LTO. If any empirical test of a LTO
implementation shows that there is a difference in the ISO C observable
behavior of a strictly conforming program, that LTO implementation
obviously has a bug, not LTO itself. (So why bother looking.) I mean,
the absolute baseline requirement any LTO implementor strives toward is
no change in observable behavior in a strictly conforming program, which
would be a showstopper.

At best we may be able to say that if those requirements with regard
to translation phase 7 and 9 separation are assiduously followed, the
implementation belongs to a certain identifiable class, which is
suitable for certain purposes (or for certain ways of expressing those
purposes in a program). Certain techniques will be reliable that
would otherwise be not. However, since it is something not reflected in
observable behavior (as defined in ISO C), the class division does not
land along the line of conforming versus non-conforming.

> Any method of achieving observable behavior that matches the behavior
> that would be permitted if the abstract semantics were followed, is
> permitted, even if the actual semantics producing that behavior are
> quite different from those specified.

I've never lost sight of that; however, in this case somehow,
there is something different.

The problem is that that requirements in question, that I have
been concerned about, are not in fact necessary, the first place, for
establishing what the observable behavior is.

It's not the case that the requirements are necesary, but then
another path can be found to that observable behavior.

That is to say, the description of translation phase 7 (for the purposes
of observable behavior and conformance) could as well say that "the
tokens are smantically analyzed and translated, possibly with the help
of access to any truth whatsoever related to the entire program's
observable behavior, by means of a magic oracle." As long as no
falsehood in relation to observable behavior is relied upon by mistake,
all is well as far as ensuring the right observable behavior,
which is synonymous with conforming.

Re: A Famous Security Bug

<utp9ct$cmur$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52405&group=comp.lang.c#52405

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sun, 24 Mar 2024 14:21:32 +0100
Organization: A noiseless patient Spider
Lines: 72
Message-ID: <utp9ct$cmur$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com>
<20240322094449.555@kylheku.com> <87cyrmyvnv.fsf@nosuchdomain.example.com>
<20240322123323.805@kylheku.com> <utmst2$3n7mv$2@dont-email.me>
<20240323090700.848@kylheku.com> <utnt30$3v0ck$1@dont-email.me>
<20240323182314.725@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 24 Mar 2024 13:21:33 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="af297f15341d352325f54a52911dae41";
logging-data="416731"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+vo/Y+ieCLIDP2eKD1hpItZD0FhAiHsFE="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:jFkMWDF67p+DvE9Cu71hdEv6EO4=
Content-Language: en-GB
In-Reply-To: <20240323182314.725@kylheku.com>
 by: David Brown - Sun, 24 Mar 2024 13:21 UTC

On 24/03/2024 06:50, Kaz Kylheku wrote:
> On 2024-03-23, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
>> On 3/23/24 12:07, Kaz Kylheku wrote:
>>> On 2024-03-23, David Brown <david.brown@hesbynett.no> wrote:
>> ...
>>>> That is true - /if/ you make the restriction that the translation unit
>>>> is complied completely to linkable machine code or assembly, and that it
>>>> is not changed in any way when it is combined into the new program.
>>>> Such a setup is common in practice, but it is in no way required by the
>>>> C standards and does not apply for more advanced compilation and build
>>>> scenarios.
>>>
>>> Well, it's only not required if you hand-wave away the sentences in
>>> section 5.
>>
>> Or, you could read the whole of section 5. 5.1.2.3p6 makes it clear that
>> all of the other requirements of the standard apply only insofar as the
>
> Aha, so you agree there are requirements, just that the behavior they
> imply can be achieved without them being followed in every detail.
>
>> observable behavior of the program is concerned.
>
> I believe what you're referring to is now in 5.1.2.4¶6 in N3220.

Yes. Usually the C standards committee try to avoid inserting sections
and the resulting changes in numbering, but they have, for some reason,
given the first paragraph of 5.1.2 its own section number in n3220 and
bumped everything down a step.

>
> Yes, you make the excellent point.
>
> If we make any claim about conformance, it has to be rooted in
> observable behavior, which is the determiner of conformance.

Agreed.

>
> But we will not find that problem in LTO. If any empirical test of a LTO
> implementation shows that there is a difference in the ISO C observable
> behavior of a strictly conforming program, that LTO implementation
> obviously has a bug, not LTO itself.

Yes. Any optimisation that changes the observable behaviour of a
program (other than amongst alternative correct behaviours - sometimes
there are several for the same input, as a result of unspecified
behaviours) is invalid as an optimisation. (I am assuming the program
does not execute any undefined behaviour - otherwise all bets are off.)

This applies to all optimisations and to the compilation itself -
optimisations don't get to change the observable behaviour. Equally,
any re-arrangement of code or other effects of the compiler that don't
change the observable behaviour are perfectly valid and don't imply
non-conformity.

> (So why bother looking.) I mean,
> the absolute baseline requirement any LTO implementor strives toward is
> no change in observable behavior in a strictly conforming program, which
> would be a showstopper.
>

Yes.

I don't believe anyone - except you - has said anything otherwise. A C
implementation is conforming if and only if it takes any correct C
source code and generates a program image that always has correct
observable behaviour when no undefined behaviour is executed. There are
no extra imaginary requirements to be conforming, such as not being
allowed to use extra information while compiling translation units.

Re: A Famous Security Bug

<utpaj9$cvh3$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52406&group=comp.lang.c#52406

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sun, 24 Mar 2024 14:42:00 +0100
Organization: A noiseless patient Spider
Lines: 47
Message-ID: <utpaj9$cvh3$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<utkea9$31sr2$1@dont-email.me> <utktul$35ng8$1@dont-email.me>
<875xxdzvxj.fsf@nosuchdomain.example.com> <20240322170425.543@kylheku.com>
<utmvq5$3o50v$1@dont-email.me> <utnca0$3r5uk$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 24 Mar 2024 13:42:01 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="af297f15341d352325f54a52911dae41";
logging-data="425507"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/VkY4NFqpLpg5DrMxW3B1aSWmG4xL8WHs="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:lZhxihJSKQjrshofzkmGj6tYTQM=
Content-Language: en-GB
In-Reply-To: <utnca0$3r5uk$1@dont-email.me>
 by: David Brown - Sun, 24 Mar 2024 13:42 UTC

On 23/03/2024 20:58, bart wrote:
> On 23/03/2024 16:25, David Brown wrote:
>> On 23/03/2024 01:09, Kaz Kylheku wrote:
>>> On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>
>>>> I'm not aware that any such language exists, at least in the mainstream
>>>> (and I've looked at a *lot* of programming languages).  I conclude that
>>>> there just isn't enough demand for that kind of thing.
>>
>> I think lack of demand combines with it actually being an extremely
>> difficult task.
>>
>> Consider something as simple as "x++;" in C.  How could that be
>> implemented?  Perhaps the cpu has an "increment" instruction.  Perhaps
>> it has an "add immediate" instruction.  Perhaps it needs to load 1
>> into a register, then use an "add" instruction.  Perhaps "x" is in
>> memory. Some cpus can execute an increment directly on the memory
>> address as an atomic instruction.  Some can do so, but only using
>> specific (and more expensive) instructions.  Some can't do it at all
>> without locking mechanisms and synchronisation loops.
>>
>> So what does this user of this mythical LLL expect when he/she writes
>> "x++;" ?
>
> This is not the issue the comes up in the OP (or the issue that was
> assumed as I don't think the OP has clarified).
>

That is trivially true. I was picking a simple example and showing how
difficult it is to try to define a language where "the compiler does
exactly what I tell it to do". If it is that difficult to define the
programmer's precise expectation of the behaviour of "x++;" at the
lowest level, how could we hope to do it with anything like the OP's case?

It sounds easy to make lists of expected behaviour, like Kaz did and
like you no doubt have (at least in your head, if not written down) for
your own low-level language. Such lists are totally subjective, and
thus inappropriate for general languages usable by a range of people for
a range of tasks.

> There it is not about micro-managing the implementation of x++, but the
> compiler deciding it isn't needed at all.
>

First you have to decide /exactly/ what you mean by "x++;", before you
can decide if it is valid to remove it or not.

Re: A Famous Security Bug

<20240324172225.00006b10@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52407&group=comp.lang.c#52407

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sun, 24 Mar 2024 17:22:25 +0300
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <20240324172225.00006b10@yahoo.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<utkea9$31sr2$1@dont-email.me>
<utktul$35ng8$1@dont-email.me>
<utm06k$3glqc$1@dont-email.me>
<utme8b$3jtip$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="ea83786ed9d7b4303133f886081061ed";
logging-data="443456"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+uH+uJlyCUrFrfJ2n6SrdYQ4hsgoeYqbI="
Cancel-Lock: sha1:tMNvgtEQ+1YGL/81hABWuTylh4E=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
 by: Michael S - Sun, 24 Mar 2024 14:22 UTC

On Sat, 23 Mar 2024 11:26:03 +0000
bart <bc@freeuk.com> wrote:

> On 23/03/2024 07:26, James Kuyper wrote:
> > bart <bc@freeuk.com> writes:
> >> On 22/03/2024 17:14, James Kuyper wrote:
> > [...]
> >>> If you want to tell a system not only what a program must do, but
> >>> also how it must do it, you need to use a lower-level language
> >>> than C.
> >>
> >> Which one?
> >
> > That's up to you. The point is, C is NOT that language.
>
> I'm asking which /mainstream/ HLL is lower level than C. So
> specifically ruling out assembly.
>

Do you want mainstream of today or mainstream of the past also count?
For later, I'd think that PL/M and BLISS are lower level than C.
But I know neither so could be wrong.
https://en.wikipedia.org/wiki/PL/M
https://en.wikipedia.org/wiki/BLISS

Ada also allows certain degree of control on how things done, but I
am not sure that control is tighter than in C. I would think that in
majority of situations Ada's 'as if' rules are similar to C.

Re: A Famous Security Bug

<20240324172641.00005ede@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52408&group=comp.lang.c#52408

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sun, 24 Mar 2024 17:26:41 +0300
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <20240324172641.00005ede@yahoo.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<utkea9$31sr2$1@dont-email.me>
<utktul$35ng8$1@dont-email.me>
<utm06k$3glqc$1@dont-email.me>
<utme8b$3jtip$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="ea83786ed9d7b4303133f886081061ed";
logging-data="443456"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/SH9loVKg7wrcy8lU7ziHQlPjyMlq5BKU="
Cancel-Lock: sha1:s9BWYOyihGEyr+36mYl4065uvpE=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
 by: Michael S - Sun, 24 Mar 2024 14:26 UTC

On Sat, 23 Mar 2024 11:26:03 +0000
bart <bc@freeuk.com> wrote:

> On 23/03/2024 07:26, James Kuyper wrote:
> > bart <bc@freeuk.com> writes:
> >> On 22/03/2024 17:14, James Kuyper wrote:
> > [...]
> >>> If you want to tell a system not only what a program must do, but
> >>> also how it must do it, you need to use a lower-level language
> >>> than C.
> >>
> >> Which one?
> >
> > That's up to you. The point is, C is NOT that language.
>
> I'm asking which /mainstream/ HLL is lower level than C. So
> specifically ruling out assembly.
>
> If there is no such choice, then this is the problem: it has to be C
> or nothing.
>
> >> I don't think anyone seriously wants to switch to assembly for the
> >> sort of tasks they want to use C for.
> >
> > Why not? Assembly provides the kind of control you're looking for; C
> > does not. If that kind of control is important to you, you have to
> > find a language which provides it. If not assembler or C, what
> > would you use?
>
> Among non-mainstream ones, my own would fit the bill. Since I write
> the implementations, I can ensure the compiler doesn't have a mind of
> its own.
>
> However if somebody else tried to implement it, then I can't
> guarantee the same behaviour. This would need to somehow be enforced
> with a precise language spec, or mine would need to be a reference
> implementation with a lot of test cases.
>
>
> -----------------
>
> Take this program:
>
> #include <stdio.h>
> int main(void) {
> goto L;
> 0x12345678;
> L:
> printf("Hello, World!\n");
> }
>
> If I use my compiler, then that 12345678 pattern gets compiled into
> the binary (because it is loaded into a register then discarded).
> That means I can use that value as a marker or sentinel which can be
> searched for.
>

Does it apply to your aarch64 compiler as well?

Re: A Famous Security Bug

<utpenn$dtnq$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52409&group=comp.lang.c#52409

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sun, 24 Mar 2024 15:52:39 +0100
Organization: A noiseless patient Spider
Lines: 203
Message-ID: <utpenn$dtnq$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<utkea9$31sr2$1@dont-email.me> <utktul$35ng8$1@dont-email.me>
<utm06k$3glqc$1@dont-email.me> <utme8b$3jtip$1@dont-email.me>
<utn1a0$3ogob$1@dont-email.me> <utnh5m$3sdhk$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 24 Mar 2024 14:52:39 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="af297f15341d352325f54a52911dae41";
logging-data="456442"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18tI8c3aGfYouD1zm4Y/3YhtrbkkwUVYvw="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:k9A9S7eCxd3HloyeeHgk2Y1sVwo=
In-Reply-To: <utnh5m$3sdhk$1@dont-email.me>
Content-Language: en-GB
 by: David Brown - Sun, 24 Mar 2024 14:52 UTC

On 23/03/2024 22:21, bart wrote:
> On 23/03/2024 16:51, David Brown wrote:
>> On 23/03/2024 12:26, bart wrote:
>>> On 23/03/2024 07:26, James Kuyper wrote:
>>>> bart <bc@freeuk.com> writes:
>>>>> On 22/03/2024 17:14, James Kuyper wrote:
>>>> [...]
>>>>>> If you want to tell a system not only what a program must do, but
>>>>>> also how it must do it, you need to use a lower-level language than
>>>>>> C.
>>>>>
>>>>> Which one?
>>>>
>>>> That's up to you. The point is, C is NOT that language.
>>>
>>> I'm asking which /mainstream/ HLL is lower level than C. So
>>> specifically ruling out assembly.
>>>
>>> If there is no such choice, then this is the problem: it has to be C
>>> or nothing.
>>
>> How much of a problem is it, really?
>>
>> My field is probably the place where low level programming is most
>> ubiquitous.  There are plenty of people who use assembly - for good
>> reasons or for bad (or for reasons that some people think are good,
>> other people think are bad).  C is the most common choice.
>>
>> Other languages used for small systems embedded programming include
>> C++, Ada, Forth, BASIC, Pascal, Lua, and Micropython.  Forth is the
>> only one that could be argued as lower-level or more "directly
>> translated" than C.
>
> Well, Forth is certainly cruder than C (it's barely a language IMO). But
> I don't remember seeing anything in it resembling a type system that
> corresponds to the 'i8-i64 u8-u64 f32-f64' types typical in current
> hardware. (Imagine trying to create a precisely laid out struct.)

Forth can be considered a typeless language - you deal with cells (or
double cells, etc.), which have contents but not types. And you can
define structs with specific layouts quite easily. (Note that I've
never tried this myself - my Forth experience is /very/ limited, and you
will get much more accurate information in comp.lang.forth or another
place Forth experts hang out.)

A key thing you miss, in comparison to C, is the type checking and the
structured identifier syntax.

In C, if you have :

struct foo {
int32_t x;
int8_t y;
uint16_t z;
};

struct foo obj;

obj.x = obj.y + obj.z;

then you access the fields as "obj.x", etc. Your struct may or may not
have padding, depending on the target and compiler (or compiler-specific
extensions). If "obj2" is an object of a different type, then "obj2.x"
might be a different field or a compile-time error if that type has no
field "x".

In Forth, you write (again, I could be inaccurate here) :

struct
4 field >x
1 field >y
2 field >z
constant /foo

The names - including the punctuation (punctuation characters can be
freely used in identifiers in Forth) - are arbitrary. This is
equivalent to :

: >x 0 + ;
: >y 4 + ;
: >z 5 + ;
: /foo 7 ;

You make your instance "obj" by :

create obj /foo allot

which makes "obj" the address of a block of 7 bytes - but does not give
it a type in any sense. ("/foo" simply means "7").

The equivalent of "obj.x = obj.y + obj.z" would be :

obj >y c@ obj >z w@ + obj >x l!

That is :

1. Put the address of obj on the stack.
2. Add 4 to it (the definition of >y)
3. Use that as an address and fetch the 8-bit value from that address,
putting it on the stack.
4. Put the address of obj on the stack.
5. Add 5 to it (the definition of >z)
6. Use that as an address and fetch the 16-bit value from that address,
putting it on the stack.
7. Add the top two values from the stack and put the result on the stack.
8. Put the address of obj on the stack.
9. Add 0 to it (the definition of >x)
10. Use that as an address and store the 32-bit value from the top of
the stack to that address.

I'm assuming this Forth uses 32-bit stack cells, and ignoring
signed/unsigned issues for simplicity. There are, after all, better
places to find Forth tutorials for the details.

At no point is the definition of the struct type attached to "obj". In
fact, there is no struct type - there's just some defined words for
adding offsets to an address (or adding those values to anything else).
You can just as well write "10 >y ." to do "printf("%i", 10 + 4);".

There's therefore also no connection between the field accessor words
and the type, or any requirement that they are only used with the right
kind of object. On the other hand, suppose you wanted to dispense with
storing the field "x" and calculate it as "p->y + p->z" every time you
needed it. In C, you'd write:

int32_t calc_x(const struct foo * p) { return p->x + p->y; }

and replace uses of "obj.x" with "calc_x(&obj)".

In Forth, you might have defined :

: >x@ >x l@ ;
: >y@ >y c@ ;
: >z@ >z w@ ;

and used >x@ as your accessor for reading obj.x (as "obj >x@") in the
rest of your code. Now you can remove ">x" from the struct definition
and write:

: >x@ dup >y@ over >z@ + ;

and all your uses of "obj >x@" remain unchanged in the rest of your
code, but now they calculate x on the fly.

This is all /way/ off-topic for comp.lang.c, but it's perhaps
interesting to see a completely different way of doing things in a very
different language.

And note that although Forth is often byte-compiled very directly to
give you exactly the actions you specify in the source code, it is also
sometimes compiled to machine code - using optimisations.

>
> It is just too weird. I think I'd rather take my chances with C.

Forth does take some getting used to!

>
> > BASIC, ..., Lua, and Micropython.
>
> Hmm, I think my own scripting language is better at low level than any
> of these.

These all have one key advantage over your language - they are real
languages, available for use by /other/ programmers for development of
products.

> It supports those low-level types for a start. And I can do
> stuff like this:
>
>    println peek(0x40'0000, u16):"m"
>
>    fun peek(addr, t=byte) = makeref(addr, t)^
>
> This displays 'MZ', the signature of the (low-)loaded EXE image on Windows
>
> Possibly it is even better than C; is this little program valid (no UB)
> C, even when it is known that the program is low-loaded:
>
>    #include <stdio.h>
>    typedef unsigned char byte;
>
>    int main(void) {
>        printf("%c%c\n", *(byte*)0x400000, *(byte*)0x400001);
>    }
>
> This works on DMC, tcc, mcc, lccwin, but not gcc because that loads
> programs at high addresses. The problem being that the address involved,
> while belonging to the program, is outside of any C data objects.
>

I think you are being quite unreasonable in blaming gcc - or C - for
generating code that cannot access that particular arbitrary address!
The addresses accessible in a program are defined by the OS and the
target environment, not the language or compiler. And C has a perfectly
good way of forcing access to addresses - use "volatile".

Re: A Famous Security Bug

<20240324185353.00002395@yahoo.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52410&group=comp.lang.c#52410

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5chosen@yahoo.com (Michael S)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sun, 24 Mar 2024 18:53:53 +0300
Organization: A noiseless patient Spider
Lines: 88
Message-ID: <20240324185353.00002395@yahoo.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com>
<20240321211306.779b21d126e122556c34a346@gmail.moc>
<utkea9$31sr2$1@dont-email.me>
<utktul$35ng8$1@dont-email.me>
<utm06k$3glqc$1@dont-email.me>
<utme8b$3jtip$1@dont-email.me>
<utn1a0$3ogob$1@dont-email.me>
<utnh5m$3sdhk$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Injection-Info: dont-email.me; posting-host="ea83786ed9d7b4303133f886081061ed";
logging-data="443456"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+rhVr+gdRD7C9ELlHaVVbcfeM9v0xk7sU="
Cancel-Lock: sha1:GlXRzfpa/1d2gELF4SlMeyWVU74=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
 by: Michael S - Sun, 24 Mar 2024 15:53 UTC

On Sat, 23 Mar 2024 21:21:58 +0000
bart <bc@freeuk.com> wrote:

> On 23/03/2024 16:51, David Brown wrote:
> > On 23/03/2024 12:26, bart wrote:
> >> On 23/03/2024 07:26, James Kuyper wrote:
> >>> bart <bc@freeuk.com> writes:
> >>>> On 22/03/2024 17:14, James Kuyper wrote:
> >>> [...]
> >>>>> If you want to tell a system not only what a program must do,
> >>>>> but also how it must do it, you need to use a lower-level
> >>>>> language than C.
> >>>>
> >>>> Which one?
> >>>
> >>> That's up to you. The point is, C is NOT that language.
> >>
> >> I'm asking which /mainstream/ HLL is lower level than C. So
> >> specifically ruling out assembly.
> >>
> >> If there is no such choice, then this is the problem: it has to be
> >> C or nothing.
> >
> > How much of a problem is it, really?
> >
> > My field is probably the place where low level programming is most
> > ubiquitous.  There are plenty of people who use assembly - for good
> > reasons or for bad (or for reasons that some people think are good,
> > other people think are bad).  C is the most common choice.
> >
> > Other languages used for small systems embedded programming include
> > C++, Ada, Forth, BASIC, Pascal, Lua, and Micropython.  Forth is the
> > only one that could be argued as lower-level or more "directly
> > translated" than C.
>
> Well, Forth is certainly cruder than C (it's barely a language IMO).
> But I don't remember seeing anything in it resembling a type system
> that corresponds to the 'i8-i64 u8-u64 f32-f64' types typical in
> current hardware. (Imagine trying to create a precisely laid out
> struct.)
>
> It is just too weird. I think I'd rather take my chances with C.
>
> > BASIC, ..., Lua, and Micropython.
>
> Hmm, I think my own scripting language is better at low level than
> any of these. It supports those low-level types for a start. And I
> can do stuff like this:
>
> println peek(0x40'0000, u16):"m"
>
> fun peek(addr, t=byte) = makeref(addr, t)^
>
> This displays 'MZ', the signature of the (low-)loaded EXE image on
> Windows
>
> Possibly it is even better than C; is this little program valid (no
> UB) C, even when it is known that the program is low-loaded:
>
> #include <stdio.h>
> typedef unsigned char byte;
>
> int main(void) {
> printf("%c%c\n", *(byte*)0x400000, *(byte*)0x400001);
> }
>
> This works on DMC, tcc, mcc, lccwin, but not gcc because that loads
> programs at high addresses. The problem being that the address
> involved, while belonging to the program, is outside of any C data
> objects.
>
>

#include <stdio.h>
#include <stddef.h>

int main(void)
{ char* p0 = (char*)((size_t)main & -(size_t)0x10000);
printf("%c%c\n", p0[0], p0[1]);
return 0;
}

That would work for small programs. Not necessarily for bigger
programs.

Re: A Famous Security Bug

<20240324083718.507@kylheku.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52411&group=comp.lang.c#52411

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sun, 24 Mar 2024 16:02:21 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 60
Message-ID: <20240324083718.507@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com>
<20240322094449.555@kylheku.com> <87cyrmyvnv.fsf@nosuchdomain.example.com>
<20240322123323.805@kylheku.com> <utmst2$3n7mv$2@dont-email.me>
<20240323090700.848@kylheku.com> <utnt30$3v0ck$1@dont-email.me>
<20240323182314.725@kylheku.com> <utp9ct$cmur$1@dont-email.me>
Injection-Date: Sun, 24 Mar 2024 16:02:21 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="866d3663afea439826bdeb05c35522a4";
logging-data="489652"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/xbB9M3QuO9pPOqKcbvmTcNrfqUihtdYw="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:/mu9GKRTM/vQIDiIEI25nUCgovQ=
 by: Kaz Kylheku - Sun, 24 Mar 2024 16:02 UTC

On 2024-03-24, David Brown <david.brown@hesbynett.no> wrote:
> On 24/03/2024 06:50, Kaz Kylheku wrote:
>> (So why bother looking.) I mean,
>> the absolute baseline requirement any LTO implementor strives toward is
>> no change in observable behavior in a strictly conforming program, which
>> would be a showstopper.
>>
>
> Yes.
>
> I don't believe anyone - except you - has said anything otherwise. A C
> implementation is conforming if and only if it takes any correct C
> source code and generates a program image that always has correct
> observable behaviour when no undefined behaviour is executed. There are
> no extra imaginary requirements to be conforming, such as not being
> allowed to use extra information while compiling translation units.

But the requirement isn't imaginary. The "least requirements"
paragraph doesn't mean that all other requirements are imaginary;
most of them are necessary to describe the language so that we know
how to find the observable behavior.

It takes a modicum of inference to deduce that a certain explicitly
stated requirement doesn't exist as far as observability/conformance.

We are clearly not imagining the sentences which describe a classic
translation and linkage model. The argument that they don't matter
for conformance is different from the argument that we imagined
something between the lines. It is the inference based on 5.1.2.4 that
is between the lines; potentially between any pair of lines anywhere!

Furthermore, the requirents may matter to other kinds of observability.

In C programming, we don't always just care about ISO C observability.

In safety critical coding, we might want to conduct a code review of
the disassembly of an object file (does it correctly implement the
intent we believe to be expressed in the source), and then retain that
exact file until wit needs to be recompiled. If the code is actually a
an intermediate code that is further translated during linking, that's
not good; we face the prospect of reviewing potentially the entire image
each time. Thus we might want an implementation which has a way of
conforming to the classic linkage model (that happens to be conveniently
described).

We just may not confuse that conformance (private contract between
implementor and user) with ISO C conformance, as I have.
Sorry about that!

What is significant is that the concept has support in ISO C wording.
Such a contract can just refer to that: "our project requires the
classic translation and linkage model that arises from the translation
phases descriptions 7 and 8 being closely followed".
As long as you have a way to disable LTO (or not enable it), you have
that.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: A Famous Security Bug

<utpk90$f8q6$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52412&group=comp.lang.c#52412

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.brown@hesbynett.no (David Brown)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sun, 24 Mar 2024 17:27:12 +0100
Organization: A noiseless patient Spider
Lines: 109
Message-ID: <utpk90$f8q6$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
<20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
<20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
<20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com>
<20240322094449.555@kylheku.com> <87cyrmyvnv.fsf@nosuchdomain.example.com>
<20240322123323.805@kylheku.com> <utmst2$3n7mv$2@dont-email.me>
<20240323090700.848@kylheku.com> <utnt30$3v0ck$1@dont-email.me>
<20240323182314.725@kylheku.com> <utp9ct$cmur$1@dont-email.me>
<20240324083718.507@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 24 Mar 2024 16:27:12 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="af297f15341d352325f54a52911dae41";
logging-data="500550"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+agmTByzzDZKncXv1G+NdFWMu8nFQc1sQ="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:a3dCH1RIJstxhfViInpOmxcIcyw=
Content-Language: en-GB
In-Reply-To: <20240324083718.507@kylheku.com>
 by: David Brown - Sun, 24 Mar 2024 16:27 UTC

On 24/03/2024 17:02, Kaz Kylheku wrote:
> On 2024-03-24, David Brown <david.brown@hesbynett.no> wrote:
>> On 24/03/2024 06:50, Kaz Kylheku wrote:
>>> (So why bother looking.) I mean,
>>> the absolute baseline requirement any LTO implementor strives toward is
>>> no change in observable behavior in a strictly conforming program, which
>>> would be a showstopper.
>>>
>>
>> Yes.
>>
>> I don't believe anyone - except you - has said anything otherwise. A C
>> implementation is conforming if and only if it takes any correct C
>> source code and generates a program image that always has correct
>> observable behaviour when no undefined behaviour is executed. There are
>> no extra imaginary requirements to be conforming, such as not being
>> allowed to use extra information while compiling translation units.
>
> But the requirement isn't imaginary. The "least requirements"
> paragraph doesn't mean that all other requirements are imaginary;
> most of them are necessary to describe the language so that we know
> how to find the observable behavior.
>

The text is not imaginary - your reading between the lines /is/. There
is no rule in the C standards stopping the compiler from using
additional information or knowledge about other parts of the program.

> It takes a modicum of inference to deduce that a certain explicitly
> stated requirement doesn't exist as far as observability/conformance.
>
> We are clearly not imagining the sentences which describe a classic
> translation and linkage model. The argument that they don't matter
> for conformance is different from the argument that we imagined
> something between the lines. It is the inference based on 5.1.2.4 that
> is between the lines; potentially between any pair of lines anywhere!
>
> Furthermore, the requirents may matter to other kinds of observability.
>
> In C programming, we don't always just care about ISO C observability.

I agree on that. The C standards are not the be all and end all of
things of interest to C programmers. If it were, we'd never have
compilers with extensions.

But it /is/ the only thing that matters when you talk about "conforming"
compilers.

If you want to say that LTO breaks some of the requirements that /you/
have for the way /you/ want to do unit testing, that's absolutely fine.
If you want to say that this applies to many other C developers, I'd
prefer to see a bit of evidence or justification for the claim, but I'd
take it seriously - I fully appreciate that people have needs beyond
what the C standards give them.

But that's not what you have been saying. You have been saying that LTO
breaks the requirements of the C standards, and you are wrong about that.

>
> In safety critical coding, we might want to conduct a code review of
> the disassembly of an object file (does it correctly implement the
> intent we believe to be expressed in the source), and then retain that
> exact file until wit needs to be recompiled.

Sure. And for that reason, some developers in that field will not use
LTO. I personally don't make much use of LTO because it makes software
a pain to debug. I do, however, retain the full toolchain used for a
project, including all build scripts and flags, libraries and compilers,
and make sure my builds are reproducible on multiple computers - then
any testing or reviews of the disassembly remain valid over time. With
LTO, at least some parts may need to be re-validated after a build even
for source code changes to apparently different parts of the program -
that is a cost that must be weighed against the benefits of LTO. (I
have considered doing LTO builds in parallel with non-LTO builds - using
the LTO builds solely for more advanced static checking, while using the
more debuggable non-LTO build for the "real" binary.)

I have agreed that there are many reasons why LTO might not be a good
choice for any given project. I have merely contended the claim that
conformity is such a reason.

> If the code is actually a
> an intermediate code that is further translated during linking, that's
> not good; we face the prospect of reviewing potentially the entire image
> each time. Thus we might want an implementation which has a way of
> conforming to the classic linkage model (that happens to be conveniently
> described).
>
> We just may not confuse that conformance (private contract between
> implementor and user) with ISO C conformance, as I have.
> Sorry about that!
>

Are you saying that after dozens of posts back and forth where you made
claims about non-conformity of C compilers handling of C code in
comp.lang.c, with heavy references to the C standards which define the
term "conformity", you are now saying that you were not talking about C
standard conformity?

> What is significant is that the concept has support in ISO C wording.
> Such a contract can just refer to that: "our project requires the
> classic translation and linkage model that arises from the translation
> phases descriptions 7 and 8 being closely followed".
> As long as you have a way to disable LTO (or not enable it), you have
> that.
>

Re: A Famous Security Bug

<86o7b3k283.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=52415&group=comp.lang.c#52415

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Sun, 24 Mar 2024 09:45:48 -0700
Organization: A noiseless patient Spider
Lines: 83
Message-ID: <86o7b3k283.fsf@linuxsc.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de> <20240320114218.151@kylheku.com> <20240321211306.779b21d126e122556c34a346@gmail.moc> <20240321131621.321@kylheku.com> <utk1k9$2uojo$1@dont-email.me> <20240322083037.20@kylheku.com> <utkgd2$32aj7$1@dont-email.me> <wwva5mpwbh0.fsf@LkoBDZeT.terraraq.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="ba786330f332cc3d5ff45a8f861eda2e";
logging-data="509397"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19FMOhlf10oBGPEs+5RAXqp1yFell3YcZ8="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:+xJ7d4M00F6ORsiuUmWMN/pJjF0=
sha1:MsQPPDkpfCURF1qSfu/HY22sdno=
 by: Tim Rentsch - Sun, 24 Mar 2024 16:45 UTC

Richard Kettlewell <invalid@invalid.invalid> writes:

> David Brown <david.brown@hesbynett.no> writes:
>
>> I have tried to explain the reality of what the C standards say in a
>> couple of posts (including one that I had not posted before you wrote
>> this one). I have tried to make things as clear as possible, and
>> hopefully you will see the point.
>>
>> If not, then you must accept that you interpret the C standards in a
>> different manner from the main compile vendors, as well as some "big
>> names" in this group. That is, of course, not proof in itself - but
>> you must realise that for practical purposes you need to be aware of
>> how others interpret the standard, both for your own coding and for
>> the advice or recommendations you give to others.
>
> Agreed that the ship has sailed on whether LTO is a valid optimization.
> But it's understandable why someone might reach a different conclusion.
> [...]

Granted that someone might follow reasoning like the comments you
gave. Even so, some further reflection should be enough for them to
reconsider their original assessment. In particular, the following:

"A C program need not all be translated at the same time." This
excerpt from the Standard implies that C programs may be translated
in their entirety all at the same time.

Notice the lead in to section 5.1.1.2 p1, describing translation
phases, says "The precedence among the syntax rules of translation
is specified by the following phases." All of phases 1 through 8
involve translation, but they are about when various forms of
source recognition take place, not about when code is generated.

The "semantically analyzed" in translation phase 7 is nothing more
than type determination and verifying constraints are not violated.
Nothing about these analyses changes if optimizations are carried
out in translation phase 8.

Notice that translation phase 8 says translator output is collected
into a program image "which contains information needed for
execution in its execution environment." A reasonable inference
is that all code generation could occur at the end of translation
phase 8, as part of producing that information.

The first two points of paragraph 2 in section 1:
This International Standard does not specify
* the mechanism by which C programs are transformed for use
by a data-processing system;
* the mechanism by which C programs are invoked for use by a
data-processing system;

The key phrase in section 5.1.2.3: "The /least requirements/ on a
conforming implementation are: [...]" [emphasis added].

Nothing in the C standard requires an implementation to generate
executable code. The output of translation phase 7 could be a
machine-independent intermediate form. The output of translation
phase 8 could be the same machine-independent intermediate form.
Executing the program could be running an interpreter on the program
"executable" holding only the machine-independent intermediate
parts, and the interpreter might carry out optimizations at run
time. All of these possibilities are allowed in a conforming
implementation as long as the "least requirements" of 5.1.2.3 are
met.

It's a mistake to draw any firm conclusions based on reading parts
of the standard in isolation. The C standard has been written as a
cohesive whole, and it's important to understand it in the same way.

Related to that, although the C standard gives explicit definitions
for many words and phrases, it also uses words that it does not
define (and presumably are not defined in any of the normative
references, though that may be difficult to verify). When
confronted with one of these non-defined terms, often arguments are
made that a word means X or Y or Z, because of ... (fill in the
blank). It's important to remember that, whatever the case is for X
or Y or Z, what /we/ think doesn't matter; all that does matter is
what the standard's authors (and members of the ISO C committee)
think. The C standard means what the ISO C group thinks it means.
They are the ultimate and sole authority. Any discussion about what
the C standard requires that ignores that or pretends otherwise is
a meaningless exercise.


devel / comp.lang.c / Re: A Famous Security Bug

Pages:123456
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor