Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

The linuX Files -- The Source is Out There. -- Sent in by Craig S. Bell, goat@aracnet.com


devel / comp.std.c / Does reading an uninitialized object have undefined behavior?

SubjectAuthor
* Does reading an uninitialized object have undefined behavior?Keith Thompson
+* Does reading an uninitialized object have undefined behavior?Ben Bacarisse
|+* Does reading an uninitialized object have undefined behavior?Keith Thompson
||`* Does reading an uninitialized object have undefined behavior?Ben Bacarisse
|| `* Does reading an uninitialized object have undefined behavior?Keith Thompson
||  +- Does reading an uninitialized object have undefined behavior?Ben Bacarisse
||  `* Does reading an uninitialized object have undefined behavior?Tim Rentsch
||   `* Does reading an uninitialized object have undefined behavior?Martin Uecker
||    `* Does reading an uninitialized object have undefined behavior?Tim Rentsch
||     `* Does reading an uninitialized object have undefined behavior?Martin Uecker
||      `* Does reading an uninitialized object have undefined behavior?Tim Rentsch
||       +* Does reading an uninitialized object have undefined behavior?Kaz Kylheku
||       |+* Does reading an uninitialized object have undefined behavior?Martin Uecker
||       ||`* Does reading an uninitialized object have undefined behavior?Kaz Kylheku
||       || `* Does reading an uninitialized object have undefined behavior?Martin Uecker
||       ||  `* Does reading an uninitialized object have undefined behavior?Richard Damon
||       ||   `- Does reading an uninitialized object have undefined behavior?Martin Uecker
||       |`* Does reading an uninitialized object have undefined behavior?Tim Rentsch
||       | `* Does reading an uninitialized object have undefined behavior?Kaz Kylheku
||       |  `- Does reading an uninitialized object have undefined behavior?Tim Rentsch
||       `* Does reading an uninitialized object have undefined behavior?Martin Uecker
||        +* Does reading an uninitialized object have undefined behavior?Tim Rentsch
||        |`* Does reading an uninitialized object have undefined behavior?Spiros Bousbouras
||        | `* Does reading an uninitialized object have undefined behavior?Tim Rentsch
||        |  `* Does reading an uninitialized object have undefined behavior?Spiros Bousbouras
||        |   `* Does reading an uninitialized object have undefined behavior?Tim Rentsch
||        |    `* Does reading an uninitialized object have undefined behavior?Spiros Bousbouras
||        |     `- Does reading an uninitialized object have undefined behavior?Tim Rentsch
||        `* Does reading an uninitialized object have undefined behavior?Tim Rentsch
||         `* Does reading an uninitialized object have undefined behavior?Jakob Bohm
||          `* Does reading an uninitialized object have undefined behavior?Ben Bacarisse
||           `* Does reading an uninitialized object have undefined behavior?Jakob Bohm
||            `- Does reading an uninitialized object have undefined behavior?Ben Bacarisse
|`* Does reading an uninitialized object have undefined behavior?Kaz Kylheku
| +* Does reading an uninitialized object have undefined behavior?Martin Uecker
| |`- Does reading an uninitialized object have undefined behavior?Tim Rentsch
| `- Does reading an uninitialized object have undefined behavior?Tim Rentsch
+* Does reading an uninitialized object have undefined behavior?Kaz Kylheku
|`* Does reading an uninitialized object have undefined behavior?Jakob Bohm
| `- Does reading an uninitialized object have undefined behavior?Tim Rentsch
`* Does reading an uninitialized object have undefined behavior?Tim Rentsch
 `* Does reading an uninitialized object have undefined behavior?Keith Thompson
  +- Does reading an uninitialized object have undefined behavior?Martin Uecker
  +- Does reading an uninitialized object have undefined behavior?Tim Rentsch
  +- Does reading an uninitialized object have undefined behavior?Kaz Kylheku
  `* Does reading an uninitialized object have undefined behavior?Kaz Kylheku
   `* Does reading an uninitialized object have undefined behavior?Keith Thompson
    `- Does reading an uninitialized object have undefined behavior?Kaz Kylheku

Pages:12
Does reading an uninitialized object have undefined behavior?

<87zg3pq1ym.fsf@nosuchdomain.example.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=525&group=comp.std.c#525

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.std.c
Subject: Does reading an uninitialized object have undefined behavior?
Date: Thu, 20 Jul 2023 22:16:01 -0700
Organization: None to speak of
Lines: 83
Message-ID: <87zg3pq1ym.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="2370f913b850030e0527dd0f7396627d";
logging-data="3293548"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18fwVdkCkH+Tmj2ymMo73PE"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:8aki8yKNElyLdN1U3vwilRAU1ME=
sha1:23z1n6lZpxOJ11/InfvxON9VJ2E=
 by: Keith Thompson - Fri, 21 Jul 2023 05:16 UTC

N3096 is the last public draft of the upcoming C23 standard.

N3096 J.2 says:

The behavior is undefined in the following circumstances:
[...]
(11) The value of an object with automatic storage duration is
used while the object has an indeterminate representation
(6.2.4, 6.7.10, 6.8).

I'll use an `int` object in my example.

Reading an object that holds a non-value representation has undefined
behavior, but not all integer types have non-value representations
-- and if an implementation has certain characteristics, we can
reliably infer that int has no non-value representations (called
"trap representations" in C99, C11, and C17).

Consider this program:
```
#include <limits.h>
int main(void) {
int foo;
if (sizeof (int) == 4 &&
CHAR_BIT == 8 &&
INT_MAX == 2147483647 &&
INT_MIN == -INT_MAX-1)
{
int bar = foo;
}
} ```

If the condition is true (as it is for many real-world
implementations), then int has no padding bits and no trap
representations. The object `foo` has an indeterminate representation
when it's used to initialize `bar`. Since it cannot have a non-value
representation, it has an unspecified value.

If J.2(11) is correct, then the use of the value results in undefined
behavior.

But Annex J is non-normative, and as far as I can tell there is no
normative text in the standard that says the behavior is undefined.

6.2.4 discusses storage duration.

6.7.10 discusses initialization; p11 implies that the representation of
`foo` is indeterminate. It does not say

6.8 discusses statements and blocks, and repeats that "the
representation of objects without an initializer becomes
indeterminate".

None of these discuss what happens when the value of an object with
an indeterminate representation is used -- nor does any other text
I found by searching the standard for "indeterminate representation".

I see no relevant changes between C11 and C23 (except that C23 changes
the term "trap representation" to "non-value representation").

I suggest there are three possible resolutions:

1. J.2(11) is correct and I've missed something (always a possibility,
but so far nobody in comp.lang.c has come up with anything).

2. J.2(11) reflects the intent, and normative text somewhere else
in the standard needs to be updated or added to make it clear
that using the value of an object with automatic storage duration
while the object has an indeterminate representation has undefined
behavior.

3. J.2(11) is incorrect and needs to be modified or deleted.
(This would also imply that compilers may not perform certain
optimizations. I have no idea whether any compilers would actually
be affected.)

I'm going to post this to comp.std.c and email it to the C23 editors.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Re: Does reading an uninitialized object have undefined behavior?

<87zg3pnuse.fsf@bsb.me.uk>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=527&group=comp.std.c#527

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.usenet@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Fri, 21 Jul 2023 16:33:53 +0100
Organization: A noiseless patient Spider
Lines: 60
Message-ID: <87zg3pnuse.fsf@bsb.me.uk>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="4d6d411b072660d280bdd2ba4d2c5af7";
logging-data="3483897"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/mGclA8SbQJWSb/ZTx0XkEwbjyrItSSis="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:8oxHKqXO8q0sa5hFcnfXk3saoM0=
sha1:An4K6AH6xt7P4/WFDHjLpsnSWjU=
X-BSB-Auth: 1.4f9c829c913586dfae59.20230721163353BST.87zg3pnuse.fsf@bsb.me.uk
 by: Ben Bacarisse - Fri, 21 Jul 2023 15:33 UTC

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> N3096 is the last public draft of the upcoming C23 standard.
>
> N3096 J.2 says:
>
> The behavior is undefined in the following circumstances:
> [...]
> (11) The value of an object with automatic storage duration is
> used while the object has an indeterminate representation
> (6.2.4, 6.7.10, 6.8).
>
> I'll use an `int` object in my example.
>
> Reading an object that holds a non-value representation has undefined
> behavior, but not all integer types have non-value representations
> -- and if an implementation has certain characteristics, we can
> reliably infer that int has no non-value representations (called
> "trap representations" in C99, C11, and C17).
>
> Consider this program:
> ```
> #include <limits.h>
> int main(void) {
> int foo;
> if (sizeof (int) == 4 &&
> CHAR_BIT == 8 &&
> INT_MAX == 2147483647 &&
> INT_MIN == -INT_MAX-1)
> {
> int bar = foo;
> }
> }
> ```
>
> If the condition is true (as it is for many real-world
> implementations), then int has no padding bits and no trap
> representations. The object `foo` has an indeterminate representation
> when it's used to initialize `bar`. Since it cannot have a non-value
> representation, it has an unspecified value.
>
> If J.2(11) is correct, then the use of the value results in undefined
> behavior.
>
> But Annex J is non-normative, and as far as I can tell there is no
> normative text in the standard that says the behavior is undefined.

6.3.2.1 p2:

"[...] If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage class
(never had its address taken), and that object is uninitialized (not
declared with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined."

seems to cover it. The restriction on not having it's address taken
seems odd.

--
Ben.

Re: Does reading an uninitialized object have undefined behavior?

<20230721002225.404@kylheku.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=528&group=comp.std.c#528

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-4973@kylheku.com (Kaz Kylheku)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Fri, 21 Jul 2023 17:42:23 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 66
Message-ID: <20230721002225.404@kylheku.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com>
Injection-Date: Fri, 21 Jul 2023 17:42:23 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d3cdef0fa121ce30c3085be8a1433f26";
logging-data="3539184"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/H6A+hcUAE7c9hEtDiv9YhcAt1oc79524="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:qwvZAZwaFACjt8+v/3fT+S+i6Gk=
 by: Kaz Kylheku - Fri, 21 Jul 2023 17:42 UTC

On 2023-07-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> N3096 is the last public draft of the upcoming C23 standard.
>
> N3096 J.2 says:
>
> The behavior is undefined in the following circumstances:
> [...]
> (11) The value of an object with automatic storage duration is
> used while the object has an indeterminate representation
> (6.2.4, 6.7.10, 6.8).

Personally, I think that the root cause of this whole issue is
the defective definition of indeterminate value.

Indeterminacy must be an abstract concept that is not encoded
in the bits of the object; it is a matter of provenance.

An indeterminate integer could have a valid bit pattern,
such as all zero, yet the implementation should be free to terminate
with a diagnostic (or behave in other ways) when it is accessed.

It should not be possible to tell whether an object is indeterminate
by looking at its bits.

An implementation can track this with meta data. Translation time
flow-analysis data can catch some uses of uninitialized objects;
that's how we get classic uninitialized variable warnings.

An implementation can track uninitialized bits at run-time with
hidden meta-data. The Valgrind debugging tool does this; for
every bit, whose value is necessarily always 0 or 1, it tracks
whether the bit is initialized.

That poor definition of indeterminate value should go.

Otherwise the standard is contradicting itself and doing
silly things like asserting that using an indeterminate value
is undefined behavior if it is a local variable with automatic
storage.

A reasonable definition of indeterminate might be:

indeterminate

an abstract status indicating that a value is invalid,
irrespective of the content of the bits which constitute
that value.

An improperly obtained value is indeterminate(1).

A previously valid value may lapse into indeterminate status.(2)

Any use of an indeterminate value is undefined behavior.

--
(1) For example, a value obtained accessing an uninitialized
object defined in automatic storage, or in an uninitializeed
region of memory obtained from malloc

(2) For example, a pointer to an object becomes indeterminate
if that object is deallocated.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: Does reading an uninitialized object have undefined behavior?

<874jlxozzz.fsf@nosuchdomain.example.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=529&group=comp.std.c#529

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Fri, 21 Jul 2023 11:56:00 -0700
Organization: None to speak of
Lines: 109
Message-ID: <874jlxozzz.fsf@nosuchdomain.example.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com>
<87zg3pnuse.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="2370f913b850030e0527dd0f7396627d";
logging-data="3568820"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX190DnGRFUDHJYokwy2xZ5u4"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:EF5yEUeiy2UdIKC6LxdfLFEHwbw=
sha1:7roxsQe7taKkdqyaBvW8OXYiPds=
 by: Keith Thompson - Fri, 21 Jul 2023 18:56 UTC

Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
>> N3096 is the last public draft of the upcoming C23 standard.
>>
>> N3096 J.2 says:
>>
>> The behavior is undefined in the following circumstances:
>> [...]
>> (11) The value of an object with automatic storage duration is
>> used while the object has an indeterminate representation
>> (6.2.4, 6.7.10, 6.8).
>>
>> I'll use an `int` object in my example.
>>
>> Reading an object that holds a non-value representation has undefined
>> behavior, but not all integer types have non-value representations
>> -- and if an implementation has certain characteristics, we can
>> reliably infer that int has no non-value representations (called
>> "trap representations" in C99, C11, and C17).
>>
>> Consider this program:
>> ```
>> #include <limits.h>
>> int main(void) {
>> int foo;
>> if (sizeof (int) == 4 &&
>> CHAR_BIT == 8 &&
>> INT_MAX == 2147483647 &&
>> INT_MIN == -INT_MAX-1)
>> {
>> int bar = foo;
>> }
>> }
>> ```
>>
>> If the condition is true (as it is for many real-world
>> implementations), then int has no padding bits and no trap
>> representations. The object `foo` has an indeterminate representation
>> when it's used to initialize `bar`. Since it cannot have a non-value
>> representation, it has an unspecified value.
>>
>> If J.2(11) is correct, then the use of the value results in undefined
>> behavior.
>>
>> But Annex J is non-normative, and as far as I can tell there is no
>> normative text in the standard that says the behavior is undefined.
>
> 6.3.2.1 p2:
>
> "[...] If the lvalue designates an object of automatic storage
> duration that could have been declared with the register storage class
> (never had its address taken), and that object is uninitialized (not
> declared with an initializer and no assignment to it has been
> performed prior to use), the behavior is undefined."
>
> seems to cover it. The restriction on not having it's address taken
> seems odd.

Good find.

That sentence was added in C11 (it doesn't appear in C99 or in
N1256, which consists of C99 plus the three Technical Corrigenda)
in response to DR #338. Since the wording in Annex J goes back to
C99 in its current form, and to C90 in a slightly different form,
that can't be what Annex J is referring to. And the statement
in Annex J is more general, so we can't quite use 6.3.2.1p2 as a
retroactive justification.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm

Yes, that restriction does seem strange. It was inspired by the
IA64 (Itanium) architecture, which has an extra trap bit in each
CPU register (NaT, "not a thing"). The "could have been declared
with the register storage class" wording is there because the IA64
NaT bit exists only in CPU registers, not in memory.

An object with automatic storage duration might be stored in an IA64
CPU register. If the object is not initialized, the register's
NaT bit would be set. Any attempt to read it would cause a trap.
Writing it would clear the NaT bit.

Which means that a hypothetical CPU with something like a NaT bit
on each word of memory (iAPX 432? i960?) might cause a trap in
circumstances not covered by that wording -- but it *is* covered
by the wording in Annex J.

(Normally, an object whose address is taken can still be stored in
a CPU register for part of its lifetime. The effect is to forbid
certain optimizations on I64-like systems.)

It's tempting to conclude that reading an uninitialized automatic
object whose address is taken is *not* undefined behavior
(https://en.wikipedia.org/wiki/Exception_that_proves_the_rule),
but the standard doesn't say so.

C90's Annex G (renamed to Annex J in later editions) says:

The behavior in the following circumstances is undefined:
[...]
- The value of an uninitialized object that has automatic storage
duration is used before a value is assigned (6.5.7).

6.5.7 discusses initialization, but doesn't say that reading an
uninitialized object has undefined behave, so the issue is an old one.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Re: Does reading an uninitialized object have undefined behavior?

<87fs5hnipv.fsf@bsb.me.uk>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=530&group=comp.std.c#530

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.usenet@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Fri, 21 Jul 2023 20:54:36 +0100
Organization: A noiseless patient Spider
Lines: 140
Message-ID: <87fs5hnipv.fsf@bsb.me.uk>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com>
<87zg3pnuse.fsf@bsb.me.uk> <874jlxozzz.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="4d6d411b072660d280bdd2ba4d2c5af7";
logging-data="3591362"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18SLhr7VzyrprwWBTNwH6p1FLAVIVjK0T8="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:y/FIOoFNJ6k5xJn6BgmAlhJjtMg=
sha1:Iduk4wmU9f79MwiaWSAjhaS4cgE=
X-BSB-Auth: 1.d1f033b13e0172d23bc0.20230721205436BST.87fs5hnipv.fsf@bsb.me.uk
 by: Ben Bacarisse - Fri, 21 Jul 2023 19:54 UTC

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
>>> N3096 is the last public draft of the upcoming C23 standard.
>>>
>>> N3096 J.2 says:
>>>
>>> The behavior is undefined in the following circumstances:
>>> [...]
>>> (11) The value of an object with automatic storage duration is
>>> used while the object has an indeterminate representation
>>> (6.2.4, 6.7.10, 6.8).
>>>
>>> I'll use an `int` object in my example.
>>>
>>> Reading an object that holds a non-value representation has undefined
>>> behavior, but not all integer types have non-value representations
>>> -- and if an implementation has certain characteristics, we can
>>> reliably infer that int has no non-value representations (called
>>> "trap representations" in C99, C11, and C17).
>>>
>>> Consider this program:
>>> ```
>>> #include <limits.h>
>>> int main(void) {
>>> int foo;
>>> if (sizeof (int) == 4 &&
>>> CHAR_BIT == 8 &&
>>> INT_MAX == 2147483647 &&
>>> INT_MIN == -INT_MAX-1)
>>> {
>>> int bar = foo;
>>> }
>>> }
>>> ```
>>>
>>> If the condition is true (as it is for many real-world
>>> implementations), then int has no padding bits and no trap
>>> representations. The object `foo` has an indeterminate representation
>>> when it's used to initialize `bar`. Since it cannot have a non-value
>>> representation, it has an unspecified value.
>>>
>>> If J.2(11) is correct, then the use of the value results in undefined
>>> behavior.
>>>
>>> But Annex J is non-normative, and as far as I can tell there is no
>>> normative text in the standard that says the behavior is undefined.
>>
>> 6.3.2.1 p2:
>>
>> "[...] If the lvalue designates an object of automatic storage
>> duration that could have been declared with the register storage class
>> (never had its address taken), and that object is uninitialized (not
>> declared with an initializer and no assignment to it has been
>> performed prior to use), the behavior is undefined."
>>
>> seems to cover it. The restriction on not having it's address taken
>> seems odd.
>
> Good find.
>
> That sentence was added in C11 (it doesn't appear in C99 or in
> N1256, which consists of C99 plus the three Technical Corrigenda)
> in response to DR #338. Since the wording in Annex J goes back to
> C99 in its current form, and to C90 in a slightly different form,
> that can't be what Annex J is referring to. And the statement
> in Annex J is more general, so we can't quite use 6.3.2.1p2 as a
> retroactive justification.

Thanks for looking into the history. I was going to do that when I had
some time.

There are three relevant clauses in Annex J, and I think we should keep
them all in mind. Sadly, they are not numbered (until C23) so I've
given then 'UB' numbers taken from the similar wording in C23.

— The value of an object with automatic storage duration is used while
it is indeterminate (6.2.4, 6.7.9, 6.8). [UB-11]

— A trap representation is read by an lvalue expression that does not
have character type (6.2.6.1). [UB-12]

— An lvalue designating an object of automatic storage duration that
could have been declared with the register storage class is used in
a context that requires the value of the designated object, but the
object is uninitialized. (6.3.2.1). [UB-20]

Clearly, UB-20 is explained by the quote I posted, but UB-11 (the one we
are talking about) is there as well and, as you say, can't be fully
explained by that normative quote.

> https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm
>
> Yes, that restriction does seem strange. It was inspired by the
> IA64 (Itanium) architecture, which has an extra trap bit in each
> CPU register (NaT, "not a thing"). The "could have been declared
> with the register storage class" wording is there because the IA64
> NaT bit exists only in CPU registers, not in memory.

Thanks. I wondered if might have been some hardware consideration...

> An object with automatic storage duration might be stored in an IA64
> CPU register. If the object is not initialized, the register's
> NaT bit would be set. Any attempt to read it would cause a trap.
> Writing it would clear the NaT bit.
>
> Which means that a hypothetical CPU with something like a NaT bit
> on each word of memory (iAPX 432? i960?) might cause a trap in
> circumstances not covered by that wording -- but it *is* covered
> by the wording in Annex J.

It's covered by UB-12 and that's backed up by normative text,
specifically paragraph 5 of the section cited in UB-12.

> (Normally, an object whose address is taken can still be stored in
> a CPU register for part of its lifetime. The effect is to forbid
> certain optimizations on I64-like systems.)
>
> It's tempting to conclude that reading an uninitialized automatic
> object whose address is taken is *not* undefined behavior
> (https://en.wikipedia.org/wiki/Exception_that_proves_the_rule),
> but the standard doesn't say so.

But it doesn't say that it is UB either, does it? That case is excluded
in 6.3.2.1 p2, but there's not else covering it but the non-normative
Annex J.

> C90's Annex G (renamed to Annex J in later editions) says:
>
> The behavior in the following circumstances is undefined:
> [...]
> - The value of an uninitialized object that has automatic storage
> duration is used before a value is assigned (6.5.7).
>
> 6.5.7 discusses initialization, but doesn't say that reading an
> uninitialized object has undefined behave, so the issue is an old one.

--
Ben.

Re: Does reading an uninitialized object have undefined behavior?

<87a5vpnegz.fsf@nosuchdomain.example.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=531&group=comp.std.c#531

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Fri, 21 Jul 2023 14:26:20 -0700
Organization: None to speak of
Lines: 59
Message-ID: <87a5vpnegz.fsf@nosuchdomain.example.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com>
<87zg3pnuse.fsf@bsb.me.uk> <874jlxozzz.fsf@nosuchdomain.example.com>
<87fs5hnipv.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="2370f913b850030e0527dd0f7396627d";
logging-data="3626045"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/99lDE4wQQxMr4XIKrL/53"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:80wg9vI9wUT2SzurX8aEygjmuso=
sha1:cNa93JCSNVMoCvBeOD8q8JFCvLw=
 by: Keith Thompson - Fri, 21 Jul 2023 21:26 UTC

Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]
> There are three relevant clauses in Annex J, and I think we should keep
> them all in mind. Sadly, they are not numbered (until C23) so I've
> given then 'UB' numbers taken from the similar wording in C23.
>
> — The value of an object with automatic storage duration is used while
> it is indeterminate (6.2.4, 6.7.9, 6.8). [UB-11]
>
> — A trap representation is read by an lvalue expression that does not
> have character type (6.2.6.1). [UB-12]
>
> — An lvalue designating an object of automatic storage duration that
> could have been declared with the register storage class is used in
> a context that requires the value of the designated object, but the
> object is uninitialized. (6.3.2.1). [UB-20]
[...]
>> An object with automatic storage duration might be stored in an IA64
>> CPU register. If the object is not initialized, the register's
>> NaT bit would be set. Any attempt to read it would cause a trap.
>> Writing it would clear the NaT bit.
>>
>> Which means that a hypothetical CPU with something like a NaT bit
>> on each word of memory (iAPX 432? i960?) might cause a trap in
>> circumstances not covered by that wording -- but it *is* covered
>> by the wording in Annex J.
>
> It's covered by UB-12 and that's backed up by normative text,
> specifically paragraph 5 of the section cited in UB-12.

I don't think so. A "non-value representation" (formerly a "trap
representation") is determined by the bits making up the representation
of an object. For an integer type, such a representation can occur only
if the type has padding bits. The IA64 NaT bit is not part of the
representation; it's neither a value bit nor a padding bit.

For a 64-bit integer type, given CHAR_BIT==8, its *representation* is
defined as a set of 8 bytes that can be copied into an object of type
`unsigned char[8]`. The NaT bit does not contribute to the size of the
object.

I think the right way for C to permit NaT-like bits is, as Kaz
suggested, to define "indeterminate value" in terms of provenance,
not just the bits that make up its current representation.
An automatic object with no initialization, or a malloc()ed object,
starts with an indeterminate value, and accessing that value
(other than as an array of characters) has undefined behavior.
(This is a proposal, not what the standard currently says.)
IA64 happens to have a way of (partially) representing that
provenance in hardware, outside the object in question. Other or
future architectures might do a more complete job.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Re: Does reading an uninitialized object have undefined behavior?

<874jlwopn5.fsf@bsb.me.uk>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=532&group=comp.std.c#532

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.usenet@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Fri, 21 Jul 2023 23:39:42 +0100
Organization: A noiseless patient Spider
Lines: 63
Message-ID: <874jlwopn5.fsf@bsb.me.uk>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com>
<87zg3pnuse.fsf@bsb.me.uk> <874jlxozzz.fsf@nosuchdomain.example.com>
<87fs5hnipv.fsf@bsb.me.uk> <87a5vpnegz.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="51a6bbfc275c790b968946b86f670b4c";
logging-data="3647541"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18tr3IwHKV56uZuXXt/24Z7/qGaz6AMwrM="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:OCk2oAONWh0/GtzIYsbRiG23Ahk=
sha1:D2aVoRDVXz7zvnxjp0B4/UjVEpk=
X-BSB-Auth: 1.2bf591313e11716e133c.20230721233942BST.874jlwopn5.fsf@bsb.me.uk
 by: Ben Bacarisse - Fri, 21 Jul 2023 22:39 UTC

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
> [...]
>> There are three relevant clauses in Annex J, and I think we should keep
>> them all in mind. Sadly, they are not numbered (until C23) so I've
>> given then 'UB' numbers taken from the similar wording in C23.
>>
>> — The value of an object with automatic storage duration is used while
>> it is indeterminate (6.2.4, 6.7.9, 6.8). [UB-11]
>>
>> — A trap representation is read by an lvalue expression that does not
>> have character type (6.2.6.1). [UB-12]
>>
>> — An lvalue designating an object of automatic storage duration that
>> could have been declared with the register storage class is used in
>> a context that requires the value of the designated object, but the
>> object is uninitialized. (6.3.2.1). [UB-20]
> [...]
>>> An object with automatic storage duration might be stored in an IA64
>>> CPU register. If the object is not initialized, the register's
>>> NaT bit would be set. Any attempt to read it would cause a trap.
>>> Writing it would clear the NaT bit.
>>>
>>> Which means that a hypothetical CPU with something like a NaT bit
>>> on each word of memory (iAPX 432? i960?) might cause a trap in
>>> circumstances not covered by that wording -- but it *is* covered
>>> by the wording in Annex J.
>>
>> It's covered by UB-12 and that's backed up by normative text,
>> specifically paragraph 5 of the section cited in UB-12.
>
> I don't think so. A "non-value representation" (formerly a "trap
> representation") is determined by the bits making up the representation
> of an object. For an integer type, such a representation can occur only
> if the type has padding bits. The IA64 NaT bit is not part of the
> representation; it's neither a value bit nor a padding bit.
>
> For a 64-bit integer type, given CHAR_BIT==8, its *representation* is
> defined as a set of 8 bytes that can be copied into an object of type
> `unsigned char[8]`. The NaT bit does not contribute to the size of the
> object.

Ah, right. I thought you were including it as a padding bit.

> I think the right way for C to permit NaT-like bits is, as Kaz
> suggested, to define "indeterminate value" in terms of provenance,
> not just the bits that make up its current representation.
> An automatic object with no initialization, or a malloc()ed object,
> starts with an indeterminate value, and accessing that value
> (other than as an array of characters) has undefined behavior.
> (This is a proposal, not what the standard currently says.)
> IA64 happens to have a way of (partially) representing that
> provenance in hardware, outside the object in question. Other or
> future architectures might do a more complete job.
>
> [...]

That would work.

--
Ben.

Re: Does reading an uninitialized object have undefined behavior?

<20230721233227.651@kylheku.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=534&group=comp.std.c#534

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-4973@kylheku.com (Kaz Kylheku)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Sat, 22 Jul 2023 06:40:39 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <20230721233227.651@kylheku.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com>
<87zg3pnuse.fsf@bsb.me.uk>
Injection-Date: Sat, 22 Jul 2023 06:40:39 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b30b716b22f4706121b4a047272243a6";
logging-data="3918199"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/byYTraUq1YHpYd/m/MbfOYGU7/IDWb20="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:HR5+rxLQbRSaEWu+XoLgmrar9UY=
 by: Kaz Kylheku - Sat, 22 Jul 2023 06:40 UTC

On 2023-07-21, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
> 6.3.2.1 p2:
>
> "[...] If the lvalue designates an object of automatic storage
> duration that could have been declared with the register storage class
> (never had its address taken), and that object is uninitialized (not
> declared with an initializer and no assignment to it has been
> performed prior to use), the behavior is undefined."
>
> seems to cover it. The restriction on not having it's address taken
> seems odd.

Wording like that looks like someone's solo documentation effort,
not peer-reviewed by an expert commitee.

That looks as if the intent is to allow some diagnoses of uses of
uninitialized variables, while discouraging others.

However, it doesn't seem a good idea to be constraining
implementations in how clever they can be in identifying
an erroneous situation.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: Does reading an uninitialized object have undefined behavior?

<21265efa-1bfe-4049-950f-45b75f0b4f71n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=535&group=comp.std.c#535

  copy link   Newsgroups: comp.std.c
X-Received: by 2002:a05:622a:1827:b0:3df:375:5102 with SMTP id t39-20020a05622a182700b003df03755102mr11288qtc.2.1690031034238;
Sat, 22 Jul 2023 06:03:54 -0700 (PDT)
X-Received: by 2002:a4a:3794:0:b0:569:a35b:1bcd with SMTP id
r142-20020a4a3794000000b00569a35b1bcdmr6036202oor.1.1690031033966; Sat, 22
Jul 2023 06:03:53 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.std.c
Date: Sat, 22 Jul 2023 06:03:53 -0700 (PDT)
In-Reply-To: <20230721233227.651@kylheku.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a02:810d:acbf:c050:8e97:5915:746b:2720;
posting-account=RQgdUAoAAACC04vq-o2ZyxdALW1NmdRY
NNTP-Posting-Host: 2a02:810d:acbf:c050:8e97:5915:746b:2720
References: <87zg3pq1ym.fsf@nosuchdomain.example.com> <87zg3pnuse.fsf@bsb.me.uk>
<20230721233227.651@kylheku.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <21265efa-1bfe-4049-950f-45b75f0b4f71n@googlegroups.com>
Subject: Re: Does reading an uninitialized object have undefined behavior?
From: ma.uecker@gmail.com (Martin Uecker)
Injection-Date: Sat, 22 Jul 2023 13:03:54 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3899
 by: Martin Uecker - Sat, 22 Jul 2023 13:03 UTC

On Saturday, July 22, 2023 at 8:40:42 AM UTC+2, Kaz Kylheku wrote:
> On 2023-07-21, Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
> > 6.3.2.1 p2:
> >
> > "[...] If the lvalue designates an object of automatic storage
> > duration that could have been declared with the register storage class
> > (never had its address taken), and that object is uninitialized (not
> > declared with an initializer and no assignment to it has been
> > performed prior to use), the behavior is undefined."
> >
> > seems to cover it. The restriction on not having it's address taken
> > seems odd.
> Wording like that looks like someone's solo documentation effort,
> not peer-reviewed by an expert commitee.
>
> That looks as if the intent is to allow some diagnoses of uses of
> uninitialized variables, while discouraging others.
>
> However, it doesn't seem a good idea to be constraining
> implementations in how clever they can be in identifying
> an erroneous situation.

I personally like this rule (but I am speaking about me. there is
no full consensus about the exact interpretation of the standard
nor about what it should say). I will try to explain why.

In C, we also can access objects using character points. This
should work in all cases, even for non-value (trap) representations,
and is also used in practice a lot to copy uninitialized or partially
initialized objects. If one makes all reads of objects with
indeterminate representation have undefined behavior, than
this would not work anymore.

If one wants to allow this (and a lot of real-world programs rely
on this), then one has to invent rules how this works with an
abstract (provenance-based) notion of indeterminate values.
This turns out to be difficult.

But if we keep this rule, it becomes very simple: On the one
hand, all reads of uninitialized automatic variables whose
address is not taken are undefined behavior. This is the most
useful behavior for detecting bugs and/or optimization.

On the other hand, taking an address and working with character
pointer to copy or manipulate an object is always defined, one
simply gets unspecified representation bytes (which may be
a non-value representation for some type and it is UB to
read them using a lvalue of this type). So low-level operations
with partially initialized objects work as expected without having
to introduce complicated rules.

It will cost a tiny bit of optimization opportunities, but avoid
a lot of trouble.

Martin

Re: Does reading an uninitialized object have undefined behavior?

<hV2dneL6E4fNjSP5nZ2dnZeNn_pg4p2d@giganews.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=536&group=comp.std.c#536

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!69.80.99.26.MISMATCH!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Mon, 24 Jul 2023 05:46:56 +0000
Subject: Re: Does reading an uninitialized object have undefined behavior?
Newsgroups: comp.std.c
References: <87zg3pq1ym.fsf@nosuchdomain.example.com> <20230721002225.404@kylheku.com>
From: jb-usenet@wisemo.com.invalid (Jakob Bohm)
Organization: WiseMo A/S
Date: Mon, 24 Jul 2023 07:53:59 +0200
User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:6.2) Goanna/20230604 Epyrus/2.0.2
MIME-Version: 1.0
In-Reply-To: <20230721002225.404@kylheku.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Message-ID: <hV2dneL6E4fNjSP5nZ2dnZeNn_pg4p2d@giganews.com>
Lines: 86
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-e9fsfHViByBg72koPJL9kH5e3jhWZ6+P15/euAo7wv7oNMylXsy5a6CfgdTsBCy7MSGzW83IyznWpxy!i292GotzFknruNpCDKcd3xZ0cw0+AdgUC1PfftGKdDoykvM4GQC0WXtovqMqNn4rWarrMozxI/M=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: Jakob Bohm - Mon, 24 Jul 2023 05:53 UTC

On 2023-07-21 19:42, Kaz Kylheku wrote:
> On 2023-07-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> N3096 is the last public draft of the upcoming C23 standard.
>>
>> N3096 J.2 says:
>>
>> The behavior is undefined in the following circumstances:
>> [...]
>> (11) The value of an object with automatic storage duration is
>> used while the object has an indeterminate representation
>> (6.2.4, 6.7.10, 6.8).
>
> Personally, I think that the root cause of this whole issue is
> the defective definition of indeterminate value.
>

The problem is much deeper than that. It all boils down to the
obsession in the official C community to abuse the concept of
"undefined" to cover everything from "arbitrary natural semantics
of the hardware" to "optimizing away code unexpectedly" . It would
be highly beneficial to a cleanup in C30 or even a corrective TR to
split up the concept into explicit cases that vary for each
situation. For example, runtime error reporting should be very
different from optimizing away code that may encounter runtime
errors on different hardware than the one it is actually run on.

From a simplified conceptual machine model that resembles a modern
von Neumann architecture with only floating point types having
actual trap representations, a lot of rules that have at various
times been rephrased using the word "undefined" seem utterly absurd,
and applying the current meaning of "undefined" back to the
actual machines that inspired them will tend to cause even more absurdities.

For example that ability of the IA64 CPUs to raise an actual trap
exception in response to reading an uninitialized register is very
different from aggressively optimizing away code that might use an
unknown stray value, especially with the aggressive optimization
settings required by the IA64 Explicitly Parallel design.

Some of the things that "undefined" in the current text could map
to:

- anyof(A,B,C) = An implementation specific and possibly uncontrolled
choice between A, B and C (with no others permitted).
- Continuing as if nothing happened
- Aborting execution, possibly with an error indication.
- raise(X) where X is specified in the standard.
- An implementation specific value to be listed in the
implementation documentation.
- A standard specified value.
- Executing machine code at a specified memory address in accordance
with the actual machine behavior (This is common for calling
a function pointer that isn't set to a C function of proper type).
- Causing the code to be eliminated (think assume(0);)
- Reserved for future standardization in future editions.
- Reserved for standardization in other ISO documents (such as POSIX
or C++).
- Reserved for implementation specific behavior to be listed in the
implementation documentation.

For example, the effect of calling assert() with a false value is
"anyof(continuing as if nothing, abort with error)", with it being
implementation defined how to force either choice (many
implementations will use the status of the DEBUG define).

There should also be a way for limits.h (one of the few headers
required in free-standing implementations) to specify via new
standard defines if the implementation conforms to common sets
of implementation specific behaviors such as "twos complement int
with wraparound", "ones complement int with wraparound", "sign
and magnitude int with wraparound", "unsigned with wraparound",
"IEEE nnnn floating point with/without overflow exceptions",
"negative int division by positive int rounds towards zero"
(and the other possibilities for division special cases) etc. etc.

Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

Re: Does reading an uninitialized object have undefined behavior?

<864jlrs28d.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=540&group=comp.std.c#540

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Tue, 25 Jul 2023 21:53:06 -0700
Organization: A noiseless patient Spider
Lines: 28
Message-ID: <864jlrs28d.fsf@linuxsc.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com> <87zg3pnuse.fsf@bsb.me.uk> <20230721233227.651@kylheku.com> <21265efa-1bfe-4049-950f-45b75f0b4f71n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="5ba297a04e2080c4db730e1e3158367c";
logging-data="1490722"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+3YzKaLHNhqMnLjQSJlAagf6POW7Mys+g="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:DfLRlm0CsTWaX2oESGvB84IaESk=
sha1:gBWhM8pLs9ZN/cfkL1NUwXx/RNQ=
 by: Tim Rentsch - Wed, 26 Jul 2023 04:53 UTC

Martin Uecker <ma.uecker@gmail.com> writes:

> On Saturday, July 22, 2023 at 8:40:42?AM UTC+2, Kaz Kylheku wrote:
>
>> On 2023-07-21, Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
>>
>>> 6.3.2.1 p2:
>>>
>>> "[...] If the lvalue designates an object of automatic storage
>>> duration that could have been declared with the register storage
>>> class (never had its address taken), and that object is
>>> uninitialized (not declared with an initializer and no
>>> assignment to it has been performed prior to use), the behavior
>>> is undefined."
>>>
>>> seems to cover it. The restriction on not having it's address
>>> taken seems odd.
>>
>> [...]
>
> I personally like this rule (but I am speaking about me. there is
> no full consensus about the exact interpretation of the standard
> nor about what it should say). I will try to explain why. [...]

It's a good rule. I agree with your comments. I guess it's
possible the wording could be improved, but compared to other
parts of the C standard the clarity of this passage is closer to
the top than it is to the bottom.

Re: Does reading an uninitialized object have undefined behavior?

<86zg3jqngv.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=541&group=comp.std.c#541

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Tue, 25 Jul 2023 21:57:20 -0700
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <86zg3jqngv.fsf@linuxsc.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com> <20230721002225.404@kylheku.com> <hV2dneL6E4fNjSP5nZ2dnZeNn_pg4p2d@giganews.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="5ba297a04e2080c4db730e1e3158367c";
logging-data="1490722"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+UmqnCOL26kshtJV5WN9FqxF5imATbtRU="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:5RCPIb/KnV/i2OpF1NGaokrv1ps=
sha1:WB9r1LSm03V/uEVM5BkkTDdvzbY=
 by: Tim Rentsch - Wed, 26 Jul 2023 04:57 UTC

Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:

> On 2023-07-21 19:42, Kaz Kylheku wrote:
>
>> On 2023-07-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>>
>>> N3096 is the last public draft of the upcoming C23 standard.
>>>
>>> N3096 J.2 says:
>>>
>>> The behavior is undefined in the following circumstances:
>>> [...]
>>> (11) The value of an object with automatic storage duration is
>>> used while the object has an indeterminate representation
>>> (6.2.4, 6.7.10, 6.8).
>>
>> Personally, I think that the root cause of this whole issue is
>> the defective definition of indeterminate value.
>
> The problem is much deeper than that. It all boils down to the
> obsession in the official C community to abuse the concept of
> "undefined" to cover everything from "arbitrary natural semantics
> of the hardware" to "optimizing away code unexpectedly" . [...]

This discussion looks interesting but it seems better that
there be a separate thread to take it up.

Re: Does reading an uninitialized object have undefined behavior?

<864jlfj34p.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=545&group=comp.std.c#545

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Thu, 03 Aug 2023 13:13:26 -0700
Organization: A noiseless patient Spider
Lines: 448
Message-ID: <864jlfj34p.fsf@linuxsc.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="4bb75e9aac326999a2b930a7e60305cb";
logging-data="960272"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX191Z1UJl/0gemhEyyAt9KWLiTgeEaEpT0I="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:MSCkZZMoNKkX4D18PGcwmmQUEg8=
sha1:FXWB6fh5sFCT2Ut5Dh+MbURceOc=
 by: Tim Rentsch - Thu, 3 Aug 2023 20:13 UTC

Repeating the question stated in the Subject line:

Does reading an uninitialized object [always] have undefined
behavior?

Background: Annex J part 2 says (in various phrasings in
different revisions of the C standard, with the one below
being taken from C90):

The value of an uninitialized object that has automatic
storage duration is used before a value is assigned [is
undefined behavior] (6.5.7)

Remembering that Annex J is informative rather than normative,
is this statement right even for a type that has no trap
representations? To ask that question another way, is this
statement always right or is it just a (perhaps useful)
approximation?

I think this question can be answered convincingly by reviewing
the subject's history in each revision of the ISO C standard.

We start in C90.

In C90 reading the value of an uninitialized object is always
undefined behavior (and that includes malloc()ed storage as well
as automatic storage duration objects). The C90 standard says,
in 6.5.7:

If an object that has automatic storage duration is not
initialized explicitly, its value is indeterminate.

and in 7.10.3.3:

The malloc function allocates space for an object whose size
is specified by size and whose value is indeterminate.

The term "indeterminate" is not defined in C90, but accessing
storage that is indeterminate is explicitly undefined behavior.
Indeed such uses are part of the /definition/ of undefined
behavior - C90 says in 3.16 (which is an entry in Definitions):

undefined behavior: Behavior, upon use of a nonportable or
erroneous program construct, of erroneous data, or of
indeterminately valued objects, for which this International
Standard imposes no requirements.

So for C90 we have a clear answer: always undefined behavior for
accessing any uninitialized object.

Unfortunately the C90 scheme has some serious issues. There is
no exception for reading using a character type. More seriously,
although C90 gives some situations that cause values to be
indeterminate, it doesn't say anything about making them /not/
be indeterminate. We can guess (but only guess) that assigning
a value to the object as a whole removes indeterminate-ness, but
what about these cases (and other similar ones):

int x;
*(char*)&x = 0;
// is the value of x now indeterminate or not?

struct { int x, y; } s;
s.x = 0;
// is the value of s now indeterminate or not?

Again, we can make guesses about what these answers should be,
but the C90 standard doesn't say. Clearly C90 has some
significant deficiencies.

Next we look at C99.

(Actually, before we do that, I should mention that C90 was
amended and corrected in 1994, 1995, and 1996, by the three
intermediate documents ISO/IEC 9899/COR1, ISO/IEC 9899/AMD1, and
ISO/IEC 9899/COR2. As far as I am aware these revisions have no
bearing on the matter at hand.)

The C99 standard represents a substantial revision and expansion
of the C90 standard. The relationship between uninitialized
memory and undefined behavior is nearly completely rewritten, and
also made more concrete. There's lots to look at here. Starting
at the top, the definition of undefined behavior is revised not
to give any mention of indeterminately valued objects. Here is
section 3.4.3 paragraph 1:

undefined behavior
behavior, upon use of a nonportable or erroneous program
construct or of erroneous data, for which this International
Standard imposes no requirements

(Incidentally the section and paragraph references given in this
part of the discussion are relative to the ISO N1256 document.)

The next most prominent change is that "indeterminate value" is
explicitly defined, in section 3.17.2 paragraph 1:

indeterminate value
either an unspecified value or a trap representation

This definition makes use of two new terms, "unspecified value"
and "trap representation", that were not used in C90. The term
unspecified value is defined immediately following, in 3.17.3 p1:

unspecified value
valid value of the relevant type where this International
Standard imposes no requirements on which value is chosen in
any instance

There is also an informative note in p2:

NOTE An unspecified value cannot be a trap representation.

The term "trap representation" is defined in 6.2.6.1 p5:

Certain object representations need not represent a value of
the object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. If such
a representation is produced by a side effect that modifies
all or any part of the object by an lvalue expression that
does not have character type, the behavior is undefined.41)
Such a representation is called a /trap representation/.

The slant characters around "trap representation" indicate
italics, which the C standard uses to denote a term being
defined. Also there is a '41)' footnote reference

41) Thus, an automatic variable can be initialized to a trap
representation without causing undefined behavior, but the
value of the variable cannot be used until a proper value is
stored in it.

which underscores the non-undefined-behavior aspect of using
character types to change the object representation (and hence
the value) of an object.

The C99 text doesn't use the term "trap representation" very
often. There are several cases where certain types are ruled out
from having trap representations; a few cases where a result
/might be/ a trap representation; and a case involving integer
types where there is an implementation-defined choice as to
whether a specific combination of value bits is a valid value or
a trap representation. Also, in Annex J part 2, the list of
undefined behaviors, there are these summary items:

A trap representation is read by an lvalue expression that
does not have character type (6.2.6.1).

A trap representation is produced by a side effect that
modifies any part of the object using an lvalue expression
that does not have character type (6.2.6.1).

which of course correspond directly to what is said in the
definition of trap representation. Based on various passages in
section 6.2.6, which describes the representation of types, we
can deduce that for some integer types all bit combinations must
be a valid value, and so no trap representations are possible for
those types. Such types always include 'unsigned char', and may
also include other integer types depending on the size of the
type, the value of CHAR_BIT, and the values given in <limits.h>
for the range of the type in question. (More concretely, if the
set of distinct values for type T has 2**(sizeof(T)*CHAR_BIT)
elements, then all object representations are valid values, and
thus type T cannot have any trap representations.)

There are three points worth mentioning regarding unspecified
values and trap representations. One is that unspecified values
are always valid values, and never by themselves cause undefined
behavior. Two is that the distinction between an unspecified
value and a trap representation depends on the type used to
access the object. Three is that, once we know the type of an
access, whether a given object holds a valid value or a trap
representation depends only on the bits and bytes that make up
the object representation of the object, and in particular not on
any hidden "magic" state associated with the object. (There is
one case though that deserves a closer look, which is explained
further on.)

The rule for trap representations is simple and clear: any
access of an object whose object representation is a trap
representation of the access's type is undefined behavior, and
this consequence is accurately portrayed in Annex J part 2.

Having settled the question for trap representations, how about
indeterminate values?

Ruling out the definition and an entry in the index, the term
"indeterminate value" (or values plural) appears in just six
places in the C99 standard: three in informative passages
(usually examples), and three normative passages, those being
6.7.8 paragraph 9 (about unnamed members), 6.8 paragraph 3 (about
declarations for objects with automatic storage duration), and
7.20.3.4 paragraph 2 (about bytes added by a call to realloc()).
The sentence in 6.8 paragraph 3 deserves quoting:

The initializers of objects that have automatic storage
duration, and the variable length array declarators of
ordinary identifiers with block scope, are evaluated and the
values are stored in the objects (including storing an
indeterminate value in objects without an initializer) each
time the declaration is reached in the order of execution, as
if it were a statement, and within each declaration in the
order that declarators appear.

Section 7 has many places where the word "indeterminate" appears
without being followed by "value". I think most of these can be
safely skipped over, but the description of malloc() deserves
quoting (it is 7.20.3.3 paragraph 2):


Click here to read the complete article
Re: Does reading an uninitialized object have undefined behavior?

<871qgjlqe9.fsf@nosuchdomain.example.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=546&group=comp.std.c#546

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Thu, 03 Aug 2023 15:20:14 -0700
Organization: None to speak of
Lines: 101
Message-ID: <871qgjlqe9.fsf@nosuchdomain.example.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com>
<864jlfj34p.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="ad89738ee6b746bb8bd9b1180a09fd6e";
logging-data="1002229"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+vBu8gjAYem7GPAePrMqSn"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:Djk9iKvEsy51fvHLKxPeb2QufXs=
sha1:b7dosTQNRwf5ZIpsSyvJZdL7Hng=
 by: Keith Thompson - Thu, 3 Aug 2023 22:20 UTC

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
> Repeating the question stated in the Subject line:
>
> Does reading an uninitialized object [always] have undefined
> behavior?
>
> Background: Annex J part 2 says (in various phrasings in
> different revisions of the C standard, with the one below
> being taken from C90):
>
> The value of an uninitialized object that has automatic
> storage duration is used before a value is assigned [is
> undefined behavior] (6.5.7)
>
> Remembering that Annex J is informative rather than normative,
> is this statement right even for a type that has no trap
> representations? To ask that question another way, is this
> statement always right or is it just a (perhaps useful)
> approximation?
[400+ lines deleted]
> Summary: my reading is that accessing an object that has not
> been explicitly stored into since its declaration was evaluated
> is necessarily undefined behavior in C90, but not necessarily
> undefined behavior in C99 and C11 (and AFAIAA also in C17 and
> the upcoming C23). My reasoning is given in detail above.
>
>
> Postscript: this commentary has taken much longer to write than
> I thought it would, for the most part because I made an early
> decision to be systematic and thorough. I hope the effort has
> helped the readers gain confidence in the explanations and
> conclusions stated. I may return to the deferred topic about
> pointer types but have no plans at present about when that might
> be.

Thank you for taking the time to write that.

I'd like to offer a brief summary of the points you made. Please let me
know if my summary is incorrect.

- An "indeterminate value" is by definition either an "unspecified
value" or a "trap representation".

- In C90 (which did not yet define all these terms), accessing the value
of an uninitialized object explicitly has undefined behavior.

- In C99 and later, J.2 (which is *not* normative) states that using the
value of an object with automatic storage duration while it is
indeterminate has undefined behavior. This implies that:
int main(void) {
int n;
n;
}
has undefined behavior, even if int has no trap representations.

- Statements in J.2 *should* be supported by normative text.

- There is no normative text in any post-C90 edition of the C
standard that supports the claim that reading an uninitialized
int object actually has undefined behavior if it does not hold
a trap representation. (Pointers raise other issues, which I'll
ignore for now.)

- The cited statement in J.2 is incorrect, or at least imprecise.

I agree with you on all the above points.

There is one point on which I think we disagree. It is a matter
of opinion, not of fact. You wrote:

Remembering that Annex J is informative rather than normative,
is this statement right even for a type that has no trap
representations? To ask that question another way, is this
statement always right or is it just a (perhaps useful)
approximation?

The statement in N1570 J.2 is:

The behavior is undefined in the following circumstances:
[...]
- The value of an object with automatic storage duration is used
while it is indeterminate (6.2.4, 6.7.9, 6.8).

I get the impression that you're not particularly bothered by the fact
that the statement in J.2 is merely an "approximation". In my opinion,
the statement in J.2 is simply incorrect, and should be fixed. (That's
unlikely to be possible at this stage of the C23 process.) The fact
that Annex J is, to quote the standard's foreword, "for information
only", is not an excuse to ignore factual errors. Readers of the
standard rely on the informative annexes to provide correct information.
This particular text is not just a "(perhaps useful) approximation"; it
is actively misleading.

I'm not criticizing the author of the standard for making this mistake.
Stuff happens. It was likely a result of an oversight during the
transition from C90 to C99.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Re: Does reading an uninitialized object have undefined behavior?

<05f976a4-270f-42bd-8cc7-1d4616422748n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=548&group=comp.std.c#548

  copy link   Newsgroups: comp.std.c
X-Received: by 2002:ae9:de04:0:b0:76c:e08d:dfb2 with SMTP id s4-20020ae9de04000000b0076ce08ddfb2mr10643qkf.9.1691223346830;
Sat, 05 Aug 2023 01:15:46 -0700 (PDT)
X-Received: by 2002:a9d:7c95:0:b0:6bc:b75c:f32f with SMTP id
q21-20020a9d7c95000000b006bcb75cf32fmr3987519otn.2.1691223346537; Sat, 05 Aug
2023 01:15:46 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.std.c
Date: Sat, 5 Aug 2023 01:15:46 -0700 (PDT)
In-Reply-To: <871qgjlqe9.fsf@nosuchdomain.example.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a02:8388:e203:9700:eddb:fb4f:5189:911d;
posting-account=RQgdUAoAAACC04vq-o2ZyxdALW1NmdRY
NNTP-Posting-Host: 2a02:8388:e203:9700:eddb:fb4f:5189:911d
References: <87zg3pq1ym.fsf@nosuchdomain.example.com> <864jlfj34p.fsf@linuxsc.com>
<871qgjlqe9.fsf@nosuchdomain.example.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <05f976a4-270f-42bd-8cc7-1d4616422748n@googlegroups.com>
Subject: Re: Does reading an uninitialized object have undefined behavior?
From: ma.uecker@gmail.com (Martin Uecker)
Injection-Date: Sat, 05 Aug 2023 08:15:46 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 7364
 by: Martin Uecker - Sat, 5 Aug 2023 08:15 UTC

On Friday, August 4, 2023 at 12:20:25 AM UTC+2, Keith Thompson wrote:
> Tim Rentsch <tr.1...@z991.linuxsc.com> writes:
> > Repeating the question stated in the Subject line:
> >
> > Does reading an uninitialized object [always] have undefined
> > behavior?
> >
> > Background: Annex J part 2 says (in various phrasings in
> > different revisions of the C standard, with the one below
> > being taken from C90):
> >
> > The value of an uninitialized object that has automatic
> > storage duration is used before a value is assigned [is
> > undefined behavior] (6.5.7)
> >
> > Remembering that Annex J is informative rather than normative,
> > is this statement right even for a type that has no trap
> > representations? To ask that question another way, is this
> > statement always right or is it just a (perhaps useful)
> > approximation?
> [400+ lines deleted]
> > Summary: my reading is that accessing an object that has not
> > been explicitly stored into since its declaration was evaluated
> > is necessarily undefined behavior in C90, but not necessarily
> > undefined behavior in C99 and C11 (and AFAIAA also in C17 and
> > the upcoming C23). My reasoning is given in detail above.
> >
> >
> > Postscript: this commentary has taken much longer to write than
> > I thought it would, for the most part because I made an early
> > decision to be systematic and thorough. I hope the effort has
> > helped the readers gain confidence in the explanations and
> > conclusions stated. I may return to the deferred topic about
> > pointer types but have no plans at present about when that might
> > be.
> Thank you for taking the time to write that.
>
> I'd like to offer a brief summary of the points you made. Please let me
> know if my summary is incorrect.
>
> - An "indeterminate value" is by definition either an "unspecified
> value" or a "trap representation".
>
> - In C90 (which did not yet define all these terms), accessing the value
> of an uninitialized object explicitly has undefined behavior.
>
> - In C99 and later, J.2 (which is *not* normative) states that using the
> value of an object with automatic storage duration while it is
> indeterminate has undefined behavior. This implies that:
> int main(void) {
> int n;
> n;
> }
> has undefined behavior, even if int has no trap representations.
>
> - Statements in J.2 *should* be supported by normative text.
>
> - There is no normative text in any post-C90 edition of the C
> standard that supports the claim that reading an uninitialized
> int object actually has undefined behavior if it does not hold
> a trap representation. (Pointers raise other issues, which I'll
> ignore for now.)
>
> - The cited statement in J.2 is incorrect, or at least imprecise.
>
> I agree with you on all the above points.
>
> There is one point on which I think we disagree. It is a matter
> of opinion, not of fact. You wrote:
>
> Remembering that Annex J is informative rather than normative,
> is this statement right even for a type that has no trap
> representations? To ask that question another way, is this
> statement always right or is it just a (perhaps useful)
> approximation?
> The statement in N1570 J.2 is:
> The behavior is undefined in the following circumstances:
> [...]
> - The value of an object with automatic storage duration is used
> while it is indeterminate (6.2.4, 6.7.9, 6.8).
>
> I get the impression that you're not particularly bothered by the fact
> that the statement in J.2 is merely an "approximation". In my opinion,
> the statement in J.2 is simply incorrect, and should be fixed. (That's
> unlikely to be possible at this stage of the C23 process.) The fact
> that Annex J is, to quote the standard's foreword, "for information
> only", is not an excuse to ignore factual errors. Readers of the
> standard rely on the informative annexes to provide correct information.
> This particular text is not just a "(perhaps useful) approximation"; it
> is actively misleading.
>
> I'm not criticizing the author of the standard for making this mistake.
> Stuff happens. It was likely a result of an oversight during the
> transition from C90 to C99.

I personally agree with this analysis and also about the need to fix J.2.
Pointers seem to fit into this scheme if you think about the valid
addresses of objects + null pointers as the set of valid values
for a pointer. Any representation not corresponding to such an
address is then a non-value representation.

But note that there are many people who believe that "indeterminate"
should be understood as an abstract property propagated similar
to pointer provenance that can be an abstract non-value
representation even for types which do not have room for such
representations.

For C23 the rules stay the same. We changed the term "trap representation"
to "non-value representation" because people were often confused.
A non-value representation is UB in lvalue conversion but this does
not necessarily imply a trap. On the other hand, a trap might be
defined behavior caused by a valid value of a type.

The term "indeterminate value" was changed to "indeterminate
representation" because the wording "an indeterminate value is
either an unspecified value or a trap representation" does not
much sense because value and representation are different
things. Also some compilers and also C++ have indeterminate
values with different semantics, which caused confusion, i.e.
in C++ you can copy indeterminate values from an uninitialized
object to another and this is not UB. In C you either directly
have UB or you copy an unspecified value which is valid, so
there are no indeterminate values as such.

Martin

Re: Does reading an uninitialized object have undefined behavior?

<86a5uv95g7.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=549&group=comp.std.c#549

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Sat, 12 Aug 2023 17:00:40 -0700
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <86a5uv95g7.fsf@linuxsc.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com> <87zg3pnuse.fsf@bsb.me.uk> <874jlxozzz.fsf@nosuchdomain.example.com> <87fs5hnipv.fsf@bsb.me.uk> <87a5vpnegz.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="40b94628daf1222a7895c880f36d7582";
logging-data="1635508"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19MaWBsLkERspqwWeT5vlxjj0IFBePT1a0="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:ytKDZl9PIlcus2vriIRjBUMze44=
sha1:LAQCpRjwoKNHvKT4TYkZFNILVUY=
 by: Tim Rentsch - Sun, 13 Aug 2023 00:00 UTC

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> I think the right way for C to permit NaT-like bits is, as Kaz
> suggested, to define "indeterminate value" in terms of provenance,
> not just the bits that make up its current representation. [...]

This idea is fundamentally wrong. NaT bits are associated with
particular areas of memory, which is to say objects. The point
of provenance is that non-viability is associated with /values/,
not with objects. Once an area of memory acquires an object
representation, the NaT bit or NaT bits for that memory are set
to zero, end of story. Note also that NaT bits are independent
of what type is used to access an object - if the NaT bit is set
then any access is illegal, no matter what type is used to do the
access. By contrast, provenance is used in situations where
non-viability is associated with values, not with objects. But
values are always type dependent; a pointer object that holds
a value that has been passed to free() is "indeterminate" when
accessed as a pointer type, but perfectly okay to access as an
unsigned char type. The two kinds of situations are essentially
different, and the theoretical models used to characterize the
rules in the two kinds of situations should therefore be
correspondingly essentially different.

Re: Does reading an uninitialized object have undefined behavior?

<fcb2be8f-b346-421f-9804-5f94c93266b0n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=552&group=comp.std.c#552

  copy link   Newsgroups: comp.std.c
X-Received: by 2002:ae9:df07:0:b0:76c:b401:14d9 with SMTP id t7-20020ae9df07000000b0076cb40114d9mr91250qkf.5.1691995267264;
Sun, 13 Aug 2023 23:41:07 -0700 (PDT)
X-Received: by 2002:a17:902:c94b:b0:1bc:4753:eddf with SMTP id
i11-20020a170902c94b00b001bc4753eddfmr3845556pla.5.1691995266715; Sun, 13 Aug
2023 23:41:06 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.std.c
Date: Sun, 13 Aug 2023 23:41:06 -0700 (PDT)
In-Reply-To: <86a5uv95g7.fsf@linuxsc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a02:8388:e203:9700:eddb:fb4f:5189:911d;
posting-account=RQgdUAoAAACC04vq-o2ZyxdALW1NmdRY
NNTP-Posting-Host: 2a02:8388:e203:9700:eddb:fb4f:5189:911d
References: <87zg3pq1ym.fsf@nosuchdomain.example.com> <87zg3pnuse.fsf@bsb.me.uk>
<874jlxozzz.fsf@nosuchdomain.example.com> <87fs5hnipv.fsf@bsb.me.uk>
<87a5vpnegz.fsf@nosuchdomain.example.com> <86a5uv95g7.fsf@linuxsc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fcb2be8f-b346-421f-9804-5f94c93266b0n@googlegroups.com>
Subject: Re: Does reading an uninitialized object have undefined behavior?
From: ma.uecker@gmail.com (Martin Uecker)
Injection-Date: Mon, 14 Aug 2023 06:41:07 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Martin Uecker - Mon, 14 Aug 2023 06:41 UTC

On Sunday, August 13, 2023 at 2:00:45 AM UTC+2, Tim Rentsch wrote:
> Keith Thompson <Keith.S.T...@gmail.com> writes:
>
> > I think the right way for C to permit NaT-like bits is, as Kaz
> > suggested, to define "indeterminate value" in terms of provenance,
> > not just the bits that make up its current representation. [...]
>
> This idea is fundamentally wrong. NaT bits are associated with
> particular areas of memory, which is to say objects. The point
> of provenance is that non-viability is associated with /values/,
> not with objects. Once an area of memory acquires an object
> representation, the NaT bit or NaT bits for that memory are set
> to zero, end of story. Note also that NaT bits are independent
> of what type is used to access an object - if the NaT bit is set
> then any access is illegal, no matter what type is used to do the
> access. By contrast, provenance is used in situations where
> non-viability is associated with values, not with objects. But
> values are always type dependent; a pointer object that holds
> a value that has been passed to free() is "indeterminate" when
> accessed as a pointer type, but perfectly okay to access as an
> unsigned char type. The two kinds of situations are essentially
> different, and the theoretical models used to characterize the
> rules in the two kinds of situations should therefore be
> correspondingly essentially different.

One could still consider the idea that "indeterminate" is an
abstract property that yields UB during read even for types
that do not have trap representations. There is no wording
in the C standard to support this, but I would not call this
idea "fundamentally wrong". You are right that this is different
to provenance provenance which is about values. What it would
have in common with pointer provenance is that there is hidden
state in the abstract machine associated with memory that
is not part of the representation. With effective types there
is another example of this.

Martin

Re: Does reading an uninitialized object have undefined behavior?

<864jkz7hrm.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=553&group=comp.std.c#553

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Tue, 15 Aug 2023 21:06:37 -0700
Organization: A noiseless patient Spider
Lines: 51
Message-ID: <864jkz7hrm.fsf@linuxsc.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com> <87zg3pnuse.fsf@bsb.me.uk> <874jlxozzz.fsf@nosuchdomain.example.com> <87fs5hnipv.fsf@bsb.me.uk> <87a5vpnegz.fsf@nosuchdomain.example.com> <86a5uv95g7.fsf@linuxsc.com> <fcb2be8f-b346-421f-9804-5f94c93266b0n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="e1408e2ca4eb08ed594cd7d645ae4b51";
logging-data="3350239"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19869tiY0FH7CcLz7m6zOvaaVA/fNTvghE="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:DA7Fpv7RkKr3M1TXJpKqKpBXjKg=
sha1:phfoPAcLuDd5+VzSc+XecqtOc0w=
 by: Tim Rentsch - Wed, 16 Aug 2023 04:06 UTC

Martin Uecker <ma.uecker@gmail.com> writes:

> On Sunday, August 13, 2023 at 2:00:45?AM UTC+2, Tim Rentsch wrote:
>
>> Keith Thompson <Keith.S.T...@gmail.com> writes:
>>
>>> I think the right way for C to permit NaT-like bits is, as Kaz
>>> suggested, to define "indeterminate value" in terms of provenance,
>>> not just the bits that make up its current representation. [...]
>>
>> This idea is fundamentally wrong. NaT bits are associated with
>> particular areas of memory, which is to say objects. The point
>> of provenance is that non-viability is associated with /values/,
>> not with objects. Once an area of memory acquires an object
>> representation, the NaT bit or NaT bits for that memory are set
>> to zero, end of story. Note also that NaT bits are independent
>> of what type is used to access an object - if the NaT bit is set
>> then any access is illegal, no matter what type is used to do the
>> access. By contrast, provenance is used in situations where
>> non-viability is associated with values, not with objects. But
>> values are always type dependent; a pointer object that holds
>> a value that has been passed to free() is "indeterminate" when
>> accessed as a pointer type, but perfectly okay to access as an
>> unsigned char type. The two kinds of situations are essentially
>> different, and the theoretical models used to characterize the
>> rules in the two kinds of situations should therefore be
>> correspondingly essentially different.
>
> One could still consider the idea that "indeterminate" is an
> abstract property that yields UB during read even for types
> that do not have trap representations. There is no wording
> in the C standard to support this, but I would not call this
> idea "fundamentally wrong". You are right that this is different
> to provenance provenance which is about values. What it would
> have in common with pointer provenance is that there is hidden
> state in the abstract machine associated with memory that
> is not part of the representation. With effective types there
> is another example of this.

My preceding comments were meant to be only about NaT bits (or
NaT-like bits) and provenance. There is an inherent mismatch
between the two, as I have tried to explain. It is only the idea
that provenence would provide a good foundation for defining the
semantics of "NaT everywhere" that I am saying is fundamentally
wrong.

I understand that you want to consider a broader topic, and that,
in the realm of that broader topic, something like provenance
could have a role to play. I think it is worth responding to
that thesis, and am expecting to do so in a separate reply (or
new thread?) although probably not right away.

Re: Does reading an uninitialized object have undefined behavior?

<e043af84-3153-4097-9505-666869fcf727n@googlegroups.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=554&group=comp.std.c#554

  copy link   Newsgroups: comp.std.c
X-Received: by 2002:ad4:5910:0:b0:63f:bde6:2f5d with SMTP id ez16-20020ad45910000000b0063fbde62f5dmr7617qvb.0.1692164438243;
Tue, 15 Aug 2023 22:40:38 -0700 (PDT)
X-Received: by 2002:a63:af50:0:b0:565:ea31:5c5c with SMTP id
s16-20020a63af50000000b00565ea315c5cmr144295pgo.7.1692164437787; Tue, 15 Aug
2023 22:40:37 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.std.c
Date: Tue, 15 Aug 2023 22:40:37 -0700 (PDT)
In-Reply-To: <864jkz7hrm.fsf@linuxsc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a02:8388:e203:9700:eddb:fb4f:5189:911d;
posting-account=RQgdUAoAAACC04vq-o2ZyxdALW1NmdRY
NNTP-Posting-Host: 2a02:8388:e203:9700:eddb:fb4f:5189:911d
References: <87zg3pq1ym.fsf@nosuchdomain.example.com> <87zg3pnuse.fsf@bsb.me.uk>
<874jlxozzz.fsf@nosuchdomain.example.com> <87fs5hnipv.fsf@bsb.me.uk>
<87a5vpnegz.fsf@nosuchdomain.example.com> <86a5uv95g7.fsf@linuxsc.com>
<fcb2be8f-b346-421f-9804-5f94c93266b0n@googlegroups.com> <864jkz7hrm.fsf@linuxsc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e043af84-3153-4097-9505-666869fcf727n@googlegroups.com>
Subject: Re: Does reading an uninitialized object have undefined behavior?
From: ma.uecker@gmail.com (Martin Uecker)
Injection-Date: Wed, 16 Aug 2023 05:40:38 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4828
 by: Martin Uecker - Wed, 16 Aug 2023 05:40 UTC

On Wednesday, August 16, 2023 at 6:06:43 AM UTC+2, Tim Rentsch wrote:
> Martin Uecker <ma.u...@gmail.com> writes:
> > On Sunday, August 13, 2023 at 2:00:45?AM UTC+2, Tim Rentsch wrote:
> >
> >> Keith Thompson <Keith.S.T...@gmail.com> writes:
> >>
> >>> I think the right way for C to permit NaT-like bits is, as Kaz
> >>> suggested, to define "indeterminate value" in terms of provenance,
> >>> not just the bits that make up its current representation. [...]
> >>
> >> This idea is fundamentally wrong. NaT bits are associated with
> >> particular areas of memory, which is to say objects. The point
> >> of provenance is that non-viability is associated with /values/,
> >> not with objects. Once an area of memory acquires an object
> >> representation, the NaT bit or NaT bits for that memory are set
> >> to zero, end of story. Note also that NaT bits are independent
> >> of what type is used to access an object - if the NaT bit is set
> >> then any access is illegal, no matter what type is used to do the
> >> access. By contrast, provenance is used in situations where
> >> non-viability is associated with values, not with objects. But
> >> values are always type dependent; a pointer object that holds
> >> a value that has been passed to free() is "indeterminate" when
> >> accessed as a pointer type, but perfectly okay to access as an
> >> unsigned char type. The two kinds of situations are essentially
> >> different, and the theoretical models used to characterize the
> >> rules in the two kinds of situations should therefore be
> >> correspondingly essentially different.
> >
> > One could still consider the idea that "indeterminate" is an
> > abstract property that yields UB during read even for types
> > that do not have trap representations. There is no wording
> > in the C standard to support this, but I would not call this
> > idea "fundamentally wrong". You are right that this is different
> > to provenance provenance which is about values. What it would
> > have in common with pointer provenance is that there is hidden
> > state in the abstract machine associated with memory that
> > is not part of the representation. With effective types there
> > is another example of this.
> My preceding comments were meant to be only about NaT bits (or
> NaT-like bits) and provenance. There is an inherent mismatch
> between the two, as I have tried to explain. It is only the idea
> that provenence would provide a good foundation for defining the
> semantics of "NaT everywhere" that I am saying is fundamentally
> wrong.
>
> I understand that you want to consider a broader topic, and that,
> in the realm of that broader topic, something like provenance
> could have a role to play. I think it is worth responding to
> that thesis, and am expecting to do so in a separate reply (or
> new thread?) although probably not right away.

I would love to hear your comments, because some people
want to have such an abstract of "indeterminate" and
some already believe that this is how the standard should
be understood already today.

Martin

Re: Does reading an uninitialized object have undefined behavior?

<86v8df55a9.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=555&group=comp.std.c#555

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Wed, 16 Aug 2023 09:19:10 -0700
Organization: A noiseless patient Spider
Lines: 140
Message-ID: <86v8df55a9.fsf@linuxsc.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com> <864jlfj34p.fsf@linuxsc.com> <871qgjlqe9.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="e1408e2ca4eb08ed594cd7d645ae4b51";
logging-data="3543797"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/BmPm/U/x3KjeXe2u+zXajpi/LcTC6Xw4="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:kIoIj9KkBnqBJQZagMZEWHR4uSE=
sha1:jBBGx9qi3JMgzWJL1DhO0ykz33k=
 by: Tim Rentsch - Wed, 16 Aug 2023 16:19 UTC

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>
>> Repeating the question stated in the Subject line:
>>
>> Does reading an uninitialized object [always] have undefined
>> behavior?
>>
>> Background: Annex J part 2 says (in various phrasings in
>> different revisions of the C standard, with the one below
>> being taken from C90):
>>
>> The value of an uninitialized object that has automatic
>> storage duration is used before a value is assigned [is
>> undefined behavior] (6.5.7)
>>
>> Remembering that Annex J is informative rather than normative,
>> is this statement right even for a type that has no trap
>> representations? To ask that question another way, is this
>> statement always right or is it just a (perhaps useful)
>> approximation?
>
> [400+ lines deleted]
>
>> Summary: my reading is that accessing an object that has not
>> been explicitly stored into since its declaration was evaluated
>> is necessarily undefined behavior in C90, but not necessarily
>> undefined behavior in C99 and C11 (and AFAIAA also in C17 and
>> the upcoming C23). My reasoning is given in detail above.
>>
>>
>> Postscript: this commentary has taken much longer to write than
>> I thought it would, for the most part because I made an early
>> decision to be systematic and thorough. I hope the effort has
>> helped the readers gain confidence in the explanations and
>> conclusions stated. I may return to the deferred topic about
>> pointer types but have no plans at present about when that might
>> be.
>
> Thank you for taking the time to write that.

It's nice to be appreciated. Thank you.

> I'd like to offer a brief summary of the points you made. Please let me
> know if my summary is incorrect.

Excellent. I am writing a reaction directly after each item.

> - An "indeterminate value" is by definition either an "unspecified
> value" or a "trap representation".

Yes.

> - In C90 (which did not yet define all these terms), accessing the value
> of an uninitialized object explicitly has undefined behavior.

C90 made "use [...] of indeterminately valued objects" part of the
definition of undefined behavior. To connect the dots we need to
know that "If an object that has automatic storage duration is not
initialized explicitly, its value is indeterminate." These two
normative items are combined into one in J.2: "The value of an
uninitialized object that has automatic storage duration is used
before a value is assigned".

> - In C99 and later, J.2 (which is *not* normative) states that using the
> value of an object with automatic storage duration while it is
> indeterminate has undefined behavior. This implies that:
> int main(void) {
> int n;
> n;
> }
> has undefined behavior, even if int has no trap representations.

For the J.2 summary, yes. I don't think I gave the implied
conclusion, but I agree with you that the J.2 entry does seem to
imply this.

> - Statements in J.2 *should* be supported by normative text.

I don't think I said this at all. At least for now I offer
no opinion on this recommendation.

> - There is no normative text in any post-C90 edition of the C
> standard that supports the claim that reading an uninitialized
> int object actually has undefined behavior if it does not hold
> a trap representation. (Pointers raise other issues, which I'll
> ignore for now.)

Yes, with a very minor correction that it is C99 and later, because
I haven't looked at the editions of the C standard after C90 but
before C99.

> - The cited statement in J.2 is incorrect, or at least imprecise.

I don't think I said this exactly. I did say or at least imply
that the quoted entry in J.2 is not completely accurate. Certainly
it allows conclusions that are not supported by normative text, and
looked at from that point of view it is "wrong".

> I agree with you on all the above points.
>
> There is one point on which I think we disagree. It is a matter
> of opinion, not of fact. You wrote:
>
> Remembering that Annex J is informative rather than normative,
> is this statement right even for a type that has no trap
> representations? To ask that question another way, is this
> statement always right or is it just a (perhaps useful)
> approximation?
>
> The statement in N1570 J.2 is:
>
> The behavior is undefined in the following circumstances:
> [...]
> - The value of an object with automatic storage duration is used
> while it is indeterminate (6.2.4, 6.7.9, 6.8).
>
> I get the impression that you're not particularly bothered by the fact
> that the statement in J.2 is merely an "approximation". In my opinion,
> the statement in J.2 is simply incorrect, and should be fixed. (That's
> unlikely to be possible at this stage of the C23 process.) The fact
> that Annex J is, to quote the standard's foreword, "for information
> only", is not an excuse to ignore factual errors. Readers of the
> standard rely on the informative annexes to provide correct information.
> This particular text is not just a "(perhaps useful) approximation"; it
> is actively misleading.

Like I said before, for now I offer no opinion on this question. I
wouldn't mind if a footnote were added to help mitigate the problem.

> I'm not criticizing the author of the standard for making this mistake.
> Stuff happens. It was likely a result of an oversight during the
> transition from C90 to C99.

After reading the various standards carefully, I believe the wording
in the J.2 entry was not just an oversight. I suspect there is
something deeper going on. In neither case, however, does it prompt
any specific reaction (ie, in myself) as to what to do about it (if
anything).

Re: Does reading an uninitialized object have undefined behavior?

<86r0o26en6.fsf@linuxsc.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=556&group=comp.std.c#556

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17687@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Wed, 16 Aug 2023 11:11:41 -0700
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <86r0o26en6.fsf@linuxsc.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com> <87zg3pnuse.fsf@bsb.me.uk> <20230721233227.651@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="e1408e2ca4eb08ed594cd7d645ae4b51";
logging-data="3576834"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+k+cspNkF6ATOfPj2+UyJAVuOwGSRYGL4="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:O7Wxo46BON53Wth3GEMaqbkf7RY=
sha1:UdQ9sQQVFHw0oQhN4N/7kantM6k=
 by: Tim Rentsch - Wed, 16 Aug 2023 18:11 UTC

Kaz Kylheku <864-117-4973@kylheku.com> writes:

> On 2023-07-21, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
>
>> 6.3.2.1 p2:
>>
>> "[...] If the lvalue designates an object of automatic storage
>> duration that could have been declared with the register storage class
>> (never had its address taken), and that object is uninitialized (not
>> declared with an initializer and no assignment to it has been
>> performed prior to use), the behavior is undefined."
>>
>> seems to cover it. The restriction on not having it's address taken
>> seems odd.
>
> Wording like that looks like someone's solo documentation effort,
> not peer-reviewed by an expert commitee.
>
> That looks as if the intent is to allow some diagnoses of uses of
> uninitialized variables, while discouraging others.

That isn't at all what this passage is about.

Re: Does reading an uninitialized object have undefined behavior?

<20230816123212.416@kylheku.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=557&group=comp.std.c#557

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-4973@kylheku.com (Kaz Kylheku)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Wed, 16 Aug 2023 19:51:40 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 169
Message-ID: <20230816123212.416@kylheku.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com>
<864jlfj34p.fsf@linuxsc.com> <871qgjlqe9.fsf@nosuchdomain.example.com>
Injection-Date: Wed, 16 Aug 2023 19:51:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d89b22cba778372e2da36441b02fe0e1";
logging-data="3604712"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18+Ls7+1s23UCCVQhIN+E9FrIqJpiyLtjk="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:9OfVl6mfx8yaMV+KnPzSfPAnGC8=
 by: Kaz Kylheku - Wed, 16 Aug 2023 19:51 UTC

On 2023-08-03, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>> Repeating the question stated in the Subject line:
>>
>> Does reading an uninitialized object [always] have undefined
>> behavior?
>>
>> Background: Annex J part 2 says (in various phrasings in
>> different revisions of the C standard, with the one below
>> being taken from C90):
>>
>> The value of an uninitialized object that has automatic
>> storage duration is used before a value is assigned [is
>> undefined behavior] (6.5.7)
>>
>> Remembering that Annex J is informative rather than normative,
>> is this statement right even for a type that has no trap
>> representations? To ask that question another way, is this
>> statement always right or is it just a (perhaps useful)
>> approximation?
> [400+ lines deleted]
>> Summary: my reading is that accessing an object that has not
>> been explicitly stored into since its declaration was evaluated
>> is necessarily undefined behavior in C90, but not necessarily
>> undefined behavior in C99 and C11 (and AFAIAA also in C17 and
>> the upcoming C23). My reasoning is given in detail above.
>>
>>
>> Postscript: this commentary has taken much longer to write than
>> I thought it would, for the most part because I made an early
>> decision to be systematic and thorough. I hope the effort has
>> helped the readers gain confidence in the explanations and
>> conclusions stated. I may return to the deferred topic about
>> pointer types but have no plans at present about when that might
>> be.
>
> Thank you for taking the time to write that.
>
> I'd like to offer a brief summary of the points you made. Please let me
> know if my summary is incorrect.
>
> - An "indeterminate value" is by definition either an "unspecified
> value" or a "trap representation".
>
> - In C90 (which did not yet define all these terms), accessing the value
> of an uninitialized object explicitly has undefined behavior.
>
> - In C99 and later, J.2 (which is *not* normative) states that using the
> value of an object with automatic storage duration while it is
> indeterminate has undefined behavior. This implies that:
> int main(void) {
> int n;
> n;
> }
> has undefined behavior, even if int has no trap representations.
>
> - Statements in J.2 *should* be supported by normative text.
>
> - There is no normative text in any post-C90 edition of the C
> standard that supports the claim that reading an uninitialized
> int object actually has undefined behavior if it does not hold
> a trap representation. (Pointers raise other issues, which I'll
> ignore for now.)
>
> - The cited statement in J.2 is incorrect, or at least imprecise.
>
> I agree with you on all the above points.
>
> There is one point on which I think we disagree. It is a matter
> of opinion, not of fact. You wrote:
>
> Remembering that Annex J is informative rather than normative,
> is this statement right even for a type that has no trap
> representations? To ask that question another way, is this
> statement always right or is it just a (perhaps useful)
> approximation?
>
> The statement in N1570 J.2 is:
>
> The behavior is undefined in the following circumstances:
> [...]
> - The value of an object with automatic storage duration is used
> while it is indeterminate (6.2.4, 6.7.9, 6.8).
>
> I get the impression that you're not particularly bothered by the fact
> that the statement in J.2 is merely an "approximation". In my opinion,
> the statement in J.2 is simply incorrect, and should be fixed. (That's
> unlikely to be possible at this stage of the C23 process.) The fact
> that Annex J is, to quote the standard's foreword, "for information
> only", is not an excuse to ignore factual errors. Readers of the
> standard rely on the informative annexes to provide correct information.
> This particular text is not just a "(perhaps useful) approximation"; it
> is actively misleading.
>
> I'm not criticizing the author of the standard for making this mistake.
> Stuff happens. It was likely a result of an oversight during the
> transition from C90 to C99.

I would be in favor of a formal model of what "uninitialized" means
which could be summarized as below.

Implementors wishing to develop tooling to catch uses of uninitialized
data can refer to the model; if their tooling diagnoses only
what the model deems undefined, then the tool can be integrated
into a conforming implementation.

- Certain objects are unintialized, like auto variables without
an initializer, or new bytes coming from malloc or realloc.

- What is undefined behavior is when an uninitialized value is used
to make a control-flow decision, or when it is output, or otherwise
passed to the host environment.

- The formal model defines "uninitialized" in terms of there being,
in the abstract semantics, a "shadow value" corresponding to every
byte of a value, and that shadow value indicates whether the
corresponding byte is initialized or not.

- Shadow values propagate across copies, accesses and calculations.

- No special exception is needed for unsigned, other than that
it doesn't have trap representations.

- This would be undefined:

{
int uninited;
int *p = &uninited;
int v = * (unsigned char *) p;

if (v) ... // undefined here

printf("%d\n", v); // undefined

No special blessing is required for unsigned char to access
the object. The resulting value keeps carrying the shadow byte
which indicates that it is uninitialized, and so when it is output,
or used for a control flow decision, the behavior is undefined.

memcpy can be written without outputting the bytes being copied,
and without allowing their value sto control flow.

If a structure is copied with memcpy, and has uninitialized padding,
the shadow value models says that the destination object now
has uninitialized padding.

- When a value is obtained by accessing an object which has one
or more uninitialized bytes, the corresponding bytes of the
value are uninitialized.

- When a calculation has any operands that have one or more
uninitialized bytes, all bytes of the resulting value
are uninitialized.

E.g. if there is an int *p, which is used to access a value *p,
where the low-order byte is initialized, then the low order
byte of *p is initialized; the other bytes are uninitialized.
But in the value *p + 0, the entire value is uninitialized.
Implementations following the model don't have to track individual
bits or bytes through calculations. This could apply to type
conversions. e.g. tif *p is of type unsigned char, and
refers to an uninitialized byte, then the entire promoted
int (or possibly unsigned int) value is uninitialized:
all four bytes (or however many) of it.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: Does reading an uninitialized object have undefined behavior?

<ubja3a$3e365$1@dont-email.me>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=558&group=comp.std.c#558

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-4973@kylheku.com (Kaz Kylheku)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Supersedes: <20230816123212.416@kylheku.com>
Date: Wed, 16 Aug 2023 20:03:54 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 86
Message-ID: <ubja3a$3e365$1@dont-email.me>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com>
<864jlfj34p.fsf@linuxsc.com> <871qgjlqe9.fsf@nosuchdomain.example.com>
Injection-Date: Wed, 16 Aug 2023 20:03:54 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d89b22cba778372e2da36441b02fe0e1";
logging-data="3607749"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/gK6yrKFcwpM1YsGrVGPWGQ7NCcq4o2rw="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Key: sha1:qo3aKdK1tlmkxR5OF/ue9ggM2+s=
Cancel-Lock: sha1:O5ytcPG2eId4W7E6fLuo73RpZYQ=
 by: Kaz Kylheku - Wed, 16 Aug 2023 20:03 UTC

On 2023-08-03, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>> Repeating the question stated in the Subject line:
>>
>> Does reading an uninitialized object [always] have undefined
>> behavior?
>
> Thank you for taking the time to write that.
[ ... ]
> I'm not criticizing the author of the standard for making this mistake.
> Stuff happens. It was likely a result of an oversight during the
> transition from C90 to C99.

[Supersede attempt to reduce quoted material.]

I would be in favor of a formal model of what "uninitialized" means
which could be summarized as below.

Implementors wishing to develop tooling to catch uses of uninitialized
data can refer to the model; if their tooling diagnoses only
what the model deems undefined, then the tool can be integrated
into a conforming implementation.

- Certain objects are unintialized, like auto variables without
an initializer, or new bytes coming from malloc or realloc.

- What is undefined behavior is when an uninitialized value is used
to make a control-flow decision, or when it is output, or otherwise
passed to the host environment.

- The formal model defines "uninitialized" in terms of there being,
in the abstract semantics, a "shadow value" corresponding to every
byte of a value, and that shadow value indicates whether the
corresponding byte is initialized or not.

- Shadow values propagate across copies, accesses and calculations.

- No special exception is needed for unsigned, other than that
it doesn't have trap representations.

- This would be undefined:

{
int uninited;
int *p = &uninited;
int v = * (unsigned char *) p;

if (v) ... // undefined here

printf("%d\n", v); // undefined

No special blessing is required for unsigned char to access
the object. The resulting value keeps carrying the shadow byte
which indicates that it is uninitialized, and so when it is output,
or used for a control flow decision, the behavior is undefined.

memcpy can be written without outputting the bytes being copied,
and without allowing their value sto control flow.

If a structure is copied with memcpy, and has uninitialized padding,
the shadow value models says that the destination object now
has uninitialized padding.

- When a value is obtained by accessing an object which has one
or more uninitialized bytes, the corresponding bytes of the
value are uninitialized.

- When a calculation has any operands that have one or more
uninitialized bytes, all bytes of the resulting value
are uninitialized.

E.g. if there is an int *p, which is used to access a value *p,
where the low-order byte is initialized, then the low order
byte of *p is initialized; the other bytes are uninitialized.
But in the value *p + 0, the entire value is uninitialized.
Implementations following the model don't have to track individual
bits or bytes through calculations. This could apply to type
conversions. e.g. tif *p is of type unsigned char, and
refers to an uninitialized byte, then the entire promoted
int (or possibly unsigned int) value is uninitialized:
all four bytes (or however many) of it.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: Does reading an uninitialized object have undefined behavior?

<87350ilnv1.fsf@nosuchdomain.example.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=559&group=comp.std.c#559

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S.Thompson+u@gmail.com (Keith Thompson)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Wed, 16 Aug 2023 13:43:30 -0700
Organization: None to speak of
Lines: 48
Message-ID: <87350ilnv1.fsf@nosuchdomain.example.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com>
<864jlfj34p.fsf@linuxsc.com> <871qgjlqe9.fsf@nosuchdomain.example.com>
<ubja3a$3e365$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="15135efa85b793336b66170ecb85b528";
logging-data="3599848"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+UHkDkGobbP4Louip0VVPA"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:GL4nTlIzf4ZQSBQUEULQE8V+Os8=
sha1:mHA8EhlXmJst4da28iJzpyh8vsM=
 by: Keith Thompson - Wed, 16 Aug 2023 20:43 UTC

Kaz Kylheku <864-117-4973@kylheku.com> writes:
> On 2023-08-03, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>>> Repeating the question stated in the Subject line:
>>>
>>> Does reading an uninitialized object [always] have undefined
>>> behavior?
>>
>> Thank you for taking the time to write that.
> [ ... ]
>> I'm not criticizing the author of the standard for making this mistake.
>> Stuff happens. It was likely a result of an oversight during the
>> transition from C90 to C99.
>
> [Supersede attempt to reduce quoted material.]
>
> I would be in favor of a formal model of what "uninitialized" means
> which could be summarized as below.
>
> Implementors wishing to develop tooling to catch uses of uninitialized
> data can refer to the model; if their tooling diagnoses only
> what the model deems undefined, then the tool can be integrated
> into a conforming implementation.
>
> - Certain objects are unintialized, like auto variables without
> an initializer, or new bytes coming from malloc or realloc.
>
> - What is undefined behavior is when an uninitialized value is used
> to make a control-flow decision, or when it is output, or otherwise
> passed to the host environment.

Why restrict it to those particular uses, rather than saying that any
attempt to read an uninitialized value has undefined behavior?

For example, something like:
{
int uninit;
int copy = uninit + 1;
}
might cause a hardware trap on some systems (for example Itanium if
uninit is stored in a register and the NaT bit is set).

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Re: Does reading an uninitialized object have undefined behavior?

<20230816134842.416@kylheku.com>

  copy mid

https://news.novabbs.org/devel/article-flat.php?id=560&group=comp.std.c#560

  copy link   Newsgroups: comp.std.c
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 864-117-4973@kylheku.com (Kaz Kylheku)
Newsgroups: comp.std.c
Subject: Re: Does reading an uninitialized object have undefined behavior?
Date: Wed, 16 Aug 2023 21:08:19 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 107
Message-ID: <20230816134842.416@kylheku.com>
References: <87zg3pq1ym.fsf@nosuchdomain.example.com>
<864jlfj34p.fsf@linuxsc.com> <871qgjlqe9.fsf@nosuchdomain.example.com>
<ubja3a$3e365$1@dont-email.me> <87350ilnv1.fsf@nosuchdomain.example.com>
Injection-Date: Wed, 16 Aug 2023 21:08:19 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d89b22cba778372e2da36441b02fe0e1";
logging-data="3625471"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/cwWucBUE/dyix9idRfbvq8iSl6wXG8LQ="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:ONTrjvhmbqJH9knGFU+j9hcvlX8=
 by: Kaz Kylheku - Wed, 16 Aug 2023 21:08 UTC

On 2023-08-16, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> Kaz Kylheku <864-117-4973@kylheku.com> writes:
>> On 2023-08-03, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>>> Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>>>> Repeating the question stated in the Subject line:
>>>>
>>>> Does reading an uninitialized object [always] have undefined
>>>> behavior?
>>>
>>> Thank you for taking the time to write that.
>> [ ... ]
>>> I'm not criticizing the author of the standard for making this mistake.
>>> Stuff happens. It was likely a result of an oversight during the
>>> transition from C90 to C99.
>>
>> [Supersede attempt to reduce quoted material.]
>>
>> I would be in favor of a formal model of what "uninitialized" means
>> which could be summarized as below.
>>
>> Implementors wishing to develop tooling to catch uses of uninitialized
>> data can refer to the model; if their tooling diagnoses only
>> what the model deems undefined, then the tool can be integrated
>> into a conforming implementation.
>>
>> - Certain objects are unintialized, like auto variables without
>> an initializer, or new bytes coming from malloc or realloc.
>>
>> - What is undefined behavior is when an uninitialized value is used
>> to make a control-flow decision, or when it is output, or otherwise
>> passed to the host environment.
>
> Why restrict it to those particular uses, rather than saying that any
> attempt to read an uninitialized value has undefined behavior?

Because that then brings back complications like

- unsigned char access has to be exempt

- what happens if we copy through in intermediate values:

int ch = *src++; // *src is uninitialized, therefore so is ch
*dst++ = ch; // ch is uninitialized and not unsigned char

Is the second access to ch uninitialized?

- structures: when a struct is access which has uninitialized
padding, what happens: we need a rule like if those bytes
are accessed, they are accessed as if unsigned char.

The idea of trapping only control flow decisions or output is inspired
by Valgrind.

Valgrind does not "spaz out" just because an uninitialized value is
accessed, because it would result in useless false positives.

Not all of the reasoning applies to C; part of it is that Valgrind is
working with machine, with no source language knowledge. The basic idea
makes sense though.

Valgrind usefully finds uninitialized data bugs, while allowing you to
write your own memcpy which can copy a structure full of uninitialized
bytes: and it does so without knowing anything about unsigned char.

We could make the rule that only visible behavior depending on
an uninitialized byte is undefined; the rule about control flows
makes it a bit tighter, while allowing the copying of of uninited
data.

> For example, something like:
> {
> int uninit;
> int copy = uninit + 1;
> }
> might cause a hardware trap on some systems (for example Itanium if
> uninit is stored in a register and the NaT bit is set).

Right, so the model above doesn't speak to traps. We still have those.

You can copy an object using unsigned char not because it's specially
blessed for access (other than in regard to aliasing rules), but because
it has no trap representation.

On a machine without traps, the above code would just result
in copy being uninitialized.

If that value isn't printed, or used in if, or switch, then it
doesn't matter.

If the type int has trap representations, then it's undefined on that
implementation; it's basically just a matter of luck whether uninit is a
trap or a value, so it has to be regarded as undefined.

I believe that the model can be used to implement useful diagnostics
even without realizing the actual shadow bytes. A subset of the
bugs can be diagnosed within a lexical scope, like uses of
uninitialized auto locals. When the compiler is doing data flow
analysis, it just propagates that uninited info around the program
graph. If an uninited data flow reaches certain nodes in the program
graph, like where control decisions are made or certain functions
are called that are known to pass the datum to the host environment,
then it can diagnose.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Pages:12
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor