Feature Request: introduce named constants as const-qualified objects with register storage class

Problem statement

Data types that we declare in C often have some common values that should be shared by all users of that type. Most prominent in C is the all zero initialized value that we always can enforce trough a default initializer of a static object, others need detailed initializers:

typedef struct listElem listElem;
struct listElem { unsigned val; listElem* next; };
static listElem const singleton = { 0 };
static listElem const singleOne = { .val = 1 };

There are two ways to deal with such const qualified objects in file scope. If they are implemented as above with static storage class, the may result in one object per compilation unit and the compiler might throw spooky warnings on us if we define them in a header file but don't use them at all.

Another way is to declare them extern in a header file

extern listElem const singleton;

and to define them in one of the compilation units:

listElem const singleton = { 0 };

This second method has the big disadvantage that the value of the object is not available in the other units that include the declaration. Therefore we may miss some opportunities for the compiler to optimize our code. For example an initialization function for our structure type may be written as

inline
listElem* listElem_init(listElem* el) {
if (el) *el = singleton;
return el;
}

If the value of singleOne is known, the assignment could be realized without loading it but only with immediats. And listElem_init itself could then much easier be inlined where it is called.

C currently has no real support for global constants for arbitrary data types, not even for all of the standard arithmetic types: all integer constants are at least as large as int. For scalar types a cast can be used to produce a value expression:

#define myVeritableFalse ((_Bool)+0)
#define myVeritableTrue ((_Bool)+1)
#define HELLO ((char const*const)"hello")

Or some predefined macro for the special case of complex types:

#define ORIGIN CMPLXF(0, 0)

The only named constants that can serve as genuine constant expressions as C understands it are of type int and are declared through a declaration of an enumeration:

enum { uchar_upper = ((unsigned)UCHAR_MAX) + 1, };

All this, to just define a constant; we thereby define a new type (the unnamed enumeration type) and define a constant that isn't even of that type, but an int.

For composite types, in particular structure types, the only way to fabric a value expression that is not an assignable lvalue and such that the address can't be taken is to first define an object for the desired type and to feed it into an expression such that the result is an rvalue. The only operator that can return a composite type as a whole is the assignment operator:

#define SINGLETON ((listElem){ .val = 0, next = 0, } = (listElem const){ .val = 0, next = 0, })

Such expression can be hard to read for a human and a debugger; we have to use two compound literals to create a simple rvalue, one that may be const-qualified where the other can't. The biggest disadvantage is that such an expression is not suitable for initializers for objects with static storage duration. We'd have to define two different macros, one for the initializer and one for the constant expression:

#define SINGLETON_INIT { .val = 0, next = 0, }
#define SINGLETON_MOD_LVALUE (listElem){ .val = 0, next = 0, }
#define SINGLETON_CONST_LVALUE (listElem const){ .val = 0, next = 0, }
#define SINGLETON (SINGLETON_MOD_LVALUE = SINGLETON_CONST_LVALUE)

In block scope, on the other hand, there is a construct that can be used to declare unmutable values of which no address can be taken: const-qualified objects with register storage class. All the above could be given in block scope with functionally equivalent definitions:

register const listElem singleton = { 0 };
register const listElem singleOne = { .val = 1 };
register const _Bool myVeritableFalse = 0;
register const _Bool myVeritableTrue = 1;
register const char *const HELLO = "hello";
register const float _Complex ORIGIN = CMPLXF(0, 0);
register const int uchar_upper = (unsigned)UCHAR_MAX + 1;
register const listElem SINGLETON = { .val = 0, next = 0, };

The idea of this proposal is that there is no apparent reason that these register definitions couldn't be allowed in file scope.

Proposed modification

The aim of this proposal is to introduce named constants, that are values that are referred through an identifier, by means const-qualified objects with register storage class. Since this construct already exists in block scope, only two features must be introduced to make this concept suitable for the intended use:

  1. Allow the definition of const-qualified objects with register storage class in file scope.
  2. Allow the usage of const-qualified objects with register storage class in constant expressions.

Allow the usage of const-qualified objects with register storage class in constant expressions.

We have to add a new item to the list of valid constants in 6.4.4 (and in the appendix)

constant:
integer-constant
floating-constant
enumeration-constant
character-constant
named-constant

Then add a new section 6.4.4.5:

6.4.4.3 Named constants
Syntax
1 named-constant: identifier
Semantics
2 An identifier of register storage class and of type that is const qualified with no other qualification, that is not a VM type, and such that its initializer is build only with constant expressions is a named constant.

It would also be appropriate to add an explanatory footnote, here.

XX) A named constant is such that its value and size are entirely determined at compile time. Because it must be initialized it cannot be a VLA. The additional constraint of not being a VM type is necessary to ensure that all uses in constant expressions will indeed be constant.

Add a paragraph (between the current p5 and p6) that explains how named constants can be used in subexpressions.

A named constant expression shall be a named constant optionally followed by a designator list.

In the examples above, the following are valid named constant expressions:

singleton
singleOne.val
Hello[myVeritableTrue]
The first is of type struct listElem and with default initialized value for that type; the second is of type unsigned int and value 0; the third is of type char const and value 'e'.

Then we have to add named constants to be valid integer constant expressions in 6.6. Add to p6 before "integer constants,":

named constant expressions of integer type,

and to p7 add as a first list item:

- a named constant expression

Optionally, we could also add suitable compound literals to that list; gcc already has an extension that allows for this.

- a compound literal that is const qualified with no other qualification and such that its initializer is build only with constant expressions, optionally followed by a designator list.

Add to p8 before "integer constants,":

named constants expressions of arithmetic type,

Modify p9 at the end of the first sentence:

... or implicitly by the use of an expression of array or function type, or by the value of a named constant expression.

Allow the definition of const-qualified objects with register storage class in file scope.

In 6.9 replace p2:

The storage-class specifier auto shall not appear in the declaration specifier in an external declaration. If the storage-class specifier register appears in the declaration specifier of an external declaration it shall be the definition of a named constant.

In 6.3.2.1p3 add to the list of exceptional cases

or it occurs inside a named constant expression

and add named constant expressions to the "Forward references" at the end:

named constant expression (6.4.4.5)

In section 6.7.1 there is a footnote that explains the potential implementation and use of register objects.

121) The implementation simply may treat any register declaration in block scope as an auto declaration and in file scope as static declaration.
... ...
Thus, the only operators that can be applied to an array declared with storage-class specifier register are sizeof, _Alignof or array designators if the array is the obtained as the value of a named constant expression (6.4.4.5).

Optionally, in section 6.7.6.2 "Array declarators" p2 emphasize that register storage class in file scope implies that VLA are not allowed. Add a footnote for that:

xx) Thus a file scope named constant can not have a variable length array type, because it has static storage duration. It also can't be of another VM type, because in file scope there exists no constant identifier of VM type that could be used for its initialization.

Discussion

Validity

Clearly the above modifications do not invalidate any valid program under the current standard. They only assure that some formally invalid programs become valid.

For register objects in block scope, the only semantic change wouldn't be that some of these objects now would be considered to be constant expressions, and thus be allowed in some contexts where they weren't before. In particular, some array bounds of block scope arrays might now become constant and thus an array that previously was a VLA might now be an ordinary array. Such a shift only enables more flexibility (by allowing initializers) and extends in some cases the life time of an array.

Register objects in file scope were not permitted before. So here there will be no change for existing conforming programs.

The impact on the allowed grammar is minimal. The only new case is the definition of an identifier in file scope, where now also storage class specifier register may occur, but this only if the type is const qualified in addition.

For the semantic of constant expressions this adds named constants followed by designators to the possibilities.

For the semantic of integer constant expressions this adds some expressions of integer type that contain named constants followed by designators to the possibilities that correspond to fields of integer type.

For the semantic of integer constant expressions or arithmetic constant expressions this adds some expressions of integer type that contain named constants followed by designators to the possibilities that correspond to fields of integer type.

This doesn't add new types of address constants since the address of register objects can't be taken.

New file scope objects

The main new feature that is added by this proposal are a restricted class of register objects in file scope, namely those that have a const-qualified type. Since not much of the rest of the description of file scope objects is changed, such const qualified objects would have the following properties:

  1. They have "external" linkage.
  2. They have of static storage duration.
  3. Taking the address of such an object would be a constraint violation.
  4. Such an object is not a modifiable lvalue.
  5. Since the proposal requires that all declarations are also definitions, there can actually only exactly one such definition.
  6. If the definition has no initializer the object is default initialized just as other objects of static storage duration.
  7. If there is an initializer, it must only contain constant expressions as partial initializer expressions and for designators.

Modifying an object that is defined with const-qualified type would always be undefined behavior, but if the object would be register such a modification could slip in when the program is executed. With this proposal we would even have a stronger property, namely that a modification could only happen through a constraint violation (and thus require a compile time diagnostic):

Because of 4. it can never be on the left side of an assignment operator or the operand of an increment operator. A cast never leads to an lvalue. Property 3. ensures that the lvalue can never be accessed through a different type.

Property 1. is desirable to oblige users of this feature to use the identifier that corresponds to a named constant consistently. It doesn't oblige and implementation to provide a symbol in the sense of the linker program of the platform, but it provides the possibility to do so.

Possible realizations

The C++ way

C++ already went part of the path that is proposed here. It allows definitions of const-qualified integer typed objects in file scope, if the initializer is a constant integer expression. This is made in such a way that no linkage conflicts occurs and that these "constants" in turn can occur in constant integer expressions.

This model only fits half into what we want to achieve, here. First C++ has these objects also as external symbol and enables the programmer to take the address of it; C++ often needs references of values. Then this is restricted to integer typed constants, which is much less than is desirable.

Treat named constants similar to objects with internal linkage

As mentioned in the modified footnote 121) the simplest possible realization of named constants in file scope is to treat them the same as static const qualified objects. If the compiler is not able to optimize all uses of that object into immediates, this realization may have the disadvantage that each compilation unit gets its own copy of this object.

Such a realization would never lead to unspecific behavior. The only unspecific behavior that several realizations as objects with internal linkage would imply, is that the addresses of these objects would be different. Since the address of such an object can't be taken, the properties of having the same address (or not) would not be observable.

A test macro code in P99, P99_CONSTANT that just declares a static const qualified object and assures that the compiler cannot issue useless warnings about an unused identifier shows good results: with optimization enabled, gcc and clang both are able to avoid the allocation of the object and use only its value where appropriate. Obviously, such replacement macro cannot test if the address of such an object is taken, but hopefully the implementation of this feature for block scope register objects could easily be reused.

Treat named constants similar to objects with "weak" linkage

Another possibility for a realization is to use a common extension that introduces "weak" linkage, i.e a linkage property that allows for multiple definition of an object in different compilation units. If several such objects occur in different units, the linker then asserts that all are of the same size and arbitrarily chooses one of these for the final program.

This technique is e.g already used by some compiler implementors to realize inline functions. gcc and clang also handle this approach well. But in difference to the previous approach they always generate an object, so a final program will always contain exactly one copy of each named constant.

Author: Jens Gustedt, INRIA, 2012
Valid XHTML 1.0 Transitional
|