RuntimeTypeInfo.md | Explore in Territory

<!--===- docs/RuntimeTypeInfo.md 
  
   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
   See https://llvm.org/LICENSE.txt for license information.
   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
  
-->

# The derived type runtime information table

```{contents}
---
local:
---
```

## Overview

Many operations on derived types must be implemented, or can be
implemented, with calls to the runtime support library rather than
directly with generated code.
Some operations might be initially implemented in the runtime library
and then reimplemented later in generated code for compelling
performance gains in optimized compilations.

The runtime library uses *derived type description* tables to represent
the relevant characteristics of derived types.
This note summarizes the requirements for these descriptions.

The semantics phase of the F18 frontend constructs derived type
descriptions from its scoped symbol table after name resolution
and semantic constraint checking have succeeded.
The lowering phase then transfers the tables to the static
read-only data section of the generated program by translating them into
initialized objects.
During execution, references to the tables occur by passing their addresses
as arguments to relevant runtime library APIs and as pointers in
the addenda of descriptors.

## Requirements

The following Fortran language features require, or may require, the use of
derived type descriptions in the runtime library.

### Components

The components of a derived type need to be described in component
order (7.4.7), but when there is a parent component, its components
can be described by reference to the description of the type of the
parent component.

The ordered component descriptions are needed to implement
* default initialization
* `ALLOCATE`, with and without `SOURCE=`
* intrinsic assignment of derived types with `ALLOCATABLE` and
  automatic components
* intrinsic I/O of derived type instances
* `NAMELIST` I/O of derived type instances
* "same type" tests

The characteristics of data components include their names, types,
offsets, bounds, cobounds, derived type descriptions when appropriate,
default component initializers, and flags for `ALLOCATABLE`, `POINTER`,
`PRIVATE`, and automatic components (implicit allocatables).
Procedure pointer components require only their offsets and address(es).

### Calls to type-bound procedures

Only extensible derived types -- those without `SEQUENCE` or `BIND(C)`
-- are allowed to have type-bound procedures.
Calls to these bindings will be resolved at compilation time when
the binding is `NON_OVERRIDABLE` or when an object is not polymorphic.
Calls to overridable bindings of polymorphic objects requires the
use of a runtime table of procedure addresses.

Each derived type (or instantiation of a parameterized derived type)
will have a complete type-bound procedure table in which all of the
bindings of its ancestor types appear first.
(Specifically, the table offsets of any inherited bindings must be
the same as they are in the table of the ancestral type's table.)
These ancestral bindings reflect their overrides, if any.

The non-inherited bindings of a type then follow the inherited
bindings, and they do so in alphabetical order of binding name.
(This is an arbitrary choice -- we could also define them to
appear in binding declaration order, I suppose -- but a consistent
ordering should be used so that relocatables generated by distinct
versions of the F18 compiler will have a better chance to interoperate.)

### Type parameter values and "same type" testing

The values of the `KIND` and `LEN` parameters of a particular derived type
instance can be obtained to implement type parameter inquiries without
requiring derived type information tables.
In the case of a `KIND` type parameter, it's a constant value known at
compilation time, and in the case of a `LEN` type parameter, it's a
member of the addendum to the object's descriptor.

The runtime library will have an API (TBD) to be called as
part of the implementation of `TYPE IS` and `CLASS IS` guards
of the `SELECT TYPE` construct.
This language support predicate returns a true result when
an object's type matches a particular type specification and
`KIND` (but not `LEN`) type parameter values.

Note that this "is same type as" predicate is *not* the same as
the one to be called to implement the `SAME_TYPE_AS()` intrinsic function,
which is specified so as to *ignore* the values of `KIND` type
parameters.

Subclause 7.5.2 defines what being the "same" derived type means
in Fortran.
In short, each definition of a derived type defines a distinct type,
so type equality testing can usually compare addresses of derived
type descriptions at runtime.
The exceptions are `SEQUENCE` types and interoperable (`BIND(C)`)
types.
Independent definitions of each of these are considered to be the "same type"
when these definitions match in terms of names, types, and attributes,
both being either `SEQUENCE` or `BIND(C)`, and containing
no `PRIVATE` components.
These "sequence" derived types cannot have type parameters, type-bound
procedures, an absence of components, or components that are not themselves
of a sequence type, so we can use a static hash code to implement
their "same type" tests.

### FINAL subroutines

When an instance of a derived type is deallocated or goes out of scope,
one of its `FINAL` subroutines may be called.
Subclause 7.5.6.3 defines when finalization occurs -- it doesn't happen
in all situations.

The subroutines named in a derived type's `FINAL` statements are not
bindings, so their arguments are not passed object dummy arguments and
do not have to satisfy the constraints of a passed object.
Specifically, they can be arrays, and cannot be polymorphic.
If a `FINAL` subroutine's dummy argument is an array, it may be
assumed-shape or assumed-rank, but it could also be an explicit-shape
or assumed-size argument.
This means that it may or may not be passed by means of a descriptor.

Note that a `FINAL` subroutine with a scalar argument does not define
a finalizer for array objects unless the subroutine is elemental
(and probably `IMPURE`).
This seems to be a language pitfall and F18 will emit a
warning when an array of a finalizable derived type is declared
with a rank lacking a `FINAL` subroutine when other ranks do have one.

So the necessary information in the derived type table for a `FINAL`
subroutine comprises:
* address(es) of the subroutine
* rank of the argument, or whether it is assumed-rank
* for rank 0, whether the subroutine is elemental
* for rank > 0, whether the argument requires a descriptor

This descriptor flag is needed to handle a difficult case with
`FINAL` subroutines that most other implementations of Fortran
fail to get right: a `FINAL` subroutine
whose argument is a an explicit shape or assumed size array may
have to be called upon the parent component of an array of
an extended derived type.

```
  module m
    type :: parent
      integer :: n
     contains
      final :: subr
    end type
    type, extends(parent) :: extended
      integer :: m
    end type
   contains
    subroutine subr(a)
      type(parent) :: a(1)
    end subroutine
  end module
  subroutine demo
    use m
    type(extended) :: arr(1)
  end subroutine
```

If the `FINAL` subroutine doesn't use a descriptor -- and it
will not if there are no `LEN` type parameters -- the runtime
will have to allocate and populate a temporary array of copies
elements of the parent component of the array so that it can
be passed by reference to the `FINAL` subroutine.

### Defined assignment

A defined assignment subroutine for a derived type can be declared
by means of a generic `INTERFACE ASSIGNMENT(=)` and by means of
a generic type-bound procedure.
Defined assignments with non-type-bound generic interfaces are
resolved to specific subroutines at compilation time.
Most cases of type-bound defined assignment are resolved to their
bindings at compilation time as well (with possible runtime
resolution of overridable bindings).

Intrinsic assignment of derived types with components that have
derived types with type-bound generic assignments is specified
by subclause 10.2.1.3 paragraph 13 as invoking defined assignment
subroutines, however.

This seems to be the only case of defined assignment that may be of
interest to the runtime library.
If this is correct, then the requirements are somewhat constrained;
we know that the rank of the target of the assignment must match
the rank of the source, and that one of the dummy arguments of the
bound subroutine is a passed object dummy argument and satisfies
all of the constraints of one -- in particular, it's scalar and
polymorphic.

So the derived type information for a defined assignment needs to
comprise:
* address(es) of the subroutine
* whether the first, second, or both arguments are descriptors
* whether the subroutine is elemental (necessarily also impure)

### User defined derived type I/O

Fortran programs can specify subroutines that implement formatted and
unformatted `READ` and `WRITE` operations for derived types.
These defined I/O subroutines may be specified with an explicit `INTERFACE`
or with a type-bound generic.
When specified with an `INTERFACE`, the first argument must not be
polymorphic, but when specified with a type-bound generic, the first
argument is a passed-object dummy argument and required to be so.
In any case, the argument is scalar.

Nearly all invocations of user defined derived type I/O subroutines
are resolved at compilation time to specific procedures or to
overridable bindings.
(The I/O library APIs for acquiring their arguments remain to be
designed, however.)
The case that is of interest to the runtime library is that of
NAMELIST I/O, which is specified to invoke user defined derived
type I/O subroutines if they have been defined.

The derived type information for a user defined derived type I/O
subroutine comprises:
* address(es) of the subroutine
* whether it is for a read or a write
* whether it is formatted or unformatted
* whether the first argument is a descriptor (true if it is a
  binding of the derived type, or has a `LEN` type parameter)

## Exporting derived type descriptions from module relocatables

Subclause 7.5.2 requires that two objects be considered as having the
same derived type if they are declared "with reference to the same
derived type definition".
For derived types that are defined in modules and accessed by means
of use association, we need to be able to describe the type in the
read-only static data section of the module and access the description
as a link-time external.

This is not always possible to achieve in the case of instantiations
of parameterized derived types, however.
Two identical instantiations in distinct compilation units of the same
use associated parameterized derived type seem impractical to implement
using the same address.
(Perhaps some linkers would support unification of global objects
with "mangled" names and identical contents, but this seems unportable.)

Derived type descriptions therefore will contain pointers to
their "uninstantiated" original derived types.
For derived types with no `KIND` type parameters, these pointers
will be null; for uninstantiated derived types, these pointers
will point at themselves.
llvm/flang/docs/RuntimeTypeInfo.md