6   Dynamic Loading Information

The dynamic linker/loader (commonly referred to as the loader) is responsible for creating a dynamic executable's process image and placing it into system memory so that it can execute. The loader's functions include finding and mapping shared libraries, completing symbol resolution, and finalizing program addresses.

To accomplish these functions, the loader requires information on external symbols and shared libraries. The linker prepares this dynamic loading information for shared objects only. The dynamic loader then uses this information to create and map the process image. The dynamic information consists of the sections highlighted in Figure 6-1.

Figure 6-1 Dynamic Object File Sections

These sections are mapped with the text segment, except for the .got, which contains the GOT (Global Offset Table). The GOT is part of the data segment because it must be written into when addresses are updated.

The function of each dynamic section can be summarized as follows:

This chapter covers the dynamic sections and related topics. The actions of the system dynamic loader are explained in detail. Related material is available in the Programmer's Guide and loader(5).


6.1   New or Changed Dynamic Loading Information Features

Version 3.13 of the object file format introduces a new dynamic tag value for specifying symbol resolution order. See DT_SYMBOLIC in Section 6.2.1 for details.


6.2   Structures, Fields, and Values for Dynamic Loading Information

All structures and macros are declared in the header file coff_dyn.h unless otherwise indicated.


6.2.1   Dynamic Header Entry

typedef struct {
        coff_int      d_tag;
        coff_uint     reserved;
        union {
            coff_uint d_val;
            coff_addr d_ptr;
        } d_un;
} Coff_Dyn;

SIZE - 16 bytes, ALIGNMENT - 8 bytes

 

Dynamic Header Entry Fields

d_tag
Indicates how the d_un field is to be interpreted.
reserved
Must be zero.
d_val
Represents integer values.
d_ptr
Represents virtual addresses. Virtual addresses stored in this field may not match the memory virtual addresses during execution. The dynamic loader computes actual addresses based on the virtual address from the file and the memory base address. Object files do not contain relocation entries to correct addresses in the dynamic section.

The d_tag requirements for dynamic executable files and shared library files are summarized in Table 6-1. "Mandatory" indicates that the dynamic linking array must contain an entry of that type; "optional" indicates that an entry for the tag may exist but is not required.

Table 6-1 Dynamic Array Tags (d_tag)

Name

Value

d_un

Executable

Shared Library

DT_NULL

0

ignored

mandatory

mandatory

DT_NEEDED

1

d_val

optional

optional

DT_PLTGOT

3

d_ptr

optional

optional

DT_HASH

4

d_ptr

mandatory

mandatory

DT_STRTAB

5

d_ptr

mandatory

mandatory

DT_SYMTAB

6

d_ptr

mandatory

mandatory

DT_STRSZ

10

d_val

optional

optional

DT_SYMENT

11

d_val

optional

optional

DT_INIT

12

d_ptr

optional

optional

DT_FINI

13

d_ptr

optional

optional

DT_SONAME

14

d_val

ignored

mandatory

DT_RPATH

15

d_val

optional

ignored

DT_SYMBOLIC

16

ignored

optional

optional

DT_REL

17

d_ptr

mandatory

mandatory

DT_RELSZ

18

d_val

mandatory

mandatory

DT_RELENT

19

d_val

optional

optional

DT_RLD_VERSION

0x70000001

d_val

mandatory

mandatory

DT_TIME_STAMP

0x70000002

d_val

optional

optional

DT_ICHECKSUM

0x70000003

d_val

optional

optional

DT_IVERSION

0x70000004

d_val

optional

optional

DT_FLAGS

0x70000005

d_val

optional

optional

DT_BASE_ADDRESS

0x70000006

d_ptr

optional

optional

DT_MSYM

0x70000007

d_ptr

optional

optional

DT_CONFLICT

0x70000008

d_ptr

optional

optional

DT_LIBLIST

0x70000009

d_ptr

optional

optional

DT_LOCAL_GOTNO

0x7000000A

d_val

mandatory

mandatory

DT_CONFLICTNO

0x7000000B

d_val

optional

optional

DT_LIBLISTNO

0x70000010

d_val

optional

optional

DT_SYMTABNO

0x70000011

d_val

mandatory

mandatory

DT_UNREFEXTNO

0x70000012

d_val

optional

optional

DT_GOTSYM

0x70000013

d_val

mandatory

mandatory

DT_HIPAGENO

0x70000014

d_val

optional

optional

DT_SO_SUFFIX

0x70000017

d_val

optional

optional

The uses of the various dynamic array tags are as follows:

DT_NULL
Marks the end of the array.
DT_NEEDED
Contains the string table offset of a null-terminated string that is the name of a needed library. The offset is an index into the table indicated in the DT_STRTAB entry. The dynamic array can contain multiple entries of this type. The order of these entries is significant.
DT_HASH
Contains the quickstart address of the symbol hash table.
DT_STRTAB
Contains the quickstart address of the string table.
DT_SYMTAB
Contains the quickstart address of the symbol table with Coff_Sym entries.
DT_STRSZ
Contains the size of the string table (in bytes).
DT_SYMENT
Contains the size of a symbol table entry (in bytes).
DT_INIT
Contains the quickstart address of the initialization function.
DT_FINI
Contains the quickstart address of the termination function.
DT_SONAME
Contains the string table offset of a null-terminated string that gives the name of the shared library file. The offset is an index into the table indicated in the DT_STRTAB entry.
DT_RPATH
Contains the string table offset of a null-terminated library search path string. The offset is an index into the table indicated in the DT_STRTAB entry.
DT_SYMBOLIC
The presence of this entry indicates that symbol references should be resolved using a depth-ring search of the shared object's dependencies. See Section 6.3.4.3 for a details on shared object search order.
This dynamic entry is for information only. The search order is controlled by the DT_FLAGS setting that includes the RHF_RING_SEARCH and RHF_DEPTH_FIRST flags when DT_SYMBOLIC is added to the dynamic section.
DT_REL
Contains the address of the dynamic relocation table. If this entry is present, the dynamic structure must contain the DT_RELSZ entry.
DT_RELSZ
Contains the size (in bytes) of the dynamic relocation table pointed to by the DT_REL entry.
DT_RELENT
Contains the size (in bytes) of a DT_REL entry.
DT_RLD_VERSION
Contains the version number of the run-time linker interface. The version is:
DT_TIME_STAMP
Contains a 32-bit time stamp.
DT_ICHECKSUM
Contains a checksum value computed from the names and other attributes of all symbols exported by the library.
DT_IVERSION
Contains the string table offset of a series of colon-separated versions. An index value of zero means no version string was specified.
DT_FLAGS
Contains a set of 1-bit flags. The following flags are defined for DT_FLAGS:

Table 6-2 DT_FLAGS Flags

Flag

Value

Meaning

RHF_QUICKSTART

0x00000001

Object may be quickstarted by loader

RHF_NOTPOT

0x00000002

Hash size not a power of two

RHF_NO_LIBRARY_REPLACEMENT

0x00000004

Use default system libraries only

RHF_NO_MOVE

0x00000008

Do not relocate

RHF_TLS

0x04000000

Identifies objects that use TLS

RHF_RING_SEARCH

0x10000000

Symbol resolution same as DT_SYMBOLIC. This flag is only meaningful when combined with RHF_DEPTH_FIRST

RHF_DEPTH_FIRST

0x20000000

Depth-first symbol resolution

RHF_USE_31BIT_ADDRESSES

0x40000000

TASO (Truncated Address Support Option) objects

DT_BASE_ADDRESS
Contains the quickstart base address of the object.
DT_CONFLICT
Contains the quickstart address of the .conflict section.
DT_LIBLIST
Contains the quickstart address of the .liblist section.
DT_LOCAL_GOTNO
Contains the number of local GOT entries. The dynamic array contains one of these entries for each GOT.
DT_CONFLICTNO
Contains the number of entries in the .conflict section.
DT_LIBLISTNO
Contains the number of entries in the .liblist section.
DT_SYMTABNO
Indicates the number of entries in the .dynsym section.
DT_UNREFEXTNO
Holds the index to the first dynamic symbol table entry that is an external symbol not referenced within the object.
DT_GOTSYM
Holds the index to the first dynamic symbol table entry that corresponds to an entry in the global offset table. The dynamic array contains one of these entries for each GOT.
DT_HIPAGENO
Not used by the default system loader. If present, must contain the value 0.
DT_SO_SUFFIX
Contains a shared library suffix that the loader appends to library names when searching for dependencies. This tag is used, for example, with Atom tools. Instrumented applications may be dependent on instrumented shared libraries identified by a tool-specific suffix.

All other tag values are reserved. Entries can appear in any order, except for the DT_NULL entry at the end of the array and the relative order of the DT_NEEDED entries.


6.2.2   Dynamic Symbol Entry

typedef struct {
        coff_uint       st_name;
        coff_uint       reserved;
        coff_addr       st_value;
        coff_uint       st_size;
        coff_ubyte      st_info;
        coff_ubyte      st_other;
        coff_ushort     st_shndx;
} Coff_Sym;

SIZE - 24 bytes, ALIGNMENT - 8 bytes

See Section 6.3.3 for related information.

 

Dynamic Symbol Entry Fields

st_name
Contains the offset of the symbol's name in the dynamic string section.
reserved
Must be zero.
st_value
Contains the quickstart address if the symbol is defined within the object. Contains 0 for undefined external symbols, the alignment value for commons, or any arbitrary value for absolute symbols.
st_size
Identifies the size of symbols with common storage allocation; otherwise, contains the value zero. For STB_DUPLICATE symbols (see Table 6-4). The size field holds the index of the primary symbol.
st_info
Identifies the symbol's binding and type. The macros COFF_ST_BIND and COFF_ST_TYPE are used to access the individual values. See Table 6-3 and Table 6-4 for the possible values.
st_other
Currently has a value of zero and no defined meaning.
st_shndx
Identifies the symbol's dynamic storage class. See Table 6-5 for the possible values.

Table 6-3 Dynamic Symbol Type (st_info) Constants

Name

Value

Description

STT_NOTYPE

0

Indicates that the symbol has no type or its type is unknown.

STT_OBJECT

1

Indicates that the symbol is a data object.

STT_FUNC

2

Indicates that the symbol is a function.

STT_SECTION

3

Indicates that the symbol is associated with a program section.

STT_FILE

4

Indicates that the symbol is the name of a source file.

Table 6-4 Dynamic Symbol Binding (st_info) Constants

Name

Value

Description

STB_LOCAL

0

Indicates that the symbol is local to the object (or designated as hidden).

STB_GLOBAL

1

Indicates that the symbol is visible to other objects.

STB_WEAK

2

Indicates that the symbol is a weak global symbol.

STB_DUPLICATE

13

Indicates the symbol is a duplicate. (Used for objects that have multiple GOTs.)

 

Table 6-5 Dynamic Section Index (st_shndx) Constants

Name

Value

Description

SHN_UNDEF

0x0000

Indicates that the symbol is undefined.

SHN_ACOMMON

0xff00

Indicates that the symbol has common storage (allocated).

SHN_TEXT

0xff01

Indicates that the symbol is in a text segment.

SHN_DATA

0xff02

Indicates that the symbol is in a data segment.

SHN_ABS

0xfff1

Indicates that the symbol has an absolute value.

SHN_COMMON

0xfff2

Indicates that the symbol has common storage (unallocated).

 


6.2.3   Dynamic Relocation Entry

typedef struct {
        coff_addr	r_offset;
        coff_uint	r_info;
        coff_uint	reserved;
} Coff_Rel;

SIZE - 16 bytes, ALIGNMENT - 8 bytes

See Section 6.3.5 for related information.

 

Dynamic Relocation Entry Fields

r_offset
Indicates the quickstart address within the object that contains the value requiring relocation.
r_info
Indicates the relocation type and the index of the dynamic symbol that is referenced. The macros COFF_R_SYM and COFF_R_TYPE access the individual attributes. The relocation type must be R_REFQUAD, R_REFLONG, or R_NULL.
reserved
Must be zero.


6.2.4   Msym Table Entry

typedef struct {
        coff_uint ms_hash_value;
        coff_uint ms_info;
} Coff_Msym;

SIZE - 8 bytes, ALIGNMENT - 4 bytes

See Section 6.3.3.4 for related information.

 

Msym Table Entry Fields

ms_hash_value
Contains the hash value computed from the name of the corresponding dynamic symbol.
ms_info
Contains both the dynamic relocation index and the symbol flags field. The macros COFF_MS_REL_INDEX and COFF_MS_FLAGS are used to acess the individual values. The dynamic relocation index identifies the first entry in the .rel.dyn section that references the dynamic symbol corresponding to this msym entry. If the index is 0, no dynamic relocations are associated with the symbol. The symbol flags field is reserved for future use and should be zero.


6.2.5   Library List Entry

typedef struct {
        coff_uint l_name;
        coff_uint l_time_stamp;
        coff_uint l_checksum;
        coff_uint l_version;
        coff_uint l_flags;
} Coff_Lib;

SIZE - 20 bytes, ALIGNMENT - 4 bytes

See Section 6.3.2 for related information.

 

Library List Entry Fields

l_name
Records the name of a shared library dependency. The value is a string table index. This name can be a full pathname, relative pathname, or file name.
l_time_stamp
Records the time stamp of a shared library dependency. The value can be combined with the l_checksum value and the l_version string to form a unique identifier for this shared library file.
l_checksum
Records the checksum of a shared library dependency.
l_version
Records the interface version of a shared library dependency. The value is a string table index.
l_flags
Specifies a set of 1-bit flags. The l_flags field can have one or more of the flags described in Table 6-6.

Table 6-6 Library List Flags

Name

Value

Description

LL_EXACT_MATCH

0x01

Requires that the run-time dynamic shared library file match exactly the shared library file used at static link time.

LL_IGNORE_INT_VER

0x02

Ignores any version incompatibility between the dynamic shared library file and the shared library file used at link time.

LL_USE_SO_SUFFIX

0x04

Marks shared library dependencies that should be loaded with a suffix appended to the name. The DT_SO_SUFFIX entry in the .dynamic section records the name of this suffix. This is used by object instrumentation tools to distinguish instrumented shared libraries.

LL_NO_LOAD

0x08

Marks entries for shared libraries that are not loaded as direct dependencies of an object. Object instrumentation tools may use LL_NO_LOAD entries to set the LL_USE_SO_SUFFIX for dynamically loaded shared libraries or for indirect shared library dependencies.

If neither LL_EXACT_MATCH nor LL_IGNORE_INT_VER bits are set, the dynamic loader requires that the version of the dynamic shared library match at least one of the colon-separated version strings indexed by the l_version string table index.


6.2.6   Conflict Entry

typedef struct {
        coff_uint   c_index;
} Coff_Conflict;

SIZE - 4 bytes, ALIGNMENT - 4 bytes

The conflict entry is an index into the dynamic symbols (.dynsym) section. See Section 6.3.6.2 for related information.


6.2.7   GOT Entry

typedef struct {
        coff_addr   g_index;
} Coff_Got;

SIZE - 8 bytes, ALIGNMENT - 8 bytes

The GOT entry is a 64-bit address. Most GOT entries map to dynamic symbols. See Section 6.3.3 for details.


6.2.8   Hash Table Entry

The hash table is implemented as an array of 32-bit values. The structure is declared internal to system utilities.

See Section 6.3.3.5 for more information.


6.2.9   Dynamic String Table

The dynamic string table consists of null-terminated character strings. The strings are of varying length and separated only by a single character. Offsets into the dynamic string table give the number of bytes from the beginning of the string space to the beginning of the name in question.

Offset 0 in the dynamic string table is reserved for the null string.


6.3   Dynamic Loading Information Usage


6.3.1   Shared Object Identification

A shared object is either a dynamic executable or a shared library. The file header flags indicate whether the object is a shared object and, if so, what type of shared object it is. The layout of the object is also stated in the file header. Normally shared objects use a ZMAGIC image layout (see Section 2.3.2.3).

Additional information on the shared object is located in the dynamic header (.dynamic section). When the dynamic loader is invoked by the kernel's exec() routine, this header information is read.

The kernel and loader take the following steps upon receiving a user command to execute a dynamic executable:

  1. User enters command.
  2. Shell calls exec() in kernel.
  3. Exec() opens the file and reads the file header.
  4. If the file is a dynamic executable, exec() calls /sbin/loader.
  5. The loader then:
    1. Reads file header and dynamic header information.
    2. Maps the executable into memory.
    3. Locates each shared library dependency, relocates it if necessary, and maps it into memory.
    4. Resolves symbols for all shared objects.
    5. Sets the heap address.
    6. Transfers control to program entry point.
  6. The program entrypoint (__start in crt0.o) then:
    1. Calls special symbol __istart which invokes the loader routine to run INIT routines
    2. Calls main with __Argc, __Argv, __environ and _auxv.

 


6.3.2   Shared Library Dependencies

Dynamic executables usually rely on shared libraries. At load time, these shared libraries must be located, validated, and mapped with the process image.

If an executable object refers to a symbol whose definition resides in a shared library, the executable is dependent on that library. This relationship is described as a direct dependency. A shared library dependency also exists if a library is used by any previously identified dependency. This is an indirect dependency for the executable.

In the example shown in Figure 6-2, libA, libB, and libcool are all shared library dependencies for a.out. The library libA is a direct dependency, and the others are indirect dependencies.

Figure 6-2 Shared Library Dependencies

Although the possibility of duplicate dependencies exists, as in the preceding example, each library is mapped only once with the image. The linker also prevents recursive inclusion, which could occur in a case of cyclic dependencies.


6.3.2.1   Identification

A shared object's dependencies are stored in its .liblist entries and in DT_NEEDED entries in the .dynamic section. The linker records this information as dependencies are encountered.

The library list (.liblist section) has name, timestamp, checksum, and version information for every entry, along with a flags field. Taken together, the timestamp and checksum value and the version string form a unique identifier for a shared library. An entry is created for each shared library dependency.

A DT_NEEDED tag in the dynamic header also indicates a shared library dependency. The value of the entry is the string table offset for the needed library's name. Note that this representation of the dependency information is redundant with that contained in the library list. The loader relies on the library list only. The DT_NEEDED entries are maintained for historical reasons.

As an example, an object linked against libc has the following dependency information:

     ***DYNAMIC SECTION***

     LIBLISTNO: 1.
     LIBLIST:   0x0000000120000690
     NEEDED:    libc.so


     ***LIBRARY LIST SECTION***

     Name             Time-Stamp        CheckSum   Flags Version
a.out:
     libc.so      May 19 22:18:46 1996 0xf937323b     0 osf.1

A shared library's checksum is computed by the linker when the library is created or updated, and the value is written into the dynamic header. When an application is linked against the library, the linker copies the library's current checksum into its entry in the application's .liblist.

The checksum computation is a summation of the names of dynamic symbols that meet the following criteria:

Common storage class symbol names are included, along with their size. Weak symbols are included, but the calculation for weak symbols differs from that used for non-weak symbols.

For a single symbol, the checksum is computed using this algorithm :

if (SYMBOL.st_shndx == SHN_COMMON || SYMBOL.st_shndx == SHN_ACOMMON) 
    CHECKSUM = SYMBOL.st_size
else 
    CHECKSUM = 0

for (# of characters in symbol name)
    CHECKSUM = (CHECKSUM << 5) + character_value

if (weak symbol) 
    CHECKSUM = (CHECKSUM << 5) + CHECKSUM + 1

A change in the number of weak symbols or a change in the size of a common storage class symbol is therefore reflected in the checksum. However, the checksum calculation is insensitive to symbol reordering.

The checksums for all symbols included are summed to produce the shared object's checksum.


6.3.2.2   Searching

After loading an executable, the loader loads the executable's shared library dependencies. The loader searches for shared libraries that match the names contained in the executable's .liblist entries. Subject to the search guidelines described in this section, the loader will load the first matching shared library that it finds for each dependency.

Certain directories are searched by default, in the following order:

  1. /usr/shlib
  2. /usr/ccs/lib
  3. /usr/lib/cmplrs/cc
  4. /usr/lib
  5. /usr/local/lib
  6. /var/shlib

The loader's search path can be altered by several methods:

The -soname option is used to set internal shared library names. The default soname is the output file name of the library when it is built. The linker uses an soname value to record shared library dependencies in the library list. Dependencies containing pathnames are located without prepending search directories to their paths. A pathname is identified by the presence of one or more slashes in the string.

The RPATH is included in a shared object's .dynamic section under an entry tagged DT_RPATH. It is a colon-separated list of shared library search directories. The RPATH is set using the -rpath linker option. The loader will search RPATH directories prior to searching LD_LIBRARY_PATH and default directories.

The environment variables that impact the search order are LD_LIBRARY_PATH and _RLD_ROOT. LD_LIBRARY_PATH has the same format as rpath. No root directories are prepended to the LD_LIBRARY_PATH directories. LD_LIBRARY_PATH can also be set by a program before it calls dlopen().

The _RLD_ROOT environnment variable is a colon-separated list of "root" directories that are prepended to other search directories. It modifies RPATH and the default search directories.

The precedence (highest to lowest) of search directories used by the loader is as follows:

  1. soname (if it includes a path)
  2. _RLD_ROOT + RPATH
  3. LD_LIBRARY_PATH
  4. _RLD_ROOT + default search directories

When using non-system libraries, it is often necessary to specify the search path rather than relying on the defaults. Here is one example:

$ ld -shared -o my.so mylib.o -lc
$ cc -o hello hello.c my.so
$ hello
7526:hello: /sbin/loader: Fatal Error: cannot map my.so
$ LD_LIBRARY_PATH=. 
$ export LD_LIBRARY_PATH 
$ hello
Hello, World!

 


6.3.2.3   Validation

One of the loader's jobs is to ensure that correct shared libraries are available to the program. Shared library versioning is used to distinguish incompatible versions of shared libraries. The loader tests for matching versions when shared library dependences are loaded. If the application is found to be incompatible with a needed shared library, the program may have to be recoded or relinked. Causes of binary incompatibility include altered global data definitions and changes to documented interfaces.

Each shared library is built with a version identifer. This identifier is recorded in the .dynamic section with the tag DT_IVERSION. Each entry in the dependency information (.liblist section) also records the version identifier of a shared library dependency. The -set_version linker option is used to provide the version identifier. Without this option, the linker will build a shared library with a null version. Version identifiers can be any ASCII string.

Version checking can also be controlled by the user. The linker option -exact_version leads to more rigorous version testing by the loader. When this option is in effect, timestamps and checksums are checked in addition to version numbers. The linker-recorded dependency information for the timestamp and checksum must precisely match the load-time values for all shared libraries. Normally, a mismatch leads to additional symbol resolution work instead of a rejected object.

Version checking can be disabled through use of the loader environment variable _RLD_ARGS. Setting this variable to -ignore_all_versions disables version testing for all shared library dependencies. Setting it to -ignore_version with a library name parameter turns off version checking for that specific dependency.

By default, versions are checked, but not checksums or timestamps. If version testing fails, the loader searches for the matching version of the shared library.

The version identifiers are used to locate version-specific libraries. The loader looks for these libraries in:

  1. dirname/version_id
  2. /usr/shlib/version_id

where dirname is the first directory where a library with a matching name but non-matching version is found.

For example, if an application needs version 1 of a shared library but the loader first encounters version 2, it continues looking for the correct version.


6.3.2.3.1   Backward Compatibility

When shared libraries are modified and new versions built, the older versions are frequently retained to support previously linked applications. Maintaining multiple versions of the library helps ensure backward compatibility for existing applications even after binary-incompatible changes have been made.

Backward-compatible shared libraries can be:

The advantage of partial shared libraries is that they require less disk space; a disadvantage is that they require more swap space.

The linker's -L option can be used to link with backward-compatible shared libraries. Warnings are generated when a shared library is linked with dependencies on different versions of the same shared library. However, the linker tests direct dependencies only. The option -transitive_link should be used to uncover all multiple-version dependencies.

Multiple versions of the same shared library can only be loaded to support partial shared library dependencies. Otherwise, dependencies on multiple versions of a library are invalid.

Figure 6-3 shows examples of valid uses of multiple versions.

Figure 6-3 Valid Shared Library with Multiple Versions

Figure 6-4 shows examples of invalid uses of multiple versions.

Figure 6-4 Invalid Shared Library with Multiple Versions


6.3.2.4   Loading

The executable object is placed in memory first, at the segment base addresses designated by the linker and recorded in the a.out header. These addresses are never changed during the lifetime of the executable's image. After the executable file's segments have been mapped into memory, shared library dependencies are loaded. Shared library dependencies are mapped recursively.

The linker chooses quickstart addresses for the text and data regions of shared libraries. The loader attempts to map shared libraries to their quickstart addresses. If this attempt fails because another library has already been mapped to the same address range, the library is relocated to a different address. Note that this problem could be caused by a library mapped by another process. The system tries to map no more than one shared library at a particular virtual address range, system-wide.

Additional dependencies, not present in the library list, can be dynamically loaded using a dlopen() call. Again, the loader will attempt to load the library at its quickstart addresses and will relocate it if necessary.

When a shared library is relocated, its text and data segments must move the same distance in memory. By fixing the distance between these segments at link time, the number of dynamic relocations is minimized and restricted to the data segment.


6.3.2.4.1   Dynamic Loading and Unloading

Dependencies can be loaded and unloaded during execution by using the dlopen and dlclose system functions.

The dlopen routine accepts a library name and loads the library and its dependencies. The loader resolves all symbols in all shared objects while processing a dlopen call. If the library was previously loaded, dlopen re-resolves global symbols and returns a handle without loading any new objects.

The loader maintains a count of references made to all shared objects that have been loaded. For example, if libm.so is dependent upon libc.so, libc's reference count is incremented when the libraries are loaded. This reference counting is part of an effort to ensure that a library is never unloaded prematurely. As an additional precaution to avoid unloading a library that is still needed, the number of existing dlopen handles is tracked by the loader. This dlopen count is incremented each time a dlopen call is made for a particular object.

The dlclose routine unloads a shared library and its dependencies. It accepts a handle that was returned by dlopen.

The dlclose routine will not unload shared libraries that are still in use. Both the dlopen count and the reference count are checked and should be zero before a library is unloaded.

The dlclose routine cannot unload an executable. It is designed for shared libraries only. It also cannot unload a shared library that was not dynamically loaded by dlopen.

Objects with TLS data can be dynamically loaded or unloaded during process execution. A new TLS region is allocated for all existing threads when an object with TLS data is loaded. Similarly, the TLS region will be deallocated for all threads when the object is unloaded.


6.3.3   Dynamic Symbol Information

The dynamic symbol table is created at link time for shared objects. Its primary purpose is to enable dynamic symbol resolution. Run-time address information for dynamic symbols is contained in the GOT section (.got).

The dynamic symbol section (.dynsym) provides information on globally scoped symbols that are defined or used by the object. This section consists of a table of dynamic symbol entries. The entries are ordered as follows:

  1. A single null entry
  2. Symbols local to the object
  3. Unreferenced global symbols
  4. Referenced global symbols (corresponding to GOT entries)
  5. Relocations-referenced global symbols (corresponding to special final GOT)

Local symbols are global in scope but are not exported to other objects. The local portion of the dynamic symbol table contains system symbols representing the sections of the object: .text, .data, and other linker-defined symbols. Typically, they do not have GOT entries.

Unreferenced globals are symbols that can be exported but are not referenced by the defining object. They are present in the dynamic symbol table so that other shared objects can import and use them. Unreferenced globals do not have GOT entries.

Referenced globals are exported and are used internally. Dynamic symbols in this category have global GOT entries.

Global symbols that are referenced only by the object's dynamic relocation entries are grouped at the end of the dynamic symbol table, corresponding to a special final GOT. These symbols require GOT entries to record their run-time addresses used in processing dynamic relocations. This special GOT is only used by the loader and is never directly referenced by the program itself.

All linker-defined TLS symbols (see Section 2.3.7) have dynamic symbol entries.

Note that the dynamic symbol table itself is never relocated; it contains only link-time addresses (in the st_value field).


6.3.3.1   Symbol Look-Up

Dynamic symbol look-up is performed by the dlsym(handle,name) routine. The routine searches for the symbol name beginning in the object associated with the handle. The search is breadth first by default and depth-first for objects built with the linkers "-B symbolic" option. If the handle is null, the routine performs a depth-first search beginning at the main executable.

It is important to use the dlsym interface for symbol look-up to avoid using an outdated address. This problem can be caused by an improper compiler assumption that a symbol's address will not change after load-time. A symbol's address may be cached as an optimization and not reloaded thereafter. However, that address may be changed during execution as the result of dynamic loading and unloading.


6.3.3.2   Scope and Binding

The concept of scope in the dynamic symbol table differs somewhat from the concept of scope in the regular symbol table because the dynamic symbol table contains only global user-program symbols. The terms "local" and "external" thus have different meanings in this context.

The two scoping levels for symbols in the dynamic symbol table are object scope and process scope. A symbol with object scope is local to the shared object and can only be referenced in the library or executable where it is defined. A symbol with process scope is visible to all program components, and may be referenced anywhere. A symbol with process scope can also be preempted by a higher-precedence definition in another shared object.

Note that the distinction between object scope and process scope does not correspond directly to the local/global symbol division in the dynamic symbol table. All symbols in the local part of the table have object scope, but global dynamic symbols can be internal to the object as well. Another factor, called binding, comes into play.

The possible bind values in the dynamic symbol table are local, global, weak, and duplicate. These values are encoded in the st_info field of the dynamic symbol entry. (See Section 6.2.2 for details.)

Users are able to designate global symbols as "hidden". In the dynamic symbol table, hidden symbols have a local binding. This representation ensures that they will not be exported from the object and will not preempt any other symbol definition. Also, internal references to hidden symbols will not be preempted. The linker's "-hidden_symbol symbol" option can be used to specify a hidden symbol.

Weak symbols are also a special-case category of global symbols that have the same scope as globals but a lower precedence for symbol resolution conflicts. See Section 6.3.4.2 for details.


6.3.3.3   Multiple GOT Representation

The GOT contains address information for all referenced external symbols in the dynamic symbol table. Observe that the GOT is the source of final, run-time addresses, whereas the symbol table contains only link-time addresses. To access a dynamic symbol, the GOT must be referenced. To associate GOT entries with dynamic symbol table entries, the symbol table and GOT are aligned as shown in Figure 6-5.

Figure 6-5 Dynamic Symbol Table and Multiple-GOT

Note that the GOT also contains entries that do not correspond to dynamic symbols. These are placed at the top of each GOT table.

The maximum number of entries in a GOT is 8189. A single GOT may be sufficient to represent all necessary addresses for an object, but one or more additional GOTs are sometimes required, as illustrated in Figure 6-5. One GOT table can contain entries from multiple input objects, but a single object's entries cannot be split between two tables. The linker also builds a separate, final GOT for relocatable global symbols, referenced only in the dynamic relocation section. These constraints generally result in some unused GOT entries at the bottom of each table.

The loader recognizes a multiple-GOT object by examining the dynamic header. A DT_GOTSYM entry exists in the dynamic header for each GOT. This entry holds the index of the first dynamic symbol table entry corresponding to a GOT entry. A DT_LOCAL_GOTNO entry exists for each GOT as well. This entry contains the index of the first global entry in that GOT. The number of DT_GOTSYM entries and DT_LOCAL_GOTNO entries in the dynamic header should match. They are also expected to occur in ascending numerical order.

The first (zero-indexed) entry for every GOT in a multiple-GOT object points to the loader's lazy-text-resolve entry point. In the final GOT (consisting of relocatable symbols), it is present even though it is unused.

Multiple-GOT objects may contain duplicate symbols. A symbol appears only once per GOT, but it can be duplicated in other GOTs. All duplicate symbols, marked in the symbol table as STB_DUPLICATE, have an associated primary symbol. The primary symbol is simply the first instance of a duplicate symbol. The st_size field for a duplicate symbol is the dynamic symbol table index of the primary symbol. When a symbol is resolved in a multiple-GOT situation, all duplicates must be found and resolved as well.


6.3.3.4   Msym Table

The msym table, which is stored in the .msym section of a shared object file, maps dynamic symbol hash values to the first of any dynamic relocations for that symbol. This section is included for performance reasons to avoid time-consuming and repetitive hashing calculations during symbol resolution.

An entry in the msym table contains a hash value and an information field. The information field can be masked to obtain a dynamic relocation index and a flags field. The size of the msym table is the same as the size of the dynamic symbol table; the two tables line up directly and have matching indices.

The msym table is referenced repeatedly when an object is opened. The loader resolves symbols by searching all shared objects for matching definitions. The search requires a hash value computed from the symbol name. The msym table provides precomputed hash values for symbols to avoid the costly hash computation at load time.

Figure 6-6 Msym Table

The .msym section is an optional object file section; it is not produced by default. The linker's -msym option causes the msym table to be generated. If the .msym section is not present in a shared object, the loader will create the table each time that the object is loaded. For this reason, it is often preferable to specify the .msym section's inclusion when building shared objects.


6.3.3.5   Hash Table

A hash table, stored in the .hash section of a shared object file, provides fast access to symbol entries in the dynamic symbol section. The table is implemented as an array of 32-bit integers.

The hash table has the format shown in Figure 6-7.

Figure 6-7 Hash Table

The entries in the hash table contain the following information:

The hashing function accepts a symbol name and returns the hash value, which can be used to compute a bucket index. If the hashing function returns the value X for a name, X%nbucket is the bucket index. The hash table entry bucket[X%nbucket] gives an index, Y, into the dynamic symbol table.

The loader must determine whether the indexed symbol is the correct one. It checks the corresponding dynamic symbol's hash value in the msym table and its name.

If the symbol table entry indicated is not the correct one, the hash table entry chain[Y] indicates the next symbol table entry for a dynamic symbol with the same hash value. The indexed symbol is again checked by the loader. If it is incorrect, the same index is used in the chain array to try the next symbol that has the same hash value. The chain links can be followed in this manner until the correct symbol table entry is located or until the chain entry contains the value STN_UNDEF.

As an example, assume that a symbol with the hash value 12 is sought. If there are ten buckets, the calculation 12 % 10 gives the bucket index 2, which signifies the third bucket. A bucket index translates into a hash table index as bucket[i]=hash[i+2]. If that bucket contains a 3, the dynamic symbol table entry with an index of 3 is checked. If the symbol is incorrect, the hash table entry chain[3] is accessed to get the next possible symbol index. A chain index translates into a hash table index as chain[i]=hash[nbucket+2+i]. If chain[3] is 7, the dynamic symbol table entry with an index of 7 is checked. If it is the correct symbol, the search is successful and halts.

The structures used in this example are shown in Figure 6-8.

Figure 6-8 Hashing Example


6.3.4   Dynamic Symbol Resolution

The dynamic loader must perform symbol resolution for unresolved symbols that remain after link time. A post-link unresolved symbol is one that was not defined in a shared object or in any of the shared object's shared library dependencies searched by the linker. If a dependency is changed before execution or additional libraries are dynamically loaded, the loader will attempt to resolve the symbol.

The linker accepts unresolved symbols when linking shared objects and records them in the dynamic symbol (.dynsym) section. The loader recognizes an unresolved symbol by a symbol type of undefined (st_shndx == SHN_UNDEF) and a symbol value of zero (st_value == 0) in the dynamic symbol table. For such symbols, the GOT value distinguishes imported symbols from symbols that are unresolved across all shared objects.

Table 6-7 gives a rough idea of different categories of symbols and how they are represented in the dynamic symbol table. Run-time addresses are stored in the GOT. They can be pre-computed by the linker and adjusted at load time.

Table 6-7 Dynamic Symbol Categories

Description

Type

Section

Value

GOT

defined item

OBJECT, FUNC

TEXT, DATA, ACOMMON

address

address

imported function

FUNC

UNDEF

0

address (in defining object)

imported data

OBJECT

UNDEF

0

address (in defining object)

common

COMMON

OBJECT

alignment

address of allocated common (in defining object)

unresolved function

FUNC

UNDEF

0

stub address

unresolved data

OBJECT

UNDEF

0

0

The loader performs symbol resolution during initial load of a program. The amount of symbol resolution work required by a program varies (see Section 6.3.4.6).

The loader can also perform dynamic symbol resolution for particular symbols during program execution. If new dependencies are added or existing dependencies are rearranged, externally visible symbols (those with process scope) must be re-resolved.

Unresolved text symbols can be resolved at run time instead of load time (see Section 6.3.4.5).


6.3.4.1   Symbol Preemption and Namespace Pollution

A namespace is a scope within which symbol names should all be unique. In a namespace, a given name is bound to a single item, wherever it may be used. This generic use of the term "namespace" is distinct from the C++ namespace construct, which is discussed in Section 5.3.6.4.

Dynamic executables running on Tru64 UNIX share a namespace with their shared library dependencies. This policy is implemented with symbol preemption. Symbol preemption, also referred to as "hooking", is a mechanism by which all references to a multiply defined symbol are resolved to the same instance of the symbol.

Advantages of symbol preemption include:

Disadvantages include extra load time for symbol resolution and potential problems resulting from namespace pollution.

Namespace pollution can occur during the use of shared libraries. A library routine may malfunction if it calls or accesses a global symbol that is redefined by another shared library or application. Figure 6-9 presents an example of this situation.

Figure 6-9 Namespace Pollution

Namespace pollution is partly covered by ANSI standards. Namespace conflicts that occur between libc and ANSI compliant programs must not affect the behavior of ANSI defined functions implemented in libc.

The identifiers reserved for use by the library are:

All other names are available to user programs. User versions of non-reserved identifiers preempt library versions.

Historically, system libraries have used many unreserved symbols. To achieve compliance with the ANSI standard, global symbols have undergone a name change. Documented interfaces have been retained as weak symbols (see Section 6.3.4.2). Their strong counterparts have names that are formed by prepending two underscores to the corresponding weak symbol's name.

Hidden symbols do not cause namespace pollution problems and cannot be preempted because they are not exported from the shared object where they are defined.

The linker options -hidden_symbol and -exported_symbol turn the hidden attribute on or off for a given symbol name. The options -hidden -non_hidden turn the hidden attribute on or off for all subsequent symbols.

TLS data symbols have the same name scope as hidden symbols. The names are not shared among multiple threads.


6.3.4.2   Weak Symbols

Weak symbols are global symbols that have a lower precedence in symbol resolution than other globals. Strong symbols are any symbols that are not marked as weak.

Weak symbols can be used as aliases for other weak or strong symbols. This technique can be useful when it is desirable to provide both a low-precedence name and a high-precedence name for the same data item or procedure. When the weak symbol is referenced, its strong counterpart is the one actually used.

This aliasing approach employing weak symbols is used in libc.so to avoid namespace pollution problems. In the example in Figure 6-10, the strong symbol definition in the application takes precedence over the weak library definition, and the program functions properly.

Figure 6-10 Weak Symbol Resolution (I)

Figure 6-11 Weak Symbol Resolution (II)

If no non-weak open symbols were defined, references to open would bind to libc's weak symbol, as shown in Figure 6-11.

Weak symbols can also be used to prevent multiple symbol definition errors or warnings when linking. The linker does not require a weak symbol to be aliased to a strong symbol, but the loader produces a warning message if it cannot find a matching strong symbol for a weak symbol it is attempting to resolve.

To find a weak symbol's strong counterpart, the loader follows these steps:

Use hash lookup to find __<NAME> in the dynamic symbol table. 
if (not found or not a match)
    foreach symbol in the dynamic symbol table 
        Test for match

Matching symbols will have the same st_value, COFF_ST_TYPE(st_info) and st_shndx.

A weak symbol is identified in the dynamic symbol table by a STB_WEAK bind value. In the external symbol table, a weak symbol has its weak_ext flag set in the EXTR entry.

Users can specify weak symbols using the .weakext assembler directive or the C #pragma weak preprocessor directive.


6.3.4.3   Search Order

The symbol resolution policy, or symbol search order, defines the order in which the loader searches for symbol definitions in a dynamic executable and its dependencies.

Default search order is a breadth-first, left-to-right traversal of the shared object dependency graph.

Figure 6-12 Symbol Resolution Search Order

The search order in Figure 6-12 is: a.out libA libB libc.so libD libE

Objects loaded dynamically by dlopen() are appended to the search order established at load time. However, dlopen options will determine whether a dynamically loaded object's symbols are visible to objects that do not include it in their dependency lists. See dlopen(3) for details.

Alternatively, the user can specify the search order by using linker or loader options. The linker's
-depth_ring_search option causes the loader to use a different symbol resolution policy. This policy is a two-step search:

  1. Depth-first search the referencing object and its dependencies
  2. Depth-first search from the main executable

Using the depth ring search policy and the dependency graph from Figure 6-12, the search order is:

From

Search Order

a.out

a.out libA libD libc.so libB libE

libA

libA libD libc.so a.out libB libE

libB

libB libE libc.so a.out libA libD

libD

libD libc.so a.out libA libB libE

libE

libE libc.so a.out libA libD libB

libc.so

libc.so a.out libA libD libB libE


6.3.4.4   Precedence

The highest-to-lowest precedence order for dynamic symbol resolution is:

  1. Strong text or data
  2. Strong largest allocated common
  3. Weak data
  4. Weak largest allocated common
  5. Largest common
  6. Weak text

In case (5), the loader allocates the common symbol. This situation only arises when an object containing an allocated common of the same name has been changed between link time and load time or is dynamically unloaded during run time. The linker will always allocate a common storage class symbol, but if there are multiple occurrences of that symbol, the others are retained as unallocated commons.

When symbols have equal precedence, the loader relies on the search order to choose the correct definition for the symbol.


6.3.4.5   Lazy Text Resolution

Lazy text resolution allows programs to execute without resolving text symbols that are never referenced.

Programs with unresolved text symbols are linked with stub routines. When a program or library calls a stub routine, the stub calls the loader's lazy_text_resolve entry point with a dynamic symbol index as an argument. The loader then resolves the text symbol. Subsequent calls will use the true address, which has replaced the stub in the appropriate GOT entry.

The dynamic symbol table does not contain any explicit information that indicates whether a text symbol has a stub associated with it. The loader looks for the following clues instead:

The environment variable LD_BIND_NOW controls the loader's text resolution mode. If the variable has a non-null value, the bind mode is immediate. If the value is null, the bind mode is deferred. Immediate binding requires all symbols to be resolved at load time. Deferred binding allows text symbols to be resolved at run time using lazy text evaluation. The default is deferred binding.

See Section 3.3.3 for related information.


6.3.4.6   Levels of Resolution

Conditions may exist that cause the loader to do more symbol resolution work for some programs than for others. The amount of symbol resolution work that is necessary can have a significant impact on a program's start-up time.

Descriptions of the possible levels of dynamic symbol resolution follow.

Quickstart Resolution

Minimal symbol resolution. For details on quickstart, see Section 6.3.6.

Timestamp Resolution

Moderate symbol resolution. This is used when any of the following are true:

Checksum Resolution

Extensive symbol resolution. This is used when a shared library dependency has been rebuilt and its checksum no longer matches the dependency information in the executable. The checksum changes if any of the following conditions are met:

Binding Resolution

Re-resolve symbols marked UNDEF for immediate binding. This is used by dlopen() to apply immediate binding symbol resolution to shared objects that were previously resolved with lazy binding.


6.3.5   Dynamic Relocation

The dynamic relocation section describes all locations that must be adjusted within the object if an object is loaded at an address other than its linked base address.

Although an object may have multiple relocation sections, the linker concatenates all relocation information present in its input objects. The dynamic loader is thus faced with a single relocation table. This dynamic relocation table is stored in the .rel.dyn section and is ordered by the corresponding dynamic symbol index.

Offset 0 in the dynamic relocation table is reserved for a null entry with all fields zeroed.

All dynamic relocations must be of the type R_REFQUAD or R_REFLONG. This simplifies the dynamic relocation process. These two relocation types are sufficient to represent all information that is necessary to accomplish dynamic relocations. Dynamic relocation entries must only apply to addresses in an object's data segment. The object's text segment must not contain any relocatable addresses.

Relocation entries are updated during dynamic symbol resolution. When a dynamic symbol's value changes, any dynamic relocations associated with that symbol must be updated. To update the entries, the relocation value is computed by subtracting the old value of the from the new value. This value is then added to the contents of the relocation targets. The old value of a dynamic symbol is always stored in a GOT entry. The new value of a dynamic symbol is stored in that GOT entry after dynamic relocations are processed.

Relocation types other than R_REFQUAD and R_REFLONG are not allowed for dynamic relocations because no other relocation types apply to absolute addresses stored in data. Most relocation types apply to values that need to be computed at link time and do not change at run time.

A dynamic executable file may also contain normal relocation sections. If normal relocation entries are present, the loader ignores them.


6.3.6   Quickstart

Quickstart is a loading technique that uses predetermined addresses to run a program that depends on shared libraries. It is particularly useful for applications that rely on shared libraries that change infrequently.

The linker chooses quickstart addresses for all shared library dependencies when a dynamic executable is linked. These addresses are stored in the registry file normally named so_locations. For details on the shared library registry file, refer to the Programmer's Guide.

Any modification to a shared library impairs quickstarting of applications that depend on that library. If a shared library dependency has changed, it may be possible to use the fixso utility to update the application and thus enable quickstart to succeed.

To verify that an application is quickstarted, set the _RLD_ARGS environment variable to
-quickstart_only.

Additional information on quickstart is available in the Programmer's Guide.


6.3.6.1   Quickstart Levels

Not all shared objects can be successfully quickstarted. If an executable cannot be quickstarted, it still runs, but start up is slower. Quickstarting is possible for programs requiring minimal symbol resolution at load time. A dynamic executable is quickstarted if:

Each quickstart requirement that is not met by a dynamic executable and its dependencies leads to additional symbol resolution work.

At this point, the timesaving advantage of quickstarting has disappeared.

For quickstart purposes, a link-time shared library matches its associated load-time shared library if the timestamp and checksum are unchanged. If they have been changed, using the fixso tool may remedy the situation and enable quickstart to succeed.


6.3.6.2   Conflict Table

The conflict table, stored in the .conflict section, contains a list of symbols that are multiply defined and must be resolved by the loader. The conflict table is used only when full quickstarting is possible. If any changes preventing quickstart have occurred, the loader resorts to other methods of symbol resolution.

The linker records conflicts in a shared object's .conflict section if a second definition is found for a previously-defined symbol. Common storage class symbols are not considered conflicts unless they are allocated in more than one shared object.

Weak symbols aliased to a newly resolved conflict entry are also treated as conflicts. This means the loader does not have to search for weak symbols matching conflict symbols. The weak symbols are added to the conflict list for the first shared library that defined the symbol in question as well as the library where the conflicting definition was found.

Figure 6-13 shows a simple example of the use of conflict entries.

Figure 6-13 Conflict Entry Example

In this example, the a.out executable has been linked with liba.so, and a single conflict has been recorded for the symbol a_error. The conflict is recorded in the executable file at link time because both the executable and shared library define the symbol. At run time, any calls to a_error from a_sort will be preempted by the definition of a_error in the a.out executable. Without the conflict entry, the call to a_error would not be preempted properly when a.out is quickstarted.


6.3.6.3   Repairing Quickstart

The fixso utility updates shared libraries to permit quickstarting of applications that utilize them, even if the libraries have changed since the executable was originally linked against them. Given a shared object as input, it updates the object and its dependencies to make them meet quickstart criteria. The library changes handled by fixso are timestamp and checksum discrepancies.

The fixso utility creates a breadth-first list of the object's dependencies. It then handles conflicts present in the conflict table. Next, fixso resolves globals, updating global symbol values, dynamic relocation entries, and GOT entries where necessary. Lastly, if these actions are successful, fixso resets the timestamp and checksum of its target object.

When a dependency is discovered during processing, fixso automatically opens the associated object and adds it to the object list if possible. The dependency will be found and opened if it is located in the default library search path, the path indicated by the LD_LIBRARY_PATH environment variable, or the path specified in the command line. Otherwise, it may be necessary to run the fixso program on the library separately, before fixing the target object.

Some changes made to shared libraries cannot be reconciled by fixso. The fixso utility does not support: