The dynamic linker/loader (commonly referred to as the loader) is responsible for creating a dynamic executable's process image and placing it into system memory so that it can execute. The loader's functions include finding and mapping shared libraries, completing symbol resolution, and finalizing program addresses.
To accomplish these functions, the loader requires information on external symbols and shared libraries. The linker prepares this dynamic loading information for shared objects only. The dynamic loader then uses this information to create and map the process image. The dynamic information consists of the sections highlighted in Figure 6-1.
These sections are mapped with the text segment, except for the .got
, which contains the GOT (Global Offset Table). The GOT is part of the data segment because it must be written into when addresses are updated.
The function of each dynamic section can be summarized as follows:
.dynamic
section serves as a header for the dynamic information.
.dynsym
section contains the dynamic symbol table.
.dynstr
section contains the names of dynamic symbols and shared library dependencies.
.hash
section holds a hash table to provide quick access into the dynamic symbol table.
.msym
table contains supplemental symbolic information, including pre-computed hash values and dynamic relocation indices.
.liblist
section stores dependency information.
.conflict
section contains a list of multiply-defined symbol names that must be resolved at load time.
.rel.dyn
section contains dynamic relocation entries.
.got
section contains one or more tables of 64-bit run-time addresses. This chapter covers the dynamic sections and related topics. The actions of the system dynamic loader are explained in detail. Related material is available in the Programmer's Guide and loader(5)
.
Version 3.13 of the object file format introduces a new dynamic tag value for specifying symbol resolution order. See DT_SYMBOLIC
in Section 6.2.1 for details.
All structures and macros are declared in the header file coff_dyn.h
unless otherwise indicated.
typedef struct { coff_int d_tag; coff_uint reserved; union { coff_uint d_val; coff_addr d_ptr; } d_un; } Coff_Dyn;
SIZE - 16 bytes, ALIGNMENT - 8 bytes
Dynamic Header Entry Fields
d_tag
d_un
field is to be interpreted. reserved
d_val
d_ptr
The d_tag
requirements for dynamic executable files and shared library files are summarized in Table 6-1. "Mandatory" indicates that the dynamic linking array must contain an entry of that type; "optional" indicates that an entry for the tag may exist but is not required.
d_tag
)
Name |
Value |
d_un |
Executable |
Shared Library |
|
0 |
ignored |
mandatory |
mandatory |
|
1 |
d_val |
optional |
optional |
|
3 |
d_ptr |
optional |
optional |
|
4 |
d_ptr |
mandatory |
mandatory |
|
5 |
d_ptr |
mandatory |
mandatory |
|
6 |
d_ptr |
mandatory |
mandatory |
|
10 |
d_val |
optional |
optional |
|
11 |
d_val |
optional |
optional |
|
12 |
d_ptr |
optional |
optional |
|
13 |
d_ptr |
optional |
optional |
|
14 |
d_val |
ignored |
mandatory |
|
15 |
d_val |
optional |
ignored |
|
16 |
ignored |
optional |
optional |
|
17 |
d_ptr |
mandatory |
mandatory |
|
18 |
d_val |
mandatory |
mandatory |
|
19 |
d_val |
optional |
optional |
|
0x70000001 |
d_val |
mandatory |
mandatory |
|
0x70000002 |
d_val |
optional |
optional |
|
0x70000003 |
d_val |
optional |
optional |
|
0x70000004 |
d_val |
optional |
optional |
|
0x70000005 |
d_val |
optional |
optional |
|
0x70000006 |
d_ptr |
optional |
optional |
|
0x70000007 |
d_ptr |
optional |
optional |
|
0x70000008 |
d_ptr |
optional |
optional |
|
0x70000009 |
d_ptr |
optional |
optional |
|
0x7000000A |
d_val |
mandatory |
mandatory |
|
0x7000000B |
d_val |
optional |
optional |
|
0x70000010 |
d_val |
optional |
optional |
|
0x70000011 |
d_val |
mandatory |
mandatory |
|
0x70000012 |
d_val |
optional |
optional |
|
0x70000013 |
d_val |
mandatory |
mandatory |
|
0x70000014 |
d_val |
optional |
optional |
|
0x70000017 |
d_val |
optional |
optional |
The uses of the various dynamic array tags are as follows:
DT_NULL
DT_NEEDED
DT_STRTAB
entry. The dynamic array can contain multiple entries of this type. The order of these entries is significant. DT_HASH
DT_STRTAB
DT_SYMTAB
Coff_Sym
entries. DT_STRSZ
DT_SYMENT
DT_INIT
DT_FINI
DT_SONAME
DT_STRTAB
entry. DT_RPATH
DT_STRTAB
entry. DT_SYMBOLIC
DT_FLAGS
setting that includes the RHF_RING_SEARCH
and RHF_DEPTH_FIRST
flags when DT_SYMBOLIC
is added to the dynamic section.DT_REL
DT_RELSZ
entry. DT_RELSZ
DT_REL
entry. DT_RELENT
DT_REL
entry. DT_RLD_VERSION
DT_TIME_STAMP
DT_ICHECKSUM
DT_IVERSION
DT_FLAGS
DT_FLAGS
:
Flag |
Value |
Meaning |
|
0x00000001 |
Object may be quickstarted by loader |
|
0x00000002 |
Hash size not a power of two |
|
0x00000004 |
Use default system libraries only |
|
0x00000008 |
Do not relocate |
|
0x04000000 |
Identifies objects that use TLS |
|
0x10000000 |
Symbol resolution same as |
|
0x20000000 |
Depth-first symbol resolution |
|
0x40000000 |
TASO (Truncated Address Support Option) objects |
DT_BASE_ADDRESS
DT_CONFLICT
.conflict
section. DT_LIBLIST
.liblist
section. DT_LOCAL_GOTNO
DT_CONFLICTNO
.conflict
section. DT_LIBLISTNO
.liblist
section. DT_SYMTABNO
.dynsym
section. DT_UNREFEXTNO
DT_GOTSYM
DT_HIPAGENO
DT_SO_SUFFIX
All other tag values are reserved. Entries can appear in any order, except for the DT_NULL
entry at the end of the array and the relative order of the DT_NEEDED
entries.
typedef struct { coff_uint st_name; coff_uint reserved; coff_addr st_value; coff_uint st_size; coff_ubyte st_info; coff_ubyte st_other; coff_ushort st_shndx; } Coff_Sym;
SIZE - 24 bytes, ALIGNMENT - 8 bytes
See Section 6.3.3 for related information.
Dynamic Symbol Entry Fields
st_name
reserved
st_value
st_size
STB_DUPLICATE
symbols (see Table 6-4). The size field holds the index of the primary symbol. st_info
COFF_ST_BIND
and COFF_ST_TYPE
are used to access the individual values. See Table 6-3 and Table 6-4 for the possible values. st_other
st_shndx
st_info
) Constants
Name |
Value |
Description |
|
0 |
Indicates that the symbol has no type or its type is unknown. |
|
1 |
Indicates that the symbol is a data object. |
|
2 |
Indicates that the symbol is a function. |
|
3 |
Indicates that the symbol is associated with a program section. |
|
4 |
Indicates that the symbol is the name of a source file. |
st_info
) Constants
Name |
Value |
Description |
|
0 |
Indicates that the symbol is local to the object (or designated as hidden). |
|
1 |
Indicates that the symbol is visible to other objects. |
|
2 |
Indicates that the symbol is a weak global symbol. |
|
13 |
Indicates the symbol is a duplicate. (Used for objects that have multiple GOTs.) |
st_shndx
) Constants
Name |
Value |
Description |
|
|
Indicates that the symbol is undefined. |
|
|
Indicates that the symbol has common storage (allocated). |
|
|
Indicates that the symbol is in a text segment. |
|
|
Indicates that the symbol is in a data segment. |
|
|
Indicates that the symbol has an absolute value. |
|
|
Indicates that the symbol has common storage (unallocated). |
typedef struct { coff_addr r_offset; coff_uint r_info; coff_uint reserved; } Coff_Rel;
SIZE - 16 bytes, ALIGNMENT - 8 bytes
See Section 6.3.5 for related information.
Dynamic Relocation Entry Fields
r_offset
r_info
COFF_R_SYM
and COFF_R_TYPE
access the individual attributes. The relocation type must be R_REFQUAD
, R_REFLONG
, or R_NULL.
reserved
typedef struct { coff_uint ms_hash_value; coff_uint ms_info; } Coff_Msym;
SIZE - 8 bytes, ALIGNMENT - 4 bytes
See Section 6.3.3.4 for related information.
Msym Table Entry Fields
ms_hash_value
ms_info
COFF_MS_REL_INDEX
and COFF_MS_FLAGS
are used to acess the individual values. The dynamic relocation index identifies the first entry in the .rel.dyn
section that references the dynamic symbol corresponding to this msym
entry. If the index is 0, no dynamic relocations are associated with the symbol. The symbol flags field is reserved for future use and should be zero. typedef struct { coff_uint l_name; coff_uint l_time_stamp; coff_uint l_checksum; coff_uint l_version; coff_uint l_flags; } Coff_Lib;
SIZE - 20 bytes, ALIGNMENT - 4 bytes
See Section 6.3.2 for related information.
Library List Entry Fields
l_name
l_time_stamp
l_checksum
value and the l_version
string to form a unique identifier for this shared library file. l_checksum
l_version
l_flags
l_flags
field can have one or more of the flags described in Table 6-6.
Name |
Value |
Description |
|
|
Requires that the run-time dynamic shared library file match exactly the shared library file used at static link time. |
|
|
Ignores any version incompatibility between the dynamic shared library file and the shared library file used at link time. |
|
|
Marks shared library dependencies that should be loaded with a suffix appended to the name. The |
|
|
Marks entries for shared libraries that are not loaded as direct dependencies of an object. Object instrumentation tools may use |
LL_EXACT_MATCH
nor LL_IGNORE_INT_VER
bits are set, the dynamic loader requires that the version of the dynamic shared library match at least one of the colon-separated version strings indexed by the l_version
string table index. typedef struct { coff_uint c_index; } Coff_Conflict;
SIZE - 4 bytes, ALIGNMENT - 4 bytes
The conflict entry is an index into the dynamic symbols (.dynsym
) section. See Section 6.3.6.2 for related information.
typedef struct { coff_addr g_index; } Coff_Got;
SIZE - 8 bytes, ALIGNMENT - 8 bytes
The GOT entry is a 64-bit address. Most GOT entries map to dynamic symbols. See Section 6.3.3 for details.
The hash table is implemented as an array of 32-bit values. The structure is declared internal to system utilities.
See Section 6.3.3.5 for more information.
The dynamic string table consists of null-terminated character strings. The strings are of varying length and separated only by a single character. Offsets into the dynamic string table give the number of bytes from the beginning of the string space to the beginning of the name in question.
Offset 0 in the dynamic string table is reserved for the null string.
A shared object is either a dynamic executable or a shared library. The file header flags indicate whether the object is a shared object and, if so, what type of shared object it is. The layout of the object is also stated in the file header. Normally shared objects use a ZMAGIC
image layout (see Section 2.3.2.3).
Additional information on the shared object is located in the dynamic header (.dynamic
section). When the dynamic loader is invoked by the kernel's exec()
routine, this header information is read.
The kernel and loader take the following steps upon receiving a user command to execute a dynamic executable:
exec()
in kernel.
Exec()
opens the file and reads the file header.
exec()
calls /sbin/loader
.
__start
in crt0.o
) then:
__istart
which invokes the loader routine to run INIT routines
main
with __Argc
, __Argv
, __environ
and _auxv
.Dynamic executables usually rely on shared libraries. At load time, these shared libraries must be located, validated, and mapped with the process image.
If an executable object refers to a symbol whose definition resides in a shared library, the executable is dependent on that library. This relationship is described as a direct dependency. A shared library dependency also exists if a library is used by any previously identified dependency. This is an indirect dependency for the executable.
In the example shown in Figure 6-2, libA
, libB
, and libcool
are all shared library dependencies for a.out
. The library libA
is a direct dependency, and the others are indirect dependencies.
Although the possibility of duplicate dependencies exists, as in the preceding example, each library is mapped only once with the image. The linker also prevents recursive inclusion, which could occur in a case of cyclic dependencies.
A shared object's dependencies are stored in its .liblist
entries and in DT_NEEDED
entries in the .dynamic
section. The linker records this information as dependencies are encountered.
The library list (.liblist
section) has name, timestamp, checksum, and version information for every entry, along with a flags field. Taken together, the timestamp and checksum value and the version string form a unique identifier for a shared library. An entry is created for each shared library dependency.
A DT_NEEDED
tag in the dynamic header also indicates a shared library dependency. The value of the entry is the string table offset for the needed library's name. Note that this representation of the dependency information is redundant with that contained in the library list. The loader relies on the library list only. The DT_NEEDED
entries are maintained for historical reasons.
As an example, an object linked against libc
has the following dependency information:
***DYNAMIC SECTION*** LIBLISTNO: 1. LIBLIST: 0x0000000120000690 NEEDED: libc.so ***LIBRARY LIST SECTION*** Name Time-Stamp CheckSum Flags Version a.out: libc.so May 19 22:18:46 1996 0xf937323b 0 osf.1
A shared library's checksum is computed by the linker when the library is created or updated, and the value is written into the dynamic header. When an application is linked against the library, the linker copies the library's current checksum into its entry in the application's .liblist
.
The checksum computation is a summation of the names of dynamic symbols that meet the following criteria:
Common storage class symbol names are included, along with their size. Weak symbols are included, but the calculation for weak symbols differs from that used for non-weak symbols.
For a single symbol, the checksum is computed using this algorithm :
if (SYMBOL.st_shndx == SHN_COMMON || SYMBOL.st_shndx == SHN_ACOMMON) CHECKSUM = SYMBOL.st_size else CHECKSUM = 0 for (# of characters in symbol name) CHECKSUM = (CHECKSUM << 5) + character_value if (weak symbol) CHECKSUM = (CHECKSUM << 5) + CHECKSUM + 1
A change in the number of weak symbols or a change in the size of a common storage class symbol is therefore reflected in the checksum. However, the checksum calculation is insensitive to symbol reordering.
The checksums for all symbols included are summed to produce the shared object's checksum.
After loading an executable, the loader loads the executable's shared library dependencies. The loader searches for shared libraries that match the names contained in the executable's .liblist
entries. Subject to the search guidelines described in this section, the loader will load the first matching shared library that it finds for each dependency.
Certain directories are searched by default, in the following order:
/usr/shlib
/usr/ccs/lib
/usr/lib/cmplrs/cc
/usr/lib
/usr/local/lib
/var/shlib
The loader's search path can be altered by several methods:
-soname
linker option
-rpath
linker option
The -soname
option is used to set internal shared library names. The default soname
is the output file name of the library when it is built. The linker uses an soname
value to record shared library dependencies in the library list. Dependencies containing pathnames are located without prepending search directories to their paths. A pathname is identified by the presence of one or more slashes in the string.
The RPATH
is included in a shared object's .dynamic
section under an entry tagged DT_RPATH
. It is a colon-separated list of shared library search directories. The RPATH
is set using the -rpath
linker option. The loader will search RPATH
directories prior to searching LD_LIBRARY_PATH
and default directories.
The environment variables that impact the search order are LD_LIBRARY_PATH
and _RLD_ROOT
. LD_LIBRARY_PATH
has the same format as rpath
. No root directories are prepended to the LD_LIBRARY_PATH
directories. LD_LIBRARY_PATH
can also be set by a program before it calls dlopen()
.
The _RLD_ROOT
environnment variable is a colon-separated list of "root" directories that are prepended to other search directories. It modifies RPATH
and the default search directories.
The precedence (highest to lowest) of search directories used by the loader is as follows:
soname
(if it includes a path)
_RLD_ROOT + RPATH
LD_LIBRARY_PATH
_RLD_ROOT
+ default search directories When using non-system libraries, it is often necessary to specify the search path rather than relying on the defaults. Here is one example:
$ ld -shared -o my.so mylib.o -lc $ cc -o hello hello.c my.so $ hello 7526:hello: /sbin/loader: Fatal Error: cannot map my.so $ LD_LIBRARY_PATH=. $ export LD_LIBRARY_PATH $ hello Hello, World!
One of the loader's jobs is to ensure that correct shared libraries are available to the program. Shared library versioning is used to distinguish incompatible versions of shared libraries. The loader tests for matching versions when shared library dependences are loaded. If the application is found to be incompatible with a needed shared library, the program may have to be recoded or relinked. Causes of binary incompatibility include altered global data definitions and changes to documented interfaces.
Each shared library is built with a version identifer. This identifier is recorded in the .dynamic
section with the tag DT_IVERSION
. Each entry in the dependency information (.liblist
section) also records the version identifier of a shared library dependency. The -set_version
linker option is used to provide the version identifier. Without this option, the linker will build a shared library with a null version. Version identifiers can be any ASCII
string.
Version checking can also be controlled by the user. The linker option -exact_version
leads to more rigorous version testing by the loader. When this option is in effect, timestamps and checksums are checked in addition to version numbers. The linker-recorded dependency information for the timestamp and checksum must precisely match the load-time values for all shared libraries. Normally, a mismatch leads to additional symbol resolution work instead of a rejected object.
Version checking can be disabled through use of the loader environment variable _RLD_ARGS
. Setting this variable to -ignore_all_versions
disables version testing for all shared library dependencies. Setting it to -ignore_version
with a library name parameter turns off version checking for that specific dependency.
By default, versions are checked, but not checksums or timestamps. If version testing fails, the loader searches for the matching version of the shared library.
The version identifiers are used to locate version-specific libraries. The loader looks for these libraries in:
/usr/shlib/
version_id where dirname is the first directory where a library with a matching name but non-matching version is found.
For example, if an application needs version 1 of a shared library but the loader first encounters version 2, it continues looking for the correct version.
When shared libraries are modified and new versions built, the older versions are frequently retained to support previously linked applications. Maintaining multiple versions of the library helps ensure backward compatibility for existing applications even after binary-incompatible changes have been made.
Backward-compatible shared libraries can be:
The advantage of partial shared libraries is that they require less disk space; a disadvantage is that they require more swap space.
The linker's -L
option can be used to link with backward-compatible shared libraries. Warnings are generated when a shared library is linked with dependencies on different versions of the same shared library. However, the linker tests direct dependencies only. The option -transitive_link
should be used to uncover all multiple-version dependencies.
Multiple versions of the same shared library can only be loaded to support partial shared library dependencies. Otherwise, dependencies on multiple versions of a library are invalid.
Figure 6-3 shows examples of valid uses of multiple versions.
Figure 6-4 shows examples of invalid uses of multiple versions.
The executable object is placed in memory first, at the segment base addresses designated by the linker and recorded in the a.out
header. These addresses are never changed during the lifetime of the executable's image. After the executable file's segments have been mapped into memory, shared library dependencies are loaded. Shared library dependencies are mapped recursively.
The linker chooses quickstart addresses for the text and data regions of shared libraries. The loader attempts to map shared libraries to their quickstart addresses. If this attempt fails because another library has already been mapped to the same address range, the library is relocated to a different address. Note that this problem could be caused by a library mapped by another process. The system tries to map no more than one shared library at a particular virtual address range, system-wide.
Additional dependencies, not present in the library list, can be dynamically loaded using a dlopen()
call. Again, the loader will attempt to load the library at its quickstart addresses and will relocate it if necessary.
When a shared library is relocated, its text and data segments must move the same distance in memory. By fixing the distance between these segments at link time, the number of dynamic relocations is minimized and restricted to the data segment.
Dependencies can be loaded and unloaded during execution by using the dlopen
and dlclose
system functions.
The dlopen
routine accepts a library name and loads the library and its dependencies. The loader resolves all symbols in all shared objects while processing a dlopen
call. If the library was previously loaded, dlopen
re-resolves global symbols and returns a handle without loading any new objects.
The loader maintains a count of references made to all shared objects that have been loaded. For example, if libm.so
is dependent upon libc.so
, libc
's reference count is incremented when the libraries are loaded. This reference counting is part of an effort to ensure that a library is never unloaded prematurely. As an additional precaution to avoid unloading a library that is still needed, the number of existing dlopen
handles is tracked by the loader. This dlopen
count is incremented each time a dlopen
call is made for a particular object.
The dlclose
routine unloads a shared library and its dependencies. It accepts a handle that was returned by dlopen
.
The dlclose
routine will not unload shared libraries that are still in use. Both the dlopen
count and the reference count are checked and should be zero before a library is unloaded.
The dlclose
routine cannot unload an executable. It is designed for shared libraries only. It also cannot unload a shared library that was not dynamically loaded by dlopen
.
Objects with TLS
data can be dynamically loaded or unloaded during process execution. A new TLS region is allocated for all existing threads when an object with TLS data is loaded. Similarly, the TLS region will be deallocated for all threads when the object is unloaded.
The dynamic symbol table is created at link time for shared objects. Its primary purpose is to enable dynamic symbol resolution. Run-time address information for dynamic symbols is contained in the GOT section (.got
).
The dynamic symbol section (.dynsym
) provides information on globally scoped symbols that are defined or used by the object. This section consists of a table of dynamic symbol entries. The entries are ordered as follows:
Local symbols are global in scope but are not exported to other objects. The local portion of the dynamic symbol table contains system symbols representing the sections of the object: .text
, .data
, and other linker-defined symbols. Typically, they do not have GOT entries.
Unreferenced globals are symbols that can be exported but are not referenced by the defining object. They are present in the dynamic symbol table so that other shared objects can import and use them. Unreferenced globals do not have GOT entries.
Referenced globals are exported and are used internally. Dynamic symbols in this category have global GOT entries.
Global symbols that are referenced only by the object's dynamic relocation entries are grouped at the end of the dynamic symbol table, corresponding to a special final GOT. These symbols require GOT entries to record their run-time addresses used in processing dynamic relocations. This special GOT is only used by the loader and is never directly referenced by the program itself.
All linker-defined TLS symbols (see Section 2.3.7) have dynamic symbol entries.
Note that the dynamic symbol table itself is never relocated; it contains only link-time addresses (in the st_value
field).
Dynamic symbol look-up is performed by the dlsym
(handle,name) routine. The routine searches for the symbol name beginning in the object associated with the handle. The search is breadth first by default and depth-first for objects built with the linkers "-B symbolic"
option. If the handle is null, the routine performs a depth-first search beginning at the main executable.
It is important to use the dlsym
interface for symbol look-up to avoid using an outdated address. This problem can be caused by an improper compiler assumption that a symbol's address will not change after load-time. A symbol's address may be cached as an optimization and not reloaded thereafter. However, that address may be changed during execution as the result of dynamic loading and unloading.
The concept of scope in the dynamic symbol table differs somewhat from the concept of scope in the regular symbol table because the dynamic symbol table contains only global user-program symbols. The terms "local" and "external" thus have different meanings in this context.
The two scoping levels for symbols in the dynamic symbol table are object scope and process scope. A symbol with object scope is local to the shared object and can only be referenced in the library or executable where it is defined. A symbol with process scope is visible to all program components, and may be referenced anywhere. A symbol with process scope can also be preempted by a higher-precedence definition in another shared object.
Note that the distinction between object scope and process scope does not correspond directly to the local/global symbol division in the dynamic symbol table. All symbols in the local part of the table have object scope, but global dynamic symbols can be internal to the object as well. Another factor, called binding, comes into play.
The possible bind values in the dynamic symbol table are local, global, weak, and duplicate. These values are encoded in the st_info
field of the dynamic symbol entry. (See Section 6.2.2 for details.)
Users are able to designate global symbols as "hidden". In the dynamic symbol table, hidden symbols have a local binding. This representation ensures that they will not be exported from the object and will not preempt any other symbol definition. Also, internal references to hidden symbols will not be preempted. The linker's "-hidden_symbol
symbol" option can be used to specify a hidden symbol.
Weak symbols are also a special-case category of global symbols that have the same scope as globals but a lower precedence for symbol resolution conflicts. See Section 6.3.4.2 for details.
The GOT contains address information for all referenced external symbols in the dynamic symbol table. Observe that the GOT is the source of final, run-time addresses, whereas the symbol table contains only link-time addresses. To access a dynamic symbol, the GOT must be referenced. To associate GOT entries with dynamic symbol table entries, the symbol table and GOT are aligned as shown in Figure 6-5.
Note that the GOT also contains entries that do not correspond to dynamic symbols. These are placed at the top of each GOT table.
The maximum number of entries in a GOT is 8189. A single GOT may be sufficient to represent all necessary addresses for an object, but one or more additional GOTs are sometimes required, as illustrated in Figure 6-5. One GOT table can contain entries from multiple input objects, but a single object's entries cannot be split between two tables. The linker also builds a separate, final GOT for relocatable global symbols, referenced only in the dynamic relocation section. These constraints generally result in some unused GOT entries at the bottom of each table.
The loader recognizes a multiple-GOT object by examining the dynamic header. A DT_GOTSYM
entry exists in the dynamic header for each GOT. This entry holds the index of the first dynamic symbol table entry corresponding to a GOT entry. A DT_LOCAL_GOTNO
entry exists for each GOT as well. This entry contains the index of the first global entry in that GOT. The number of DT_GOTSYM
entries and DT_LOCAL_GOTNO
entries in the dynamic header should match. They are also expected to occur in ascending numerical order.
The first (zero-indexed) entry for every GOT in a multiple-GOT object points to the loader's lazy-text-resolve
entry point. In the final GOT (consisting of relocatable symbols), it is present even though it is unused.
Multiple-GOT objects may contain duplicate symbols. A symbol appears only once per GOT, but it can be duplicated in other GOTs. All duplicate symbols, marked in the symbol table as STB_DUPLICATE, have an associated primary symbol. The primary symbol is simply the first instance of a duplicate symbol. The st_size
field for a duplicate symbol is the dynamic symbol table index of the primary symbol. When a symbol is resolved in a multiple-GOT situation, all duplicates must be found and resolved as well.
The msym
table, which is stored in the .msym
section of a shared object file, maps dynamic symbol hash values to the first of any dynamic relocations for that symbol. This section is included for performance reasons to avoid time-consuming and repetitive hashing calculations during symbol resolution.
An entry in the msym
table contains a hash value and an information field. The information field can be masked to obtain a dynamic relocation index and a flags field. The size of the msym
table is the same as the size of the dynamic symbol table; the two tables line up directly and have matching indices.
The msym
table is referenced repeatedly when an object is opened. The loader resolves symbols by searching all shared objects for matching definitions. The search requires a hash value computed from the symbol name. The msym table provides precomputed hash values for symbols to avoid the costly hash computation at load time.
The .msym
section is an optional object file section; it is not produced by default. The linker's -msym
option causes the msym
table to be generated. If the .msym
section is not present in a shared object, the loader will create the table each time that the object is loaded. For this reason, it is often preferable to specify the .msym
section's inclusion when building shared objects.
A hash table, stored in the .hash
section of a shared object file, provides fast access to symbol entries in the dynamic symbol section. The table is implemented as an array of 32-bit integers.
The hash table has the format shown in Figure 6-7.
The entries in the hash table contain the following information:
nbucket
entry indicates the number of entries in the bucket
array.
nchain
entry indicates the number of entries in the chain
array.
bucket
and chain
arrays both hold dynamic symbol table indices, and the entries in chain
parallel the dynamic symbol table. The value of nchain
is equal to the number of symbol table entries. Symbol table indices can be used to select chain
entries. The hashing function accepts a symbol name and returns the hash value, which can be used to compute a bucket
index. If the hashing function returns the value X for a name, X%nbucket is the bucket index. The hash table entry bucket[
X%nbucket]
gives an index, Y
, into the dynamic symbol table.
The loader must determine whether the indexed symbol is the correct one. It checks the corresponding dynamic symbol's hash value in the msym
table and its name.
If the symbol table entry indicated is not the correct one, the hash table entry chain[Y]
indicates the next symbol table entry for a dynamic symbol with the same hash value. The indexed symbol is again checked by the loader. If it is incorrect, the same index is used in the chain
array to try the next symbol that has the same hash value. The chain
links can be followed in this manner until the correct symbol table entry is located or until the chain
entry contains the value STN_UNDEF
.
As an example, assume that a symbol with the hash value 12 is sought. If there are ten buckets, the calculation 12 % 10
gives the bucket index 2, which signifies the third bucket. A bucket index translates into a hash table index as bucket[i]=hash[i+2]
. If that bucket contains a 3, the dynamic symbol table entry with an index of 3 is checked. If the symbol is incorrect, the hash table entry chain[3]
is accessed to get the next possible symbol index. A chain index translates into a hash table index as chain[i]=hash[nbucket+2+i]
. If chain[3]
is 7, the dynamic symbol table entry with an index of 7 is checked. If it is the correct symbol, the search is successful and halts.
The structures used in this example are shown in Figure 6-8.
The dynamic loader must perform symbol resolution for unresolved symbols that remain after link time. A post-link unresolved symbol is one that was not defined in a shared object or in any of the shared object's shared library dependencies searched by the linker. If a dependency is changed before execution or additional libraries are dynamically loaded, the loader will attempt to resolve the symbol.
The linker accepts unresolved symbols when linking shared objects and records them in the dynamic symbol (.dynsym
) section. The loader recognizes an unresolved symbol by a symbol type of undefined (st_shndx == SHN_UNDEF
) and a symbol value of zero (st_value
== 0) in the dynamic symbol table. For such symbols, the GOT value distinguishes imported symbols from symbols that are unresolved across all shared objects.
Table 6-7 gives a rough idea of different categories of symbols and how they are represented in the dynamic symbol table. Run-time addresses are stored in the GOT. They can be pre-computed by the linker and adjusted at load time.
Description |
Type |
Section |
Value |
GOT |
defined item |
|
|
address |
address |
imported function |
|
|
0 |
address (in defining object) |
imported data |
|
|
0 |
address (in defining object) |
common |
|
|
alignment |
address of allocated common (in defining object) |
unresolved function |
|
|
0 |
stub address |
unresolved data |
|
|
0 |
0 |
The loader performs symbol resolution during initial load of a program. The amount of symbol resolution work required by a program varies (see Section 6.3.4.6).
The loader can also perform dynamic symbol resolution for particular symbols during program execution. If new dependencies are added or existing dependencies are rearranged, externally visible symbols (those with process scope) must be re-resolved.
Unresolved text symbols can be resolved at run time instead of load time (see Section 6.3.4.5).
A namespace is a scope within which symbol names should all be unique. In a namespace, a given name is bound to a single item, wherever it may be used. This generic use of the term "namespace" is distinct from the C++ namespace construct, which is discussed in Section 5.3.6.4.
Dynamic executables running on Tru64 UNIX share a namespace with their shared library dependencies. This policy is implemented with symbol preemption. Symbol preemption, also referred to as "hooking", is a mechanism by which all references to a multiply defined symbol are resolved to the same instance of the symbol.
Advantages of symbol preemption include:
Disadvantages include extra load time for symbol resolution and potential problems resulting from namespace pollution.
Namespace pollution can occur during the use of shared libraries. A library routine may malfunction if it calls or accesses a global symbol that is redefined by another shared library or application. Figure 6-9 presents an example of this situation.
Namespace pollution is partly covered by ANSI standards. Namespace conflicts that occur between libc and ANSI compliant programs must not affect the behavior of ANSI defined functions implemented in libc.
The identifiers reserved for use by the library are:
fopen
, malloc
, and so forth) All other names are available to user programs. User versions of non-reserved identifiers preempt library versions.
Historically, system libraries have used many unreserved symbols. To achieve compliance with the ANSI standard, global symbols have undergone a name change. Documented interfaces have been retained as weak symbols (see Section 6.3.4.2). Their strong counterparts have names that are formed by prepending two underscores to the corresponding weak symbol's name.
Hidden symbols do not cause namespace pollution problems and cannot be preempted because they are not exported from the shared object where they are defined.
The linker options -hidden_symbol
and -exported_symbol
turn the hidden attribute on or off for a given symbol name. The options -hidden
-non_hidden
turn the hidden attribute on or off for all subsequent symbols.
TLS data symbols have the same name scope as hidden symbols. The names are not shared among multiple threads.
Weak symbols are global symbols that have a lower precedence in symbol resolution than other globals. Strong symbols are any symbols that are not marked as weak.
Weak symbols can be used as aliases for other weak or strong symbols. This technique can be useful when it is desirable to provide both a low-precedence name and a high-precedence name for the same data item or procedure. When the weak symbol is referenced, its strong counterpart is the one actually used.
This aliasing approach employing weak symbols is used in libc.so
to avoid namespace pollution problems. In the example in Figure 6-10, the strong symbol definition in the application takes precedence over the weak library definition, and the program functions properly.
If no non-weak open
symbols were defined, references to open
would bind to libc's
weak symbol, as shown in Figure 6-11.
Weak symbols can also be used to prevent multiple symbol definition errors or warnings when linking. The linker does not require a weak symbol to be aliased to a strong symbol, but the loader produces a warning message if it cannot find a matching strong symbol for a weak symbol it is attempting to resolve.
To find a weak symbol's strong counterpart, the loader follows these steps:
Use hash lookup to find __<NAME> in the dynamic symbol table. if (not found or not a match) foreach symbol in the dynamic symbol table Test for match
Matching symbols will have the same st_value,
COFF_ST_TYPE(st_info)
and st_shndx
.
A weak symbol is identified in the dynamic symbol table by a STB_WEAK
bind value. In the external symbol table, a weak symbol has its weak_ext
flag set in the EXTR
entry.
Users can specify weak symbols using the .weakext
assembler directive or the C #pragma weak
preprocessor directive.
The symbol resolution policy, or symbol search order, defines the order in which the loader searches for symbol definitions in a dynamic executable and its dependencies.
Default search order is a breadth-first, left-to-right traversal of the shared object dependency graph.
The search order in Figure 6-12 is: a.out libA libB libc.so libD libE
Objects loaded dynamically by dlopen()
are appended to the search order established at load time. However, dlopen
options will determine whether a dynamically loaded object's symbols are visible to objects that do not include it in their dependency lists. See dlopen(3)
for details.
Alternatively, the user can specify the search order by using linker or loader options. The linker's
-depth_ring_search
option causes the loader to use a different symbol resolution policy. This policy is a two-step search:
Using the depth ring search policy and the dependency graph from Figure 6-12, the search order is:
From |
Search Order |
|
|
|
|
|
|
|
|
|
|
|
|
The highest-to-lowest precedence order for dynamic symbol resolution is:
In case (5), the loader allocates the common symbol. This situation only arises when an object containing an allocated common of the same name has been changed between link time and load time or is dynamically unloaded during run time. The linker will always allocate a common storage class symbol, but if there are multiple occurrences of that symbol, the others are retained as unallocated commons.
When symbols have equal precedence, the loader relies on the search order to choose the correct definition for the symbol.
Lazy text resolution allows programs to execute without resolving text symbols that are never referenced.
Programs with unresolved text symbols are linked with stub routines. When a program or library calls a stub routine, the stub calls the loader's lazy_text_resolve
entry point with a dynamic symbol index as an argument. The loader then resolves the text symbol. Subsequent calls will use the true address, which has replaced the stub in the appropriate GOT entry.
The dynamic symbol table does not contain any explicit information that indicates whether a text symbol has a stub associated with it. The loader looks for the following clues instead:
st_shndx
is SHN_UNDEF
st_value
is zero
The environment variable LD_BIND_NOW
controls the loader's text resolution mode. If the variable has a non-null value, the bind mode is immediate. If the value is null, the bind mode is deferred. Immediate binding requires all symbols to be resolved at load time. Deferred binding allows text symbols to be resolved at run time using lazy text evaluation. The default is deferred binding.
See Section 3.3.3 for related information.
Conditions may exist that cause the loader to do more symbol resolution work for some programs than for others. The amount of symbol resolution work that is necessary can have a significant impact on a program's start-up time.
Descriptions of the possible levels of dynamic symbol resolution follow.
Quickstart Resolution
Minimal symbol resolution. For details on quickstart, see Section 6.3.6.
Timestamp Resolution
Moderate symbol resolution. This is used when any of the following are true:
Checksum Resolution
Extensive symbol resolution. This is used when a shared library dependency has been rebuilt and its checksum no longer matches the dependency information in the executable. The checksum changes if any of the following conditions are met:
Binding Resolution
Re-resolve symbols marked UNDEF
for immediate binding. This is used by dlopen()
to apply immediate binding symbol resolution to shared objects that were previously resolved with lazy binding.
The dynamic relocation section describes all locations that must be adjusted within the object if an object is loaded at an address other than its linked base address.
Although an object may have multiple relocation sections, the linker concatenates all relocation information present in its input objects. The dynamic loader is thus faced with a single relocation table. This dynamic relocation table is stored in the .rel.dyn
section and is ordered by the corresponding dynamic symbol index.
Offset 0 in the dynamic relocation table is reserved for a null entry with all fields zeroed.
All dynamic relocations must be of the type R_REFQUAD
or R_REFLONG
. This simplifies the dynamic relocation process. These two relocation types are sufficient to represent all information that is necessary to accomplish dynamic relocations. Dynamic relocation entries must only apply to addresses in an object's data segment. The object's text segment must not contain any relocatable addresses.
Relocation entries are updated during dynamic symbol resolution. When a dynamic symbol's value changes, any dynamic relocations associated with that symbol must be updated. To update the entries, the relocation value is computed by subtracting the old value of the from the new value. This value is then added to the contents of the relocation targets. The old value of a dynamic symbol is always stored in a GOT entry. The new value of a dynamic symbol is stored in that GOT entry after dynamic relocations are processed.
Relocation types other than R_REFQUAD
and R_REFLONG
are not allowed for dynamic relocations because no other relocation types apply to absolute addresses stored in data. Most relocation types apply to values that need to be computed at link time and do not change at run time.
A dynamic executable file may also contain normal relocation sections. If normal relocation entries are present, the loader ignores them.
Quickstart is a loading technique that uses predetermined addresses to run a program that depends on shared libraries. It is particularly useful for applications that rely on shared libraries that change infrequently.
The linker chooses quickstart addresses for all shared library dependencies when a dynamic executable is linked. These addresses are stored in the registry file normally named so_locations
. For details on the shared library registry file, refer to the Programmer's Guide.
Any modification to a shared library impairs quickstarting of applications that depend on that library. If a shared library dependency has changed, it may be possible to use the fixso
utility to update the application and thus enable quickstart to succeed.
To verify that an application is quickstarted, set the _RLD_ARGS
environment variable to
-quickstart_only
.
Additional information on quickstart is available in the Programmer's Guide.
Not all shared objects can be successfully quickstarted. If an executable cannot be quickstarted, it still runs, but start up is slower. Quickstarting is possible for programs requiring minimal symbol resolution at load time. A dynamic executable is quickstarted if:
Each quickstart requirement that is not met by a dynamic executable and its dependencies leads to additional symbol resolution work.
At this point, the timesaving advantage of quickstarting has disappeared.
For quickstart purposes, a link-time shared library matches its associated load-time shared library if the timestamp and checksum are unchanged. If they have been changed, using the fixso
tool may remedy the situation and enable quickstart to succeed.
The conflict table, stored in the .conflict
section, contains a list of symbols that are multiply defined and must be resolved by the loader. The conflict table is used only when full quickstarting is possible. If any changes preventing quickstart have occurred, the loader resorts to other methods of symbol resolution.
The linker records conflicts in a shared object's .conflict
section if a second definition is found for a previously-defined symbol. Common storage class symbols are not considered conflicts unless they are allocated in more than one shared object.
Weak symbols aliased to a newly resolved conflict entry are also treated as conflicts. This means the loader does not have to search for weak symbols matching conflict symbols. The weak symbols are added to the conflict list for the first shared library that defined the symbol in question as well as the library where the conflicting definition was found.
Figure 6-13 shows a simple example of the use of conflict entries.
In this example, the a.out executable has been linked with liba.so, and a single conflict has been recorded for the symbol a_error
. The conflict is recorded in the executable file at link time because both the executable and shared library define the symbol. At run time, any calls to a_error
from a_sort
will be preempted by the definition of a_error
in the a.out
executable. Without the conflict entry, the call to a_error
would not be preempted properly when a.out
is quickstarted.
The fixso
utility updates shared libraries to permit quickstarting of applications that utilize them, even if the libraries have changed since the executable was originally linked against them. Given a shared object as input, it updates the object and its dependencies to make them meet quickstart criteria. The library changes handled by fixso
are timestamp and checksum discrepancies.
The fixso
utility creates a breadth-first list of the object's dependencies. It then handles conflicts present in the conflict table. Next, fixso
resolves globals, updating global symbol values, dynamic relocation entries, and GOT entries where necessary. Lastly, if these actions are successful, fixso
resets the timestamp and checksum of its target object.
When a dependency is discovered during processing, fixso
automatically opens the associated object and adds it to the object list if possible. The dependency will be found and opened if it is located in the default library search path, the path indicated by the LD_LIBRARY_PATH
environment variable, or the path specified in the command line. Otherwise, it may be necessary to run the fixso
program on the library separately, before fixing the target object.
Some changes made to shared libraries cannot be reconciled by fixso
. The fixso
utility does not support:
soname
values