5   Symbol Table (V3.13)

One of the chief tasks of the compilation process is the production of a symbol table, which is a collection of data structures whose purpose is to store type, scope, and address information about program data. Compilers and assemblers create the symbol table. It is read and may be modified by linkers, profiling tools, and assorted object manipulation tools. It also contains information required for debugging.

For large applications, a single compilation can involve many program components, including source files, header files, and libraries. Data from all of these files must be described in the symbol table.

The Tru64 UNIX eCOFF symbol table, when present, comprises a large portion of the physical object file and is often considered a stand-alone entity. It is divided into numerous sections, including a header section that is used for navigation. The contents of the symbol table are shown in Figure 5-1.

Figure 5-1 Symbol Table Sections

The symbol table has a hierarchical design. The sections storing local symbols, local strings, relative file descriptors, procedure descriptors, line numbers, auxiliary symbols, and optimization symbols are divided into subtables and organized by file. Local symbols, local strings, and optimization symbols are further broken down by procedure. Figure 5-2 depicts this hierarchy.

Figure 5-2 Symbol Table Hierarchy

A particular symbol table may not contain all sections, for one of the following reasons:

The function of each symbol table section is summarized below:

Several tools are available to view the contents of the symbol table. See the stdump(1), odump(1), and nm(1) man pages.

This chapter covers symbol table organization and usage, concentrating on debugging issues in particular. The version of the symbol table covered is V3.13. The dynamic symbol table built by the linker is discussed separately in Section 6.3.3.


5.1   New or Changed Symbol Table Features

Version 3.13 of the symbol table includes the following new or changed features:

 


5.2   Structures, Fields and Values for Symbol Tables

Unless otherwise specified, all structures described in this section are declared in the header file sym.h, and all constants are defined in the header file symconst.h.


5.2.1   Symbolic Header (HDRR)

typedef struct {
        coff_ushort	magic;          
        coff_ushort	vstamp;         
        coff_int	ilineMax;       
        coff_int	idnMax;         
        coff_int	ipdMax;         
        coff_int	isymMax;        
        coff_int	ioptMax;        
        coff_int	iauxMax;        
        coff_int	issMax;         
        coff_int	issExtMax;      
        coff_int	ifdMax;         
        coff_int	crfd;           
        coff_int	iextMax;        
        coff_long	cbLine;         
        coff_off	cbLineOffset;   
        coff_off	cbDnOffset;     
        coff_off	cbPdOffset;     
        coff_off	cbSymOffset;    
        coff_off	cbOptOffset;    
        coff_off	cbAuxOffset;    
        coff_off	cbSsOffset;     
        coff_off	cbSsExtOffset;  
        coff_off	cbFdOffset;     
        coff_off	cbRfdOffset;    
        coff_off	cbExtOffset;    
} HDRR, *pHDRR;

SIZE - 144 bytes, ALIGNMENT - 8 bytes

 

Symbolic Header Fields

magic
To verify validity of the symbol table, this field must contain the constant magicSym, defined as 0x1992.
vstamp
Symbol table version stamp. This value consists of a major version number and a minor version number, as defined in the stamp.h header file:

MAJ_SYM_STAMP
3
High byte
MIN_SYM_STAMP
13
Low byte

 
See Section 5.1 for a list of symbol table features introduced with version V3.13.
ilineMax
Number of line number entries (if expanded).
idnMax
Obsolete.
ipdMax
Number of procedure descriptors.
isymMax
Number of local symbols.
ioptMax
Byte size of optimization symbol table.
iauxMax
Number of auxiliary symbols.
issMax
Byte size of local string table.
issExtMax
Byte size of external string table.
ifdMax
Number of file descriptors.
crfd
Number of relative file descriptors.
iextMax
Number of external symbols.
cbLine
Byte size of (packed) line number entries.
cbLineOffset
Byte offset to start of (packed) line numbers.
cbDnOffset
Obsolete.
cbPdOffset
Byte offset to start of procedure descriptors.
cbSymOffset
Byte offset to start of local symbols.
cbOptOffset
Byte offset to start of optimization entries.
cbAuxOffset
Byte offset to start of auxiliary symbols.
cbSsOffset
Byte offset to start of local strings.
cbSsExtOffset
Byte offset to start of external strings.
cbFdOffset
Byte offset to start of file descriptors.
cbRfdOffset
Byte offset to start of relative file descriptors.
cbExtOffset
Byte offset to start of external symbols.

 

General Notes

The size and offset fields describing symbol table sections must be set to zero if the section described is not present.

The cb*Offset fields are byte offsets from the beginning of the object file.

The i*Max fields contain the number of entries for a symbol table section. Legal index values for a symbol table section will range from 0 to the value of the associated i*Max field minus one.

For an explanation of packed and expanded line number entries, see the discussion in Section 5.3.2.2.


5.2.2   File Descriptor Entry (FDR)

typedef struct fdr {
        coff_addr	adr;    
        coff_long	cbLineOffset;   
        coff_long	cbLine;         
        coff_long	cbSs;           
        coff_int	rss;            
        coff_int	issBase;        
        coff_int	isymBase;      
        coff_int	csym;         
        coff_int	ilineBase;    
        coff_int	cline;       
        coff_int	ioptBase;   
        coff_int	copt;      
        coff_int	ipdFirst;       
        coff_int	cpd;            
        coff_int	iauxBase;      
        coff_int	caux;        
        coff_int	rfdBase;    
        coff_int	crfd;           
        coff_uint	lang : 5;      
        coff_uint	fMerge : 1;  
        coff_uint	fReadin : 1;
        coff_uint	fBigendian : 1;
        coff_uint	glevel : 2;    
        coff_uint	fTrim : 1;    
        coff_uint	reserved: 5;  
        coff_ushort	vstamp;
        coff_uint       reserved2;
} FDR, *pFDR;   

SIZE - 96 bytes, ALIGNMENT - 8 bytes

See Section 5.3.2.1 for related information.

 

File Descriptor Table Entry Fields

adr
Address of first instruction generated from this source file, which should be the same value as found in the PDR.adr field of the first procedure descriptor for this file. If no instructions are associated with this source file, this field should be set to 0. File descriptors that have been merged by source language in locally-stripped objects will have this field set to addressNil (-1).
cbLineOffset
Byte offset from start of packed line numbers to start of entries for this file.
cbLine
Byte size of packed line numbers for this file.
cbSs
Byte size of local string table entries for this file.
rss
Byte offset from start of file's local string table entries to source file name; set to issNil (-1) to indicate the source file name is unknown.
issBase
Start of local strings for this file.
isymBase
Starting index of local symbol entries for this file.
csym
Count of local symbol entries for this file.
ilineBase
Starting index of line number entries (if expanded) for this file.
cline
Count of line number entries (if expanded) for this file.
ioptBase
Byte offset from start of optimization symbol table to optimization symbol entries for this file.
copt
Byte size of optimization symbol entries for this file.
ipdFirst
Starting index of procedure descriptors for this file.
cpd
Count of procedure descriptors for this file.
iauxBase
Starting index of auxiliary symbol entries for this file.
caux
Count of auxiliary symbol entries for this file.
rfdBase
Starting index of relative file descriptors for this file.
crfd
Count of relative file descriptors for this file.
lang
Source language for this file (see Table 5-1).
fMerge
Informs linker whether this file can be merged.
fReadin
True if file was read in (as opposed to just created).
fBigendian
Unused.
glevel
Symbolic information level with which this file was compiled. This value is not the same as the user's idea of debugging levels. The value mapping from the user level (-g compiler switch value) to the symbol table value is:

Debug switch
-g0
-g1
-g2
-g3
glevel contents
2
1
0
3

 
fTrim
Unused.
vstamp
Symbol table version stamp (HDRR.vstamp) value from the original object module (.o file) that is recorded by the linker. The linker may combine objects that were compiled at different times and potentially contain different versions of the symbol table. In post-link objects, this value may or may not match the version stamp in the symbolic header. For pre-link objects, the values in this field and the symbolic header stamp should be the same.
reserved, reserved2
Must be zero.

 

General Notes

The i*Base fields provide the starting indices of this file's subtables within the symbol table sections. If the associated count fields are set to 0, the base fields will also be set to zero.

For an explanation of packed and expanded line number entries, see the discussion in Section 5.3.2.2.

Table 5-1 Source Language (lang) Constants

Name

Value

Comment

langC

0

 

langPascal

1

 

langFortran

2

 

langAssembler

3

 

langMachine

4

 

langNil

5

 

langAda

6

 

langPl1

7

 

langCobol

8

 

langStdc

9

 

langMIPSCxx

10

Unused.

langDECCxx

11

 

langCxx

12

 

langFortran90

13

Not used by all compilers - langFortran might be used instead for both f77 and f90

langBliss

14

 

langMax

31

Number of language codes available

 


5.2.3   Procedure Descriptor Entry (PDR)

struct pdr {
        coff_addr	adr;    
        coff_long	cbLineOffset;   
        coff_int	isym;          
        coff_int	iline;        
        coff_uint	regmask;     
        coff_int	regoffset;  
        coff_int	iopt;          
        coff_uint	fregmask;     
        coff_int	fregoffset;  
        coff_int	frameoffset;
        coff_int	lnLow;          
        coff_int	lnHigh;        
        coff_uint	gp_prologue : 8; 
        coff_uint	gp_used : 1;   
        coff_uint	reg_frame : 1;
        coff_uint	prof : 1;      
        coff_uint	reserved : 13; 
        coff_uint	localoff : 8; 
        coff_ushort	framereg;     
        coff_ushort	pcreg;         
} PDR, *pPDR;

SIZE - 64 bytes, ALIGNMENT - 8 bytes

See Section 5.3.4 for related information.

 

Procedure Descriptor Table Entry Fields

adr
The start address of this procedure. Set to addressNil (-1) for procedures with no text. This field may not be updated by the linker in symbol table versions prior to V3.13. To determine the procedure start address in pre-V3.13 symbol tables, use the algorithm described in Section 5.3.4.2.
cbLineOffset
Byte offset to the start of this procedure's line numbers from the start of the file descriptor entry (FDR.cbLineOffset).
isym
Start of local symbols for this procedure. This symbol is the symbol for the procedure (symbol type stProc). The name of the procedure can be obtained from the iss field of the symbol table entry.
If the object is stripped of local symbol information, this field contains an external symbol table index for the procedure symbol's entry.
If this procedure has no symbols associated with it, this field should be set to isymNil (-1). This situation occurs for a static procedure in an object stripped of local symbol information.
iline
Start of line number entries (if expanded) for this procedure. Set to ilineNil (-1) to indicate that this procedure does not have line numbers.
regmask
Saved general register mask.
regoffset
Offset from the virtual frame pointer to the general register save area in the stack frame.
iopt
Start of procedure's optimization symbol entries. Set to ioptNil (-1) to indicate that this procedure does not have optimization symbol entries.
fregmask
Saved floating-point register mask.
fregoffset
Offset from the virtual frame pointer to the floating-point register save area in the stack frame.
frameoffset
Size of the fixed part of the stack frame. The actual frame size can exceed this value. A routine can extend its own frame size for frame sizes larger than 2 GB or for dynamic stack allocation requests.
lnLow
Lowest source line number within this file for the procedure. This is typically the line number of the first instruction in the procedure, but not always. Code optimizations can rearrange or remove instructions making the first instruction map to a different line number.
lnHigh
Highest source line number within this file for the procedure. This field contains a value of -1 for alternate entry points, which is how an alternate entry point is identified.
gp_prologue
Byte size of gp prologue.
gp_used
Flag set if the procedure uses gp.
reg_frame
True if the procedure is a light-weight or null-weight procedure. See the General Notes section following these definitions for more details on procedure weights.
prof
True if the procedure has been compiled with –pg for gprof profiling.
reserved
Must be zero.
localoff
Bias value for accessing local symbols on the stack at run time.
framereg
Frame pointer register number.
pcreg
PC (Program Counter) register number.

 

General Notes:

For more information on call frames, see Section 5.3.4.1.

If the value of gp_prologue is zero and gp_used is 1, a gp prologue is present but was scheduled into the procedure prologue.

For an explanation of packed and expanded line number entries, see the discussion in Section 5.3.2.2.

A procedure may be heavy-, light-, or null-weight. The weight of a procedure can be determined from its descriptor by using the following guidelines:

Weight

Indications

Heavy

reg_frame is 0 and bit 26 of the register mask (regmask) is on

Light

reg_frame is 1 and regoffset is ra_save

Null

reg_frame is 1 and regoffset is 26

 

See the Calling Standard for Alpha Systems for details on the calling conventions for different weight procedures. Note that a calling routine does not need to know the weight of the routine being called.


5.2.4   Line Number Entry (LINER)

Line numbers are represented using two formats: packed and expanded. The packed format is a byte stream that can be interpreted as described in Section 5.3.2.2 to build an expanded table that maps instructions to source line numbers. The LINER field is used to refer to a single entry in the expanded table. It is declared as:

typedef int LINER, *pLINER;

A second, newer form of line number information is located in the optimization symbols section. See Section 5.2.10 and Section 5.3.2.2.


5.2.5   Local Symbol Entry (SYMR)

typedef struct {
        coff_long	value;       
        coff_int	iss;           
        coff_uint	st : 6;      
        coff_uint	sc  : 5;    
        coff_uint	reserved : 1; 
        coff_uint	index : 20; 
} SYMR, *pSYMR;

SIZE - 16 bytes, ALIGNMENT - 8 bytes

See Section 5.2.11, Section 5.3.4, and Section 5.3.8 for related information.

 

Local Symbol Table Entry Fields

value
A field that can contain an address, size, offset, or index. Its interpretation is determined by the symbol type and storage class combination, as explained in Section 5.2.11.
iss
Byte offset from the issBase field of a file descriptor table entry to the name of the symbol. If the symbol does not have a name, this field is set to issNil (-1). Generally, all user-defined symbols have names. A symbol without a name is one that has been created by the compilation system for its own use.
st
Symbol type (see Table 5-2).
sc
Storage class (see Table 5-3).
reserved
Must be zero.
index
An index into either the local symbol table or auxiliary symbol table, depending on the symbol type and class. The index is used as an offset from the isymBase field in the file descriptor entry for an entry in the local symbol table or an offset from the iauxBase field for an entry in the auxiliary symbol table.
The index field may have a value of indexNil, which is defined as (long)0xfffff. This value is used to indicate that the index is not a valid reference.

The next two tables contain all defined values for the st and sc constants, along with short descriptions. However, these fields must be considered as pairs that have a limited number of possible pairings as explained in Section 5.2.11.

Table 5-2 Symbol Type (st) Constants

Constant

Value

Description

stNil

0

Dummy entry

stGlobal

1

Global variable

stStatic

2

Static variable

stParam

3

Procedure argument

stLocal

4

Local variable

stLabel

5

Label

stProc

6

Global procedure

stBlock

7

Start of block

stEnd

8

End of block, file, or procedure

stMember

9

Member of class, structure, union, or enumeration

stTypedef

10

User-defined type definition

stFile

11

Source file name

stStaticProc

14

Static procedure

stConstant

15

Constant data

stBase

17

Base class (for example, C++)

stVirtBase

18

Virtual base class (for example, C++)

stTag

19

Data structure tag value (for example, C++ class or struct)

stInter

20

Interlude (for example, C++)

stModule

22

Fortran90 module definition;
not yet implemented

stNamespace

22

Namespace definition (for example, C++)

stModview

23

Modifiers for current view of given module;
not yet implemented

stUsing

23

Namespace use (for example, C++ "using").

stAlias

24

Defines an alias for another symbols. Currently, only used for namespace aliases.

 

Table 5-3 Storage Class (sc) Constants

Constant

Value

Description

scNil

0

Dummy entry

scText

1

Symbol allocated in the .text section

scData

2

Symbol allocated in the .data section

scBss

3

Symbol allocated in the .bss section

scRegister

4

Symbol allocated in a register

scAbs

5

Symbol value is absolute

scUndefined

6

Symbol referenced but not defined in the current module

scUnallocated

7

Storage not allocated for this symbol

scTlsUndefined

9

Undefined TLS symbol

scInfo

11

Symbol contains debugger information

scSData

13

Symbol allocated in the .sdata section

scSBss

14

Symbol allocated in the .sbss section

scRData

15

Symbol allocated in the .rdata section

scVar

16

Parameter passed by reference (for example, Fortran or Pascal)

scCommon

17

Common symbol

scSCommon

18

Small common symbol

scVarRegister

19

Parameter passed by reference in a register

scVariant

20

Variant record (for example, Pascal or Ada)

scFileDesc

20

File descriptor (for example, COBOL)

scSUndefined

21

Small undefined symbol

scInit

22

Symbol allocated in the .init section

scReportDesc

23

Report descriptor (for example, COBOL)

scXData

24

Symbol allocated in the .xdata section

scPData

25

Symbol allocated in the .pdata section

scFini

26

Symbol allocated in the .fini section

scRConst

27

Symbol allocated in the .rconst section

scTlsCommon

29

TLS unallocated data

scTlsData

30

Symbol allocated in the .tlsdata section

scTlsBss

31

Symbol allocated in the .tlsbss section

scMax

32

Maximum number of storage classes


5.2.6   External Symbol Entry (EXTR)

typedef struct {
        SYMR          asym;     
        coff_uint     jmptbl:1;    
        coff_uint     cobol_main:1;  
        coff_uint     weakext:1;    
        coff_uint     reserved:29; 
        coff_int      ifd;         
} EXTR, *pEXTR;

SIZE - 24 bytes, ALIGNMENT - 8 bytes

 

External Symbol Table Entry Fields

asym
External symbol table entry. This structure has the same format as a local symbol entry. The field interpretations differ somewhat:
value
Contains the symbol address for most defined symbols. See Section 5.2.11 for details.
iss
Byte offset in external string table to symbol name. Set to issNil (-1) if there is no name for this symbol.
st
Symbol type. See Table 5-2 for possible values.
sc
Storage class. See Table 5-3 for possible values.
reserved
Must be zero.
index
Can contain an index into the auxiliary symbol table for a type description or an index into the local symbol table to pointing to a related symbol.
jmptbl
Unused.
cobol_main
Flag set to indicate that the symbol is a COBOL main procedure.
weakext
Flag set to identify the symbol as a weak external. See Section 6.3.4.2 for more details on weak symbols.
reserved
Must be zero.
ifd
Index of the file descriptor where the symbol is defined. Set to ifdNil (-1) for undefined symbols and for some compiler system symbols.


5.2.7   Relative File Descriptor Entry (RFDT)

The relative file descriptor table provides a post-link mapping of file descriptor indices. The purpose of this table is to minimize work for the linker, which does not update symbol table references to local symbols. This information is used to obtain the file offset used to bias local symbol indices. Because this table is also known as the File Indirect Table, two declarations are included in the sym.h header file, as shown here.

typedef int RFDT, *pRFDT;
typedef int FIT, *pFIT;

SIZE - 4 bytes, ALIGNMENT - 4 bytes

See Section 5.3.2.1 for related information.


5.2.8   Auxiliary Symbol Table Entry (AUXU)

The auxiliary symbol table entry is a 32-bit union. It is either interpreted as a TIR or RNDXR structure or as an integer value. See Section 5.3.7.3 for detailed instructions on reading the auxiliary symbols.

typedef union {
        TIR 		ti;             
        RNDXR		rndx;          
        coff_int	dnLow;        
        coff_int	dnHigh;      
        coff_int	isym;       
        coff_int	iss;       
        coff_int	width;    
        coff_int	count;   
} AUXU, *pAUXU;

SIZE - 4 bytes, ALIGNMENT - 4 bytes

See Section 5.3.7.3 for related information.

 

Auxiliary Symbol Table Entry Fields

ti
Type information record (TIR), as defined in Section 5.2.8.1.
rndx
Relative index into local or auxiliary symbols (RNDX), as defined in Section 5.2.8.2.
dnLow
Lower bound of range or array dimension. For large structures, two of these fields can be used together to form one 64-bit number.
dnHigh
Upper bound of range or array dimension. For large structures, two of these fields can be used together to form one 64-bit number.
isym
For procedures (stProc or stStaticProc symbols), this field is an index into the local symbols. It is also used as an index into the relative file descriptors.
iss
Unused.
width
Width of a bit field or array stride in bits. Fortran compilers set the array stride to the array element size in bits. Two of these fields can be used together to form one 64-bit number.
count
Count of ranges for variant arm. This field name is only used within the type description of a variant block (stBlock, scVariant).

General Notes:

The fields dnLow, dnHigh, or width must all use either the 32-bit or 64-bit representation when used together. For example, an array dimension cannot be specified with a 32-bit dnLow and a 64-bit dnHigh.


5.2.8.1   Type Information Record (TIR)

typedef struct {
        coff_uint	fBitfield : 1;
        coff_uint	continued : 1;
        coff_uint 	bt  : 6;     
        coff_uint 	tq4 : 4;
        coff_uint 	tq5 : 4;
        coff_uint 	tq0 : 4;
        coff_uint 	tq1 : 4;    
        coff_uint 	tq2 : 4;
        coff_uint 	tq3 : 4;
} TIR, *pTIR;

SIZE - 4 bytes, ALIGNMENT - 4 bytes

Type Information Record Entry Fields

fBitfield
Flag set if bit width is specified.
continued
Flag set to indicate that the type description is continued in another TIR record. This will happen if the type is represented with more than six type qualifiers.
bt
Basic type (see Table 5-4 and Section 5.3.7.1).
tq0, tq1, tq2, tq3, tq4, tq5
Type qualifiers (see Table 5-5 and Section 5.3.7.2). The lower-numbered tq fields must be used first, and all unneeded fields must be set to tqNil (0).

Table 5-4 Basic Type (bt) Constants

Constant

Value

Description

btNil

0

Undefined or void

btAdr32

1

Address

btChar

2

Character

btUChar

3

Unsigned character

btShort

4

Short (16 bits)

btUShort

5

Unsigned short (16 bits)

btInt

6

Integer (32 bits)

btUInt

7

Unsigned integer (32 bits)

btLong32

8

Long (32 bits)

btULong32

9

Unsigned long (32 bits)

btFloat

10

Floating point

btDouble

11

Double-precision floating point

btStruct

12

Structure or record

btUnion

13

Union

btEnum

14

Enumeration

btTypedef

15

Defined by means of a user-defined type definition

btRange

16

Range of values (for example, Pascal subrange)

btSet

17

Sets (for example, Pascal)

btComplex

18

Currently unused

btDComplex

19

Currently unused

btIndirect

20

Indirect definition; following rndx points to an entry in the auxiliary symbol table that contains a TIR (type information record)

btFixedBin

21

Fixed binary (for example, COBOL)

btDecimal

22

Packed or unpacked decimal (for example, COBOL)

btPicture

25

Picture (for example, COBOL)

btVoid

26

Void

btPtrMem

27

Currently unused

btScaledBin

27

Scaled binary (for example, COBOL)

btVptr

28

Virtual function table (for example, C++)

btArrayDesc

28

Array descriptor (for example, Fortran, Pascal)

btClass

29

Class (for example, C++)

btLong64

30

Address

btLong

30

Long (64 bits)

btULong64

31

Unsigned long (64 bits)

btULong

31

Unsigned long (64 bits)

btLongLong

32

Long long (64 bits)

btULongLong

33

Unsigned long long (64 bits)

btAdr64

34

Address (64 bits)

btAdr

34

Address (64 bits)

btInt64

35

Integer (64 bits)

btUInt64

36

Unsigned integer (64 bits)

btLDouble

37

Long double floating point (128 bits)

btInt8

38

Integer (64 bits)

btUInt8

39

Unsigned integer (64 bits)

btRange_64

41

64-bit range

btProc

42

Procedure or function

btChecksum

63

Symbol table checksum value stored in auxiliary record

btMax

64

Number of basic type codes

Table Notes:

  1. btInt and btLong32 are synonymous.
  2. btUInt and btULong32 are synonymous.
  3. btLong, btLong64, btLongLong, btInt64, and btInt8 are synonymous.
  4. btULong64, btULongLong, btUInt64, and btUInt8 are synonymous.

 

Table 5-5 Type Qualifier (tq) Constants

Constant

Value

Description

tqNil

0

No qualifier (placeholder)

tqPtr

1

Pointer

tqProc

2

Procedure or function (obsolete)

tqArray

3

Array

tqFar

4

32-bit pointer; used with the -xtaso emulation

tqVol

5

Volatile

tqConst

6

Constant

tqRef

7

Reference

tqArray_64

8

Large array

tqHasLen

9

Reserved

tqShar

10

Reserved

tqSharArr_64

11

Reserved

tqMax

16

Number of type qualifier codes


5.2.8.2   Relative Symbol Record (RNDXR)

typedef struct {
        coff_uint	rfd : 12;    
        coff_uint	index : 20; 
} RNDXR, *pRNDXR;

SIZE - 4, ALIGNMENT - 4

Relative Symbol Record Fields

rfd
Index into relative file descriptor table if it exists; otherwise, index into file descriptor table.
This field may have a value of ST_RFDESCAPE, defined as 0xfff in the header file cmplrs/stsupport.h. This value is used to indicate that the next auxiliary entry, interpreted as an isym, contains the index.
index
Symbol index. Used as an offset from either FDR.isymbase or FDR.iauxbase, depending on context.


5.2.9   String Table

The string table is composed of two parts: the local string table and the external string table. In the on-disk symbol table, the external strings follow the local strings. The local string table is present only for objects created with full debugging information; it is removed if an object is locally stripped.

The storage format for the string table is a list of null-terminated character strings. It is correctly considered as one long character array, not an array of strings. Fields in the symbolic header and file headers represent string table sizes and offsets in bytes.


5.2.10   Optimization Symbol Entry (PPODHDR)

typedef struct {
        coff_uint	ppode_tag;
        coff_uint 	ppode_len;
        coff_ulong 	ppode_val;
} PPODHDR, *pPPODHDR;

SIZE - 16 bytes, ALIGNMENT - 8 bytes

See Section 5.3.3 for related information.

 

Optimization Symbol Entry Fields

ppode_tag
Identifies the kind of data described by this entry.
ppode_len
Indicates the size in bytes of the data that is found in the raw data area for this entry. When this field is zero, the only data is stored in the ppode_val field.
ppode_val
This field is either a pointer to the entry's data or is itself the data. If ppode_len is nonzero, this field is a relative file offset from the beginning of the current Per-Procedure Optimization Descriptor (PPOD) to the applicable data area. If ppode_len is zero, this field contains the data for the entry.
 

Table 5-6 Optimization Tag Values

Name

Value

Description

PPODE_STAMP

1

Version number of the PPOD stored in ppode_val. The current PPOD_VERSION value is 1

PPODE_END

2

End of entries for this PPOD

PPODE_EXT_SRC

3

Extended source line information

PPODE_SEM_EVENT

4

Semantic event information. (Reserved for future use.)

PPODE_SPLIT

5

Split lifetime information. (Reserved for future use.)

PPODE_DISCONTIG_SCOPE

6

Discontiguous scope information. (Reserved for future use.)

PPODE_INLINED_CALL

7

Inlined procedure call information. (Reserved for future use.)

PPODE_PROFILE_INFO

8

Profile feedback information.

 


5.2.11   Symbol Type and Class (st/sc) Combinations

Entries in the symbol table are primarily identified by the combination of their symbol type (st) and storage class (sc) values. Not all combinations are valid. Figure 5-3 indicates which combinations are currently in use.

Figure 5-3 st/sc Combination Matrix

Interpretation of storage class column labels:
    Ab. scAbs         RC. scRConst        TC. scTlsCommon
    BV. scBasedVar    RD. scRData         TD. scTlsData
    Bi. scBits        RI. scRegImage      TU. scTlsUndefined
    Bs. scBss         Re. scRegister      Ua. scUnallocated
    Co. scCommon      Rp. scReportDesc    Un. scUndefined
    Da. scData        SB. scSBss          US. scUserStruct
    FD. scFileDesc    SC. scSCommon       Va. scVar
    Fi. scFini        SD. scSData         VR. scVarRegister
    If. scInfo        SU. scSUndefined    Vt. scVariant
    In. scInit        Sy. scSymref        XD. scXData
    Ni. scNil         Te. scText
    PD. scPData       TB. scTlsBss 


              sc |ABBBC|DFFII|NPRRR|RRSSS|SSTTT|TTUUU|VVVX
    st           |bViso|aDifn|iDCDI|epBCD|UyeBC|DUanS|aRtD
    -------------+-----+-----+-----+-----+-----+-----+----
    stAlias      |     |   X |     |     |     |     |
    stBase       |     |   X |     |     |     |     |
    stBlock      |    X| X X |     | X   |  X  |     |  X
    stConstant   |X  X |X  X |  X  |  X X|     |     |
    stEnd        |    X| X X |     | X   |  X  |     |  X
    stExpr       |     |     |     |     |     |     |
    stFile       |     |     |     |     |  X  |     |
    stForward    |     |     |     |     |     |     |
    stGlobal     |   XX|X    |  XX |  XXX|X  XX|XX X |
    stInter      |     |   X |     |     |     |     |
    stLabel      |X  X |X X X| XXX |  X X|  XX |X X  |   X
    stLocal      |X  X |X X X| XXX |X X X|  XX |X X  |XX X
    stMember     |     | X X |     | X   |     |     |
    stModule     |     |     |     |     |     |     |
    stModview    |     |     |     |     |     |     |
    stNamespace  |     |   X |     |     |     |     |
    stNil        |     |     |     |     |     |     |
    stNumber     |     |     |     |     |     |     |
    stParam      |X  X |X  X |  XX |X X X|     |  X  |XX
    stProc       |     |   X |X    |     |  X  |   X |
    stRegReloc   |     |     |     |     |     |     |
    stSplit      |     |     |     |     |     |     |
    stStaParam   |     |     |     |     |     |     |
    stStatic     |   XX|X  X |  XX |  X X|   X |X    |
    stStaticProc |     |  X X|     |     |  X  |     |
    stStr        |     |     |     |     |     |     |
    stTag        |     |   X |     |     |     |     |
    stType       |     |     |     |     |     |     |
    stTypedef    |     |   X |     |     |     |     |
    stUsing      |     |   X |     |     |     |     |
    stVirtBase   |     |   X |     |     |     |     |

A symbol's type and class taken together determines interpretation of other fields in the symbol table entry. The same combination can be used for different purposes in different contexts. As a result, to understand the symbol entry, it also may be necessary to access type information in the auxiliary table or the source language information in the file descriptor.

The contents of the value and index fields for each combination, with a brief explanation of the symbol's use, are described in the following list of combinations. For many combinations, greater detail can be found in Section 5.3.7 and Section 5.3.8 .

stGlobal,sc(S)Data/(S)Bss/RData/Rconst

stGlobal,scTlsData/TlsBss

stGlobal, sc(S)Common/TlsCommon

stGlobal, sc(S)Undefined/TlsUndefined

stStatic, sc(S)Data/(S)Bss/RData/Rconst

stStatic, scTlsData/TlsBss

stStatic, scCommon

stStatic, scInfo

stParam, scAbs

stParam, scRegister

stParam, scVar

stParam, scVarRegister

stParam, scInfo

stParam, sc(S)Data/(S)Bss/Rconst/Rdata

stParam, scUnallocated

stLocal, scAbs

stLocal, scRegister

stLocal, scVar

stLocal, scVarRegister

stLocal, scUnallocated

stLocal, scText/Init/Fini/(S)Data/(S)Bss/Rconst/Rdata/TlsData/TlsBss

stLabel, scAbs

stLabel, scText/Init/Fini/(S|X|P|R)Data/(S)Bss/Rconst/TlsData/TlsBss

stLabel, scUnallocated

stProc, scNil

stProc, scText

stProc, scUndefined

stProc, scInfo

stBlock, scText

stBlock, scInfo

stBlock, scCommon

stBlock, scVariant

stBlock, scFileDesc/scReportDesc

stEnd, scText

stEnd, scInfo

stEnd, scCommon

stEnd, scVariant

stEnd, scFileDesc/scReportDesc

stMember, scInfo

stMember, scFileDesc/scReportDesc

stTypedef, scInfo

stFile, scText

stStaticProc, scText

stStaticProc, scInit/Fini

stConstant, scInfo

stConstant, scAbs

stConstant, sc(S)Data/(S)Bss/RData/Rconst

stBase, scInfo

stVirtBase, scInfo

stTag, scInfo

stInter, scInfo

stNamespace, scInfo

stUsing, scInfo

stAlias, scInfo

 

Combinations may be valid in the local symbol table, the external symbol table, or both. Table 5-7 shows which combinations are valid in which table, based on the symbol type value and also the storage class value where necessary. Only combinations previously specified as valid apply where the storage class value is shown as a wildcard value with the character '*'.

Table 5-7 Valid Placement for st/sc Combinations

st/sc Combination

External Symbol Table

Local Symbol Table

stNil, *

X

X

stGlobal, *

X

 

stStatic, *

 

X

stParam, *

 

X

stLocal, scSCN1

X

 

stLocal, not scSCN1

 

X

stLabel, *

X

X

stProc, scInfo

 

X

stProc, scText

X

X

stProc, scUndefined

X

 

stBlock, *

 

X

stEnd, *

 

X

stMember, *

 

X

stTypedef, *

 

X

stFile, *

 

X

stStaticProc, scText

 

X

stStaticProc, scInit/Fini

X

 

stConstant, *

X

X

stBase, *

 

X

stVirtBase, *

 

X

stTag, *

 

X

stInter, *

 

X

stNamespace, *

 

X

stUsing, *

 

X

stAlias, *

 

X

Table Notes:

  1. scSCN = scData, scSData, scBss, scSBss, scRConst, scRData, scInit, scFini, scText, scXData, scPData, scTlsData, scTlsBss, scTlsInit

 


5.3   Symbol Table Usage


5.3.1   Levels of Symbolic Information

Different levels of symbolic information can be stored with an object file. Compilers often provide options that allow the user to choose the desired level of symbolic information for their program. This choice may be influenced by size considerations and debugging needs. A trade-off exists between the benefit of saving space in the object file and the amount of information available to tools that consume symbolic information.

It is also possible to change the amount of symbolic information present in a program that has already been compiled and linked. Information can be added or deleted. Two of the most common and useful operations are locally stripping and fully stripping the symbol tables in executable files. Tools that modify linked executables, such as instrumentation tools and code optimizers, may rewrite parts of the symbol table to reflect changes that they made.


5.3.1.1   Compilation Levels

The representation of symbolic information supported by compilers can be broken down into four levels:

  1. Minimal– Only information required for linking
  2. Limited– Source file and line number information for profiling and limited debugging (stack-tracing)
  3. Full– Complete debugging information for non-optimized code
  4. Optimized– Debugging information for optimized code

These levels correspond to the system compiler switches -g0 (minimal), -g1 (limited), -g2 (full), and
-g3 (optimized). Table 5-8 shows the symbol table sections that are produced by system compilers at each compilation level.

Table 5-8 Symbol Table Sections Produced at Various Compilation Levels

Symbol Table Section

Compilation Level

Minimal

Limited

Full

Optimized

Symbolic header

Yes

Yes

Yes

Yes

File Descriptors

Yes

Yes

Yes

Yes

External Symbols

Yes

Yes

Yes

Yes

External Strings

Yes

Yes

Yes

Yes

Procedure Descriptors

Yes

Yes

Yes

Yes

Line Numbers

No

Yes

Yes

Yes

Relative File Descriptors

No

No

Yes

Yes

Optimization Symbols

No

Partial

Yes

Yes

Local Symbols

No

Partial

Yes

Yes

Local Strings

No

Partial

Yes

Yes

Auxiliary Symbols

No

Partial

Yes

Yes

 

The minimal level of symbolic information that may be produced during compilation includes only the symbol information required for the linker to function properly. This includes external symbol information that is needed to perform symbol resolution and relocation.

If the limited level of symbolic information is requested, line number entries are generated, but the auxiliary table will contain only external symbol entries. Again, external symbol and procedure descriptors are available. In addition, local symbols for procedures (and the corresponding auxiliary symbols, optimization symbols, and local strings) are present. Limited symbolic information is sufficient to meet the needs of profiling tools. The information present at this level is a subset of that required for full debugger support.

If full symbolic information is included, all symbol table section are produced in full. This level enables full debugging support with complete type descriptions for local and external symbols. Optimization is disabled.

Optimized symbolic information is designed to balance the aims of performance and debugging capabilities. This level supplies the same information as the full debugging option, but it also allows all compiler optimizations. As a result, some of the correlation is lost between the source code and the executable program.

On Tru64 UNIX systems, users can choose to compile their programs with any one of the four levels of symbolic information. The options -g0, -g1, and -g2 specify increasing levels of symbolic information. The system compiler's default is to produce the minimal level (-g0). Currently, debugging of optimized code (-g3) is not fully supported. See cc(1) for more details.


5.3.1.2   Locally Stripped Images

Objects can be produced with only global symbolic information stored in the symbol table. Selection of the -x option causes the linker to create a locally-stripped object. Reasons for stripping local symbolic information include reducing file size and limiting the amount of symbolic information available to end users of an application.

A locally-stripped object is very similar to an object produced with minimal symbolic information (see Section 5.3.1.1). The difference is the consolidation of file descriptors, which the linker does only for locally-stripped objects.

In a locally-stripped image, the file descriptors are included solely for the purpose of identifying source file languages. One file descriptor is present for each source language involved in the compilation. These file descriptors will have their adr field set to addressNil indicating the file descriptors cannot be used to identify text addresses.

The procedure descriptor table is present in full but is rearranged to group procedures by source language. All procedure descriptors for procedures written in a particular source language are thus contiguous, and they reflect the file descriptor's information.

External symbols are also present in a locally-stripped image. The file indices (ifd field) of the external symbols are updated to identify the generic file descriptor for the appropriate source language. The index fields are set to zero to indicate that no type information is available. External symbols with the storage class scNil are removed. These are debugging symbols that are not normally produced for minimal symbol tables.

Limited debugging is possible with locally-stripped objects. Because the procedure descriptors are retained, stack traces are possible. External symbol information can also be viewed, and language-dependent handling of symbols (for example, C++ name demangling) is preserved.

A linked executable file can be locally stripped at any time after its creation using the ostrip -x option. The output is the same as described above. This operation may also alter the raw data of the .comment section. See Chapter 7 for details.


5.3.1.3   (Fully) Stripped Images

Executable files may be fully stripped at any time after creation using either the strip command or the ostrip -s command. Stripping an executable will result in complete removal of the symbol table, including the symbolic header. The file header fields f_symptr and f_nsyms are set to zero to indicate that the file has been stripped.

This operation may also alter the raw data of the .comment section. See Chapter 7 for details.


5.3.2   Source Information

The final executable image for a program bears little resemblance to the source code files from which it was created. One of the principal functions of the symbol table is to track the relationship between the two so that the debugger is able to describe the resulting program in a way that the programmer can recognize.


5.3.2.1   Source Files

Much of the complication of source information stems from the "include" system. When a compilation involves several source files, there may be duplication of the header files included in each source file, or of the source files themselves. To avoid repetition of header file information in the linked object, the linker merges the input objects' included files wherever possible. Compilers mark file descriptors as mergeable or unmergeable. The linker then examines the input file descriptors and performs the merge whenever possible.

The linker considers two file descriptors to be mergeable if all of the following criteria are met:

  1. The file descriptor fMerge bit is set in both (marked as mergeable by compiler).
  2. Files have the same name.
  3. Files are written in the same language.
  4. Files contain the same number of local and auxiliary symbols.
  5. Checksums match.
    The checksums match if either:
    1. Neither file's first auxiliary record is a btChecksum.
    2. Both files' first auxiliary record is a btChecksum and they are identical.

The role of the relative file descriptor (RFD) tables is to track file-relative information after merging. A relative file descriptor table entry maps the index of each file at compile time to its index after linking. After linking, local or auxiliary symbols must be accessed through the RFD table to obtain the updated file descriptor index. This mechanism is necessary because the indices in the local symbol table are not updated when files are merged.

Figure 5-4 is an example of the use of the relative file descriptor table.

Figure 5-4 Relative File Descriptor Table Example

For a symbol reference composed of a file index and symbol index (offset within file), the relative file descriptor table is used as follows:

  1. To look up given file index in the RFD table to get updated file index.
  2. To look up new file index in the (merged) file descriptor table to get base of symbols for that file.
  3. To add symbol index to file's base to access the symbol entry.

See Section 5.3.7.3 for the representation of relative indices in the auxiliary symbol table.


5.3.2.2   Line Number Information

For a debugger to be effective, a connection must be made between high-level-language statements in source files and the executable machine instructions in object files. Line number entries map executable instructions to source lines. This mapping allows a debugger to present to a programmer the line of source code that corresponds to the code being executed. The line number information is produced by the compiler and should be rewritten if an application such as an instrumentation tool or an optimizer modifies code.

In V3.13 of the Tru64 UNIX symbol table, line number information is emitted in two forms, one found in the line number table and one in the optimization symbol table. (Section5.3.3 describes the structure of the optimization symbol table.) The line number information found in the optimization symbol table is referred to as "extended source location information". This is a new form of line number information introduced in V3.13 symbol tables. The new line number information augments the information in the line number table. If both forms of line number information are present in an object the extended source line information will only be present for procedures that cannot be described adequately by entries in the line number table.


5.3.2.2.1   The Line Number Table

Line number information is generated for each source file that contributes executable code to a program. Within each source file, line numbers are organized by procedure, in the order of appearance in the file. The line number symbol table section is produced only when a program is compiled with limited or greater symbolic information (see Section 5.3.2.2).

Figure 5-5 illustrates of the organization of the line number table.

Figure 5-5 Line Number Table

The order outlined in Figure 5-5 is not guaranteed to match the ordering of file descriptors or procedure descriptors in those tables. To determine the bounds of the line number table entries for a specific procedure, fields in the associated file descriptor and procedure descriptors must be used. The starting offset for a procedure's line table entries is calculated directly from these fields. The ending offset can only be determined by finding the starting offset of the next procedure's entries in the line number table. An algorithm to identify the starting and ending line table offsets for a procedure follows.

IPD = index-of-procedure
IFD = index-of-file-containing-procedure

if (FDR[IFD].cbLine == 0 or
    (PDR[IPD].iline == ilineNil ))
    /* No line information for this procedure */

START_FILE_OFFSET = FDR[IFD].cbLineOffset
END_FILE_OFFSET = START_FILE_OFFSET + FDR[IFD].cbLine

START_PROC_OFFSET = START_FILE_OFFSET + PDR[IPD].cbLineOffset

NEXTIPD = -1
for (I = 0; I < FDR[IFD].cpd; I++)
    IPD2 = FDR[IFD].ipdFirst + I
    if (IPD2 != IPD and
        PDR[IPD2].iline != ilineNil and       /* No lines */
        PDR[IPD2].lnHigh != -1 and            /* Alt entry */
        PDR[IPD2].cbLineOffset > PDR[IPD].cbLineOffset)

        if (NEXTIPD == -1 or
            PDR[PID2].cbLineOffset < PDR[NEXTIPD].cbLineOffset)

            NEXTIPD = IPD2

if (NEXTIPD == -1)
    /* IPD is the last procedure with line numbers in the file */
    END_PROC_OFFSET = END_FILE_OFFSET
else
    END_PROC_OFFSET = START_FILE_OFFSET + PDR[NEXTIPD].cbLineOffset

Alternate entrypoints have a starting line number, but they have no specific ending line number. Procedure descriptors for a procedure and each of its associated alternate entrypoints share a common end offset in the line number table. See Section 5.3.6.7 for more information on alternate entrypoints.

The line number table has two forms. The "packed" form is used in the object file. The "expanded" form is a more useful representation to programmers and can be derived algorithmically (or by API) from the packed form.

The packed line numbers are stored as bytes. Each packed entry within the single byte value consists of two parts: count and delta. The count is the number of instructions generated from a source line. The delta is the number of source lines between the current source line and the previous one that generated executable instructions.

Figure 5-6 shows how these two values are represented.

Figure 5-6 Line Number Byte Format

The four-bit count is interpreted as an unsigned value between 1 and 16 (0 means 1, 1 means 2, and so forth). A zero value would be wasted when no instructions are generated for a source line and, as a result, no line number entry will exist for that line.

The four-bit delta is interpreted as a signed value in the range -7 to +7. The reason for this is that code generators may produce instructions that are not in the same order as the corresponding source lines. Therefore, the offset to the "next" source line may be a forwards or backward jump.

Either of these quantities may fall outside the permissible range. For a delta outside the range, an extended format exists (as shown in Figure 5-7).

Figure 5-7 Line Number 3-Byte Extended Format

For a count outside the range, one or more additional entries follow, with the delta set to zero.

If both fields are out of range, the delta is handled first. An extended-format delta representation is followed by an entry with the delta bits set to zero and the remainder of the count contained in the count value.

The packed line number format can be expanded to produce the instruction-to-source-line mapping that is needed for debugging. An algorithm to accomplish this transformation for a given procedure follows. The expanded line number array has a source line number entry for each instruction in the given procedure. The address of the first entry is the address recorded in the PDR.adr field. Subsequent entries correspond to contiguous sequential instruction addresses.

START_PROC_OFFSET = offset-of-procedure's-entries-in-line-table
END_PROC_OFFSET = offset-of-next-procedure's-line-table-entries

PACKED = HDRR.cbLineOffset + START_PROC_OFFSET 
CURRENTLINE = PDR.lnLow
EXPANDED = ALLOCATE(number-of-instructions-in-procedure)
 
for (I = 0; 
     I < (END_PROC_OFFSET - START_PROC_OFFSET)/sizeof(*PACKED); 
     I++)
    COUNT = (unsigned)(PACKED[0] & 0x0F) + 1
    DELTA =   (signed)(PACKED[0] & 0xF0) >> 4

    if (DELTA == (signed)0x8)     /* Extended delta */ 
        DELTA = (signed)((PACKED[2] << 8) | PACKED[1])
        PACKED += 2
    else
        PACKED += 1

    if (current-offset-matches-offset-of-alternate-entry)
        CURRENTLINE = PDR.lnLow of alternate entry

    CURRENTLINE += DELTA
 
    while (COUNT-- > 0)
        *EXPANDED = CURRENTLINE
        EXPANDED++

The following source listing of a file named lines.c provides an example that shows how the compiler assigns line numbers:

1   #include <stdio.h>
2   main()
3   {
4       char c;
5
6       printf("this program just prints input\n");
7       for (;;) {
8          if ((c =fgetc(stdin)) != EOF) break;
9       /*   this is a greater than 7-line comment
10           * 1
11           * 2
12           * 3
13           * 4
14           * 5
15           * 6
16           * 7
17           */
18           printf("%c", c);
19      } /* end for */
20  } /* end main */

The compiler generates line numbers only for the lines 2, 6, 8, 18, and 20; the other lines are either blank or contain only comments.

Table 5-9 shows the packed entries' interpretation for each source line.

Table 5-9 Line Number Example

Source Line

LINER contents

Interpretation

2

03

Delta 0, count 4

6

44

Delta 4, count 5

8

29

Delta 2, count 10

18 1

88 00 0a

Delta 10, count 9

19

10

Delta 1, count 1

20

14

Delta 1, count 5

Table Note:

  1. Extended format (delta is greater than 7 lines).

The compiler generates the following instructions for the example program:

  [lines.c:   2] 0x0:     ldah    gp, 1(t12)
  [lines.c:   2] 0x4:     lda     gp, -32592(gp)
  [lines.c:   2] 0x8:     lda     sp, -16(sp)
  [lines.c:   2] 0xc:     stq     ra, 0(sp)
  [lines.c:   6] 0x10:    ldq     a0, -32720(gp)
  [lines.c:   6] 0x14:    ldq     t12, -32728(gp)
  [lines.c:   6] 0x18:    jsr     ra, (t12), printf
  [lines.c:   6] 0x1c:    ldah    gp, 1(ra)
  [lines.c:   6] 0x20:    lda     gp, -32620(gp)
  [lines.c:   8] 0x24:    ldq     a0, -32736(gp)
  [lines.c:   8] 0x28:    ldq     t12, -32744(gp)
  [lines.c:   8] 0x2c:    jsr     ra, (t12), fgetc
  [lines.c:   8] 0x30:    ldah    gp, 1(ra)
  [lines.c:   8] 0x34:    lda     gp, -32640(gp)
  [lines.c:   8] 0x38:    and     v0, 0xff, t0
  [lines.c:   8] 0x3c:    stq     v0, 8(sp)
  [lines.c:   8] 0x40:    xor     t0, 0xff, t0
  [lines.c:   8] 0x44:    bne     t0, 0x6c
  [lines.c:  18] 0x48:    ldq     t2, 8(sp)
  [lines.c:  18] 0x4c:    sll     t2, 0x38, t2
  [lines.c:  18] 0x50:    sra     t2, 0x38, a1
  [lines.c:  18] 0x54:    ldq     a0, -32752(gp)
  [lines.c:  18] 0x58:    ldq     t12, -32728(gp)
  [lines.c:  18] 0x5c:    jsr     ra, (t12), printf
  [lines.c:  18] 0x60:    ldah    gp, 1(ra)
  [lines.c:  18] 0x64:    lda     gp, -32688(gp)
  [lines.c:  19] 0x68:    br      zero, 0x24
  [lines.c:  20] 0x6c:    bis     zero, zero, v0
  [lines.c:  20] 0x70:    ldq     ra, 0(sp)
  [lines.c:  20] 0x74:    lda     sp, 16(sp)
  [lines.c:  20] 0x78:    ret     zero, (ra), 1
  [lines.c:  20] 0x7c:    call_pal        halt

After applying the given algorithm, the following instruction-to-source mapping (formatted instruction number. source line number) is obtained:

           0.    2         1.    2         2.    2
           3.    2         4.    6         5.    6
           6.    6         7.    6         8.    6
           9.    8        10.    8        11.    8
          12.    8        13.    8        14.    8
          15.    8        16.    8        17.    8
          18.   18        19.   18        20.   18
          21.   18        22.   18        23.   18
          24.   18        25.   18        26.   19
          27.   20        28.   20        29.   20
          30.   20        31.   20

Header files included in an object have no associated line numbers recorded in the symbol table. Line number information for included files containing source code is not supported.


5.3.2.2.2   Extended Source Location Information (ESLI)

The line number table does not correctly describe optimized code or programs with untraditional source files, resulting in images that are difficult to debug. Extended Source Location Information (ESLI) is intended to provide more information to enable debugging of optimized programs, including PC and line number changes, file transitions, and line and column ranges. ESLI is essentially a superset of the older line number table.

ESLI is stored in the optimization symbols section. This information is accessible on a per-procedure basis from the procedure descriptors. See Section 5.3.3 for more detail on accessing information in the optimization symbols section.

ESLI is a byte stream that can be interpreted in two modes: data mode or command mode. Currently, two formats are defined for data mode. These are designated as "Data Mode 1" and "Data Mode 2". Additional data modes may be defined as needed.

Figure 5-8 ESLI Data Mode Bytes

Data Mode 1 is the initial mode for a procedure's ESLI. Data Mode 1 is identical to the packed line number format with the exception of the interpretation of the delta PC escape value '1000' (which indicates a switch to command mode).

In Data Mode 2, each entry consists of two bytes. The first byte is identical to the encoding and interpretation of Data Mode 1. The second byte is an absolute column number (from 0 to 255), where column number 0 indicates that column information is missing or not meaningful for this entry. The escape from Data Mode 2 to command mode consists of a delta PC escape value set to '1000' and column number set to 0.

In command mode, each byte is either a command or a command parameter. For a command byte, the low-order six bits are a command code, and the two high bits are used as flags, as shown in Figure 5-9. The "mark" flag, if set, announces that a new state has been established. Several commands may be required to fully describe a new state. The "resume" flag, if set, indicates the end of command mode. The next byte following a command with "resume" set will be a data mode byte. The same data mode that was in effect prior to the escape to command mode will be resumed. See Table 5-10 for a complete list of commands.

Figure 5-9 ESLI Command Byte

Command parameters are stored in LEB (Little Endian Byte) 128 format. See Section 1.4.6 for a description of this data representation. PC deltas are always expressed as machine instruction offsets and must be scaled by the size of a machine instruction before adding to the current PC. No other deltas need to be scaled.

Table 5-10 shows how to interpret the bytes in command mode. These definitions can be found in the system header file linenum.h.

Table 5-10 ESLI Commands

Name

Value

Number of Parameters

Type of Parameters

ADD_PC

1

1

SLEB

ADD_LINE

2

1

SLEB

SET_COL

3

1

LEB

SET_FILE

4

1

LEB

SET_DATA_MODE

5

1

LEB

ADD_LINE_PC

6

2

SLEB, SLEB

ADD_LINE_PC_COL

7

3

SLEB, SLEB, LEB

SET_LINE

8

1

LEB

SET_LINE_COL

9

2

LEB, LEB

 

ADD_PC
Parameter is a signed value to add to the current PC value.
ADD_LINE
Parameter is a signed value to add to the current line number.
SET_COL
Parameter is an unsigned value that represents a new column number. The column number is used to associate the PC with a particular location within a source line. Column number parameters use a zero-based representation that must be adjusted by adding 1.
SET_FILE
Parameter is an unsigned value used to switch file context. This command is typically followed by a set_line command.
SET_DATA_MODE
Parameter is an unsigned value used to set current data mode. The only parameter values that are currently accepted are 1 and 2. Additional data modes may be defined in future releases.
ADD_LINE_PC
Both parameters are signed values. The first is added to the PC and the second is added to the line number.
ADD_LINE_PC_COL
The first two parameters are signed values and the third is an unsigned value. The first two are added to the PC and line number respectively. The third is used to set the column number.
SET_LINE
Parameter is an unsigned value that sets the current line number.
SET_LINE_COL
Both parameters are unsigned values. The first represents the line number and the second represents the column number.
 

A tool reading the ESLI must maintain the current PC value, file number, line number, and column. Taken together, these four values represent the current "state". Consumers must also keep track of the mode in effect to interpret the data properly. The following example shows the instructions for consuming ESLI for one procedure.

MODE = data mode 1
FILE = current file
LINE = PDR.lnLow
COLUMN = 0
PC = PDR.adr
STATE_TABLE++ = (FILE,LINE,COLUMN,PC)
ESLI = GET_ESLI(PDR.iopt)
for ppode_len bytes of ESLI do
    if (MODE == data mode 1 or MODE == data mode 2)
        if (ESLI.delta == escape)
            PUSH_MODE(MODE)
            MODE = command mode
        else
            PC += 4 * ESLI.delta
            LINE += COUNT + 1
            if (MODE == data mode 1)
                STATE_TABLE++ = (FILE,LINE,COLUMN,PC)
        ESLI++
    if (MODE == data mode 2)
        COLUMN = ESLI++
        STATE_TABLE++ = (FILE,LINE,COLUMN,PC)
    if (MODE == command mode)
        read all parameters
        update FILE, LINE, COLUMN and PC as required
        if (mark flag set)
            STATE_TABLE++ = (FILE,LINE,COLUMN,PC)
        if (resume flag set)
            MODE = POP_MODE()
        ESLI += number-of-bytes-read

Data encoded in ESLI can be represented in tabular format. The PC value and file, line and column numbers can be stored as a state table. The following example shows how to build this state table.

In this example ESLI will record line numbers for a routine that includes text from a header file.

Source listing for line1.c:

1   /* ESLI example using included source lines */
2   
3   main() {
4      char *msg;
5   
6      msg = (char *)0;
7   
8   #include "line2.h"
9   
10     printf("%s", msg);
11  }

Source listing for line2.h

1   msg = (char *)malloc(20);
2   /*
3    *
4    *
5    *
6    *
7    *
8    *
9    *
10   */
11  strcpy(msg, "Hello\n");

The compiler generates the following instructions for the example program:

      main:
[line1.c:   3] 0x1200011d0:     ldah    gp, 8192(t12)
[line1.c:   3] 0x1200011d4:     lda     gp, 28336(gp)
[line1.c:   3] 0x1200011d8:     lda     sp, -16(sp)
[line1.c:   3] 0x1200011dc:     stq     ra, 0(sp)
[line1.c:   3] 0x1200011e0:     stq     s0, 8(sp)
[line1.c:   6] 0x1200011e4:     bis     zero, zero, s0
[line2.h:   1] 0x1200011e8:     bis     zero, 0x14, a0
[line2.h:   1] 0x1200011ec:     ldq     t12, -32560(gp)
[line2.h:   1] 0x1200011f0:     jsr     ra, (t12)
[line2.h:   1] 0x1200011f4:     ldah    gp, 8192(ra)
[line2.h:   1] 0x1200011f8:     lda     gp, 28300(gp)
[line2.h:   1] 0x1200011fc:     bis     zero, v0, s0
[line2.h:  11] 0x120001200:     bis     zero, s0, a0
[line2.h:  11] 0x120001204:     lda     a1, -32768(gp)
[line2.h:  11] 0x120001208:     ldq     t12, -32600(gp)
[line2.h:  11] 0x12000120c:     jsr     ra, (t12)
[line2.h:  11] 0x120001210:     ldah    gp, 8192(ra)
[line2.h:  11] 0x120001214:     lda     gp, 28272(gp)
[line1.c:  10] 0x120001218:     ldq_u   zero, 0(sp)
[line1.c:  10] 0x12000121c:     lda     a0, -32760(gp)
[line1.c:  10] 0x120001220:     bis     zero, s0, a1
[line1.c:  10] 0x120001224:     ldq     t12, -32552(gp)
[line1.c:  10] 0x120001228:     jsr     ra, (t12)
[line1.c:  10] 0x12000122c:     ldah    gp, 8192(gp)
[line1.c:  10] 0x120001230:     lda     gp, 28244(gp)
[line1.c:  11] 0x120001234:     bis     zero, zero, v0
[line1.c:  11] 0x120001238:     ldq     ra, 0(sp)
[line1.c:  11] 0x12000123c:     ldq     s0, 8(sp)
[line1.c:  11] 0x120001240:     lda     sp, 16(sp)
[line1.c:  11] 0x120001244:     ret     zero, (ra)

The ESLI and its interpretation for the generated code is shown in the following table.

Table 5-11 ESLI Example

ESLI bytes (hex)

Mode

Command
(M)ark (R)esume

State
(F)ile (L)ine (C)olumn

   

Code

M

R

PC (hex)

F

L

C

Initial State

Data1

     

1200011d0

0

3

0

04

Data1

     

1200011e4

0

3

0

30

Data1

     

1200011e8

0

6

0

80

Data1

Escape

           

04 01

Cmd

set_file(1)

     

1

   

48 01

Cmd

set_line(1)

 

X

   

1

 

05

Data1

     

120001200

1

1

0

80

Data1

Escape

           

86 0a 06

Cmd

add_line_pc(10,6)

X

 

120001218

1

11

0

04 00

Cmd

set_file(0)

     

0

   

48 0a

Cmd

set_line(10)

 

X

   

10

 

06

Data1

     

120001234

0

10

0

16

Data1

     

120001250

0

11

0

 

The handling of alternate entry points differs from the handling of main entry points. Procedure descriptors for alternate entry points are identified by a PDR.lnHigh value of -1. If the PC for an instruction maps to an alternate entry point, the following steps should be taken:


5.3.3   Optimization Symbols

The optimization symbols section gives individual producers and consumers the ability to communicate information about any aspect of the object file, in any form they choose. New information can be generated at any time with minimal coordination between all producers and consumers. In V3.13 of the symbol table, the optimization section may include extended source location information (see Section 5.3.2.2).

The optimization section is organized on a per-procedure basis. Each procedure descriptor has a pointer to the optimization symbols in the field PDR.iopt. If no optimization symbols are associated with the procedure, the field contains ioptNil. Otherwise, it contains the index of the first optimization symbol entry for this procedure. Consumers should access the optimization symbols through the procedure descriptors. The optimization section is not present in a locally-stripped object.

This section consists of a sequence of zero or more Per-Procedure Optimization Descriptions (PPODs), as shown in Figure 5-10. Each PPOD's internal structure consists of two parts:

  1. A leading sequence of structured entries using a Tag-Length-Value model to describe subsequent raw data. The structure of the PPOD entry can be found in Section 5.2.10.
  2. The raw data area.

Figure 5-10 Optimization Symbols Section

This section has the following alignment requirements:

Object file producers must produce either an empty optimization symbols section or a valid one. An empty one has the symbolic header fields cbOptOffset and ioptMax set to zero. If an optimization section is present, but a particular file does not contribute to it, the file descriptor field copt is set to zero. In this case, all procedure descriptors belonging to the file must have their iopt fields set to ioptNil.

Tools that both read and write object files must consume a valid optimization symbols section (if present in the input file) and produce an equivalent and valid section in its output file. If a tool does not know how to process the section contents, the section must be omitted from the output file. If a tool does know how to process portions of the optimization symbols, those portions may be modified and the rest should be removed. As usual, the linker is a special case. It concatenates input optimization symbols sections into one output section without reading or modifying any of the entries.

The format and flexible nature of this section are similar by design to the .comment section. The structures are the same size and contain the same fields (with different names), and the rules of navigation are the same. The primary difference is that the optimization section is broken down by procedure; whereas, the comment section must be treated as a whole.


5.3.4   Run-Time Information

The symbol table contains information that debuggers must interpret to find symbols at run time. This section describes the information that the static symbol table structures provides. Algorithms for determining run-time symbol addresses are included.


5.3.4.1   Stack Frames

A stack frame is a run-time memory structure that is created whenever a procedure is called. The Calling Standard for Alpha Systems specifies the stack frame format and related code requirements. This section explains how to interpret procedure descriptor fields related to the stack frame.

Two types of stack frames are supported: fixed-size frames and variable-size frames. The variable frame format is used for procedures that dynamically allocate memory and for those with very large frames. Figure 5-11 shows a fixed-size frame and Figure 5-12 shows a variable-sized frame.

From the procedure descriptor, you can determine which type of stack frame the procedure has. The field PDR.framereg stores the frame pointer register number. If this field has a value of 30 ($sp), the stack frame is a fixed-size frame. If it has a value of 15 ($fp), the stack frame is a variable-size frame.

Figure 5-11 Fixed-Size Stack Frame

Figure 5-12 Variable-Size Stack Frame

For both types of stack frames, the value of PDR.frameoffset is the size of the fixed part of the stack frame. In the case of a fixed-size frame, it is the entire frame size. For a variable-sized frame, the entire frame size cannot be determined from the symbol table. The code may dynamically increase and decrease the size of the frame multiple times during procedure execution.

The virtual frame pointer represents the contents of the frame pointer register at procedure entry, prior to prologue execution. The (real) frame pointer is the contents of the frame pointer register after prologue execution. The difference between the virtual and real frame pointer values is the fixed frame size, which is subtracted from the $sp contents during the procedure prologue. Note that stack offsets recorded in the symbol table are relative to the virtual frame pointer, not the real value used at run time.

The contents of the frame pointer register at are used at run time as the base address for accessing data, such as parameters and local variables, on the stack. See Section 5.3.4.3 for details.


5.3.4.2   Procedure Addresses

The PDR.adr is reliably updated by the linker starting with version V3.13 of the symbol table. To determine the procedure start address for a given PDR in prior versions of the symbol table, the following algorithm is recommended:

if (HDRR.vstamp >= 0x30D || PDR.isym == isymNil) 
    return(PDR.adr)
else
    foreach FDR in HDRR
        foreach PDR in FDR
            if PDR matches
                if (FDR.csym == 0)  /* Use external symbol */
                    return (EXTR[PDR.isym].asym.value)
                else                /* Use local symbol */
                    return (SYMR[FDR.isymbase + PDR.isym].value)

If local symbol information is present for the given PDR, the isym field identifies the local symbol table entry that contains the start address of the procedure. If no local symbol information is present, the isym field identifies the external symbol table entry containing the start address of the procedure. If no symbol information is present for the PDR, the isym field is set to isymNil and the adr field will contain a reliable start address.


5.3.4.3   Local Symbol Addresses

Local variables and parameters may be stored in registers or on the stack. Those stored in registers (identified by a storage class of scRegister) do not have addresses. For local variables and parameters with addresses, this section explains how to calculate their run-time locations from the symbol table information.

To calculate the run-time address for a local variable (stLocal) based on its symbol table value:

Frame pointer - PDR.localoff + SYMR.value

To calculate the run-time address for a parameter (stParam) based on its symbol table value:

Frame pointer - argument_home_area_size + SYMR.value

The argument home area is a portion of the stack frame designated for parameter storage. See Figure 5-11 for an illustration. For historical reasons, the size of this area is always 48 bytes.

The calculations above must be performed at run time when the actual frame pointer value is known. Note that the value becomes valid only after the procedure prologue has executed.

To calculate the locations based on static information, convert the symbol's value to an offset from the real frame pointer:

Local:

PDR.frameoffset - PDR.localoff + SYMR.value

Parameter:

PDR.frameoffset - 48 + SYMR.value

The resulting offsets are always positive values because the frame pointer contains the address of the lowest memory in the fixed part of the stack frame at run time.


5.3.4.4   Uplevel Links

An uplevel link is the real frame pointer of an ancestor of a nested routine. The routine nesting may be a feature of the language (such as Pascal), or the nesting may occur in optimized code which has been decomposed for parallel execution into smaller routines. Uplevel links provide debuggers a method of finding all local symbols associated with the ancestor routine.

When a procedure is passed a static link, that static link will be represented within the scope of the procedure definition as a local automatic symbol with a special name beginning with "__StaticLink.". The lifetime of this symbol begins after the procedure prologue has been executed.

The static link symbol will occur between the procedure's parameter definitions and the first stBlock symbol.

The full name of the symbol will be "__StaticLink." followed by a positive decimal integer with no leading zeros. This integer value identifies the number of levels up the ancestor tree the static link points to.

For example, if the name is "__StaticLink.3" it will contain the static link of the procedure in which it is defined, and that procedure's static link points to a stack frame that is three levels up in the procedure's ancestor tree, the great-grandfather of the procedure.

Figure 5-13 Representation of Uplevel Reference

Debuggers of Tru64 UNIX object files need to use the uplevel link information to determine which symbols are visible at a location in the program and to compute the addresses of local symbols in ancestor routines. When the debugger needs the current value or address of a name that might be defined as an uplevel reference, two separate actions may be required: finding the procedure that defines the currently visible instance of that name, and finding the address of the currently visible instance of that name. If only type information is required, finding the procedure that defines the name may be sufficient.

Finding the defining procedure is accomplished by repeatedly looking up the name in the local symbol table of a chain of procedures that extends from the current procedure through its chain of ancestors until either the name is found in a procedure or the end of the chain of ancestors is reached without finding the name. If this search terminates without finding the name, the debugger should conclude that the name is not visible by uplevel reference at the current location in the program.

When searching for the desired procedure, the debugger should count how many levels in the ancestor chain were traversed before finding the name. If zero levels were traversed, the name is defined within the current procedure and is not an uplevel reference. The number of levels traversed is assumed to be in the variable LevelsToGo in the algorithm below.

Finding the address for the name involves locating static link values and dereferencing them with appropriate offsets. Basically, while the number of levels to be traversed is greater than zero, find the static link symbol for the current level and obtain its value. Finally, add the desired symbol's offset from the real frame pointer to the final static link value.

The recommended algorithm for finding the address is as follows:

LevelsToGo = <from name lookup above>
NewProc = CurrentProcedure
NewFrame = FramePointerValue(CurrentProcedure)
Failed = false
while (LevelsToGo > 0 && !Failed)
    StaticLink = FindStaticLinkSym(NewProc)
    if (StaticLink == NULL)
        Failed = true
    else
        NewFrame = *(NewFrame + StaticLink->symbol.offset)
        Levels = StaticLinkLevels(StaticLink)
        LevelsToGo = LevelsToGo - Levels
        for (; Levels > 0; Levels--)
            NewProc = NewProc->proc.parent

if Failed is true after executing this algorithm, required information about static links is missing in the symbol table, and an error has occurred. If LevelsToGo ends up less than zero, the optimizer's static link optimization has eliminated a static link level that would be needed to compute the address of the name. It is recommended that debuggers inform the user that optimization prevents the debugger from computing the address of the name.

If Failed is false and LevelsToGo is equal to zero, the address for the currently visible instance of the name is NewFrame plus the offset of the name with respect to the real frame pointer for NewProc.

The function StaticLinkLevels returns the integer at the end of the name for the indicated static link symbol.


5.3.4.5   Finding Thread Local Storage (TLS) Symbols

This section explains how to interpret symbolic information for TLS symbols (identified by a storage class of scTlsdata or scTlsbss). See Section 3.3.9 or the Programmer's Guide for general information on TLS.

A TLS symbol's value contains its offset from the start of the TLS region for that object. This offset can be used at process execution time to determine the address of the TLS symbol for a particular thread.

A debugger can calculate TLS symbol addresses by looking up the address of the TLS region using run-time structures and adding the offset of the TLS symbol to that address. The following formula can be used to calculate TLS symbol addresses.

TLS sym address = *(TEB.TSD + __tlskey) + SYMR.value

A detailed description of this formula follows:

  1. Get the address of the Thread Environment Block (TEB).
  2. Get the address of the Thread Specific Data (TSD) array from the TEB structure.
  3. Get the offset of the TLS pointer in the TSD array.

    This offset is normally stored in a .lita or .got entry. This value should be accessed using the symbol __tlskey . In spite of the fact that __tlskey is a label symbol, no ampersand is used in this context because the value that the label points to is being retrieved. The address of __tlskey will need to be adjusted by the address mapping displacement in the same manner that the debugger adjusts addresses of text and data symbols.

    For non-shared objects, the .lita entry contains the constant offset (2048). This offset identifies the first and only TSD slot (256) that will be allocated for the TLS pointer.

    For shared objects, the .got entry labeled by __tlskey is initially 0, indicating that the TSD slot has not been allocated yet. After the the object's initialization routines have run, a TSD key will be allocated and the .got entry will contain its offset.
  4. Get the TLS pointer value. The TLS pointer is a 64-bit address set to the start of the TLS Region.
  5. Calculate the address of the TLS symbol by adding the offset of the TLS symbol to the TLS pointer value.


5.3.5   Profile Feedback Data

Profile feedback data is stored in entries in the optimization symbols table with tag type PPODE_PROFILE_INFO. The data contained in this section is intended for Compaq internal use only. It contains execution profiling feedback used by compilers and the om utility.

Profile feedback data contains relative file descriptor and local symbol table indexes. If an object tool removes, adds, or rearranges relative file descriptors or local symbol table entries it must also remove all optimization symbol table entries including the profile feedback data.


5.3.6   Scopes

From a user-program's point of view, an identifer's scope determines its visibility in different parts of the program. Programming languages provide facilities for declaring and defining names of procedures, variables and other program components inside various scoping levels. This section briefly discusses the concept of scope and then explains how it is represented in the symbol table. References are made to structures in the auxiliary symbol table; see Section 5.3.7.3 for details.

Generally speaking, the four main scoping levels in a program are block scope, procedure scope, file scope, and program scope. Most programming languages have constructs to implement at least these scoping levels. Figure 5-14 shows the hierarchy of these scopes.

Figure 5-14 Basic Scopes

Names with block scope can only be referenced inside the declaring block. Blocks are delimited by begin and end markers, the syntax of which varies among languages.

Names with procedure scope are only recognized inside their enclosing subroutines. For instance, the names of formal parameters and local variables declared inside a procedure are accessible only to that procedure's executable statements.

Names with file scope can be referenced by any instruction within the file where they are declared. A file can be composed of procedures and data external to any procedure. Both external data names and procedure names can have file scope or program scope. Note that in a compilation involving only a single file or in a compilation for a programming language with no separate-compilation facilities, file scope and program scope are equivalent.

Names with program scope are visible everywhere in the program, even when the executable program is built from many source and header files. The linker must resolve these names or pass them to the dynamic loader to resolve. See Section 5.3.10 for more information about symbol resolution.

In the symbol table, procedure scope, file scope and program scope correspond to local, static, and global symbols, respectively. Block scope names are also local symbols. Local and static symbols appear in the local symbol table, and global symbols are in the external symbol table.


5.3.6.1   Procedure Scope

Although procedure symbols can only be global or static (with symbol types stProc and stStaticProc, respectively), procedure entries appear in the local symbol table to identify the containing scope of their local data. The set of symbols appearing in the local symbol table to describe a procedure scope and their associated auxiliary entries is shown in Figure 5-15. Global procedures also have entries in the external symbol table. As illustrated, the indices of these external entries point to the scoping entries in the local symbol table.

In this chapter, all diagrams of symbol table representations use arrows to show that one entry contains an index to another entry. For external and local symbol table entries, the index used is contained in the index field. For auxiliary symbols, the isym or RNDXR field is the index used. Any exceptions to this general rule are noted in the diagrams.

Figure 5-15 Procedure Representation

 

A special instance of a procedure definition occurs for a procedure with no text. This type of procedure occurs only in the local symbol table and is very similar to the representation of other procedures. It is generally used for procedures that have been optimized away that still need to be represented for debugging or profiling information.

Figure 5-16 Procedure with No Text

A procedure with no code can contain only nested procedures that also have no code associated with them. If a procedure with no code does not contain any nested procedures, the stBlock/stEnd symbol pair can be omitted from the representation.

The stProc symbol included in this representation is distinguished from similar stProc symbols by its value field that is set to addressNil (-1).


5.3.6.2   File Scope

As in the case of procedures, file name entries appear in the local symbol table to define the file's scope. This representation is shown in Figure 5-17. Note that file symbols appear in the local symbol table only.

Figure 5-17 File Representation


5.3.6.3   Block Scope

In general, the local symbol table denotes scoping levels with stBlock and stEnd pairs, as shown in Figure 5-18.

All symbols contained between these two entries belong to the scope they describe. Nested blocks are possible, and stEnd symbols match the most recent occurences of stBlock (or other opening symbol entries such as stProc or stTag).

Figure 5-18 Block Representation

Block scopes occur in many languages. In C, they take the form of lexical blocks. In C++, declarations can occur anywhere in the code. In Pascal and Ada, nested procedures are possible, with local variables at any or all levels.


5.3.6.4   Namespaces (C++)

A C++ namespace is a mechanism that allows the partitioning of the program global name space. This partitioning is intended to reduce name clashing and provide greater program managability to C++ developers.

Figure 5-19 C++ Namespace Representation

A namespace definition may exist only at the global scope or within another namespace. The namespace representation in Figure 5-19 shows a single contribution to a namespace. This representation may be replicated many times in the symbol table for a single namespace. A namespace definition may be continued within the same file or over multiple source files.

A single namespace contribution that spans multiple source files is represented as if it were contained entirely within the source file in which it began.

Namespaces may be aliased, allowing a single namespace to be refered to by multiple names. Namespace components may also be referenced without their namespace qualification if they are included within a scope by a using directive or using declaration. The representations of namespace aliases, using directives, and using declarations are shown in Figure 5-19. Namespace definitions, namespace component declarations, namespace aliases, using directives, and using declarations occur only in the local symbol table. Namespace component definitions may occur in the local or external symbol table.


5.3.6.4.1   Namespace Components

The components of a namespace are represented in two parts: declarations and definitions. Namespace components that do not require definition must be declared in the namespace definition. Namespace components that are referenced by a using declaration must be declared in the namespace definition. All other namespace component declarations may be omitted from the namespace definition.

Namespace component names are mangled only as needed. Function and data definitions have mangled name definitions in the local or external symbol table. These entries are mangled for type-safe linkage and as a method of matching components with the namespaces to which they belong. Names of component declarations within a namespace definition may or may not be mangled. They are not required to include the namespace name in their mangled form.

Empty namespace contributions can be omitted, but at least one instance of a namespace definition must occur somewhere in the local symbol table. This definition is required because name mangling rules do not distinguish namespace component definitions from class member definitions.


5.3.6.4.2   Namespace Aliases

Namespace aliases can occur in namespace, file, procedure or block scope in the local symbol table. The index value for the stAlias entry is an auxiliary table index. The auxiliary entry is a RNDXR record containing the local symbol table index of the stNamespace symbol in the first instance of a namespace definition within a compilation unit. For an alias of an alias, the RNDXR record can also contain the index of another stAlias symbol in the local symbol table. Section 9.2.5 provides an example of a namespace alias.

The stAlias symbol type may be used in future versions of the symbol table format as a general purpose symbol alias representation. The semantic interpretation of the stAlias symbol depends on the type of the symbol it aliases.


5.3.6.4.3   Unnamed Namespace

An unnamed namespace can be declared at the global scope or within another namespace. An unnamed namespace is unique within a compilation unit. Multiple contributions to a unique unnamed namespace are not allowed. Unnamed namespace contributions are included in the non-mergeable portion of a C++ header file.

Unnamed namespace components are subject to the same rules as named namespaces for declarations and definitions.

The stNamespace symbol for an unnamed namespace has no name, and its iss field is set to issNil. A compiler generated name is used to identify the unnamed namespace in the mangled names of unnamed namespace components. A convention for this special name is currently being investigated and will be identified in the next release of this document. The unnamed namespace example in Section 9.2.4 will use the name __unnamed until the actual naming convention has been determined.


5.3.6.4.4   Usage of Namespaces

A C++ using directive or a using declaration is represented by a symbol of type stUsing. It may occur in any scope in the local symbol table. The index value for the stUsing entry is an auxiliary table index. If the stUsing entry represents a using declaration for a single namespace component, the auxiliary entry is a RNDXR record containing the local symbol table index of a namespace component declaration. If the stUsing entry represents a using directive, its RNDXR auxiliary contains the local symbol table index of the stNamespace symbol in the first definition of that namespace in the compilation unit.

A using directive for a namespace alias is represented with a RNDXR auxiliary that directly references the aliased namespace. This representation contains no record of the alias referenced by the using directive.

Names are not required for stUsing entries, but they can be set to match the namespace or namespace component to which they refer.

Namespace components that are referenced by an stUsing symbol must be declared in the namespace definition.

Section 9.2.3 provides an example of namespace definitions and uses.


5.3.6.5   Exception Handling Blocks (C++)

In C++, a special scoping mechanism is introduced to expand user-defined exception-handling capabilities. Exception handlers are defined to "catch" exceptions that are "thrown" by other functions. The symbol table must contain sufficient information to recognize the scope of a handler. The compiler generates special symbols to identify where exception handlers are valid.

Figure 5-20 C++ Exception Handler Representation


5.3.6.6   Common Blocks (Fortran)

Fortran common blocks constitute another scoping level. Fortran uses common blocks as a way of specifying data that is global or shared between program units. A common block is global storage that can be named, allotted, accessed, and used by various subroutines. The block can be named or unnamed; unnamed blocks are known as "blank commons". Internal to the symbol table, blank commons are named "_BLNK_".

Figure 5-21 shows the symbolic representation of Fortran common blocks.

Figure 5-21 Fortran Common Block Representation

Because a Fortran common is represented as a synthesized file, it also has an entry in the file descriptor table. Furthermore, a global symbol with the same name is also present in the external symbol table.

An example of a Fortran common block can be found in Section 9.3.1.


5.3.6.7   Alternate Entry Points

Fortran also has a facility for creating alternate entry points in procedures. An alternate entry point is represented using an stProc, scText symbol. In the procedure descriptor table, an alternate entry point is identified by a lnHigh field with a value of -1. Procedure descriptors for alternate entry points follow the procedure descriptor for the primary entry point. In the local symbol table, an alternate entry point has an entry inside the scope of the procedure's main entry.

The representation of a procedure with an alternate entry point is shown in Figure 5-22

Figure 5-22 Alternate Entry Point Representation

An example of Fortran alternate entries can be found in Section 9.3.2.


5.3.7   Data Types in the Symbol Table

A data element's type dictates its size and interpretation in a programming environment. One of the symbol table's most important tasks is to represent data types in a compact and complete manner.

Type information is stored in the local and auxiliary symbol tables. This section provides guidelines for understanding the type information plus specific examples for depicting a range of types.


5.3.7.1   Basic Types

All programming languages have a set of simple types that are built into the language and from which other data types can be derived. Examples of simple types are integer, character, and floating point. Languages also provide constructs for creating user-defined types based on the simple types. For example, a C++ class can be built using any simple type or previously defined user-defined type and the language facility for declaring classes.

Similarly, a basic type in the symbol table is a building block from which each language constructs its type information. Basic type (bt) values directly represent many of the simple types for supported languages; for instance, the value btChar indicates a character. Other bt values represent language constructs for building aggregate types; a value of btStruct may be used, for example, to represent a C structure or Pascal record.

The symbol table uses approximately forty basic type values. The interpretation of some of these values is language dependent. See Table 5-4 for a list of all values.


5.3.7.2   Type Qualifiers

Type qualifiers can be applied to basic types to create other data types. Examples are "pointer to" and "array of". Generally the number and order of type qualifiers is unrestricted.

The type qualifier "function returning" (tqProc) is not used in V3.13 of the symbol table. However, it is used in prior versions for variables declared as function pointers. This older representation uses a TIR record to store the function type in the bt value followed by as many type qualifiers as necessary. A major limitation of this representation is the inability to represent parameter types.

The symbol table currently uses eight type qualifiers. See Table 5-5 for a list of all possible values.


5.3.7.3   Interpreting Type Descriptions in the Auxiliary Table

This section explains in detail the encoding of type descriptions in the symbol table. To fully describe the type of a symbol, the auxiliary symbol table must be created and referenced. Compilation with full symbolic information (-g option on system compilers) results in the creation of this table.

To correctly decode the type information, proceed sequentially, beginning with the symbol table entry. Several fields may be required from other symbol table structures:

The first step is to determine whether the symbol contains an index of an auxiliary table description.

Table 5-12 Symbol Table Entries with Associated Auxiliary Table Type Descriptions

Symbol Type

Storage Class

Conditions

SYMR Field Containing AUXU Index

stGlobal

Any

None

index

stStatic

Any

None

index

stParam

Any

None

index

stLocal

Any

Local symbol table

index

stProc

Any

Local symbol table only

index

stBlock

scInfo

Inside an scVariant block only

value

stMember

scInfo

None

index

stTypedef

scInfo

None

index

stStaticProc

Any

Local symbol table only

index

stConstant

Any

None

index

stBase

scInfo

None

index

stVirtBase

scInfo

None

index

stTag

scInfo

None

index

stInter

scInfo

None

index

stNamespace

scInfo

None

index

stUsing

scInfo

None

index

stAlias

scInfo

None

index

 

If the index does represent a record in the auxiliary symbol table, the interpretation of the first auxiliary entry (AUXU) depends on the type of the symbol:

The next task is to examine the contents of the TIR. The TIR contains constants representing the basic type of the symbol and up to six type qualifiers, labeled tq0-tq5. If a type has more than one qualifier, they are ordered from lowest to highest. Lower qualifiers are applied to the basic type before higher qualifiers. All unused tq fields are set to tqNil, and no tqNil fields are present before or between other type qualifiers.

In addition to the basic type and type qualifiers, the TIR contains two flags: an fBitfield flag to mark whether the size of the type is explicitly recorded, and a continued flag to indicate that the type description is continued in another TIR. If fBitfield is set, the TIR is immediately followed by a width entry. If more than six type qualifiers are required for the current definition, the description is continued, and the continued flag is set. If exactly six type qualifiers are needed, all six fields are used and the continued flag is cleared.

To illustrate, consider the type "array of pointers to integers". The basic type is "integer" and has two qualifiers, "array of" and "pointer to". Each element of the array is a "pointer to integer". Therefore, the qualifier "pointer to" must be applied first to the basic type "integer". In this example, the qualifier "pointer to" is lower than the qualifier "array of". The contents of the TIR are as follows:

        bt: btInt
        tq0: tqPtr
        tq1: tqArray
        tq2: tqNil
        tq3: tqNil
        tq4: tqNil
        tq5: tqNil
        continued: 0
        fBitfield: 0

The contents of the TIR dictate how to interpret any subsequent records. The records appear in a prescribed order:

For a type description containing more than one TIR, the fields of all TIR records are interpreted in the same way. When a TIR is reached with the flag cleared and any records associated with that TIR have been decoded, the type description is complete.

As an example, consider an array of structures with the fBitfield flag set. A total of seven auxiliary records can be used to describe the type:

  1. The TIR with a basic type of btStruct and with tq0 set to tqArray
  2. A width record. The size of the basic type
  3. A RNDXR record. A pointer to the structure definition in the local symbol table
  4. A RNDXR record. A pointer to the array index type description elsewhere in the auxiliary table
  5. A dnlow record. The lower bound of the array's range
  6. A dnhigh record. The upper bound of the array's range
  7. A width record. The distance in bits between each element in the array

If the continued flag of the TIR is cleared, the width record corresponding to the array qualifier is the final AUXU for this type description.

For another view of this process, see Figure 5-23. Each box represents one auxiliary entry belonging to the symbol's type description. Using the flowchart, an ordered list of entries can be assembled.

Figure 5-23 Auxiliary Table Interpretation

Figure 5-24 Auxiliary Table "ti" Interpretation

Figure 5-25 Auxiliary Table "bt vals" Interpretation

Figure 5-26 Auxiliary Table "arrays" Interpretation

 

Figure 5-27 Auxiliary Table "range" Interpretation

Figure 5-28 Auxiliary Table "rndx" Interpretation

 

The final step is to decode the RNDXR records. The basic types that are followed by RNDXR records require reference to another local or auxiliary symbol to complete the type description. Interpret the RNDXR records as follows:

Additionally, the index of every RNDXR used as a pointer must be mapped through the relative file descriptor table (see Section 5.3.2.1), if the table exists. The rfd field of the record controls this mapping. The following algorithm can be used to locate the symbol referenced by the relative index record:

if (RNDXR.rfd == ST_RFDESCAPE)
    RFD = (++AUXU).isym
else 
    RFD = RNDXR.rfd 
if (HDRR.crfd) /* RFD table exists */
    IFD = (current FDR's RFD table)[RFD]
else
    IFD = RFD

if (SYMR needed)
    SYMBASE = FDR[IFD].isymBase
    SYMR = SYMBASE[RNDXR.index]
else if (AUXU needed)
    AUXBASE = FDR[IFD].iauxBase
    AUXU = AUXBASE[RNDXR.index]

 


5.3.8   Individual Type Representations

This section provides sketches of type representations in the local and auxiliary symbol tables. The connections between the two tables is depicted for each type. This form of representation is only possible when full symbolic information is present.

Note that external symbols as well as local symbols reference the auxiliary table, although the examples in this chapter use local symbols only.


5.3.8.1   Pointer Type

A pointer is a variable containing the address of another variable. A pointer is represented by a tqPtr type qualifier modifying another type. A pointer is represented by a single symbol with an entry in the auxiliary table, as shown in Figure 5-29.

Note that if the pointer referenced a user-defined type, such as a class or structure, the TIR would be followed by an RNDXR (and possibly an isym).

Figure 5-29 Pointer Representation

The combination of type qualifiers tqFar and tqPtr are used to represent a short (32-bit) pointer. This pointer type is used with the XTASO emulation.


5.3.8.2   Array Type

An array is a list of elements that all have the same type. Arrays may be fixed size and allocated at compile time or dynamically sized and allocated at run time. This section describes the fixed-size array symbol table representation. For information on Fortran dynamic arrays, see Section 5.3.8.9. For conformant arrays in Pascal and Ada, see Section 5.3.8.10.

An array is represented by a tqArray or tqArray_64 type qualifier applied to another type. This second type describes the type of all elements in the array. In the local or external symbol table, a single entry represents an array. Figure 5-30 shows the symbol table description for an array.

Figure 5-30 Array Representation

Note that for an array of elements of a user-defined type, such as a class or structure, another RNDXR (and possibly an isym) would be inserted between the TIR and the RNDXR describing the subscript type.

If an array has multiple dimensions, the symbols describing the dimension appear in the order of innermost to outermost. For example, the following declaration produces a TIR with the tqArray qualifier followed by the RNDXR and range description for 0-1 followed by the entries for the dimension 0-99:

float floattable[100][2]

Some arrays may have dimensions too large to represent in the 32-bit format shown in Figure 5-30. Such arrays are represented using a 64-bit format in which two auxiliary entries are used for the dimension bounds and size. Figure 5-31 illustrates the 64-bit representation.

Figure 5-31 64-Bit Array Representation


5.3.8.3   Structure, Union, and Enumerated Types

This section applies to data structures in languages other than C++. For the C++ structure, union, or enumerated type representation, see Section 5.3.8.6.

Structures, unions, and enumerated types have a common representation. All three are identified using "tags" and contain zero or more fields. In the symbol table, the tag is the name associated with the starting stBlock symbol for the structure's set of local symbols. Note that it may be empty because the tag is optional. Symbols for fields follow. The definition is completed by a block-end symbol matching the block-start symbol.

Figure 5-32 contains a graphical depiction of this set of symbols.

Figure 5-32 Structure Representation

The structure members have auxiliary table indices pointing to their type descriptions.

Untagged structures and unions are represented with a NULL tag name. Unnamed structures can be embedded in other structures and are represented as a NULL-named member of the outer structure. See Section 9.1.1 for an example of an unnamed structure.

A structure can contain a field that is a pointer to itself. This field is represented by an stMember symbol with an auxiliary table entry that references the beginning of the structure's block of local symbols, as shown in Figure 5-33.

Figure 5-33 Recursive Structure Representation

When a field within a structure is itself a structure, the compiler may choose to generate the structure definitions either sequentially or embedded, as shown in Figure 5-34.

Figure 5-34 Nested Structure Representation

The following declaration might result in the nested structure representation:

struct line { 
        struct point { 
            float x, y;
        }  p1, p2;
};

 


5.3.8.4   Typedef Type

Most languages allow programmers to choose alternate names, or aliases, for data types. The alias created by such a facility (such as C's typedef) is represented as a single local symbol entry that has a pointer to its type description in the auxiliary table. The auxiliary entry contains a pointer to the definition of the type name, as shown in Figure 5-35.

Figure 5-35 Typedef Representation


5.3.8.5   Function Pointer Type

Languages such as C and C++, which allow pointers to functions, represent the type of the function pointer using a special stProc/scInfo block describing the parameters and return value for the function as shown in Figure 5-36.

Figure 5-36 Function Pointer Representation

The stProc/scInfo entry has its value set to -2, which distinguishes it from similar entries used to represent procedures with no text and C++ member functions. The stProc/scInfo and stEnd/scInfo entries have null names in the function pointer representation. The parameters are optional and may or may not be named.

This representation for function pointers is new in V3.13. The previous representation used the combination of type qualifiers tqPtr and tqProc in the TIR of the function pointer variable. Prior to V3.13, it was not possible to represent the parameter types for a function pointer.


5.3.8.6   Class Type (C++)

A C++ class resembles an extended C structure. One major distinction is that class fields (referred to as "members") can be functions as well as variables. The set of symbols created for a class is organized as follows:

Another characteristic of classes is that symbols are defined implicitly. For example, all classes have an operator= operator-overloading function included in the class definition and a "this" pointer to its own type as a parameter to all member functions. These symbols are always included explicitly in the symbol table description.

Figure 5-37 is a graphical representation of the set of symbols for a class.

Figure 5-37 Class Representation

Class members, including member functions, have auxiliary references that point to their type descriptions. Note that member functions are represented as prototypes. The set of symbols defining the member function is elsewhere in the symbol table. To locate the definition of a member function, a name lookup can be performed using the mangled name of the member function with its class name qualifier. See Section 5.3.10.3 for information on name mangling.

C++ structures, unions, and enumerated types are represented the same way as classes. The different data structures are distinguished by basic type value.

The symbol table does not represent class member access attributes.

Examples of base and derived classes can be found in Section 9.2.1.


5.3.8.6.1   Empty Class or Structure (C++)

The representation of empty classes or structures in C++ is shown in Figure 5-38.

Figure 5-38 Empty Class or Structure (C++)


5.3.8.6.2   Base and Derived Classes (C++)

Hierarchical groups of classes can be designed in C++. A base class serves as a wider classification for its derived classes, and a derived class has all of the members and methods of the base class, plus additional members of its own. In the symbol table, the set of symbols denoting a derived class is nearly identical to that for a non-derived class. The derived class includes an additional stBase or stVirtBase symbol that identifies its corresponding base class, and it does not need to duplicate the definitions for the base class members. This representation is shown in Figure 5-39.

Figure 5-39 Base Class Representation

The representation of virtual base classes for C++ relies on the definition of a special symbol that identifies the virtual base table. The name for this symbol is derived from the name of the class to which it belongs. For example, the virtual base table symbol for class C5 would be named "_btbl_2C5". This table contains entries for base class run-time descriptions.

A class can include the special member "_bptr". This class member is a pointer to the virtual base table for that class.

The value field for a virtual base class symbol (stVirtBase/scInfo) serves as an index (starting at 1) into the virtual base class table.


5.3.8.7   Template Type (C++)

Templates are a C++-specific language construct allowing the parameterization of types. C++ class templates are represented in the symbol table for each instantiation, but not for the template itself. The set of class symbols is unchanged from the set shown in Figure 5-37.


5.3.8.8   Interlude Type (C++)

Interludes are compiler generated functions in C++. They are represented in the local symbol table with special names starting with the "__INTER__" prefix. Their representation in the symbol table makes use of two RNDXR aux entries to identify the related member function and the actual interlude function, both of which are local symbol table entries.

Figure 5-40 Interlude Representation


5.3.8.9   Array Descriptor Type (Fortran90)

A Fortran90 array descriptor is a structure that describes an array: its location, dimensions, bounds, sizes, and other attributes. Array descriptors are described in detail in the Fortran 90 User Manual for Tru64 UNIX. Fortran90 includes several types of arrays for which the dimensions or dimension bounds are determined at run time: allocatable arrays, assumed shape arrays, and array pointers.

Two symbol table representations can be used for an array descriptor. The default representation describes the array descriptor itself. The alternate representation describes what is known of the array itself at compile time.

No matter what symbolic representation is used, symbols of this type point to a data location at which the array descriptor is allocated. One of the array descriptor fields contains a pointer to the actual array. Other fields are used to describe the attributes of the array. Fields that describe the number of dimensions and upper and lower bounds are filled in at run time.

By default, array descriptors are described by a structure tag representation. Most of the array descriptor fields are represented as structure members. (Excluded fields are not needed by debuggers.) Special tag names are used to identify array descriptor structure definitions: $f90$f90_array_desc (assumed-shape array), $f90$f90_ptr_desc (pointer to array) and $f90$f90_alloc_desc (allocatable array). Figure 5-41 shows the format of this representation.

Some compilers may emit other fields in addition to those shown in Figure 5-41. A consumer's ability to interpret additional fields depends on its knowledge of the producing compiler.

Figure 5-41 Array Descriptor Representation (I)

An example of the default Fortran array descriptor representation can be found in Section 9.3.3.

An alternate representation for array descriptors may be found in symbol tables prior to V3.13. The overloaded basic type value 28 indicates an array descriptor in the TIR, and dimension bounds are set to [1:1] indicating their true size is unknown. The alternate representation does not provide any information describing the contents of the array descriptor itself, so debuggers must assume a static representation for the descriptor and lookup the fields at their expected offsets.

This representation is substantially more compact in the local symbol table, but it provides no way to distinguish between the different types of array descriptors.

Figure 5-42 shows the format of the older array descriptor representation.

Figure 5-42 Array Descriptor Representation (II)


5.3.8.10   Conformant Array Type (Pascal)

Full details are not currently available for Pascal's conformant array representation. A Pascal conformant array is very similar to Fortran's assumed shape arrays. It is an array parameter with upper and lower dimension bounds that are determined by the input argument. A conformant array is represented by an array descriptor. The special names used and the format of the array descriptor differ from those used for Fortran. The DEC Pascal release notes contain additional information on conformant arrays.


5.3.8.11   Variant Record Type (Pascal and Ada)

A variant record is an extension to the record data type, which is a Pascal or Ada data structure akin to a C struct and is represented in the same manner in the symbol table. The variant part of the record consists of sets of one or more fields associated with a range of values. Only one such set is part of the record, and it is selected based on the value of another record field. Any number of variant parts can be embedded in a single record.

The local symbol table entries for the variant part of a record are contained within a block with the storage class (sc value) scVariant. The value field of the stBlock entry contains the index of the local symbol entry for the member of the record whose value determines which variant arm is used. The variant block contains multiple inner blocks, each representing a variant arm. The value field of each of these block entries is an auxiliary table index. Each auxliary table entry starts with a count, which indicates how many range entries follow. The range entries describe the values associated with the block.

Figure 5-43 is a graphical representation of a variant record.

Figure 5-43 Variant Record Representation

Prior to V3.13 of the symbol table, variant records were represented differently. Figure 5-44 depicts the older representation.

Figure 5-44 Variant Record Representation (pre-V3.13)

An example of a Pascal variant record can be found in Section 9.4.3.


5.3.8.12   Subrange Type (Pascal and Ada)

A subrange data type defines a subset of the values associated with a particular ordinal type (the "base type" of the subrange). Ordinal types in Pascal include integers, characters, and enumerated types. The symbol table representation of a subrange uses the btRange or btRange_64 type followed by an auxiliary index identifying the base type and entries providing the bounds of the subrange. The 32-bit representation is shown in Figure 5-45 and the 64-bit representation is shown in Figure 5-46.

Figure 5-45 Subrange Representation

 

Figure 5-46 64-bit Range Representation

An example of a Pascal subrange can be found in Section 9.4.2.


5.3.8.13   Set Type (Pascal)

A set is a data type that groups ordinal elements in an unordered list. The arithmetic and logical operators are overloaded in Pascal; this enables them to be used with set variables to perform classic set operations such as union and intersection. A special auxiliary type definition btSet exists to identify this type. The symbol table representation is depicted in Figure 5-47.

Figure 5-47 Set Representation

The element type for a set is typically a range or an enumeration. An example of a Pascal set can be found in Section 9.4.1.


5.3.9   Special Debug Symbols

A variety of special symbols are used throughout the symbol table to convey call frame information, special type semantics, or other language specific information. These names are reserved for use by compilers and other tools that produce Tru64 UNIX object files.

Name

Purpose

__StaticLink.*

Uplevel link. See Section 5.3.4.4.

_BLNK__

Fortran unnamed common block. See Section 5.3.6.6.

MAIN__

Fortran alias for main program unit. See Section 5.3.10.4.

<ARGNAME>.len

Generated parameter for Fortran routines. It contains the length of <ARGNAME>, a parameter of character type.

.lb_<ARRAY>.<dim>
.ub_<ARRAY>.<dim>

Lower and upper bounds of particular dimensions of arrays–when the array has an explicit shape, yet some bounds come from non-constant specification expressions (array arguments in Pascal and Fortran routines).

$f90$f90_array_desc
$f90$f90_alloc_desc
$f90$f90_ptr_desc

Variants of Fortran-90 described arrays (assumed shape, ALLOCATABLE, and POINTER, respectively). See Section 5.3.8.9.

cray pointee

Fortran-generated typedef describing the type of a variable pointed to by a CRAY pointer.

pointer

Fortran generated typedef describing the type of a scalar with the POINTER attribute.

_DECCXX_generated_name_*

DECC++ compiler-inserted name for unamed classes and enumerations.

this

Hidden parameter in C++ member functions that is a pointer to the current instance of the class. See Section 5.3.8.6.

__vptr

Hidden C++ class member containing the virtual function table. See example in Section 9.2.2.

__bptr

Hidden C++ class member containing the virtual base class table. See example in Section 9.2.2.

__vtbl_*

Global symbols for C++ virtual function tables. See example in Section 9.2.2.

__btbl_*

Global symbols for C++ virtual base class tables. See example in Section 9.2.2.

__control

Hidden argument to C++ constructors controlling descent (in the face of virtual base classes).

__t*__evdf

Structure used to maintain a list of C++ global deconstructors.

t*__iviw

C++ static procedure used for global constructors.

t*__evdw

C++ static procedure used for global destructors.

__t*_thunk

C++ static procedure used to provide a defaulted argument value.

__INTER__*

C++ interlude. See example in Section 9.2.2.

__unnamed::*

C++ unnamed namespace components. See example in Section 9.2.4.

 


5.3.10   Symbol Resolution

Among the linker's chief tasks is symbol resolution. Because most compilations involve multiple source files and virtually all programs rely on system libraries, a process is necessary to resolve conflicting uses of global symbol names. The linker must decide which symbol is referenced by a given name. This section highlights the major issues involved in that decision. Related information is contained in Section 6.3.4 and the Programmer's Guide.

Symbol table entries provide information relevant to performing symbol resolution. External symbols with a storage class of sc(S)Undefined, sc(S)Common, or scTlsCommon must be resolved before they are referenced. By default, the linker will not mark an object file with unresolved symbols as executable. However, linker options give programmers a fair measure of control over its symbol resolution behavior. See ld(1) for more information.


5.3.10.1   Library Search

Symbols referenced, but not defined in the main executable of an application must be matched with definitions in linked-in libraries. The linker combines objects, archives, and shared libraries while attempting to resolve all references to undefined symbols. The Programmer's Guide covers related topics in detail, such as how to specify libraries during compilation and the search order of libraries.

In general, main executable objects and shared libraries are searched before archive libraries. If no undefined external symbols remain, archive libraries in the library list do not have to be searched, because archive members are only loaded to resolve external references. Archives are not used to find "better" common definitions (see Section 5.3.10.2), and no archive definitions preempt symbol definitions from the main object or shared libraries.


5.3.10.2   Resolution of Symbols with Common Storage Class

Symbols with common storage class are a special category of global symbols that have a size but no allocated storage. Symbols with common storage class should not be confused with Fortran common symbols, which are not represented by a single symbol table entry. (See Section 5.3.6.6 for a description of Fortran common symbols.). Common storage classes are scCommon, scSCommon, and scTlsCommon.

The symbol definition model used by Tru64 UNIX allows an unlimited number of common storage class symbols with the same name. Ultimately, the "best" of these must be selected (by the linker or the loader) during symbol resolution. The criteria used to select the best symbol definition include the symbol's allocation status and size.

The symbol table does not provide an "allocated common" storage class. Common storage class symbols adopt a new storage class when they are allocated. Typically, their new storage class is scBss or scSBss or scTlsBss. On the other hand, the dynamic symbol table does explicitly distinguish common storage class symbols that have been allocated. See Section 6.3.4 for more information on dynamic symbol resolution.

A symbol reference is resolved according to the following precedence rules:

  1. Find a symbol definition that does not have a common storage class and is not identified as an allocated common in the dynamic symbol table.
  2. Find the largest allocated common identified in the dynamic symbol table.
  3. Find the largest common storage class symbol and allocate it. This step will be skipped when the linker produces a relocatable object file.

Precedence is given to symbol definitions with storage allocation to minimize load time common allocation and redundant storage allocations in shared objects. The loader is capable of allocating space for common storage class symbols, but this should only be necessary when a program references an allocated common symbol in a shared library that is later removed from that shared library.

Note that Fortran common block representations use common storage class symbols Another very frequent occurrence of a common storage class symbol is a C-language global variable that does not have an initializer in its declaration.


5.3.10.3   Mangling and Demangling

Another issue related to symbol resolution is the need to "mangle" user-level identifiers. For example, C++ allows function overloading, prototyping, and the use of templates–all of which can result in the occurrence of the same names for different entities. The solution employed by the symbol table is to use mangled names that derive from the symbol's type signature.

Object file consumers, such as debuggers and object dumpers, need to "demangle" the identifiers so they can be output in a form that is recognizable to the user. For linking and loading, the mangled names are used for symbol resolution.

The encoding of C++ names is described in the manual Using DEC C++ for Tru64 UNIX Systems.

Other compilers may write symbol names that are modified by prepending or appending special characters such as dollar sign ($) or underscore (_) or by prepending qualifier strings such as file names or namespace names. Uppercasing of names is also common for certain languages such as Fortran. All of these transformations fall into the general category of mangled names. Refer to the release notes for specific compilers for additional information.


5.3.10.4   Mixed Language Resolution

Compilation of a program involving multiple source languages introduces additional symbol resolution issues. One important task is resolving the main program entry point because conflicting "main" symbols may be present in the different files. For C and C++, the symbol "main" is the main program entry point, but for other languages, "main" will either be an alias for the main program or an interlude. DEC Fortran and DEC COBOL provide interludes that perform some language specific initializations and then call the real main program entry point. For DEC FORTRAN the main program is "MAIN__" and for DEC COBOL the main program is "__cobol_main". DEC PASCAL provides a "main" symbol that aliases the actual main program symbol.

The symbols "MAIN__" and "__cobol_main" can both be present in a mixed language program, and either, neither, or both can be used by the program. Debuggers can set a breakpoint in the user's main program by applying some precedence for selecting the most appropriate symbol. For a mixed language program, there is a slight chance that "MAIN__" or "__cobol_main" will be present but never called.


5.3.10.5   TLS Symbols

TLS symbols, like non-TLS symbols, can be undefined or common. Unresolved TLS symbols are identified by the storage class scTlsUndefined, and TLS commons have the storage class scTlsCommon. The symbol resolution process for TLS names is similar, but separate; TLS symbols cannot be resolved to non-TLS symbols or vice versa.

TLS common symbols are resolved in the same manner as other common storage class symbols (see Section 5.3.10.2), except that, again, only TLS symbols are candidates for resolution.

Another rule special to TLS is that symbol definitions for TLS common and undefined symbols cannot be imported from shared libraries.

 

 


5.4   Language-Specific Symbol Table Features

Language-specific characteristics are pervasive in the symbol table, particularly in the local, external, and auxiliary symbol tables. See Section 5.2 and Section 5.3.7 for information on language-specific values.

The lang field of the file descriptor entry encodes the source language of the file. This field should be accessed prior to decoding symbolic information, especially type descriptions. This section highlights, by language, language-specific features represented in the symbol table. Additional information on certain features is available elsewhere in this chapter.


5.4.1   Fortran77 and Fortran90

In Fortran, it is possible to create multiple entry points in subroutines. A subroutine has one main entry point and zero or more alternate entry points, indicated by ENTRY statements. See Section 5.3.6.7 for their representation in the symbol table.

Fortran90 array descriptors include allocatable arrays, assumed-shape arrays, and pointers to arrays. Their representation in the symbol table is discussed in Section 5.3.8.9.

Modules provide another scoping level in Fortran90 programs. The symbol table representation for modules has not yet been implemented.


5.4.2   C++

C++ classes encapsulate functions and data inside a single structure. Classes are represented in the symbol table using a btClass basic type and the stBlock/stEnd scoping mechanism. See Section 5.3.8.6.

Templates provide for parameterized types. At present, no special symbol table values are related to templates. The template itself is not represented; rather, entries that correspond to each instantiation are generated. Template instantiations are distinguished by mangled names based on their type signatures.

C++ namespaces, like Fortran modules, offer an additional scope for program identifiers. Again, they are not yet implemented in the symbol table.

The C++ concepts of private, protected, and public data attributes are not currently represented in the symbol table. The C++ concept of "friend" classes and functions are also not represented.


5.4.3   Pascal and Ada

Pascal conformant arrays are function parameters with array dimensions that are determined by the arguments passed to the function at run time. See Section 5.3.8.10.

Variant records are an extension of the record data structure. Variant records allow different sets of fields depending on the value of a particular record member. See Section 5.3.8.11.

Nested procedures are supported in these languages. They are represented using standard scoping mechanisms discussed in Section 5.3.6 and uplevel references described in Section 5.3.4.4.

Sets and subranges are user-defined subsets of ordinal types. Sets are unordered groups of elements, which can be manipulated with the classic set operations. Subranges are ordered and are used with the usual operators. See Section 5.3.8.12 and Section 5.3.8.13.

Ada subtypes of ordinal types are represented in the same manner as Pascal subranges.