2 The Compiler System

This chapter contains information on the following topics:

Data types in the Digital UNIX environment
Using the C preprocessor
Linking object files
Running programs
Object file tools
ANSI name space pollution cleanup in the standard C library

The compiler system is responsible for converting source code into an executable program. This can involve several steps:

Preprocessing - The compiler system performs such operations as expanding macro definitions or including header files in the source code. The output of this operation is an intermediate file with the .i file suffix.
Compiling - The compiler system converts a source file or preprocessed file to an object file with the .o file suffix.
Linking - The compiler system produces a binary image.

These steps can be performed by separate preprocessing, compiling, and linking commands, or they can be performed in a single operation, with the compiler system calling each tool at the appropriate time during the compilation.

Other tools in the compiler system help debug the program after it has been compiled and linked, examine the object files that are produced, create libraries of routines, or analyze the run-time performance of the program.

Table 2-1 summarizes the tools in the compiler system and points to the chapter or section where they are described in this and other documents.

Table 2-1: Compiler System Functions

Task	Tools	Where Documented
Compile, link, and load programs, build shared libraries	Compiler drivers, link editor, dynamic loader	This chapter, Chapter 4, `cc`(1), `c89`(1), `as`(1), `ld`(1), `loader`(5), Assembly Language Programmer's Guide, DEC C Language Reference Manual
Debug programs	Symbolic debuggers (`dbx` and `ladebug`) and Third Degree	Chapter 5, Chapter 6, `dbx`(1), `third`(5), `ladebug`(1), Ladebug Debugger Manual
Profile programs	Profiler, call graph profiler	Chapter 8, `prof`(1), `gprof`(1), `pixie`(5), `atom`(1), `hiprof`(5), `atomtools`(5)
Optimize programs	Optimizer, post-link optimizer	This chapter, Chapter 10, `cc`(1), `third`(5)
Examine object files	`nm`, `file`, `size`, `dis`, `odump`, and `stdump` tools	This chapter, `nm`(1), `file`(1), `size`(1), `dis`(1), `odump`(1), `stdump`(1), Programming Support Tools
Produce necessary libraries	Archiver (`ar`), linker (`ld`) command	This chapter, Chapter 4, `ar`(1), `ld`(1)

2.1 Compiler System Components (Driver Programs)

Figure 2-1 shows the relationship between the major components of the compiler system and their primary inputs and outputs.

Figure 2-1: Compiling a Program

Compiler system commands, sometimes called driver programs, invoke the components of the compiler system. Each language has its own set of compiler commands and flags. In addition, your system might include layered products such as C++, or other languages such as Fortran or Pascal. The languages supported by any one system are determined by the choices made at the time the system is installed or modified. Thus, the configuration of your particular system may not support languages other than C and assembly.

The cc command invokes the C compiler. The -newc and -oldc flags invoke different compiler implementations (where the implementation invoked by -newc is upwardly compatible with that invoked by -oldc). The -newc compiler offers improved optimization, additional features, and greater compatibility with Digital compilers provided on other platforms. The -newc compiler implementation is the default.

The -newc compiler was accessible in previous versions of the Digital UNIX operating system by means of the -migrate flag. The -newc compiler has been made more compatible with the -oldc compiler.

Note
This manual uses the phrase "the C compiler" to refer to both versions of the DEC C compiler, -newc and -oldc. Features supported by only one of the compilers are so marked.

Each compiler implementation supports a slightly different set of compiler flags. See Table 2-4 for a comparison.

In the Digital UNIX programming environment, a single compiler command can perform multiple actions, including the following:

Determine whether to call the appropriate preprocessor, compiler (or assembler), or linker based on the file name suffix of each file. Table 2-2 lists the supported file suffixes, which identify the contents of the input files.
Compile and link a source file to create an executable program. If multiple source files are specified, the files can be passed to other compilers before linking.
Unlike the compilers, the assembler (as) can assemble only a single file, which is assumed to contain assembler code (any file suffix is ignored). The as command does not automatically link the assembled object file. Thus, if you directly invoke the assembler, you need to link the object in a separate step.
Prevent linking and the creation of the executable program, thereby retaining the .o object file for a subsequent link operation.
Pass the major flags associated with the link command (ld) to the linker. For example, you can include the -L flag as part of the cc command to specify the directory path to search for a library. Each language requires different libraries at link time; the driver program for a language passes the appropriate libraries to the linker. For more information on linking with libraries, see Chapter 4 and Section 2.5.3.
Create an executable program file with a default name of a.out or with a name that you specify.

Table 2-2: File Suffixes and Associated Files

Suffix	File
`.a`	Archive library
`.c`	C source code
`.i`	The driver assumes that the source code was processed by the C preprocessor and that the source code is that of the processing driver, for example, `% cc c source.i`. The file, `source.i`, is assumed to contain C source code.
`.o`	Object file
`.s`	Assembly source code
`.so`	Shared object (shared library)
`.u`	ucode object file (supported only under `-oldc`)
`.b`	ucode object library (supported only under `-oldc`)

2.2 Data Types in the Digital UNIX Environment

The following sections describe how data is represented on the Digital UNIX system.

2.2.1 Data Type Sizes

The Digital UNIX system is little endian; that is, the address of a multibyte integer is the address of its least significant byte; the more significant bytes are at higher addresses. The C compiler supports only little endian byte ordering. The following table gives the sizes of supported data types.

Data type Size in bits

char 8

short 16

int 32

long 64

long long 64

float 32 (IEEE Single)

double 64 (IEEE Double)

pointer 64

2.2.2 Floating-Point Range and Processing

The C compiler supports IEEE single-precision (32-bit float) and double-precision (64-bit double) floating-point data, as defined by the IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985).

Floating-point numbers have the following ranges:

float: 1.17549435e-38f to 3.40282347e+38f
double: 2.2250738585072014e-308 to 1.79769313486231570e+308

Digital UNIX provides the basic floating-point number formats, operations (add, subtract, multiply, divide, square root, remainder, and compare), and conversions defined in the standard. You can obtain full IEEE-compliant trapping behavior (including nonnumbers [NaNs]) by specifying a compilation flag, or by specifying a fast mode when IEEE-style traps are not required. You can also select, at compile time, the rounding mode applied to the results of IEEE operations. See cc(1) for information on the flags that support IEEE floating-point processing.

A user program can control the delivery of floating-point traps to a thread by calling ieee_set_fp_control(), or dynamically set the IEEE rounding mode by calling write_rnd(). See ieee(3) for additional information on how to handle IEEE floating-point exceptions.

2.2.3 Structure Alignment

The C compiler aligns structure members on natural boundaries by default. That is, the components of a structure are laid out in memory in the order in which they are declared. The first component has the same address as the entire structure. Each additional component follows its predecessor on the next natural boundary for the component type.

For example, the following structure is aligned as shown in Figure 2-2:

struct {char c1;
        short s1;
        float f;
        char c2;
       }

Figure 2-2: Default Structure Alignment

The first component of the structure, c1, starts at offset 0 and occupies the first byte. The second component, s1, is a short; it must start on a word boundary. Therefore, padding is added between c1 and s1. No padding is needed to make f and c2 fall on their natural boundaries. However, because size is rounded up to a multiple alignment, three bytes of padding are added after c2.

The following mechanisms can be used to override the default alignment of structure members:

The #pragma member_alignment and #pragma nomember_alignment directives (-newc only)
The #pragma pack directive (-newc or -oldc)
The -Zpn flag

See Section 3.5 and Section 3.7 for information on these directives.

2.2.4 Bit-Field Alignment

In general, the alignment of a bit field is determined by the bit size and bit offset of the previous field. For example, the following structure is aligned as shown in Figure 2-3:

struct a {
    char  f0:   1;
    short f1:  12;
    char  f2:   3;
} struct_a;

Figure 2-3: Default Bit-Field Alignment

The first bit field, f0, starts on bit offset 0 and occupies 1 bit. The second, f1, starts at offset 1 and occupies 12 bits. The third, f2, starts at offset 13 and occupies 3 bits. The size of the structure is two bytes.

Certain conditions can cause padding to occur prior to the alignment of the bit field:

Bit fields of size 0 cause padding to the next pack boundary. (The pack boundary is determined by the #pragma pack directive (-newc or -oldc) or the -Zpn compiler flag.) For bit fields of size 0, the bit field's base type is ignored. For example, consider the following structure:
```
struct b {
    char f0:  1;
    int    :  0;
    char f1:  2;
} struct_b;
```
If the source file is compiled with the -Zp1 flag or if a #pragma pack 1 directive is encountered in the compilation, f0 would start at offset 0 and occupy 1 bit, the unnamed bit field would start at offset 8 and occupy 0 bits, and f1 would start at offset 8 and occupy 2 bits.
Similarly, if the -Zp2 flag or the #pragma pack 2 directive were used, the unnamed bit field would start at offset 16. With -Zp4 or #pragma pack 4, it would start at offset 32.
If the bit field does not fit in the current unit, padding occurs to either the next pack boundary or the next unit boundary, whichever is closest. (The unit boundary is determined by the bit field's base type, for example, the unit boundary associated with the declaration char foo: 1 is a byte.) The current unit is determined by the current offset, the bit field's base size, and the kind of packing specified, as shown in the following example:
```
struct c {
    char  f0:  7;
    short f1: 11;
} struct_c;
```
Assuming that you specify either the -Zp1 flag or the #pragma pack 1 .directive, f0 starts on bit offset 0 and occupies 7 bits in the structure. Because the base size of f1 is 8 bits and the current offset is 7, f1 will not fit in the current unit. Padding is added to reach the next unit boundary or the next pack boundary, whichever comes first, in this case, bit 8. The layout of this structure is shown in Figure 2-4.

Figure 2-4: Padding to the Next Pack Boundary

2.2.5 The _align Storage Class Modifier

Data alignment is implied by data type. For example, the C compiler aligns an int (32 bits) on a 4-byte boundary and a long (64 bits) on an 8-byte boundary. The _align storage-class modifier, supported only by the C compiler using the -std and -newc flags (the default), aligns objects of any of the C data types on the specified storage boundary. It can be used in a data declaration or definition.

The _align modifier has the following format:

_align ( keyword ) _align ( n )

Where keyword is a predefined alignment constant and n is an integer power of 2. The predefined constant or power of 2 tells the compiler the number of bytes to pad in order to align the data.

For example, to align an integer on the next quadword boundary, use any of the following declarations:

   int _align(QUADWORD) data;
   int _align(quadword) data;
   int _align(3) data;

In this example, int _align (3) specifies an alignment of 2x2x2 bytes, which is 8 bytes, or a quadword of memory.

The following table shows the predefined alignment constants, their equivalent power of 2, and equivalent number of bytes.

Constant Power Number

of 2 of Bytes

BYTE or byte 0 1

WORD or word 1 2

LONGWORD or longword 2 4

QUADWORD or quadword 3 8

2.3 Using the C Preprocessor

The C preprocessor performs macro expansion, includes header files, and executes preprocessor directives prior to compiling the source file. The following sections describe the Digital UNIX -specific operations performed by the C preprocessor. For more information on the C preprocessor, see the cc(1) and cpp(1) reference pages and the DEC C Language Reference Manual.

2.3.1 Predefined Macros

When the compiler is invoked, it defines C preprocessor macros that identify the language of the input files and the environments on which the code may run. You can reference these macros in #ifdef statements to isolate code that applies to a particular language or environment. The preprocessor macros are listed in Table 2-3.

The type of source file and the type of standards you apply determine the macros that are defined. The C compiler supports several levels of standardization:

The -std flag enforces the ANSI C standard, but allows some common programming practices disallowed by the standard, and passes the macro _ _STDC_ _=0 to the preprocessor.
The -std0 flag enforces the K & R programming style, with certain ANSI extensions in areas where the K & R behavior is undefined or ambiguous. In general, -std0 compiles most pre-ANSI C programs and produces expected results. It causes the _ _STDC_ _ macro to be undefined.
The -std1 flag strictly enforces the ANSI C standard and all its prohibitions (such as those that apply to handling a void, the definition of an lvalue in expressions, the mixing of integrals and pointers, and the modification of an rvalue). It passes the macro _ _STDC_ _=1 to the preprocessor.

Table 2-3: Predefined Macros

Macro	Source File Type	-std Flag
_ _DECC (-newc only)	.c	-std0, -std, -std1
LANGUAGE_C	.c	-std0
_ _LANGUAGE_C_ _	.c	-std0, -std, -std1
unix	.c, .s	-std0
_ _unix_ _	.c, .s	-std0, -std, -std1
_ _osf_ _	.c, .s	-std0, -std, -std1

_ _alpha	.c, .s	-std0, -std, -std1
SYSTYPE_BSD	.c, .s	-std0
_SYSTYPE_BSD	.c, .s	-std0, -std, -std1
LANGUAGE_ASSEMBLY	.s	-std0, -std, -std1
_ _LANGUAGE_ASSEMBLY_ _	.s	-std0, -std, -std1

2.3.2 Including Common Files

When writing programs, you often use header files that are common among a program's modules. These files define constants, the parameters for system calls, and so on.

C header files, sometimes known as include files, have a \.h suffix. Typically, the reference page for a library routine or system call indicates the required header files. Header files can be used in programs written in different languages.

Note
If you intend to debug your program using dbx or ladebug, do not place executable code in a header file. The debugger interprets a header file as one line of source code; none of the source lines in the file appears during the debugging session. For more information on the dbx debugger, see Chapter 5. For details on ladebug, see the Ladebug Debugger Manual.

You can include header files in a program source file in one of two ways:

#include " filename": Enter this line in column 1 of a source file to indicate that the C macro preprocessor should first search for the include file filename in the directory in which it found the file that contains the directive, then in the search path indicated by the -I flag, and finally in /usr/include.

#include < filename >: Enter this line in column 1 of a source file to indicate that the C macro preprocessor should search for the include file filename only in the search path indicated by the -I flag and in /usr/include, but not in the current directory.

You can also use the -Idir compiler flag to specify additional pathnames (directories) to be searched by the C preprocessor for #include files. The C preprocessor searches first in the directory where the source file resides, followed by the specified pathname, dir, then the default directory, /usr/include. If dir is omitted, the default directory, /usr/include, is not searched.

2.3.3 Setting Up Multilanguage Include Files

C, Fortran, and assembly code can reside in the same include files, then conditionally included in programs as required. To set up a shareable include file, you must create a \.h file and enter the respective code, as shown in the following example:

   #ifdef _ _LANGUAGE_C_ _
    .
    .    (C code)
    .
   #endif
   #ifdef _ _LANGUAGE_ASSEMBLY_ _
    .
    .    (assembly code)
    .
   #endif

When the compiler includes this file in a C source file, the _ _LANGUAGE_C_ _ macro is defined, and the C code is compiled. When the compiler includes this file in an assembly language source file, the _ _LANGUAGE_ASSEMBLY_ _ macro is defined, and the assembly language code is compiled.

2.3.4 Implementation-Specific Preprocessor Directives (#pragma)

The #pragma directive is a standard method of implementing features that vary from one compiler to the next. The C compiler supports the following implementation-specific pragmas:

#pragma environment
#pragma function
#pragma inline
#pragma intrinsic
#pragma linkage
#pragma member
#pragma message
#pragma pack
#pragma pointer_size
#pragma use_linkage
#pragma weak

The pragmas are described in detail in Chapter 3.

2.4 Compiling Source Programs

The cc command provides more than one compilation environment: The -newc and -oldc flags invoke different compiler implementations (where the implementation invoked by -newc is upwardly compatible with that invoked by -oldc). The -newc compiler offers improved optimization, additional features, and greater compatibility with Digital compilers provided on other platforms. The -newc compiler implementation is the default.

The -newc compiler has been accessible in previous versions of the Digital UNIX operating system by means of the -migrate flag. The -newc compiler has been made more compatible with the -oldc compiler.

All compilation environments produce object files that comply with the common object file format (COFF), and their objects files can be freely intermixed. The C compiler invoked by the -oldc flag employs ucode-based optimizations; the C compiler invoked by the -newc flag employs other optimizations.

The following sections describe the flags that are available in all compilation environments, the default compiler behavior, and how to compile multilanguage programs.

2.4.1 Compilation Flags

Compiler flags select a variety of program development functions, including debugging, optimizing, and profiling facilities, and the names assigned to output files.

Table 2-4 compares the flags that are available with the three compilation environments. An asterisk (*) indicates that the flag is accepted, but ignored, by the compiler. See the cc(1) reference page for more information on these flags.

Table 2-4: Comparison of Compiler Flags

Flag	-newc	-oldc	-migrate
-ansi_alias	yes	no	yes
-[no_]ansi_args	yes	no	yes
-assume [no]accuracy_sensitive	yes	yes	yes
-assume [no]aligned_object	yes	no	yes
-assume [no]trusted_short_alignment	yes	no	yes
-B	yes	yes	yes
-c	yes	yes	yes
-C	yes	yes	yes
-call_shared	yes	yes	yes
-check	yes	no	yes
-compress	yes	yes	yes
-cord	yes	yes	yes
-[no_]cpp	yes	yes	yes
-D	yes	yes	yes
-double	yes	yes	yes
-edit	yes	yes	yes
-exact_version	yes	yes	yes
-E	yes	yes	yes
-fast	yes	yes	yes
-feedback	yes	yes	yes
-float	yes	yes	yes
-float_const	yes	yes	yes
-[no_]fp_reorder	yes	yes	yes
-fprm {c \| d \| n \| m}	yes	yes	yes
-fptm {n \| su \| sui \| u}	yes	yes	yes
-framepointer	yes	yes	yes
-g	yes	yes	yes
-G	yes*	yes	yes*
-gen_feedback	yes	no	yes
-h	yes	yes	yes
-H	yes	yes	yes
-I	yes	yes	yes
-ieee	yes	yes	yes
-ifo	yes	yes*	yes
-inline	yes	no	yes
-j	no	yes	no
-k	yes	yes	yes
-K	yes	yes	yes
-ko	yes	yes	yes
-M	yes	yes	yes
-machine_code	yes	no	yes
-MD	yes	yes	yes
-[no_]misalign	yes	yes	yes
-no_archive	yes	yes	yes
-no_inline	yes	yes	yes
-nomember_alignment	yes	no	yes
-non_shared	yes	yes	yes
-noobject	yes	no	yes
-o	yes	yes	yes
-O	yes	yes	yes
-oldcomment	yes	yes	yes
-Olimit	yes*	yes	yes*
-p	yes	yes	yes
-P	yes	yes	yes
-[no_]pg	yes	yes	yes
-portable	yes	no	yes
-preempt_module	yes	no	yes
-preempt_symbol	yes	no	yes
-proto[is]	yes	yes	yes
-pthread	yes	yes	yes
-Q	yes	yes	yes
-readonly_strings	yes	yes	yes
-resumption_safe	yes	yes	yes
-S	yes	yes	yes
-scope_safe	yes	yes	yes
-show	yes	no	yes
-signed	yes	yes	yes
-source_listing	yes	no	yes
-speculate	yes	no	yes
-std[n]	yes	yes	yes
-t	yes	yes	yes
-taso	yes	yes	yes
-threads	yes	yes	yes
-tune	yes	yes	yes
-traditional	yes	yes	yes
-trapuv	yes	yes	yes
-U	yes	yes	yes
-unroll	yes	no	yes
-unsigned	yes	yes	yes
-v	yes	yes	yes
-V	yes	yes	yes
-varargs	yes	yes	yes
-vaxc	yes	no	yes
-verbose	yes	yes	yes
-volatile	yes	yes	yes
-w	yes	yes[Table Note 1]	yes
-W	yes	yes	yes
-warnprotos	yes	yes	yes
-writable_strings	yes	yes	yes
-xtaso	yes	yes	yes
-xtaso_short	yes	yes	yes
-Zp	yes	yes	yes

Table note:

The -w0 flag is not accepted by the -oldc flag.

2.4.2 Default Compilation Behavior

Some flags have default values that are used if the flag is not specified on the command line. For example, the default name for an output file is filename.o for object files, where filename is the base name of the source file. The default name for an executable program object is a.out. The following example uses the defaults in compiling two source files named prog1.c and prog2.c:

% cc prog1.c prog2.c

This command runs the C compiler, creating object modules prog1.o and prog2.o and the executable program a.out.

Whether you are new to Digital UNIX, porting applications from other systems, or concerned with compatibility issues, knowing the default behavior of the compiler is useful. When you enter the cc compiler command with no other flags, the following flags are in effect:

-newc: The default compiler flag; invoked when the compiler flag is not specified.

-assume aligned_objects: Allows the compiler to make such an assumption, and thereby generate more efficient code for pointer dereferences of aligned pointer types.

-call_shared: Produces a dynamic executable file that uses shareable objects at run time.

-double: Promotes expressions of type float to double.

-fprm n: Performs normal rounding (unbiased round to nearest) of floating-point numbers.

-g0: Does not produce symbol information for symbolic debugging.

-I/usr/include: Specifies that #include files whose names do not begin with / are always sought first in the directory /usr/include.

-inline manual: Inlines only those function calls explicitly requested for inlining by a #pragma inline directive.

-member_alignment: Directs the compiler to naturally align data structure members (with the exception of bit-field members).

-no_fp_reorder: Directs the compiler to use only certain scalar rules for calculations.

-no_misalign: Generates alignment faults for arbitrarily aligned addresses.

-O1: Enables global optimizations.

-oldcomment: Allows traditional token concatenation.

-p0: Disables profiling.

-no_pg: Turns off gprof profiling.

-preempt_symbol: Allows symbol preemption on a symbol-by-symbol basis.

-signed: Causes all char declarations to be signed char.

-std0: Enforces the K&R standard with some ANSI extensions.

-tune generic: Selects instruction tuning that is appropriate for all implementations of the Alpha architecture.

-unroll 0: Directs the optimizer to use its own default loop unrolling amount.

-writeable_strings: Makes string literals writable.

The following list includes miscellaneous aspects of the default cc compiler behavior:

The output file is named a.out unless another name is specified by using the -o flag.
Source files are linked automatically if compilation (or assembly) is successful.
Floating-point computations are fast floating point, not full IEEE.
Pointers are 64 bits. For information on using 32-bit pointers, see Appendix A.
Temporary files are placed in the \/tmp directory.

2.4.3 Compiling Multilanguage Programs

When the source language of the main program differs from that of a subprogram, compile each program separately with the appropriate driver and link the object files in a separate step. You can create objects suitable for linking by specifying the -c flag, which stops a driver immediately after the object file has been created. For example:

% cc -c main.c

This command produces the object file main.o, not the executable file a.out.

Most language driver programs pass information to cc, which, after processing, passes information to ld. When one of the modules to be compiled is a C program, you can usually use the driver command of the other language to compile and link both modules.

2.5 Linking Object Files

The cc driver command can link object files to produce an executable program. In some cases, you may want to use the ld linker directly. Depending on the nature of the application, you must decide whether to compile and link separately or to compile and link with one compiler command. Factors to consider include:

Whether all source files are in the same language
Whether any files are in source form

2.5.1 Linking Using Compiler Commands

You can use a compiler command instead of the linker command to link separate objects into one executable program. Each compiler (except the assembler) recognizes the .o suffix as the name of a file that contains object code suitable for linking and immediately invokes the linker.

Because the compiler driver programs pass the libraries associated with that language to the linker, using the compiler command is usually recommended. For example, the cc driver uses the C library (libc.so) by default. For information about the default libraries used by each compiler command, see the appropriate command in the reference pages, such as cc(1).

You can also use the -l flag to specify additional libraries to be searched for unresolved references. The following example shows how to use the cc driver to pass the names of two libraries to the linker with the -l flag:

% cc -o all main.o more.o rest.o -lm -lexc

The -lm flag specifies the math library; the -lexc flag specifies the exception library.

You should compile and link modules with a single command when you want to optimize your program. Most compilers support increasing levels of optimization with the use of certain flags. For example:

The -O0 flag requests no optimization (usually for debugging purposes).
The -O1 flag requests certain local (module-specific) optimizations.
Cross-module optimizations can be requested with the -O3 flag to the C compiler using the -oldc flag, or with the -ifo flag to the C compiler using the -newc flag. In this case, compiling multiple files in one operation allows the compiler to perform the maximum possible optimizations.
Certain compilers may provide a combination of flags (such as -c and -o) that compile multiple source files into a single object module. This combination allows interprocedural optimizations to occur, yet retains the object file.

2.5.2 Linking Using the ld Command

Normally, users do not need to run the linker directly, but use the cc command to indirectly invoke the linker. Executables that need to be built solely from assembler objects can be built with the ld command.

The linker (ld) combines one or more object files (in the order specified) into one executable program file, performing relocation, external symbol resolutions, and all other processing required to make object files ready for execution. Unless you specify otherwise, the linker names the executable program file a.out. You can execute the program file or use it as input for another linker operation.

The as assembler does not automatically invoke the linker. To link a program written in assembly language, do either of the following:

Assemble and link with one of the other compiler commands. The .s suffix of the assembly language source file automatically causes the compiler command to invoke the assembler.
Assemble with the as command and then link the resulting object file with the ld command.

For information about the flags and libraries that affect the linking process, see the ld(1) reference page.

2.5.3 Specifying Libraries

When you compile your program on the Digital UNIX system, it is automatically linked with the C library, libc.so. If you call routines that are not in libc.so or one of the archive libraries associated with your compiler command, you must explicitly link your program with the library. Otherwise, your program will not be linked correctly.

You need to explicitly specify libraries in the following situations:

When compiling multilanguage programs
If you compile multilanguage programs, be sure to explicitly request any required run-time libraries to handle unresolved references. Link the libraries by specifying -lstring, where string is an abbreviation of the library name.
For example, if you write a main program in C and some procedures in another language, you must explicitly specify the library for that language and the math library. When you use these flags, the linker replaces the -l with lib and appends the specified characters (for the language library and for the math library) and the .a or .so suffix, depending upon whether it is a static (non-shared archive library) or dynamic (call-shared object or shared library) library. Then, it searches the following directories for the resulting library name:
/usr/shlib
/usr/ccs/lib
/usr/lib/cmplrs/cc
/usr/lib
/usr/local/lib
/var/shlib

For a list of the libraries that each language uses, see the reference pages for the appropriate language compiler driver.
When storing object files in an archive library
You must include the pathname of the library on the compiler or linker command line. For example, the following command specifies that the libfft.a archive library in the /usr/jones directory is to be linked along with the math library:
% cc main.o more.o rest.o /usr/jones/libfft.a -lm

The linker searches libraries in the order you specify. Therefore, if any file in your archive library uses data or procedures from the math library, you must specify the archive library before you specify the math library.
When storing ucode object libraries
To link from a ucode library, specify the -klx compiler flag.
Note
Only the -oldc flag to the C compiler can be used to produce ucode files.
The following example links a file from a ucode library:
% cc -klucode_lib -o output main.u more.u rest.u

Because the libraries are searched as they are encountered on the command line, the order in which you specify them is important. Although a library might be made from both assembly and high-level language routines, the ucode object library contains code only for the high-level language routines.
Unlike an extended COFF object library, the ucode library does not contain code for the routines. You must specify to the ucode linker both the ucode object library and the extended COFF object library, in that order, to ensure that all modules are linked with the proper library.
If the compiler driver is to perform both a ucode link step and a final link step, the object file created after the ucode link step is placed in the position of the first ucode file specified or created on the command line in the final link step.

2.6 Running Programs

To run an executable program in your current working directory, in most cases you enter its file name. For example, to run the program a.out located in your current directory, enter:

% a.out

If the executable program is not in a directory in your path, enter the directory path before the file name, or enter:

% ./a.out

When the program is invoked, the main function in a C program can accept arguments from the command line if the main function is defined with one or more of the following optional parameters:

int main( int argc, char *argv[ ], char *envp[ ] ) [...]

The argc parameter is the number of arguments in the command line that invoked the program. The argv parameter is an array of character strings containing the arguments. The envp parameter is the environment array containing process information, such as the user name and controlling terminal. (The envp parameter has no bearing on passing command-line arguments. Its primary use is during exec and getenv function calls.)

You can access only the parameters that you define. For example, the following program defines the argc and argv parameters to echo the values of parameters passed to the program:

/*
 * Filename: echo-args.c
 * This program echoes command-line arguments.
*/

 

#include <stdio.h>

 

int main( int argc, char *argv[] )
{
int i;

 

printf( "program: %s\n", argv[0] ); /* argv[0] is program name */

 

for ( i=1; i < argc; i++ )
	printf( "argument %d: %s\n", i, argv[i] );

 

return(0);
}

The program is compiled with the following command to produce a program file called a.out:

$ cc echo-args.c

When the user invokes a.out and passes command-line arguments, the program echoes those arguments on the terminal. For example:

$ a.out Long Day\'s "Journey into Night"

	program: a.out
	argument 1: Long
	argument 2: Day's
	argument 3: Journey into Night

The shell parses all arguments before passing them to a.out. For this reason, a single quote must be preceded by a backslash, alphabetic arguments are delimited by spaces or tabs, and arguments with embedded spaces or tables are enclosed in quotation marks.

2.7 Object File Tools

After a source file has been compiled, you can examine the object file or executable file with following tools:

odump - Displays the contents of an object file, including the symbol table and header information.
stdump - Displays symbol table information from an object file.
nm - Displays only symbol table information.
file - Provides descriptive information on the general properties of the specified file, for example, the programming language used.
size - Displays the size of the text, data, and bss segments.
dis - Disassembles object files into machine instructions.

The following sections describe these tools. In addition, see the strings(1) reference page for information on using the strings command to find the printable strings in an object file or other binary file.

2.7.1 Dumping Selected Parts of Files (odump)

The odump tool displays header tables and other selected parts of an object or archive file. For example, odump displays the following information about the file echo-args.o:

% odump -at echo-args.o


 


 

			***ARCHIVE SYMBOL TABLE***

 


 


 


 

			***ARCHIVE HEADER***
	Member Name        Date       Uid     Gid     Mode      Size

 


 


 

			***SYMBOL TABLE INFORMATION***
[Index]	Name	Value	Sclass	Symtype	Ref
echo-args.o:
[0]	 main	0x0000000000000000	0x01	0x06	0xfffff
[1]	 printf	0x0000000000000000	0x06	0x06	0xfffff
[2]	 _fpdata	0x0000000000000000	0x06	0x01	0xfffff

For more information, see the odump(1) reference page.

2.7.2 Listing Symbol Table Information (nm)

The nm tool displays symbol table information for object files. For example, nm would display the following information about the object file produced for the executable file a.out:

% nm

nm: Warning: - using a.out

 

Name                            Value        Type       Size

 

.bss                   | 0000005368709568 | B | 0000000000000000
.data                  | 0000005368709120 | D | 0000000000000000
.lit4                  | 0000005368709296 | G | 0000000000000000
.lit8                  | 0000005368709296 | G | 0000000000000000
.rconst                | 0000004831842144 | Q | 0000000000000000
.rdata                 | 0000005368709184 | R | 0000000000000000

.
.
.

The Name column contains the symbol or external name; the Value column shows the address of the symbol, or debugging information; the Type column contains a letter showing the symbol type; and the Size column shows the symbol's size (accurate only when the source file is compiled with a debugging flag, for example, -g). Some of the symbol type letters are:

B - External zeroed data
D - External initialized data
G - External small initialized data
Q - Read-only constants
R - External read-only data

For more information, see nm(1).

2.7.3 Determining a File's Type (file)

The file command reads input files, tests each file to classify it by type, and writes the file's type to standard output. The file command uses the /etc/magic file to identify files that contain a magic number. (A magic number is a numeric or string constant that indicates a file's type.)

The following example shows the output of the file command on a directory containing a C source file, object file, and executable file:

% file *.*

.:       directory
..:      directory
a.out:   COFF format alpha dynamically linked, demand paged executable
or object module not stripped - version 3.11-8
echo-args.c:    c program text
echo-args.o:    COFF format alpha executable or object module not
stripped - version 3.12-6

For more information, see file(1).

2.7.4 Determining a File's Segment Sizes (size)

The size tool displays information about the text, data, and bss segments of the specified object or archive file or files in octal, hexadecimal, or decimal format. For example, when it is called without any arguments, the size command returns information on a.out. You can also specify the name of an object or executable file on the command line. For example:

% size

text    data    bss    dec      hex
8192    8192    0      16384    4000


% 
size echo-args.o

text    data    bss    dec      hex
176     96      0      272      110

For more information, see size(1).

2.7.5 Disassembling an Object File (dis)

The dis tool disassembles object file modules into machine language. For example, the dis command produces the following output when it disassembles the a.out program:

% dis a.out

.
.
.

       _ _start:
 0x120001080:  23defff0        lda      sp, -16(sp)
 0x120001084:  b7fe0008        stq      zero, 8(sp)
 0x120001088:  c0200000        br       t0, 0x12000108c
 0x12000108c:  a21e0010        ldl      a0, 16(sp)
 0x120001090:  223e0018        lda      a1, 24(sp)

.
.
.

2.8 ANSI Name Space Pollution Cleanup in the Standard C Library

The ANSI C standard states that users whose programs link against libc are guaranteed a certain range of global identifiers that can be used in their programs without danger of conflict with, or preemption of, any global identifiers in libc.

The ANSI C standard also reserves a range of global identifiers libc can use in its internal implementation. These are called reserved identifiers and consist of the following, as defined in ANSI document number X3.159-1989:

Any external identifier beginning with an underscore
Any external identifier beginning with an underscore followed by a capital letter or an underscore

ANSI conformant programs are not permitted to define global identifiers that either match the names of ANSI routines or fall into the reserved name space specified earlier in this section. All other global identifier names are available for use in user programs.

Historical libc implementations contain large numbers of non-ANSI, nonreserved global identifiers that are both documented and supported. These routines are often called from within libc by other libc routines, both ANSI and otherwise. A user's program that defines its own version of one of these non-ANSI, nonreserved items would preempt the routine of the same name in libc. This could alter the behavior of supported libc routines, both ANSI and otherwise, even though the user's program may be ANSI conformant. This potential conflict is known as ANSI name space pollution.

The implementation of libc on Digital UNIX Version 4.0 includes a large number of non-ANSI, nonreserved global identifiers that are both documented and supported. To protect against preemption of these global identifiers within libc and to avoid pollution of the user's name space, the vast majority of these identifiers have been renamed to the reserved name space by prepending two underscores (_ _) to the identifier names. To preserve external access to these items, weak identifiers have been added using the original identifier names that correspond to their renamed reserved counterparts. Weak identifiers work much like symbolic links between files. When the weak identifier is referenced, the strong counterpart is used instead.

User programs linked statically against libc may have extra symbol table entries for weak identifiers. Each of these identifiers will have the same address as its reserved counterpart, which will also be included in the symbol table. For example, if a statically linked program simply called the tzset() function from libc, the symbol table would contain two entries for this call, as follows:

# stdump -b a.out | grep tzset 18. (file 9) (4831850384) tzset Proc Text symref 23 (weakext) 39. (file 9) (4831850384) _ _tzset Proc Text symref 23

In this example, tzset is the weak identifier and _ _tzsetis its strong counterpart. The _ _tzsetidentifier is the routine that will actually do the work.

User programs linked as shared should not see such additions to the symbol table because the weak/strong identifier pairs remain in the shared library.

Existing user programs that reference non-ANSI, nonreserved identifiers from libc do not need to be recompiled because of these changes, with one exception: user programs that depended on preemption of these identifiers in libc will no longer be able to preempt them using the nonreserved names. This kind of preemption is not ANSI compliant and is highly discouraged. However, the ability to preempt these identifiers still exists by using the new reserved names (those preceded by two underscores).

These changes apply to the dynamic and static versions of libc:

/usr/shlib/libc.so
/usr/lib/libc.a

When debugging programs linked against libc, references to weak symbols resolve to their strong counterparts, as in the following example:

% dbx a.out

dbx version 3.11.4

Type 'help' for help.

main: 4 tzset


(dbx) stop in tzset


[2] stop in _ _tzset


(dbx)

When the weak symbol tzset in libc is referenced, the debugger responds with the strong counterpart _ _tzsetinstead because the strong counterpart actually does the work. The behavior of the dbx debugger is the same as if _ _tzsetwere referenced directly.

Data type	Size in bits
char	8
short	16
int	32
long	64
long long	64
float	32 (IEEE Single)
double	64 (IEEE Double)
pointer	64

Constant	Power	Number
	of 2	of Bytes
BYTE or byte	0	1
WORD or word	1	2
LONGWORD or longword	2	4
QUADWORD or quadword	3	8