Program analysis tools are extremely important for computer architects and software engineers. Computer architects use them to test and measure new architectural designs, and software engineers use them to identify critical pieces of code in programs or to examine how well a branch prediction or instruction scheduling algorithm is performing. Program analysis tools are needed for problems ranging from basic block counting to instruction and data cache simulation. Although the tools that accomplish these tasks may appear quite different, each can be implemented simply and efficiently through code instrumentation.
Atom provides a flexible code instrumentation interface that is capable of building a wide variety of tools. Atom separates the common part in all problems from the problem-specific part by providing machinery for instrumentation and object-code manipulation, and allowing the tool designer to specify what points in the program are to be instrumented. Atom is independent of any compiler and language as it operates on object modules that make up the complete program.
Atom, as provided in the Digital UNIX operating system, provides the following:
atom
application_program
-tool
toolname
-env
environment
atom
application_program
instrumentation_file
analysis_file
The
atom
(1)
reference page describes both forms of the
atom
command.
This chapter contains the following sections:
The Digital UNIX operating system provides and supports the Atom tools listed in Table 9-1.
Tool | Description |
Third Degree
(third )
|
Performs memory access checks and detects memory leaks in an
application. The Third Degree Atom tool is described in
Chapter 7 and in the
third (5)
reference page.
|
hiprof
|
Produces a flat profile of an application that shows the execution
time spent in a given procedure and a hierarchical profile that
shows the execution time spent in a given procedure and all its
descendants. The
hiprof
Atom tool is described in
Chapter 8 and
hiprof (5).
|
pixie
|
Partitions an application into basic blocks and counts the number
of times each basic block is executed. The
pixie
Atom tool is described in
Chapter 8 and
pixie (5).
|
The Digital UNIX operating system provides the unsupported Atom tools listed in Table 9-2 as examples for programmers developing custom-designed Atom tools. These tools are distributed in source form to illustrate Atom's programming interfaces. Some of the tools are further described in Section 9.2.
Tool | Description |
branch
|
Instruments all conditional branches to determine how many are predicted correctly. |
cache
|
Determines cache miss rate if application runs in 8K direct-mapped cache. |
dtb
|
Determines the number of dtb (data translation buffer) misses if the application uses 8KB pages and a fully associative translation buffer. |
dyninst
|
Provides fundamental dynamic counts of instructions, loads, stores, blocks, and procedures. |
inline
|
Identifies potential candidates for inlining. |
iprof
|
Prints the number of times each procedure is called as well as the number of instructions executed (dynamic count) by each procedure. |
malloc
|
Records each call to the
malloc
function and prints a summary of the application's allocated memory.
|
prof
|
Prints the number of instructions executed (dynamic count) by each procedure. |
ptrace
|
Prints the name of each procedure as it is called. |
trace
|
Generates an address trace, logs the effective address of every load and store operation, and logs the address of the start of every basic block as it is executed. |
An Atom tool consists of the following:
Atom views an application as a hierarchy of components:
Atom tools insert instrumentation points in an application program at
procedure, basic block, or instruction boundaries.
For example, basic block counting tools instrument the beginning of
each basic block, data cache
simulators instrument each load and
store instruction, and branch prediction analyzers instrument each
conditional branch instruction.
At any instrumentation point, Atom allows a tool to insert a procedure call to an analysis routine. The tool can specify that the procedure call be made before or after an object, procedure, basic block, or instruction.
The command line used to apply Atom tools to an application is
described completely in the
atom
(1)
reference page. This section describes the command line and its most
commonly used arguments and flags.
The
atom
command line has two forms:
atom application_program
-tool toolname
[
-env environment
]
[
flags...
]
atom
command is used to build an instrumented version of an
application program using a prepackaged Atom tool.
This form requires the
-tool
flag and accepts the
-env
flag. It does not allow either the
instrumentation_file
or the
analysis_file
parameter.
The
-tool
flag identifies the prepackaged Atom tool to be used. By default,
Atom searches for prepackaged tools in the
/usr/lib/cmplrs/atom/tools
and
/usr/lib/cmplrs/atom/examples
directories. You can add directories to the search path by supplying
a colon-separated list of additional directories to the
ATOMTOOLPATH
environment variable.
The
-env
flag identifies any special environment (for instance,
threads
)
in which the tool is to operate.
The set of environments supported by a given tool is defined by
the tool's creator and listed in the tool's documentation.
Atom displays an error if you specify an environment that is
undefined for the tool. The prepackaged tools allow you to omit the
-env
flag to obtain a general-purpose environment.
atom application_program instrumentation_file
[
analysis_file
]
[
flags...
]
atom
command is used to apply a tool that instruments an application
program. This form requires the
instrumentation_file
parameter and accepts the
analysis_file
parameter.
The
instrumentation_file
parameter specifies the name of a C source file or an object module
that contains the Atom tool's instrumentation procedures.
By convention, most instrumentation files have the suffix
.inst.c
or
.inst.o
.
The
analysis_file
parameter specifies the name of a C source file or an object module
that contains the Atom tool's analysis procedures.
Note that you do not need to specify an analysis file if the
instrumentation file does not call analysis procedures.
By convention, most analysis files have the suffix
.anal.c
or
.anal.o
.
You can have multiple instrumentation and analysis source files. The following example creates composite instrumentation and analysis objects from several source files:
%
cc -c file1.c file2.c
%
cc -c file7.c file8
%
ld -r -o tool.inst.o file1.o file2.o
%
ld -r -o tool.anal.o file7.o file8.o
%
atom hello tool.inst.o tool.anal.o -o hello.tool
Note
You can also write analysis procedures in C++. You must assign a type of
extern "C"
to each procedure to allow it to be called from the application. You must also compile and link the analysis files before issuing theatom
command. For example:
% cxx -c tool.a.C
% ld -r -o tool.anal.o tool.a.o -lcxx -lexc
% atom hello tool.inst.c tool.anal.o -o hello.tool
With the exception of the
-tool
and
-env
flags, both forms of the
atom
command accept any of the remaining flags described in the
atom
(1)
reference page.
The following are some flags that deserve special mentioning:
-A1
-debug
ptrace
sample tool is run under the
dbx
debugger. The instrumentation is stopped at line 12, and the
procedure name is printed.
%
atom hello ptrace.inst.c ptrace.anal.c -o hello.ptrace -debug
dbx version 3.11.8 Type 'help' for help. Stopped in InstrumentAll (dbx) stop at 12 [4] stop at "/udir/test/scribe/atom.user/tools/ptrace.inst.c":12 (dbx) c [3] [InstrumentAll:12 ,0x12004dea8] if (name == NULL) name = "UNKNOWN"; (dbx) p name 0x2a391 = "__start"
-g
%
atom hello ptrace.inst.c ptrace.anal.c -o hello.ptrace -g
%
dbx hello.ptrace
dbx version 3.11.8 Type 'help' for help. (dbx) stop in ProcTrace [2] stop in ProcTrace (dbx) r [2] stopped at [ProcTrace:5 ,0x120005574] fprintf (stderr,"%s\n",name); (dbx) n __start [ProcTrace:6 ,0x120005598] }
-toolargs
argc
and
argv
arguments to the
main
program. For example:
#include <stdio.h> unsigned InstrumentAll(int argc, char **argv) { int i; for (i = 0; i < argc; i++) { printf(stderr,"argv[%d]: %s\n",argv[i]); } }
The following example shows how Atom passes the
-toolargs
arguments:
%
atom hello args.inst.c -toolargs="8192 4"
argv[0]: hello argv[1]: 8192 argv[2]: 4
Atom invokes a tool's instrumentation routine on a given application
program when that program is specified as the
application_program
parameter to the
atom
command, and either of the following is true:
-tool
flag of an
atom
command. By default, Atom looks for prepackaged tools in the
/usr/lib/cmplrs/atom/tools
and
/usr/lib/cmplrs/atom/examples
directories.
instrumentation_file
parameter of an
atom
command.
The instrumentation routine contains the code that traverses the objects, procedures, basic blocks, and instructions to locate instrumentation points; adds calls to analysis procedures; and builds the instrumented version of an application.
As described in the
atom_instrumentation_routines
(5)
reference page, an instrumentation routine can employ one of the
following interfaces based on the needs of the tool:
Instrument (int iargc,
char **iargv,
Obj *obj
)
Instrument
routine for each object in the application program. As a result, an
Instrument
routine
does not need to use the object navigation routines (such as
GetFirstObj
).
Because Atom automatically writes each object before passing the
next to the
Instrument
routine, the
Instrument
routine should never call the
BuildObj
,
WriteObj
,
or
ReleaseObj
routine. When using the
Instrument
interface, you can define an
InstrumentInit
routine to perform tasks required before Atom calls
Instrument
for the first object (such as defining analysis routine prototypes,
adding program level instrumentation calls, and performing global
initializations). You can also define an
InstrumentFini
routine to perform tasks required after Atom calls
Instrument
for the last object (such as global cleanup).
InstrumentAll (int iargc,
char **iargv
)
InstrumentAll
routine once for the entire application program, thus allowing a
tool's instrumentation code itself to determine how to traverse the
application's objects. With this method, there are no
InstrumentInit
or
InstrumentFini
routines.
An
InstrumentAll
routine must call the Atom object navigation routines and use the
BuildObj
,
WriteObj
,
or
ReleaseObj
routine to manage the application's objects.
Regardless of the instrumentation routine interface, Atom passes
the arguments specified in the
-toolargs
flag to the routine. In the case of the
Instrument
interface, Atom also passes a pointer to the current object.
Atom provides a comprehensive interface for instrumenting applications. The interface supports the following types of activities:
The Atom application navigation routines, described in the
atom_application_navigation
(5)
reference page, allow an Atom tool's instrumentation routine to find
locations in an application at which to add calls to analysis
procedures.
GetFirstObj
,
GetLastObj
,
GetNextObj
,
and
GetPrevObj
routines navigate among the objects of a program. For nonshared
programs, there is only one object. For call-shared programs,
the first object corresponds to the main program. The remaining
objects are each of its dynamically linked shared libraries.
GetFirstObjProc
and
GetLastObjProc
routines return a pointer to the first or last procedure,
respectively, in the specified object. The
GetNextProc
and
GetPrevProc
routines navigate among the procedures of an object.
GetFirstBlock
,
GetLastBlock
,
GetNextBlock
,
and
GetPrevBlock
routines navigate among the basic blocks of a procedure.
GetFirstInst
,
GetLastInst
,
GetNextInst
,
and
GetPrevInst
routines navigate among the instructions of a basic block.
GetInstBranchTarget
routine returns a pointer to the instruction that is the target of
a specified branch instruction.
GetProcObj
routine returns a pointer to the object that contains the specified
procedure. Similarly, the
GetBlockProc
routine returns a pointer to the procedure that contains the
specified basic block, and the
GetInstBlock
routine returns a pointer to the basic block that contains the
specified instruction.
The Atom object management routines, described in the
atom_object_management
(5)
reference page, allow an Atom tool's
InstrumentAll
routine to build, write, and release objects.
The
BuildObj
routine builds the internal data structures Atom requires to
manipulate the object. An
InstrumentAll
routine must call the
BuildObj
routine before traversing the procedures in the object and adding
analysis routine calls to the object.
The
WriteObj
routine writes the instrumented version the specified object,
deallocating the internal data structures the
BuildObj
routine previously created. The
ReleaseObj
routine deallocates the internal data structures for the given
object, but does not write out the instrumented version the object.
The
IsObjBuilt
routine returns a nonzero value if the specified object has
been built with the
BuildObj
routine but not yet written with the
WriteObj
routine or unbuilt with the
ReleaseObj
routine.
The Atom application query routines, described in the
atom_application_query
(5)
reference page,
allow an instrumentation routine to obtain static information about
a program and its objects, procedures, basic blocks, and instructions.
The
GetAnalName
routine returns the name of the analysis file, as passed to the
atom
command. This routine is useful for tools that have a single
instrumentation file and multiple analysis files. For example,
multiple cache simulators might share a single instrumentation file
but each have a different analysis file.
The
GetProgInfo
routine returns the number of objects in a program.
Table 9-3
lists the routines that provide information about a program's objects.
Routine | Description |
GetObjInfo
|
Returns information about an object's text, data, and bss segments; the number of procedures, basic blocks, or instructions it contains; its object ID; or a Boolean hint as to whether the given object should be excluded from instrumentation. |
GetObjInstArray
|
Returns an array consisting of the 32-bit instructions included in the object. |
GetObjInstCount
|
Returns the number of instructions in the array included in the array
returned by the
GetObjInstArray
routine.
|
GetObjName
|
Returns the original filename of the specified object. |
GetObjOutName
|
Returns the name of the instrumented object. |
The following instrumentation routine, which prints statistics about the program's objects, demonstrates the use of Atom object query routines:
1 #include <stdio.h> 2 #include <cmplrs/atom.inst.h> 3 unsigned InstrumentAll(int argc, char **argv) 4 { 5 Obj *o; Proc *p; 6 const unsigned int *textSection; 7 long textStart; 8 for (o = GetFirstObj(); o != NULL; o = GetNextObj(o)) { 9 BuildObj(o); 10 textSection = GetObjInstArray(o); 11 textStart = GetObjInfo(o,ObjTextStartAddress); 12 printf("Object %d\n", GetObjInfo(o,ObjID)); 13 printf(" Object name: %s\n", GetObjName(o)); 14 printf(" Text segment start: 0x%lx\n", textStart); 15 printf(" Text size: %ld\n", GetObjInfo(o,ObjTextSize)); 16 printf(" Second instruction: 0x%x\n", textSection[1]); 17 ReleaseObj(o); 18 } 19 return(0); 20 }
Because the instrumention routine adds no procedures to the executable, there is no need for an analysis procedure. The following example demonstrates the process of compiling and instrumenting a program with this tool. A sample run of the instrumented program prints the object identifier, the compile-time starting address of the text segment, the size of the text segment, and the binary for the second instruction. The disassembler provides a convenient method for finding the corresponding instructions.
%
cc hello.c -o hello
%
atom hello info.inst.c -o hello.info
Object 0 Object Name: hello Start Address: 0x120000000 Text Size: 8192 Second instruction: 0x239f001d Object 1 Object Name: /usr/shlib/libc.so Start Address: 0x3ff80080000 Text Size: 901120 Second instruction: 0x239f09cb
%
dis hello | head -3
0x120000fe0: a77d8010 ldq t12, -32752(gp) 0x120000fe4: 239f001d lda at, 29(zero) 0x120000fe8: 279c0000 ldah at, 0(at)
%
dis /ust/shlib/libc.so | head -3
0x3ff800bd9b0: a77d8010 ldq t12,-32752(gp) 0x3ff800bd9b4: 239f09cb lda at,2507(zero) 0x3ff800bd9b8: 279c0000 ldah at, 0(at)
Table 9-4 lists the routines that provide information about an object's procedures:
Routine | Description |
GetProcInfo
|
Returns information pertaining to the procedure's stack frame, register-saving, register-usage, and prologue characteristics as defined in the Calling Standard for Alpha Systems and the Assembly Language Programmer's Guide. Such values are important to tools, like Third Degree, that monitor the stack for access to uninitialized variables. It can also return such information about the procedure as the number of basic blocks or instructions it contains, its procedure ID, its lowest or highest source line number, or an indication if its address has been taken. |
ProcFileName
|
Returns the name of the source file that contains the procedure. |
ProcName
|
Returns the procedure's name. |
ProcPC
|
Returns the compile-time program counter (PC) of the first instruction in the procedure. |
Table 9-5 lists the routines that provide information about a procedure's basic blocks:
Routine | Description |
BlockPC
|
Returns the compile-time program counter (PC) of the first instruction in the basic block. |
GetBlockInfo
|
Returns the number of instructions in the basic block or the block ID. The block ID is unique to this basic block within its containing object. |
IsBranchTarget
|
Indicates if the block is the target of a branch instruction. |
Table 9-6 lists the routines that provide information about a basic block's instructions:
Routine | Description |
GetInstBinary
|
Returns a 32-bit binary representation of the assembly language instruction. |
GetInstClass
|
Returns the instruction class (for instance, floating-point load or integer store) as defined by the Alpha Architecture Reference Manual. An Atom tool uses this information to determine instruction scheduling and dual issue rules. |
GetInstInfo
|
Parses the entire 32-bit instruction and obtains all or a portion of that instruction. |
GetInstRegEnum
|
Returns the register type (floating-point or integer) from an
instruction field as returned by the
GetInstInfo
routine.
|
GetInstRegUsage
|
Returns a bit mask with one bit set for each possible source register and one bit set for each possible destination register. |
InstPC
|
Returns the compile-time program counter (PC) of the instruction. |
InstLineNo
|
Returns the instruction's source line number. |
IsInstType
|
Indicates whether the instruction is of the specified type (load instruction, store instruction, conditional branch, or unconditional branch). |
Resolving procedure names and subroutine targets is trivial for nonshared programs because all procedures are contained in the same object. However, the target of a subroutine branch in a call-shared program could be in any object.
The Atom application procedure name and call target resolution
routines, described in the
atom_application_resolvers
(5)
reference page, allow an Atom tool's instrumentation routine to find
a procedure by name and to find a target procedure for a call site:
ResolveTargetProc
routine attempts to resolve the target of a procedure call.
ResolveNamedProc
routine returns the procedure identified by the specified name string.
ReResolveProc
routine completes a procedure resolution if the procedure initially
resided in an unbuilt object.
The Atom application instrumentation routines, described
in the
atom_application_instrumentation
(5)
reference page, add arbitrary procedure calls at various points in
the application:
AddCallProto
routine to specify the prototype of each analysis procedure
to be added to the program. In other words, an
AddCallProto
call must define the procedural interface for each analysis procedure
used in calls to
AddCallProgram
,
AddCallObj
,
AddCallProc
,
AddCallBlock
,
and
AddCallInst
.
Atom provides facilities for passing integers and floating-point
numbers, arrays, branch condition values, effective addresses,
cycle counters, as well as procedure arguments and return values.
AddCallProgram
routine in an instrumentation routine to add a call to an analysis
procedure before a program starts execution or after it completes
execution. Typically, such an analysis procedure does
AddCallObj
routine in an instrumentation routine to add a call to an analysis
procedure before an object starts execution or after it completes
execution. Typically such an analysis procedure does something that
applies to the single object, such as initializing some data for
its procedures.
AddCallProc
routine in an instrumentation routine to add a call to an analysis
procedure before a procedure starts execution or after it completes
execution.
AddCallBlock
routine in an instrumentation routine to add a call to an analysis
procedure before a basic block starts execution or after it completes
execution.
AddCallInst
routine in an instrumentation routine to add a call to an analysis
procedure before a given instruction executes or after it executes.
ReplaceProcedure
routine to replace a procedure in the instrumented program.
For example, the Third Degree Atom tool replaces memory allocation
functions such as
malloc
and
free
with its own versions to allow it to check for invalid memory
accesses and memory leaks.
An Atom tool's description file, as described in the
atom_description_file
(5)
reference page, identifies and describes the tool's instrumentation
and analysis files. It can also specify the flags to be used by the
cc
,
ld
,
and
atom
commands when it is compiled, linked, and invoked.
Each Atom tool must supply at least one description file.
There are two types of Atom description file:
tool.
desc
tool.
environment.
desc
The names supplied for the
tool
and
environment
portions of these description file
names correspond to values the user specifies to the
-tool
and
-env
flags of an
atom
command when invoking the tool.
An Atom description file is a text file containing a series of tags
and values. See
atom_description_file
(5)
for a complete description of the file's syntax.
An instrumented application calls analysis procedures to perform the specific functions defined by an Atom tool. An analysis procedure can use any system call or library function, even if the same call or function is instrumented within the application. The routines used by the analysis routine and the instrumented application are physically distinct.
An analysis procedure that uses the standard I/O library should take
care to explicitly close file descriptors before the instrumented
application exits. The standard I/O library buffers
read and write requests to optimize disk accesses.
It flushes an output buffer to disk either when it is full or when
a procedure calls the
fflush
function. If the instrumented application exits before an analysis
procedure properly closes its output file descriptors, the
procedure's output may not be completely written.
Some Atom tool analysis procedures may print results to
stdout
or
stderr
.
Because the file descriptors for these I/O streams
are closed when an instrumented application calls the
exit
function, an analysis routine that is called from an instrumentation
point set by a call to the
ProgramAfter
routine can no longer send output to either.
Analysis procedures written in C++ must also take care when using the
cout
and
cerr
functions. Because these streams are buffered by the class library,
an analysis routine must call
cout.flush()
or
cerr.flush()
before the instrumented application exits.
If a process calls a
fork
function but does not call an
exec
function, the process is cloned and the child inherits an exact copy of
the parent's state. In many cases, this is exactly the behavior than an
Atom tool expects. For example, an instruction-address tracing tool
sees references for both the parent and the child, interleaved in the
order in which the references occurred.
In the case of an instruction-profiling tool (for example, the
trace
tool referenced in
Table 9-2),
the file is opened at a
ProgramBefore
instrumentation point and, as a result, the output file
descriptor is shared between the parent and the child processes.
If the results are printed at a
ProgramAfter
instrumentation point, the output file contains the parent's
data, followed by the child's data (assuming that the parent
process finishes first).
For tools that count events, the data structures that hold the
counts should be returned to zero in the child process after the
fork
call because the events occurred in the parent, not the child.
This type of Atom tool can support correct handling of
fork
calls by instrumenting the
fork
library procedure and calling an analysis procedure with the return
value of the
fork
routine as an argument. If the analysis procedure is passed a return
value of 0 (zero) in the argument, it knows that it was called from
a child process. It can then reset the counts variable or other
data structures so that they tally statistics for only the child
process.
The Atom Xlate routines, described in
Xlate
(5),
allow you to determine the instrumented PC for selected instructions.
You can use these functions to build a table that translates an
instruction's PC in the instrumented application to its PC in the
uninstrumented application.
To enable analysis code to determine the instrumented PC of an
instruction at runtime, an Atom tool's instrumentation routine must
select the instruction and place it into an address translation buffer
(XLATE
).
An Atom tool's instrumentation routine creates and fills the address
translation buffer by calling the
CreateXlate
and
AddXlateAddress
routines, respectively. An address translation buffer can only hold
instructions from a single object.
The
AddXlateAddress
routine adds the specified instruction to an existing address
translation buffer.
An Atom tool's instrumentation passes an address translation buffer
to an analysis routine by passing it as a parameter of type
XLATE *
,
as indicated in the analysis routine's prototype definition in an
AddCallProto
call.
Another way to determine an instrumented PC is to specify a
formal parameter type of
REGV
in an analysis routine's prototype and pass the
REG_IPC
value.
An Atom tool's analysis routine uses the following interfaces to access an address translation buffer passed to it:
XlateNum
routine returns the number of addresses in the
specified address translation buffer.
XlateInstTextStart
routine returns the starting address of the text
segment for the instrumented object corresponding to the specified
address translation buffer.
XlateInstTextSize
routine returns the size of the text segment.
XlateLoadShift
routine returns the difference between the runtime
addresses in the object corresponding to the specified address translation
buffer and the compile-time addresses.
XlateAddr
routine returns the instrumented runtime address for the instruction in
the specified position of the specified address translation buffer.
Note that the runtime address for an instruction in a shared library is
not necessarily the same as its compile-time address.
The following example demonstrates the use of the Xlate routines by the instrumentation and analysis files of a tool that uses the Xlate routines. This tool prints the target address of every jump instruction. To use it, issue the following instruction:
%
atom progname xlate.inst.c xlate.anal.c -all
The following source listing
(xlate.inst.c
)
contains the instrumentation for the
xlate
tool:
#include <stdlib.h> #include <stdio.h> #include <alpha/inst.h> #include <cmplrs/atom.inst.h>
static void address_add(unsigned long); static unsigned address_num(void); static unsigned long * address_paddrs(void); static void address_free(void);
void InstrumentInit(int iargc, char **iargv) { /* Create analysis prototypes. */ AddCallProto("RegisterNumObjs(int)"); AddCallProto("RegisterXlate(int, XLATE *, long[0])"); AddCallProto("JmpLog(long, REGV)");
/* Pass the number of objects to the analysis routines. */ AddCallProgram(ProgramBefore, "RegisterNumObjs", GetProgInfo(ProgNumberObjects)); }
Instrument(int iargc, char **iargv, Obj *obj) { Proc * p; Block * b; Inst * i; Xlate * pxlt; union alpha_instruction bin; ProcRes pres; unsigned long pc; char proto[128];
/* * Create an XLATE structure for this Obj. We use this to translate * instrumented jump target addresses to pure jump target addresses. */ pxlt = CreateXlate(obj, XLATE_NOSIZE);
for (p = GetFirstObjProc(obj); p; p = GetNextProc(p)) { for (b = GetFirstBlock(p); b; b = GetNextBlock(b)) { /* * If the first instruction in this basic block has had its * address taken, it's a potential jump target. Add the * instruction to the XLATE and keep track of the pure address * too. */ i = GetFirstInst(b); if (GetInstInfo(i, InstAddrTaken)) { AddXlateAddress(pxlt, i); address_add(InstPC(i)); }
for (; i; i = GetNextInst(i)) { bin.word = GetInstInfo(i, InstBinary); if (bin.common.opcode == op_jsr && bin.j_format.function == jsr_jmp) { /* * This is a jump instruction. Instrument it. */ AddCallInst(i, InstBefore, "JmpLog", InstPC(i), GetInstInfo(i, InstRB)); } } } }
/* * Re-prototype the RegisterXlate() analysis routine now that we * know the size of the pure address array. */ sprintf(proto, "RegisterXlate(int, XLATE *, long[%d])", address_num()); AddCallProto(proto);
/* * Pass the XLATE and the pure address array to this object. */ AddCallObj(obj, ObjBefore, "RegisterXlate", GetObjInfo(obj, ObjID), pxlt, address_paddrs());
/* * Deallocate the pure address array. */ address_free(); }
/* ** Maintains a dynamic array of pure addresses. */ static unsigned long * pAddrs; static unsigned maxAddrs = 0; static unsigned nAddrs = 0;
/* ** Add an address to the array. */ static void address_add( unsigned long addr) { /* * If there's not enough room, expand the array. */ if (nAddrs >= maxAddrs) { maxAddrs = (nAddrs + 100) * 2; pAddrs = realloc(pAddrs, maxAddrs * sizeof(*pAddrs)); if (!pAddrs) { fprintf(stderr, "Out of memory\n"); exit(1); } }
/* * Add the address to the array. */ pAddrs[nAddrs++] = addr; }
/* ** Return the number of elments in the address array. */ static unsigned address_num(void) { return(nAddrs); }
/* ** Return the array of addresses. */ static unsigned long *address_paddrs(void) { return(pAddrs); }
/* ** Deallocate the address array. */ static void address_free(void) { free(pAddrs); pAddrs = 0; maxAddrs = 0; nAddrs = 0; }
The following source listing
(xlate.anal.c
)
contains the analysis routine for the
xlate
tool:
#include <stdlib.h> #include <stdio.h> #include <cmplrs/atom.anal.h>
/* * Each object in the application gets one of the following data * structures. The XLATE contains the instrumented addresses for * all possible jump targets in the object. The array contains * the matching pure addresses. */ typedef struct { XLATE * pXlt; unsigned long * pAddrsPure; } ObjXlt_t;
/* * An array with one ObjXlt_t structure for each object in the * application. */ static ObjXlt_t * pAllXlts; static unsigned nObj; static int translate_addr(unsigned long, unsigned long *); static int translate_addr_obj(ObjXlt_t *, unsigned long, unsigned long *);
/* ** Called at ProgramBefore. Registers the number of objects in ** this application. */ void RegisterNumObjs( unsigned nobj) { /* * Allocate an array with one element for each object. The * elements are initialized as each object is loaded. */ nObj = nobj; pAllXlts = calloc(nobj, sizeof(pAllXlts)); if (!pAllXlts) { fprintf(stderr, "Out of Memory\n"); exit(1); } }
/* ** Called at ObjBefore for each object. Registers an XLATE with ** instrumented addresses for all possible jump targets. Also ** passes an array of pure addresses for all possible jump targets. */ void RegisterXlate( unsigned iobj, XLATE * pxlt, unsigned long * paddrs_pure) { /* * Initialize this object's element in the pAllXlts array. */ pAllXlts[iobj].pXlt = pxlt; pAllXlts[iobj].pAddrsPure = paddrs_pure; }
/* ** Called at InstBefore for each jump instruction. Prints the pure ** target address of the jump. */ void JmpLog( unsigned long pc, REGV targ) { unsigned long addr;
printf("0x%lx jumps to - ", pc); if (translate_addr(targ, &addr)) printf("0x%lx\n", addr); else printf("unknown\n"); }
/* ** Attempt to translate the given instrumented address to its pure ** equivalent. Set '*paddr_pure' to the pure address and return 1 ** on success. Return 0 on failure. ** ** Will always succede for jump target addresses. */ static int translate_addr( unsigned long addr_inst, unsigned long * paddr_pure) { unsigned long start; unsigned long size; unsigned i;
/* * Find out which object contains this instrumented address. */ for (i = 0; i < nObj; i++) { start = XlateInstTextStart(pAllXlts[i].pXlt); size = XlateInstTextSize(pAllXlts[i].pXlt); if (addr_inst >= size && addr_inst < start + size) { /* * Found the object, translate the address using that * object's data. */ return(translate_addr_obj(&pAllXlts[i], addr_inst, paddr_pure)); } }
/* * No object contains this address. */ return(0); }
/* ** Attempt to translate the given instrumented address to its ** pure equivalent using the given object's translation data. ** Set '*paddr_pure' to the pure address and return 1 on success. ** Return 0 on failure. */ static int translate_addr_obj( ObjXlt_t * pObjXlt, unsigned long addr_inst, unsigned long * paddr_pure) { unsigned num; unsigned i;
/* * See if the instrumented address matches any element in the XLATE. */ num = XlateNum(pObjXlt->pXlt); for (i = 0; i < num; i++) { if (XlateAddr(pObjXlt->pXlt, i) == addr_inst) { /* * Matches this XLATE element, return the matching pure * address. */ *paddr_pure = pObjXlt->pAddrsPure[i]; return(1); } }
/* * No match found, must not be a possible jump target. */ return(0); }
This section describes the basic tool building interface by using three simple examples: procedure tracing, instruction profiling, and data cache simulation.
The
ptrace
tool prints the names of procedures in the order in which they
are executed. The implementation adds a call to each procedure
in the application. By convention, the instrumentation for the
ptrace
tool is placed in the file
ptrace.inst.c
.
1 #include <stdio.h> 2 #include <cmplrs/atom.inst.h> [1] 3 4 unsigned InstrumentAll(int argc, char **argv) [2] 5 { 6 Obj *o; Proc *p; 7 AddCallProto("ProTrace(char *)"); [3] 8 for (o = GetFirstObj(); o != NULL; o = GetNextObj(o)) { [4] 9 if (BuildObj(o) return 1; [5] 10 for (p = GetFirstObjProc(o); p != NULL; p = GetNextProc(p)) { [6] 11 const char *name = ProcName(p); [7] 12 if (name == NULL) name = "UNKNOWN"; [8] 13 AddCallProc(p,ProcBefore,"ProcTrace",name); [9] 14 } 15 WriteObj(o); [10] 16 } 17 return(0); 18 }
InstrumentAll
procedure. This instrumentation routine defines the interface to
each analysis procedure and inserts calls to those procedures at
the correct locations in the applications it instruments.
[Return to example]
AddCallProto
routine to define the
ProcTrace
analysis procedure.
ProcTrace
takes a single argument of type
char*
.
[Return to example]
GetFirstObj
and
GetNextObj
routines to cycle through each object in the application. If the
program was linked nonshared, there is only a single object. If the
program was linked call-shared, it contains multiple objects - one
for the main executable and one for each dynamically-linked shared
library. The main program is always the first object.
[Return to example]
InstrumentAll
routine reports this condition to Atom by returning a nonzero value.
[Return to example]
GetFirstObjProc
and
GetNextProc
routines to step through each procedure in the application program.
[Return to example]
ProcName
procedure to find the procedure name. Depending on the amount of symbol
table information that is available in the application, some procedure names,
such as those defined as
static
,
may not be available. (Compiling applications with the
-g1
flag provides this level of symbol information.)
In these cases, Atom returns
NULL
.
[Return to example]
NULL
procedure name string to
UNKNOWN
.
[Return to example]
AddCallProc
routine to add a call to the procedure pointed to by
p
.
The
ProcBefore
argument indicates that the analysis procedure is to be added before
all other instructions in the procedure. The name of the
analysis procedure to be called at this instrumentation point is
ProcTrace
.
The final argument is to be passed to the analysis procedure.
In this case, it is the procedure named obtained on Line 11.
[Return to example]
The instrumentation file added calls to the
ProcTrace
analysis procedure. This procedure is defined in the analysis file
ptrace.anal.c
as shown in the following example:
1 #include <stdio.h> 2 3 void ProcTrace(char *name) 4 { 5 fprintf(stderr, "%s\n",name); 6 }
The
ProcTrace
analysis procedure prints, to
stderr
,
the character string passed to it as an argument.
Note that an analysis procedure cannot return a value.
Once the instrumentation and analysis files are specified, the tool is complete. To illustrate the application of this tool, we compile and link the following application:
#include <stdio.h> main() { printf("Hello world!\n"); }
The following example builds a nonshared executable, applies the
ptrace
tool, and runs the instrumented executable.
This simple program calls almost 30 procedures.
%
cc -non_shared hello.c -o hello
%
atom hello ptrace.inst.c ptrace.anal.c -o hello.ptrace
%
hello.ptrace
__start main printf _doprnt __getmbcurmax strchr strlen memcpy . . .
The following example repeats this process with the application linked
call-shared. The major difference is that the
LD_LIBRARY_PATH
environment variable must be set to the current directory because Atom
creates an instrumented version of the
libc.so
shared library in the local directory.
%
cc hello.c -o hello
%
atom hello ptrace.inst.c ptrace.anal.c -o hello.ptrace
%
setenv LD_LIBRARY_PATH `pwd`
%
hello.ptrace
__start _call_add_gp_range __exc_add_gp_range malloc cartesian_alloc cartesian_growheap2 __getpagesize __sbrk . . .
The call-shared version of the application calls almost twice the number of procedures that the nonshared version calls.
Note that only calls in the original application program are
instrumented. Because the call to the
ProcTrace
analysis procedure did not occur in the original application, it
does not appear in a trace of the instrumented application procedures.
Likewise, the standard library calls that print the names of each
procedure are also not included.
If the application and the analysis program both call the
printf
function, Atom would link into the instrumented application two
copies of the function. Only the copy in the application program
would be instrumented. Atom also correctly instruments procedures
that have multiple entry points.
The
prof
example tool counts the number of instructions a program executes.
It is useful for finding critical sections of code.
Each time the application is executed,
prof
creates a file called
prof.out
that contains a profile of the number of instructions that
are executed in each procedure.
The most efficient place to compute instruction counts is inside
each basic block. Each time a basic block is executed, a fixed number
of instructions are executed. The following example shows how the
prof
tool's instrumentation procedure
(prof.inst.c
)
performs these tasks:
1 #include <stdio.h> 2 #include <cmplrs/atom.inst.h> 3 4 unsigned InstrumentAll(int argc, char **argv) 5 { 6 Obj *o; Proc *p; Block *b; Inst *i; 7 int n = 0; 8 AddCallProto("OpenFile(int)"); [1] 9 AddCallProto("Count(int,int)"); 10 AddCallProto("Print(int,char *)"); 11 AddCallProto("CloseFile()"); 12 for (o = GetFirstObj(); o != NULL; o = GetNextObj(o)) { [2] 13 if (BuildObj(o)) return (1); [3] 14 for (p = GetFirstObjProc(o); p != NULL; p = GetNextProc(p)) { [4] 15 const char *name = ProcName(p); [5] 16 if (name == NULL) name = "UNKNOWN";
17 for (b = GetFirstBlock(p); b != NULL; b = GetNextBlock(b)) { [6] 18 AddCallBlock(b,BlockBefore,"Count",n, [7] GetBlockInfo(b,BlockNumberInsts)); 19 } 20 AddCallProgram(ProgramAfter,"Print",n,name); [8] 21 n++; [9] 22 } 23 WriteObj(o); [10] 24 } 25 AddCallProgram(ProgramBefore,"OpenFile",n); [11] 26 AddCallProgram(ProgramAfter,"CloseFile"); [12] 27 return (0); 28 }
Count
analysis procedure before any of the instructions in this basic block
are executed. The argument types of the
Count
are defined in the prototype on Line 9.
The first argument is a procedure index of type
int
;
the second argument, also an
int
,
is the number of instructions in the basic block. The
Count
analysis procedure adds the number of instructions in the basic block
to a per-procedure data structure.
[Return to example]
Print
analysis procedure to the end of the program. The
Print
analysis procedure prints a line summarizing this procedure's
instruction use.
[Return to example]
OpenFile
analysis procedure to the beginning of the program, passing it an
int
representing the number of procedures in the application. The
OpenFile
procedure allocates the per-procedure data structure that tallies
instructions and opens the output file.
[Return to example]
CloseFile
analysis procedure to the end of the program.
[Return to example]
The analysis procedures used by the
prof
tool are defined in the
prof.anal.c
file as shown in the following example:
1 #include <stdio.h> 2 #include <assert.h> 3 4 long *instrPerProc; 5 FILE *file; 6 7 void OpenFile(int n) 8 { 9 instrPerProc = (long *) calloc(sizeof(long),n); [1] 10 assert(instrPerProc != NULL); 11 file = fopen("prof.out","w"); 12 assert(file != NULL); 13 fprintf(file,"%30s %15s %10s\n","Procedure","Instructions","Percentage"); 14 } 15 void Count(int n, int instructions) 16 { 17 instrTotal += instructions; 18 instrPerProc[n] += instructions; 19 } 20 void Print(int n, char *name) 21 { 22 if (instrPerProc[n] > 0) { [2] 23 fprintf(file,"%30s %15ld %9.3f\n", name, instrPerProc[n], 24 ((float) instrPerProc[n] / instrTotal)*100.0); 25 } 26 } 27 void CloseFile() [3] 28 { 29 fprintf(file,"\n%30s %15ld %9.3f\n", "Total", instrTotal,100.0); 30 fclose(file); 31 }
calloc
function zero-fills the counts data.
[Return to example]
Once the instrumentation and analysis files are specified, the tool is complete. To illustrate the application of this tool, we compile and link the "Hello" application:
#include <stdio.h> main() { printf("Hello world!\n"); }
The following example builds a call-shared executable, applies the
prof
tool, and runs the instrumented executable. In contrast to the
ptrace
tool described in
Section 9.2.7.1,
the
prof
tool sends its output to a file instead of
stdout
.
%
cc hello.c -o hello
%
atom hello prof.inst.c prof.anal.c -o hello.prof
%
setenv LD_LIBRARY_PATH `pwd`
%
hello.prof
Hello world!
%
more prof.out
Procedure Instructions Percentage __start 159 4.941 main 14 0.435 . . . _call_add_gp_range 41 1.274 _call_remove_gp_range 35 1.088
Total 3218 100.000
%
unsetenv LD_LIBRARY_PATH
Instruction and data address tracing has been used for many years as a technique for capturing and analyzing cache behavior. Unfortunately, current machine speeds make this increasingly difficult. For example, the Alvinn SPEC92 benchmark executes 961,082,150 loads, 260,196,942 stores, and 73,687,356 basic blocks, for a total of 2,603,010,614 Alpha instructions. Storing the address of each basic block and the effective address of all the loads and stores would take in excess of 10GB and slow down the application by a factor of over 100.
The
cache
tool uses on-the-fly simulation to determine the cache miss rates
of an application running in an 8KB direct mapped cache.
The following example shows its instrumentation routine:
1 #include <stdio.h> 2 #include <cmplrs/atom.inst.h> 3 4 unsigned InstrumentAll(int argc, char **argv) 5 { 6 Obj *o; Proc *p; Block *b; Inst *i; 7 8 AddCallProto("Reference(VALUE)"); 9 AddCallProto("Print()"); 10 for (o = GetFirstObj(); o != NULL; o = GetNextObj(o)) { 11 if (BuildObj(o)) return (1); 12 for (p=GetFirstProc(); p != NULL; p = GetNextProc(p)) { 13 for (b = GetFirstBlock(p); b != NULL; b = GetNextBlock(b)) { 14 for (i = GetFirstInst(b); i != NULL; i = GetNextInst(i)) { [1] 15 if (IsInstType(i,InstTypeLoad) || IsInstType(i,InstTypeStore)) { 16 AddCallInst(i,InstBefore,"Reference",EffAddrValue); [2] 17 } 18 } 19 } 20 } 21 WriteObj(o); 22 } 23 AddCallProgram(ProgramAfter,"Print"); 24 return (0); 25 }
Reference
analysis procedure, passing the effective address of the data reference.
[Return to example]
The analysis procedures used by the
cache
tool are defined in the
cache.anal.c
file as shown in the following example:
1 #include <stdio.h> 2 #include <assert.h> 3 #define CACHE_SIZE 8192 4 #define BLOCK_SHIFT 5 5 long tags[CACHE_SIZE >> BLOCK_SHIFT]; 6 long references, misses; 7 8 void Reference(long address) { 9 int index = (address & (CACHE_SIZE-1)) >> BLOCK_SHIFT; 10 long tag = address >> BLOCK_SHIFT; 11 if tags[index] != tag) { 12 misses++; 13 tags[index] = tag; 14 } 15 references++; 16 } 17 void Print() { 18 FILE *file = fopen("cache.out","w"); 19 assert(file != NULL); 20 fprintf(file,"References: %ld\n", references); 21 fprintf(file,"Cache Misses: %ld\n", misses); 22 fprintf(file,"Cache Miss Rate: %f\n", (100.0 * misses) / references); 23 fclose(file); 24 }
Once the instrumentation and analysis files are specified, the tool is complete. To illustrate the application of this tool, we compile and link the "Hello" application:
#include <stdio.h> main() { printf("Hello world!\n"); }
The following example applies the
cache
tool to instrument both the nonshared and call-shared versions
of the application:
%
cc hello.c -o hello
%
atom hello cache.inst.c cache.anal.c -o hello.cache -all
%
setenv LD_LIBRARY_PATH `pwd`
%
hello.cache
Hello world!
%
more cache.out
References: 1091 Cache Misses: 225 Cache Miss Rate: 20.623281
%
cc -non_shared hello.c -o hello
%
atom hello cache.inst.c cache.anal.c -o hello.cache -all
%
hello.cache
Hello world!
%
more cache.out
References: 382 Cache Misses: 93 Cache Miss Rate: 24.345550