 |
Index for Section 5 |
|
 |
Alphabetical listing for H |
|
 |
Bottom of page |
|
hiprof(5)
NAME
hiprof - Hierarchical instruction profiler
SYNOPSIS
atom appl_prog -tool hiprof [-env threads] [-toolargs="arg1 arg2..."]
[atom_options...]
This interface will be retired in a future major release. See hiprof(1) for
the replacement interface.
OPERANDS
appl_prog
File name of a fully linked shared or nonshared executable to be
profiled. This program should be compiled with the -g1, -g2, or
-g3 option to obtain more complete profiling information. If the
default symbol table level (-g0) has been used, line number
information, static procedure names, and file names are unavailable
to the profiler.
OPTIONS
-tool hiprof
Identifies the hiprof tool to atom.
-env threads
Specifies that the hiprof tool is being invoked on an application that
runs in a threaded environment. To make run-time analysis of an
application threadsafe, you must specify -env threads in the hiprof
command. Only POSIX threads created using the pthread_create function
are supported.
The threadsafe instrumented executable is named
appl_prog.hiprof.threads by default. You may omit the -env threads
option if the application does not create threads; in this case the
instrumented executable is named appl_prog.hiprof.
-toolargs="arg1 arg2 ..."
Passes arguments (listed below, in this section) to the hiprof tool's
instrumentation routines. Use whitespace characters to separate
arguments from their parameters (if any) and from other arguments.
If you need to represent spaces within argument parameters (such as
within a parameter to the -exc argument), use matching single-quotes or
matching double-quotes, making sure that you avoid having the shell
interpret those characters as shell-special characters. For example:
-toolargs="-exc 'strstreambase::strstreambase(char*, \
int, char*)'"
-toolargs='-exc "operator -" -exc "ostream::operator \
<<" -exc main -exc "operator new(unsigned long)"'
atom_options
Specifies options to the atom command. See the atom(1) reference page
for descriptions of other options accepted by the atom command, such as
those that enable instrumentation of shared libraries, specify the
names of instrumented objects, and request debugging information.
After you have instrumented an application that uses libc.so,
libpthread.so, or other shared libraries, you must set the
LD_LIBRARY_PATH environment variable to point to the directory
containing the instrumented shared libraries. Typically, this would be
the current directory or the directory specified by the -shlibdir
option. (You may leave LD_LIBRARY_PATH pointing to this directory while
running other, uninstrumented applications.)
The hiprof tool allows the following arguments (options) to be passed in
the -toolargs option for use by the hiprof tool's instrumentation routine
when instrumenting appl_prog.
-calltime
Causes hiprof to apply more precise, pthread-dependent profiling
process-wide. This style of profiling measures the cost of calls during
each call. By default, hiprof uses threadsafe, pthread-independent
profiling, which shows the cost of calls proportional to the number of
calls.
-cputime
Causes hiprof to use CPU time obtained from the processor cycle
counter, for non-threaded programs only. It has the same effect as
-calltime when -env threads is specified. The cycle counter will wrap,
yielding an incorrect profile, unless an instrumented procedure is
called at least every few seconds.
-dirname directory-path
Specifies the directory path in which hiprof creates the .hiout profile
files. The path specified with -dirname is pre-pended to the path and
filename specified with -hiout, if any. See Specifying Profile File
Names and Locations.
-exc procname
Excludes time spent in procname from the profile. This switch can be
used multiple times to exclude multiple procedures. To represent all
of the variations of an overloaded C++ function name, you can specify
just the part of the name up to but not including the "(".
-fastrecur
Invokes a simpler heuristic for mapping recursion into a hierarchical
report when used with the -calltime, -cputime, or -pagefaults option.
Program execution may be faster, but the profile may be less intuitive.
-fork
Indicates that a call-shared program forks. You must specify the -fork
option if libc.so is not being fully instrumented and the call-shared
program being instrumented makes a fork or vfork system call. When the
-fork option is specified, each child process produces a separate
profiling data file (or possibly several if the -threads option is also
specified) unless it makes an exec system call. A profile generated
from all of the profiling data files represents the behavior of the
parent process and its children; a profile generated from any single
profiling data file represents the single process or thread associated
with that file.
-hiout filename
Specifies a name and, optionally, a directory path for the .hiout
profile file. The filename specified overrides the default appl_prog
portion of the profile filename. Any directory path specified with
-dirname is pre-pended to filename. See Specifying Profile File Names
and Locations.
-nolog
Disables use of a trace buffer for -cputime. This is useful for
studying the performance of hiprof.
-nousr
Excludes user execution time from the profile.
-[no]pids
Include (or not include) the process ID of the process running the
program in the name of the hiprof profile file produced by the
instrumented application.
-pagefaults
Measures pagefaults instead of program execution time. Only works for
nonthreaded programs.
-samples
Causes hiprof to profile CPU time in all selected code using profil(2).
The resulting profile is a statistical sampling rather than a
measurement, but it reflects the memory access delays suffered by the
program, and it is usefully accurate when the run time is more than a
few seconds (the longer the better). You can use the -asm, -heavy, and
-lines options of gprof(1) to display more finely grained profiles at
the level of source lines and machine instructions. The -gp and -A0
atom command options should be used with the -samples option.
-sigdump sig
Causes the process running the instrumented application to catch the
signal indicated by sig (see signal(4)). When it receives that signal,
the process writes the current profiling data to the output file,
reinitializes the profile by setting the execution time to zero, and
resumes execution.
-systime
Incorporates cycle counter estimates of system time into instruction
count estimates of user time when used with the -calltime option.
-threads
When used with the -calltime or -cputime options (and -env threads is
specified on the atom command line), causes hiprof to separately
profile each individual thread in the process. Otherwise, hiprof
provides process-wide profiling.
-textout
When used with the -calltime, -cputime, or -pagefaults options,
produces a text-format profile file instead of a binary profiling data
file. This file is similar to the output from gprof, although it cannot
be combined or filtered. It also contains additional statistics on the
instrumentation that has been used on appl_prog. By default, the
profile file contains binary data that the gprof utility can combine
with other profiles and filter, prior to generating a report.
When -textout is specified with -env threads, each thread is
individually profiled, as if -threads had also been specified.
-verbose
Prints the names of any procedures that were not instrumented.
While the instrumented appl_prog is being executed, options specified in
the definition of the HIPROF_ARGS environment variable override any
corresponding settings in the -toolargs options. For example:
% setenv HIPROF_ARGS "-dirname /tmp/profiles -pids"
The -dirname, -fastrecur, -hiout, -pids, -sigdump, -textout, and -threads
options can be specified in HIPROF_ARGS.
DESCRIPTION
The hiprof tool is most conveniently used by means of the hiprof(1)
command. It is an Atom-based program profiling tool that produces both flat
and hierarchical profiles. The flat profile shows the execution time spent
in any given procedure. The hierarchical profile shows the time spent in a
given procedure and all its descendents. The hierarchical profile enables
the user to answer questions of the form "How much time is spent in
printf() and all procedures called by printf()?".
The hiprof tool's output is similar to that generated by the -pg option of
the cc command. However, hiprof uses Atom, not a compiler, to instrument
the program. The gprof command is usually used to filter and merge output
files and to format profile reports.
The hiprof tool generates an instrumented version of appl_prog. The
instrumented program behaves identically to the original except that it
writes out an execution profile after it is done.
If you are instrumenting a shared-library program, you will probably need
to set the LD_LIBRARY_PATH environment variable (see atom(1) for more
information).
Multiple profile files can be created by a single program run because a
separate profile can optionally be generated for each thread of each
process.
Specifying Profile File Names and Locations
By default, the profile file is created in the current directory and its
name has the following form:
appl_prog.pid.tid.hiout
The pid (process ID) portion of the filename appears only if you specify
the -pids or -fork option. The tid (thread ID) portion appears only if you
specify both -env threads and -threads.
You can specify that the file be created in another directory by using the
-dirname option.
You can specify a different name (including a directory path) for the
appl_prog portion of the filename by using the -hiout option. For example,
the following -toolargs entry in the atom command line:
-toolargs="-hiout /test/file1"
causes the profile filename to have the form /test/file1.pid.tid.hiout
Any directory path specified with -dirname is pre-pended to the directory
path and filename specified with -hiout, if any.
Resetting the Profile
It is sometimes useful to start profiling part way into the execution of a
program. For example, a user may wish to omit program initialization from
the profile. Also, it is sometimes useful to force the program to print
its profile even before it has finished executing. For example, a user
might wish to extract the profile of a running file server. The hiprof
tool provides a mechanism to do these things.
If you specify the -sigdump option in the atom command line or define the
-sigdump option in the HIPROF_ARGS environment variable, the specified
signal will be caught by the process. When it receives that signal, the
process writes the current profiling data to the output file, reinitializes
the profile by setting the execution time to zero, and resumes execution.
The process can be signaled any number of times during its execution.
If you do not specify the -textout option in the atom command line or
define it in the HIPROF_ARGS environment variable (that is, when you are
producing binary profile files for gprof), each signal causes the process
to overwrite any existing file.
If you do specify the -textout option (that is, when you are producing
text-format profile files), the output file will contain two sets of
profile data when the process completes execution:
· From the beginning of the program to the point at which the signal was
received
· From the point each signal was received to the end of the program
For example:
setenv HIPROF_ARGS "-sigdump USR1"
application_program.hiprof &
<wait until the desired time>
kill -USR1 pid
User Time Profiling
The hiprof tool provides three different ways of estimating user execution
time: instruction count, the cycle counter, and sampling. By default, the
hiprof tool estimates execution time by counting the number of user-level
instructions executed. However, if the -cputime option is specified during
instrumentation, CPU time is estimated using the hardware cycle counter.
This involves looking at the value of the hardware cycle counter before and
after a procedure call to determine the time spent in the procedure. The
same technique is used (with the -pagefaults option) to determine the
number of page faults that occur in each procedure. If the -sampling option
is specified, profil(2) is used to sample the program counter (current
instruction pointer) about every millisecond, to yield a statistical
profile.
The advantage of instruction counts is that they are repeatable, at least
for non-threaded programs. If a program is run twice with identical inputs,
the instruction counts for both runs will be identical. The disadvantage
of instruction counts is that they do not account for memory access delays
which degrade the execution time of a real program.
The advantage of using the cycle counter is that memory access delays are
accounted for. The disadvantage is that the presence of the instrumentation
code can degrade the performance of the memory system. If an application
procedure is short (100 or so instructions), then times reported for both
the short procedure and the procedure calling the short procedure can be
unrealistically pessimistic. If a significant fraction of an application's
time is spent in a short procedure, it may be better not to instrument that
procedure at all. To exclude procedure procname from instrumentation, you
can specify the -exc procname option in the atom command line. If a
procedure is not instrumented, its run time is charged to its parent and
all calls made by the procedure appear to be made by the parent.
The advantages of sampling are:
· It reflects memory access delays for either non-threaded or threaded
programs.
· The coarse millisecond precision avoids counter-wrapping problems.
· Its use of a separate counter per instruction (not per procedure)
allows fine grain profiles of source lines and instructions to be
generated with the gprof(1) command's -asm, -heavy, or -lines option.
System Time Profiling
By default, the hiprof tool uses instruction counts and omits system time
from its estimates of execution time. However, passing the -cputime option
in the -toolargs option to hiprof's instrumentation routine causes the
instrumentation routine to use the hardware cycle counter to measure both
user and system CPU time. If you specify the -calltime option to the
-toolargs option on the atom command line, you can specify the -systime
option (either in -toolargs or in the HIPROF_ARGS environment variable) to
incorporate cycle counter estimates of system time into instruction count
estimates of user time. You can exclude user execution time from the
profile by using the -nouser option in the -toolargs option at
instrumentation time.
Multiple Processes and Threads
When a program calls fork, an additional output file is created for the new
child process if the -fork option was specified. The child's output file
reports only the execution time used by the child process following the
fork. The parent's output file reports the execution time of the parent
process both before and after the fork. Similarly, when a threaded
application creates a new thread, a separate profile is created for that
thread if the -threads option was specified. Note that some procedures
occur as both children of other procedures and as spontaneous procedures. A
procedure with one or more parents is never listed separately in the call
graph display, even if sometimes it is spontaneously generated.
If a process calls exec and the exec succeeds, then all execution time
statistics from the creation of the process up to the exec are lost. This
occurs because the profile statistics are lost when the exec overwrites the
address space. For the most part, this is not a problem because calls to
exec are usually immediately preceded by a fork. If the program being
invoked by the exec call is instrumented, then the execution time of the
process following the exec is reported in that new program's output file.
RESTRICTIONS
If a procedure contains interprocedural branches or interprocedural jumps,
that procedure will not be instrumented if the -calltime, -cputime, or
-pagefaults option was specified, and no information will be reported about
that procedure. Use the -verbose option to see which procedures were not
instrumented. Compilers can optimize return statements or non-returning
function calls to interprocedural branches. To avoid this, recompile with
-O0 or -no_inline.
FILES
appl_prog.hiprof
Default name for instrumented version of appl_prog
appl_prog.hiout
Default name of profile output file
SEE ALSO
atom(1), hiprof(1), gprof(1), cc(1), dxprof(1). (dxprof is available as an
option.)
Programmer's Guide
 |
Index for Section 5 |
|
 |
Alphabetical listing for H |
|
 |
Top of page |
|