Index Index for
Section 5
Index Alphabetical
listing for H
Index Bottom of
page

hiprof(5)

NAME

hiprof - Hierarchical instruction profiler

SYNOPSIS

atom appl_prog -tool hiprof [-env threads] [-toolargs="arg1 arg2..."] [atom_options...] This interface will be retired in a future major release. See hiprof(1) for the replacement interface.

OPERANDS

appl_prog File name of a fully linked shared or nonshared executable to be profiled. This program should be compiled with the -g1, -g2, or -g3 option to obtain more complete profiling information. If the default symbol table level (-g0) has been used, line number information, static procedure names, and file names are unavailable to the profiler.

OPTIONS

-tool hiprof Identifies the hiprof tool to atom. -env threads Specifies that the hiprof tool is being invoked on an application that runs in a threaded environment. To make run-time analysis of an application threadsafe, you must specify -env threads in the hiprof command. Only POSIX threads created using the pthread_create function are supported. The threadsafe instrumented executable is named appl_prog.hiprof.threads by default. You may omit the -env threads option if the application does not create threads; in this case the instrumented executable is named appl_prog.hiprof. -toolargs="arg1 arg2 ..." Passes arguments (listed below, in this section) to the hiprof tool's instrumentation routines. Use whitespace characters to separate arguments from their parameters (if any) and from other arguments. If you need to represent spaces within argument parameters (such as within a parameter to the -exc argument), use matching single-quotes or matching double-quotes, making sure that you avoid having the shell interpret those characters as shell-special characters. For example: -toolargs="-exc 'strstreambase::strstreambase(char*, \ int, char*)'" -toolargs='-exc "operator -" -exc "ostream::operator \ <<" -exc main -exc "operator new(unsigned long)"' atom_options Specifies options to the atom command. See the atom(1) reference page for descriptions of other options accepted by the atom command, such as those that enable instrumentation of shared libraries, specify the names of instrumented objects, and request debugging information. After you have instrumented an application that uses libc.so, libpthread.so, or other shared libraries, you must set the LD_LIBRARY_PATH environment variable to point to the directory containing the instrumented shared libraries. Typically, this would be the current directory or the directory specified by the -shlibdir option. (You may leave LD_LIBRARY_PATH pointing to this directory while running other, uninstrumented applications.) The hiprof tool allows the following arguments (options) to be passed in the -toolargs option for use by the hiprof tool's instrumentation routine when instrumenting appl_prog. -calltime Causes hiprof to apply more precise, pthread-dependent profiling process-wide. This style of profiling measures the cost of calls during each call. By default, hiprof uses threadsafe, pthread-independent profiling, which shows the cost of calls proportional to the number of calls. -cputime Causes hiprof to use CPU time obtained from the processor cycle counter, for non-threaded programs only. It has the same effect as -calltime when -env threads is specified. The cycle counter will wrap, yielding an incorrect profile, unless an instrumented procedure is called at least every few seconds. -dirname directory-path Specifies the directory path in which hiprof creates the .hiout profile files. The path specified with -dirname is pre-pended to the path and filename specified with -hiout, if any. See Specifying Profile File Names and Locations. -exc procname Excludes time spent in procname from the profile. This switch can be used multiple times to exclude multiple procedures. To represent all of the variations of an overloaded C++ function name, you can specify just the part of the name up to but not including the "(". -fastrecur Invokes a simpler heuristic for mapping recursion into a hierarchical report when used with the -calltime, -cputime, or -pagefaults option. Program execution may be faster, but the profile may be less intuitive. -fork Indicates that a call-shared program forks. You must specify the -fork option if libc.so is not being fully instrumented and the call-shared program being instrumented makes a fork or vfork system call. When the -fork option is specified, each child process produces a separate profiling data file (or possibly several if the -threads option is also specified) unless it makes an exec system call. A profile generated from all of the profiling data files represents the behavior of the parent process and its children; a profile generated from any single profiling data file represents the single process or thread associated with that file. -hiout filename Specifies a name and, optionally, a directory path for the .hiout profile file. The filename specified overrides the default appl_prog portion of the profile filename. Any directory path specified with -dirname is pre-pended to filename. See Specifying Profile File Names and Locations. -nolog Disables use of a trace buffer for -cputime. This is useful for studying the performance of hiprof. -nousr Excludes user execution time from the profile. -[no]pids Include (or not include) the process ID of the process running the program in the name of the hiprof profile file produced by the instrumented application. -pagefaults Measures pagefaults instead of program execution time. Only works for nonthreaded programs. -samples Causes hiprof to profile CPU time in all selected code using profil(2). The resulting profile is a statistical sampling rather than a measurement, but it reflects the memory access delays suffered by the program, and it is usefully accurate when the run time is more than a few seconds (the longer the better). You can use the -asm, -heavy, and -lines options of gprof(1) to display more finely grained profiles at the level of source lines and machine instructions. The -gp and -A0 atom command options should be used with the -samples option. -sigdump sig Causes the process running the instrumented application to catch the signal indicated by sig (see signal(4)). When it receives that signal, the process writes the current profiling data to the output file, reinitializes the profile by setting the execution time to zero, and resumes execution. -systime Incorporates cycle counter estimates of system time into instruction count estimates of user time when used with the -calltime option. -threads When used with the -calltime or -cputime options (and -env threads is specified on the atom command line), causes hiprof to separately profile each individual thread in the process. Otherwise, hiprof provides process-wide profiling. -textout When used with the -calltime, -cputime, or -pagefaults options, produces a text-format profile file instead of a binary profiling data file. This file is similar to the output from gprof, although it cannot be combined or filtered. It also contains additional statistics on the instrumentation that has been used on appl_prog. By default, the profile file contains binary data that the gprof utility can combine with other profiles and filter, prior to generating a report. When -textout is specified with -env threads, each thread is individually profiled, as if -threads had also been specified. -verbose Prints the names of any procedures that were not instrumented. While the instrumented appl_prog is being executed, options specified in the definition of the HIPROF_ARGS environment variable override any corresponding settings in the -toolargs options. For example: % setenv HIPROF_ARGS "-dirname /tmp/profiles -pids" The -dirname, -fastrecur, -hiout, -pids, -sigdump, -textout, and -threads options can be specified in HIPROF_ARGS.

DESCRIPTION

The hiprof tool is most conveniently used by means of the hiprof(1) command. It is an Atom-based program profiling tool that produces both flat and hierarchical profiles. The flat profile shows the execution time spent in any given procedure. The hierarchical profile shows the time spent in a given procedure and all its descendents. The hierarchical profile enables the user to answer questions of the form "How much time is spent in printf() and all procedures called by printf()?". The hiprof tool's output is similar to that generated by the -pg option of the cc command. However, hiprof uses Atom, not a compiler, to instrument the program. The gprof command is usually used to filter and merge output files and to format profile reports. The hiprof tool generates an instrumented version of appl_prog. The instrumented program behaves identically to the original except that it writes out an execution profile after it is done. If you are instrumenting a shared-library program, you will probably need to set the LD_LIBRARY_PATH environment variable (see atom(1) for more information). Multiple profile files can be created by a single program run because a separate profile can optionally be generated for each thread of each process. Specifying Profile File Names and Locations By default, the profile file is created in the current directory and its name has the following form: appl_prog.pid.tid.hiout The pid (process ID) portion of the filename appears only if you specify the -pids or -fork option. The tid (thread ID) portion appears only if you specify both -env threads and -threads. You can specify that the file be created in another directory by using the -dirname option. You can specify a different name (including a directory path) for the appl_prog portion of the filename by using the -hiout option. For example, the following -toolargs entry in the atom command line: -toolargs="-hiout /test/file1" causes the profile filename to have the form /test/file1.pid.tid.hiout Any directory path specified with -dirname is pre-pended to the directory path and filename specified with -hiout, if any. Resetting the Profile It is sometimes useful to start profiling part way into the execution of a program. For example, a user may wish to omit program initialization from the profile. Also, it is sometimes useful to force the program to print its profile even before it has finished executing. For example, a user might wish to extract the profile of a running file server. The hiprof tool provides a mechanism to do these things. If you specify the -sigdump option in the atom command line or define the -sigdump option in the HIPROF_ARGS environment variable, the specified signal will be caught by the process. When it receives that signal, the process writes the current profiling data to the output file, reinitializes the profile by setting the execution time to zero, and resumes execution. The process can be signaled any number of times during its execution. If you do not specify the -textout option in the atom command line or define it in the HIPROF_ARGS environment variable (that is, when you are producing binary profile files for gprof), each signal causes the process to overwrite any existing file. If you do specify the -textout option (that is, when you are producing text-format profile files), the output file will contain two sets of profile data when the process completes execution: · From the beginning of the program to the point at which the signal was received · From the point each signal was received to the end of the program For example: setenv HIPROF_ARGS "-sigdump USR1" application_program.hiprof & <wait until the desired time> kill -USR1 pid User Time Profiling The hiprof tool provides three different ways of estimating user execution time: instruction count, the cycle counter, and sampling. By default, the hiprof tool estimates execution time by counting the number of user-level instructions executed. However, if the -cputime option is specified during instrumentation, CPU time is estimated using the hardware cycle counter. This involves looking at the value of the hardware cycle counter before and after a procedure call to determine the time spent in the procedure. The same technique is used (with the -pagefaults option) to determine the number of page faults that occur in each procedure. If the -sampling option is specified, profil(2) is used to sample the program counter (current instruction pointer) about every millisecond, to yield a statistical profile. The advantage of instruction counts is that they are repeatable, at least for non-threaded programs. If a program is run twice with identical inputs, the instruction counts for both runs will be identical. The disadvantage of instruction counts is that they do not account for memory access delays which degrade the execution time of a real program. The advantage of using the cycle counter is that memory access delays are accounted for. The disadvantage is that the presence of the instrumentation code can degrade the performance of the memory system. If an application procedure is short (100 or so instructions), then times reported for both the short procedure and the procedure calling the short procedure can be unrealistically pessimistic. If a significant fraction of an application's time is spent in a short procedure, it may be better not to instrument that procedure at all. To exclude procedure procname from instrumentation, you can specify the -exc procname option in the atom command line. If a procedure is not instrumented, its run time is charged to its parent and all calls made by the procedure appear to be made by the parent. The advantages of sampling are: · It reflects memory access delays for either non-threaded or threaded programs. · The coarse millisecond precision avoids counter-wrapping problems. · Its use of a separate counter per instruction (not per procedure) allows fine grain profiles of source lines and instructions to be generated with the gprof(1) command's -asm, -heavy, or -lines option. System Time Profiling By default, the hiprof tool uses instruction counts and omits system time from its estimates of execution time. However, passing the -cputime option in the -toolargs option to hiprof's instrumentation routine causes the instrumentation routine to use the hardware cycle counter to measure both user and system CPU time. If you specify the -calltime option to the -toolargs option on the atom command line, you can specify the -systime option (either in -toolargs or in the HIPROF_ARGS environment variable) to incorporate cycle counter estimates of system time into instruction count estimates of user time. You can exclude user execution time from the profile by using the -nouser option in the -toolargs option at instrumentation time. Multiple Processes and Threads When a program calls fork, an additional output file is created for the new child process if the -fork option was specified. The child's output file reports only the execution time used by the child process following the fork. The parent's output file reports the execution time of the parent process both before and after the fork. Similarly, when a threaded application creates a new thread, a separate profile is created for that thread if the -threads option was specified. Note that some procedures occur as both children of other procedures and as spontaneous procedures. A procedure with one or more parents is never listed separately in the call graph display, even if sometimes it is spontaneously generated. If a process calls exec and the exec succeeds, then all execution time statistics from the creation of the process up to the exec are lost. This occurs because the profile statistics are lost when the exec overwrites the address space. For the most part, this is not a problem because calls to exec are usually immediately preceded by a fork. If the program being invoked by the exec call is instrumented, then the execution time of the process following the exec is reported in that new program's output file.

RESTRICTIONS

If a procedure contains interprocedural branches or interprocedural jumps, that procedure will not be instrumented if the -calltime, -cputime, or -pagefaults option was specified, and no information will be reported about that procedure. Use the -verbose option to see which procedures were not instrumented. Compilers can optimize return statements or non-returning function calls to interprocedural branches. To avoid this, recompile with -O0 or -no_inline.

FILES

appl_prog.hiprof Default name for instrumented version of appl_prog appl_prog.hiout Default name of profile output file

SEE ALSO

atom(1), hiprof(1), gprof(1), cc(1), dxprof(1). (dxprof is available as an option.) Programmer's Guide

Index Index for
Section 5
Index Alphabetical
listing for H
Index Top of
page