Profiling is a method of identifying sections of code that consume large portions of execution time. In a typical program, most execution time is spent in relatively few sections of code. To improve performance, the greatest gains result from improving coding efficiency in time-intensive sections.
This chapter discusses the following topics:
prof
program
gprof
program
pixie
and
hiprof
Atom tools
uprofile
and
kprofile
tools
monitor
routines
Profiling methods include:
prof
and
gprof
tools use PC sampling to produce a
statistical sample showing which portions of code consume the most
time. The
gprof
tool also produces call graphs, which show the relationship of calling
and called routines.
To select an appropriate profiling method for an application, you must take into consideration the following factors:
The profiling data display tools, and their respective data collection methods, include the following:
prof
The
prof
tool supports the following data collection methods:
-p
flag
The
-p
flag supports the profiling of shared libraries, but requires you to
at least relink the program. It collects only CPU statistics using PC
sampling
uprofile
tool
The
uprofile
tool profiles user code. It does not support the profiling of
shared libraries. It does not require you to relink the program
and collects either CPU statistics or other information.
kprofile
tool
The
kprofile
tool profiles the running operating system kernel.
It does not require you to relink the program and collects
either CPU statistics or other information.
prof
-pixie
prof
-pixie
tool supports the following basic block counting profiling data
collection method:
pixie
Atom tool (that is, the
atom
-tool
pixie
command) to instrument the program's basic blocks.
The
pixie
Atom tool supports the profiling of shared libraries and does
not require you to relink the program. It supports the
prof
tool's instruction-level profiling and true cycle-count
estimation.
gprof
The
gprof
tool
supports the following data collection methods:
-pg
flag
The
-pg
flag does not allow the profiling of shared libraries.
It requires that you recompile the program's sources and uses
an apportioned call cost method to determine a given procedure's
cost to its callers.
hiprof
Atom tool (that is, the
atom
-tool
hiprof
command) to instrument the program
The
hiprof
Atom tool supports the profiling of shared libraries and does not
require you to recompile or relink.
To determine a given procedure's cost to its callers,
it supports both the apportioned call cost method and the measured call
cost method.
You can also use the
monitor
routines to perform PC-sampling on a specified address range in
a program. For more information on using
monitor
routines, see
Section 8.13
and
monitor
(3).
Table 8-1 provides a concise overview of the profiling tools available in the Digital UNIX operating system.
Tool | Use |
PC-sampling/
prof
|
Link application with
-p;
analyze results with
prof ;
see
prof (1)
and
monitor (3).
|
Call-arcs/
gprof
|
Compile and link with
-pg;
analyze results with
gprof ;
see
gprof (1)
and
monitor (3).
|
pixstats
|
Additional postprocessor for pixified program output; see
pixstats (1).
|
uprofile /
kprofile
|
Run application under
uprofile
or
kprofile ;
requires
pfm
driver to be installed; analyze results with
prof ;
see
uprofile (1),
kprofile (1),
and
pfm (7).
|
Atom toolkit |
Programmable debug/performance analysis tool. Example tools are
contained in
/usr/lib/cmplrs/atom/examples ;
see
atom (1)
and other Atom reference pages for programming interface.
|
pixie
|
Atom-based basic block profiler; analyze results with
prof ;
see
pixie (5).
|
hiprof
|
Atom-based call-arc analyzer; analyze results with
gprof ;
see
hiprof (5).
|
third
|
Atom-based memory error/leak detection tool, Third Degree;
generates text output. See
third (5).
|
All profiling tools work on call-shared and nonshared applications.
Statistical PC-sampling for the program is useful for diagnosing high CPU-usage procedures in the program and it supports both threads and shared libraries.
Interface summary:
%
cc -p *.o -o program # Link with libprof1.a
%
program # Run program to collect data
%
prof program # Process the mon.out file
The
gprof
tool provides procedure call information coupled with statistical
PC-sampling. This is useful for determining which routines are
called most frequently and from where. The
gprof
tool also gives a flat profile for CPU-usage on the routines.
It supports threads and call-shared programs, but does not support
shared libraries.
Using the
gprof
tool, you can retrieve information from
libc.a
and
libm.a
because these two libraries are compiled with the
-pg
flag. Other Digital-supplied libraries are not compiled with
-pg
,
so calling information on these other system libraries is not
available.
Interface summary:
%
cc -pg *.c -o program # Compile and link with -pg
%
program # Run program to collect data
%
gprof program # Process the gmon.out file
The
uprofile
and
kprofile
tools use the performance counters on the Alpha chip.
They do not collect information on shared libraries.
By default, both tools collect cycles for the program.
The performance data produced by these tools is processed with the
prof
command. See
uprofile
(1)
and
kprofile
(1)
for more information.
The Atom toolkit consists of a programmable instrumentation tool
and several packaged tools. Examples are included in the
/usr/lib/cmplrs/atom/examples
directory that demonstrate how to develop instrumentation and
analysis code. The instrumentation part of the tool instructs
Atom on where to insert calls to analysis routines in the program.
When the program is run, the analysis routines are entered and
data collection is performed
as prescribed by the Atom tool specified on the
atom
command.
Atom does not work on programs built with the
-om
flag.
Interface summary:
%
atom -tool toolname program
%
program.tool
Postprocessing is tool-dependent. See Chapter 9 for details on Atom.
The Atom-based pixie is a basic block profiler that supports shared libraries and threaded applications.
Interface summary:
%
atom -tool pixie [-env threads] program
%
program.pixie[.threads]
%
prof -pixie program
The
hiprof
Atom tool collects call-arc information on a program. By default,
it operates like the
gprof
support provided by the
-pg
flag, but has flag-selectable options that are more powerful. The
hiprof
Atom tool supports shared libraries and threaded applications.
Interface summary:
%
atom -tool hiprof [-env threads] program
%
program.hiprof[.threads]
%
gprof program program.hiout
Third Degree is a memory-leak and memory-overwrite
detection tool, also based on Atom.
Third Degree generates text output to a file called
program.3log
.
The log contains the diagnostics that Third Degree detected
(for example, reads of uninitialized heap or stack, memory
overwrites, and memory leaks).
Interface summary:
%
atom -tool third [-env threads] program
%
program.third[.threads]
%
cat program.3log
The examples in the remainder of this chapter refer to the
sample program,
profsample.c
,
shown in
Example 8-1.
#include <math.h> #include <stdio.h>
#define LEN 100
void mult_by_scalar(double ary[], int len, double num); void add_vector(double arya[], double aryb[], int len); double value; void printit(double value);
main() { double ary1[LEN]; double ary2[LEN]; int i;
for (i=0; i<LEN; i++) { ary1[i] = 0.0; ary2[i] = sqrt((double)i); } mult_by_scalar(ary1, LEN, 3.14159); mult_by_scalar(ary2, LEN, 2.71828); for (i=0; i<20; i++) add_vector(ary1, ary2, LEN); }
void mult_by_scalar(double ary[], int len, double num) { int i;
for (i=0; i<len; i++) { ary[i] *= num; value = ary[i]; printit(value); } }
void add_vector(double arya[], double aryb[], int len) { int i;
for (i=0; i<len; i++) { arya[i] += aryb[i]; value = arya[i]; printit(value); } }
void printit(double value) { printf("Value = %f\n", value); }
To use
prof
to obtain PC sampling data on a program, follow these steps:
-p
option, as follows:
%
cc -c profsample.c
%
cc -p -o profsample profsample.o -lm
You must specify the
-p
profiling option during the link step to obtain
PC sampling information.
If you have an existing application, you will not
need to recompile to profile the executable program; simply
relink the program using the
-p
option with the
cc
command.
If you are building an application for the first time, you can
compile and link in the same step.
In the preceding example, the
-lm
option ensures that
libm.{a,so}
is used to resolve symbols that refer to
math library functions.
You might also consider compiling with one of the optimization flags to help improve the efficiency of your code, compiling with a debug flag to provide more symbolic information for the profile report, or compiling with both types of flags.
If you are profiling a multithreaded application, use the
-threads
flag with the
cc
command. For more information on profiling multithreaded
applications, see
Section 8.14.
%
profsample
You can run the program several times, altering the input data (if any) to create multiple profile data files.
During execution, profiling data is saved in a profile data file.
The default name for the profile data file is
mon.out
,
unless you have set the environment variable
PROFDIR
.
For more information on using
PROFDIR
,
see
Section 8.12.1
prof
,
which extracts information from one or more profile data files and
produces a tabular report:
%
prof profsample mon.out
Example 8-2
shows output produced by
the
prof
command on the
profsample.c
program.
Profile listing generated Thu May 26 13:36:14 1994 with: prof profsample mon.out
-------------------------------------------------------------- * -p[rocedures] using pc-sampling; sorted in descending * * order by total time spent in each procedure; * * unexecuted procedures excluded * --------------------------------------------------------------
Each sample covers 4.00 byte(s) for 14% of 0.0068 seconds
%time seconds cum % cum sec procedure (file)
42.9 0.0029 42.9 0.00 printit (profsample.c) 42.9 0.0029 85.7 0.01 add_vector (profsample.c) [1] 14.3 0.0010 100.0 0.01 mult_by_scalar (profsample.c)
add_vector
.
printit
and
add_vector
routines.
mult_by_scalar
is
profsample.c
Because the
prof
program works by periodic sampling of the program counter, you might see
different output when you profile the same program multiple times.
A different profiling run than the preceding example of the sample
program produced the following output:
Profile listing generated Thu May 26 13:34:00 1994 with: prof -procedures profsample mon.out
-------------------------------------------------------------- * -p[rocedures] using pc-sampling; sorted in descending * * order by total time spent in each procedure; * * unexecuted procedures excluded * --------------------------------------------------------------
Each sample covers 4.00 byte(s) for 17% of 0.0059 seconds
%time seconds cum % cum sec procedure (file)
66.7 0.0039 66.7 0.00 add_vector (profsample.c) 33.3 0.0020 100.0 0.01 printit (profsample.c)
To determine the manner in which routines call, or are called by,
other routines, use the
gprof
profiling tool.
The
gprof
tool postprocesses both
hiprof
output and
-pg
output.
To use this tool, follow these steps:
hiprof
Atom tool to produce an instrumented version of the program:
%
atom -tool hiprof profsample
profsample
:
%
profsample.hiprof
%
gprof profsample profsample.hiout
During execution, profiling data is saved in the data file
profsample.hiout
,
unless you have set the
-dirname
flag in the
HIPROF_ARGS
environment variable or on the command line.
Alternatively, you can use the following procedure to collect
profiling data for the
gprof
tool:
-pg
option, as follows:
%
cc -pg -c profsample.c
%
cc -pg -o profsample profsample.o -lm
You must specify the
-pg
flag with the
cc
command during both the compile and link steps
to obtain call graph information.
%
profsample
When this method is used, profiling data is saved during execution
in the data file
gmon.out
,
unless you have set the
PROFDIR
environment variable.
For more information on using this variable, see
Section 8.12.1.
gprof
,
which extracts information from the data file:
%
gprof profsample gmon.out
The output produced by the
gprof
utility comprises three sections:
prof
You can control
gprof
profiling by file by using the
-no_pg
flag to the
cc
command. When you use this flag, you
disable
gprof
profiling for all objects that follow the flag on the command line.
You cannot use the
-no_pg
flag with the
-r
and
-shared
flags to the
ld
command.
Example 8-3
shows output for
gprof
profiling of the sample program. The
-b
flag was used with
gprof
to suppress printing of the description of each output field.
The descriptions are valuable, but they are lengthy and were
left out due to space considerations. To see these descriptions,
follow the steps to produce
gprof
output and write the output to a file or pipe the output through
the
more
utility.
In the call graph profile section, each routine in the program
has its own subsection that is contained within dashed lines and
identified by the index number in the first column.
Note that for the purpose of this
example output, the three sections have been separated by rows of
asterisks that do not appear in the output produced by
gprof
.
Each row of asterisks includes the name of the section.
For more information on
gprof
flags, see the
gprof
(1)
reference page.
*********************** call graph profile *******************
granularity: each sample hit covers 4 byte(s) for 10.00% of 0.01 seconds
called/total parents index %time self descendents called+self name index called/total children
<spontaneous> [1] 100.0 0.00 0.01 main [1] 0.00 0.00 20/20 add_vector [2] 0.00 0.00 2/2 mult_by_scalar [4]
-----------------------------------------------
0.00 0.00 20/20 main [1] [1] [2] 75.5 0.00 0.00 20 add_vector [2] [2] 0.00 0.00 2000/2200 printit [3] [3]
-----------------------------------------------
0.00 0.00 200/2200 mult_by_scalar [4] 0.00 0.00 2000/2200 add_vector [2] [3] 50.0 0.00 0.00 2200 printit [3]
-----------------------------------------------
0.00 0.00 2/2 main [1] [4] 4.5 0.00 0.00 2 mult_by_scalar [4] 0.00 0.00 200/2200 printit [3]
-----------------------------------------------
*********************** timing profile section ***************
granularity: each sample hit covers 4 byte(s) for 10.00% of 0.01 seconds
% cumulative self self total time seconds seconds calls ms/call ms/call name 50.0 0.00 0.00 2200 0.00 0.00 printit [3] 30.0 0.01 0.00 20 0.15 0.37 add_vector [2] 20.0 0.01 0.00 main [1] 0.0 0.01 0.00 2 0.00 0.22 mult_by_scalar[4]
*********************** index section ************************ Index by function name
[2] add_vector [4] mult_by_scalar [1] main [3] printit
main
routine to the
add_vector
routine. Because
main
is listed above the
add_vector
routine in the final column of this section,
main
is identified as the parent of
add_vector
.
The fraction 20/20 indicates that of the 20 times that
add_vector
(the denominator of the fraction) was called,
it was called 20 times by
main
(the numerator of this fraction).
[Return to example]
add_vector
routine, which is the subject of this portion of
the call graph profile because it is the leftmost routine
in the rightmost column of this section.
The index number [2] in the first column corresponds to the
index number [2] in the index section at the end of the output.
The 75.5% in the second column reports the total amount of time
in the sample that is accounted for by the
add_vector
routine and its descendent, in this case the
printit
routine. The 20 in the
called
column indicates the total number of times that the
add_vector
routine is called.
[Return to example]
printit
routine to the
add_vector
routine. Because the
printit
routine is below the
add_vector
routine in this section,
printit
is identified as the child of
add_vector
.
The fraction 2000/2200 indicates that of the total of 2200 calls to
printit
,
2000 of these calls came from
add_vector
.
[Return to example]
A basic block is a set of instructions with one entry and one exit.
The
pixie
Atom tool provides execution counts for the basic blocks of a
program. With
prof
,
the execution counts can be viewed at the instruction level.
To obtain data for basic block counting, follow these steps:
%
cc -c profsample.c
%
cc -o profsample profsample.o -lm
pixie
Atom tool.
You do not have to specify a name for the output because
pixie
produces an output file by default with the same name as the
original C source file, but with
pixie
appended after a period. For example, the
following command causes
pixie
to create two files,
profsample.pixie
and
profsample.Addrs
:
%
atom -tool pixie profsample
The
profsample.pixie
file is equivalent to
profsample
but contains additional code that counts the execution of each basic
block. To create an output file with a name other than
pname.pixie
,
use the
-o
flag followed by the name you assign to the output file.
The
profsample.Addrs
file contains the address of each of the basic blocks.
For more information, see
pixie
(5).
profsample.pixie
file:
%
profsample.pixie
This command generates the file
profsample.Counts
,
which contains the basic block counts.
Each time you execute the
profsample.pixie
file, you create a new
profsample.Counts
file.
prof
,
with the
-pixie
flag over the
profsample
executable file:
%
prof -pixie profsample
This command extracts information from
profsample.Addrs
and
profsample.Counts
and displays information in an easily readable format.
Note that you do not need to specify the
.Addrs
and
.Counts
file suffixes because
pixie
searches by default for files containing them.
You can also run the
pixstats
program on the executable file
profsample
to generate a detailed report on opcode frequencies, interlocks, a
miniprofile, and more. For more information, see
pixstats
(1).
Note
The
pixie
profiling tool provided in the current version of the Digital UNIX operating system is thepixie
Atom tool. If you use the syntax provided in earlier versions of the operating system to invokepixie
, a script transforms the call into a call to thepixie
Atom tool. The previous version of thepixie
tool can be found at/usr/opt/obsolete/usr/bin/pixie
.
Depending on the size of the application and the
type of profiling you request,
prof
may generate a very large amount of output.
However, you are often only interested in profiling data about a
particular portion of your application.
The
prof
program provides the following flags to display information
selectively by procedure:
-only
-exclude
-Only
-Exclude
-totals
The
-only
option tells
prof
to print only profiling information for a particular procedure.
You can specify the
-only
option multiple times on the command line. For example, the
following command displays profiling information for procedures
mult_by_scalar
and
add_vector
from the sample program:
%
prof -only mult_by_scalar -only add_vector profsample
The
-exclude
option tells
prof
to print profiling information for all
procedures except the specified procedure. You can use multiple
-exclude
flags on the command line.
The following command displays
profiling information for all procedures except
add_vector
:
%
prof -exclude add_vector profsample
Do not use the
-only
and
-exclude
flags on the same command line.
Many of the
prof
utility's profiling flags print output as percentages, for
example, the percentage of total execution time attributed to a
particular procedure.
By default, the
-only
and
-exclude
flags cause
prof
to calculate percentages based on all of the procedures in
the application even if they were omitted from the listing.
You can change this behavior with the
-Only
and
-Exclude
flags. These flags work the same as
-only
and
-exclude
,
but cause
prof
to calculate percentages based only on those procedures that appear
in the listing. For example, the following command omits the
add_vector
procedure from both the listing and from percentage calculations:
%
prof -Exclude add_vector profsample
The
-totals
flag, used with the
-procedures
and
-invocations
listings, prints cumulative statistics for the entire object file
instead of for each procedure in the object.
The
-all
,
-incobj
,
and
-excobj
flags allows you to display profiling information for shared libraries
used by the program:
-all
flag causes the profiles for all shared libraries (if any) described
in the data file(s) to be displayed, in addition to the profile for
the executable.
-incobj
flag causes the profile for the named shared library to be printed, in
addition to the profile for the executable.
-excobj
flag causes the profile for the named executable or shared library not
to be printed.
The
-heavy
and
-lines
flags cause
prof
to display the total number of machine cycles executed by each source
line in your application. Both of these flags require you to use basic
block counting (the
-pixie
option); they do not work in PC-sampling mode.
The
-heavy
option prints an entry for every source line that was
executed by your application. Each entry shows the total number of
machine cycles executed by that line. Entries are sorted from the line
with the most machine cycles to the line with the least machine cycles.
Because this option often prints a huge number of entries, you might
want to use one of the
-quit
,
-only
,
or
-exclude
flags to reduce output to a manageable size.
Example 8-4 shows output generated by the following command:
%
prof -pixie -heavy -only add_vector -only mult_by_scalar \
-only main profsample
For example, you can see in
Example 8-4
that line 47 of
profsample.c
in the procedure
add_vector(
)
accounts for over 12 percent of the application's total
execution time. The listing also shows the size in bytes of
each source line.
Profile listing generated Fri May 27 14:09:10 1994 with: prof -pixie -heavy -only add_vector -only mult_by_scalar -only main profsample
------------------------------------------------------------------ * -h[eavy] using basic-block counts; * * sorted in descending order by the number of cycles executed * * in each * * line; unexecuted lines are excluded * ------------------------------------------------------------------
procedure (file) line bytes cycles % cum %
add_vector (profsample.c) 48 44 22000 23.26 23.26 add_vector (profsample.c) 46 40 20000 21.15 44.41 add_vector (profsample.c) 47 24 12000 12.69 57.10 mult_by_scalar (profsample.c) 36 44 2200 2.33 59.43 main (profsample.c) 20 60 1500 1.59 61.02 mult_by_scalar (profsample.c) 34 28 1400 1.48 62.50 mult_by_scalar (profsample.c) 35 24 1200 1.27 63.77 main (profsample.c) 19 12 300 0.32 64.08 main (profsample.c) 25 48 240 0.25 64.34 add_vector (profsample.c) 41 28 140 0.15 64.48 add_vector (profsample.c) 44 12 60 0.06 64.55 add_vector (profsample.c) 50 12 60 0.06 64.61 mult_by_scalar (profsample.c) 29 28 14 0.01 64.63 main (profsample.c) 23 32 8 0.01 64.63 main (profsample.c) 22 32 8 0.01 64.64 mult_by_scalar (profsample.c) 38 12 6 0.01 64.65 mult_by_scalar (profsample.c) 32 12 6 0.01 64.66 main (profsample.c) 26 16 4 0.00 64.66
main (profsample.c) 13 16 4 0.00 64.66 main (profsample.c) 18 8 2 0.00 64.67 main (profsample.c) 24 8 2 0.00 64.67
The
-lines
option is similar to
-heavy
,
but it sorts the output
differently. This option prints the lines for each procedure in the
order that they occur in the source file. Even lines that never
executed are printed. The procedures themselves are sorted from those
procedures that execute the most machine cycles to those that execute
the least.
Example 8-5 shows the same information as Example 8-4, but in a different format as generated by the following command:
%
prof -pixie -lines -only add_vector -only mult_by_scalar \
-only main profsample
Profile listing generated Fri May 27 14:07:28 1994 with: prof -pixie -lines -only add_vector -only mult_by_scalar -only main profsample
------------------------------------------------------------------ * -l[ines] using basic-block counts; * * grouped by procedure, sorted by cycles executed per procedure;* * '?' means that line number information is not available. * ------------------------------------------------------------------
procedure (file) line bytes cycles % cum %
add_vector (profsample.c) 41 28 140 0.15 0.15 44 12 60 0.06 0.21 46 40 20000 21.15 21.36 47 24 12000 12.69 34.05 48 44 22000 23.26 57.32 50 12 60 0.06 57.38 mult_by_scalar (profsample.c) 29 28 14 0.01 57.39 32 12 6 0.01 57.40 34 28 1400 1.48 58.88 35 24 1200 1.27 60.15 36 44 2200 2.33 62.48 38 12 6 0.01 62.48 main (profsample.c) 13 16 4 0.00 62.49 18 8 2 0.00 62.49 19 12 300 0.32 62.81 20 60 1500 1.59 64.39 22 32 8 0.01 64.40 23 32 8 0.01 64.41
24 8 2 0.00 64.41 25 48 240 0.25 64.66 26 16 4 0.00 64.67
The
-quit
option reduces the amount of profiling output displayed. The
-quit
option affects the output from the
-procedures
,
-heavy
,
and
-lines
profiling modes.
The
-quit
option provides three versions:
-quit
n
The
n
refers to an integer. All lines after the
n
line are truncated.
-quit
n%
The
n
is an integer followed by a percent sign (%). All lines after the
line containing
n%
calls in the
%calls
column of the display are truncated.
-quit
ncum%
The
ncum%
refers to an integer
n
followed by the characters
cum
(for cumulative) and a percent sign (%). All lines after the line
containing
ncum%
calls in the
cum%
column of the display are truncated.
If you specify several modes on the same
command line, the
-quit
option affects the output from each mode. For
example, the
-quit
option in the following command reduces the output from
both the
-procedures
and
-heavy
modes:
%
prof -pixie -procedures -heavy -quit 20 profsample
This command prints only the 20 most
time-consuming procedures and the 20 most time-consuming source lines.
The
-quit n
option has no affect on the
-lines
profiling mode.
The
-quit
n%
option restricts the output to those entries that account
for at least
n%
of the total. Depending on the profiling mode, the total can refer
to the total amount of time, the total number of machine cycles,
or the total number of invocation counts. For example, the
following command prints only those source lines that account for at
least 2 percent of the application's total number of machine cycles:
%
prof -pixie -lines -quit 2% profsample
The
-quit
ncum%
option truncates the output after
n%
of the total
has been accounted for. The definition of total depends on the
profiling mode, as described in the preceding paragraph.
For example, the following command prints the most heavily used source
line and stops after 30 percent of the application's total number of
machine cycles have been accounted for:
%
prof -pixie -heavy -quit 30cum% sample
A single run of a program may not produce the desired results.
You can repeatedly run the version of the program created by
pixie
,
varying the input with each run, and then use the resulting
.Counts
files to produce a consolidated report. For example:
-p
option when linking to produce an executable file for
pixie
:
%
cc -c profsample.c
%
cc -o profsample profsample.o -lm
pixie
,
as follows:
%
atom -tool pixie -toolargs=-pids profsample
This command produces the
profsample.Addrs
file to be used in step 4, as well as the modified program
profsample.pixie
.
.Counts
files, set the
PIXIE_ARGS
environment variable to
"-pids"
,
and run the executable program produced by
pixie
.
For example:
%
profsample.pixie
The
-pids
option specified with the
atom
-tool
pixie
command in step 2 appends the process ID of the process running
the executable program to the name of the
profsample.Counts
file, for example,
profsample.Counts.1753
.
profsample.Counts.<pid>
file is created.
prof
to create the report as follows:
%
prof -pixie profsample profsample.Addrs profsample.Counts.*
If you had run
profsample.pixie
three times, the
prof
utility would have averaged the basic block data in the three files
generated by the executable
(profsample.Counts.<pid1>
,
profsample.Counts.<pid2>
,
and
profsample.Counts.<pid3>
)
to produce the profile report.
When you are writing a test suite for an application, you might want to
know how effectively your suite tests the application. The
prof
utility provides
two flags that can help you determine this. The
-zero
option prints the names of procedures that were never executed by your
application. The
-testcoverage
option lists all of the source lines
that were never executed by your application. Both of these flags
require basic block counting.
Typically, you would perform the following steps to make use of these flags.
pixie
Atom tool on your application.
\.Counts
files.
-zero
or
-testcoverage
flags and specify all of the
\.Counts
files produced when you ran the test suite.
If the application you are profiling is fairly complicated, you may
want to run it several times with different inputs to get an accurate
picture of its profile. If you are using PC sampling, each run of
your application produces a new
mon.out
file,
or a
program.pid
file if you have set the
PROFDIR
environment variable. If you are using basic block
counting, each run produces a new
\.Counts
file.
You have two ways of displaying profiling information that is based on an average of all of these output files.
The first way is to specify the names of each profiling data file explicitly on the command line. For example, the following command prints profiling information from two profile data files:
%
prof -procedures profsample 1510.profsample 1522.profsample
Keeping track of many different profiling data files, however, can be
difficult. Therefore,
prof
provides the
-merge
option to combine several data files into a single merged file. When
prof
operates in
-pixie
mode, the
-merge
flag combines the
\.Counts
files. When
prof
operates in PC-sampling mode, this switch combines the
mon.out
or other profile data files.
The following example combines two
profile data files into a single data file named
total.out
:
%
prof -merge total.out profsample 1773.profsample \
1777.profsample
At a later time, you can then display profiling data using the
combined file, just as you would use a normal
mon.out
file. For example:
%
prof -procedures profsample total.out
The merge process is similar for
-pixie
mode. You must specify the executable file's name, the
\.Addrs
file, and each
\.Counts
file:
%
prof -pixie -merge total.Counts a.out a.out.Addrs \
a.out.Counts.1866 a.out.Counts.1868
Feedback files are useful in identifying portions of a large
executable program in which significant percentages of the execution
occur. Without feedback, the compiler must make assumptions about
call frequency based on nesting levels. These assumptions are almost
never as good as actual data from a sample run.
The following sections describes how to use feedback files by using
the
cc
command and the
atom
-tool
pixie
and
prof
commands.
Follow these steps to generate feedback information that can be used to optimize subsequent compilations:
%
cc -O2 -o profsample profsample.c -lm
pixie
Atom tool on the executable file:
%
atom -tool pixie -toolargs=-o profsample.pixie profsample
This step creates an output executable file named
profsample.pixie
and a
prof
input file named
profsample.Addrs
.
%
profsample.pixie
This step creates a
file named
profsample.Counts
,
which contains execution statistics.
prof
to create a feedback file from the execution statistics:
%
prof -pixie -feedback profsample.feedback profsample
-O2
or
-O3
optimization levels when you use the
-feedback
option with the
cc
command, as shown in the following example:
%
cc -O3 -feedback profsample.feedback -o \
profsample profsample.c -lm
The feedback file provides the compiler with actual execution
information that can be used to improve certain optimizations,
such as inlining function calls.
Use a feedback file generated from a
-O2
compilation for any subsequent compilations with
-O2
or
-O3
flags.
You can also use a feedback file as input to the
cord
utility.
The
cord
utility orders the procedures in an executable program
to improve execution time.
The following example shows how to use the
-cord
option as part of a compilation command with a feedback file as input:
%
cc -O2 -cord -feedback profsample.feedback \
-o profsample profsample.c -lm
Use a feedback file generated with the same optimization level as the level you use in subsequent compilations.
You can also use
cord
with the
runcord
utility. For more information, see
runcord
(1).
By default, the
-p
and
-pg
flags to the
cc
command provide the following:
monitor
utilities, as described in
Section 8.13
and
monitor
(3).
mon.out
(for
-p
)
or
gmon.out
(for
-pg
)
placed in the current directory.
The
-p
flag
supports the profiling of shared libraries.
The
-pg
flag
and
uprofile
tool support the profiling of only the part of a program that is
in the executable. When using these tools to generate profiling
information for library routines, link your object file with the
-non_shared
flag to the
cc
command.
You can use one of the following environment variables to control profiling behavior:
PROFDIR
PROFFLAGS
By using these variables, you can disable aspects of default profiling behavior, including:
You can use the
PROFFLAGS
and
PROFDIR
environment variables together.
Note that these environment variables have no effect on the
prof
and
gprof
post-processors; they affect the profiling behavior of a program during
its execution. These environment variables have no effect when you use
the
pixie
Atom tool.
By default, profiling data is collected in a data file named
[g]mon.out
.
When you do multiple profiling runs, each run overwrites the existing
[g]mon.out
file. Use the
PROFDIR
environment variable when you want to collect PC sampling data in
files with unique names. Set this environment variable as follows:
setenv
PROFDIR
path
PROFDIR
=
path
;
export
PROFDIR
The results are saved in the file
path/pid.progname
,
which resolves as follows:
path
PROFDIR
,
identifying an existing directory.
pid
progname
When you set
PROFDIR
to a null string, no profiling occurs.
By default, the profiling library
libprof1.a
(or
libprof1_r.a
,
for multithreaded programs) allocates one buffer per process
to record your profiling data, as well as placing the data output
file in your current directory.
To disable this default behavior, set the
PROFFLAGS
environment variable as follows:
setenv
PROFFLAGS
"-disable_default"
PROFFLAGS
=
"-disable_default";
export
PROFFLAGS
When you have set
PROFFLAGS
to
-disable_default
,
the default profiling support is disabled, allowing you to use the
monitor
calls to profile specific sections of your program for both
nonthreaded and multithreaded programs. See
monitor
(3)
and
Section 8.13
for more information on using the
monitor
,
monstartup
,
and
moncontrol
routines.
For multithreaded programs, you can allocate one buffer per thread by
setting the
PROFFLAGS
environment variable as follows:
setenv
PROFFLAGS
"-threads"
PROFFLAGS
=
"-threads";
export
PROFFLAGS
When you have set
PROFFLAGS
to
-threads
,
a separate file is produced for each thread and is named
pid.sid.progname
,
which is resolved as follows:
pid
sid
progname
You can use the
-threads
and
-disable_default
flags together to control profiling of your program when you
use the
monitor
routines.
You can also set the
PROFFLAGS
environment variable to include or exclude profiling information:
setenv
PROFFLAGS
"-all"
setenv
PROFFLAGS
"-incobj
lib_name"
.dD
Causes the profile for the named shared library to be printed,
in addition to the profile for the executable.
setenv
PROFFLAGS
"-excobj
lib_name"
.dD
Causes the profile for the named executable or shared library not to
be printed.
The default profiling behavior on Digital UNIX systems is to profile
the entire text segment of your program and place the profiling data in
mon.out
for
prof
profiling or in
gmon.out
for
gprof
profiling. For large programs, you might not need to profile the
entire text segment. The
monitor
routines provide the ability to profile portions of your program
specified by the lower and upper address boundaries of a function
address range.
The
monitor
routines are:
monitor(
)
gprof
profiling.
monstartup(
)
monitor,
except it specifies address range only and is supported for
gprof
profiling.
moncontrol(
)
monitor
and
monstartup
to turn PC sampling on or off during program execution for a specific
process or thread.
monitor_signal(
)
You can use
monitor
and
monstartup
to profile an address range in each shared library as well as in
the static executable.
For more information on these functions, see
monitor
(3).
By default, profiling begins as soon your program starts to execute.
You can set the
PROFFLAGS
environment variable to
-disable_default
to prevent profiling from beginning when your program executes.
Then, you can use the
monitor
routines to begin profiling
after the first call to
monitor
or
monstartup.
You can disable the default naming of the profiling data file by
using the
PROFDIR
environment variable. For more information on using this environment
variable, see
Section 8.12.1.
Example 8-6
demonstrates how to use the
monstartup
and
monitor
routines within a program to begin and end profiling.
/* Profile the domath() routine using monstartup. * This example allocates a buffer for the entire program. * Compile command: cc -p foo.c -o foo -lm * Before running the executable, enter the following * from the command line to disable default profiling support: * setenv PROFFLAGS -disable_default */
#include <stdio.h> #include <sys/syslimits.h>
char dir[PATH_MAX];
extern void _ _start(); extern unsigned long _etext;
main() { int i; int a = 1;
/* Start profiling between _ _start (beginning of text * and _etext (end of text). The profiling library * routines will allocate the buffer. */
monstartup(_ _start,&_etext);
for(i=0;i<10;i++) domath();
/* Stop profiling and write the profiling output file. */
monitor(0);
} domath() { int i; double d1, d2;
d2 = 3.1415; for (i=0; i<1000000; i++) d1 = sqrt(d2)*sqrt(d2); }
The external name
_etext
lies just above all the program text. See
end
(3)
for more information.
When you set the
PROFFLAGS
environment variable to
-disable_default
,
you disable default profiling buffer support.
You can allocate buffers within your program, as shown in
Example 8-7.
/* Profile the domath routine using monitor(). * Compile command: cc -p foo.c -o foo -lm * Before running the executable, enter the following * from the command line to disable default profiling support: * setenv PROFFLAGS -disable_default */
#include <sys/types.h> #include <sys/syslimits.h>
extern char *calloc();
void domath(void); void nextproc(void);
#define INST_SIZE 4 /* Instruction size on Alpha */ char dir[PATH_MAX];
main() { int i; char *buffer; size_t bufsize;
/* Allocate one counter for each instruction to * be sampled. Each counter is an unsigned short. */
bufsize = (((char *)nextproc - (char *)domath)/INST_SIZE) * sizeof(unsigned short);
/* Use calloc() to ensure that the buffer is clean * before sampling begins. */
buffer = calloc(bufsize,1);
/* Start sampling. */ monitor(domath,nextproc,buffer,bufsize,0); for(i=0;i<10;i++) domath();
/* Stop sampling and write out profiling buffer. */ monitor(0); } void domath(void) { int i; double d1, d2;
d2 = 3.1415; for (i=0; i<1000000; i++) d1 = sqrt(d2)*sqrt(d2); }
void nextproc(void) {}
You use the
monitor_signal(
)
routine to profile programs that do not terminate. Declare this
routine as a signal handler in your program and build the program
for
prof
or
gprof
profiling. While the program is executing, send a signal from the
shell by using the
kill
command.
When the signal is received,
monitor_signal
is invoked and writes profiling data to the data file. If the program
receives another signal, the data file is overwritten.
Example 8-8
illustrates how to use the
monitor_signal
routine.
/* From the shell, start up the program in background. * Send a signal to the process, for example: kill -30 <pid> * Process the [g]mon.out file normally using gprof or prof */
#include <signal.h>
extern int monitor_signal();
main() { int i; double d1, d2;
/* * Declare monitor_signal() as signal handler for SIGUSR1 */ signal(SIGUSR1,monitor_signal); d2 = 3.1415; /* * Loop infinitely (absurd example of non-terminating process) */ for (;;) d1 = sqrt(d2)*sqrt(d2); }
Profiling multithreaded applications is essentially the same as
profiling non-threaded applications. However, to profile
multithreaded applications, you must compile your program with the
-pthread
or
-threads
flag to the
cc
command. Specifying one of these flags and either the
-p
or
-pg
flag enables the thread profiling library,
libprof1_r.a
.
The default case for profiling multithreaded applications is to
provide one sampling buffer for all threads.
In this case, you get sampling across the entire process and you
get one output file comprising sampling data from all threads.
Depending on whether you use the
-p
or
-pg
flag,
your output file will be named
mon.out
or
gmon.out
,
respectively.
To get a separate buffer and a separate output file for
each thread in your program, use the environment variable
PROFFLAGS
.
Set
PROFFLAGS
to
-threads
,
as shown in the following example:
setenv PROFFLAGS "-threads"
The profiling data file will be named according to the following convention:
pid.sid.progname
In the preceding example,
pid
is the process id of the program,
sid
corresponds to the order in which the thread was created,
progname
is your program name.
If the application controls profiling by using the
monitor
routines,
sid
corresponds to the order in which profiling was started for
the thread.
If you use the
monitor(
)
or
monstartup(
)
calls in a threaded
program, you must first set
PROFFLAGS
to
"-disable_default
-threads"
,
giving you complete control of profiling the application.
If the application uses
monitor(
)
and allocates separate buffers for
each thread profiled, you must first set
PROFFLAGS
to
"disable_default
-threads"
,
because this setting affects the file naming conventions that are used.
Without the
-threads
flag, the buffer and address range used as a result of the first
monitor
or
monstartup
call would be applied to every thread that subsequently requests
profiling. In this case, a single data file that covers all threads
being profiled would be created.
Each thread in a process must call the
monitor(
)
or
monstartup(
)
routines to initiate profiling for itself.