Index Click this button to go to the index for this section.


pfm(7)

NAME

pfm - The on-chip performance counter pseudo-device

SYNOPSIS

pseudo-device pfm

DESCRIPTION

The pfm pseudo-device is the interface to Alpha implementation-specific on-chip performance counters. A set of ioctl calls form the interface, as defined in the <sys/pfcntr.h> header file. The kernel in use must have the pfm pseudo-device configured into it. To do this, add the following line to the kernel configuration file and rebuild the kernel: pseudo-device pfm

EV4 INTERFACE DESCRIPTION

The EV4 implementations (21064, 21064A, 21066, and 21068) have two counters, each of which can be independently programmed to count certain internal or external events. Each counter interrupts the system when a certain number of the selected events have been counted. Any one of the following three actions can happen at each interrupt (tick): - Counters (PFM_COUNTERS) - IPL histogramming (PFM_IPL) - User or kernel PC profiling (PFM_PROFILING) These values are defined in <sys/pfcntr.h> and can be selected orthogonally by bitwise ORing the selections together and passing the result to the PCNTSETITEMS ioctl request. If counters are enabled, the interrupt count for this event is incremented. This records the number of times each event has happened, in multiples of the interrupt frequency selected (PCNTSETMUX). Note that the driver can only count the interrupts generated; no direct access to the EV4 on-chip counter values is provided. If IPL histogramming is enabled, the appropriate entry in the IPL array is incremented. The entries are: - 0-5 refer to IPL0-IPL5. - 6 is unused. (IPL6 is the level of the performance counter interrupts.) - 7 counts "idle" ticks (IPL = 0 and current_thread = idle_thread). - 8 counts user mode ticks. If profiling is enabled, a PC sample is added to the profile histogram if the mode is correct (kernel or user). Each CPU in a multiprocessor platform has separate counters, and the device can be opened in three different ways: - PCNTOPENONE opens and collects data on only the CPU that the program is running on. - PCNTOPENEACH opens all CPUs but keeps data for each one separately. - PCNTOPENALL opens all CPUs, aggregating the data for all CPUs into one collection. These values are defined in <sys/pfcntr.h> and are bitwise ORed into the mode passed to the device open call. Note that if PCNTOPENONE is selected, the opening thread/process must be bound to that processor; otherwise, the open will fail. It must also remain bound to that processor for the duration of the driver usage or extremely unpredictable results will occur. The following ioctl calls apply to the performance counter pseudo-device. Note that most of the EV4 ioctls can also be used on EV5 and EV6: PCNTRDISABLE Disables performance counter interrupts on the CPU. Takes no arguments. PCNTRENABLE Enables performance counter interrupts on the CPU. Takes no arguments. PCNTSETMUX (EV4 only) Selects the statistics to be counted by each performance counter and the interrupt frequency. Takes a pointer to a struct iccsr that contains the MUX register values desired. The fields in this register are: iccsr_pc0 Controls the interrupt frequency of performance counter 0. If set, interrupt frequency is every 2^12 events. If clear, interrupt frequency is every 2^16 events. iccsr_pc1 Controls the interrupt frequency of performance counter 1. If set, interrupt frequency is every 2^8 events. If clear, interrupt frequency is every 2^12 events. iccsr_mux0 Selects the event counted by counter 0. One of: PF_ISSUES, PF_PIPEDRY, PF_LOADI, PF_PIPEFROZEN, PF_BRANCHI, PF_CYCLES, PF_PALMODE, PF_NONISSUES, PF_EXTPIN0 iccsr_mux1 Selects the event counted by counter 1. One of: PF_DCACHE, PF_ICACHE, PF_DUAL, PF_BRANCHMISS, PF_FPINST, PF_INTOPS, PF_STOREI, PF_EXTPIN1 iccsr_disable Contains two bits, each of which disables data collection on the specified counter. For example, set to 2 to disable counter 1 and enable counter 0. Cannot be set to 3 (which disables both counters, causing PCNTSETMUX to return EINVAL). iccsr_ign0, iccsr_ign1, iccsr_ign2, iccsr_ign3 Do not set these fields. Must be zero. PCNTSETITEMS Selects the data items to be collected at each tick: - Counters (PFM_COUNTERS) - IPL histogramming (PFM_IPL) - User or kernel PC profiling (PFM_PROFILING - see PCNTSETUADDR, PCNTSETURANGE, PCNTSETKADDR, and PCNTSETKRANGE) These values are defined in <sys/pfcntr.h> and can be selected orthogonally by bitwise ORing the selections together into the integer argument. If no items are selected, returns EINVAL. PCNTLOGALL Sets the on-chip counters to count all system activity. Takes no arguments and returns no errors. PCNTLOGSELECT Sets the on-chip counters to count only those threads/processes with the PCB_PME_BIT set in their PCBs, and sets the PCB_PME_BIT for this process. This bit is inherited across fork/exec, setting it for all children. Takes no arguments and returns no errors. PCNTCLEARPCBPME Clears the PCB_PME_BIT in the PCB of the current process. Takes no arguments and returns no errors. PCNTCLEARCNT Clears the driver's internal counters appropriate to the actions selected. If PFM_COUNTERS is enabled, the interrupt counters and cycle counter value are reset. If PFM_IPL is enabled, the IPL histogram is reset. If neither is enabled (PFM_PROFILING only), EINVAL is returned and nothing is cleared. Takes no arguments. PCNTGETCNT (EV4 only) Returns the driver's counter values and the pcc value(s). Takes a pointer to an array of struct pfcntrs; the array is filled in with the values. Sample usage of this ioctl is: struct pfcntrs cntrs[NUM_OF_CPUS]; struct pfcntrs *pfcntrs = cntrs; ioctl (fd, PCNTGETCNT, &pfcntrs); If the driver is opened in mode PCNTOPENEACH, the underlying array must be big enough to hold all of the data for each CPU; otherwise, EFAULT is returned. If the driver is opened in mode PCNTOPENONE or PCNTOPENALL, the array can be one element. If PFM_COUNTER is not enabled, returns EINVAL. PCNTGETRSIZE Returns the number of bytes of data available to read for getting the PC profiling samples. By default this will be equal to one fourth of the address range being profiled. (By default, profiling data is kept as one bucket per four instructions, which corresponds to a default profiling stride of 4 instructions per sample count.) If the driver is opened in mode PCNTOPENEACH, this number of bytes will be multiplied by the number of CPUs. To set the profiling address range and stride (and select user or kernel profiling), use the PCNTSETURANGE or PCNTSETKRANGE ioctl, respectively. To set the address range without changing the stride, you can also use the PCNTSETUADDR or PCNTSETKADDR ioctl. The PCNTGETRSIZE ioctl takes a pointer to a long and returns no errors. The returned value will be 0 if profiling is not currently selected or if the address range and mode have not been specified. PCNTGETIPLHIS Returns the current IPL histogram(s). Takes a pointer to an array of struct pfipls; the array is filled in with the values. Sample usage of this ioctl is: struct pfipls ipls[NUM_OF_CPUS]; struct pfipls *pfipls = ipls; ioctl (fd, PCNTGETIPLHIS, &pfipls); If the driver is opened in mode PCNTOPENEACH, the underlying array must be big enough to hold all of the data for each CPU. If the underlying array is not big enough, EFAULT might be returned or other data in the program might be overwritten. If the driver is opened in mode PCNTOPENONE or PCNTOPENALL, the array can be one element. If PFM_IPL is not enabled, EINVAL is returned. PCNTSETKADDR Sets the kernel address range to profile and turns on kernel mode PC profiling. If the device is not open for profiling, returns EINVAL. If memory cannot be obtained for the sample data, returns ENOMEM. Note that PCNTSETKRANGE performs the same functions as PCNTSETKADDR and, in addition, lets you set the profiling stride. PCNTSETKRANGE Sets the kernel address range to profile and sets the profile stride (the number of consecutive instructions grouped together for each sample count). The stride must be a power of two (for example, 0, 1, 2, 4, 8). A zero stride means there should be only one counter for the whole address range. This ioctl also turns on kernel mode PC profiling. If the device is not open for profiling, returns EINVAL. If memory cannot be obtained for the sample data, returns ENOMEM. PCNTSETUADDR Sets the user address range to profile and turns on user mode PC profiling. If the device is not open for profiling, returns EINVAL. If memory cannot be obtained for the sample data, returns ENOMEM. Note that PCNTSETURANGE performs the same functions as PCNTSETUADDR and, in addition, lets you set the profiling stride. PCNTSETURANGE Sets the user address range to profile and sets the profile stride (the number of consecutive instructions grouped together for each sample count). The stride must be a power of two (for example, 0, 1, 2, 4, 8). A zero stride means there should be only one counter for the whole address range. This ioctl also turns on user mode PC profiling. If the device is not open for profiling, returns EINVAL. If memory cannot be obtained for the sample data, returns ENOMEM. Only one process can have the pfm device open at any point in time. If the device is opened with PCNTOPENONE, only the specified CPU is considered open; subsequent open attempts will return EBUSY. If the device is opened with PCNTOPENALL or PCNTOPENEACH, all CPUs must be available; otherwise, EBUSY is returned. It is sufficient to open the device read-only. Opening the device will disable interrupts (PCNTDISABLE) and log all system activity (PCNTLOGALL), generating simple counters only. The counters are not cleared. Closing the device automatically disables interrupts and resets the service routines (PCNTDISABLE).

EV4 DETAILED STAT DESCRIPTIONS

Following are more detailed descriptions of each of the events that can be counted by the two on-chip counters associated with the EV4 implementations. For more information, consult the 21064 chip specification. Counter 0: Issues (Total Issues Divided By 2) This counter is incremented by one for each cycle in which two instructions are issued and is incremented by 1/2 for each cycle in which one instruction is issued. The number of cycles in which one instruction is issued can be found by using the Dual Issues field and the equation S = (I - D) * 2, where S = Single Issues, D = Dual Issues, and I = Issues. Pipedry This counter is incremented by one for each cycle in which nothing is issued due to the lack of valid instruction stream data. The causes could be instruction cache refill operations (due to normal sequential operation or delays while fetching the target of a branch) or delays caused by the draining of the pipeline in response to an exception. Loads This counter is incremented for each load instruction. Note: If a load misses in the primary data cache, the replay of the instruction will cause the load counter to be incremented again. Pipefrozen This counter is incremented for each cycle in which nothing is issued due to a resource conflict within the pipeline. Examples are: - Not all source and destination registers are available - A load miss or write buffer overflow occurs - A conditional branch cannot be issued in the cycle following a jump - Memory Barrier instruction processing can cause the pipe to freeze Branches This counter is incremented for each branch instruction. Cycles This counter is incremented for each cycle. PALcycles This counter is incremented for each cycle spent in PALmode. Nonissues (Total Non-issues Divided By 2) This counter is incremented by one for each cycle in which no instructions are issued and is incremented by 1/2 for each cycle in which only one instruction is issued. This counter is the inverse of the Issues counter: Non-issues = 1 - Issues. Victims (External Pin 0) This counter is incremented for each external event supplied to external pin 0. On the DEC 3000/500 and DEC 3000/400, this pin is connected to logic that indicates external cache misses with victims. A victim is a data block that must be written back to main memory before it is reused. Counter 1: Dcache This counter is incremented for each primary data cache miss. Note: this counter actually is incremented each time a primary data cache probe does not complete in one cycle. This includes all misses, but also includes hits that are stalled for other reasons such as bus traffic holding previously misses pending. Icache This counter is incremented for each primary instruction cache miss. Dualissues This counter is incremented for each cycle in which two instructions are dual-issued. Mispredicts This counter is incremented for each incorrectly predicted branch. Floatops This counter is incremented for each floating-point operate instruction. The floating-point operate instructions do not include the floating-point load, floating-point branch and floating-point store instructions. Intops This counter is incremented for each integer operate instruction as well as for each Load Address and Load Address High instruction. Stores This counter is incremented for each store instruction. Novictims (External Pin 1) This counter is incremented for each external event supplied to external pin 1. On the DEC 3000/500 and DEC 3000/400, this pin is connected to logic that indicates external cache misses without victims. Most items count the instances of different types of instructions. These counters are incremented for each occurrence, and they do not give information about the cost of executing the instruction. The Pipe Frozen/Dry counter increments for each frozen or dry cycle, not for each instance of pipe freeze or pipe dry.

EV5 INTERFACE DESCRIPTION

The EV5 implementations (21164, 21164A, and 21164PC) have three counters, each of which can be independently programmed to count certain internal or external events. They operate in much the same way as on EV4. Most of the EV4 ioctl calls can also be used on EV5. Here are some descriptions for EV5-specific ioctl calls: PCNT5MUX Selects the events counted by all three counters. The argument is a bitwise OR of one event name for each counter. See <sys/pfcntr.h> for the identifiers for the events: PF5_MUX0_*, PF5_MUX1_*, PF5_MUX2_*. PCNT5FREQ Selects the sampling interrupt frequency for all three counters. The argument is a bitwise OR of one frequency indicator for each counter. A frequency of 256 places an extremely heavy load on the system, so a lower frequency is usually advisable, for example: PF5_C0_INT_EVERY_65536 PF5_C1_INT_EVERY_65536 PF5_C2_INT_EVERY_16384 PCNT5ENABLE, PCNT5RESTART Enables selected counters. (PCNT5RESTART zeroes them first.) The argument is the address of the pmctrs_ev5_long member of a union pmctrs_ev5, with the following additional field-member assignments: - pmctrs_ev5_cpu = PMCTRS_ALL_CPUS - pmctrs_ev5_select = any combination of PF5_SEL_COUNTER_0, PF5_SEL_COUNTER_1, and PF5_SEL_COUNTER_2 using a bitwise OR operator PCNT5DISABLE Disables selected counters. PCNT5CLEAR, PCNT5SETCNTRS Clears or writes selected counters on selected CPUs. The argument is the address of the pmctrs_ev5_long member of a union pmctrs_ev5. See <sys/pfcntr.h> for more information. PCNT5CTXTS Sets contexts in which to count. The argument is a bitwise OR of selected PF5_CTXT_* values. PCNT5GETCNT Similar to EV4's PCNTGETCNT except that the argument is a pointer to an array of struct pfcntrs_ev5. PCNT5GETCNTRS Reads the hardware counters from the selected CPU. The argument is the address of the pmctrs_ev5_long member of a union pmctrs_ev5. See <sys/pfcntr.h> for more information.

EV5 DETAILED STAT DESCRIPTIONS

Following are more detailed descriptions of each of the events that can be counted by the three on-chip counters associated with the EV5 implementations. For more information, see the 21164 or 21164PC chip specification. All EV5 Implementations (EV5, EV56, PCA56) Counter 0: Cycles0 This counter is incremented for each cycle. (Note that counter 2 also has a cycles counter.) Issues This counter is incremented for each instruction. Counter 1: Nonissues This counter is incremented for each cycle in which valid instructions are ready for issue, but none are issued because of a pipeline stall or because the resources they need are not available. Splitissue This counter is incremented for each cycle in which some but not all of the maximum of four instructions are issued. Pipedry This counter is incremented for each cycle in which no instructions are ready to issue. Replay This counter is incremented for each time an instruction has to be executed again (instead of those behind it in the pipeline) because resources it needed were found to be unavailable the first time it executed. Singleissues This counter is incremented for each cycle in which one instruction is issued. Dualissues This counter is incremented for each cycle in which two instructions are issued. Tripleissues This counter is incremented for each cycle in which three instructions are issued. Quadissues This counter is incremented for each cycle in which four instructions are issued. Flowchanges This counter is incremented for each branch, jump, or return instruction. Intops This counter is incremented for each integer operation. Floatops This counter is incremented for each floating-point operation. Loads This counter is incremented for each load operation. Stores This counter is incremented for each store operation. Icacheacc This counter is incremented for each Instruction Cache access. Dcacheacc This counter is incremented for each Data Cache access. Counter 2: Longstalls This counter is incremented for each long pipeline stall (over 15 cycles). Pcmispredicts This counter is incremented for each PC misprediction. Branchmispredicts This counter is incremented for each branch misprediction. Icachemisses This counter is incremented for each instruction not found in either the Instruction Cache or the associated Refill Buffer. Itbmisses This counter is incremented for each Instruction Cache miss for which the instruction's page entry is not stored in the Instruction Translation Buffer. Dcacheldmisses This counter is incremented for each load of a value that is not in the Data Cache. Dtbmisses This counter is incremented for each Data Cache miss for which the data page entry is not stored in the Data Translation Buffer. Ldsmerged This counter is incremented for each load from an address that misses in the Data Cache but is merged with another load from the same address that is already in the Missed Address File. Ldureplays This counter is incremented for each Data Cache miss (for a load) that causes the replay of a later instruction that uses the loaded value. Fullreplays This counter is incremented for each store that is replayed because the Write Buffer is full and for each load that is replayed because the Missed Address File is full. Externalinput This counter is incremented for each cycle for which the perf_mon_h External Input pin is true. Cycles2 This counter is incremented for each cycle. (Note that counter 0 also has a cycles counter.) Memorybarriers This counter is incremented for each stall cycle resulting from a Memory Barrier. Lockedloads This counter is incremented for each Locked Load instruction. EV5 and EV56 Implementations Only Counter 1: Scacheacc This counter is incremented for each Secondary Cache access (for either instructions or data). Scachereads This counter is incremented for each read from the Secondary Cache. Scachewrites1 This counter is incremented for each write to the Secondary Cache. (Note that counter 2 also has a scachewrites counter.) Scachevictim This counter is incremented for each time a data block in the Secondary Cache must be written back to main memory before it is reused. Bcacheref This counter is incremented for each access to the optional, board- level Backup Cache. Bcachevictim This counter is incremented for each time a data block in the Backup Cache must be written back to main memory before it is reused. Sysreqs This counter is incremented for each system request. Counter 2: Scachemisses This counter is incremented for each Secondary Cache miss. Scachereadmisses This counter is incremented for each Secondary Cache Read miss. Scachewritemisses This counter is incremented for each Secondary Cache Write miss. Scachesharedwrites This counter is incremented for each Secondary Cache Shared Write operation. Scachewrites2 This counter is incremented for each Secondary Cache Write operation. (Note that counter 1 also has a scachewrites counter.) Bcachemisses This counter is incremented for each miss in the optional board-level Backup Cache. Systeminvalidates This counter is incremented for each System Invalidate operation. Systemreadrequests This counter is incremented for each System Read Request. PCA56 Implementation Only Counter 1: bcachereads This counter is incremented for each read request from the MBOX. bcachedreadhits This counter is incremented for each Dstream read request that hits in the bcache. bcachedreadfills This counter is incremented for each Dstream read fill to the Bcache. bcachewrites This counter is incremented for each write request from the MBOX. bcachecleanwritehits This counter is incremented for each write that hits a clean block in the Bcache. bcachevictims This counter is incremented for each VICTIM command issued by the 21164PC. readmisstwo This counter is incremented each time a second READ_MISS is sent to the system while an earlier READ_MISS command is still outstanding. Counter 2: bcachedreads This counter is incremented for each Dstream read request from the MBOX. bcachereadhits This counter is incremented for each read request that hits in the Bcache. bcachereadfills This counter is incremented for each read fill to the Bcache. bcachewritehits This counter is incremented for each write that hits in the Bcache. bcachewritefills This counter is incremented for each write fill to the Bcache. sysreadflushhits This counter is incremented for each system READ or FLUSH hit in the Bcache. sysreadflushmisses This counter is incremented for each system READ or FLUSH request. readmissthree This counter is incremented each time a third READ_MISS is sent to the system while two earlier READ_MISS commands are still outstanding.

EV6 INTERFACE DESCRIPTION

The EV6 implementation (21264) has two counters, each of which can be independently programmed to count certain internal or external events. They operate in much the same way as the counters on EV4 and EV5. Most of the EV4 ioctl calls can also be used on EV6. Here are some descriptions for EV6-specific ioctl calls: PCNT6MUX Selects the events counted by the two counters. The argument is a bitwise OR of one event name for each counter. See <sys/pfcntr.h> for the identifiers for the events: PF6_MUX0_*, PF6_MUX1_*. PCNT6ENABLE, PCNT6RESTART, PCNT6ENABWRITE Enables selected counters. PCNT6RESTART zeros them first. PCNT6ENABWRITE sets them to specified values. The argument is the address of the pmctrs_ev6_long member of a union pmctrs_ev6, with the following additional field-member assignments: - pmctrs_ev6_cpu = PMCTRS_ALL_CPUS - pmctrs_ev6_select = any combination of PF6_SEL_COUNTER_0 and PF6_SEL_COUNTER_1 using a bitwise OR operator. PCNT6DISABLE Disables selected counters. PCNT6CLEAR, PCNT6SETCNTRS Clears or writes selected counters on selected CPUs. The argument is the address of the pmctrs_ev6_long member of a union pmctrs_ev6. See <sys/pfcntr.h> for more information. PCNT6GETCNT Similar to EV4's PCNTGETCNT. PCNT6GETCNTRS Reads the hardware counters from the selected CPU. The argument is the address of the pmctrs_ev6_long member of a union pmctrs_ev6. See <sys/pfcntr.h> for more information.

EV6 DETAILED STAT DESCRIPTIONS

Following are more detailed descriptions of each of the events that can be counted by the two on-chip counters associated with the EV6 implementation. For more information, see the 21264 chip specification. Counter 0: cycles0 This counter is incremented for each cycle. (Note that counter 1 also has a cycles counter.) retinst This counter is incremented for every retired instruction. Counter 1: cycles1 This counter is incremented for each cycle. (Note that counter 0 also has a cycles counter.) retcondbranch This counter is incremented for each retired conditional branch. retbranchmiss This counter is incremented for each retired branch mispredict. retdtb1miss This counter is incremented for each retired single dstream translation buffer (DTB) miss. retdtb2miss This counter is incremented for each retired double DTB miss. retitbmiss This counter is incremented for each retired instuction translation buffer (ITB) miss. retunaltrap This counter is incremented for each retired unaligned trap. replay This counter is incremented for each replay trap.

NOTES

The notes in this section pertain only to EV4 processors. Disabling an EV4 counter cannot actually disable it from interrupting the CPU. However, the interrupt will be dismissed without recording any data. Connections of the CPU's External Input pins to external events are platform dependent. The DEC 3000/400, /500, /600, /800 workstations have these connections; they count BCache Misses and BCache Misses with Victims. Generating statistics on a per-process basis is only possible on 21064 Pass 3 or later processors. Attempts to do this on a Pass 2 or earlier will gather statistics for the entire system.

FILES

/dev/pfcntr The device entry (character, dev# 26/0) /usr/include/sys/pfcntr.h Structure definitions

SEE ALSO

Commands: kprofile(1), uprofile(1), prof(1), sysconfig(8), autosysconfig(8)