Profiling how to

This is a quick and dirty guide to profiling. You are strongly recommended to read the profiling software documentation to get deeper insight in the commands suggested in the following.

Intel VTune Amplifier XE

This software is a nice tool for profiling and has a GUI working also on Linux. Moreover, you will not need to compile your program with profiling flags on. The only bad thing is that it is commercial software. Profiling takes two steps: 1. run your program under VTune; 2. analyze the collected data.

Assuming your program is called foobar, and further assuming that you only want to collect the hotspots, for step 1 you will launch:

amplxe-cl -collect hotspot -r /results/directory ./foobar

the results of the sampling will be put in /results/directory. If the results directory is not specified VTune will put everything in a subdirectory of the current directory. For step 2:

amplxe-cl -report hotspots -r /results/directory > report_name

this will produce a hotspots report called report_name.

gprof

gprof is free software. It works a bit differently from VTune since we need to explictly compile our code with the -pg flag. This step is needed in order to link correctly the profiling library to the executable that will be produced. Once the program is correctly compiled you just need to execute it without specifying gprof as launcher as is the case for VTune:

./foobar [--flag1 --flag2 ...] [input1 input2 ...]

your program will automatically create a file, called gmon.out containing the profiling information collected. This file will by default sit in the current working directory. After data collection you need to generate a human-readable profile summary. We run then the gprof command:

gprof options [executable-file [profile-data-files ...]] [>outfile]

this will produce a so-called flat profile from the raw data collected. A more detailed introduction to gprof can be found here: gprof.

Profiling DIRAC with Intel VTune Amplifier XE

The situation is a bit different if you want to profile DIRAC, because we usually launch the executable through a script. If you are using the wrapper.py script, you will specify as launcher:

--launcher="amplxe-cl -collect hotspot -r /results/directory"

the rest of the procedure remains the same. What if we want to profile during an MPI run? We modify the launcher as follows:

--launcher="mpirun -np 12 amplxe-cl -collect hotspots -follow-child -mrte-mode=auto -target-duration-type=medium -no-allow-multiple-runs -no-analyze-system -data-limit=100 -slow-frames-threshold=40 -fast-frames-threshold=100 -r /results/directory"

Some more words of comment (shamelessly copied from amplxe-cl -help collect):

  • follow-child, collects data on processes launched by the target process;
  • mrte-mode, selects the profiling mode;
  • target-duration-type, estimates the application duration time. This value affects the size of collected data;
  • no-allow-multiple-runs, disables multiple runs to achieve more precise results for hardware event-based collections;
  • no-analyze-system, disables analyzing all processes running on the system;
  • data-limit, limits the amount of raw data to be collected;
  • slow-frames-threshold, specifies a threshold to separate slow and good frames. It must be smaller than the threshold for fast frames;
  • fast-frames-threshold, specifies the threshold to separate good and fast frames.

Profiling DIRAC with gprof

In contrast with VTune using gprof to profile DIRAC is rather straightforward. The only thing you will need to do is to link the profiling library. Thus if using the setup script you will type:

./setup --fc=... --cc=... --cxx=... --profiling --release

we recommend to build in release mode if you want to have your collected profiling data to be really significant. Once you managed to compile the sources correctly, you just run DIRAC as you’re used to using pam or wrapper.py. The program itself will produce in your current working directory the gmon.out file and you will translate it to human-readable form with gprof as explained before.

One word of caution for MPI runs. To avoid all the processes trying to write to the same gmon.out file you should export the GMON_OUT_PREFIX environment variable:

export GMON_OUT_PREFIX=foobarmon

and pass it to mpirun:

mpirun -x GMON_OUT_PREFIX -np <np> ./foobar

In this way you will have a series of files named GMON_OUT_PREFIX.pid, post-collection analysis works exactly as for non-parallel runs. One warning from the mpirun manual: “Users are advised to set variables in the environment, and then use -x to export (not define) them.