Profiling how to¶
This is a quick and dirty guide to profiling. You are strongly recommended to read the profiling software documentation to get deeper insight in the commands suggested in the following.
Intel VTune Amplifier XE¶
This software is a nice tool for profiling and has a GUI working also on Linux. Moreover, you will not need to compile your program with profiling flags on. The only bad thing is that it is commercial software. Profiling takes two steps: 1. run your program under VTune; 2. analyze the collected data.
Assuming your program is called
foobar, and further assuming that you only want to
collect the hotspots, for step 1 you will launch:
amplxe-cl -collect hotspot -r /results/directory ./foobar
the results of the sampling will be put in
/results/directory. If the
results directory is not specified VTune will put everything in a subdirectory
of the current directory.
For step 2:
amplxe-cl -report hotspots -r /results/directory > report_name
this will produce a hotspots report called
gprof is free software. It works a bit differently from VTune since we need to explictly
compile our code with the
-pg flag. This step is needed in order to link correctly the
profiling library to the executable that will be produced. Once the program is correctly
compiled you just need to execute it without specifying
gprof as launcher as is the case
./foobar [--flag1 --flag2 ...] [input1 input2 ...]
your program will automatically create a file, called
gmon.out containing the profiling
information collected. This file will by default sit in the current working directory.
After data collection you need to generate a human-readable profile summary. We run then the
gprof options [executable-file [profile-data-files ...]] [>outfile]
this will produce a so-called flat profile from the raw data collected. A more detailed introduction to
can be found here: gprof.
Profiling DIRAC with Intel VTune Amplifier XE¶
The situation is a bit different if you want to profile DIRAC, because we usually
launch the executable through a script.
If you are using the
wrapper.py script, you will specify as launcher:
--launcher="amplxe-cl -collect hotspot -r /results/directory"
the rest of the procedure remains the same. What if we want to profile during an MPI run? We modify the launcher as follows:
--launcher="mpirun -np 12 amplxe-cl -collect hotspots -follow-child -mrte-mode=auto -target-duration-type=medium -no-allow-multiple-runs -no-analyze-system -data-limit=100 -slow-frames-threshold=40 -fast-frames-threshold=100 -r /results/directory"
Some more words of comment (shamelessly copied from
amplxe-cl -help collect):
follow-child, collects data on processes launched by the target process;
mrte-mode, selects the profiling mode;
target-duration-type, estimates the application duration time. This value affects the size of collected data;
no-allow-multiple-runs, disables multiple runs to achieve more precise results for hardware event-based collections;
no-analyze-system, disables analyzing all processes running on the system;
data-limit, limits the amount of raw data to be collected;
slow-frames-threshold, specifies a threshold to separate slow and good frames. It must be smaller than the threshold for fast frames;
fast-frames-threshold, specifies the threshold to separate good and fast frames.
Profiling DIRAC with gprof¶
In contrast with VTune using
gprof to profile DIRAC is rather straightforward.
The only thing you will need to do is to link the profiling library. Thus if using the
you will type:
./setup --fc=... --cc=... --cxx=... --profiling --release
we recommend to build in release mode if you want to have your collected profiling data to be
Once you managed to compile the sources correctly, you just run DIRAC as you’re used to using
wrapper.py. The program itself will produce in your current working directory the
file and you will translate it to human-readable form with
gprof as explained before.
One word of caution for MPI runs. To avoid all the processes trying to write to the same
file you should export the
GMON_OUT_PREFIX environment variable:
and pass it to
mpirun -x GMON_OUT_PREFIX -np <np> ./foobar
In this way you will have a series of files named
GMON_OUT_PREFIX.pid, post-collection analysis
works exactly as for non-parallel runs. One warning from the
“Users are advised to set variables in the environment, and then use -x to export (not define) them.”