The CHECKPOINT.h5 file¶
What is this file ?¶
It is an hdf5 file containing the essential data given as input to and generated by DIRAC.
All data stored in this file is defined and documented in the DIRAC data schema, the source of which is found in
utils/DIRACschema.txt
.
What can I do with it ?¶
One purpose is to make restarting and data curation after a run easier.
Another purpose is to facilitate communication with other programs.
With the hdf5 format and h5py it is trivial to import data into Python and further process and view it.
Can I also extend the schema as I have data that is not listed here ?¶
The first question is whether this data is indeed essential. The CHECKPOINT.h5 file is not intended for large data sets as it gets saved automatically after a run. It is also not intended for highly specialized or intermediate data. If you want to use hdf5 for such data consider making a special hdf5 file using the interface provided in the mh5 module. This is even easier as you need not define everything as thoroughly as with the schema (see below).
In case a new data type is indeed a generally useful addition, please start by documenting it (type and description) and ask for a peer review by one of the developers before proceeding to the next step.
How the schema is processed.¶
The source text in
utils/DIRACschema.txt
is processed at run time by the python functionsread_schema
andwrite_schema
that are found inutils/process_schema.py
and which are called by the DIRAC run scriptpam
. This produces a new text file calledschema_labels.txt
which is placed in the work directory. This is the file used by the actual DIRAC code and contains the set of labels also found on CHECKPOINT.h5. To familiarize yourself with this: copyschema_labels.txt
from the work directory and compare it toDIRACschema.txt
.Note that the hierarchical structure is defined by /s, much like you see in a Unix directory. This also means that one can not use /s in data labels as hdf5 would get confused.
In the Fortran code the generated labels are used directly, an example is found in gp/dircmo.F90:
call checkpoint_write ('/result/wavefunctions/scf/energy',rdata=toterg)
which writes the total energy (a single real number) with the appropriate label.Note that data is classified as
optional
orrequired
in the schema. This is used to define whether restart is possible, for this purpose all required data should be present on the checkpoint file.
How can the schema be extended ?¶
For extending the schema: Do NOT edit the schema_labels.txt file. All edits should be made in
DIRACschema.txt
.Check first whether the data is optional or required. Be careful to define new required data as restart files will be invalid if this data is missing and this may hamper restarting from old checkpoint files.
If the data consists of a simple standard type (real, integer or string) which fits in an existing subdirectory you can simply define it at the appropriate place and the scripts will automatically generate the label. After inspecting this you can then use this in calls in the Fortran code.
If the data is of composite type, you need to define its elements below. This is done by creating new subsection in the file, an example is the
molecule
data type that is part ofinput
and defined in a separate section. Each section starts with a*
and ends with*end
.You may also nest sections, see for instance the data type
wavefunctions
that has the composite typescf
as an element.
What happens at run time and on the Fortran side ?¶
At the start of the run DIRAC checks whether CHECKPOINT.h5 is present and contains all required data. If it is, a restart will be attempted. Note that you can use the copy facilities of pam to place the file in the work directory.
During the run the only calls needed on the Fortran side are
checkpoint_read
andcheckpoint_write
. These subroutines are found in the modulecheckpoint
and support writing of reals, integers and strings. It is intended to keep the hdf5 interface simple and easily maintainable, so more complicated types should be split up in these standard types. There are two more public routines in this module (for opening and closing a checkpoint file), but these are already called at the appropriate places in DIRAC and should not be called at other places.If the checkpoint_read routine is called data is located on the file and given back to the caller. Error handling is currently absent (still to do) so make sure data is indeed present or you may get crashes.
If the checkpoint_write routine is called data is stored on file after checking that the label is indeed known in the schema. Undefined data will not be written and a warning is issued. This guarantees that all data placed on the checkpoint file is properly documented.
At the end of the run
pam
checks whether CHECKPOINT.h5 is present and contains all required data. If it is, it will be copied to the directory from which pam was called and renamed following the same convention as used for the output file, but giving a file extension .h5 instead of .out.