# How to specify the molecule and basis set using the traditional mol files¶

The procedure used to read mol files is as follows: first, the basis set for the large component will be read in from the the mol file; then the small component basis set is generated automatically via the kinetic balance prescription, but can (by experienced users) also be specified explicitly. Try this only if you have studied the theory and know what the criteria for small component basis sets are, there are many pitfalls that one needs to be aware of!

Warning

The traditional mol file is entered in fixed format (with few exceptions). This means that exact position and spacing matters. If you misplace the charge, the number of centers, etc., then DIRAC will not be able to read the structure and basis set information correctly. Until you get familiar with the format, we recommend to copy existing mol files and adapt them.

We will learn the mol file format using the following example:

DIRAC

C   2
1.    2
H1   -1.4523499293         .0000000000         .8996235720
H2    1.4523499293         .0000000000         .8996235720
LARGE BASIS cc-pVDZ
8.    1
O      .0000000000        0.0000000000        -.2249058930
LARGE BASIS cc-pVDZ
FINISH

This is the water molecule using the cc-pVDZ basis for the large component basis set. The first 3 lines are arbitrary comment lines. You can leave them blank or write some note to yourself but they have to be there. Line 4 specifies a cartesian GTO basis set (C) and 2 atom types (basis set types). So after line 4 DIRAC expects 2 basis set types and here they are - first hydrogen:

        1.    2
H1   -1.4523499293         .0000000000         .8996235720
H2    1.4523499293         .0000000000         .8996235720
LARGE BASIS cc-pVDZ

then oxygen:

        8.    1
O      .0000000000        0.0000000000        -.2249058930
LARGE BASIS cc-pVDZ

and we finish off with a FINISH in the final line.

You can already guess the meaning of the numbers. We have 2 hydrogens with a nuclear charge of 1.0 and one oxygen with a nuclear charge of 8.0. Then come the atom symbols and the coordinates in bohr followed by the large component basis set label. DIRAC will look up this label in the basis set library and complain if it does not exist. Note that the atom label (H1, H2, O) is only for you so that you find it in the output. It does not matter to the code. The code will identify basis sets based on the nuclear charge, not the label. The cartesian coordinates may be given in free format. However, the name of the atom must still be left four places, and no coordinates must enter the four first positions.

DIRAC typically runs in cartesian GTO. Note that by default the transformation to spherical harmonic basis (modified for the small components) is done in the transformation to the orthogonal basis. Spherical Gaussians (S) are allowed in 2-component calculations.

## Default coordinates are in bohr, how can I use angstrom instead?¶

See the “A” in line 4 in position 20? If it is there, DIRAC will read in angstrom, if this position is blank, then DIRAC will read in bohr (default):

DIRAC
1         2         3           # this is a comment line
1234567890123456789012345678901234567890 # this is a comment line
C   2              A
1.    2
[...]

## Nuclear exponent¶

You can modify the default gaussian exponent for the nuclear charge distribution by giving a floating point value somewhere within the space specified below:

DIRAC
1         2         3           # this is a comment line
1234567890123456789012345678901234567890 # this is a comment line
C   2              A
1.    2--------here--------
[...]

As an example, here is the Ne atom with a custom nuclear exponent of 10.0D8:

DIRAC

--------here--------
C   1
10.     1   10.0D+08
Ne    0.0  0.0  0.0
LARGE BASIS cc-pVDZ
FINISH

## Fitting basis set¶

In the development version of DIRAC it is now possible to make use of density fitting. In such calculations, it is mandatory to include a basis set used to expand the density. This is done adding a line with the FTSET keyword after the coordinates for a given atom type. Basically all the options available in the definition of large or small basis sets are applicable to the fit set (i.e. the use of a set from the library, explicitly typed basis, even/well-tempered etc). Here is an example:

DIRAC

C   2
1.    2
H1   -1.4523499293         .0000000000         .8996235720
H2    1.4523499293         .0000000000         .8996235720
LARGE BASIS cc-pVDZ
FTSET BASIS Ahlrichs-Coulomb-Fit
8.    1
O      .0000000000        0.0000000000        -.2249058930
LARGE BASIS cc-pVDZ
FTSET BASIS Ahlrichs-Coulomb-Fit
FINISH

## Multiple basis sets from the basis set library¶

As an alternative to the BASIS option described above, it is possible to use specify different basis set files via the MULTIBASIS keyword. The MULTIBASIS option is present to allow easier inclusion of different sets of diffuse and/or polarization functions to a reference basis set. The syntax for this keyword is very similar to that of the BASIS:

DIRAC

C   2
1.    2
H1   -1.4523499293         .0000000000         .8996235720
H2    1.4523499293         .0000000000         .8996235720
LARGE MULTIBASIS 2 cc-pVDZ cc-pVDZ-diffuse
8.    1
O      .0000000000        0.0000000000        -.2249058930
LARGE MULTIBASIS 2 cc-pVDZ cc-pVDZ-diffuse
FINISH

After the keyword an integer with the number of files to be read should be specified. It is followed by the name of the different basis set files, each separated by a whitespace. The only limitation for the number of basis set files is that the total lenght of this line should not exceed Fortran’s maximum allowed line size (80 characters).

## Explicitly typed basis sets¶

Explicitly typed basis sets are best described using an explicit example:

DIRAC

C   2
8.    1
O      .0000000000        0.0000000000        -.2249058930
LARGE EXPLICIT  3    1    1    1
# s functions
f  10    4
11720.0000000  0.00071000 -0.00016000  0.00000000  0.00000000
1759.0000000  0.00547000 -0.00126300  0.00000000  0.00000000
400.8000000  0.02783700 -0.00626700  0.00000000  0.00000000
113.7000000  0.10480000 -0.02571600  0.00000000  0.00000000
37.0300000  0.28306200 -0.07092400  0.00000000  0.00000000
13.2700000  0.44871900 -0.16541100  0.00000000  0.00000000
5.0250000  0.27095200 -0.11695500  0.00000000  0.00000000
1.0130000  0.01545800  0.55736800  0.00000000  0.00000000
0.3023000 -0.00258500  0.57275900  1.00000000  0.00000000
0.0789600  0.00000000  0.00000000  0.00000000  1.00000000
# p functions
f   5    3
17.7000000  0.04301800  0.00000000  0.00000000
3.8540000  0.22891300  0.00000000  0.00000000
1.0460000  0.50872800  0.00000000  0.00000000
0.2753000  0.46053100  1.00000000  0.00000000
0.0685600  0.00000000  0.00000000  1.00000000
# d functions
f   2    2
1.1850000  1.00000000  0.00000000
0.3320000  0.00000000  1.00000000
[...]

Following EXPLICIT is the highest angular quantum number plus one. In this case it is 3, since we are using a spd basis set. Following this number are the number of blocks for each l-value. The memory requirements grow rapidly with the number of basis functions in a block (note for instance that four g functions actually are 60 basis functions, as there are 15 cartesian components of each g function). Memory requirements can therefore be reduced by splitting basis functions of the quantum number into different blocks. This will, however, decrease the performance of the integral calculation.

Following the LARGE EXPLICIT line we see a line with a comment. Lines starting with either !, \$, or # are comments and are ignored by the code.

The next line starts with a single character describing the input format of the basis set in this block.

The default format is 8F10.4 which will be used if left blank. Be very careful when using this default format as it will miss any exponential parameter standing to the right of the 10 characters. In this format the first column is the orbital exponent and the seven last columns are contraction coefficients. If no numbers are given, a zero is assumed. If more than 7 contracted functions occur in a given block, the contraction coefficients may be continued on the next line, but the first column (where the orbital exponents are given) must then be left blank.

An F or f in the first position (like in the example above) will indicate that the input is in free format. This will of course require that all contraction coefficients need to be typed in, as all numbers need to be present on each line. However, note that this options is particularly handy together with completely decontracted basis sets, as described below. Note that the program reads the free format input from an internal file that is 80 characters long, and no line should therefore exceed 80 characters.

The numbers in the line “f 10 4” are the number of of primitive Gaussians in this block (10) and the number of contracted Gaussians in this block (4). If a zero is given for he number of contracted Gaussians, an uncontracted basis set will be assumed, and only orbital exponents need to be given.

One may also give the format H or h. This corresponds to high precision format 4F20.8, where the first column again is reserved for the orbital exponents, and the three next columns are designated to the contraction coefficients. If no number is given, a zero is assumed. If there are more than three contracted orbitals in a given block, the contraction coefficients may be continued on the next line, though keeping the column of the orbital exponents blank.

## Specification of how to generate small component functions¶

Coming back to the example above we can add another integer following “f 10 4”, which will specify how to generate small component functions:

DIRAC

C   2
8.    1
O      .0000000000        0.0000000000        -.2249058930
LARGE EXPLICIT  3    1    1    1
# s functions
f  10    4    0
11720.0000000  0.00071000 -0.00016000  0.00000000  0.00000000
1759.0000000  0.00547000 -0.00126300  0.00000000  0.00000000
400.8000000  0.02783700 -0.00626700  0.00000000  0.00000000
113.7000000  0.10480000 -0.02571600  0.00000000  0.00000000
37.0300000  0.28306200 -0.07092400  0.00000000  0.00000000
13.2700000  0.44871900 -0.16541100  0.00000000  0.00000000
5.0250000  0.27095200 -0.11695500  0.00000000  0.00000000
1.0130000  0.01545800  0.55736800  0.00000000  0.00000000
0.3023000 -0.00258500  0.57275900  1.00000000  0.00000000
0.0789600  0.00000000  0.00000000  0.00000000  1.00000000
# p functions
f   5    3    0
17.7000000  0.04301800  0.00000000  0.00000000
3.8540000  0.22891300  0.00000000  0.00000000
1.0460000  0.50872800  0.00000000  0.00000000
0.2753000  0.46053100  1.00000000  0.00000000
0.0685600  0.00000000  0.00000000  1.00000000
# d functions
f   2    2    0
1.1850000  1.00000000  0.00000000
0.3320000  0.00000000  1.00000000
[...]

If a 0 (as in this example) or no number is given, the small component functions are generated both upwards and downwards (default). If the number is 1, the small component functions are generated upwards, If the number is 2, the small component functions are generated downwards. For other values no small components functions generated.

## MOLFDIR-type basis sets¶

The MOLFDIR basis set file(s) (here: Oxygen-xyz.bas) must be copied to the scratch area, for example using the pam script:

DIRAC

C   2
1.    2
H1   -1.4523499293         .0000000000         .8996235720
H2    1.4523499293         .0000000000         .8996235720
LARGE MOLFBAS Hydrogen-xyz.bas
8.    1
O      .0000000000        0.0000000000        -.2249058930
LARGE MOLFBAS Oxygen-xyz.bas
FINISH

## Even-tempered basis sets (geometric progressions)¶

The exponents are generated in an even-tempered series:

$\eta_{N - k + 1} = \alpha \beta^{k - 1}$

with

$k = 1, \ldots, N$
LARGE EVENTEMP 0.05 3.0 11 3 2 1 1
1..5
6..11
7..11
9..10

This means that alpha is 0.05, beta is 3.0, N is 11. Highest angular quantum number plus one is 3 (s, p, and d functions), we will have 2 blocks for the s, 1 block for p, and one block for d. The s blocks go from exponent 1 to 5 and from 6 to 11, p goes from 7 to 11, d goes from 9 to 10.

## Well-tempered basis sets¶

This works very much like LARGE EVENTEMP, except that the exponents are generated in an well-tempered series:

$\eta_N = \alpha$

and

$\eta_{N - k + 1} = \eta_{N - k + 2} \beta [ 1 + \gamma (\frac{k}{N})^n ]$

with

$k = 1, \ldots, N$
LARGE WELLTEMP 0.05 2.5 2.0 6.0 11 3 2 1 1
1..5
6..11
7..11
9..10

Here alpha is 0.05, beta is 2.5, gamma is 2.0, n is 6.0, and N is 11.

## Explicit small component basis set¶

This is for experts. Following the line or block specifying the LARGE component basis set you can override the default kinetic balance prescription and specify the small component basis set either using SMALL EXPLICIT or using SMALL MOLFBAS, in analogy to LARGE EXPLICIT and LARGE MOLFBAS After the large component basis set the small component basis set can be specified. If nothing is specified it is equivalent to specifying SMALL KINBAL (see below). There are three possibilities for giving the basis set.

## Family basis sets¶

Input for basis sets where the same set of exponents are used for all functions. This is analogous to the well- and even-tempered basis sets except that the exponents are not calculated from a formula, but must be given in the file. These exponents may come from a basis set optimization with GRASP:

DIRAC

C   2
8.    1
O      .0000000000        0.0000000000        -.2249058930
LARGE FAMILY   10    3    1    1    1
# exponents
6665.0000000
1000.0000000
228.0000000
64.7100000
21.0600000
7.4950000
2.7970000
0.5215000
0.1596000
0.0469000
# ranges
1..10
6..10
8..9
[...]

## Dual family basis sets¶

A basis set analogous to the family basis set, except one set of exponents are used for s, d, g, ... (gerade) functions, and another set is used for p, f, h, ... (ungerade) functions:

DIRAC

C   2
8.    1
O      .0000000000        0.0000000000        -.2249058930
LARGE DUALFAMILY   10    5    3    1    1    1
# s, d, g, ...
6665.0000000
1000.0000000
228.0000000
64.7100000
21.0600000
7.4950000
2.7970000
0.5215000
0.1596000
0.0469000
# p, f, h, ...
9.4390000
2.0020000
0.5456000
0.1517000
0.0404100
# ranges
1..10
1..5
8..9
[...]

## How to set the charge of the system¶

By default the charge of the atom/molecule is zero. You can change this in the mol file:

DIRAC

*** <- you can use these positions on next line
C   2  1
1.    2
H1   -1.4523499293         .0000000000         .8996235720
H2    1.4523499293         .0000000000         .8996235720
LARGE BASIS cc-pVDZ
8.    1
O      .0000000000        0.0000000000        -.2249058930
LARGE BASIS cc-pVDZ
FINISH

This is water with charge plus 1. Alternatively you can specify the occupation explicitly in the inp file and leave it blank in the mol file.

## Automatic augmentation with diffuse functions¶

By using a augmentation prefix you can use automatic augmentation to extend Dunning, Dyall, and Turbomole basis sets with diffuse exponents.

The following augmentation prefixes are recognized:

d-aug-cc-p...
t-aug-cc-p...
q-aug-cc-p...
5-aug-cc-p...
6-aug-cc-p...
7-aug-cc-p...

s-aug-dyall...
d-aug-dyall...
t-aug-dyall...
q-aug-dyall...

s-a-Turbomole...
d-a-Turbomole...
t-a-Turbomole...
q-a-Turbomole...


What the code does is that it sorts exponents for each angular momentum, calculates the factor between the two most diffuse exponents, and adds a new exponent (or new exponents) by keeping this factor (even tempered augmentation). If an angular momentum only has one exponent, then the progression cannot be calculated (obviously), and a factor 3.5 is used. This is a choice.

Automatic augmentation can be useful but it should not be used in a black box manner. You are strongly encouraged to define DEBUG_PRIMITIVES in abacus/herrdn.F and to examine which exponents have been added. The code simply augments and does not consider whether it uses the original SCF set exponents or correlating exponents.

## How to force specific symmetry¶

In all the above examples we have not specified symmetry explicitly. In this case DIRAC will detect and use the highest symmetry available in the code. In doing so it may translate and rotate the molecule (check output).

There are two alternatives to this. The first is that we can turn off symmetry completely and force C1 (by placing a 0 at the right place; the 0 means zero symmetry generators):

DIRAC

C   2    0
[...]

DIRAC

Alternative two is to specify an explicit symmetry other than C1. For this we need to remove all symmetry-generated centers and specify the coordinates only for the symmetry-unique centers:

DIRAC

C   2    2  X  Y
8.    1
O      .0000000000        0.0000000000        -.2249058930
LARGE BASIS cc-pVDZ
1.    1
H     1.4523499293         .0000000000         .8996235720
LARGE BASIS cc-pVDZ
FINISH

We give only one hydrogen center, the other is generated by symmetry operations. This molecule has two symmetry generators (second 2 in line 4).

Warning

If you fail to remove the centers generated by symmetry operations, DIRAC will apply symmetry operations to them which will re-generate the symmetry-unique centers. You will get more than one center at the same position and DIRAC will stop with the error “nuclei too close”.

All available symmetry groups and the corresponding symmetry group specification strings are given in the following list:

C   ?    3  Z  Y  X   # group:      D2h
# operations: reflection(xy)
#             reflection(xz)
#             reflection(yz)

C   ?    2XY  YZ      # group:      D2
# operations: rotation(z)
#             rotation(x)

C   ?    2 Y X        # group:      C2v
# operations: reflection(xz)
#             reflection(yz)

C   ?    2  ZXYZ      # group:      C2h
# operations: reflection(xy)
#             inversion

C   ?    1XY          # group:      C2
# operations: rotation(z)

C   ?    1  Z         # group:      Cs
# operations: reflection(xy)

C   ?    1XYZ         # group:      Ci
# operations: inversion

C   ?    0            # group:      C1