Basic Use

This is a commented step-by-step guide for the usage of XHARPy as a python library. I try to also give some understanding what is happening. If you just want results fast, I would suggest you adapt the L-alanin_gpaw example to your structure. The examples folder is also very useful to see how the library is applied to different problems, after you have a basic understanding of what is happening.

The imports in this document are split up according to the section, but of you would usually still put all of them at the top of a .py or .ipynb file.

The usage of XHARPy as a library to write refinement scripts can be split into four distinct steps: Loading your data, setting refinement and computation options, refinement and writing your data to disk.

Activating a potential conda environment

If you have started a fresh console and have followed the installation instructions, do not forget to start the conda environment by typing;

conda activate xharpy

The following steps are most easily done in a jupyter notebook.

Loading your data

For structures without atoms on special positions you just need a .cif and an .hkl file. If you have atom on special positions, which require restraints, the easiest way is to adapt these from a SHELXL .lst file

So let us import the io functions for input:

from xharpy import shelxl_hkl2pd, cif2data, lst2constraint_dict

As written before, the last one is only strictly necessary if we have atoms on special positions. On the other hand it will also give the correct empty dictionary if there are no constaints resulting from special positions, so you can just import it anyway. This way you will not run into problems if you overlook something.

Let us load the data from the cif-file using the cif2data function:

atom_table, cell, cell_esd, symm_mats_vecs, symm_strings, wavelength  = cif2data('iam.cif', 0)

Let us first talk about the arguments of the function: The first one is simply the path to our cif file. The second argument can either be an integer or a string and can be used to select a specific data block within the .cif file. We have used an integer, which the function will interpret as an index in python convention. So we are selecting the first dataset. If we give a string we select the dataset by name. So the string needs to match whatever is written behind the data\_ keyword in the cif file.

From our function we have got a number of resulting objects, we need to further process. Most importantly we have created an atom_table, which contains all our atomic data in a pandas DataFrame. The columns are shortened versions of the naming in the loop instruction. They are shortened by omitting the common beginning that is unique to each table in a cif file. If values have an esd in the cif file is is output to <column_name>_esd. Missing values are represented by a numpy nan value. The atom_table can be manipulated to change values (especially the adp type and parameters before we start our refinement).

cell and cell_esd are just arrays containing the cell parameters and their estimated standard deviations. The wavelength is a float with the wavelength in Angstroems. symm_mats_vecs is a tuple containing the symmetry matrices and translation vectors for the symmetry elements in the cif file. The symm_strings are the corresponding strings.

Next let us load the reflection information with:

hkl = shelxl_hkl2pd('iam.hkl')

Which will again return a pandas DataFrame, which has the columns: h, k, l, intensity and int_esd, where the last is the estimated standard deviation of the measured intensities.

Finally, for dealing with special position constraints we need a constraint_dict. As mentioned before we can generate this one from an .lst file. If you have hydrogen atoms on special positions, these need to be anisotropic in the run of SHELXL that has been used for the .lst generation. Usually, you would want to fix the hydrogen (or all atoms) with AFIX 1 before you run that refinement.

constraint_dict = lst2constraint_dict('iam.lst')
# constraint_dict = {} # This is also possible if there are no atoms on special positions

A constraint_dict is a nested Dict. The syntax is explained in a separate page about symmetry constraints.

Setting options

In general there are two type of options represented by their own dictionaries. Options that concern the refinement routine and options that concern the computation routines that calculate the atomic form factors.

A basic example for a refinement_dict would look like this:

refinement_dict = {
    'f0j_source': 'gpaw', # GPAW with single-core
    #'f0j_source': 'gpaw_mpi', # GPAW with multi-core
    'core': 'constant', # treatment of the core density
    'extinction': 'none', # Refinement of extinction
    'reload_step': 1, # step where the density is reloaded from the save_file, 1 means first step AFTER initialisation
}

You might notice that two of the options concern the computation of the atomic form factors. The f0j_source is used to actually select the implementation of the atomic form factor calculation within the refinement routine. The implementations are also unaware of the step in the refinement. Therefore, the refinement itself triggers the reloading of a precalculated density. We want to start from a new density, but after initialisation we want to reload previous calculation to speed things up. We also want to calculate core density on a separate spherical grid, as they have sharp maxima at the core positions. This might not be well described on the rectangular grid we use for the valence density. This also means Hirshfeld partitioning will not affect the core density. There are more options for the refinement_dict, which are explained on a separate page.

Next we need to define the options for the atomic form factor calculation. these are directly passed on to the routines that we loaded with the f0j_source. An example the selected GPAW source and a molecular structure might look like this:

computation_dict = {
    # options for the XHARPy implementation
    'save_file': 'gpaw_result.gpw', # Where are results saved and loaded
    'gridinterpolation': 4, # density interpolation to use for Hirshfeld and FFT

    # options that are passed on to the gpaw calculator
    'xc': 'SCAN', # Functional
    'txt': 'gpaw.txt', # Text output for GPAW
    'h': 0.175, # Grid spacing for wavefunction calculation
    'convergence':{'density': 1e-7}, # Higher convergence for density calculation
    'symmetry': {'symmorphic': False}, # Also search for symmetry involving translation
    'nbands': -2 # Number of calculated bands = n(occ) + 2
}

As you can see the function of the GPAW source will read the options that are specific to the XHARPy GPAW plugin and remove it from the dictionary. All options that are not known will be passed on to the GPAW calculator without any further checks. GPAW options can be found in the GPAW documentation <https://wiki.fysik.dtu.dk/gpaw/documentation/basic.html>`_

Options for the calc_f0j function can be found in the specific docstrings which are also output to the in the Options for f0j_calculations page.

Writing data to disk

Finally we want to export our structures. There are three kinds of files that we can write at the moment, and four functions that we need to import

from xharpy import write_cif, write_res, write_fcf, add_density_entries_from_fcf

The crystallographic information file is a standard format for exchanging and depositing crystallographic data. We can write such a file with:

write_cif(
    output_cif_path='xharpy.cif',
    cif_dataset='xharpy',
    shelx_cif_path='iam.cif',
    shelx_dataset=0,
    cell=cell,
    cell_esd=cell_esd,
    symm_mats_vecs=symm_mats_vecs,
    hkl=hkl,
    construction_instructions=construction_instructions,
    parameters=parameters,
    var_cov_mat=var_cov_mat,
    refinement_dict=refinement_dict,
    computation_dict=computation_dict,
    information=information
)

You might notice that we need an original cif file (the library was developed with SHELXL) to generate the new cif file. The reason is that the write-routine does currently not calculate all values by itself. Additional values such as crystal size can also be added to the original cif file and will be then copied to the new one.

The cif file is also used to look up the bond and angle tables. XHARPy will then output recalculated values for existing bonds. However, only bonds that are in the original cif file will be output. Therefore, you need to make sure that your X-H bonds are in the bond table of your template cif. If needed, just add bonds with arbitrary values for the bond lengths.

Fcf files can be written as fcf mode 4 or 6 with the two commands:

write_fcf(
    fcf_path='xharpy.fcf',
    fcf_dataset='xharpy',
    fcf_mode=4,
    cell=cell,
    hkl=hkl,
    construction_instructions=construction_instructions,
    parameters=parameters,
    wavelength=wavelength,
    refinement_dict=refinement_dict,
    symm_strings=symm_strings,
    information=information,
)

write_fcf(
    fcf_path='xharpy_6.fcf',
    fcf_dataset='xharpy_6',
    fcf_mode=6,
    cell=cell,
    hkl=hkl,
    construction_instructions=construction_instructions,
    parameters=parameters,
    wavelength=wavelength,
    refinement_dict=refinement_dict,
    symm_strings=symm_strings,
    information=information,
)

Both outputs will correct for extinction and fcf6 will correct the phases for dispersion effects. If you want to access the corrected values for validation. Both functions return a pandas DataFrame.

XHARPy currently has no means of evaluating the difference electron density by itself. For this reason we need to use an additional function with a cctbx module to add the missing entries to the cif file.

add_density_entries_from_fcf('xharpy.cif', 'xharpy_6.fcf')

For visualisation of the structure and the difference electron density is is also helpful to write a SHELXL .res file. This can be done by:

write_res(
    out_res_path='xharpy_6.res',
    in_res_path='iam.lst',
    cell=cell,
    cell_esd=cell_esd,
    construction_instructions=construction_instructions,
    parameters=parameters,
    wavelength=wavelength
)

Again we need a template res or lst file. Currently XHARPy has no way to divide symmetry cards into those generated by a lattice centring or inversion symmetry and those generated by other symmetry elements, which would be necessary for writing these files on its own.

The created .res file can be used to display the structure easily in programs like ShelXle. If you have written an fcf6 file with the same name you can also plot the difference electron density this way.

Basic Use

Activating a potential conda environment

Loading your data

Setting options

Refinement

Writing data to disk