Basis Optimization

In practical density functional theory calculations, apart from numerical issues there are three main sources of error, namely (i) the use of approximate exchange-correlation functionals, (ii) the approximation introduced by pseudopotentials and, (iii) the approximation introduced by finite basis set. This Chapter deals with issue (iii) by basis function optimization.

Since one cannot use infinite number of atomic orbital as basis set, optimization of LCAO function becomes very important for obtaining accurate results. Some discussions about basis functions are found in section [tip-basis] above and various part of this manual. The neutral atom database came with NanoDCAL was generated by the atomic package nanobase and optimized using NanoDCAL. The rest of this Chapter discusses how to optimize basis functions. For this purpose, one needs to have a nanobase installed in the same computer as NanoDCAL. A combined application of both packages is necessary to achieve basis optimization.

An optimization requires a cost function which is to be minimized. NanoDCAL supplies several cost functions but users can supply their own as discussed in section 1.3 below. The cost functions supplied by NanoDCAL are constructed based on some calculated and/or target physical quantities, such as band structure or total energy. During the optimization, parameters in the basis functions (cutoffs of the orbital, etc) are slightly adjusted until the value of the cost function reaches its minimum. In other words, the optimization procedure is to reduce error between the calculated and the target physical quantities. A good basis is optimized against many target physical quantities.

Preparation

To start, please make sure both NanoDCAL and nanobase are installed properly. One should already know how to generate basis function using nanobase and how to calculate a physical property (band structure, band gap, total energy, etc.) of a system using NanoDCAL. Next, three (sets of) input files are needed for optimization:

An input file for basis optimization used by NanoDCAL, call it “bo.input”. This is the main input file for the calculation (see below).
An input file (or files) for basis generation used by nanobase. The input parameters defined in this input file will be used as their initial values to start optimization. The quality of the optimized basis may highly depend on the initial value provided here. Nanobase and NanoDCAL will check the validity of initial values. If one does not know how to choose initial values, please go to the /neutralatomdatabase of NanoDCAL and check the values there. Refer to the User Reference manual of nanobase about how to set up this input file. Importantly, please put the input file for each atomic element in a different folder. This input file (or files) will be specified in bo.input.
An input file (or files) for calculating the physical property (or properties) of a system (or systems) used by NanoDCAL. The calculated physical properties are used to construct the cost function that is being minimized. Remember that all physical properties used as cost function need to be calculated here. Put each input file of each system in a separate folder. Usually, basis optimization using one target system is recommended due to the efficiency issue. However, a user may require excellent transferability of the optimized basis in many very different chemical environments. For example, a user may wish to find oxygen basis functions that works very well for both molecules containing oxygen and crystalline oxides. NanoDCAL provides such a flexibility but the more complicated the target systems, the more difficult the optimization process. This input file (or files) will be specified in bo.input.

Optimization can be quite lengthy and it is recommended to do parallel computation, for instance the following command starts an 8-processor parallel run (the actual command depends on your parallel computer set up):

mpiexec -n 8 matlab -nodisplay -r “nanodcal -parallel bo.input”

The input files

In this section, we discuss the three input files for basis optimization presented above. We use the Si example in the /examples/Group9/. Other examples can be also be found there.

Input file for optimization

This is the bo.input file introduced in Section1.1which tells nanodcal to do an optimization run:

% This is the bo.input for nanodcal.
calculation.name = ’basisOptimization’
calculation.basisOptimization.basisInputFiles =
{’./Si/Si-nanobase.inp’}
calculation.basisOptimization.basisFileNames = {’./Si_SZP_LDA.nad’}
calculation.basisOptimization.systemInputFiles=
{’./Si-bulk/Si-nanodcal.inp’}
calculation.basisOptimization.simplex.MaxFunEvals = 10000
calculation.basisOptimization.simplex.TolFun = 1e-3
calculation.basisOptimization.constrainType = ’hard’ %soft or hard
calculation.basisOptimization.history = true % true or false

The above parameters are explained in the following.

calculation.name: for basis optimization, it can be ‘basisOptimization’ or simply ‘bo’;
calculation.basisOptimization.basisInputFiles: the name of the input file for basis generation used by nanobase.
calculation.basisOptimization.basisFileNames: file name for the optimized basis output. It is named as Element_OrbitalType. Here it is ‘./Si_SZP_LDA.nad’, meaning it is the optimized Si having the OrbitalType ‘SZP_LDA’ This name should be specified in the AtomBlock in the file of Si-nanodcal.inp (next line), in order to use it for the physical property calculation (see similar input files scf.input in previous chapters).
calculation.basisOptimization.systemInputFiles: the name of the input file for calculating the physical property used by NanoDCAL.
calculation.basisOptimization.simplex.MaxFunEvals: This parameter is used in a simplex algorithm implementation to specify the maximum number of times the cost function is called. Default is 1000 times. Usually, it is set to a large number.
calculation.basisOptimization.simplex.TolFun: this parameter is used in the simplex algorithm implementation. It is the tolerance of the cost function value. Since the default cost value for optimization is the “total energy”, the default unit is ‘au’. As users can define their own cost functions, the unit of the tolerance is therefore varied by the user definition.
calculation.basisOptimization.constrainType: it can be ‘hard’ or ‘soft’. If ‘hard’, the basis parameters are strictly confined within a preset optimization domain. If ‘soft’, the basis parameters are allowed to wander to outside the optimization domain with a large punishing cost value.
calculation.basisOptimization.history: ‘true’ or ‘false’. If true, a file named “searchingHistoryOfTheBasisParameters.txt” is generated to record the searching history of the basis parameters.

Input file for basis generation

The input file for basis generation is the input for the atomic package nanobase. In our Si example, it is called “Si-nanobase.inp” as given in the second line of bo.input above, which is for generation of a set of SZP basis. Please refer to the User Reference manual of nanobase for all the details. Here we only very briefly outline it. The essential part of the nanobase input file Si-nanobase.inp has the following blocks:

Block 1: tells nanobase the tasks

I want to calculate all electron atom: yes
I want to generate pseudopotential: yes
I want to check log derivative: yes
I want to generate confined basis: yes
I want to export basis data file: yes

Block 2: parameters for all electron Si atom

AllElectron.Atom = ‘Si’
AllElectron.Shell.Core = ’1s2 2s2 2p6’
AllElectron.Shell.Valence = ’3s2 3p2 3d0 4f0’
AllElectron.XCfunctional = ‘LSDA_PZ81’

Block 3: parameters for generating pseudopotential

PseudoAtom.CoreRadius = [1.75, 1.75, 1.89, 1.89]
PseudoAtom.PartialCore.Method = ”
PseudoAtom.PartialCore.Radius = 0.00

Block 4: parameters for logarithmic derivative

LogDerivative.type = ’AEvsPSvsKB’;
LogDerivative.E = linspace(-3,3,1001)
LogDerivative.R = 4
LogDerivative.L = 0

Block 5: parameters for generating confined basis

NanoOrbital.BasisSetType = ‘Confined’
NanoOrbital.ChargeTransfer = 0.1
NanoOrbital.ConfinementType = ‘nano-1’
NanoOrbital.ConfinedOrbital{1} = {0, [8.0], [1.5], [20]}
NanoOrbital.ConfinedOrbital{2} = {1, [8.0], [1.5], [20]}
NanoOrbital.ConfinedOrbital{3} = {2, [6.0], [1.0], [20]}

While these parameters make little sense here, they are explained and discussed in detail in the User Reference manual of nanobase. The optimization procedure is to adjust some of these parameters in Block-5 to minimize the cost function. Since the ‘Confined’ basis type is most flexible in nanobase, we suggest to use this type for optimization.

Before the optimization run, please run nanobase once with the above input file. Afterward, please change the ‘Yes’ to ‘No’ for the first three tasks in Block-1. This way, we do not re-solve the all electron atom and pseudopotential during the optimization process.

Input file for calculating the physical property

The input file for calculating the physical property is used by NanoDCAL. In previous Chapters we have discussed such input files for various physical properties of different systems such as molecule, crystal and two probe devices. In our Si example (in subdirectory of Si-bulk), it is called “Si-nanodcal.inp‘̈ as given in the forth line of bo.input above, which is for SCF calculation of a Si crystal, as shown in Fig.[si-simplex].

[si-simplex] 1. system.name = ’Si-bulkfcc’
system.centralCellVector1 = 5.4080 * [0.5 0.5 0.0]
system.centralCellVector2 = 5.4080 * [0.0 0.5 0.5]
system.centralCellVector3 = 5.4080 * [0.5 0.0 0.5]
system.atomCoordinateFormat= ’fractional’
system.atomBlock= 2
AtomType X Y Z OrbitalType
Si 0.125000 0.12500 0.125000 SZP_LDA
Si 0.875000 0.87500 0.875000 SZP_LDA
End
calculation.name = {’SCF’}
calculation.control.logOutputLevel= [0 2]
calculation.realspacegrids.E_cutoff= 200 ’a.u.’
calculation.k_spacegrids.L_cutoff= 60 ’a.u.’
calculation.xcFunctional.Type= LDA_PZ81
calculation.occupationFunction.temperature= 300
calculation.SCF.maximumSteps= 100 %default is 50
calculation.SCF.donatorObject= ’NanodcalObject.mat
’calculation.SCF.startingMode= ’realRho’
calculation.SCF.mixingMode= ’H’
calculation.SCF.mixer.class= cMixerBroyden
calculation.SCF.mixer.parameter.beta= 0.1
calculation.SCF.monitoredVariableName= {’hMatrix’, ’rhoMatrix’,
’bandEnergy’, ’gridCharge’,’orbitalCharge’}
calculation.SCF.convergenceCriteria= {1e-4, 1e-4, 1e-5, [],[]}
calculation.bandStructure.symmetryKPoints= {’W’ ’L’ ’G’ ’X’ ’W’
’K’}
calculation.bandStructure.numberOfKPoints= [50 50 50 50 50]

As you are familiar now, there is really no any essential difference between the input file of Fig.[si-simplex] and other input files for SCF calculation discussed in previous Chapters such as those in Figs.[Fe-in-out] or [au-sc]. Here, it is however important to make the AtomType and OrbitalType, lines 7-9 in Fig.[si-simplex], exactly the same as that in the basis optimization input file bo.input (see subsection 1.2.1) above. Namely we must specify OrbitalType to be SZP_LDA in lines 8-9 because that was what we specified in the bo.input.

Line-12, calculation.control.logOutputLevel, is used to limit how much output we will get. NanoDCAL has a smart way to control standard output on the computer screen. In an optimization process, users may not like the screen clogged with hundreds of output lines. Line-12 tells NanoDCAL to not print anything on screen (level 0) but save them in the log file (level 2). Line-18 specifies an object as the donator object to NanoDCAL. In the basis optimization, the input file in Fig.[si-simplex] is called many times, therefore we donate the results of the last step as the initial condition for the present step to save computation time.

Line-11 indicates that the calculation is for self-consistent-field (SCF). It should be noted that the total energy is automatically calculated when the SCF calculation is finished. If the total energy is the only physical property which is used to construct the cost function (like in the cost function class costEnergy which is the default used by NanoDCAL), no other physical property needs to be calculated here. Otherwise, all physical properties which are used to construct the cost function need to be calculated here by including the corresponding calculation name in the parameter calculation.name, e.g., if band structure is used to construct the cost function, line-11 should be: calculation.name = {‘SCF’, ‘BandStructure’}. See :ref:`input_ref` section of NanoDCAL for all the physical property calculations.

User customized cost function

Besides several built-in cost functions, NanoDCAL provides a convenient way for users to define their own cost functions. In principle, any physical quantity can be taken to construct the cost function. For instance, total energy, band structure, band gap, energy levels, transmission coefficients, or combination of these quantities can be adopted depending on the aim of basis optimization. To be more specific, let’s take the difference between the calculated and the target value of the band gap as the cost function to illustrate the optimization process. Assuming one has obtained a benchmark band gap for Si from a large basis method such as a plane wave DFT method. We wish to optimize the finite LCAO basis functions to reduce the error between the calculated band gap and this benchmark.

NanoDCAL provides an application programming interface (API) to create the cost function calculator (see :ref:`input_ref` section). As an example, the procedure of creating such a plug-in class is as follows:

Create a folder named, for example, “@costBandgap”, where the costBandgap is the class name. Make sure it is in the searching path of Matlab. In fact, you shall find that this folder already exists in /nanodcal/classes/basisoptimization.
cd @costBandgap
Create two files named: “costBandgap.m” and “startCalculation.m”. The function costBandgap.m is the constructor which will be called when an object of the class is constructed, with some proper construction parameters as an input. The startCalculation.m is a public function of the class, which is required by the API supplied by NanoDCAL. In fact, the P-code version of these files already exist. But let’s create the m-code as a practice.

In the costBandgap.m, we define the class of cost function for band gap as follows:

function this = costBandgap(cp)
try
this.standardbandgap =cp.standardbandgap;
catch
error(’you have to specify the standard band gap.’);
end
this = class(this, ’costBandgap’);
end

where the cp is the construction parameter, which should be a structure with a field named as standardbandgap, otherwise an error message will be given. The value of the field of standardbandgap should be a double number (or an \(n\times 1\) double array when deal with \(n\) systems simultaneously); when an object of the class is constructed, an user may pass the target benchmark band gap (standard band gap) to the object, so that the target benchmark band gap is available in the public function startCalculation.

In “startCalculation.m”, it calculates the cost value which is the difference between the benchmark band gap (standard band gap) and the calculated band gap using the LCAO basis. The startCalculation.m is as follows:

function costvalue = startCalculation(this,systemInputFiles)
for ii = 1:length(systemInputFiles)
pathstr = fileparts(systemInputFiles{ii});
fullfilename = fullfile(pathstr, ’BandStructure.mat’);
try
bandstructure = load(fullfilename);
gap(ii) = bandstructure.data.bandGap*27.21138386; errorbandgap(ii) =
abs(gap(ii)-this.standardbandgap(ii));
catch
error([’Can not get the band structure in ’, fullfilename, ’.’])
end
end
costvalue = sum(errorbandgap);
end

where the bandstructure.data.bandGap is the calculated band gap in unit of Hartree, and the this.standardbandgap is the benchmark band gap in unit of eV.

This way, we have created a customized plug-in cost function class that can be used by NanoDCAL in the optimization by claiming it in the input file. For the present example, to optimize the Si SZP basis with our customized plug-in cost function class, please proceed as follows:

First of all, it must be declared in the input file to tell NanoDCAL to use the customized plug-in cost function class, by adding the following two lines into the file bo.input of Section1.2.1:

calculation.basisOptimization.costCalculator.class = ’costBandgap’
calculation.basisOptimization.costCalculator.param
eter.standardbandgap = 0.4522

The first line claims the class costBandgap, our customized plug-in cost function, to be the cost function calculator used in the optimization, i.e. using the error between the calculated band gap and a benchmark as the cost value. Nanodcal provides other classes and users can also plug-in other calculators for the cost function evaluation just like what we are doing here. Please see :ref:`input_ref` section.

The second line provides the construction parameter of the class claimed above, which is the value of the benchmark band gap for the present class costBandgap. It supposes to be the accurate value, but here it is the LDA value of Si obtained by, for instance plane wave DFT methods. These two extra lines tell NanoDCAL to use the customized cost function class and its construction parameter.

Secondly, all physical properties used in the cost function must be calculated in each optimization step. Therefore, the input file for calculating the physical property should be well prepared for the tasks. In the present example, line-11 of file Si-nanodcal.inp in Fig.**`[si-simplex] <#si-simplex>`__**, namely,

calculation.name = {‘SCF’}

should be changed to:

calculation.name = {‘SCF’,‘BS’}

The band structure will now be calculated in each step of the optimization such that the cost value of \(|\)bandGap\(-\)standardbandgap\(|\) defined in the above startCalculation.m can be evaluated.

After these additions to the input files, type:

nanodcal bo.input

NanoDCAL and nanobase will combine force to generate the Si basis SZP functions optimized by the band gap cost function. You can of course do a parallel run as specified in Section 1.1 above. The basis optimization using band gap as the cost function is perhaps the simplest, it is useful for semiconductors or insulators. For the present example, using Si fcc lattice (\(a = 5.4080\)Å) as the system and band gap as the cost function, even SZP basis can capture the Si band structure extremely well. This is especially important for bands near the Fermi level because quantum transport depends these bands sensitively. Fig.[bandopt] plots the comparison of the band structure obtained with different sets of basis to that obtained by the VASP electronic package:raw-latex:cite{basis1}: substantial improvement is evident.

Importantly, in order to achieve good basis sets, one can define more complicated cost functions. For instance, cost function can be an entire band structure of a crystal, or many molecular levels of a molecule. The essential work on the user side is to mimic the codes in Section 1.3 and create your own cost function.

So far we focused on optimizing basis functions. The idea is that after the basis is optimized, one moves forward to do research with the fixed basis. From another point of view, the basis optimization technique of NanoDCAL can be used to do research with a dynamically changing basis. Namely the SCF calculation can be done with a dynamic (called variable) basis.