trainset.in file — Training set and cost-function definition
The trainset.in file defines a training set (cost function) used for force field optimization in ReaxFF.
It compares ReaxFF-computed properties (charges, energies, geometries, cell parameters) against literature or quantum-chemistry (QC) reference values and assigns weights to each data point.
ReaxFF uses structure identifiers defined in:
- the DESCRP field of .bgf geometry files, or
- the models.in file
Key properties
- Format-free (fields must be space-separated)
- Lines starting with
#are comments - Identifiers must not contain the symbols
-,+, or/(they have special meaning) - The file is divided into five sections, each starting with a keyword and ending with
END<KEYWORD>
Sections overview
- CHARGE — fixed atomic charges
- HEATFO — heat of formation
- GEOMETRY — bonds, angles, torsions, or force RMSG
- CELL PARAMETERS — lattice constants and angles
- ENERGY — relative energy differences between structures
Each section communicates a specific data type to ReaxFF.
General data-line structure (most sections)
For CHARGE, HEATFO, GEOMETRY, and CELL PARAMETERS, each data line follows:
<Identifier> <Weight> <Type> <Reference value>
Where:
- Identifier — structure name (from DESCRP or models.in)
- Weight — contribution to the total cost function
- Type — depends on section (see below)
- Reference value — literature or QC value
Section details
CHARGE section
Defines reference atomic charges.
Format
<Iden> <Weight> <Atom number> <Reference charge>
HEATFO section
Defines reference heats of formation.
Format
<Iden> <Weight> <Reference heat of formation>
No type identifier is required in this section.
GEOMETRY section
Defines reference geometric targets.
Supported targets:
- Bond lengths — At1 At2
- Valence angles — At1 At2 At3
- Torsion angles — At1 At2 At3 At4
- Force RMSG — omit atom identifiers
Format examples
<Iden> <Weight> <At1> <At2> <Ref> # bond
<Iden> <Weight> <At1> <At2> <At3> <Ref> # angle
<Iden> <Weight> <At1> <At2> <At3> <At4> <Ref> # torsion
<Iden> <Weight> <Ref> # RMSG
If no atom identifiers are provided, ReaxFF compares the RMSG of forces.
CELL PARAMETERS section
Defines reference lattice parameters.
Format
<Iden> <Weight> <Type> <Reference value>
Where Type is one of:
- a, b, c
- alpha, beta, gamma
ENERGY section
Defines relative energy differences between structures.
Format
<Weight> <op1> <Id1>/<n1> <op2> <Id2>/<n2> ... <Reference energy>
Where:
- Weight — contribution to the cost function
- op — + or - (+ is default)
- Id/n — structure identifier divided by an integer (optional)
- Reference energy — literature/QC value
The /n divider enables comparison between condensed-phase systems and monomers.
Example: trainset.in file
CHARGE
# Iden Weight Atom Lit
chexane 0.1 1 -0.15
ENDCHARGE
HEATFO
# Iden Weight Lit
methane 2.00 -17.80
chexane 2.00
ENDHEATFO
GEOMETRY
# Iden Weight At1 At2 At3 At4 Lit
chexane 0.01 1 2 1.54 # bond
chexane 1.00 1 2 3 111.0 # valence angle
chexane 1.00 1 2 3 4 56.0 # torsion angle
chexane 1.00 0.01 # RMSG
ENDGEOMETRY
CELL PARAMETERS
# Iden Weight Type Lit
chex_cryst 0.01 a 11.20
END CELL PARAMETERS
ENERGY
# Weight op Ide1/n1 op Ide2/n2 Lit
# alfa vs beta vs gamma cleavage in butylbenzene
1.5 + butbenz/1 - butbenz_a/1 -90.00
1.5 + butbenz/1 - butbenz_b/1 -71.00
1.5 + butbenz/1 - butbenz_c/1 -78.00
# cyclohexane heat of vaporization
1.0 + chex_cryst/16 -- chexane/1 -11.83
ENDENERGY
Interpretation example (ENERGY section)
The final ENERGY entry computes the heat of vaporization of cyclohexane:
- Divide crystal energy by 16 monomers
- Subtract gas-phase monomer energy
- Compare against −11.83 kcal/mol
The / divider is optional but improves clarity.
Output generated by ReaxFF
After running ReaxFF on all referenced structures:
fort.13— detailed training set evaluationfort.99— summarized cost-function contributions
(See the Output section of the manual for details.)
Known issues and best practices
Bugs / limitations
- Reference values of exactly
0.00are not allowed - Use a small value instead (e.g.
0.0001) - Structure identifiers must be unique
- Reusing an identifier can cause incorrect ENERGY comparisons
- ReaxFF does not currently warn about this
Best practices
- Use clear, unique identifiers
- Start with modest weights and scale gradually
- Validate each section independently before full optimization
Summary
trainset.indefines the optimization cost function- Supports charges, energies, geometries, cell parameters
- Central to ReaxFF force-field fitting workflows
- Produces
fort.13andfort.99for analysis