Skip to content

trainset.in file — Training set and cost-function definition

The trainset.in file defines a training set (cost function) used for force field optimization in ReaxFF.

It compares ReaxFF-computed properties (charges, energies, geometries, cell parameters) against literature or quantum-chemistry (QC) reference values and assigns weights to each data point.

ReaxFF uses structure identifiers defined in: - the DESCRP field of .bgf geometry files, or - the models.in file


Key properties

  • Format-free (fields must be space-separated)
  • Lines starting with # are comments
  • Identifiers must not contain the symbols -, +, or / (they have special meaning)
  • The file is divided into five sections, each starting with a keyword and ending with END<KEYWORD>

Sections overview

  1. CHARGE — fixed atomic charges
  2. HEATFO — heat of formation
  3. GEOMETRY — bonds, angles, torsions, or force RMSG
  4. CELL PARAMETERS — lattice constants and angles
  5. ENERGY — relative energy differences between structures

Each section communicates a specific data type to ReaxFF.


General data-line structure (most sections)

For CHARGE, HEATFO, GEOMETRY, and CELL PARAMETERS, each data line follows:

<Identifier> <Weight> <Type> <Reference value>

Where: - Identifier — structure name (from DESCRP or models.in) - Weight — contribution to the total cost function - Type — depends on section (see below) - Reference value — literature or QC value


Section details

CHARGE section

Defines reference atomic charges.

Format

<Iden> <Weight> <Atom number> <Reference charge>

HEATFO section

Defines reference heats of formation.

Format

<Iden> <Weight> <Reference heat of formation>

No type identifier is required in this section.


GEOMETRY section

Defines reference geometric targets.

Supported targets: - Bond lengthsAt1 At2 - Valence anglesAt1 At2 At3 - Torsion anglesAt1 At2 At3 At4 - Force RMSG — omit atom identifiers

Format examples

<Iden> <Weight> <At1> <At2> <Ref>            # bond
<Iden> <Weight> <At1> <At2> <At3> <Ref>      # angle
<Iden> <Weight> <At1> <At2> <At3> <At4> <Ref> # torsion
<Iden> <Weight> <Ref>                        # RMSG

If no atom identifiers are provided, ReaxFF compares the RMSG of forces.


CELL PARAMETERS section

Defines reference lattice parameters.

Format

<Iden> <Weight> <Type> <Reference value>

Where Type is one of: - a, b, c - alpha, beta, gamma


ENERGY section

Defines relative energy differences between structures.

Format

<Weight> <op1> <Id1>/<n1> <op2> <Id2>/<n2> ... <Reference energy>

Where: - Weight — contribution to the cost function - op+ or - (+ is default) - Id/n — structure identifier divided by an integer (optional) - Reference energy — literature/QC value

The /n divider enables comparison between condensed-phase systems and monomers.


Example: trainset.in file

CHARGE
# Iden     Weight  Atom  Lit
chexane   0.1     1     -0.15
ENDCHARGE

HEATFO
# Iden     Weight  Lit
methane   2.00    -17.80
chexane   2.00
ENDHEATFO

GEOMETRY
# Iden     Weight  At1 At2 At3 At4  Lit
chexane   0.01    1   2        1.54     # bond
chexane   1.00    1   2   3    111.0    # valence angle
chexane   1.00    1   2   3   4 56.0    # torsion angle
chexane   1.00    0.01                 # RMSG
ENDGEOMETRY

CELL PARAMETERS
# Iden        Weight  Type  Lit
chex_cryst   0.01    a     11.20
END CELL PARAMETERS

ENERGY
# Weight op  Ide1/n1    op  Ide2/n2    Lit
# alfa vs beta vs gamma cleavage in butylbenzene
1.5 + butbenz/1 - butbenz_a/1 -90.00
1.5 + butbenz/1 - butbenz_b/1 -71.00
1.5 + butbenz/1 - butbenz_c/1 -78.00

# cyclohexane heat of vaporization
1.0 + chex_cryst/16 -- chexane/1 -11.83
ENDENERGY

Interpretation example (ENERGY section)

The final ENERGY entry computes the heat of vaporization of cyclohexane:

  • Divide crystal energy by 16 monomers
  • Subtract gas-phase monomer energy
  • Compare against −11.83 kcal/mol

The / divider is optional but improves clarity.


Output generated by ReaxFF

After running ReaxFF on all referenced structures:

  • fort.13 — detailed training set evaluation
  • fort.99 — summarized cost-function contributions

(See the Output section of the manual for details.)


Known issues and best practices

Bugs / limitations

  • Reference values of exactly 0.00 are not allowed
  • Use a small value instead (e.g. 0.0001)
  • Structure identifiers must be unique
  • Reusing an identifier can cause incorrect ENERGY comparisons
  • ReaxFF does not currently warn about this

Best practices

  • Use clear, unique identifiers
  • Start with modest weights and scale gradually
  • Validate each section independently before full optimization

Summary

  • trainset.in defines the optimization cost function
  • Supports charges, energies, geometries, cell parameters
  • Central to ReaxFF force-field fitting workflows
  • Produces fort.13 and fort.99 for analysis