Skip to content

Trainset Generator

Trainset generation utilities for ReaxFF parameter training.

This module provides end-to-end helpers for generating elastic-energy training targets (bulk EOS and elastic constants), optional strained geometries, YAML-based configuration, and Materials Project-based bootstrapping for trainset creation.

Typical use cases include:

  • generating trainset_elastic.in plus energy tables (E vs strain/volume)
  • generating strained XYZ/GEO structures for ReaxFF runs
  • writing/reading a trainset_elastic.yaml settings file
  • creating a ready-to-run trainset from a Materials Project material ID

generate_all_energy_vs_volume_data(*, out_dir, bulk_inputs, elastic_inputs, bulk_cell, elastic_volume_cell=None, bulk_options=None, elastic_options=None, trainset_filename='trainset_elastic.in')

Write bulk and elastic energy targets to trainset and table files.

High-level generator that: 1) generates bulk + elastic energy targets 2) writes: - trainset_elastic.in - EvsStrain_bulk.dat - EvsStrain_c11.dat ... EvsStrain_c66.dat

This function performs both generation and writing and returns None.

Works on

Elastic-energy training targets written to disk (trainset + tables)

Parameters:

Name Type Description Default
out_dir str

Output directory to write files into.

required
bulk_inputs dict

Bulk target inputs with keys such as B0_gpa, B0_prime, and max_volumetric_strain_percent.

required
elastic_inputs dict

Elastic target inputs including max_strain_percent and cij values.

required
bulk_cell dict

Bulk reference cell with keys: a,b,c,alpha,beta,gamma.

required
elastic_volume_cell dict or None

Cell used to compute volume prefactors for elastic targets. If None, uses bulk_cell.

None
bulk_options dict or None

Optional overrides (e.g., linear_strain_step, reference_energy).

None
elastic_options dict or None

Optional overrides (e.g., strain_step).

None
trainset_filename str

Output trainset file name (default: "trainset_elastic.in").

'trainset_elastic.in'

Returns:

Type Description
None

Writes trainset_elastic.in and E-vs-strain/volume tables to out_dir.

Examples:

>>> generate_all_energy_vs_volume_data(
...     out_dir="out",
...     bulk_inputs={"B0_gpa": 180, "B0_prime": 4.0, "max_volumetric_strain_percent": 6.0},
...     elastic_inputs={"max_strain_percent": 3.0, "c11": 300, "c22": 300, "c33": 250,
...                    "c12": 120, "c13": 140, "c23": 140, "c44": 80, "c55": 80, "c66": 60},
...     bulk_cell={"a": 2.9, "b": 2.9, "c": 3.5, "alpha": 90, "beta": 90, "gamma": 90},
... )

generate_strained_geometries_with_xtob(*, elastic_xyz, bulk_xyz, elastic_cell, bulk_cell, max_strain_elastic, dstrain_elastic, max_strain_bulk_linear, dstrain_bulk_linear, out_dir, sort_by=None)

Generate strained XYZ structures and convert them to GEO via xtob.

Creates strained XYZ files (with comment=title on line 2) and converts each to GEO using xtob().

Output folders: out_dir/xyz_strained/.xyz out_dir/geo_strained/.bgf

Works on

XYZ input structures + GEO/XTLGRF outputs via xtob

Parameters:

Name Type Description Default
elastic_xyz str or Path

Base XYZ used for elastic strain modes.

required
bulk_xyz str or Path or None

Optional base XYZ used for bulk mode. If None, reuse elastic_xyz.

required
elastic_cell dict

Elastic reference cell with keys: a,b,c,alpha,beta,gamma.

required
bulk_cell dict

Bulk reference cell with keys: a,b,c,alpha,beta,gamma.

required
max_strain_elastic float

Maximum absolute linear strain for elastic modes (unitless).

required
dstrain_elastic float

Linear strain step for elastic modes (unitless).

required
max_strain_bulk_linear float

Maximum absolute linear bulk strain (unitless).

required
dstrain_bulk_linear float

Linear bulk strain step (unitless).

required
out_dir str or Path

Output directory where xyz_strained and geo_strained are created.

required
sort_by str or None

Sorting key passed to xtob (e.g., "z").

None

Returns:

Type Description
dict[str, list[Path]]

Mapping mode name to written GEO paths (e.g., "bulk", "c11").

Examples:

>>> cell = {"a": 2.9, "b": 2.9, "c": 3.5, "alpha": 90, "beta": 90, "gamma": 90}
>>> out = generate_strained_geometries_with_xtob(
...     elastic_xyz="ground_elastic.xyz",
...     bulk_xyz=None,
...     elastic_cell=cell,
...     bulk_cell=cell,
...     max_strain_elastic=0.02,
...     dstrain_elastic=0.005,
...     max_strain_bulk_linear=0.01,
...     dstrain_bulk_linear=0.004,
...     out_dir="out",
... )

generate_trainset_from_yaml(yaml_path, out_dir, *, place_all_outputs_in_out_dir=True, copy_input_xyz_into_out_dir=True)

Generate a trainset and optional strained geometries from a YAML settings file.

Works on

YAML settings + XYZ inputs (optional) → trainset files and strained structures

Parameters:

Name Type Description Default
yaml_path str

Path to the trainset settings YAML file.

required
out_dir str

Output directory for generated files.

required
place_all_outputs_in_out_dir bool

If True, place all generated outputs (including geometry outputs) in out_dir.

True
copy_input_xyz_into_out_dir bool

If True, copy input XYZ files into the output directory when geometry generation is enabled.

True

Returns:

Type Description
None

Writes trainset files and (optionally) strained XYZ/GEO files to disk.

Examples:

>>> generate_trainset_from_yaml("trainset_elastic.yaml", out_dir="out")

generate_trainset_settings_yaml_from_mp_simple(*, mp_id, out_yaml, structure_dir=None, bulk_mode='vrh', api_key=None, verbose=True)

Generate a trainset settings YAML and structures from a Materials Project ID.

Minimal MP -> (structure + mechanics) -> CIF -> XYZ -> trainset_settings.yaml.

  • Fetches: structure, lattice (a,b,c,alpha,beta,gamma), elastic tensor (6x6), bulk modulus.
  • Writes: .cif and .xyz
  • Writes YAML where:
    • elastic_cell == bulk_cell == MP lattice
    • structure 1.elastic_xyz == structure 2.bulk_xyz == generated XYZ
    • geo.enable is set true (since geo comes from the XYZ)
Works on

Materials Project API + structure files (CIF/XYZ) + trainset settings YAML

Parameters:

Name Type Description Default
mp_id str

Materials Project material ID (e.g., "mp-661").

required
out_yaml str or Path

Output YAML path to write.

required
structure_dir str or Path or None

Directory to write structure files (CIF/XYZ). If None, uses the YAML folder.

None
bulk_mode ('voigt', 'reuss', 'vrh')

Which bulk modulus value to store in YAML.

"voigt","reuss","vrh"
api_key str or None

Materials Project API key. If None, uses MP_API_KEY environment variable.

None
verbose bool

If True, print written paths to stdout.

True

Returns:

Type Description
dict[str, str]

Mapping with keys: "cif", "xyz", "yaml" pointing to written file paths.

Examples:

>>> out = generate_trainset_settings_yaml_from_mp_simple(
...     mp_id="mp-661",
...     out_yaml="trainset_elastic.yaml",
... )
>>> sorted(out.keys())
['cif', 'xyz', 'yaml']

read_trainset_settings_yaml(yaml_path)

Read a trainset settings YAML file into a configuration dictionary.

Works on

YAML configuration files for trainset generation (trainset_elastic.yaml)

Parameters:

Name Type Description Default
yaml_path str

Path to a YAML settings file.

required

Returns:

Type Description
dict

Parsed configuration mapping containing elastic, bulk, and output sections.

Examples:

>>> cfg = read_trainset_settings_yaml("trainset_elastic.yaml")
>>> sorted(cfg.keys())[:3]
['bulk', 'elastic', 'metadata']

write_trainset_settings_yaml(*, out_path, name='AlN example', source='manual', mp_id=None, elastic_max_strain_percent=3.0, elastic_dstrain=0.005, cij_gpa=None, elastic_cell=None, B0_gpa=174.0, B0_prime=1.5, bulk_max_volumetric_strain_percent=6.0, bulk_dstrain_linear=0.004, bulk_cell=None, trainset_file='trainset_elastic.in', tables=None, elastic_xyz='ground_elastic.xyz', bulk_xyz='null', geo_enable=True)

Write a trainset settings YAML file for elastic-energy trainset generation.

Works on

YAML configuration files for trainset generation (trainset_elastic.yaml)

Parameters:

Name Type Description Default
out_path str

Output YAML file path.

required
name str

Descriptive material name stored in metadata.

'AlN example'
source str

Settings source label (e.g., "manual" or "materials_project").

'manual'
mp_id str or None

Materials Project ID to store in metadata.

None
elastic_max_strain_percent float

Maximum elastic strain magnitude (%).

3.0
elastic_dstrain float

Elastic strain step size (unitless).

0.005
cij_gpa dict or None

Elastic constants in GPa with keys c11..c66.

None
elastic_cell dict or None

Elastic reference cell with keys a,b,c,alpha,beta,gamma.

None
B0_gpa float

Bulk modulus B0 (GPa).

174.0
B0_prime float

Bulk modulus pressure derivative B0' (dimensionless).

1.5
bulk_max_volumetric_strain_percent float

Maximum volumetric strain magnitude (%).

6.0
bulk_dstrain_linear float

Bulk linear strain step (unitless).

0.004
bulk_cell dict or None

Bulk reference cell with keys a,b,c,alpha,beta,gamma.

None
trainset_file str

Trainset file name to store under output settings.

'trainset_elastic.in'
tables dict or None

Output table filenames keyed by mode (e.g., "bulk", "c11").

None
elastic_xyz str or Path or None

Base XYZ used for elastic geometry generation when enabled.

'ground_elastic.xyz'
bulk_xyz str or Path or None

Optional base XYZ for bulk geometry generation when enabled.

'null'
geo_enable bool

Whether the YAML enables geometry generation.

True

Returns:

Type Description
None

Writes a YAML settings file to disk.

Examples:

>>> write_trainset_settings_yaml(
...     out_path="trainset_elastic.yaml",
...     name="AlN example",
...     source="manual",
... )