Trainset Generator
Trainset generation utilities for ReaxFF parameter training.
This module provides end-to-end helpers for generating elastic-energy training targets (bulk EOS and elastic constants), optional strained geometries, YAML-based configuration, and Materials Project-based bootstrapping for trainset creation.
Typical use cases include:
- generating
trainset_elastic.inplus energy tables (E vs strain/volume) - generating strained XYZ/GEO structures for ReaxFF runs
- writing/reading a
trainset_elastic.yamlsettings file - creating a ready-to-run trainset from a Materials Project material ID
generate_all_energy_vs_volume_data(*, out_dir, bulk_inputs, elastic_inputs, bulk_cell, elastic_volume_cell=None, bulk_options=None, elastic_options=None, trainset_filename='trainset_elastic.in')
Write bulk and elastic energy targets to trainset and table files.
High-level generator that: 1) generates bulk + elastic energy targets 2) writes: - trainset_elastic.in - EvsStrain_bulk.dat - EvsStrain_c11.dat ... EvsStrain_c66.dat
This function performs both generation and writing and returns None.
Works on
Elastic-energy training targets written to disk (trainset + tables)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
out_dir
|
str
|
Output directory to write files into. |
required |
bulk_inputs
|
dict
|
Bulk target inputs with keys such as |
required |
elastic_inputs
|
dict
|
Elastic target inputs including |
required |
bulk_cell
|
dict
|
Bulk reference cell with keys: |
required |
elastic_volume_cell
|
dict or None
|
Cell used to compute volume prefactors for elastic targets. If None, uses |
None
|
bulk_options
|
dict or None
|
Optional overrides (e.g., |
None
|
elastic_options
|
dict or None
|
Optional overrides (e.g., |
None
|
trainset_filename
|
str
|
Output trainset file name (default: |
'trainset_elastic.in'
|
Returns:
| Type | Description |
|---|---|
None
|
Writes |
Examples:
>>> generate_all_energy_vs_volume_data(
... out_dir="out",
... bulk_inputs={"B0_gpa": 180, "B0_prime": 4.0, "max_volumetric_strain_percent": 6.0},
... elastic_inputs={"max_strain_percent": 3.0, "c11": 300, "c22": 300, "c33": 250,
... "c12": 120, "c13": 140, "c23": 140, "c44": 80, "c55": 80, "c66": 60},
... bulk_cell={"a": 2.9, "b": 2.9, "c": 3.5, "alpha": 90, "beta": 90, "gamma": 90},
... )
generate_strained_geometries_with_xtob(*, elastic_xyz, bulk_xyz, elastic_cell, bulk_cell, max_strain_elastic, dstrain_elastic, max_strain_bulk_linear, dstrain_bulk_linear, out_dir, sort_by=None)
Generate strained XYZ structures and convert them to GEO via xtob.
Creates strained XYZ files (with comment=title on line 2) and converts each to GEO using xtob().
Output folders: out_dir/xyz_strained/.xyz out_dir/geo_strained/.bgf
Works on
XYZ input structures + GEO/XTLGRF outputs via xtob
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
elastic_xyz
|
str or Path
|
Base XYZ used for elastic strain modes. |
required |
bulk_xyz
|
str or Path or None
|
Optional base XYZ used for bulk mode. If None, reuse |
required |
elastic_cell
|
dict
|
Elastic reference cell with keys: |
required |
bulk_cell
|
dict
|
Bulk reference cell with keys: |
required |
max_strain_elastic
|
float
|
Maximum absolute linear strain for elastic modes (unitless). |
required |
dstrain_elastic
|
float
|
Linear strain step for elastic modes (unitless). |
required |
max_strain_bulk_linear
|
float
|
Maximum absolute linear bulk strain (unitless). |
required |
dstrain_bulk_linear
|
float
|
Linear bulk strain step (unitless). |
required |
out_dir
|
str or Path
|
Output directory where |
required |
sort_by
|
str or None
|
Sorting key passed to |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, list[Path]]
|
Mapping mode name to written GEO paths (e.g., |
Examples:
>>> cell = {"a": 2.9, "b": 2.9, "c": 3.5, "alpha": 90, "beta": 90, "gamma": 90}
>>> out = generate_strained_geometries_with_xtob(
... elastic_xyz="ground_elastic.xyz",
... bulk_xyz=None,
... elastic_cell=cell,
... bulk_cell=cell,
... max_strain_elastic=0.02,
... dstrain_elastic=0.005,
... max_strain_bulk_linear=0.01,
... dstrain_bulk_linear=0.004,
... out_dir="out",
... )
generate_trainset_from_yaml(yaml_path, out_dir, *, place_all_outputs_in_out_dir=True, copy_input_xyz_into_out_dir=True)
Generate a trainset and optional strained geometries from a YAML settings file.
Works on
YAML settings + XYZ inputs (optional) → trainset files and strained structures
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
yaml_path
|
str
|
Path to the trainset settings YAML file. |
required |
out_dir
|
str
|
Output directory for generated files. |
required |
place_all_outputs_in_out_dir
|
bool
|
If True, place all generated outputs (including geometry outputs) in |
True
|
copy_input_xyz_into_out_dir
|
bool
|
If True, copy input XYZ files into the output directory when geometry generation is enabled. |
True
|
Returns:
| Type | Description |
|---|---|
None
|
Writes trainset files and (optionally) strained XYZ/GEO files to disk. |
Examples:
>>> generate_trainset_from_yaml("trainset_elastic.yaml", out_dir="out")
generate_trainset_settings_yaml_from_mp_simple(*, mp_id, out_yaml, structure_dir=None, bulk_mode='vrh', api_key=None, verbose=True)
Generate a trainset settings YAML and structures from a Materials Project ID.
Minimal MP -> (structure + mechanics) -> CIF -> XYZ -> trainset_settings.yaml.
- Fetches: structure, lattice (a,b,c,alpha,beta,gamma), elastic tensor (6x6), bulk modulus.
- Writes:
.cif and .xyz - Writes YAML where:
- elastic_cell == bulk_cell == MP lattice
- structure 1.elastic_xyz == structure 2.bulk_xyz == generated XYZ
- geo.enable is set true (since geo comes from the XYZ)
Works on
Materials Project API + structure files (CIF/XYZ) + trainset settings YAML
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mp_id
|
str
|
Materials Project material ID (e.g., |
required |
out_yaml
|
str or Path
|
Output YAML path to write. |
required |
structure_dir
|
str or Path or None
|
Directory to write structure files (CIF/XYZ). If None, uses the YAML folder. |
None
|
bulk_mode
|
('voigt', 'reuss', 'vrh')
|
Which bulk modulus value to store in YAML. |
"voigt","reuss","vrh"
|
api_key
|
str or None
|
Materials Project API key. If None, uses |
None
|
verbose
|
bool
|
If True, print written paths to stdout. |
True
|
Returns:
| Type | Description |
|---|---|
dict[str, str]
|
Mapping with keys: |
Examples:
>>> out = generate_trainset_settings_yaml_from_mp_simple(
... mp_id="mp-661",
... out_yaml="trainset_elastic.yaml",
... )
>>> sorted(out.keys())
['cif', 'xyz', 'yaml']
read_trainset_settings_yaml(yaml_path)
Read a trainset settings YAML file into a configuration dictionary.
Works on
YAML configuration files for trainset generation (trainset_elastic.yaml)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
yaml_path
|
str
|
Path to a YAML settings file. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Parsed configuration mapping containing |
Examples:
>>> cfg = read_trainset_settings_yaml("trainset_elastic.yaml")
>>> sorted(cfg.keys())[:3]
['bulk', 'elastic', 'metadata']
write_trainset_settings_yaml(*, out_path, name='AlN example', source='manual', mp_id=None, elastic_max_strain_percent=3.0, elastic_dstrain=0.005, cij_gpa=None, elastic_cell=None, B0_gpa=174.0, B0_prime=1.5, bulk_max_volumetric_strain_percent=6.0, bulk_dstrain_linear=0.004, bulk_cell=None, trainset_file='trainset_elastic.in', tables=None, elastic_xyz='ground_elastic.xyz', bulk_xyz='null', geo_enable=True)
Write a trainset settings YAML file for elastic-energy trainset generation.
Works on
YAML configuration files for trainset generation (trainset_elastic.yaml)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
out_path
|
str
|
Output YAML file path. |
required |
name
|
str
|
Descriptive material name stored in metadata. |
'AlN example'
|
source
|
str
|
Settings source label (e.g., |
'manual'
|
mp_id
|
str or None
|
Materials Project ID to store in metadata. |
None
|
elastic_max_strain_percent
|
float
|
Maximum elastic strain magnitude (%). |
3.0
|
elastic_dstrain
|
float
|
Elastic strain step size (unitless). |
0.005
|
cij_gpa
|
dict or None
|
Elastic constants in GPa with keys |
None
|
elastic_cell
|
dict or None
|
Elastic reference cell with keys |
None
|
B0_gpa
|
float
|
Bulk modulus B0 (GPa). |
174.0
|
B0_prime
|
float
|
Bulk modulus pressure derivative B0' (dimensionless). |
1.5
|
bulk_max_volumetric_strain_percent
|
float
|
Maximum volumetric strain magnitude (%). |
6.0
|
bulk_dstrain_linear
|
float
|
Bulk linear strain step (unitless). |
0.004
|
bulk_cell
|
dict or None
|
Bulk reference cell with keys |
None
|
trainset_file
|
str
|
Trainset file name to store under output settings. |
'trainset_elastic.in'
|
tables
|
dict or None
|
Output table filenames keyed by mode (e.g., |
None
|
elastic_xyz
|
str or Path or None
|
Base XYZ used for elastic geometry generation when enabled. |
'ground_elastic.xyz'
|
bulk_xyz
|
str or Path or None
|
Optional base XYZ for bulk geometry generation when enabled. |
'null'
|
geo_enable
|
bool
|
Whether the YAML enables geometry generation. |
True
|
Returns:
| Type | Description |
|---|---|
None
|
Writes a YAML settings file to disk. |
Examples:
>>> write_trainset_settings_yaml(
... out_path="trainset_elastic.yaml",
... name="AlN example",
... source="manual",
... )