Foyer¶
A package for atom-typing as well as applying and disseminating force fields
Annotate an OpenMM .xml force field file with SMARTS-based atomtypes:
<ForceField>
<AtomTypes>
<Type name="opls_135" class="CT" element="C" mass="12.01100" def="[C;X4](C)(H)(H)H" desc="alkane CH3"/>
<Type name="opls_140" class="HC" element="H" mass="1.00800" def="H[C;X4]" desc="alkane H"/>
</AtomTypes>
</ForceField>
Apply the forcefield to arbitrary chemical topologies. We currently support:
from foyer import Forcefield
import parmed as pmd
untyped_ethane = pmd.load_file('ethane.mol2', structure=True)
oplsaa = Forcefield(forcefield_files='oplsaa.xml')
ethane = oplsaa.apply(untyped_ethane)
# Save to any format supported by ParmEd
ethane.save('ethane.top')
ethane.save('ethane.gro')
Getting started?¶
Check out our example template for disseminating force fields: https://github.com/mosdef-hub/forcefield_template
¶
Various sub-portions of this library may be independently distributed under different licenses. See those files for their specific terms.
This material is based upon work supported by the National Science Foundation under grants NSF ACI-1047828 and NSF ACI-1535150. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Installation¶
Install from conda:
conda install -c conda-forge -c omnia -c mosdef foyer
Install from pip:
pip install foyer
Install an editable version from source:
git clone https://github.com/mosdef-hub/foyer.git
cd foyer
pip install -e .
Supported Python Versions¶
Python 3.6 and 3.7 are officially supported, including testing during development and packaging. Other Python versions, such as 3.8 and 3.5 and older, may successfully build and function but no guarantee is made.
Testing your installation¶
foyer uses py.test
for unit testing. To run them simply type run the
following while in the base directory:
$ conda install pytest
$ py.test -v
Using foyer with Docker¶
As much of scientific software development happens in unix platforms, to avoid the quirks of development dependent on system you use, a recommended way is to use docker or other containerization technologies. This section is a how to guide on using foyer with docker.
Prerequisites¶
A docker installation in your machine. Follow this link to get a docker installation working on your machine. If you are not familiar with docker and want to get started with docker, the Internet is full of good tutorials like the ones here and here.
Quick Start¶
After you have a working docker installation, please use the following command to use run a jupyter-notebook with all the dependencies for foyer installed:
$ docker pull mosdef/foyer:latest
$ docker run -it --name foyer -p 8888:8888 mosdef/foyer:latest su anaconda -s\
/bin/sh -l -c "jupyter-notebook --no-browser --ip="0.0.0.0" --notebook-dir\
/home/anaconda/foyer-notebooks"
If every thing happens correctly, you should a be able to start a jupyter-notebook server running in a python environment with all the dependencies for foyer installed.
Alternatively, you can also start a Bourne shell to use python from the container’s terminal:
$ docker run -it --name foyer mosdef/foyer:latest
Important
The instructions above will start a docker container but containers by nature are ephemeral, so any filesystem changes (like adding a new notebook) you make will only persist till the end of the container’s lifecycle. If the container is removed, any changes or code additions will not persist.
Persisting User Volumes¶
If you will be using foyer from a docker container, a recommended way is to mount what are called user volumes in the container. User volumes will provide a way to persist all filesystem/code additions made to a container regardless of the container lifecycle. For example, you might want to create a directory called foyer-notebooks in your local system, which will store all your foyer notebooks/code. In order to make that accessible to the container(where the notebooks will be created/edited), use the following steps:
Create a directory in your filesystem
$ mkdir -p /path/to/foyer-notebooks
$ cd /path/to/foyer-notebooks
Define an entry-point script. Inside foyer-notebooks in your local file system create a file called
dir_entrypoint.sh
and paste the following content.
#!/bin/sh
chown -R anaconda:anaconda /home/anaconda/foyer-notebooks
su anaconda -s /bin/sh -l -c "jupyter-notebook --no-browser --ip="0.0.0.0" --notebook-dir /home/anaconda/foyer-notebooks"
Run docker image for foyer
$ docker run -it --name foyer -p 8888:8888 --entrypoint /home/anaconda/foyer-notebooks/dir_entrypoint.sh -v $HOME/foyer-notebooks:/home/anaconda/foyer-notebooks mosdef/foyer:latest
Cleaning Up¶
You can remove the created container by using the following command:
$ docker container rm foyer
Note
Instead of using latest, you can use the image mosdef/foyer:stable
for most recent stable release of foyer and run the tutorials.
SMARTS-based atomtyping¶
Foyer allows users to describe atomtypes using a modified version of SMARTS You may already be familiar with SMILES representations for describing chemical structures. SMARTS is a straightforward extension of this notation.
Basic usage¶
Consider the following example defining the OPLS-AA atomtypes for a methyl group carbon and its hydrogen atoms:
<ForceField>
<AtomTypes>
<Type name="opls_135" class="CT" element="C" mass="12.01100" def="[C;X4](C)(H)(H)H" desc="alkane CH3"/>
<Type name="opls_140" class="HC" element="H" mass="1.00800" def="H[C;X4]" desc="alkane H"/>
</AtomTypes>
</ForceField>
This .xml
format is an extension of the OpenMM force field
format
The above example utilizes two additional .xml
attributes supported
by foyer: def
and desc
. The atomtype that we are attempting to
match is always the first token in the SMARTS string, in the above
example, [C;X4]
and H
. The opls_135
(methyl group carbon) is
defined by a SMARTS string indicated a carbon with 4 bonds, a carbon
neighbor and 3 hydrogen neighbors. The opls_140
(alkane hydrogen) is
defined simply as a hydrogen atom bonded to a carbon with 4 bonds.
Overriding atomtypes¶
When multiple atomtype definitions can apply to a given atom, we must establish precedence between those definitions. Many other atomtypers determine rule precedence by placing more specific rules first and evaluate those in sequence, breaking out of the loop as soon as a match is found.
While this approach works, it becomes more challenging to maintain the correct ordering of rules as the number of atomtypes grows. Foyer iteratively runs all rules on all atoms and each atom maintains a whitelist (rules that apply) and a blacklist (rules that have been superceded by another rule). The set difference between the white- and blacklists yields the correct atomtype if the force field is implemented correctly.
We can add a rule to a blacklist using the overrides
attribute in
the .xml
file. To illustrate an example where overriding can be used
consider the following types describing alkenes and benzene:
<ForceField>
<AtomTypes>
<Type name="opls_141" class="CM" element="C" mass="12.01100" def="[C;X3](C)(C)C" desc="alkene C (R2-C=)"/>
<Type name="opls_142" class="CM" element="C" mass="12.01100" def="[C;X3](C)(C)H" desc="alkene C (RH-C=)"/>
<Type name="opls_144" class="HC" element="H" mass="1.00800" def="[H][C;X3]" desc="alkene H"/>
<Type name="opls_145" class="CA" element="C" mass="12.01100" def="[C;X3;r6]1[C;X3;r6][C;X3;r6][C;X3;r6][C;X3;r6][C;X3;r6]1" overrides="opls_141,opls_142"/>
<Type name="opls_146" class="HA" element="H" mass="1.00800" def="[H][C;%opls_145]" overrides="opls_144" desc="benzene H"/>
</AtomTypes>
</ForceField>
If we’re atomtyping a benzene molecule, the carbon atoms will match the
SMARTS patterns for both opls_142
and opls_145
. Without the
overrides
attribute, foyer will notify you that multiple atomtypes
were found for each carbon. Providing the overrides
indicates that
if the opls_145
pattern matches, it should supercede the specified
rules.
Current Grammar Supported¶
We currently do not (yet) support all of SMARTS’ features. Here we’re keeping track of which portions are supported.
Parameter definitions¶
Parameter definitions within force field XMLs follow the same conventions as defined in the OpenMM documentation. Currently, only certain functional forms for molecular forces are supported, while future developments are expected to allow Foyer to support any desired functional form, including reactive and tabulated potentials. The currently supported functional forms for molecular forces are:
Nonbonded
Bonds
Angles
Torsions (proper)
Torsions (improper)
Definitions for each molecular force follow the OpenMM standard.
Classes vs. Types¶
OpenMM allows users to specify either a class
or a
type
(See Atom Types and Atom Classes),
to define each particle within the force definition. Here, type
refers to a specific atom type (as defined in the <AtomTypes>
section), while class
refers to a more general description that can
apply to multiple atomtypes (i.e. multiple atomtypes may share the same
class). This aids in limiting the number of force definitions required
in a force field XML, as many similar atom types may share force
parameters.
Assigning parameters by specificity¶
Foyer deviates from OpenMM’s convention when matching force definitions
in a force field XML to instances of these forces in a molecular system.
In OpenMM, forces are assigned according to the first matching
definition in a force field XML, even if multiple matching definitions
exist. In contrast, Foyer assigns force parameters based on definition
specificity, where definitions containing more type
attributes are
considered to be more specific.
Example:
<RBTorsionForce>
<Proper class1="CT" class2="CT" class3="CT" class4="CT" c0="2.9288" c1="-1.4644" c2="0.2092" c3="-1.6736" c4="0.0" c5="0.0"/>
<Proper type1="opls_961" type2="opls_136" type3="opls_136" type4="opls_136" c0="-0.987424" c1="0.08363" c2="-0.08368" c3="-0.401664" c4="1.389088" c5="0.0"/>
</RBTorsionForce>
Above, two proper torsions are defined, both describing a torsional
force between four tetrahedral carbons. However, the first definition
features four class
attributes and zero type
attributes, as this
describes a general dihedral for all tetrahedral carbons. The second
definition features zero class
attributes and four type
attributes, and describes a more specific dihedral for the case where
one end carbon is of type 'opls_961'
(a fluorinated carbon) and the
remaining three carbons are of type 'opls_136'
(alkane carbons). Now
consider we want to use a force field containing the above torsion
definitions to assign parameters to some molecular system that features
partially fluorinated alkanes. When assigning torsion parameters to a
quartet of atoms where one end carbon is fluorinated ('opls_961'
)
and the remaining three are hydrogenated ('opls_136'
), if using the
OpenMM procedure for parameter assignment the more general
'CT-CT-CT-CT'
torsion parameters (the first definition above) would
be assigned because this would be the first matching definition in the
force field XML. However, with Foyer, the second definition will be
auto-detected as more specific, due to the greater number of type
attributes (4 vs. 0) and those parameters will be assigned instead.
It should be noted that if two definitions feature the same specificity
level (i.e. the same number of type
definitions) then automatic
detection of precedence is not possible and parameter assignment will
follow the OpenMM procedure whereby parameters from the first matching
force definition in the XML will be assigned.
Atom-typing Options¶
The primary data structure in foyer is the Forcefield
class, which inherits
from the OpenMM class of the same name. The primary operation on this class is
the .apply()
function, which takes a chemical topology and returns a
parametrized ParmEd
Structure
. The user may pass some otions to this
function based on a particular use case.
Forcefield¶
-
class
foyer.forcefield.
Forcefield
(forcefield_files=None, name=None, validation=True, debug=False)[source]¶ Specialization of OpenMM’s Forcefield allowing SMARTS based atomtyping.
- Parameters
forcefield_files (list of str, optional, default=None) – List of forcefield files to load.
name (str, optional, default=None) – Name of a forcefield to load that is packaged within foyer.
-
apply
(structure, references_file=None, use_residue_map=True, assert_bond_params=True, assert_angle_params=True, assert_dihedral_params=True, assert_improper_params=False, combining_rule='geometric', verbose=False, *args, **kwargs)[source]¶ Apply the force field to a molecular structure
- Parameters
structure (parmed.Structure or mbuild.Compound) – Molecular structure to apply the force field to
references_file (str, optional, default=None) – Name of file where force field references will be written (in Bibtex format)
use_residue_map (boolean, optional, default=True) – Store atomtyped topologies of residues to a dictionary that maps them to residue names. Each topology, including atomtypes, will be copied to other residues with the same name. This avoids repeatedly calling the subgraph isomorphism on idential residues and should result in better performance for systems with many identical residues, i.e. a box of water. Note that for this to be applied to independent molecules, they must each be saved as different residues in the topology.
assert_bond_params (bool, optional, default=True) – If True, Foyer will exit if parameters are not found for all system bonds.
assert_angle_params (bool, optional, default=True) – If True, Foyer will exit if parameters are not found for all system angles.
assert_dihedral_params (bool, optional, default=True) – If True, Foyer will exit if parameters are not found for all system proper dihedrals.
assert_improper_params (bool, optional, default=False) – If True, Foyer will exit if parameters are not found for all system improper dihedrals.
combining_rule (str, optional, default=’geometric’) – The combining rule of the system, stored as an attribute of the ParmEd structure. Accepted arguments are geometric and lorentz.
verbose (bool, optional, default=False) – If True, Foyer will print debug-level information about notable or potentially problematic details it encounters.
-
createSystem
(topology, nonbondedMethod=NoCutoff, nonbondedCutoff=Quantity(value=1.0, unit=nanometer), constraints=None, rigidWater=True, removeCMMotion=True, hydrogenMass=None, switchDistance=None, **args)[source]¶ Construct an OpenMM System representing a Topology with this force field.
- Parameters
topology (Topology) – The Topology for which to create a System
nonbondedMethod (object=NoCutoff) – The method to use for nonbonded interactions. Allowed values are NoCutoff, CutoffNonPeriodic, CutoffPeriodic, Ewald, or PME.
nonbondedCutoff (distance=1*nanometer) – The cutoff distance to use for nonbonded interactions
constraints (object=None) – Specifies which bonds and angles should be implemented with constraints. Allowed values are None, HBonds, AllBonds, or HAngles.
rigidWater (boolean=True) – If true, water molecules will be fully rigid regardless of the value passed for the constraints argument
removeCMMotion (boolean=True) – If true, a CMMotionRemover will be added to the System
hydrogenMass (mass=None) – The mass to use for hydrogen atoms bound to heavy atoms. Any mass added to a hydrogen is subtracted from the heavy atom to keep their total mass the same.
switchDistance (float=None) – The distance at which the potential energy switching function is turned on for
args – Arbitrary additional keyword arguments may also be specified. This allows extra parameters to be specified that are specific to particular force fields.
- Returns
the newly created System
- Return type
system
-
run_atomtyping
(structure, use_residue_map=True, **kwargs)[source]¶ Atomtype the topology
- Parameters
structure (parmed.structure.Structure) – Molecular structure to find atom types of
use_residue_map (boolean, optional, default=True) – Store atomtyped topologies of residues to a dictionary that maps them to residue names. Each topology, including atomtypes, will be copied to other residues with the same name. This avoids repeatedly calling the subgraph isomorphism on idential residues and should result in better performance for systems with many identical residues, i.e. a box of water. Note that for this to be applied to independent molecules, they must each be saved as different residues in the topology.
Usage Examples¶
Foyer supports atomtyping of both all-atom and coarse-grained molecular systems, and also allows for separate force fields to be used to atom-type separate portions of a molecular system.
Creating a box of ethane¶
Here we use mBuild to construct a box filled with ethane molecules and use Foyer to atom-type the system, applying the OPLS force field, and save to run-able GROMACS files.
import mbuild as mb
from mbuild.examples import Ethane
ethane_box = mb.fill_box(compound=Ethane(), n_compounds=100, box=[2, 2, 2])
ethane_box.save('ethane-box.gro')
ethane_box.save('ethane-box.top', forcefield_name='oplsaa')
Creating a box of coarse-grained ethane¶
Again we will use mBuild to construct a box filled with ethane molecules. However, now we will model ethane using a united-atom description and apply the TraPPE force field during atom-typing. Note how particle names are prefixed with an underscore so that Foyer knows these particles are non-atomistic.
import mbuild as mb
ethane_UA = mb.Compound()
ch3_1 = mb.Particle(name='_CH3', pos=[0, 0, 0])
ch3_2 = mb.Particle(name='_CH3', pos=[0.15, 0, 0])
ethane_UA.add([ch3_1, ch3_2])
ethane_UA.add_bond((ch3_1, ch3_2))
ethane_UA_box = mb.fill_box(ethane_UA, 100, box=[2, 2, 2])
ethane_UA_box.save('ethane-UA-box.gro')
ethane_UA_box.save('ethane-UA-box.top', forcefield_name='trappe-ua')
Combining force fields¶
In some instances, the use of multiple force fields may be desired to describe a molecular system. For example, the user may want to use one force field for a surface and another for a fluid in the same system. Foyer supports this functionality by allowing the user to separately atom-type parts of a system. In this example, we take a system featuring bulk united atom ethane above a silica surface and apply the OPLS force field to the surface and the TraPPE force field to the ethane. The two atomtyped Parmed structures are then combined using a simple ‘+’ operator and can be saved to Gromacs files.
from foyer import Forcefield
from foyer.examples.utils import example_file_path
import mbuild as mb
from mbuild.examples import Ethane
from mbuild.lib.atoms import H
from mbuild.lib.bulk_materials import AmorphousSilica
from mbuild.lib.recipes import SilicaInterface
from mbuild.lib.recipes import Monolayer
interface = SilicaInterface(bulk_silica=AmorphousSilica())
interface = Monolayer(surface=interface, chains=H(), guest_port_name='up')
box = mb.Box(mins=[0, 0, max(interface.xyz[:,2])],
maxs=interface.periodicity + [0, 0, 4])
ethane_box = mb.fill_box(compound=Ethane(), n_compounds=200, box=box)
opls = Forcefield(name='oplsaa')
opls_silica = Forcefield(forcefield_files=example_file_path('opls-silica.xml'))
ethane_box = opls.apply(ethane_box)
interface = opls_silica.apply(interface)
system = interface + ethane_box
system.save('ethane-silica.gro')
system.save('ethane-silica.top')
Paper Examples¶
Contained below are the toy examples from the Usage Examples section of the foyer paper. The source code selections are listed below on this page, there are Jupyter
Notebooks
where you can try these examples yourself. Note that these examples are
meant to showcase the abilities of foyer
through simple examples. If
the user would like to examine more in-depth examples using foyer
with mBuild
, please refer to the tutorial
repository.
Below is Listing 6 from the paper, a python script to fill a \(2x2x2 nm\)
box with 100 ethane molecules. The system is then atomtyped using the
OPLS-AA forcefield. There are two approaches to the same problem
detailed below in this listing, the first approach uses the
forcefield_files
function argument from
mBuild to atomptype the
system (using foyer under the hood). While the second approach creates a
foyer
Forcefield
object, which then calls its apply
function, operating on the mBuild
Compound
to return the
properly atomtyped structure. Note that in all instances when using
foyer
, the chemical system of interest is converted into a
ParmEd
Structure
. Even the mBuild
Compounds
, when
calling the save
routine, are converted into a ParmEd
Structure
before foyer
can atomtype them. The object returned by
foyer
after the atomtypes have been applied are ParmEd
Structures
. This is subject to change in later iterations of
foyer
.
Example 1¶
import mbuild as mb
from mbuild.examples import Ethane
from foyer.examples.utils import example_file_path
from foyer import Forcefield
""" Applying a force field while saving from mBuild """
# Create the chemical topology
ethane_fluid = mb.fill_box(compound=Ethane(), n_compounds=100, box=[2, 2, 2])
# Apply and save the topology
ethane_fluid.save("ethane-box.top", forcefield_files=example_file_path("oplsaa_alkane.xml"))
ethane_fluid.save("ethane-box.gro")
""" Applying a force field directly with foyer """
# Create the chemical topology
ethane_fluid = mb.fill_box(compound=Ethane(), n_compounds=100, box=[2, 2, 2])
# Load the forcefield
opls_alkane = Forcefield(forcefield_files=example_file_path("oplsaa_alkane.xml"))
# Apply the forcefield to atom-type
ethane_fluid = opls_alkane.apply(ethane_fluid)
# Save the atom-typed system
ethane_fluid.save("ethane-box.top", overwrite=True)
ethane_fluid.save("ethane-box.gro", overwrite=True)
Example 2¶
The other example listing from the text showcases the ability to create
two separate chemical topologies and applying different forcefield files
to each. The two parameterized systems that are generated are then
combined into a single ParmEd
Structure
and saved to disk.
from foyer import Forcefield
from foyer.examples.utils import example_file_path
import mbuild as mb
from mbuild.examples import Ethane
from mbuild.lib.atoms import H
from mbuild.lib.bulk_materials import AmorphousSilica
from mbuild.lib.recipes import SilicaInterface
from mbuild.lib.recipes import Monolayer
# Create a silica substrate, capping surface oxygens with hydrogen
silica=SilicaInterface(bulk_silica=AmorphousSilica())
silica_substrate=Monolayer(surface=silica,chains=H(),guest_port_name="up")
# Determine the box dimensions dictated by the silica substrate
box=mb.Box(mins=[0, 0,max(silica.xyz[:,2])],maxs=silica.periodicity+ [0, 0, 4])
# Fill the box with ethane
ethane_fluid=mb.fill_box(compound=Ethane(),n_compounds=200,box=box)
# Load the forcefields
#opls_silica=Forcefield(forcefield_files=get_example_file("oplsaa_with_silica.xml"))
opls_silica=Forcefield(forcefield_files=example_file_path("output.xml"))
opls_alkane=Forcefield(forcefield_files=example_file_path("oplsaa_alkane.xml"))
# Apply the forcefields
silica_substrate=opls_silica.apply(silica_substrate)
ethane_fluid=opls_alkane.apply(ethane_fluid)
# Merge the two topologies
system=silica_substrate+ethane_fluid
# Save the atom-typed system
system.save("ethane-silica.top")
system.save("ethane-silica.gro")
Validation of force field files¶
Foyer performs several validation steps to help prevent malformed force field files and SMARTS strings from making it into production code. Our goal is to provide human readable error messages that users who may not be intimately familiar with XML or our SMARTS parsing grammar can easily act upon.
However, if you receive any unclear error messages or warnings we strongly encourage you to submit an issue detailing the error message you received and, if possible, attach a minimal example of the force field file that created the problem.
XML schema¶
As a first line of defense, any force field files loaded by foyer is
validated by this XML schema
definition. Here we enforce which
elements (e.g. HarmonicBondForce
) are valid and how their attributes
should be formatted. Additionally, the schema ensures that atomtypes are
not 1) defined more than once and that 2) atomtypes referenced in other
sections are actually defined in the <AtomTypes>
element.
SMARTS validation¶
All SMARTS strings used to define atomtypes are parsed. Parsing errors are captured and re-raised with error messages that allow you to pin point the location of the problem in the XML file and within the SMARTS string. Wherever possible, we attempt to provide helpful hints and we welcome any contributions that help improve the clarity of our error messages.
Additionally, we ensure that any atomtypes referenced using the
%type
or overrides
syntax are actually defined
in the <AtomTypes>
element.
Citing foyer¶
If you use foyer for your research, please cite our paper or its pre-print:
ACS
Klein, C.; Summers, A. Z.; Thompson, M. W.; Gilmer, J. B.; Mccabe, C.; Cummings, P. T.; Sallai, J.; Iacovella, C. R. Formalizing Atom-Typing and the Dissemination of Force Fields with Foyer. Computational Materials Science 2019, 167, 215–227.
BibTeX
@article{klein2019,
title = "Formalizing atom-typing and the dissemination of force fields with foyer",
journal = "Computational Materials Science",
volume = "167",
pages = "215 - 227",
year = "2019",
issn = "0927-0256",
doi = "https://doi.org/10.1016/j.commatsci.2019.05.026",
url = "http://www.sciencedirect.com/science/article/pii/S0927025619303040",
author = "Christoph Klein and Andrew Z. Summers and Matthew W. Thompson and Justin B. Gilmer and Clare McCabe and Peter T. Cummings and Janos Sallai and Christopher R. Iacovella",
keywords = "Molecular simulation, Force fields, Reproducibility, Open-source software",
}