gunshotmatch-pipeline
GunShotMatch Analysis Pipeline
Docs |
|
---|---|
Tests |
|
PyPI |
|
Activity |
|
QA |
|
Other |
Installation
python3 -m pip install gunshotmatch-pipeline --user
python3 -m pip install git+https://github.com/GunShotMatch/gunshotmatch-pipeline@master --user
Contents
gunshotmatch_pipeline
GunShotMatch Pipeline.
Functions:
|
Pipeline from raw datafile to a |
|
Construct a project from the given |
gunshotmatch_pipeline.config
Configuration for GunShotMatch analysis.
-
class
Configuration
(pyms_nist_search)[source] Bases:
MethodBase
Overall GunShotMatch configuration.
Methods:
from_json
(json_string)Parse a
Configuration
from a JSON string.from_toml
(toml_string)Parse a
Configuration
from a TOML string.to_toml
()Convert a
Configuration
to a TOML string.Attributes:
Configuration for
pyms_nist_search
.-
classmethod
from_json
(json_string)[source] Parse a
Configuration
from a JSON string.- Parameters
json_string (
str
)- Return type
-
classmethod
from_toml
(toml_string)[source] Parse a
Configuration
from a TOML string.- Parameters
toml_string (
str
)- Return type
-
pyms_nist_search
Type:
PyMSNISTSearchCfg
Configuration for
pyms_nist_search
.
-
to_toml
()[source] Convert a
Configuration
to a TOML string.- Return type
-
classmethod
gunshotmatch_pipeline.decision_tree
Prepare data and train decision trees.
Classes:
|
Class for exporting visualisations of a decision tree or random forest. |
Functions:
|
Returns a |
|
Returns a |
|
Return a dot (graphviz) suitable name for a sample, with special characters escaped. |
|
Fit the classifier to the data. |
|
Return the feature names for the given data. |
|
Predict classes for an unknown sample from a decision tree or random forest. |
|
Generate simulated peak area data for a project. |
|
Visualise a decision tree with graphviz. |
-
class
DecisionTreeVisualiser
(classifier, feature_names, factorize_map)[source] Bases:
object
Class for exporting visualisations of a decision tree or random forest.
New in version 0.8.0.
- Parameters
classifier (
ClassifierMixin
) – Decision tree or random forest classifier.feature_names (
List
[str
]) – The compounds the decision tree was trained on.factorize_map (
List
[str
]) – List of class names in the order they appear as classes in the classifier.
Methods:
__eq__
(other)Return
self == other
.Used for pickling.
__ne__
(other)Return
self != other
.__repr__
()Return a string representation of the
DecisionTreeVisualiser
.__setattr__
(name, val)Implement
setattr(self, name)
.__setstate__
(state)Used for pickling.
from_data
(data, classifier, factorize_map)Alternative constructor from the pandas dataframe the classifier was trained on.
visualise_tree
([filename, filetype])Visualise the decision tree or random forest as an image.
Attributes:
Decision tree or random forest classifier.
List of class names in the order they appear as classes in the classifier.
The compounds the decision tree was trained on.
-
__repr__
() Return a string representation of the
DecisionTreeVisualiser
.- Return type
-
__setattr__
(name, val) Implement
setattr(self, name)
.
-
classifier
Type:
ClassifierMixin
Decision tree or random forest classifier.
-
factorize_map
-
List of class names in the order they appear as classes in the classifier.
-
classmethod
from_data
(data, classifier, factorize_map)[source] Alternative constructor from the pandas dataframe the classifier was trained on.
- Return type
-
data_from_projects
(projects, normalize=False)[source] Returns a
DataFrame
containing decision tree data for the given projects.
-
data_from_unknown
(unknown, feature_names, normalize=False)[source] Returns a
DataFrame
containing decision tree data for the given unknown.- Parameters
unknown (
UnknownSettings
)feature_names (
Collection
[str
]) – The compounds the decision tree was trained on. Extra compounds in the unknown will be excluded.
- Return type
-
dotsafe_name
(name)[source] Return a dot (graphviz) suitable name for a sample, with special characters escaped.
New in version 0.5.0.
-
fit_decision_tree
(data, classifier)[source] Fit the classifier to the data.
- Parameters
data (
DataFrame
)classifier (
ClassifierMixin
)
- Return type
- Returns
List of feature names
-
predict_unknown
(unknown, classifier, factorize_map, feature_names)[source] Predict classes for an unknown sample from a decision tree or random forest.
- Parameters
unknown (
UnknownSettings
)classifier (
ClassifierMixin
)factorize_map (
List
[str
]) – List of class names in the order they appear as classes in the classifier.feature_names (
List
[str
]) – The compounds the decision tree was trained on. Extra compounds in the unknown will be excluded.
- Return type
- Returns
An iterator of predicted class names and their probabilities, ranked from most to least likely.
New in version 0.9.0.
-
simulate_data
(project, normalize=False, n_simulated=10)[source] Generate simulated peak area data for a project.
-
visualise_decision_tree
(data, classifier, factorize_map, filename='decision_tree_graphivz', filetype='svg')[source] Visualise a decision tree with graphviz.
- Parameters
data (
DataFrame
)classifier (
ClassifierMixin
)factorize_map (
List
[str
]) – List of class names in the order they appear as classes in the classifier.filename (
str
) – Output filename without extension; for random forest, the base filename (followed by-tree-n
). Default'decision_tree_graphivz'
.filetype (
str
) – Output filetype (e.g. svg, png, pdf). Default'svg'
.
gunshotmatch_pipeline.decision_tree.export
Export and load decision trees to/from JSON-safe dictionaries..
New in version 0.6.0.
Functions:
|
Serialise a decision tree to a JSON-safe dictionary. |
|
Deserialise a decision tree. |
|
Verify the saved |
|
Serialise a random forest to a JSON-safe dictionary. |
|
Deserialise a random forest. |
|
Verify the saved |
-
serialise_decision_tree
(model)[source] Serialise a decision tree to a JSON-safe dictionary.
- Parameters
model (
DecisionTreeClassifier
) – Trained decision tree.- Return type
-
deserialise_decision_tree
(model_dict)[source] Deserialise a decision tree.
-
verify_saved_decision_tree
(in_process, from_file)[source] Verify the saved
DecisionTreeClassifier
matches the model in memory.Will raise an
AssertionError
if the data do not match.- Parameters
in_process (
DecisionTreeClassifier
) – TheDecisionTreeClassifier
already in memory.from_file (
DecisionTreeClassifier
) – ADecisionTreeClassifier
loaded from disk.
New in version 0.7.0.
-
serialise_random_forest
(model)[source] Serialise a random forest to a JSON-safe dictionary.
- Parameters
model (
RandomForestClassifier
) – Trained random forest.- Return type
-
deserialise_random_forest
(model_dict)[source] Deserialise a random forest.
-
verify_saved_random_forest
(in_process, from_file)[source] Verify the saved
RandomForestClassifier
matches the model in memory.Will raise an
AssertionError
if the data do not match.- Parameters
in_process (
RandomForestClassifier
) – TheRandomForestClassifier
already in memory.from_file (
RandomForestClassifier
) – ARandomForestClassifier
loaded from disk.
New in version 0.7.0.
gunshotmatch_pipeline.decision_tree.predictions
Represents random forest classifier predictions for testing classifier performance.
New in version 0.9.0.
Classes:
|
Represents the predicted classes from a random forest classifier. |
Functions:
|
Return a JSON representation of the predictions. |
|
Load predictions from the given JSON string. |
-
namedtuple
PredictionResult
(name, class_name, predictions)[source] Bases:
NamedTuple
Represents the predicted classes from a random forest classifier.
- Fields
-
property
correct
Returns whether the top prediction matches the actual class name.
- Return type
-
__repr__
() Return a nicely formatted representation string
-
dump_predictions
(predictions, indent=2)[source] Return a JSON representation of the predictions.
- Parameters
predictions (
List
[PredictionResult
])
- Return type
gunshotmatch_pipeline.exporters
Functions and classes for export to disk, and verification of saved data.
Functions:
|
Verify the data in a saved |
|
Verify the data in a saved |
|
Write a CSV file listing the top hits for each peak in the |
|
Write the JSON output file listing the determined “best match” for each peaks. |
-
verify_saved_datafile
(in_process, from_file)[source] Verify the data in a saved
Datafile
matches the data in memory.Will raise an
AssertionError
if the data do not match.
-
verify_saved_project
(in_process, from_file)[source] Verify the data in a saved
Project
matches the data in memory.Will raise an
AssertionError
if the data do not match.
gunshotmatch_pipeline.nist_ms_search
Configuration for pyms_nist_search
and NIST MS Search.
Classes:
|
Initialize the NIST MS Serch engine on demand. |
|
Configuration for |
Functions:
|
Defer initialization of the NIST MS Serch engine until required (if at all). |
|
Initialize the NIST MS Serch engine from |
-
class
LazyEngine
(config, **kwargs)[source] Bases:
object
Initialize the NIST MS Serch engine on demand.
- Parameters
config (
PyMSNISTSearchCfg
)**kwargs – Keyword arguments for
pyms_nist_search.win_engine.Engine
New in version 0.2.0.
Methods:
deinit
()Cleanup the underlying engine and temporary directory.
Attributes:
The NIST MS Search engine.
-
class
PyMSNISTSearchCfg
(library_path, user_library=False)[source] Bases:
libgunshotmatch.method.MethodBase
Configuration for
pyms_nist_search
.- Parameters
Attributes:
Absolute path to the NIST library (mainlib or user).
-
engine_on_demand
(config, **kwargs)[source] Defer initialization of the NIST MS Serch engine until required (if at all).
- Parameters
config (
PyMSNISTSearchCfg
)**kwargs – Keyword arguments for
pyms_nist_search.win_engine.Engine
- Return type
New in version 0.2.0.
-
nist_ms_search_engine
(config, **kwargs)[source] Initialize the NIST MS Serch engine from
pyms_nist_search
.- Parameters
config (
PyMSNISTSearchCfg
)**kwargs – Keyword arguments for
pyms_nist_search.win_engine.Engine
- Return type
gunshotmatch_pipeline.peaks
Peak detection and alignment functions.
Functions:
|
Perform peak alignment and peak filtering for the project, with the given method. |
|
Construct and filter the peak list. |
gunshotmatch_pipeline.projects
Metadata for project pipelines.
Classes:
|
Settings applied for all projects. |
Mixin class providing |
|
|
Settings for a specific project. |
|
Reference data projects to process through the pipeline. |
Functions:
|
Process projects with common methods and config. |
-
class
GlobalSettings
(output_directory='output', method=None, config=None, data_directory=None)[source] Bases:
libgunshotmatch.method.MethodBase
,gunshotmatch_pipeline.projects.LoaderMixin
Settings applied for all projects.
- Parameters
output_directory (
str
) – Relative or absolute path to the directory the output files should be placed in. Default'output'
.method (
Optional
[str
]) – Relative or absolute filename to the method TOML file. The table name is “method”. DefaultNone
.config (
Optional
[str
]) – Relative or absolute filename to the configuration TOML file. The table name is “config”. DefaultNone
.data_directory (
Optional
[str
]) – Relative or absolute path to the directory containing the data files. DefaultNone
.
The method and config files may point to the same TOML file.
Attributes:
Relative or absolute filename to the configuration TOML file.
Relative or absolute path to the directory containing the data files.
Relative or absolute filename to the method TOML file.
Relative or absolute path to the directory the output files should be placed in.
-
config
-
Relative or absolute filename to the configuration TOML file. The table name is “gunshotmatch”.
-
data_directory
-
Relative or absolute path to the directory containing the data files.
-
method
-
Relative or absolute filename to the method TOML file. The table name is “method”.
-
class
LoaderMixin
[source] Bases:
object
Mixin class providing
load_method()
andload_config()
methods.Methods:
Load the configuration for this project from the specified file.
Load the method for this project from the specified file.
-
class
ProjectSettings
(name, datafiles, method=None, config=None, data_directory=None)[source] Bases:
libgunshotmatch.method.MethodBase
,gunshotmatch_pipeline.projects.LoaderMixin
Settings for a specific project.
- Parameters
name (
str
) – The project name.datafiles (
List
[str
]) – List of input datafiles (paths relative to the data_directory option)method (
Optional
[str
]) – Relative or absolute filename to the method TOML file. The table name is “method”. DefaultNone
.config (
Optional
[str
]) – Relative or absolute filename to the configuration TOML file. The table name is “config”. DefaultNone
.data_directory (
Optional
[str
]) – Relative or absolute path to the directory containing the data files. DefaultNone
.
Attributes:
Relative or absolute filename to the configuration TOML file.
Relative or absolute path to the directory containing the data files.
List of input datafiles (paths relative to the data_directory option)
Relative or absolute filename to the method TOML file.
The project name.
Methods:
Returns an iterator over paths to the datafiles.
-
config
-
Relative or absolute filename to the configuration TOML file. The table name is “config”.
-
data_directory
-
Relative or absolute path to the directory containing the data files.
-
get_datafile_paths
()[source] Returns an iterator over paths to the datafiles.
The paths start with
data_directory
if set.
-
method
-
Relative or absolute filename to the method TOML file. The table name is “method”.
-
class
Projects
(global_settings=GlobalSettings(output_directory='output', method=None, config=None, data_directory=None), per_project_settings={})[source] Bases:
libgunshotmatch.method.MethodBase
Reference data projects to process through the pipeline.
- Parameters
global_settings (
GlobalSettings
) – Settings applied for all projects. DefaultGlobalSettings(output_directory='output', method=None, config=None, data_directory=None)
.per_project_settings (
Dict
[str
,ProjectSettings
]) – Settings for specific projects. Default{}
.
Methods:
from_json
(json_string)Parse a
Projects
from a JSON string.from_toml
(toml_string)Parse a
Projects
from a TOML string.get_project_settings
(project_name)Returns the settings for the given project, taking into account the global settings.
Returns whether all projects have common configuration.
Returns whether all projects have a common method.
Iterate
Project
objects loaded from disk.Iterate over the per-project settings, taking into account the global settings.
load_project
(project_name)Load a previously created project.
to_toml
()Convert a
Configuration
to a TOML string.Attributes:
Settings applied for all projects.
Settings for specific projects.
-
get_project_settings
(project_name)[source] Returns the settings for the given project, taking into account the global settings.
- Parameters
project_name (
str
)- Return type
-
global_settings
Type:
GlobalSettings
Settings applied for all projects.
-
iter_project_settings
()[source] Iterate over the per-project settings, taking into account the global settings.
- Return type
-
per_project_settings
Type:
Dict
[str
,ProjectSettings
]Settings for specific projects.
-
to_toml
()[source] Convert a
Configuration
to a TOML string.- Return type
gunshotmatch_pipeline.results
Results presented in different formats.
Classes:
Return type from |
|
Type hint for the |
|
Type hint for the |
Functions:
|
Returns data on the compounds in each repeat in the project(s). |
|
Prepares data on the compounds in each repeat from the output of |
|
Returns data formatted for training a decision tree or other machine learning model. |
|
Returns data on the “best match” for each peak. |
|
Returns results for an unknown sample. |
|
Returns data formatted for training a decision tree or other machine learning model. |
-
typeddict
Matches
[source] Bases:
TypedDict
Return type from
matches()
.- Required Keys
metadata (
MatchesMetadata
)compounds (
Dict
[str
,MatchesCompounds
])
-
compounds
(*project, normalize=False)[source] Returns data on the compounds in each repeat in the project(s).
The output mapping gives the peak areas for each compound in the different projects, grouped by compound.
-
compounds_from_matches
(*matches_data, normalize=False)[source] Prepares data on the compounds in each repeat from the output of
matches()
for each project.The output mapping gives the peak areas for each compound in the different projects, grouped by compound.
-
machine_learning_data
(*project, normalize=False)[source] Returns data formatted for training a decision tree or other machine learning model.
-
unknown
(unknown_project, normalize=False)[source] Returns results for an unknown sample.
The output mapping is formatted the same as that from
compounds()
, but with only one “project”.
gunshotmatch_pipeline.unknowns
Metadata and pipeline for unknown samples.
Classes:
|
Settings for an unknown propellant or OGSR sample. |
Functions:
|
Filter peaks by minimum peak area, then identify compounds. |
|
Process an “unknown” sample. |
-
class
UnknownSettings
(name, datafile, method, config, output_directory, data_directory='')[source] Bases:
libgunshotmatch.method.MethodBase
,gunshotmatch_pipeline.projects.LoaderMixin
Settings for an unknown propellant or OGSR sample.
- Parameters
name (
str
) – The unknown sample’s name or identifier.datafile (
str
) – The input datafilemethod (
str
) – Relative or absolute filename to the method TOML file. The table name is “method”.config (
str
) – Relative or absolute filename to the configuration TOML file. The table name is “config”.output_directory (
str
) – Relative or absolute path to the directory the output files should be placed in.data_directory (
str
) – Relative or absolute path to the directory containing the data files. Default''
.
Attributes:
Relative or absolute filename to the configuration TOML file.
Relative or absolute path to the directory containing the data files.
The input datafile
The absolute path to the datafile.
Relative or absolute filename to the method TOML file.
The unknown sample’s name or identifier.
Relative or absolute path to the directory the output files should be placed in.
Methods:
from_json
(json_string)Parse an
UnknownSettings
from a JSON string.from_toml
(toml_string)Parse an
UnknownSettings
from a TOML string.to_toml
()Convert an
UnknownSettings
to a TOML string.-
config
Type:
str
Relative or absolute filename to the configuration TOML file. The table name is “config”.
-
classmethod
from_json
(json_string)[source] Parse an
UnknownSettings
from a JSON string.- Parameters
json_string (
str
)- Return type
-
classmethod
from_toml
(toml_string)[source] Parse an
UnknownSettings
from a TOML string.- Parameters
toml_string (
str
)- Return type
-
method
Type:
str
Relative or absolute filename to the method TOML file. The table name is “method”.
-
output_directory
Type:
str
Relative or absolute path to the directory the output files should be placed in.
-
to_toml
()[source] Convert an
UnknownSettings
to a TOML string.- Return type
gunshotmatch_pipeline.utils
General utility functions.
Classes:
Class for mapping IUPAC preferred names to more common, friendlier names. |
Data:
Mapping of IUPAC preferred names to more common, friendlier names. |
Functions:
-
project_plural
(*args, **kwargs) = Plural('project', 'projects') domdf_python_tools.words.Plural
forproject
.
-
unknown_plural
(*args, **kwargs) = Plural('unknown', 'unknowns') domdf_python_tools.words.Plural
forunknown
.New in version 0.9.0.
-
friendly_name_mapping
Type:
NameMapping
Mapping of IUPAC preferred names to more common, friendlier names.
-
class
NameMapping
[source] -
Class for mapping IUPAC preferred names to more common, friendlier names.
On lookup, if the name has no known alias the looked-up name is returned.
New in version 0.4.0.
The module also provides either tomli
or tomllib
(depending on Python version)
through the tomllib
attribute.
Contributing
gunshotmatch-pipeline
uses tox to automate testing and packaging,
and pre-commit to maintain code quality.
Install pre-commit
with pip
and install the git hook:
python -m pip install pre-commit
pre-commit install
Coding style
formate is used for code formatting.
It can be run manually via pre-commit
:
pre-commit run formate -a
Or, to run the complete autoformatting suite:
pre-commit run -a
Automated tests
Tests are run with tox
and pytest
.
To run tests for a specific Python version, such as Python 3.6:
tox -e py36
To run tests for all Python versions, simply run:
tox
Build documentation locally
The documentation is powered by Sphinx. A local copy of the documentation can be built with tox
:
tox -e docs
Downloading source code
The gunshotmatch-pipeline
source code is available on GitHub,
and can be accessed from the following URL: https://github.com/GunShotMatch/gunshotmatch-pipeline
If you have git
installed, you can clone the repository with the following command:
git clone https://github.com/GunShotMatch/gunshotmatch-pipeline
Cloning into 'gunshotmatch-pipeline'...
remote: Enumerating objects: 47, done.
remote: Counting objects: 100% (47/47), done.
remote: Compressing objects: 100% (41/41), done.
remote: Total 173 (delta 16), reused 17 (delta 6), pack-reused 126
Receiving objects: 100% (173/173), 126.56 KiB | 678.00 KiB/s, done.
Resolving deltas: 100% (66/66), done.

Downloading a ‘zip’ file of the source code
Building from source
The recommended way to build gunshotmatch-pipeline
is to use tox:
tox -e build
The source and wheel distributions will be in the directory dist
.
If you wish, you may also use pep517.build or another PEP 517-compatible build tool.
License
gunshotmatch-pipeline
is licensed under the MIT License
A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
Permissions | Conditions | Limitations |
---|---|---|
|
|
Copyright (c) 2023 Dominic Davis-Foster
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
OR OTHER DEALINGS IN THE SOFTWARE.
View the Function Index or browse the Source Code.