gunshotmatch_pipeline.decision_tree

Prepare data and train decision trees.

Classes:

DecisionTreeVisualiser(classifier, …)

Class for exporting visualisations of a decision tree or random forest.

Functions:

data_from_projects(projects[, normalize])

Returns a DataFrame containing decision tree data for the given projects.

data_from_unknown(unknown, feature_names[, …])

Returns a DataFrame containing decision tree data for the given unknown.

dotsafe_name(name)

Return a dot (graphviz) suitable name for a sample, with special characters escaped.

fit_decision_tree(data, classifier)

Fit the classifier to the data.

get_feature_names(data)

Return the feature names for the given data.

predict_unknown(unknown, classifier, …)

Predict classes for an unknown sample from a decision tree or random forest.

simulate_data(project[, normalize, n_simulated])

Generate simulated peak area data for a project.

visualise_decision_tree(data, classifier, …)

Visualise a decision tree with graphviz.

class DecisionTreeVisualiser(classifier, feature_names, factorize_map)[source]

Bases: object

Class for exporting visualisations of a decision tree or random forest.

New in version 0.8.0.

Parameters
  • classifier (ClassifierMixin) – Decision tree or random forest classifier.

  • feature_names (List[str]) – The compounds the decision tree was trained on.

  • factorize_map (List[str]) – List of class names in the order they appear as classes in the classifier.

Methods:

__eq__(other)

Return self == other.

__getstate__()

Used for pickling.

__ne__(other)

Return self != other.

__repr__()

Return a string representation of the DecisionTreeVisualiser.

__setattr__(name, val)

Implement setattr(self, name).

__setstate__(state)

Used for pickling.

from_data(data, classifier, factorize_map)

Alternative constructor from the pandas dataframe the classifier was trained on.

visualise_tree([filename, filetype])

Visualise the decision tree or random forest as an image.

Attributes:

classifier

Decision tree or random forest classifier.

factorize_map

List of class names in the order they appear as classes in the classifier.

feature_names

The compounds the decision tree was trained on.

__eq__(other)

Return self == other.

Return type

bool

__getstate__()

Used for pickling.

Automatically created by attrs.

__ne__(other)

Return self != other.

Return type

bool

__repr__()

Return a string representation of the DecisionTreeVisualiser.

Return type

str

__setattr__(name, val)

Implement setattr(self, name).

__setstate__(state)

Used for pickling.

Automatically created by attrs.

classifier

Type:    ClassifierMixin

Decision tree or random forest classifier.

factorize_map

Type:    List[str]

List of class names in the order they appear as classes in the classifier.

feature_names

Type:    List[str]

The compounds the decision tree was trained on.

classmethod from_data(data, classifier, factorize_map)[source]

Alternative constructor from the pandas dataframe the classifier was trained on.

Return type

DecisionTreeVisualiser

visualise_tree(filename='decision_tree_graphivz', filetype='svg')[source]

Visualise the decision tree or random forest as an image.

Parameters
  • filename (str) – Output filename without extension; for random forest, the base filename (followed by -tree-n). Default 'decision_tree_graphivz'.

  • filetype (str) – Output filetype (e.g. svg, png, pdf). Default 'svg'.

data_from_projects(projects, normalize=False)[source]

Returns a DataFrame containing decision tree data for the given projects.

Parameters
Return type

Tuple[DataFrame, List[str]]

data_from_unknown(unknown, feature_names, normalize=False)[source]

Returns a DataFrame containing decision tree data for the given unknown.

Parameters
  • unknown (UnknownSettings)

  • feature_names (Collection[str]) – The compounds the decision tree was trained on. Extra compounds in the unknown will be excluded.

  • normalize (bool) – Default False.

Return type

DataFrame

dotsafe_name(name)[source]

Return a dot (graphviz) suitable name for a sample, with special characters escaped.

Parameters

name (str)

Return type

str

New in version 0.5.0.

fit_decision_tree(data, classifier)[source]

Fit the classifier to the data.

Parameters
Return type

List[str]

Returns

List of feature names

get_feature_names(data)[source]

Return the feature names for the given data.

Parameters

data (DataFrame)

Return type

List[str]

predict_unknown(unknown, classifier, factorize_map, feature_names)[source]

Predict classes for an unknown sample from a decision tree or random forest.

Parameters
  • unknown (UnknownSettings)

  • classifier (ClassifierMixin)

  • factorize_map (List[str]) – List of class names in the order they appear as classes in the classifier.

  • feature_names (List[str]) – The compounds the decision tree was trained on. Extra compounds in the unknown will be excluded.

Return type

Iterator[Tuple[str, float]]

Returns

An iterator of predicted class names and their probabilities, ranked from most to least likely.

New in version 0.9.0.

simulate_data(project, normalize=False, n_simulated=10)[source]

Generate simulated peak area data for a project.

Parameters
  • project (Project)

  • normalize (bool) – Default False.

  • n_simulated (int) – The number of values to simulate. Default 10.

Return type

DataFrame

visualise_decision_tree(data, classifier, factorize_map, filename='decision_tree_graphivz', filetype='svg')[source]

Visualise a decision tree with graphviz.

Parameters
  • data (DataFrame)

  • classifier (ClassifierMixin)

  • factorize_map (List[str]) – List of class names in the order they appear as classes in the classifier.

  • filename (str) – Output filename without extension; for random forest, the base filename (followed by -tree-n). Default 'decision_tree_graphivz'.

  • filetype (str) – Output filetype (e.g. svg, png, pdf). Default 'svg'.

gunshotmatch_pipeline.decision_tree.export

Export and load decision trees to/from JSON-safe dictionaries..

New in version 0.6.0.

Functions:

serialise_decision_tree(model)

Serialise a decision tree to a JSON-safe dictionary.

deserialise_decision_tree(model_dict)

Deserialise a decision tree.

verify_saved_decision_tree(in_process, from_file)

Verify the saved DecisionTreeClassifier matches the model in memory.

serialise_random_forest(model)

Serialise a random forest to a JSON-safe dictionary.

deserialise_random_forest(model_dict)

Deserialise a random forest.

verify_saved_random_forest(in_process, from_file)

Verify the saved RandomForestClassifier matches the model in memory.

serialise_decision_tree(model)[source]

Serialise a decision tree to a JSON-safe dictionary.

Parameters

model (DecisionTreeClassifier) – Trained decision tree.

Return type

Dict[str, Any]

deserialise_decision_tree(model_dict)[source]

Deserialise a decision tree.

Parameters

model_dict (Dict[str, Any]) – JSON-safe representation of the decision tree.

Return type

DecisionTreeClassifier

verify_saved_decision_tree(in_process, from_file)[source]

Verify the saved DecisionTreeClassifier matches the model in memory.

Will raise an AssertionError if the data do not match.

Parameters

New in version 0.7.0.

serialise_random_forest(model)[source]

Serialise a random forest to a JSON-safe dictionary.

Parameters

model (RandomForestClassifier) – Trained random forest.

Return type

Dict[str, Any]

deserialise_random_forest(model_dict)[source]

Deserialise a random forest.

Parameters

model_dict (Dict[str, Any]) – JSON-safe representation of the random forest.

Return type

RandomForestClassifier

verify_saved_random_forest(in_process, from_file)[source]

Verify the saved RandomForestClassifier matches the model in memory.

Will raise an AssertionError if the data do not match.

Parameters

New in version 0.7.0.

gunshotmatch_pipeline.decision_tree.predictions

Represents random forest classifier predictions for testing classifier performance.

New in version 0.9.0.

Classes:

PredictionResult(name, class_name, predictions)

Represents the predicted classes from a random forest classifier.

Functions:

dump_predictions(predictions[, indent])

Return a JSON representation of the predictions.

load_predictions(predictions_json)

Load predictions from the given JSON string.

namedtuple PredictionResult(name, class_name, predictions)[source]

Bases: NamedTuple

Represents the predicted classes from a random forest classifier.

Fields
  1.  name (str) – the sample name e.g. “Unknown Western Double A”

  2.  class_name (str) – i.e. the ammo type e.g. “Western Double A”

  3.  predictions (Tuple[Tuple[str, float], …]) – Tuples of (<class name>, <probability>).

property correct

Returns whether the top prediction matches the actual class name.

Return type

bool

__repr__()

Return a nicely formatted representation string

dump_predictions(predictions, indent=2)[source]

Return a JSON representation of the predictions.

Parameters
Return type

str

load_predictions(predictions_json)[source]

Load predictions from the given JSON string.

Parameters

predictions_json (str)

Return type

List[PredictionResult]