gunshotmatch_pipeline.decision_tree
Prepare data and train decision trees.
Classes:
|
Class for exporting visualisations of a decision tree or random forest. |
Functions:
|
Returns a |
|
Returns a |
|
Return a dot (graphviz) suitable name for a sample, with special characters escaped. |
|
Fit the classifier to the data. |
|
Return the feature names for the given data. |
|
Predict classes for an unknown sample from a decision tree or random forest. |
|
Generate simulated peak area data for a project. |
|
Visualise a decision tree with graphviz. |
-
class
DecisionTreeVisualiser(classifier, feature_names, factorize_map)[source] Bases:
objectClass for exporting visualisations of a decision tree or random forest.
New in version 0.8.0.
- Parameters
classifier (
ClassifierMixin) – Decision tree or random forest classifier.feature_names (
List[str]) – The compounds the decision tree was trained on.factorize_map (
List[str]) – List of class names in the order they appear as classes in the classifier.
Methods:
__eq__(other)Return
self == other.Used for pickling.
__ne__(other)Check equality and either forward a NotImplemented or return the result negated.
__repr__()Return a string representation of the
DecisionTreeVisualiser.__setattr__(name, val)Implement
setattr(self, name).__setstate__(state)Used for pickling.
from_data(data, classifier, factorize_map)Alternative constructor from the pandas dataframe the classifier was trained on.
visualise_tree([filename, filetype])Visualise the decision tree or random forest as an image.
Attributes:
Decision tree or random forest classifier.
List of class names in the order they appear as classes in the classifier.
The compounds the decision tree was trained on.
-
__ne__(other) Check equality and either forward a NotImplemented or return the result negated.
- Return type
-
__repr__() Return a string representation of the
DecisionTreeVisualiser.- Return type
-
__setattr__(name, val) Implement
setattr(self, name).
-
classifier Type:
ClassifierMixinDecision tree or random forest classifier.
-
factorize_map -
List of class names in the order they appear as classes in the classifier.
-
classmethod
from_data(data, classifier, factorize_map)[source] Alternative constructor from the pandas dataframe the classifier was trained on.
- Parameters
data (
DataFrame)classifier (
ClassifierMixin) – Decision tree or random forest classifier.factorize_map (
List[str]) – List of class names in the order they appear as classes in the classifier.
- Return type
-
data_from_projects(projects, normalize=False)[source] Returns a
DataFramecontaining decision tree data for the given projects.
-
data_from_unknown(unknown, feature_names, normalize=False)[source] Returns a
DataFramecontaining decision tree data for the given unknown.- Parameters
unknown (
UnknownSettings)feature_names (
Collection[str]) – The compounds the decision tree was trained on. Extra compounds in the unknown will be excluded.
- Return type
-
dotsafe_name(name)[source] Return a dot (graphviz) suitable name for a sample, with special characters escaped.
New in version 0.5.0.
-
fit_decision_tree(data, classifier)[source] Fit the classifier to the data.
- Parameters
data (
DataFrame)classifier (
ClassifierMixin)
- Return type
- Returns
List of feature names
-
predict_unknown(unknown, classifier, factorize_map, feature_names)[source] Predict classes for an unknown sample from a decision tree or random forest.
- Parameters
unknown (
UnknownSettings)classifier (
ClassifierMixin)factorize_map (
List[str]) – List of class names in the order they appear as classes in the classifier.feature_names (
List[str]) – The compounds the decision tree was trained on. Extra compounds in the unknown will be excluded.
- Return type
- Returns
An iterator of predicted class names and their probabilities, ranked from most to least likely.
New in version 0.9.0.
-
simulate_data(project, normalize=False, n_simulated=10)[source] Generate simulated peak area data for a project.
-
visualise_decision_tree(data, classifier, factorize_map, filename='decision_tree_graphivz', filetype='svg')[source] Visualise a decision tree with graphviz.
- Parameters
data (
DataFrame)classifier (
ClassifierMixin)factorize_map (
List[str]) – List of class names in the order they appear as classes in the classifier.filename (
str) – Output filename without extension; for random forest, the base filename (followed by-tree-n). Default'decision_tree_graphivz'.filetype (
str) – Output filetype (e.g. svg, png, pdf). Default'svg'.
gunshotmatch_pipeline.decision_tree.export
Export and load decision trees to/from JSON-safe dictionaries..
New in version 0.6.0.
Functions:
|
Serialise a decision tree to a JSON-safe dictionary. |
|
Deserialise a decision tree. |
|
Verify the saved |
|
Serialise a random forest to a JSON-safe dictionary. |
|
Deserialise a random forest. |
|
Verify the saved |
-
serialise_decision_tree(model)[source] Serialise a decision tree to a JSON-safe dictionary.
- Parameters
model (
DecisionTreeClassifier) – Trained decision tree.- Return type
-
deserialise_decision_tree(model_dict)[source] Deserialise a decision tree.
-
verify_saved_decision_tree(in_process, from_file)[source] Verify the saved
DecisionTreeClassifiermatches the model in memory.Will raise an
AssertionErrorif the data do not match.- Parameters
in_process (
DecisionTreeClassifier) – TheDecisionTreeClassifieralready in memory.from_file (
DecisionTreeClassifier) – ADecisionTreeClassifierloaded from disk.
New in version 0.7.0.
-
serialise_random_forest(model)[source] Serialise a random forest to a JSON-safe dictionary.
- Parameters
model (
RandomForestClassifier) – Trained random forest.- Return type
-
deserialise_random_forest(model_dict)[source] Deserialise a random forest.
-
verify_saved_random_forest(in_process, from_file)[source] Verify the saved
RandomForestClassifiermatches the model in memory.Will raise an
AssertionErrorif the data do not match.- Parameters
in_process (
RandomForestClassifier) – TheRandomForestClassifieralready in memory.from_file (
RandomForestClassifier) – ARandomForestClassifierloaded from disk.
New in version 0.7.0.
gunshotmatch_pipeline.decision_tree.predictions
Represents random forest classifier predictions for testing classifier performance.
New in version 0.9.0.
Classes:
|
Represents the predicted classes from a random forest classifier. |
Functions:
|
Return a JSON representation of the predictions. |
|
Load predictions from the given JSON string. |
-
namedtuple
PredictionResult(name, class_name, predictions)[source] Bases:
NamedTupleRepresents the predicted classes from a random forest classifier.
- Fields
-
property
correct Returns whether the top prediction matches the actual class name.
- Return type
-
__repr__() Return a nicely formatted representation string
-
dump_predictions(predictions, indent=2)[source] Return a JSON representation of the predictions.
- Parameters
predictions (
List[PredictionResult])
- Return type