gunshotmatch_pipeline.decision_tree
Prepare data and train decision trees.
Classes:
|
Class for exporting visualisations of a decision tree or random forest. |
Functions:
|
Returns a |
|
Returns a |
|
Return a dot (graphviz) suitable name for a sample, with special characters escaped. |
|
Fit the classifier to the data. |
|
Return the feature names for the given data. |
|
Predict classes for an unknown sample from a decision tree or random forest. |
|
Generate simulated peak area data for a project. |
|
Visualise a decision tree with graphviz. |
-
class
DecisionTreeVisualiser
(classifier, feature_names, factorize_map)[source] Bases:
object
Class for exporting visualisations of a decision tree or random forest.
New in version 0.8.0.
- Parameters
classifier (
ClassifierMixin
) – Decision tree or random forest classifier.feature_names (
List
[str
]) – The compounds the decision tree was trained on.factorize_map (
List
[str
]) – List of class names in the order they appear as classes in the classifier.
Methods:
__eq__
(other)Return
self == other
.Used for pickling.
__ne__
(other)Return
self != other
.__repr__
()Return a string representation of the
DecisionTreeVisualiser
.__setattr__
(name, val)Implement
setattr(self, name)
.__setstate__
(state)Used for pickling.
from_data
(data, classifier, factorize_map)Alternative constructor from the pandas dataframe the classifier was trained on.
visualise_tree
([filename, filetype])Visualise the decision tree or random forest as an image.
Attributes:
Decision tree or random forest classifier.
List of class names in the order they appear as classes in the classifier.
The compounds the decision tree was trained on.
-
__repr__
() Return a string representation of the
DecisionTreeVisualiser
.- Return type
-
__setattr__
(name, val) Implement
setattr(self, name)
.
-
classifier
Type:
ClassifierMixin
Decision tree or random forest classifier.
-
factorize_map
-
List of class names in the order they appear as classes in the classifier.
-
classmethod
from_data
(data, classifier, factorize_map)[source] Alternative constructor from the pandas dataframe the classifier was trained on.
- Return type
-
data_from_projects
(projects, normalize=False)[source] Returns a
DataFrame
containing decision tree data for the given projects.
-
data_from_unknown
(unknown, feature_names, normalize=False)[source] Returns a
DataFrame
containing decision tree data for the given unknown.- Parameters
unknown (
UnknownSettings
)feature_names (
Collection
[str
]) – The compounds the decision tree was trained on. Extra compounds in the unknown will be excluded.
- Return type
-
dotsafe_name
(name)[source] Return a dot (graphviz) suitable name for a sample, with special characters escaped.
New in version 0.5.0.
-
fit_decision_tree
(data, classifier)[source] Fit the classifier to the data.
- Parameters
data (
DataFrame
)classifier (
ClassifierMixin
)
- Return type
- Returns
List of feature names
-
predict_unknown
(unknown, classifier, factorize_map, feature_names)[source] Predict classes for an unknown sample from a decision tree or random forest.
- Parameters
unknown (
UnknownSettings
)classifier (
ClassifierMixin
)factorize_map (
List
[str
]) – List of class names in the order they appear as classes in the classifier.feature_names (
List
[str
]) – The compounds the decision tree was trained on. Extra compounds in the unknown will be excluded.
- Return type
- Returns
An iterator of predicted class names and their probabilities, ranked from most to least likely.
New in version 0.9.0.
-
simulate_data
(project, normalize=False, n_simulated=10)[source] Generate simulated peak area data for a project.
-
visualise_decision_tree
(data, classifier, factorize_map, filename='decision_tree_graphivz', filetype='svg')[source] Visualise a decision tree with graphviz.
- Parameters
data (
DataFrame
)classifier (
ClassifierMixin
)factorize_map (
List
[str
]) – List of class names in the order they appear as classes in the classifier.filename (
str
) – Output filename without extension; for random forest, the base filename (followed by-tree-n
). Default'decision_tree_graphivz'
.filetype (
str
) – Output filetype (e.g. svg, png, pdf). Default'svg'
.
gunshotmatch_pipeline.decision_tree.export
Export and load decision trees to/from JSON-safe dictionaries..
New in version 0.6.0.
Functions:
|
Serialise a decision tree to a JSON-safe dictionary. |
|
Deserialise a decision tree. |
|
Verify the saved |
|
Serialise a random forest to a JSON-safe dictionary. |
|
Deserialise a random forest. |
|
Verify the saved |
-
serialise_decision_tree
(model)[source] Serialise a decision tree to a JSON-safe dictionary.
- Parameters
model (
DecisionTreeClassifier
) – Trained decision tree.- Return type
-
deserialise_decision_tree
(model_dict)[source] Deserialise a decision tree.
-
verify_saved_decision_tree
(in_process, from_file)[source] Verify the saved
DecisionTreeClassifier
matches the model in memory.Will raise an
AssertionError
if the data do not match.- Parameters
in_process (
DecisionTreeClassifier
) – TheDecisionTreeClassifier
already in memory.from_file (
DecisionTreeClassifier
) – ADecisionTreeClassifier
loaded from disk.
New in version 0.7.0.
-
serialise_random_forest
(model)[source] Serialise a random forest to a JSON-safe dictionary.
- Parameters
model (
RandomForestClassifier
) – Trained random forest.- Return type
-
deserialise_random_forest
(model_dict)[source] Deserialise a random forest.
-
verify_saved_random_forest
(in_process, from_file)[source] Verify the saved
RandomForestClassifier
matches the model in memory.Will raise an
AssertionError
if the data do not match.- Parameters
in_process (
RandomForestClassifier
) – TheRandomForestClassifier
already in memory.from_file (
RandomForestClassifier
) – ARandomForestClassifier
loaded from disk.
New in version 0.7.0.
gunshotmatch_pipeline.decision_tree.predictions
Represents random forest classifier predictions for testing classifier performance.
New in version 0.9.0.
Classes:
|
Represents the predicted classes from a random forest classifier. |
Functions:
|
Return a JSON representation of the predictions. |
|
Load predictions from the given JSON string. |
-
namedtuple
PredictionResult
(name, class_name, predictions)[source] Bases:
NamedTuple
Represents the predicted classes from a random forest classifier.
- Fields
-
property
correct
Returns whether the top prediction matches the actual class name.
- Return type
-
__repr__
() Return a nicely formatted representation string
-
dump_predictions
(predictions, indent=2)[source] Return a JSON representation of the predictions.
- Parameters
predictions (
List
[PredictionResult
])
- Return type