podium.validation package¶

Submodules¶

podium.validation.kfold module¶

class podium.validation.kfold.KFold(n_splits='warn', shuffle=False, random_state=None)¶

Bases: sklearn.model_selection._split.KFold

Adapter class for the scikit-learn KFold class. Works with podium datasets directly.

split(dataset)¶

Splits the dataset into multiple train and test folds often used in model validation.

Parameters: dataset (dataset) – The dataset to be split into folds.
Yields: train_set, test_set – Yields the train and test datasets for every fold.

podium.validation.validation module¶

podium.validation.validation.k_fold_classification_metrics(experiment: podium.models.experiment.Experiment, dataset: podium.datasets.dataset.Dataset, n_splits: int, average: str = 'micro', beta: float = 1.0, labels: List[int] = None, pos_label: int = 1, shuffle: Optional[bool] = False, random_state: int = None) → Tuple[float, float, float, float]¶

Calculates the most often used classification metrics : accuracy, precision, recall and the F1 score. All scores are calculated for every fold and the mean of every score over all folds is returned.

Parameters

experiment (Experiment) – Experiment defining the training and prediction procedure to be evaluated.
dataset (Dataset) – Dataset to be used for experiment evaluation.
n_splits (int) – Number of folds.
average (str, Optional) –
Determines the type of averaging performed.

The supported averaging methods are:

’micro’:
Calculate metrics globally by counting the total true positives, false negatives and false positives.

’macro’:
Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

’weighted’:
Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

binary:
Only report results for the class specified by pos_label. This is applicable only if targets (i.e. results of predict) are binary.

None:
The scores for each class are returned.
beta (float) – The strength of recall versus precision in the F-score.
labels (List, optional) – The set of labels to include when average != ‘binary’, and their order if average is None. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class, while labels not present in the data will result in 0 components in a macro average. For multilabel targets, labels are column indices.
pos_label (int) – The class to report if average=’binary’ and the data is binary. If the data are multiclass or multilabel, this will be ignored; setting labels=[pos_label] and average != ‘binary’ will report scores for that label only.
shuffle (boolean, optional) – Whether to shuffle the data before splitting into batches.
random_state (int, RandomState instance or None) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when shuffle == True.

Returns

A tuple containing four classification metrics: accuracy, precision, recall, f1 Each score returned is a mean of that score over all folds.

Return type

tuple(float, float, float, float)

Raises

ValueError – If average is not one of: micro, macro, weighted, binary

podium.validation.validation.k_fold_validation(experiment: podium.models.experiment.Experiment, dataset: podium.datasets.dataset.Dataset, n_splits: int, score_fun: Callable[[numpy.ndarray, numpy.ndarray], float], shuffle: Optional[bool] = False, random_state: int = None) → Union[numpy.ndarray, int, float]¶

Convenience function for kfold_scores. Calculates scores for every fold and returns the mean of all scores.

Parameters

experiment (Experiment) – Experiment defining the training and prediction procedure to be evaluated.
dataset (Dataset) – Dataset to be used for experiment evaluation.
n_splits (int) – Number of folds.
score_fun (Callable (y_true, y_predicted) -> score) – Callable used to evaluate the score for a fold. This callable should take two numpy array arguments: y_true and y_predicted. y_true is the ground truth while y_predicted are the model’s predictions. This callable should return a score that can be either a numpy array, a int or a float.
shuffle (boolean, optional) – Whether to shuffle the data before splitting into batches.
random_state (int, RandomState instance or None) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when shuffle == True.

Returns

Return type

The mean of all scores for every fold.

podium.validation.validation.kfold_scores(experiment: podium.models.experiment.Experiment, dataset: podium.datasets.dataset.Dataset, n_splits: int, score_fun: Callable[[numpy.ndarray, numpy.ndarray], Union[numpy.ndarray, int, float]], shuffle: Optional[bool] = False, random_state: int = None) → List[Union[numpy.ndarray, int, float]]¶

Calculates a score for each train/test fold. The score for a fold is calculated by first fitting the experiment to the train split and then using the test split to calculate predictions and evaluate the score. This is repeated for every fold.

Parameters

experiment (Experiment) – Experiment defining the training and prediction procedure to be evaluated.
dataset (Dataset) – Dataset to be used for experiment evaluation.
n_splits (int) – Number of folds.
score_fun (Callable (y_true, y_predicted) -> score) – Callable used to evaluate the score for a fold. This callable should take two numpy array arguments: y_true and y_predicted. y_true is the ground truth while y_predicted are the model’s predictions. This callable should return a score that can be either a numpy array, a int or a float.
shuffle (boolean, optional) – Whether to shuffle the data before splitting into batches.
random_state (int, RandomState instance or None) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when shuffle == True.

Returns

Return type

a List of scores provided by score_fun for every fold.

Module contents¶

Package contains modules used in model validation

class podium.validation.KFold(n_splits='warn', shuffle=False, random_state=None)¶

Bases: sklearn.model_selection._split.KFold

Adapter class for the scikit-learn KFold class. Works with podium datasets directly.

split(dataset)¶

Splits the dataset into multiple train and test folds often used in model validation.

Parameters: dataset (dataset) – The dataset to be split into folds.
Yields: train_set, test_set – Yields the train and test datasets for every fold.

podium.validation.kfold_scores(experiment: podium.models.experiment.Experiment, dataset: podium.datasets.dataset.Dataset, n_splits: int, score_fun: Callable[[numpy.ndarray, numpy.ndarray], Union[numpy.ndarray, int, float]], shuffle: Optional[bool] = False, random_state: int = None) → List[Union[numpy.ndarray, int, float]]¶

Calculates a score for each train/test fold. The score for a fold is calculated by first fitting the experiment to the train split and then using the test split to calculate predictions and evaluate the score. This is repeated for every fold.

Parameters

experiment (Experiment) – Experiment defining the training and prediction procedure to be evaluated.
dataset (Dataset) – Dataset to be used for experiment evaluation.
n_splits (int) – Number of folds.
score_fun (Callable (y_true, y_predicted) -> score) – Callable used to evaluate the score for a fold. This callable should take two numpy array arguments: y_true and y_predicted. y_true is the ground truth while y_predicted are the model’s predictions. This callable should return a score that can be either a numpy array, a int or a float.
shuffle (boolean, optional) – Whether to shuffle the data before splitting into batches.
random_state (int, RandomState instance or None) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when shuffle == True.

Returns

Return type

a List of scores provided by score_fun for every fold.

podium.validation.k_fold_validation(experiment: podium.models.experiment.Experiment, dataset: podium.datasets.dataset.Dataset, n_splits: int, score_fun: Callable[[numpy.ndarray, numpy.ndarray], float], shuffle: Optional[bool] = False, random_state: int = None) → Union[numpy.ndarray, int, float]¶

Convenience function for kfold_scores. Calculates scores for every fold and returns the mean of all scores.

Parameters

experiment (Experiment) – Experiment defining the training and prediction procedure to be evaluated.
dataset (Dataset) – Dataset to be used for experiment evaluation.
n_splits (int) – Number of folds.
score_fun (Callable (y_true, y_predicted) -> score) – Callable used to evaluate the score for a fold. This callable should take two numpy array arguments: y_true and y_predicted. y_true is the ground truth while y_predicted are the model’s predictions. This callable should return a score that can be either a numpy array, a int or a float.
shuffle (boolean, optional) – Whether to shuffle the data before splitting into batches.
random_state (int, RandomState instance or None) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when shuffle == True.

Returns

Return type

The mean of all scores for every fold.

podium.validation.k_fold_classification_metrics(experiment: podium.models.experiment.Experiment, dataset: podium.datasets.dataset.Dataset, n_splits: int, average: str = 'micro', beta: float = 1.0, labels: List[int] = None, pos_label: int = 1, shuffle: Optional[bool] = False, random_state: int = None) → Tuple[float, float, float, float]¶

Calculates the most often used classification metrics : accuracy, precision, recall and the F1 score. All scores are calculated for every fold and the mean of every score over all folds is returned.

Parameters

experiment (Experiment) – Experiment defining the training and prediction procedure to be evaluated.
dataset (Dataset) – Dataset to be used for experiment evaluation.
n_splits (int) – Number of folds.
average (str, Optional) –
Determines the type of averaging performed.

The supported averaging methods are:

’micro’:
Calculate metrics globally by counting the total true positives, false negatives and false positives.

’macro’:
Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

’weighted’:
Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

binary:
Only report results for the class specified by pos_label. This is applicable only if targets (i.e. results of predict) are binary.

None:
The scores for each class are returned.
beta (float) – The strength of recall versus precision in the F-score.
labels (List, optional) – The set of labels to include when average != ‘binary’, and their order if average is None. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class, while labels not present in the data will result in 0 components in a macro average. For multilabel targets, labels are column indices.
pos_label (int) – The class to report if average=’binary’ and the data is binary. If the data are multiclass or multilabel, this will be ignored; setting labels=[pos_label] and average != ‘binary’ will report scores for that label only.
shuffle (boolean, optional) – Whether to shuffle the data before splitting into batches.
random_state (int, RandomState instance or None) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when shuffle == True.

Returns

A tuple containing four classification metrics: accuracy, precision, recall, f1 Each score returned is a mean of that score over all folds.

Return type

tuple(float, float, float, float)

Raises

ValueError – If average is not one of: micro, macro, weighted, binary

podium.validation package¶

Submodules¶

podium.validation.kfold module¶

podium.validation.validation module¶

Module contents¶

Table of Contents

Previous topic

This Page