API Reference

This part of the documentation details the complete SeFEF API.

Modules

sefef.evaluation

This module contains functions to implement time-series cross validation (TSCV).

copyright:
  1. 2024 by Ana Sofia Carmo

license:

BSD 3-clause License, see LICENSE for more details.

class sefef.evaluation.Dataset(timestamps, samples_duration, sz_onsets)[source]

Bases: object

Create a Dataset with metadata on the data that will be used for training and testing

timestamps

The Unix-time timestamp (in seconds) of the start timestamp of each sample.

Type:

array-like, shape (#samples,)

samples_duration

Duration of samples in seconds.

Type:

array-like, shape (#samples,)

sz_onsets

Contains the Unix-time timestamps (in seconds) corresponding to the onsets of seizures.

Type:

np.array

sampling_frequency

Frequency at which the data is stored in each file.

Type:

int

class sefef.evaluation.TimeSeriesCV(preictal_duration, prediction_latency, n_min_events_train=3, n_min_events_test=1, post_sz_interval=3600, pre_lead_sz_interval=14400, initial_train_duration=None, test_duration=None)[source]

Bases: object

Implements time series cross validation (TSCV).

preictal_duration

Duration of the period (in seconds) that will be labeled as preictal, i.e. that we expect to contain useful information for the forecast

Type:

int, defaults to 3600 (60min)

prediction_latency

Latency (in seconds) of the preictal period with regards to seizure onset.

Type:

int, defaults to 600 (10min)

n_min_events_train

Minimum number of lead seizures to include in the train set. Should guarantee at least one lead seizure is left for testing.

Type:

int, defaults to 3

n_min_events_test

Minimum number of lead seizures to include in the test set. Should guarantee at least one lead seizure is left for testing.

Type:

int, defaults to 1

post_sz_interval

Time interval (in seconds) after a lead seizure that should be included in the same set as the corresponding seizure. This time will be removed from the train set, along with the seizure onset and prediction_latency.

Type:

int

pre_lead_sz_interval

Time interval (in seconds) free of seizures by which a seizure should be preceded to be considered a lead seizure.

Type:

int

initial_train_duration

Set duration of train for initial split (in seconds).

Type:

int, defaults to 1/3 of total recorded duration

test_duration

Set duration of test (in seconds).

Type:

int, defaults to 1/2 of ‘initial_train_duration’

method

Method for TSCV - can be either ‘expanding’ or ‘sliding’. Only ‘expanding’ is implemented atm.

Type:

str

n_folds

Number of folds for the TSCV, determined according to the attributes set by the user and available data.

Type:

int

split_ind_ts

Contains split timestamp indices (train_start_ts, test_start_ts, test_end_ts) for each fold. Is initiated as None and populated during ‘split’ method.

Type:

array-like, shape (n_folds, 3)

split(dataset, iteratively) :

Get timestamp indices to split data for time series cross-validation. - The train set can be obtained by metadata.loc[train_start_ts : test_start_ts]. - The test set can be obtained by metadata.loc[test_start_ts : test_end_ts].

plot(dataset) :

Plots the TSCV folds with the available data.

iterate() :

Iterates over the TSCV folds and at each iteration returns a train set and a test set.

Raises:
  • ValueError : – Raised whenever TSCV is not passible to be performed under the attributes set by the user and available data.

  • AttributeError : – Raised when ‘plot’ is called before ‘split’.

get_TSCV_fold(h5dataset, ifold, remove_non_preictal_interictal_samples=True)[source]

Returns a train set and a test set from corresponding TSCV fold.

Parameters:
  • h5dataset (HDF5 file) – HDF5 file object with the following datasets: - “data”: each entry corresponds to a sample with shape (embedding shape), e.g. (#features, ) or (sample duration, #channels) - “timestamps”: contains the start timestamp (unix in seconds) of each sample in the “data” dataset, with shape (#samples, ). - “annotations”: contains the labels (0: interictal, 1: preictal) for each sample in the “data” dataset, with shape (#samples, ). - “sz_onsets”: contains the Unix timestamps of the onsets of seizures (#sz_onsets, ).

  • ifold (int) – Index corresponding to TSCV fold.

  • remove_non_preictal_interictal_samples (bool) – Whether to remove samples that are neither preictal or interical, i.e. samples containing the onsets of seizures, as well as the intervals corrsponding to “prediction_latency” and “lead_sz_post_interval”.

Returns:

tuple

  • ((train_data, train_annotations, train_timestamps, train_sz_onsets), (test_data, test_annotations, test_timestamps, test_sz_onsets))

  • Where:
    • ”[]_data”: A slice of “h5dataset[“data”]”, with shape (#samples, embedding shape), e.g. (#samples, #features) or (#samples, sample duration, #channels), and dtype “float32”.

    • ”[]_annotations”: A slice of “h5dataset[“annotations”]”, with shape (#samples, ) and dtype “bool”.

    • ”[]_timestamps”: A slice of “h5dataset[“timestamps”]”, with shape (#samples, ) and dtype “int64”.

    • ”[]_sz_onsets”: A slice of “h5dataset[“sz_onsets”]”, with shape (#sz onsets, ) and dtype “int64”.

iterate(h5dataset, remove_non_preictal_interictal_samples=True)[source]

Iterates over the TSCV folds and at each iteration returns a train set and a test set.

Parameters:
  • h5dataset (HDF5 file) – HDF5 file object with the following datasets: - “data”: each entry corresponds to a sample with shape (embedding shape), e.g. (#features, ) or (sample duration, #channels) - “timestamps”: contains the start timestamp (unix in seconds) of each sample in the “data” dataset, with shape (#samples, ). - “annotations”: contains the labels (0: interictal, 1: preictal) for each sample in the “data” dataset, with shape (#samples, ). - “sz_onsets”: contains the Unix timestamps of the onsets of seizures (#sz_onsets, ).

  • remove_non_preictal_interictal_samples (bool) – Whether to remove samples that are neither preictal or interical, i.e. samples containing the onsets of seizures, as well as the intervals corrsponding to “prediction_latency” and “lead_sz_post_interval”.

Returns:

tuple

  • ((train_data, train_annotations, train_timestamps), (test_data, test_sz_onsets, test_timestamps))

  • Where:
    • ”[]_data”: A slice of “h5dataset[“data”]”, with shape (#samples, embedding shape), e.g. (#samples, #features) or (#samples, sample duration, #channels), and dtype “float32”.

    • ”[]_annotations”: A slice of “h5dataset[“annotations”]”, with shape (#samples, ) and dtype “bool”.

    • ”[]_sz_onsets”: A slice of “h5dataset[“sz_onsets”]”, with shape (#sz onsets, ) and dtype “int64”.

    • ”[]_timestamps”: A slice of “h5dataset[“timestamps”]”, with shape (#samples, ) and dtype “int64”.

plot(dataset, folder_path=None, filename=None, mode='lines')[source]

Plots the TSCV folds with the available data.

Parameters:
  • dataset (Dataset) – Instance of Dataset.

  • mode (str) – Trace scatter mode (“lines” or “markers”), for sparse data, “markers” is a more suitable option, despite being heavier to plot.

split(dataset, iteratively=False, plot=False, extend_final_test_set=False)[source]

Get timestamp indices to split data for time series cross-validation. - The train set would be given by metadata.loc[train_start_ts : test_start_ts]. - The test set would be given by metadata.loc[test_start_ts : test_end_ts].

Parameters:

datasetDataset

Instance of Dataset.

iterativelybool, defaults to False

If the split is meant to return the timestamp indices for each fold iteratively (True) or to simply update ‘split_ind_ts’ (False).

plotbool, defaults to False

If a diagram illustrating the TSCV should be shown at the end. ‘iteratively’ cannot be set to True

extend_final_test_setbool

Whether to extend test set in final fold to include all data or keep test duration approximately the same across folds.

Returns:

train_start_tsint

Timestamp index for the start of the train set.

test_start_tsint

Timestamp index for the start of the test set (and end of train set).

test_end_tsint

Timestamp index for the end of the test set.

sefef.labeling

This module contains functions to automatically label samples according to the desired pre-ictal duration and prediction latency.

copyright:
  1. 2024 by Ana Sofia Carmo

license:

BSD 3-clause License, see LICENSE for more details.

sefef.labeling.add_annotations(h5dataset, sz_onsets_ts, preictal_duration=3600, prediction_latency=600)[source]

Add “annotations”, with shape (#samples, ) and dtype “bool”, to HDF5 file object according to the variables “preictal_duration” and “prediction_latency”. Annotations are either 0 (inter-ictal), or 1 (pre-ictal).

Parameters:
  • h5dataset (HDF5 file) – HDF5 file object with the following datasets: - “data”: each entry corresponds to a sample with shape (embedding shape), e.g. (#features, ) or (sample duration, #channels). - “timestamps”: contains the start timestamp (unix in seconds) of each sample in the “data” dataset, with shape (#samples, ). - “sz_onsets”: contains the Unix timestamps of the onsets of seizures (#sz_onsets, ). (optional)

  • sz_onsets_ts (array-like, shape (#sz onsets, )) – Contains the unix timestamps (in seconds) of the onsets of seizures.

  • preictal_duration (int, defaults to 3600 (60min)) – Duration of the period (in seconds) that will be labeled as preictal, i.e. that we expect to contain useful information for the forecast

  • prediction_latency (int, defaults to 600 (10min)) – Latency (in seconds) of the preictal period with regards to seizure onset.

Returns:

None, but adds a dataset instance to the h5dataset file object.

sefef.labeling.add_sz_onsets(h5dataset, sz_onsets_ts)[source]

Add “sz_onsets”, with shape (#seizures, ) and dtype “int64”, to HDF5 file object, corresponding to the Unix timestamps of each seizure onset.

Parameters:
  • h5dataset (HDF5 file) – HDF5 file object with the following datasets: - “data”: each entry corresponds to a sample with shape (embedding shape), e.g. (#features, ) or (sample duration, #channels). - “timestamps”: contains the start timestamp (unix in seconds) of each sample in the “data” dataset, with shape (#samples, ). - “annotations”: contains the annotations (aka labels) of each sample. (optional)

  • sz_onsets_ts (array-like, shape (#sz onsets, )) – Contains the unix timestamps (in seconds) of the onsts of seizures.

Returns:

None, but adds a dataset instance to the h5dataset file object.

sefef.postprocessing

This module contains functions to process individual predicted probabilities into a unified forecast according to the desired forecast horizon. Author: Ana Sofia Carmo

copyright:
  1. 2024 by Ana Sofia Carmo

license:

BSD 3-clause License, see LICENSE for more details.

class sefef.postprocessing.Forecast(pred_proba, timestamps)[source]

Bases: object

Stores the forecasts made by the model and processes them.

pred_proba

Contains the probability predicted by the model for each sample belonging to the pre-ictal class.

Type:

array-like, shape (#samples, ), dtype “float64”

timestamps

Contains the unix timestamps (in seconds) corresponding to the start-time of each sample.

Type:

array-like, shape (#samples, ), dtype “int64”

append(pred_proba, timestamps) :

Appends new predicted probabilities to the ones already in the Forecast object.

postprocess(forecast_horizon) :

Applies postprocessing methodology to the predictions stored in “pred_proba”, according to “forecast horizon” (in seconds). Returns an array with the new probabilities.

Raises:

ValueError : – Description

append(pred_proba, timestamps)[source]
postprocess(forecast_horizon, smooth_win, smooth_sliding=False, origin='clock-time')[source]

Applies post-processing methodology to the predictions stored in “pred_proba”. For each time period with duration equal to “forecast_horizon”, mean predicted probabilities are calculated for groups of consecutive samples (with a window of duration “smooth_win”, in seconds), with or without overlap, and the maximum across the full period is obtained.

Parameters:
  • forecast_horizon (int) – Forecast horizon in seconds, i.e. time in the future for which the forecasts will be issued.

  • smooth_win (int) – Duration of window, in seconds, used to smooth the predicted probabilities. If “smooth_sliding” is set to False, the duration of this variable should sum up to “forecast_horizon”.

  • smooth_sliding (bool, defaults to False) – Whether to use a sliding-window approach during smoothing (with a step of 1 sample), or to use non-overlaping smoothing windows. When True, not yet implemented.

  • origin (str, defaults to "clock-time") – Determines if the forecasts are issued at clock-time (e.g. at the start of each hour) or according to the start-time of the first sample. Options are “clock-time” and “sample-time”, respectively.

Returns:

  • result1 (array-like, shape (#forecasts, ), dtype “float64”) – Contains the predicted probabilites of seizure occurrence for the period with duration “forecast_horizon” and starting at the timestamps in “result2”.

  • result2 (array-like, shape (#forecasts, ), dtype “int64”) – Contains the Unix timestamps, in seconds, for the start of the period for which the forecasts (in “result1”) are valid.

sefef.scoring

This module contains functions to compute both deterministic and probvabilistic metrics according to the horizon of the forecast.

copyright:
  1. 2024 by Ana Sofia Carmo

license:

BSD 3-clause License, see LICENSE for more details.

class sefef.scoring.Scorer(metrics2compute, sz_onsets, forecast_horizon, reference_method='prior_prob', hist_prior_prob=None)[source]

Bases: object

Class description

metrics2compute

List of metrics to compute. The metrics can be either deterministic or probabilistic and metric names should be the ones from the following list: - Deterministic: “Sen” (i.e. sensitivity), “FPR” (i.e. false positive rate), “TiW” (i.e. time in warning), “AUC_TiW” (i.e. area under the curve of Sen vs TiW). - Probabilistic: “resolution”, “reliability”, “BS” (i.e. Brier score), “skill” or “BSS” (i.e. Brier skill score).

Type:

list<str>

sz_onsets

Contains the Unix timestamps, in seconds, for the start of each seizure onset.

Type:

array-like, shape (#seizures, ), dtype “int64”

forecast_horizon

Forecast horizon in seconds, i.e. time in the future for which the forecasts are valid.

Type:

int

performance

Dictionary where the keys are the metrics’ names (as in “metrics2compute”) and the value is the corresponding performance. It is initialized as an empty dictionary and populated in “compute_metrics”.

Type:

dict

reference_method

Method to compute the reference forecasts.

Type:

str, defaults to “prior_prob”

hist_prior_prob

Prior probability, aka historical likelihood (relative frequency) of seizures in train data. Used only as the “hist_prior_prob” reference forecast compute the skill measure.

Type:

float64, defaults to None

compute_metrics(forecasts, timestamps):

Computes metrics in “metrics2compute” for the probabilities in “forecasts” and populates the “performance” attribute. This method uses techniques described in [Mason2004] and [Stephenson2008].

reliability_diagram() :

Description

Raises:
  • ValueError : – Raised when a metric name in “metrics2compute” is not a valid metric or when “reference_method” is not a valid method.

  • AttributeError : – Raised when ‘compute_metrics’ is called before ‘compute_metrics’.

References

[Mason2004]
    1. Mason, “On Using ‘Climatology’ as a Reference Strategy in the Brier and Ranked Probability Skill Scores,” Jul. 2004, Accessed: Nov. 06, 2024. [Online]. Available: https://journals.ametsoc.org/view/journals/mwre/132/7/1520-0493_2004_132_1891_oucaar_2.0.co_2.xml

[Stephenson2008]

Stephenson, D. B. , C. A. S. Coelho, and I. T. Jolliffe. “Two Extra Components in the Brier Score Decomposition”, Weather and Forecasting 23, 4 (2008): 752-757, doi: https://doi.org/10.1175/2007WAF2006116.1

compute_metrics(forecasts, timestamps, threshold=0.5, binning_method='quantile', num_bins=10, draw_diagram=True)[source]

Computes metrics in “metrics2compute” for the probabilities in “forecasts” and populates the “performance” attribute.

Parameters:
  • forecasts (array-like, shape (#forecasts, ), dtype "float64") – Contains the predicted probabilites of seizure occurrence for the period with duration equal to the forecast horizon and starting at the timestamps in “timestamps”.

  • timestamps (array-like, shape (#forecasts, ), dtype "int64") – Contains the Unix timestamps, in seconds, for the start of the period for which the forecasts (in “forecasts”) are valid.

  • threshold (float64, defaults to 0.5) – Probability value to apply as the high-likelihood threshold.

  • binning_method (str, defaults to "equal_frequency") –

    Method used to determine the number of bins used to compute probabilistic metrics. Available methods are:
    • ”uniform”: number of bins corresponds to np.ceil(#forecasts^(1/3)), set at approximately equal distances.

    • ”quantile”: number of bins corresponds to np.ceil(#forecasts^(1/3)), which are populated with an approximately equal number of forecasts.

  • num_bins (int64, defaults to 10) – Number of bins used to compute probabilistic metrics. If None, it is calculated as np.ceil(#forecasts^(1/3)), otherwise “num_bins” is used as the number of bins.

  • draw_diagram (bool, defaults to True) – Whether to draw the reliability diagram after computing all required metrics.

Returns:

performance (dict) – Dictionary where the keys are the metrics’ names (as in “metrics2compute”) and the value is the corresponding performance.

reliability_diagram(forecasts, timestamps, binning_method, num_bins)[source]

Method that plots the reliability diagram (forecasted_proba vs observed_proba), along with the no-resolution and perfect-reliability lines.

sefef.visualization

This is a helper module for visualization.

copyright:
  1. 2024 by Ana Sofia Carmo

license:

BSD 3-clause License, see LICENSE for more details.

sefef.visualization.aggregate_plots(figs, folder_path=None, filename=None, show=True)[source]

Receives go.Figure objects created using “plot_forecasts” and aggregates them into a single Figure.

Parameters:

figs (go.Figure) – Figures to aggregate into a single plot.

sefef.visualization.hex_to_rgba(h, alpha)[source]

Converts color value in hex format to rgba format with alpha transparency

sefef.visualization.html_modelcard_formating(contents)[source]

Courtesy of ChatGPT

sefef.visualization.plot_forecasts(forecasts, ts, sz_onsets, high_likelihood_thr, forecast_horizon, title='Seizure probability', folder_path=None, filename=None, show=True, return_plot=False, n_points=100)[source]

Provide visualization of forecasts.

Parameters:
  • forecasts (array-like, shape (#forecasts, ), dtype "float64") – Contains the predicted probabilites of seizure occurrence for the period with duration “forecast_horizon” and starting at the timestamps in “result2”.

  • ts (array-like, shape (#forecasts, ), dtype "int64") – Contains the Unix timestamps, in seconds, for the start of the period for which the forecasts (in “result1”) are valid.

  • sz_onsets (array-like, shape (#sz onsets, )) – Contains the unix timestamps (in seconds) of the onsts of seizures.

  • high_likelihood_thr (float64) – Value between 0 and 1 corresponding to the threshold of high-likelihood.