API Reference

This part of the documentation details the complete SeFEF API.

Modules

sefef.evaluation 

This module contains functions to implement time-series cross validation (TSCV).

copyright:

2024 by Ana Sofia Carmo

license:

BSD 3-clause License, see LICENSE for more details.

class sefef.evaluation.Dataset(timestamps, samples_duration, sz_onsets)[source]

Bases: object

Create a Dataset with metadata on the data that will be used for training and testing

timestamps

The Unix-time timestamp (in seconds) of the start timestamp of each sample.

Type:: array-like, shape (#samples,)

samples_duration

Duration of samples in seconds.

Type:: array-like, shape (#samples,)

sz_onsets

Contains the Unix-time timestamps (in seconds) corresponding to the onsets of seizures.

Type:: np.array

sampling_frequency

Frequency at which the data is stored in each file.

Type:: int

class sefef.evaluation.TimeSeriesCV(preictal_duration, prediction_latency, n_min_events_train=3, n_min_events_test=1, post_sz_interval=3600, pre_lead_sz_interval=14400, initial_train_duration=None, test_duration=None)[source]

Bases: object

Implements time series cross validation (TSCV).

preictal_duration

Duration of the period (in seconds) that will be labeled as preictal, i.e. that we expect to contain useful information for the forecast

Type:: int, defaults to 3600 (60min)

prediction_latency

Latency (in seconds) of the preictal period with regards to seizure onset.

Type:: int, defaults to 600 (10min)

n_min_events_train

Minimum number of lead seizures to include in the train set. Should guarantee at least one lead seizure is left for testing.

Type:: int, defaults to 3

n_min_events_test

Minimum number of lead seizures to include in the test set. Should guarantee at least one lead seizure is left for testing.

Type:: int, defaults to 1

post_sz_interval

Time interval (in seconds) after a lead seizure that should be included in the same set as the corresponding seizure. This time will be removed from the train set, along with the seizure onset and prediction_latency.

Type:: int

pre_lead_sz_interval

Time interval (in seconds) free of seizures by which a seizure should be preceded to be considered a lead seizure.

Type:: int

initial_train_duration

Set duration of train for initial split (in seconds).

Type:: int, defaults to 1/3 of total recorded duration

test_duration

Set duration of test (in seconds).

Type:: int, defaults to 1/2 of ‘initial_train_duration’

method

Method for TSCV - can be either ‘expanding’ or ‘sliding’. Only ‘expanding’ is implemented atm.

Type:: str

n_folds

Number of folds for the TSCV, determined according to the attributes set by the user and available data.

Type:: int

split_ind_ts

Contains split timestamp indices (train_start_ts, test_start_ts, test_end_ts) for each fold. Is initiated as None and populated during ‘split’ method.

Type:: array-like, shape (n_folds, 3)

split(dataset, iteratively) :: Get timestamp indices to split data for time series cross-validation. - The train set can be obtained by metadata.loc[train_start_ts : test_start_ts]. - The test set can be obtained by metadata.loc[test_start_ts : test_end_ts].

plot(dataset) :: Plots the TSCV folds with the available data.

iterate() :: Iterates over the TSCV folds and at each iteration returns a train set and a test set.

Raises:

ValueError : – Raised whenever TSCV is not passible to be performed under the attributes set by the user and available data.
AttributeError : – Raised when ‘plot’ is called before ‘split’.

get_TSCV_fold(h5dataset, ifold, remove_non_preictal_interictal_samples=True)[source]

Returns a train set and a test set from corresponding TSCV fold.

Parameters:

h5dataset (HDF5 file) – HDF5 file object with the following datasets: - “data”: each entry corresponds to a sample with shape (embedding shape), e.g. (#features, ) or (sample duration, #channels) - “timestamps”: contains the start timestamp (unix in seconds) of each sample in the “data” dataset, with shape (#samples, ). - “annotations”: contains the labels (0: interictal, 1: preictal) for each sample in the “data” dataset, with shape (#samples, ). - “sz_onsets”: contains the Unix timestamps of the onsets of seizures (#sz_onsets, ).
ifold (int) – Index corresponding to TSCV fold.
remove_non_preictal_interictal_samples (bool) – Whether to remove samples that are neither preictal or interical, i.e. samples containing the onsets of seizures, as well as the intervals corrsponding to “prediction_latency” and “lead_sz_post_interval”.

Returns:

tuple –

((train_data, train_annotations, train_timestamps, train_sz_onsets), (test_data, test_annotations, test_timestamps, test_sz_onsets))
Where:
- ”[]_data”: A slice of “h5dataset[“data”]”, with shape (#samples, embedding shape), e.g. (#samples, #features) or (#samples, sample duration, #channels), and dtype “float32”.
- ”[]_annotations”: A slice of “h5dataset[“annotations”]”, with shape (#samples, ) and dtype “bool”.
- ”[]_timestamps”: A slice of “h5dataset[“timestamps”]”, with shape (#samples, ) and dtype “int64”.
- ”[]_sz_onsets”: A slice of “h5dataset[“sz_onsets”]”, with shape (#sz onsets, ) and dtype “int64”.

iterate(h5dataset, remove_non_preictal_interictal_samples=True)[source]

Iterates over the TSCV folds and at each iteration returns a train set and a test set.

Parameters:

h5dataset (HDF5 file) – HDF5 file object with the following datasets: - “data”: each entry corresponds to a sample with shape (embedding shape), e.g. (#features, ) or (sample duration, #channels) - “timestamps”: contains the start timestamp (unix in seconds) of each sample in the “data” dataset, with shape (#samples, ). - “annotations”: contains the labels (0: interictal, 1: preictal) for each sample in the “data” dataset, with shape (#samples, ). - “sz_onsets”: contains the Unix timestamps of the onsets of seizures (#sz_onsets, ).
remove_non_preictal_interictal_samples (bool) – Whether to remove samples that are neither preictal or interical, i.e. samples containing the onsets of seizures, as well as the intervals corrsponding to “prediction_latency” and “lead_sz_post_interval”.

Returns:

tuple –

((train_data, train_annotations, train_timestamps), (test_data, test_sz_onsets, test_timestamps))
Where:
- ”[]_data”: A slice of “h5dataset[“data”]”, with shape (#samples, embedding shape), e.g. (#samples, #features) or (#samples, sample duration, #channels), and dtype “float32”.
- ”[]_annotations”: A slice of “h5dataset[“annotations”]”, with shape (#samples, ) and dtype “bool”.
- ”[]_sz_onsets”: A slice of “h5dataset[“sz_onsets”]”, with shape (#sz onsets, ) and dtype “int64”.
- ”[]_timestamps”: A slice of “h5dataset[“timestamps”]”, with shape (#samples, ) and dtype “int64”.

plot(dataset, folder_path=None, filename=None, mode='lines')[source]

Plots the TSCV folds with the available data.

Parameters:

dataset (Dataset) – Instance of Dataset.
mode (str) – Trace scatter mode (“lines” or “markers”), for sparse data, “markers” is a more suitable option, despite being heavier to plot.

split(dataset, iteratively=False, plot=False, extend_final_test_set=False)[source]

Get timestamp indices to split data for time series cross-validation. - The train set would be given by metadata.loc[train_start_ts : test_start_ts]. - The test set would be given by metadata.loc[test_start_ts : test_end_ts].

Parameters:

datasetDataset: Instance of Dataset.
iterativelybool, defaults to False: If the split is meant to return the timestamp indices for each fold iteratively (True) or to simply update ‘split_ind_ts’ (False).
plotbool, defaults to False: If a diagram illustrating the TSCV should be shown at the end. ‘iteratively’ cannot be set to True
extend_final_test_setbool: Whether to extend test set in final fold to include all data or keep test duration approximately the same across folds.

Returns:

train_start_tsint: Timestamp index for the start of the train set.
test_start_tsint: Timestamp index for the start of the test set (and end of train set).
test_end_tsint: Timestamp index for the end of the test set.

sefef.labeling 

This module contains functions to automatically label samples according to the desired pre-ictal duration and prediction latency.

copyright:

2024 by Ana Sofia Carmo

license:

BSD 3-clause License, see LICENSE for more details.

sefef.labeling.add_annotations(h5dataset, sz_onsets_ts, preictal_duration=3600, prediction_latency=600)[source]

Add “annotations”, with shape (#samples, ) and dtype “bool”, to HDF5 file object according to the variables “preictal_duration” and “prediction_latency”. Annotations are either 0 (inter-ictal), or 1 (pre-ictal).

Parameters:

h5dataset (HDF5 file) – HDF5 file object with the following datasets: - “data”: each entry corresponds to a sample with shape (embedding shape), e.g. (#features, ) or (sample duration, #channels). - “timestamps”: contains the start timestamp (unix in seconds) of each sample in the “data” dataset, with shape (#samples, ). - “sz_onsets”: contains the Unix timestamps of the onsets of seizures (#sz_onsets, ). (optional)
sz_onsets_ts (array-like, shape (#sz onsets, )) – Contains the unix timestamps (in seconds) of the onsets of seizures.
preictal_duration (int, defaults to 3600 (60min)) – Duration of the period (in seconds) that will be labeled as preictal, i.e. that we expect to contain useful information for the forecast
prediction_latency (int, defaults to 600 (10min)) – Latency (in seconds) of the preictal period with regards to seizure onset.

Returns:

None, but adds a dataset instance to the h5dataset file object.

sefef.labeling.add_sz_onsets(h5dataset, sz_onsets_ts)[source]

Add “sz_onsets”, with shape (#seizures, ) and dtype “int64”, to HDF5 file object, corresponding to the Unix timestamps of each seizure onset.

Parameters:

h5dataset (HDF5 file) – HDF5 file object with the following datasets: - “data”: each entry corresponds to a sample with shape (embedding shape), e.g. (#features, ) or (sample duration, #channels). - “timestamps”: contains the start timestamp (unix in seconds) of each sample in the “data” dataset, with shape (#samples, ). - “annotations”: contains the annotations (aka labels) of each sample. (optional)
sz_onsets_ts (array-like, shape (#sz onsets, )) – Contains the unix timestamps (in seconds) of the onsts of seizures.

Returns:

None, but adds a dataset instance to the h5dataset file object.

sefef.postprocessing 

This module contains functions to process individual predicted probabilities into a unified forecast according to the desired forecast horizon. Author: Ana Sofia Carmo

copyright:

2024 by Ana Sofia Carmo

license:

BSD 3-clause License, see LICENSE for more details.

class sefef.postprocessing.Forecast(pred_proba, timestamps)[source]

Bases: object

Stores the forecasts made by the model and processes them.

pred_proba

Contains the probability predicted by the model for each sample belonging to the pre-ictal class.

Type:: array-like, shape (#samples, ), dtype “float64”

timestamps

Contains the unix timestamps (in seconds) corresponding to the start-time of each sample.

Type:: array-like, shape (#samples, ), dtype “int64”

append(pred_proba, timestamps) :: Appends new predicted probabilities to the ones already in the Forecast object.

postprocess(forecast_horizon) :: Applies postprocessing methodology to the predictions stored in “pred_proba”, according to “forecast horizon” (in seconds). Returns an array with the new probabilities.

Raises:: ValueError : – Description

append(pred_proba, timestamps)[source]

postprocess(forecast_horizon, smooth_win, smooth_sliding=False, origin='clock-time')[source]

Applies post-processing methodology to the predictions stored in “pred_proba”. For each time period with duration equal to “forecast_horizon”, mean predicted probabilities are calculated for groups of consecutive samples (with a window of duration “smooth_win”, in seconds), with or without overlap, and the maximum across the full period is obtained.

Parameters:

forecast_horizon (int) – Forecast horizon in seconds, i.e. time in the future for which the forecasts will be issued.
smooth_win (int) – Duration of window, in seconds, used to smooth the predicted probabilities. If “smooth_sliding” is set to False, the duration of this variable should sum up to “forecast_horizon”.
smooth_sliding (bool, defaults to False) – Whether to use a sliding-window approach during smoothing (with a step of 1 sample), or to use non-overlaping smoothing windows. When True, not yet implemented.
origin (str, defaults to "clock-time") – Determines if the forecasts are issued at clock-time (e.g. at the start of each hour) or according to the start-time of the first sample. Options are “clock-time” and “sample-time”, respectively.

Returns:

result1 (array-like, shape (#forecasts, ), dtype “float64”) – Contains the predicted probabilites of seizure occurrence for the period with duration “forecast_horizon” and starting at the timestamps in “result2”.
result2 (array-like, shape (#forecasts, ), dtype “int64”) – Contains the Unix timestamps, in seconds, for the start of the period for which the forecasts (in “result1”) are valid.

sefef.scoring 

This module contains functions to compute both deterministic and probvabilistic metrics according to the horizon of the forecast.

copyright:

2024 by Ana Sofia Carmo

license:

BSD 3-clause License, see LICENSE for more details.

class sefef.scoring.Scorer(metrics2compute, sz_onsets, forecast_horizon, reference_method='prior_prob', hist_prior_prob=None)[source]

Bases: object

Class description

metrics2compute

List of metrics to compute. The metrics can be either deterministic or probabilistic and metric names should be the ones from the following list: - Deterministic: “Sen” (i.e. sensitivity), “FPR” (i.e. false positive rate), “TiW” (i.e. time in warning), “AUC_TiW” (i.e. area under the curve of Sen vs TiW). - Probabilistic: “resolution”, “reliability”, “BS” (i.e. Brier score), “skill” or “BSS” (i.e. Brier skill score).

Type:: list<str>

sz_onsets

Contains the Unix timestamps, in seconds, for the start of each seizure onset.

Type:: array-like, shape (#seizures, ), dtype “int64”

forecast_horizon

Forecast horizon in seconds, i.e. time in the future for which the forecasts are valid.

Type:: int

performance

Dictionary where the keys are the metrics’ names (as in “metrics2compute”) and the value is the corresponding performance. It is initialized as an empty dictionary and populated in “compute_metrics”.

Type:: dict

reference_method

Method to compute the reference forecasts.

Type:: str, defaults to “prior_prob”

hist_prior_prob

Prior probability, aka historical likelihood (relative frequency) of seizures in train data. Used only as the “hist_prior_prob” reference forecast compute the skill measure.

Type:: float64, defaults to None

compute_metrics(forecasts, timestamps):: Computes metrics in “metrics2compute” for the probabilities in “forecasts” and populates the “performance” attribute. This method uses techniques described in [Mason2004] and [Stephenson2008].

reliability_diagram() :: Description

Raises:

ValueError : – Raised when a metric name in “metrics2compute” is not a valid metric or when “reference_method” is not a valid method.
AttributeError : – Raised when ‘compute_metrics’ is called before ‘compute_metrics’.

References

[Mason2004]

1. Mason, “On Using ‘Climatology’ as a Reference Strategy in the Brier and Ranked Probability Skill Scores,” Jul. 2004, Accessed: Nov. 06, 2024. [Online]. Available: https://journals.ametsoc.org/view/journals/mwre/132/7/1520-0493_2004_132_1891_oucaar_2.0.co_2.xml

[Stephenson2008]

Stephenson, D. B. , C. A. S. Coelho, and I. T. Jolliffe. “Two Extra Components in the Brier Score Decomposition”, Weather and Forecasting 23, 4 (2008): 752-757, doi: https://doi.org/10.1175/2007WAF2006116.1

compute_metrics(forecasts, timestamps, threshold=0.5, binning_method='quantile', num_bins=10, draw_diagram=True)[source]

Computes metrics in “metrics2compute” for the probabilities in “forecasts” and populates the “performance” attribute.

Parameters:

forecasts (array-like, shape (#forecasts, ), dtype "float64") – Contains the predicted probabilites of seizure occurrence for the period with duration equal to the forecast horizon and starting at the timestamps in “timestamps”.
timestamps (array-like, shape (#forecasts, ), dtype "int64") – Contains the Unix timestamps, in seconds, for the start of the period for which the forecasts (in “forecasts”) are valid.
threshold (float64, defaults to 0.5) – Probability value to apply as the high-likelihood threshold.
binning_method (str, defaults to "equal_frequency") –
Method used to determine the number of bins used to compute probabilistic metrics. Available methods are:
- ”uniform”: number of bins corresponds to np.ceil(#forecasts^(1/3)), set at approximately equal distances.
- ”quantile”: number of bins corresponds to np.ceil(#forecasts^(1/3)), which are populated with an approximately equal number of forecasts.
num_bins (int64, defaults to 10) – Number of bins used to compute probabilistic metrics. If None, it is calculated as np.ceil(#forecasts^(1/3)), otherwise “num_bins” is used as the number of bins.
draw_diagram (bool, defaults to True) – Whether to draw the reliability diagram after computing all required metrics.

Returns:

performance (dict) – Dictionary where the keys are the metrics’ names (as in “metrics2compute”) and the value is the corresponding performance.

reliability_diagram(forecasts, timestamps, binning_method, num_bins)[source]: Method that plots the reliability diagram (forecasted_proba vs observed_proba), along with the no-resolution and perfect-reliability lines.

sefef.visualization 

This is a helper module for visualization.

copyright:

2024 by Ana Sofia Carmo

license:

BSD 3-clause License, see LICENSE for more details.

sefef.visualization.aggregate_plots(figs, folder_path=None, filename=None, show=True)[source]

Receives go.Figure objects created using “plot_forecasts” and aggregates them into a single Figure.

Parameters:: figs (go.Figure) – Figures to aggregate into a single plot.

sefef.visualization.hex_to_rgba(h, alpha)[source]: Converts color value in hex format to rgba format with alpha transparency

sefef.visualization.html_modelcard_formating(contents)[source]: Courtesy of ChatGPT

sefef.visualization.plot_forecasts(forecasts, ts, sz_onsets, high_likelihood_thr, forecast_horizon, title='Seizure probability', folder_path=None, filename=None, show=True, return_plot=False, n_points=100)[source]

Provide visualization of forecasts.

Parameters:

forecasts (array-like, shape (#forecasts, ), dtype "float64") – Contains the predicted probabilites of seizure occurrence for the period with duration “forecast_horizon” and starting at the timestamps in “result2”.
ts (array-like, shape (#forecasts, ), dtype "int64") – Contains the Unix timestamps, in seconds, for the start of the period for which the forecasts (in “result1”) are valid.
sz_onsets (array-like, shape (#sz onsets, )) – Contains the unix timestamps (in seconds) of the onsts of seizures.
high_likelihood_thr (float64) – Value between 0 and 1 corresponding to the threshold of high-likelihood.

API Reference

Modules

sefef.evaluation

Parameters:

Returns:

sefef.labeling

sefef.postprocessing

sefef.scoring

sefef.visualization

sefef.evaluation 

sefef.labeling 

sefef.postprocessing 

sefef.scoring 

sefef.visualization 