msqms.libs.pyprep.ransac module#
RANSAC bad channel identification.
Summary#
Functions:#
Detect channels that are not predicted well by other channels. |
Reference#
- msqms.libs.pyprep.ransac.find_bad_by_ransac(data, sample_rate, complete_chn_labs, chn_pos, exclude, n_samples=50, sample_prop=0.25, corr_thresh=0.75, frac_bad=0.4, corr_window_secs=5.0, channel_wise=False, max_chunk_size=None, random_state=None, matlab_strict=False)[source]#
Detect channels that are not predicted well by other channels.
Here, a RANSAC approach (see [1], and a short discussion in [2]) is adopted to predict a “clean EEG” dataset. After identifying clean EEG channels through the other methods, the clean EEG dataset is constructed by repeatedly sampling a small subset of clean EEG channels and interpolation the complete data. The median of all those repetitions forms the clean EEG dataset. In a second step, the original and the RANSAC-predicted data are correlated and channels, which do not correlate well with themselves across the two datasets are considered bad_by_ransac.
- Parameters:
data (np.ndarray) – A 2-D array of detrended EEG data, with bad-by-flat and bad-by-NaN channels removed.
sample_rate (float) – The sample rate (in Hz) of the EEG data.
complete_chn_labs (array_like) – Labels for all channels in data, in the same order as they appear in data.
chn_pos (np.ndarray) – 3-D electrode coordinates for all channels in data, in the same order as they appear in data.
exclude (list) – Labels of channels to exclude as signal predictors during RANSAC (i.e., channels already flagged as bad by metrics other than HF noise).
n_samples (int, optional) – Number of random channel samples to use for RANSAC. Defaults to
50.sample_prop (float, optional) – Proportion of total channels to use for signal prediction per RANSAC sample. This needs to be in the range [0, 1], where 0 would mean no channels would be used and 1 would mean all channels would be used (neither of which would be useful values). Defaults to
0.25(e.g., 16 channels per sample for a 64-channel dataset).corr_thresh (float, optional) – The minimum predicted vs. actual signal correlation for a channel to be considered good within a given RANSAC window. Defaults to
0.75.frac_bad (float, optional) – The minimum fraction of bad (i.e., below-threshold) RANSAC windows for a channel to be considered bad-by-RANSAC. Defaults to
0.4.corr_window_secs (float, optional) – The duration (in seconds) of each RANSAC correlation window. Defaults to 5 seconds.
channel_wise (bool, optional) – Whether RANSAC should predict signals for chunks of channels over the entire signal length (“channel-wise RANSAC”, see max_chunk_size parameter). If
False, RANSAC will instead predict signals for all channels at once but over a number of smaller time windows instead of over the entire signal length (“window-wise RANSAC”). Channel-wise RANSAC generally has higher RAM demands than window-wise RANSAC (especially if max_chunk_size isNone), but can be faster on systems with lots of RAM to spare. Defaults toFalse.max_chunk_size ({int, None}, optional) – The maximum number of channels to predict at once during channel-wise RANSAC. If
None, RANSAC will use the largest chunk size that will fit into the available RAM, which may slow down other programs on the host system. If using window-wise RANSAC (the default), this parameter has no effect. Defaults toNone.random_state ({int, None, np.random.RandomState}, optional) – The random seed with which to generate random samples of channels during RANSAC. If random_state is an int, it will be used as a seed for RandomState. If
None, the seed will be obtained from the operating system (see RandomState for details). Defaults toNone.matlab_strict (bool, optional) – Whether or not RANSAC should strictly follow MATLAB PREP’s internal math, ignoring any improvements made in PyPREP over the original code (see matlab-diffs for more details). Defaults to
False.
- Returns:
bad_by_ransac (list) – List containing the labels of all channels flagged as bad by RANSAC.
channel_correlations (np.ndarray) – Array of shape (windows, channels) containing the correlations of the channels with their predicted RANSAC values for each window.
References