abcgan package

Submodules

abcgan.constants module

File for global constants used in the program.

abcgan.anomaly module

abcgan.anomaly.anomaly_score_bv(bvs, gen_bvs, method: str = 'joint', alpha: float = 2.0)

returns unbounded anomaly scores for a background profiles given set of generated samples. Low scores represent anomalous events.

Parameters:
  • bvs (np.ndarray) – (n_samples x n_alt x n_bv) real background profiles

  • gen_bvs (np.ndarray) –

    (n_samples x n_repeat x n_alt x n_feat) n_repeat generated

    set of generated background profiles for each input sample

  • method (str, optional) – ‘joint’: estimates a single anomaly score for each altitude bin using joint distribution ‘marginal’: estimates anomaly scores at each alt for each bv feature using marginal distributions

  • alpha (float) – scalar parameter for sigma (lower alpha –> finner resolution)

Returns:

anomalies – (n_samples x n_alt) output of anomaly scores if joint (n_samples x n_alt x n_feat) output of anomaly scores if marginal

Return type:

np.ndarray

abcgan.anomaly.anomaly_score_hfp(hfps, gen_hfps, method: str = 'joint', alpha: float = 2.0)

returns unbounded anomaly scores for a HFP waves given set of generated samples. Low scores represent anomalous events.

Parameters:
  • hfps (np.ndarray) – (n_samples x n_waves x n_bv) real HFPs

  • gen_hfps (np.ndarray) –

    (n_samples x n_repeat x n_waves x n_feat) n_repeat generated

    set of generated hfp waves for each input sample

  • method (str, optional) – ‘joint’: estimates a single anomaly score for each wave using joint distribution ‘marginal’: estimates anomaly scores on each hfp feature for each wave using marginal distributions

  • alpha (float) – scalar parameter for sigma (lower alpha –> finner resolution)

Returns:

anomalies – (n_samples x n_waves) output of anomaly scores if joint (n_samples x n_waves x n_feat) output of anomaly scores if marginal

Return type:

np.ndarray

abcgan.anomaly.anomaly_score_wtec(wtecs, gen_wtecs: ndarray, method: str = 'joint', alpha: float = 2.0, dataset_name='LSTIDs_Poker')

returns unbounded anomaly scores for TEC wave parameters given set of generated TEC waves. Low scores represent anomalous events.

Parameters:
  • wtecs (np.ndarray) – (n_samples x n_wtec) real background profiles

  • gen_wtecs (np.ndarray) –

    (n_samples x n_repeat x n_wtec) n_repeat generated

    set of generated tec waves for each input sample

  • method (str, optional) – ‘marginal’: estimates anomaly scores on each tec feature using marginal distributions ‘joint’: estimates anomaly score of each tec wave using joint distribution

  • alpha (float) – scalar parameter for sigma (lower alpha –> finner resolution)

  • dataset_name (str) – specify dataset type for z-scaling

Returns:

anomalies – (n_samples) output of anomaly scores if joint (n_samples x n_feat) output of anomaly scores if marginal

Return type:

np.ndarray

abcgan.anomaly.joint_anomaly_estimation(sampled_feats, feats, alpha=1.0)

compute an anomaly scores from the set of generated features using joint distribution based logsumexp computation

Parameters:
  • sampled_feats (np.ndarray) – sampled z-scaled features for each input feature

  • feats (np.ndarray) – input feature broadcast to match sampled_feat dim

  • alpha (float) – scalar parameter for sigma (lower alpha –> finner resolution)

Returns:

anomalies – joint anomaly scores (feat.shape[:-1])

Return type:

np.ndarray

abcgan.anomaly.marginal_anomaly_estimation(sampled_feats, feats, alpha=2.0)

compute an anomaly scores from the set of generated features using marginal distribution based logsumexp computation

Parameters:
  • sampled_feats (np.ndarray) – sampled z-scaled features for each input feature

  • feats (np.ndarray) – input feature broadcast to match sampled_feat dim

  • alpha (float) – scalar parameter for sigma (lower alpha –> finner resolution)

Returns:

anomalies – marginal anomaly scores (feat.shape)

Return type:

np.ndarray

abcgan.attention module

abcgan.attention.collect_bv_attn_map(drivers, bvs, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], model='bv_gan', bv_type='radar')

function to collect attention map from weights in pre-trained model

Parameters:
  • bvs (np.ndarray) – n_samples x n_alt x n_feat (not z-scaled)

  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • driver_names (list) – list of names of driving parameters

  • model (str, optional) – name of model to use

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

Returns:

samples – n_alt x n_alt atten map found in transformer altitude.

Return type:

np.ndarray

abcgan.attention.collect_hfp_attn_map(drivers, bvs, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], model='hfp_gan', bv_type='radar')

function to collect attention map from the HFP GAN’s decoder

Parameters:
  • bvs (np.ndarray) – n_samples x n_alt x n_feat (not z-scaled)

  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • driver_names (list) – list of names of driving parameters

  • model (str, optional) – name of model to use

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

Returns:

samples – n_waves x n_alt map found in transformer altitude.

Return type:

np.ndarray

abcgan.attention.get_decoder_attn(transformer, tgt, memory, key_padding_mask)
abcgan.attention.get_encoder_attn(layers, src, src_mask, src_key_mask)

Collects attention masks from each encoder layer in the transformer

Parameters:
  • layers (list) – list of pytorch encoder layers from transformer

  • src (torch.tensor) – embedded source tensor

  • src_mask (torch.tensor) – constant source key mask from transformer

  • src_key_mask (torch.tensor) – mask for missing source data

Returns:

samples – n_layers x n_samples x n_alt x n_alts output of attention mask or weights for each encoder layer in the transformer

Return type:

torch.tensor

abcgan.interface module

Code for top level interface.

This code is added to the main package level in __init__.py

abcgan.interface.average_wtec(wtec: ndarray, avg_coefficients: List[float] = [0.05, 0.05, 0.05, 0.1, 0.1, 0.1, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05], z_scale_input: bool = False, dataset_name: None | str = 'LSTIDs_Poker')

loads and returns external drivers, tec wave parameters, and unix timestamp all aligned in time with outlier/invalid data filtered out

Parameters:
  • wtec (np.ndarray) – tec wave parameter data

  • avg_coefficients (list (n_wtec_feat,)) – z-scaled averaging coefficients to smooth out the original tec wave parameter distributions.

  • z_scale_input (bool) – set if the input wtec data is already z-scaled

  • dataset_name (str) – specify dataset type for z-scaling

Returns:

wtec – (n_samples x n_wtec) tec wave parameter samples.

Return type:

np.ndarray

abcgan.interface.discriminate(drivers, bvs, hfps=None, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], bv_model='bv_gan', hfp_model='hfp_gan', bv_type='radar')

Score how well the measurements match with historical observations.

Parameters:
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • driver_names (list) – list of names of driving parameters

  • bvs (np.ndarray) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than max_alt.

  • hfps (np.ndarray, optional) – n_samples x n_wave x n_hfps input list of wave measurements,

  • bv_model (str, optional) – name of bv model to use

  • hfp_model (str, optional) – name of model hfp to use

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

Returns:

scores – 1) n_samples x n_alt bv normalcy scores in the range [0, 1.0]. 2) n_samples hfp wave normalcy scores in the range [0, 1.0].

Return type:

(np.ndarray, np.ndarray)

abcgan.interface.estimate_drivers(drivers, model='dr_gan')

Predict drivers 2 hours into the future driver GAN model. Used for real-time background predictions using drivers from 2 hours ago.

Parameters:
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • model (str, optional) – name of model to use

Returns:

predicted_drivers – estimation of driver features two hours from the drivers inputted

Return type:

np.ndarray

abcgan.interface.generate_bvs(drivers: ndarray, bv_measurements: None | ndarray = None, driver_names: List[str] = ['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], mean_replace_drs: None | List[str] = None, n_alt: int = 30, bv_model: str = 'bv_gan', model_dir: None | str = None, bv_type: str = 'radar', return_z_scale: bool = False, cuda_index: None | int = None, verbose: int = 1)

Generate background variable profiles consistent with the historical distribution.

Parameters:
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • driver_names (list) – list of names of driving parameters

  • mean_replace_drs (list) – list of driver that will set to its average, i.e. z-scaled value of zero

  • bv_measurements (np.ndarray, optional) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than n_alt. These represent fixed measurements for the lowest altitudes to condition on.

  • n_alt (int, optional) – number of altitude measurements to draw, defaults to max_alt

  • return_z_scale (bool, optional) – set to have the function return z scaled feature data

  • bv_model (str, optional) – name of bv GAN to use

  • model_dir (str, optional) – directory to load model from

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

  • cuda_index (int, optional) – GPU index to use when generating BVs and HFPs

  • verbose (bool, optional) – set to show loading bar

Returns:

samples – 1) n_samples x n_alt x n_bvs output measurements at each requested altitude. 2) n_samples x n_hfps generated hfp waves 3) n_sample probabilities that the generated wave is present

Return type:

(np.ndarray, np.ndarray, np.ndarray)

abcgan.interface.generate_hfps(drivers: ndarray, bv_measurements: ndarray, driver_names: List[str] = ['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], mean_replace_drs: None | List[str] = None, hfp_model: str = 'hfp_gan', model_dir: None | str = None, return_z_scale: bool = False, cuda_index: None | int = None, verbose: int = 1)

Generate background variable profiles and HFP waves consistent with the historical distribution.

Parameters:
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • driver_names (list) – list of names of driving parameters

  • mean_replace_drs (list) – list of driver that will set to its average, i.e. z-scaled value of zero

  • bv_measurements (np.ndarray, optional) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than n_alt. These represent fixed measurements for the lowest altitudes to condition on.

  • return_z_scale (bool, optional) – set to have the function return z scaled feature data

  • hfp_model (str, optional) – name of hfp GAN to use

  • model_dir (str, optional) – directory to load model from

  • cuda_index (int, optional) – GPU index to use when generating BVs and HFPs

  • verbose (bool, optional) – set to show loading bar

Returns:

samples – 1) n_samples x n_hfps generated hfp waves 2) n_sample probabilities that the generated wave is present

Return type:

(np.ndarray, np.ndarray)

abcgan.interface.generate_multi_bv(drivers: ndarray, bvs: None | ndarray = None, n_repeat: int = 10, n_alt: int = 30, bv_model: str = 'bv_gan', bv_type: str = 'radar', cuda_index: None | int = None, verbose: int = 1)

Generate multiple background variable profiles consistent with the historical distribution for each driver sample. Used for anomaly detection.

Parameters:
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • bvs (np.ndarray, optional) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than n_alt. These represent fixed measurements for the lowest altitudes to condition on. Usually left as default (None)

  • n_repeat (int, optional) – number of bv profiles to generate for each driver sample

  • n_alt (int, optional) – number of altitude measurements to draw, defaults to max_alt

  • bv_model (str, optional) – name of bv GAN to use

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

  • cuda_index (int, optional) – GPU index to use when generating BVs and HFPs

  • verbose (bool, optional) – set to show loading bar

Returns:

G_bvs – output measurements at each requested altitude.

Return type:

(n_samples x n_repeat x n_alt x n_bvs)

abcgan.interface.generate_multi_hfp(drivers: ndarray, bvs: ndarray, n_repeat: int = 10, hfp_model: str = 'hfp_gan', cuda_index: None | int = None, verbose: int = 1)

Generate HFP waves consistent with the historical distribution for each driver sample. Creates multiple samples for a single input driver/bv profile to be using in anomaly detection

Parameters:
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • bvs (np.ndarray, optional) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than n_alt but greater than zero. These represent fixed measurements that the HFP GAN will use as conditioning.

  • n_repeat (int, optional) – number of hfp waves to generate for each driver sample

  • hfp_model (str, optional) – name of hfp gan to use

  • cuda_index (int, optional) – GPU index to use when generating and HFPs

  • verbose (bool, optional) – set to show loading bar

Returns:

samples – 1) G_hfps (n_samples x n_repeat x 1 x n_hfps) generated hfp waves 2) G_b (n_sample x n_repeat) probabilities that the generated wave is present

Return type:

(np.ndarray, np.ndarray)

abcgan.interface.generate_multi_wtec(drivers: ndarray, n_repeat: int = 10, wtec_model: str = 'wtec_gan_LSTIDs_Poker', model_dir: None | str = None, dataset_name='LSTIDs_Poker', cuda_index: None | int = None, verbose: int = 0)

Generate multiple background variable profiles and HFP waves consistent with the historical distribution for each driver sample

Parameters:
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • n_repeat (int, optional) – number of waves to generate for each driver sample

  • wtec_model (str, optional) – name of WTEC GAN to use

  • model_dir (str, optional) – directory to load model from

  • dataset_name (str) – specify dataset type for z-scaling

  • cuda_index (int, optional) – GPU index

  • verbose (bool, optional) – set to show loading bar

Returns:

samples – (n_samples x n_repeat x n_wtec) output wtec measurements.

Return type:

np.ndarray

abcgan.interface.generate_wtec(drivers, driver_names: list = ['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], mean_replace_drs: None | List[str] = None, wtec_model: str = 'wtec_gan_LSTIDs_Poker', dataset_name: None | str = None, model_dir: None | str = None, return_z_scale: bool = False, cuda_index: None | int = None, verbose: int = 1)

Generate background variable profiles and HFP waves consistent with the historical distribution.

Parameters:
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • driver_names (list) – list of names of driving parameters

  • mean_replace_drs (list) – list of driver that will set to its average, i.e. z-scaled value of zero

  • return_z_scale (bool, optional) – set to have the function return z scaled feature data

  • wtec_model (str, optional) – name of WTEC GAN to use

  • dataset_name (str) – specify dataset type for z-scaling

  • model_dir (str, optional) – directory to load model from

  • cuda_index (int, optional) – GPU index to use when generating BVs and HFPs

  • verbose (bool, optional) – set to show loading bar

Returns:

samples – 1) n_samples x n_wtec output measurements

Return type:

np.ndarray

abcgan.interface.hellinger_scores_bv(real: ndarray, fake: ndarray, mask: None | ndarray = None, bins: None | int = None, filter_length: None | int = None, return_hist_info: bool = False, z_scale: bool = True, z_scale_input: bool = False, bv_type: str = 'radar')

Returns the hellinger distance score that measures how similarity between real and generated background variable profiles.

Parameters:
  • real (np.ndarray) – tensor of real values for a particular alt and bv feat

  • fake (np.ndarray) – tensor of generated values for a particular alt and bv feat

  • bins (int) – number of bins to use in histogram calculations (If None # of bins will be calculated based on number of samples)

  • filter_length (int) – averaging filter length to smooth out noise in histograms (If None filter length will be calculated based on number of samples)

  • return_hist_info (bool) – set to have function return the histograms and bin edges used which were used to calculate the hellinger distance metric.

  • z_scale (bool) – used z-scaled values when calculating hellinger distance (recommended)

  • z_scale_input (bool) – Set if you are inputting bvs that are already z-scaled

  • bv_type – type of data (radar or lidar)

Returns:

  • dist – the hellinger distance (n_alts x n_feats)

  • hist_info – additional histogram data that was used to calculate hellinger dist. Info includes to real and fake histograms and there shared bin edges.

abcgan.interface.hellinger_scores_hfp(real: ndarray, fake: ndarray, r_mask: None | ndarray = None, f_mask: None | ndarray = None, n_bins: None | tuple | int = None, filter_length: None | int = None, return_hist_info: bool = False, z_scale: bool = True, z_scale_input: bool = False)

Returns the hellinger distance score that measures the similarity between real and generated background variable profiles.

Parameters:
  • real – tensor of real values for a particular alt and bv feat

  • fake – tensor of generated values for a particular alt and bv feat

  • n_bins – tensor of real values for a particular alt and bv feat

  • filter_length – averaging filter length to smooth out histograms

  • z_scale (bool) – used z-scaled values (recommended)

  • z_scale_input (bool) – Set if you are inputting hfps that are already z-scaled

  • return_hist_info (bool) – set to have function return the real hist, fake hist, and bin edges used in calculation

Returns:

  • dist – the hellinger distance (n_alts or n_waves x n_feats)

  • hist_info – additional histogram data that was used to calculate hellinger dist. Info includes to real and fake histograms and there shared bin edges.

abcgan.interface.hellinger_scores_wtec(real: ndarray, fake: ndarray, n_bins: None | int = None, filter_length: None | int = None, z_scale: bool = True, z_scale_inputs: bool = False, dataset_name: None | str = 'LSTIDs_Poker', return_hist_info: bool = False)

Returns the hellinger distance score that measures the similarity between real and generated tec wave.

Parameters:
  • real – array of real tec waves

  • fake – array of fake/generated tec waves

  • n_bins – number of bins to use during hellinger score calculation

  • filter_length – averaging filter length to smooth out histograms

  • z_scale (bool) – used z-scaled values (recommended)

  • z_scale_inputs (bool) – Set if you are inputting hfps that are already z-scaled

  • dataset_name (str) – specify dataset type for z-scaling

  • return_hist_info (bool) – set to have function return the real hist, fake hist, and bin edges used in calculation

Returns:

  • dist – the hellinger distance (1 x n_feats)

  • hist_info – additional histogram data that was used to calculate hellinger dist. Info includes to real and fake histograms and there shared bin edges.

abcgan.interface.load_h5_data(fname, bv_type: str = 'radar', load_hfp: bool = False, n_samples: None | int = None, random_start: bool = False, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])

loads and returns external drivers, background variables, HFP waves and data mask all aligned in time with outlier/invalid data filtered out

Parameters:
  • fname (str) – name of h5 file to load the data from

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

  • load_hfp (bool. optional) – set to load HFP waves along with the background variables

  • n_samples (int. optional) – number of samples to load (None to load all samples)

  • random_start (bool. optional) – randomize starting index to select n_samples from

  • driver_names (list. optional) – list of driver names to load

Returns:

  • drivers (np.ndarray) – (n_samples x n_dr) external drivers.

  • bvs (np.ndarray) – (n_samples x n_alt x n_bv) background variables.

  • alt_mask (np.ndarray) – (n_samples x n_alt) background variables alt mask.

  • hfps (np.ndarray) – (n_samples x 1 x n_hpf) HPF waves.

  • wave_mask (np.ndarray) – (n_samples x 1) HFP wave mask.

  • unix_time (np.ndarray) – (n_samples, ) time stamp of each sample

abcgan.interface.load_wtec_h5(fname: str, dataset_name='LSTIDs_Poker', n_samples: None | int = None, avg_coefficients: None | List[float] = None, random_start: bool = True)

loads and returns external drivers, tec wave parameters, and unix timestamp all aligned in time with outlier/invalid data filtered out

Parameters:
  • fname (str) – name of h5 file to load the data from

  • dataset_name (str) – specify dataset type for z-scaling

  • n_samples (int. optional) – number of samples to load (None to load all samples)

  • avg_coefficients (list (n_wtec_feat,)) – z-scaled averaging coefficients to smooth out the original tec wave parameter distributions.

  • random_start (bool. optional) – randomize starting index to select n_samples from

Returns:

  • drivers (np.ndarray) – (n_samples x n_wtec_dr) external drivers.

  • wtec (np.ndarray) – (n_samples x n_wtec) tec wave parameter samples.

  • unix_time (np.ndarray) – (n_samples, ) time stamp of each sample

abcgan.interface.stack_bvs(bv_dict, bv_type='radar')

Stacks drivers in appropriate format.

This function is provided for convenience.

Parameters:
  • bv_dict (dict) – Dictionary mapping names of background variables to numpy arrays with values for those bvs. Each array should have shape n_sapmles x n_altitudes. Can also use h5py.Group.

  • bv_type (str) – string specifying weather to stack radar or lidar data

  • abcgan.bv_names (Valid names for drivers can be found at) –

Raises:
  • ValueError: – If the input shape of the bv dict values is not corrects

  • KeyError: – If one of the required bvs is missing.

abcgan.interface.stack_drivers(driver_dict, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])

Stacks drivers in appropriate format.

This function is provided for convenience.

Parameters:
  • driver_dict (dict) – Dictionary mapping names of drivers to the numpy arrays with values for those drivers. Each array has a single dimension of the same length n_samples. Can also use an h5py.Group.

  • driver_names (list) – names of the drivers to load

  • abcgan.driver_names (Valid names for drivers can be found at) –

Raises:
  • ValueError: – If the driver values have the wrong type or shape.

  • KeyError: – If one of the required drivers is missing.

abcgan.mask module

abcgan.mask.mask_altitude(bv_feat)

Creates an altitude mask for nans in bvs.

Also replaces nans with numbers.

Parameters:

bv_feat (torch.Tensor) – background variables

Returns:

  • bv_feat (torch.Tensor) – bv_feat with nans replaced, done in place but returned for clarity

  • alt_mask (torch.Tensor) – Mask that is true for valid altitudes

Raises:

ValueError: – If valid values are not contiguous.

abcgan.mask.prev_driver_mask(unix_time)

Creates a driver mask of samples that have a previous sample and a mapping vector to the previous sample.

Parameters:

unix_time (np.array) – time stamp of driver samples

Returns:

  • prev_dr_map (np.array) – vector mapping each sample to its delayed sample

  • dr_mask (torch.Tensor) – Mask of valid driver samples that have a delayed sample

abcgan.mean_estimation module

class abcgan.mean_estimation.Transformer(d_dr: int = 32, d_bv: int = 12, n_alt: int = 30, d_model: int = 64, nhead: int = 1, num_encoder_layers: int = 1, dim_feedforward: int = 64, dropout: float = 0.0, activation: str = 'relu')

Bases: Module

Transformer with only the encoder :param d_model: the number of expected features in the encoder/decoder inputs :param d_stack: the number of features to stack to output :param nhead: the number of heads in the multiheadattention models :param num_encoder_layers: the number of sub-encoder-layers in the encoder :param dim_feedforward: the dimension of the feedforward network model :param dropout: the dropout value :param activation: the activation function of encoder/decoder intermediate layer

forward(driver_src: Tensor, bv_src: Tensor, src_key_padding_mask: Tensor | None = None)

Take in and process masked source/target sequences. :param driver_src: the sequence to the encoder (required). :param bv_src: the sequence to the decoder (required). :param src_key_padding_mask: the ByteTensor mask for src keys per batch :param (optional).:

Shape:
  • driver_src: \((n_batch, d_dr)\).

  • bv_src: \((n_batch, n_alt, d_bv)\).

  • src_key_padding_mask: \((n_batch, n_alt)\).

generate_square_subsequent_mask(sz: int) Tensor

Generate a square mask for the sequence. The masked positions are filled with float(‘-inf’). Unmasked positions are filled with float(0.0).

abcgan.bv_model module

class abcgan.bv_model.Critic(transformer: Module, n_layers=4, img_dim=12, hidden_dim=128)

Bases: Module

Critic Class

Parameters:
  • transformer (torch.nn.Module) – transformer for the critic

  • n_layers (int) – number of layers in MLP

  • img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar

  • hidden_dim (int) – the inner dimension, a scalar

forward(bv_features, driver_src, real, src_key_mask=None)

Function for completing a forward pass of the critic: Given an image tensor, returns a 1-dimension tensor representing a fake/real prediction.

Parameters:
  • bv_features (torch.Tensor) – a flattened image tensor with dimension (n_batch, max_alt, n_bv_feat)

  • driver_src (torch.Tensor) – tensor of driver features from data loader (n_batch, n_dr_feat)

  • real (torch.Tensor) – tensor of bv features from data loader (n_batch, n_alt, n_bv_feat)

  • src_key_mask (torch.Tensor, optional) – mask for bv features from data loader (n_batch, n_alt)

class abcgan.bv_model.Driver_Critic(n_layers=2, img_dim=32, hidden_dim=64)

Bases: Module

Critic Class

Parameters:
  • n_layers (int) – number of layers in MLP

  • img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar

  • hidden_dim (int) – the inner dimension, a scalar

forward(dr_src, dr_prev)

forward pass of the critic for driver augmentation: Given an image tensor, returns a 1-dimension tensor representing a fake/real prediction.

Parameters:
  • dr_src (torch.Tensor) – tensor of driver features (n_batch, n_dr_feat)

  • dr_prev (torch.Tensor) – tensor of past driver features (n_batch, n_dr_feat)

class abcgan.bv_model.Driver_Generator(n_layers=2, latent_dim=16, img_dim=32, hidden_dim=64)

Bases: Module

Generator Class

Parameters:
  • n_layers (int) – number of MLP layers

  • latent_dim (int) – the dimension of the input latent vector

  • img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar

  • hidden_dim (int) – the inner dimension, a scalar

forward(dr_prev, noise=None)

forward pass of the generator for driver augmentation: Given driver sample from the past and noise tensor, returns generated driver sample. :param dr_prev: tensor of past driver features from data loader (n_batch, n_dr_feat) :type dr_prev: torch.Tensor :param noise: a noise tensor with dimensions (n_batch, latent_dim) :type noise: torch.Tensor, optional

class abcgan.bv_model.Generator(transformer: Module, n_layers=4, latent_dim=16, img_dim=12, hidden_dim=128)

Bases: Module

Generator Class

Parameters:
  • transformer (torch.nn.Module) – transformer for the generator

  • n_layers (int) – number of MLP layers

  • latent_dim (int) – the dimension of the input latent vector

  • img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar

  • hidden_dim (int) – the inner dimension, a scalar

forward(driver_src, bv_src, src_key_mask=None, noise=None)

Function for completing a forward pass of the generator: Given a noise tensor, returns generated images. :param driver_src: tensor of driver features from data loader (n_batch, n_dr_feat) :type driver_src: torch.Tensor :param bv_src: tensor of bv featrues from data loader (n_batch, n_alt, n_bv_feat) :type bv_src: torch.Tensor :param src_key_mask: mask for bv features from data loader (n_alt, n_batch) :type src_key_mask: torch.Tensor, optional :param noise: a noise tensor with dimensions (n_batch, latent_dim) :type noise: torch.Tensor, optional

abcgan.hfp_model module

class abcgan.hfp_model.HFP_Critic(transformer: Module, n_layers=4, img_dim=8, hidden_dim=128)

Bases: Module

Critic Class

Parameters:
  • transformer (torch.nn.Module) – transformer for the critic

  • n_layers (int) – number of layers in MLP

  • img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar

  • hidden_dim (int) – the inner dimension, a scalar

forward(dr_src, real_bv, real_hfp, hfp_feat, src_key_mask=None, tgt_key_mask=None)

Critic forward

Parameters:
  • dr_src (torch.Tensor) – tensor of driver features from data loader (n_batch, n_dr_feat)

  • real_bv (torch.Tensor) – tensor of bv features from data loader (n_batch, n_alt, n_bv_feat)

  • real_hfp (torch.Tensor) – tensor of real hfp features from data loader (n_batch, n_wave, n_bv_feat)

  • hfp_feat (torch.Tensor) – tensor of hfp features from data loader (n_batch, n_wave, n_bv_feat)

  • src_key_mask (torch.Tensor, optional) – mask for bv features from data loader (n_batch, n_alt)

  • tgt_key_mask (torch.Tensor, optional) – mask for hfp features from data loader (n_batch, n_wave)

class abcgan.hfp_model.HFP_Generator(transformer: Module, n_layers=4, latent_dim=16, img_dim=8, hidden_dim=128)

Bases: Module

Generator Class

Parameters:
  • transformer (torch.nn.Module) – the transformer model object that estimates hfp features

  • n_layers (int) – number of MLP layers

  • latent_dim (int) – the dimension of the input latent vector

  • img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar

  • hidden_dim (int) – the inner dimension, a scalar

forward(driver_src, bv_src, hfp_tgt, src_key_mask=None, tgt_key_mask=None, noise=None)

Function for completing a forward pass of the generator: Given a noise tensor, returns generated images. :param driver_src: tensor of driver features from data loader (n_batch, n_dr_feat) :type driver_src: torch.Tensor :param bv_src: tensor of bv featrues from data loader (n_batch, n_alt, n_bv_feat) :type bv_src: torch.Tensor :param hfp_tgt: tensor of hfp featrues from data loader (n_batch, n_wave, n_hfp_feat) :type hfp_tgt: torch.Tensor :param src_key_mask: mask for bv features from data loader (n_alt, n_batch) :type src_key_mask: torch.Tensor, optional :param tgt_key_mask: mask for hfp features from data loader (n_batch, n_wave) :type tgt_key_mask: torch.Tensor, optional :param noise: a noise tensor with dimensions (n_batch, latent_dim) :type noise: torch.Tensor, optional

class abcgan.hfp_model.HFP_Transformer(d_dr: int = 32, d_bv: int = 12, n_alt: int = 30, n_waves: int = 1, d_hfp: int = 8, d_model: int = 256, output_b: bool = False, nhead: int = 1, num_encoder_layers: int = 1, num_decoder_layers: int = 1, dim_feedforward: int = 1024, dropout: float = 0.1, activation: str = 'relu')

Bases: Module

Transformer with only the encoder :param d_model: the number of expected features in the encoder/decoder inputs :param nhead: the number of heads in the multiheadattention models :param num_encoder_layers: the number of sub-encoder-layers in the encoder :param dim_feedforward: the dimension of the feedforward network model :param dropout: the dropout value :param activation: the activation function of encoder/decoder intermediate layer

forward(driver_src: Tensor, bv_src: Tensor, hfp_tgt: Tensor, src_key_padding_mask: Tensor | None = None, tgt_key_padding_mask: Tensor | None = None)

Take in and process masked source/target sequences. :param driver_src: conditioning driver input (required). :param bv_src: the sequence to the encoder (required). :param hfp_tgt: the sequence to the decoder (required). :param src_key_padding_mask: the ByteTensor mask for src keys per batch :param tgt_key_padding_mask: the ByteTensor mask for tgt keys per batch :param (optional).:

Shape:
  • driver_src: \((n_batch, d_dr)\).

  • bv_src: \((n_batch, n_alt, d_bv)\).

  • src_key_padding_mask: \((n_batch, n_alt)\).

generate_square_subsequent_mask(sz: int) Tensor

Generate a square mask for the sequence. The masked positions are filled with float(‘-inf’). Unmasked positions are filled with float(0.0).

abcgan.persist module

This module supports persistence of the generator and discriminator.

It saves two files a parameters file and a configuration file.

It also supports persisting of multiple modules.

To be persistable in this way the module must have a property containing a json serializable input dictionary as mdl.input_args

abcgan.persist.fullname(inst)
abcgan.persist.persist(generator, critic, name='gan', dir_path='/home/valentic/sandbox/atmosense/atmosense-abcgan/src/abcgan/models', train_conf=None)

Persists abcgan generator and critic modules.

Persists both input arguments and parameters.

Parameters:
  • generator – torch.nn.Module module for the generator

  • critic – torch.nn.Module module for the critic

  • name – str, optional name of the saved configuration

  • dir_path – str, optional default is the models directory. None assumes file is in local directory.

  • train_conf – dict, optional default is None. dictionary of training parameters used

The generator, critic and any transformers passed in as arguments to these must be registered in persist.py and must have a parameter ‘input_args’ that specifies their input arguments as a dictionary

abcgan.persist.recreate(name='gan', dir_path='/home/valentic/sandbox/atmosense/atmosense-abcgan/src/abcgan/models')

Load a pre-trained generator and discriminator.

Parameters:
  • name (str, optional) – name of the configuration to load, as saved by persist. default: ‘wgan_gp’

  • dir_path (str, optional) – default is the models directory. None assumes file is in local directory.

Returns:

  • generator (torch.nn.module) – the loaded generator

  • critic (torch.nn.module) – the loaded critic

  • Modules must have previosuly been saved. All modules are

  • loaded on the cpu, they can subsequently be moved.

abcgan.transforms module

Transforms to and from z-scaled variables.

Uses numpy only (no pytorch)

abcgan.transforms.compute_valid(bvs, bv_thresholds=array([[-1.00000000e+00, 2.88214929e+14], [1.00000000e+00, 1.89264686e+12], [-1.00000000e+00, 5.00000000e+05], [-1.00000000e+00, 4.39506857e+09], [-1.00000000e+00, 1.00247000e+05], [-1.00000000e+00, 8.45428636e+06], [-2.00000000e+03, 2.00000000e+03], [1.00000000e-06, 2.00000000e+03], [-2.00000000e+03, 2.00000000e+03], [1.00000000e-06, 2.00000000e+03], [-2.00000000e+03, 2.00000000e+03], [1.00000000e-06, 2.00000000e+03]]))

Returns a mask which can be used to get rid of invalid background variables samples and outliers

Parameters:
  • bvs (np.ndarray) – (n_samples x n_waves x n_hfp_feat)

  • bv_thresholds (np.ndarray:) – Upper and lower bounds of each bv feature.

Returns:

valid_mask – (n_samples,)

Return type:

bool np.ndarray

abcgan.transforms.compute_valid_hfp(hfps, hfp_thresholds=array([[0., 250.], [0., 8000.], [-8000., 0.], [0., 4000.], [-4000., 0.], [-180., 180.], [175., 275.], [180., 325.]]))

Returns a mask which can be used to get rid of invalid hfp waves and outliers

Parameters:
  • hfps (np.ndarray) – (n_samples x n_waves x n_hfp_feat)

  • hfp_thresholds (np.ndarray:) – Upper and lower bounds of each hfp feature.

Returns:

valid_mask – (n_samples,)

Return type:

bool np.ndarray

abcgan.transforms.compute_valid_wtec(wtec: ndarray, wtec_thresholds: ndarray = array([[-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf]]))

Returns a mask which can be used to get rid of invalid tec waves and outliers

Parameters:
  • wtec (np.ndarray) – (n_samples x n_waves x n_wtec)

  • wtec_thresholds (np.ndarray:) – Upper and lower bounds of each wtec variable.

Returns:

valid_mask – (n_samples,)

Return type:

bool np.ndarray

abcgan.transforms.decode(data, driver_names)

Encode variables, or just add extra dimension

Parameters:
  • data (np.ndarray) – array of feature values.

  • driver_names (list: str) – list driver names in data

Returns:

enc – array of encoded variables

Return type:

np.ndarray

abcgan.transforms.encode(data, name)

Encode variables, or just add extra dimension

Parameters:
  • data (np.ndarray) – array of variable values.

  • name (str) – name of the variable.

Returns:

enc – array of encoded variables (with an extra dimension in all cases)

Return type:

np.ndarray

abcgan.transforms.encoded_driver_names(dr_names: list = ['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])

Gets list of driver feature names in order of :param dr_names: array of driver names. :type dr_names: list

Returns:

driver_feat_names – list of driver feature names

Return type:

list

abcgan.transforms.get_bv(bv_feat, bv_type='radar')

Invert featurization to recover bvs.

Parameters:
  • bv_feat (np.ndarray) – n_samples x n_bv_feat

  • bv_type (str) – radar or lidar bvs

Returns:

scaled_feat – n_samples x n_bv

Return type:

np.ndarray

abcgan.transforms.get_driver(driver_feat, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])

Invert featurization to recover driving parameters.

Parameters:
  • driver_feat (np.ndarray) – n_samples x n_driver_feat

  • driver_names (list: str) – list driver names in driver_feat

Returns:

original driver – n_samples x n_driver

Return type:

np.ndarray

abcgan.transforms.get_hfp(hfp_feat)

Invert featurization to recover hfp.

Parameters:

hfp_feat (np.ndarray) – n_samples x n_hfp_feat

Returns:

hfps – n_samples x n_hfp

Return type:

np.ndarray

abcgan.transforms.get_wtec(wtec_feat: ndarray, dataset_name: None | str = 'LSTIDs_Poker')

Invert featurization to recover tec waves.

Parameters:
  • wtec_feat (np.ndarray) – n_samples x n_wtec_feat

  • dataset_name (str) – specify dataset type for z-scaling

Returns:

wtec – n_samples x n_wtec

Return type:

np.ndarray

abcgan.transforms.scale_bv(bvs, bv_type='radar')

Return a scaled version of the drivers.

Parameters:
  • bvs (np.ndarray) – n_samples x n_bv

  • bv_type (str) – string specifying weather to scale

Returns:

bv_feat – n_samples x n_bv_feat

Return type:

np.ndarray

abcgan.transforms.scale_driver(drivers: ndarray, driver_names: list = ['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])

Return a scaled version of the drivers.

Parameters:
  • drivers (np.ndarray) – n_samples x n_driver

  • driver_names (list: str) – list of driver names

Returns:

driver_feat – n_samples x n_driver_feat

Return type:

np.ndarray

abcgan.transforms.scale_hfp(hfps)

Return a scaled version of the hfps.

Parameters:

hfps (np.ndarray) – n_samples x n_waves x n_hfp

Returns:

hfp_feat – n_samples x n_waves x n_hfp_feat

Return type:

np.ndarray

abcgan.transforms.scale_wtec(wtec: ndarray, dataset_name: None | str = 'LSTIDs_Poker')

Return a scaled version of the tec waves.

Parameters:
  • wtec (np.ndarray) – n_samples x n_wtec_waves x n_wtec

  • dataset_name (str) – specify dataset type for z-scaling

Returns:

  • wtec_feat (np.ndarray) – n_samples x n_wtec_waves x n_wtec_feat

  • valid_mask (np.ndarray) – n_samples x 1

Module contents

abcgan.discriminate(drivers, bvs, hfps=None, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], bv_model='bv_gan', hfp_model='hfp_gan', bv_type='radar')

Score how well the measurements match with historical observations.

Parameters:
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • driver_names (list) – list of names of driving parameters

  • bvs (np.ndarray) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than max_alt.

  • hfps (np.ndarray, optional) – n_samples x n_wave x n_hfps input list of wave measurements,

  • bv_model (str, optional) – name of bv model to use

  • hfp_model (str, optional) – name of model hfp to use

  • bv_type (str. optional) – name of the type of background variables to use (lidar or radar)

Returns:

scores – 1) n_samples x n_alt bv normalcy scores in the range [0, 1.0]. 2) n_samples hfp wave normalcy scores in the range [0, 1.0].

Return type:

(np.ndarray, np.ndarray)

abcgan.estimate_drivers(drivers, model='dr_gan')

Predict drivers 2 hours into the future driver GAN model. Used for real-time background predictions using drivers from 2 hours ago.

Parameters:
  • drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).

  • model (str, optional) – name of model to use

Returns:

predicted_drivers – estimation of driver features two hours from the drivers inputted

Return type:

np.ndarray

abcgan.stack_bvs(bv_dict, bv_type='radar')

Stacks drivers in appropriate format.

This function is provided for convenience.

Parameters:
  • bv_dict (dict) – Dictionary mapping names of background variables to numpy arrays with values for those bvs. Each array should have shape n_sapmles x n_altitudes. Can also use h5py.Group.

  • bv_type (str) – string specifying weather to stack radar or lidar data

  • abcgan.bv_names (Valid names for drivers can be found at) –

Raises:
  • ValueError: – If the input shape of the bv dict values is not corrects

  • KeyError: – If one of the required bvs is missing.

abcgan.stack_drivers(driver_dict, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])

Stacks drivers in appropriate format.

This function is provided for convenience.

Parameters:
  • driver_dict (dict) – Dictionary mapping names of drivers to the numpy arrays with values for those drivers. Each array has a single dimension of the same length n_samples. Can also use an h5py.Group.

  • driver_names (list) – names of the drivers to load

  • abcgan.driver_names (Valid names for drivers can be found at) –

Raises:
  • ValueError: – If the driver values have the wrong type or shape.

  • KeyError: – If one of the required drivers is missing.