abcgan package
Submodules
abcgan.constants module
File for global constants used in the program.
abcgan.anomaly module
- abcgan.anomaly.anomaly_score_bv(bvs, gen_bvs, method: str = 'joint', alpha: float = 2.0)
returns unbounded anomaly scores for a background profiles given set of generated samples. Low scores represent anomalous events.
- Parameters:
bvs (np.ndarray) – (n_samples x n_alt x n_bv) real background profiles
gen_bvs (np.ndarray) –
- (n_samples x n_repeat x n_alt x n_feat) n_repeat generated
set of generated background profiles for each input sample
method (str, optional) – ‘joint’: estimates a single anomaly score for each altitude bin using joint distribution ‘marginal’: estimates anomaly scores at each alt for each bv feature using marginal distributions
alpha (float) – scalar parameter for sigma (lower alpha –> finner resolution)
- Returns:
anomalies – (n_samples x n_alt) output of anomaly scores if joint (n_samples x n_alt x n_feat) output of anomaly scores if marginal
- Return type:
np.ndarray
- abcgan.anomaly.anomaly_score_hfp(hfps, gen_hfps, method: str = 'joint', alpha: float = 2.0)
returns unbounded anomaly scores for a HFP waves given set of generated samples. Low scores represent anomalous events.
- Parameters:
hfps (np.ndarray) – (n_samples x n_waves x n_bv) real HFPs
gen_hfps (np.ndarray) –
- (n_samples x n_repeat x n_waves x n_feat) n_repeat generated
set of generated hfp waves for each input sample
method (str, optional) – ‘joint’: estimates a single anomaly score for each wave using joint distribution ‘marginal’: estimates anomaly scores on each hfp feature for each wave using marginal distributions
alpha (float) – scalar parameter for sigma (lower alpha –> finner resolution)
- Returns:
anomalies – (n_samples x n_waves) output of anomaly scores if joint (n_samples x n_waves x n_feat) output of anomaly scores if marginal
- Return type:
np.ndarray
- abcgan.anomaly.anomaly_score_wtec(wtecs, gen_wtecs: ndarray, method: str = 'joint', alpha: float = 2.0, dataset_name='LSTIDs_Poker')
returns unbounded anomaly scores for TEC wave parameters given set of generated TEC waves. Low scores represent anomalous events.
- Parameters:
wtecs (np.ndarray) – (n_samples x n_wtec) real background profiles
gen_wtecs (np.ndarray) –
- (n_samples x n_repeat x n_wtec) n_repeat generated
set of generated tec waves for each input sample
method (str, optional) – ‘marginal’: estimates anomaly scores on each tec feature using marginal distributions ‘joint’: estimates anomaly score of each tec wave using joint distribution
alpha (float) – scalar parameter for sigma (lower alpha –> finner resolution)
dataset_name (str) – specify dataset type for z-scaling
- Returns:
anomalies – (n_samples) output of anomaly scores if joint (n_samples x n_feat) output of anomaly scores if marginal
- Return type:
np.ndarray
- abcgan.anomaly.joint_anomaly_estimation(sampled_feats, feats, alpha=1.0)
compute an anomaly scores from the set of generated features using joint distribution based logsumexp computation
- Parameters:
sampled_feats (np.ndarray) – sampled z-scaled features for each input feature
feats (np.ndarray) – input feature broadcast to match sampled_feat dim
alpha (float) – scalar parameter for sigma (lower alpha –> finner resolution)
- Returns:
anomalies – joint anomaly scores (feat.shape[:-1])
- Return type:
np.ndarray
- abcgan.anomaly.marginal_anomaly_estimation(sampled_feats, feats, alpha=2.0)
compute an anomaly scores from the set of generated features using marginal distribution based logsumexp computation
- Parameters:
sampled_feats (np.ndarray) – sampled z-scaled features for each input feature
feats (np.ndarray) – input feature broadcast to match sampled_feat dim
alpha (float) – scalar parameter for sigma (lower alpha –> finner resolution)
- Returns:
anomalies – marginal anomaly scores (feat.shape)
- Return type:
np.ndarray
abcgan.attention module
- abcgan.attention.collect_bv_attn_map(drivers, bvs, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], model='bv_gan', bv_type='radar')
function to collect attention map from weights in pre-trained model
- Parameters:
bvs (np.ndarray) – n_samples x n_alt x n_feat (not z-scaled)
drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).
driver_names (list) – list of names of driving parameters
model (str, optional) – name of model to use
bv_type (str. optional) – name of the type of background variables to use (lidar or radar)
- Returns:
samples – n_alt x n_alt atten map found in transformer altitude.
- Return type:
np.ndarray
- abcgan.attention.collect_hfp_attn_map(drivers, bvs, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], model='hfp_gan', bv_type='radar')
function to collect attention map from the HFP GAN’s decoder
- Parameters:
bvs (np.ndarray) – n_samples x n_alt x n_feat (not z-scaled)
drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).
driver_names (list) – list of names of driving parameters
model (str, optional) – name of model to use
bv_type (str. optional) – name of the type of background variables to use (lidar or radar)
- Returns:
samples – n_waves x n_alt map found in transformer altitude.
- Return type:
np.ndarray
- abcgan.attention.get_decoder_attn(transformer, tgt, memory, key_padding_mask)
- abcgan.attention.get_encoder_attn(layers, src, src_mask, src_key_mask)
Collects attention masks from each encoder layer in the transformer
- Parameters:
layers (list) – list of pytorch encoder layers from transformer
src (torch.tensor) – embedded source tensor
src_mask (torch.tensor) – constant source key mask from transformer
src_key_mask (torch.tensor) – mask for missing source data
- Returns:
samples – n_layers x n_samples x n_alt x n_alts output of attention mask or weights for each encoder layer in the transformer
- Return type:
torch.tensor
abcgan.interface module
Code for top level interface.
This code is added to the main package level in __init__.py
- abcgan.interface.average_wtec(wtec: ndarray, avg_coefficients: List[float] = [0.05, 0.05, 0.05, 0.1, 0.1, 0.1, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05], z_scale_input: bool = False, dataset_name: None | str = 'LSTIDs_Poker')
loads and returns external drivers, tec wave parameters, and unix timestamp all aligned in time with outlier/invalid data filtered out
- Parameters:
wtec (np.ndarray) – tec wave parameter data
avg_coefficients (list (n_wtec_feat,)) – z-scaled averaging coefficients to smooth out the original tec wave parameter distributions.
z_scale_input (bool) – set if the input wtec data is already z-scaled
dataset_name (str) – specify dataset type for z-scaling
- Returns:
wtec – (n_samples x n_wtec) tec wave parameter samples.
- Return type:
np.ndarray
- abcgan.interface.discriminate(drivers, bvs, hfps=None, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], bv_model='bv_gan', hfp_model='hfp_gan', bv_type='radar')
Score how well the measurements match with historical observations.
- Parameters:
drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).
driver_names (list) – list of names of driving parameters
bvs (np.ndarray) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than max_alt.
hfps (np.ndarray, optional) – n_samples x n_wave x n_hfps input list of wave measurements,
bv_model (str, optional) – name of bv model to use
hfp_model (str, optional) – name of model hfp to use
bv_type (str. optional) – name of the type of background variables to use (lidar or radar)
- Returns:
scores – 1) n_samples x n_alt bv normalcy scores in the range [0, 1.0]. 2) n_samples hfp wave normalcy scores in the range [0, 1.0].
- Return type:
(np.ndarray, np.ndarray)
- abcgan.interface.estimate_drivers(drivers, model='dr_gan')
Predict drivers 2 hours into the future driver GAN model. Used for real-time background predictions using drivers from 2 hours ago.
- Parameters:
drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).
model (str, optional) – name of model to use
- Returns:
predicted_drivers – estimation of driver features two hours from the drivers inputted
- Return type:
np.ndarray
- abcgan.interface.generate_bvs(drivers: ndarray, bv_measurements: None | ndarray = None, driver_names: List[str] = ['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], mean_replace_drs: None | List[str] = None, n_alt: int = 30, bv_model: str = 'bv_gan', model_dir: None | str = None, bv_type: str = 'radar', return_z_scale: bool = False, cuda_index: None | int = None, verbose: int = 1)
Generate background variable profiles consistent with the historical distribution.
- Parameters:
drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).
driver_names (list) – list of names of driving parameters
mean_replace_drs (list) – list of driver that will set to its average, i.e. z-scaled value of zero
bv_measurements (np.ndarray, optional) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than n_alt. These represent fixed measurements for the lowest altitudes to condition on.
n_alt (int, optional) – number of altitude measurements to draw, defaults to max_alt
return_z_scale (bool, optional) – set to have the function return z scaled feature data
bv_model (str, optional) – name of bv GAN to use
model_dir (str, optional) – directory to load model from
bv_type (str. optional) – name of the type of background variables to use (lidar or radar)
cuda_index (int, optional) – GPU index to use when generating BVs and HFPs
verbose (bool, optional) – set to show loading bar
- Returns:
samples – 1) n_samples x n_alt x n_bvs output measurements at each requested altitude. 2) n_samples x n_hfps generated hfp waves 3) n_sample probabilities that the generated wave is present
- Return type:
(np.ndarray, np.ndarray, np.ndarray)
- abcgan.interface.generate_hfps(drivers: ndarray, bv_measurements: ndarray, driver_names: List[str] = ['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], mean_replace_drs: None | List[str] = None, hfp_model: str = 'hfp_gan', model_dir: None | str = None, return_z_scale: bool = False, cuda_index: None | int = None, verbose: int = 1)
Generate background variable profiles and HFP waves consistent with the historical distribution.
- Parameters:
drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).
driver_names (list) – list of names of driving parameters
mean_replace_drs (list) – list of driver that will set to its average, i.e. z-scaled value of zero
bv_measurements (np.ndarray, optional) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than n_alt. These represent fixed measurements for the lowest altitudes to condition on.
return_z_scale (bool, optional) – set to have the function return z scaled feature data
hfp_model (str, optional) – name of hfp GAN to use
model_dir (str, optional) – directory to load model from
cuda_index (int, optional) – GPU index to use when generating BVs and HFPs
verbose (bool, optional) – set to show loading bar
- Returns:
samples – 1) n_samples x n_hfps generated hfp waves 2) n_sample probabilities that the generated wave is present
- Return type:
(np.ndarray, np.ndarray)
- abcgan.interface.generate_multi_bv(drivers: ndarray, bvs: None | ndarray = None, n_repeat: int = 10, n_alt: int = 30, bv_model: str = 'bv_gan', bv_type: str = 'radar', cuda_index: None | int = None, verbose: int = 1)
Generate multiple background variable profiles consistent with the historical distribution for each driver sample. Used for anomaly detection.
- Parameters:
drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).
bvs (np.ndarray, optional) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than n_alt. These represent fixed measurements for the lowest altitudes to condition on. Usually left as default (None)
n_repeat (int, optional) – number of bv profiles to generate for each driver sample
n_alt (int, optional) – number of altitude measurements to draw, defaults to max_alt
bv_model (str, optional) – name of bv GAN to use
bv_type (str. optional) – name of the type of background variables to use (lidar or radar)
cuda_index (int, optional) – GPU index to use when generating BVs and HFPs
verbose (bool, optional) – set to show loading bar
- Returns:
G_bvs – output measurements at each requested altitude.
- Return type:
(n_samples x n_repeat x n_alt x n_bvs)
- abcgan.interface.generate_multi_hfp(drivers: ndarray, bvs: ndarray, n_repeat: int = 10, hfp_model: str = 'hfp_gan', cuda_index: None | int = None, verbose: int = 1)
Generate HFP waves consistent with the historical distribution for each driver sample. Creates multiple samples for a single input driver/bv profile to be using in anomaly detection
- Parameters:
drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).
bvs (np.ndarray, optional) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than n_alt but greater than zero. These represent fixed measurements that the HFP GAN will use as conditioning.
n_repeat (int, optional) – number of hfp waves to generate for each driver sample
hfp_model (str, optional) – name of hfp gan to use
cuda_index (int, optional) – GPU index to use when generating and HFPs
verbose (bool, optional) – set to show loading bar
- Returns:
samples – 1) G_hfps (n_samples x n_repeat x 1 x n_hfps) generated hfp waves 2) G_b (n_sample x n_repeat) probabilities that the generated wave is present
- Return type:
(np.ndarray, np.ndarray)
- abcgan.interface.generate_multi_wtec(drivers: ndarray, n_repeat: int = 10, wtec_model: str = 'wtec_gan_LSTIDs_Poker', model_dir: None | str = None, dataset_name='LSTIDs_Poker', cuda_index: None | int = None, verbose: int = 0)
Generate multiple background variable profiles and HFP waves consistent with the historical distribution for each driver sample
- Parameters:
drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).
n_repeat (int, optional) – number of waves to generate for each driver sample
wtec_model (str, optional) – name of WTEC GAN to use
model_dir (str, optional) – directory to load model from
dataset_name (str) – specify dataset type for z-scaling
cuda_index (int, optional) – GPU index
verbose (bool, optional) – set to show loading bar
- Returns:
samples – (n_samples x n_repeat x n_wtec) output wtec measurements.
- Return type:
np.ndarray
- abcgan.interface.generate_wtec(drivers, driver_names: list = ['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], mean_replace_drs: None | List[str] = None, wtec_model: str = 'wtec_gan_LSTIDs_Poker', dataset_name: None | str = None, model_dir: None | str = None, return_z_scale: bool = False, cuda_index: None | int = None, verbose: int = 1)
Generate background variable profiles and HFP waves consistent with the historical distribution.
- Parameters:
drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).
driver_names (list) – list of names of driving parameters
mean_replace_drs (list) – list of driver that will set to its average, i.e. z-scaled value of zero
return_z_scale (bool, optional) – set to have the function return z scaled feature data
wtec_model (str, optional) – name of WTEC GAN to use
dataset_name (str) – specify dataset type for z-scaling
model_dir (str, optional) – directory to load model from
cuda_index (int, optional) – GPU index to use when generating BVs and HFPs
verbose (bool, optional) – set to show loading bar
- Returns:
samples – 1) n_samples x n_wtec output measurements
- Return type:
np.ndarray
- abcgan.interface.hellinger_scores_bv(real: ndarray, fake: ndarray, mask: None | ndarray = None, bins: None | int = None, filter_length: None | int = None, return_hist_info: bool = False, z_scale: bool = True, z_scale_input: bool = False, bv_type: str = 'radar')
Returns the hellinger distance score that measures how similarity between real and generated background variable profiles.
- Parameters:
real (np.ndarray) – tensor of real values for a particular alt and bv feat
fake (np.ndarray) – tensor of generated values for a particular alt and bv feat
bins (int) – number of bins to use in histogram calculations (If None # of bins will be calculated based on number of samples)
filter_length (int) – averaging filter length to smooth out noise in histograms (If None filter length will be calculated based on number of samples)
return_hist_info (bool) – set to have function return the histograms and bin edges used which were used to calculate the hellinger distance metric.
z_scale (bool) – used z-scaled values when calculating hellinger distance (recommended)
z_scale_input (bool) – Set if you are inputting bvs that are already z-scaled
bv_type – type of data (radar or lidar)
- Returns:
dist – the hellinger distance (n_alts x n_feats)
hist_info – additional histogram data that was used to calculate hellinger dist. Info includes to real and fake histograms and there shared bin edges.
- abcgan.interface.hellinger_scores_hfp(real: ndarray, fake: ndarray, r_mask: None | ndarray = None, f_mask: None | ndarray = None, n_bins: None | tuple | int = None, filter_length: None | int = None, return_hist_info: bool = False, z_scale: bool = True, z_scale_input: bool = False)
Returns the hellinger distance score that measures the similarity between real and generated background variable profiles.
- Parameters:
real – tensor of real values for a particular alt and bv feat
fake – tensor of generated values for a particular alt and bv feat
n_bins – tensor of real values for a particular alt and bv feat
filter_length – averaging filter length to smooth out histograms
z_scale (bool) – used z-scaled values (recommended)
z_scale_input (bool) – Set if you are inputting hfps that are already z-scaled
return_hist_info (bool) – set to have function return the real hist, fake hist, and bin edges used in calculation
- Returns:
dist – the hellinger distance (n_alts or n_waves x n_feats)
hist_info – additional histogram data that was used to calculate hellinger dist. Info includes to real and fake histograms and there shared bin edges.
- abcgan.interface.hellinger_scores_wtec(real: ndarray, fake: ndarray, n_bins: None | int = None, filter_length: None | int = None, z_scale: bool = True, z_scale_inputs: bool = False, dataset_name: None | str = 'LSTIDs_Poker', return_hist_info: bool = False)
Returns the hellinger distance score that measures the similarity between real and generated tec wave.
- Parameters:
real – array of real tec waves
fake – array of fake/generated tec waves
n_bins – number of bins to use during hellinger score calculation
filter_length – averaging filter length to smooth out histograms
z_scale (bool) – used z-scaled values (recommended)
z_scale_inputs (bool) – Set if you are inputting hfps that are already z-scaled
dataset_name (str) – specify dataset type for z-scaling
return_hist_info (bool) – set to have function return the real hist, fake hist, and bin edges used in calculation
- Returns:
dist – the hellinger distance (1 x n_feats)
hist_info – additional histogram data that was used to calculate hellinger dist. Info includes to real and fake histograms and there shared bin edges.
- abcgan.interface.load_h5_data(fname, bv_type: str = 'radar', load_hfp: bool = False, n_samples: None | int = None, random_start: bool = False, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])
loads and returns external drivers, background variables, HFP waves and data mask all aligned in time with outlier/invalid data filtered out
- Parameters:
fname (str) – name of h5 file to load the data from
bv_type (str. optional) – name of the type of background variables to use (lidar or radar)
load_hfp (bool. optional) – set to load HFP waves along with the background variables
n_samples (int. optional) – number of samples to load (None to load all samples)
random_start (bool. optional) – randomize starting index to select n_samples from
driver_names (list. optional) – list of driver names to load
- Returns:
drivers (np.ndarray) – (n_samples x n_dr) external drivers.
bvs (np.ndarray) – (n_samples x n_alt x n_bv) background variables.
alt_mask (np.ndarray) – (n_samples x n_alt) background variables alt mask.
hfps (np.ndarray) – (n_samples x 1 x n_hpf) HPF waves.
wave_mask (np.ndarray) – (n_samples x 1) HFP wave mask.
unix_time (np.ndarray) – (n_samples, ) time stamp of each sample
- abcgan.interface.load_wtec_h5(fname: str, dataset_name='LSTIDs_Poker', n_samples: None | int = None, avg_coefficients: None | List[float] = None, random_start: bool = True)
loads and returns external drivers, tec wave parameters, and unix timestamp all aligned in time with outlier/invalid data filtered out
- Parameters:
fname (str) – name of h5 file to load the data from
dataset_name (str) – specify dataset type for z-scaling
n_samples (int. optional) – number of samples to load (None to load all samples)
avg_coefficients (list (n_wtec_feat,)) – z-scaled averaging coefficients to smooth out the original tec wave parameter distributions.
random_start (bool. optional) – randomize starting index to select n_samples from
- Returns:
drivers (np.ndarray) – (n_samples x n_wtec_dr) external drivers.
wtec (np.ndarray) – (n_samples x n_wtec) tec wave parameter samples.
unix_time (np.ndarray) – (n_samples, ) time stamp of each sample
- abcgan.interface.stack_bvs(bv_dict, bv_type='radar')
Stacks drivers in appropriate format.
This function is provided for convenience.
- Parameters:
bv_dict (dict) – Dictionary mapping names of background variables to numpy arrays with values for those bvs. Each array should have shape n_sapmles x n_altitudes. Can also use h5py.Group.
bv_type (str) – string specifying weather to stack radar or lidar data
abcgan.bv_names (Valid names for drivers can be found at) –
- Raises:
ValueError: – If the input shape of the bv dict values is not corrects
KeyError: – If one of the required bvs is missing.
- abcgan.interface.stack_drivers(driver_dict, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])
Stacks drivers in appropriate format.
This function is provided for convenience.
- Parameters:
driver_dict (dict) – Dictionary mapping names of drivers to the numpy arrays with values for those drivers. Each array has a single dimension of the same length n_samples. Can also use an h5py.Group.
driver_names (list) – names of the drivers to load
abcgan.driver_names (Valid names for drivers can be found at) –
- Raises:
ValueError: – If the driver values have the wrong type or shape.
KeyError: – If one of the required drivers is missing.
abcgan.mask module
- abcgan.mask.mask_altitude(bv_feat)
Creates an altitude mask for nans in bvs.
Also replaces nans with numbers.
- Parameters:
bv_feat (torch.Tensor) – background variables
- Returns:
bv_feat (torch.Tensor) – bv_feat with nans replaced, done in place but returned for clarity
alt_mask (torch.Tensor) – Mask that is true for valid altitudes
- Raises:
ValueError: – If valid values are not contiguous.
- abcgan.mask.prev_driver_mask(unix_time)
Creates a driver mask of samples that have a previous sample and a mapping vector to the previous sample.
- Parameters:
unix_time (np.array) – time stamp of driver samples
- Returns:
prev_dr_map (np.array) – vector mapping each sample to its delayed sample
dr_mask (torch.Tensor) – Mask of valid driver samples that have a delayed sample
abcgan.mean_estimation module
- class abcgan.mean_estimation.Transformer(d_dr: int = 32, d_bv: int = 12, n_alt: int = 30, d_model: int = 64, nhead: int = 1, num_encoder_layers: int = 1, dim_feedforward: int = 64, dropout: float = 0.0, activation: str = 'relu')
Bases:
Module
Transformer with only the encoder :param d_model: the number of expected features in the encoder/decoder inputs :param d_stack: the number of features to stack to output :param nhead: the number of heads in the multiheadattention models :param num_encoder_layers: the number of sub-encoder-layers in the encoder :param dim_feedforward: the dimension of the feedforward network model :param dropout: the dropout value :param activation: the activation function of encoder/decoder intermediate layer
- forward(driver_src: Tensor, bv_src: Tensor, src_key_padding_mask: Tensor | None = None)
Take in and process masked source/target sequences. :param driver_src: the sequence to the encoder (required). :param bv_src: the sequence to the decoder (required). :param src_key_padding_mask: the ByteTensor mask for src keys per batch :param (optional).:
- Shape:
driver_src: \((n_batch, d_dr)\).
bv_src: \((n_batch, n_alt, d_bv)\).
src_key_padding_mask: \((n_batch, n_alt)\).
- generate_square_subsequent_mask(sz: int) Tensor
Generate a square mask for the sequence. The masked positions are filled with float(‘-inf’). Unmasked positions are filled with float(0.0).
abcgan.bv_model module
- class abcgan.bv_model.Critic(transformer: Module, n_layers=4, img_dim=12, hidden_dim=128)
Bases:
Module
Critic Class
- Parameters:
transformer (torch.nn.Module) – transformer for the critic
n_layers (int) – number of layers in MLP
img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar
hidden_dim (int) – the inner dimension, a scalar
- forward(bv_features, driver_src, real, src_key_mask=None)
Function for completing a forward pass of the critic: Given an image tensor, returns a 1-dimension tensor representing a fake/real prediction.
- Parameters:
bv_features (torch.Tensor) – a flattened image tensor with dimension (n_batch, max_alt, n_bv_feat)
driver_src (torch.Tensor) – tensor of driver features from data loader (n_batch, n_dr_feat)
real (torch.Tensor) – tensor of bv features from data loader (n_batch, n_alt, n_bv_feat)
src_key_mask (torch.Tensor, optional) – mask for bv features from data loader (n_batch, n_alt)
- class abcgan.bv_model.Driver_Critic(n_layers=2, img_dim=32, hidden_dim=64)
Bases:
Module
Critic Class
- Parameters:
n_layers (int) – number of layers in MLP
img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar
hidden_dim (int) – the inner dimension, a scalar
- forward(dr_src, dr_prev)
forward pass of the critic for driver augmentation: Given an image tensor, returns a 1-dimension tensor representing a fake/real prediction.
- Parameters:
dr_src (torch.Tensor) – tensor of driver features (n_batch, n_dr_feat)
dr_prev (torch.Tensor) – tensor of past driver features (n_batch, n_dr_feat)
- class abcgan.bv_model.Driver_Generator(n_layers=2, latent_dim=16, img_dim=32, hidden_dim=64)
Bases:
Module
Generator Class
- Parameters:
n_layers (int) – number of MLP layers
latent_dim (int) – the dimension of the input latent vector
img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar
hidden_dim (int) – the inner dimension, a scalar
- forward(dr_prev, noise=None)
forward pass of the generator for driver augmentation: Given driver sample from the past and noise tensor, returns generated driver sample. :param dr_prev: tensor of past driver features from data loader (n_batch, n_dr_feat) :type dr_prev: torch.Tensor :param noise: a noise tensor with dimensions (n_batch, latent_dim) :type noise: torch.Tensor, optional
- class abcgan.bv_model.Generator(transformer: Module, n_layers=4, latent_dim=16, img_dim=12, hidden_dim=128)
Bases:
Module
Generator Class
- Parameters:
transformer (torch.nn.Module) – transformer for the generator
n_layers (int) – number of MLP layers
latent_dim (int) – the dimension of the input latent vector
img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar
hidden_dim (int) – the inner dimension, a scalar
- forward(driver_src, bv_src, src_key_mask=None, noise=None)
Function for completing a forward pass of the generator: Given a noise tensor, returns generated images. :param driver_src: tensor of driver features from data loader (n_batch, n_dr_feat) :type driver_src: torch.Tensor :param bv_src: tensor of bv featrues from data loader (n_batch, n_alt, n_bv_feat) :type bv_src: torch.Tensor :param src_key_mask: mask for bv features from data loader (n_alt, n_batch) :type src_key_mask: torch.Tensor, optional :param noise: a noise tensor with dimensions (n_batch, latent_dim) :type noise: torch.Tensor, optional
abcgan.hfp_model module
- class abcgan.hfp_model.HFP_Critic(transformer: Module, n_layers=4, img_dim=8, hidden_dim=128)
Bases:
Module
Critic Class
- Parameters:
transformer (torch.nn.Module) – transformer for the critic
n_layers (int) – number of layers in MLP
img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar
hidden_dim (int) – the inner dimension, a scalar
- forward(dr_src, real_bv, real_hfp, hfp_feat, src_key_mask=None, tgt_key_mask=None)
Critic forward
- Parameters:
dr_src (torch.Tensor) – tensor of driver features from data loader (n_batch, n_dr_feat)
real_bv (torch.Tensor) – tensor of bv features from data loader (n_batch, n_alt, n_bv_feat)
real_hfp (torch.Tensor) – tensor of real hfp features from data loader (n_batch, n_wave, n_bv_feat)
hfp_feat (torch.Tensor) – tensor of hfp features from data loader (n_batch, n_wave, n_bv_feat)
src_key_mask (torch.Tensor, optional) – mask for bv features from data loader (n_batch, n_alt)
tgt_key_mask (torch.Tensor, optional) – mask for hfp features from data loader (n_batch, n_wave)
- class abcgan.hfp_model.HFP_Generator(transformer: Module, n_layers=4, latent_dim=16, img_dim=8, hidden_dim=128)
Bases:
Module
Generator Class
- Parameters:
transformer (torch.nn.Module) – the transformer model object that estimates hfp features
n_layers (int) – number of MLP layers
latent_dim (int) – the dimension of the input latent vector
img_dim (int) – the dimension of the images, fitted for the dataset used, a scalar
hidden_dim (int) – the inner dimension, a scalar
- forward(driver_src, bv_src, hfp_tgt, src_key_mask=None, tgt_key_mask=None, noise=None)
Function for completing a forward pass of the generator: Given a noise tensor, returns generated images. :param driver_src: tensor of driver features from data loader (n_batch, n_dr_feat) :type driver_src: torch.Tensor :param bv_src: tensor of bv featrues from data loader (n_batch, n_alt, n_bv_feat) :type bv_src: torch.Tensor :param hfp_tgt: tensor of hfp featrues from data loader (n_batch, n_wave, n_hfp_feat) :type hfp_tgt: torch.Tensor :param src_key_mask: mask for bv features from data loader (n_alt, n_batch) :type src_key_mask: torch.Tensor, optional :param tgt_key_mask: mask for hfp features from data loader (n_batch, n_wave) :type tgt_key_mask: torch.Tensor, optional :param noise: a noise tensor with dimensions (n_batch, latent_dim) :type noise: torch.Tensor, optional
- class abcgan.hfp_model.HFP_Transformer(d_dr: int = 32, d_bv: int = 12, n_alt: int = 30, n_waves: int = 1, d_hfp: int = 8, d_model: int = 256, output_b: bool = False, nhead: int = 1, num_encoder_layers: int = 1, num_decoder_layers: int = 1, dim_feedforward: int = 1024, dropout: float = 0.1, activation: str = 'relu')
Bases:
Module
Transformer with only the encoder :param d_model: the number of expected features in the encoder/decoder inputs :param nhead: the number of heads in the multiheadattention models :param num_encoder_layers: the number of sub-encoder-layers in the encoder :param dim_feedforward: the dimension of the feedforward network model :param dropout: the dropout value :param activation: the activation function of encoder/decoder intermediate layer
- forward(driver_src: Tensor, bv_src: Tensor, hfp_tgt: Tensor, src_key_padding_mask: Tensor | None = None, tgt_key_padding_mask: Tensor | None = None)
Take in and process masked source/target sequences. :param driver_src: conditioning driver input (required). :param bv_src: the sequence to the encoder (required). :param hfp_tgt: the sequence to the decoder (required). :param src_key_padding_mask: the ByteTensor mask for src keys per batch :param tgt_key_padding_mask: the ByteTensor mask for tgt keys per batch :param (optional).:
- Shape:
driver_src: \((n_batch, d_dr)\).
bv_src: \((n_batch, n_alt, d_bv)\).
src_key_padding_mask: \((n_batch, n_alt)\).
- generate_square_subsequent_mask(sz: int) Tensor
Generate a square mask for the sequence. The masked positions are filled with float(‘-inf’). Unmasked positions are filled with float(0.0).
abcgan.persist module
This module supports persistence of the generator and discriminator.
It saves two files a parameters file and a configuration file.
It also supports persisting of multiple modules.
To be persistable in this way the module must have a property containing a json serializable input dictionary as mdl.input_args
- abcgan.persist.fullname(inst)
- abcgan.persist.persist(generator, critic, name='gan', dir_path='/home/valentic/sandbox/atmosense/atmosense-abcgan/src/abcgan/models', train_conf=None)
Persists abcgan generator and critic modules.
Persists both input arguments and parameters.
- Parameters:
generator – torch.nn.Module module for the generator
critic – torch.nn.Module module for the critic
name – str, optional name of the saved configuration
dir_path – str, optional default is the models directory. None assumes file is in local directory.
train_conf – dict, optional default is None. dictionary of training parameters used
The generator, critic and any transformers passed in as arguments to these must be registered in persist.py and must have a parameter ‘input_args’ that specifies their input arguments as a dictionary
- abcgan.persist.recreate(name='gan', dir_path='/home/valentic/sandbox/atmosense/atmosense-abcgan/src/abcgan/models')
Load a pre-trained generator and discriminator.
- Parameters:
name (str, optional) – name of the configuration to load, as saved by persist. default: ‘wgan_gp’
dir_path (str, optional) – default is the models directory. None assumes file is in local directory.
- Returns:
generator (torch.nn.module) – the loaded generator
critic (torch.nn.module) – the loaded critic
Modules must have previosuly been saved. All modules are
loaded on the cpu, they can subsequently be moved.
abcgan.transforms module
Transforms to and from z-scaled variables.
Uses numpy only (no pytorch)
- abcgan.transforms.compute_valid(bvs, bv_thresholds=array([[-1.00000000e+00, 2.88214929e+14], [1.00000000e+00, 1.89264686e+12], [-1.00000000e+00, 5.00000000e+05], [-1.00000000e+00, 4.39506857e+09], [-1.00000000e+00, 1.00247000e+05], [-1.00000000e+00, 8.45428636e+06], [-2.00000000e+03, 2.00000000e+03], [1.00000000e-06, 2.00000000e+03], [-2.00000000e+03, 2.00000000e+03], [1.00000000e-06, 2.00000000e+03], [-2.00000000e+03, 2.00000000e+03], [1.00000000e-06, 2.00000000e+03]]))
Returns a mask which can be used to get rid of invalid background variables samples and outliers
- Parameters:
bvs (np.ndarray) – (n_samples x n_waves x n_hfp_feat)
bv_thresholds (np.ndarray:) – Upper and lower bounds of each bv feature.
- Returns:
valid_mask – (n_samples,)
- Return type:
bool np.ndarray
- abcgan.transforms.compute_valid_hfp(hfps, hfp_thresholds=array([[0., 250.], [0., 8000.], [-8000., 0.], [0., 4000.], [-4000., 0.], [-180., 180.], [175., 275.], [180., 325.]]))
Returns a mask which can be used to get rid of invalid hfp waves and outliers
- Parameters:
hfps (np.ndarray) – (n_samples x n_waves x n_hfp_feat)
hfp_thresholds (np.ndarray:) – Upper and lower bounds of each hfp feature.
- Returns:
valid_mask – (n_samples,)
- Return type:
bool np.ndarray
- abcgan.transforms.compute_valid_wtec(wtec: ndarray, wtec_thresholds: ndarray = array([[-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf], [-inf, inf]]))
Returns a mask which can be used to get rid of invalid tec waves and outliers
- Parameters:
wtec (np.ndarray) – (n_samples x n_waves x n_wtec)
wtec_thresholds (np.ndarray:) – Upper and lower bounds of each wtec variable.
- Returns:
valid_mask – (n_samples,)
- Return type:
bool np.ndarray
- abcgan.transforms.decode(data, driver_names)
Encode variables, or just add extra dimension
- Parameters:
data (np.ndarray) – array of feature values.
driver_names (list: str) – list driver names in data
- Returns:
enc – array of encoded variables
- Return type:
np.ndarray
- abcgan.transforms.encode(data, name)
Encode variables, or just add extra dimension
- Parameters:
data (np.ndarray) – array of variable values.
name (str) – name of the variable.
- Returns:
enc – array of encoded variables (with an extra dimension in all cases)
- Return type:
np.ndarray
- abcgan.transforms.encoded_driver_names(dr_names: list = ['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])
Gets list of driver feature names in order of :param dr_names: array of driver names. :type dr_names: list
- Returns:
driver_feat_names – list of driver feature names
- Return type:
list
- abcgan.transforms.get_bv(bv_feat, bv_type='radar')
Invert featurization to recover bvs.
- Parameters:
bv_feat (np.ndarray) – n_samples x n_bv_feat
bv_type (str) – radar or lidar bvs
- Returns:
scaled_feat – n_samples x n_bv
- Return type:
np.ndarray
- abcgan.transforms.get_driver(driver_feat, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])
Invert featurization to recover driving parameters.
- Parameters:
driver_feat (np.ndarray) – n_samples x n_driver_feat
driver_names (list: str) – list driver names in driver_feat
- Returns:
original driver – n_samples x n_driver
- Return type:
np.ndarray
- abcgan.transforms.get_hfp(hfp_feat)
Invert featurization to recover hfp.
- Parameters:
hfp_feat (np.ndarray) – n_samples x n_hfp_feat
- Returns:
hfps – n_samples x n_hfp
- Return type:
np.ndarray
- abcgan.transforms.get_wtec(wtec_feat: ndarray, dataset_name: None | str = 'LSTIDs_Poker')
Invert featurization to recover tec waves.
- Parameters:
wtec_feat (np.ndarray) – n_samples x n_wtec_feat
dataset_name (str) – specify dataset type for z-scaling
- Returns:
wtec – n_samples x n_wtec
- Return type:
np.ndarray
- abcgan.transforms.scale_bv(bvs, bv_type='radar')
Return a scaled version of the drivers.
- Parameters:
bvs (np.ndarray) – n_samples x n_bv
bv_type (str) – string specifying weather to scale
- Returns:
bv_feat – n_samples x n_bv_feat
- Return type:
np.ndarray
- abcgan.transforms.scale_driver(drivers: ndarray, driver_names: list = ['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])
Return a scaled version of the drivers.
- Parameters:
drivers (np.ndarray) – n_samples x n_driver
driver_names (list: str) – list of driver names
- Returns:
driver_feat – n_samples x n_driver_feat
- Return type:
np.ndarray
- abcgan.transforms.scale_hfp(hfps)
Return a scaled version of the hfps.
- Parameters:
hfps (np.ndarray) – n_samples x n_waves x n_hfp
- Returns:
hfp_feat – n_samples x n_waves x n_hfp_feat
- Return type:
np.ndarray
- abcgan.transforms.scale_wtec(wtec: ndarray, dataset_name: None | str = 'LSTIDs_Poker')
Return a scaled version of the tec waves.
- Parameters:
wtec (np.ndarray) – n_samples x n_wtec_waves x n_wtec
dataset_name (str) – specify dataset type for z-scaling
- Returns:
wtec_feat (np.ndarray) – n_samples x n_wtec_waves x n_wtec_feat
valid_mask (np.ndarray) – n_samples x 1
Module contents
- abcgan.discriminate(drivers, bvs, hfps=None, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'], bv_model='bv_gan', hfp_model='hfp_gan', bv_type='radar')
Score how well the measurements match with historical observations.
- Parameters:
drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).
driver_names (list) – list of names of driving parameters
bvs (np.ndarray) – n_samples x n_alt_in x n_meas input list of altitude measurements, n_alt_in should be less than max_alt.
hfps (np.ndarray, optional) – n_samples x n_wave x n_hfps input list of wave measurements,
bv_model (str, optional) – name of bv model to use
hfp_model (str, optional) – name of model hfp to use
bv_type (str. optional) – name of the type of background variables to use (lidar or radar)
- Returns:
scores – 1) n_samples x n_alt bv normalcy scores in the range [0, 1.0]. 2) n_samples hfp wave normalcy scores in the range [0, 1.0].
- Return type:
(np.ndarray, np.ndarray)
- abcgan.estimate_drivers(drivers, model='dr_gan')
Predict drivers 2 hours into the future driver GAN model. Used for real-time background predictions using drivers from 2 hours ago.
- Parameters:
drivers (np.ndarray) – n_samples x n_drivers input list of driving parameters (not z-scaled).
model (str, optional) – name of model to use
- Returns:
predicted_drivers – estimation of driver features two hours from the drivers inputted
- Return type:
np.ndarray
- abcgan.stack_bvs(bv_dict, bv_type='radar')
Stacks drivers in appropriate format.
This function is provided for convenience.
- Parameters:
bv_dict (dict) – Dictionary mapping names of background variables to numpy arrays with values for those bvs. Each array should have shape n_sapmles x n_altitudes. Can also use h5py.Group.
bv_type (str) – string specifying weather to stack radar or lidar data
abcgan.bv_names (Valid names for drivers can be found at) –
- Raises:
ValueError: – If the input shape of the bv dict values is not corrects
KeyError: – If one of the required bvs is missing.
- abcgan.stack_drivers(driver_dict, driver_names=['Ap', 'F10.7', 'F10.7avg', 'MEI', 'MLT', 'RMM1', 'RMM2', 'SLT', 'SZA', 'ShadHeight', 'T10', 'T100', 'T2', 'T30', 'T5', 'T70', 'TCI', 'U10', 'U100', 'U2', 'U30', 'U5', 'U70', 'ap', 'dst', 'moon_phase', 'moon_x', 'moon_y', 'moon_z'])
Stacks drivers in appropriate format.
This function is provided for convenience.
- Parameters:
driver_dict (dict) – Dictionary mapping names of drivers to the numpy arrays with values for those drivers. Each array has a single dimension of the same length n_samples. Can also use an h5py.Group.
driver_names (list) – names of the drivers to load
abcgan.driver_names (Valid names for drivers can be found at) –
- Raises:
ValueError: – If the driver values have the wrong type or shape.
KeyError: – If one of the required drivers is missing.