Advanced options to customize Counterfactual Explanations

Here we discuss a few ways to change DiCE’s behavior.

  • Train a custom ML model

  • Changing feature weights that decide relative importance of features in perturbation

  • Trading off between proximity and diversity goals

  • Selecting the features to change

[1]:
from numpy.random import seed

# import DiCE
import dice_ml
from dice_ml.utils import helpers  # helper functions

# Tensorflow libraries
import tensorflow as tf

# supress deprecation warnings from TF
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
2022-10-19 14:55:55.631881: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-10-19 14:55:55.635417: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-19 14:55:55.635428: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
[2]:
%load_ext autoreload
%autoreload 2

Loading dataset

We use “adult” income dataset from UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/adult). For demonstration purposes, we transform the data as detailed in dice_ml.utils.helpers module.

[3]:
dataset = helpers.load_adult_income_dataset()
dataset.head()
[3]:
age workclass education marital_status occupation race gender hours_per_week income
0 28 Private Bachelors Single White-Collar White Female 60 0
1 30 Self-Employed Assoc Married Professional White Male 65 1
2 32 Private Some-college Married White-Collar White Male 50 0
3 20 Private Some-college Single Service White Female 35 0
4 41 Self-Employed Some-college Married White-Collar White Male 50 0
[4]:
d = dice_ml.Data(dataframe=dataset, continuous_features=['age', 'hours_per_week'], outcome_name='income')

1. Loading a custom ML model

Below, we use an Artificial Neural Network based on Tensorflow framework.

[5]:
# seeding random numbers for reproducability
seed(1)
# from tensorflow import set_random_seed; set_random_seed(2) # for tf1
tf.random.set_seed(1)
[6]:
backend = 'TF'+tf.__version__[0]  # TF1
# provide the trained ML model to DiCE's model object
ML_modelpath = helpers.get_adult_income_modelpath(backend=backend)
# Step 2: dice_ml.Model
m = dice_ml.Model(model_path=ML_modelpath, backend=backend)

Generate diverse counterfactuals

[7]:
# initiate DiCE
exp = dice_ml.Dice(d, m)
2022-10-19 14:56:14.394561: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-10-19 14:56:14.394614: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-10-19 14:56:14.394632: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (AMSHAR-X1): /proc/driver/nvidia/version does not exist
2022-10-19 14:56:14.394838: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[8]:
# query instance in the form of a dictionary; keys: feature name, values: feature value
query_instance = {'age': 22,
                  'workclass': 'Private',
                  'education': 'HS-grad',
                  'marital_status': 'Single',
                  'occupation': 'Service',
                  'race': 'White',
                  'gender': 'Female',
                  'hours_per_week': 45}

We now generate counterfactuals for this input. This may take some time to run–the optimization takes more time in tensorflow2.

[9]:
# generate counterfactuals
dice_exp = exp.generate_counterfactuals(query_instance, total_CFs=4, desired_class="opposite")
  0%|                                                                                         | 0/8 [00:00<?, ?it/s]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [9], in <cell line: 2>()
      1 # generate counterfactuals
----> 2 dice_exp = exp.generate_counterfactuals(query_instance, total_CFs=4, desired_class="opposite")

File /mnt/c/Users/amshar/code/dice/dice_ml/explainer_interfaces/explainer_base.py:160, in ExplainerBase.generate_counterfactuals(self, query_instances, total_CFs, desired_class, desired_range, permitted_range, features_to_vary, stopping_threshold, posthoc_sparsity_param, proximity_weight, sparsity_weight, diversity_weight, categorical_penalty, posthoc_sparsity_algorithm, verbose, **kwargs)
    158     query_instances_list = query_instances
    159 for query_instance in tqdm(query_instances_list):
--> 160     self.data_interface.set_continuous_feature_indexes(query_instance)
    161     res = self._generate_counterfactuals(
    162         query_instance, total_CFs,
    163         desired_class=desired_class,
   (...)
    170         verbose=verbose,
    171         **kwargs)
    172     cf_examples_arr.append(res)

File /mnt/c/Users/amshar/code/dice/dice_ml/data_interfaces/base_data_interface.py:30, in _BaseData.set_continuous_feature_indexes(self, query_instance)
     28 def set_continuous_feature_indexes(self, query_instance):
     29     """Remaps continuous feature indices based on the query instance"""
---> 30     self.continuous_feature_indexes = [query_instance.columns.get_loc(name) for name in
     31                                        self.continuous_feature_names]

File /mnt/c/Users/amshar/code/dice/dice_ml/data_interfaces/base_data_interface.py:30, in <listcomp>(.0)
     28 def set_continuous_feature_indexes(self, query_instance):
     29     """Remaps continuous feature indices based on the query instance"""
---> 30     self.continuous_feature_indexes = [query_instance.columns.get_loc(name) for name in
     31                                        self.continuous_feature_names]

AttributeError: 'str' object has no attribute 'columns'
[10]:
# visualize the resutls
dice_exp.visualize_as_dataframe(show_only_changes=True)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [10], in <cell line: 2>()
      1 # visualize the resutls
----> 2 dice_exp.visualize_as_dataframe(show_only_changes=True)

NameError: name 'dice_exp' is not defined

2. Changing feature weights

It may be the case that some features are harder to change than others (e.g., education level is harder to change than working hours per week). DiCE allows input of relative difficulty in changing a feature through specifying feature weights. A higher feature weight means that the feature is harder to change than others. For instance, one way is to use the mean absolute deviation from the median as a measure of relative difficulty of changing a continuous feature.

Median Absolute Deviation (MAD) of a continuous feature conveys the variability of the feature, and is more robust than standard deviation as is less affected by outliers and non-normality. The inverse of MAD would then imply the ease of varying the feature and is hence used as feature weights in our optimization to reflect the difficulty of changing a continuous feature. By default, DiCE computes this internally and divides the distance between continuous features by the MAD of the feature’s values in the training set. Let’s see what their values are by computing them below:

[11]:
# get MAD
mads = d.get_mads(normalized=True)

# create feature weights
feature_weights = {}
for feature in mads:
    feature_weights[feature] = round(1/mads[feature], 2)
print(feature_weights)
{'age': 7.3, 'hours_per_week': 24.5}

The above feature weights encode that changing age is approximately seven times more difficult than changing categorical variables, and changing hours_per_week is approximately three times more difficult than changing age. Of course, this may sound odd, since a person cannot change their age. In this case, what it’s reflecting is that there is a higher diversity in age values than hours-per-week values. Below we show how to over-ride these weights to assign custom user-defined weights.

Now, let’s try to assign unit weights to the continuous features and see how it affects the counterfactual generation. DiCE allows this through feature_weights parameter.

[12]:
# assigning equal weights
feature_weights = {'age': 1, 'hours_per_week': 1}
[13]:
# generate counterfactuals
dice_exp = exp.generate_counterfactuals(query_instance, total_CFs=4, desired_class="opposite",
                                        feature_weights=feature_weights)
  0%|                                                                                         | 0/8 [00:00<?, ?it/s]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [13], in <cell line: 2>()
      1 # generate counterfactuals
----> 2 dice_exp = exp.generate_counterfactuals(query_instance, total_CFs=4, desired_class="opposite",
      3                                         feature_weights=feature_weights)

File /mnt/c/Users/amshar/code/dice/dice_ml/explainer_interfaces/explainer_base.py:160, in ExplainerBase.generate_counterfactuals(self, query_instances, total_CFs, desired_class, desired_range, permitted_range, features_to_vary, stopping_threshold, posthoc_sparsity_param, proximity_weight, sparsity_weight, diversity_weight, categorical_penalty, posthoc_sparsity_algorithm, verbose, **kwargs)
    158     query_instances_list = query_instances
    159 for query_instance in tqdm(query_instances_list):
--> 160     self.data_interface.set_continuous_feature_indexes(query_instance)
    161     res = self._generate_counterfactuals(
    162         query_instance, total_CFs,
    163         desired_class=desired_class,
   (...)
    170         verbose=verbose,
    171         **kwargs)
    172     cf_examples_arr.append(res)

File /mnt/c/Users/amshar/code/dice/dice_ml/data_interfaces/base_data_interface.py:30, in _BaseData.set_continuous_feature_indexes(self, query_instance)
     28 def set_continuous_feature_indexes(self, query_instance):
     29     """Remaps continuous feature indices based on the query instance"""
---> 30     self.continuous_feature_indexes = [query_instance.columns.get_loc(name) for name in
     31                                        self.continuous_feature_names]

File /mnt/c/Users/amshar/code/dice/dice_ml/data_interfaces/base_data_interface.py:30, in <listcomp>(.0)
     28 def set_continuous_feature_indexes(self, query_instance):
     29     """Remaps continuous feature indices based on the query instance"""
---> 30     self.continuous_feature_indexes = [query_instance.columns.get_loc(name) for name in
     31                                        self.continuous_feature_names]

AttributeError: 'str' object has no attribute 'columns'
[14]:
# visualize the resutls
dice_exp.visualize_as_dataframe(show_only_changes=True)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [14], in <cell line: 2>()
      1 # visualize the resutls
----> 2 dice_exp.visualize_as_dataframe(show_only_changes=True)

NameError: name 'dice_exp' is not defined

Note that we transform continuous features and one-hot-encode categorical features to fall between 0 and 1 in order to handle relative scale of features. However, this also means that the relative ease of changing continuous features is higher than categorical features when the total number of continuous features are very less compared to the total number of categories of all categorical variables combined. This is reflected in the above table where continuous features (age and hours_per_week) have been varied to reach their extreme values (range of age: [17, 90]; range of hours_per_week: [1, 99]) for most of the counterfactuals. This is the reason why the distances are divided by a scaling factor. Deviation from the median provides a robust measure of the variability of a feature’s values, and thus dividing by the MAD allows us to capture the relative prevalence of observing the feature at a particular value (see our paper for more details).

3. Trading off between proximity and diversity goals

We acknowledge that not all counterfactual explanations may be feasible for a user. In general, counterfactuals closer to an individual’s profile will be more feasible. Diversity is also important to help an individual choose between multiple possible options. DiCE allows tunable parameters proximity_weight (default: 0.5) and diversity_weight (default: 1.0) to handle proximity and diversity respectively. Below, we increase the proximity weight and see how the counterfactuals change.

[15]:
# change proximity_weight from default value of 0.5 to 1.5
dice_exp = exp.generate_counterfactuals(query_instance, total_CFs=4, desired_class="opposite",
                                        proximity_weight=1.5, diversity_weight=1.0)
  0%|                                                                                         | 0/8 [00:00<?, ?it/s]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [15], in <cell line: 2>()
      1 # change proximity_weight from default value of 0.5 to 1.5
----> 2 dice_exp = exp.generate_counterfactuals(query_instance, total_CFs=4, desired_class="opposite",
      3                                         proximity_weight=1.5, diversity_weight=1.0)

File /mnt/c/Users/amshar/code/dice/dice_ml/explainer_interfaces/explainer_base.py:160, in ExplainerBase.generate_counterfactuals(self, query_instances, total_CFs, desired_class, desired_range, permitted_range, features_to_vary, stopping_threshold, posthoc_sparsity_param, proximity_weight, sparsity_weight, diversity_weight, categorical_penalty, posthoc_sparsity_algorithm, verbose, **kwargs)
    158     query_instances_list = query_instances
    159 for query_instance in tqdm(query_instances_list):
--> 160     self.data_interface.set_continuous_feature_indexes(query_instance)
    161     res = self._generate_counterfactuals(
    162         query_instance, total_CFs,
    163         desired_class=desired_class,
   (...)
    170         verbose=verbose,
    171         **kwargs)
    172     cf_examples_arr.append(res)

File /mnt/c/Users/amshar/code/dice/dice_ml/data_interfaces/base_data_interface.py:30, in _BaseData.set_continuous_feature_indexes(self, query_instance)
     28 def set_continuous_feature_indexes(self, query_instance):
     29     """Remaps continuous feature indices based on the query instance"""
---> 30     self.continuous_feature_indexes = [query_instance.columns.get_loc(name) for name in
     31                                        self.continuous_feature_names]

File /mnt/c/Users/amshar/code/dice/dice_ml/data_interfaces/base_data_interface.py:30, in <listcomp>(.0)
     28 def set_continuous_feature_indexes(self, query_instance):
     29     """Remaps continuous feature indices based on the query instance"""
---> 30     self.continuous_feature_indexes = [query_instance.columns.get_loc(name) for name in
     31                                        self.continuous_feature_names]

AttributeError: 'str' object has no attribute 'columns'
[16]:
# visualize the resutls
dice_exp.visualize_as_dataframe(show_only_changes=True)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [16], in <cell line: 2>()
      1 # visualize the resutls
----> 2 dice_exp.visualize_as_dataframe(show_only_changes=True)

NameError: name 'dice_exp' is not defined

As we see from above table, both continuous and categorical features are more closer to the original query instance and the counterfactuals are also less diverse than before.

[17]:
# visualize the resutls
dice_exp.visualize_as_dataframe(show_only_changes=True)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [17], in <cell line: 2>()
      1 # visualize the resutls
----> 2 dice_exp.visualize_as_dataframe(show_only_changes=True)

NameError: name 'dice_exp' is not defined