import nd
import xarray as xr
import matplotlib.pyplot as plt
import numpy as np
import calendar
plt.rcParams['figure.figsize'] = (12, 8)
For this example we are using a subset of the GHRSST sea surface temperature data set (https://www.ghrsst.org/ghrsst-data-services/products/).
The provided subset contains the monthly mean temperature in the region of the gulf stream for the year 2019.
We can read the NetCDF file using xarray
and convert the temperature to degrees Celsius:
ds = xr.open_dataset('ghrsst_gulf_2019.nc')
sst_celsius = ds['analysed_sst'] - 273.15
We can create a simple visualization of the data for January using built-in plotting methods in xarray
:
sst_celsius.isel(time=0).plot(cmap='plasma', vmin=-10, vmax=30);
mask = ~sst_celsius.isnull().all('time')
We will now be looking at more advanced visualization methods using the nd
library.
In this example, we are going to create a video from the time series data.
mask = (ds['mask'] != 2).any('time')
sst_celsius.nd.to_video('sst.gif', fps=6, fontcolor=(255, 255, 255), cmap='plasma', mask=mask, width=600)
We will now demonstrate how to use scikit-learn
classifiers with xarray
data sets,
with the help of an nd.classify.Classifier
.
As we have no labels for our data, we are going to do a simple clustering using KMeans.
from sklearn.cluster import KMeans
import nd.classify
As we want the clustering to be time invariant we need to specify that 'time' is a feature dimension. This means that the full time series will be used to derive the clusters, rather than considering each time step individually.
clf = nd.classify.Classifier(
KMeans(n_clusters=3),
feature_dims=['time']
)
clf = clf.fit(sst_celsius)
pred = clf.predict(sst_celsius).transpose('y', 'x')
We can visualize the clustering result by color coding the cluster labels:
im = pred.nd.to_rgb(cmap='cool', mask=~pred.isnull())
plt.imshow(im);
The library makes it easy to reproject an entire xarray data set to a new coordinate reference system. rasterio
is used in the backend.
ds_proj = ds.nd.reproject(src_crs='epsg:4326', dst_crs='epsg:2163')
The following shows a slice of the original data (in EPSG:4326) alongside the reprojected data set (EPSG:2163).
fig, ax = plt.subplots(1, 2, figsize=(20, 8))
ds['analysed_sst'].isel(time=0).plot(ax=ax[0])
ax[0].set_title('EPSG:4326')
ds_proj['analysed_sst'].isel(time=0).plot(ax=ax[1])
ax[1].set_title('EPSG:2163');
We will now demonstrate how to apply a pixelwise function over an entire dataset using the apply
method provided by nd
.
In this example, we use np.argmax()
to find the hottest month at each pixel.
hottest = sst_celsius.nd.apply(np.argmax, signature='(time)->()')
plt.figure(figsize=(16, 8))
plt.imshow(hottest.where(mask), cmap=plt.cm.get_cmap('gist_ncar', 12), vmin=-.5, vmax=11.5)
months = np.arange(12)
month_names = map(calendar.month_name.__getitem__, months + 1)
cbar = plt.colorbar(ticks=months)
cbar.ax.set_yticklabels(month_names);
While this example is quite trivial and could be solved without using nd.apply
, the method is very flexible and can efficiently map functions of any signature over arbitrary dimensions.