Spatial and temporal subsetting

A common task in climate data analysis is subsetting files over a region of interest. Global model simulations and observations cover the entire globe, while impact analyses are often concerned with a region. Instead of downloading the entire file on a local disk, it is often more practical to subset it on the server and only download the relevant part.

This can be done through two ways: interactive analysis using OPeNDAP, or a WPS request for a subsetter. Let’s start with the most direct approach with OPeNDAP. The PAVICS THREDDS server provides two links for each file, a link to the file itself which will download the file locally when accessed, and a dodsC link which supports the OPeNDAP protocol. We’ll use this link and simply pass it to our netCDF library, here xarray.

Subsetting with OPeNDAP

In [1]:
%matplotlib inline
import xarray as xr
import numpy as np
from matplotlib import pyplot as plt
# The dodsC link for the test file
dap = 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/'
ncfile = 'birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612.nc'

# Here we open the file and subset it using xarray fonctionality, which communicates directly with
# the OPeNDAP server to retrieve only the data needed.
ds = xr.open_dataset(dap+ncfile)
tas = ds.tasmax
subtas = tas.sel(time=slice('2006-01-01', '2006-03-01'), lon=slice(188,330), lat=slice(6, 70))
subtas.isel(time=0).plot()
plt.show()
../_images/notebooks_subsetting_1_0.png

Subset processes with WPS and FlyingPigeon

PAVICS offers a number of subsetting processes through the FlyingPigeon WPS server: - subset_continents - subset_countries - subset_bbox - subset_wfs - subset

The subset_continents and subset_countries use a predefined list of polygons for the subsetting. The subset_bbox takes the geographical coordinates of the two opposite corner of a rectangle to define the subset region, while both subset_wfs and subset use a polygon defined on a remote geoserver, identified by a typename and a feature id. The only difference between those two is that subset also does temporal subsetting.

The first step to launch those services is to create a connexion to the WPS server using Birdy’s WPSClient.

In [2]:
from birdy import WPSClient
url = 'https://pavics.ouranos.ca/twitcher/ows/proxy/flyingpigeon/wps'
fp = WPSClient(url)

Now we’ll use fp.subset_continents, so let’s first check what arguments it expects and pass those to the function.

In [3]:
help(fp.subset_continents)
Help on method subset_continents in module birdy.client.base:

subset_continents(region='Africa', mosaic=None, resource=None) method of birdy.client.base.WPSClient instance
    Return the data whose grid cells intersect the selected continents for each input dataset.

    Parameters
    ----------
    region : {'Africa', 'Asia', 'Australia', 'North America', 'Oceania', 'South America', 'Antarctica', 'Europe'}string
        Continent name.
    mosaic : boolean
        If True, selected regions will be merged into a single geometry.
    resource : ComplexData:mimetype:`application/x-netcdf`, :mimetype:`application/x-tar`, :mimetype:`application/zip`
        NetCDF Files or archive (tar/zip) containing netCDF files.

    Returns
    -------
    output : ComplexData:mimetype:`application/x-tar`
        Tar archive of the subsetted netCDF files.
    ncout : ComplexData:mimetype:`application/x-netcdf`
        NetCDF file with subset for one dataset.
    output_log : ComplexData:mimetype:`text/plain`
        Collected logs during process run.

In [4]:
thredds = 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/'
ncfile = 'birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612.nc'
resp = fp.subset_continents(resource=thredds+ncfile, region='Africa')

The response we’re getting can either include the data itself or a reference to the data. Using the get method of the response object, we’ll get what was included in the response. If the response holds only a reference (link) to the output, we can retrieve it using the get(as_obj=True) method. Birdy will then inspect the file format of each output and try to find the appropriate way to open the file and return a Python object. A warning is issued if no converter is found, in which case the original reference is returned.

In [5]:
resp.get()
tar_out, nc_out, log = resp.get(asobj=True)
/home/david/src/birdy/birdy/client/outputs.py:65: UserWarning: No converter was found for mime type: application/x-tar
  warnings.warn(UserWarning("No converter was found for mime type: {}".format(output.mimeType)))

Next, we’ll open the netCDF dataset using xarray and plot the result. Note that since nc_out is an already opened netcdf4.Dataset, we’re using the xr.backends.NetCDF4DataStore function to open the dataset.

In [6]:
import xarray as xr
ds = xr.open_dataset(xr.backends.NetCDF4DataStore(nc_out))
ds.tasmax.isel(time=0).plot()
Out[6]:
<matplotlib.collections.QuadMesh at 0x7fd774d672b0>
../_images/notebooks_subsetting_10_1.png
In [ ]: