{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Spatial and temporal subsetting \n", "\n", "A common task in climate data analysis is subsetting files over a region of interest. Global model simulations and observations cover the entire globe, while impact analyses are often concerned with a region. Instead of downloading the entire file on a local disk, it is often more practical to subset it on the server and only download the relevant part. \n", "\n", "This can be done through two ways: interactive analysis using OPeNDAP, or a WPS request for a subsetter. Let's start with the most direct approach with OPeNDAP. The PAVICS THREDDS server provides two links for each file, a link to the file itself which will download the file locally when accessed, and a *dodsC* link which supports the OPeNDAP protocol. We'll use this link and simply pass it to our netCDF library, here `xarray`. \n", "\n", "## Subsetting with OPeNDAP" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "import xarray as xr\n", "import numpy as np\n", "from matplotlib import pyplot as plt\n", "# The dodsC link for the test file\n", "dap = 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/'\n", "ncfile = 'birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612.nc'\n", "\n", "# Here we open the file and subset it using xarray fonctionality, which communicates directly with \n", "# the OPeNDAP server to retrieve only the data needed. \n", "ds = xr.open_dataset(dap+ncfile)\n", "tas = ds.tasmax\n", "subtas = tas.sel(time=slice('2006-01-01', '2006-03-01'), lon=slice(188,330), lat=slice(6, 70))\n", "subtas.isel(time=0).plot()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Subset processes with WPS and FlyingPigeon\n", "\n", "PAVICS offers a number of subsetting processes through the FlyingPigeon WPS server:\n", " - subset_continents\n", " - subset_countries\n", " - subset_bbox\n", " - subset_wfs\n", " - subset\n", " \n", "The `subset_continents` and `subset_countries` use a predefined list of polygons for the subsetting. The `subset_bbox` takes the geographical coordinates of the two opposite corner of a rectangle to define the subset region, while both `subset_wfs` and `subset` use a polygon defined on a remote geoserver, identified by a typename and a feature id. The only difference between those two is that `subset` also does temporal subsetting. \n", "\n", "The first step to launch those services is to create a connexion to the WPS server using Birdy's `WPSClient`. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from birdy import WPSClient\n", "url = 'https://pavics.ouranos.ca/twitcher/ows/proxy/flyingpigeon/wps'\n", "fp = WPSClient(url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we'll use `fp.subset_continents`, so let's first check what arguments it expects and pass those to the function." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on method subset_continents in module birdy.client.base:\n", "\n", "subset_continents(region='Africa', mosaic=None, resource=None) method of birdy.client.base.WPSClient instance\n", " Return the data whose grid cells intersect the selected continents for each input dataset.\n", " \n", " Parameters\n", " ----------\n", " region : {'Africa', 'Asia', 'Australia', 'North America', 'Oceania', 'South America', 'Antarctica', 'Europe'}string\n", " Continent name.\n", " mosaic : boolean\n", " If True, selected regions will be merged into a single geometry.\n", " resource : ComplexData:mimetype:`application/x-netcdf`, :mimetype:`application/x-tar`, :mimetype:`application/zip`\n", " NetCDF Files or archive (tar/zip) containing netCDF files.\n", " \n", " Returns\n", " -------\n", " output : ComplexData:mimetype:`application/x-tar`\n", " Tar archive of the subsetted netCDF files.\n", " ncout : ComplexData:mimetype:`application/x-netcdf`\n", " NetCDF file with subset for one dataset.\n", " output_log : ComplexData:mimetype:`text/plain`\n", " Collected logs during process run.\n", "\n" ] } ], "source": [ "help(fp.subset_continents)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "thredds = 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/'\n", "ncfile = 'birdhouse/testdata/flyingpigeon/cmip5/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612.nc'\n", "resp = fp.subset_continents(resource=thredds+ncfile, region='Africa')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The response we're getting can either include the data itself or a reference to the data. Using the `get` method of the response object, we'll get what was included in the response. If the response holds only a reference (link) to the output, we can retrieve it using the `get(as_obj=True)` method. Birdy will then inspect the file format of each output and try to find the appropriate way to open the file and return a Python object. A warning is issued if no converter is found, in which case the original reference is returned." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/david/src/birdy/birdy/client/outputs.py:65: UserWarning: No converter was found for mime type: application/x-tar\n", " warnings.warn(UserWarning(\"No converter was found for mime type: {}\".format(output.mimeType)))\n" ] } ], "source": [ "resp.get()\n", "tar_out, nc_out, log = resp.get(asobj=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we'll open the netCDF dataset using xarray and plot the result. Note that since nc_out is an already opened netcdf4.Dataset, we're using the `xr.backends.NetCDF4DataStore` function to open the dataset. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import xarray as xr\n", "ds = xr.open_dataset(xr.backends.NetCDF4DataStore(nc_out))\n", "ds.tasmax.isel(time=0).plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }