Using R to Access data

Downloading Data From CanWIN (ckan)

The scripts below will demonstrate a few ways to download a csv into your R environment or directly onto your harddrive using the ckanr library. Only csv, xls, xlsx, xml, html, json, shp, geojson, and txt files are downloadable using this script. See pages 15-17 of the documentation for additional details on fetching the other data types.

In CanWIN we allow users to dowload a dataset in a compressed folder (.zip); however, these are not downloadable using R. Refer to the Downloading Data help page for information on how to download data packages.

Loading Library and Working Environment

In order for your code to work and extract data from the CanWIN site, rather than the default site within the code, you will need to set the code to the correct server. There are different servers that ckan offers for extracting data, so it is important to ensure you are directing your script to the correct server.

Viewing Data Categories

View a list of the Theme, Dataset, and Keyword categories we use in CanWIN, which can help you explore some of the available datasets. In ckanr these are known as group, package, and tag.Listing categories of datasets

Importing a Dataset File into RStudio

A majority of the datasets on CanWIN are csv. If the resource on a dataset page is a compressed folder, you’ll need to download directly from site. Importing a csv into R requires you to know the resource ID of the dataset you are wanting to access, which can be found in the metadata section at the bottom of the dataset page. Here we present an example using the dyplr package provided in the Tidyverse library.

Downloading a Dataset Directly to Harddrive

To download a dataset from CanWIN onto your harddrive, or "disk" in ckanr, requires you to use the URL to the direct resource you would like to download. Ckanr provides you with a function that only requires the resource ID to generate this URL for you, which can then be accessed using a caller command. It is important to set your working directory to the location you would like to save your dataset in (generally coded at the beginning of the script). The example below utilizes our 2016 Lake Waterhen Ecotriplet dataset.

Downloading data from ERDDAP

As you begin to familiarize yourself with CanWIN datasets you will notice that some of our data is linked in through various platforms. One of these platforms is CanWIN ERDDAP server, where we host bigger datasets for open-access. To access data on CanWIN that is hosted on ERDDAP you can use the R package rerddap. Similar to the ckanr package you must set your coding environment to the correct server. ERDDAP offers multiple servers with this script and defaults to the NOAA GEO-IDE UAF ERDDAP server when using the code.

Searching and Filtering Datasets

Once you have setup your environment and determined what dataset type you would like to access, you can code for more specific search and query of ERDDAP datasets. For more advanced search options, read page 12 of rerddap package documentation.

Importing a Dataset File into RStudio

When you have found a dataset you would like to retrieve, you will need to use the dataset ID given to the dataset when uploaded to the ERDDAP server. This can be done by copy/paste from our ERDDAP dataset table online or filtering the dataset IDs via R. The example below utilizes our tabledap type, Greenedge nutrient dataset.

Downloading a Dataset Directly to Harddrive

Similarly to ckanr, there is a disk function that allows you to specify the harddrive pathway to where you would like to save your filtered/modified dataset to your computer. There is also an option to save using memory for storing in your RStudio enivronment as a dataframe. You will need to pass disk() in a function and specify the pathway if you wish. The pathway to which the dataset is saved is then printed in your console after running the function, along with other information.

There are more packages and coding languages you can utilize with ERDDAP. Additional information can be found on the Awesome-ERDDAP GitHub page or go to our External Software Help page for a summary of resources provided for you.