Prepare the Data and Metadata

Prepare the Data

After data collection, we recommend storing the data in a secure location, with at least two copies of the data regularly synced. CanWIN provides a GitLab repository for working data storage, i.e., data currently being processed to be shared publicly. In cases where Git is not feasible, for example, if the file sizes are too large, we also have an FTP server where data can be stored. The main point of data storage is to ensure the safety (protection from physical loss) and security (protection from unauthorized access) of the data.

Note: CanWIN offers both storage, such as our GitLab or FTP server for working and archival data, as well as an open-access repository for publishing processed data.

It is, therefore, not recommended that raw, unprocessed files be submitted to CanWIN for open-access publishing. Where possible, files should be analysis-ready, supporting the reusability of the data. The data files should also adhere to structure and formatting best practices where possible. 

Steps we recommend:

  • Follow the best practice recommendations for your data files. See our primer for a summary of these best practices, or get more details on our data best practices page.

Prepare the Metadata

Metadata is as important as data when publishing in an open-access data repository. Metadata provides context and makes it easier to understand the research and data. See our metadata primer for a quick summary of why metadata is so essential. 

Steps we recommend:

  • Have all the mandatory and applicable recommended metadata ready to be submitted. See the metadata that CanWIN collects here.
  • Record the steps to process the data at each stage; from raw to intermediate to the data's final processed state. A CanWIN curator will work with you to create a Data Cookbook using these steps.
  • Describe any scripts used to process the data; the purpose of the script, and a description of each main function or section. A CanWIN curator will work with you to create a Data Codebook to accompany your code. 
  • Create a Data Dictionary for the variables in your data file. It should include the following headers:
Common name Units Description Variable media Statistic applied
A common name for each variable. For example, if T_C represents temperature in the data file, the common name would be Temperature Units A description of the variable Media where variable was collected (controlled list, see point below on standardized vocabularies) Any statistic applied to the variable (controlled list, see point below on standardized vocabularies)
  • Consider using standardized vocabularies for unambiguous, interoperable data. See our curated vocabularies page for some standardized (controlled) vocabularies/terms used by CanWIN. We recommend that you:

See our metadata levels primer for the different levels of metadata completeness.

 

Back Next