Using controlled vocabularies is an important step in data curation. It is a means of standardizing data so that it can be easily understood and reused by domain and non-domain experts, as well as shared between different systems. Controlled vocabularies are applied to your data before it is published.
There are certain variable descriptors that can be added to a tabular data, that gives a user more information about the variables in your data file. These descriptors are added as columns in your data file beside the variable they describe.
For example, Result Value Qualifiers (RVQs) are a list of codes that explain erroneous or missing data. Consider the case where your dataset contains erroneous values because your instrument failed. By adding a result_value_qualifier column to your data, and adding the code FEF (Field Equipment failed) where the values are erroneous, this allows users to understand right away that there is an issue with those values because of instrument failure.
CanWIN curates a specific controlled vocabulary list for describing variables in your dataset, that we recommend you include where applicable.
Click here for CanWIN's variable descriptors
Controlled vocabularies for variable names are curated naming conventions that allow us to use a standard variable name that is associated with a specific definition. Much like Digital Object Identifiers (DOI's), these naming conventions and definitions have a permanent location, meaning it can always be referenced. This benefits the researcher since it allows their datasets to be described in a clear and unambiguous manner, facilitating reuse by experts and non-experts alike.
CanWIN recommends that you use standardized variable names as the column headers in your tabular datasets. If your variable is not yet on our list, a data curator can work with you to find an appropriate standardized variable name.
Click here for CanWIN's standardized variable names