--- title: "Idronaut_data_workflow" author: "Katelyn Rodgers" date created: '2022-10-24' last edited: '2023-02-21' "by Katelyn Rodgers" output: html_document: keep_md: true --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ```{r load libraries} library(tidyverse) library(lubridate) library(janitor) library(ggplot2) library(hms) library(here) ``` ## Intermediate Idronaut dataset The intermediate dataset contains all the same variables and lines as the raw but with an addition column, a second conductivity at 25C (specific conductance). This second conductivity at 25C is an external calculation and used to compare with the internal calculated conductivity at 25C produced by the Idronaut itself. Equation and variable for calculating specific conductance at 25C. C25 = CT/(1+r(T-25)) where: C25 = Specific Conductance in Siemens /cm or microSeimens / cm CT = actual Conductivity of the sample in Siemens /cm or microSeimens / cm T = temperature of the sample in degrees C r = temperature correction coefficient for the sample ```{r intermediate files} # Read file - change file path to data file location raw <- read.table("raw//.txt") # change folder and file name # Move the first row to headers raw_headers <- raw %>% row_to_names(row_number = 1) # Convert column classes raw_headers$Date <- dmy(raw_headers$Date) ## convert date to YYYY-MM-DD raw_headers[3:9] <- lapply(raw_headers[3:9], FUN = function(y){as.numeric(y)}) # convert columns 3-9 to numeric # Temperature correction coefficient for the sample - this value can change r <- 0.02 # Adding column of calculated specific conductance interm <- raw_headers %>% mutate(Cond_std25 = round(Cond/(1+r*(Temp-25)), 4)) # Write csv file write.csv(interm, here("processed/intermediate//.csv"), row.names = FALSE) # change folder and file name ``` ## Clean Idronaut datasets There are two sections that need to be removed, the stabilizing/out of water period and the upcast. The first intermediate data file will include the downcast and the upcast. The second intermediate data file will only include the downcast. At the moment the best way to determine the portion of the stabilizing and upcast of which to remove is to examine the file and determine which lines should be removed from the file, descriptions of this process is in each subset below. Double check for multiple casts; this will lead to making multiple downcast files. If you just want the downcast and not the upcast, skip to section *Extracting downcast only*. ### Subset cast period (downcast and upcast) Typically the idronaut is turned on and left out of the water for a specific period of time, then placed in the water so all the sensors are submerged underwater and left to stabilize for a specific period of time. These stabilization periods will vary between each user. Since these periods will vary, the easiest way to clean these files is to examine the file and determine when the downcast starts. Ways to identify when the downcast starts is when the pressure is a positive value and it starts to increase at a steady rate. The out of water period will typically have negative pressure values and the stabilizing period will have positive pressure values which are around the same value. ```{r cleaned files step one} # Load intermediate file - change file path to data file location interm <- read.csv(here("processed/intermediate//.csv")) # use folder and file name from intermediate files script # Removing stabilization period via row numbers cast <- interm[c(:), ] # examine loaded interm data file (from above line) to determine the rows names (row numbers) to subset, when idronaut placed in water and then removed from water # Write csv file write.csv(cast, here("processed/cleaned//_cast.csv"), row.names = FALSE) # change folder and file name ``` ### Subset downcast period The upcast is the portion of data where the idronaut is recording on the ascend to the surface. Typically, this is seen when the pressure starts to decreasing, going back to zero. Examine the file and remove the remaining data points when the upcast starts (after largest pressure reading). Way to identify the upcast period is to start at the bottom of the file and move upwards and find the point in time where the pressure is the greatest. This point would be at the bottom of the downcast. Use you own discretion at which point should be the last reading before the upcast starts. Double check there aren't multiple casts in one cast; if this is the cast, you may need to repeat this step multiple times and remove to other casts to single out the cast you are trying to extract. However, typically the first cast is used for analysis, unless the first downcast was performed too quickly. ```{r cleaned files step two} # Load cast file - change file path to data file location cast <- read.csv(here("processed/cleaned//_cast.csv")) # use folder and file name from cleaned step one script # Removing upcast period via row numbers downcast <- cast[c(1:), ] # examine loaded cast data file (from above line) to determine the row names (row numbers) to remove/subset, use row numbers in uploaded cast in this code # Write csv file write.csv(downcast, here("processed/cleaned//_downcast.csv"), row.names = FALSE) # change folder and file name ``` ### Extracting downcast only Skip this step if you already did the above two steps. Use this section if you want to directly extract the downcast section of the data. ```{r cleaned files downcast only} # Load intermediate file - change file path to data file location interm <- read.csv(here("processed/intermediate//.csv")) # use folder and file name from intermediate files script # Removing stabilization period via row numbers downcast <- interm[-c(:), ] # examine loaded interm data file to determine the row names (row numbers) to remove/subset, insert rows where you want downcast to start and end # Write csv file write.csv(downcast, here("processed/cleaned//_downcast.csv"), row.names = FALSE) # change folder and file name ``` ### Checking Downcast data Visualize downcast data and save to working folder. This step will also help check the data was subsetted correctly. ```{r plot} # Load downcast data - change file path to data file location downcast <- read.csv(here("processed/cleaned//_downcast.csv")) # use folder and file name from intermediate files script # Plot temperature profile plot <- downcast %>% ggplot(aes(x = Temp, y = Pres)) + geom_line() + scale_x_continuous(position = "top") + scale_y_continuous(trans = "reverse") + labs(x = "Temperature (°C)", y = "Pressure (dbar; represented as depth in m)") + theme_test() plot ``` ## Final curated Idronaut dataset The final curated dataset will include any associated metadata with the data values, such as research program name, station ID, station latitude and longitude, ISO 8601 formated timestamp in UTC timezone, instrument name and ID, result value types, and statistics applied. Column names from cleaned files will also be changed to include units. Note: no units are provided for salinity and Sigma T. Typically input the downcast clean files, however if you want the upcast as well, use the full cast file. ```{r final curated files} # Load cleaned downcast file - change file path to data file location downcast <- read.csv(here("processed/cleaned//_downcast.csv")) # use folder and file name from cleaned step two script # Merge date and time and put in ISO8601 standard time_merge <- downcast %>% mutate(merge_time = paste(Date, Time)) # merging date and time and will be in ISO8601 standard # vec_cast(time_merge$merge_time) # Convert date and time to UTC time zone options(digits.secs = 2) # global option setting to show decimal seconds, may need to change this back to no digits (digits.secs = 0) after done with code local_tz <- as.POSIXct(strptime( time_merge$merge_time, format = "%Y-%m-%d %H:%M:%OS", tz ="America/Winnipeg"), tz = "America/Winnipeg") # indicating the timezone the current timestamp is in - change if in different timezone utc_tz <- as.POSIXlt(local_tz, tz = "UTC") # converting timestamp to UTC timezone df_utc_tz <- data.frame(utc_tz) # changing POSIXct format to dataframe time_utc <- time_merge %>% mutate(utc_iso8601 = df_utc_tz$utc_tz, .before = Date) %>% # adding UTC timestamp dataframe to other data select(-c(merge_time)) # removing column time_utc$utc_iso8601 <- strftime(time_utc$utc_iso8601 , "%Y-%m-%dT%H:%M:%OSZ", tz = "UTC") # change timestamp format to include 'T' and 'Z' # Add metadata and rename column names curated <- time_utc %>% mutate(Station_id = <"SiteID">, .before = utc_iso8601) %>% # Change station ID mutate(Latitude_dd = , .after = Station_id) %>% # Change station Latitude mutate(Longitude_dd = , .after = Latitude_dd) %>% # Change station Longitude rename(UTC_iso8601 = utc_iso8601) %>% relocate(UTC_iso8601, .after = Longitude_dd) %>% rename(Date_local = Date, Time_local = Time) %>% rename(Pres_Z = Pres) %>% mutate(Pres_Z_result_value_qualifier = "", .after = Pres_Z) %>% rename(CTDTmp90 = Temp) %>% mutate(CTDTmp90_result_value_qualifier = "", .after = CTDTmp90) %>% rename(CTDCond = Cond) %>% mutate(CTDCond_result_value_qualifier = "", .after = CTDCond) %>% rename(CTDSal = Sal) %>% mutate(CTDSal_result_value_qualifier = "", .after = CTDSal) %>% rename(Turbidity = Turb) %>% mutate(Turbidity_result_value_qualifier = "", .after = Turbidity) %>% rename(SigTheta = SigmaT) %>% mutate(sigma_t_result_value_qualifier = "", .after = SigTheta) %>% rename(CTDCond_std25_raw = Cond25) %>% mutate(CTDCond_std25_raw_result_value_qualifier = "", .after = CTDCond_std25_raw) %>% rename(CTDCond_std25_calc = Cond_std25) %>% mutate(CTDCond_std25_calc_result_value_qualifier = "", .after = CTDCond_std25_calc) # Write csv file write.csv(curated, here("processed/curated//_curated.csv"), row.names = FALSE) # change folder and file name ```