Machine readable • sysrevdata

# packages used in this walkthrough
library(sysrevdata)
library(tidyverse)
library(leaflet)

# this avoids tidyverse conflicts with the base function filter
conflicted::conflict_prefer("filter", "dplyr")

complex visualisations and analysis

Along with a narrative synthesis, systematic reviews and maps typically require complex visualisations. In addition, it is common for review authors to upgrade a systematic map into one or more focused systematic reviews, which may involve . Before this can happen, review authors need to convert their database of studies from a condensed or wide format into a format that is ready for analysis. The easiest way to do this is to produce a long or tidy format, one variable per study per row.

This is the ideal way to share data for future syntheses, as it is machine readable for multiple analyses. It also explicitly separates connected data, rather than compressing multiple methods and outcomes onto a single line, losing links between columns. In this long version, each linkage between populations, interventions, comparators, outcomes, study methods, etc. is preserved explicitly on a separate row - a row of independent data.

In this walkthrough, we’ll consider the bufferstrips dataset, which starts of as wide-formatted (each level of each variable is presented as a separate column).

# spatial variables
buffer_example %>% 
  select(short_title, contains("spatial"))
#> # A tibble: 5 x 7
#>   short_title spatialscale_pl~ spatialscale_fi~ spatialscale_fa~
#>   <chr>       <chr>            <chr>            <chr>           
#> 1 Aaron (200~ <NA>             <NA>             <NA>            
#> 2 Aavik (200~ <NA>             <NA>             <NA>            
#> 3 Aavik (201~ <NA>             <NA>             <NA>            
#> 4 Abu-Zreig ~ Plot scale       <NA>             <NA>            
#> 5 Abu-Zreig ~ Plot scale       <NA>             <NA>            
#> # ... with 3 more variables: spatialscale_catchment <chr>,
#> #   spatialscale_regional <chr>, spatialscale_notdescribed <chr>

wide to long

We want to convert these wide data to long-form data, dropping the elements and producing a table where each row contains a unique of each for each study.

We’ll use the collection of variables we obtained in the creating a narrative synthesis table vignette.

buffer_variables
#>  [1] "es"                      "farmingproductionsystem"
#>  [3] "farmingsystem"           "intervention"           
#>  [5] "measurementquarter"      "outcome"                
#>  [7] "spatialscale"            "striplocation"          
#>  [9] "stripmanagement"         "studydesign"            
#> [11] "vegetationtype"

Here’s one way of transforming the data to long format.

buffer_example_long <-
  buffer_example %>%
  # this function pivots longer
  pivot_longer(
    # see the narrative vignette for how this vector was created
    cols = contains(buffer_variables),
    # name of column we will put the column names of the wide data
    names_to = "category_type",
    # name of column we will put the values of those columns in
    values_to = "subcategory_value",
    # drop the NA values
    values_drop_na = TRUE
  ) %>%
 
  mutate(
      subcategory_type = map_chr(
    category_type,
    .f = function(x){ifelse(
    str_detect(x, "_"),
    str_match(x, "_(\\w+)") %>% pluck(2),
    NA
  )}),
    category_type = if_else(
    # extract the prefix of the column names with _
    str_detect(category_type, "_"),
    str_extract(category_type, "[a-z]*"),
    category_type
  )
  )

# newly created columns
buffer_example_long %>%
  select(short_title, category_type, subcategory_type, subcategory_value)
#> # A tibble: 97 x 4
#>    short_title  category_type           subcategory_type    subcategory_value   
#>    <chr>        <chr>                   <chr>               <chr>               
#>  1 Aaron (2005) vegetated               strip_description   Riparian buffer     
#>  2 Aaron (2005) studydesign             observational       Observational       
#>  3 Aaron (2005) farmingsystem           notdescribed        Not described       
#>  4 Aaron (2005) farmingproductionsystem notdescribed        Not described       
#>  5 Aaron (2005) vegetationtype          notdescribed        Not described       
#>  6 Aaron (2005) stripmanagement         notdescribed        Not described       
#>  7 Aaron (2005) intervention            presence            Strip presence      
#>  8 Aaron (2005) intervention            presenceinfo        Percentage riparian~
#>  9 Aaron (2005) es                      supporting_biodive~ Biodiversity        
#> 10 Aaron (2005) Time since interventio~ <NA>                Not stated          
#> # ... with 87 more rows

condensed to long

Suppose, however, that we began with condensed data, as we created in the creating a narrative synthesis table vignette.

condensed_buffer_example
#> # A tibble: 5 x 24
#>   item_id short_title title  year period google_scholar_~ nation study_country
#>     <dbl> <chr>       <chr> <dbl> <chr>  <chr>            <chr>  <chr>        
#> 1  2.06e7 Aaron (200~ Inve~  2005 2005-~ http://scholar.~ USA    Maryland, USA
#> 2  2.06e7 Aavik (200~ What~  2008 2005-~ http://scholar.~ Eston~ Estonia      
#> 3  2.06e7 Aavik (201~ Quan~  2010 2010-~ http://scholar.~ Eston~ Estonia      
#> 4  2.06e7 Abu-Zreig ~ Expe~  2004 2000-~ http://scholar.~ Not s~ Not stated   
#> 5  2.06e7 Abu-Zreig ~ Phos~  2003 2000-~ http://scholar.~ Canada Ontario, Can~
#> # ... with 16 more variables: study_location <chr>, latitute <chr>,
#> #   longitude <chr>, `Study length (years)` <chr>,
#> #   itervention_structureinfo <chr>, es <chr>, farmingproductionsystem <chr>,
#> #   farmingsystem <chr>, intervention <chr>, measurementquarter <chr>,
#> #   outcome <chr>, spatialscale <chr>, striplocation <chr>,
#> #   stripmanagement <chr>, studydesign <chr>, vegetationtype <chr>

We want to take these data and create the same long-format we have above.

condensed_buffer_example %>% 
  pivot_longer(
    contains(buffer_variables),
    names_to = "category_type",
    values_to = "subcategory_value"
  ) %>% 
  separate(subcategory_value,
           # there are a maximum of 8 different subcategories
           into = letters[1:8],
           sep = "; ") %>% 
  pivot_longer(letters[1:8],
               values_to = "subcategory_value") %>%
  # get rid of nas
  filter(!is.na(subcategory_value)) %>%
  # drop redundant column
  select(-name) %>% 
  # from here is just for display
  select(short_title, category_type, subcategory_value)
#> # A tibble: 134 x 3
#>    short_title  category_type           subcategory_value        
#>    <chr>        <chr>                   <chr>                    
#>  1 Aaron (2005) es                      Riparian buffer          
#>  2 Aaron (2005) es                      Observational            
#>  3 Aaron (2005) es                      Not described            
#>  4 Aaron (2005) es                      Not described            
#>  5 Aaron (2005) es                      Not described            
#>  6 Aaron (2005) es                      Not described            
#>  7 Aaron (2005) es                      Strip presence           
#>  8 Aaron (2005) es                      Percentage riparian cover
#>  9 Aaron (2005) studydesign             Observational            
#> 10 Aaron (2005) farmingproductionsystem Not described            
#> # ... with 124 more rows

summarising

Now we have our data in long-form, we can perform various analyses, including (if we wanted to) meta-analysis on extracted full quantitative data from each study.

We might be interested in the number of countries in the systematic review, in which case we can use the original data where each row is a study (and studies are the independent data needed when we look at countries: each study was conducted in a specific country, so long data aren’t necessary yet).

bufferstrips %>% 
  count(study_country)
#> # A tibble: 113 x 2
#>    study_country                               n
#>    <chr>                                   <int>
#>  1 Alberta, Canada                             1
#>  2 Argentina                                   5
#>  3 Arkansas, Kentucky and Mississippi, USA     1
#>  4 Arkansas, USA                               5
#>  5 Austria                                     3
#>  6 Belgium                                    15
#>  7 British Columbia, Canada                    4
#>  8 California, USA                             6
#>  9 Central and eastern USA                     1
#> 10 Central district, Russia                    1
#> # ... with 103 more rows

But to see the number of, say, observations in the farming production system, we will use our long data.

buffer_example_long %>%
  filter(category_type == "farmingproductionsystem") %>%
  count(subcategory_value)
#> # A tibble: 5 x 2
#>   subcategory_value                                                  n
#>   <chr>                                                          <int>
#> 1 Cropped fields (arable)                                            1
#> 2 Livestock                                                          1
#> 3 Mixed conventional and organic, multiple farms (not described)     1
#> 4 Not described                                                      3
#> 5 Other (please specify)                                             1

creating evidence atlases

For creating we need the data in the wide-format with one row per study. Our bufferstrips dataset is in the wide format already so lets try and reconfigure a wide database from the long formatted data that we created above.

back to wide

back_to_wide <-
  buffer_example_long %>% 
  pivot_wider(
    id_cols = -contains("category"),
    names_from = c(category_type, subcategory_type),
    names_sep = "_",
    values_from = subcategory_value
  )

We can use a wide formatted dataset to plot a cartographic map of the study locations for example.

back_to_wide %>% 
  select(short_title,latitute, longitude, google_scholar_link) %>% 
  mutate(lat=as.numeric(latitute)) %>% 
  mutate(lng=as.numeric(longitude)) %>% 
  mutate(tag = paste0("Scholar_link: <a href=", google_scholar_link,">", google_scholar_link, "</a>")) %>% 
  leaflet(width = "100%") %>% 
  addTiles() %>%  # Add default OpenStreetMap map tiles
  addMarkers(lng=~lng, lat=~lat, popup=~tag, clusterOptions = markerClusterOptions())

The code above is just applied to a subset of the data but we can apply the same code to our bufferstrips dataframe.

map<-sysrevdata::bufferstrips %>% 
  select(short_title,latitute, longitude, google_scholar_link) %>% 
  mutate(lat=as.numeric(latitute)) %>% 
  mutate(lng=as.numeric(longitude)) %>% 
  mutate(tag = paste0("Scholar_link: <a href=", google_scholar_link,">", google_scholar_link, "</a>"))

# you might need to tidy up the encoding in the dataframe to get it to work with leaflet 
Encoding(x = map$tag) <- "UTF-8"

# replace all non UTF-8 character strings with an empty space
map$tag <-
  iconv( x = map$tag,
          from = "UTF-8"
         , to = "UTF-8"
         , sub = "" )

map %>% leaflet(width = "100%") %>% 
  addTiles() %>%  # Add default OpenStreetMap map tiles
  addMarkers(lng=~lng, lat=~lat, popup=~tag, clusterOptions = markerClusterOptions())