complex • sysrevdata

# packages used in this walkthrough
library(sysrevdata)
library(tidyverse)
library(tippy)

# this avoids tidyverse conflicts with the base function filter
conflicted::conflict_prefer("filter", "dplyr")

complex visualisations and analysis

In addition to a narrative synthesis, systematic reviews and maps typically require complex visualisations, such as evidence atlases. In addition, it is common for review authors to upgrade a systematic map into one or more focused systematic reviews, which may involve

. The easiest way to do this is to produce a long or tidy format, one variable per study per row.

This is the ideal way to share data for future syntheses, as it is machine readable for multiple analyses.

In this walkthrough, we’ll consider the bufferstrips dataset, which is wide-formatted by subcategory.

# spatial variables
buffer_example %>% 
  select(short_title, contains("spatial"))
#> # A tibble: 5 x 7
#>   short_title spatialscale_pl… spatialscale_fi… spatialscale_fa…
#>   <chr>       <chr>            <chr>            <chr>           
#> 1 Aaron (200… <NA>             <NA>             <NA>            
#> 2 Aavik (200… <NA>             <NA>             <NA>            
#> 3 Aavik (201… <NA>             <NA>             <NA>            
#> 4 Abu-Zreig … Plot scale       <NA>             <NA>            
#> 5 Abu-Zreig … Plot scale       <NA>             <NA>            
#> # … with 3 more variables: spatialscale_catchment <chr>,
#> #   spatialscale_regional <chr>, spatialscale_notdescribed <chr>

wide to long

We wish to convert these wide data to long-form data, dropping the NA elements and producing a table wherein each row represents the subcategory for one category for each study.

We’ll use the collection of variables we obtained in the creating a narrative synthesis table vignette.

buffer_variables
#> [1] "farmingproductionsystem" "farmingsystem"          
#> [3] "measurementquarter"      "spatialscale"           
#> [5] "striplocation"           "stripmanagement"        
#> [7] "studydesign"             "vegetationtype"

Here’s one way of transforming the data to long format.

buffer_example_long <-
  buffer_example %>%
  # this function pivots longer
  pivot_longer(
    # see the narrative vignette for how this vector was created
    cols = contains(buffer_variables),
    # name of column we will put the column names of the wide data
    names_to = "category",
    # name of column we will put the values of those columns in
    values_to = "subcategory",
    # drop the NA values
    values_drop_na = TRUE
  ) %>%
  # I suspect there's a tidy solution to this. However, finding a regex solution 
  # was just as useful. There is a lot of string manipulation and extraction in 
  # evisynth that is not pivoting. 
  mutate(
      category_suffix = map_chr(
    category,
    .f = function(x){ifelse(
    str_detect(x, "_"),
    str_match(x, "_(\\w+)") %>% pluck(2),
    NA
  )}),
    category = if_else(
    # extract the prefix of the column names with _
    str_detect(category, "_"),
    str_extract(category, "[a-z]*"),
    category
  )
  )

# newly created columns
buffer_example_long %>%
  select(short_title, category, category_suffix, subcategory)
#> # A tibble: 54 x 4
#>    short_title  category                category_suffix subcategory    
#>    <chr>        <chr>                   <chr>           <chr>          
#>  1 Aaron (2005) farmingproductionsystem notdescribed    Not described  
#>  2 Aaron (2005) farmingsystem           notdescribed    Not described  
#>  3 Aaron (2005) measurementquarter      Q1              Q1             
#>  4 Aaron (2005) measurementquarter      Q2              Q2             
#>  5 Aaron (2005) spatialscale            catchment       Catchment scale
#>  6 Aaron (2005) spatialscale            regional        Regional scale 
#>  7 Aaron (2005) striplocation           riparian        Riparian       
#>  8 Aaron (2005) stripmanagement         notdescribed    Not described  
#>  9 Aaron (2005) studydesign             observational   Observational  
#> 10 Aaron (2005) vegetationtype          notdescribed    Not described  
#> # … with 44 more rows

condensed to long

Suppose, however, that we began with condensed data, as we created in the creating a narrative synthesis table vignette.

condensed_buffer_example
#> # A tibble: 5 x 22
#>   item_id short_title title  year period google_scholar_… vegetated_strip…
#>     <dbl> <chr>       <chr> <dbl> <chr>  <chr>            <chr>           
#> 1  2.06e7 Aaron (200… Inve…  2005 2005-… http://scholar.… Riparian buffer 
#> 2  2.06e7 Aavik (200… What…  2008 2005-… http://scholar.… Field boundary  
#> 3  2.06e7 Aavik (201… Quan…  2010 2010-… http://scholar.… Field boundary  
#> 4  2.06e7 Abu-Zreig … Expe…  2004 2000-… http://scholar.… Vegetated filte…
#> 5  2.06e7 Abu-Zreig … Phos…  2003 2000-… http://scholar.… Vegetated filte…
#> # … with 15 more variables: nation <chr>, study_country <chr>,
#> #   study_location <chr>, latitute <chr>, longitude <chr>, `Study length
#> #   (years)` <chr>, `Time since intervention (years)` <chr>,
#> #   farmingproductionsystem <chr>, farmingsystem <chr>,
#> #   measurementquarter <chr>, spatialscale <chr>, striplocation <chr>,
#> #   stripmanagement <chr>, studydesign <chr>, vegetationtype <chr>

We want to take these data and create the same long-format we have above.

condensed_buffer_example %>% 
  pivot_longer(
    contains(buffer_variables),
    names_to = "category",
    values_to = "subcategory"
  ) %>% 
  separate(subcategory,
           # there are a maximum of 8 different subcategories
           into = letters[1:8],
           sep = "; ") %>% 
  pivot_longer(letters[1:8],
               values_to = "subcategory") %>%
  # get rid of nas
  filter(!is.na(subcategory)) %>%
  # drop redundant column
  select(-name) %>% 
  # from here is just for display
  select(short_title, category, subcategory)
#> # A tibble: 59 x 3
#>    short_title  category                subcategory    
#>    <chr>        <chr>                   <chr>          
#>  1 Aaron (2005) farmingproductionsystem Not described  
#>  2 Aaron (2005) farmingsystem           Not described  
#>  3 Aaron (2005) measurementquarter      Q1             
#>  4 Aaron (2005) measurementquarter      Q2             
#>  5 Aaron (2005) spatialscale            Catchment scale
#>  6 Aaron (2005) spatialscale            Regional scale 
#>  7 Aaron (2005) striplocation           Riparian       
#>  8 Aaron (2005) stripmanagement         Not described  
#>  9 Aaron (2005) studydesign             Observational  
#> 10 Aaron (2005) vegetationtype          Not described  
#> # … with 49 more rows

summarising

Now we have our data in long-form, we can perform various analyses.

We might be interested in the number of countries in the systematic review, in which case we can use the original data.

bufferstrips %>% 
  count(study_country)
#> # A tibble: 112 x 2
#>    study_country                               n
#>  * <chr>                                   <int>
#>  1 Alberta, Canada                             1
#>  2 Argentina                                   5
#>  3 Arkansas, Kentucky and Mississippi, USA     1
#>  4 Arkansas, USA                               5
#>  5 Austria                                     3
#>  6 Belgium                                    15
#>  7 British Columbia, Canada                    4
#>  8 California, USA                             6
#>  9 Central and eastern USA                     1
#> 10 Central district, Russia                    1
#> # … with 102 more rows

But to see the number of, say, observations in the farming production system, we will use our long data.

buffer_example_long %>%
  filter(category == "farmingproductionsystem") %>%
  count(subcategory)
#> # A tibble: 5 x 2
#>   subcategory                                                        n
#> * <chr>                                                          <int>
#> 1 Cropped fields (arable)                                            1
#> 2 Livestock                                                          1
#> 3 Mixed conventional and organic, multiple farms (not described)     1
#> 4 Not described                                                      3
#> 5 Other (please specify)                                             1