complex.Rmd
# packages used in this walkthrough
library(sysrevdata)
library(tidyverse)
library(tippy)
# this avoids tidyverse conflicts with the base function filter
conflicted::conflict_prefer("filter", "dplyr")
In addition to a narrative synthesis, systematic reviews and maps typically require complex visualisations, such as evidence atlases. In addition, it is common for review authors to upgrade a systematic map into one or more focused systematic reviews, which may involve
. The easiest way to do this is to produce a long or tidy format, one variable per study per row.
This is the ideal way to share data for future syntheses, as it is machine readable for multiple analyses.
In this walkthrough, we’ll consider the bufferstrips
dataset, which is wide-formatted by subcategory.
# spatial variables
buffer_example %>%
select(short_title, contains("spatial"))
#> # A tibble: 5 x 7
#> short_title spatialscale_pl… spatialscale_fi… spatialscale_fa…
#> <chr> <chr> <chr> <chr>
#> 1 Aaron (200… <NA> <NA> <NA>
#> 2 Aavik (200… <NA> <NA> <NA>
#> 3 Aavik (201… <NA> <NA> <NA>
#> 4 Abu-Zreig … Plot scale <NA> <NA>
#> 5 Abu-Zreig … Plot scale <NA> <NA>
#> # … with 3 more variables: spatialscale_catchment <chr>,
#> # spatialscale_regional <chr>, spatialscale_notdescribed <chr>
We wish to convert these wide data to long-form data, dropping the NA
elements and producing a table wherein each row represents the subcategory for one category for each study.
We’ll use the collection of variables we obtained in the creating a narrative synthesis table vignette.
buffer_variables
#> [1] "farmingproductionsystem" "farmingsystem"
#> [3] "measurementquarter" "spatialscale"
#> [5] "striplocation" "stripmanagement"
#> [7] "studydesign" "vegetationtype"
Here’s one way of transforming the data to long format.
buffer_example_long <-
buffer_example %>%
# this function pivots longer
pivot_longer(
# see the narrative vignette for how this vector was created
cols = contains(buffer_variables),
# name of column we will put the column names of the wide data
names_to = "category",
# name of column we will put the values of those columns in
values_to = "subcategory",
# drop the NA values
values_drop_na = TRUE
) %>%
# I suspect there's a tidy solution to this. However, finding a regex solution
# was just as useful. There is a lot of string manipulation and extraction in
# evisynth that is not pivoting.
mutate(
category_suffix = map_chr(
category,
.f = function(x){ifelse(
str_detect(x, "_"),
str_match(x, "_(\\w+)") %>% pluck(2),
NA
)}),
category = if_else(
# extract the prefix of the column names with _
str_detect(category, "_"),
str_extract(category, "[a-z]*"),
category
)
)
# newly created columns
buffer_example_long %>%
select(short_title, category, category_suffix, subcategory)
#> # A tibble: 54 x 4
#> short_title category category_suffix subcategory
#> <chr> <chr> <chr> <chr>
#> 1 Aaron (2005) farmingproductionsystem notdescribed Not described
#> 2 Aaron (2005) farmingsystem notdescribed Not described
#> 3 Aaron (2005) measurementquarter Q1 Q1
#> 4 Aaron (2005) measurementquarter Q2 Q2
#> 5 Aaron (2005) spatialscale catchment Catchment scale
#> 6 Aaron (2005) spatialscale regional Regional scale
#> 7 Aaron (2005) striplocation riparian Riparian
#> 8 Aaron (2005) stripmanagement notdescribed Not described
#> 9 Aaron (2005) studydesign observational Observational
#> 10 Aaron (2005) vegetationtype notdescribed Not described
#> # … with 44 more rows
Suppose, however, that we began with condensed data, as we created in the creating a narrative synthesis table vignette.
condensed_buffer_example
#> # A tibble: 5 x 22
#> item_id short_title title year period google_scholar_… vegetated_strip…
#> <dbl> <chr> <chr> <dbl> <chr> <chr> <chr>
#> 1 2.06e7 Aaron (200… Inve… 2005 2005-… http://scholar.… Riparian buffer
#> 2 2.06e7 Aavik (200… What… 2008 2005-… http://scholar.… Field boundary
#> 3 2.06e7 Aavik (201… Quan… 2010 2010-… http://scholar.… Field boundary
#> 4 2.06e7 Abu-Zreig … Expe… 2004 2000-… http://scholar.… Vegetated filte…
#> 5 2.06e7 Abu-Zreig … Phos… 2003 2000-… http://scholar.… Vegetated filte…
#> # … with 15 more variables: nation <chr>, study_country <chr>,
#> # study_location <chr>, latitute <chr>, longitude <chr>, `Study length
#> # (years)` <chr>, `Time since intervention (years)` <chr>,
#> # farmingproductionsystem <chr>, farmingsystem <chr>,
#> # measurementquarter <chr>, spatialscale <chr>, striplocation <chr>,
#> # stripmanagement <chr>, studydesign <chr>, vegetationtype <chr>
We want to take these data and create the same long-format we have above.
condensed_buffer_example %>%
pivot_longer(
contains(buffer_variables),
names_to = "category",
values_to = "subcategory"
) %>%
separate(subcategory,
# there are a maximum of 8 different subcategories
into = letters[1:8],
sep = "; ") %>%
pivot_longer(letters[1:8],
values_to = "subcategory") %>%
# get rid of nas
filter(!is.na(subcategory)) %>%
# drop redundant column
select(-name) %>%
# from here is just for display
select(short_title, category, subcategory)
#> # A tibble: 59 x 3
#> short_title category subcategory
#> <chr> <chr> <chr>
#> 1 Aaron (2005) farmingproductionsystem Not described
#> 2 Aaron (2005) farmingsystem Not described
#> 3 Aaron (2005) measurementquarter Q1
#> 4 Aaron (2005) measurementquarter Q2
#> 5 Aaron (2005) spatialscale Catchment scale
#> 6 Aaron (2005) spatialscale Regional scale
#> 7 Aaron (2005) striplocation Riparian
#> 8 Aaron (2005) stripmanagement Not described
#> 9 Aaron (2005) studydesign Observational
#> 10 Aaron (2005) vegetationtype Not described
#> # … with 49 more rows
Now we have our data in long-form, we can perform various analyses.
We might be interested in the number of countries in the systematic review, in which case we can use the original data.
bufferstrips %>%
count(study_country)
#> # A tibble: 112 x 2
#> study_country n
#> * <chr> <int>
#> 1 Alberta, Canada 1
#> 2 Argentina 5
#> 3 Arkansas, Kentucky and Mississippi, USA 1
#> 4 Arkansas, USA 5
#> 5 Austria 3
#> 6 Belgium 15
#> 7 British Columbia, Canada 4
#> 8 California, USA 6
#> 9 Central and eastern USA 1
#> 10 Central district, Russia 1
#> # … with 102 more rows
But to see the number of, say, observations in the farming production system, we will use our long data.
buffer_example_long %>%
filter(category == "farmingproductionsystem") %>%
count(subcategory)
#> # A tibble: 5 x 2
#> subcategory n
#> * <chr> <int>
#> 1 Cropped fields (arable) 1
#> 2 Livestock 1
#> 3 Mixed conventional and organic, multiple farms (not described) 1
#> 4 Not described 3
#> 5 Other (please specify) 1