incorrigible tidy::vert

music

a song with %>% references

Charles T. Gray https://twitter.com/cantabile
10-24-2019

incorrigible tidy::vert %>% exegesis()

This post was adapted for inclusion in a joint publication of mathematically-informed artworks engaging with Magritte’s pipe, an icon of surrealism. This manuscript is from a working group of researchers who met at the 2019 Heidelberg Laureate Forum.

Thank you, Laura Ación, Hao Ye, and James Goldie for checking the maths, and for the reflections on %>%.

In his 1929 surrealist painting, The Treachery of Images, René Magritte declares Ceci n’est pas une pipe (This is not a pipe). In so doing, he highlights this is but an image, a representation, of a pipe, not truly a pipe itself.

The Treachery of Images, René Magritte, 1929. Image source: wikipedia.

I recently wrote a song with lots of references to R and the tidyverse:: metapackage(Wickham 2017).

I didn't mean to %>% your bubble
I'm just an incorrigible tidy::vert

In this post, I’ll unpack the pipe operator, %>%, that features throughout the lyrics.

the %>% operator in the R language

The %>% pipe operator is the first major concept introduced in the programming section, following exploratory data analysis and data wrangling, of Wickham’s R for Data Science (Grolemund and Wickham 2017).

With Stefan Milton Bache’s magrittr::(Bache and Wickham 2014) package,

# f(x) is equivalent to x %>% f() with magrittr::

library(magrittr)

# f(x)
round(3.1)
[1] 3
# is equivalent to x %>% f()

3.1 %>% round()
[1] 3

What is an operator?

We often forget that operators are, themselves, functions.

Brian A. Davey’s MAT4GA General Algebra coursebook (what I have on hand) provides this definition.


For \(n \in \mathbb N_0 := \mathbb N \cup \{ 0\}\), a map \(f : A^n \to A\) is called an n-ary operation on A.


For example, \(+\) is a function that takes two arguments, numbers, and returns a single number. Algebraically, 3 + 2 = 5 is shorthand for +(3, 2) = 5.

For those with formal mathematical training, multiple uses of the %>% operator in a single line of code can be thought of in terms of a coding instantiation of a composite of functions.

What is a composite?


Let \(f\) and \(g\) be real functions.

The composite of \(f\) with \(g\) is the real function \(g \circ f\) given by the formula \((g \circ f)(x) := g(f(x))\).


For reasons that only made sense to me once I reached graduate-level mathematics, we read a composite of functions from right to left.

And just to break our brains a little, algebraically, the composite operator is a function, so we have \(g \circ f = \circ (f, g)\)!

The pipe, %>%, operator is the R-language equivalent to the composite \(\circ\) operator on real functions.

Why do I love to %>%?

Here is an example with three functions: \((h \circ g \circ f)(x) := h(g(f(x))).\)

set.seed(39)

# get a random sample size between 20 & 100
sample(seq(20, 100), 1) %>% # this f(x) goes into
  # generate sample from normal distribution with 
  # mean 50 & sd 0.5
rnorm(., 50, 0.5) %>% # g, so, now g(f(x), which goes into
  # calculate mean of that sample
  mean() # h, so h(g(f(x)))
[1] 49.94228

To see how this is the \((h \circ g \circ f)(x)\) instantiation, reading from right to left, we take a look at the \(h(g(f(x)))\) instantiation of the same code.

# this line of code is equivalent to above
# h(g(f(x))) is less text
# but the algorithm is harder to ascertain 
mean(rnorm(sample(seq(20, 100), 1), 50, 0.5))
[1] 49.98828

The reader is invited to consider if they agree with the author that it is harder to read the symbols so close together, in this \(h(g(f(x)))\) instantiation of the code. Also, arguably more importantly, one does not have the ability to comment each component of the algorithm.

There is a downside to the %>%, however. The longer a composite becomes, the more difficult it is to identify errors.

 
On the the train Leipzig with snow falling
And my %>%s are getting too long

incorrigible tidy::vert %>% lyrics()

Caught the train to Leipzig, snow is falling
But I am not nearly done
Rube Goldberging this algorithm
But the sampling is off.

I didn't mean to %>% your bubble
I'm just an incorrible tidy::vert

And I'm here to tell ya
There's some rhyme and reason
But there's a whole lot that can get fucked up

And I'm here to tell ya
There's scarce rhyme and reason
So let's brace for the shitstorm

Don't think I'll unravel
The mysteries of the beta distribution
On the the train Leipzig with snow falling
And my %>%s are getting too long

But I didn't mean to %>% your bubble
I'm just an incorrible tidy::vert

And I'm here to tell ya
There's some rhyme and reason
But there's a whole lot that can get fucked up

And I'm here to tell ya
There's scarce rhyme and reason
So let's brace for the CRANstorm

And I didn't mean to %>% your bubble
Let your flame war flame itself out
And I didn't mean to %>% your bubble
Excuse Me, Do You Have a Moment 
 to Talk About Version Control?

And I didn't mean to %>% your bubble
I'm just an incorrible tidy::vert
And I didn't mean to %>% your bubble
I'm just an incorrible tidy::vert

iphone balanced on music stand quality recording

Don’t say I didn’t warn you about the sound quality.

Bache, Stefan Milton, and Hadley Wickham. 2014. “Magrittr: A Forward-Pipe Operator for R.”
Grolemund, Garrett, and Hadley Wickham. 2017. R for Data Science.
Wickham, Hadley. 2017. “Tidyverse: Easily Install and Load the ’Tidyverse’.”

References