If you’ve been introduced to R as a simple way to do data analysis you might have come across this strange operator, %>%. It’s called a pipe because it passes data from one function to another. Here’s an example of subsetting and transforming data using the pipe from the magrittr package:

library(magrittr)

dat <- airquality %>%
  subset(Ozone > 40) %>%
  transform(Celsius = (Temp - 32) * (5/9)) %>%
  head()

dat

##    Ozone Solar.R Wind Temp Month Day  Celsius
## 1     41     190  7.4   67     5   1 19.44444
## 29    45     252 14.9   81     5  29 27.22222
## 30   115     223  5.7   79     5  30 26.11111
## 40    71     291 13.8   90     6   9 32.22222
## 62   135     269  4.1   84     7   1 28.88889
## 63    49     248  9.2   85     7   2 29.44444

The first line can be read, “I’m going to make a new object called dat and it’s going to start with the airquality data frame”. The %>% at the end of the first line pipes the data frame to the next line, which is the subset function. If you look at the documentation for subset, the first argument is x, an “object to be subsetted”. The %>% takes the data frame immediately before it and places it in the first argument of the function immediately following it. So airquality becomes the object to be subsetted in the subset function.

Since the pipe has already assigned a data frame to the first argument of subset, the next argument in the function is a logical expression that is used to select rows to keep (i.e., subset the data frame). I want to keep all rows where the ozone values are above 40.

Once the concept sinks in, you can easily read the rest of the code. The output of subset is piped to the first argument of transform. The argument that I have inside of transform is assigned to the second argument, and the output of transform is passed on to the first argument of head.

dplyr

So why use the pipe? For one thing, you avoid reassigning the data frame every time you change it. Here’s the subset/transformation from above without the pipes.

dat <- subset(airquality, Ozone > 40) 
dat <- transform(dat, Celsius = (Temp - 32) * (5/9)) 
dat <- head(dat)

dat

##    Ozone Solar.R Wind Temp Month Day  Celsius
## 1     41     190  7.4   67     5   1 19.44444
## 29    45     252 14.9   81     5  29 27.22222
## 30   115     223  5.7   79     5  30 26.11111
## 40    71     291 13.8   90     6   9 32.22222
## 62   135     269  4.1   84     7   1 28.88889
## 63    49     248  9.2   85     7   2 29.44444

You not only avoid reassigning the data frame every time, but you don’t have to type the data frame object as the first argument in each function.

Admittedly, the amount of typing being saved is minimal. The other main reason to use pipes is the benefit of chaining dplyr functions together. Those functions were written with the pipe in mind.

library(dplyr)

dat <- airquality %>%
  filter(Ozone > 40) %>%
  mutate(Celsius = (Temp - 32) * (5/9)) %>%
  head()

dat

##   Ozone Solar.R Wind Temp Month Day  Celsius
## 1    41     190  7.4   67     5   1 19.44444
## 2    45     252 14.9   81     5  29 27.22222
## 3   115     223  5.7   79     5  30 26.11111
## 4    71     291 13.8   90     6   9 32.22222
## 5   135     269  4.1   84     7   1 28.88889
## 6    49     248  9.2   85     7   2 29.44444

Once you get used to using the pipe, you gain the ability to quickly read a chain of dplyr functions. And this can speed up your production significantly.

Using Pipes in R

dplyr

Nathan Byers

Using Pipes in R

dplyr

Nathan Byers

Web Scraping with rvest

Using Pipes in R