Jan 1, 2018
rvest is an R package that makes it easy to scrape text from web pages.
This code is from the GitHub page for the package. It shows how to scrape the rating, cast, and poster for The Lego Movie from IMBD.
library(rvest) lego_movie <- read_html("http://www.imdb.com/title/tt1490017/") rating <- lego_movie %>% html_nodes("strong span") %>% html_text() %>% as.numeric() rating ## [1] 7.8 cast <- lego_movie %>% html_nodes("#titleCast .itemprop span") %>% html_text() cast ## [1] "Will Arnett" "Elizabeth Banks" "Craig Berry" ## [4] "Alison Brie" "David Burrows" "Anthony Daniels" ## [7] "Charlie Day" "Amanda Farinos" "Keith Ferguson" ## [10] "Will Ferrell" "Will Forte" "Dave Franco" ## [13] "Morgan Freeman" "Todd Hansen" "Jonah Hill" #Scrape the website for the url of the movie poster poster <- lego_movie %>% html_nodes("#img_primary img") %>% html_attr("src") poster CSS selector The trick to all of this is the text you put in the html_nodes function.