West Virginia Web Scraping

West Virginia Data Scraping, Web Scraping Tennessee, Data Extraction Tennessee, Scraping Web Data, Website Data Scraping, Email Scraping Tennessee, Email Database, Data Scraping Services, Scraping Contact Information, Data Scrubbing

Friday 28 November 2014

Scraping XML Tables with R

A couple of my good friends also recently started a sports analytics blog. We’ve decided to collaborate on a couple of studies revolving around NBA data found at www.basketball-reference.com. This will be the first part of that project!

Data scientists need data. The internet has lots of data. How can I get that data into R? Scrape it!

People have been scraping websites for as long as there have been websites. It’s gotten pretty easy using R/Python/whatever other tool you want to use. This post shows how to use R to scrape the demographic information for all NBA and ABA players listed at www.basketball-reference.com.

Here’s the code:

###### Settings

library(XML)

 ###### URLs

url<-paste0("http://www.basketball-reference.com/players/",letters,"/")

len<-length(url)

 ###### Reading data

tbl<-readHTMLTable(url[1])[[1]]

 for (i in 2:len)

    {tbl<-rbind(tbl,readHTMLTable(url[i])[[1]])}

 ###### Formatting data

colnames(tbl)<-c("Name","StartYear","EndYear","Position","Height","Weight","BirthDate","College")

tbl$BirthDate<-as.Date(tbl$BirthDate[1],format="%B %d, %Y")

Created by Pretty R at inside-R.org

And here’s the result:Result

Source: http://www.r-bloggers.com/scraping-xml-tables-with-r/

No comments:

Post a Comment