Debdeep Bhattacharya

View My GitHub Profile

Basics of R

18 Jul 2019

Installing packages in R using

Installing locally (i.e. in your ~ directory):

  install.packages("ggplot2") 

System-wide installation in Ubuntu (and saving space in /home):

sudo apt-get install r-cran-ggplot2

Reading a CSV

tran <- read.csv(filename, header=TRUE)

Exploring a CSV

names(tran)
head(tran)

or

tail(tran, 3)
str(tran)
levels(tran$Category)

Data manipulation using date

tran$day <- weekdays(as.Date(tran$Date))
daily$DoW <- factor(daily$DoW, levels= c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
 daily[order(daily$DoW), ]
barplot(table(tran$day))
t <- Sys.Date()
dayseq <- weekdays(seq.Date(t,t+6,by=1))

Get the corresponding weekday value by

weekdays(dayseq)

or,

daynames <- weekdays(dayseq, abbreviate=TRUE)

Renaming attributes:

Assume that you have a data with a variable name Category, and Catergoy can be either Grocery, Shopping or Travel. We would like to anonymize the data by renaming the 3 categories by the numbers 1, 2, and 3.

In order to do that, first convert the variable into a factor using:

data$Category <- factor(data$Category)

Then, you can use levels(data$Category) to get a vector with only 3 variables. You can change the factor data$Category the way you change a vector.

The problem is to edit an entry in the data frame which is a category type. For example, if you want to change data[4,"Category"] to hello, you cannot change it using data[4,"Category"] <- "hello" !!! Here is what you should do instead:

  1. Change the type of Category variable to character using:
data$Category <- as.character(data$Category)
  1. Edit the value:
data[4,"Category"] <- "hello"
  1. Change the variable type back to factor:
data$Category <- factor(data$Category)

It is a bit annoying.

ggplot2 examples

qplot(x=Date, y=Amount, data=tran, geom=c('point','line'), color=Category, alpha = I(0.7))
qplot(factor(timeS), data=tran, geom="bar", fill=factor(Category))
ggplot(tran, aes(timeS, fill=Category)) + geom_bar() + facet_wrap(~ User) 

A slight invariant:

ggplot(tran, aes(timeS, fill=User)) + geom_bar() + facet_wrap(~ Category)

stat='identity' is the option that lets you plot y vs x instead of the default statistics count.

ggplot(tran) + geom_bar(aes(timeS, Amount, fill=Category), stat='identity')

With separate user:

ggplot(tran) + geom_bar(aes(timeS, Amount, fill=Category), stat='identity') + facet_wrap(~ User)
ggplot(tran) + geom_bar(aes(x=timeS, y=Amount, fill=User), stat='identity') + facet_wrap(~ Category, nrow = 2)
ggplot(tran) + geom_bar(aes(timeS, Amount, fill=Category), stat='identity') + geom_bar(data=transform(tran, User=NULL), aes(x=timeS, y=Amount), stat='identity', alpha=I(0.2)) + facet_wrap(~User)
 ggplot(tran) + geom_bar(aes(timeS, Amount, fill=User), stat='identity') + geom_bar(data=transform(tran, Category=NULL), aes(x=timeS, y=Amount), stat='identity', alpha=I(0.2)) + facet_wrap(~Category)

In this context, manysum is nothing but

ggplot(tran) + geom_bar(aes(timeS, Amount, fill=User), stat='identity')