Installing locally (i.e. in your ~
directory):
install.packages("ggplot2")
System-wide installation in Ubuntu (and saving space in /home
):
sudo apt-get install r-cran-ggplot2
tran <- read.csv(filename, header=TRUE)
names(tran)
tail
) few rows (optional argument n=5
to specify how many rows to display)head(tran)
or
tail(tran, 3)
str(tran)
Category
levels(tran$Category)
tran
and tran$Date
be in a date format (possibilities)tran$day <- weekdays(as.Date(tran$Date))
DoW
)daily$DoW <- factor(daily$DoW, levels= c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
daily[order(daily$DoW), ]
barplot(table(tran$day))
t <- Sys.Date()
dayseq <- weekdays(seq.Date(t,t+6,by=1))
Get the corresponding weekday value by
weekdays(dayseq)
or,
daynames <- weekdays(dayseq, abbreviate=TRUE)
Assume that you have a data with a variable name Category
, and Catergoy
can be either Grocery
, Shopping
or Travel
. We would like to anonymize the data by renaming the 3 categories by the numbers 1
, 2
, and 3
.
In order to do that, first convert the variable into a factor using:
data$Category <- factor(data$Category)
Then, you can use levels(data$Category)
to get a vector with only 3 variables. You can change the factor data$Category
the way you change a vector.
The problem is to edit an entry in the data frame which is a category type. For example, if you want to change data[4,"Category"]
to hello
, you cannot change it using data[4,"Category"] <- "hello"
!!!
Here is what you should do instead:
data$Category <- as.character(data$Category)
data[4,"Category"] <- "hello"
data$Category <- factor(data$Category)
It is a bit annoying.
qplot(x=Date, y=Amount, data=tran, geom=c('point','line'), color=Category, alpha = I(0.7))
qplot(factor(timeS), data=tran, geom="bar", fill=factor(Category))
Grocery
, Shopping
and Travel
represented by different colors. The x-axis is time span.ggplot(tran, aes(timeS, fill=Category)) + geom_bar() + facet_wrap(~ User)
A slight invariant:
ggplot(tran, aes(timeS, fill=User)) + geom_bar() + facet_wrap(~ Category)
stat='identity'
is the option that lets you plot y vs x instead of the default statistics count.
ggplot(tran) + geom_bar(aes(timeS, Amount, fill=Category), stat='identity')
With separate user:
ggplot(tran) + geom_bar(aes(timeS, Amount, fill=Category), stat='identity') + facet_wrap(~ User)
ggplot(tran) + geom_bar(aes(x=timeS, y=Amount, fill=User), stat='identity') + facet_wrap(~ Category, nrow = 2)
Facet_wrap
w.r.t User of all categories with a greyscale of total amount of both users in the backgroundggplot(tran) + geom_bar(aes(timeS, Amount, fill=Category), stat='identity') + geom_bar(data=transform(tran, User=NULL), aes(x=timeS, y=Amount), stat='identity', alpha=I(0.2)) + facet_wrap(~User)
To have a picture in the background of every facet, we need to create a facet without the facet variable. For example, in the previous case, transform(tran, User=NULL)
gives you a data without the facet variable ~User
. We plot a bar geometry of this data.
Alternate representation with Category
and User
interchanged
ggplot(tran) + geom_bar(aes(timeS, Amount, fill=User), stat='identity') + geom_bar(data=transform(tran, Category=NULL), aes(x=timeS, y=Amount), stat='identity', alpha=I(0.2)) + facet_wrap(~Category)
In this context, manysum
is nothing but
ggplot(tran) + geom_bar(aes(timeS, Amount, fill=User), stat='identity')