Reason #1 Why I Don’t like ggplot

[UPDATE: My best use of academic Twitter yet.  I tweeted a link to this post and quickly got pwnd by the internet.  What I learned:  there is a function, ggsave(), that does this that I did not know about.  So use that instead of the hacky workaround I came up with.  Never would’ve learned if I didn’t ask!]

I use R these days primarily for visualizing, which means I’m supposed to love ggplot2 and everything else Hadley Wickham has put into R.  In fact, I do like a lot of the tidyverse, but the part I like the least is the one almost everyone likes the most: ggplot2.  While I could spend hours detailing all the parts that bother me, I do not have that much time, need to get tenure, and should try not to annoy the hordes.  So let me take the opportunity this blog post provides me to focus on one particular problematic, hugely annoying feature of : the special handling required when saving its output to a file in a function.

With base R, saving a plot inside a function works intuitively, that is the same it works outside of a function.  Below is ggplot code that works outside of a function.


jpeg(paste0('Figures/Summer2018/', expName, '_results_heatmap_MeanMedianRatio.jpeg'), res=300, width=6, height=6, units='in')
ggplot(data, aes(x=simulationScaling, y=repressionRate)) + geom_tile(aes(fill=avgDiffusion/medianDiffusion)) +
labs(x='Network Scaling', y='Repression Rate') + theme(panel.grid.major=element_line(colour="grey95"),
panel.grid.minor=element_line(colour='grey95'), panel.background=element_rect(fill="white"),
panel.border=element_rect(colour="black", fill=NA)) + labs(fill='Mean/Median') + scale_fill_gradient(low='white',
high='black')
dev.off()

If it were base R code and put inside a function, it would work as is.  But in ggplot world, that is not the case.  Instead, the graph needs to be saved to an object, and that object is invoked between jpeg() and dev.off().  See below for how ggplot has to be used inside of the function.


wickhamwhy <- ggplot(data, aes(x=simulationScaling, y=repressionRate)) + geom_tile(aes(fill=avgDiffusion/medianDiffusion)) + labs(x='Network Scaling', y='Repression Rate') + theme(panel.grid.major=element_line(colour="grey95"), panel.grid.minor=element_line(colour='grey95'), panel.background=element_rect(fill="white"), panel.border=element_rect(colour="black", fill=NA)) + labs(fill='Mean/Median') + scale_fill_gradient(low='white', high='black')
jpeg(paste0('Figures/Summer2018/', expName, '_results_heatmap_MeanMedianRatio.jpeg'), res=300, width=6, height=6, units='in')
print(wickhamwhy)
dev.off()

 

Let me reiterate: you have to change how you make a plot just because the plot occurs inside a function, and this change is not required with base R, the much maligned plotting no one who is anyone is supposed to use.

That change is incredibly annoying!  First, it adds one line of code for every plot you want to make.  I just made a function creating 14 plots, so it makes my code 14 lines longer.  Second, it is not intuitive.  I am sure there is a beautiful, elegant, genius, grammatical reason that my simpleton mind cannot understand, but so what.  It doesn’t make sense!  The only reason to do it is because Hadley says so, and that’s not a reason I can get behind.

“Zack Zack Zack,” you say, “you are being unreasonable.  You are blowing an edge case way out of proportion.”  No!  If you are doing data visualization on datasets with more than a dozen variables, or many datasets with even a few variables, you want to quickly look at different relationships.  This step is fundamental to the research process, and doing it with any speed, and with an eye future sanity when you return to the code in 6 months and have forgotten what everything does, requires a function that you can call.  That function will plot relationships, and ggplot is good for (some kinds of) plotting.  In other words, this issue is a key component of the research process.

I’m not alone.  Here is a very smart professor person who doesn’t use ggplot2. But here is another smart person who knows that smart person who does like ggplot2.

Again, let me emphasize that there is lots I like about ggplot2 and the larger tidyverse.  But I really really really do not like this feature, and it’s a very important one.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.