A Simple Function for Forest Plots

A great way of conveying regression results is through a forest plot.  Widely used in meta-analyses to compare results across models, they are also a convenient way to visualize regression results.  Wanting to make one for a presentation, I naturally turned to R and its seemingly infinite packages.

The package the internet recommends is forestplot.  I actually found this package very frustrating to use for two reasons.  First, it is designed to work with meta-analyses, so the user interested in plotting just one regression result has to modify a lot of function arguments.  Second, it appears impossible to modify the size of axis labels.  The documentation suggests this is possible, but hours of messing around got me nowhere.  This difficulty could be because forestplot uses the lattice plotting framework, and I am firmly in the base R and ggplot world.  Not worth learning a new approach for a small problem.

I then researched how to make a forest plot in ggplot.  Forest plots in ggplot are doable, but I wasn’t pleased with the syntax required.  Too much hacking for what should be really simple.

After having wasted many hours, I bit the bullet and wrote my own function to make a simple forestplot.  (Many thanks to Alex Hughes for the initial code.)  The code is very simple and so not worth releasing a R package.  Indeed, the initial results for your model will probably require you to tweak the code.  The main benefit of this function therefore is to save the time of setting up the initial code, and you should have a presentable .pdf file with just a couple of minutes of tweaking.  The code is available at this Github repository and pasted below.

Some notes that may not be obvious from my comments.  model should be the results of a regression.   The plot has pretty wide margins, but they may not be wide enough depending on the length of your coefficient labels.  Line width and point size are hard coded but very easy to change if you desire.  The code is designed to produce .pdf output, and the outpath argument should end in ‘.pdf’.

'''
The purpose of this script is to define an R function that makes pretty forest plots based on regression results.
Function arguments:
 1. model = model results
 2. coefficients = character vector of coefficients to keep. Easiest to pass as names(model$coefficients)
 3. coef_labels = labels for y-axis on plot
 4. se = user can pass custom standard errors
 5. zscore = how user defines width of confidence intervals
 6. outfile = name, including filepath, where the output will be saved. Must end in .pdf
 7. cex = magnification for text

NB: This plotting works best when the coefficients have similar values. 
'''
forestPlot <- function(model=NULL, coefficients=NULL, coef_labels=NULL, se = NULL, zscore = 1.96, outpath, cex){
 coef <- model$coefficients # Get coefficients from model

 if(is.null(coefficients) == TRUE){ # Generate coefficie
 coefficients <- names(model$coefficients)
 }

 coef <- coef[coefficients] # Keep user specific coefficients
 coef <- as.numeric(coef)

 if(is.null(se) == TRUE){ # If no standard errors given, take from model
 stdev <- summary(model)$coefficients[,2] # Get coefficient standard deviations
 stdev <- stdev[coefficients]
 stdev <- as.numeric(stdev)
 }

 if(is.null(se) == FALSE){ # If standard errors given, use those
 stdev <- sd
 stdev <- stdev[coefficients]
 stdev <- as.numeric(stdev)
 }

 lower <- coef - zscore*stdev # Lower value of confidence interval
 upper <- coef + zscore*stdev # Upper value of confidence interval

 vars <- length(coef) # Will be used to create y values

 if(is.null(coef_labels) == TRUE){
 coef_labels <- coefficients
 }

 # Define x minimum to plot y-axis labels
 label_pos <- min(lower)*-1.05


 #Plot all results
 pdf(outpath)
 par(mar=c(5,10,3,1)) # Need more space on left side of plot (6) for var. labels
 plot(x=coef, y=vars:1, xlim=c(min(lower)*1.1,max(upper)*1.5), pch = 20, xlab='Coefficient', bty='n', ylab='', yaxt='n', xaxt='n', cex.lab = cex)
 for(i in 1:vars){
 lines(x=c(lower[i], upper[i]), y = rep(vars+1-i, each=2), lwd=2)
 }
 axis(1, cex.axis=cex)
 axis(2, at=vars:1, labels=coef_labels, las=1, lwd=0, pos=-label_pos, outer=TRUE, cex.axis=cex) # pos is x-axis location of labels
 abline(v=0, lty=2)
 dev.off()
}

 

 

Advertisements

One comment

  1. Just crawling through old internet posts… but, where’s the gravy? I wanted to see what this comes out looking like! (And, thanks for the h/t by the way)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: