A great way of conveying regression results is through a forest plot. Widely used in meta-analyses to compare results across models, they are also a convenient way to visualize regression results. Wanting to make one for a presentation, I naturally turned to R and its seemingly infinite packages.

The package the internet recommends is forestplot. I actually found this package very frustrating to use for two reasons. First, it is designed to work with meta-analyses, so the user interested in plotting just one regression result has to modify a lot of function arguments. Second, it appears impossible to modify the size of axis labels. The documentation suggests this is possible, but hours of messing around got me nowhere. This difficulty could be because forestplot uses the lattice plotting framework, and I am firmly in the base R and ggplot world. Not worth learning a new approach for a small problem.

I then researched how to make a forest plot in ggplot. Forest plots in ggplot are doable, but I wasn’t pleased with the syntax required. Too much hacking for what should be really simple.

After having wasted many hours, I bit the bullet and wrote my own function to make a simple forestplot. (Many thanks to Alex Hughes for the initial code.) The code is very simple and so not worth releasing a R package. Indeed, the initial results for your model will probably require you to tweak the code. The main benefit of this function therefore is to save the time of setting up the initial code, and you should have a presentable .pdf file with just a couple of minutes of tweaking. The code is available at this Github repository and pasted below.

Some notes that may not be obvious from my comments. model should be the results of a regression. The plot has pretty wide margins, but they may not be wide enough depending on the length of your coefficient labels. Line width and point size are hard coded but very easy to change if you desire. The code is designed to produce .pdf output, and the outpath argument should end in ‘.pdf’.

''' The purpose of this script is to define an R function that makes pretty forest plots based on regression results. Function arguments: 1. model = model results 2. coefficients = character vector of coefficients to keep. Easiest to pass as names(model$coefficients) 3. coef_labels = labels for y-axis on plot 4. se = user can pass custom standard errors 5. zscore = how user defines width of confidence intervals 6. outfile = name, including filepath, where the output will be saved. Must end in .pdf 7. cex = magnification for text NB: This plotting works best when the coefficients have similar values. ''' forestPlot <- function(model=NULL, coefficients=NULL, coef_labels=NULL, se = NULL, zscore = 1.96, outpath, cex){ coef <- model$coefficients # Get coefficients from model if(is.null(coefficients) == TRUE){ # Generate coefficie coefficients <- names(model$coefficients) } coef <- coef[coefficients] # Keep user specific coefficients coef <- as.numeric(coef) if(is.null(se) == TRUE){ # If no standard errors given, take from model stdev <- summary(model)$coefficients[,2] # Get coefficient standard deviations stdev <- stdev[coefficients] stdev <- as.numeric(stdev) } if(is.null(se) == FALSE){ # If standard errors given, use those stdev <- sd stdev <- stdev[coefficients] stdev <- as.numeric(stdev) } lower <- coef - zscore*stdev # Lower value of confidence interval upper <- coef + zscore*stdev # Upper value of confidence interval vars <- length(coef) # Will be used to create y values if(is.null(coef_labels) == TRUE){ coef_labels <- coefficients } # Define x minimum to plot y-axis labels label_pos <- min(lower)*-1.05 #Plot all results pdf(outpath) par(mar=c(5,10,3,1)) # Need more space on left side of plot (6) for var. labels plot(x=coef, y=vars:1, xlim=c(min(lower)*1.1,max(upper)*1.5), pch = 20, xlab='Coefficient', bty='n', ylab='', yaxt='n', xaxt='n', cex.lab = cex) for(i in 1:vars){ lines(x=c(lower[i], upper[i]), y = rep(vars+1-i, each=2), lwd=2) } axis(1, cex.axis=cex) axis(2, at=vars:1, labels=coef_labels, las=1, lwd=0, pos=-label_pos, outer=TRUE, cex.axis=cex) # pos is x-axis location of labels abline(v=0, lty=2) dev.off() }

Just crawling through old internet posts… but, where’s the gravy? I wanted to see what this comes out looking like! (And, thanks for the h/t by the way)

[…] popular – ok, not wildly popular, but very useful for me – post documenting my function to create a forest plot. Recently, I made models using cross-validation via R’s caret package. The train() […]