A Slightly Less Simple Function for Forest Plots

This post is a sequel to my wildly popular – ok, not wildly popular, but very useful for me – post documenting my function to create a forest plot. Recently, I made models using cross-validation via R’s caret package. The train() function does the heavy lifting, but the problem is that its output is a of the train class, which I presume was created for caret. This is a problem because my function assumes it is receiving a generic model, so it fails.

The new function, pasted at the end of this post, makes two modifications. The main modification is to take the coefficients from model$finalModel$coefficients. You of course could pass those directly to the model argument in forestPlot, but that would require more typing, and I am lazy. The second update, which is not because of train, it to add na.rm=TRUE to min() and max(). Those arguments should have been in the original function, oh well. The third change is simply to change the x-axis label, which is hard coded. That should probably be an argument to pass to the function; oh well.

In response to demand from Alex Hughes, I have included an example of the output. Initial results suggest that violence, small groups, children, and seeing more individuals protest correlates with larger protests in the future, while the presence of police and large groups correlates with smaller protests.

forestPlot_crossvalidated_Models_DVAvg_ProtestImages_Countryday_EGESHKPKSKVEUA

'''
This formula has been updated to handle results of train class, from caret package.
The purpose of this script is to define an R function that makes pretty forest plots based on regression results.
Function arguments:
1. model = model results from caret
2. coefficients = character vector of coefficients to keep. Easiest to pass as names(model$coefficients)
3. coef_labels = labels for y-axis on plot
4. se = user can pass custom standard errors
5. zscore = how user defines width of confidence intervals
6. outfile = name, including filepath, where the output will be saved. Must end in .pdf
7. cex = magnification for text
 
NB: This plotting works best when the coefficients have similar values, i.e. consider standardizing your variables. 
'''

forestPlot <- function(model=NULL, coefficients=NULL, coef_labels=NULL, se = NULL, zscore = 1.96, outpath, cex){
 coef <- model$finalModel$coefficients # Get coefficients from model

if(is.null(coefficients) == TRUE){ # Generate coefficient names
 coefficients <- names(model$finalModel$coefficients)
 }

coef <- coef[coefficients] # Keep user specific coefficients
 coef <- as.numeric(coef)

if(is.null(se) == TRUE){ # If no standard errors given, take from model
 stdev <- summary(model)$coefficients[,2] # Get coefficient standard deviations
 stdev <- stdev[coefficients]
 stdev <- as.numeric(stdev)
 }

if(is.null(se) == FALSE){ # If standard errors given, use those
 stdev <- sd
 stdev <- stdev[coefficients]
 stdev <- as.numeric(stdev)
 }

lower <- coef - zscore*stdev # Lower value of confidence interval
 upper <- coef + zscore*stdev # Upper value of confidence interval

vars <- length(coef) # Will be used to create y values

if(is.null(coef_labels) == TRUE){
 coef_labels <- coefficients
 }

# Define x minimum to plot y-axis labels
label_pos <- min(lower, na.rm=TRUE)*-1.05

#Plot all results
 pdf(outpath)
 par(mar=c(7,10,3,1)) # Need more space on left side of plot (6) for var. labels
 plot(x=coef, y=vars:1, xlim=c(min(lower, na.rm=TRUE)*1.1,max(upper, na.rm=TRUE)*1.5), pch = 20, xlab='Standardized Coefficient', bty='n', ylab='', yaxt='n', xaxt='n', cex.lab = cex)
 for(i in 1:vars){
 lines(x=c(lower[i], upper[i]), y = rep(vars+1-i, each=2), lwd=2)
 }
 axis(1, cex.axis=cex)
 axis(2, at=vars:1, labels=coef_labels, las=1, lwd=0, pos=-label_pos, outer=TRUE, cex.axis=cex) # pos is x-axis location of labels
 abline(v=0, lty=2)
 dev.off()
}

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Share this:

Leave a comment Cancel reply