Darren Dahly PhD Statistical Epidemiology

Reproducible tables with R and LaTeX

Like many people, I used to compose tables for publication by typing or pasting values into cells. This is usually joyless, and always time consuming. The inevitable typos are are a nuisance, and having to remake multiple tables due to some minuscule change in the data or analysis is intolerable.

It doesn’t have to be this way. If you don’t already know where I am going with this, read up on Reproducible Research and Literate Programming. Despite a lot of great advice out there, I still haven’t found a satisfying way to produce summary tables.

Here is the skeleton of the table I want to make, made with this LaTeX code. I want the table to summarize key characteristics for an entire study sample as well as for key subgroups.

To fill the table, I wrote a function in R that loops** over a data frame and pastes each column name (or category label) to the number of missing values and the respective summary value, depending on the class of the column, along with all the other bits LaTeX needs to make the table. I made another function that ignores the variable/category names and number of missing values, and only gives the summary. I then created data frames with the variables I wanted to include, for each group I am interested in. The last step was to run the appropriate functions on the appropriate data frames, gather and print the resulting output as a single matrix. The output can then be cut and pasted directly into the LaTeX code.

The functions are here.

Create the output text as:

 table <- cbind(summaryA(dataA), 
                summaryB(dataB), 
                summaryB(dataC), 
                c("\\\\"))
write.table(table, “table.txt”,
             row.names = F, col.names = F, quote = F)

Cut and paste the result into the LaTeX code to get this.

** I’ve tried to do this with apply, but am having trouble with the colnames.