For myself and many colleagues, adopting R as our main analysis tool was only after years of using other products like SPSS, Stata, or Excel. Lets admit it – Excel is all too often what many of us use for initial data inspection, preprocessing, and some descriptive statistics. A common step is to create a quick pivot table in Excel to explore some of the different variables in the dataset. The idea of a pivot table is to use two variables to group data and then summarize their relationship.
For example, I often work with voter files that include voter history. I often want to group by the election on one axis and the ballots cast by party, age, geography, or some other available variable. My old habits have me drop this data into Excel and make a pivot table to summarize these variables. Something that I kept wanting to do with R is replicate this tool.
To create a similar tool in R requires a little script and dplyr. Here is what you need to do. First decide what the variable you will group on for the rows, then what variable to group on for the columns. Finally, decide how you want to measure the interaction between the rows and columns. Using dplyr’s summarise function, you can count, sum, calculate means or medians, and so on.
Here is how the script is set up. If you use it – change out “data” with your dataframe, and “rows” with the variable name you want to use, and “columnvariable” for the variable to use for the columns. How this works is
#Libraries Required
library(dplyr)
library(pander) #Optional if you want the last line to run
pivot <- data %>% group_by(rows) %>% summarise(n=n()) #Data is grouped to create the row titles you want
#Start summarizing by the rows
columns <- unique(columnvariable) #The unique values of the column variable are stored for use in the loop.
#Now, loop through the column variable to create summaries.
#Change the summarise function as desired.
for (i in seq_along(row_category)) {
temp <- data %>% group_by(row) %>% filter(column==columns[i]) %>% summarise(summary=n())
names(temp)[names(temp)=="summary"] <- columns[i] #Renames the summarized column with the particular column name to use in join.
pivot <- full_join(pivot, temp, by="row") #Adds the results for each analysis to make the new table.
}
rm("temp") #Delete the temp file for creating summary
#Be fancy and create markdown table result for a nice report!
pandoc.table(pivot, style="rmarkdown")
You can also add more summary functions to the loop, just remember to adjust the renaming step to address the additional summary columns. The last line in this script makes a table that can be used in Markdown or RMarkdown.