I’m still learning everything about sharing nicely. I’ve uploaded my r-script to my Github Gist. This might be a better way of “sharing code”?
This script uses the R package CTPPr to pull the California place-level 2012-16 data on households by household size (5) by vehicles available (5) by Tenure (5). That’s a pretty complex table with 125 cells per geography. And CTPPr automatically downloads the standard estimate (SE, not the 90% Margin of Error), so that’s about 250 records per each piece of geography. Ouch.
The function “pivot_wider” in the r package “dplyr” is used to rewrite the dataset from this “long” format to more of a “wide” format. It’s pulling the “estimates” (Households) separately from the “standard errors”, and then re-combining them.
The result of this first phase is a data set with many fewer rows, and lots of columns with really long variable names. But there’s a solution!
The next phase is to rename variables and re-code variable values into much shorter, mnemonic variable names, and then to do a new set of pivot_wider to create a data set with much easier to read variable names!
I would STRONGLY recommend learning the R package “dplyr” if you’re going to be analyzing census data using either CTPPr or tidycensus.
Take care,
Chuck Purvis
Hayward, California