I’m still learning everything about sharing nicely. I’ve uploaded my r-script to my Github
Gist. This might be a better way of “sharing code”?
https://gist.github.com/chuckpurvis/281a786c06593afbf256f184a567a5ce
<https://gist.github.com/chuckpurvis/281a786c06593afbf256f184a567a5ce>
This script uses the R package CTPPr to pull the California place-level 2012-16 data on
households by household size (5) by vehicles available (5) by Tenure (5). That’s a pretty
complex table with 125 cells per geography. And CTPPr automatically downloads the standard
estimate (SE, not the 90% Margin of Error), so that’s about 250 records per each piece of
geography. Ouch.
The function “pivot_wider” in the r package “dplyr” is used to rewrite the dataset from
this “long” format to more of a “wide” format. It’s pulling the “estimates” (Households)
separately from the “standard errors”, and then re-combining them.
The result of this first phase is a data set with many fewer rows, and lots of columns
with really long variable names. But there’s a solution!
The next phase is to rename variables and re-code variable values into much shorter,
mnemonic variable names, and then to do a new set of pivot_wider to create a data set with
much easier to read variable names!
I would STRONGLY recommend learning the R package “dplyr” if you’re going to be analyzing
census data using either CTPPr or tidycensus.
Take care,
Chuck Purvis
Hayward, California