I’m still learning everything about sharing nicely. I’ve uploaded my r-script to my Github Gist. This might be a better way of “sharing code”?

https://gist.github.com/chuckpurvis/281a786c06593afbf256f184a567a5ce

This script uses the R package CTPPr to pull the California place-level 2012-16 data on households by household size (5) by vehicles available (5) by Tenure (5). That’s a pretty complex table with 125 cells per geography. And CTPPr automatically downloads the standard estimate (SE, not the 90% Margin of Error), so that’s about 250 records per each piece of geography. Ouch.

The function “pivot_wider” in the r package “dplyr” is used to rewrite the dataset from this “long” format to more of a “wide” format. It’s pulling the “estimates” (Households) separately from the “standard errors”, and then re-combining them.

The result of this first phase is a data set with many fewer rows, and lots of columns with really long variable names. But there’s a solution!

The next phase is to rename variables and re-code variable values into much shorter, mnemonic variable names, and then to do a new set of pivot_wider to create a data set with much easier to read variable names!

I would STRONGLY recommend learning the R package “dplyr” if you’re going to be analyzing census data using either CTPPr or tidycensus.

Take care,

Chuck Purvis

Hayward, California