Here���s my writeup on creating on what I call a ���stacked��� output from tidycensus: one record per each unique geography / year combination. This approach may be preferred if you���re trying to create a ���data profile��� for a specific geographic area, with rows representing the years, and columns representing the various variables of interest (total population, household population, workers by means of transportation to work, etc.)

I���m almost done! Hope this helps!

Chuck Purvis,
Hayward, California

Example #3. More Complex Tidycensus examples: multiple years, multiple geographies, multiple variables. ���Stacked��� results.

This is an example of stacking ���R��� data frames, where each record (row) represents a unique geography/year combination.

Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. I���m loading the ���R��� package ���plyr��� which helps in stacking / concatenating / pancaking data frames.

# Step 0: Load relevant libraries into each R-session.

 

library(tidyverse)

library(tidycensus)

library(janitor)

library(plyr) # This is needed for a function to concatenate a lot of files in one statement!

 

In this set of examples, I���m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California. Very similar to Example #2, but with one record (row) per each geography/year combination.

#  Example 3.1 through 3.14: Run get_acs for large California Places, 2005-2018

#  Example 3.15:             Concatenate (pancake) data frames: lots of records

#  Example 3.16:             Merge in a file of Large San Francisco Bay Area places, and subset file.

#  Example 3.17:             Extract data for one place using a string search on the place name

#------------------------------------------------------------------------------------

# Set a list of variables to extract in each iteration of get_acs

#  This is a LOT more efficient for variable naming!!!

 

selvars  <- c(TotalPop_   = "B06001_001", # Total Population

              Med_HHInc_  = "B19013_001", # Median Household Income

              Agg_HHInc_  = "B19025_001", # Aggregate Household Income

              HHldPop_    = "B11002_001", # Population in Households

              Househlds_  = "B25003_001", # Total Households

              Owner_OccDU_= "B25003_002", # Owner-Occupied Dwelling Units

              Rent_OccDU_ = "B25003_003", # Renter-Occupied Dwelling Units

              Med_HHVal_  = "B25077_001")

#------------------------------------------------------------------------------------

temp2005  <- get_acs(survey="acs1", year=2005, geography = "place",   state = "CA",

                     show_call = TRUE,output="wide", variables = selvars)

temp2005$Year       <- "2005"

#------------------------------------------------------------------------------------

temp2006  <- get_acs(survey="acs1", year=2006, geography = "place",   state = "CA",

                     show_call = TRUE,output="wide", variables = selvars)

temp2006$Year       <- "2006"

#------------------------------------------------------------------------------------

temp2007  <- get_acs(survey="acs1", year=2007, geography = "place",   state = "CA",

                     show_call = TRUE,output="wide", variables = selvars)

temp2007$Year       <- "2007"

#------------------------------------------------------------------------------------

 

These sets of codes are repeated for each ACS single-year of interest. Note that I���m adding a new variable ���Year��� to each data frame. Otherwise, I have no indication of the year of each data frame, other than the actual name of the file!

 

In the following ���R��� step, I���m using the ���dplyr��� function ���rbind.fill��� to concatenate a lot of data frames!

 

#  Example 3.15:             Concatenate (pancake) data frames: lots of records

#  Concatenate All Years .....

#  rbind can only concatenate two dataframes at a time. rbind.fill can do 2-or-more data

#   frames to concatenate. It's a plyr function.

# temp0506 <- rbind(temp2005,temp2006)

# temp0507 <- rbind(temp0506,temp2007)

 

tempall <- rbind.fill(temp2005,temp2006,temp2007,temp2008,temp2009,

                      temp2010,temp2011,temp2012,temp2013,temp2014,

                      temp2015,temp2016,temp2017,temp2018)

 

# Add a couple of useful variables!

# need to have a if/then to catch zero values.. work on this later.

# tempall$Avg_HHSize <- tempall$HHldPop_E / tempall$Househlds_E

# tempall$MeanHHInc  <- tempall$Agg_HHInc_E / tempall$Househlds_E

 

# Sort the Results by GEOID and then by Year

 

tempalls <- tempall[order(tempall$GEOID,tempall$Year),]

 

setwd("~/Desktop/tidycensus_work/output")

 

# Export the data frames to CSV files, for importing to Excel, and applying finishing touches

 

write.csv(tempalls,"ACS_AllYears_Calif_Places_Stacked.csv")

 
 

In the following step I���m extracting data for large places in the San Francisco Bay Area.

 

#  Example 3.16: Merge in a file of Large San Francisco Bay Area places, and subset file.

# Read in a file with the Large SF Bay Area Places, > 65,000 population

# and merge with the All Large California Places

 

bayplace <- read.csv("BayArea_Places_65K.csv")

 

Bayplace1 <- merge(bayplace,tempalls,  by = c('NAME'))

Bayplace1 <- Bayplace1[order(Bayplace1$GEOID.x,Bayplace1$Year),]

 

write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places_Stacked.csv")

 

dput(names(Bayplace1))

 
 

In the following step I���m extracting data for just ���Hayward��� city in the San Francisco Bay Area. This uses the ���R��� function ���grepl���. (That���s grep-ell, not grep-one).

 

#  Example 3.17:  Extract data for one place using a string search on the place name

#     Extract one place at a time from Bayplace1

#

Hayward <- filter(Bayplace1, grepl("Hayward",NAME,fixed=TRUE))

Hayward <- Hayward[order(Hayward$Year),]

 

Hayward$Avg_HHSize <- Hayward$HHldPop_E / Hayward$Househlds_E

Hayward$MeanHHInc  <- Hayward$Agg_HHInc_E / Hayward$Househlds_E

 

selvarxxx <- c("Year","NAME", "GEOID.x", "NAME2", 

             "TotalPop_E", "Med_HHInc_E",

             "Agg_HHInc_E", "HHldPop_E", "Househlds_E",

             "Owner_OccDU_E", "Rent_OccDU_E",

             "Med_HHVal_E", "Avg_HHSize", "MeanHHInc" )

 

Hayward2 <- Hayward[selvarxxx]

write.csv(Hayward2,"ACS_AllYears_BaseVar_Hayward_Stacked.csv")

 

#####################################################################################

 

This concludes Example #3: ���multiple geographies / multiple years/ multiple variables��� with only one record (row) per each geography/year combination, or the ���stacked��� output.