Here��s my writeup on creating on what I call a ��stacked�� output from tidycensus: one record per each unique geography / year combination. This approach may be preferred if you��re trying to create a ��data profile�� for a specific geographic area, with rows representing the years, and columns representing the various variables of interest (total population, household population, workers by means of transportation to work, etc.)

I��m almost done! Hope this helps!

Chuck Purvis,

Hayward, California

Example #3. More Complex Tidycensus examples: multiple years, multiple geographies, multiple variables. ��Stacked�� results.

This is an example of stacking ��R�� data frames, where each record (row) represents a unique geography/year combination.

Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. I��m loading the ��R�� package ��plyr�� which helps in stacking / concatenating / pancaking data frames.

# Step 0: Load relevant libraries into each R-session.

library(tidyverse)

library(tidycensus)

library(janitor)

library(plyr) # This is needed for a function to concatenate a lot of files in one statement!

In this set of examples, I��m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California. Very similar to Example #2, but with one record (row) per each geography/year combination.

# Example 3.1 through 3.14: Run get_acs for large California Places, 2005-2018

# Example 3.15: Concatenate (pancake) data frames: lots of records

# Example 3.16: Merge in a file of Large San Francisco Bay Area places, and subset file.

# Example 3.17: Extract data for one place using a string search on the place name

#------------------------------------------------------------------------------------

# Set a list of variables to extract in each iteration of get_acs

# This is a LOT more efficient for variable naming!!!

selvars <- c(TotalPop_ = "B06001_001", # Total Population

Med_HHInc_ = "B19013_001", # Median Household Income

Agg_HHInc_ = "B19025_001", # Aggregate Household Income

HHldPop_ = "B11002_001", # Population in Households

Househlds_ = "B25003_001", # Total Households

Owner_OccDU_= "B25003_002", # Owner-Occupied Dwelling Units

Rent_OccDU_ = "B25003_003", # Renter-Occupied Dwelling Units

Med_HHVal_ = "B25077_001")

#------------------------------------------------------------------------------------

temp2005 <- get_acs(survey="acs1", year=2005, geography = "place", state = "CA",

show_call = TRUE,output="wide", variables = selvars)

temp2005$Year <- "2005"

#------------------------------------------------------------------------------------

temp2006 <- get_acs(survey="acs1", year=2006, geography = "place", state = "CA",

show_call = TRUE,output="wide", variables = selvars)

temp2006$Year <- "2006"

#------------------------------------------------------------------------------------

temp2007 <- get_acs(survey="acs1", year=2007, geography = "place", state = "CA",

show_call = TRUE,output="wide", variables = selvars)

temp2007$Year <- "2007"

#------------------------------------------------------------------------------------

These sets of codes are repeated for each ACS single-year of interest. Note that I��m adding a new variable ��Year�� to each data frame. Otherwise, I have no indication of the year of each data frame, other than the actual name of the file!

In the following ��R�� step, I��m using the ��dplyr�� function ��rbind.fill�� to concatenate a lot of data frames!

# Example 3.15: Concatenate (pancake) data frames: lots of records

# Concatenate All Years .....

# rbind can only concatenate two dataframes at a time. rbind.fill can do 2-or-more data

# frames to concatenate. It's a plyr function.

# temp0506 <- rbind(temp2005,temp2006)

# temp0507 <- rbind(temp0506,temp2007)

tempall <- rbind.fill(temp2005,temp2006,temp2007,temp2008,temp2009,

temp2010,temp2011,temp2012,temp2013,temp2014,

temp2015,temp2016,temp2017,temp2018)

# Add a couple of useful variables!

# need to have a if/then to catch zero values.. work on this later.

# tempall$Avg_HHSize <- tempall$HHldPop_E / tempall$Househlds_E

# tempall$MeanHHInc <- tempall$Agg_HHInc_E / tempall$Househlds_E

# Sort the Results by GEOID and then by Year

tempalls <- tempall[order(tempall$GEOID,tempall$Year),]

setwd("~/Desktop/tidycensus_work/output")

# Export the data frames to CSV files, for importing to Excel, and applying finishing touches

write.csv(tempalls,"ACS_AllYears_Calif_Places_Stacked.csv")

In the following step I��m extracting data for large places in the San Francisco Bay Area.

# Example 3.16: Merge in a file of Large San Francisco Bay Area places, and subset file.

# Read in a file with the Large SF Bay Area Places, > 65,000 population

# and merge with the All Large California Places

bayplace <- read.csv("BayArea_Places_65K.csv")

Bayplace1 <- merge(bayplace,tempalls, by = c('NAME'))

Bayplace1 <- Bayplace1[order(Bayplace1$GEOID.x,Bayplace1$Year),]

write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places_Stacked.csv")

dput(names(Bayplace1))

In the following step I��m extracting data for just ��Hayward�� city in the San Francisco Bay Area. This uses the ��R�� function ��grepl��. (That��s grep-ell, not grep-one).

# Example 3.17: Extract data for one place using a string search on the place name

# Extract one place at a time from Bayplace1

Hayward <- filter(Bayplace1, grepl("Hayward",NAME,fixed=TRUE))

Hayward <- Hayward[order(Hayward$Year),]

Hayward$Avg_HHSize <- Hayward$HHldPop_E / Hayward$Househlds_E

Hayward$MeanHHInc <- Hayward$Agg_HHInc_E / Hayward$Househlds_E

selvarxxx <- c("Year","NAME", "GEOID.x", "NAME2",

"TotalPop_E", "Med_HHInc_E",

"Agg_HHInc_E", "HHldPop_E", "Househlds_E",

"Owner_OccDU_E", "Rent_OccDU_E",

"Med_HHVal_E", "Avg_HHSize", "MeanHHInc" )

Hayward2 <- Hayward[selvarxxx]

write.csv(Hayward2,"ACS_AllYears_BaseVar_Hayward_Stacked.csv")

#####################################################################################

This concludes Example #3: ��multiple geographies / multiple years/ multiple variables�� with only one record (row) per each geography/year combination, or the ��stacked�� output.