Example #2. More Complex Tidycensus examples: multiple years, multiple geographies, multiple variables.

This is a more complex example of a script to ��pull�� select variables from the ACS using my Census API key and the R package tidycensus.

Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. Otherwise, it won��t work.

# Step 0: Load relevant libraries into each R-session.

library(tidyverse)

library(tidycensus)

In this set of examples, I��m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California.

# The get_acs function is run for each year of the single-year ACS data, from 2005 to 2018.

# Note that group quarters data was not collected in 2005, but started in 2006.

# Note the "_05_" included in the variable name in the first data "pull". That's a # # mnemonic device that tells us it's for the year 2005.

# Example 2.1 through 2.14: Run get_acs for large California Places, 2005-2018

# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!

# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.

#-------------------------------------------------------------------------------

place05 <- get_acs(survey="acs1", year=2005, geography = "place", state = "CA",

show_call = TRUE, output="wide",

variables = c(TotalPop_05_ = "B06001_001", # Total Population

Med_HHInc_05_ = "B19013_001", # Median Household Income

Agg_HHInc_05_ = "B19025_001", # Aggregate Household Income

HHldPop_05_ = "B11002_001", # Population in Households

Househlds_05_ = "B25003_001", # Total Households

Owner_OccDU_05_= "B25003_002", # Owner-Occupied Dwelling Units

Rent_OccDU_05_ = "B25003_003", # Renter-Occupied Dwelling Units

Med_HHVal_05_ = "B25077_001")) # Median Value of Owner-Occ DUs

place05$Avg_HHSize_05 <- place05$HHldPop_05_E / place05$Househlds_05_E

place05$MeanHHInc_05 <- place05$Agg_HHInc_05_E / place05$Househlds_05_E

#------------------------------------------------------------------------------------

place06 <- get_acs(survey="acs1", year=2006, geography = "place", state = "CA",

show_call = TRUE, output="wide",

variables = c(TotalPop_06_ = "B06001_001", # Total Population

Med_HHInc_06_ = "B19013_001", # Median Household Income

Agg_HHInc_06_ = "B19025_001", # Aggregate Household Income

HHldPop_06_ = "B11002_001", # Population in Households

Househlds_06_ = "B25003_001", # Total Households

Owner_OccDU_06_= "B25003_002", # Owner-Occupied Dwelling Units

Rent_OccDU_06_ = "B25003_003", # Renter-Occupied Dwelling Units

Med_HHVal_06_ = "B25077_001")) # Median Value of Owner-Occ DUs

place06$Avg_HHSize_06 <- place06$HHldPop_06_E / place06$Househlds_06_E

place06$MeanHHInc_06 <- place06$Agg_HHInc_06_E / place06$Househlds_06_E

#------------------------------------------------------------------------------------

These sets of codes are repeated for each single year ACS of interest, say for 2005 through 2018. Smarter ��R�� programmers will be able to tell me about ��do loops�� to make this process more efficient with magical wild cards.

The following step merges the data frames using the GEOID/NAME variables. This create a very ��wide�� database. One record per geography, and each column representing the variable/year combinations.

The ��merge�� function in ��R�� allows only two data frames to be joined by common columns at a time. I have yet to find a ��R�� function that allows me to merge all of the data frames at once.

#####################################################################################

# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!

# Merge the dataframes, adding a year in each step. All=TRUE is needed if # of places is different.

# (R-language newbie script...There are probably more terse/exotic ways of doing this!)

place0506 <- merge(place05, place06, by = c('GEOID','NAME'), all=TRUE)

place0507 <- merge(place0506,place07, by = c('GEOID','NAME'), all=TRUE)

place0508 <- merge(place0507,place08, by = c('GEOID','NAME'), all=TRUE)

place0509 <- merge(place0508,place09, by = c('GEOID','NAME'), all=TRUE)

place0510 <- merge(place0509,place10, by = c('GEOID','NAME'), all=TRUE)

place0511 <- merge(place0510,place11, by = c('GEOID','NAME'), all=TRUE)

place0512 <- merge(place0511,place12, by = c('GEOID','NAME'), all=TRUE)

place0513 <- merge(place0512,place13, by = c('GEOID','NAME'), all=TRUE)

place0514 <- merge(place0513,place14, by = c('GEOID','NAME'), all=TRUE)

place0515 <- merge(place0514,place15, by = c('GEOID','NAME'), all=TRUE)

place0516 <- merge(place0515,place16, by = c('GEOID','NAME'), all=TRUE)

place0517 <- merge(place0516,place17, by = c('GEOID','NAME'), all=TRUE)

place0518 <- merge(place0517,place18, by = c('GEOID','NAME'), all=TRUE)

place_all <- place0518

View(place_all)

Sometimes you want to create smaller data frames with just a select number of columns. Here��s a good approach for that.

# The following functions output useful lists to the R-studio console which can then be edited

names(place_all)

dput(names(place_all)) # most useful for subsetting variables

# The purpose here is to re-order and select variables into a much more compact

# database, for eventual exporting into a CSV file, and then into Excel for finishing touches.

selvars <- c("GEOID", "NAME",

"TotalPop_05_E", "TotalPop_06_E", "TotalPop_07_E", "TotalPop_08_E",

"TotalPop_09_E", "TotalPop_10_E", "TotalPop_11_E", "TotalPop_12_E",

"TotalPop_13_E", "TotalPop_14_E", "TotalPop_15_E", "TotalPop_16_E",

"TotalPop_17_E", "TotalPop_18_E")

# note the brackets for outputing new data frame from previous data frame....

place_all2 <- place_all[selvars]

# View the Selected Variables Table

View(place_all2)

# Set directory for exported data files, MacOS directory style

setwd("~/Desktop/tidycensus_work/output")

# Export the data frames to CSV files, for importing to Excel, and applying finishing touches

write.csv(place_all2,"ACS_AllYears_TotalPop_Calif_Places.csv")

write.csv(place_all, "ACS_AllYears_BaseVar_Calif_Places.csv")

In this last example, I��m reading in a file of large places in the Bay Area (manually derived from the CSV file created previous) in order to subset Bay Area ��large places�� from State of California ��large places��.

#####################################################################################

# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.

# Read in a file with the Large SF Bay Area Places, > 65,000 population

# and merge with the All Large California Places

bayplace <- read.csv("BayArea_Places_65K.csv")

Bayplace1 <- merge(bayplace,place_all, by = c('NAME'))

Bayplace2 <- merge(bayplace,place_all2, by = c('NAME'))

write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places.csv")

write.csv(Bayplace2,"ACS_AllYears_TotalPop_BayArea_Places.csv")

This concludes Example #2: ��multiple geographies / multiple years/ multiple variables�� with only one record (row) per each geography.