Example
#2. More Complex Tidycensus examples: multiple years, multiple geographies,
multiple variables.
This is a more complex example of a script to ���pull��� select variables from the ACS using my Census API key and the R package tidycensus.
Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. Otherwise, it won���t work.
# Step 0: Load relevant libraries into each
R-session.
library(tidyverse)
library(tidycensus)
In this set of examples, I���m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California.
# The get_acs function is run for each year
of the single-year ACS data, from 2005 to 2018.
#
Note that group quarters data was not collected in 2005, but started in
2006.
#
Note the "_05_" included in the variable name in the first
data "pull". That's a # # mnemonic
device that tells us it's for the year 2005.
#
Example 2.1 through 2.14: Run get_acs for large California Places,
2005-2018
#
Example 2.15: Merge
together data frames into a VERY wide database...lots of columns!
#
Example 2.16: Merge in
a file of Large San Francisco Bay Area places, and subset file.
#-------------------------------------------------------------------------------
place05
<- get_acs(survey="acs1", year=2005, geography =
"place", state =
"CA",
show_call = TRUE, output="wide",
variables = c(TotalPop_05_ =
"B06001_001", # Total Population
Med_HHInc_05_ = "B19013_001", # Median Household
Income
Agg_HHInc_05_ = "B19025_001", # Aggregate
Household Income
HHldPop_05_ = "B11002_001", # Population in
Households
Househlds_05_ = "B25003_001", # Total Households
Owner_OccDU_05_=
"B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_05_ =
"B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_05_
= "B25077_001")) # Median Value of Owner-Occ DUs
place05$Avg_HHSize_05 <-
place05$HHldPop_05_E / place05$Househlds_05_E
place05$MeanHHInc_05 <- place05$Agg_HHInc_05_E /
place05$Househlds_05_E
#------------------------------------------------------------------------------------
place06
<- get_acs(survey="acs1", year=2006, geography =
"place", state =
"CA",
show_call = TRUE, output="wide",
variables = c(TotalPop_06_ =
"B06001_001", # Total Population
Med_HHInc_06_ = "B19013_001", # Median Household
Income
Agg_HHInc_06_ = "B19025_001", # Aggregate
Household Income
HHldPop_06_ = "B11002_001", # Population in
Households
Househlds_06_ = "B25003_001", # Total Households
Owner_OccDU_06_=
"B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_06_ =
"B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_06_ = "B25077_001")) # Median Value of
Owner-Occ DUs
place06$Avg_HHSize_06 <-
place06$HHldPop_06_E / place06$Househlds_06_E
place06$MeanHHInc_06 <- place06$Agg_HHInc_06_E /
place06$Househlds_06_E
#------------------------------------------------------------------------------------
These
sets of codes are repeated for each single year ACS of interest, say for 2005
through 2018. Smarter ���R��� programmers will be able to tell me about ���do loops���
to make this process more efficient with magical wild cards.
The following step merges the data frames using the GEOID/NAME variables. This create a very ���wide��� database. One record per geography, and each column representing the variable/year combinations.
The ���merge��� function in ���R��� allows only two data frames to be joined by common columns at a time. I have yet to find a ���R��� function that allows me to merge all of the data frames at once.
#####################################################################################
#
Example 2.15: Merge together data frames into a VERY wide
database...lots of columns!
# Merge the dataframes, adding a year in
each step. All=TRUE is needed if # of places is different.
#
# (R-language newbie script...There are
probably more terse/exotic ways of doing this!)
place0506 <- merge(place05, place06, by = c('GEOID','NAME'), all=TRUE)
place0507 <- merge(place0506,place07, by
= c('GEOID','NAME'), all=TRUE)
place0508 <- merge(place0507,place08, by
= c('GEOID','NAME'), all=TRUE)
place0509 <- merge(place0508,place09, by
= c('GEOID','NAME'), all=TRUE)
place0510 <- merge(place0509,place10, by
= c('GEOID','NAME'), all=TRUE)
place0511 <- merge(place0510,place11, by
= c('GEOID','NAME'), all=TRUE)
place0512 <- merge(place0511,place12, by
= c('GEOID','NAME'), all=TRUE)
place0513 <- merge(place0512,place13, by
= c('GEOID','NAME'), all=TRUE)
place0514 <- merge(place0513,place14, by
= c('GEOID','NAME'), all=TRUE)
place0515 <- merge(place0514,place15, by
= c('GEOID','NAME'), all=TRUE)
place0516 <- merge(place0515,place16, by
= c('GEOID','NAME'), all=TRUE)
place0517 <- merge(place0516,place17, by
= c('GEOID','NAME'), all=TRUE)
place0518 <- merge(place0517,place18, by
= c('GEOID','NAME'), all=TRUE)
place_all <- place0518
View(place_all)
Sometimes
you want to create smaller data frames with just a select number of columns.
Here���s a good approach for that.
# The following functions output useful
lists to the R-studio console which can then be edited
names(place_all)
dput(names(place_all)) # most useful for
subsetting variables
# The purpose here is to re-order and
select variables into a much more compact
#
database, for eventual exporting into a CSV file, and then into Excel
for finishing touches.
selvars <- c("GEOID",
"NAME",
"TotalPop_05_E", "TotalPop_06_E", "TotalPop_07_E",
"TotalPop_08_E",
"TotalPop_09_E", "TotalPop_10_E",
"TotalPop_11_E", "TotalPop_12_E",
"TotalPop_13_E", "TotalPop_14_E",
"TotalPop_15_E", "TotalPop_16_E",
"TotalPop_17_E", "TotalPop_18_E")
# note the brackets for outputing new data
frame from previous data frame....
place_all2 <- place_all[selvars]
# View the Selected Variables Table
View(place_all2)
# Set directory for exported data files,
MacOS directory style
setwd("~/Desktop/tidycensus_work/output")
# Export the data frames to CSV files, for
importing to Excel, and applying finishing touches
write.csv(place_all2,"ACS_AllYears_TotalPop_Calif_Places.csv")
write.csv(place_all, "ACS_AllYears_BaseVar_Calif_Places.csv")
In this last example, I���m reading in a file of large places in the Bay Area (manually derived from the CSV file created previous) in order to subset Bay Area ���large places��� from State of California ���large places���.
#####################################################################################
#
Example 2.16: Merge in a file of Large San Francisco Bay Area places,
and subset file.
# Read in a file with the Large SF Bay Area
Places, > 65,000 population
# and merge with the All Large California
Places
bayplace <-
read.csv("BayArea_Places_65K.csv")
Bayplace1 <-
merge(bayplace,place_all, by =
c('NAME'))
Bayplace2 <- merge(bayplace,place_all2,
by = c('NAME'))
write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places.csv")
write.csv(Bayplace2,"ACS_AllYears_TotalPop_BayArea_Places.csv")
This concludes
Example #2: ���multiple geographies / multiple years/ multiple variables��� with
only one record (row) per each geography.