Example #3. More Complex Tidycensus examples: multiple years, multiple geographies, multiple variables. ���Stacked��� results.
This is an example of stacking ���R��� data frames, where each record (row) represents a unique geography/year combination.
Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. I���m loading the ���R��� package ���plyr��� which helps in stacking / concatenating / pancaking data frames.
# Step 0: Load relevant libraries into each
R-session.
library(tidyverse)
library(tidycensus)
library(janitor)
library(plyr) # This is needed for a
function to concatenate a lot of files in one statement!
In this set of examples, I���m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California. Very similar to Example #2, but with one record (row) per each geography/year combination.
#
Example 3.1 through 3.14: Run get_acs for large California Places,
2005-2018
#
Example 3.15:
Concatenate (pancake) data frames: lots of records
#
Example 3.16: Merge in
a file of Large San Francisco Bay Area places, and subset file.
#
Example 3.17: Extract
data for one place using a string search on the place name
#------------------------------------------------------------------------------------
# Set a list of variables to extract in
each iteration of get_acs
#
This is a LOT more efficient for variable naming!!!
selvars
<- c(TotalPop_ =
"B06001_001", # Total Population
Med_HHInc_ = "B19013_001", # Median Household
Income
Agg_HHInc_ = "B19025_001", # Aggregate
Household Income
HHldPop_ = "B11002_001", # Population in
Households
Househlds_ = "B25003_001", # Total Households
Owner_OccDU_=
"B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_ =
"B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_ = "B25077_001")
#------------------------------------------------------------------------------------
temp2005
<- get_acs(survey="acs1", year=2005, geography =
"place", state =
"CA",
show_call =
TRUE,output="wide", variables = selvars)
temp2005$Year <- "2005"
#------------------------------------------------------------------------------------
temp2006
<- get_acs(survey="acs1", year=2006, geography =
"place", state =
"CA",
show_call =
TRUE,output="wide", variables = selvars)
temp2006$Year <- "2006"
#------------------------------------------------------------------------------------
temp2007
<- get_acs(survey="acs1", year=2007, geography =
"place", state =
"CA",
show_call =
TRUE,output="wide", variables = selvars)
temp2007$Year <- "2007"
#------------------------------------------------------------------------------------
These sets of
codes are repeated for each ACS single-year of interest. Note that I���m adding a
new variable ���Year��� to each data frame. Otherwise, I have no indication of the
year of each data frame, other than the actual name of the file!
In the following
���R��� step, I���m using the ���dplyr��� function ���rbind.fill��� to concatenate a lot of
data frames!
#
Example 3.15:
Concatenate (pancake) data frames: lots of records
#
Concatenate All Years .....
#
rbind can only concatenate two dataframes at a time. rbind.fill can do
2-or-more data
#
frames to concatenate. It's a plyr function.
# temp0506 <- rbind(temp2005,temp2006)
# temp0507 <- rbind(temp0506,temp2007)
tempall <-
rbind.fill(temp2005,temp2006,temp2007,temp2008,temp2009,
temp2010,temp2011,temp2012,temp2013,temp2014,
temp2015,temp2016,temp2017,temp2018)
# Add a couple of useful variables!
# need to have a if/then to catch zero
values.. work on this later.
# tempall$Avg_HHSize <-
tempall$HHldPop_E / tempall$Househlds_E
# tempall$MeanHHInc <- tempall$Agg_HHInc_E /
tempall$Househlds_E
# Sort the Results by GEOID and then by
Year
tempalls <-
tempall[order(tempall$GEOID,tempall$Year),]
setwd("~/Desktop/tidycensus_work/output")
# Export the data frames to CSV files, for
importing to Excel, and applying finishing touches
write.csv(tempalls,"ACS_AllYears_Calif_Places_Stacked.csv")
In the following
step I���m extracting data for large places in the San Francisco Bay Area.
#
Example 3.16: Merge in a file of Large San Francisco Bay Area places,
and subset file.
# Read in a file with the Large SF Bay Area
Places, > 65,000 population
# and merge with the All Large California
Places
bayplace <-
read.csv("BayArea_Places_65K.csv")
Bayplace1 <-
merge(bayplace,tempalls, by = c('NAME'))
Bayplace1 <-
Bayplace1[order(Bayplace1$GEOID.x,Bayplace1$Year),]
write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places_Stacked.csv")
dput(names(Bayplace1))
In the following
step I���m extracting data for just ���Hayward��� city in the San Francisco Bay Area.
This uses the ���R��� function ���grepl���. (That���s grep-ell, not grep-one).
# Example 3.17:
Extract data for one place using a string search on the place name
#
Extract one place at a time from Bayplace1
#
Hayward <- filter(Bayplace1, grepl("Hayward",NAME,fixed=TRUE))
Hayward <- Hayward[order(Hayward$Year),]
Hayward$Avg_HHSize <- Hayward$HHldPop_E
/ Hayward$Househlds_E
Hayward$MeanHHInc <- Hayward$Agg_HHInc_E /
Hayward$Househlds_E
selvarxxx <-
c("Year","NAME", "GEOID.x",
"NAME2",
"TotalPop_E", "Med_HHInc_E",
"Agg_HHInc_E", "HHldPop_E", "Househlds_E",
"Owner_OccDU_E", "Rent_OccDU_E",
"Med_HHVal_E", "Avg_HHSize", "MeanHHInc" )
Hayward2 <- Hayward[selvarxxx]
write.csv(Hayward2,"ACS_AllYears_BaseVar_Hayward_Stacked.csv")
#####################################################################################
This concludes
Example #3: ���multiple geographies / multiple years/ multiple variables��� with
only one record (row) per each geography/year combination, or the ���stacked��� output.