(I’ve been having outgoing mail issues, as well…)
Just a reminder to all that the single-year 2019 American Community Survey data was released to the public on Thursday, September 17, 2020.
This is the fifteenth year of single-year ACS data (2005 to 2019). (Reminder that group quarters data collection didn’t start until 2006.)
The data is actually under “embargo” (since 9/15/20), and only accessible to the media for analysis. But that’s just a short two day window of opportunity to get the scoop, so to speak.
Here’s a nice summary article, on health insurance coverage, by the nonpartisan Center on Budget and Policy Priorities:
https://www.cbpp.org/research/health/uninsured-rate-rose-again-in-2019-furt… <https://www.cbpp.org/research/health/uninsured-rate-rose-again-in-2019-furt…>
The bad news is that the share of the population without health insurance is up to 9.2 percent in 2019, compared to a historic low of 8.6 percent in 2016 and a historic high of 15.5 percent in 2010. (Health insurance was first collected in the ACS in 2008.)
And here’s the link to a 9/15/20 20-page report by the US Census Bureau on Health Insurance Coverage in the US based on data from the Current Population Survey (CPS) and the American Community Survey (ACS).
https://www.census.gov/content/dam/Census/library/publications/2020/demo/p6… <https://www.census.gov/content/dam/Census/library/publications/2020/demo/p6…>
https://www.census.gov/library/publications/2020/demo/p60-271.html <https://www.census.gov/library/publications/2020/demo/p60-271.html>
Note that the 2019 CPS data was collected February through April 2020, just at the start of our ongoing pandemic. I’d recommend reading both the CBPP and Census Bureau reports!!
If any of the national or local media has released any 2019 ACS results on transportation-related topics, that would be useful to post here!
Stay safe!
Chuck Purvis,
Hayward, California
formerly of the Metropolitan Transportation Commission (San Francisco, California).
Can I get a list of places within my multi-county region? Yes, but it takes a little work!
Finding Places within Counties for Your State and Region.
This is an “R” script which uses readily downloadable files from the decennial census to build a file of places within counties within states. This can then be subsetted to extract lists of places within a one-county or multi-county region. The following examples used 2010 Census data for the states of California, Texas, and New York, and can hopefully be easily adapted once the 2020 Census data becomes available spring 2021.
The standard Census geography hierarchy diagram shows that “places” are below “states.” “Places” are NOT below “County” geographic levels. This is because there are some “places” in the United States that straddle two-or-more counties! This is an inconvenience if the analyst is interested in, say, the population characteristics of all places (cities, census designated places) within a one-county or multi-county region.
A simple, creative way to find places-within-counties is to use GIS software to layer both county boundaries and place boundaries to select your places of interest. But this process doesn’t say anything about the population characteristics of places that potentially straddle two-or-more counties!
But there are two Census “summary levels” that can be used to clearly identify places-within-counties: summary level 160 (state-place); and summary level 155 (state-place-county). The #160 summary level is the more commonly used of the two. The #155 summary level is pretty much a secret summary level only used by curious census data wonks.
The two summary levels, combined, provide an accurate database of places-within-counties, including lists of places that straddle two-or-more counties.
The simplest approach is to download the available PL 94-171 “Redistricting File” for the US state of interest. Here’s the link to the Census 2010 Redistricting Files:
https://www2.census.gov/census_2010/redistricting_file--pl_94-171/ <https://www2.census.gov/census_2010/redistricting_file--pl_94-171/>
What is important is the “geographic header” file that includes all geographic summary levels of interest, including summary levels 155 and 160.
The following example extracts summary level 155 and 160 data for places in California, from Summary File #1 (SF1), though the PL 94-171 geographic header file is identical!
The filename “cageo2010.sf1” means “California” + “Geographic” + “2010” from “SF1”
This is a good example of building an “R” data frame from a fixed-format data file.
###########################################################################
## Extract the California Place Names and their Counties from the
## 2010 Decennial Census master geographic file, California, SF #1
###########################################################################
# install.packages("tidyverse")
# install.packages("dplyr")
library(dplyr)
setwd("~/Desktop/tidycensus_work/output")
setwd("~/Desktop/Census/2010_Census/ca2010.sf1")
# setwd("~/Desktop/ca2010.sf1")
x <- readLines(con="cageo2010.sf1") # Very large fixed format file, 843,287 observations for Calif
CalGeo <- data.frame(# fileid = substr(x, 1, 6),
# stusab = substr(x, 7, 8),
sumlev = substr(x, 9, 11),
# geocomp= substr(x, 12, 13),
state = substr(x, 28, 29),
county = substr(x, 30, 32),
place = substr(x, 46, 50),
# tract = substr(x, 55, 60),
# blkgrp = substr(x, 61, 61),
# block = substr(x, 62, 65),
arealand=substr(x, 199, 212),
areawatr=substr(x, 213, 226),
name = substr(x, 227, 316),
pop100 = substr(x, 319, 327),
hu100 = substr(x, 328, 336))
CalGeo$GEOID <- paste(CalGeo$state,CalGeo$place,sep="")
The following statements extract all summary level 155 and 160 records from the master geo-header file. The two files are then merge by the variables “state” and “place” and variables are renamed to something more recognizable. Note that if the variable names are the same in the merged data frames, “R” uses a convention of “variablename.x” for the first data frame and “variablename.y” for the second data frame used in the merge.
sumlev155 <- subset(CalGeo, sumlev == 155) # state-place-county summary level
sumlev160 <- subset(CalGeo, sumlev == 160) # state-place summary level
coplace1 <- merge(sumlev155, sumlev160, by = c('state','place'), all=TRUE)
coplace2 <- dplyr::rename(coplace1, county_name = name.x, # name is eg "Contra Costa County (part)"
place_name = name.y, # name is eg "Acalanes Ridge CDP"
county = county.x,
GEOID = GEOID.x)
The following statements are intended to identify places that straddle two-or-more counties in the state. In California (in 2010) we had four places that each straddle two counties, for a total of eight place-county records! These four place are Aromas (Monterey/San Benito Counties), Kingvale (Nevada/Place Counties), Kirkwood (Alpine, Amador Counties) and Tahoma (El Dorado/Place Counties).
# Extra places that straddle two-or-more counties.
# pop100.x = 2010 population count for, perhaps, part of the place (sumlev=155)
# pop100.y = 2010 population count for the FULL place (sumlev=160)
# This yields 4 places that straddle 2 counties, each, for 8 records in this file.
splittown <- subset(coplace2, pop100.x < pop100.y)
View(splittown)
And lastly, I wanted to extract the places within the nine-county San Francisco Bay Area. This is probably the simplest script for this extraction.
# Subset the Bay Area places from the SUMLEV=155/160 file
BayArea <- subset(coplace2, county== "001" | county=="013" | county=="041" |
county=="055" | county=="075" | county=="081" | county=="085" | county=="095" |
county=="097" )
# c(1,13,41,55,75,81,85,95,97)
Of course, write out the data frames to CSV files for further analysis.
setwd("~/Desktop/tidycensus_work/output")
write.csv(BayArea,"Census2010_BayArea_Places.csv")
write.csv(coplace2,"Census2010_California_Places.csv")
Let’s check Texas!
Just to check this procedure, I downloaded the PL 94-171 data files for the State of Texas. The geo-header file for Texas was even larger than California (n=1,158,742 records in Texas, n=843,287 records in California!)
Texas has 1,752 places (sumlev=160) and 1,934 place-county parts (sumlev=155). Upon inspection, there are places in Texas (Dallas!) that straddle five counties. That caught me by surprise!
The population-based split procedure for California (subset(coplace2, pop100.x < pop100.y)) didn’t work for Texas since there are a few place-county parts in Texas with zero population. I found that “arealand” works just fine for Texas. The following code works for Texas:
# Extra places that straddle two-or-more counties.
# pop100.x = 2010 population count for, perhaps, part of the place (sumlev=155)
# pop100.y = 2010 population count for the FULL place (sumlev=160)
# Find the Texas places straddling two-or-more counties
splittown <- subset(coplace2, pop100.x < pop100.y)
View(splittown)
# This works better for Texas, since there are a few place-county parts with zero population,
# and 100 percent of population in the other place-county part.
splittown2 <- subset(coplace2, arealand.x < arealand.y)
View(splittown2)
That’s as far as I’ve carried the Texas example, since I’m not on the lookout for a master national list of places split by county boundaries! Maybe this is something that the Census Bureau Geography Division has ready access to?
Let’s check New York!
This process also works for the State of New York. It’s still best to use the “AREALAND” differences, sumlev=155 versus sumlev=160, to find places that straddle two-or-more-counties (or boroughs in the case of New York City).
Yes, the City of New York straddles/encompasses five boroughs/counties. And there are 13 other places in New York State that straddle two counties: Almond, Attica, Brewerton, Bridgeport, Deposit, Dodgeville, Earlville, Geneva, Gowanda, Keeseville, Peach Lake, Rushville, and Saranac Lake.
These are “R” scripts that don’t use “tidycensus” but are clean methods for answering such a simple question as “what are the census places within my multi-county region?”
I’ve been having troubles sending my example set #2 for my introduction to tidycensus. Hopefully this time through it’ll get through.
Chuck Purvis,
Hayward, California
Example #2. More Complex Tidycensus examples: multiple years, multiple geographies, multiple variables.
This is a more complex example of a script to “pull” select variables from the ACS using my Census API key and the R package tidycensus.
Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. Otherwise, it won’t work.
# Step 0: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
In this set of examples, I’m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California.
# The get_acs function is run for each year of the single-year ACS data, from 2005 to 2018.
# Note that group quarters data was not collected in 2005, but started in 2006.
# Note the "_05_" included in the variable name in the first data "pull". That's a # # mnemonic device that tells us it's for the year 2005.
# Example 2.1 through 2.14: Run get_acs for large California Places, 2005-2018
# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!
# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
#-------------------------------------------------------------------------------
place05 <- get_acs(survey="acs1", year=2005, geography = "place", state = "CA",
show_call = TRUE, output="wide",
variables = c(TotalPop_05_ = "B06001_001", # Total Population
Med_HHInc_05_ = "B19013_001", # Median Household Income
Agg_HHInc_05_ = "B19025_001", # Aggregate Household Income
HHldPop_05_ = "B11002_001", # Population in Households
Househlds_05_ = "B25003_001", # Total Households
Owner_OccDU_05_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_05_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_05_ = "B25077_001")) # Median Value of Owner-Occ DUs
place05$Avg_HHSize_05 <- place05$HHldPop_05_E / place05$Househlds_05_E
place05$MeanHHInc_05 <- place05$Agg_HHInc_05_E / place05$Househlds_05_E
#------------------------------------------------------------------------------------
place06 <- get_acs(survey="acs1", year=2006, geography = "place", state = "CA",
show_call = TRUE, output="wide",
variables = c(TotalPop_06_ = "B06001_001", # Total Population
Med_HHInc_06_ = "B19013_001", # Median Household Income
Agg_HHInc_06_ = "B19025_001", # Aggregate Household Income
HHldPop_06_ = "B11002_001", # Population in Households
Househlds_06_ = "B25003_001", # Total Households
Owner_OccDU_06_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_06_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_06_ = "B25077_001")) # Median Value of Owner-Occ DUs
place06$Avg_HHSize_06 <- place06$HHldPop_06_E / place06$Househlds_06_E
place06$MeanHHInc_06 <- place06$Agg_HHInc_06_E / place06$Househlds_06_E
#------------------------------------------------------------------------------------
These sets of codes are repeated for each single year ACS of interest, say for 2005 through 2018. Smarter “R” programmers will be able to tell me about “do loops” to make this process more efficient with magical wild cards.
The following step merges the data frames using the GEOID/NAME variables. This create a very “wide” database. One record per geography, and each column representing the variable/year combinations.
The “merge” function in “R” allows only two data frames to be joined by common columns at a time. I have yet to find a “R” function that allows me to merge all of the data frames at once.
#####################################################################################
# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!
# Merge the dataframes, adding a year in each step. All=TRUE is needed if # of places is different.
#
# (R-language newbie script...There are probably more terse/exotic ways of doing this!)
place0506 <- merge(place05, place06, by = c('GEOID','NAME'), all=TRUE)
place0507 <- merge(place0506,place07, by = c('GEOID','NAME'), all=TRUE)
place0508 <- merge(place0507,place08, by = c('GEOID','NAME'), all=TRUE)
place0509 <- merge(place0508,place09, by = c('GEOID','NAME'), all=TRUE)
place0510 <- merge(place0509,place10, by = c('GEOID','NAME'), all=TRUE)
place0511 <- merge(place0510,place11, by = c('GEOID','NAME'), all=TRUE)
place0512 <- merge(place0511,place12, by = c('GEOID','NAME'), all=TRUE)
place0513 <- merge(place0512,place13, by = c('GEOID','NAME'), all=TRUE)
place0514 <- merge(place0513,place14, by = c('GEOID','NAME'), all=TRUE)
place0515 <- merge(place0514,place15, by = c('GEOID','NAME'), all=TRUE)
place0516 <- merge(place0515,place16, by = c('GEOID','NAME'), all=TRUE)
place0517 <- merge(place0516,place17, by = c('GEOID','NAME'), all=TRUE)
place0518 <- merge(place0517,place18, by = c('GEOID','NAME'), all=TRUE)
place_all <- place0518
View(place_all)
Sometimes you want to create smaller data frames with just a select number of columns. Here’s a good approach for that.
# The following functions output useful lists to the R-studio console which can then be edited
names(place_all)
dput(names(place_all)) # most useful for subsetting variables
# The purpose here is to re-order and select variables into a much more compact
# database, for eventual exporting into a CSV file, and then into Excel for finishing touches.
selvars <- c("GEOID", "NAME",
"TotalPop_05_E", "TotalPop_06_E", "TotalPop_07_E", "TotalPop_08_E",
"TotalPop_09_E", "TotalPop_10_E", "TotalPop_11_E", "TotalPop_12_E",
"TotalPop_13_E", "TotalPop_14_E", "TotalPop_15_E", "TotalPop_16_E",
"TotalPop_17_E", "TotalPop_18_E")
# note the brackets for outputing new data frame from previous data frame....
place_all2 <- place_all[selvars]
# View the Selected Variables Table
View(place_all2)
# Set directory for exported data files, MacOS directory style
setwd("~/Desktop/tidycensus_work/output")
# Export the data frames to CSV files, for importing to Excel, and applying finishing touches
write.csv(place_all2,"ACS_AllYears_TotalPop_Calif_Places.csv")
write.csv(place_all, "ACS_AllYears_BaseVar_Calif_Places.csv")
In this last example, I’m reading in a file of large places in the Bay Area (manually derived from the CSV file created previous) in order to subset Bay Area “large places” from State of California “large places”.
#####################################################################################
# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
# Read in a file with the Large SF Bay Area Places, > 65,000 population
# and merge with the All Large California Places
bayplace <- read.csv("BayArea_Places_65K.csv")
Bayplace1 <- merge(bayplace,place_all, by = c('NAME'))
Bayplace2 <- merge(bayplace,place_all2, by = c('NAME'))
write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places.csv")
write.csv(Bayplace2,"ACS_AllYears_TotalPop_BayArea_Places.csv")
This concludes Example #2: “multiple geographies / multiple years/ multiple variables” with only one record (row) per each geography.
For some reason my Tidycensus Example #2 didn’t get posted. I’ll retry. Maybe there’s a size limit, since the previous attempt had a fairly large pic embedded in it??
Chuck Purvis, Hayward, California….
Example #2. More Complex Tidycensus examples: multiple years, multiple geographies, multiple variables.
This is a more complex example of a script to “pull” select variables from the ACS using my Census API key and the R package tidycensus.
Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. Otherwise, it won’t work.
# Step 0: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
In this set of examples, I’m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California.
# The get_acs function is run for each year of the single-year ACS data, from 2005 to 2018.
# Note that group quarters data was not collected in 2005, but started in 2006.
# Note the "_05_" included in the variable name in the first data "pull". That's a # # mnemonic device that tells us it's for the year 2005.
# Example 2.1 through 2.14: Run get_acs for large California Places, 2005-2018
# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!
# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
#-------------------------------------------------------------------------------
place05 <- get_acs(survey="acs1", year=2005, geography = "place", state = "CA",
show_call = TRUE, output="wide",
variables = c(TotalPop_05_ = "B06001_001", # Total Population
Med_HHInc_05_ = "B19013_001", # Median Household Income
Agg_HHInc_05_ = "B19025_001", # Aggregate Household Income
HHldPop_05_ = "B11002_001", # Population in Households
Househlds_05_ = "B25003_001", # Total Households
Owner_OccDU_05_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_05_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_05_ = "B25077_001")) # Median Value of Owner-Occ DUs
place05$Avg_HHSize_05 <- place05$HHldPop_05_E / place05$Househlds_05_E
place05$MeanHHInc_05 <- place05$Agg_HHInc_05_E / place05$Househlds_05_E
#------------------------------------------------------------------------------------
place06 <- get_acs(survey="acs1", year=2006, geography = "place", state = "CA",
show_call = TRUE, output="wide",
variables = c(TotalPop_06_ = "B06001_001", # Total Population
Med_HHInc_06_ = "B19013_001", # Median Household Income
Agg_HHInc_06_ = "B19025_001", # Aggregate Household Income
HHldPop_06_ = "B11002_001", # Population in Households
Househlds_06_ = "B25003_001", # Total Households
Owner_OccDU_06_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_06_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_06_ = "B25077_001")) # Median Value of Owner-Occ DUs
place06$Avg_HHSize_06 <- place06$HHldPop_06_E / place06$Househlds_06_E
place06$MeanHHInc_06 <- place06$Agg_HHInc_06_E / place06$Househlds_06_E
#------------------------------------------------------------------------------------
These sets of codes are repeated for each single year ACS of interest, say for 2005 through 2018. Smarter “R” programmers will be able to tell me about “do loops” to make this process more efficient with magical wild cards.
The following step merges the data frames using the GEOID/NAME variables. This create a very “wide” database. One record per geography, and each column representing the variable/year combinations.
The “merge” function in “R” allows only two data frames to be joined by common columns at a time. I have yet to find a “R” function that allows me to merge all of the data frames at once.
#####################################################################################
# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!
# Merge the dataframes, adding a year in each step. All=TRUE is needed if # of places is different.
#
# (R-language newbie script...There are probably more terse/exotic ways of doing this!)
place0506 <- merge(place05, place06, by = c('GEOID','NAME'), all=TRUE)
place0507 <- merge(place0506,place07, by = c('GEOID','NAME'), all=TRUE)
place0508 <- merge(place0507,place08, by = c('GEOID','NAME'), all=TRUE)
place0509 <- merge(place0508,place09, by = c('GEOID','NAME'), all=TRUE)
place0510 <- merge(place0509,place10, by = c('GEOID','NAME'), all=TRUE)
place0511 <- merge(place0510,place11, by = c('GEOID','NAME'), all=TRUE)
place0512 <- merge(place0511,place12, by = c('GEOID','NAME'), all=TRUE)
place0513 <- merge(place0512,place13, by = c('GEOID','NAME'), all=TRUE)
place0514 <- merge(place0513,place14, by = c('GEOID','NAME'), all=TRUE)
place0515 <- merge(place0514,place15, by = c('GEOID','NAME'), all=TRUE)
place0516 <- merge(place0515,place16, by = c('GEOID','NAME'), all=TRUE)
place0517 <- merge(place0516,place17, by = c('GEOID','NAME'), all=TRUE)
place0518 <- merge(place0517,place18, by = c('GEOID','NAME'), all=TRUE)
place_all <- place0518
View(place_all)
Sometimes you want to create smaller data frames with just a select number of columns. Here’s a good approach for that.
# The following functions output useful lists to the R-studio console which can then be edited
names(place_all)
dput(names(place_all)) # most useful for subsetting variables
# The purpose here is to re-order and select variables into a much more compact
# database, for eventual exporting into a CSV file, and then into Excel for finishing touches.
selvars <- c("GEOID", "NAME",
"TotalPop_05_E", "TotalPop_06_E", "TotalPop_07_E", "TotalPop_08_E",
"TotalPop_09_E", "TotalPop_10_E", "TotalPop_11_E", "TotalPop_12_E",
"TotalPop_13_E", "TotalPop_14_E", "TotalPop_15_E", "TotalPop_16_E",
"TotalPop_17_E", "TotalPop_18_E")
# note the brackets for outputing new data frame from previous data frame....
place_all2 <- place_all[selvars]
# View the Selected Variables Table
View(place_all2)
# Set directory for exported data files, MacOS directory style
setwd("~/Desktop/tidycensus_work/output")
# Export the data frames to CSV files, for importing to Excel, and applying finishing touches
write.csv(place_all2,"ACS_AllYears_TotalPop_Calif_Places.csv")
write.csv(place_all, "ACS_AllYears_BaseVar_Calif_Places.csv")
In this last example, I’m reading in a file of large places in the Bay Area (manually derived from the CSV file created previous) in order to subset Bay Area “large places” from State of California “large places”.
#####################################################################################
# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
# Read in a file with the Large SF Bay Area Places, > 65,000 population
# and merge with the All Large California Places
bayplace <- read.csv("BayArea_Places_65K.csv")
Bayplace1 <- merge(bayplace,place_all, by = c('NAME'))
Bayplace2 <- merge(bayplace,place_all2, by = c('NAME'))
write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places.csv")
write.csv(Bayplace2,"ACS_AllYears_TotalPop_BayArea_Places.csv")
This concludes Example #2: “multiple geographies / multiple years/ multiple variables” with only one record (row) per each geography.
This is a followup to my 8/6/20 e-mail to the CTPP-News listserv. Just two more after this one. An “R” script is attached.
Example #2. More Complex Tidycensus examples: multiple years, multiple geographies, multiple variables.
This is a more complex example of a script to “pull” select variables from the ACS using my Census API key and the R package tidycensus.
Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. Otherwise, it won’t work.
# Step 0: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
In this set of examples, I’m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California.
# The get_acs function is run for each year of the single-year ACS data, from 2005 to 2018.
# Note that group quarters data was not collected in 2005, but started in 2006.
# Note the "_05_" included in the variable name in the first data "pull". That's a # # mnemonic device that tells us it's for the year 2005.
# Example 2.1 through 2.14: Run get_acs for large California Places, 2005-2018
# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!
# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
#-------------------------------------------------------------------------------
place05 <- get_acs(survey="acs1", year=2005, geography = "place", state = "CA",
show_call = TRUE, output="wide",
variables = c(TotalPop_05_ = "B06001_001", # Total Population
Med_HHInc_05_ = "B19013_001", # Median Household Income
Agg_HHInc_05_ = "B19025_001", # Aggregate Household Income
HHldPop_05_ = "B11002_001", # Population in Households
Househlds_05_ = "B25003_001", # Total Households
Owner_OccDU_05_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_05_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_05_ = "B25077_001")) # Median Value of Owner-Occ DUs
place05$Avg_HHSize_05 <- place05$HHldPop_05_E / place05$Househlds_05_E
place05$MeanHHInc_05 <- place05$Agg_HHInc_05_E / place05$Househlds_05_E
#------------------------------------------------------------------------------------
place06 <- get_acs(survey="acs1", year=2006, geography = "place", state = "CA",
show_call = TRUE, output="wide",
variables = c(TotalPop_06_ = "B06001_001", # Total Population
Med_HHInc_06_ = "B19013_001", # Median Household Income
Agg_HHInc_06_ = "B19025_001", # Aggregate Household Income
HHldPop_06_ = "B11002_001", # Population in Households
Househlds_06_ = "B25003_001", # Total Households
Owner_OccDU_06_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_06_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_06_ = "B25077_001")) # Median Value of Owner-Occ DUs
place06$Avg_HHSize_06 <- place06$HHldPop_06_E / place06$Househlds_06_E
place06$MeanHHInc_06 <- place06$Agg_HHInc_06_E / place06$Househlds_06_E
#------------------------------------------------------------------------------------
These sets of codes are repeated for each single year ACS of interest, say for 2005 through 2018. Smarter “R” programmers will be able to tell me about “do loops” to make this process more efficient with magical wild cards.
The following step merges the data frames using the GEOID/NAME variables. This create a very “wide” database. One record per geography, and each column representing the variable/year combinations.
The “merge” function in “R” allows only two data frames to be joined by common columns at a time. I have yet to find a “R” function that allows me to merge all of the data frames at once.
#####################################################################################
# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!
# Merge the dataframes, adding a year in each step. All=TRUE is needed if # of places is different.
#
# (R-language newbie script...There are probably more terse/exotic ways of doing this!)
place0506 <- merge(place05, place06, by = c('GEOID','NAME'), all=TRUE)
place0507 <- merge(place0506,place07, by = c('GEOID','NAME'), all=TRUE)
place0508 <- merge(place0507,place08, by = c('GEOID','NAME'), all=TRUE)
place0509 <- merge(place0508,place09, by = c('GEOID','NAME'), all=TRUE)
place0510 <- merge(place0509,place10, by = c('GEOID','NAME'), all=TRUE)
place0511 <- merge(place0510,place11, by = c('GEOID','NAME'), all=TRUE)
place0512 <- merge(place0511,place12, by = c('GEOID','NAME'), all=TRUE)
place0513 <- merge(place0512,place13, by = c('GEOID','NAME'), all=TRUE)
place0514 <- merge(place0513,place14, by = c('GEOID','NAME'), all=TRUE)
place0515 <- merge(place0514,place15, by = c('GEOID','NAME'), all=TRUE)
place0516 <- merge(place0515,place16, by = c('GEOID','NAME'), all=TRUE)
place0517 <- merge(place0516,place17, by = c('GEOID','NAME'), all=TRUE)
place0518 <- merge(place0517,place18, by = c('GEOID','NAME'), all=TRUE)
place_all <- place0518
View(place_all)
Sometimes you want to create smaller data frames with just a select number of columns. Here’s a good approach for that.
# The following functions output useful lists to the R-studio console which can then be edited
names(place_all)
dput(names(place_all)) # most useful for subsetting variables
# The purpose here is to re-order and select variables into a much more compact
# database, for eventual exporting into a CSV file, and then into Excel for finishing touches.
selvars <- c("GEOID", "NAME",
"TotalPop_05_E", "TotalPop_06_E", "TotalPop_07_E", "TotalPop_08_E",
"TotalPop_09_E", "TotalPop_10_E", "TotalPop_11_E", "TotalPop_12_E",
"TotalPop_13_E", "TotalPop_14_E", "TotalPop_15_E", "TotalPop_16_E",
"TotalPop_17_E", "TotalPop_18_E")
# note the brackets for outputing new data frame from previous data frame....
place_all2 <- place_all[selvars]
# View the Selected Variables Table
View(place_all2)
# Set directory for exported data files, MacOS directory style
setwd("~/Desktop/tidycensus_work/output")
# Export the data frames to CSV files, for importing to Excel, and applying finishing touches
write.csv(place_all2,"ACS_AllYears_TotalPop_Calif_Places.csv")
write.csv(place_all, "ACS_AllYears_BaseVar_Calif_Places.csv")
In this last example, I’m reading in a file of large places in the Bay Area (manually derived from the CSV file created previous) in order to subset Bay Area “large places” from State of California “large places”.
#####################################################################################
# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
# Read in a file with the Large SF Bay Area Places, > 65,000 population
# and merge with the All Large California Places
bayplace <- read.csv("BayArea_Places_65K.csv")
Bayplace1 <- merge(bayplace,place_all, by = c('NAME'))
Bayplace2 <- merge(bayplace,place_all2, by = c('NAME'))
write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places.csv")
write.csv(Bayplace2,"ACS_AllYears_TotalPop_BayArea_Places.csv")
Here’s a screenshot of a summary table showing total population in large places in the San Francisco Bay Area, 2005-2018. (It’s a big table, better viewed in landscape mode, letter size!)
This concludes Example #2: “multiple geographies / multiple years/ multiple variables” with only one record (row) per each geography.
This is the last tidycensus example that I’ve prepared. Hope this proves useful!
Stay safe!
Chuck Purvis,
Hayward, California
formerly of the Metropolitan Transportation Commission, San Francisco California.
clpurvis(a)att.net
Example #4. Explore the Geographies that Tidycensus can pull out.
This is an example of using tidycensus to extract 2014/18 ACS data for all available geographies for the entire USA. It’s just a test of capabilities.
Some geographic levels (county subdivision, state upper and lower houses) don’t appear to be working.
There is some extra code in the “PUMA step” to tally the number of PUMAs per US state, and the minimum and maximum total population levels for each PUMA in the US.
# Step 0: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
library(janitor)
library(plyr) # This is needed for a function to concatenate a lot of files in one statement!
library(dplyr)
# Add the variable Geography_Name to each data frame. Maybe concatenate/pancake these dataframes?
selvars <- c(TotalPop_ = "B06001_001", # Total Population
SamplePop_ = "B00001_001", # Unweighted Sample Count of Population
HHUnits_ = "B25002_001", # Total Housing Units
Househlds_ = "B25002_002", # Total Households
SampleDU_ = "B00002_001") # Unweighted Sample Count of Dwelling Units
#------------------------------------------------------------------------------------
us <- get_acs(survey="acs5", year=2018, geography = "us",
show_call = TRUE,output="wide", variables = selvars)
us$Geography_Name <- "us"
#------------------------------------------------------------------------------------
region <- get_acs(survey="acs5", year=2018, geography = "region",
show_call = TRUE,output="wide", variables = selvars)
region$Geography_Name <- "region"
#------------------------------------------------------------------------------------
division <- get_acs(survey="acs5", year=2018, geography = "division",
show_call = TRUE,output="wide", variables = selvars)
division$Geography_Name <- "division"
#------------------------------------------------------------------------------------
state <- get_acs(survey="acs5", year=2018, geography = "state",
show_call = TRUE,output="wide", variables = selvars)
state$Geography_Name <- "state"
setwd("~/Desktop/tidycensus_work/output")
write.csv(state,"ACS1418_USA_State_1.csv")
#------------------------------------------------------------------------------------
county <- get_acs(survey="acs5", year=2018, geography = "county",
show_call = TRUE,output="wide", variables = selvars)
county$Geography_Name <- "county"
#------------------------------------------------------------------------------------
# County Subdivision isn't working ...returns an API error (unknown/unsupported geography)
countysubdiv <- get_acs(survey="acs5", year=2018, geography = "county subdivision",
show_call = TRUE,output="wide", variables = selvars)
countysubdiv$Geography_Name <- "countysubdiv"
#------------------------------------------------------------------------------------
# Pull just the tracts in Alameda County, California
tract <- get_acs(survey="acs5", year=2018, geography = "tract", state="CA",
county="Alameda",show_call = TRUE,output="wide", variables = selvars)
tract$Geography_Name <- "tract"
#------------------------------------------------------------------------------------
# Pull just the block groups in Alameda County, California
blockgroup<- get_acs(survey="acs5", year=2018, geography = "block group", state="CA", county="Alameda",show_call = TRUE,output="wide", variables = selvars)
blockgroup$Geography_Name <- "blockgroup"
#------------------------------------------------------------------------------------
place <- get_acs(survey="acs5", year=2018, geography = "place",
show_call = TRUE,output="wide", variables = selvars)
place$Geography_Name <- "place"
#------------------------------------------------------------------------------------
urban <- get_acs(survey="acs5", year=2018, geography = "urban area",
show_call = TRUE,output="wide", variables = selvars)
urban$Geography_Name <- "urban"
#------------------------------------------------------------------------------------
congdist <- get_acs(survey="acs5", year=2018, geography = "congressional district",
show_call = TRUE,output="wide", variables = selvars)
congdist$Geography_Name <- "congdist"
#------------------------------------------------------------------------------------
puma <- get_acs(survey="acs5", year=2018, geography = "public use microdata area",
show_call = TRUE,output="wide", variables = selvars)
puma$TotalPop2 <- puma$TotalPop_E * 1.0
puma$Tally <- 1.0
puma$State <- substr(puma$GEOID,1,2)
puma$Geography_Name <- "puma"
pumas <- puma[order(puma$State,puma$GEOID),]
summary(pumas)
sum1 <- aggregate(pumas[,3:12],
by = list(pumas$State),
FUN = sum, na.rm=TRUE)
min1 <- aggregate(pumas[,3:12],
by = list(pumas$State),
FUN = min, na.rm=TRUE)
max1 <- aggregate(pumas[,3:12],
by = list(pumas$State),
FUN = max, na.rm=TRUE)
setwd("~/Desktop/tidycensus_work/output")
write.csv(sum1,"ACS1418_USA_PUMA_sum_by_State_1.csv")
write.csv(min1,"ACS1418_USA_PUMA_min_by_State_1.csv")
write.csv(max1,"ACS1418_USA_PUMA_max_by_State_1.csv")
write.csv(pumas,"ACS1418_USA_PUMA_All_1.csv")
sum2 <- pumas %>%
group_by(State) %>%
summarize_at("TotalPop_E",
list(name=sum))
#------------------------------------------------------------------------------------
csa <- get_acs(survey="acs5", year=2018, geography = "combined statistical area",
show_call = TRUE,output="wide", variables = selvars)
csa$Geography_Name <- "csa"
#------------------------------------------------------------------------------------
msamisa <- get_acs(survey="acs5", year=2018, geography = "metropolitan statistical area/micropolitan statistical area",
show_call = TRUE,output="wide", variables = selvars)
msamisa$Geography_Name <- "msamisa"
#------------------------------------------------------------------------------------
zcta <- get_acs(survey="acs5", year=2018, geography = "zcta",
show_call = TRUE,output="wide", variables = selvars)
zcta$Geography_Name <- "zcta"
#------------------------------------------------------------------------------------
# State Senate and House aren't working ...returns an API error (unknown/unsupported geography)
statesenate <- get_acs(survey="acs5", year=2018, geography = "state legislative district (upper chamber)",
show_call = TRUE,output="wide", variables = selvars)
statesenate$Geography_Name <- "statesenate"
#------------------------------------------------------------------------------------
statehouse <- get_acs(survey="acs5", year=2018, geography = "state legislative district (lower chamber)",
show_call = TRUE,output="wide", variables = selvars)
statehouse$Geography_Name <- "statehouse"
#------------------------------------------------------------------------------------
This concludes Example #4: “exploring tidycensus geography.”
Here’s my writeup on creating on what I call a “stacked” output from tidycensus: one record per each unique geography / year combination. This approach may be preferred if you’re trying to create a “data profile” for a specific geographic area, with rows representing the years, and columns representing the various variables of interest (total population, household population, workers by means of transportation to work, etc.)
I’m almost done! Hope this helps!
Chuck Purvis,
Hayward, California
Example #3. More Complex Tidycensus examples: multiple years, multiple geographies, multiple variables. “Stacked” results.
This is an example of stacking “R” data frames, where each record (row) represents a unique geography/year combination.
Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. I’m loading the “R” package “plyr” which helps in stacking / concatenating / pancaking data frames.
# Step 0: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
library(janitor)
library(plyr) # This is needed for a function to concatenate a lot of files in one statement!
In this set of examples, I’m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California. Very similar to Example #2, but with one record (row) per each geography/year combination.
# Example 3.1 through 3.14: Run get_acs for large California Places, 2005-2018
# Example 3.15: Concatenate (pancake) data frames: lots of records
# Example 3.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
# Example 3.17: Extract data for one place using a string search on the place name
#------------------------------------------------------------------------------------
# Set a list of variables to extract in each iteration of get_acs
# This is a LOT more efficient for variable naming!!!
selvars <- c(TotalPop_ = "B06001_001", # Total Population
Med_HHInc_ = "B19013_001", # Median Household Income
Agg_HHInc_ = "B19025_001", # Aggregate Household Income
HHldPop_ = "B11002_001", # Population in Households
Househlds_ = "B25003_001", # Total Households
Owner_OccDU_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_ = "B25077_001")
#------------------------------------------------------------------------------------
temp2005 <- get_acs(survey="acs1", year=2005, geography = "place", state = "CA",
show_call = TRUE,output="wide", variables = selvars)
temp2005$Year <- "2005"
#------------------------------------------------------------------------------------
temp2006 <- get_acs(survey="acs1", year=2006, geography = "place", state = "CA",
show_call = TRUE,output="wide", variables = selvars)
temp2006$Year <- "2006"
#------------------------------------------------------------------------------------
temp2007 <- get_acs(survey="acs1", year=2007, geography = "place", state = "CA",
show_call = TRUE,output="wide", variables = selvars)
temp2007$Year <- "2007"
#------------------------------------------------------------------------------------
These sets of codes are repeated for each ACS single-year of interest. Note that I’m adding a new variable “Year” to each data frame. Otherwise, I have no indication of the year of each data frame, other than the actual name of the file!
In the following “R” step, I’m using the “dplyr” function “rbind.fill” to concatenate a lot of data frames!
# Example 3.15: Concatenate (pancake) data frames: lots of records
# Concatenate All Years .....
# rbind can only concatenate two dataframes at a time. rbind.fill can do 2-or-more data
# frames to concatenate. It's a plyr function.
# temp0506 <- rbind(temp2005,temp2006)
# temp0507 <- rbind(temp0506,temp2007)
tempall <- rbind.fill(temp2005,temp2006,temp2007,temp2008,temp2009,
temp2010,temp2011,temp2012,temp2013,temp2014,
temp2015,temp2016,temp2017,temp2018)
# Add a couple of useful variables!
# need to have a if/then to catch zero values.. work on this later.
# tempall$Avg_HHSize <- tempall$HHldPop_E / tempall$Househlds_E
# tempall$MeanHHInc <- tempall$Agg_HHInc_E / tempall$Househlds_E
# Sort the Results by GEOID and then by Year
tempalls <- tempall[order(tempall$GEOID,tempall$Year),]
setwd("~/Desktop/tidycensus_work/output")
# Export the data frames to CSV files, for importing to Excel, and applying finishing touches
write.csv(tempalls,"ACS_AllYears_Calif_Places_Stacked.csv")
In the following step I’m extracting data for large places in the San Francisco Bay Area.
# Example 3.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
# Read in a file with the Large SF Bay Area Places, > 65,000 population
# and merge with the All Large California Places
bayplace <- read.csv("BayArea_Places_65K.csv")
Bayplace1 <- merge(bayplace,tempalls, by = c('NAME'))
Bayplace1 <- Bayplace1[order(Bayplace1$GEOID.x,Bayplace1$Year),]
write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places_Stacked.csv")
dput(names(Bayplace1))
In the following step I’m extracting data for just “Hayward” city in the San Francisco Bay Area. This uses the “R” function “grepl”. (That’s grep-ell, not grep-one).
# Example 3.17: Extract data for one place using a string search on the place name
# Extract one place at a time from Bayplace1
#
Hayward <- filter(Bayplace1, grepl("Hayward",NAME,fixed=TRUE))
Hayward <- Hayward[order(Hayward$Year),]
Hayward$Avg_HHSize <- Hayward$HHldPop_E / Hayward$Househlds_E
Hayward$MeanHHInc <- Hayward$Agg_HHInc_E / Hayward$Househlds_E
selvarxxx <- c("Year","NAME", "GEOID.x", "NAME2",
"TotalPop_E", "Med_HHInc_E",
"Agg_HHInc_E", "HHldPop_E", "Househlds_E",
"Owner_OccDU_E", "Rent_OccDU_E",
"Med_HHVal_E", "Avg_HHSize", "MeanHHInc" )
Hayward2 <- Hayward[selvarxxx]
write.csv(Hayward2,"ACS_AllYears_BaseVar_Hayward_Stacked.csv")
#####################################################################################
This concludes Example #3: “multiple geographies / multiple years/ multiple variables” with only one record (row) per each geography/year combination, or the “stacked” output.
Apologies if you have already seen this. According to my read Table
B98001 (Unweighted Housing Unit Sample) will be published for nation,
state, county and place in both the 1-year and 5-year ACS beginning with
the 2019 ACS release, and I assume moving forward. What struck me was
that there was no reference to the B98001 information ever being
available for tracks or below ever again.
-------- Forwarded Message --------
Subject: Research Matters Blog: ACS Updates on Disclosure Avoidance and
Release Plans
Date: Thu, 20 Aug 2020 12:20:48 -0500
From: U.S. Census Bureau <census(a)subscriptions.census.gov>
Reply-To: census(a)subscriptions.census.gov
To: edc(a)berwyned.com
Research Matters Blog: ACS Updates on Disclosure Avoidance and Release
Plans
Registered United States Census Bureau Logo
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDAsInVy…>
*ACS Updates on Disclosure Avoidance and Release Plans*
*RESEARCH MATTERS BLOG*
AUG. 20, 2020**
Written By: Dr. John M. Abowd, chief scientist and associate director
for Research and Methodology, and Donna M. Daily, chief, American
Community Survey Office
Despite changes and delays during this unprecedented time, we are happy
to report that the U.S. Census Bureau is on track to release the 2019
American Community Survey (ACS) 1-year estimates as scheduled Sept. 17,
2020. Check out our website for the full data release schedule and other
details
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDEsInVy…>_._
As we prepare to release the 2019 ACS products, we want to remind data
users of our commitment to protect respondent privacy and
confidentiality. Prior blogs
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDIsInVy…>
outlined steps the Census Bureau is taking to modernize the procedures
we use to protect respondent data. Our adoption of formal privacy will
allow us to strengthen safeguards and increase transparency about the
impact of privacy protections on data accuracy.
As Deputy Director Ron Jarmin previously stated
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDMsInVy…>,
we do not plan to implement formal privacy for the full suite of ACS
data products before 2025. In the interim, we will continue to evaluate
existing privacy protections and bolster them as necessary to address
privacy risks that emerge. Our goal is to maintain the utility of the
ACS as the preeminent federal survey for federal, state and local data
users, while remaining committed to our legal obligations to protect
confidentiality.**
One area where we are strengthening our disclosure avoidance methods is
the count of final interviews published in our quality measures and
other detailed tables. We will continue to publish the quality measures
tables that provide this information for /select/ geographies. But we
are discontinuing the tables that included this information for /all/
geographies.^^[1] <#_ftn1> In addition, we are adding “noise” to the
interview counts to cut the risk of disclosure while still providing a
general indicator of data quality for the geography of interest. We will
continue to publish the household sample sizes selected for invitation
to complete the ACS, without added noise or rounding. The reason: These
sample sizes are properties of the ACS design, not its realized sample,
and provide a more robust indicator of data quality for very small
geographies.
Table B98001 (Unweighted Housing Unit Sample) will be published for
nation, state, county and place in both the 1-year and 5-year ACS
beginning with the 2019 ACS release. Previously, the table was not
published for places in the 1-year ACS. A new table will be created of
final person interviews, which will be released for nation, state,
county and place in the 1-year and 5-year ACS. This will provide the
same information as the former B00001, but will be restricted to the
same summary levels as B98001.
The Census Bureau has a tradition and public expectation of producing
high quality statistics while protecting the confidentiality of
respondents. We will work closely with our scientific and data user
communities as we explore options for modernizing ACS privacy
protections while ensuring the data products’ continued high quality and
fitness-for-use. The Census Bureau is funding collaboration
opportunities with external researchers on the issue of formal privacy
for sample surveys. A key deliverable of such collaboration will be
establishing effective data user engagement. We will provide more
information about this effort as it becomes available.
Send comments or questions to <ACSprivacy(a)census.gov>.
------------------------------------------------------------------------
[1]
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDQsInVy…>
These data were published in B00001 and B00002 in the 1-year and 5-year
ACS products as well as K200001 and K200002 in the 1-year supplemental
ACS product.
Learn More
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDUsInVy…>
Resources for the Media
We're here to help you get the most out of our tipsheets, press
releases, trainings, and media services. If you have a question, give us
a call at 301-763-3030 or contact our Public Information Office at
pio(a)census.gov <mailto:pio(a)census.gov>.
Share This
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDYsInVy…>
*Stay connected with us!*
Join the conversation on social media.
facebook
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDcsInVy…>twitter
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDgsInVy…>linkedin
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDksInVy…>youtube
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMTAsInVy…>instagram
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMTEsInVy…>
SUBSCRIBER SERVICES:
Subscriber Settings
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMTIsInVy…>
| Remove me from All Subscriptions
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMTMsInF1…> |
Help
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMTQsInVy…>
------------------------------------------------------------------------
This is an official email from the U.S. Census Bureau. If you have any
questions or comments, please contact us
(http://www.census.gov/about/contact-us.html
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMTUsInVy…>).
Good Morning,
The marketing department at AASHTO is taking a crack at helping us get more on the map. One major benefit to this is that when our next funding cycle rolls around (2023) we will be a bit ahead in the "materials for decision makers" department. To this end, if you have some success story or testimonial regarding the CTPP, I would love to hear about it. It may be tweeted, posted to Facebook or Instagram (if visual), or used in future marketing. If you would rather send them to me, and not the whole list, that would be great: pweinberger(a)aashto.org<mailto:pweinberger(a)aashto.org>. I've included a couple examples below my signature, to give you ideas.
Thanks,
Penelope Weinberger
CTPP Program Manager
AASHTO
Ctpp.transportation.org
"The CTPP program is a vital component for understanding travel in the state of Florida. The data, along with the technical support provided by AASHTO helps Florida understand the nature of the workforce in Florida, how, when, and where they travel for work, and the impacts on congestion and transportation operations. The CTPP data is a cost effective tool for helping Florida DOT achieve its mission of providing for the mobility of people and ensuring economic prosperity by helping provide a data driven solution to transportation problems."
- Florida Department of Transportation
Several major model development projects in the state of Colorado have used CTPP Journey-to-Work data, including development of activity-based models for the Denver region and for the entire state. CTPP is one of the "go-to" sources of travel pattern data that is genuinely independent of the travel survey data commonly used to estimate these models. Unbiased, independent data of this type is very hard to come by, and the CTPP is a critical piece of just about any model development project puzzle.
Example #1. Tidycensus examples: one year, multiple geographies, multiple variables
This is an example of a simple script to “pull” select variables from the ACS using my Census API key and the R package tidycensus.
Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. Otherwise, it won’t work.
Comments start with hashtag “#”. It’s more obvious when using R-Studio.
# Step 0: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
library(janitor)
Example #1.1 pulls the 2018 ACS data for San Francisco Bay Area counties, for table C03002 (population by race/ethnicity). It’s pulled into a “data frame” called “county1”.
Though the keywords (survey, year, geography, etc.) can be in any order within the “get_acs()” statement, I prefer leading with:
1. Survey=”acs1” – am I using the 1-year or 5-year databases
2. Year=2018 - what’s the last year of the 1-yr/5-year database
3. Geography=”county” – what level of geography am I pulling? US, State? County? Congressional District? Place?
See the tidycensus documentation, and the author’s website, for all of this and more!
https://walker-data.com/tidycensus/articles/basic-usage.html <https://walker-data.com/tidycensus/articles/basic-usage.html>
https://cran.r-project.org/web/packages/tidycensus/tidycensus.pdf <https://cran.r-project.org/web/packages/tidycensus/tidycensus.pdf>
https://www.rdocumentation.org/packages/tidycensus/ <https://www.rdocumentation.org/packages/tidycensus/>
# Simple Example #1.1: Population by Race/Ethnicity, 2018, SF Bay, Table C03002
# Note that tidycensus can use either the County Name or the County FIPS Code number.
# Experiment with output="wide" versus output="tidy" ("tidy" is the default.)
#####################################################################################
county1 <- get_acs(survey="acs1", year=2018, geography = "county", state = "CA",
# county=c(1,13,41,55,75,81,85,95,97),
county=c("Alameda","Contra Costa","Marin","Napa","San Francisco",
"San Mateo","Santa Clara","Solano","Sonoma"),
show_call = TRUE, output="wide",
table="C03002")
Example #1.2 is a variation on the previous script portion and pulls out population by race/ethnicity for ALL California counties, 2014/18 five-year ACS. If I used “ACS1” and “2018”, I’d only obtain data for the largest counties with 65,000+ total population!
# Simple Example #1.2: Population by Race/Ethnicity, 2014-2018, All California Counties, Table B03002
# If the list of counties is excluded,
# then data is pulled for all counties in the State
######################################################################################
AllCalCounties <- get_acs(survey="acs5", year=2018, geography = "county",
state = "CA", show_call = TRUE, output="wide", table="B03002")
Example #1.3 pulls out population by race/ethnicity for ALL Congressional Districts in California, for the single year 2018 ACS.
# Simple Example #1.3: Population by Race/Ethnicity, 2018, California Congress Dists, Table C03002
# This example pulls the congressional districts from California. Eliminate state="CA" to get congressional districts from the entire United States
######################################################################################
congdist1 <- get_acs(survey="acs1", year=2018, geography = "congressional district",
state = "CA", show_call = TRUE, output="wide", table="C03002")
Example #1.4 Names the variables using mnemonic names for population by race/ethnicity, 2018, single year ACS, Bay Area counties. I’m using the janitor package “adorn_totals” function to sum up regional totals.
The tidycensus package will append “E” to variable estimates and “M” to variable margins of error (90 percent confidence level, by default). So, the variable “White_NH_E” will mean, to me, at least, “Estimates of White Non-Hispanic Population” and “White_NH_M” will mean: “Margin of Error, 90% confidence level, of White Non-Hispanic Population.”
# Simple Example #1.4.1: Population by Race/Ethnicity: Bay Counties: Naming Variables.
# User-defined mnemonic variable names, since "C03002_001_E" doesn't fall trippingly on the tongue!
# the underscore is useful since tidycensus will append "E" to estimates and "M" to margin of error
# variables, e.g., "Total_E" and "Total_M"
######################################################################################
county2 <- get_acs(survey="acs1", year=2018, geography = "county", state = "CA",
county=c(1,13,41,55,75,81,85,95,97),
show_call = TRUE, output="wide",
variables = c(Total_ = "C03002_001", # Universe is Total Population
White_NH_ = "C03002_003", # Non-Hispanic White
Black_NH_ = "C03002_004", # Non-Hispanic Black
AIAN_NH_ = "C03002_005", # NH, American Indian & Alaskan Native
Asian_NH_ = "C03002_006", # Non-Hispanic Asian
NHOPI_NH_ = "C03002_007", # NH, Native Hawaiian & Other Pacific Isl.
Other_NH_ = "C03002_008", # Non-Hispanic Other
Multi_NH_ = "C03002_009", # Two-or-More Races, Non-Hispanic
Hispanic_ = "C03002_012")) # Hispanic/Latino
# Sometimes the results of TIDYCENSUS aren't sorted, so:
county2 <- county2[order(county2$GEOID),]
###########################################################################
# Simple Example #1.4.2: Add a new record: SF Bay Area, as sum of records 1-9
# adorn_totals is a function from the package janitor.
# The name="06888" is arbitrary, just a filler for the GEOID column.
tempxxx <- adorn_totals(county2,name="06888")
tempxxx[10,2]="San Francisco Bay Area"
county3 <- tempxxx
# Set a working directory, and write out CSV files as wanted.
# This is an example for a Mac, with the folder tidycensus_work on the desktop, and
# the folder output within tidycensus_work
setwd("~/Desktop/tidycensus_work/output")
write.csv(county3,"ACS18_BayAreaCounties.csv")
#############################################################################
At the end of this step I’m writing out CSV (comma separated value) files which I then open in Excel for finishing touches to tables, manually editing the variable names to something les cryptic:
That’s all for today!
Chuck Purvis,
Retired Person, Hayward, California
(Formerly of the Metropolitan Transportation Commission, San Francisco, California)
Take care!!