Example #1. Tidycensus examples: one year, multiple geographies, multiple variables

This is an example of a simple script to ���pull��� select variables from the ACS using my Census API key and the R package tidycensus.


Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. Otherwise, it won���t work.

 Comments start with hashtag ���#���. It���s more obvious when using R-Studio.

# Step 0: Load relevant libraries into each R-session.

 

library(tidyverse)

library(tidycensus)

library(janitor)

 

Example #1.1 pulls the 2018 ACS data for San Francisco Bay Area counties, for table C03002 (population by race/ethnicity). It���s pulled into a ���data frame��� called ���county1���.

 Though the keywords (survey, year, geography, etc.) can be in any order within the ���get_acs()��� statement, I prefer leading with:

1.  Survey=���acs1��� ��� am I using the 1-year or 5-year databases

2.  Year=2018  - what���s the last year of the 1-yr/5-year database

3.  Geography=���county��� ��� what level of geography am I pulling? US, State? County? Congressional District? Place?

 

See the tidycensus documentation, and the author���s website, for all of this and more!

https://walker-data.com/tidycensus/articles/basic-usage.html

https://cran.r-project.org/web/packages/tidycensus/tidycensus.pdf

https://www.rdocumentation.org/packages/tidycensus/

 

# Simple Example #1.1: Population by Race/Ethnicity, 2018, SF Bay, Table C03002

#  Note that tidycensus can use either the County Name or the County FIPS Code number.

#  Experiment with output="wide" versus output="tidy" ("tidy" is the default.)

#####################################################################################

county1   <- get_acs(survey="acs1", year=2018, geography = "county", state = "CA",

                   # county=c(1,13,41,55,75,81,85,95,97),

                    county=c("Alameda","Contra Costa","Marin","Napa","San Francisco",

                              "San Mateo","Santa Clara","Solano","Sonoma"),

                   show_call = TRUE,  output="wide",

                   table="C03002")

 

Example #1.2 is a variation on the previous script portion and pulls out population by race/ethnicity for ALL California counties, 2014/18 five-year ACS. If I used ���ACS1��� and ���2018���, I���d only obtain data for the largest counties with 65,000+ total population!

 

# Simple Example #1.2: Population by Race/Ethnicity, 2014-2018, All California Counties, Table B03002

#    If the list of counties is excluded,

#    then data is pulled for all counties in the State

######################################################################################

AllCalCounties   <- get_acs(survey="acs5", year=2018, geography = "county",

               state = "CA", show_call = TRUE,  output="wide", table="B03002")

 

Example #1.3 pulls out population by race/ethnicity for ALL Congressional Districts in California, for the single year 2018 ACS.

 

 

# Simple Example #1.3: Population by Race/Ethnicity, 2018, California Congress Dists, Table C03002

#   This example pulls the congressional districts from California. Eliminate state="CA" to get congressional districts from the entire United States

######################################################################################

congdist1 <- get_acs(survey="acs1", year=2018, geography = "congressional district",  

                     state = "CA", show_call = TRUE, output="wide", table="C03002")

 

Example #1.4 Names the variables using mnemonic names for population by race/ethnicity, 2018, single year ACS, Bay Area counties. I���m using the janitor package ���adorn_totals��� function to sum up regional totals.

The tidycensus package will append ���E��� to variable estimates and ���M��� to variable margins of error (90 percent confidence level, by default). So, the variable ���White_NH_E��� will mean, to me, at least, ���Estimates of White Non-Hispanic Population��� and ���White_NH_M��� will mean: ���Margin of Error, 90% confidence level, of White Non-Hispanic Population.���

 

# Simple Example #1.4.1: Population by Race/Ethnicity: Bay Counties: Naming Variables.

#  User-defined mnemonic variable names, since "C03002_001_E" doesn't fall trippingly on the tongue!

#  the underscore is useful since tidycensus will append "E" to estimates and "M" to margin of error

#  variables, e.g., "Total_E" and "Total_M"

######################################################################################

county2   <- get_acs(survey="acs1", year=2018, geography = "county", state = "CA",

                     county=c(1,13,41,55,75,81,85,95,97),

                     show_call = TRUE, output="wide",

              variables = c(Total_    = "C03002_001",  # Universe is Total Population

                            White_NH_ = "C03002_003",  # Non-Hispanic White

                            Black_NH_ = "C03002_004",  # Non-Hispanic Black

                            AIAN_NH_  = "C03002_005",  # NH, American Indian & Alaskan Native

                            Asian_NH_ = "C03002_006",  # Non-Hispanic Asian

                            NHOPI_NH_ = "C03002_007",  # NH, Native Hawaiian & Other Pacific Isl.

                            Other_NH_ = "C03002_008",  # Non-Hispanic Other

                            Multi_NH_ = "C03002_009",  # Two-or-More Races, Non-Hispanic

                            Hispanic_ = "C03002_012")) # Hispanic/Latino

 

# Sometimes the results of TIDYCENSUS aren't sorted, so:

county2 <- county2[order(county2$GEOID),]

 

###########################################################################

# Simple Example #1.4.2: Add a new record: SF Bay Area, as sum of records 1-9

# adorn_totals is a function from the package janitor.

# The name="06888" is arbitrary, just a filler for the GEOID column.

 

tempxxx <- adorn_totals(county2,name="06888")

tempxxx[10,2]="San Francisco Bay Area"

 

county3 <- tempxxx

 

# Set a working directory, and write out CSV files as wanted.

# This is an example for a Mac, with the folder tidycensus_work on the desktop, and

# the folder output within tidycensus_work

setwd("~/Desktop/tidycensus_work/output")

 

write.csv(county3,"ACS18_BayAreaCounties.csv")

#############################################################################

 

At the end of this step I���m writing out CSV (comma separated value) files which I then open in Excel for finishing touches to tables, manually editing the variable names to something les cryptic: