Example #1. Tidycensus examples: one year, multiple geographies, multiple variables

This is an example of a simple script to ��pull�� select variables from the ACS using my Census API key and the R package tidycensus.

Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. Otherwise, it won��t work.

Comments start with hashtag ��#��. It��s more obvious when using R-Studio.

# Step 0: Load relevant libraries into each R-session.

library(tidyverse)

library(tidycensus)

library(janitor)

Example #1.1 pulls the 2018 ACS data for San Francisco Bay Area counties, for table C03002 (population by race/ethnicity). It��s pulled into a ��data frame�� called ��county1��.

Though the keywords (survey, year, geography, etc.) can be in any order within the ��get_acs()�� statement, I prefer leading with:

1. Survey=��acs1�� am I using the 1-year or 5-year databases

2. Year=2018 - what��s the last year of the 1-yr/5-year database

3. Geography=��county�� what level of geography am I pulling? US, State? County? Congressional District? Place?

See the tidycensus documentation, and the author��s website, for all of this and more!

https://walker-data.com/tidycensus/articles/basic-usage.html

https://cran.r-project.org/web/packages/tidycensus/tidycensus.pdf

https://www.rdocumentation.org/packages/tidycensus/

# Simple Example #1.1: Population by Race/Ethnicity, 2018, SF Bay, Table C03002

# Note that tidycensus can use either the County Name or the County FIPS Code number.

# Experiment with output="wide" versus output="tidy" ("tidy" is the default.)

#####################################################################################

county1 <- get_acs(survey="acs1", year=2018, geography = "county", state = "CA",

# county=c(1,13,41,55,75,81,85,95,97),

county=c("Alameda","Contra Costa","Marin","Napa","San Francisco",

"San Mateo","Santa Clara","Solano","Sonoma"),

show_call = TRUE, output="wide",

table="C03002")

Example #1.2 is a variation on the previous script portion and pulls out population by race/ethnicity for ALL California counties, 2014/18 five-year ACS. If I used ��ACS1�� and ��2018��, I��d only obtain data for the largest counties with 65,000+ total population!

# Simple Example #1.2: Population by Race/Ethnicity, 2014-2018, All California Counties, Table B03002

# If the list of counties is excluded,

# then data is pulled for all counties in the State

######################################################################################

AllCalCounties <- get_acs(survey="acs5", year=2018, geography = "county",

state = "CA", show_call = TRUE, output="wide", table="B03002")

Example #1.3 pulls out population by race/ethnicity for ALL Congressional Districts in California, for the single year 2018 ACS.

# Simple Example #1.3: Population by Race/Ethnicity, 2018, California Congress Dists, Table C03002

# This example pulls the congressional districts from California. Eliminate state="CA" to get congressional districts from the entire United States

######################################################################################

congdist1 <- get_acs(survey="acs1", year=2018, geography = "congressional district",

state = "CA", show_call = TRUE, output="wide", table="C03002")

Example #1.4 Names the variables using mnemonic names for population by race/ethnicity, 2018, single year ACS, Bay Area counties. I��m using the janitor package ��adorn_totals�� function to sum up regional totals.

The tidycensus package will append ��E�� to variable estimates and ��M�� to variable margins of error (90 percent confidence level, by default). So, the variable ��White_NH_E�� will mean, to me, at least, ��Estimates of White Non-Hispanic Population�� and ��White_NH_M�� will mean: ��Margin of Error, 90% confidence level, of White Non-Hispanic Population.��

# Simple Example #1.4.1: Population by Race/Ethnicity: Bay Counties: Naming Variables.

# User-defined mnemonic variable names, since "C03002_001_E" doesn't fall trippingly on the tongue!

# the underscore is useful since tidycensus will append "E" to estimates and "M" to margin of error

# variables, e.g., "Total_E" and "Total_M"

######################################################################################

county2 <- get_acs(survey="acs1", year=2018, geography = "county", state = "CA",

county=c(1,13,41,55,75,81,85,95,97),

show_call = TRUE, output="wide",

variables = c(Total_ = "C03002_001", # Universe is Total Population

White_NH_ = "C03002_003", # Non-Hispanic White

Black_NH_ = "C03002_004", # Non-Hispanic Black

AIAN_NH_ = "C03002_005", # NH, American Indian & Alaskan Native

Asian_NH_ = "C03002_006", # Non-Hispanic Asian

NHOPI_NH_ = "C03002_007", # NH, Native Hawaiian & Other Pacific Isl.

Other_NH_ = "C03002_008", # Non-Hispanic Other

Multi_NH_ = "C03002_009", # Two-or-More Races, Non-Hispanic

Hispanic_ = "C03002_012")) # Hispanic/Latino

# Sometimes the results of TIDYCENSUS aren't sorted, so:

county2 <- county2[order(county2$GEOID),]

###########################################################################

# Simple Example #1.4.2: Add a new record: SF Bay Area, as sum of records 1-9

# adorn_totals is a function from the package janitor.

# The name="06888" is arbitrary, just a filler for the GEOID column.

tempxxx <- adorn_totals(county2,name="06888")

tempxxx[10,2]="San Francisco Bay Area"

county3 <- tempxxx

# Set a working directory, and write out CSV files as wanted.

# This is an example for a Mac, with the folder tidycensus_work on the desktop, and

# the folder output within tidycensus_work

setwd("~/Desktop/tidycensus_work/output")

write.csv(county3,"ACS18_BayAreaCounties.csv")

#############################################################################

At the end of this step I��m writing out CSV (comma separated value) files which I then open in Excel for finishing touches to tables, manually editing the variable names to something les cryptic: