The Census Bureau released the Census 2020 PL 94-171 data in the “user friendly”
(API-ready) format last Thursday, September 16th. The “legacy format” PL 94-171 was
released the previous month, August 12th.
For analysts/planners/data scientists/retirees using the R package, we could use the R
package “PL94171” to retrieve the “legacy format” data.
On (the morning of) September 16th, Dr Walker of TCU updated his R package “TIDYCENSUS” to
retrieve the “user-friendly/API-ready” PL 94-171 data. Luckily I follow Dr Walker on
twitter so I was apprised fairly quickly of the updates!
The user can also access the new PL 94-171 from the Census Bureau’s data portal:
data.census.gov <http://data.census.gov/>
I’ve attached two R scripts that make use of the newest TIDYCENSUS version.
PL94181_tidycensus_2020_step1.r == this is my script from 9/16/21. I just tested it for
all California places; all US Counties; all US Places, all blocks in Alameda County,
California; all census tracts in the San Francisco Bay Area; and all block groups in the
San Francisco Bay Area
The package works perfectly.
If you’re examining data for multiple states, or all states, then TIDYCENSUS is by far the
best way to go. If you’re just looking at data for just one state, then the R package
PL94171 is quite sufficient.
PL94171_US_places_allyears_Step1.r == this is my script (from yesterday). I wanted to get
data for all places in the US for all available years. The 1990 and previous years
PL94-171 are not available on the Census Bureau’s API, and thus not available for pulling
out by TIDYCENSUS. So, this script pulls selected variables from PL94-171 from the 2000,
2010, and 2020 censuses. I merge the data for all places in the three censuses, and then
filtered (selected) out large places with 100,000+ population in any year.
Note that you can bring in the TIGER/Line boundary files (shp format) with the TIDYCENSUS
package. Refer to the “geometry” and “keep_geo_variables” functions in TIDYCENSUS. This is
crucial if you’re interested in calculating population density (and, of course, mapping
all of these things!) AREALAND is a Census Bureau variable on “land area in square
meters.” Alas, this doesn’t work for all places in the US, but does work for all counties
in the US. (Retrieving AREALAND for places WITHIN a state works AOK!)
test2020 <- get_decennial(year=2020, sumfile="pl",
geography = "county", # state="CA",
geometry=TRUE, keep_geo_vars=TRUE,
show_call = TRUE,output="wide", variables = selvars20)
As a side note, the TCU featured Dr Walker’s efforts in their Fall 2021 TCU magazine. It’s
well worth reading. I can’t believe that TIDYCENSUS has only been around since 2017!!
https://magazine.tcu.edu/fall-2021/kyle-walker-open-source-mapping-tools/
<https://magazine.tcu.edu/fall-2021/kyle-walker-open-source-mapping-tools/>
If you’re a TIDYCENSUS user, I would strongly recommend follow Kyle Walker on twitter
(@kyle_e_walker). You can follow me on twitter, too (@charleypurvis) but I’m more likely
to be tweeting about census data, wine, beer, liberal politics, and baseball.
REQUEST: has anybody created R code (tidycensus or pl94171 package) to calculate the
variables “white alone or in combination with other races” or “black alone or in
combination with other races” … etc? I’m not a member of the State Data Center network,
but maybe folks on the SDC network could pass along this request… It will take SEVERAL
hours to create the code, I think…..?
Take care and stay safe!!
Chuck Purvis
Hayward, California
# # # # #