The Census Bureau released the Census 2020 PL 94-171 data in the “user friendly” (API-ready) format last Thursday, September 16th. The “legacy format” PL 94-171 was released the previous month, August 12th.
For analysts/planners/data scientists/retirees using the R package, we could use the R package “PL94171” to retrieve the “legacy format” data.
On (the morning of) September 16th, Dr Walker of TCU updated his R package “TIDYCENSUS” to retrieve the “user-friendly/API-ready” PL 94-171 data. Luckily I follow Dr Walker on twitter so I was apprised fairly quickly of the updates!
The user can also access the new PL 94-171 from the Census Bureau’s data portal: data.census.gov <http://data.census.gov/>
I’ve attached two R scripts that make use of the newest TIDYCENSUS version.
PL94181_tidycensus_2020_step1.r == this is my script from 9/16/21. I just tested it for all California places; all US Counties; all US Places, all blocks in Alameda County, California; all census tracts in the San Francisco Bay Area; and all block groups in the San Francisco Bay Area
The package works perfectly.
If you’re examining data for multiple states, or all states, then TIDYCENSUS is by far the best way to go. If you’re just looking at data for just one state, then the R package PL94171 is quite sufficient.
PL94171_US_places_allyears_Step1.r == this is my script (from yesterday). I wanted to get data for all places in the US for all available years. The 1990 and previous years PL94-171 are not available on the Census Bureau’s API, and thus not available for pulling out by TIDYCENSUS. So, this script pulls selected variables from PL94-171 from the 2000, 2010, and 2020 censuses. I merge the data for all places in the three censuses, and then filtered (selected) out large places with 100,000+ population in any year.
Note that you can bring in the TIGER/Line boundary files (shp format) with the TIDYCENSUS package. Refer to the “geometry” and “keep_geo_variables” functions in TIDYCENSUS. This is crucial if you’re interested in calculating population density (and, of course, mapping all of these things!) AREALAND is a Census Bureau variable on “land area in square meters.” Alas, this doesn’t work for all places in the US, but does work for all counties in the US. (Retrieving AREALAND for places WITHIN a state works AOK!)
test2020 <- get_decennial(year=2020, sumfile="pl",
geography = "county", # state="CA",
geometry=TRUE, keep_geo_vars=TRUE,
show_call = TRUE,output="wide", variables = selvars20)
As a side note, the TCU featured Dr Walker’s efforts in their Fall 2021 TCU magazine. It’s well worth reading. I can’t believe that TIDYCENSUS has only been around since 2017!!
https://magazine.tcu.edu/fall-2021/kyle-walker-open-source-mapping-tools/ <https://magazine.tcu.edu/fall-2021/kyle-walker-open-source-mapping-tools/>
If you’re a TIDYCENSUS user, I would strongly recommend follow Kyle Walker on twitter (@kyle_e_walker). You can follow me on twitter, too (@charleypurvis) but I’m more likely to be tweeting about census data, wine, beer, liberal politics, and baseball.
REQUEST: has anybody created R code (tidycensus or pl94171 package) to calculate the variables “white alone or in combination with other races” or “black alone or in combination with other races” … etc? I’m not a member of the State Data Center network, but maybe folks on the SDC network could pass along this request… It will take SEVERAL hours to create the code, I think…..?
Take care and stay safe!!
Chuck Purvis
Hayward, California
# # # # #
Guy-
I saw your August 24 email RE: "We haven't heard about this from our State Data Center or anyone else so I'm not sure if we're supposed to be involved with it or not, seems like we should be."
Right. Census Bureau has put Census State Data Centers (SDCs) in the driver's seat of PUMA delineations.
When your state's SDC Lead Agency reaches out depends on how attentive they are to in-state partnerships. *
* fun footnote: Until end of 2019, I was a board member and then chairman of the national network of SDCs<https://www.census.gov/about/partners/sdc/member-network/steering-committee…>. What I learned from it: Some states' SDCs are well-resourced, high-performing as partners. Others, less so. But in all cases Census Bureau provides zero funding to the SDCs - never has - so anything a state's SDC is doing (or not) is ultimately discretionary & guided by its own agenda or purpose.
Your state's SDC should call Atlanta Regional Council. + same in other states.
But if the whole month of September goes by, and Oct. 1st rolls around, you may need to call them, take the initiative. Here is the list of State Lead contacts in all states: https://www.census.gov/about/partners/sdc/member-network.html
In my opinion, any MPO serving a region where individual counties have > 200,000 population should be involved.
And specifically: enlist the MPO analysts who were involved 2+ years ago in the re-tracting initiative (what Census calls the Participant Statistical Areas Program). They will be really well-prepared for this task... If that describes you, contact your Census State Data Center this fall.
Hope that helps.
--Todd Graham
[Metropolitan Council Logo]
Todd Graham
Principal Forecaster | Research
Metropolitan Council
390 North Roberrt Street, St. Paul, MN 55101
Ph. 651-602-1322
metrocouncil.org<https://www.metrocouncil.org/data> | facebook<https://www.facebook.com/MetropolitanCouncil> | twitter<https://twitter.com/metcouncilnews>
From: Guy Rousseau <GRousseau(a)atlantaregional.org>
Sent: Tuesday, August 24, 2021 3:48 PM
To: Graham, Todd <todd.graham(a)metc.state.mn.us>; ChristopherEd <edc(a)berwyned.com>; Charles Purvis <clpurvis(a)att.net>
Cc: Weinberger Penelope <pweinberger(a)aashto.org>
Subject: RE: Defining PUMAs for Census 2020
Thanks Todd, Ed and Chuck for the suggestions. We haven't heard about this from our State Data Center or anyone else so I'm not sure if we're supposed to be involved with it or not, seems like we should be. I don't remember MPOs having to delineate PUMAs in the past, especially for 2010's set, though it's possible I missed that, as I wasn't directly involved with that work back then, so I don't think we've been involved with the delineation before. Anyhow, we've got the 2020 Census geography in a geodatabase (GDB), however we don't yet have all the population data, just the tracts. We're still waiting on the tables to join for the blocks and block groups.
We looked at the PUMAs (borders in white) in relation to our ARC Super Districts (multicolored) and it looks like most counties have more super-districts than PUMAs in them. We will also look at how the 2020 tracts nest within the 2010 PUMAs and what the 2020 population is.
[Map Description automatically generated with low confidence]
From: Graham, Todd <todd.graham(a)metc.state.mn.us<mailto:todd.graham@metc.state.mn.us>>
Sent: Monday, August 23, 2021 4:01 PM
To: The Census Transportation Products Program Community of Practice/Users discussion and news list <ctpp(a)listserv.transportation.org<mailto:ctpp@listserv.transportation.org>>
Subject: [CTPP News] Re: Defining PUMAs for Census 2020
Hi Chuck-
Thanks for the heads-up. Yes, that is the way to think of PUMAs = as "super-districts" or sub-state regions.
Here are my "pro-tips" learned in PUMA drawing 10 years ago:
1. Do not group together fractional pieces of counties when you could keep a county whole, or when you could group multiple whole counties together in a PUMA.
2. When splitting counties into multiple PUMAs, try to arrange for the split lines to be stable city/town boundaries. This means you're looking to create PUMAs where city/town boundaries are aligned with Tract boundaries. (Because Census Geog Dept will require that tracts be the basic units of PUMA assembly.)
The reason I emphasize parsimony with counties in point #1 is: The PUMAs you draw will enable or limit the detail of MIGPUMAs as well. (MIGPUMA= Migration origination geographic units) Census Bureau will create MIGPUMAs as the least common denominator grouping of counties that is entirely coincident with a group of PUMAs. So don't split counties unnecessarily.
The reason I emphasize city/town boundaries in point #2 -- even though Census Geog discusses tracts as the basic units - is this: The PUMAs you draw will enable or limit the detail of POWPUMAs. (POWPUMAs = Place of Work geographic units) Census Bureau will create POWPUMAs as the least common denominator grouping of counties + places that is entirely coincident with a group of PUMAs.
Stated differently: Census looks for combinations of county + place to uniquely nest within a POWPUMA.
Why is this the standard for POWPUMAs? It's because of the questions asked on ACS: ACS asks specifically for the county + place of one's work location. The PUMA final criteria document https://www.census.gov/programs-surveys/geography/guidance/geo-areas/pumas/…<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.censu…> does say all this, but you'd have to read all the way to the last 3 pages of that document to find it.
That's all my advice. Good luck!
--Todd Graham
[Map Description automatically generated]
[Metropolitan Council Logo]
Todd Graham
Principal Forecaster | Research
Metropolitan Council
390 North Roberrt Street, St. Paul, MN 55101
Ph. 651-602-1322
metrocouncil.org<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.metro…> | facebook<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.faceb…> | twitter<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.c…>
From: Charles Purvis <clpurvis(a)att.net<mailto:clpurvis@att.net>>
Sent: Monday, August 23, 2021 12:32 PM
To: ctpp(a)listserv.transportation.org<mailto:ctpp@listserv.transportation.org>
Subject: [CTPP News] Defining PUMAs for Census 2020
The Census Bureau is ramping up efforts for Census 2020 PUMA delineation. PUMAs are "Public Use Microdata Areas." They are large, contiguous areas of 100,000+ population, built up from census tracts and counties.
Here's the main Census Bureau page on PUMA 2020:
https://www.census.gov/programs-surveys/geography/guidance/geo-areas/pumas/…<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.censu…>
The Census Bureau will be kicking off the program in September 2021 (next month!). This will be an announcement to each State Data Center points of contact.
If you're an MPO, you might be part of your state's State Data Center Network (as an affiliate data center, regional data center, etc.) Get in touch with your state's SDC. It will be each SDC that provides proposed PUMAs to the Census Bureau.
The actual work on defining the new Census 2020 based PUMAs will be November 2021 through January 2022, with the "final" 2020 PUMAs published by summer 2022.
My key point: the PUMAs are NOT just for use in the Public Use Microdata Sample, but are used as STANDARD tabulations for the American Community Survey, both the 1-year and 5-year products. As such, the PUMAs can be thought of as "Regional Analysis Districts" or "Superdistricts" or "Regional Districts." They can be SUPER useful in MPO transportation planning analyses.
Here is the Census Bureau's statement on the usefulness of PUMAs, from the "Final Criteria" document:
"In addition to PUMS data publication, as the ACS was developed and implemented after the 2000
Census, standard PUMAs were adopted as a basic tabulation geographic entity to present summary
data. This was in response to concerns raised by SDCs and other stakeholders that the minimum
population thresholds for tabulation and dissemination of 1-year and 3-year ACS data (65,000 and
20,000 persons, respectively) would limit the availability of data for the predominantly rural portions of
states as well as for many counties. PUMAs met these population size requirements for all ACS data
tabulations and their adoption resulted in a substantially larger community of PUMA data users, many
of whom do not use PUMS files. This sustained interest in PUMA geography and associated data is
expected to continue, therefore the PUMA criteria and guidelines for the 2020 Census are intended to
help maintain a stable and comparable dataset."
[from: Final Criteria for Public Use Microdata Area for the 2020 Census and the American Community Survey]
Note that the current set of Census 2020 PL 94-171 data files do NOT have PUMAs as a standard summary level. This is because the Census 2020 includes the 2020 Census Tracts, and the current PUMAs are based on the 2010 Census Tracts.
My recommendation for MPO staffs. To me this is a GIS-heavy process:
1. Map the Census 2010 Census Tracts and PUMAs.
2. Map the Census 2020 Census Tracts. Ideally the 2020 tracts nest within the 2010 tracts, but boundaries do indeed change.
3. Develop an equivalency between 2020 Census Tracts and 2010 PUMAs.
4. Use PL 94-171 data to get Census 2020 census tracts, and aggregated to approximate the 2010 PUMAs.
5. If the county is > 200,000 population, consider how to best re-draw PUMA boundaries.
6. It's a jigsaw puzzle, where none of the potential PUMAs can be less than 100,000. Consider this as "redistricting for MPOs"
7. Involve local actors who are interested: counties, cities, academics, nonprofits, etc.
8. Consider the Bureau's advice on "stable and comparable dataset"... Sometimes you may just keep the old PUMAs!
Hope this is of interest:
Chuck Purvis, Hayward, California
Hi all,
Join us on Wednesday, September 22 at 2-4 pm ET for a CTPP training course on software basics:
CTPP Data Access Software Basics on September 22 at 2-4 pm ET
The CTPP Program is offering a two-hour training session where you'll learn the basics of the CTPP data access software. You'll learn how to navigate the software, select geographies, find tables, and more! Click the link below to register today. https://aashto.adobeconnect.com/software1/event/registration.html
This is the start of a new monthly training series that will cover a range of topics, including basic and advanced software instruction, data issues, charts and graphs, feature presentations, and more.
Hope to see you there!
Team CTPP