I’ve finished an analysis of the CTPP 2017-21 data for Table B202105 (workers at workplace by detailed means of transportation). This is for California counties, and California census tracts, summarized to the county-level. The purpose is to ascertain the level of missing data due to absence of secondary allocation (imputation) to the tract-level database.
To reiterate, “primary allocation” of missing values for workplace location is always produced (by the Census Bureau) to the county, place and MCD levels. “Secondary allocation” was used in previous CTPP products, but was discontinued in the 2012-2016 CTPP. Secondary allocation imputes workplace location down to the TAZ or census tract or block group levels. (There may be better ways of stating this. I’m using my human powered AI to construct these statements.)
Results for the San Francisco Bay Area are included in this inserted graphic:

Overall, at least in the Bay Area, the least amount of missing secondary allocation are for the bicycle-to-work and walk-to-work modes, at 9 to 11 percent missing values. The most (worst) secondary allocation is for 3-or-more person carpools, 2-person carpools, and other (3) (motorcycle + taxicab + other).
A possible explanation is that bicycle and walk commuters are more savvy and address-conscious than carpoolers?
As should be expected, workers working at home should never have workplace allocation issues, since the block-tract-place-county of workplace for at-home workers is identical to their home location. Only non-home workers should be factored up in Part 2 tables.
The weirdness in the work-at-home totals (593,489 from tract data; 593,510 from county summary level) is due to rounding issues at the tract vs county level. Moral of this story: don’t expect tract-level data from the CTPP to aggregate neatly up to the county level. It’s all due to rounding. But tract-level data from standard ACS five-year tables *should* aggregate neatly up to county level.
I’m thinking that the most needed set of “county correction factors” will be for part 3: factoring tract-to-tract commuters based on county-of-residence, county-of-work, and means of transportation.
Here is my fully fleshed R script to analyze Table B202105 from both tract summary level and county summary level file.
https://github.com/chuckpurvis/r_scripts/blob/main/ctpp1721_california_b202…
r_scripts/ctpp1721_california_b202105_1.R at main · chuckpurvis/r_scripts
github.com
I hope this is of use to the community!
Chuck Purvis
Hayward, CA
clpurvis(a)att.net <mailto:clpurvis@att.net>
I’m extremely grateful to Stacey Bricka of MacroSys and Shichen Feng of the Fresno COG to provide insights and help in curing my logjam/frustrations.
I’ve uploaded a 10 example R-script to pull data from the CTPP 2017-2021 data for Parts 1, 2, and 3 at various geographic levels.
Here it is:
https://github.com/chuckpurvis/r_scripts/blob/main/ctpp1721_examples_1.R
r_scripts/ctpp1721_examples_1.R at main · chuckpurvis/r_scripts
github.com
I recommend having the CTPP Data Portal open in a web browser to help assist in selecting tables and understanding geographic levels associated with any of the tables.
Most challenging was my attempt to pull tract-to-tract worker flow for my region. I think there is an upper limit on the number of records that can be pulled in the API, somewhere between 50,000 records and 5,000,000 records.
Next steps:
1. renaming variable names to be more mnemonic and memorable.
2. attaching GIS files using either tidycensus or the r package tigris.
3. mapping all of the cool patterns
4. summarizing tract-to-tract worker flows to county-to-county level, to ascertain data loss due to lack of secondary allocation.
5. There probably should NOT be any data loss in any place-to-county or county-to-place summary levels. Check this.
Hope this is of interest and use to the community.
cheers,
Chuck Purvis
Hayward, California
or contact me directly, at: clpurvis(a)att.net <mailto:clpurvis@att.net>
# # # # #
I’ve tested the new chatbot on the CTPP Data Portal, with generally positive results.
Questions I’ve asked include:
1. How can I modify the API scripts to work in the R statistical package?
2. How do I store my API key in an r statistical package environment variable?
The chatbot gave full and easy-to-implement code to store my personal CTPP API key. Good
Various versions of my request for API scripts in R package gave various results.
It was awkward to cut-and-paste my question and the answer I received from the AI chatbot.
Basically:
1. load the R libraries jsonlite and httr
library(jsonlite)
library(httr)
2. Save your API key in your R environment
# Edit your R profile
usethis::edit_r_profile()
Sys.setenv(CTPP_KEY = "mytopsecretapikeyxxxxxxx")
api_key <- Sys.getenv("CTPP_KEY”)
3. Define API Endpoint: Specify the URL of the API endpoint you want to access
url <- "https://ctppdata.transportation.org/api/data/2021”
4. **Set Headers**: Create a list of headers required for the API request, including the API key.
headers <- c(Accept = "application/json", "x-api-key" = api_key)
5. Prepare Parameters: Define the parameters needed for your API call.
# may include specifying the type of data you want (like geographic IDs)
params <- list(get = "b302100_e1,b302100_m1",geo = "C0300US06001,C0300US06005",
size = 10,page = 1)
6. Make the Request: Use the httr package send a GET request to the API.
response <- GET(url, add_headers(.headers = headers), query = params)
############################################################################
The code is failing around step 5: defining the tables and geography for the API data pull….
I really could use some human assistance to fix this R code. I heard there is a cabal of MPO data people who can code anything and everything in R or Python. Can somebody help me?
I’d like to develop examples, in R script, to retrieve data for various tables (Parts 1, 2 and 3) for various geographies (state, county, place, tract). Obviously I’ll share my R code in my bithub repository.
I just need a collaborator / hand-holder.
I received no feedback on my April 4, 2025 post to this CTPP listserv. My guess is that too few people have actually tried to use the API, or they’re too shy to provide feedback.
Hope to hear from you!
Chuck Purvis,
Hayward, California
PS, The single-year 2024 ACS data is scheduled for release in one month, on September 11, 2025. Should keep us busy for a few days. And the five-year, 2020-2024 ACS is scheduled for December 11, 2025.
###
Hello CTPP Listserv,
I'm excited to share that we have introduced a new feature to the CTPP Data Access site<https://ctppdata.transportation.org/#/index>. We’re excited to introduce the new CTPP Chatbot, a tool to help users navigate CTPP data more easily than ever. The chatbot can answer questions about content, methodology, and history — simply ask a question in your own words, and the chatbot will deliver precise answers in seconds. Thank you to our partners at Macrosys for their work on this tool!
We hope that the chatbot will streamline the user experience. It learns from your interactions and feedback, so it will continually improve to serve you better with every question you ask. You can find the chatbot icon in the bottom right-hand corner of the data site - https://ctppdata.transportation.org/#/index.
If you encounter any issues with the chatbot or other elements of the data access site, please fill out a help ticket, which can be found in the upper right-hand corner of the site.
Thank you,
Julia Glickman
Program Manager for Transportation Data
AASHTO | American Association of State Highway and Transportation Officials
(202) 624-3556 | jglickman(a)aashto.org