The CDC's vaccine data is all wrong
Privacy concerns have trumped accuracy, and nobody knows what’s happening
As Florida experienced its very deadly Delta wave, the volume of deaths and hospitalizations there seemed inconsistent with the state’s superficially average levels of vaccination.
But as David Wallace-Wells pointed out, the official data actually “suggested in some places considerably more than 100 percent of local residents were vaccinated there, in some cases 200 percent of local residents.” Smart people on the internet swiftly reached a consensus that this reflected the miscounting of seasonal residents or vaccine tourists. After all, before Ron DeSantis’ political instincts led him to his current quasi-anti-vax posture, he was an early advocate of a more open, less bureaucratic approach to vaccine allocation. That meant “head to Florida to get a shot” seemed like a plausible option for a lot of affluent, vax-enthusiastic people.
That would be an unfortunate but somewhat understandable vaccine data problem.
But the truth is quite a bit worse: it really seems that all the vaccine data is bad. The CDC says, for example, that 99.9% of the senior citizen population has had at least one shot.
Pennsylvania’s health department actually did some kind of audit of the data they’d been reporting to the CDC, and with that correction, the vaccination percentage has dropped by five percentage points. But are other states correct? Is it really true that 99.9% of West Virginia seniors have had one shot? That would make it very hard to understand why 16 West Virginians are dying of Covid each day. It seems like the counts are just not correct, probably because no system was ever created to count them correctly.
Some of my friends at the Niskanen Center recently launched a project on improving America’s state capacity, and I think these basic counting functions are a good building block for thinking about this issue. Of course, just counting stuff is not a particularly impressive state function. But I think one way of distinguishing “The State” from a random hunter-gatherer band or settled village is that a state can — qua state — count things that are beyond the capacity of an individual observer. The American government is not in a state of pre-literacy where we need to invent some whole new technology in order to count things and keep records. But we are almost shockingly bad at doing it in practice.
CDC vaccination data doesn’t make sense
The second-largest state in the union is Texas, which is home to about 9% of the population. The CDC says that 92.6% of Texans over the age of 65 have received at least one dose, which is hard to square with the 99.9% figure from the CDC.
Or look at Maryland, where they say 99.9% of seniors have gotten at least one shot. About 15% of Maryland lives in Prince George’s County, where 98.6% of seniors are vaccinated. In Baltimore County, home to about 9% of the state, it’s 96.1% of seniors. About 585,000 of Maryland’s six million residents live in Baltimore City (an independent city-county, not part of Baltimore County), and the CDC says that 85.5% of the city’s seniors are vaccinated.
In other words, the states seem to aggregate to a lower number than the national figure. And the county numbers seem to aggregate to a lower number than the state figure.
The inconsistency strikes me as odd, but I won’t dwell on it too much because the answer to the riddle “Why is the CDC’s vaccination count so bad?” turns out to be that actually nobody is counting. In the footnotes, they explain that they receive data that has been de-identified for privacy purposes, and consequently, you may be reported to them as two separate individuals if you get different doses from different providers. They also explain that not everyone has county of residence information, which is why the county-level counts aggregate up to a lower number than the state-level count. Then, in a methodological choice that I think was a mistake, they top code everything at 99.9% on the theory that “this cap helps address potential overestimates of vaccination coverage due to first, second, and booster doses that were not linked.”
This top-coding lets the CDC avoid saying absurd things like 117% of the senior citizens in Montgomery County are vaccinated. But by doing so, they are obscuring the underlying flaws in the data.
The privacy/competence tradeoff
CDC mostly seems to be contemplating data errors as stemming from administrative confusion. I know people who got their first shot in one state and second shot in another. And I know tons of people who got boosted someplace other than their original jurisdiction. Personally, I got my original two shots at a federally run mass vaccination clinic in Baltimore that didn’t exist anymore by the time the FDA authorized boosters for the general population.
Beyond that, I also know plenty of J&J people who got a booster as soon as it became overwhelmingly clear from the scientific data that this was wise. Unfortunately, the FDA took months to acknowledge this data, so many of my J&J friends believed that the best way to get the booster they wanted was to turn up at a pharmacy and claim to be an uninsured person in need of a first shot. I know other Pfizer and Moderna folks who did this in the “boosters only for people who fit certain criteria” era.
In the sense that it was good that these people got unauthorized boosters, I am glad that we opted for a system that prioritizes privacy over administrative competence.
But I don’t actually think that “we need to do a terrible job of keeping track of who is getting which vaccines and when, so that way we have a backstop in case regulators make terrible decisions” is a sound basis for running a country. Besides this, the stated reason for collecting such bad data is not to allow people to get illicit boosters, it’s to protect their privacy. As I wrote in “They deliberately put errors in the Census,” I am very skeptical that the privacy value of having the government do inaccurate record-keeping is high.
After all, in the case of vaccines, what is the value of privacy?
In a practical sense, it makes it harder for governments to implement vaccine mandates. But if you think vaccine mandates are good, then we should be making it easier to implement them, not harder. And if you think vaccine mandates are the long arm of tyranny, then the solution is for courts to uphold people’s rights. Whether it’s on the rights-respecting front or on the making-good-regulatory-decisions front, I think the idea that “making the government inept is a good substitute for making it actually perform correctly” is a bad one.
Policy is full of known-unknowns
Another issue that’s been in the news for the past year and a half is the apparently rising number of shootings and murders. I say “apparently” because the FBI’s national compilation of crime statistics for 2020 didn’t come out until September 2021, long after the end of calendar year 2020. So while I think there is good reason to believe that shootings have continued rising in 2021 based on fragmentary data from a few big cities, it’s not completely clear what’s happening. And while the 15,875 law enforcement agencies that contributed to the report constitute the vast majority of actual offenses, it’s still striking that there are literally thousands of agencies that don’t report timely data — all the more striking because the FBI’s normal reporting window isn’t actually timely.
A lot of people in the community seem to have just reconciled with themselves that this is the way that it is. In some ways, it’s not a huge issue because in practice you can track the 25 or 50 biggest cities and infer a national trend from them.
But on another level, it’s actually a very big deal because if you want to make inferences about how political or policy trends may be impacting violence then you need all the data. And “How many people were murdered?” is the simple question. As Jeff Asher notes, other crime statistics like the clearance rate can be subject to bizarre definitional problems.
Unlike in the vaccine case, you can’t blame privacy for this. It seems to me that if nobody can think of a better idea, the DOJ should hire a bunch of people to make phone calls all day to every one of the 3,000 or so county coroners’ offices and ask them how many murders they logged last month. Then by the end of September, you could report on how many people were killed in August. It would be good to know!
The Economist recently reported on the number of lead water pipes in use in America, and the answer is nobody seems to know since most states do not officially track this.
Again, not great. Lead water pipes have really only become a hot policy topic in the past few years, but the broad contours of the issue have been understood scientifically for literally over a century. The government has had plenty of time to try to put together a survey, but instead — like so many things — it seems to have fallen into the crevasse of federalism, where we leave something up to the states and they mostly don’t do it.
The government is really good at counting when it tries
The tragedy of all this is that it’s honestly not as if the government can’t collect and disseminate accurate and timely statistical information. The U.S. Department of Agriculture tracks and releases tons of information about how many animals of various kinds are slaughtered, how much of various staple crops are harvested, and what the price of everything is. The Bureau of Labor Statistics compiles the Consumer Price Index on a monthly basis.
The difference is really just that at some point in the past, Congress sat down and provided the funds and the mandate to create these agencies and empower them to do a good job. But passion for this kind of capacity-building has evaporated.
And while modern computers and information technology should theoretically make this stuff easier, in practice we not only have an endless series of IT procurement fiascos; collecting good information rarely seems to be a priority.
At the end of the day, it’s hard to have a successful vaccination campaign if you’re not keeping track of how many people you’ve vaccinated and when. But the root of the oft-absurd CDC vaccine numbers is that the information is anonymized in a way that makes it impossible to know the thing that one would actually want to know. And I think it’s time to start insisting that it’s actually quite important for government agencies to be able to do their jobs well, even if that means spending more money or running some privacy risks.
It's simply absurd that there is a debilitating concern about privacy when it comes to public sector activities while the private sector and especially the tech giants are by far the bigger threat in the privacy realm.
The focus on privacy in the public sector is weaponized by multiple actors. On one side, it is the target of bad faith complaining by those that wish to diminish the government's capabilities and on the other it is a useful bureaucratic scapegoat in the face of larger, structural issues.
This is really great, Matt! Privacy concerns also block inter-agency information sharing, and then they enter into bizarre and incomplete MOUs are created to 'share what we can'.
But the claim that no one is counting cannot be overstated. In Virginia, for example, the best eviction data we have is from an academic institution who has an unpaid intern pull the numbers from the court records. Due to the scale, the data is only regional. To the point of the article, eviction data is registered twice already by statute and we don't care to count; one being the court documentation mentioned, but another by sheriff's when they place a notice of eviction.