National
Harvard researchers recommend Census not use privacy tool
A group of Harvard researchers has come out against the U.S. Census Bureau’s use of a controversial method to protect privacy with the numbers used for redrawing congressional and legislative districts, saying it doesn’t produce data good enough for redistricting.
The Harvard researchers said in a paper released last week that using the new privacy method will make it impossible for states to comply with the requirement that districts have equal populations, a principle also known as “One Person, One Vote.” The technique also doesn’t universally protect the privacy of people who participated in the 2020 census, they said.
The privacy method adds “noise,” or intentional errors, to the data to obscure the identity of any given participant in the 2020 census while still providing statistically valid information. Rather than use this technique, known as “differential privacy,” the researchers said the Census Bureau should rely on a privacy method used in the 2010 census, when data in some households were swapped with other households.
“Over the past half century, the Supreme Court has firmly established the principle of One Person, One Vote, requiring states to minimize the population difference across districts based on the Census data,” they wrote. Differential privacy makes it “impossible to follow this basic principle.”
The technique “negatively impacts the redistricting process and voting rights of minority groups without providing clear benefits,” the researchers said.
The Harvard researchers made the recommendation as the Census Bureau puts the final touches on how it will use differential privacy. Simultaneously, a panel of federal judges in Alabama is deliberating whether the method can be used on the redistricting data expected to be released in mid-August. Alabama’s legal challenge argues differential privacy will produce inaccurate data, and the judges could rule any day.
The Census Bureau says more privacy protections are needed as technological innovations magnify the threat of people being identified through their census answers, which are confidential by law. Computing power is now so vast that it can easily crunch third-party data sets that combine personal information from credit ratings and social media companies, purchasing records, voting patterns and public documents, among other things.
“With today’s powerful computers and cloud-ready software, bad actors can easily find and download data from multiple databases. They can use sophisticated computer programs to match information between those databases and identify the people behind the statistics we publish. And they can do it at lightning speed,” two Census Bureau officials, John Abowd and Victoria Velkoff, wrote in a blog post several weeks ago.
The Harvard team — including political scientists, statisticians and a data scientist — simulated drawing a large number of realistic maps of political districts in different states. They used data from the 2010 census, applied the most recent Census Bureau version of the privacy technique, and followed rules that the political districts needed to have equal population and be compact and contiguous.
According to the researchers, differential privacy made it more difficult to draw districts of equal population, particularly for smaller districts such as state legislative seats.
It undercounted racially mixed areas, as well as politically mixed areas where both Democrats and Republicans lived, while overestimating racially and politically segregated areas, making it more unpredictable whether a minority voter would be included in a district where more than half of registered voters are either Black or Hispanic. That would either hamper or artificially inflate the voting power of minority groups, the researchers said.
The technique “tends to introduce more error for minority groups than for White voters, and even more error for voters who are in a minority group” at the neighborhood level, the researchers said.
If the Census Bureau ends up using the technique, statisticians should favor accuracy over privacy when attempting to balance the two principles, the researchers said.
A University of Minnesota analysis, also released last week, reached similar conclusions to the Harvard study. While the Census Bureau’s latest version of differential privacy cut back errors on the total population compared to earlier iterations, they remained for minority groups. “This level of error will severely compromise demographic and policy analyses,” the University of Minnesota researchers said.
A team of Princeton researchers on Wednesday took issue with the conclusions by the Harvard researchers, saying they should be viewed with skepticism until the work has been peer reviewed. The Princeton researchers said in a rebuttal paper that their analysis showed no practical consequence in the differences between the raw 2010 data and the ones where differential privacy was applied.
The census data are not only used for redrawing congressional and legislative districts but they’re also utilized for determining how many congressional seats each state gets, as well as the distribution of $1.5 trillion in federal spending each year.
Follow Mike Schneider on Twitter at https://twitter.com/MikeSchneiderAP