I don't think people set little store by "privacy" writ large, it's just that conversations about privacy are usually pretty remote from the kinds of privacy that I care about and that I suspect a lot of people care about. Why should anyone care about an Amazon algorithm, where no humans see the results connected to the customer's name, guessing their age or other demographics, in order to show you stuff they think you'd like to buy? It's different when it's the government, though; the government can fine, conscript, or arrest people, among other materially harmful things. People care about privacy relationally; it just doesn't make sense to think of privacy as a binary of "information public" vs. "information hidden". What people care about is like "I don't want my employer seeing my exercise habits / my relatives seeing my sexts / hiring managers and landlords seeing my political activities or family planning intentions / abusive exes seeing my physical location / the state keeping a record of my conduct for use in some kind of social credit system." What's worrisome is that the government can get access via subpoenas to some of the data private companies are tracking, but it's hard to see how making the census marginally less accurate helps.
Good point. These examples clarify what goals-based privacy should try to achieve, instead of vague hand wringing over database reconstructions.
It's also points out that some data needs more privacy than others. Medical data: Yea, needs to be private. The kind of toilet paper I use: Happy to let anyone know.
The loss of digital privacy coincides with a massive surplus of privacy in physical reality. Try growing up in a small town. If you took a different girl than usual to the Blockbuster to pick up a movie on Friday night, your mom's friend's cousin would know about it by Saturday morning. I guess it's true that Facebook and Google probably know weird stuff about me; but in every meaningful human sense, I have way more "privacy" than all of my ancestors.
A smaller fraction of people live in small towns now, and a larger fraction of people live in big cities (or at least suburbs) where the physical realm is more anonymous than it used to be.
"This is fine if you’re an ideological libertarian who cares mostly about making the state ineffective."
I know this was tongue-in-cheek, but as an ideological libertarian myself I must object. I don't want an ineffective state--I want a state that is highly effective within a narrowly defined scope.
I know many people who work/worked at the equivalent Australian organisation. The sense I got was always that they just accepted privacy axiomatically as a good thing. No one ever considered trade-offs instead it was just a thing which they did.
There's also a significant portion of maths/computer people working at these places for whom privacy maximisation is quite simply an interesting intellectual challenge and that very quickly becomes an end in and of itself.
Also it is/was very common for Australian civil servants to provide greater privacy protections than are actually necessary under the relevant legislation - because they are simply ignorant of the legislation and operate on a gut feel basis.
The legislation basically allows you to do anything reasonable provided you have consent - which is basically how the private sector operates (with consent buried in T&Cs).
That and the law of unintended consequences isn’t something they consider. I generally have the impression that the same people are both advocates of privacy and government openness and it does not even occur to them that these are mutually exclusive goals. For example, when I worked for the government, the Privacy Act was the greatest gift ever for turning away FOIA requests and denying information to the press.
This reminds me of how I recently had a chance to see how people working in real estate development (at least in the Bay Area) think about "equity" (in the social-justice sense, not the investment sense). I was coming from a pretty lefty policy program, and I expected the real estate development people in contrast to be hard-boiled capitalists. Instead, "equity" was just part of their homework: "the community" wants X, we should partner with Y organization to do Z; and they were happy to do it, but they just absolutely didn't think in philosophical terms about what "equity" means or how we really maximize it, or who "the community" is.
It's tangential, but I guess both are cases where teaching people to think more critically about first principles and higher aims would maybe help.
Yup. They crossed the database of substack subscribers with the database of maths/computer people working at these places, and your name popped right up.
Meta-observation: this was probably the first time Slow Boring alerted me to a topic entirely off my radar. (Well, there was the whole "Chad" thing, but that was half-joking...)
I liked it. An interesting twist on wonkery, reporting detailed changes like this. Good to mix in with some of the opinions - which I like, but often simply reinforce opinions I already hold.
I agree with some parts of the article more than others, but it's intellectually stimulating in the best sense. Indeed, why should I care about XYZ, and what are the effects of policy ABC? What is the goal of all of this? The last question feels especially salient in our current moment of 'rona/Afghanistan discourse.
This would get me to subscribe if I weren't already.
If you did not score high on the "openness to new experiences" variable, you would not subscribe to MY's substack to begin with. So, MY's alerting you about topics that are new to you just reinforces the opinion you already hold, that new and unfamiliar experience is good. It's not really telling you anything new about the value of newness. (Meta-meta-observation).
The thing about all the data gathered online for advertisers is that it's used mostly for bullshit. I work in advertising (though on the creative side, not in data, analytics or media buying) and can tell you a lot of what we do is bullshit. Most of it isn't even in the ads – those are straightforward compared to what vendors, agencies and clients tell each other.
Data is the latest hotness in advertising. Previously we had VR/AR and Account Planners/Strategists with British accents. While there's value in all these, they're often used by agencies to bullshit their clients that they're using the latest, trendiest, ideas. The clients than bullshit their bosses, who bullshits the CMO, who bullshits the CEO, who bullshits the board, who bullshits the shareholders. Though in the end, consumers often walk into a store and buy your client's product instead of a competitor's and this whole apparatus gets justified.
There's a reason targeting often tries to sell you stuff you've already brought and uses dumb localization like "Hey NYC, aren't are our pickup trucks great?". It's because advertisers want to sell to clients way more than they want to sell to consumers.
I agree with Matt's larger point about the perverseness of requiring the administrative state to have less access to private data than commercial entities, but I'm still not clear on what Census is doing for the 2020 census that it especially problematic.
Title 13 requires the census to avoid disclosure. Here's a working paper from the Census (https://www.census.gov/library/working-papers/2018/adrm/cdar2018-01.html) that describes how they did it from 1970-2010. If you ask the Census people, the 2020 method that uses differential privacy is less hamfisted than previous methods that just blanked out tables or injected synthetic data using cruder methods. Are they wrong about that?
What I sort of suspect is happening is that privacy researchers in academia know a lot about differential privacy, which requires lots of fancy statistics and is the new hotness, and so poking holes in the Census's 2020 method generates academic papers in a way that saying, "deleting tables and blanking out data is bad" doesn't. It's fine to say that Census is screwing up differential privacy in 2020 but if it's still an improvement (again, I don't know!) on what they were doing 1970-2010 then we've kind of lost the plot here.
Yeah, have to agree: Matt's stepping into a field that is far, far outside his expertise. Lots of researchers agree that the 2020 method is better for the data, regardless of the privacy. The fact that some there's serious discussion among people who really know differential privacy about implementation doesn't mean it's a bad idea.
“Lots of researchers agree” = almost entirely CS/math academics and privacy advocates and no actual census data users. Differential privacy is well suited to broad statistical analyses where a few general queries are asked of a database. It’s not designed for conveying many thousands of small counts accurately, which is what small communities (not “researchers”) need and expect from the census to answer many basic questions: has my community grown? By how much? Are we getting older, more diverse, etc.? Add more than a little noise to any one community’s data (as DP almost inevitably will) and that community is unfairly disadvantaged. And for what gain?
I have seen a lot of stupidity on this subject. Anybody remember how during the Obama years we had the terrible fears that the NSA would be preserving telephone billing records? Yep. They were going to know who was calling who and when forever. Big deal thinks I. That information is absolutely useless except forensically and it can only be accessed by a FISA warrant. I was quite comfortable being buried in a mass of information of stupendous size. I do not care if anyone in any government knows what my favorite pizza place might be. Forensically on the other hand it was a great idea to collect this information and preserve it.
Timothy McVeigh was caught within hours of committing a horrendous act of terrorism. How? Well the rear axle of the truck he rented and filled with explosives had a serial number on it. All part of quality control and product tracing every manufacturer does. Through GMs database that number was attached to a VIN number which led to the rental outfit that led to Tim. And it happened within minutes of knowing that axle number.
Now a stupendous pile of phone records or similar data is absolutely useless for finding anyone before they commit a criminal or terrorist act. But once they have done it then you are absolutely going to want to know who that person was talking to and probably for years. It was a great idea that enhanced everyone's security. Except for the paranoid for whom nothing is ever satisfactory.
Yea, I think that we can say with confidence that the technological means to maintain massive databases of just about everything exist, and that as such this will happen, everywhere.
The goal of "privacy advocates" should now be to limit how the data can be used by governments, lest we end up with China's genuinely terrifying panopticon.
My recollection of some of these NSA metadata complaints were that the warrant protections were insufficient - some of these warrant requests were always rubber-stamped yes.
For things that should only be available "with a warrant", controlling warrant access appropriately seems like a good way to try and regulate privacy.
> Anybody remember how during the Obama years we had the terrible fears that the NSA would be preserving telephone billing records? Yep. They were going to know who was calling who and when forever. Big deal thinks I. That information is absolutely useless except forensically and it can only be accessed by a FISA warrant.
They literally used that information to designate people as terrorists and kill them. Even if you think call logs don't provide any useful information, they certainly thought the information they had was enough to justify killing people. How do you know a future administration won't decide to play 7 Degrees of Terrorist Bacon and kill you? Or someone you care about? Or decide to play 7 Degrees of Political Opponent instead.
> Some jurisdictions have moved toward automated enforcement of speeding rules via cameras which is good ...
This is bad, actually. Yeah it catches more violators, but speed limits being inviolable is a problem. Speed limits aren't well-suited to describing the actual circumstances of the road, where sometimes it makes sense to go faster, and sometimes it makes sense to go slower. If you don't let someone go 50 in a 40 when the road is open and visibility is good, you're actually just incinerating time from people's lives for no gain.
If we want automatically enforced speed limits, we should make speed limits suck less, first
Now I’ve heard everything. Excessive speed is a major factor in automobile accidents. These accidents often kill people, especially in the United States. We’re willing to “Incinerate time” because the benefit (allowing more people to live) outweighs the costs.
The benefit sometimes does and sometimes doesn't outweigh the costs -- it's certainly not true that you'd support a 10 MPH speed limit on all roads at all times. And I'm obviously not saying that speed limits are always bad, but rather, that they're generally not well-calibrated and the fact that they don't adapt to circumstances means that even if we did pick the best single number for each road (which we don't do) we'd be leaving a lot on the table
IDK about that. I've often heard arguments from the road safety lobby along lines of "x % of vehicle collisions involve speed". By which they mean any speed in excess of speed limit.
But what % of vehicles are exceeding speed limits in free-flowing traffic anyway?
This is one of the dumbest takes I've seen in Matt's comments. Your ability to kill someone on an urban road basically rises exponentially. Just going 10 miles over the 25 mph speed limit turns any collision with a pedestrian from something they'll probably survive to something that is almost certainly going to kill them. Now on controlled access highways things are different, and we can debate speed there, but basically anywhere in the US where pedestrians and vehicles interact the speed limit is probably too high.
The speed cameras on Connecticut Avenue in DC (actually just inside Montgomery County) have done wonders to slow the average speed to the posted 30 mph. The road itself is capable of much higher speeds but it is interspersed with pedestrian heavy retail and residential stretches.
DC itself is pushing for 20 mph as the standard surface street speed limit. As a bicyclist this would go a long way to making streets much more bicycle friendly.
Thing is, there are just two cameras, quite visible and a couple hundred yards apart. So, you tap your brakes twice in 20 seconds and you're good to go as fast as you like
One of the other issues if that if the limit is 65, but everyone on the road is going 70, then it's better if I go 70. On the other hand, if everyone who went 70 were getting tickets, then that probably wouldn't be an issue. But... when cameras aren't ubiquitous yet... which kind of jurisdiction am I in?
I'd be curious about putting cameras in that don't actually ticket people and gathering information that way first.
I am not authorized to speak on behalf of my employer, but I want to say all the big tech companies are working on differential privacy. I personally work on differential privacy at a big tech firm and I know people who do the same at every other big tech firm. The Census in this case is helping set an example.
Also, the idea of differential privacy is not really just that you just aggregate at higher levels to eliminate the noise. The shape of the introduced noise is disclosed. When you do statistical analysis, you always assume some noise in your data. When analyzing differentially-private datasets, you basically just include the noise as part of your model.
The other really nice thing about differential privacy is that it is quantifiable and provable. That's going to make it a lot easier to communicate about privacy, set standards, etc... In and of itself, the Census' adoption isn't going to make a huge difference. But it is contributing to an overall trend in the correct direction.
Thank you for helping clarify the source of this movement—DP is being driven by tech bros and CS nerds.
However, of this is helpful for communication. It’s obviously much clearer for the census to say that they swap some characteristics in certain tables than it is for them to use arcane “noise injection” methods.
I also work at a big tech firm on differential privacy and my impression is that we make a big deal about it because if we don't, and something bad happens, which it has in the past and probably will again, everyone will be like "Well why didn't you do this one simple thing to prevent this from happening" and it is not an acceptable answer to say "Because actually nobody cares" which although it may be true does not appear from the perspective of elected officials or regulators to be the case.
The first differentially-private mechanism we know of was designed for social science surveys in the 1950s to study sensitive topics such as sexuality and crime. The formal definition of differential privacy we use is from a 2006 paper whose first author is Cynthia Dwork. Dismissing DP as the invention of tech bros is both the genetic fallacy and factually incorrect.
I’m asserting that the *demand* is an invention of obsessive tech bros—which it is. No normal citizens or political leaders are demanding that you folks fuck up the census data and ruin the apportionment process. This is a solution in search of a problem, classic Silicon Valley.
The census bureau does not add noise to apportionment-related figures. Their mandate to use differential privacy explicitly prohibited them from doing that.
The census bureau is required by law to keep the information they collect confidential and that's why they are adopting this technique instead of using other techniques that do not maintain confidentiality. Maybe keeping the census data confidential is unimportant, but for now, it is actually legally mandated and dp is the best way to do it if they want to release useful information.
On a lighter subject than terrorism consider transportation planning. You know what is really useful to planning that? Knowing where you are coming from and where you are going door to door. And not just when you turn a turnstyle. If you want better public transportation stop being a paranoid idiot.
This is a bit tangential, but I've always wondered why there was so much hand wringing about the privacy implications of supermarket and drug store loyalty cards. The deal always seemed straightforward:
1. The store would like to know what you're buying
2. This data has value so they pay you for it in the form of discounts
3. It's easy to opt out on any transaction — don't scan the card, or if you're paranoid, use cash
4. This is information I want the store to know – I'm happy to tell the store which SKU of toothpaste I use so they're more likely to stock it
5. The most privacy concerning purchases – prescription meds – I believe are covered by HIPPA
There was a really good article a while back (NYT, I think) about Target's early attempts to send individualized flyers to households based on their purchase history. It turns out, that creates enough of a backlash they had to implement counterintelligence in designing their flyers - Target was figuring out who was pregnant before the rest of the household was informed.
Ha, I remember that one. On a sadder note, it was very depressing to get diaper and formula samples after I went through a miscarriage… (I went on to have a healthy kiddo so it’s water under the bridge now!)
My store mails me coupons for the specific items I buy, and when I order online, they helpfully suggest things I may have forgotten that I often buy. I love it! I wish they went a step further and just predicted everything I was going to buy each week, ha!
I wish they did too and were good only for ordering online. My experience of coupons is they go right to the trash and are good only for old ladies digging through their purses while inconveniencing everyone behind them in the checkout line.
I don't really understand the practical concerns about privacy. The people I know who are super concerned have a wildly inflated view of their own importance. When they rant I just think to myself - no one gives a shit about your or what you do.
That said I can certainly understand if you're friends with our boss or co-workers on Facebook don't post photos of your drunken shenanigans at 12:30am and call in sick later that morning. Is that what people are talking about? I don't get the sense that they are. It's more a concern about some faceless entity doing something they can't quite define.
The surveillance the FBI conducted against Muslim Americans after 9/11 was a nightmare, to take a recent example. When I feel concerned about privacy, I’m mostly concerned about what would happen to myself or others who get on the wrong side of the US government surveillance apparatus. Surveillance capitalism seems only to support that vast potential for truly scary stuff. We don’t know the future of American politics and I don’t trust those guys.
It seems like what you're talking about is related more to intent and much less to capabilities. If I wanted to find all the muslims in America I wouldn't start with a database reconstruction attack. The risk to muslims from whatever I was doing would be 99.999% whatever intent I had behind it, not the tiny usefulness of additional census data.
Is there a quick summary about what the nightmare was for Muslim Americans? I remember some nightmarish things like being secretly put on do-not-fly lists, but I wasn't sure what the nightmares were that related to privacy.
How about infiltrating churches, synagogues and temples as well as mosques, and broadcasting what the priests, rabbis and imans are spouting to the devout. Would make the 75% of us who loathe religion more confident our government is on top of things.
Thanks - I remembered some stuff about undercover agents infiltrating Islamic groups, but couldn't think about what in that would be "nightmarish". The sting aspect seems deeply problematic, but I'm not totally sure about what the problem is of having government agents within various organizations.
> I'm not totally sure about what the problem is of having government agents within various organizations.
Lawyers always tell you that you should always invoke the 5th amendment when speaking to the police. You don't know every law on the book, so you don't know if some common innocuous thing you're admitting to is actually illegal. You don't know if you're admitting to something innocent that can be contextualized as not innocent. Having government agents infiltrate organizations means you're potentially always talking with the police. I know people who live in places where there are government spies everywhere and it's not a pleasant way to live.
For one thing, data from the Census Bureau was used to identify where Japanese-Americans lived during WW2 and round them up. This is a step away from them being able to do that kind of thing.
I've noticed that privacy advocates tend to take a very siloed approach when evaluating these issues.
So for COVID19 tracking apps, they might say "here's all the possible data it could reveal about you", rather than "here's the additional information it could reveal, on top of what Facebook and the cellular carriers already track."
That's because if tomorrow we try to get Facebook to be more privacy-preserving, people will respond with "Well, the COVID19 tracking app knows this, so why should we care?" Basically, we're in a flooding ship trying to get some pumps working and while it won't make a big incremental difference today, punching more holes in the hull is going to make things harder in the long run.
I don't know whether opt-in contact tracing was ultimately a good goal or not, but personally it made me feel more comfortable. I really appreciated that Google and Apple seemed to work really hard on making an API that could be used by health services to track data between nearby phones for spread _without_ revealing who those people were (keys changed regularly, it would push to YOU that you had been near someone, but not tell health services).
I felt completely reassured by this, and was eager to download the contact tracing app that supported their API and.... couldn't ever find it. And never saw advertisements or any info on twitter or anything else on how to download it. (Maybe it came about later, but I was barely coming in contact with anyone anyway, so there's some inertia to over come for checking)
I'm not convinced privacy issues were the problem here, in that even when I _wanted_ to download an app with these protections that Google/Apple made possible, I could not FIND it.
As for cops bricking cars - sounds good, but any backdoor you put in a car is something that can be used by rogue actors. If you create a backdoor for the gov't you create a backdoor for everyone. I'm _more_ in favor of allowing existing backdoors to be usable by the gov't(perhaps to help with the census) than creating new ones.
On several occasions I’ve done research on sensitive business data that had to be provided by a firm with a government agency overseeing. Invariably the firm reps tell me their competitors already know all this stuff and are pretty nonchalant about disclosure while the gov agency insists I can’t report X or Y because the firms have a right to privacy. So, names can’t be used, data are turned into indices that can’t be reverse engineered, regressions redact coefficients, stuff like that. The agency is just following the rules, of course, but it’s always funny how the protected business shrugs with a “meh” during the data gathering.
As one of the "meh" answerers, there are usually only 1% of the private data that actually matters, and for that 1% it's usually sufficient that disclosure is legally forbidden. That reduces the risk of harm to acceptable levels.
Regressions redact coefficients? Leaving aside what that has to do with sensitive business data, what is one supposed to do with a regression that has no coefficients? Or is providing nonsense the point?
Yeah, I know. So, what the agency asks us to do is usually something like redact all the intercepts and one coefficient on a particular variable. They are worried someone will be able to figure out a particular firm from the underlying regression.
I don't think people set little store by "privacy" writ large, it's just that conversations about privacy are usually pretty remote from the kinds of privacy that I care about and that I suspect a lot of people care about. Why should anyone care about an Amazon algorithm, where no humans see the results connected to the customer's name, guessing their age or other demographics, in order to show you stuff they think you'd like to buy? It's different when it's the government, though; the government can fine, conscript, or arrest people, among other materially harmful things. People care about privacy relationally; it just doesn't make sense to think of privacy as a binary of "information public" vs. "information hidden". What people care about is like "I don't want my employer seeing my exercise habits / my relatives seeing my sexts / hiring managers and landlords seeing my political activities or family planning intentions / abusive exes seeing my physical location / the state keeping a record of my conduct for use in some kind of social credit system." What's worrisome is that the government can get access via subpoenas to some of the data private companies are tracking, but it's hard to see how making the census marginally less accurate helps.
Good point. These examples clarify what goals-based privacy should try to achieve, instead of vague hand wringing over database reconstructions.
It's also points out that some data needs more privacy than others. Medical data: Yea, needs to be private. The kind of toilet paper I use: Happy to let anyone know.
The loss of digital privacy coincides with a massive surplus of privacy in physical reality. Try growing up in a small town. If you took a different girl than usual to the Blockbuster to pick up a movie on Friday night, your mom's friend's cousin would know about it by Saturday morning. I guess it's true that Facebook and Google probably know weird stuff about me; but in every meaningful human sense, I have way more "privacy" than all of my ancestors.
Why? Gossip still exists today.
A smaller fraction of people live in small towns now, and a larger fraction of people live in big cities (or at least suburbs) where the physical realm is more anonymous than it used to be.
"This is fine if you’re an ideological libertarian who cares mostly about making the state ineffective."
I know this was tongue-in-cheek, but as an ideological libertarian myself I must object. I don't want an ineffective state--I want a state that is highly effective within a narrowly defined scope.
I know many people who work/worked at the equivalent Australian organisation. The sense I got was always that they just accepted privacy axiomatically as a good thing. No one ever considered trade-offs instead it was just a thing which they did.
There's also a significant portion of maths/computer people working at these places for whom privacy maximisation is quite simply an interesting intellectual challenge and that very quickly becomes an end in and of itself.
Also it is/was very common for Australian civil servants to provide greater privacy protections than are actually necessary under the relevant legislation - because they are simply ignorant of the legislation and operate on a gut feel basis.
The legislation basically allows you to do anything reasonable provided you have consent - which is basically how the private sector operates (with consent buried in T&Cs).
That and the law of unintended consequences isn’t something they consider. I generally have the impression that the same people are both advocates of privacy and government openness and it does not even occur to them that these are mutually exclusive goals. For example, when I worked for the government, the Privacy Act was the greatest gift ever for turning away FOIA requests and denying information to the press.
This reminds me of how I recently had a chance to see how people working in real estate development (at least in the Bay Area) think about "equity" (in the social-justice sense, not the investment sense). I was coming from a pretty lefty policy program, and I expected the real estate development people in contrast to be hard-boiled capitalists. Instead, "equity" was just part of their homework: "the community" wants X, we should partner with Y organization to do Z; and they were happy to do it, but they just absolutely didn't think in philosophical terms about what "equity" means or how we really maximize it, or who "the community" is.
It's tangential, but I guess both are cases where teaching people to think more critically about first principles and higher aims would maybe help.
> maths/computer people working at these places for whom privacy maximisation is quite simply an interesting intellectual challenge
I feel like I'm being called out.
Yup. They crossed the database of substack subscribers with the database of maths/computer people working at these places, and your name popped right up.
Meta-observation: this was probably the first time Slow Boring alerted me to a topic entirely off my radar. (Well, there was the whole "Chad" thing, but that was half-joking...)
I liked it. An interesting twist on wonkery, reporting detailed changes like this. Good to mix in with some of the opinions - which I like, but often simply reinforce opinions I already hold.
I agree with some parts of the article more than others, but it's intellectually stimulating in the best sense. Indeed, why should I care about XYZ, and what are the effects of policy ABC? What is the goal of all of this? The last question feels especially salient in our current moment of 'rona/Afghanistan discourse.
This would get me to subscribe if I weren't already.
If you did not score high on the "openness to new experiences" variable, you would not subscribe to MY's substack to begin with. So, MY's alerting you about topics that are new to you just reinforces the opinion you already hold, that new and unfamiliar experience is good. It's not really telling you anything new about the value of newness. (Meta-meta-observation).
The thing about all the data gathered online for advertisers is that it's used mostly for bullshit. I work in advertising (though on the creative side, not in data, analytics or media buying) and can tell you a lot of what we do is bullshit. Most of it isn't even in the ads – those are straightforward compared to what vendors, agencies and clients tell each other.
Data is the latest hotness in advertising. Previously we had VR/AR and Account Planners/Strategists with British accents. While there's value in all these, they're often used by agencies to bullshit their clients that they're using the latest, trendiest, ideas. The clients than bullshit their bosses, who bullshits the CMO, who bullshits the CEO, who bullshits the board, who bullshits the shareholders. Though in the end, consumers often walk into a store and buy your client's product instead of a competitor's and this whole apparatus gets justified.
There's a reason targeting often tries to sell you stuff you've already brought and uses dumb localization like "Hey NYC, aren't are our pickup trucks great?". It's because advertisers want to sell to clients way more than they want to sell to consumers.
I agree with Matt's larger point about the perverseness of requiring the administrative state to have less access to private data than commercial entities, but I'm still not clear on what Census is doing for the 2020 census that it especially problematic.
Title 13 requires the census to avoid disclosure. Here's a working paper from the Census (https://www.census.gov/library/working-papers/2018/adrm/cdar2018-01.html) that describes how they did it from 1970-2010. If you ask the Census people, the 2020 method that uses differential privacy is less hamfisted than previous methods that just blanked out tables or injected synthetic data using cruder methods. Are they wrong about that?
What I sort of suspect is happening is that privacy researchers in academia know a lot about differential privacy, which requires lots of fancy statistics and is the new hotness, and so poking holes in the Census's 2020 method generates academic papers in a way that saying, "deleting tables and blanking out data is bad" doesn't. It's fine to say that Census is screwing up differential privacy in 2020 but if it's still an improvement (again, I don't know!) on what they were doing 1970-2010 then we've kind of lost the plot here.
Yeah, have to agree: Matt's stepping into a field that is far, far outside his expertise. Lots of researchers agree that the 2020 method is better for the data, regardless of the privacy. The fact that some there's serious discussion among people who really know differential privacy about implementation doesn't mean it's a bad idea.
“Lots of researchers agree” = almost entirely CS/math academics and privacy advocates and no actual census data users. Differential privacy is well suited to broad statistical analyses where a few general queries are asked of a database. It’s not designed for conveying many thousands of small counts accurately, which is what small communities (not “researchers”) need and expect from the census to answer many basic questions: has my community grown? By how much? Are we getting older, more diverse, etc.? Add more than a little noise to any one community’s data (as DP almost inevitably will) and that community is unfairly disadvantaged. And for what gain?
The only gain is the satisfaction and career advancement of weird nerds
No actual social science researchers think this is an improvement. CS nerds love DP in and of itself.
That sounds extremely plausible to me.
I have seen a lot of stupidity on this subject. Anybody remember how during the Obama years we had the terrible fears that the NSA would be preserving telephone billing records? Yep. They were going to know who was calling who and when forever. Big deal thinks I. That information is absolutely useless except forensically and it can only be accessed by a FISA warrant. I was quite comfortable being buried in a mass of information of stupendous size. I do not care if anyone in any government knows what my favorite pizza place might be. Forensically on the other hand it was a great idea to collect this information and preserve it.
Timothy McVeigh was caught within hours of committing a horrendous act of terrorism. How? Well the rear axle of the truck he rented and filled with explosives had a serial number on it. All part of quality control and product tracing every manufacturer does. Through GMs database that number was attached to a VIN number which led to the rental outfit that led to Tim. And it happened within minutes of knowing that axle number.
Now a stupendous pile of phone records or similar data is absolutely useless for finding anyone before they commit a criminal or terrorist act. But once they have done it then you are absolutely going to want to know who that person was talking to and probably for years. It was a great idea that enhanced everyone's security. Except for the paranoid for whom nothing is ever satisfactory.
Yea, I think that we can say with confidence that the technological means to maintain massive databases of just about everything exist, and that as such this will happen, everywhere.
The goal of "privacy advocates" should now be to limit how the data can be used by governments, lest we end up with China's genuinely terrifying panopticon.
My recollection of some of these NSA metadata complaints were that the warrant protections were insufficient - some of these warrant requests were always rubber-stamped yes.
For things that should only be available "with a warrant", controlling warrant access appropriately seems like a good way to try and regulate privacy.
Agreed.
That and data security enhancements.
But the latter is a hard sell.
> Anybody remember how during the Obama years we had the terrible fears that the NSA would be preserving telephone billing records? Yep. They were going to know who was calling who and when forever. Big deal thinks I. That information is absolutely useless except forensically and it can only be accessed by a FISA warrant.
They literally used that information to designate people as terrorists and kill them. Even if you think call logs don't provide any useful information, they certainly thought the information they had was enough to justify killing people. How do you know a future administration won't decide to play 7 Degrees of Terrorist Bacon and kill you? Or someone you care about? Or decide to play 7 Degrees of Political Opponent instead.
> Some jurisdictions have moved toward automated enforcement of speeding rules via cameras which is good ...
This is bad, actually. Yeah it catches more violators, but speed limits being inviolable is a problem. Speed limits aren't well-suited to describing the actual circumstances of the road, where sometimes it makes sense to go faster, and sometimes it makes sense to go slower. If you don't let someone go 50 in a 40 when the road is open and visibility is good, you're actually just incinerating time from people's lives for no gain.
If we want automatically enforced speed limits, we should make speed limits suck less, first
Now I’ve heard everything. Excessive speed is a major factor in automobile accidents. These accidents often kill people, especially in the United States. We’re willing to “Incinerate time” because the benefit (allowing more people to live) outweighs the costs.
The benefit sometimes does and sometimes doesn't outweigh the costs -- it's certainly not true that you'd support a 10 MPH speed limit on all roads at all times. And I'm obviously not saying that speed limits are always bad, but rather, that they're generally not well-calibrated and the fact that they don't adapt to circumstances means that even if we did pick the best single number for each road (which we don't do) we'd be leaving a lot on the table
IDK about that. I've often heard arguments from the road safety lobby along lines of "x % of vehicle collisions involve speed". By which they mean any speed in excess of speed limit.
But what % of vehicles are exceeding speed limits in free-flowing traffic anyway?
This is one of the dumbest takes I've seen in Matt's comments. Your ability to kill someone on an urban road basically rises exponentially. Just going 10 miles over the 25 mph speed limit turns any collision with a pedestrian from something they'll probably survive to something that is almost certainly going to kill them. Now on controlled access highways things are different, and we can debate speed there, but basically anywhere in the US where pedestrians and vehicles interact the speed limit is probably too high.
I don’t think we’re talking about core urban grids here, but roads whose physical characteristics are wildly out of line with their posted limits.
Lots of core urban roads are like that. Even in a relatively walkable place like Chicago the half mile arterials can be quite dangerous to cross.
The speed cameras on Connecticut Avenue in DC (actually just inside Montgomery County) have done wonders to slow the average speed to the posted 30 mph. The road itself is capable of much higher speeds but it is interspersed with pedestrian heavy retail and residential stretches.
DC itself is pushing for 20 mph as the standard surface street speed limit. As a bicyclist this would go a long way to making streets much more bicycle friendly.
Thing is, there are just two cameras, quite visible and a couple hundred yards apart. So, you tap your brakes twice in 20 seconds and you're good to go as fast as you like
One of the other issues if that if the limit is 65, but everyone on the road is going 70, then it's better if I go 70. On the other hand, if everyone who went 70 were getting tickets, then that probably wouldn't be an issue. But... when cameras aren't ubiquitous yet... which kind of jurisdiction am I in?
I'd be curious about putting cameras in that don't actually ticket people and gathering information that way first.
Speed limits are good example of "whatever is measurable becomes the metric".
Speed is easy to measure, so used a simple proxy for driving safety. Even where the evidence shows that the slowest drivers are not the safest.
Slower roads are safer though, so IDK what your'e going on about.
I am not authorized to speak on behalf of my employer, but I want to say all the big tech companies are working on differential privacy. I personally work on differential privacy at a big tech firm and I know people who do the same at every other big tech firm. The Census in this case is helping set an example.
Also, the idea of differential privacy is not really just that you just aggregate at higher levels to eliminate the noise. The shape of the introduced noise is disclosed. When you do statistical analysis, you always assume some noise in your data. When analyzing differentially-private datasets, you basically just include the noise as part of your model.
The other really nice thing about differential privacy is that it is quantifiable and provable. That's going to make it a lot easier to communicate about privacy, set standards, etc... In and of itself, the Census' adoption isn't going to make a huge difference. But it is contributing to an overall trend in the correct direction.
Thank you for helping clarify the source of this movement—DP is being driven by tech bros and CS nerds.
However, of this is helpful for communication. It’s obviously much clearer for the census to say that they swap some characteristics in certain tables than it is for them to use arcane “noise injection” methods.
I also work at a big tech firm on differential privacy and my impression is that we make a big deal about it because if we don't, and something bad happens, which it has in the past and probably will again, everyone will be like "Well why didn't you do this one simple thing to prevent this from happening" and it is not an acceptable answer to say "Because actually nobody cares" which although it may be true does not appear from the perspective of elected officials or regulators to be the case.
Are there elected officials who are actually demanding that the census bureau put out fake results? Or is this just an invention of obsessive CS bros?
The first differentially-private mechanism we know of was designed for social science surveys in the 1950s to study sensitive topics such as sexuality and crime. The formal definition of differential privacy we use is from a 2006 paper whose first author is Cynthia Dwork. Dismissing DP as the invention of tech bros is both the genetic fallacy and factually incorrect.
I’m asserting that the *demand* is an invention of obsessive tech bros—which it is. No normal citizens or political leaders are demanding that you folks fuck up the census data and ruin the apportionment process. This is a solution in search of a problem, classic Silicon Valley.
The census bureau does not add noise to apportionment-related figures. Their mandate to use differential privacy explicitly prohibited them from doing that.
The census bureau is required by law to keep the information they collect confidential and that's why they are adopting this technique instead of using other techniques that do not maintain confidentiality. Maybe keeping the census data confidential is unimportant, but for now, it is actually legally mandated and dp is the best way to do it if they want to release useful information.
On a lighter subject than terrorism consider transportation planning. You know what is really useful to planning that? Knowing where you are coming from and where you are going door to door. And not just when you turn a turnstyle. If you want better public transportation stop being a paranoid idiot.
This is a bit tangential, but I've always wondered why there was so much hand wringing about the privacy implications of supermarket and drug store loyalty cards. The deal always seemed straightforward:
1. The store would like to know what you're buying
2. This data has value so they pay you for it in the form of discounts
3. It's easy to opt out on any transaction — don't scan the card, or if you're paranoid, use cash
4. This is information I want the store to know – I'm happy to tell the store which SKU of toothpaste I use so they're more likely to stock it
5. The most privacy concerning purchases – prescription meds – I believe are covered by HIPPA
There was a really good article a while back (NYT, I think) about Target's early attempts to send individualized flyers to households based on their purchase history. It turns out, that creates enough of a backlash they had to implement counterintelligence in designing their flyers - Target was figuring out who was pregnant before the rest of the household was informed.
Ha, I remember that one. On a sadder note, it was very depressing to get diaper and formula samples after I went through a miscarriage… (I went on to have a healthy kiddo so it’s water under the bridge now!)
My store mails me coupons for the specific items I buy, and when I order online, they helpfully suggest things I may have forgotten that I often buy. I love it! I wish they went a step further and just predicted everything I was going to buy each week, ha!
I wish they did too and were good only for ordering online. My experience of coupons is they go right to the trash and are good only for old ladies digging through their purses while inconveniencing everyone behind them in the checkout line.
^ …it me.
I don't really understand the practical concerns about privacy. The people I know who are super concerned have a wildly inflated view of their own importance. When they rant I just think to myself - no one gives a shit about your or what you do.
That said I can certainly understand if you're friends with our boss or co-workers on Facebook don't post photos of your drunken shenanigans at 12:30am and call in sick later that morning. Is that what people are talking about? I don't get the sense that they are. It's more a concern about some faceless entity doing something they can't quite define.
The surveillance the FBI conducted against Muslim Americans after 9/11 was a nightmare, to take a recent example. When I feel concerned about privacy, I’m mostly concerned about what would happen to myself or others who get on the wrong side of the US government surveillance apparatus. Surveillance capitalism seems only to support that vast potential for truly scary stuff. We don’t know the future of American politics and I don’t trust those guys.
It seems like what you're talking about is related more to intent and much less to capabilities. If I wanted to find all the muslims in America I wouldn't start with a database reconstruction attack. The risk to muslims from whatever I was doing would be 99.999% whatever intent I had behind it, not the tiny usefulness of additional census data.
I was replying to BronxZooCobra’s comment “It's more a concern about some faceless entity doing something they can't quite define.”
Is there a quick summary about what the nightmare was for Muslim Americans? I remember some nightmarish things like being secretly put on do-not-fly lists, but I wasn't sure what the nightmares were that related to privacy.
Some flavor. This is about “infiltrating” mosques, but there was a ton of online surveillance that accompanies these tactics: https://www.reuters.com/world/us/us-supreme-court-takes-up-fbi-bid-block-muslim-civil-rights-suit-2021-06-07/
How about infiltrating churches, synagogues and temples as well as mosques, and broadcasting what the priests, rabbis and imans are spouting to the devout. Would make the 75% of us who loathe religion more confident our government is on top of things.
Thanks - I remembered some stuff about undercover agents infiltrating Islamic groups, but couldn't think about what in that would be "nightmarish". The sting aspect seems deeply problematic, but I'm not totally sure about what the problem is of having government agents within various organizations.
> I'm not totally sure about what the problem is of having government agents within various organizations.
Lawyers always tell you that you should always invoke the 5th amendment when speaking to the police. You don't know every law on the book, so you don't know if some common innocuous thing you're admitting to is actually illegal. You don't know if you're admitting to something innocent that can be contextualized as not innocent. Having government agents infiltrate organizations means you're potentially always talking with the police. I know people who live in places where there are government spies everywhere and it's not a pleasant way to live.
The facts here seem like a pretty major 4th amendment problem. That’s a nightmare to me. But sure, nightmare is subjective.
Sure is! the Fourth Amendment is the 'nightmare' to me. The second Amendment too. I'm sure there're some others.
For one thing, data from the Census Bureau was used to identify where Japanese-Americans lived during WW2 and round them up. This is a step away from them being able to do that kind of thing.
I've noticed that privacy advocates tend to take a very siloed approach when evaluating these issues.
So for COVID19 tracking apps, they might say "here's all the possible data it could reveal about you", rather than "here's the additional information it could reveal, on top of what Facebook and the cellular carriers already track."
That's because if tomorrow we try to get Facebook to be more privacy-preserving, people will respond with "Well, the COVID19 tracking app knows this, so why should we care?" Basically, we're in a flooding ship trying to get some pumps working and while it won't make a big incremental difference today, punching more holes in the hull is going to make things harder in the long run.
I don't know whether opt-in contact tracing was ultimately a good goal or not, but personally it made me feel more comfortable. I really appreciated that Google and Apple seemed to work really hard on making an API that could be used by health services to track data between nearby phones for spread _without_ revealing who those people were (keys changed regularly, it would push to YOU that you had been near someone, but not tell health services).
I felt completely reassured by this, and was eager to download the contact tracing app that supported their API and.... couldn't ever find it. And never saw advertisements or any info on twitter or anything else on how to download it. (Maybe it came about later, but I was barely coming in contact with anyone anyway, so there's some inertia to over come for checking)
I'm not convinced privacy issues were the problem here, in that even when I _wanted_ to download an app with these protections that Google/Apple made possible, I could not FIND it.
As for cops bricking cars - sounds good, but any backdoor you put in a car is something that can be used by rogue actors. If you create a backdoor for the gov't you create a backdoor for everyone. I'm _more_ in favor of allowing existing backdoors to be usable by the gov't(perhaps to help with the census) than creating new ones.
On several occasions I’ve done research on sensitive business data that had to be provided by a firm with a government agency overseeing. Invariably the firm reps tell me their competitors already know all this stuff and are pretty nonchalant about disclosure while the gov agency insists I can’t report X or Y because the firms have a right to privacy. So, names can’t be used, data are turned into indices that can’t be reverse engineered, regressions redact coefficients, stuff like that. The agency is just following the rules, of course, but it’s always funny how the protected business shrugs with a “meh” during the data gathering.
As one of the "meh" answerers, there are usually only 1% of the private data that actually matters, and for that 1% it's usually sufficient that disclosure is legally forbidden. That reduces the risk of harm to acceptable levels.
Regressions redact coefficients? Leaving aside what that has to do with sensitive business data, what is one supposed to do with a regression that has no coefficients? Or is providing nonsense the point?
Yeah, I know. So, what the agency asks us to do is usually something like redact all the intercepts and one coefficient on a particular variable. They are worried someone will be able to figure out a particular firm from the underlying regression.