Whisper is a company that blew it, big time. In theory, you can use it to anonymously send electronic postcards, and to receive replies. This generates lots of sharing of very private or sensitive information. Whisper generates money by curating the content that transits through them, and forming partnerships with news organisations. While Guardian journalists were at their HQ to investigate a potential business deal, the sales pitch they received markedly differed from the pitch normally given to users: Whisper commits egregious and numerous violations of privacy, or at least of the privacy levels promised in their own Terms of Service. Journalists have thicker legitimacy in society, and indeed in this case they really had to make public some of this information: some data was monitored by the US military, a clear no-no for a service used by some for whistleblowing. Judging from the pictures included in the article, the Guardian was there because it was interested in leaks of information from sensitive locations, but must have quickly determined that the legitimacy of Whisper was much too thin to actually partner with. When this came out, the Whisper crew basically tried to accuse the journalists of grave violations of their own ethical standards (inventing characters, quotes, situations). That is not working out so well for Whisper right now.
Do we need to talk about Facebook? It has a history of eroding the public notion of privacy. While Zuckerberg now says that "Privacy is an evolving social norm", this statement hides that he very personally created this situation. He also said: "Move fast and break things. Unless you are breaking things you are not moving fast enough" and even earlier "They trust me - dumb fucks". By intentionally breaking up parts of his own product repeatedly, he was able to lull the general public (or at least a significant enough portion of it) into compliance.
But now Facebook goes beyond that. Around June 29th 2014, an article was published in a scientific journal describing the analysis of data collected by Facebook. Generously described, the goal was to see how the newsfeed affected the emotion of Facebook users. If happy messages are promoted, will that make you happier? Of course, since this is A/B testing, you will need to test also the converse: if sad messages are promoted, will that make you more sad? Leave the service?
[W]e may use the information we receive about you[..] for internal operations, including troubleshooting, data analysis, testing, research and service improvement.
The reactions to the publication of the experiment were very interesting:
- the public reaction was bipolar: either shocked or unimpressed;
- Cornell defended its ethical IRB: it only formally intervened after the data was collected, so it didn't have to go through IRB, or at least it was easier to let it through. This obscures the fact that the scientists involved apparently informally consulted with Facebook before the data was collected.
- James Grimmelmann, a law professor at the University of Maryland, is arguing that what Facebook did is illegal under Maryland law: because it presented its study as science, Facebook may have violated State law on human subject testing.
In other words, Facebook is now out to erode the whole scientific process in the social sciences, and in a very good position to succeed. The lure of big data access is proving irresistible to scientists, and leading them to cut corners. Jay Rosen, an NYU journalism professor, argued on July 3rd that this was toxic and corrosive to universities, and clearly introduced this distinction between thick and thin legitimacies.
I think he is very correct. Both research and industry can break things, or even people (in psychology experiments). But there are operating procedures that differ in both worlds, and the ultimate burden when these legitimacies collaborate should be on the thick side, not the thin one.
Coursera is very interesting in connection to the Facebook experiment: MOOCs will enable all kinds of data collection, that will lead to interesting research.
Records of your participation in Online Courses may be used for researching online education. In the interests of this research, you may be exposed to slight variations in the course materials that will not substantially alter your learning experience. All research findings will be reported at the aggregate level and will not expose your personal identity.
- When Coursera discusses A/B testing, they invite someone from LinkedIn. However grandiloquent, Coursera is the stuff of thin legitimacy. Their practices come from an industry that has spawned Google, Facebook, etc, and engineers move between those companies. At the 57:00 mark in the video, the head of analytics of Coursera shares what they themselves do. With the goal of improving the user experience, Coursera is already performing hundreds of experiments on its users. Should this be 3 pixels wide? 4? Fine. But they also do other experiments that get much closer to uncomfortable: A/B testing on emails to see which interventions help keep users engaged with the course, for instance. This can be seen as marketing (because of the clear distinction email/Coursera forums), but will that distinction remain clear once on mobile, when these same interventions occur in-app or via push notifications? For how long will Coursera need to maintain this distinction? Is this even legal in Maryland? The Coursera employee only cites a few experiments there, but there are actually hundreds. What are the others? Are those ethically sensitive? Who makes that distinction? Is he even supposed to think of that? Did anyone approve the ethics of using robots in Coursera forums, to answer student questions?
- Once the data is owned by thin legitimacy, thick legitimacy has to fight to get back access to it. Coursera has a bad track record there. They make promises that they are not able or willing to keep.
- There is a risk that the proximity and aligned interests of academics and Coursera would lead to abuses. The history of psychology is full of these cases, where scientists forgot that they were dealing with human subjects. I suspect the a-posteriori-analysis-of-already-collected-data trick will become widespread. Did anything in this discussion already step over that line? Probably not. But this post is over a year old, and I do know that when I encouraged transparency at Coursera, three days before the Facebook experiment came out, and explicitly mentioned that link, those issues were not understood. The post and comments between Mike Caufield, Alan Levine, Derek Bruff, Kate Bowles, Kristin Palmer and others highlight clearly how the IRB process can be gamed, and will be gamed (I am not saying or implying these posters did game it): one person with thick legitimacy collects the data via an outfit with thin legitimacy, another one with thick legitimacy analyses it. I also know that four months later I still don't hear any discussion about ethics coming from Coursera itself. I also know that Coursera has a Partner Portal, where such IRB discussions are encouraged but walled, while conveniently centralising information for Coursera itself (i.e. when they want to extend their service in one controversial direction, it will be easier to find partners with thick legitimacy who are willing to break things/people). When making the videos for my Coursera course Massive Teaching: New skills required, about a month earlier, I made some of this clear, although it was still phrased purely positively: