"Don't be evil", or how I learned to behave like a startup and love the data


When Gmail was opened in 2004, I received invitations early. If I remember well, they came from a friend working at Google who had already snatched a few fun login names. I did the same, and passed on further invitations to my brother and our friends back home.

A year or so later, when my brother was visiting with his friends, we went on a tour of the Googleplex. Randomly passing in front of the cubicle of a homonym, one of the friends suddenly realised why he had not been able to register his own name earlier. In other words, an unknown collision in the physical world had first manifested digitally.

I like to think of those collisions as the digital equivalent of New York overcrowding, trying to fit too many people in just a few login characters.

So which fun pseudonyms did we chose? Which did we consider worthy in this land grab? Certainly many of them were aimed at our shared cultural backgrounds as Belgians in the Silicon Valley. If you had tintin@gmail.com, or frietkot@gmail.com that would be pretty impressive, no? Indeed, we grabbed names of regions, superheroes, movie stars, concepts, etc. We certainly thought this was OK, and didn't reflect more on something that became controversial only later.

One of the logins I grabbed had the name of a Belgian politician, let's call him Some Guy. He was on TV and I thought my friends would get a chuckle if I emailed them from it. Certainly, I might have crossed a moral line already then, but it felt like a very tiny escalation in this virtual land grab.

What did I do with this account? I mostly used it for spam protection. I set it up so that all emails sent there would be forwarded to my default inbox, and gave this address whenever there was a need to a register for a spammy online service. This worked well, possibly because Gmail's algorithms had learned to weigh emails transiting through this address differently and benefited from the additional segmenting.

Around 2008, inevitably, I started receiving emails addressed to That Guy. Those collisions happen to all of us, for all of our email accounts. What is the moral thing to do there? My philosophy is most of the time to let it drop, but sometimes also to reply to the sender telling them that they got the wrong address (due to emails missent to my main account, I must have had to contact a dozen hotels in Quebec by now). In most cases, the only way to know what to do is to read the email, slightly invading this other persons' privacy.

Just like Rachel and her Friends in their New York apartment, we struggle to deal with those privacy collisions, especially when we feel a need to intervene.

For That Guy, it was even easier to feel morally OK about it: I never actively sought the emails, had no way to prevent the mistake, and anyways the emails were from cranks. On top, by that time I had registered to too many services with that pseudonym, which effectively tied my identity to it, with no way to revert the situation. So in effect this data collection was happening, whether I liked it or not, or at least that was my moral justification.

The problem with data is that it leaks. The cranks don't just email one influential person at a time. They email a few, who are susceptible to know each other. As a consequence, in this case, the cranks polluted those recipients' email software with a wrong email address. Of course, in due time, the email autocompletion software of those recipients started tripping them and I started receiving emails from other politicians to That Guy. Algorithmic curation had gone wrong, and actively mislead humans. The fact that these were politicians might have mislead me: I should have made the effort of explaining the awkward situation to That Guy's interlocutors and tried to correct it. But I didn't. Somehow a couple more emails made it to me that were clearly of more social nature. Again, I didn't do anything. This data will not disappear unless actively deleted, and even then I can only be so sure.

At this point you will deservedly think that I am a moron. But was it morally wrong? And when exactly did it go wrong?

Throughout my moral justification was that I was not actively seeking this. Emails would land in my mailbox and I would have to read them to know what to do. Of course, this conveniently ignores what I could have done to prevent those emails to arrive in the first place. Part of my justification was that I wasn't doing anything with the data collected. There was no clear goal, except awareness that this could be used to make a point later, which I guess I am making here now publicly (in fact, I have used this to make this point in private throughout the years).

The more interesting issue here is to understand that this is exactly how many big data companies function. "Don't be evil" Google gobbles data all over the place for purposes that are not always clear at the time, and the justification is often that this was incidental, automated and did not require human intervention. Looking at a corporate setting elevates the stakes, and my feeble moral justifications are not sufficient anymore. It becomes a matter of ethics, which arguably should be that data collection is by default unethical: data should not be kept beyond the time necessary for its intended use, with that use itself subject to precise and established ethical rules. It looks like Google has understood this in some markets, for instance education (unlike other players there), and this will be the topic of a later post.

(Image in the public domain: the Dr Stangelove War Room, which happens to be replicated in the Airbnb HQ)