The recent Facebook scandal—in which Cambridge Analytica obtained 50 million Facebook users’ personal data—really shouldn’t have been such a big deal. By no means was it the biggest breach of data, nor a breach of the most sensitive kind of data. It wasn’t nearly as salacious as PRISM nor any of the other secret programs the NSA designed to siphon phone and internet data (which remained covert until whistleblower Edward Snowden famously told the Guardian about them in 2013).
The most—if not the only—potentially unlawful act in this saga was the deal struck to share the data between Aleksandr Kogan and Cambridge Analytica, a potential violation of the terms of service for the survey app that Kogan created to harvest the data in the first place. The act of collecting the data, though no longer permitted by Facebook, was perfectly lawful at the time.
So why then was this breach such a big deal?
First of all, we can universally agree that Facebook has amassed a trove of personal data larger than that of any other company on the planet. Given the obvious value of these data, Facebook is constantly targeted. When such a breach occured on the world’s largest social network, millions of people became upset, rightfully so. Facebook got pressured to explain how it happened. Still seems like no big deal.
Their explanation, however, begged an important question: Why does Facebook need all these data in the first place? That in turn led to an interesting Catch-22: In explaining the data breach, Facebook had to draw attention to its business model, namely that it collects data, anonymizes them, and sells them—albeit indirectly—to their real customers. The real customers are not us, the Facebook/Instagram/WhatsApp users. No, Facebook’s customers are ad networks and advertisers, i.e. companies and people who pay to promote their products and services on Facebook.
This “revelation” should have come as no big surprise to anyone, or at least to anyone paying attention. Revenue through tailored advertising is the business model of Google, Facebook and nearly every media entity that operates online. Last year, Facebook reported that 98% of their revenue came from advertising.
Using your data (and mine and everyone else’s), Facebook built an incredibly powerful ad targeting platform, a platform we allowed them to build and deploy when we accepted their terms of service—all two (plus) billion of us.
It’s even possible—through Facebook’s publicly-available advertising platform—to target a 41-year-old man in San Francisco who speaks Spanglish, who has attended at least one Lindyhop event and who belongs to the Bay Area Esk8 group. In other words, I can target an ad so narrowly that it’s shown only to me. (I just tried this, and though the platform gave me a warning that my targeting parameters might be “too specific,” it didn’t stop me from setting up the ad.)
So this is how Facebook makes hay using our personal data. Along with paywalls/subscriptions (e.g. San Francisco Chronicle, Medium, New York Times) and donations (e.g. The Guardian, NPR, Wikipedia), selling ads targeted to people’s personal sensibilities is how hay gets made not just on Facebook, but all over the internet. If that means I get to consume ads for dance camps and wetsuits in lieu of celebrity plastic surgery disasters, then everybody wins. (Facebook infers, correctly, that I surf. O’Neill pays Facebook to advertise the wetsuit to me and other surfers, we buy the wetsuits from O’Neill. Repeat. Cha-ching.)
Somehow we got from selling wetsuits to throwing elections. To understand how our current internet failed us in order to frame to where the new internet needs to take us, it’s worth doing a shallow dive into internet history.
A Brief History of the Internet (and Cats)
The internet was never intended to be a money-making machine. In the late 60s and early 70s, large universities wired their computers together in order to share research, primarily through email (of all things) on an early version of the internet known as ARPAnet. Along the way, the DoD provided financing to create DARPAnet. In the 80s, I’m sure sharing cat pictures (uuencoded as streams of text) started to become a thing, if it wasn’t already. Even still, the internet’s only “business model” was government-sponsored academic propeller-spinning.
In 1994, with the advent of the Netscape browser, non-academics flooded onto the internet in droves. Ten years prior, I got my first email account and dial up access from AppleLink. I connected to and explored BBSs and started using protocols like Gopher and NNTP (Usenet). I read up on “netiquette,” learned how to keep my CAPS LOCK key off, how to spot an AOLer (hint: CAPS LOCK USUALLY ON) and how to construct some basic emoticons, something we once called “ASCII art.” |_|] ← That’s a coffee mug right there. Really, it is.
This early internet, on the precipice of becoming commercial, had the feel of a loosely-coupled collection of “expert communities”—for lack of a better term—scattered amongst BBSs, Usenet and AOL chatrooms. (Keep this notion of “expert communities” in mind as you continue reading; I’ll circle back to it later.)
From roughly 1994-2002, companies flocked to the internet to experiment with the web’s first “real” business model: ecommerce. For a few years, it seemed like every business needed a web storefront. However, when investors realized that selling cat food online wasn’t quite what it was cracked up to be, the bubble burst. The same market forces that quickly evaporated five trillion dollars of value also declared Amazon the clear “winner” of ecommerce, proving that centralized inventory (along with on-demand inventory) and centralized technology and fulfillment logistics were the best way—if not the only way—to sell cat food online and actually turn a profit.
After a brief moment of reckoning, from the second wave of the internet—what some call Web 2.0—emerged a new, more indirect business model, this one borrowed from traditional media companies. Like newspapers and magazines, “Web 2.0” sites and applications would also run ads, but insteading of hiring professional photographers and journalists, everyday users would supply the cat photos and write the heartwarming cat stories. Sites like these could save money by letting amateurs create the content—called User Generated Content (or UGC for short)—while they collected money for every cat food ad impression (CPM), every cat photo click-thru (CPC) and every action, e.g. signing up for a site’s feline marketing content or taking a cat survey (CPA).
Naturally, the sites with the most users and the most cat photos (predominantly Facebook and Twitter) could provide the richest ad targeting platforms. Facebook’s claim of making the world more connected belied another mission: to create the richest, most effective ad-targeting platform known to mankind.
(It’s worth nothing that I’m glossing over huge swaths of the ad industry, including search ads/SEO/SEM and scores of networks that serve up ads on third-party sites and mobile applications. I’m also neglecting to talk about the mobile web in general terms, the Semantic Web, the Internet of Things and a whole host of other topics, just so we can stay focused on UGC.)
User Generated Cats
While it has been part of the technology toolkit and lingo for at least 15 years, many—if not most—people heard about UGC for the first time during the recent fallout from the Facebook/Kogan/Cambridge Analytica scandal. Prior to a few days ago, people thought Facebook was free; in reality it’s not. We pay for Facebook by bartering our personal information in exchange for the Facebook features we enjoy.
Perhaps “used to enjoy” would have been better phrasing, since this latest scandal left angry mobs of people joining the #DeleteFacebook movement. In many ways, they’re doing so in vain, because we would literally need to stop using our smartphones and the entire internet, change our names, addresses, hair/eye color, purchase history and a thousand other things to escape the personal data collection happening everywhere on the web.
On Facebook and elsewhere, UGC greases the gears of an enormous machine designed to turn cat photos into cash. And it works, or at least it works for a few massive companies, which seems to be a theme as far as internet companies go.
In fact, at least three times in the brief history of the internet have we seen huge oligopolies create—and consume—entire online business models: Amazon (for ecommerce), Google (for search advertising) and Facebook (for UGC advertising).
Organic growth and the acquisitions by Facebook alone resulted in more than two billion people’s personal information, likes, preferences and social interactions gettings stored inside what is effectively one enormous database.
And that finally explains why this scandal is important: because it has caused people to start asking some really good questions, like: Was it a good idea to allow companies like Facebook to give everyone a free microphone in exchange for harvesting, storing and mining everything everyone says?
It’s Not the Cat Photos; It’s the Cat Distribution
Facebook may be the biggest collector of data, but they certainly aren’t the only one. Plus, they’re not going to delete their data, as it’s the lifeblood of their company. So instead of focusing on Facebook, I want to ask a more fundamental question, one that will surely ignite the ire of free speech advocates everywhere, but one that needs to be asked regardless: Was it even a good idea to give everyone a free microphone in the first place?
Put another way, when is it a good idea—in the real, non-digital world—for us to tell something, instantly, to everyone we know: family, good friends, co-workers, acquaintances, people we just met and immediately befriended? Before Facebook, this wasn’t easily possible. We used to hide our reading materials and journals under the mattress and only send things like baby announcements to everyone we know (even then selectively skipping creepers like Uncle Charlie). Now Facebook has flipped that notion on its head. Your cat photo has more likes than my baby announcement? Does this make any kind of sense IRL? Then why should it be possible online?
But, what about free speech? Yes, in this country we are all free to say nearly anything without fear of repercussion. In another sense, however, speech isn’t really free at all. Our precious free speech is utterly worthless without distribution. Without distribution, our posts on the internet are nothing more than trees falling in the forest with no one around to listen for the sounds they might make. Distribution costs money—and that’s why we strike a Faustian promise with every word and click on Facebook. We provide the content; they provide the distribution. And we pay for the distribution, albeit indirectly, by allowing Facebook to broker our data to advertisers.
Too often and too easily is distribution confused with truth. If something is “widely reported,” that doesn’t make it factual. Therein lies problem with the awesome distribution power of Facebook: It can be used to distribute facts just as efficiently as it can to spread, um, “alternative facts.” As a result, Facebook and Twitter and other UGC sites are heavily moderated both by people and by machines. The other day, Facebook’s censorship robots blocked my friend Tim from saying “trees cause global warming.” Many artists have had their work removed for showing a little too much nipple (or a little too much something). This introduces a whole new set of problems, the most of important of which is: Do we trust Facebook to arbitrate “good” speech from “bad?” Under what or whose standards?
I had a revealing personal experience in 2012 when I helped Miso—a Google-backed venture conceived as a social media site for videos—build an application called Quips. This app would allow people to use their phones to take still images from TV shows and movies and create memes from them by adding the chunky white text we’ve come to associate with such artifacts.
Long story short: we didn’t build moderation (a common internet euphemism for censorship) into the first version of the platform. Rather, we gave people unfettered access to tools they could use to create potentially viral content. What could possibly go wrong? Within weeks, Quips had degenerated into the most profoundly hateful cesspool I’ve yet to see on the internet—and I even (sometimes) read YouTube video comments! Who knew Miso was actually short for misogyny—and racism, homophobia, xenophobia and a million other kinds of hate speech?
It was easy for us to sunset Quips and bury the steaming pile of dreck that Quippers created. It’s not so simple for Facebook.
They certainly can’t delete everything without destroying the data vital to their business model. Meanwhile trying to censor posts is an endless game of algorithmic Whack-a-Mole certain to offend the sensibilities of moles on the far-right, the far-left and every mole in between, including my friend Tim (who doesn’t actually believe that trees cause global warming; it was just a joke).
So distribution without moderation/censorship leads to a cesspool. We technologists all knew this already, but it hasn’t stopped a host of really smart people from trying to build a better moderation/censorship mousetrap. Ultimately they will fail because of (what I can only hope is merely a few) creative individuals with a lot of free time producing a seemingly-limitless supply of garbage. Or art. Or jokes! Sarcasm, something nearly impossible to detect on the web, can often be mistaken for hate speech, especially when the whole point of the sarcasm was to raise awareness of the hate speech in the first place.
When faced with an intractable issue like “stamping out misinformation on the internet,” it helps to reframe the problem by looking at the actual root cause. The cause is not fake news per se, nor ad networks, nor Facebook, Cambridge Analytica nor even UGC. Rather, the naive ideology of the internet coupled with the worst traits in humanity formed ideal grounds for a Tragedy of the Commons: If you create something open and free, some people will eventually find a way to exploit it for their own benefit and thereby ruin it for everyone else.
Emerging from the Cesspool
Even though it’s likely a very small segment of “bad actors” who are ruining the internet for everyone, I’m proposing a radical shift: let’s leave the internet for what it is (a cesspool) and build a better one. What if we could start over with the same lofty goals—connecting the world by sharing information—but this time build an internet with failsafes that would prevent us from creating yet another cesspool of misinformation and hate speech?
I’m not suggesting that we shut down the internet, but instead that we build something atop existing protocols that helps the world organize information, validate claims, and establish fact; in other words, we need to build an internet that lives up to its early design considerations, which, obviously, did not include building a cesspool of falsehoods and hate speech.
A recent NYT article really drove this point home for me: “The downgrading of experience and devaluing of expertise can be explained partly by the internet, which allows people to assemble their own preferred information and affords them the delusion of omniscience.”
Note it said “partly.” The internet is partly at fault. Humanity bears responsibility for the rest.
So yes, humanity is a big part of the problem. But it’s also the solution. For every bad actor, there are thousands and thousands of good ones.
What if we could build an internet wherein good actors could drive out bad?
What if we could create an internet consisting only of factual information? An internet devoid of corporate interests? An internet of real people wherein everyone could only interact with the system using a proven identity?
What if we could finally draw the line between private and non-private digital communications, such that private conversations could remain truly private?
What if all information was organized into siloes, like the “expert communities” of the early internet, but codified into a meritocratic hierarchy where every claim needed to be vetted by an established community of experts? What if experts could delegate privileges to other experts who prove their worth through contributions? What if the information curated remained free to the consumer, but provided a basic income to its creators and gardners for the work they put into curating the information? What if this internet could remain completely read-only to everyone not designated an expert in a particular silo?
Much of the technology we need to build something like this already exists. Signal, Keybase and scores of other platforms offer peer-to-peer (serverless) encrypted messaging. StackExchange already provides a model for curated expert communities, entirely based upon Q&A. Modeling the new internet off of StackExchange (or Quora or WhySaurus), each question response could be stored as a block in a blockchain with experts from the appropriate communities recruited to validate the responses, much like block validation already works today for cryptocurrencies.
Every information silo would require a community of experts to curate it. But what good are these experts if we can’t check their credentials and contributions to validate that they really are experts? The missing piece here is global identity management, i.e. a way to prove that we are who we say we are. We need a biometric-seeded revocable cryptographic key that would allow us to conduct business using our IRL identities or with pseudonyms that the owners can prove are theirs (but not the other way around). The Human Unique Identifier (or HUID) described by the ambitious Cicada Project proposes a clever design for this.
Creating a secure, un-spoofable identity system is a fundamental challenge, but it’s surely not the only challenge. In building this new internet, our biggest enemy is what we don’t know—and what we won’t know until we we’ve already written oodles of code and tests, as is often the case with software projects.
But we can’t let fear of the unknown stop us. The time has come—in fact it’s long overdue—to create a new internet, an internet that can’t be defeated by Nigerian scammers, Russian fake news bots or that 400-pound kid in his bed somewhere. Let’s leave the existing internet intact but teach our kids that they should assume that nearly everything they read there is either bullshit or sponsored bullshit. If vetted, cite-able, factual information is what they seek: They need to consult Web X.0.
And yes, this new internet would be read-only for 99.9999% of the world’s population. This would leave about 7,000 experts in control of all the world’s public factual information, with the ability to delegate more experts as needed. No corporations would be allowed; no corporate interests would be tolerated. In this way, the denizens of the new internet would maintain all the world’s information much like the denizens of the early internet “expert communities” on BBSs, Usenet and chatrooms, but this time with HUIDs and block validation keeping everyone honest.
People could still interact with corporations on the “old internet,” but we could use the Web X.0 HUID to doll out Basic Attention Tokens (or something like them) to allow people to decide for themselves which revocable personal information they want to share with commercial entities—and get compensated with cryptocurrency in return. In other words, corporations would pay consumers directly for paying attention to their messages, eliminating the layers of ad network middlemen who get paid for matching companies to consumers.
The Cicada Project takes this a step further by adding a secure direct democracy component, which would allow populations small and large to self-govern. Direct democracies are notoriously disastrous (e.g. Athens) but given that two of our last three presidents took office despite losing the popular vote, maybe is an idea worth considering again.
Then again maybe direct democracy is biting off more than we can chew. Maybe we should start by building and deploying the HUID on the existing internet and then go from there.
Maybe this is all hogwash.
But maybe—thanks to Facebook, Kogan and Cambridge Analytica—we’re finally starting to ask the right questions.