Facebook has been accused of blocking the ability of independent researchers to effectively study how political disinformation flows across its ad platform.
Adverts that the social network’s business is designed to monetize have — at the very least — the potential to influence people and push voters’ buttons, as the Cambridge Analytica Facebook data misuse scandal highlighted last year.
Since that story exploded into a major global scandal for Facebook, the company has faced a chorus of calls from policymakers on both sides of the Atlantic for increased transparency and accountability.
Among Facebook’s less controversial efforts to counter the threat that disinformation poses to its business are what it bills “election security” initiatives, such as identity checks for political advertisers. Even these efforts have looked hopelessly flat-footed, patchy and piecemeal in the face of concerned attempts to use its tools to amplify disinformation in markets around the world.
Perhaps more significantly — under amped up political pressure — Facebook has launched a searchable ad archive. And access to Facebook ad data certainly has the potential to let external researchers hold the company’s claims to account.
But only if access is not equally flat-footed, patchy and piecemeal, with the risk that selective access to ad data ends up being just as controlled and manipulated as everything else on Facebook’s platform.
So far Facebook’s efforts on this front continue to attract criticism for falling way short.
“the opposite of what they claim to be doing… “
The company opened access to an ad archive API last month, via which it provides rate-limited access to a keyword search tool that lets researchers query historical ad data. (Researchers first need to pass an identity check process and agree to the Facebook developer platform terms of service before they can access the API.)
However, a review of the tool by not-for-profit Mozilla rates the API as a lot of weak-sauce “transparency-washing” — rather than a good faith attempt to support public interest research that could genuinely help quantify the societal costs of Facebook’s ad business.
“The fact is, the API doesn’t provide necessary data. And it is designed in ways that hinders the important work of researchers, who inform the public and policymakers about the nature and consequences of misinformation,” it writes in a blog post, where it argues that Facebook’s ad API meets just two out of five minimum standards it previously set out — backed by a group of sixty academics, hailing from research institutions including Oxford University, the University of Amsterdam, Vrije Universiteit Brussel, Stiftung Neue Verantwortung and many more.
Instead of providing comprehensive political advertising content, as the experts argue a good open API must, Mozilla writes that “it’s impossible to determine if Facebook’s API is comprehensive, because it requires you to use keywords to search the database.”
“It does not provide you with all ad data and allow you to filter it down using specific criteria or filters, the way nearly all other online databases do. And since you cannot download data in bulk and ads in the API are not given a unique identifier, Facebook makes it impossible to get a complete picture of all of the ads running on their platform (which is exactly the opposite of what they claim to be doing),” it adds.
Facebook’s tool is also criticized for failing to provide targeting criteria and engagement information for ads — thereby making it impossible for researchers to understand what advertisers on its platform are paying the company to reach; as well as how effective (or otherwise) these Facebook ads might be.
This exact issue was raised with a number of Facebook executives by British parliamentarians last year, during the course of a multi-month investigation into online disinformation. At one point Facebook’s CTO was asked point-blank whether the company would be providing ad targeting data as part of planned political ad transparency measures — only to provide a fuzzy answer.
Of course there are plenty of reasons why Facebook might be reluctant to enable truly independent outsiders to quantify the efficacy of political ads on its platform and therefore, by extension, its ad business.
Including, of course, the specific scandalous example of the Cambridge Analytica data heist itself, which was carried out by an academic, called Dr. Aleksandr Kogan, then attached to Cambridge University, who used his access to Facebook’s developer platform to deploy a quiz app designed to harvest user data without (most) people’s knowledge or consent in order to sell the info to the disgraced digital campaign company (which worked on various U.S. campaigns, including the presidential campaigns of Ted Cruz and Donald Trump).
But that just highlights the scale of the problem of so much market power being concentrated in the hands of a single adtech giant that has zero incentives to voluntarily report wholly transparent metrics about its true reach and power to influence the world’s 2 billion+ Facebook users.
Add to that, in a typical crisis PR response to multiple bad headlines last year, Facebook repeatedly sought to paint Kogan as a rogue actor — suggesting he was not at all a representative sample of the advertiser activity on its platform.
So, by the same token, any effort by Facebook to tar genuine research as similarly risky rightly deserves a robust rebuttal. The historical actions of one individual, albeit yes an academic, shouldn’t be used as an excuse to shut the door to a respected research community.
“The current API design puts huge constraints on researchers, rather than allowing them to discover what is really happening on the platform,” Mozilla argues, suggesting the various limitations imposed by Facebook — including search-rate limits — means it could take researchers “months” to evaluate ads in a particular region or on a certain topic.
Again, from Facebook’s point of view, there’s plenty to be gained by delaying the release of any more platform usage skeletons from its bulging historical data closet. (The “historical app audit” it announced with much fanfare last year continues to trickle along at a disclosure pace of its own choosing.)
The two areas where Facebook’s API is given a tentative thumbs up by Mozilla is in providing access to up-to-date and historical data (the seven-year availability of the data is badged “pretty good”); and for the API being accessible to and shareable with the general public (at least once they’ve gone through Facebook’s identity confirm process).
Though in both cases Mozilla also cautions it’s still possible that further blocking tactics might emerge — depending on how Facebook supports/constrains access going forward.
It does not look entirely coincidental that the criticism of Facebook’s API for being “inadequate” has landed on the same day that Facebook has pushed out publicity about opening access to a database of URLs its users have linked to since 2017 — which is being made available to a select group of academics.
In that case, 60 researchers, drawn from 30 institutions, who have been chosen by the U.S.’ Social Science Research Council.
Notably the Facebook-selected research data set entirely skips past the 2016 U.S. presidential election, when Russian election propaganda infamously targeted hundreds of millions of U.S. Facebook voters.
The U.K.’s 2016 Brexit vote is also not covered by the January 2017 onwards scope of the data set.
Facebook does say it is “committed to advancing this important initiative,” suggesting it could expand the scope of the data set and/or who can access it at some unspecified future time.
It also claims “privacy and security” considerations are holding up efforts to release research data quicker.
“We understand many stakeholders are eager for data to be made available as quickly as possible,” it writes. “While we remain committed to advancing this important initiative, Facebook is also committed to taking the time necessary to incorporate the highest privacy protections and build a data infrastructure that provides data in a secure manner.”
The EU-wide Code includes a specific commitment that platform signatories “empower the research community to monitor online disinformation through privacy-compliant access to the platforms’ data,” in addition to other actions such as tackling fake accounts and making political ads and issue-based ads more transparent.
However, here, too, Facebook appears to be using “privacy-compliance” as an excuse to water down the level of transparency that it’s offering to external researchers.
TechCrunch understands that, in private, Facebook has responded to concerns raised about its ad API’s limits by saying it cannot provide researchers with more fulsome data about ads — including the targeting criteria for ads — because doing so would violate its commitments under the EU’s General Data Protection Regulation (GDPR) framework.
That argument is of course pure “cakeism.” AKA Facebook is trying to have its cake and eat it where privacy and data protection is concerned.
In plainer English, Facebook is trying to use European privacy regulation to shield its business from deeper and more meaningful scrutiny. Yet this is the very same company — and here comes the richly fudgy cakeism — that elsewhere contends personal data its platform pervasively harvests on users’ interests is not personal data. (In that case Facebook has also been found allowing sensitive inferred data to be used for targeting ads — which experts suggest violates the GDPR.)
So, tl;dr, Facebook can be found seizing upon privacy regulation when it suits its business interests to do so — i.e. to try to avoid the level of transparency necessary for external researchers to evaluate the impact its ad platform and business has on wider society and democracy … yet argues against GDPR when the privacy regulation stands in the way of monetizing users’ eyeballs by stuffing them with intrusive ads targeted by pervasive surveillance of everyone’s interests.
Such contradictions have not at all escaped privacy experts.
“The GDPR in practice — not just Facebook’s usual weak interpretation of it — does not stop organisations from publishing aggregate information, such as which demographics or geographic areas saw or were targeted for certain adverts, where such data is not fine-grained enough to pick an individual out,” says Michael Veale, a research fellow at the Alan Turing Institute — and one of 10 researchers who co-wrote the Mozilla-backed guidelines for what makes an effective ad API.
“Facebook would require a lawful basis to do the aggregation for the purpose of publishing, which would not be difficult, as providing data to enable public scrutiny of the legality and ethics of data processing is a legitimate interest if I have ever seen one,” he also tells us. “Facebook constantly reuse data for different and unclearly related purposes, and so claiming they could legally not reuse data to put their own activities in the spotlight is, frankly, pathetic.
“Statistical agencies have long been familiar with techniques such as differential privacy which stop aggregated information leaking information about specific individuals. Many differential privacy researchers already work at Facebook, so the expertise is clearly there.”
“It seems more likely that Facebook doesn’t want to release information on targeting as it would likely embarrass [it] and their customers,” Veale adds. “It is also possible that Facebook has confidentiality agreements with specific advertisers who may be caught red-handed for practices that go beyond public expectations. Data protection law isn’t blocking the disinfecting light of transparency, Facebook is.”
Asked about the URL database that Facebook has released to selected researchers today, Veale says it’s a welcome step — while pointing to further limitations.
“It’s a good thing that Facebook is starting to work more openly on research questions, particularly those which might point to problematic use of this platform. The initial cohort appears to be geographically diverse, which is refreshing — although appears to lack any academics from Indian universities, far and away Facebook’s largest user base,” he says. “Time will tell whether this limited data set will later expand to other issues, and how much researchers are expected to moderate their findings if they hope for continued amicable engagement.”
“It’s very possible for Facebook to effectively cherry-pick data sets to try to avoid issues they know exist, but you also cannot start building a collaborative process on all fronts and issues. Time will tell how open the multinational wishes to be,” Veale adds.
We’ve reached out to Facebook for comment on the criticism of its ad archive API.