It’s Time to Open the Black Box of Social Media

Social media companies need to give their data to independent researchers to better understand how to keep users safe

By Renée DiResta, Laura Edelson, Brendan Nyhan & Ethan Zuckerman

Woman holding a writing pad stands in front of a screen showing different screenshots of social media sites sharing data with researchers for study. — Kailey Whitman

Social media platforms are where billions of people around the globe go to connect with others, get information and make sense of the world. The companies that run these sites, including Facebook, Twitter, Instagram, TikTok and Reddit, collect vast amounts of data based on every interaction that takes place on their platforms.

And despite the fact that social media has become one of our most important public forums for speech, several of the most important platforms are controlled by a small number of people. Mark Zuckerberg controls 58 percent of the voting share of Meta, the parent company of both Facebook and Instagram, effectively giving him sole control of two of the largest social platforms. Elon Musk made a $44-billion offer to take Twitter private (although whether that deal goes through will be determined by a lawsuit). [Editor’s Note: Musk completed his acquisition of Twitter in late October.] All these companies have a history of sharing scant portions of the data about their platforms with researchers, preventing us from understanding the impacts of social media on individuals and society. Such singular ownership of the three most powerful social media platforms makes us fear this lockdown on data sharing will continue.

After decades of little regulation, it is time to require more transparency from social media companies.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

In 2020 social media was an important mechanism for the spread of false and misleading claims about the election and for mobilization by groups that participated in the January 6, 2021, Capitol insurrection. We have seen misinformation about COVID spread widely online during the pandemic. And today social media companies are failing to remove the Russian propaganda about the war in Ukraine that they promised to ban. Social media has become a major conduit for the spread of false information about every issue of concern to society. We don’t know what the next crisis will be, but we do know that false claims about it will circulate on these platforms.

Unfortunately, social media companies are stingy about releasing data and publishing research, especially when the findings might be unwelcome (although notable exceptions exist). The only way to understand what is happening on the platforms is for lawmakers and regulators to require social media companies to release data to independent researchers. In particular, we need access to data on the structures of social media, such as platform features and algorithms, so we can better analyze how they shape the spread of information and affect user behavior.

For example, platforms have assured legislators that they are taking steps to counter misinformation and disinformation by flagging content and inserting fact-checks. Are these efforts effective? Again, we would need access to data to know. Without better data, we can’t have a substantive discussion about which interventions are most effective and consistent with our values. We also run the risk of creating new laws and regulations that do not adequately address harms or of inadvertently making problems worse.

Some of us have consulted with lawmakers in the U.S. and Europe on potential legislative reforms along these lines. The conversation around transparency and accountability for social media companies has grown deeper and more substantive, moving from vague generalities to specific proposals. The debate still lacks important context, however. Lawmakers and regulators frequently ask us to better explain why we need access to data, what research it would enable, and how that research would help the public and inform regulation of social media platforms.

To address this need, we’ve created this list of questions we could answer if social media companies began to share more of the data they gather about how their services function and how users interact with their systems. We believe such research would help platforms develop better, safer systems and also inform lawmakers and regulators who seek to hold platforms accountable for the promises they make to the public.

Research suggests that misinformation is often more engaging than other types of content. Why is this the case? What features of misinformation are most associated with heightened user engagement and virality? Researchers have proposed that novelty and emotionality are key factors, but we need more research to know whether this is true. A better understanding of why misinformation is so engaging will help platforms improve their algorithms and recommend misinformation less often.
Research shows that the delivery-optimization techniques companies use to maximize revenue, and even the ad-delivery algorithms themselves, can be discriminatory. Are some groups of users significantly more likely than others to see potentially harmful ads, such as consumer scams? Are others less likely to be shown useful ads, such as job postings? How can ad networks improve delivery and optimization to be less discriminatory?
Social media companies attempt to combat misinformation by labeling content of questionable provenance, hoping to push users toward more accurate information. Results from survey experiments show that the effects of labels on beliefs and behavior are mixed. We need to learn more about whether labels are effective when individuals encounter them on platforms. Do labels reduce the spread of misinformation or attract attention to posts that users might otherwise ignore? Do people start to ignore labels as they become more familiar?
Internal studies at Twitter show that Twitter’s algorithms amplify right-leaning politicians and political news sources more than left-leaning accounts in six of seven countries studied. Do other algorithms used by other social media platforms show systemic political bias as well?
Because of the central role they now play in public discourse, platforms have a great deal of power over who can speak. Minority groups sometimes feel their views are silenced online as a consequence of platform moderation decisions. Do decisions about what content is allowed on a platform affect some groups disproportionately? Are platforms allowing some users to silence others through the misuse of moderation tools or through systemic harassment designed to silence certain viewpoints?

Social media companies ought to welcome the help of independent researchers to better measure online harm and inform policies. Some companies, such as Twitter and Reddit, have been helpful, but we can’t depend on the goodwill of a few businesses whose policies might change at the whim of a new owner. We hope a potentially Musk-led Twitter would be as forthcoming as before, if not more so. [Editor’s Note: This article was written and posted before Musk took ownership of Twitter.] In our fast-changing information environment, we should not regulate and legislate by anecdote. We need lawmakers to ensure our access to the data we need to help keep users safe.

Editor’s Note (11/11/22): This story was edited after posting to include updates about Elon Musk’s acquisition of Twitter.

A version of this article with the title “Social Media Companies Must Share Data” was adapted for inclusion in the December 2022 issue of Scientific American.