Introduction

What Is Collaborative Data Journalism, and Why Would We Want to Do It?

As budgets shrink and as the internet creates new opportunities for effortless communication and file sharing, newsrooms across the country are turning to new ways of working together to deliver complex data-based stories to their communities.

Huge datasets that cover vital national issues are coming out of the federal government every day, and within them hide endless numbers of story leads for local journalists. Data journalism teams and civic hackers are collecting massive troves of data, which present a rich vein of reporting if they can be efficiently mined. Engagement and social media teams are fielding large numbers of tips and crowdsourcing responses from their communities, only a few of which can become parts of stories.

With the proliferation of available data, it’s become more common for newsrooms to have access to datasets that contain more story leads than they can meaningfully pursue themselves. Collaborative data journalism allows multiple newsrooms to find and tell those stories, make the most out of large datasets and, ultimately, increase the chances that their work will have impact.

The results of collaborative data projects can be anything — a story, a series of stories, an interactive graphic, a documentary, etc. Collaborations can also include entities outside journalism, like universities, nonprofit groups, researchers or libraries.

With the right tools and planning, very large, complex and even secret collaborations are possible. The Panama Papers and Paradise Papers investigations, for example, distributed huge document leaks to dozens of newsrooms around the world. ProPublica's Electionland project tracks voting problems in real time by working with hundreds of newsrooms around the country.

Collaborations run the gamut from independently reporting stories within a coalition or conglomerate, to working together on co-published stories, to sharing resources to report separately. And while some collaborations may last for a limited period of time, others may be long term.

The Center for Cooperative Media, which studies these types of projects, defines six types of collaborations. Here’s how their taxonomy applies to data collaborations:

Temporary and Separate: Covering the same issue and working entirely separately, like the SF Homeless Project
Temporary and Co-creating: Sharing information but reporting independently, like ProPublica’s Electionland or Documenting Hate
Temporary and Integrated: Reporting stories together for a period of time, like the Panama Papers and the Implant Files, or the USA Today network's data projects
Ongoing and Separate: Partners create content separately and share it, like the Marshall Project's Next to Die
Ongoing and Co-creating: Co-reporting in a long-term setting, like Alaska's Energy Desk
Ongoing and Integrated: Partners work together on an operational level, like TapInto in New Jersey.

Data partnerships tend to fall into the temporary category of co-creating or integrated approaches. Occasionally, they are ongoing and separate, like the AP's Data World service.

The center has a database of close to 200 different collaborative projects from around the world that shows the diversity and creativity of these types of endeavors.

What Is This Guide?

At ProPublica, collaborative journalism is a central part of our self-identity and ethos. We’ve partnered with hundreds of journalism outlets on thousands of stories. We’ve assembled this guide to pass on what we’ve learned to other newsrooms interested in starting similar efforts.

We also launched a free tool called Collaborate, which is based on the one we built for our own data collaborations.

Finally, this guide is itself a collaboration. We think our process works well, but we don’t have a monopoly on good ideas. We’re posting this guide to GitHub, and we’re eager for your contributions via the normal GitHub methods of issues and pull requests. If you’re not familiar with how to participate using GitHub’s tools, here’s a guide.

This document is, and may always be, a work in progress. Tools will come out that help improve and even cause us to completely rethink the processes we describe here, and people who also do this work will learn lessons we haven’t. Like any collaboration, the results are stronger because we worked on this together.

How Can I Contribute or Ask Questions?

Do you have tips or concrete advice about how to execute a data- or tip-focused collaboration? You can add to this guide in one of three ways:

Email Rachel Glickhouse at rachel.glickhouse@propublica.org. Please include your name, title and company.
Clone this project on Github, edit it and submit it back to us via a pull request.
Submit questions, feedback or ideas via Github’s issue-tracking system.

PreviousAvailable Languages NextPart 1: How to Run a Collaborative Data Project

Last updated 4 years ago