Depending on your dataset, it's important to make clear to participants the completeness of the data and how to refer to it in their stories.
As we’ve already said, it's a really good idea to make a guide about how to use the data as part of your onboarding materials for partners. Documenting Hate has a guide about how to use our database and how to parse the tips — and pitfalls to avoid when trying to do analysis. ProPublica often does reporting recipes for large datasets to help newsrooms localize stories, or it makes a reporting recipe for our interactive databases, like this one for Documenting Hate.
Reporting recipes are a key part of the Bureau Local's approach to engaging partners. It sends them to partners to give them context about the investigation, key findings, guidance on how to parse the data and ways to localize the data. "The recipes play a key role in improving the accessibility of local information and ensuring it reaches people in a clear and actionable way," the group says in its resource guide.
If your dataset is solely tips you've collected from the public, you'll need to make clear to participants a few things.
The data isn't all-encompassing.
That means data isn't ripe for statistical analysis or comparable over time.
Tips are just tips until they are verified by journalists, so it's a good idea to be cautious about how you cite numbers from the database.
The important thing to keep in mind with crowdsourced data is that you can cite the number of tips you've collected, and the number of tips you've verified, but you should make clear how you assembled that data.
For example, with Documenting Hate, partners doing stories with multiple tips will often say how many they examined and how many they were able to verify. They explain that tips were gathered as part of the Documenting Hate project, a coalition of news organizations.
The bottom line is to make sure all of the participants understand the limitations of the data and how to refer to it.
If your dataset is limited solely to crowdsourced tips, there are several approaches to reporting:
Report individual tips as standalone stories. This is particularly useful when you're working with local media outlets.
Identify patterns in the data and report out as many tips as possible that are part of that pattern. Remember: Patterns in the data don’t automatically mean the pattern exists as prevalently or in the same way beyond the results of the callout.
Use the database to identify sources for stories you're already working on.
See story examples from:
ProPublica's Documenting Hate
ProPublica's Maternal Mortality
Reveal's Rehab Reporting Network
Reveal's Case Cleared Reporting Network
BBC's Shared Data Unit
Verificado Mexico project (Spanish)
First Draft's Comprova project (Portuguese)
First Draft's Crosscheck project (French)
Vox's ER bills project*
*This did not operate as a collaborative project, but tips were offered to local reporters at the end of the project.
If you're importing or adding data from other sources in addition to crowdsourcing, it's possible you'll be closer to a more complete dataset. In this case, you may be able to run an analysis and allow partners to do so. But until you get to the point where your data is comprehensive, it's important to communicate with partners about how to cite the data properly.
If a partner conducts its own analysis, it's a good idea to make sure it runs it by the other partners or with the group coordinator before publishing.