Dedupe.io for journalists

Dedupe.io is proud to support journalists working on important investigations with messy and complex data.

As part of that support, we are happy to offer journalists with up to 250,000 free rows for their investigations if they cite Dedupe.io in their story.

Getting started

If you are interested, email us from your publication email address at ​info@dedupe.io and tell us your publication, the data you are working with, and CC your editor, so we know that they are on board. We’ll get you set up with an account and bonus rows!

If you need additional rows, we are happy to offer additional discounts.

Citing Dedupe.io

The citation can appear in your story or its methods section. It should name the service “Dedupe.io,” briefly explain how you used it for your story, and (if the story is publishedontheweb)linkbacktoh​ ttps://dedupe.io​.

Here are some examples of acceptable citations:

  • “The Local Herald used​ ​Dedupe.io​ to link officers across court records…”

  • “We merged the voter information into one spreadsheet. We then matched the names in the school employee spreadsheet with all the voter names… The technique we used is called deduping, and the program we used is called Dedupe.io.​ ..”

Stories that have used Dedupe.io

Here’s what you should know about the millions fueling Chicago’s aldermanic races

By Jonah Newman, Chicago Reporter on Feb 22, 2019

We used dedupe.io … to match donors based on commonalities in names, addresses, and reported occupations. Matches were then reviewed by hand to verify accuracy and completeness, though with more than 39,000 individual contributions …


Here’s what you should know about the millions fueling Chicago’s aldermanic races
Eviction tactics squeeze renters: AJC analysis shows landlords increasingly use filings to collect late rent

By Chris Joyner, Jeff Ernsthausen and Willoughby Mariano, The Atlanta Journal-Constitution on Jun 22, 2018

To investigate differences in eviction filing rates and behavior by different owners or operators, the AJC used Dedupe.io, a machine learning-based application, to link the LLC’s that purchased most multifamily properties together using names and addresses listed on real estate transaction records.


Eviction tactics squeeze renters: AJC analysis shows landlords increasingly use filings to collect late rent
Shielded by the Badge

By Jennifer Bjorhus and MaryJo Webster, StarTribune on Oct 1, 2017

To assess the record of the Minnesota POST Board, the Star Tribune matched names and dates of birth of licensed law enforcement officers against conviction data from state criminal court records. The POST Board provided names, dates of birth and licensing start and expiration dates for anyone who has held a peace officer’s license since Minnesota’s licensing system started in the late 1970s.

“Dedupe.io simplifies the process of trying to do a messy match of names and dates of birth between two datasets, which I’ve found to be one of the hardest data tasks. Dedupe found matches that I likely would’ve missed using any other tool, and gave me far greater confidence in my results.”

  • MaryJo Webster, Reporter, StartTribune


Shielded by the Badge
The Mossack Fonseca Universe

By Brian Clifton, Fusion on Apr 15, 2016

Cleaning and preparing the data was done with Python, relying heavily on the pandas library and DataMade’s Dedupe. NetworkX was used to create the network graph and to perform operations on the graph. Next, the graph was imported into the open-source visualization software Gephi, which was used to compute the layout of the nodes and edges. A static version of visualization was exported from Gephi, and then the x and y coordinates were translated into latitude/longitude using Python. Then the data was imported into Mapbox for styling, interactivity, and hosting. The Mapbox GL javascript library was used to create a simple website that could be embedded in an article or act as a standalone visualization.


The Mossack Fonseca Universe
The Cost of D.C. Council’s Power Over Contracts

By Patrick Madden, Chris Baronavski, Rachel Baye, Chris Baronavski and Carrie Moskal, WAMU 88.5 on Oct 14, 2014

WAMU and the Investigative Reporting Workshop obtained legislative records to analyze more than 1,000 contracts that were sent to the D.C. Council for approval from 2007 through January 2014. The reporters also analyzed more than 100,000 campaign contributions to D.C. officials and candidates from the Office of Campaign Finance from 2005 through January 2014. This allowed us to capture contributions made two years before or two years after votes on the contracts.


The Cost of D.C. Council’s Power Over Contracts