Dedupe.io was shut down Jan 31, 2023.
The Dedupe.io team has decided to dedicate our focus to our consulting practice at DataMade and work on projects more aligned with our mission to support our clients in working toward democracy, justice, and equity.
We are continuing our consulting practice around the open source dedupe library and would be happy to consult with you on setting up a solution based on it. Contact us to get started >
Dedupe.io is proud to support journalists working on important investigations with messy and complex data.
As part of that support, we are happy to offer journalists with up to 250,000 free rows for their investigations if they cite Dedupe.io in their story.
If you are interested, email us from your publication email address at info@datamade.us and tell us your publication, the data you are working with, and CC your editor, so we know that they are on board. We’ll get you set up with an account and bonus rows!
If you need additional rows, we are happy to offer additional discounts.
The citation can appear in your story or its methods section. It should name the service “Dedupe.io,” briefly explain how you used it for your story, and (if the story is publishedontheweb)linkbacktoh ttps://dedupe.io.
Here are some examples of acceptable citations:
“The Local Herald used Dedupe.io to link officers across court records…”
“We merged the voter information into one spreadsheet. We then matched the names in the school employee spreadsheet with all the voter names… The technique we used is called deduping, and the program we used is called Dedupe.io. ..”
By Jonah Newman, Chicago Reporter on Feb 22, 2019
We used dedupe.io … to match donors based on commonalities in names, addresses, and reported occupations. Matches were then reviewed by hand to verify accuracy and completeness, though with more than 39,000 individual contributions …
By Chris Joyner, Jeff Ernsthausen and Willoughby Mariano, The Atlanta Journal-Constitution on Jun 22, 2018
To investigate differences in eviction filing rates and behavior by different owners or operators, the AJC used Dedupe.io, a machine learning-based application, to link the LLC’s that purchased most multifamily properties together using names and addresses listed on real estate transaction records.
By Jennifer Bjorhus and MaryJo Webster, StarTribune on Oct 1, 2017
To assess the record of the Minnesota POST Board, the Star Tribune matched names and dates of birth of licensed law enforcement officers against conviction data from state criminal court records. The POST Board provided names, dates of birth and licensing start and expiration dates for anyone who has held a peace officer’s license since Minnesota’s licensing system started in the late 1970s.
“Dedupe.io simplifies the process of trying to do a messy match of names and dates of birth between two datasets, which I’ve found to be one of the hardest data tasks. Dedupe found matches that I likely would’ve missed using any other tool, and gave me far greater confidence in my results.”
By Brian Clifton, Fusion on Apr 15, 2016
Cleaning and preparing the data was done with Python, relying heavily on the pandas library and DataMade’s Dedupe. NetworkX was used to create the network graph and to perform operations on the graph. Next, the graph was imported into the open-source visualization software Gephi, which was used to compute the layout of the nodes and edges. A static version of visualization was exported from Gephi, and then the x and y coordinates were translated into latitude/longitude using Python. Then the data was imported into Mapbox for styling, interactivity, and hosting. The Mapbox GL javascript library was used to create a simple website that could be embedded in an article or act as a standalone visualization.
By Patrick Madden, Chris Baronavski, Rachel Baye, Chris Baronavski and Carrie Moskal, WAMU 88.5 on Oct 14, 2014
WAMU and the Investigative Reporting Workshop obtained legislative records to analyze more than 1,000 contracts that were sent to the D.C. Council for approval from 2007 through January 2014. The reporters also analyzed more than 100,000 campaign contributions to D.C. officials and candidates from the Office of Campaign Finance from 2005 through January 2014. This allowed us to capture contributions made two years before or two years after votes on the contracts.