Dedupe.io was shut down Jan 31, 2023.
The Dedupe.io team has decided to dedicate our focus to our consulting practice at DataMade and work on projects more aligned with our mission to support our clients in working toward democracy, justice, and equity.
We are continuing our consulting practice around the open source dedupe library and would be happy to consult with you on setting up a solution based on it. Contact us to get started >
De-duplicate and find matches in your Excel spreadsheet or database
Dedupe.io is a powerful tool that learns the best way to find similar rows in your data. Using cutting-edge research in machine learning we quickly and accurately identify matches in your Excel spreadsheet or database—saving you time and money.
Trusted at organizations around the world
In today’s world of big data, there’s never been more information available to work with. Unfortunately, all this data is hard to use, especially if it’s been entered by hand or comes from different systems. The simple task of figuring out who is who in a spreadsheet or database can be a daunting, time-consuming task.
That’s where Dedupe.io comes in. We developed the best dynamic and scalable solution for de-duplicating and linking datasets, and built a simple step-by-step wizard for anyone to use it.
Read more about how and why we built Dedupe.io »
Select examples of impactful projects powered by Dedupe.io and the dedupe python library.
Upload a spreadsheet and find all exact and similar records within it
Link together two or more spreadsheets and find overlapping records in each
Upload a master list and check new spreadsheets against it
Real-world data is messy, and Dedupe.io was built to work with it
We find matches even when there are major data quality issues
Data that is hand-typed can have misspellings, abbreviations and other typos
We match them using powerful text similarity algorithms
|atty title guaranty fund||One S. Wacker Dr. 24th Floor Chicago, IL 60606|
|attorneys' title guarantee fund, inc.||1 s. wacker drive 24th floor chicago il 60606|
Different people and systems format data differently
We parse out names, addresses and any text to make smart comparisons
|Chicago Commons Guadalupano||1814 S. Paulina 60608||6663883|
|Chicago Commons Guadalupano Family Center||1814 South Paulina 60608||6663884|
|Chicago Commons Association - Guadalupano Family Center||1814 S Paulina St||6663883|
|CHICAGO COMMONS ASSOCIATION GUADALUPANO FAMILY CENTER||1814 S PAULINA 60608||6663883|
Sometimes, your data doesn't agree with itself
We compare using multiple fields to find records with the most agreement
|kennedy-king college||6301 s halsted street 60621||6025340|
|kennedy-king college||6800 s wentworth avenue 60621||6025340|
Upload any spreadsheet or connect directly to your database
You provide training on the right way to identify similar records in your data
Matches are automatically found for you to review and then download
We’re told that we’re in the age of big data and the analytics revolution, leading to “the algorithmic business.” Not only will managers be able to make better strategic decisions based on data, systems will generate and then follow data-driven algorithms to make thousands of operations-related decisions each second.
But has your company fully exploited that potential?
Contact us to get a copy of our white paper, Entity Resolution with Machine Learning: Dedupe.io’s Scalable Foundation for Data Quality.
We're happy to help! Read our FAQ