De-duplicate and find matches in your Excel spreadsheet or database is a powerful tool that learns the best way to find similar rows in your data. Using cutting-edge research in machine learning we quickly and accurately identify matches in your Excel spreadsheet or database—saving you time and money.

Trusted at organizations around the world

Minneapolis Star Tribune
Open Secrets
Entytle Aftermarket engagement platform
St Charles County, MO

A simple tool for a complex problem

In today’s world of big data, there’s never been more information available to work with. Unfortunately, all this data is hard to use, especially if it’s been entered by hand or comes from different systems. The simple task of figuring out who is who in a spreadsheet or database can be a daunting, time-consuming task.

That’s where comes in. We developed the best dynamic and scalable solution for de-duplicating and linking datasets, and built a simple step-by-step wizard for anyone to use it.

Read more about how and why we built » uses

  • De-duplicating customer records
  • Combining lists of addresses or businesses
  • Master data management
  • Merging different database systems
  • Creating a master list of products or parts
  • Cleaning up lists of names and emails
  • Finding contributions in campaign finance
  • Cross-referencing government records

And much more!
Not sure about your use case? Drop us a line

Dedupe in action

Select examples of impactful projects powered by and the dedupe python library.

See more examples >

How can you use

Find duplicates in a spreadsheet

Upload a spreadsheet and find all exact and similar records within it

Merge multiple files

Link together two or more spreadsheets and find overlapping records in each

Check against a canonical list

Upload a master list and check new spreadsheets against it

We find the hard matches

Real-world data is messy, and was built to work with it

We find matches even when there are major data quality issues

Typos, misspellings and abbreviations

Data that is hand-typed can have misspellings, abbreviations and other typos
We match them using powerful text similarity algorithms

name address
atty title guaranty fund One S. Wacker Dr. 24th Floor Chicago, IL 60606
attorneys' title guarantee fund, inc. 1 s. wacker drive 24th floor chicago il 60606

Inconsistent formatting

Different people and systems format data differently
We parse out names, addresses and any text to make smart comparisons

site_name address phone
Chicago Commons Guadalupano 1814 S. Paulina 60608 6663883
Chicago Commons Guadalupano Family Center 1814 South Paulina 60608 6663884
Chicago Commons Association - Guadalupano Family Center 1814 S Paulina St 6663883

Contradictory fields

Sometimes, your data doesn't agree with itself
We compare using multiple fields to find records with the most agreement

site_name address phone
kennedy-king college 6301 s halsted street 60621 6025340
kennedy-king college 6800 s wentworth avenue 60621 6025340

How it works

Upload your data

Upload any spreadsheet or connect directly to your database

Train it

You provide training on the right way to identify similar records in your data

Validate and download

Matches are automatically found for you to review and then download

Learn more about how it works »


We're happy to help! Read our FAQ