Field comparators


Dedupe.io can compare your fields in different ways depending on the makeup of the data.

Types of field comparators

The 'Define fields' page on Dedupe.io
The 'Define fields' page on Dedupe.io

Simple comparators

These are the most commonly used comparators. They will handle most types of data you want to compare like names, addresses and dates. If you're not sure what one to pick, Default is always a good option.

Comparator Description Example
Default Default comparator. Fields will be compared based on how similar they are to each other, character by character. This is the most versitile and commonly used comparator and will work best on most fields. atty title guaranty fund
vs
attorneys' title guarantee fund, inc.
Address Splits addresses into separate components using usaddress. Works best on addresses in the United States. 1 S. Wacker Dr Chicago
vs
One South Wacker Chicago, IL
Name Splits names into separate components using probablepeople. Good for western person names, company names and households. Mr George 'Gob' Bluth II
vs
George Bluth Jr
Dates and Times Compares calendar dates and times of day using the python dateutil library. Good for dates of birth or event dates. 2017-07-25
vs
Tuesday, July 25th 2017

Advanced comparators

If you know more about the structure of your data, you can use these advanced comparators to create custom rules that will improve the accuracy of your de-duplication results.

Comparator Description Examples
Categorical Used for comparing keywords or categories with a small number of options (5 or less). red vs green
Exact Match Checks to see if the fields exactly match or not. Good for cleaned and consistent data. Chicago, IL vs Chicago, IL
Exists? Measures whether both, one, or neither of the fields are defined. Good for sparsely populated data (when presence is significant). ABC vs  
Fuzzy Categorical Used for comparing keywords or categories with a large number of options (more than 5). This is useful for fields like occupation or employer. Attorney vs Lawyer
Long Text Compares entire words. Useful for longer text fields. Good for product descriptions or article abstracts. The quick brown fox jumped over the lazy dog.
vs
The slow brown squirrel hopped over the sleeping dog.
Set Used for comparing short lists of pre-defined elements. Good for multiple keywords or categories. List elements should be separated by commas (like "Red, Green, Blue".) red, round, soft
vs
blue, round, sharp

Custom parsers

For particularly messy datasets, we can improve the results of the Dedupe.io by building custom parsers fine-tuned for your data. Custom parsers enable smarter matches by breaking up semi-structured text into separate fields for better comparisons.

For more information, contact us at dedupe@datamade.us