Instructions and tips on formatting and processing files for upload.
You can import a file of these file types:
The file must contain one row for every record, with the first row indicating the name of each column.
Spreadsheets (.xls, .xslx)
For spreadsheets, only the first sheet will be imported. Special formatting, and formulas will not be imported.
Comma-separated text (.csv)
Comma-separated value (.csv) files are a standard way to represent tabular data. Cell values with comma, newline or quotes must be quoted and quotes escaped by doubling. For example:
"17"" LCD display".
By convention, the first row acts as header and defines the names of the columns in the data.
If your spreadsheet contains more than one header row, columns with duplicate names, or columns with no field name, you will need to edit your file to correct these issues before uploading.
Once you upload your data to dedupe.io, you will not be able to edit it. We take the approach of dealing with data in its original messy state and simply identify which records to cluster together.
However, there are some cases where editing your data before uploading to dedupe.io is a good idea. If a column has a lot of blank values, that’s ok. Dedupe.io will know how to ignore them appropriately. However, if your data has text like “Null” or “n/a” in them, it would be a good idea to clear them out. We recommend using tools like Excel, or Open Refine for larger spreadsheets to make these kinds of changes.
Dedupe.io works best on spreadsheets that have 100 rows or more. When spreadsheets have fewer than 100 rows, Dedupe.io will have troubled getting a good sample of the data to work with. (It’s usually faster to clean such small spreadsheets by hand, anyway.) For these reasons, we prevent uploads of spreadsheets with fewer than 100 rows.
If you’d like to test the service and you don’t have a large enough messy dataset on hand, you can use our example spreadsheet of early childhood education centers in Chicago:
Download Chicago Early Childhood Locations (800 rows)