Formatting files for upload


Instructions and tips on formatting and processing files for upload.

Supported file types

You can import a file of these file types:

  • Spreadsheets (.xls, .xslx)
  • Comma-separated text (.csv)

The file must contain one row for every record, with the first row indicating the name of each column.

Spreadsheets (.xls, .xslx)
For spreadsheets, only the first sheet will be imported. Special formatting, and formulas will not be imported.

Comma-separated text (.csv)
Comma-separated value (.csv) files are a standard way to represent tabular data. Cell values with comma, newline or quotes must be quoted and quotes escaped by doubling. For example: "17"" LCD display".

By convention, the first row acts as header and defines the names of the columns in the data.

Pre-processing your data

If your spreadsheet contains more than one header row, columns with duplicate names, or columns with no field name, you will need to edit your file to correct these issues before uploading.

Once you upload your data to dedupe.io, you will not be able to edit it. We take the approach of dealing with data in its original messy state and simply identify which records to cluster together.

However, there are some cases where editing your data before uploading to dedupe.io is a good idea. If a column has a lot of blank values, that’s ok. Dedupe.io will know how to ignore them appropriately. However, if your data has text like “Null” or “n/a” in them, it would be a good idea to clear them out. We recommend using tools like Excel, or Open Refine for larger spreadsheets to make these kinds of changes.