Dedupe.io is a software as a service platform for quickly and accurately identifying clusters of similar records across one or more files or databases.

Dedupe.io is built on top of dedupe, an open source Python library for accurate and scalable fuzzy matching, record deduplication, and entity-resolution. It was developed by Forest Gregg and Derek Eder, partners at DataMade and Dedupe.io LLC, to help clean up and make sense of the variety of messy and disjointed data they encountered while building and consulting on a wide variety of civic technology projects involving campaign finance data, elected officials, health indicators, and government budgets.

After launching the dedupe library, they realized there was a need for a tool that used the same approach, but was usable by non-developers. So they set out to build Dedupe.io as a robust, powerful, and easy to use online tool that anyone in any industry can use to clean up their data.


Dedupe.io is led by a team of dedicated and experienced engineers and data analysts. They have over a decade of experience working with, cleaning up, linking, and performing analysis on large datasets from public and private sectors.

Forest Gregg, Partner Forest works to find ways that information and information technology can help the people of Chicago recognize, understand, and address our shared challenges and opportunities. He has been trained as a sociologist – particularly in quantitative methods and urban sociology. The statistics and machine learning training is useful on projects like dedupe and usaddress; the urban sociology training is useful for projects like Chicago’s Million Dollar Blocks and Where to Buy.

Derek Eder, Partner Derek is an entrepreneur, technologist, organizer and one of the leaders of the civic tech community in Chicago. He is founder and partner at DataMade and the lead organizer for Chi Hack Night, Chicago’s premier weekly event for building, sharing and learning about civic tech. Derek has been building websites in Chicago since 2006 and building up the Chicago civic tech community since 2011. Over the years, Derek has built and collaborated dozens of civic projects including Chicago Lobbyists, ClearStreets, 2nd City Zoning, Look at Cook, Large Lots, MyReps and Is There Sewage in the Chicago River.

Hannah Cushman, Lead Developer Hannah is a wayward journalist turned software developer. She cut her teeth on public life in mid-Missouri, covering municipal economic development (ask her about enterprise zones) and elections. An alumna of the Missouri School of Journalism and a veteran of the Associated Press, Hannah remains deeply interested in how information is consumed, shared, and acted upon. At DataMade, she loves projects that derive meaning from data, both narratively like Justice Divided, and practically, like Dedupe.io. She’s also devoted to documenting of common patterns and commenting clever functions. In her spare time, Hannah enjoys pondering particle physics, eating just about anything, and simply existing in the Lincoln Square apartment she shares with her fiancé, plants, and three cats.