« Dedupe in action

Project
ProQuest-UMETRICS Linkage
By
Institute for Research on Innovation & Science
Author(s)
Raphael Ku, Natsuko Nicholls, Beth Uberseder and Matt VanEseltine
Link
https://iris.isr.umich.edu/wp-content/uploads/2019/07/2019-Summer-Supplementary-Release.pdf
Published
July 2019
Tool used
dedupe python library
ProQuest-UMETRICS Linkage

This supplementary release in the summer of 2019 produces the new results from linking UMETRICS employee transaction records to ProQuest dissertation data with a focus on dissertation subjects.

Using the Python package dedupe, the 244,023 unique publications were condensed into one author per row with combined thesis title and subject information, with a final n of 242,316.