« Dedupe in action

Project
The Mossack Fonseca Universe
By
Fusion
Author(s)
Brian Clifton
Link
http://community.globaleditorsnetwork.org/content/mossack-fonseca-universe-0
Published
April 2016
Tool used
dedupe python library
The Mossack Fonseca Universe

Cleaning and preparing the data was done with Python, relying heavily on the pandas library and DataMade’s Dedupe. NetworkX was used to create the network graph and to perform operations on the graph. Next, the graph was imported into the open-source visualization software Gephi, which was used to compute the layout of the nodes and edges. A static version of visualization was exported from Gephi, and then the x and y coordinates were translated into latitude/longitude using Python. Then the data was imported into Mapbox for styling, interactivity, and hosting. The Mapbox GL javascript library was used to create a simple website that could be embedded in an article or act as a standalone visualization.