How To Manage Duplicates In Datasets — IV

Paul Bradley
3 min readJun 27, 2023
Line chart showing the increasing time needed to compare increasing numbers of text pairs using different commercially available applications.
Source: data4decisions

Framework Conclusions

Our framework guides decision making. After considering each element you should be able to answer the following:
• Do we need a process or is a one-off approach acceptable?
• Can we work with off-the-shelf applications, or is an industrial-scale solution needed?
• Are our duplicates complex or simple?
• Can we simply delete our duplicates, or should we…

--

--

Paul Bradley

co-founder at eQAfy | measuring, analyzing & benchmarking digital estates