Abstract

Data scientists are facing multiple issues when working with real-life data. Logs are rarely devoid of incorrect values and one of the common categories of data problems is missing values. Gaps in logs are of various shapes, sizes, and quantities, with a plethora of techniques to infill, or restore missing values. No single algorithm will perform best for all scenarios, hence in pursuit of best results exploration of various options is necessary. Furthermore, gap filling in single step may be impossible for certain methods, where gaps exist for multiple attributes. This paper explores an automated iterative approach, where a selection of common algorithms and different input combinations are evaluated on existing data to select the best method based on R2 score. With the ability to perform iterative infilling, where previously imputed data is re-used as training data to patch other gaps, this represents the most automated and universal approach for gap filling in real-life data-series. This paper presents the methodologies and issues behind automated iterative approach to gap filling, and discusses what is necessary to achieve the final goal of high quality, one-click and optimal data infilling.

This content is only available via PDF.
You do not currently have access to this content.