Missing attribute values is a recurrent problem in data mining and machine learning. Although there are plenty of techniques to handle this problem, most of them are too simplistic to provide a good estimation for absent attribute values. A very active research area focuses on solving the missing attribute value problem via imputation methods, which replaces missing data with substituted values. This paper proposes a new imputation method which uses a special graph named Complete p-Partite Attribute-based Decision Graphs (CpP-AbDG) to estimate, in a consistent and plausible way, the missing values. The graph is built by considering the range of each attribute that describes the data divided into sub-intervals; sub-intervals are approached as the vertices of a graph. Edges are then established between pairs of different vertices, provided they do not related to the same attribute. The edges and vertices are finally assigned a weight, based on distributions of the classes. The resulting CpP-AbDG has shown to be a suitable and informative data structure for finding the proper interval in which a missing attribute value should lie, taking into account all the attributes that describe the data. Results comparing the proposed approach to classical ones in an computational environment that considers classification problems as an evaluation criteria, show the potential of the method.
Financed by the National Centre for Research and Development under grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program:
SYNAT - “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.