Data blending in manufacturing and supply chains

B. Y. Ong; Rong Wen; Allan N. Zhang

doi:10.1109/BigData.2016.7841047

Data blending in manufacturing and supply chains

Source

2016 IEEE International Conference on Big Data (Big Data) > 3773 - 3778

Abstract

Big Data revolution has transformed business models of many organizations to include the usage of big data analytics. Big Data are believed to be the key basis of competition and growth in today's world whereby huge amounts of data are created daily. One of the main challenges of Big Data is not mainly about the storage of the data but how to blend the different varieties or sources of data together and turn them into values. As the nature of supply chain is complex and dynamic, data are stored in various forms or managed independently. The data have their own naming convention as the data from the different nodes in the supply chain seldom communicate with each other. Some of the challenges of data blending are the lack of unique identifiers to merge the data together and the lack of training data or domain knowledge to understand the criteria to blend the data. In this paper, an automatic filtering and sorting similarity metric, Term Frequency-Inverse Document Frequency (TF-IDF) Ratcliff/Obershelp is proposed. The method is able to handle the issue of same entity with different naming conventions and allow word filtering. The experiment results show that the proposed TF-IDF Ratcliff/Obershelp is able to improve the performance of the data blending.