Review and comparison of works on heterogeneous data and semantic analysis in Big Data
DOI:
https://doi.org/10.5433/1679-0375.2021v42n1p113Keywords:
Data analysis, Heterogeneous data, Big data, Semantic heterogeneity, Structural heterogeneityAbstract
In integration approaches, heterogeneity is one of the main challenging factors on the task of providing integration among different data sources, whose solution lies in the search for equality among them. This work describes the state of the art and theoretical foundation involved in the structural and semantic analysis of heterogeneous data and information. The work aims to review methods and techniques used in data integration in Big Data, considering data heterogeneity, reviewing techniques that use the concepts of Semantic Web, Cloud Computing, Data Analysis, Big Data, Data Warehouse and other technologies to solve the problem of data heterogeneity. The research was divided into three stages. In the first stage, articles were selected from digital libraries according to their titles and keywords. In the second stage, the papers went through a second filter based on their summary, and, besides that, duplicate articles were also removed. The works’ introduction and conclusion were analyzed in the third stage to select the articles belonging to this systematic review. Throughout the study, articles were analyzed, compared and categorized. At the end of each section, the interrelationships and possible areas for future work were shown.Downloads
References
ALKHAMISI, A. O.; SALEH, M. Ontology opportunities and challenges: discussions from semantic data integration perspectives. In: CONFERENCE ON DATA SCIENCE AND MACHINE LEARNING APPLICATIONS (CDMA). 6., 2020, Riyadh. Proceedings [...]. Riyadh: IEEE, 2020. p. 134-140.
ALQARNI, A. A.; PARDEDE, E. Integration of data warehouse and unstructured business documents. In: INTERNATIONAL CONFERENCE ON NETWORK-BASED INFORMATION SYSTEMS, 15., 2012, Melbourne. Proceedings [...]. Melbourne: IEEE, 2012. p. 32-37.
ARPUTHAMARY, B.; AROCKIAM, L. Data Integration in big data environment. Bonfring International Journal of Data Mining, Tamilnadu, v. 5, n. 1, p. 1-5, 2015.
ASSAF, A.; LOUW, E.; SENART, A.; FOLLENFANT, C.; TRONCY, R.; TRASTOUR, D. RUBIX: a framework for improving data integration with linked data. In: WOD 12: INTERNATIONAL WORKSHOP ON OPEN DATA, 1., 2012, Nantes. Proceedings [...]. New York: Association for Computing Machinery, 2012. p. 13-21.
AUFAURE, M.-A.; CHIKY, R.; CURÉ, O.; KHROUF, H.; KEPEKLIAN, G. From business intelligence to semantic data stream management. Future Generation Computer Systems, London, v. 63, p. 100-107, 2016.
BALA, M. BOUSSAID, O.; ALIMAZIGHI, Z. P-ETL: Parallel-ETL based on the MapReduce paradigm. In: INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 11., 2014, Doha. Proceedigns [...]. Doha: IEEE, 2014. p. 42-49.
BONDAREV, A.; ZAKIROV, D. Data warehouse on Hadoop platform for decision support systems in education. In: INTERNATIONAL CONFERENCE ON ELECTRONICS COMPUTER AND COMPUTATION (ICECCO), 12., 2015, Almaty. Proceedings [...].Almaty: Suleyman Demirel University, 2015. p. 1-4.
BORTOLI, S.; BOUQUET, P.; POMPERMAIER, F.; MOLINARI, A. Semantic big data for tax assessment. In: SBD 16: INTERNATIONAL WORKSHOP ON SEMANTIC BIG DATA, 16., 2016, San Francisco Califórnia. Proceedings [...]. New York: Association for Computing Machinery, 2016. p. 1-6.
CHEN, W.; WANG, R.; WU, R.; TANG, L.; FAN, J. Multi-source and heterogeneous data integration model for big data analytics in power DCS. In: INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC), 2016, Chengdu. Proceedings [...]. Chengdu: IEEE, 2016. p. 238-242.
CHOI, T.-M.; CHAN, H. K.; YUE, X. Recent development in big data analytics for business operations and risk management. IEEE transactions on cybernetics, New York, v. 47, n. 1, p. 81-92, 2016.
CUZZOCREA, A.; DIAMANTINI, C.; GENGA, L.; POTENA, D.; STORTI, E. A composite methodology for supporting collaboration pattern discovery via semantic enrichment and multidimensional analysis. In: INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 6., 2014, Tunis. Proceedings [...]. Tunis: IEEE, 2014. p. 459-464.
CUZZOCREA, A.; SACCÀ, D.; ULLMAN, J. D. Big data: a research agenda. In: IDEAS 13: INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM, 17., 2013, Barcelona. Proceedings [...]. New York: Association for Computing Machinery, 2013. p. 198-203.
CUZZOCREA, A.; SONG, I-Y.; DAVIS, K. C. Analytics over large-scale multidimensional data: the big data revolution!. In: DOLAP 11: INTERNATIONAL WORKSHOP ON DATA WAREHOUSING AND OLAP, 14., 2011, Glasgow Scotland. Proceedings [...]. New York: Association for Computing Machinery, 2011. p. 101-104.
DEB NATH, R. P.; HOSE, K.; PEDERSEN, T. B. Towards a programmable semantic extract-transform-load framework for semantic data warehouses. In: PROCEEDINGS OF THE ACM EIGHTEENTH INTERNATIONAL WORKSHOP ON DATA WAREHOUSING AND OLAP, 24., 2015, Melbourne. Proceedings [...]. New York: Association for Computing Machinery, 2015. p. 15-24.
DONG, X. L.; SRIVASTAVA, D. Big data integration. In: INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 29., 2013, Brisbane. Proceedings [...].Brisbane: IEEE, 2013. p. 1245-1248.
FAELDON, J.; ESPANA, K.; SABIDO, D. J. Data-centric HPC for numerical weather forecasting. In: INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, 43., 2014, Minneapolis. Proceedings [...].Minneapolis: IEEE, 2014. p. 79-84.
FATHY, N.; GAD, W.; BADR, N. A Unified Access to Heterogeneous big data through ontology-based semantic integration. In: INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INFORMATION SYSTEMS (ICICIS), 9., 2019, Cairo. Proceedings [...]. Cairo: IEEE, 2019. p. 387-392.
GAO, J.; XIAO, J. Research on heterogeneous data access and integration model based on OGSA-DAI. In: INTERNATIONAL CONFERENCE ON COMPUTATIONAL AND INFORMATION SCIENCES, 2013, Shiyang. Proceedings [...].Shiyang: IEEE, 2013. p. 1690-1693.
GHOSH, R.; HAIDER, S.; SEN, S. An integrated approach to deploy data warehouse in business intelligence environment. In: INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND INFORMATION TECHNOLOGY (C3IT). 13., 2015, Hooghly. Proceedings [...].Hooghly: IEEE, 2015. p. 1-4.
GUO, S.; DONG, X. L.; SRIVASTAVA, D.. Record linkage with uniqueness constraints and erroneous values. Proceedings of the VLDB Endowment, [S. l.], v. 3, n. 1/2, p. 417-428, 2010.
KADADI, A; AGRAWAL, R.; NYAMFUL, C.; ATIQ, R. Challenges of data integration and interoperability in big data. In: IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, Washington. Proceedings [...].Washington: IEEE, 2014. p. 38-40.
KELLER, R.; RANJAN, S.; WEI, M. Y; ESHOW, M. M. Semantic representation and scale-up of integrated air traffic management data. In: SBD 16: INTERNATIONAL WORKSHOP ON SEMANTIC BIG DATA, 16., 2016, San Francisco. Proceedings [...]. New York: Association for Computing Machinery , 2016. p. 1-6.
KIMBALL, R.; ROSS, M. The data warehouse toolkit: the complete guide to dimensional modeling. 2nd. ed. New York: John Wiley & Sons, 2011.
KOMAMIZU, T.; AMAGASA, T.; KITAGAWA, H. SPOOL: a SPARQL-based ETL framework for OLAP over linked data. In: INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEBBASED APPLICATIONS & SERVICES, 15., 2015, Brussels. Proceedings [...]. New York: Association for Computing Machinery, 2015. p. 1-10.
LE-PHUOC, D.; QUOC, H. N. M.; QUOC, H. N.; NHAT, T. T.; HAUSWIRTH, M. The graph of things: a step towards the live knowledge graph of connected things. Journal of Web Semantics, London, v. 37, p. 25-35, 2016.
LI, X., DONG, L.; LYONS, K.; MENG, W.; SRIVASTAVA, D. Truth finding on the deep Web: Is the problem solved?. Proceedings of the VLDB Endowment, [S. l.], v. 6, n. 2, p. 97-108, 2012.
MADKOUR, A.; AREF, W. G.; BASALAMAH, S. Knowledge cubes: a proposal for scalable and semantically-guided management of Big Data. In: IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013, Silicon Valley. Proceedings [...].Silicon Valley: IEEE, 2013. p. 1-7.
MALVIYA, Ayushi; UDHANI, Amit; SONI, Suryakant. R-tool: Data analytic framework for big data. In: SYMPOSIUM ON COLOSSAL DATA ANALYSIS AND NETWORKING (CDAN), 2016, Indore. Proceedings [...].Indore: IEEE, 2016. p. 1-5.
MCDANIEL, M.; STOREY, V. C. Evaluating domain ontologies: clarification, classification, and challenges. ACM Computing Surveys, New York, v. 52, n. 4, p. 1-44, 2019.
MOUNTASSER, I.; OUHBI, B.; FRIKH, B. From data to wisdom: a new multi-layer prototype for Big Data management process. In: INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA), 15., 2015, Marrakech. Proceedings [...].Marrakech: IEEE, 2015. p. 104-109.
NADAL, S.; ROMERO, O.; ABELLÓ, A.; VASSILIADIS, P.; VANSUMMEREN, S. An integration-oriented ontology to govern evolution in big data ecosystems. Information Systems, Elmsford, v. 79, p. 3-19, 2019.
NIMMAGADDA, S. L.; DREHER, H. V. Big-data integration methodologies for effective management and data mining of petroleum digital ecosystems. In: IEEE INTERNATIONAL CONFERENCE ON DIGITAL ECOSYSTEMS AND TECHNOLOGIES (DEST), 7, 2013, Menlo Park. Proceedings [...].Menlo Park: IEEE, 2013. p. 148-153.
NUGRAHENI, E.; AKBAR, S.; SAPTAWATI, G. Ayu Putri. Framework of semantic data warehouse for heterogeneous and incomplete data. In: IEEE REGION 10 SYMPOSIUM (TENSYMP), 2016, Bali. Proceedings [...]. Bali: IEEE, 2016. p. 161-166.
OSTROWSKI, D.; RYCHTYCKYJ, N.; MACNEILLE, P.; KIM, M. integration of big data using semantic web technologies. In: IEEE TENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2016, Laguna Hills. Proceedings [...].Laguna Hills: IEEE, 2016. p. 382-385.
PELEKIS, N.; THEODORIDIS, Y.; JANSSENS, D. On the management and analysis of our lifesteps. ACM SIGKDD Explorations Newsletter, [S. l.], v. 15, n. 1, p. 23- 32, 2014.
QIN, H.-F.; QIAN, Z.-M.; ZHAO, Y.-C. On the research of data warehouse in big data. In: INTERNATIONAL CONFERENCE ON NETWORK AND INFORMATION SYSTEMS FOR COMPUTERS, 2015, Wuhan. Proceedings [...].Wuhan: IEEE, 2015. p. 354-357.
SAES, K. R. Abordagem para integração automática de dados estruturados e não estruturados em um contexto Big Data. 2019. Tese (Doutorado) - Universidade de São Paulo, São Paulo, 2019.
SHAFIEE, M. E.; BARKER, Z.; RASEKH, A. Enhancing water system models by integrating big data. Sustainable cities and Society, London, v. 37, p. 485-491, 2018.
YAHYA, F.; FAZLI, B. M.; ABDULLAH, M. F.; ZULKIFLI, H. Extending the national lake database of malaysia (mylake) as a central data exchange using big data integration. In: INTERNATIONAL CONFERENCE ON DATA SCIENCE AND INFORMATION TECHNOLOGY, 2., 2019, Seoul. Proceedings [...]. New York: Association for Computing Machinery, 2019. p. 30-35.
Downloads
Published
How to Cite
Issue
Section
License
The Copyright Declaration for articles published in this journal is the author’s right. Since manuscripts are published in an open access Journal, they are free to use, with their own attributions, in educational and non-commercial applications. The Journal has the right to make, in the original document, changes regarding linguistic norms, orthography, and grammar, with the purpose of ensuring the standard norms of the language and the credibility of the Journal. It will, however, respect the writing style of the authors. When necessary, conceptual changes, corrections, or suggestions will be forwarded to the authors. In such cases, the manuscript shall be subjected to a new evaluation after revision. Responsibility for the opinions expressed in the manuscripts lies entirely with the authors.
This journal is licensed with a license Creative Commons Attribution-NonCommercial 4.0 International.