A New Algorithm to Predict Missing Values in Datasets

Document Type : Computer Article

Authors

kerman

Abstract

Most Datasets related to data mining and machine learning contain data with missing values. How to deal with missing values and to provide solutions based on estimating missing values lead to a very important issue in the field of machine learning and data mining. Among data mining algorithm, the C4.5 algorithm has been used repeatedly because of performance being used in various applications and also ability in working and estimating missing values in data sets. Researchers have presented various methods for deal with missing values and estimating it’s amount in a C4.5 data sets which any of their method causes an increase in accuracy of decision tree and there for produce a more effective and efficient decision. In this paper, for estimating missing values in data sets, at the first, we review the previous methods then the proposed approach as a displacement properties method and in the end the accuracy of proposed methods for deletion and average will be comparing.

Keywords

Main Subjects


 
[1] Bhardwaj, R., Vatta, S. (2013).  “Implemention of ID3 Algorithm”. International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 82, pp. 317–329.
 [2] Gey, S.,  Nedelec, E.  (2005). “Model Selection for CART regr ession trees”. Information Theory IEEE Transactions,  Vol.51,  Issue 2.
[3] Zhu, H., Chen, S. (2013). “Rang Tree:A Feauter Selection Algoritm for C4.5 Decission Tree”,  5th International Conference on Intelligent Networking and Collaborative Systems (INCoS).
[4] Chen, MH.,  Lipsitz, SR. (2008). “Bayesian methods for generalized linear models with covariates missing at random”, Canadian Journal of Statistics.
[5] Marwala, T. (2009). “Computational Intelligence for Missing Data Imputation, Estimation and Management: Knowledge Optimization Techniques”, South Africa University of Witwatersrand IGI Global.
[6] Little RJA. (1988). “A test of missing completely at random for multivariate data with Missing Values”, State Assoc.
[7] Fleiss, JL., Levin, B., Paik MC. (2002).“ Statistical Methods for Rates and Proportions”, 3rd  International New York.
[8] Priyadharsini, C, Selvadoss, P. (2014). “Prediction of Missing Values in Blood Cancer and Ocuurrence of Cancer Using Improved ID3 Algorithm”, International Journal of Innovative Research in Computer and Communication Engineering, Vol.2, Issue 8.
[9] Kaiser, J. (2014). “Dealing with Missing Values in Data”, Journal of Systems Integration, Vol.6, Issue 10.
[10] Augustin, M.,  Sakena,  S. (2015). “Machine Learning with Missing Attributes Value Methods Implementation”, Procedding of the Worid Congress on Engineering and Computer.
[11] Chen, T. (2015).  “A comparison of approaches for dealing with Missing Values”, proceedings of the international conference on machine learning and cybernetics.
[12] Huaxiong, Li.(2013).  “Missing Values Imputation Based on Iterative Learning”, Canadian Journal of Statistics.
[13] Clark, P., Niblett, T. (2012).  “The LEM2 and C4.5 Inductiion Algorithm Machine”, International Journal software Computer.
[14] Jerzy,  W. (2015).  “A Comparison of Rule Induction Using Feature Selection and the LEM2 Algorithm”, Springer Verlag Berlin Heidelberg.
[15] Grzymala, J. (2014). “On the unknown attribute values in learning from examples”, 6th International Symposium on Methodologies for Intelligent systems, Charlotte.
[16] Quinlan, J. (2014).  “C4.5 Programs for Machine Learning”, MorganKaufman Publishers, San Matteo CA.
[17] Sharma, A. Mehta, N. (2013). “Reasoning with Missing Values in Multi Attribute Datasets”, International Journal of Advanced Research in Computer Science and Software Engineering.
[18] Jeerz, A. (2016). “Coordinate Metology”, Acuuracy of Systema and Measurements Series Springer Tracts in Mechanial Enginnering.
 
.