Outlier Detection for Analysis of Real Estate Price

被引:1
作者
Cetiner, Meltem [1 ,2 ]
Dincsoy, Ozge [1 ,3 ]
Toraman, Taner [1 ]
机构
[1] Idea Teknol Cozumleri, Maslak Mah Sanatkarlar Sok 5-8, TR-34398 Istanbul, Turkey
[2] Gebze Tekn Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
[3] Bogazici Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
来源
2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU) | 2020年
关键词
Real Estate Price; Outlier Detection; Univariate Models; Multivariate Models;
D O I
10.1109/siu49456.2020.9302110
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The original form of the real estate data, which is used for different purposes, such as housing or investment, needs to be preprocessed for outlier detection caused by spelling errors, systemic errors or artificial documents created to attract attention. In Literature review, there is no analysis study of the real estate price which uses multi-variate outlier detection. In this study, different outlier detection methods have been conducted and the results are compared using the housing prices in Istanbul province by considering the register office information. The models were tested on the labeled data-set which is located in Istanbul. The best results were obtained by using One-class Support Vector Machines (OCSVM) and Average K-Nearest Neighbor (AveKNN). After elimination of the detected outlier documents, a general system is created to generate the ranges for price per square meter of the real estate values. The best results to estimate the range for price per square meter were obtained using Average KNN model. The subject of this study, determination of price per square meter ranges of real estate values are expected to be used in various areas, such as investment, tax auditing, detection of false advertisements on real estate sites.
引用
收藏
页数:4
相关论文
共 19 条
[1]  
Angiulli F., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P15
[2]  
[Anonymous], 1980, IDENTIFICATION OUTLI
[3]  
[Anonymous], 2002, APPL DATA MINING COM, DOI DOI 10.1007/978-1-4615-0953-04
[4]  
[Anonymous], 2001, CREDIT SCORING CREDI
[5]  
Barnett et., 1984, Outliers in statistical data, V3, P120
[6]  
BECKMAN RJ, 1983, TECHNOMETRICS, V25, P119, DOI 10.2307/1268541
[7]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[8]   Identifying mislabeled training data [J].
Brodley, CE ;
Friedl, MA .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1999, 11 :131-167
[9]  
Cimbala JohnM., 2011, Outliers
[10]  
Ding Z., 2013, IFAC Proc, V46, P12, DOI DOI 10.3182/20130902-3-CN-3020.00044