Content-based text mining technique for retrieval of CAD documents

被引:51
作者
Yu, Wen-der [1 ]
Hsu, Jia-yang [1 ]
机构
[1] Chung Hua Univ, Dept Construct Management, Hsinchu 300, Taiwan
关键词
CAD; Text mining; Information retrieval; Characteristic document; Construction engineering; CONSTRUCTION MANAGEMENT; CLASSIFICATION; INFORMATION; PROJECT;
D O I
10.1016/j.autcon.2012.11.037
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
The computer aided design (CAD) document provides an effective communication medium, a legal contract document, and a reusable design case for a construction project. Due to technological advancements in CAD industry, the volume of CAD documents has been increased dramatically in the database of construction organizations. Traditional retrieval methods relied on textual naming and indexing schemes that require the designers (engineers and architects) to memorize in details the meta-information used to characterize the drawings. Such approaches easily overwhelmed the users' memory capability and thus caused low reusability of CAD documents. In this paper, a content-based text mining technique is adopted to extract the textual content of a CAD document into a characteristic document (CD), which can be retrieved with similarity matching using a Vector Space Model (VSM), so that the automated and expedited retrievals of CAD documents from vast CAD databases become possible. A prototype system, namely Content-based CAD document Retrieval System (CCRS), is developed to implement the proposed method. After preliminary testing with a CAD database with 2094 Chinese annotated CAD drawings collected from two real-world construction projects and a public engineering drawing database, the proposed CCRS is proven to retrieve all relevant CAD documents with relatively high precision when appropriate query is specified. Finally, three search strategies are recommended for the users to narrow down search scope while a target CAD document is desired. It is concluded that the proposed content-based text mining approach provides a promising solution to improve the current difficulty encountered in retrieval and reusability of vast CAD documents for the construction industry. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:65 / 74
页数:10
相关论文
共 29 条
  • [1] [Anonymous], 1996, P ACM C HUMAN FACTOR
  • [2] Bechtel, 1994, BECHT LIN REF MAN
  • [3] Berchtold S., 1997, P INT C MAN DAT SIGM
  • [4] Content-Based Search Engines for construction image databases
    Brilakis, I
    Soibelman, L
    [J]. AUTOMATION IN CONSTRUCTION, 2005, 14 (04) : 537 - 550
  • [5] Automating hierarchical document classification for construction management information systems
    Caldas, CH
    Soibelman, L
    [J]. AUTOMATION IN CONSTRUCTION, 2003, 12 (04) : 395 - 406
  • [6] Automated classification of construction project documents
    Caldas, CH
    Soibelman, L
    Han, JW
    [J]. JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2002, 16 (04) : 234 - 243
  • [7] Cao Y, 2002, LECT NOTES COMPUT SC, V2480, P360
  • [8] Chang S.K., 1999, CONTENT BASED MULTIM
  • [9] Dorre J., 1999, P 5 ACM SIGKDD INT C, V1, P398
  • [10] Eastman C.M., 2011, BIM HDB GUIDE BUILDI, P3