Technology of text mining

被引:0
作者
Visa, A [1 ]
机构
[1] Tampere Univ Technol, FIN-33101 Tampere, Finland
来源
MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION | 2001年 / 2123卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A large amount of information is stored in databases, in intranets or in Internet. This information is organised in documents or in text documents. The difference depends on the fact if pictures, tables, figures, and formulas are included or not. The common problem is to find the desired piece of information, a trend, or an undiscovered pattern from these sources. The problem is not a new one. Traditionally the problem has been considered under the title of information seeking, this means the science how to find a book in the library. Traditionally the problem has been solved either by classifying and accessing documents by Dewey Decimal Classification system or by giving a number of characteristic keywords. The problem is that nowadays there axe lots of unclassified documents in company databases and in intranet or in Internet. First one defines some terms. Text filtering means an information seeking process in which documents are selected from a dynamic text stream. Text mining is a process of analysing text to extract information from it for particular purposes. Text categorisation means the process of clustering similar documents from a large document set. All these terms have a certain degree of overlapping. Text mining, also know as document information mining, text data mining, or knowledge discovery in textual databases is an merging technology for analysing large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns or knowledge. Typical subproblems that have been solved axe language identification, feature selection/extraction, clustering, natural language processing, summarisation, categorisation, search, indexing, and visualisation. These subproblems are discussed in detail and the most common approaches axe given. Finally some examples of current uses of text mining are given and some potential application areas are mentioned.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 38 条
  • [1] [Anonymous], 1990, Language and representation in information retrieval
  • [2] [Anonymous], P 1996 IEEE VIS LANG
  • [3] [Anonymous], 1949, Human behaviour and the principle of least-effort
  • [4] BACK B, 2001, J EC SOC FINLAND, V54, P39
  • [5] BAEZAYATES RA, 1999, MODERN INFORMATION R
  • [6] Bollacker K. D., 1998, Proceedings of the Second International Conference on Autonomous Agents, P116, DOI 10.1145/280765.280786
  • [7] BRUGGEMANNKLEIN A, 1999, D LIB MAGAZINE NOV, V5
  • [8] DEWEY M, 1876, US BUREAU ED SPECIAL, P623
  • [9] Dewey Melvin., 1876, CLASSIFICATION SUBJE
  • [10] TOPIC PARSING - ACCOUNTING FOR TEXT MACRO STRUCTURES IN FULL-TEXT ANALYSIS
    HAHN, U
    [J]. INFORMATION PROCESSING & MANAGEMENT, 1990, 26 (01) : 135 - 170