Deriving Term Relations for a Corpus by Graph Theoretical Clusters

被引:14
作者
Augustson, J. Gary [1 ]
Minker, Jack [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
来源
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE | 1970年 / 21卷 / 02期
关键词
D O I
10.1002/asi.4630210202
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We discuss how alternative methods of automatic term clustering may provide insight into how terms are related within a corpus. The work reported uses a corpus of 2267 documents that contain 3950 index terms. A similarity matrix is developed using the document-term matrix. A threshold level T is applied to the similarity matrix. Entries in the matrix that are greater than or equal to the threshold level are set equal to one, and the remaining entries are set to zero. Three definitions are applied to the corresponding graph of each threshold matrix to develop clusters. These are, (1) the connected components of the graph, (2) the maximal complete subgraphs of the graph, and (3) the combined maximal complete subgraphs of the graph as described by Gotlieb and Kumar. Two examples are described that show how insight may be gained into the term relations by varying the threshold levels and the cluster definitions.
引用
收藏
页码:101 / 111
页数:11
相关论文
共 28 条
  • [1] ABRAHAM CT, 1965, SOME PROBLEMS INFORM
  • [2] AUGUSTSON JG, 1970, TR70106 U MAR COMP S
  • [3] AUGUSTSON JG, 1969, THESIS U MARYLAND
  • [4] Ball Geoffrey H., 1965, P NOVEMBER 30 DECEMB, P533
  • [5] BIERSTONE E, CLIQUES GEN CL UNPUB
  • [6] BONNER RE, 1964, J RESEARCH DE DEVELO, V8, P22
  • [7] BORKO H, 1963, TM77100100 SYST DEV
  • [8] *DEF DOC CTR DEF S, 1965, AD624000 DEF DOC CTR
  • [9] Gerard Salton, 1968, AUTOMATIC INFORM ORG
  • [10] GIULIANO VE, 1963, VISTAS INFORM HANDLI, V1