Library and Information Science Society Home → English page

三田図書館・情報学会誌論文(論文ID LIS026067)

著者
海野敏
和文タイトル
出現頻度情報に基づく単語重みづけの原理
英文タイトル
Some Principles of Weighting Methods Based on Word Frequencies for Automatic Indexing
掲載号・頁
No.26, p.67-88
発行日
1989-03-25
英文抄録

Characteristics of the occurrence frequency of words in natural language texts have been used as an indicator for the selection of significant words in automatic indexing. This paper describes some general principles common to term weighting methods which use occurrence frequency measures.

For this purpose, nearly sixty weighting fomulas were collected from the documents published in the past thirty years. Then their theoretical characteristics were analyzed and compared with each other. As a result, these formulas were classified into following five categories.

1) absolute frequency measures

2) two kinds of relative frequency measures

3) word dispersion measures

4) 2-Poisson model proposed by Harter

5) information theory similar to the one proposed by Shannon

Various mathematical relations peculiar to the formulas of each category were found. These relations were well explained by a model consisting of two kinds of word sets, one of which is subsumed by the other; that is, the significance of a word depended on the degree of its maldistribution to the subsumed word set.

論文本文
本文PDF (2,760K)
種別
原著論文