Library and Information Science Society Home → English page

三田図書館・情報学会誌論文(論文ID LIS054001)

著者
安形輝
和文タイトル
圧縮プログラムを応用した著者推定
英文タイトル
Authorship Attribution by Data Compression Program
掲載号・頁
No.54, p.1-18
発行日
2006-03-10
英文抄録

Benedetto et al. recently confirmed the validity of a method for measuring similarity using data compression software. Despite its potential, this method has not yet been applied to the field of information science. The present study proposes the use of CIR, a modified method that uses an improved ratio of compression, and describes two experiments on authorship attribution using data from modern Japanese literature. The first experiment compares the results of applying CIR and Benedetto’s method to test collections of modified data (fixed length) using a procedure similar to that described by Matsuura et al. The second experiment is based on original data (variable length).

The first experiment showed an average precision rate of 97.7% for CIR, while Benedetto’s method gave a rate of 90.5%. The CIR method proves to be an improvement on the best method described by Matsuura et al. The second experiment confirmed the effectiveness of the CIR method, giving an average precision rate of 95.7%.

論文本文
本文PDF (332K)
種別
原著論文