RWCP 形態素解析実験(Gaussian prior 付前進 MEMM)

マシン

indigo (Athlon XP 2500+, 1.5GB メモリ, WindowsXP Professional SP2)

素性

RWCP 標準設定
コーパス中の出現頻度3回未満の素性を破棄

未知語処理

5文字未知語全展開
1回出現の単語を全て擬似未知語として学習

その他パラメータ

素性数:122025
ユニークな単語数:18023
1回のみ出現した単語数:8928 (49.5%)

学習モデル

前進 MEMM
Gaussian prior, variance (2*\sigma^2 の \sigma^2) を全素性共通で0.1

モデルファイル

indigo 上 rwcp_memm_5_100_3_gaussian_01_3

学習

Likelihood: 0.00235676 -> 0.00234674 (relatice change: 0.00426649)
Log likelihood: -142017 -> -142117 (relative change: 0.00070365)
# of iterations: 34
Elapsed time: 4474.4
Elapsed time per iteration: 131.6
Press any key to continue

定量評価

precision for sentences with correct segmentations: 3473 / 7336 = 0.473419
precision for sentences with correct words: 1726 / 7336 = 0.235278

precision for segmentations: 176174 / 187754 = 0.938324
recall for segmentations: 176174 / 186414 = 0.945069
F-measure for segmentations: 0.941684

precision for POS: 171292 / 187754 = 0.912321
recall for POS: 171292 / 186414 = 0.918879
F-measure for POS: 0.915589

precision for fine POS: 167802 / 187754 = 0.893733
recall for fine POS: 167802 / 186414 = 0.900158
F-measure for fine POS: 0.896934

precision for words: 167077 / 187754 = 0.889872
word recall per words: 167077 / 186414 = 0.896269
F-measure for words: 0.893059

recall for unknown words: 4644 / 8748 = 0.530864

考察

あれ?精度下がった?