RWCP 形態素解析実験(Gaussian prior 付前進 MEMM)
マシン
indigo (Athlon XP 2500+, 1.5GB メモリ, WindowsXP Professional SP2)
素性
RWCP 標準設定
全コーパス中の出現頻度3回未満の素性を破棄
未知語処理
5文字未知語全展開
1回出現の単語を全て擬似未知語として学習
その他パラメータ
素性数:122025
ユニークな単語数:18023
1回のみ出現した単語数:8928 (49.5%)
モデルファイル
indigo 上 rwcp_memm_5_100_3_gaussian_01_3
学習
Likelihood: 0.00235676 -> 0.00234674 (relatice change: 0.00426649)
Log likelihood: -142017 -> -142117 (relative change: 0.00070365)
# of iterations: 34
Elapsed time: 4474.4
Elapsed time per iteration: 131.6
Press any key to continue
定量評価
precision for sentences with correct segmentations: 3473 / 7336 = 0.473419
precision for sentences with correct words: 1726 / 7336 = 0.235278
precision for segmentations: 176174 / 187754 = 0.938324
recall for segmentations: 176174 / 186414 = 0.945069
F-measure for segmentations: 0.941684
precision for POS: 171292 / 187754 = 0.912321
recall for POS: 171292 / 186414 = 0.918879
F-measure for POS: 0.915589
precision for fine POS: 167802 / 187754 = 0.893733
recall for fine POS: 167802 / 186414 = 0.900158
F-measure for fine POS: 0.896934
precision for words: 167077 / 187754 = 0.889872
word recall per words: 167077 / 186414 = 0.896269
F-measure for words: 0.893059
recall for unknown words: 4644 / 8748 = 0.530864
考察
あれ?精度下がった?