首页 | 本学科首页   官方微博 | 高级检索  
     

用基于词的二元模型消解交集型分词歧义
引用本文:陈小荷. 用基于词的二元模型消解交集型分词歧义[J]. 金陵法律评论, 2004, 0(6): 109-113
作者姓名:陈小荷
作者单位:南京师范大学文学院 江苏
摘    要:解决交集型分词歧义问题,对于大规模语料库建设具有十分重要的意义.我们用基于词的二元模型对两个各200万字的语料库中的三字长交集型字串进行了消歧实验,封闭测试正确率达到99%以上,开放测试正确率达到90%以上,比以往最好结果有明显的提高.

关 键 词:中文信息处理  基于词的二元模型  交集型分词歧义
文章编号:1001-4608(2004)06-0109-05
修稿时间:2004-06-09

Using Word-based Bi-gram as a Discriminator for Crossing Ambiguities in Chinese Word Segmentation
CHEN Xiao-he. Using Word-based Bi-gram as a Discriminator for Crossing Ambiguities in Chinese Word Segmentation[J]. Journal of Nanjing Normal University (Social Science Edition), 2004, 0(6): 109-113
Authors:CHEN Xiao-he
Abstract:It is very important to solve the crossing ambiguities in word segmentation for Chinese information processing. We employ the word-based bi-gram to discriminate the 3-character crossing ambiguous string in two corpora. The precision rates are above 99% and 90% respectively in close test and open test, which are much higher than the best results yielded before.
Keywords:Chinese information processing  Word-based Bi-gram  crossing ambiguities in Chinese word segmentation
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号