Discriminative Word Alignment over Multiple Word Segmentations

被引:0
作者
Xi Ning [1 ]
Dai Xinyu [1 ]
Huang Shujian [1 ]
Chen Jiajun [1 ]
机构
[1] Nanjing Univ, Dept Comp Sci & Technol, State Key Lab Novel Software Technol, Nanjing 210046, Jiangsu, Peoples R China
基金
高等学校博士学科点专项科研基金; 中国国家自然科学基金;
关键词
Natural language processing; Word alignment; Multiple word segmentation; Machine translation;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Conventional bilingual word alignment is conducted on sentence pairs with single word segmentation for languages such as Chinese, viz. Single-segmentation-based word alignment (SSWA). However, SSWA may run the risk of losing optimal word segmentation granularities or causing data sparseness in word alignment. This paper proposes Multiple-segmentation-based word alignment (MSWA). In MSWA, diverse and complementary knowledge in multiple word segmentations can be employed to lower the above risks in word alignment. Given k word segmentations of a Chinese sentence, a skeleton segmentation is firstly constructed. The alignment between the skeleton segmentation and the parallel English sentence is log-linearly modeled, where various features defined over multiple word segmentations are incorporated. The Viterbi alignment, the alignment with the highest score, is mapped back to k word alignments based on k segmentations respectively. Experimentally, MSWA outperformed SSWA on all k segmentations in both alignment quality and translation performance.
引用
收藏
页码:263 / 270
页数:8
相关论文
共 35 条
  • [1] [Anonymous], INT C EACL ATH GREEC
  • [2] [Anonymous], P ASS COMP LING
  • [3] [Anonymous], J NEW TECHNOLOGY LIB
  • [4] [Anonymous], P ACL
  • [5] [Anonymous], P ACL 2011 STUD SESS
  • [6] [Anonymous], P 3 INT JOINT C NAT
  • [7] [Anonymous], INT J ASIAN LANGUAGE
  • [8] [Anonymous], COMPUTATIONAL LINGUI
  • [9] [Anonymous], P JOINT 5 WORKSH STA
  • [10] [Anonymous], P 45 ANN M ASS COMP