Deep Neural Networks for Social Media Word Segmentation of Asian Languages

被引:0
|
作者
Ngoc Tan Le [1 ]
Sadat, Fatiha [1 ]
机构
[1] Univ Quebec Montreal, Dept Comp Sci, Montreal, PQ, Canada
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
data mining; information extraction; deep neural network; NLP; word segmentation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information extraction today faces new challenges with noisy, short, unstructured data. This is especially the case for social media messages, such as tweets, in which language can be erroneous or cryptic, and contains references to a great number of new entities. Traditional NLP systems are challenged and need to develop new strategies to handle with these data. With the emergence of the neural network-based approach, the research about the word segmentation has benefited from large-scale raw texts by leveraging them for pretrained character and word embeddings. To this end, we experimented the use of both character and word embeddings to provide extra features to input layer of our neural network-based system architecture. This system has been tested on both Chinese and Japanese social media datasets. With the help of rich pretrained embeddings, our model achieved the promising results both on Chinese and Japanese social media word segmentation task by comparing with the state-of-the-art NLP tools.
引用
收藏
页码:2314 / 2318
页数:5
相关论文
共 50 条
  • [1] Deep Neural Networks Algorithm for Vietnamese Word Segmentation
    Zheng, Kexiao
    Zheng, Wenkui
    SCIENTIFIC PROGRAMMING, 2022, 2022
  • [2] Text classification in Asian languages without word segmentation
    Peng, Fuchun
    Huang, Xiangji
    Schuurmans, Dale
    Wang, Shaojun
    Proceedings of the 6th International Workshop on Information Retrieval with Asian Languages, IRAL 2003, 2003, : 41 - 48
  • [3] Word Segmentation by Separation Inference for East Asian Languages
    Tong, Yu
    Guo, Jingzhi
    Zhou, Jizhe
    Chen, Ge
    Zheng, Guokai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3924 - 3934
  • [4] Joint Chinese Word Segmentation and Punctuation Prediction Using Deep Recurrent Neural Network for Social Media Data
    Wu, Kui
    Wang, Xuancong
    Zhou, Nina
    Aw, AiTi
    Li, Haizhou
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 41 - 44
  • [5] Separation Inference: A Unified Framework for Word Segmentation in East Asian Languages
    Tong, Yu
    Guo, Jingzhi
    Zhou, Jizhe
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1521 - 1530
  • [6] Optimal Word Segmentation for Neural Machine Translation into Dravidian Languages
    Dhar, Prajit
    Bisazza, Arianna
    van Noord, Gertjan
    WAT 2021: THE 8TH WORKSHOP ON ASIAN TRANSLATION, 2021, : 181 - 190
  • [7] Multimodal Social Media Video Classification with Deep Neural Networks
    Trzcinski, Tomasz
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2018, 2018, 10808
  • [8] Revisiting Tibetan Word Segmentation with Neural Networks
    Duanzhu, Sangjie
    Jiacuo, Cizhen
    Jia, Cairang
    CHINESE LEXICAL SEMANTICS (CLSW 2020), 2021, 12278 : 515 - 524
  • [9] Detecting predatory conversations in social media by deep Convolutional Neural Networks
    Ebrahimi, Mohammadreza
    Suen, Ching Y.
    Ormandjieva, Olga
    DIGITAL INVESTIGATION, 2016, 18 : 33 - 48
  • [10] Neural Networks Incorporating Dictionaries for Chinese Word Segmentation
    Zhang, Qi
    Liu, Xiaoyu
    Fu, Jinlan
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5682 - 5689