Segmenting Brazilian legislative text using weak supervision and active learning

被引:0
|
作者
Siqueira, Felipe A. [1 ]
Pressato, Diany [1 ]
Pereira, Fabiola S. F. [1 ,2 ]
da Silva, Nadia F. F. [1 ,3 ]
Souza, Ellen [1 ,4 ]
Dias, Marcio S. [1 ,5 ]
de Carvalho, Andre C. P. L. F. [1 ]
机构
[1] Univ Sao Paulo, Inst Math Sci & Computat, Sao Carlos, SP, Brazil
[2] Univ Fed Uberlandia, Uberlandia, MG, Brazil
[3] Univ Fed Goias, Goiania, GO, Brazil
[4] Rural Fed Univ Pernambuco, Serra Talhada, PE, Brazil
[5] Fed Univ Catalao, Catalao, Go, Brazil
基金
巴西圣保罗研究基金会;
关键词
Text segmentation; Legislative domain; Weak supervision; Active Learning; Portuguese data;
D O I
10.1007/s10506-024-09419-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Legislative houses all over the world are adopting tools based on artificial intelligence to support their work. The incorporation of these tools can improve the analysis of the text of the proposed new laws and speed the preparation and discussion of new laws. The performance of artificial intelligence tools for text processing tasks is largely affected by the corpora used, which should ideally be adapted for the specific domain. When dealing with legislative corpora, text segmentation is often necessary due to the distinct purposes of legislative segments within the overall bill structure. While rule-based approaches can be effective in cases where the data follows a consistent format, they fail when inconsistencies arise in the formatting of legislative bills. In this study, we extensively investigate the use of weak supervision and active learning to accurately segment over 100,000 Brazilian federal legislative bills using a sequence tagging approach. The experiments demonstrated that both BERT and LSTM models achieved high statistical performance without the limitations of rule-based systems. In segmenting long documents beyond the limited context window of BERT, we find that simple moving windows suffice because the required context for accurate legislative segmentation is mostly local. We also conducted an analysis of transfer learning from our monolingual models to French, Italian, German, and English (US) legislative texts. According to our experimental results our models present non-trivial zero-shot and effective out-of-distribution fine-tuning performance, suggesting potential avenues for multilingual legislative segmentation without the need for computationally expensive models. The models, data, and code are publicly available at https://github.com/ulysses-camara/ulysses-segmenter.
引用
收藏
页数:82
相关论文
共 50 条
  • [41] Denoising Multi-Source Weak Supervision for Neural Text Classification
    Ren, Wendi
    Li, Yinghao
    Su, Hanting
    Kartchner, David
    Mitchell, Cassie
    Zhang, Chao
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
  • [42] Segmenting film sequences using active surfaces
    Hall, J
    Greenhill, D
    Jones, GA
    INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL I, 1997, : 751 - 754
  • [43] Learning to Segment Under Various Forms of Weak Supervision
    Xu, Jia
    Schwing, Alexander G.
    Urtasun, Raquel
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 3781 - 3790
  • [44] DEEP LEARNING FOR WEAK SUPERVISION OF DIABETIC RETINOPATHY ABNORMALITIES
    Ahmad, Maroof
    Kasukurthi, Nikhil
    Pande, Harshit
    2019 IEEE 16TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2019), 2019, : 573 - 577
  • [45] Government and Opposition in Legislative Speechmaking: Using Text-As-Data to Estimate Brazilian Political Parties' Policy Positions
    Izumi, Mauricio Y.
    Medeiros, Danilo B.
    LATIN AMERICAN POLITICS AND SOCIETY, 2021, 63 (01) : 145 - 164
  • [46] Weak Human Preference Supervision for Deep Reinforcement Learning
    Cao, Zehong
    Wong, KaiChiu
    Lin, Chin-Teng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (12) : 5369 - 5378
  • [47] Driving Content Recommendations by Building a Knowledge Base Using Weak Supervision and Transfer Learning
    Deb, Sanghamitra
    RECSYS 2019: 13TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, 2019, : 531 - 531
  • [48] DPF: Learning Dense Prediction Fields with Weak Supervision
    Chen, Xiaoxue
    Zheng, Yuhang
    Zheng, Yupeng
    Zhou, Qiang
    Zhao, Hao
    Zhou, Guyue
    Zhang, Ya-Qin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15347 - 15357
  • [49] Weak Supervision Network Embedding for Constrained Graph Learning
    Guo, Ting
    Zhu, Xingquan
    Wang, Yang
    Chen, Fang
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT I, 2021, 12712 : 488 - 500
  • [50] Weak Supervision Learning for Object Co-Segmentation
    Huang, Aiping
    Zhao, Tiesong
    IEEE TRANSACTIONS ON BIG DATA, 2020, 8 (04) : 1129 - 1140