Corpus-Based Methods for Recognizing the Gender of Anthroponyms

被引：1

作者：

Nazar, Rogelio ^{[1
]}

Renau, Irene ^{[1
]}

Acosta, Nicolas ^{[1
]}

Robledo, Hernan ^{[1
]}

Soliman, Maha ^{[1
]}

机构：

[1] Pontificia Univ Catolica Valparaiso, Inst Literatura & Ciencias Lenguaje, Valparaiso, Chile

来源：

NAMES-A JOURNAL OF ONOMASTICS | 2021年 / 69卷 / 03期

关键词：

anthroponymy; co-occurrence statistics; corpus linguistics; gender recognition; given names; Spanish; NAMES;

D O I：

10.5195/names.2021.2238

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

This paper presents a series of methods for automatically determining the gender of proper names, based on their co-occurrence with words and grammatical features in a large corpus. Although the results obtained were for Spanish given names, the method presented here can be easily replicated and used for names in other languages. Most methods reported in the literature use pre-existing lists of first names that require costly manual processing and tend to become quickly outdated. Instead, we propose using corpora. Doing so offers the possibility of obtaining real and up-to-date name-gender links. To test the effectiveness of our method, we explored various machine-learning methods as well as another method based on simple frequency of co-occurrence. The latter produced the best results: 93% precision and 88% recall on a database of ca. 10,000 mixed names. Our method can be applied to a variety of natural language processing tasks such as information extraction, machine translation, anaphora resolution or large-scale delivery or email correspondence, among others.

引用

页码：16 / 27

页数：12

共 50 条

[41] A corpus-based analysis of trainee translators' performance in medical translation
Pan, Yun
ASIA PACIFIC TRANSLATION AND INTERCULTURAL STUDIES, 2021, 8 (03) : 267 - 285
[42] Conclusive English then and Swedish da A corpus-based contrastive study
Altenberg, Bengt
LANGUAGES IN CONTRAST, 2010, 10 (01) : 102 - 123
[43] The use of space with indicating verbs in Auslan A corpus-based investigation
de Beuzeville, Louise
Johnston, Trevor
Schembri, Adam
SIGN LANGUAGE & LINGUISTICS, 2009, 12 (01) : 53 - 82
[44] A Corpus-based Analysis of Mixed Code in Hong Kong Speech
Lee, John
2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 165 - 168
[45] Khipu Transcription Typologies: A Corpus-Based Study of the Textos Andinos
Medrano, Manuel
ETHNOHISTORY, 2021, 68 (02) : 311 - 341
[46] Learn to blend in!: A corpus-based analysis of the representation of women in mining
Norberg, Cathrine
Faltholm, Ylva
EQUALITY DIVERSITY AND INCLUSION, 2018, 37 (07): : 698 - 712
[47] The semantics of English out-prefixation: a corpus-based investigation
Kotowski, Sven
ENGLISH LANGUAGE & LINGUISTICS, 2021, 25 (01) : 61 - 89
[48] A Constructionist and Corpus-Based Approach to Formulas in Old English Poetry
Ginevra, Riccardo
Biagetti, Erica
Villa, Luca Brigada
Giarda, Martina
LANGUAGES, 2024, 9 (07)
[49] CORPUS-BASED TRANSLATION STUDIES IN HIGHER EDUCATION: A CASE STUDY
Carloni, G.
10TH INTERNATIONAL CONFERENCE OF EDUCATION, RESEARCH AND INNOVATION (ICERI2017), 2017, : 7676 - 7676
[50] Depictions of deception: A corpus-based analysis of five Shakespearean characters
Archer, Dawn
Gillings, Mathew
LANGUAGE AND LITERATURE, 2020, 29 (03) : 246 - 274

← 1 2 3 4 5 →