Details
Original language | English |
---|---|
Title of host publication | 2022 Language Resources and Evaluation Conference, LREC 2022 |
Editors | Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Jan Odijk, Stelios Piperidis |
Pages | 4316-4323 |
Number of pages | 8 |
ISBN (electronic) | 9791095546726 |
Publication status | Published - 2022 |
Event | 13th International Conference on Language Resources and Evaluation Conference, LREC 2022 - Marseille, France Duration: 20 Jun 2022 → 25 Jun 2022 |
Abstract
The disambiguation of causative-passive homonymy (CPH) is potentially tricky for machines, as the causative and the passive are not distinguished by the sentences' syntactic structure. By transforming CPH disambiguation to a challenging natural language inference (NLI) task, we present the first Chinese Adversarial NLI challenge set (CANLI). We show that the pretrained transformer model RoBERTa, fine-tuned on an existing large-scale Chinese NLI benchmark dataset, performs poorly on CANLI. We also employ Word Sense Disambiguation as a probing task to investigate to what extent the CPH feature is captured in the model's internal representation. We find that the model's performance on CANLI does not correspond to its internal representation of CPH, which is the crucial linguistic ability central to the CANLI dataset. CANLI is available on Hugging Face Datasets (Lhoest et al., 2021) at https://huggingface.co/datasets/sxu/CANLI.
Keywords
- adversarial dataset, causative-passive homonymy, Chinese, natural language inference
ASJC Scopus subject areas
- Arts and Humanities(all)
- Language and Linguistics
- Social Sciences(all)
- Library and Information Sciences
- Social Sciences(all)
- Linguistics and Language
- Social Sciences(all)
- Education
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
2022 Language Resources and Evaluation Conference, LREC 2022. ed. / Nicoletta Calzolari; Frederic Bechet; Philippe Blache; Khalid Choukri; Christopher Cieri; Thierry Declerck; Sara Goggi; Hitoshi Isahara; Bente Maegaard; Joseph Mariani; Helene Mazo; Jan Odijk; Stelios Piperidis. 2022. p. 4316-4323.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - The Chinese Causative-Passive Homonymy Disambiguation
T2 - 13th International Conference on Language Resources and Evaluation Conference, LREC 2022
AU - Xu, Shanshan
AU - Markert, Katja
N1 - Funding Information: This work was supported by German Federal Ministry of Education and Research (BMBF) under grant agreement No. 01IS19063A. We would like to thank Yinjun Wang for his diligent proofreading of the dataset; and our native speakers Yang Li, Tingxian Wu, Qinghua Chen, Jiaying Ma and Yong Xu for their great efforts. We also thank 3 anonymous reviewers for their insightful comments.
PY - 2022
Y1 - 2022
N2 - The disambiguation of causative-passive homonymy (CPH) is potentially tricky for machines, as the causative and the passive are not distinguished by the sentences' syntactic structure. By transforming CPH disambiguation to a challenging natural language inference (NLI) task, we present the first Chinese Adversarial NLI challenge set (CANLI). We show that the pretrained transformer model RoBERTa, fine-tuned on an existing large-scale Chinese NLI benchmark dataset, performs poorly on CANLI. We also employ Word Sense Disambiguation as a probing task to investigate to what extent the CPH feature is captured in the model's internal representation. We find that the model's performance on CANLI does not correspond to its internal representation of CPH, which is the crucial linguistic ability central to the CANLI dataset. CANLI is available on Hugging Face Datasets (Lhoest et al., 2021) at https://huggingface.co/datasets/sxu/CANLI.
AB - The disambiguation of causative-passive homonymy (CPH) is potentially tricky for machines, as the causative and the passive are not distinguished by the sentences' syntactic structure. By transforming CPH disambiguation to a challenging natural language inference (NLI) task, we present the first Chinese Adversarial NLI challenge set (CANLI). We show that the pretrained transformer model RoBERTa, fine-tuned on an existing large-scale Chinese NLI benchmark dataset, performs poorly on CANLI. We also employ Word Sense Disambiguation as a probing task to investigate to what extent the CPH feature is captured in the model's internal representation. We find that the model's performance on CANLI does not correspond to its internal representation of CPH, which is the crucial linguistic ability central to the CANLI dataset. CANLI is available on Hugging Face Datasets (Lhoest et al., 2021) at https://huggingface.co/datasets/sxu/CANLI.
KW - adversarial dataset
KW - causative-passive homonymy
KW - Chinese
KW - natural language inference
UR - http://www.scopus.com/inward/record.url?scp=85143735257&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85143735257
SP - 4316
EP - 4323
BT - 2022 Language Resources and Evaluation Conference, LREC 2022
A2 - Calzolari, Nicoletta
A2 - Bechet, Frederic
A2 - Blache, Philippe
A2 - Choukri, Khalid
A2 - Cieri, Christopher
A2 - Declerck, Thierry
A2 - Goggi, Sara
A2 - Isahara, Hitoshi
A2 - Maegaard, Bente
A2 - Mariani, Joseph
A2 - Mazo, Helene
A2 - Odijk, Jan
A2 - Piperidis, Stelios
Y2 - 20 June 2022 through 25 June 2022
ER -