default search action
4th Eval4NLP 2023: Bali, Indonesia
- Daniel Deutsch, Rotem Dror, Steffen Eger, Yang Gao, Christoph Leiter, Juri Opitz, Andreas Rücklé:
Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2023, Bali, Indonesia, November 1, 2023. Association for Computational Linguistics 2023, ISBN 979-8-89176-021-9 - Lukas Weber, Krishnan Jothi Ramalingam, Matthias Beyer, Axel Zimmermann:
WRF: Weighted Rouge-F1 Metric for Entity Recognition. 1-11 - Vatsal Raina, Adian Liusie, Mark J. F. Gales:
Assessing Distractors in Multiple-Choice Tests. 12-22 - Yixuan Wang, Qingyan Chen, Duygu Ataman:
Delving into Evaluation Metrics for Generation: A Thorough Assessment of How Metrics Generalize to Rephrasing Across Languages. 23-31 - Zahra Kolagar, Sebastian Steindl, Alessandra Zarcone:
EduQuick: A Dataset Toward Evaluating Summarization of Informal Educational Content for Social Media. 32-48 - Nitin Ramrakhiyani, Vasudeva Varma, Girish K. Palshikar, Sachin Pawar:
Zero-shot Probing of Pretrained Language Models for Geography Knowledge. 49-61 - Yanran Chen, Steffen Eger:
Transformers Go for the LOLs: Generating (Humourous) Titles from Scientific Abstracts End-to-End. 62-84 - Jeremy Block, Yu-Peng Chen, Abhilash Budharapu, Lisa Anthony, Bonnie J. Dorr:
Summary Cycles: Exploring the Impact of Prompt Engineering on Large Language Models' Interaction with Interaction Log Information. 85-99 - Savita Bhat, Vasudeva Varma:
Large Language Models As Annotators: A Preliminary Evaluation For Annotating Low-Resource Language Content. 100-107 - Jad Doughman, Shady Shehata, Leen Al Qadi, Youssef Nafea, Fakhri Karray:
Can a Prediction's Rank Offer a More Accurate Quantification of Bias? A Case Study Measuring Sexism in Debiased Language Models. 108-116 - Christoph Leiter, Juri Opitz, Daniel Deutsch, Yang Gao, Rotem Dror, Steffen Eger:
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics. 117-138 - Rui Zhang, Fuhai Song, Hui Huang, Jinghao Yuan, Muyun Yang, Tiejun Zhao:
HIT-MI&T Lab's Submission to Eval4NLP 2023 Shared Task. 139-148 - Abhishek Pradhan, Ketan Kumar Todi:
Understanding Large Language Model Based Metrics for Text Summarization. 149-155 - Pavan Baswani, Ananya Mukherjee, Manish Shrivastava:
LTRC_IIITH's 2023 Submission for Prompting Large Language Models as Explainable Metrics Task. 156-163 - Joonghoon Kim, Sangmin Lee, Seung Hun Han, Saeran Park, Jiyoon Lee, Kiyoon Jeong, Pilsung Kang:
Which is better? Exploring Prompting Strategy For LLM-based Metrics. 164-183 - Yuan Lu, Yu-Ting Lin:
Characterised LLMs Affect its Evaluation of Summary and Translation. 184-192 - Abbas Akkasi, Kathleen C. Fraser, Majid Komeili:
Reference-Free Summarization Evaluation with Large Language Models. 193-201 - Neema Kotonya, Saran Krishnasamy, Joel R. Tetreault, Alejandro Jaimes:
Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared Task. 202-218 - Ghazaleh Mahmoudi:
Exploring Prompting Large Language Models as Explainable Metrics. 219-227 - Daniil Larionov, Vasiliy Viskov, George Kokush, Alexander Panchenko, Steffen Eger:
Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG Evaluation. 228-234
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.