Skip to the content.

AraSentEval 2026: A Shared Task on Sentiment Analysis and Swapping in Arabic

Hosted with OSACT7 Workshop at LREC 2026

1. Overview of the Proposed Task

Sentiment analysis remains a cornerstone of Natural Language Processing (NLP), with critical applications ranging from social media monitoring to customer feedback analysis. While significant progress has been made, the Arabic language presents unique challenges due to its rich morphology and extensive dialectal variation. Most user-generated content, the primary source for sentiment data, is written in dialectal Arabic, which is often under-resourced and differs significantly from Modern Standard Arabic (MSA).

To address these challenges and foster innovation in Arabic NLP, we propose the Shared Task on Sentiment Analysis and Swapping in Arabic (AraSentEval). This task is designed to move beyond standard sentiment classification by evaluating both the understanding and generation of sentiment in diverse Arabic contexts. AraSentEval comprises two distinct but complementary subtasks:

2. Motivation for the Task

The motivation for AraSentEval is twofold. First, there is a persistent need for high-quality benchmarks and models for Arabic dialect sentiment analysis. While previous shared tasks have addressed Arabic sentiment (El-Beltagy et al., 2017; Rosenthal et al., 2017), the dialectal aspect remains a significant hurdle. By providing a new, multi-dialect dataset, we aim to spur the development of models that are more effective on real-world, user-generated data.

Second, the field of Arabic NLP is mature enough to move towards more complex generative tasks. Sentiment swapping, a form of text style transfer, is a challenging NLG problem that has been explored in English (Shen et al., 2017) but remains largely untouched for Arabic. This is largely due to the scarcity of high-quality, parallel datasets required to train and robustly evaluate such models. Success in this task has direct applications in data augmentation, controlling the tone of conversational agents, and creative content generation. Subtask 1 pushes the community to develop models with more nuanced generative capabilities for Arabic.

By combining a classification task on diverse dialects with a challenging generative task, AraSentEval will provide a comprehensive benchmark for sentiment understanding and manipulation in Arabic, fostering the development of more sophisticated and practically useful models.

3. Data/Resource Collection and Creation

The datasets for both subtasks are ready, having been collected and annotated.

Current Status: The core dataset is ready. We are in the process of expanding it to include more examples and potentially add two more dialects: Jordanian and Yemeni, to further increase the task’s diversity and challenge before its official release.

4. Task Description

Participants can choose to participate in one or both subtasks. We will use the CodaLab platform for running both subtasks, which will handle submissions and host the official leaderboards.

Subtask 1: Arabic Dialect Sentiment Analysis

Subtask 2: Arabic Sentiment Swap

5. Pilot Run Details

We conducted an internal pilot run for both subtasks to validate the datasets and evaluation methodology. For Subtask 1, an initial version of the dataset was verified in a recently organized shared task at RANLP ‘2025. For Subtask 2, the dataset was benchmarked using several state-of-the-art Large Language Models, including AceGPT, JAIS, and Llama-3. These models were evaluated in various settings (zero-shot, few-shot, and fine-tuning), confirming the dataset’s quality and the task’s feasibility. The pilot also highlighted the necessity of combining automatic metrics with human judgment for a fair assessment of NLG quality.

6. Tentative Timeline

We will adhere to the tentative timeline proposed by the OSACT7 organizers.

Organizers

Participation Guidelines

References