Skip to the content.

AbjadGenEval: Abjad AI Generated Text Detection Shared Task for Languages Using Arabic Script

Hosted with AbjadNLP Workshop within EACL 2026 Conference

1. Overview of the Shared Task

The rapid expansion of user-generated content across social media, digital news platforms, and online communication has created a growing demand for sophisticated Natural Language Processing (NLP) techniques to distinguish between human-written and machine-generated text. Recent advances in multilingual Large Language Models (LLMs) have made it increasingly difficult to identify AI-generated content, particularly in low-to-medium resource languages.

This shared task, AbjadGenEval, focuses specifically on languages utilizing the Abjad (Arabic) script, covering Arabic, Urdu, and Persian. While AI detection tools are maturing for English, performance often degrades significantly for Abjad languages due to complex morphology, script specificities, and varying degrees of data availability.

We invite participants to develop robust models for the following main task:

2. Motivation

AI-generated text detection is critical for maintaining information integrity in education, journalism, and social media. The motivation for launching this shared task arises from the increasing capabilities of LLMs in generating fluent text in Arabic, Urdu, and Persian, combined with the lack of robust detection benchmarks for these specific languages.

The Abjad domain presents distinct challenges:

Our goal is to inspire researchers to tackle these challenges and enhance detection techniques specifically for the Abjad ecosystem.

3. Data Collection and Creation

The dataset for this task is a curated collection of human and machine-generated texts across the three target languages.

4. Task Description: Abjad AI-Generated Text Detection

Participants are required to build models that can classify a given text as either human-written or AI-generated. The task is divided into three language-specific tracks and one combined track:

  1. Arabic Track: Detection on Arabic data only.

  2. Urdu Track: Detection on Urdu data only.

  3. Persian Track: Detection on Persian data only.

  4. Multilingual Abjad Track: A unified model evaluated across all three languages.

5. Tentative Timeline

6. Organizers’ Details

7. Participation Guidelines


Logistics and Support

Stay Updated

Announcements and Updates

Anti-Harassment policy

We uphold the ACL Anti-Harassment Policy, and participants in this shared task are encouraged to reach out with any concerns or questions to any of the shared task organizers.