AbjadAuthorID: Shared Task on Authorship Identification in Languages that use the Arabic Script (Multiclass Classification)

Hosted with AbjadNLP(2026) at EACL 2026, Rabat, Morocco

1. Overview of the Shared Task

This shared task, AbjadAuthorID, focuses on a multiclass classification challenge aimed at identifying the author of a given text excerpt from literature across diverse periods.

The task specifically focuses on languages that utilize the Arabic script, including:

MSA
Urdu
Kurdish

Authorship identification is a fundamental problem in Natural Language Processing (NLP) and computational linguistics, with applications ranging from digital humanities and literary analysis to forensic linguistics. This task seeks to advance the state-of-the-art in authorship attribution for low-resource and morphologically rich languages that share the Abjad writing system.

2. Motivation

The motivation for this shared task stems from the need to develop robust authorship identification models for languages beyond English and those using the Latin script. Languages like Arabic, Urdu, Persian, and Kurdish share a common script but exhibit distinct linguistic features.

Script Commonality: Exploring how the shared Abjad script influences authorship attribution features.
Linguistic Diversity: Addressing the challenges of different language families (Semitic, Indo-Iranian) using the same script.
Multiclass Classification: Moving beyond binary authorship verification to the more challenging problem of distinguishing between many potential authors.

3. Data

The dataset for this shared task consists of text excerpts from a diverse set of authors in the target languages.

Languages: MSA, Urdu, and Kurdish.
Domain: Literature (Novels, Short Stories, Poems, etc.).
Periods: Texts spanning different historical periods.

Detailed statistics and download links will be provided upon registration.

4. Task Description

Participants are invited to submit systems for the following task:

Authorship Identification (Multiclass Classification)

Goal: Given a text excerpt, predict the correct author from a closed set of candidate authors.
Input: A text segment (e.g., a paragraph or page).
Output: The ID or Name of the author.
Evaluation Metric: Macro-F1 Score (primary), Accuracy (secondary).

5. Tentative Timeline

December 25, 2025: Release of training data
January 10, 2026: Release of test data
January 15, 2026: End of evaluation cycle (test submissions close)
January 17, 2026: Final results released
January 21, 2025: Shared task papers due date
January 27, 2026: Notification of acceptance
February 3, 2026: Camera-ready versions due
March 24–29, 2026: Workshop Dates [TBD]

6. Organizers

Shadi Abudalfa, King Fahd University of Petroleum & Minerals
Saad Ezzini, King Fahd University of Petroleum & Minerals
Ahmed Abdelali, HUMAIN
Mustafa Jarrar, Hamad Bin Khalifa University / Birzeit University
Mo El-Haj, VinUniversity / Lancaster University
Nadir Durrani, HBKU
Hassan Sajjad, Dalhousie University
Farah Adeeba, University of Massachusetts Amherst
Sina Ahmadi, University of Zurich

7. Participation Guidelines

Access the following links on Codabench platform to access the datasets and train/evalaute the perfomance of your system/model:
- MSA
- Urdu
- Kurdish
For participation guidelines, please refer to Participation Guidelines.
Comprehensive instructions for preparing and submitting your paper(s) are available at Paper Submission Guidelines.

References

Stamatatos, E. “A survey of modern authorship attribution methods.” Journal of the American Society for information Science and Technology, 60(3), 538-556, 2009.
Koppel, M., et al. “Computational methods in authorship attribution.” Journal of the American Society for information Science and Technology, 60(1), 9-26, 2009.

Logistics and Support

Website Hosting: GitHub Pages
Submission System: Codabench
Communication Channels:
- Slack workspace
- Mailing list
- GitHub repository
Regular updates and participant support

Stay Updated

Official GitHub Repository: [Link to Repo]
Join our Slack community

Announcements and Updates

The final phase will begin on July 20, 2025 (UTC-12h) and will run for 5 days, ending on July 25, 2025 (UTC-12h). During this time, each team must submit their predictions on the test set (not the validation set) via the Codabench system. Only submissions made within this timeframe will be considered as official contributions when submitting your papers to the OpenReview system.
The main content of your paper should not exceed 4 pages. There is no page limit on appendices and references; these sections are excluded from the 4-page limit.

Anti-Harassment policy

We uphold the ACL Anti-Harassment Policy, and participants in this shared task are encouraged to reach out with any concerns or questions to any of the shared task organizers.