AbjadAuthorID: Shared Task on Authorship Identification in Languages that use the Arabic Script (Multiclass Classification)
Hosted with AbjadNLP(2026) at EACL 2026, Rabat, Morocco
1. Overview of the Shared Task
This shared task, AbjadAuthorID, focuses on a multiclass classification challenge aimed at identifying the author of a given text excerpt from literature across diverse periods.
The task specifically focuses on languages that utilize the Arabic script, including:
- Modern Standard Arabic (MSA)
- Urdu
- Persian
- Kurdish
Authorship identification is a fundamental problem in Natural Language Processing (NLP) and computational linguistics, with applications ranging from digital humanities and literary analysis to forensic linguistics. This task seeks to advance the state-of-the-art in authorship attribution for low-resource and morphologically rich languages that share the Abjad writing system.
2. Motivation
The motivation for this shared task stems from the need to develop robust authorship identification models for languages beyond English and those using the Latin script. Languages like Arabic, Urdu, Persian, and Kurdish share a common script but exhibit distinct linguistic features.
- Script Commonality: Exploring how the shared Abjad script influences authorship attribution features.
- Linguistic Diversity: Addressing the challenges of different language families (Semitic, Indo-Iranian) using the same script.
- Multiclass Classification: Moving beyond binary authorship verification to the more challenging problem of distinguishing between many potential authors.
3. Data
The dataset for this shared task consists of text excerpts from a diverse set of authors in the target languages.
- Languages: MSA, Urdu, Persian, Kurdish.
- Domain: Literature (Novels, Short Stories, Poems, etc.).
- Periods: Texts spanning different historical periods.
Detailed statistics and download links will be provided upon registration.
4. Task Description
Participants are invited to submit systems for the following task:
Authorship Identification (Multiclass Classification)
- Goal: Given a text excerpt, predict the correct author from a closed set of candidate authors.
- Input: A text segment (e.g., a paragraph or page).
- Output: The ID or Name of the author.
- Evaluation Metric: Macro-F1 Score (primary), Accuracy (secondary).
5. Tentative Timeline
- June 10, 2025: Release of training data
- July 20, 2025: Release of test data
- July 25, 2025: End of evaluation cycle (test submissions close)
- July 30, 2025: Final results released
- August 22, 2025: Shared task papers due date
- August 25, 2025: Notification of acceptance
- September 5, 2025: Camera-ready versions due
- November 5–9, 2025: Main Conference
6. Organizers
- Shadi Abudalfa, King Fahd University of Petroleum & Minerals
- Saad Ezzini, King Fahd University of Petroleum & Minerals
- Ahmed Abdelali, HUMAIN
- Mustafa Jarrar, Hamad Bin Khalifa University / Birzeit University
- Mo El-Haj, VinUniversity / Lancaster University
7. Participation Guidelines
- For participation guidelines, please refer to Participation Guidelines.
- Comprehensive instructions for preparing and submitting your paper(s) are available at Paper Submission Guidelines.
References
- Stamatatos, E. “A survey of modern authorship attribution methods.” Journal of the American Society for information Science and Technology, 60(3), 538-556, 2009.
- Koppel, M., et al. “Computational methods in authorship attribution.” Journal of the American Society for information Science and Technology, 60(1), 9-26, 2009.
Logistics and Support
- Website Hosting: GitHub Pages
- Submission System: Codabench
- Communication Channels:
- Slack workspace
- Mailing list
- GitHub repository
- Regular updates and participant support
Stay Updated
- Official GitHub Repository: [Link to Repo]
- Join our Slack community
Announcements and Updates
- The final phase will begin on July 20, 2025 (UTC-12h) and will run for 5 days, ending on July 25, 2025 (UTC-12h). During this time, each team must submit their predictions on the test set (not the validation set) via the Codabench system. Only submissions made within this timeframe will be considered as official contributions when submitting your papers to the OpenReview system.
- The main content of your paper should not exceed 4 pages. There is no page limit on appendices and references; these sections are excluded from the 4-page limit.
Anti-Harassment policy
We uphold the ACL Anti-Harassment Policy, and participants in this shared task are encouraged to reach out with any concerns or questions to any of the shared task organizers.