HaHackathon Semeval2021

Task 7: Hahackathon: Linking Humor and Offense Across Different Age Groups

Background and Motivation

Humor, like most figurative language, poses interesting linguistic challenges to NLP, due to its emphasis on multiple word senses, cultural knowledge, and pragmatic competence. Humor appreciation is also a highly subjective phenomenon, with age, gender and socio-economic status known to have an impact on the perception of a joke. In this task, we collected labels and ratings from a balanced set of age groups from 18-70. Our annotators also represented a variety of genders, political stances and income levels.
Find out more


  • Task 1 emulates previous humor detection tasks in which all ratings were averaged to provide mean classification and rating scores.

    • Task 1a: predict if the text would be considered humorous (for an average user). This is a binary task.
    • Task 1b: if the text is classed as humorous, predict how humorous it is (for an average user). The values vary between 0 and 5.
    • Task 1c: if the text is classed as humorous, predict if the humor rating would be considered controversial, i.e. the variance of the rating between annotators is higher than the median. This is a binary task.
  • Task 2 aims to predict how offensive a text would be (for an average user) with values between 0 and 5.

    • Task 2a: predict how generally offensive a text is for users. This score was calculated regardless of whether the text is classed as humorous or offensive overall.


J. A. Meaney, University of Edinburgh, jameaney@ed.ac.uk
Steven Wilson, University of Edinburgh, steven.wilson@ed.ac.uk
Luis Chiruzzo, Universidad de la Republica, luischir@fing.edu.uy
Walid Magdy , University of Edinburgh, wmagdy@inf.ed.ac.uk

Join our mailing list: hahackathon@googlegroups.com

View the results here:

Hahackathon Results

Download the data here:

Hahackathon Data

Mazjak - An Online Arabic Sentiment Analyser

Sentiment analysis is one of the most useful natural language processing applications. There are many papers and systems addressing this task, but most of the work is focused on English. Therefore, we present Mazajak, an online system for Arabic sentiment analysis. The system is based on a deep learning model, which achieves state-of-the-art results on many Arabic dialect datasets including SemEval 2017 and ASTD. The system provides three-way sentiment classification to one of the classes (Positive, Negative, Neutral)

Find out more

Mazajak provides many features such as sentiment analysis for a sentence, a file, or you can submit a Twitter account and get an analysis of the user. In addtion to that there is an online API.

Mazajak was created by Ibrahim Abu Farha and Dr. Walid Magdy at the ILCC, part of the School of Informatics, the Univeristy of Edinburgh.

This project was funded by The Alan Turing Institute, UK. The details about the system were published in WANLP-2019, please cite the following paper:

Mazajak: An Online Arabic Sentiment Analyser. Ibrahim Abu Farha and Walid Magdy. In Proceedings of the Fourth Arabic Natural Language Processing Workshop (WANLP). 2019.

You can access the online tool from here.

Urban Dictionary Embeddings

This resource contains a set of English-language word embeddings trained on the entirety of Urban Dictionary (urbandictionary.com) as of October 16, 2019. All terms, definitions, examples, and tags were treated as running text and embeddings were learned using the fasttext framework (fasttext.cc) with window size of 5, a negative sampling rate of 10, and a word-level dimensionality of 300.

Find out more

These embeddings perform competitively on a range on word-level semantics tasks, and were also useful initializations for classifiers trained for sentiment and sarcasm detection. If you are working in a domain that uses a high degree of slang or nonstandard English and you want representations that better capture the slang meanings of terms, give them a try!

If you use this resource in your work, please cite: Wilson, S. R., Magdy, W., McGillivray, B., Garimella, K., & Tyson, G. Urban Dictionary Embeddings for Slang NLP Applications. In Proceedings of the Language Resources and Evaluation Conference (LREC). Marseille, France, May 2020

You can download the file from here.

Tutorial: Detection and Characterization of Stance on Social Media

Stance detection involves the identification of the positions of a piece of text or a user towards a target such as a topic, entity, or claim. A growing body of research in the ICWSM and Social Computing community on performing and using stance detection shows its importance for a variety of applications including properly analyzing the attitudes of online users.

Find out more

This tutorial aims to teach participants how to perform and use stance detection. Specifically, we provide a general introduction to the concept of stance and how it differs from sentiment analysis; present recent methodologies for stance detection on social media including supervised, semi-supervised, and unsupervised methods; and introduce various applications of stance detection on social media including how it can be used to support analytical studies. The tutorial concludes with an exploration of open challenges and future directions for stance detection on social media.


Abeer AlDayel PhD student at the school of Informatics, the University of Edinburgh. Her work is on stanc detection on social media. Website: https://abeeraldayel.github.io/

Kareem Darwish Principle scientist at Qatar Computing Research Institute, HBKU university. Website: http://kareemdarwish.com/

Walid Magdy Associate professor at the School of Informatics, the University of Edinburgh, and faculty fellow at the Alan Turing Institute. Website: http://homepages.inf.ed.ac.uk/wmagdy/


Other materials