-
Notifications
You must be signed in to change notification settings - Fork 1
Home
This page contains information about ML-Ask. ML-Ask is a system for Affect Analysis of textual input in Japanese. It is based on a linguistic assumption that emotional states of a speaker are conveyed by emotional expressions used in emotive utterances. ML-Ask firstly separates emotive utterances from non-emotive and in the emotive utterances seeks for expressions of specific emotion types.
ML-Ask, or eMotive eLement and Expression Analysis system is a keyword-based language-dependent system for automatic affect annotation on utterances in Japanese. It uses a two-step procedure:
-
Specifying whether a sentence is emotive, and -
Recognizing particular emotion types in utterances described as emotive.
ML-Ask is based on the idea of two-part classification of realizations of emotions in language into:
-
Emotive elements or emotemes, which indicate that a sentence is emotive, but do not detail what specific emotions have been expressed. For example, interjections such as “whoa!” or “Oh!” indicate that the speaker (producer of the utterance) have conveyed some emotions. However, it is not possible, basing only on the analysis of those words, to estimate precisely what kind of emotion the speaker conveyed. -
Emotive expressions are words or phrases that directly describe emotional states, but could be used to both express one’s emotions and describe the emotion without emotional engagement.
I collected and hand-crafted a database of 907 emotemes, which include such groups of emotemes as:
- interjections: すごい sugoi (great!)
- mimetic expressions (gitaigo in Japanese): わくわく wakuwaku (heart pounding)
- vulgar language: やがる -yagaru (syntactic morpheme used in verb vulgarization)
- emotive sentence markers: ‘!’, or ‘??’ (sentence markers indicating emotiveness)
A set of features similar to what I define as emotemes has been also applied in other research on discrimination between emotive (emotional/subjective) and non-emotive (neutral/objective) sentences (see for example Wiebe et al., 2005, Wilson & Wiebe, 2005, or Aman & Szpakowicz, 2007).
Emotive expressions can be realized by various parts of speech and phrases, such as:
- nouns: 愛情 aijou (love),
- verbs: 悲しむ kanashimu (to feel sad, to grieve)
- adjectives: 嬉しい ureshii (happy)
- phrases: 虫唾が走る mushizu ga hashiru (to give one the creeps [from hate])
As the collection of emotive expressions ML-Ask uses a database created on the basis of Akira Nakamura’s “Emotive Expression Dictionary”. The emotive expression database is a collection of over two thousand expressions describing emotional states. It also incorporates an emotion classification reflecting Japanese language and cluture. All expressions are classified as representing a specific emotion type, one or more if applicable. In particular, the ten emotion types are: 喜 ki/yorokobi (joy, delight), 怒 dō/ikari (anger), 哀 ai/aware (sorrow, sadness, gloom), 怖 fu/kowagari (fear), 恥 chi/haji (shame, shyness, bashfulness), 好 kō/suki (liking, fondness), 厭 en/iya (dislike, detestation), 昂 kō/takaburi (excitement), 安 an/yasuragi (relief), and 驚 kyō/odoroki (surprise, amazement). The distribution of separate expressions across all emotion classes is represented in the table below.
Emotion class Nunber of expressions
- dislike 532
- excitement 269
- sadness 232
- joy 224
- anger 199
- fondness 197
- fear 147
- surprise 129
- relief 106
- shame 65
Sum 2100
ML-Ask also implements the idea of Contextual Valence Shifters (CVS) for Japanese. The idea of CVS, as proposed by Polanyi and Zaenen, 2006, assumes two kinds of CVS: negations and intensifiers. Negations are words and phrases like “not”, “never” or “not quite”, which change semantic polarity of an evaluative word they refer to. Intensifiers are words like “very”, “very much” or “deeply”, which intensify the semantic orientation of an evaluative word. ML-Ask incorporates the negation type of CVS with 108 syntactic negation structures. Examples of CVS negations in Japanese are structures such as: あまり~ない amari -nai (not quite-) ~とは言えない -to wa ienai (cannot say it is-) ~てはいけない -te wa ikenai (cannot+[verb]-) As for intensifiers, although ML-Ask does not include them as a separate database, most Japanese intensifiers are included in the emoteme database.
Finally, ML-Ask implements Russell’s two dimensional model of affect. The model assumes that all emotions can be represented in two dimensions: the valence (positive/negative) and activation (activated/deactivated). An example of negative-activated emotion could be “anger”; a positive-deactivated emotion is, e.g., “relief”. The emotion classes annotated by the system are also generalized on this mode
2013.08.07 [ML-Ask 4.3.1]
-
Fixed a small bug appearing in ML-Ask which caused crashing on startup due to utf libraries mismatching.
2013.05.20 [ML-Ask 4.3, codename: "noregex"]
-
Added a few foreach loops, but got rid of most regex. -
Much faster than 4.2 (faster and more furious :-) ). -
Needs much less memory. -
Repaired a bug in 4.2 where if there was the same emotive expression in two emotion type databases the system extracted only one emotion type. -
Beginning with this version I develop ML-Ask and ML-Ask-simple simultaneously.
2011.10.27 [ML-Ask 4.2.2.1.2a, codename: "simple"]
-
Additional version with no emoteme processing. -
Created especially for processing of non-conversation-like contents, like blogs, fairytales, etc.
2011.10.21 [ML-Ask 4.2, codename: "fast and furious"]
-
Official release of ML-Ask 4.2. -
Added new algorithm for fast and precise emoticon detection, -
optimized all regex (simplified, compressed, added anchors, got rid of irrelevant grouping/brackets), -
added regex precompilation, -
where possible got rid of regex at all in favor of simpler (faster) operations, -
got rid of several loops, -
improved processing speed (up to 10 times comparing to 4.0), -
using much less memory. -
This version was used to annotate YACIS corpus.
2011.09.27 [ML-Ask 4.0]
-
Official release of ML-Ask 4.0. -
Got rid of many lines of code, -
improved processing speed (3-6 times), -
added RE2 regex engine for faster regex matching, -
added additional interjection extraction with MeCab-perl-binding, -
added basic emoticon database from CAO to detect emoticons, -
added improved CVS algorithm, -
added improved 2D-affect space mapping algorithm, -
added processing of both whole files and STDIN.
around 2008 Nov [ML-Ask 3.0]
-
First attempt to add Russell's 2D affect space (still very clumsy).
around 2008 Sep [ML-Ask 2.0]
-
First attempt to support ML-Ask with CVS.
around 2007 Sep [ML-Ask 1.0]
-
First version of ML-Ask is created (keyword matching, no CVS, no 2D affect space).
Michal Ptaszynski, Pawel Dybala, Rafal Rzepka and Kenji Araki, “Affecting Corpora: Experiments with Automatic Affect Annotation System - A Case Study of the 2channel Forum -”, In Proceedings of The Conference of the Pacific Association for Computational Linguistics (PACLING-09), September 1-4, 2009, Hokkaido University, Sapporo, Japan, pp. 223-228.
Michal Ptaszynski, Pawel Dybala, Wenhan Shi, Rafal Rzepka and Kenji Araki, “A System for Affect Analysis of Utterances in Japanese Supported with Web Mining”, Journal of Japan Society for Fuzzy Theory and Intelligent Informatics, Vol. 21, No. 2 (April), pp. 30-49 (194-213), 2009.
Download the software form here.