MLSA - A Multi-layered Reference Corpus for German Sentiment Analysis

Instance of: Dataset
Contact Point Datahub/mlsa#N1d932e1d6e5b4a21b939e58e473da115
Description Sentence-layer annotation represents the most coarse-grained annotation in this corpus. We adhere to definitions of objectivity and subjectivity introduced in (Wiebe et al., 2005). Additionally, we followed guidelines drawn from (Balahur & Steinberger, 2009). Their clarifications proved to be quite effective, raising inter-annotator agreement in a sentence-layer polarity annotation task from about 50% to >80%. All sentences were annotated in two dimensions. The first dimension covers the factual nature of the sentence, i.e. whether it provides objective information or if it is intended to express an opinion, belief or subjective argument. Therefore, it is either objective or subjective. The second dimension covers the semantic orientation of the sentence, i.e. its polarity. Thus, it is either positive, negative or neutral. In the second layer, we model the contextually interpreted sentiments on the levels of words and NP/PP phrases. That is, the annotation decisions are based on the meaning of the words in the context of the sentence. Word sentiment markers: The sentiments on the level of individual words are expressed by single character markers added at the end of the words. A word might be positive (+), negative(-), neutral(empty), a shifter (~), an intensifier (^), or a diminisher (%). If a word ends with a hyphen (e.g., \"auf beziehungs-_ bzw. partnerschaftliche Probleme-\", an underscore is added to the word in order to prevent missinterpretations of the hyphen as a negative marker. Currently, only words that are part of an NP/PP are marked with sentiment markers. Annotated words are nouns, adjectives, negation particles, prepositions, adverbs. The world level annotation was done by 3 persons individually. The individual results were harmonized into a single reference annotation. Phrase level markers: Each phrase is marked up textually by brackets, e.g. \"[auf beziehungs-_ bzw. partnerschaftliche Probleme-]\". The type of a phrase (NP/PP) is not written to the brackets. We follow largely the annotation model of TIGER for structuring embedded NPs and PPs. Currently, the following limitations with regard to TIGER exist: (1) Adjectival phrases are not marked up (2) Relative or infinitival sentences are not included in NPs/PPs if they appear at the end of a phrase or if the are discontiguous. We do not only annotate the phrases which immediately contain words that are marked up as polar. Any dependent subphrase (NP/PP) is integrated into all its dominating NPs/PPs, e.g. \"[Die tieferen Ursachen [der Faszination+]]\". Dependent subphrases without any polar words are also included, however, there is no internal bracketing for them, e.g. \"[hohe+ Ansprüche an Qualität und Lage]\" At the level of phrases, we distinguish the following markers: positive (+), negative (-), neutral(0), bipolar (#). The category 'bipolar' is used mainly for coordinations where negative and positive sentiments of something are kept in balance by the writer. This is quite common for a lot of binomial constructions as \"Krieg und Frieden\".
Distribution SPARQL endpoint
Example sentence resource
RDF dump
Diagram of the MLSA linked data model
Identifier 043ad4fd-c827-427d-bf89-277ab7a53cea
Issued 2012-09-21T12:29:17.507831 Date Time
Keyword linguistics
Landing Page
Modified 2015-03-18T14:23:26.614665 Date Time
Title MLSA - A Multi-layered Reference Corpus for German Sentiment Analysis