Search Results

Contact Point Datahub/mlsa#N1d932e1d6e5b4a21b939e58e473da115
Description Sentence-layer annotation represents the most coarse-grained annotation in this corpus. We adhere to definitions of objectivity and subjectivity introduced in (Wiebe et al., 2005). Additionally, we followed guidelines drawn from (Balahur & Steinberger, 2009). Their clarifications proved to be quite effective, raising inter-annotator agreement in a sentence-layer polarity annotation task from about 50% to >80%. All sentences were annotated in two dimensions. The first dimension covers the factual nature of the sentence, i.e. whether it provides objective information or if it is intended to express an opinion, belief or subjective argument. Therefore, it is either objective or subjective. The second dimension covers the semantic orientation of the sentence, i.e. its polarity. Thus, it is either positive, negative or neutral. In the second layer, we model the contextually interpreted sentiments on the levels of words and NP/PP phrases. That is, the annotation decisions are based on the meaning of the words in the context of the sentence. Word sentiment markers: The sentiments on the level of individual words are expressed by single character markers added at the end of the words. A word might be positive (+), negative(-), neutral(empty), a shifter (~), an intensifier (^), or a diminisher (%). If a word ends with a hyphen (e.g., \"auf beziehungs-_ bzw. partnerschaftliche Probleme-\", an underscore is added to the word in order to prevent missinterpretations of the hyphen as a negative marker. Currently, only words that are part of an NP/PP are marked with sentiment markers. Annotated words are nouns, adjectives, negation particles, prepositions, adverbs. The world level annotation was done by 3 persons individually. The individual results were harmonized into a single reference annotation. Phrase level markers: Each phrase is marked up textually by brackets, e.g. \"[auf beziehungs-_ bzw. partnerschaftliche Probleme-]\". The type of a phrase (NP/PP) is not written to the brackets. We follow largely the annotation model of TIGER for structuring embedded NPs and PPs. Currently, the following limitations with regard to TIGER exist: (1) Adjectival phrases are not marked up (2) Relative or infinitival sentences are not included in NPs/PPs if they appear at the end of a phrase or if the are discontiguous. We do not only annotate the phrases which immediately contain words that are marked up as polar. Any dependent subphrase (NP/PP) is integrated into all its dominating NPs/PPs, e.g. \"[Die tieferen Ursachen [der Faszination+]]\". Dependent subphrases without any polar words are also included, however, there is no internal bracketing for them, e.g. \"[hohe+ Ansprüche an Qualität und Lage]\" At the level of phrases, we distinguish the following markers: positive (+), negative (-), neutral(0), bipolar (#). The category 'bipolar' is used mainly for coordinations where negative and positive sentiments of something are kept in balance by the writer. This is quite common for a lot of binomial constructions as \"Krieg und Frieden\".
Title MLSA - A Multi-layered Reference Corpus for German Sentiment Analysis
Contact Point Datahub/usage-review-corpus#Nec5dae964d474855acc326bd570e8f1d
Description This corpus consists of *sentiment* annotations of Amazon reviews for different product categories in the languages German and English. The reviews themselves are not part of this data publication. The annotations are fine-grained, including aspects and subjective phrases. In addition, the relation of an aspect to be a target of a subjective phrase is provided as well as the polarity of the subjective phrase. The corpus consists of 622 English and 611 German reviews for coffee machines, cutlery, microwaves, toaster, trash cans, vacuum cleaner, washing machines and dishwasher. The English corpus is annotated with more than 8000 aspects and 5000 subjective phrases, the German part with more than 6000 aspects and around 5000 subjective phrases (depending on the annotator). Each review is independently annotated by two annotators.
Title USAGE review corpus