Grupo de Sistemas Ingeligentes Marl Ontology

Marl Ontology Experiments

07 February 2011

This version: http://purl.org/marl/experiments/0.1/
Latest version: http://purl.org/marl/experiments/
Editors: Adam Westerski
Authors: Adam Westerski
Contributors: See acknowledgements

Creative Commons License


Abstract

Marl is a standardised data schema (also referred as "ontology" or "vocabulary") designed to annotate and describe subjective opinions expressed on the web or in particular Information Systems. The following document contains results of data mapping experiments where we try to describe various datasets or output of services with Marl ontology. For the description of ontology and instructions how to connect it with descriptions of other resources see Ontology Specification.


Table of Contents

  1. Introduction
    1. Opinions on the Web and the opinion mining process
    2. The Semantic Web
    3. What is Marl for?
  2. Research Datasets
    1. Congressional speech data (Cornell)
    2. Movie Review Data (Cornell)
    3. Customer Review Data (Hu, Liu)
    4. French Newspaper Articles
    5. Multi-Domain Sentiment Dataset
  3. Online Opinion Analysis Services
    1. Swotti
    2. Tweetsentiments
    3. Mombo
    4. Opinion Crawl (Semantic Engines LLC)
    5. Opal module for Drupal
    6. OPfine / Jane16
    7. Evri sentiment API
    8. Opendover
  4. Summary and Comparison

Appendixes

  1. Changelog
  2. Acknowledgements

1 Introduction

The following document gathers the data results of various experiments done with Marl Ontology. It’s goal it to test the coverage of Marl properties for different datasets constructed independently of the Marl project.

The analysis is split into two parts. Each of the sections presents a list of sources and Marl mappings for them, along with some coverage statistics.

Section two describes usage of the ontology to produce mappings for various datasets published by researchers during their opinion mining algorithms study. The second part relates to the same effort but conducted in context of online services for end users that publish opinion mined data.

The choice of sources for datasets is based on state of the art knowledge of authors (in case of research datasets the list was partially created based on resources listed by Pang et al. [ref]).

An important note is that Marl ontology presented here is not a complete model to address the problem of describing and linking opinions online and inside information systems. It marly defines concepts that are not described yet by the means of other ontologies and provides the data attributes that enable to connect opinions with contextual information already defined in metadata created with other ontologies. For detailed instructions and recommendations how to fully model opinions and the results of opinion mining process refer to analysis done by Gi2MO project.

1.1 Opinions on the Web and the opinion mining process

With the birth of Web 2.0 users started to provide their input and create content on mass scape about their subjective opinions related to various topics (e.g. opinions about movies). While this kind of content can be very beneficial for many different uses (e.g. market analysis or predictions) it's accurate analysis and interpretation has not been fully harnessed yet. Information left by the users is often very disorganized and many portals that enable user input leave the user added information unmoderated.

Opinion mining (often referred as sentiment analysis) is one of the attempts bring order to those vast amounts of user generated content. The domain focuses to analyse textual content using special language processing tools and as output provides a quantified judgement of the sentiments contained in the text (e.g. if the text expresses a positive or negative opinion).

Due to the complexity of the problem and attempts to provide efficient and fast tools the area can be devided into three main research directions:

In relation to the World Wide Web, there is a number of common uses of opinion formalisation and analysis. Firstly, it can be applied on top of search engines to find the desired content and next run it through opinion analysis software to obtain desired statistics (e.g. Swotti). Secondly, such algorithms can used within dedicated systems that use the Web to connect to particular communities and gather their opinions on very specific topics (e.g. Internet shops or review websites).

In relation to the dedicated systems (e.g. Enterprise Systems), there the community collaborative models that have proven successful in the open web are often transferred to large enterprise to enhance knowledge exchange and bring the employees together. The same opinion mining techniques can be applied in such cases to extact particular information and use it for internal statistics and to improve knowledge search across the enterprise (e.g. see use of opinion mining in Idea Management [link]).

1.2 The Semantic Web

The Semantic Web is a W3C initiative that aims to introduce rich metadata to the current Web and provide machine readable and processable data as a supplement to human-readable Web.

Semantic Web is a mature domain that has been in research phase for many years and with the increasing amount of commercial interest and emerging products is starting to gain appreciation and popularity as one of the rising trends for the future Internet.

One of the corner stones of the Semantic Web is research on interlinkable and interoperable data schemas for information published online. Those schemas are often refered to as ontologies or vocabularies. In order to facilitate the concept of ontologies that lead to a truly interoperable Web of Data, W3C has proposed a series of technologies such as RDF and OWL. Marl uses those technologies and the research that comes within to propose an ontology for the particular goal of describing opinions and linking them with contextual information (such as opinion topic, features described in the opinion etc.).

1.3 What is Marl for?

The goals of the Marl ontology to achieve as a data schema are:

For more information please refer to Marl usage study done as part of the research in the Gi2MO project.

2. Research Datasets

The goal of this experiment was to see the capabilities of Marl ontology as a universal data schema used with real data extracted with existing algorithms. In comparison to the use case study [link], here we evaluate if our assumptions about the model of opinion are correct when aligned with work of other researchers.

Before reading the below mappings please be advised that Marl is an ontology meant for annotation of opinions not various (detailed) parameters of opinion mining algorithms performance. Therefore, often the very individual algorithm data is not covered.

2.1 Congressional speech data (Cornell)

Description:
Dataset mined from congressional speech text available online. Apart of polarity the data expresses connections between speakers, speeches and debates etc.
Reference paper:
"Get out the vote: Determining support or opposition from Congressional floor-debate transcripts"
Matt Thomas and Bo Pang and Lillian Lee, Proceedings of EMNLP, 2006
References to other speakers
(edges_reference_set_full.v1.1.csv)
Dataset field Mapping Description Example
Agreement marl:Polarity Agreement between two speakers (binary) 1
Debate Number marl:extractedFrom Reference to the text 52
Speaker ID [Linked] dcterms:creator Person that is expressing an opinion 400011
Referenced Speaker ID marl:describesObject Subject of the opinion 1
Raw Score
uncovered:raw_score
Score 0.48831446
Normalized Score marl:polarityValue Score 0.936875742
Edge Strength
uncovered:edge_strength
- 2342
Overall document sentiment
(edges_individual_document.v1.1.csv)
Dataset field Mapping Description Example
Filename marl:extractedFrom Reference to the text 052_400...txt
Raw Score
uncovered:raw_score
Score -0.27982776
Normalized Score marl:polarityValue Score -0.362440238
Strength from source
uncovered:edge_strengthSource
- 4094
Strength to sink
uncovered:edge_strengthSink
- 5906

2.2 Movie Review Data (Cornell)

Description:
Positive/negative movie reviews, positive/negative sentences, subjective/objective sentences sets.
Reference paper:
"A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts"
Proceedings of the ACL, 2004
RottenTomatoes Reviews
(rotten_imdb.tar.gz)
Dataset field Mapping Description Example
Sentence marl:extractedFrom Line of text extracted from the review (reviews provided in separate files) “color , musical bounce and warm seas lapping on island shores....“
Filename
uncovered:isSubjective
Separate files for subjective and objective -
IMDb archive of the rec.arts.movies.reviews group
(review_polarity.tar.gz)
Dataset field Mapping Description Example
Filename marl:extractedFrom Source on the web -
File location/subdirectory marl:Polarity Separate directories for positive and negative -

2.3. Customer Review Data (Hu, Liu)

Description:
Electronics products reviews downloaded from Amazon and Cnet.
Reference paper:
"Mining and summerizing customer reviews"
Hu, M., Liu, B., Proceeding of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2004
Get Sample RDF/XML:
ds3_product_reviews.rdf
Amazon/Cnet customer reviews
(CustomerReviewData.zip)
Dataset field Mapping Description Example
Product name marl:describesObject Reviewed product name “Creative Labs Nomad Jukebox Zen Xtra 40GB”
Review title [Linked] dcterms:title TItle of the user review “get this player ! “
Product feature marl:describesObjectPart / marl:describesFeature Element of a device or its characteristic “wma file”
Opinion polarity marl:Polarity Polarity of opinion “+”
Opinion strength marl:polarityValue Rating from 1-3 “2”
Feature presence
uncovered:hasExplicateTopic
If the detected feature/object appears in the mined sentence (player) ”get it , it 's worth every penny”
Suggestion
uncovered:opinionType
If the review is a suggestions or recommendation -
Comparison [cc]
uncovered:isComparisonTo
If there is a comparison to product of different brand -
Comparison [cs]
uncovered:isComparisonTo
If there is a comparison to product of same brand -

2.4 French Newspaper Articles

File:
Eval.zip
Description:
702 sentences published in the Belgian French-language newspaper Le Soir in 1995. These sentences were evaluated by ten judges. Their task was to indicate, on a seven-point scale, up to what point the contents of each sentence evoked an unpleasant, neutral or pleasant idea.
Reference paper:
"Un barometre affectif effectif: Corpus de reference et methode pour determiner la valence affective de phrases"
Bethard, S., Fairon, C., Kerves, L., JADT 2004
Le Soir newspaper sentences
(Eval.zip)
Dataset field Mapping Description Example
Sentence
uncovered:opinionText
Sentence from the paper “Pour ses opérations souvent très meurtrières, l'armée use.....”
Valence mean marl:polarityValue Number for -3 to 3 that describes emotions caused by reading the sample (-3=very unpleasent, 3= pleasent) -2.4
Standard deviation
uncovered:stdDeviation
- 0.7

2.5 Multi-Domain Sentiment Dataset

Description:
Reviews of different product types taken from Amazon.
Reference paper:
"Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification"
Blitzer, J., Dredze, M.m Pereira, F., Proceedings of the Association for Computational Linguistics (ACL), 2007
Amazon customer reviews
(processed_acl.tar.gz)
Dataset field Mapping Description Example
Filename marl:Polarity Filename in this dataset states the polarity positive/negative
Feature marl:describesFeature Feature name “the_failure”
Feature count marl:algorithmConfidence The amount of times a feature was detected in the review 1
Label marl:Polarity In addition to filename every review has attached polarity as well positive/negative

3. Online Opinion Analysis Services

The following use cases aim to show how Marl Ontology could be used in different environments (as in systems) and when applied to to opinions of various complexity and structure.

3.1 Swotti

Description:
Product/Movie etc. opinions taken from the web (via search engines) and processed with opinion mining to create rankings and summerize.
Get Sample RDF/XML:
swotti.rdf
Aggregated opinions
Dataset field Mapping Description Example
Product Name marl:describesObject Name of the product “iPad”
Feature Name marl:describesFeature Generic feature name “Usability”
Polarity marl:Polarity Positive or negative “Positive”
Rating marl:polarityValue Aggregated rating value based on opinions from the web (rating is 1-5) “5/5”
Opinion count
uncovered:opinionsCount
Total number of aggregated opinions (on all features) “21.553”
Positive count
uncovered:positiveOpinionsCount
Amount of positive opinions vs. negative “70% positive”
Individual opinions
Dataset field Mapping Description Example
Tag marl:describesFeature Feature name “Usability”
Date
uncovered:opinionAnalysisDate
Date the opinion was mined “14.04.2010”
Opinion text
uncovered:opinionText
The fragment of text that expresses an opinion “run on Apple's iPad, which features a lightweight, incredibly easy-to-use, touch-screen ....”
Opinion relevance marl:algorithmConfidence Rank of how valuable the opinion is -
Opinion Value marl:polarityValue Polarity rated from 1-5 “5/5”
Polarity marl:Polarity If the opinion is positive or negative Positive
Source URL marl:extractedFrom The URL of the page where the opinion was expressed -

3.2 Tweetsentiments

Sentiments on tweets
(API)
Dataset field Mapping Description Example
Text
uncovered:opinionText
Text fragment given as parameter “hates being in on a friday :[“
Value marl:describesFeature 1, -1 or 0 “1”
Name marl:Polarity Positive, negative or neutral “Positive”
Sentiments on topic/user
(API)
Dataset field Mapping Description Example
Topic / User marl:describesObject Topic or user name (parameter) “iPad”
Sentiment Index marl:polarityValue Normalized overall value (1-100) “60”
Positive
uncovered:positiveOpinionsCount
Total number of positive tweets “42”
Negative
uncovered:negativeOpinionsCount
Total number of negative tweets “6”
Neutral
uncovered:neutralOpinionsCount
Total number of neutral tweets “52”
Results marl:aggregatesOpinion All tweets on topic/user that contained with sentiments [user id, user name, image, date, ...]
Results Text
uncovered:opinionText
Table with tweets text “hates being in on a friday :[“
Results sentiment marl:polarityValue Table with tweets sentiments “-1”

3.3. Mombo

Description:
Similar as Tweet Sentiment but only for movie reviews. Grabs messages from twitter and attaches sentiments to it. The current API does not allow full access to single twitter posts with sentiment values.
Movies with aggregated sentiment rating
(partially available via API)
Dataset field Mapping Description Example
movieName marl:describesObject Name of the movie “Gnomeo and Juliet”
mentions
uncovered:opinionsCount
Total amount of mentions of the movie on Twitter “1”
avgMentionsPerDay
uncovered:opinionsCountPerDay
Average amount of tweets that expressed a sentiment per day “1131.13”
momboMeter marl:polarityValue 1-5 sentiment rating “3.5”
All tweets marl:aggregatesOpinion Listing of all tweets (text, sentiment) connected to the movie -
Positive tweets
uncovered:aggregatesPositiveOpinion
Listing of positive tweets (text, sentiment) connected to the movie -
Negative tweets
uncovered:aggregatesNegativeOpinion
Listing of negative tweets (text, sentiment) connected to the movie -
Tweet sentiment marl:Polarity Positive or negative ( 1= positive, 2= negative) “1”
Text
uncovered:opinionText
Tweet text “Gnomeo & juliet is AMAZING!”
from_user_id / user_profile [Linked] dcterms:creator Person that expressed the opinion “16110567“
to_user_id [Linked] dcterms:creator For whom the opinion was created (in reply to who's post) “16114367“
source marl:extractedFrom Url of the tweet -
confidence_score marl:algorithmConfidence Mambo algorithm confidence score “3”
override_time
uncovered:opinionChangedTo
Information if the sentiment on the topic by the particular user was changed later -
created_at dcterms:created Date when opinion was created “2011-02-14 03:46:47”
reply_to [Linked] sioc:reply_of If the tweet is a reply this will contain the url to the original tweet that triggered the sentiment -

3.4 Opinion Crawl (Semantic Engines LLC)

Description:
Web sentiment snapshot on a person, company or event. Processes a variety of content sources - news sites, blogs, forums, etc.
Sentiments in web documents
(API)
Dataset field Mapping Description Example
name marl:describesObject / marl:describesFeature Object or feature to which sentiments are attached “cookies“
datestamp
uncovered:opinionAnalysisDate
Date and timestamp of the analysis “09/01/2010 11:34:26 PM”
overall marl:Polarity Overall document polarity: {positive, negative, neutral} “Positive”
mentions
uncovered:opinionsCount
Total number of sentiment expressions in the document “1”
positive
uncovered:positiveOpinionsCount
Number of positive sentiment expressions -
negative
uncovered:negativeOpinionsCount
Number of negative sentiment expressions -
neutral
uncovered:neutralOpinionsCount
Number of neutral sentiment expressions -
bulltobear marl:polarityValue Ratio of positive vs negative sentiments -
concepts marl:describesObject / marl:describesFeature Automatically extracted objects, features related to sentiments “cookies”

3.5 Opal module for Drupal

Description:
A module for analysing comments posted in the Drupal CMS. Uses Sentiwordnet to calculate the polarity of comments as well as overall polarity of the commented text.
Sentiments in Drupal comments
Dataset field Mapping Description Example
Commented resource [Linked] sioc:reply_of Object or feature to which sentiments are attached -
Comment reference marl:extractedFrom Reference to the comment URL -
Polarity marl:Polarity Overall document polarity: {positive, negative, neutral} “Neutral”
Result marl:polarityValue Number representing the polarity “0.0182”
Sentiments in Drupal content types
Dataset field Mapping Description Example
Content type reference marl:extractedFrom Content that has comments -
Comment references marl:aggregatesOpinion References to comments (ids/ URLs) -
Positive
uncovered:positiveOpinionsCount
Number of positive comments "0"
Neutral
uncovered:neutralOpinionsCount
Number of neutral comments "1"
Negative
uncovered:negativeOpinionsCount
Number of negative comments "0"
Result marl:polarityValue Number representing the polarity “0.0182”
Polarity marl:Polarity Overall document polarity: {positive, negative, neutral} “Neutral”

3.6 OPfine / Jane16

Description:
Analysis of sentiments about market (e.g. particular companies), implementation based on Jane16 open-source library.
Individual Documents
Dataset field Mapping Description Example
Polarity marl:Polarity Positive/Negative -
Sentiment marl:polarityValue References to comments (ids/ URLs) "3.2"
Title of the article [Linked] dcterms:title Polarity value normalized from -100 to 100 “No comfort for Ireland buy-to-let investors”
Publish time [Linked] dcterms:created Date the webpage was created -
Source marl:extractedFrom URL of the source -
Topic marl:describesObject keyword representing the topic of the setiment “Ireland”

3.7 Evri sentiment API

Description:
Evri is a news aggregation website, out of others is provides a sentiment API for retrieval of opinions connected to particular items in its database.
Sentiment on Evri entities
*based only on API documentation
Dataset field Mapping Description Example
Opinion topic entity marl:describesObject Who the opinions are about “/organization/nato-0x308f6”
Positive
uncovered:positiveOpinionsCount
Percent of positive opinions "28"
Negative
uncovered:negativeOpinionsCount
Percent of negative opinions “72”
Opinion source entity [Linked] dcterms:creator Who expressed the opinion (can be individual or group) “/russia-0x237ce”
Opinion polarity marl:Polarity The API enables to list individual opinions with detailed listing of source and polarity -

3.8 Opendover

Description:
A web service that provides opinion mining. In addition Opendover is available, similarly like OPAL, as a Drupal plugin.
Sentiments detection and polarity rating
(API documentation)
Dataset field Mapping Description Example
docDate
uncovered:opinionAnalysisDate
Sentiment analysis date -
document [Linked] sioc:Post Full document text "28"
attitude
uncovered:opinionType
Type of sentiment: judgement, appreciation or emotional state. “APPRECIATION”
value
uncovered:opinionTrigger
Word(s) describing sentiment) “bad”
orientation marl:Polarity Positive or negative “-1”
offset
uncovered:opinionTriggerTextOffset
Position of the word in text 12
force marl:polarityValue Polarity value -9 to +9 “-4”
length
uncovered:opinionTriggerTextLength
Length of the word 3
object marl:describesObject Reference to described entity -

4 Summary and Comparison

Coverage per dataset/service
(linked properties are counted as coverd)
Dataset/service name # Covered # Uncovered # Total Coverage (percent)
Congressional speech data (Cornell) 7
5
12 58%
Movie Review Data (Cornell) 3
1
4 75%
Customer Review Data (Hu, Liu) 5
4
9 56%
French Newspaper Articles 1
2
3 33%
Multi-Domain Sentiment Dataset 4
0
4 100%
Swotti 9
4
13 69%
Tweetsentiments 6
5
11 55%
Mombo 10
6
16 63%
Opinion Crawl 4
5
9 44%
OPAL 8
3
11 73%
OPfine 6
0
6 100%
Evri 3
2
5 60%
Opendover 4
5
9 44%
Avarage 5
3
8 64%
Property usage in mappings for datasets and services
(average amount a property was used in a data source)

A Changelog

B Acknowledgements

The style formatting of the following document has been inspired on FOAF specification.

Special thanks for support with Marl ontology creation and research to: Prof. Carlos A. Iglesias and members of the GSI Group of DIT department of Universidad Politécnica de Madrid.