Marl Ontology Experiments

07 February 2011

This version:	http://purl.org/marl/experiments/0.1/
Latest version:	http://purl.org/marl/experiments/
Editors:	Adam Westerski
Authors:	Adam Westerski
Contributors:	See acknowledgements

This work is licensed under a Creative Commons Attribution License. This copyright applies to the Marl Ontology Specification and accompanying documentation in RDF. This ontology uses W3C's RDF technology, an open Web standard that can be freely used by anyone.

Abstract

Marl is a standardised data schema (also referred as "ontology" or "vocabulary") designed to annotate and describe subjective opinions expressed on the web or in particular Information Systems. The following document contains results of data mapping experiments where we try to describe various datasets or output of services with Marl ontology. For the description of ontology and instructions how to connect it with descriptions of other resources see Ontology Specification.

Introduction
Research Datasets
Online Opinion Analysis Services
Summary and Comparison

1 Introduction

The following document gathers the data results of various experiments done with Marl Ontology. It’s goal it to test the coverage of Marl properties for different datasets constructed independently of the Marl project.

The analysis is split into two parts. Each of the sections presents a list of sources and Marl mappings for them, along with some coverage statistics.

Section two describes usage of the ontology to produce mappings for various datasets published by researchers during their opinion mining algorithms study. The second part relates to the same effort but conducted in context of online services for end users that publish opinion mined data.

The choice of sources for datasets is based on state of the art knowledge of authors (in case of research datasets the list was partially created based on resources listed by Pang et al. [ref]).

An important note is that Marl ontology presented here is not a complete model to address the problem of describing and linking opinions online and inside information systems. It marly defines concepts that are not described yet by the means of other ontologies and provides the data attributes that enable to connect opinions with contextual information already defined in metadata created with other ontologies. For detailed instructions and recommendations how to fully model opinions and the results of opinion mining process refer to analysis done by Gi2MO project.

1.1 Opinions on the Web and the opinion mining process

With the birth of Web 2.0 users started to provide their input and create content on mass scape about their subjective opinions related to various topics (e.g. opinions about movies). While this kind of content can be very beneficial for many different uses (e.g. market analysis or predictions) it's accurate analysis and interpretation has not been fully harnessed yet. Information left by the users is often very disorganized and many portals that enable user input leave the user added information unmoderated.

Opinion mining (often referred as sentiment analysis) is one of the attempts bring order to those vast amounts of user generated content. The domain focuses to analyse textual content using special language processing tools and as output provides a quantified judgement of the sentiments contained in the text (e.g. if the text expresses a positive or negative opinion).

Due to the complexity of the problem and attempts to provide efficient and fast tools the area can be devided into three main research directions:

document wide sentiment analysis
sentence sentiment analysis
feature-based sentiment analysis

In relation to the World Wide Web, there is a number of common uses of opinion formalisation and analysis. Firstly, it can be applied on top of search engines to find the desired content and next run it through opinion analysis software to obtain desired statistics (e.g. Swotti). Secondly, such algorithms can used within dedicated systems that use the Web to connect to particular communities and gather their opinions on very specific topics (e.g. Internet shops or review websites).

In relation to the dedicated systems (e.g. Enterprise Systems), there the community collaborative models that have proven successful in the open web are often transferred to large enterprise to enhance knowledge exchange and bring the employees together. The same opinion mining techniques can be applied in such cases to extact particular information and use it for internal statistics and to improve knowledge search across the enterprise (e.g. see use of opinion mining in Idea Management [link]).

1.2 The Semantic Web

The Semantic Web is a W3C initiative that aims to introduce rich metadata to the current Web and provide machine readable and processable data as a supplement to human-readable Web.

Semantic Web is a mature domain that has been in research phase for many years and with the increasing amount of commercial interest and emerging products is starting to gain appreciation and popularity as one of the rising trends for the future Internet.

One of the corner stones of the Semantic Web is research on interlinkable and interoperable data schemas for information published online. Those schemas are often refered to as ontologies or vocabularies. In order to facilitate the concept of ontologies that lead to a truly interoperable Web of Data, W3C has proposed a series of technologies such as RDF and OWL. Marl uses those technologies and the research that comes within to propose an ontology for the particular goal of describing opinions and linking them with contextual information (such as opinion topic, features described in the opinion etc.).

1.3 What is Marl for?

The goals of the Marl ontology to achieve as a data schema are:

enable to publish raw data about opinions and the sentiments expressed in them
deliver schema that will allow to compare opinions coming from different systems (polarity, topics, features)
interconnect opinions by linking them to contextual information expressed with concepts from other popular ontologies or specialised domain ontologies

For more information please refer to Marl usage study done as part of the research in the Gi2MO project.

2. Research Datasets

The goal of this experiment was to see the capabilities of Marl ontology as a universal data schema used with real data extracted with existing algorithms. In comparison to the use case study [link], here we evaluate if our assumptions about the model of opinion are correct when aligned with work of other researchers.

Before reading the below mappings please be advised that Marl is an ontology meant for annotation of opinions not various (detailed) parameters of opinion mining algorithms performance. Therefore, often the very individual algorithm data is not covered.

2.1 Congressional speech data (Cornell)

Source:

http://www.cs.cornell.edu/home/llee/data/convote.html

File:

convote_v1.1.tar.gz

Description:

Dataset mined from congressional speech text available online. Apart of polarity the data expresses connections between speakers, speeches and debates etc.

Reference paper:

"Get out the vote: Determining support or opposition from Congressional floor-debate transcripts"

Matt Thomas and Bo Pang and Lillian Lee, Proceedings of EMNLP, 2006

Get Sample RDF/XML:

ds1_congressionalspeech.rdf, ds1_congressionalspeech_2.rdf

References to other speakers

(edges_reference_set_full.v1.1.csv)

Dataset field	Mapping	Description	Example
Agreement	marl:Polarity	Agreement between two speakers (binary)	1
Debate Number	marl:extractedFrom	Reference to the text	52
Speaker ID	[Linked] dcterms:creator	Person that is expressing an opinion	400011
Referenced Speaker ID	marl:describesObject	Subject of the opinion	1
Raw Score	uncovered:raw_score	Score	0.48831446
Normalized Score	marl:polarityValue	Score	0.936875742
Edge Strength	uncovered:edge_strength	-	2342

Overall document sentiment

(edges_individual_document.v1.1.csv)

Dataset field	Mapping	Description	Example
Filename	marl:extractedFrom	Reference to the text	052_400...txt
Raw Score	uncovered:raw_score	Score	-0.27982776
Normalized Score	marl:polarityValue	Score	-0.362440238
Strength from source	uncovered:edge_strengthSource	-	4094
Strength to sink	uncovered:edge_strengthSink	-	5906

2.2 Movie Review Data (Cornell)

Source:

http://www.cs.cornell.edu/people/pabo/movie-review-data/

File:

dataset v2.0.tar.gz, rt-polaritydata.tar.gz, rotten_imdb.tar.gz

Description:

Positive/negative movie reviews, positive/negative sentences, subjective/objective sentences sets.

Reference paper:

"A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts"

Proceedings of the ACL, 2004

Get Sample RDF/XML:

ds2_movie_reviews.rdf, ds2_movie_reviews_2.rdf

RottenTomatoes Reviews

(rotten_imdb.tar.gz)

Dataset field	Mapping	Description	Example
Sentence	marl:extractedFrom	Line of text extracted from the review (reviews provided in separate files)	“color , musical bounce and warm seas lapping on island shores....“
Filename	uncovered:isSubjective	Separate files for subjective and objective	-

IMDb archive of the rec.arts.movies.reviews group

(review_polarity.tar.gz)

Dataset field	Mapping	Description	Example
Filename	marl:extractedFrom	Source on the web	-
File location/subdirectory	marl:Polarity	Separate directories for positive and negative	-

2.3. Customer Review Data (Hu, Liu)

Source:

http://www.cs.uic.edu/~liub/FBS/CustomerReviewData.zip

File:

CustomerReviewData.zip, Reviews-9-products.rar

Description:

Electronics products reviews downloaded from Amazon and Cnet.

Reference paper:

"Mining and summerizing customer reviews"

Hu, M., Liu, B., Proceeding of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2004

Get Sample RDF/XML:

ds3_product_reviews.rdf

Amazon/Cnet customer reviews

(CustomerReviewData.zip)

Dataset field	Mapping	Description	Example
Product name	marl:describesObject	Reviewed product name	“Creative Labs Nomad Jukebox Zen Xtra 40GB”
Review title	[Linked] dcterms:title	TItle of the user review	“get this player ! “
Product feature	marl:describesObjectPart / marl:describesFeature	Element of a device or its characteristic	“wma file”
Opinion polarity	marl:Polarity	Polarity of opinion	“+”
Opinion strength	marl:polarityValue	Rating from 1-3	“2”
Feature presence	uncovered:hasExplicateTopic	If the detected feature/object appears in the mined sentence	(player) ”get it , it 's worth every penny”
Suggestion	uncovered:opinionType	If the review is a suggestions or recommendation	-
Comparison [cc]	uncovered:isComparisonTo	If there is a comparison to product of different brand	-
Comparison [cs]	uncovered:isComparisonTo	If there is a comparison to product of same brand	-

2.4 French Newspaper Articles

Source:

http://www.psor.ucl.ac.be/personal/yb/Resource.html

File:

Eval.zip

Description:

702 sentences published in the Belgian French-language newspaper Le Soir in 1995. These sentences were evaluated by ten judges. Their task was to indicate, on a seven-point scale, up to what point the contents of each sentence evoked an unpleasant, neutral or pleasant idea.

Reference paper:

"Un barometre affectif effectif: Corpus de reference et methode pour determiner la valence affective de phrases"

Bethard, S., Fairon, C., Kerves, L., JADT 2004

Le Soir newspaper sentences

(Eval.zip)

Dataset field	Mapping	Description	Example
Sentence	uncovered:opinionText	Sentence from the paper	“Pour ses opérations souvent très meurtrières, l'armée use.....”
Valence mean	marl:polarityValue	Number for -3 to 3 that describes emotions caused by reading the sample (-3=very unpleasent, 3= pleasent)	-2.4
Standard deviation	uncovered:stdDeviation	-	0.7

2.5 Multi-Domain Sentiment Dataset

Source:

http://www.cs.jhu.edu/~mdredze/datasets/sentiment/

File:

processed_acl.tar.gz, processed_stars.tar.gz

Description:

Reviews of different product types taken from Amazon.

Reference paper:

"Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification"

Blitzer, J., Dredze, M.m Pereira, F., Proceedings of the Association for Computational Linguistics (ACL), 2007

Amazon customer reviews

(processed_acl.tar.gz)

Dataset field	Mapping	Description	Example
Filename	marl:Polarity	Filename in this dataset states the polarity	positive/negative
Feature	marl:describesFeature	Feature name	“the_failure”
Feature count	marl:algorithmConfidence	The amount of times a feature was detected in the review	1
Label	marl:Polarity	In addition to filename every review has attached polarity as well	positive/negative

3. Online Opinion Analysis Services

The following use cases aim to show how Marl Ontology could be used in different environments (as in systems) and when applied to to opinions of various complexity and structure.

3.1 Swotti

Url:

http://www.swotti.com

Description:

Product/Movie etc. opinions taken from the web (via search engines) and processed with opinion mining to create rankings and summerize.

Get Sample RDF/XML:

swotti.rdf

Aggregated opinions

Dataset field	Mapping	Description	Example
Product Name	marl:describesObject	Name of the product	“iPad”
Feature Name	marl:describesFeature	Generic feature name	“Usability”
Polarity	marl:Polarity	Positive or negative	“Positive”
Rating	marl:polarityValue	Aggregated rating value based on opinions from the web (rating is 1-5)	“5/5”
Opinion count	uncovered:opinionsCount	Total number of aggregated opinions (on all features)	“21.553”
Positive count	uncovered:positiveOpinionsCount	Amount of positive opinions vs. negative	“70% positive”

Individual opinions

Dataset field	Mapping	Description	Example
Tag	marl:describesFeature	Feature name	“Usability”
Date	uncovered:opinionAnalysisDate	Date the opinion was mined	“14.04.2010”
Opinion text	uncovered:opinionText	The fragment of text that expresses an opinion	“run on Apple's iPad, which features a lightweight, incredibly easy-to-use, touch-screen ....”
Opinion relevance	marl:algorithmConfidence	Rank of how valuable the opinion is	-
Opinion Value	marl:polarityValue	Polarity rated from 1-5	“5/5”
Polarity	marl:Polarity	If the opinion is positive or negative	Positive
Source URL	marl:extractedFrom	The URL of the page where the opinion was expressed	-

3.2 Tweetsentiments

Url:

http://www.tweetsentiments.com/

API:

http://intridea.com/2010/11/29/sentiment-analysis-using-tweetsentimentscom-api

Description:

Live mining of sentiments from Twitter.

Sentiments on tweets

(API)

Dataset field	Mapping	Description	Example
Text	uncovered:opinionText	Text fragment given as parameter	“hates being in on a friday :[“
Value	marl:describesFeature	1, -1 or 0	“1”
Name	marl:Polarity	Positive, negative or neutral	“Positive”

Sentiments on topic/user

(API)

Dataset field	Mapping	Description	Example
Topic / User	marl:describesObject	Topic or user name (parameter)	“iPad”
Sentiment Index	marl:polarityValue	Normalized overall value (1-100)	“60”
Positive	uncovered:positiveOpinionsCount	Total number of positive tweets	“42”
Negative	uncovered:negativeOpinionsCount	Total number of negative tweets	“6”
Neutral	uncovered:neutralOpinionsCount	Total number of neutral tweets	“52”
Results	marl:aggregatesOpinion	All tweets on topic/user that contained with sentiments	[user id, user name, image, date, ...]
Results Text	uncovered:opinionText	Table with tweets text	“hates being in on a friday :[“
Results sentiment	marl:polarityValue	Table with tweets sentiments	“-1”

3.3. Mombo

Url:

http://www.mombo.com

API:

http://www.mombo.com/api/documentation

Description:

Similar as Tweet Sentiment but only for movie reviews. Grabs messages from twitter and attaches sentiments to it. The current API does not allow full access to single twitter posts with sentiment values.

Movies with aggregated sentiment rating

(partially available via API)

Dataset field	Mapping	Description	Example
movieName	marl:describesObject	Name of the movie	“Gnomeo and Juliet”
mentions	uncovered:opinionsCount	Total amount of mentions of the movie on Twitter	“1”
avgMentionsPerDay	uncovered:opinionsCountPerDay	Average amount of tweets that expressed a sentiment per day	“1131.13”
momboMeter	marl:polarityValue	1-5 sentiment rating	“3.5”
All tweets	marl:aggregatesOpinion	Listing of all tweets (text, sentiment) connected to the movie	-
Positive tweets	uncovered:aggregatesPositiveOpinion	Listing of positive tweets (text, sentiment) connected to the movie	-
Negative tweets	uncovered:aggregatesNegativeOpinion	Listing of negative tweets (text, sentiment) connected to the movie	-
Tweet sentiment	marl:Polarity	Positive or negative ( 1= positive, 2= negative)	“1”
Text	uncovered:opinionText	Tweet text	“Gnomeo & juliet is AMAZING!”
from_user_id / user_profile	[Linked] dcterms:creator	Person that expressed the opinion	“16110567“
to_user_id	[Linked] dcterms:creator	For whom the opinion was created (in reply to who's post)	“16114367“
source	marl:extractedFrom	Url of the tweet	-
confidence_score	marl:algorithmConfidence	Mambo algorithm confidence score	“3”
override_time	uncovered:opinionChangedTo	Information if the sentiment on the topic by the particular user was changed later	-
created_at	dcterms:created	Date when opinion was created	“2011-02-14 03:46:47”
reply_to	[Linked] sioc:reply_of	If the tweet is a reply this will contain the url to the original tweet that triggered the sentiment	-

3.4 Opinion Crawl (Semantic Engines LLC)

Url:

http://opinioncrawl.com/

API:

http://www.semanticengines.com/sentimentapi.aspx

Description:

Web sentiment snapshot on a person, company or event. Processes a variety of content sources - news sites, blogs, forums, etc.

Sentiments in web documents

(API)

Dataset field	Mapping	Description	Example
name	marl:describesObject / marl:describesFeature	Object or feature to which sentiments are attached	“cookies“
datestamp	uncovered:opinionAnalysisDate	Date and timestamp of the analysis	“09/01/2010 11:34:26 PM”
overall	marl:Polarity	Overall document polarity: {positive, negative, neutral}	“Positive”
mentions	uncovered:opinionsCount	Total number of sentiment expressions in the document	“1”
positive	uncovered:positiveOpinionsCount	Number of positive sentiment expressions	-
negative	uncovered:negativeOpinionsCount	Number of negative sentiment expressions	-
neutral	uncovered:neutralOpinionsCount	Number of neutral sentiment expressions	-
bulltobear	marl:polarityValue	Ratio of positive vs negative sentiments	-
concepts	marl:describesObject / marl:describesFeature	Automatically extracted objects, features related to sentiments	“cookies”

3.5 Opal module for Drupal

Url:

http://www.gi2mo.org/apps/opal/

Description:

A module for analysing comments posted in the Drupal CMS. Uses Sentiwordnet to calculate the polarity of comments as well as overall polarity of the commented text.

Sentiments in Drupal comments

Dataset field	Mapping	Description	Example
Commented resource	[Linked] sioc:reply_of	Object or feature to which sentiments are attached	-
Comment reference	marl:extractedFrom	Reference to the comment URL	-
Polarity	marl:Polarity	Overall document polarity: {positive, negative, neutral}	“Neutral”
Result	marl:polarityValue	Number representing the polarity	“0.0182”

Sentiments in Drupal content types

Dataset field	Mapping	Description	Example
Content type reference	marl:extractedFrom	Content that has comments	-
Comment references	marl:aggregatesOpinion	References to comments (ids/ URLs)	-
Positive	uncovered:positiveOpinionsCount	Number of positive comments	"0"
Neutral	uncovered:neutralOpinionsCount	Number of neutral comments	"1"
Negative	uncovered:negativeOpinionsCount	Number of negative comments	"0"
Result	marl:polarityValue	Number representing the polarity	“0.0182”
Polarity	marl:Polarity	Overall document polarity: {positive, negative, neutral}	“Neutral”

3.6 OPfine / Jane16

Url:

http://www.jane16.com/

Description:

Analysis of sentiments about market (e.g. particular companies), implementation based on Jane16 open-source library.

Individual Documents

Dataset field	Mapping	Description	Example
Polarity	marl:Polarity	Positive/Negative	-
Sentiment	marl:polarityValue	References to comments (ids/ URLs)	"3.2"
Title of the article	[Linked] dcterms:title	Polarity value normalized from -100 to 100	“No comfort for Ireland buy-to-let investors”
Publish time	[Linked] dcterms:created	Date the webpage was created	-
Source	marl:extractedFrom	URL of the source	-
Topic	marl:describesObject	keyword representing the topic of the setiment	“Ireland”

3.7 Evri sentiment API

Url:

http://www.evri.com/developer/rest#API-GetSentimentInformation

Description:

Evri is a news aggregation website, out of others is provides a sentiment API for retrieval of opinions connected to particular items in its database.

Sentiment on Evri entities

*based only on API documentation

Dataset field	Mapping	Description	Example
Opinion topic entity	marl:describesObject	Who the opinions are about	“/organization/nato-0x308f6”
Positive	uncovered:positiveOpinionsCount	Percent of positive opinions	"28"
Negative	uncovered:negativeOpinionsCount	Percent of negative opinions	“72”
Opinion source entity	[Linked] dcterms:creator	Who expressed the opinion (can be individual or group)	“/russia-0x237ce”
Opinion polarity	marl:Polarity	The API enables to list individual opinions with detailed listing of source and polarity	-

3.8 Opendover

Url:

http://www.opendover.nl

Drupal Plugin:

http://drupal.org/project/opendover

API use example:

hhttp://www.sentimentsearch.nl/

Description:

A web service that provides opinion mining. In addition Opendover is available, similarly like OPAL, as a Drupal plugin.

Sentiments detection and polarity rating

(API documentation)

Dataset field	Mapping	Description	Example
docDate	uncovered:opinionAnalysisDate	Sentiment analysis date	-
document	[Linked] sioc:Post	Full document text	"28"
attitude	uncovered:opinionType	Type of sentiment: judgement, appreciation or emotional state.	“APPRECIATION”
value	uncovered:opinionTrigger	Word(s) describing sentiment)	“bad”
orientation	marl:Polarity	Positive or negative	“-1”
offset	uncovered:opinionTriggerTextOffset	Position of the word in text	12
force	marl:polarityValue	Polarity value -9 to +9	“-4”
length	uncovered:opinionTriggerTextLength	Length of the word	3
object	marl:describesObject	Reference to described entity	-

4 Summary and Comparison

Coverage per dataset/service

(linked properties are counted as coverd)

Dataset/service name	# Covered	# Uncovered	# Total	Coverage (percent)
Congressional speech data (Cornell)	7	5	12	58%
Movie Review Data (Cornell)	3	1	4	75%
Customer Review Data (Hu, Liu)	5	4	9	56%
French Newspaper Articles	1	2	3	33%
Multi-Domain Sentiment Dataset	4	0	4	100%
Swotti	9	4	13	69%
Tweetsentiments	6	5	11	55%
Mombo	10	6	16	63%
Opinion Crawl	4	5	9	44%
OPAL	8	3	11	73%
OPfine	6	0	6	100%
Evri	3	2	5	60%
Opendover	4	5	9	44%
Avarage	5	3	8	64%

Property usage in mappings for datasets and services

(average amount a property was used in a data source)

A Changelog

First version of the document

B Acknowledgements

The style formatting of the following document has been inspired on FOAF specification.

Special thanks for support with Marl ontology creation and research to: Prof. Carlos A. Iglesias and members of the GSI Group of DIT department of Universidad Politécnica de Madrid.

Marl Ontology Experiments

07 February 2011

Abstract

Table of Contents

Appendixes

1 Introduction

1.1 Opinions on the Web and the opinion mining process

1.2 The Semantic Web

1.3 What is Marl for?

2. Research Datasets

2.1 Congressional speech data (Cornell)

2.2 Movie Review Data (Cornell)

2.3. Customer Review Data (Hu, Liu)

2.4 French Newspaper Articles

2.5 Multi-Domain Sentiment Dataset

3. Online Opinion Analysis Services

3.1 Swotti

3.2 Tweetsentiments

3.3. Mombo

3.4 Opinion Crawl (Semantic Engines LLC)

3.5 Opal module for Drupal

3.6 OPfine / Jane16

3.7 Evri sentiment API

3.8 Opendover

4 Summary and Comparison

A Changelog

B Acknowledgements