This version: | http://purl.org/marl/experiments/0.1/ |
Latest version: | http://purl.org/marl/experiments/ |
Editors: | Adam Westerski |
Authors: | Adam Westerski |
Contributors: | See acknowledgements |
This work is licensed under a Creative Commons Attribution License. This copyright applies to the Marl Ontology Specification and accompanying documentation in RDF. This ontology uses W3C's RDF technology, an open Web standard that can be freely used by anyone.
Marl is a standardised data schema (also referred as "ontology" or "vocabulary") designed to annotate and describe subjective opinions expressed on the web or in particular Information Systems. The following document contains results of data mapping experiments where we try to describe various datasets or output of services with Marl ontology. For the description of ontology and instructions how to connect it with descriptions of other resources see Ontology Specification.
The following document gathers the data results of various experiments done with Marl Ontology. It’s goal it to test the coverage of Marl properties for different datasets constructed independently of the Marl project.
The analysis is split into two parts. Each of the sections presents a list of sources and Marl mappings for them, along with some coverage statistics.
Section two describes usage of the ontology to produce mappings for various datasets published by researchers during their opinion mining algorithms study. The second part relates to the same effort but conducted in context of online services for end users that publish opinion mined data.
The choice of sources for datasets is based on state of the art knowledge of authors (in case of research datasets the list was partially created based on resources listed by Pang et al. [ref]).
An important note is that Marl ontology presented here is not a complete model to address the problem of describing and linking opinions online and inside information systems. It marly defines concepts that are not described yet by the means of other ontologies and provides the data attributes that enable to connect opinions with contextual information already defined in metadata created with other ontologies. For detailed instructions and recommendations how to fully model opinions and the results of opinion mining process refer to analysis done by Gi2MO project.
With the birth of Web 2.0 users started to provide their input and create content on mass scape about their subjective opinions related to various topics (e.g. opinions about movies). While this kind of content can be very beneficial for many different uses (e.g. market analysis or predictions) it's accurate analysis and interpretation has not been fully harnessed yet. Information left by the users is often very disorganized and many portals that enable user input leave the user added information unmoderated.
Opinion mining (often referred as sentiment analysis) is one of the attempts bring order to those vast amounts of user generated content. The domain focuses to analyse textual content using special language processing tools and as output provides a quantified judgement of the sentiments contained in the text (e.g. if the text expresses a positive or negative opinion).
Due to the complexity of the problem and attempts to provide efficient and fast tools the area can be devided into three main research directions:
In relation to the World Wide Web, there is a number of common uses of opinion formalisation and analysis. Firstly, it can be applied on top of search engines to find the desired content and next run it through opinion analysis software to obtain desired statistics (e.g. Swotti). Secondly, such algorithms can used within dedicated systems that use the Web to connect to particular communities and gather their opinions on very specific topics (e.g. Internet shops or review websites).
In relation to the dedicated systems (e.g. Enterprise Systems), there the community collaborative models that have proven successful in the open web are often transferred to large enterprise to enhance knowledge exchange and bring the employees together. The same opinion mining techniques can be applied in such cases to extact particular information and use it for internal statistics and to improve knowledge search across the enterprise (e.g. see use of opinion mining in Idea Management [link]).
The Semantic Web is a W3C initiative that aims to introduce rich metadata to the current Web and provide machine readable and processable data as a supplement to human-readable Web.
Semantic Web is a mature domain that has been in research phase for many years and with the increasing amount of commercial interest and emerging products is starting to gain appreciation and popularity as one of the rising trends for the future Internet.
One of the corner stones of the Semantic Web is research on interlinkable and interoperable data schemas for information published online. Those schemas are often refered to as ontologies or vocabularies. In order to facilitate the concept of ontologies that lead to a truly interoperable Web of Data, W3C has proposed a series of technologies such as RDF and OWL. Marl uses those technologies and the research that comes within to propose an ontology for the particular goal of describing opinions and linking them with contextual information (such as opinion topic, features described in the opinion etc.).
The goals of the Marl ontology to achieve as a data schema are:
The goal of this experiment was to see the capabilities of Marl ontology as a universal data schema used with real data extracted with existing algorithms. In comparison to the use case study [link], here we evaluate if our assumptions about the model of opinion are correct when aligned with work of other researchers.
Before reading the below mappings please be advised that Marl is an ontology meant for annotation of opinions not various (detailed) parameters of opinion mining algorithms performance. Therefore, often the very individual algorithm data is not covered.
Dataset field | Mapping | Description | Example |
Agreement | marl:Polarity | Agreement between two speakers (binary) | 1 |
Debate Number | marl:extractedFrom | Reference to the text | 52 |
Speaker ID | [Linked] dcterms:creator | Person that is expressing an opinion | 400011 |
Referenced Speaker ID | marl:describesObject | Subject of the opinion | 1 |
Raw Score | uncovered:raw_score |
Score | 0.48831446 |
Normalized Score | marl:polarityValue | Score | 0.936875742 |
Edge Strength | uncovered:edge_strength |
- | 2342 |
Dataset field | Mapping | Description | Example |
Filename | marl:extractedFrom | Reference to the text | 052_400...txt |
Raw Score | uncovered:raw_score |
Score | -0.27982776 |
Normalized Score | marl:polarityValue | Score | -0.362440238 |
Strength from source | uncovered:edge_strengthSource |
- | 4094 |
Strength to sink | uncovered:edge_strengthSink |
- | 5906 |
Dataset field | Mapping | Description | Example |
Sentence | marl:extractedFrom | Line of text extracted from the review (reviews provided in separate files) | “color , musical bounce and warm seas lapping on island shores....“ |
Filename | uncovered:isSubjective |
Separate files for subjective and objective | - |
Dataset field | Mapping | Description | Example |
Filename | marl:extractedFrom | Source on the web | - |
File location/subdirectory | marl:Polarity | Separate directories for positive and negative | - |
Dataset field | Mapping | Description | Example |
Product name | marl:describesObject | Reviewed product name | “Creative Labs Nomad Jukebox Zen Xtra 40GB” |
Review title | [Linked] dcterms:title | TItle of the user review | “get this player ! “ |
Product feature | marl:describesObjectPart / marl:describesFeature | Element of a device or its characteristic | “wma file” |
Opinion polarity | marl:Polarity | Polarity of opinion | “+” |
Opinion strength | marl:polarityValue | Rating from 1-3 | “2” |
Feature presence | uncovered:hasExplicateTopic |
If the detected feature/object appears in the mined sentence | (player) ”get it , it 's worth every penny” |
Suggestion | uncovered:opinionType |
If the review is a suggestions or recommendation | - |
Comparison [cc] | uncovered:isComparisonTo |
If there is a comparison to product of different brand | - |
Comparison [cs] | uncovered:isComparisonTo |
If there is a comparison to product of same brand | - |
Dataset field | Mapping | Description | Example |
Sentence | uncovered:opinionText |
Sentence from the paper | “Pour ses opérations souvent très meurtrières, l'armée use.....” |
Valence mean | marl:polarityValue | Number for -3 to 3 that describes emotions caused by reading the sample (-3=very unpleasent, 3= pleasent) | -2.4 |
Standard deviation | uncovered:stdDeviation |
- | 0.7 |
Dataset field | Mapping | Description | Example |
Filename | marl:Polarity | Filename in this dataset states the polarity | positive/negative |
Feature | marl:describesFeature | Feature name | “the_failure” |
Feature count | marl:algorithmConfidence | The amount of times a feature was detected in the review | 1 |
Label | marl:Polarity | In addition to filename every review has attached polarity as well | positive/negative |
The following use cases aim to show how Marl Ontology could be used in different environments (as in systems) and when applied to to opinions of various complexity and structure.
Dataset field | Mapping | Description | Example |
Product Name | marl:describesObject | Name of the product | “iPad” |
Feature Name | marl:describesFeature | Generic feature name | “Usability” |
Polarity | marl:Polarity | Positive or negative | “Positive” |
Rating | marl:polarityValue | Aggregated rating value based on opinions from the web (rating is 1-5) | “5/5” |
Opinion count | uncovered:opinionsCount |
Total number of aggregated opinions (on all features) | “21.553” |
Positive count | uncovered:positiveOpinionsCount |
Amount of positive opinions vs. negative | “70% positive” |
Dataset field | Mapping | Description | Example |
Tag | marl:describesFeature | Feature name | “Usability” |
Date | uncovered:opinionAnalysisDate |
Date the opinion was mined | “14.04.2010” |
Opinion text | uncovered:opinionText |
The fragment of text that expresses an opinion | “run on Apple's iPad, which features a lightweight, incredibly easy-to-use, touch-screen ....” |
Opinion relevance | marl:algorithmConfidence | Rank of how valuable the opinion is | - |
Opinion Value | marl:polarityValue | Polarity rated from 1-5 | “5/5” |
Polarity | marl:Polarity | If the opinion is positive or negative | Positive |
Source URL | marl:extractedFrom | The URL of the page where the opinion was expressed | - |
Dataset field | Mapping | Description | Example |
Text | uncovered:opinionText |
Text fragment given as parameter | “hates being in on a friday :[“ |
Value | marl:describesFeature | 1, -1 or 0 | “1” |
Name | marl:Polarity | Positive, negative or neutral | “Positive” |
Dataset field | Mapping | Description | Example |
Topic / User | marl:describesObject | Topic or user name (parameter) | “iPad” |
Sentiment Index | marl:polarityValue | Normalized overall value (1-100) | “60” |
Positive | uncovered:positiveOpinionsCount |
Total number of positive tweets | “42” |
Negative | uncovered:negativeOpinionsCount |
Total number of negative tweets | “6” |
Neutral | uncovered:neutralOpinionsCount |
Total number of neutral tweets | “52” |
Results | marl:aggregatesOpinion | All tweets on topic/user that contained with sentiments | [user id, user name, image, date, ...] |
Results Text | uncovered:opinionText |
Table with tweets text | “hates being in on a friday :[“ |
Results sentiment | marl:polarityValue | Table with tweets sentiments | “-1” |
Dataset field | Mapping | Description | Example |
movieName | marl:describesObject | Name of the movie | “Gnomeo and Juliet” |
mentions | uncovered:opinionsCount |
Total amount of mentions of the movie on Twitter | “1” |
avgMentionsPerDay | uncovered:opinionsCountPerDay |
Average amount of tweets that expressed a sentiment per day | “1131.13” |
momboMeter | marl:polarityValue | 1-5 sentiment rating | “3.5” |
All tweets | marl:aggregatesOpinion | Listing of all tweets (text, sentiment) connected to the movie | - |
Positive tweets | uncovered:aggregatesPositiveOpinion |
Listing of positive tweets (text, sentiment) connected to the movie | - |
Negative tweets | uncovered:aggregatesNegativeOpinion |
Listing of negative tweets (text, sentiment) connected to the movie | - |
Tweet sentiment | marl:Polarity | Positive or negative ( 1= positive, 2= negative) | “1” |
Text | uncovered:opinionText |
Tweet text | “Gnomeo & juliet is AMAZING!” |
from_user_id / user_profile | [Linked] dcterms:creator | Person that expressed the opinion | “16110567“ |
to_user_id | [Linked] dcterms:creator | For whom the opinion was created (in reply to who's post) | “16114367“ |
source | marl:extractedFrom | Url of the tweet | - |
confidence_score | marl:algorithmConfidence | Mambo algorithm confidence score | “3” |
override_time | uncovered:opinionChangedTo |
Information if the sentiment on the topic by the particular user was changed later | - |
created_at | dcterms:created | Date when opinion was created | “2011-02-14 03:46:47” |
reply_to | [Linked] sioc:reply_of | If the tweet is a reply this will contain the url to the original tweet that triggered the sentiment | - |
Dataset field | Mapping | Description | Example |
name | marl:describesObject / marl:describesFeature | Object or feature to which sentiments are attached | “cookies“ |
datestamp | uncovered:opinionAnalysisDate |
Date and timestamp of the analysis | “09/01/2010 11:34:26 PM” |
overall | marl:Polarity | Overall document polarity: {positive, negative, neutral} | “Positive” |
mentions | uncovered:opinionsCount |
Total number of sentiment expressions in the document | “1” |
positive | uncovered:positiveOpinionsCount |
Number of positive sentiment expressions | - |
negative | uncovered:negativeOpinionsCount |
Number of negative sentiment expressions | - |
neutral | uncovered:neutralOpinionsCount |
Number of neutral sentiment expressions | - |
bulltobear | marl:polarityValue | Ratio of positive vs negative sentiments | - |
concepts | marl:describesObject / marl:describesFeature | Automatically extracted objects, features related to sentiments | “cookies” |
Dataset field | Mapping | Description | Example |
Commented resource | [Linked] sioc:reply_of | Object or feature to which sentiments are attached | - |
Comment reference | marl:extractedFrom | Reference to the comment URL | - |
Polarity | marl:Polarity | Overall document polarity: {positive, negative, neutral} | “Neutral” |
Result | marl:polarityValue | Number representing the polarity | “0.0182” |
Dataset field | Mapping | Description | Example |
Content type reference | marl:extractedFrom | Content that has comments | - |
Comment references | marl:aggregatesOpinion | References to comments (ids/ URLs) | - |
Positive | uncovered:positiveOpinionsCount |
Number of positive comments | "0" |
Neutral | uncovered:neutralOpinionsCount |
Number of neutral comments | "1" |
Negative | uncovered:negativeOpinionsCount |
Number of negative comments | "0" |
Result | marl:polarityValue | Number representing the polarity | “0.0182” |
Polarity | marl:Polarity | Overall document polarity: {positive, negative, neutral} | “Neutral” |
Dataset field | Mapping | Description | Example |
Polarity | marl:Polarity | Positive/Negative | - |
Sentiment | marl:polarityValue | References to comments (ids/ URLs) | "3.2" |
Title of the article | [Linked] dcterms:title | Polarity value normalized from -100 to 100 | “No comfort for Ireland buy-to-let investors” |
Publish time | [Linked] dcterms:created | Date the webpage was created | - |
Source | marl:extractedFrom | URL of the source | - |
Topic | marl:describesObject | keyword representing the topic of the setiment | “Ireland” |
Dataset field | Mapping | Description | Example |
Opinion topic entity | marl:describesObject | Who the opinions are about | “/organization/nato-0x308f6” |
Positive | uncovered:positiveOpinionsCount |
Percent of positive opinions | "28" |
Negative | uncovered:negativeOpinionsCount |
Percent of negative opinions | “72” |
Opinion source entity | [Linked] dcterms:creator | Who expressed the opinion (can be individual or group) | “/russia-0x237ce” |
Opinion polarity | marl:Polarity | The API enables to list individual opinions with detailed listing of source and polarity | - |
Dataset field | Mapping | Description | Example |
docDate | uncovered:opinionAnalysisDate |
Sentiment analysis date | - |
document | [Linked] sioc:Post | Full document text | "28" |
attitude | uncovered:opinionType |
Type of sentiment: judgement, appreciation or emotional state. | “APPRECIATION” |
value | uncovered:opinionTrigger |
Word(s) describing sentiment) | “bad” |
orientation | marl:Polarity | Positive or negative | “-1” |
offset | uncovered:opinionTriggerTextOffset |
Position of the word in text | 12 |
force | marl:polarityValue | Polarity value -9 to +9 | “-4” |
length | uncovered:opinionTriggerTextLength |
Length of the word | 3 |
object | marl:describesObject | Reference to described entity | - |
Dataset/service name | # Covered | # Uncovered | # Total | Coverage (percent) |
Congressional speech data (Cornell) | 7 | 5 |
12 | 58% |
Movie Review Data (Cornell) | 3 | 1 |
4 | 75% |
Customer Review Data (Hu, Liu) | 5 | 4 |
9 | 56% |
French Newspaper Articles | 1 | 2 |
3 | 33% |
Multi-Domain Sentiment Dataset | 4 | 0 |
4 | 100% |
Swotti | 9 | 4 |
13 | 69% |
Tweetsentiments | 6 | 5 |
11 | 55% |
Mombo | 10 | 6 |
16 | 63% |
Opinion Crawl | 4 | 5 |
9 | 44% |
OPAL | 8 | 3 |
11 | 73% |
OPfine | 6 | 0 |
6 | 100% |
Evri | 3 | 2 |
5 | 60% |
Opendover | 4 | 5 |
9 | 44% |
Avarage | 5 | 3 |
8 | 64% |
The style formatting of the following document has been inspired on FOAF specification.
Special thanks for support with Marl ontology creation and research to: Prof. Carlos A. Iglesias and members of the GSI Group of DIT department of Universidad Politécnica de Madrid.