What is PreMOn?

PreMOn (PRedicate Model for ONtologies) is a linguistic resource for representing predicate models (i.e., PropBank, NomBank, VerbNet, FrameNet), their annotations, the mappings between them (e.g, SemLink, PredicateMatrix), and the mappings to frame-based ontologies in RDF/OWL. PreMOn consists of two components:

  1. the PreMOn Ontology, an OWL 2 ontology that extends lemon (by the W3C Ontology Lexicon Community Group) for modeling the core concepts of semantic class (i.e., roleset in PropBank and NomBank, verb class in VerbNet, and frame in FrameNet), semantic role, mapping, and annotation common to all predicate models; and,

  2. the PreMOn Dataset, a freely-available, interlinked RDF dataset containing the PropBank, NomBank, VerbNet, and FrameNet predicate model data (in various versions), the examples provided with the original resources, and the SemLink and PredicateMatrix mappings, published online as Linked Open Data according to the PreMOn Ontology.

Compared to the current situation where each predicate model has its own proprietary XML format, PreMOn brings several benefits to users of predicate models:

  1. ease of access and reuse of predicate model data, due to the adoption of a common RDF format, stable URIs, and LOD best practices;
  2. possibility to abstract and capture the aspects common to different predicate models, while at the same time keeping track of the peculiarities of each model (using RDFS/OWL subclass/subproperty primitives);
  3. possibility to apply SW technologies to predicate model data, such as automated reasoning and SPARQL querying, e.g., for retrieving the semantic classes of a lexical entry and the associated mappings;
  4. possibility to combine PreMOn with other linguistic ontologies, e.g., for providing the SRL annotations of a text according to the NLP Interchange Format (NIF);
  5. possibility for third parties to publish and interlink their datasets with PreMOn, extending it in a decentralized way (e.g., with new mappings).

These capabilities are particulary relevant in the context of Semantic Web (SW), where predicate models are increasingly used for information extraction, e.g., in tools such as NewsReader, PIKES, or as the starting point from which to derive ontologies of extracted knowledge, such as FrameBase and the ESO ontology both derived from FrameNet.

PreMOn ontology

The PreMOn ontology consists of a core module defining the main abstractions for representing predicate model, and of additional modules (PropBank, NomBank, VerbNet, and FrameNet) specific to each predicate model included in PreMOn.

The core module (see figure) extends lemon and thus inherits the capability to represent lexical entries (class LexicalEntry) with their associated forms, and to relate lexical entries to the ontological entities they denote (classes, properties, individuals) using the LexicalSense reified relation. lemon supports mapping entries to LexicalConcepts (subclass of SKOS Concept), each denoting an intensional (∼informal) meaning evoked by a set of lexical entries. Example of lexical concepts are WordNet synsets, whose semantics is not formally encoded in an ontology.

PreMOn ontology, core module

Semantic Classes and Roles

[in green in the figure]

The PreMOn Ontology Core Module extends lemon by introducing classes pmo:SemanticClass and pmo:SemanticRole. pmo:SemanticClass homogeneously represents the semantic classes from the various predicate models. That is, individuals of this class correspond to rolesets in PropBank and NomBank (e.g., pm:nb10-seller.01 and pm:pb17-sell.01), verb classes in VerbNet (e.g., pm:vn32-give-13.1-1), and frames in FrameNet (e.g., pm:fn15-commerce sell). An instance of pmo:SemanticClass typically has (via property pmo:semRole) a number of pmo:SemanticRoles, representing, from a semantic point of view, the roles the arguments of that pmo:SemanticClass can play.

Semantic roles are defined locally to semantic classes, so VerbNet agent is represented as multiple semantic roles, one for each verb class it occurs in, and with each semantic role linked to its specific selectional restrictions (if any). Note that pmo:SemanticClass is defined as subclass of ontolex:LexicalConcept, as we see pmo:SemanticClasses as essentially informal concepts rather than well defined concepts of a formal ontology (although an ontology can be derived from them, cf., FrameBase and ESO). Being ontolex:LexicalConcepts, pmo:SemanticClasses inherit the link to lexical entries as well as the link (via ontolex:isConceptOf) to the ontological entities formalizing them, typically event classes.

Properties pmo:classRel and pmo:roleRel, and their resource-specific subproperties, are introduced to express the relations between elements at each level, such as subtyping, and predicate and role inheritance (e.g., pmofn:inheritsFrom and pmofn:inheritsFromFER for FrameNet). Additional resource-specific classes (e.g., pmovn:ThematicRole) and properties (e.g., pmovn:thematicRole) further characterize important aspects of each predicate model, like commonalities between semantic roles.


[in red in the figure]

Mappings between different predicate models are practically relevant but cannot be expressed using only the classes above, as they are often defined (e.g., in SemLink and PredicateMatrix) in terms of <pmo:SemanticClass, ontolex:LexicalEntry> pairs. To model these pairs, one could reuse the notion of ontolex:LexicalSense. However, its formalization in lemon as reified relation depends on the existence of (exactly) one ontological entity for each <ontolex:LexicalConcept, ontolex:LexicalEntry> pair, a strong constraint that we do not necessarily need for our purposes. Therefore, we introduce the pmo:Conceptualization class. Structurally, a pmo:Conceptualization can be seen as the reification of the ontolex:evokes relation between ontolex:LexicalEntry and ontolex:LexicalConcept. Semantically, it can be seen as a very specific intensional concept (among many, in case of polysemy) evoked by a single ontolex:LexicalEntry, which can be generalized to a ontolex:LexicalConcept when multiple entries are considered but with a possible loss of information that prevents precise alignments to be represented.

Mappings are explicitly represented as individuals of class pmo:Mapping, and can be seen as sets of (or n-ary relations between) either (i) pmo:Conceptualizations, (ii) pmo:SemanticClasses, and (iii) pmo:SemanticRoles, with role mappings anchored to conceptualization or class mappings via property pmo:semRoleMapping. We rely on this set-like modeling, since mappings are not necessarily represented as binary relations in predicate mapping resources: e.g., in the PredicateMatrix, each row represents the mapping of a semantic role / lexical entry pair over the different resources (e.g., <13.1-1-agent, deal> in VerbNet, <sell.01-arg0, sell> in PropBank, <Commerce Sell-seller, sell> in FrameNet) as well as the corresponding WordNet verb sense. Reifying the n-ary mapping relation also allows us, if needed, to further characterize each single mapping, asserting additional information such as confidence and reliability. Moreover, it is possible to further specialize mappings (e.g., to model mappings holding only in one direction, from a resource to another one, or to represent different types of relationships among the members of the mapping) by subtyping the pmo:Mapping class or the property (pmo:item) relating a pmo:Mapping to its members.


[in yellow in the figure]

Predicate models are typically complemented by examples showing concrete occurrences of semantic classes and roles in text. More generally, a text can be annotated with semantic classes and roles as a result of manual or automatic SRL.

The PreMOn Ontology provides some common primitives, based on the NLP Interchange Format (NIF), which aim at properly modeling the heterogeneous annotations of a text for different predicate models. NIF introduces the general notion of nif:String to represent arbitrary text strings. nif:Context is a particular subclass of nif:String, representing a whole string of text. Any substring (itself a nif:String) has a nif:referenceContext relation to the nif:Context individual representing the whole text containing it.

To specifically model the aforementioned examples complementing predicate models, we introduce pmo:Example, subclass of nif:Context, to represent the string associated with the example. The occurrence of a ontolex:LexicalEntry, pmo:SemanticClass, or pmo:SemanticRole in a nif:Context is denoted by an instance of nif:Annotation, related to the given ontolex:LexicalEntry, pmo:SemanticClass, or pmo:SemanticRole via property pmo:valueObj (the value attached to the annotation), and to the nif:Context instance via property nif:annotation. If detailed information on the specific span of text (i.e., substring) denoting the ontolex:LexicalEntry, pmo:SemanticClass, or pmo:SemanticRole is available (e.g., FrameNet provides the specific offsets of lexical units, frames, and frame elements, in the example text) an additional instance of pmo:Markable, subclass of nif:String, is created and linked to the specific nif:Annotation and nif:Context via properties nif:annotation and nif:referenceContext, respectively. As the same nif:Context may contain multiple nif:Annotations referring to one or more semantic classes and their corresponding roles, an additional pmo:AnnotationSet instance is created to cluster annotations from the same predicate structure.

Below you can find an image illustrating the instantiation of the PreMOn Ontology with some semantic classes, semantic roles, mappings, and examples, with predicate model data.

Example of predicates representation using the PreMOn ontology

Additional classes and properties are defined in the specific submodule for each predicate model (PropBank, NomBank, VerbNet, and FrameNet).

PreMOn datasets

To populate PreMOn with content from the various resources (predicate models, mappings), we developed an open-source Java command-line tool available. The tool applies pluggable, resource-specific converters to the original distribution files of each resource, instantiating the proper individuals and assertions according to the PreMOn Ontology. If available, mappings to additional resources (e.g., WordNet synsets, OntoNotes groupings) are also extracted. OWL 2 RL inference, statistics extraction and some cross-resource cleanup (e.g., for dropping inconsistent mappings) are applied to extracted triples, leveraging RDFpro for RDF processing.

Specific conversion strategies had to be implemented for each predicate model. E.g., in VerbNet, semantic roles (with selectional constraints) and frames have to be propagated from a class to its subclasses, unless redefined in the latter. In PropBank (and NomBank), the instantiation of pmopb:SemanticRoles requires creating an individual for each ⟨pmopb:Roleset, pmopb:Argument⟩ pair, as no information is provided on which arguments a predicate may have (besides explicit occurrence in frame files, in which case semantic role attributes pmopb:core/pmonb:core are set to “true”). We applied the conversion suite on a large collection of resources, producing a comprehensive dataset, namely the PreMOn Dataset, containing:

  • PropBank v1.7 (pb17)
  • PropBank v2.1.5 released with OntoNotes v5 (pb215)
  • NomBank v1.0 (nb10)
  • VerbNet v3.2 (vn32)
  • FrameNet v1.5 (fn15)
  • FrameNet v1.6 (fn16)
  • SemLink 1.2.2c (sl122c)
  • PredicateMatrix 1.3 (pm13)

The PreMOn Dataset contains the mappings between semantic classes and roles provided by each predicate model, SemLink and the PredicateMatrix, as well as the mappings between VerbNet classes and lexical senses in WordNet 3.1 (wn31) and OntoNotes 5 groupings.

By adopting an homogeneous schema for heterogeneous predicate models, PreMOn facilitates the joint querying of content from different resources. For instance, a query like

SELECT DISTINCT ?lexEnt (COUNT(?resource) as ?n)
	{SELECT DISTINCT ?lexEnt ?resource
		GRAPH ?resource
			{?lexEnt ontolex:evokes ?semCla .
			?semCla a pmo:SemanticClass . }
		?conc pmo:evokingEntry ?lexEnt .
		?mapping a pmo:Mapping ; pmo:item ?conc }
} GROUP BY ?lexEnt ORDER BY DESC(?n) ?lexEnt

looks for lexical entries (?lexEnt) evoking semantic classes in different resources (?resource), for which no mappings are defined (try this query on our SPARQL endpoint). Results are ordered by decreasing number of resources defining the lexical entry. This query hints a way to exploit PreMOn to investigate, and possibly extend, mappings between predicate models.

What’s next?

Back to top

Last Published: 2016/05/20.

Reflow Maven skin by Andrius Velykis.

Data and Knowledge Management tools