|
|
 |
 |
|
 |
|
WordNet WordNet® is an online lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets.
Word Sense Disambiguation in a Slot Grammar Framework This is a preliminary report on a system for word sense disambiguation (WSD) for unrestricted vocabulary, which requires no training on tagged text. Disambiguation is done to WordNet word senses. The “disambiguating power” of the system comes from three sources: (A) Parsing by English Slot Grammar (ESG), (B) the WordNet relation system, and (C) the WordNet sense frequency data.
Mapping of EuroWordnet Top Ontology to Upper Cyc Ontology A mapping of EuroWordnet Top Ontology into Upper Cyc Ontology is presented. The mapping is expressed in terms of a CycL microtheory encoding of the EuroWordnet Top Ontology, because it is impossible to be made just by means of equivalence and subsumption relations.
WordNet::Similarity This is a CPAN module that implements a variety of semantic similarity measures that can be used in conjunction with WordNet. In particular, it supports the measures of Resnik, Lin, Jiang-Conrath, Leacock-Chodorow, Hirst-St.Onge, Wu-Palmer, Banerjee-Pedersen, and Patwardhan-Pedersen.
eXtended WordNet The goal of this project is to develop a tool that takes as input the current or future versions of WordNet and automatically generates an eXtended WordNet that provides several important enhancements intended to remedy the present limitations of WordNet.
NameNet: a Self-Improving Resource for Name Classification This paper presents a semantically structured resource of more than 1,600 Name Classes. This structure is based on the noun hyperonymy hierarchies in WordNet, expanded and validated by corpus evidence collected from the World Wide Web. The set of seed examples provided by WordNet is boostrapped and the used to automatically construct an annotated training corpus for each Name Class. The resulting Named Entity resource enables a supervised Named Entity Recognizer to identify all the encoded Name Classes with high accuracy and without any human intervention.
Balkanet The Balkan WordNet aims at the development of a multilingual lexical database comprising of individual WordNets for the Balkan languages. The most ambitious feature of the BalkaNet is its attempt to represent semantic relations between words in each Balkan language and link them together in order to develop an on line multilingual semantic network. The main objective is the development of each's languages WordNet from available resources covering the general vocabulary of each language. Semantic relations will be classified in the independent WordNets according to a shared ontology. Then, all individual WordNets will be organized into a common database providing linking across them. Each of the WordNets will be structured along the same lines as the EuroWordNet through a WordNet Management System. This project is an excellent opportunity to explore the less studied Balkan languages and combine and compare them cross-linguistically.
WordNet.Net WordNet.Net library - the .Net Framework library for WordNet.
WordNet-based semantic similarity measurement Semantic similarity is a confidence score that reflects the semantic relation between the meanings of two sentences. It is difficult to gain a high accuracy score because the exact semantic meanings are completely understood only in a particular context. | |
 | |
 |
 |
|
 |
|
Vecteurs conceptuels et fonctions lexicales : application à l'antonymie. Ce mémoire porte sur la représentation de l'aspect thématique des segments textuels (documents, paragraphes, syntagmes, etc). Nous nous basons sur une approche mixte (symbolique et vectorielle) qui vise à combiner les informations déductibles des structures syntaxiques et les informations issues des représentations de sémantique lexicale. Certaines formes syntaxiques sont indirectement porteuses de sens et d'une facon générale peuvent être modélisées à l'aide de la théorie sens-texte et des fonctions lexicales. La négation, très fréquente dans les textes, peut permettre, entre autres, d'éviter les répétitions, ou de produire des énoncés dont la forme n'est pas lexicalement avérée comme, par exemple, les syntagmes "il n'est pas sérieux", "il n'est pas aimable". Les mots ,sérieux - ou aimable- n'ont pas de contraires bien avérées. Les termes ,léger - et désagréable- ne sont tout au plus que des approximations. La négation ne signifie pas toujours le contraire d'une affirmation, comme dans le cas de la phrase, "elle n'est pas belle, elle est superbe". Par contre, dans le cas, "il n'est pas mort" la négation exprime, a priori, l'idée opposée "il est vivant" avec cependant les problèmes de la polysémie et des sens figurés. On peut parler de "vivant" dans le sens gai, tonique.
Antonymy and Semantic Range in English This dissertation investigates what makes two words antonyms. Previous research has not adequately explained why some words seem to contrast in meaning but are still not considered antonyms (e.g. large and little) nor can it explain why some words have two antonyms (e.g., happy/sad and happy/unhappy). An explanation is given here using the notion of "semantic range" (a description of a word's typical collocation patterns); antonyms are shown to be words which have a great deal of semantic range in common. | |
 | |
 |
 |
| Other resources on ontologies | |
 |
|
Sites Relevant to Ontologies and Knowledge Sharing A list of resources on Ontologies and Knowledge Sharing.
John Bateman's ontology portal This page is a collection of starting points for information on ontologies gathered together for ease of reference for our own ontology-related projects.
Fine-Grained Proper Noun Ontologies for Question Answering The WordNet lexical ontology, which is primarily composed of common nouns, has been widely used in retrieval tasks. Here, we explore the notion of a finegrained proper noun ontology and argue for the utility of such an ontology in retrieval tasks. To support this claim, we build a fine-grained proper noun ontology from unrestricted news text and use this ontology to improve performance on a question answering task.
An introduction to Ontology by John F. Sowa Ontology is the study of existence. An ontology is a system of categories for classifying and talking about the things that are assumed to exist. This directory contains a summary of the ontology developed and used in the KR book by John Sowa.
KBS / Ontology Projects Worldwide Some ongoing KBS/Ontology projects and groups.
OWL Web Ontology Language Reference The Web Ontology Language OWL is a semantic markup language for publishing and sharing ontologies on the World Wide Web. OWL is developed as a vocabulary extension of RDF (the Resource Description Framework) and is derived from the DAML+OIL Web Ontology Language. | |
 | |
 |
 |
|
 |
|
Sekine's Extended Named Entity Hierarchy The Extended Named Entity Hierarchy is designed and developed to meet increasing needs for wider range of NE types. It originates from the first Named Entity set defined by MUC (Grishman et al., 1996), the Named Entity set developed by IREX (Sekine et al., 2000), and the Extended Named Entity hierarchy which contains approximately 150 NE types (Sekine et al., 2002). But now it extened again t 200 NE types. The applications include Questions and Answering (Q&A) system that analyzes general texts such as newspaper articles, as well as Information Extraction (IE), Machine Translation (MT), Summarization and Information Retrieval (IR) systems that meet variety of NLP applications. We designe the Extended Named Entity Hierarchy, so that Q&A system or IE system assuming that information one wants know is basically in a form of noun phrase with specific names, time expression or numerical values.
NameNet: a Self-Improving Resource for Name Classification This paper presents a semantically structured resource of more than 1,600 Name Classes. This structure is based on the noun hyperonymy hierarchies in WordNet, expanded and validated by corpus evidence collected from the World Wide Web. The set of seed examples provided by WordNet is boostrapped and the used to automatically construct an annotated training corpus for each Name Class. The resulting Named Entity resource enables a supervised Named Entity Recognizer to identify all the encoded Name Classes with high accuracy and without any human intervention. | |
 | |
 |
 |
|
 |
|
Story understanding resources A list of resources on story understanding
Story understanding through multi-representation model construction We present an implemented model of story understanding and apply it to the understanding of a children’s story. We argue that understanding a story consists of building multirepresentation models of the story and that story models are efficiently constructed using a satisfiability solver. We present a computer program that contains multiple representations of commonsense knowledge, takes a narrative as input, transforms the narrative and representations of commonsense knowledge into a satisfiability problem, runs a satisfiability solver, and produces models of the story as output. The narrative, models, and representations are expressed in the language of Shanahan’s event calculus.
Understanding script-based stories using commonsense reasoning This paper investigates the use of commonsense reasoning to understand texts involving stereotypical activities or scripts. We present a system that understands news stories involving four terrorism scripts. The system (1) builds a commonsense reasoning problem given an information extraction template representing a terrorist incident, and (2) uses commonsense reasoning and a commonsense knowledge base to build a model of the terrorist incident. The reasoning problem, commonsense knowledge base, and model are expressed in the classical logic event calculus. The system was developed using the MUC3 and MUC4 development data set. We present the results of running the system on the MUC3 and MUC4 test data sets, using manually generated answer key templates and templates generated automatically by two MUC4 information extraction systems. We present a detailed analysis of the models produced by the system given automatically generated templates. We present methods for answering questions based on the models produced by our system. We assess the portability of the system by extending it to handle 10 scripts frequent in Project Gutenberg American literature texts.
Prospects for in-depth story understanding by computer (Erik T. Mueller - November 29, 1999) While much research on the hard problem of in-depth story understanding by computer was performed starting in the 1970s, interest shifted in the 1990s to information extraction and word sense disambiguation. Now that a degree of success has been achieved on these easier problems, I propose it is time to return to in-depth story understanding. In this paper I examine the shift away from story understanding, discuss some of the major problems in building a story understanding system, present some possible solutions involving a set of interacting understanding agents, and provide pointers to useful tools and resources for building story understanding systems.
The Plots of Children and Machines: The Statistical and Symbolic Semantic Analysis of Narratives This thesis presents a method of automatic plot analysis of narrative texts that uses both components of traditional symbolic analysis of natural language and statistical machine-learning. In particular, we are investigating the story rewriting task. In the story rewriting task, an exemplar story is read to the pupils and the pupils rewrite the story in StoryStation, which allows them to concentrate more on diction and grammar than on content creation. However, often in the process of content creation the pupil improperly recalls the story. Our method of automatic plot analysis should allow the tutoring system to automatically analyze the plot of the story and provide relevant feedback to both the pupil and teacher. (Harry Reeves Halpin, Master of Science - School of Informatics - University of Edinburgh, 2003) | |
 | |
 |
 |
| Natural Semantic Metalanguage (NSM) | |
 |
|
The Natural Semantic Metalanguage homepage This site contains information and resources about the 'natural semantic metalanguage' (NSM) approach to semantic analysis, which can lay claim to being the most well-developed, comprehensive and practical approach to cross-cultural semantics on the contemporary scene. The approach is based on evidence that there is a small core of basic, universal meanings, known as semantic primes, which can be found as words or other linguistic expressions in all languages. This common core of meaning can be used as a tool for linguistic and cultural analysis: to explicate complex and culture-specific words and grammatical constructions, and to articulate culture-specific values and attitudes (cultural scripts), in terms which are maximally clear and translatable. The theory also provides a semantic foundation for universal grammar and for linguistic typology. It has applications in intercultural communication, lexicography (dictionary making), language teaching, the study of child language acquisition, legal semantics, and other areas. The main author is Anna Wierzbicka, who is the originator of the theory, but she has many colleagues and collaborators whose works are also listed here.
Semantics: Primes and Universals (Anna Wierzbicka) Conceptual primitives and semantic universals are the cornerstones of a semantic theory which Anna Wierzbicka has been developing for many years. Semantics: Primes and Universals is a major synthesis of her work, presenting a full and systematic exposition of that theory in a non-technical and readable way. It delineates a full set of universal concepts, as they have emerged from large-scale investigations across a wide range of languages undertaken by the author and her colleagues. On the basis of empirical cross-linguistic studies it vindicates the old notion of the "psychic unity of mankind", while at the same time offering a framework for the rigorous description of different languages and cultures.
Definition of "Natural semantic metalanguage" From Wikipedia, the free encyclopedia. | |
 | |
 |
 |
| KIM (Knowledge and Information Management) Platform | |
 |
|
Ontotext KIM The KIM Platform provides a novel Knowledge and Information Management (KIM) infrastructure and services for automatic semantic annotation, indexing, and retrieval of unstructured and semi-structured content. The most direct applications of KIM are: Generation of meta-data for the Semantic Web, which allows hyper-linking and advanced visualization and navigation; Knowledge Management, enhancing the efficiency of the existing indexing, retrieval, classification and filtering applications. Ontotext is a Sirma laboratory for R&D related to knowledge representation, linguistics, and web services. We provide core technology with applications in Knowledge Management, Semantic Web, and integration. Read more about us and about our Products, Mission, Skills, and Focus. Ontotext is proven to be knowledgeable, reliable, and cost-effective in: development of tools and solutions: knowledge management; language engineering; semantic web services; custom reasoning services; ontology design, evaluation, and mapping: domain analysis and modelling; application-specific ontologies. Our most popular product is the KIM platform for semantic annotation, indexing and retrieval. | |
 | |
 |
 |
|
 |
|
MontyLingua V.2.1 (Python and Java) MontyLingua is a free, commonsense-enriched, end-to-end natural language understander for English. Feed raw English text into MontyLingua, and the output will be a semantic interpretation of that text. Perfect for information retrieval and extraction, request processing, and question answering. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people's names, places, events, dates and times, and other semantic information. MontyLingua makes traditionally difficult language processing tasks trivial! | |
 | |
 |
 |
|
 |
|
A syntax / semantic interface using broad-coverage resources in English In Natural Language Processing, we must first compute a semantic representation of a text prior to “understanding” it. We describe here how to pass from a syntactic structure (generated by a syntactic parser of English) to a semantic form (in the form of predicates and relations between these predicates). Our approach is based on the interoperability between several resources, covering syntactical (Link Grammar Parser), lexical (WordNet) and semantic (VerbNet) aspects of English. The joint use of these broad-coverage resources leads to encouraging results on lexical and syntactical disambiguation. That also makes it possible to assign a “semantic probability” to each interpretation of a sentence. (MSc dissertation of François-Régis Chaumartin)
A Practical Semantic Representation For Natural Language Parsing This thesis deals with the problem of building fast, accurate and portable parsers for natural language understanding. Our focus is a multi-domain dialogue system in which we need a deep linguistically-motivated parser to produce the representations of the input suitable for reasoning. In this dissertation, we are concerned with building parsers which have the wide coverage and portability o ered by a general syntactic grammar without sacri cing parsing speed and accuracy.
Synchronisation des connaissances syntaxiques et sémantiques pour l'analyse d'énoncés en langage naturel à l'aide des grammaires d'arbres adjoints lexicalisées - Djamé Seddah A interface between syntax and semantic aims to propose a logical formalization of the relations between the parts of a sentence. This thesis is a proposal based upon the analysis of problematic linguistic phenomena in the Lexicalized Tree Adjunct Grammars (LTAG) framework. LTAG is a linguistic formalism which provides two structures of representation, derived tree and derivation tree. The last one is an almost perfect structure to be used as a canvas for semantic analysis. However, the derivation tree cannot represent coindexations in an autonomous way. We based our proposition upon the study of linguistic phenomena induced by control verbs. In order to allow their treatment and their complete formalization, we modify the initial LTAG formalism by introducing a new lexical information: the control canvas. Its purpose is to integrate inference of missing argumental links into a synchronous course of derived trees and derivations trees via a shared forest. We propose a dynamic reconstruction algorithm based on inference rules. These rules are executed during the derivation tree extraction process from the shared forest. As we use tabular techniques, we can extract, into a dependency graph, all the argumental relations described by one shared forest. | |
 | |
 |
 |
| Other resources on Knowledge Management | |
 |
|
|
 | |
 |
 |
|
 |
|
Google Web APIs With the Google Web APIs service, software developers can query billions of web pages directly from their own computer programs. Google uses the SOAP and WSDL standards so a developer can program in his or her favorite environment - such as Java, Perl, or Visual Studio .NET.
Yahoo! Search Web Services Yahoo! Search Web Services allow you to access Yahoo content and services in your favorite programming languages. This means you can now build Yahoo directly into your own applications.
MSN Web Search SDK The MSN Search SDK provides documentation that describes the core concepts, requirements, development guidelines, and class library for the MSN Search Web Service. The SDK also contains sample code that demonstrates application development techniques using the MSN Search Web Service. | |
 | |
 |
 |
| Advanced Web search engine | |
 |
|
AnswerBus AnswerBus is an open-domain question answering system (QA) based on intelligent information retrieval. It accepts your questions in natural languages and extracts answers from the Web. Currently, You can use English, German, French, Italian, Spanish and Portuguese as your languages.
KartOO KartOO est un méta moteur de recherche qui présente ses résultats sous forme de carte. Les sites trouvés sont représentés par des pages plus ou moins grosses en fonction de leur pertinence. Entre ces sites figurent des thèmes qu'il suffit de cliquer pour préciser votre recherche.
WebClust WebClust is a meta search engine based on a technology called "Document clustering": the automatic organization of documents into meaningful groups. WebClust queries one or more web search engines, parses their result pages to extract the documents (titles, URLs, and short descriptions) and groups the documents based on this information. This process presents the best results of the web in a "horizontal" topical arrangement in addition to a single vertical list. | |
|
| | |