|
|
 |
 |
|
 |
|
WordNet WordNet® is an online lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets.
Word Sense Disambiguation in a Slot Grammar Framework This is a preliminary report on a system for word sense disambiguation (WSD) for unrestricted vocabulary, which requires no training on tagged text. Disambiguation is done to WordNet word senses. The “disambiguating power” of the system comes from three sources: (A) Parsing by English Slot Grammar (ESG), (B) the WordNet relation system, and (C) the WordNet sense frequency data.
Mapping of EuroWordnet Top Ontology to Upper Cyc Ontology A mapping of EuroWordnet Top Ontology into Upper Cyc Ontology is presented. The mapping is expressed in terms of a CycL microtheory encoding of the EuroWordnet Top Ontology, because it is impossible to be made just by means of equivalence and subsumption relations.
WordNet::Similarity This is a CPAN module that implements a variety of semantic similarity measures that can be used in conjunction with WordNet. In particular, it supports the measures of Resnik, Lin, Jiang-Conrath, Leacock-Chodorow, Hirst-St.Onge, Wu-Palmer, Banerjee-Pedersen, and Patwardhan-Pedersen.
eXtended WordNet The goal of this project is to develop a tool that takes as input the current or future versions of WordNet and automatically generates an eXtended WordNet that provides several important enhancements intended to remedy the present limitations of WordNet.
NameNet: a Self-Improving Resource for Name Classification This paper presents a semantically structured resource of more than 1,600 Name Classes. This structure is based on the noun hyperonymy hierarchies in WordNet, expanded and validated by corpus evidence collected from the World Wide Web. The set of seed examples provided by WordNet is boostrapped and the used to automatically construct an annotated training corpus for each Name Class. The resulting Named Entity resource enables a supervised Named Entity Recognizer to identify all the encoded Name Classes with high accuracy and without any human intervention.
Balkanet The Balkan WordNet aims at the development of a multilingual lexical database comprising of individual WordNets for the Balkan languages. The most ambitious feature of the BalkaNet is its attempt to represent semantic relations between words in each Balkan language and link them together in order to develop an on line multilingual semantic network. The main objective is the development of each's languages WordNet from available resources covering the general vocabulary of each language. Semantic relations will be classified in the independent WordNets according to a shared ontology. Then, all individual WordNets will be organized into a common database providing linking across them. Each of the WordNets will be structured along the same lines as the EuroWordNet through a WordNet Management System. This project is an excellent opportunity to explore the less studied Balkan languages and combine and compare them cross-linguistically.
WordNet.Net WordNet.Net library - the .Net Framework library for WordNet.
WordNet-based semantic similarity measurement Semantic similarity is a confidence score that reflects the semantic relation between the meanings of two sentences. It is difficult to gain a high accuracy score because the exact semantic meanings are completely understood only in a particular context. | |
 | |
 |
 |
|
 |
|
Vecteurs conceptuels et fonctions lexicales : application à l'antonymie. Ce mémoire porte sur la représentation de l'aspect thématique des segments textuels (documents, paragraphes, syntagmes, etc). Nous nous basons sur une approche mixte (symbolique et vectorielle) qui vise à combiner les informations déductibles des structures syntaxiques et les informations issues des représentations de sémantique lexicale. Certaines formes syntaxiques sont indirectement porteuses de sens et d'une facon générale peuvent être modélisées à l'aide de la théorie sens-texte et des fonctions lexicales. La négation, très fréquente dans les textes, peut permettre, entre autres, d'éviter les répétitions, ou de produire des énoncés dont la forme n'est pas lexicalement avérée comme, par exemple, les syntagmes "il n'est pas sérieux", "il n'est pas aimable". Les mots ,sérieux - ou aimable- n'ont pas de contraires bien avérées. Les termes ,léger - et désagréable- ne sont tout au plus que des approximations. La négation ne signifie pas toujours le contraire d'une affirmation, comme dans le cas de la phrase, "elle n'est pas belle, elle est superbe". Par contre, dans le cas, "il n'est pas mort" la négation exprime, a priori, l'idée opposée "il est vivant" avec cependant les problèmes de la polysémie et des sens figurés. On peut parler de "vivant" dans le sens gai, tonique.
Antonymy and Semantic Range in English This dissertation investigates what makes two words antonyms. Previous research has not adequately explained why some words seem to contrast in meaning but are still not considered antonyms (e.g. large and little) nor can it explain why some words have two antonyms (e.g., happy/sad and happy/unhappy). An explanation is given here using the notion of "semantic range" (a description of a word's typical collocation patterns); antonyms are shown to be words which have a great deal of semantic range in common. | |
 | |
 |
 |
| Other resources on ontologies | |
 |
|
Sites Relevant to Ontologies and Knowledge Sharing A list of resources on Ontologies and Knowledge Sharing.
John Bateman's ontology portal This page is a collection of starting points for information on ontologies gathered together for ease of reference for our own ontology-related projects.
Fine-Grained Proper Noun Ontologies for Question Answering The WordNet lexical ontology, which is primarily composed of common nouns, has been widely used in retrieval tasks. Here, we explore the notion of a finegrained proper noun ontology and argue for the utility of such an ontology in retrieval tasks. To support this claim, we build a fine-grained proper noun ontology from unrestricted news text and use this ontology to improve performance on a question answering task.
An introduction to Ontology by John F. Sowa Ontology is the study of existence. An ontology is a system of categories for classifying and talking about the things that are assumed to exist. This directory contains a summary of the ontology developed and used in the KR book by John Sowa.
KBS / Ontology Projects Worldwide Some ongoing KBS/Ontology projects and groups.
OWL Web Ontology Language Reference The Web Ontology Language OWL is a semantic markup language for publishing and sharing ontologies on the World Wide Web. OWL is developed as a vocabulary extension of RDF (the Resource Description Framework) and is derived from the DAML+OIL Web Ontology Language. | |
 | |
 |
 |
|
 |
|
Sekine's Extended Named Entity Hierarchy The Extended Named Entity Hierarchy is designed and developed to meet increasing needs for wider range of NE types. It originates from the first Named Entity set defined by MUC (Grishman et al., 1996), the Named Entity set developed by IREX (Sekine et al., 2000), and the Extended Named Entity hierarchy which contains approximately 150 NE types (Sekine et al., 2002). But now it extened again t 200 NE types. The applications include Questions and Answering (Q&A) system that analyzes general texts such as newspaper articles, as well as Information Extraction (IE), Machine Translation (MT), Summarization and Information Retrieval (IR) systems that meet variety of NLP applications. We designe the Extended Named Entity Hierarchy, so that Q&A system or IE system assuming that information one wants know is basically in a form of noun phrase with specific names, time expression or numerical values.
NameNet: a Self-Improving Resource for Name Classification This paper presents a semantically structured resource of more than 1,600 Name Classes. This structure is based on the noun hyperonymy hierarchies in WordNet, expanded and validated by corpus evidence collected from the World Wide Web. The set of seed examples provided by WordNet is boostrapped and the used to automatically construct an annotated training corpus for each Name Class. The resulting Named Entity resource enables a supervised Named Entity Recognizer to identify all the encoded Name Classes with high accuracy and without any human intervention. | |
 | |
 |
 |
|
 |
|
Story understanding resources A list of resources on story understanding
Story understanding through multi-representation model construction We present an implemented model of story understanding and apply it to the understanding of a children’s story. We argue that understanding a story consists of building multirepresentation models of the story and that story models are efficiently constructed using a satisfiability solver. We present a computer program that contains multiple representations of commonsense knowledge, takes a narrative as input, transforms the narrative and representations of commonsense knowledge into a satisfiability problem, runs a satisfiability solver, and produces models of the story as output. The narrative, models, and representations are expressed in the language of Shanahan’s event calculus.
Understanding script-based stories using commonsense reasoning This paper investigates the use of commonsense reasoning to understand texts involving stereotypical activities or scripts. We present a system that understands news stories involving four terrorism scripts. The system (1) builds a commonsense reasoning problem given an information extraction template representing a terrorist incident, and (2) uses commonsense reasoning and a commonsense knowledge base to build a model of the terrorist incident. The reasoning problem, commonsense knowledge base, and model are expressed in the classical logic event calculus. The system was developed using the MUC3 and MUC4 development data set. We present the results of running the system on the MUC3 and MUC4 test data sets, using manually generated answer key templates and templates generated automatically by two MUC4 information extraction systems. We present a detailed analysis of the models produced by the system given automatically generated templates. We present methods for answering questions based on the models produced by our system. We assess the portability of the system by extending it to handle 10 scripts frequent in Project Gutenberg American literature texts.
Prospects for in-depth story understanding by computer (Erik T. Mueller - November 29, 1999) While much research on the hard problem of in-depth story understanding by computer was performed starting in the 1970s, interest shifted in the 1990s to information extraction and word sense disambiguation. Now that a degree of success has been achieved on these easier problems, I propose it is time to return to in-depth story understanding. In this paper I examine the shift away from story understanding, discuss some of the major problems in building a story understanding system, present some possible solutions involving a set of interacting understanding agents, and provide pointers to useful tools and resources for building story understanding systems.
The Plots of Children and Machines: The Statistical and Symbolic Semantic Analysis of Narratives This thesis presents a method of automatic plot analysis of narrative texts that uses both components of traditional symbolic analysis of natural language and statistical machine-learning. In particular, we are investigating the story rewriting task. In the story rewriting task, an exemplar story is read to the pupils and the pupils rewrite the story in StoryStation, which allows them to concentrate more on diction and grammar than on content creation. However, often in the process of content creation the pupil improperly recalls the story. Our method of automatic plot analysis should allow the tutoring system to automatically analyze the plot of the story and provide relevant feedback to both the pupil and teacher. (Harry Reeves Halpin, Master of Science - School of Informatics - University of Edinburgh, 2003) | |
 | |
 |
 |
| Natural Semantic Metalanguage (NSM) | |
 |
|
The Natural Semantic Metalanguage homepage This site contains information and resources about the 'natural semantic metalanguage' (NSM) approach to semantic analysis, which can lay claim to being the most well-developed, comprehensive and practical approach to cross-cultural semantics on the contemporary scene. The approach is based on evidence that there is a small core of basic, universal meanings, known as semantic primes, which can be found as words or other linguistic expressions in all languages. This common core of meaning can be used as a tool for linguistic and cultural analysis: to explicate complex and culture-specific words and grammatical constructions, and to articulate culture-specific values and attitudes (cultural scripts), in terms which are maximally clear and translatable. The theory also provides a semantic foundation for universal grammar and for linguistic typology. It has applications in intercultural communication, lexicography (dictionary making), language teaching, the study of child language acquisition, legal semantics, and other areas. The main author is Anna Wierzbicka, who is the originator of the theory, but she has many colleagues and collaborators whose works are also listed here.
Semantics: Primes and Universals (Anna Wierzbicka) Conceptual primitives and semantic universals are the cornerstones of a semantic theory which Anna Wierzbicka has been developing for many years. Semantics: Primes and Universals is a major synthesis of her work, presenting a full and systematic exposition of that theory in a non-technical and readable way. It delineates a full set of universal concepts, as they have emerged from large-scale investigations across a wide range of languages undertaken by the author and her colleagues. On the basis of empirical cross-linguistic studies it vindicates the old notion of the "psychic unity of mankind", while at the same time offering a framework for the rigorous description of different languages and cultures.
Definition of "Natural semantic metalanguage" From Wikipedia, the free encyclopedia. | |
 | |
 |
 |
| KIM (Knowledge and Information Management) Platform | |
 |
|
Ontotext KIM The KIM Platform provides a novel Knowledge and Information Management (KIM) infrastructure and services for automatic semantic annotation, indexing, and retrieval of unstructured and semi-structured content. The most direct applications of KIM are: Generation of meta-data for the Semantic Web, which allows hyper-linking and advanced visualization and navigation; Knowledge Management, enhancing the efficiency of the existing indexing, retrieval, classification and filtering applications. Ontotext is a Sirma laboratory for R&D related to knowledge representation, linguistics, and web services. We provide core technology with applications in Knowledge Management, Semantic Web, and integration. Read more about us and about our Products, Mission, Skills, and Focus. Ontotext is proven to be knowledgeable, reliable, and cost-effective in: development of tools and solutions: knowledge management; language engineering; semantic web services; custom reasoning services; ontology design, evaluation, and mapping: domain analysis and modelling; application-specific ontologies. Our most popular product is the KIM platform for semantic annotation, indexing and retrieval. | |
 | |
 |
 |
|
 |
|
MontyLingua V.2.1 (Python and Java) MontyLingua is a free, commonsense-enriched, end-to-end natural language understander for English. Feed raw English text into MontyLingua, and the output will be a semantic interpretation of that text. Perfect for information retrieval and extraction, request processing, and question answering. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people's names, places, events, dates and times, and other semantic information. MontyLingua makes traditionally difficult language processing tasks trivial! | |
 | |
 |
 |
|
 |
|
A syntax / semantic interface using broad-coverage resources in English In Natural Language Processing, we must first compute a semantic representation of a text prior to “understanding” it. We describe here how to pass from a syntactic structure (generated by a syntactic parser of English) to a semantic form (in the form of predicates and relations between these predicates). Our approach is based on the interoperability between several resources, covering syntactical (Link Grammar Parser), lexical (WordNet) and semantic (VerbNet) aspects of English. The joint use of these broad-coverage resources leads to encouraging results on lexical and syntactical disambiguation. That also makes it possible to assign a “semantic probability” to each interpretation of a sentence. (MSc dissertation of François-Régis Chaumartin)
A Practical Semantic Representation For Natural Language Parsing This thesis deals with the problem of building fast, accurate and portable parsers for natural language understanding. Our focus is a multi-domain dialogue system in which we need a deep linguistically-motivated parser to produce the representations of the input suitable for reasoning. In this dissertation, we are concerned with building parsers which have the wide coverage and portability o ered by a general syntactic grammar without sacri cing parsing speed and accuracy.
Synchronisation des connaissances syntaxiques et sémantiques pour l'analyse d'énoncés en langage naturel à l'aide des grammaires d'arbres adjoints lexicalisées - Djamé Seddah A interface between syntax and semantic aims to propose a logical formalization of the relations between the parts of a sentence. This thesis is a proposal based upon the analysis of problematic linguistic phenomena in the Lexicalized Tree Adjunct Grammars (LTAG) framework. LTAG is a linguistic formalism which provides two structures of representation, derived tree and derivation tree. The last one is an almost perfect structure to be used as a canvas for semantic analysis. However, the derivation tree cannot represent coindexations in an autonomous way. We based our proposition upon the study of linguistic phenomena induced by control verbs. In order to allow their treatment and their complete formalization, we modify the initial LTAG formalism by introducing a new lexical information: the control canvas. Its purpose is to integrate inference of missing argumental links into a synchronous course of derived trees and derivations trees via a shared forest. We propose a dynamic reconstruction algorithm based on inference rules. These rules are executed during the derivation tree extraction process from the shared forest. As we use tabular techniques, we can extract, into a dependency graph, all the argumental relations described by one shared forest. | |
 | |
 |
 |
| Other resources on Knowledge Management | |
 |
|
|
 | |
 |
 |
|
 |
|
Google Web APIs With the Google Web APIs service, software developers can query billions of web pages directly from their own computer programs. Google uses the SOAP and WSDL standards so a developer can program in his or her favorite environment - such as Java, Perl, or Visual Studio .NET.
Yahoo! Search Web Services Yahoo! Search Web Services allow you to access Yahoo content and services in your favorite programming languages. This means you can now build Yahoo directly into your own applications.
MSN Web Search SDK The MSN Search SDK provides documentation that describes the core concepts, requirements, development guidelines, and class library for the MSN Search Web Service. The SDK also contains sample code that demonstrates application development techniques using the MSN Search Web Service. | |
 | |
 |
 |
| Advanced Web search engine | |
 |
|
AnswerBus AnswerBus is an open-domain question answering system (QA) based on intelligent information retrieval. It accepts your questions in natural languages and extracts answers from the Web. Currently, You can use English, German, French, Italian, Spanish and Portuguese as your languages.
KartOO KartOO est un méta moteur de recherche qui présente ses résultats sous forme de carte. Les sites trouvés sont représentés par des pages plus ou moins grosses en fonction de leur pertinence. Entre ces sites figurent des thèmes qu'il suffit de cliquer pour préciser votre recherche.
WebClust WebClust is a meta search engine based on a technology called "Document clustering": the automatic organization of documents into meaningful groups. WebClust queries one or more web search engines, parses their result pages to extract the documents (titles, URLs, and short descriptions) and groups the documents based on this information. This process presents the best results of the web in a "horizontal" topical arrangement in addition to a single vertical list. | |
 | |
 |
 |
| Other resources on Question Answering / Information Extraction | |
 |
|
Sekine's Extended Named Entity Hierarchy The Extended Named Entity Hierarchy is designed and developed to meet increasing needs for wider range of NE types. It originates from the first Named Entity set defined by MUC (Grishman et al., 1996), the Named Entity set developed by IREX (Sekine et al., 2000), and the Extended Named Entity hierarchy which contains approximately 150 NE types (Sekine et al., 2002). But now it extened again t 200 NE types. The applications include Questions and Answering (Q&A) system that analyzes general texts such as newspaper articles, as well as Information Extraction (IE), Machine Translation (MT), Summarization and Information Retrieval (IR) systems that meet variety of NLP applications. We designe the Extended Named Entity Hierarchy, so that Q&A system or IE system assuming that information one wants know is basically in a form of noun phrase with specific names, time expression or numerical values. | |
 | |
 |
 |
| Jena 2 - A Semantic Web Framework | |
 |
|
Jena 2 Jena is a Java framework for writing Semantic Web applications.
Jena 2 source code Jena is a Java framework for writing Semantic Web applications.
Wicked Cool Java: Crawling the Semantic Web (Get started with RDF) Brian Eubanks explains how Java developers can participate in the Semantic Web, a project that strives to create a universal medium for information exchange by linking concepts together. He introduces the Resource Description Framework standard and presents some APIs that aid in producing or consuming content. | |
 | |
 |
 |
|
 |
|
Introduction to Semantic Web Technologies This is intended to give someone new to the Semantic Web a basic overview of the technologies involved, and a guide to where to go to find out more.
What is the Semantic Web? Currently the focus of a W3C working group, the Semantic Web vision was conceived by Tim Berners-Lee, the inventor of the World Wide Web. The World Wide Web changed the way we communicate, the way we do business, the way we seek information and entertainment – the very way most of us live our daily lives. Calling it the next step in Web evolution, Berners-Lee defines the Semantic Web as “a web of data that can be processed directly and indirectly by machines.”
Cerebra Technologies A list of technologies used by the Cerebra Semantic Web product. | |
 | |
 |
 |
|
 |
|
Encyclopedia.com Encyclopedia.com, the Internet's premiere free encyclopedia, provides users with more than 57,000 frequently updated articles from the Columbia Encyclopedia, Sixth Edition. Each article is enhanced with links to newspaper and magazine articles as well as pictures and maps - all provided by HighBeam Research.
Wikipedia Wikipedia is a free–content encyclopedia that anyone can edit.
Probert Encyclopaedia The Probert Encyclopaedia is a trully independent reference work covering all aspects of human knowledge. Because we only use independent researchers, not sponsored by industry, corporations, governments or advertisers, our data is reliable and unbiased. Unlike many other sources including the so-called 'experts' one reads, sees and hears in the media. The data within The Probert Encyclopaedia is arranged into concise articles which are classified by their scope: people, places, nature, food and drink, costume &c. These articles are fully inter-linked allowing additional data to be quickly and easily obtained as required, and most articles have a single click research link allowing retrieval of all related data to be requested with a single mouse click. You can easily search for a specific topic or subject, or if you prefer browse through the over 235,000 fully inter-linked articles to discover specific information about everything imaginable, whether it's a famous actor or a particular warship, revolver, phobia or medical complaint.
Probert Encyclopaedia former (freeware) HTML content This text-only encyclopedia (not a program) is comprehensive enough to be useful. It is divided into major topical sections, within which entries are sorted alphabetically. There are some areas covered which some "popular" CD-ROM dictionaries often neglect. This encyclopedia has become a shareware / commercial product – the final freeware versions listed here are still available, but are no longer being updated.
Columbia Encyclopedia The Columbia Electronic Encyclopedia contains almost 52,000 entries (marshalling six and one-half million words on a vast range of topics), with more than 84,000 hypertext cross-references. Columbia Encyclopedia is among the most complete and up-to-date electronic encyclopedias ever produced.
Ethnologue - languages of the world An encyclopedic reference work cataloging all of the world’s 6,912 known living languages,
Britannica Concise Encyclopedia A one-volume encyclopedia that includes 25,000 short entries. | |
 | |
 |
 |
|
 |
|
Wordsmyth Wordsmyth is a dictionary that has several important and distinctive qualities. Chief among the distinctive features are (1) clarity, simplicity, and precision of style resulting in definitions that are more accessible than those of American college dictionaries; and (2) the integration of dictionary and thesaurus data, so that only one entry is required instead of both dictionary and thesaurus entries.
Edventures Term Browser Look up tricky math and science terms
Smartpedia.com An encyclopedia licensed under the GNU Free Documentation License (GFDL).
WordReference.com The WordReference Dictionaries are free online translation dictionaries. Type in a word in the forms to the left for a quick translation or definition.
The Free Dictionary English, Medical, Legal, Financial, and Computer Dictionaries, Thesaurus, Acronyms, Encyclopedia, a Literature Reference Library, and a Search Engine all in one!
OneLook OneLook regroupe divers dictionnaires généraux et spécialisés en un seul outil de recherche. Les sujets couvrent presque tous les domaines. Il inclut près de 1000 dictionnaires avec plus de 6 000 000 de mots.
Cambridge Dictionaries Online Publish dictionaries for people learning English all around the world
UltraLingua Contains over 120,000 definitions & 80,000 synonyms. The online interface allows you to search for words or parts of words, search within definitions to find headwords (Reverse Dictionary) , search for words that sound alike (Phonetic Dictionary)
Longman Dictionary of Contemporary English (LDOCE) You can use the Longman Web Dictionary to look up ANY word on the Web. It contains over 80,000 words and phrases, including 15,000 references to people, places, events and organizations.
The American Heritage Dictionary of the English Language Le Dictionnaire integral du Patrimoine Americain(r), troisième édition, contient plus de 350 000 entrées et acceptions. Les définitions de mots sont ensuite soulignées par plus de 34 000 exemples d'utilisation, plus de 500 notes sur l'usage et un appendice récemment révisé des racines indo-européennes. | |
 | |
 |
 |
|
 |
|
Webopedia The only online dictionary and search engine you need for computer and Internet technology definitions.
FOLDOC Free On-Line Dictionary Of Computing
Glossary of legal terms Based on Merriam-Webster's Dictionary of Law 2001.
EconomicExpert.com This site is intended as a resource for those working or interested in working on macro-economy research, training, education and economic development. We provide a comprehensive and searchable reference tool on the web, our website is completely free and non-profit.
Dictionary of legal terms Dictionnaire en ligne anglais des termes juridiques avec les explications claires de 3000 termes juridiques communs.
The WorldWideWeb Acronym and Abbreviation Server You can search here for acronyms and for words used in acronyms. An acronym is a label formed from the beginnings of words (Greek: acro [head] and nym [word]) -- or very rarely, from letters in the middle of words. There is no requirement that an acronym be pronounceable as a normal word (this is a curious myth perpetuated by American dictionaries): IBM is just as much an acronym as LASER.
Find out what those acronyms and abbreviations stand for The web's most comprehensive dictionary of acronyms, abbreviations, and initialisms (414,000+ definitions).
Find the meanings of military terms and acronyms With Military Words, you can search for military/government acronyms and abbreviations (powered by Acronym Finder) and military terms from the US DoD Joint Publication
MedTerms The MedTerms Medical Dictionary is somewhat different from the traditional medical dictionary. Since this Medical Dictionary was first conceived some years ago, the medical staff of MedicineNet.com has added (and subtracted) entries almost daily. We have also revised existing entries on an ongoing basis. The MedTerms Medical Dictionary is an online publication with the advantages of this electronic medium.
Merriam-Webster medical dictionary Dictionnaire très complet de l'anglais médical.
BioTech life science dictionary Currently, most of our 8300+ terms deal with biochemistry, biotechnology, botany, cell biology and genetics. We also have some terms relating to ecology, limnology, pharmacology, toxicology and medicine.
The CMU Pronouncing Dictionary The Carnegie Mellon University Pronouncing Dictionary is a machine-readable pronunciation dictionary for North American English that contains over 125,000 words and their transcriptions. This format is particularly useful for speech recognition and synthesis, as it has mappings from words to their pronunciations in the given phoneme set. The current phoneme set contains 39 phonemes, for which the vowels may carry lexical stress. | |
 | |
 |
 |
|
 |
|
P# P# is a compiler which facilitates interoperation between a concurrent superset of the Prolog programming language and C#. This enables Prolog to be used as a native implementation language for Microsoft's .NET platform. P# compiles a linear logic extension of Prolog to C# source code.
PROLOG tutorial in french Ce support de cours correspond à un module de 20 heures, destiné à des étudiants en deuxième année d'IUT.
On-line guide to PROLOG programming Contribution to evolving area of logic programming languages and PROLOG in particular
PROLOG theorem solver PROLOG theorem solver.
Solutions for "The Zebra Puzzle" This is an example of a completely specified solution which doesn't appear to be specified at all. The constraints are such that the answer is unique, but they are stated in such a way that it is not at all obvious (to this human, at least) what the answer is.
The µ-TBL Homepage The µ-TBL system represents an attempt to use the search and database capabilities of the Prolog programming language to implement a generalized form of transformation-based learning. | |
 | |
 |
 |
|
 |
|
OpenCCG: The OpenNLP CCG Library OpenCCG, the OpenNLP CCG Library, is an open source natural language processing library written in Java, which provides parsing and realization services based on Mark Steedman's Combinatory Categorial Grammar (CCG) formalism.
AnswerFinder This is a general purpose open-domain question answering system (written in Java) that draws it's answers from the Internet.
Instance-Based Learning: A Java Implementation Instance-Based Learning (IBL) is defined as the generalizing of a new instance (target) to be classified from the stored training examples. Training examples are processed when a new instance arrives. Instance-Based Learning methods are sometimes called Lazy Learning because they delay the processing until a new instance must be classified. Each time a new query instance is encountered, its relationship to the previously stored examples is examined to assign a target function value for the new instance.
Selection Engine Selection Engine is a Java Case-Based-Reasoning (CBR) Tool
Java KBtextmaster Natural Language Processing Toolkit Utilities for reading a variety of file formats (e.g., Microsoft Word, Powerpoint, PDF, OpenOffice.org, AbiWord), part of speach tagging, automatic categorization, extract human and place names from text, automatic summarization, document clustering, full indexing and search (using Lucene), etc.
GATE GATE is one of the most widely used human language processing systems in the world. It is a tool for: scientists performing experiments that involve processing human language; companies developing applications with language processing components; teachers and students of courses about language and language computation. GATE comprises an architecture, framework (or SDK) and graphical development environment, and has been built over the past eight years in the Sheffield NLP group. The system has been used for many language processing projects; in particular for Information Extraction in many languages. The system supports the full lifecycle of language processing components, from corpus collection and annotation through system evaluation. GATE is funded by the EPSRC and the EU.
Noun Phrase Chunker This application is a Java implementation of the Ramshaw and Marcaus BaseNP chunker (in fact the files in the resources directory are taken straight from their original distribution) which attempts to insert brackets marking noun phrases in text which have been marked with POS tags in the same format as the output of Eric Brill's transformational tagger. The output from this version should be identical to the output of the original C++/Perl version released by Ramshaw and Marcus. A wrapper is also included which allows the easy use of this chunker within the GATE framework.
Jena 2 source code Jena is a Java framework for writing Semantic Web applications.
Lucene Java The Apache Lucene project develops open-source search software, including Lucene Java, our flagship sub-project, provides Java-based indexing and search technology.
PowerLoom Knowledge Representation System PowerLoom™ is the successor to the Loom™ knowledge representation system. It provides a language and environment for constructing intelligent applications. PowerLoom uses a fully expressive, logic-based representation language (a variant of KIF), and it uses a natural-deduction-style backward and forward chainer as its inference engine. The inference engine is not a complete first-order theorem prover, but it can handle complex rules, negation, equality reasoning, subsumption, and restricted forms of higher order reasoning. PowerLoom has a classifier that is able to classify descriptions expressed in full first order predicate calculus [See paper]. PowerLoom uses modules as a structuring device for knowledge bases, and lightweight worlds for classification and hypothetical reasoning. To implement PowerLoom we developed a new programming language called STELLA, which is a Strongly Typed, Lisp-like LAnguage that can be translated into Lisp, C++ and Java. PowerLoom is written in STELLA and therefore available in Common-Lisp, C++ and Java versions.
Wicked Cool Java: Crawling the Semantic Web (Get started with RDF) Brian Eubanks explains how Java developers can participate in the Semantic Web, a project that strives to create a universal medium for information exchange by linking concepts together. He introduces the Resource Description Framework standard and presents some APIs that aid in producing or consuming content.
Stanford Log-linear POS Tagger download This is a Java implementation of the log-linear part-of-speech (POS) taggers. | |
 | |
 |
 |
| .NET (C#, VB.NET, Delphi.NET…) | |
 |
|
Artificial Mind : .NET SDK for Artificial Intelligence ArtificialMind is a free Artificial Intelligence platform (SDK) that provides the following services: Search Algorithms for problem solving, Genetic Algorithms and Artificial Neural Networks.
Nsolver NSolver is a powerful programming language extension for ECMA CLS-compliant languages. It adds constraint programming capabilities to CLS-compliant languages.
NxBRE NxBRE is a lightweight Business Rule Engine (aka Rule Based Engine) for the .NET platform, composed of a forward-chaining inference engine and an XML-driven flow control engine. It supports RuleML 0.86 Naf Datalog and Visio 2003 modeling.
P# P# is a compiler which facilitates interoperation between a concurrent superset of the Prolog programming language and C#. This enables Prolog to be used as a native implementation language for Microsoft's .NET platform. P# compiles a linear logic extension of Prolog to C# source code.
DotLucene DotLucene is a powerful open-source search engine for .NET.
FLUtE (Fuzzy Logic Ultimate Engine) FLUtE, Fuzzy Logic Ultimate Engine, is a library released with LGPL license that allow the user to enforce his projects with the power of Fuzzy Logic’s techniques.
NooJ NooJ is a linguistic development environment that includes large-coverage dictionaries and grammars, and parses corpora in real time. NooJ includes tools to create and maintain large-coverage lexical resources, as well as morphological and syntactic grammars. Dictionaries and grammars are applied to texts in order to locate morphological, lexical and syntactic patterns and tag simple and compound words. NooJ can build complex concordances, with respect to all types of Finite State and Context-Free patterns. NooJ users can easily develop extractors to identify semantic units in large texts, such as names of persons, locations, dates, technical expressions of finance, etc.
WordNet.Net WordNet.Net library - the .Net Framework library for WordNet.
WordNet-based semantic similarity measurement Semantic similarity is a confidence score that reflects the semantic relation between the meanings of two sentences. It is difficult to gain a high accuracy score because the exact semantic meanings are completely understood only in a particular context. | |
 | |
 |
 |
|
 |
|
Link Grammar Parser The Link Grammar Parser is a syntactic parser of English, based on link grammar, an original theory of English syntax. Given a sentence, the system assigns to it a syntactic structure, which consists of a set of labeled links connecting pairs of words. The parser also produces a "constituent" representation of a sentence (showing noun phrases, verb phrases, etc.).
CASS chunker Cass. A fast, robust partial parser developed by Steven Paul Abney. CASS is a partial parser designed for use with large amounts of noisy text.
YamCha: Yet Another Multipurpose CHunk Annotator YamCha is a generic, customizable, and open source text chunker oriented toward a lot of NLP tasks, such as POS tagging, Named Entity Recognition, base NP chunking, and Text Chunking. YamCha is using a state-of-the-art machine learning algorithm called Support Vector Machines (SVMs), first introduced by Vapnik in 1995.
SS Tagger - a part-of-speech tagger for English Tagging speed is crucial in large-scale information extraction and real-time NLP applications. This part-of-speech (POS) tagger offers fast tagging (2400 tokens/sec) with a state-of-the-art accuracy (97.10% on the WSJ corpus). The tagger uses an extension of Maximum Entropy Markov Models (MEMM), in which tags are determined in the easiest-first mannar.
Eric Brill's trainable rule-based part of speech tagger The NLP programs that you can download: a supervised part of speech tagger, an unsupervised part of speech tagger, and a prepositional phrase attachment program. This tagger is based on transformation-based error-driven learning, a technique that has been effective in a number of natural language applications, including part of speech and word sense tagging, prepositional phrase attachment, and syntactic parsing.
SVMTool The SVMTool is a simple and effective generator of sequential taggers based on Support Vector Machines. We have appied the SVMTool to the problem of part-of-speech tagging. By means of a rigorous experimental evaluation, we conclude that the proposed SVM-based tagger is robust and flexible for feature modelling (including lexicalization), trains efficiently with almost no parameters to tune, and is able to tag thousands of words per second, which makes it really practical for real NLP applications. Regarding accuracy, the SVM-based tagger significantly outperforms the TnT tagger exactly under the same conditions, and achieves a very competitive accuracy of 97.2% for English on the Wall Street Journal corpus, which is comparable to the best taggers reported up to date.
NeoClassic: The C++ Version of Classic Classic is a family of knowledge representation (KR) systems designed for applications where only limited expressive power is necessary, but rapid responses to questions are essential. The Classic systems are based on description logics (DLs), which gives them an object-centered flavor, and thus most of the features available in semantic networks are also available in Classic. Classic has a framework that allows users to represent descriptions, concepts, roles, individuals and rules. Classic allows for both primitive concepts, similar to the classes and frames of other knowledge representation systems and object-oriented programming languages, and defined concepts, i.e. concepts that have both necessary and sufficient conditions for membership. Concepts are automatically organized into a generalization taxonomy and objects are automatically made instances of all concepts for which they pass the membership test. Another type of reasoning that Classic does is to detect inconsistencies in information that it is told. In the presence of defined concepts these operations are non-trivial and useful.
ThoughtTreasure ThoughtTreasure is a commonsense knowledge base and architecture for natural language processing that uses multiple representations including logic, finite automata, grids, and scripts.
PowerLoom Knowledge Representation System PowerLoom™ is the successor to the Loom™ knowledge representation system. It provides a language and environment for constructing intelligent applications. PowerLoom uses a fully expressive, logic-based representation language (a variant of KIF), and it uses a natural-deduction-style backward and forward chainer as its inference engine. The inference engine is not a complete first-order theorem prover, but it can handle complex rules, negation, equality reasoning, subsumption, and restricted forms of higher order reasoning. PowerLoom has a classifier that is able to classify descriptions expressed in full first order predicate calculus [See paper]. PowerLoom uses modules as a structuring device for knowledge bases, and lightweight worlds for classification and hypothetical reasoning. To implement PowerLoom we developed a new programming language called STELLA, which is a Strongly Typed, Lisp-like LAnguage that can be translated into Lisp, C++ and Java. PowerLoom is written in STELLA and therefore available in Common-Lisp, C++ and Java versions.
MINIPAR MINIPAR is a broad-coverage parser for the English language. An evaluation with the SUSANNE corpus shows that MINIPAR achieves about 88% precision and 80% recall with respect to dependency relationships. MINIPAR is very efficient, on a Pentium II 300 with 128MB memory, it parses about 300 words per second. | |
 | |
 |
 |
| Description logics languages | |
 |
|
KRHyper This is the homepage of the new implementation of KRHyper in Ocaml. KRHyper is a first order logic theorem proving and model generation system based on the hyper tableau calculus,
PowerLoom Knowledge Representation System PowerLoom™ is the successor to the Loom™ knowledge representation system. It provides a language and environment for constructing intelligent applications. PowerLoom uses a fully expressive, logic-based representation language (a variant of KIF), and it uses a natural-deduction-style backward and forward chainer as its inference engine. The inference engine is not a complete first-order theorem prover, but it can handle complex rules, negation, equality reasoning, subsumption, and restricted forms of higher order reasoning. PowerLoom has a classifier that is able to classify descriptions expressed in full first order predicate calculus [See paper]. PowerLoom uses modules as a structuring device for knowledge bases, and lightweight worlds for classification and hypothetical reasoning. To implement PowerLoom we developed a new programming language called STELLA, which is a Strongly Typed, Lisp-like LAnguage that can be translated into Lisp, C++ and Java. PowerLoom is written in STELLA and therefore available in Common-Lisp, C++ and Java versions.
NeoClassic: The C++ Version of Classic Classic is a family of knowledge representation (KR) systems designed for applications where only limited expressive power is necessary, but rapid responses to questions are essential. The Classic systems are based on description logics (DLs), which gives them an object-centered flavor, and thus most of the features available in semantic networks are also available in Classic. Classic has a framework that allows users to represent descriptions, concepts, roles, individuals and rules. Classic allows for both primitive concepts, similar to the classes and frames of other knowledge representation systems and object-oriented programming languages, and defined concepts, i.e. concepts that have both necessary and sufficient conditions for membership. Concepts are automatically organized into a generalization taxonomy and objects are automatically made instances of all concepts for which they pass the membership test. Another type of reasoning that Classic does is to detect inconsistencies in information that it is told. In the presence of defined concepts these operations are non-trivial and useful.
RACER RacerPro is an OWL reasoner and inference server for the Semantic Web.
The Description Logic Handbook : Theory, Implementation and Applications Description Logics are knowledge representation languages that have been studied extensively in artificial intelligence over the last two decades. This Handbook covers all aspects of research in this field; including theory, implementation, and applications. Its appeal is broad, ranging from more theoretically-oriented readers, to those with more practically-oriented interests who need a sound and modern understanding of knowledge representation systems based on Description Logics. The chapters by some of the most prominent researchers in the field first introduce the basic technical material before addressing the current state of the subject. This unique reference can also be used for self-study or in conjunction with knowledge representation and artificial intelligence courses.
Automated reasoning tools directory A full list of theorem provers and satisfiability solvers. | |
 | |
 |
 |
|
 |
|
Reuters Corpus, Volume 1 (RCV1) In 2000, Reuters Ltd made available a large collection of Reuters News stories for use in research and development of natural language processing, information retrieval, and machine learning systems. This corpus, known as "Reuters Corpus, Volume 1" or RCV1, is significantly larger than the older, well-known Reuters-21578 collection heavily used in the text classification community.
Reuters-21578 The data was originally collected and labeled by Carnegie Group, Inc. and Reuters, Ltd. in the course of developing the CONSTRUE text categorization system. The collection is available here as a gzipped tar archive (8.2 MB; 28.0 MB uncompressed). | |
 | |
 |
 |
|
 |
|
Moby lexicon project Moby Words is part of the Moby Project, a large collection of lists of words and phrases, and works of literature (contents are now in the public domain). Partial contents of Moby Words:
- Over 354,000 single words, excluding proper names, acronyms, or compound words and phrases. This list does not exclude archaic words or significant variant spellings.
- 74,550 common dictionary words. A list of words in common with two or more published dictionaries. This gives the developer of a custom spelling checker a good beginning pool of relatively common words.
- 4,946 female names. Frequent given names of females in English speaking countries.
- 3,897 male names. Frequent given names of males in English speaking countries.
- 21,986 names. This database contains the most common names used in the United States and Great Britain. Spelling checkers may want to supplement their basic word list with this one.
| |
 | |
 |
 |
|
 |
|
GrammarStation.com Come, explore and learn the English language with ease and understand the correct grammar and its usage using GrammarStation.
Coherence: Anaphora and reference Many examples of anaphora.
Online Writing Lab : Grammar, Punctuation, and Spelling In this section of our site, we offer you handouts and exercises on grammar, spelling, and punctuation. We also have PowerPoint presentations related to grammar, and we have an entire section of handouts and resources for English as a Second Language learners that might also prove useful.
English Grammar: explanations and exercices - by Mary Ansell All of the essential points of English grammar are covered. Each point of grammar is clearly explained, and is illustrated by examples. For every important point of grammar, one or more exercises are provided, to make it easier to learn and remember the material. Answers for the exercises are provided. A summary of the uses and formation of the English verb tenses is given for easy reference. Grammatically determined rules for spelling, pronunciation, and punctuation are included. The grammar of North American English is emphasized. Grammatical differences between formal and informal English are pointed out.
Daily Grammar Teachers have our permission to duplicate and use the lessons in their classrooms so long as the copyright information is preserved.
Grammar Bytes! Grammar Instruction with Attitude Find detailed definitions of common grammar terms--everything from abstract nouns to verbs!
Glossary of English Grammar Terms The grammar glossary is a comprehensive site with very clear definitions.
French interactive grammar Grammaire interactive du français avec plusieurs centaines de fiches pratiques.
Online english course Cours en ligne de Kevin Halion
Grammars and Language Courses Here you will find grammars of over 100 languages where you can look up the rules of a language. Language courses that teach you foreign languages are also linked here, whether on line or on the shelf. Additional language resources such as newspapers, online radio stations, and our dictionaries are linked to each language.
UltraLingua grammar reference Complete on-line grammar references for many languages (english, french, spanish, german). | |
 | |
 |
 |
| Workgroups working on NLP | |
 |
|
Microsoft Research At Microsoft Research, we have an insatiable curiosity and the desire to create new technology that will help define the computing experience. Whether inspired by a suggestion from a customer or simply the search for a better way, we’re driven to innovate and push the state-of-the-art in computer science as far as our imaginations can reach. To that end, we collaborate with universities, submit papers for peer review, and partner with product groups to bring our research to you. Read on to discover what we’re doing to improve Microsoft products in the next two to ten years.
Google Labs A partial list of papers written by people now at Google, showing the range of backgrounds of people in Google Engineering.
IBM Research - Computer Science - Natural Language Processing Natural Language Processing at IBM is a dynamic research area spanning a wide range of topics vital for the development of cutting-edge applications of language engineering. Our mission is to offer speech and language technologies that form the core of current and future products and solutions for processing natural language. We work on theoretical issues of computational linguistics and develop technologies such as speech processing, machine translation, universal and application-specific dialog engines, information retrieval, text mining and hypertext databases, automatic text summarization, natural language understanding and generation, to mention just a few. One key goal is to provide advanced NLP software for multiple languages and modalities exploited in business applications. Another fundamental goal is to provide the sophisticated NLP technologies required to linguistically enable human-computer interfaces.
TALANA Les activités de recherche de portent sur la Linguistique Informatique et plus particulièrement : - Génération de textes - Interactions sémantique-syntaxe-prosodie - Modélisation linguistique et dépendance - Modélisations sémantiques Talana est dirigé par Laurence Danlos, Professeur, Université Paris 7.
ATALA L'ATALA se consacre depuis 1959 au développement de la linguistique informatique en France. L'ATALA participe au portail Technoloangue.
GREYC Le Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen a pour sigle GREYC. Ici, le Y symbolise à la fois les trois I (Informatique, Image, Instrumentation) mais aussi, avec un peu d'imagination (!) et une fois retourné, le A d'Automatique.
ISI - The University of Southern California Natural Language Processing at USC/ISI USC/ISI is an academic research Institute that is part of USC's School of Engineering. Many ISI researchers are also on the faculty and computer science, and likewise, many CS graduate students do their dissertation research at ISI. In Business Week's recent survey of academic information technology research, USC ranked fifth overall, and ISI was called "a star" in its survey of international research institutions. ISI's Intelligent Systems Division is one of the largest university based artificial intelligence groups in the world. | |
 | |
 |
 |
|
 |
|
Sinequa Specialized in corporate applications, Sinequa focuses on providing means to find, understand and seamlessly use textual information through intelligent and intuitive access. Thanks to its expertise in natural language processing, the company has developed on top of its patent protected semantic technology a flagship product called Intuition. This platform is both a search engine with advanced linguistic functionalities enabling true understanding of the meaning of both queries and documents, and an access and navigation tool facilitating each user corporate-wide access to information truly critical for day to day work.
PERTIMM La société Pertimm est Editeur et Intégrateur de solutions de recherche d'informations à fortes valeurs ajoutées.
SYSTRAN SYSTRAN is the leading provider of the world's most scalable and modular translation architecture. Its core technology powers revolutionary translation solutions for the Internet, PCs and network infrastructures that facilitate communication in 36 language pairs and in 20 specialized domains.
Connexor Connexor provides linguistic technologies and expertise to software houses and solution providers who tackle the challenge of how to derive useful information from unstructured digital text for different kinds of consumers and analysts.
ONTOLOGOS CORP L'évolution des marchés et des technologies a conduit à une profonde modification de nos sociétés que l'on qualifie volontiers de « sociétés de l'information » et dont le nouvel enjeu économique est devenu la maîtrise des connaissances et des savoir-faire. Cette maîtrise nécessite au préalable la construction de terminologies métier de l'entreprise qui soient consensuelles, cohérentes, partageables et réutilisables ; d'où la nécessaire introduction des ontologies comme représentations de la signification des termes pour l'indexation, la recherche, le routage, le rapprochement et la cartographie de l'information.
Synapse Synapse Développement est une société toulousaine d'édition de logiciels créée en 1994. Elle a pour vocation le développement d'applications intégrant les techniques de la linguistique et de l'intelligence artificielle appliquées aux domaines de traitement de la langue, comme la correction orthographique, syntaxique, l'analyse de la langue, la traduction, le traitement automatique du langage naturel (Taln)
Softissimo Pour vous aider à comprendre ou à traduire des documents, des pages web, et ceci pour de nombreuses combinaisons de langues et dans des domaines variés, Softissimo vous offre une gamme complète de logiciels de traduction : Reverso.
VirtuOz VirtuOz met l’intelligence artificielle au service de vos clients et prospects. Avec la suite logicielle DialogServer et StudiOz, VirtuOz met enfin la technologie des agents conversationnels au service des entreprises. Capables de simuler le dialogue humain, nos solutions rendent vos applications web plus humaines et plus efficientes : résolution des problèmes clients, présentation des produits, communication des messages clés, enquêtes marketing...
MemoData MemoData est leader européen des bases de données linguistiques destinées au traitement automatique du langage naturel. Développées depuis 15 ans, les bases de MemoData couvrent six langues européennes. Le dictionnaire français comprend plus de 185.000 mots-sens, ce qui correspond à environ 700.000 formes fléchies. 55.000 noms, 25.000 verbes, 25.000 adjectifs et 10.000 adverbes sont couverts. En outre, la base contient plus de 280.000 liens ontologiques (un fleuriste vend des fleurs, un animal mange, le chat est un animal...).
Cognitive Relation Cognitive Relation is the only software solution on the market that both reasons and writes. Cognitive Relation offers your enterprise the opportunity to: reproduce any type of intelligent and recurring reasoning process, be it business-related or administrative; with the same skill and quality of reply; for any contact channel used.
TextAnalysis Text Analysis International's flagship product is VisualText, a comprehensive integrated development environment for NLP. VT uses the NLP++ general programming language with specializations for natural language processing. VT integrates the Conceptual Grammar KBMS for ontology and semantics. Analyzers built with VT blend grammars, patterns, keyword, and statistical paradigms in a multi-pass framework.
Lingway Text mining solution, Lingway's technology consists of a natural language multilingual search engine, categorization and coding tools, software for generating an XML structure from textual documents, as well as information extraction and document visualization functions.
Basis Technology Basis Technology provides software solutions for extracting meaningful intelligence from unstructured text in Asian, European and Middle Eastern languages. | |
 | |
|
|
|
|
|