Our technology

For ten years, Proxem has invested heavily in R&D in natural language processing and artificial intelligence.

Proxem's solutions rely on a dedicated R&D team that combines the best linguistic and statistical approaches for a precise understanding of language.

4 engineers dedicated to R&D

A dedicated team

With specialists in mathematics and artificial intelligence, Proxem’s R&D team works with state-of-the-art technology in natural language processing: deep learning, word embeddings, and active learning.

Constant investments

Each year, Proxem invests a significant portion of its earnings to enhance its solutions with the latest technological advancements in the field.

30% annual sales revenue
5 R&D projects

Scientific recognition

Proxem has carried out several research projects with laboratories and other companies in the field.

Scientific publications

Recognized by the scientific community, Proxem regularly presents at conferences and publishes in international research journals.

30 scientific publications

Scientific publications

  • 2016

    Des humains dans la machine : la conception d’un algorithme de classification sémantique au prisme du concept d’objectivité

    An algorithm is a result of the formalization of a procedure which, when implemented in a computer program, can then be replayed indefinitely without intervention. The socio-technical materiality of programs puts them in systems of contingencies, standards and habits that leave the human capacity for action in the heart of the process. Neither the mechanical nature of the programs or the structural consistency of their mathematical foundations, will allow them to produce objectivity by themselves. It comes from the expertise of their designers interacting, either by direct exchange or through benchmarking tools, with end users whose appreciation pragmatically validates the products of the algorithms. It is, in short, the design of programs by succession of human choices that makes them machines for knowledge production.

  • 2016

    Trans-gram, Fast Cross-lingual Word-embeddings

    We introduce Trans-gram, a simple and computationally-efficient method to simultaneously learn and align wordembeddings for a variety of languages, using only monolingual data and a smaller set of sentence-aligned data. We use our new method to compute aligned wordembeddings for twenty-one languages using English as a pivot language. We show that some linguistic features are aligned across languages for which we do not have aligned data, even though those properties do not exist in the pivot language. We also achieve state of the art results on standard cross-lingual text classification and word translation tasks.

  • 2015

    Convolutional Neural Network for Twitter Sentiment Analysis

    Sentiment Analysis is a common task in natural language processing that aims to detect polarity of a text document (from the most negative to the most positive). We introduce in this article a neural network that classifies in a weakly supervised fashion a set of tweets in three classes : negative, neutral or positive. The architecture of the model is that of a convolutional neural network with three parallel layers where each layer detects distinct features. The network is fed with word embeddings learned on a set of corpus among which the French Wikipedia with few linguistic informations. This model achieves a macro-precision in average 25% higher than classical methods based on bag of words

  • 2015

    Elements for an epistemology of instrumentation and collaboration in Twitter data research

    Twitter has become the most studied online social network in academia, in social sciences as well as in other fields. It is commonly grasped through a collection and analysis of its own data. In this paper, I show through a bibliometric analysisthat scholarly publications on this matter come equally from social and computer sciences, and from natural sciences to a lesser extent. Social scientists rely mostly on classical quantitative methods while computer scientists try to improve algorithmsand techniques. Twitter data can take several epistemic values, from representing nothing to representing real-world social phenomena. Having observed the infrequence of interdisciplinary works, I make a few suggestions based on the history of science for future collaborative projects based on Twitter data.

  • 2015

    L'ambiguïté épistémologique des big data : le cas de la donnée web en sciences sociales

    Le mythe des big data annonce l’avènement de nouvelles connaissances d’ordre quantitatif en sciences sociales. Considérant les big data comme les conséquences de l’informatisation du fait humain, nous explorons l’exemple des données construites à partir du web en montrant qu’elles ne relèvent ni de l’épistémologie des sciences expérimentales, ni du paradigme indiciaire propre aux sciences humaines et sociales. Leur utilisation les inscrit ainsi dans plusieurs statuts épistémologiques possibles (corpus, objet autonome, miroir du réel) dominés par un ancrage disciplinaire en informatique plutôt qu’en sciences sociales. Un flottement et une circulation conceptuelle entre ces différents statuts s’accompagne d’une succession de ruptures épistémiques dans l’exploitation de la donnée, de sa construction à la signification qui lui sera conférée, avec pour conséquence une ambiguïté sur la signification des savoirs nouveaux ainsi produits.

  • 2015

    La structuration disciplinaire et thématique des humanités numériques

    Ce travail se propose d’aborder en largeur la question de la structuration disciplinaire et thématique des humanités numériques. Pour cela, il présente la relation aux technologies numériques d’un certain nombre de disciplines “humanistes” d’un point de vue épistémologique et historique. Je pose ensuite la question de l’interdisciplinarité et notamment du dialogue avec l’informatique, du point de vue des sciences humaines et sociales d’une part, et du point de vue des producteurs de technologies d’autre part. Je propose ensuite de suivre une démarche qui pourrait être caractéristique d’une “épistémologie appliquée” modifiée par le numérique pour étudier la structuration en thématiques et en disciplines de la production scientifique caractéristique des humanités numériques.

View more

Collaborative research projects

  • 2008 2010

    SCRIBO - (Semi-automatic and Collaborative Retrieval of Information Based on Ontologies)

    SCRIBO (Semi-automatic and Collaborative Retrieval of Information Based on Ontologies) aims to develop algorithms and collaborative tools for knowledge extraction from texts and images and to annotate digital documents semi-automatically. The project has a total budget of €4.3 million, with €2 million in public support divided among the nine participants: the AFP, the CEA LIST, the INRIA, the LRDE (EPITA), Mandriva, Nuxeo, Proxem, Tagmatica, and XWiki (Coordinator). Proxem is developing recognition of named entities and events (in UIMA architecture). SCRIBO is accredited by System@tic.

  • 2010 2011


    Accredited by the Cap Digital business cluster and supported by the greater Paris region as part of the ERDF program, the SIRE project (Semantics, Internet, and Recruitment) brings together Lingway, Proxem, and the MoDyCo laboratory. Its goal is to develop semantic tools: for the construction of ontologies specializing in employment, for skills repositories, and for “matching” (automatic alignment) of supply and demand in employment.

  • 2009 2010

    Extended Brain

    The Extended Brain project (AAP Web 2.0), accredited by Cap Digital, is produced in collaboration with PlasmaSoft and ENSCI (École Nationale Supérieure de Création Industrielle). Extended Brain is an application intended for the general public for participatory processing of digital documentation. Extended Brain offers a range of widgets that users can use to quickly extract quotations, pages, or parts of pages in their browser or in Office. The semantic engine developed by Proxem qualifies them in five seconds (title, labels, “theme” or “project”), and then organizes them, regardless of their source or their formats.

  • 2010 2013


    The SOLEN project (Interoperable Systems for Nomad Electronic Reading, FUI-AAP9 program) brings together a consortium of several French players in the electronic book field (tablet manufacturers, publishers, social networks, etc.). Proxem’s contribution is to analyze the readers’ opinions (“I liked this book, this author…”) within a social network and to recommend new readings (“If you liked X, you’ll love Y!”).

  • 2013 2016


    Tourinflux aims to provide those in the tourist industry (primarily institutions but also private actors) with a set of tools for managing both internal data and information available on the web, in order to better understand how a region is perceived and to act on this perception.

View more

Proxem Software

Proxem Software collects, analyzes, and mines textual data for business.

Discover the software

Do you have a question or a need?

We're here to serve you

Contact us