Can you introduce yourself and your main missions at Total?
I am Pierre Jallais, I work within the Technology Group of Total, attached to the Holding and more particularly to Strategy & Innovation, in charge of all subjects related to research, innovation and NLP.
Total is a multi-energy group present in 130 countries, with 100,000 employees. We operate throughout the energy chain, from production to processing. There are multitudes of energies:
Traditional energies like gas and oil and more recent ones like wind power and solar power (that is to say renewable energies)
Our main mission is to lead the networks of the CTG (Group Technology Committee), within Total it is a set of technical business networks common to all branches of the Group. The CTG has existed for more than 20 years and has always been the privileged place for professionals to share their skills, their experiences, implement innovations and regroup them.
Why did you set up a NLP semantic analysis solution for the company?
We have two main objectives on the CTG side:
- Respond to Knowledge Management issues, the CTG being above all a space of knowledge, of synergy between tools and technique.
- The second objective is to provide the means to test and implement new technologies (innovation) and to share and develop the skills necessary for the Group’s needs.
In order to contribute to our KM challenges, we have set up a thesaurus; NLP allows it to be enriched and, by combining it with our tools, thus facilitates access to knowledge. This also improves the quality of our search engine; the relevance of the words searched and allows more efficient navigation between all the documents made available.
Concerning the second project, “SIL” (Safety Integrate Level): it is financed and managed by the CTG. It consists in analyzing the breakdown reports of the equipment in certain industrial sites in order to derive maximum value from them. The objective of the project is to respond to safety issues concerning more particularly the Normandy site (Refining Chemicals branch). This involves analyzing the reports written by operators working on equipment related to instrumentation (equipment with safety functions). This guarantees the safety of our facilities.
The whole point is to use NLP to analyze all these unstructured mini-reports, to check if the equipment is working properly. This is a very important piece of information. Before the analysis was done manually on a very small sample of data.
Today, with Proxem, we have analyzed nearly 400,000 reviews linked to breakdowns or maintenance reports.
The two projects have very distinct objectives, while being consistent with the missions of the CTG. However, these are linked because they both use the part of the thesaurus devoted to phenomena that may have an impact on our facilities. We will also find the vocabulary related to breakdowns used on SIL (for example, corrosion).
As you can see, the semantic analysis solution implemented allows us to enrich the vocabulary, to combine it with our research tools and thus to contribute to the improvement of our results.
In addition, we want this vocabulary to be accessible to the whole company (Total group) which works on NLP use cases. Internally, we try to promote these uses, namely the use of semantic resources, so that it can be reused in as much different contexts as possible.
Today, we will publish this vocabulary in another tool, so that Data Scientists can complete the vocabulary to build their own thesauruses.
Why did you choose Proxem for your projects?
The “Vocabulary Thesaurus” project took place more than two years ago after a manual construction of this thesaurus.
We had not seen many other solutions suitable for this use. The major advantage of the tool is that it does not require a very technical background, so the teams, in particular our part-time librarian, were able to use it quickly after a short training.
Could you describe the results of our solutions for us?
We had 400 keywords in the existing thesaurus. Today with Proxem, we have nearly 6,000 concepts, which is quite consistent. This plays an important role for our search engine to improve all that is the relevance of the search results. Vocabulary had an impact, but there were also other related developments to improve relevancy in particular.
The main objective of the SIL project is not the ROI but the safety of our installations. Thanks to all these results, we were able to calculate the failure rates of our equipment. We have consistent results compared to what has been evaluated in the past. These good results allow us to validate this method in an official way and thus we hope to be able to automate it and especially to deepen its use in order to give even more details.
What are the perspectives for development?
Regarding the SIL project, we realize that this use case will meet essential security objectives; but we can also apply it to address other issues. We are in the value search phase concerning:
- Taking all the information on breakdowns to have more visibility on maintenance costs.
- Setting up action plans to check the quality of the equipment and make it more available to extend it to other industrial sites.
- Comparing sites and highlighting best practices from site to site, and go so far as to look at the subcategories of equipment and suppliers, which are better or worse than the others.
- Review the equipment purchasing strategy; we want to estimate the maintenance costs.
We realize that what we have put in place within the framework of the SIL is interesting when we have to assess the quality of an industrial asset.
Evaluating the quality and safety of our equipment on a large scale is the goal.