More precisely, when analysing texts written by humans, it is necessary to agree on the meaning of these texts based on what the human meant. Automatic verbatim analysis therefore needs to have a reference on which it can base its analysis to understand the text. To build this language reference, the simplest way is to ask humans what they have understood.
The task being complicated and the texts being sometimes very long, the researchers needed a means of questioning many readers simply. This need explains the growing use of Amazon’s Mechanical Turk, a platform that connects a “Requester” with a “Turk” who will perform many small repetitive tasks. The Turk receives a small remuneration for each task they complete allowing the Requester to accomplish a large number of tasks at a reasonable cost.
On a side note, the name Turk Mechanic is a reference to a seemingly human robot created in the XVIII century who was supposed to be able to beat a human at chess… but it was later revealed that there was a person inside the machine who moved the arms of the supposed robot. The image of the Turk Mechanic has remained and even gave its name to this system that is seemingly automatic but is in fact manual.
However, the enormous success of the Amazon Turk Mechanic (AMT) created a certain number of questions for the research team. They presented their findings at the TALN 2011 conference, “A Turkish Mechanic for linguistic resources: Criticism of the myriadization of fragmented labour.”
The authors first showed that the use of Turkish Mechanics poses significant legal problems, notably in France where paid work on small tasks is prohibited. It also poses ethical problems, both because the remuneration is so low, and the fact that the tasks are often so repetitive. According to surveys, between 20% and 30% of Turks are located in India and using it as a full time occupation. Therefore, the system forces a division of labour based on the relocation of repetitive and low-paying tasks.
Secondly, in addition to these legal and ethical problems, there is the question of the usefulness of this approach. Native speakers of French are rare, and are therefore more expensive. Translating a need into small tasks that can be performed by non-experts is often difficult and costly – or even impossible! Finally, pay-for-work does not encourage quality work, but rather encourages the person to complete the task as quickly as possible.
At this stage, we are forced to say that this solution is not as practical as it may seem and that we should consider alternatives … which fortunately, do exist. First, existing resources can be cleverly re-used and several international projects have been designed as such:
- Wikipedia provides extensive information on vocabulary and proper nouns
- Wordnet is a lexical database project designed by linguists
- Freebase is a kind of Wikipedia for semantic web engineers
The good news is that the machine also has its own solutions. In artificial intelligence, there are systems capable of learning automatically from examples: semi-supervised learning. We no longer need to annotate all the documents at start-up, but only a small part, which can be managed with a great deal of care and precision. The production process at Proxem can be extremely efficient and fast, in addition to being able to operate day and night. In a future article, we will show you in more detail how such feats are possible.
Sagot, B., Fort, K., Adda, G., Mariani, J., & Lang, B. (2011). Un turc mécanique pour les ressources linguistiques : critique de la myriadisation du travail parcellisé. TALN’2011.