As part of our work on categorisation from Wikipedia, we conducted semantic analysis experiments on their vast knowledge source. Similarly to what we have done for our client, Apec, we analysed people and the different trades or occupations that they were able to practice.
For example, if we take Namik Kemal, a Turkish TV star, we can see that he was an author, a poet, a journalist, a translator, a playwright and a social reformist.
This investigation consisted of extracting people from Wikipedia and then identifying career related expressions: “[person] is / was a [career], [career] and [career]:” But rest assured, not everyone had as many careers as Namik Kemal. These expressions are known in linguistics as a “copulative defining expression”: defining because it defines, “copulative” because it connects the subjects with its attributes.
This laborious and complicated phase took place whilst we verified that we were dealing with a real person, that the profession existed and that we could link it to a Wikipedia category. We also extracted a whole set of small details that did not interest us in this investigation and that created a lot of noise in the data (if you are very picky you will notice that a little bit of noise remains in our presentation). In the end, we got a very large list of pairs of trades such as “actor and musician,” “poet and teacher,” “painter and filmmaker.”
And without further delay, here are the results of our investigation. Pairs are linked by a line, the darker the line, the stronger the correlation between the two.
We were so pleased to see that the graph was readable, with comprehensible clustering:
- The artistic sphere is linked with musicians, graphic artists and writers
- Of these “artists,” some bridge over to teaching and others into the realm of politics
- Teachers linked to the scientific domain.
- Some scientists are also engineers
- From scientists, to engineers, and now onto government institutions ; the army, politics and administration were all linked
- Religion finds it home between politics, teaching and writing
We can also see certain careers create certain associations:
- Writer, poet and author
Of course, we need to take into account that this information was taken only from Wikipedia, where generally well known people are the point of reference. It should not be forgotten either that the size of the elements is related to the number of pairs that a career has or not to the number of people who are actively in this career. This shows a slightly skewed version of the professional world as trades that promote celebrity status are over-represented in our investigation. Nevertheless, the result is remarkable and from our point of view, completely hypnotizing. Do not forget to click on the picture to get a good look at the results!