Today, I learned about:
During the month of October, my daughter Karina had the pleasure of presenting a project related to NLP (Natural Language Processing) at an international conference for AI (Artificial Intelligence) in Salvador, Bahia, Brazil.
While still being a Portuguese colony, during the 16th century, São Salvador da Bahia de Todos os Santos, or just Salvador for short, became the first capital of Brazil, before it later on moved to Rio de Janeiro and Brasília. Here are some nice pictures from Salvador. See also reference # 1 below.
Today I received more details from the conference I mentioned above. It was called STIL – XII Brazilian Symposium in Information and Human Language Technology and was held in Salvador on 2019-10-15 – – 18, bringing together both academic and industrial participants working in the areas of Linguistics, Computer Science, Psycholinguistics, Information Science, etc.
STIL also had three different collocated events, one of them being VI Student Workshop on Information and Human Language Technology (TILic). It was at TILic that Karina presented her project, Research of the use of word embeddings for calculation of similarity in translation memories, with the following abstract:
“The strategy traditionally employed by the CAT tools to match the segments of the phrase being currently translated with the segments present in the translation memory considers the intersection of the sequence of words (n-grams) present in the segments of the text being compared. However, this strategy is not capable of capturing semantic similarities beyond the trivial level. This study therefore presents a project with the aim of investigating the applicability of monolingual and bilingual word embeddings to implement the matching. The study is still in its initial phase of development. In sequence, there will be proposed and implemented a strategy for the calculation of similarity using word embeddings, which will be incorporated in a open source CAT tool. In order to evaluate the proposed strategies, the quality of matching in the baseline system (a version of a CAT system without any modification) will be compared to those of the system in which the proposed method will be implemented. At the conclusion of this project is expected to have obtained a strategy based on semantic similarity that will be an alternative to the traditional matching strategy based on n-grams. Although there are already texts covering the use of word embeddings to detect the textual similarity and cleaning of translation memories, there is no literature about any work that has investigated the objective of this project. Consequently, this study should be considered as the first initiative to an investigation within this context.”
In ref. # 2 below is the complete presentation (in Portuguese).
And here are three photos from the event. It shows Karina and her colleague João Gabriel Melo Barbirato, who presented a project named “Linguistic improvements on the text-image aligner LinkPICS”.
That’s what I learned in school !