Language and Technology
Course code
IFI6223.DT
old course code
Course title in Estonian
Keel ja tehnoloogia
Course title in English
Language and Technology
ECTS credits
6.0
approximate amount of contact lessons
28
Teaching semester
spring
Assessment form
Examination
lecturer of 2019/2020  Autumn semester
Pille Eslon (eesti keel) tavaline kursus
lecturer of 2019/2020  Spring semester
lecturer not assigned
Course aims
To develop
a) basic knowledge and practical skills for processing, comparing and visualizing vast amounts of textual data
b) readiness to determine optimal methods and tools for automatic text analysis, as well as explain one’s preferences
c) the ability to qualitatively interpret language patterns according to the set assignment
d) knowledge about how to analyse speech with the help of technological tools.
Brief description of the course
Introducing options for automatic natural language analysis and software applications that are created or can be adapted for the analysis of Estonian language patterns (e.g. TreeTagger, morphological and syntactic analyser, Cluster Catcher, WordSmith Tools, Sketch Engine).
Providing an overview of Estonian-related technological development and freeware that can be used to analyse oral speech (text-to-speech synthesis, speech recognition) and written texts, as well as for processing digitized archive materials.
Acquiring the ability to use text mining applications, which reveal the language use patterns and are related to the content of the text, and qualitatively interpret these found results in order to determine the important events, actors, their attitudes and views, and draw conclusions on individual, socio-cultural, political etc. opinions in various texts.
Acquiring the skills to apply basic statistical methods when comparing large amounts of textual data and defining the language use patterns (incl. concordances, collocations, idioms, keywords), which distinguish e.g. the discourse of different time periods.
Learning to use the help of thematic toolkits (e.g. Natural Language Toolkit (http://www.nltk.org/) or Pandas (http://pandas.pydata.org/) when solving different initial tasks.
Discussing problematic questions concerning the term paper (e.g. the actuality of initial tasks), evaluating the optimality of the methods and analysis tools chosen to solve them, arguing over the validity of hypotheses based on applied and theoretical arguments.
Independent work
Choosing the topic for the term paper, framing the research problem and action plan, solving various other practical tasks. Defending the term paper in front of the group.
Working through the source materials and literature (reading diary).
Learning outcomes in the course
Has acquired an overview of fundamental statistical methods and software for language analysis.
Is able to use them knowingly and purposefully, choosing optimal methods and applications for solving a specific (cultural, social, linguistic, language-technological etc.) research question.
Is able to qualitatively interpret the results of automatic analysis of natural language, to associate them with different sublanguages, individual language use, media events, socio-historic discourse etc.
Assessment methods
Term paper (20,000 characters)
Term paper defense: slides + 10 minute presentation + 15 minute discussion.
Presentation on a freely chosen topic based on the reading diary (10-15 minutes, slide show).
Teacher
Pille Eslon
Study literature
Liin, K., Muischnek, K., Müürisep, K., Vider, K. (2012). The Estonian Language in the Digital Age / Eesti keel digiajastul. Berlin, Heidelberg: Springer. (https://books.google.ee/books?id=h3FsD47LEjIC&lpg=PA11&dq=keeletehnoloogia%20areng%20Eestis&pg=PA11#v=onepage&q&f=false )
Õim, H., Koit, M. (2017). Suundumusi inimsuhtluse keelelises analüüsis ja modelleerimises (I) ja (II). - Keel ja Kirjandus, 1 (71-80) ja 2 (143-150).
Mihkla, M., Hein, I., Kalvik, M.-L., Kiissel, I., Sirts, R., Tamuri, K. (2012). Estonian speech synthesis: applications and challenges/Синтез речи эстонского языка: применение и вызовы. A. E. Kibrik (Toim.). Computational Linguistics and Intellectual Technologies, Papers from the Annual International Conference "Dialogue" 2012. Moskva: РГГУ, 443 - 453.
Kaalep, H.-J., Koit, M. (2010). Kuidas masin tõlgib? - Keel ja Kirjandus, 10, 724-738.
Mautner, G. (2007). Mining large corpora for social information: The case
Replacement literature
Groom, N. et al (eds.) (2015). Corpora, Grammar and Discourse. Amsterdam: Benjamins.
Baker, P. (2006). Using corpora in discourse analysis. London, New York: Continuum.
Mihkla, M. (2009). Eesti keel tehnoloogiate mõjutuses. - Õiguskeel, 4.
Meister, E., Penjam, J., Tõugu, E. Rakendusi reaal- ja humanitaarteaduste sümbioosist.
Eesti keele tekst-kõne süntees. Vt http://www.eki.ee/keeletehnoloogia/projektid/syntees/tks.html
Tekst-kõne sünteesi veebileht, vt http://synt.think.ee/
Kõnetöötlusvahendid, vt https://keeleressursid.ee/et/keeleressursid/konetootlusvahendid
Meister, E. Keeletehnoloogiatest. Vt https://www.youtube.com/watch?v=iwcahAD4cdw
Kõnetuvastus, vt http://veebiakadeemia.ee/puramiidi-tipus/konetuvastus/