DIGITAL LIBRARY
EDUCATIONAL SOFTWARE FOR HANDS ON TRAINING IN COMPUTATIONAL LINGUISTICS
Southern Ural State University (RUSSIAN FEDERATION)
About this paper:
Appears in: INTED2018 Proceedings
Publication year: 2018
Pages: 7915-7924
ISBN: 978-84-697-9480-7
ISSN: 2340-1079
doi: 10.21125/inted.2018.1893
Conference name: 12th International Technology, Education and Development Conference
Dates: 5-7 March, 2018
Location: Valencia, Spain
Abstract:
Nowadays technology influences every aspect of life including education, and the way people learn is changing. To meet student’s expectations, teaching efforts must align to both contemporary educational strategies and hands on experiences with IT tools. This is especially important in the area of natural language processing (NLP) that forms the main content of courses in computational linguistics (CL). Most of tools for training in CL require programming qualification and are primarily developed for programmers. However, the specificity of CL courses is that there is usually a significant diversity of backgrounds among the CL students who major in linguistics or computer science. Thus, it is very important to carefully consider the teaching tools to be used. On the one hand, such tools should acquaint computer scientists with the linguistic rule based processing techniques that are more and more merged with purely statistical methods and, on the other hand, they should be suitable for linguists who do not possess programming skills. The contribution of linguistic knowledge in developing NLP applications cannot be but appreciated, though the needs of linguists, not so experienced in programming, are often neglected.

This paper reports on a software kit that meets the above formulated requirements. It was adapted from the developer environment earlier created for our own research in various aspects of computational linguistics, such as machine translation, authoring, summarization, etc. The kit has a modular architecture and includes a lexicon shell with flexible settings (to define, among others, tag descriptions, entry structures, depth of knowledge), a number of rule acquisition compilers with universal rule-writing formalisms and a control interface. The lexicon program permits porting entry structures, tags and knowledge between languages and applications. The lexicon knowledge is directly pipelined to the rule acquisition compilers. Any changes made in the lexicon, e.g., tagsets, instantaneously propagate to the compilers and are displayed in the compiler interfaces.

The umbrella configuration of the kit modules covers the top level procedures of Rule Based Machine Translation (analysis, transfer and generation), while for particular applications only selected modules can be used. Every top level procedure includes a number of sub procedures. The basic analysis scenario consists of the following sequence of procedures: Tokenization, Tagging, Chunking and Shallow semantic analysis. Tagging includes assigning tags by lexicon look up and tag disambiguation according to disambiguation rules. Chucking is performed by a bottom-up heuristic parser with a recursive pattern matching technique. It identifies and classifies text constituents as typed phrases. Shallow semantic analysis determines semantic dependency relations between the text chunks and predicates. All modules are compatible and can provide different depth of processing.

The kit modules have been successfully used in a number of unilingual and multilingual applications that involved English, Danish, French and Russian and, thus, are suitable for training CL students on the material of different languages.
Keywords:
Educational software, computational linguistics, hands on training.