DATA SCIENCE KNOWLEDGE DEMONSTRATOR LAB DEVELOPMENT

L. Gotsev1, E. Kovatcheva 1, B. Jekov1, R. Nikolov1, A. Peshev2, I. Barzev1, E. Shoikova1

1University of Library Studies and Information Technologies (BULGARIA)
2Silver Star Retail - Mercedes-Benz (BULGARIA)
The paper aims to reveal a guided knowledge discovery process on Data Science in an experimental lab environment enhanced by interactive demonstrators' development. Problem-solving, decision-making and critical analysis skills focus on real-world use cases enriched with inquiry-based and cross-disciplinary learning. The main idea lies in developing a collaborative, high-performance knowledge lab integrated within the infrastructure of University Centre of Excellence for Data Science experiments, open for students, researchers, and practitioners to learn, test, refine, upgrade and apply models on real-world data-intensive projects in an interactive, safe, cost-effective environment.

The proposed structure of the Data Science Knowledge Demonstrator Lab consists of 3 significant constructs: Multimedia demonstrators dedicated on workflows, Machine Learning algorithms, software engineering for end-to-end data process management; Business Cases & Projects Portfolio; and Dataset Libraries.

The presented implementation of a demonstration-practice method into experimental infrastructure facilitates master students' engagement across research activities. The demonstrator lab enables a practice skill-building and broadens Data Science knowledge and competencies, particularly in Artificial Intelligence & Machine Learning Domains, as an organic intersection between Data DevOps and Smart Analytics. The central notion is through learning-by-doing and expanding research facilities to a collaborative assembly in the entrepreneurial world accelerated by experimental and demonstration lab environments for ideas and model testing. Knowledge, methodology, models and algorithms defined by business need, technology solution, and applicability are integrated into one place – the demonstrator studio. This studio is more than an interactive learning place; it is a way to transform knowledge into a real project development by applying the proposed models in various domains.

The paper outlines a lab maturing concept from defining the appropriate methodology through specific workflows and algorithms to guided real data-intensive experiments open for students, researchers and practitioner’s collaboration.

As an illustration, a demonstrator design for fake news identification (text classifier) is proposed. It includes a representative case of Natural Language Processing with Machine & Deep Learning along with the visualization of various classifiers' accuracy (such as logistic regression, decision tree, random forest) on different models pre-trained on five large datasets. The technological solution realized in Python enables easy integration of new Machine Learning libraries. The benefits stand on a direct comparison of methods and models informed by metrics for decision-making. These help in decision-making for divergent scenarios and the limitations depending on the chosen context. For example, the developed LSTM model with very high accuracy (99%) has limitations when the news theme changes. Moreover, a comparative study on each model is simple for verifying with the other datasets. In such a way, practitioners, researchers and students can examine a corpus of social media news, extend the models by integrating other open libraries or develop applications with preferred algorithms.