AUTOMATED META-CLASSIFICATION OF SCIENTIFIC PUBLICATIONS - EMPOWERING NOVICE RESEARCHERS CONDUCTING FILTERED LITERATURE RESEARCH WITHOUT REQUIRING DEEP DOMAIN KNOWLEDGE OR EXPERIENCE
RWTH Aachen University (GERMANY)
About this paper:
Conference name: 10th International Conference on Education and New Learning Technologies
Dates: 2-4 July, 2018
Location: Palma, Spain
Abstract:
Familiarization and exploration of new knowledge domains is indispensable in today's fast moving information society. That impinges on the private life, the working world, and especially on science, which is pioneering new knowledge. The common way to get an overview of the current state of the art regarding a certain research area is by conducting a literature research. Although the increasing availability of research publications is extremely beneficial, locating the most suitable research material and related works proves to become more and more complicated. And while this is a hard task by nature, it is even more challenging for novice researchers, that lack procedural as well as factual knowledge, reflected in missing search strategies, overview, etc. At the same time, the biggest problem is most available aids, in form of software tools or services, only concentrate on keyword searches, assuming an understanding of the domains’ content itself. Usually, beginners are not even aware of the domain’s glossary and fail in formulating correct and valuable search queries. Hence, there is a clear need to support the knowledge acquisition process without requiring experts’ knowledge and years of experience.
This paper presents an approach to support novice researchers in the literature research process, by offering new filtering mechanisms for scientific publications, which are not based on their actual domain content. With the help of conducted literature researches, expert interviews, and intensive data exploration and mining of 1.5k example Computer Science publications, indicators and corresponding meta-categories could be defined which allow new classifications of scientific publications. Applied machine learning approaches (such as supervised/unsupervised learning and association rule mining) concentrated on three different kinds of available publication information, namely meta-information of documents, containment information (e.g. specific, appearing vocabulary), and statistical features. The resulting new classifiers (e.g. C4.5 hierarchical tree classifier) show promising results to aid the researchers. They have been embedded in a support tool, capable of being integrated in the natural workflow of literature research, and can thus support the filtering process on the actual purpose of the seekers and not on the domain knowledge.Keywords:
Meta-classification, machine learning, literature research, scientific publications, filtering.