About this paper

Appears in:
Pages: 3437-3443
Publication year: 2011
ISBN: 978-84-615-3324-4
ISSN: 2340-1095

Conference name: 4th International Conference of Education, Research and Innovation
Dates: 14-16 November, 2011
Location: Madrid, Spain

INFLUENCE OF FILE CONVERTERS ON KEYWORD EXTRACTION WITH KEA

M. Ramšak1, B. Kaučič2, M. Marolt1

1University of Ljubljana, Faculty of Computer and Information Science (SLOVENIA)
2University of Ljubljana, Faculty of Education (SLOVENIA)
Number of electronic resources rises daily. In parallel to that, number of resource collections, digital libraries and repositories is increasing. Quality of them depends also from how well resources are indexed and how similar words, synonyms etc. are considered in the searching algorithm. Basis for that are appropriate keywords (sometimes referred as keyphrases), and their extraction process as one of the tasks in resource management. Apart from that, keywords have many additional useful applications.
Several algorithms and tools have been reported in literature about keyword extraction. Basically they can be divided into approaches based on natural language processing, machine learning and combinations of them. Output of algorithms is a set of the top best candidates for keywords. In general, algorithms work in two phases: preparing the list of keyword candidates in the first phase, and cleaning and ordering that list based on keyword features in the second phase. Undoubtedly, efficiency of the second phase depends on the efficiency of the first phase. In the first phase, many of them use phrase boundaries as one of the filters limiting the number of keyword candidates.

In this paper, several file converters are considered and how their output influences the keyword extraction. In addition we observe how consolidated text without supplementary text influences it. For the keyword extraction, the freely available Kea tool is used. In the evaluation, a collection of PDF documents is used, and the results of extractions are compared against different file converters, consolidated texts, and against manually (by authors of documents) given keywords. Information retrieval metrics precision and recall are used.
@InProceedings{RAMSAK2011INF,
author = {Ramšak, M. and Kaučič, B. and Marolt, M.},
title = {INFLUENCE OF FILE CONVERTERS ON KEYWORD EXTRACTION WITH KEA},
series = {4th International Conference of Education, Research and Innovation},
booktitle = {ICERI2011 Proceedings},
isbn = {978-84-615-3324-4},
issn = {2340-1095},
publisher = {IATED},
location = {Madrid, Spain},
month = {14-16 November, 2011},
year = {2011},
pages = {3437-3443}}
TY - CONF
AU - M. Ramšak AU - B. Kaučič AU - M. Marolt
TI - INFLUENCE OF FILE CONVERTERS ON KEYWORD EXTRACTION WITH KEA
SN - 978-84-615-3324-4/2340-1095
PY - 2011
Y1 - 14-16 November, 2011
CI - Madrid, Spain
JO - 4th International Conference of Education, Research and Innovation
JA - ICERI2011 Proceedings
SP - 3437
EP - 3443
ER -
M. Ramšak, B. Kaučič, M. Marolt (2011) INFLUENCE OF FILE CONVERTERS ON KEYWORD EXTRACTION WITH KEA, ICERI2011 Proceedings, pp. 3437-3443.
User:
Pass: