TRAINING IN MULTIVARIATE DATA ANALYSIS IN THE MASTER OF HUMAN EVOLUTION
1 University of Burgos, Laboratory of Human Evolution (SPAIN)
2 University of Burgos, Faculty of Sciences (SPAIN)
3 University of Burgos, Department. Mathematics and Computation (SPAIN)
About this paper:
Appears in:
ICERI2014 Proceedings
Publication year: 2014
Pages: 2785-2793
ISBN: 978-84-617-2484-0
ISSN: 2340-1095
Conference name: 7th International Conference of Education, Research and Innovation
Dates: 17-19 November, 2014
Location: Seville, Spain
Abstract:
The curricula defined by the University of Burgos in the Master of Human Evolution establish generic skills as criticism, analysis, synthesis, problem-solving, independent reading and project work and other specific skills as handling and processing data [1]. Visualization and data analysis are central in modern anthropology for analysis of thousands of data. For several years, our postgraduate students in the Human Evolution Master are learning the skills, mentioned above, dealing with methodology of statistical techniques in the subject “Advanced methods for data analysis” .
The statistical procedures developed in the course are:
i) Univariate analysis
ii) Distribution fitting to data
iii) Outlier detection;
iv) Hypothesis test;
v) Principal Component Analysis;
vi) Multivariate regression.
The tools, summarized above, are used to evaluate the sexual dimorphism by means of craniometric data downloaded from reference [1]. All students made a practice that consists in building a rule of decision that uses 21 craniometric variables from five populations (BERG, PERU, EGYPT, BUSHMAN and ZULU) to solve the problem of sexual dimorphism. Each student makes a rule of decision for every population which implies to work with two- thousands of data for each case.
By means of a regression on principal components the distribution of probability that corresponds to the null hypothesis H0 (the individual is a female) versus the alternative Ha (the individual is a male). After both probabilities of error type I (to affirm that the individual is a male when is actually a female) and type II (to affirm that the individual is a female when is actually a male) are evaluated. Finally the operative curve of this test, that describes graphically the goodness of the decision rule, is built.
The practice demands to combine the six statistical tools mentioned to answer a question posed in the common terms of anthropology. Thus the students understand that a complex question: “to decide what the sex of an individual in a concrete population is” is solved with multivariate indirect information (21 craniometric variables) taking into account the uncertainty that accompanies these determinations. The practice concludes with a public oral exhibition followed by a debate with all students and the teachers.
Every step will be argued according to the mutual relations between variables that are different in every population therefore every statistical procedure is not possible to be used in an automatic way. In addition, the ratio individual/variables (for male or female) is low in every population with the added problem of colinearity in the craniometric variables. So a projection on the principal components as previous step for to reduce the dimension of raw data without significant loss of information, is demanded.
References:
[1] http://www.ubu.es/titulaciones/es/master_evolucion/informacion-academica/objetivos-competencias/competencias
[2] W.W. Howells' Craniometric Data set. Website: http://konig.la.utk.edu/howells.htm.Keywords:
Generic skills, training statistical, processing data, anthropology.