THE ROLE OF STATISTICAL SOFTWARE IN TEACHING DATA ANALYSIS

C. Capilla

Polytechnic University of Valencia (SPAIN)
Teaching statistics at the university level is constantly changing due to the influence of modern technology. The use of statistical software for computations and visual representations, enable students’ active knowledge constructions by “doing” and “seeing” statistics. In this paper a case study of teaching statistical data analysis using software is described. The subject is in the environmental science degree at the Polytechnic University of Valencia (Spain). It is taught in the second semester of the fourth year of the specialization. A computer monitor projected on a screen is used to introduce in the classroom the basic concepts of data analysis. During the semester there are ten computer laboratory classes in which students work in teams of two people and have to apply data analysis techniques using software. SPSS and Statgraphics are the programs available at the university campus for use in the classes. The students’ activities in the computer lab consist in the analysis of environmental data of different fields (air and water pollution, meteorology, etc). The first three activities are related with descriptive analysis (unidimensional and bidimensional frequency tables, univariate descriptive techniques for quantitative data, and bivariate descriptive techniques for quantitative data). The assessment of these sessions has shown that student’s have more difficulties and misunderstandings, when interpreting univariate descriptive parameters (location, dispersion and shape) and statistical descriptive plots (histograms, box-whisker plots). In the fourth session in the computer lab students have to apply discrete probability models (Binomial and Poisson), to answer different questions, among which their application to design a sampling procedure is included. The next session is related with continuous probability models (probability plots, normal, lognormal, uniform, exponential and extreme value distributions). The evaluations results indicate that there are more difficulties in students’ application of discrete models than in the use of continuous. Inference analysis is the target of the sessions six to eight. Session six requires the use of hypothesis test and confidence intervals for the mean and standard deviation of a normal population, and hypothesis test on the correlation coefficient. The next one is devoted to the comparisons of means and standard deviations of two normal populations. Session eight is used to introduce the analysis of variance. The two last course computer lab sessions are to introduce more advanced data analysis methods: the multiple regression model, and descriptive analysis of temporal and multivariate data. The comparison of students’ marks in all the evaluations shows that they are significantly smaller in sessions two, four and ten. There are statistical differences between the activities and the students. They are heterogeneous in their backgrounds which is an important factor to influence the teaching-learning process of the course.