STUDENTS’ UNDERSTANDING OF LINEAR REGRESSION
Polytechnic University of Valencia (SPAIN)
About this paper:
Appears in:
INTED2015 Proceedings
Publication year: 2015
Pages: 1127-1136
ISBN: 978-84-606-5763-7
ISSN: 2340-1079
Conference name: 9th International Technology, Education and Development Conference
Dates: 2-4 March, 2015
Location: Madrid, Spain
Abstract:
In this paper we analyze students’ beliefs and difficulties in understanding linear regression methods. The knowledge can be used to develop improved teaching programs. Two groups of students are considered. The first one is involved in a statistics introductory course in the computer science engineering program. This course is taught in the first semester of the first year of the program. The topics of correlation and association, and simple linear regression are taught as part of the curriculum. A brief introduction to inference in the regression model is also included. They work only with quantitative data. The second group is formed by students in the environmental science degree. In this case the course is in the fourth year of the degree and its contents are more advanced statistical methods: analysis of variance, multiple regression, descriptive analysis of time series and some multivariate methods. These students have previously been taught introductory courses in statistics with descriptive and probability methods. The multiple regression methods are taught after and introduction with the review of correlation and association, and simple regression. The topic covers questions such as polynomial regression, use of dummy variables to include the effects of qualitative factor in the model, analysis of interactions between factors, and inference on the model. In the descriptive analysis of time series, regression is introduced to estimate temporal trends and seasonal components.
In both cases the courses focus on the application of the methods with real data. The emphasis is placed in the use and interpretation of results of statistical software for estimation of statistical parameters and regression models. Several examples of activities that students have to do during the courses are presented, and the results of evaluations of these concepts are commented. Prior to studying linear regression, students develop intuition about bivariate data and statistical association between two variables, also known as covariation. Reasoning about association is an important cognitive activity that humans perform. Several examples are used to illustrate that correlation and association do not necessarily indicate causation. Abilities are developed to interpret scatterplots in order to visually determine and describe relationships between the two variables. Students are taught the different steps involved in building a model: a clear statement of the problem, data collection and initial exploration, model formulation, tentative model fitting and inference, validation of assumptions, and the use of the model to make predictions and decisions regarding the problem. Teaching of linear regression is also designed to help students to understand that a model is a useful approximation to reality but that it should never be considered the final word.Keywords:
Statistics education, linear regression models, correlation, computer science degree, environmental science degree, undergraduate students.