128x Filetype PDF File size 0.29 MB Source: www.stat.cmu.edu
11:55 Wednesday 14th October, 2015 See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix Form 2 1.1 The Basic Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Mean Squared Error . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Minimizing the MSE . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Fitted Values and Residuals 5 2.1 Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Expectations and Covariances . . . . . . . . . . . . . . . . . . . . 7 3 Sampling Distribution of Estimators 8 4 Derivatives with Respect to Vectors 9 4.1 Second Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.2 Maxima and Minima . . . . . . . . . . . . . . . . . . . . . . . . . 11 5 Expectations and Variances with Vectors and Matrices 12 6 Further Reading 13 1 2 So far, we have not used any notions, or notation, that goes beyond basic algebra and calculus (and probability). This has forced us to do a fair amount of book-keeping, as it were by hand. This is just about tolerable for the simple linear model, with one predictor variable. It will get intolerable if we have multiple predictor variables. Fortunately, a little application of linear algebra will let us abstract away from a lot of the book-keeping details, and make multiple linear regression hardly more complicated than the simple version1. Thesenoteswill not remind you of how matrix algebra works. However, they will review some results about calculus with matrices, and about expectations and variances with vectors and matrices. Throughout, bold-faced letters will denote matrices, as a as opposed to a scalar a. 1 Least Squares in Matrix Form Our data consists of n paired observations of the predictor variable X and the response variable Y , i.e., (x ,y ),...(x ,y ). We wish to fit the model 1 1 n n Y =β +β X+ǫ (1) 0 1 where E[ǫ|X = x] = 0, Var[ǫ|X = x] = σ2, and ǫ is uncorrelated across mea- surements2. 1.1 The Basic Matrices Groupall of the observations of the response into a single column (n×1) matrix y, y 1 y 2 y= . (2) . . y n Similarly, we group both the coefficients into a single vector (i.e., a 2 × 1 matrix) β = β0 (3) β1 We’d also like to group the observations of the predictor variable together, but we need something which looks a little unusual at first: 1 x 1 1 x 2 x= . . (4) . . . . 1 x n 1Historically, linear models with multiple predictors evolved before the use of matrix alge- bra for regression. You may imagine the resulting drudgery. 2When I need to also assume that ǫ is Gaussian, and strengthen “uncorrelated” to “inde- pendent”, I’ll say so. th 11:55 Wednesday 14 October, 2015 3 1.2 Mean Squared Error Thisisann×2matrix,wherethefirstcolumnisalways1,andthesecondcolumn contains the actual observations of X. We have this apparently redundant first column because of what it does for us when we multiply x by β: β0+β1x1 β0+β1x2 xβ = (5) . . . β +β x 0 1 n That is, xβ is the n × 1 matrix which contains the point predictions. The matrix x is sometimes called the design matrix. 1.2 Mean Squared Error At each data point, using the coefficients β results in some error of prediction, so we have n prediction errors. These form a vector: e(β) = y−xβ (6) (You can check that this subtracts an n×1 matrix from an n×1 matrix.) When we derived the least squares estimator, we used the mean squared error, n MSE(β)= 1 Xe2(β) (7) n i i=1 How might we express this in terms of our matrices? I claim that the correct form is 1 T MSE(β)= ne e (8) To see this, look at what the matrix multiplication really involves: e1 e2 [e e ...e ] (9) 1 2 n . . . en P 2 This, clearly equals e , so the MSE has the claimed form. i i Let us expand this a little for further use. MSE(β) = 1eTe (10) n = 1(y−xβ)T(y−xβ) (11) n = 1(yT −βTxT)(y−xβ) (12) n 1 T T T T T T = n y y−y xβ−β x y+β x xβ (13) 11:55 Wednesday 14th October, 2015 4 1.3 Minimizing the MSE Notice that (yTxβ)T = βTxTy. Further notice that this is a 1 × 1 matrix, so yTxβ =βTxTy. Thus MSE(β)= 1 yTy−2βTxTy+βTxTxβ (14) n 1.3 Minimizing the MSE First, we find the gradient of the MSE with respect to β: ∇MSE(β = 1 ∇yTy−2∇βTxTy+∇βTxTxβ (15) n = 1 0−2xTy+2xTxβ (16) n = 2 xTxβ−xTy (17) n b Wenowset this to zero at the optimum, β: T b T x xβ−x y=0 (18) b This equation, for the two-dimensional vector β, corresponds to our pair of nor- ˆ ˆ mal or estimating equations for β and β . Thus, it, too, is called an estimating 0 1 equation. Solving, b T −1 T β =(x x) x y (19) Thatis, we’ve got one matrix equation which gives us both coefficient estimates. If this is right, the equation we’ve got above should in fact reproduce the least-squares estimates we’ve already derived, which are of course ˆ cXY xy−x¯y¯ β1 = 2 = (20) s 2 2 X x −x¯ and ˆ ˆ β =y−β x (21) 0 1 Let’s see if that’s right. Asafirststep, let’s introduce normalizing factors of 1/n into both the matrix products: b −1 T −1 −1 T β =(n x x) (n x y) (22) Now let’s look at the two factors in parentheses separately, from right to left. y1 1 1 1 1 . . . 1 y2 xTy = (23) . n n x1 x2 . . . xn . . yn 1 Py = Pi i (24) n i xiyi = y (25) xy 11:55 Wednesday 14th October, 2015
no reviews yet
Please Login to review.