Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model


 Raymond Lee
 5 years ago
 Views:
Transcription
1 Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written as (1) or () where xt is the tth row of the matrix X or simply as (3) where it is implicit that x t is a row vector containing the regressors for the tth time period. The classical assumptions on the model can be summarized as (4) Assumption V as written implies II and III. These assumptions are described as 1. linearity. zero mean of the error vector 3. scalar covariance matrix for the error vector 4. nonstochastic X matrix of full rank 5. normality of the error vector With normally distributed disturbances, the joint density (and therefore likelihood function) of y is
2 (5) The natural log of the likelihood function is given by (6) Maximum likelihood estimators are obtained by setting the derivatives of (6) equal to zero and solving the resulting k+1 equations for the k s and. These first order conditions for the M.L estimators are (7) Solving we obtain (8) The ordinary least squares estimator is obtained be minimizing the sum of squared errors which is defined by
3 3 (9) The necessary condition for to be a minimum is that (10) This gives the normal equations which can then be solved to obtain the least squares estimator (11) The maximum likelihood estimator of estimator is given as is the same as the least squares estimator. The distribution of this (1) We have shown that the least squares estimator is: 1. unbiased. minimum variance of all unbiased estimators 3. consistent 4. asymptotically normal 5. asymptotically efficient. In this section we will discuss how the statistical properties of crucially dependent upon the assumptions IV. The discussion will proceed by dropping one assumption at a time and considering the consequences. Following a general discussion, later sections will analyze specific violations of the assumptions in detail.
4 4 B. Nonlinearity 1. nonlinearity in the variables only If the model is nonlinear in the variables, but linear in the parameters, it can still be estimated using linear regression techniques. For example consider a set of variables z = (z 1, z,...,z p ), a set of k functions h... h, and parameters. Now define the model: 1 k (13) 0 This model is linear in the parameters and can be estimated using standard techniques where the functions h take the place of the x variables in the standard model. i. intrinsic linearity in the parameters a. idea Sometimes models are nonlinear in the parameters. Some of these may be intrinsically linear, however. In the classical model, if the k parameters can be written as k onetoone functions (perhaps nonlinear) of a set of k underlying parameters 1,..., k, then model is intrinsically linear in. b. example (14) The model is nonlinear in the parameter A, but since it is linear in, and is a onetoone 0 function of A, the model is intrinsically linear. 3. inherently nonlinear models Models that are inherently nonlinear cannot be estimated using ordinary least squares and the previously derived formulas. Alternatives include Taylor's series approximations and direct nonlinear estimation. In the section on nonlinear estimation we showed that the nonlinear least squares estimator is: 1. consistent. asymptotically normal We also showed that the maximum likelihood estimator in a general nonlinear model is 1. consistent. asymptotically normal 3. asymptotically efficient in the sense that within the consistent asymptotic normal (CAN) class it has minimum variance
5 If the distribution of the error terms in the nonlinear least squares model is normal, and the errors are iid(o, ), then the nonlinear least squares estimator and the maximum likelihood estimator will be the same, just as in the classical normal linear regression model. C. Nonzero expected value of error term (E( t) 0) Consider the case where has a nonzero expectation. The least squares estimators of is given by (15) 5 The expected value of is given as follows where (16) which appears to suggest that all of the least squares estimators in the vector are biased. However, if E( ) = for all t, then t (17) To interpret this consider the rules for matrix multiplication.
6 6 (18) Now consider (19) The first column of the X matrix is a column of ones. Therefore (0) Thus it is clear that
7 7 (1) and only the estimator of the intercept is biased. This situation can arise if a relevant and important factor has been omitted from the model, but the factor doesn't change over time. The effect of this variable is then included in the intercept and separate estimators of and can't be obtained. 1 More general violations lead to more serious problems and in general the least squares estimators of and are biased.
8 8 D. A nonscalar identity covariance matrix 1. introduction Assumption III implies that the covariance matrix of the error vector is a constant multiplied by the identity matrix. In general this covariance may be any positive definite matrix. Different assumptions about this matrix will lead to different properties of various estimators.. heteroskedasticity Heteroskedasticity is the case where the diagonal terms of the covariance matrix are not all equal, i.e. Var ( ) for all t t With heteroskedasticity alone the covariance matrix is given by () This model will have k + n parameters and cannot be estimated using n observations unless some assumptions (restrictions) about the parameters are made....
9 9 3. autocorrelation Autocorrelation is the case where the offdiagonal elements of the covariance matrix are not zero, i.e. Cov (, ) 0 for t s. With no autocorrelation, the errors have no discernible pattern. t s In the case above, positive levels of tend to be associated with positive levels and so on. With autocorrelation alone is given by (3) This model will have k (n(n1)/) parameters and cannot be estimated using n observations unless some assumptions (restrictions) about the parameters are made.
10 10 4. the general linear model For situations in which autocorrelation or heteroskedasticity exists (4) and the model can be written more generally as (5) Assumption VI as written here allows X to be stochastic, but along with II, allows all results to be conditioned on X in a meaningful way. This model is referred to as the generalized normal linear regression model and includes the classical normal linear regression model as a special case, i.e., when = I. The unknown parameters in the generalized regression model are the 's = ( 1,..., k)', and the n(n + 1)/ independent elements of the covariance matrix. In general it is not possible to estimate unless simplifying assumptions are made since one cannot estimate k + [n(n+1)/] parameters with n observations. 5. Least squares estimations of in the general linear model with = known Least squares estimation makes no assumptions about the disturbance matrix and so is defined as before using the sum of squared errors. The sum of squared errors is defined by (6) The necessary condition for to be a minimum is that (7)
11 11 This gives the normal equations which can then be solved to obtain the least squares estimator (8) The least squares estimator is exactly the same as before. Its properties may be different, however, as will be shown in a later section. 6. Maximum likelihood estimation with known The likelihood function for the vector random variable y is given by the multivariate normal density. For this model Therefore the likelihood function is given by (9) (30) The natural log of the likelihood function is given as (31) The M.L.E. of is defined by maximizing 31 (3) This then yields as an estimator of (33) This estimator differs from the least squares estimator. Thus the least squares estimator will have different properties than the maximum likelihood estimator. Notice that if is equal to I, the estimators are the same.
12 1 (34) 7. Best linear unbiased estimation with known BLUE estimators are obtained by finding the best estimator that satisfies certain conditions. BLUE estimators have the properties of being linear, unbiased, and minimum variance among all linear unbiased estimators. Linearity and unbiasedness can be summarized as (35) The estimator must also be minimum variance. One definition of this is that the variance of each must be a minimum. The variance of the ith is given by the ith diagonal element of (36) This can be denoted as (37) where is the ith row of the matrix A, is and is given as where i is the ith row of an kxk identity matrix. The construction of the estimator can be reduced to selecting the matrix A so that the rows of A (38) Because the result will be symmetric for each (hence, for each a ), denote by where a is an (n by 1) vector. The problem then becomes: i i
13 13 (39) The column vector i is the ith column of the identity matrix. The Lagrangian is as follows (40) To minimize it take the derivatives with respect to a and (41) 1 Now substitute a' = (1/) 'X' into the second equation in 41 to obtain (4) It is obvious that AX = I. The BLUE and MLE estimators of are identical, but different from the least squares estimator of. We sometimes call the BLUE estimator of in the general linear model, the generalized least squares estimator, GLS. This estimator is also sometimes called the Aitken estimator after the individual who first proposed it. 8. A note on the distribution of a. introduction
14 14 For the Classical Normal Linear Regression Model we showed that (43) For the Generalized Regression Model (44) b. unbiasedness of ordinary least squares in the general linear model As before write in the following fashion. (45) Now take the expected value of equation 45 (46) Because is either fixed or a function only of X if X is stochastic, it can be factored out of the expectation, leaving E( X), which has an expectation of zero by assumption II. Now find the unconditional expectation of by using the law of iterated expectations. In the sense of Theorem 3 of the section on alternative estimators, h(x,y) is and E Y X computes the expected value of conditioned on X. The interpretation of this result is that for any particular set of observations, X, the least squares estimator has expectation. c. variance of the OLS estimator First rewrite equation 4 as follows (47)
15 15 (48) Now directly compute the variance of given X. (49) If the regressors are nonstochastic, then this is also the unconditional variance of regressors are stochastic, then the unconditional variance is given by. If the d. unbiasedness of MLE and BLUE in the general linear model First write the GLS estimator as follows (50) Now take the expected value of equation 50 (51) Because is either fixed or a function only of X if X is stochastic, it can be factored out of the expectation, leaving E( X), which has an expectation of zero by assumption II. Now find the unconditional expectation of by using the law of iterated expectations. The interpretation of this result is that for any particular set of observations, X, the generalized least squares estimator has expectation. e. variance of the GLS (MLE and BLUE) estimator (5)
16 16 First rewrite equation 50 as follows (53) Now directly compute the variance of given X. (54) If the regressors are nonstochastic, then this is also the unconditional variance of regressors are stochastic, then the unconditional variance is given by. If the f. summary of finite sample properties of OLS in the general model Note that all the estimators are unbiased estimators of, but. If = I then the classical model results are obtained. Thus using least squares in the generalized model gives unbiased estimators, but the variance of the estimator may not be minimal. 9. Consistency of OLS in the generalized linear regression model We have shown that the least squares estimator in the general model is unbiased. If we can show that its variance goes to zero as n goes to infinity we will have shown that it is mean square error consistent, and thus that is converges in probability to. This variance is given by (55) As previously, we will assume that (56) With this assumption, we need to consider the remaining term, i.e.,
17 17 (57) The leading term,, will, by itself go to zero. We can write the matrix term in the following useful fashion similar to the way we wrote out a matrix product in proving the asymptotic normality of the nonlinear least squares estimator. Remember specifically that (58) where In similar fashion we can show that the matrix in equation 57 can be written as (59) The second term in equation 59 is a sum of n terms divided by n. In order to check convergence of this product, we need to consider the order of each term. Remember the definition of order given earlier. Definition of order: 1. A sequence {a } is at most of order n, which we denote O(n ) if is bounded. n When =0, {a } is bounded, and we also write a = O(1), which we say as big oh one. n n. A sequence {a } is of smaller order then n, which we denote o(n ) if. When =0, {a } converges to zero, and we also write a = o(1), which we say as little oh one. n n n The first term in the product is of order 1/n, O(1/n). The second term, in general is of O(n). So it appears that if the product of these two terms converges, it might converge to a matrix of nonzero constants. If this were the case, proving consistency would be a problem. At this point we will simply make an assumption as follows.
18 18 (60) If this is the case, then the expression in equation 59 will converge in the limit to zero, and will be consistent. Using arguments similar to those adopted previously, we can also show that the OLS estimator will be asymptotically normal in a wide variety of settings (Amemiya, 187). Discussion of the GLS estimator will be discussed later. 10. Consistent estimators of the covariance matrix in the case of general error structures a. general discussion If were known, then the estimator of the asymptotic covariance matrix of would be (61) The outer terms are available from the data, and if were known, we would have the information we need to compute standard errors. From a sample of n observations, there is no way to estimate the elements of. But what we really need is an estimator of which is a symmetric k k matrix. What we then need is an estimator of this matrix. We can write this is a more useful fashion as follows (6) where t is the appropriate element of as compared to. The idea will be to use information on the residuals from the least squares regression to devise a way to approximate Q. b. heteroskedasticity only In the case where there is no auto correlation, that is when is a diagonal matrix, we can write equation 6 as * (63) White has shown that under very general conditions, the estimator
19 19 (64) The proof is based on the fact that is a consistent estimate of, (meaning the residuals are 1 consistent estimates of ), and fairly mild assumptions on X. Then rather than using (X X) to estimated the variance of in the general model, we instead use (65) c. autocorrelation In the case of a more general covariance matrix, a candidate estimator for Q * might be (66) The difficulty here is that to this matrix may not converge in the limit. To obtain convergence, it is necessary to assume that the terms involving unequal subscripts in (66) diminish in importance as n grows. A sufficient condition is that terms with subscript pairs t  grow smaller as the distance between then grows larger. A more practical problem for estimation is that Q * may not be positive definite. Newey and West have proposed an estimator to solve this problem using some of the cross products e e. This estimator will be discussed in a later section. E. Stochastic X matrix (possibly less than full rank) 1. X matrix less than full rank t t If the X matrix, which is nxk, has rank less than k then X'X cannot be inverted and the least squares estimator will not be defined. This was discussed in detail in the section on multicollinearity.. Stochastic X Consider the least squares estimator of in the classical model. We showed that it was unbiased as follows. (67) If the X matrix is stochastic and correlated with, we cannot factor it out of the second term in equation 48. If this is the case,. In such cases, the least squares estimator is usually not only
20 0 biased, but is inconsistent as well. Consider for example the case where converges to a finite and nonsingular matrix Q. Then we can compute the probability limit of as (68) unless. We showed previously that a consistent estimator of could be obtained using instrumental variables (IV). The idea of instrumental variables is to devise an estimator of such that the second term in equation 49 will have a probability limit of zero. The instrumental variables estimator is based on the idea that the instruments used in the estimation are not highly correlated with and any correlation disappears in large samples. A further condition is that these instruments are correlated with variables in the matrix X. We defined instrumental variables estimators in two different cases, when the number of instrumental variables was equal to the number of columns of the X matrix, i.e., Z was n x k matrix, and cases where there were more than k instruments. In either case we assumed that had the following properties (69) Then the instrumental variables estimator was given by By finding the plim of this estimator, we showed that it was consistent (70) (71)
21 In the case where the number of instrumental variables was greater than k, we formed k instruments by projecting each of the columns of the stochastic X matrix on all of the instruments, and then used the predicted values of X from these regressions as instrumental variables in defining the IV estimator. If we let P Z be the matrix that projects orthogonally onto the column space defined by the vectors Z, S(Z), then the IV estimator is given by 1 (7) We always assume that the matrix X PZX has full rank, which is a necessary condition for identified. In a similar fashion to equation 5, we can show that this IV estimator is consistent. F. Random disturbances are not distributed normally (assumptions IIV hold) 1. General discussion to be An inspection of the derivation of the least squares estimator reveals that the deduction is not dependent upon any of the assumptions IIV except for the full rank condition on X. It really doesn't depend on I, if we are simply estimating a linear model no matter the nature of the underlying model. Thus for the model the OLS estimator is always (73) (74) even when assumption V is dropped. However, the statistical properties of distribution of. are very sensitive to the Similarly, we note that while the BLUE of depends upon IIIV, is invariant with respect to the assumptions about the underlying density of as long as IIIV are valid. We can thus conclude that even when the error term is not normally distributed.. Properties of the estimators (OLS and BLUE) when is not normally distributed When the error terms in the linear regression model are not normally distributed, the OLS and BLUE estimators are: a. unbiased b. minimum variance of all unbiased linear estimators (not necessarily of all unbiased estimators since the Cramer Rao lower bound is not known unless we know the density of the residuals) c. consistent d. but standard t and F tests and confidence intervals are not necessarily valid for nonnormally (75)
22 distributed residuals The distribution of (e.g., normal, Beta, Chi Square, etc.) will depend on the distribution of which determines the distribution of y (y = X + ). The maximum likelihood estimator, of course, will differ since it depends explicitly on the joint density function of the residuals. And this joint density gives rise to the likelihood function (76) and requires a knowledge of the distribution of the random disturbances. It is not defined otherwise. MLE are generally efficient estimators and least squares estimators will be efficient if f(y; ) is normal. However, least squares need not be efficient if the residuals are not distributed normally.
23 3 3. example Consider the case in which the density function of the random disturbances is defined by the Laplace distribution (77) which can be graphically depicted as The associated likelihood function is defined by (78) where x t = (1, x t,..., x tk), ' = ( 1,..., k). The log likelihood function is given by (79) The MLE of in this case will minimize (80) and is sometimes called the "least lines," minimum absolute deviations (MAD), or least absolute deviation (LAD) estimator. It will have all the properties of maximum likelihood estimators such as being asymptotically unbiased, consistent, and asymptotically efficient. It need not, however, be unbiased, linear, or minimum variance of all unbiased estimators. 4. Testing for and using other distributions The functional form of the distribution of the residuals is rarely investigated. This can be done, however, by comparing the distribution of with the normal. t
24 Various tests have been proposed to test the assumption of normality. These tests take different forms. One class of tests is based on examining the skewness or kurtosis of the distribution of the estimated residuals. Chi square goodness of fit tests have been proposed which are based upon comparing the histogram of estimated residuals with the normal distribution. The KolmogorovSmirnov test is based upon the distribution of the maximum vertical distance between the cumulative histogram and the cumulative distribution of the hypothesized distribution. An alternative approach is to consider general distribution functions such as the beta or gamma which include many of the common alternative specifications as special cases. Literature Cited Aitken, A.C. On Least Squares and Liner Combinations of Observations. Proceedings of the Royal Statistical Society, 55, (1935):448 Amemiya, T. Advanced Econometrics. Cambridge: Harvard University Press, Huber, P. J., Robust Statistics, New York: Wiley, Newey, W., and K. West. A Simple Positive Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica 55 (1987): White, H. A Heteroskedasticity Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica 48 (1980):
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More information1 Teaching notes on GMM 1.
Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in
More informationChapter 3: The Multiple Linear Regression Model
Chapter 3: The Multiple Linear Regression Model Advanced Econometrics  HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measuresoffit in multiple regression Assumptions
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 03 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationSYSTEMS OF REGRESSION EQUATIONS
SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations
More informationWhat s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationMean squared error matrix comparison of least aquares and Steinrule estimators for regression coefficients under nonnormal disturbances
METRON  International Journal of Statistics 2008, vol. LXVI, n. 3, pp. 285298 SHALABH HELGE TOUTENBURG CHRISTIAN HEUMANN Mean squared error matrix comparison of least aquares and Steinrule estimators
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #47/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationMATH10212 Linear Algebra. Systems of Linear Equations. Definition. An ndimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0534405967. Systems of Linear Equations Definition. An ndimensional vector is a row or a column
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationSolution to Homework 2
Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if
More information1 Short Introduction to Time Series
ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The
More informationSolving Linear Systems, Continued and The Inverse of a Matrix
, Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationNotes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 918/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
More informationMATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationMULTIVARIATE PROBABILITY DISTRIBUTIONS
MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More information1 Another method of estimation: least squares
1 Another method of estimation: least squares erm: estim.tex, Dec8, 009: 6 p.m. (draft  typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationMaster s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra  1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More information5.3 The Cross Product in R 3
53 The Cross Product in R 3 Definition 531 Let u = [u 1, u 2, u 3 ] and v = [v 1, v 2, v 3 ] Then the vector given by [u 2 v 3 u 3 v 2, u 3 v 1 u 1 v 3, u 1 v 2 u 2 v 1 ] is called the cross product (or
More informationT ( a i x i ) = a i T (x i ).
Chapter 2 Defn 1. (p. 65) Let V and W be vector spaces (over F ). We call a function T : V W a linear transformation form V to W if, for all x, y V and c F, we have (a) T (x + y) = T (x) + T (y) and (b)
More informationWooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions
Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special DistributionsVI Today, I am going to introduce
More information2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)
2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationLS.6 Solution Matrices
LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES
Advances in Information Mining ISSN: 0975 3265 & EISSN: 0975 9093, Vol. 3, Issue 1, 2011, pp2632 Available online at http://www.bioinfo.in/contents.php?id=32 ANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a nonempty
More informationImputing Missing Data using SAS
ABSTRACT Paper 32952015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
More informationMATH 551  APPLIED MATRIX THEORY
MATH 55  APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationCAPM, Arbitrage, and Linear Factor Models
CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, Linear Factor Models 1/ 41 Introduction We now assume all investors actually choose meanvariance e cient portfolios. By equating these investors
More informationEigenvalues, Eigenvectors, Matrix Factoring, and Principal Components
Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationEstimating Industry Multiples
Estimating Industry Multiples Malcolm Baker * Harvard University Richard S. Ruback Harvard University First Draft: May 1999 Rev. June 11, 1999 Abstract We analyze industry multiples for the S&P 500 in
More information10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method
578 CHAPTER 1 NUMERICAL METHODS 1. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS As a numerical technique, Gaussian elimination is rather unusual because it is direct. That is, a solution is obtained after
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationSolving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More information3.1 Least squares in matrix form
118 3 Multiple Regression 3.1 Least squares in matrix form E Uses Appendix A.2 A.4, A.6, A.7. 3.1.1 Introduction More than one explanatory variable In the foregoing chapter we considered the simple regression
More informationChapter 2. Dynamic panel data models
Chapter 2. Dynamic panel data models Master of Science in Economics  University of Geneva Christophe Hurlin, Université d Orléans Université d Orléans April 2010 Introduction De nition We now consider
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationChapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem
Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become
More informationINTRODUCTORY STATISTICS
INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore
More informationThe Method of Least Squares
Hervé Abdi 1 1 Introduction The least square methods (LSM) is probably the most popular technique in statistics. This is due to several factors. First, most common estimators can be casted within this
More informationPlease follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationStandard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In nonlinear regression models, such as the heteroskedastic
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationMethods for Finding Bases
Methods for Finding Bases Bases for the subspaces of a matrix Rowreduction methods can be used to find bases. Let us now look at an example illustrating how to obtain bases for the row space, null space,
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulationbased method for estimating the parameters of economic models. Its
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table covariation least squares
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationCHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In
More informationFactor Analysis. Chapter 420. Introduction
Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.
More informationThe Bivariate Normal Distribution
The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included
More informationα = u v. In other words, Orthogonal Projection
Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v
More informationAlgebra 2 Chapter 1 Vocabulary. identity  A statement that equates two equivalent expressions.
Chapter 1 Vocabulary identity  A statement that equates two equivalent expressions. verbal model A word equation that represents a reallife problem. algebraic expression  An expression with variables.
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationMultiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More information13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.
3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in threespace, we write a vector in terms
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationDepartment of Economics
Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 14730278 On Testing for Diagonality of Large Dimensional
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationAutocovariance and Autocorrelation
Chapter 3 Autocovariance and Autocorrelation If the {X n } process is weakly stationary, the covariance of X n and X n+k depends only on the lag k. This leads to the following definition of the autocovariance
More informationLinear Models for Continuous Data
Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear
More informationTime Series and Forecasting
Chapter 22 Page 1 Time Series and Forecasting A time series is a sequence of observations of a random variable. Hence, it is a stochastic process. Examples include the monthly demand for a product, the
More informationLecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
More informationFrom the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationCURVE FITTING LEAST SQUARES APPROXIMATION
CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship
More informationChapter 6: Multivariate Cointegration Analysis
Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationMissing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
More informationThe VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.
Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium
More informationEstimating an ARMA Process
Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators
More informationE3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationPARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA
PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA ABSTRACT The decision of whether to use PLS instead of a covariance
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationFEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL
FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint
More informationA linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form
Section 1.3 Matrix Products A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form (scalar #1)(quantity #1) + (scalar #2)(quantity #2) +...
More informationLinear Algebra Notes for Marsden and Tromba Vector Calculus
Linear Algebra Notes for Marsden and Tromba Vector Calculus ndimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of
More informationAu = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.
Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry
More information1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
More informationRisk Decomposition of Investment Portfolios. Dan dibartolomeo Northfield Webinar January 2014
Risk Decomposition of Investment Portfolios Dan dibartolomeo Northfield Webinar January 2014 Main Concepts for Today Investment practitioners rely on a decomposition of portfolio risk into factors to guide
More informationBias in the Estimation of Mean Reversion in ContinuousTime Lévy Processes
Bias in the Estimation of Mean Reversion in ContinuousTime Lévy Processes Yong Bao a, Aman Ullah b, Yun Wang c, and Jun Yu d a Purdue University, IN, USA b University of California, Riverside, CA, USA
More information