{"id":406,"date":"2022-01-10T08:04:24","date_gmt":"2022-01-10T08:04:24","guid":{"rendered":"http:\/\/www.dissertationcanada.com\/blog\/?p=406"},"modified":"2022-03-10T08:05:27","modified_gmt":"2022-03-10T08:05:27","slug":"how-to-diagnose-fix-violated-assumptions-of-linear-regression-model","status":"publish","type":"post","link":"https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/","title":{"rendered":"How to \u2018diagnose\u2019 &#038; \u2018fix\u2019 violated assumptions of linear regression model?"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Typically, when a researcher wants to determine the linear relationship between the target and one or more predictors, the one test that would occur to the researcher is the linear regression model.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Linear regression attempts to analyse whether one or more predictor variables explain the dependent variables. While one variable is considered to be explanatory, the other is deemed to be a dependent variable. The linear regression line represented by, <\/span><i><span style=\"font-weight: 400;\">Y = a + bX<\/span><\/i><span style=\"font-weight: 400;\">, where \u2018Y\u2019 is the dependent variable, \u2018X\u2019 is an explanatory variable, \u2018a\u2019 is the intercept and \u2018b\u2019 is the slope.<\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">For instance<\/span><\/i><span style=\"font-weight: 400;\">, a researcher would want to relate the heights of individuals to their weights using this test. Prior to trying to fit a linear model to observed data, the researcher must investigate whether there is a relationship between the interested variables. To determine this, a scatterplot is used. If no association between the explanatory and dependent variables exists, then fitting a linear regression model to the data will not deliver a useful model.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The numerical measure of association between two variables is known as the <\/span><i><span style=\"font-weight: 400;\">correlation coefficient<\/span><\/i><span style=\"font-weight: 400;\">, and the value lies between -1 and 1.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The linear regression test has five key assumptions<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Linearity relationship between independent &amp; dependent variable<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Statistical independence of errors (no correlation between consecutive errors particular in time series data)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Homoscedasticity of errors<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Normality of error distribution<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">No or little multicollinearity\u00a0<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">If any of these assumptions are violated, then the scientific insights, forecasts yielded may be inefficient or biased\/misleading. Therefore, it becomes a mandate to diagnose the assumptions and find the right solution.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li aria-level=\"1\"><b><i>Violations of linearity\u00a0<\/i><\/b><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><b>Diagnosis<\/b><span style=\"font-weight: 400;\"> &#8211; Non-linearity is evident in the plot of residuals vs predicted values or observed vs predicted values. The points must be symmetrically distributed around a horizontal line in the former plot, whereas in the latter plot it must be distributed around a diagonal line. This is followed by careful investigation for evidence of a \u2018<\/span><i><span style=\"font-weight: 400;\">bowed\u2019<\/span><\/i><span style=\"font-weight: 400;\"> pattern, implying that during large or small predictions, the model makes systematic errors.\u00a0<\/span><\/p>\n<p><b>Solution<\/b><span style=\"font-weight: 400;\"> &#8211; The best way to fix the violated assumption is incorporating a <\/span><i><span style=\"font-weight: 400;\">nonlinear transformation <\/span><\/i><span style=\"font-weight: 400;\">to the dependent and\/or independent variables. For example, if the data is positive, you can consider the log transformation as an option. Applying a log transformation to the dependent variable is equivalent to an assumption of growing or decaying of the dependent variable exponentially as a function of the independent variables. Applying it to the dependent as well as the independent variables is equivalent to an assumption that the impact of the independent variables are multiplicative and not additive in their original units. This indicates that a small percentage change in any one of the independent variables results in proportional percentage<\/span> <span style=\"font-weight: 400;\">change in the desired value of the dependent variable.<\/span><\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li aria-level=\"1\"><b><i>Violation of independence\u00a0<\/i><\/b><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><b>Diagnosis<\/b><span style=\"font-weight: 400;\"> &#8211; Investigate residual time series plot (residuals vs row number) and a residual autocorrelations. Residual autocorrelations must fall within the 95% confidence bands around zero ( i.e., nearest plus-or-minus values to zero). Look for significant correlations at the first lags and in the vicinity of the seasonal period as they are fixable.\u00a0<\/span><\/p>\n<p><b>Solution<\/b><span style=\"font-weight: 400;\"> &#8211; You can add lags of the dependent variable and\/or lags of the independent variables. Alternatively, if you have an ARIMA+regressor procedure, add an AR(1) or MA(1) to the regression model. While an AR(1) adds a lag of the dependent variable, an MA(1) term adds a lag of the forecast error. If there is seasonality in the model, it can be managed by various ways: (i) seasonally adjust the variables or (ii) include seasonal dummy variables to the model.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li aria-level=\"1\"><b><i>Violation of homoscedasticity<\/i><\/b><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><b>Diagnosis<\/b><span style=\"font-weight: 400;\"> &#8211; Investigate residuals vs predicted values plot and in case of time series data, look at residuals vs time plot. Due to the imprecision in the coefficient estimates, the errors tend to be larger for forecasts associated with predictions. Therefore, develop plots of residuals vs independent variables and check for consistency.\u00a0<\/span><\/p>\n<p><b>Solutions<\/b><span style=\"font-weight: 400;\"> &#8211; If the dependent variable is positive and the residual vs predicted plot represents that the size of the errors is directly proportional to the size of the predictions, a log transformation is applied to the dependent variable. If it has already been applied, then the<\/span> <span style=\"font-weight: 400;\">additive seasonal adjustment is used (similar to linearity assumptions).\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li aria-level=\"1\"><b><i>Violation of normality\u00a0<\/i><\/b><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><b>Diagnosis<\/b><span style=\"font-weight: 400;\"> &#8211; The best to check normally distributed errors is by using a normal probability plot. This is a fractiles of error distribution vs the fractiles of a normal distribution plot. If the distribution is normal, then the points on the plot will be close to the diagonal reference line. An <\/span><i><span style=\"font-weight: 400;\">S-shaped<\/span><\/i><span style=\"font-weight: 400;\"> pattern of deviations determines that either there are too many or two few large errors in both directions. On the other hand, a <\/span><i><span style=\"font-weight: 400;\">bow-shaped<\/span><\/i><span style=\"font-weight: 400;\"> pattern of deviations indicates that the residual has excessive<\/span> <span style=\"font-weight: 400;\">errors in one direction.\u00a0<\/span><\/p>\n<p><b>Solutions<\/b><span style=\"font-weight: 400;\"> &#8211; The best solution is the utilisation of nonlinear transformation of variables. An example of nonlinear transformation is log transformation. However, this solution is only used if the errors are not normally distributed.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li aria-level=\"1\"><b><i>Violations of multicollinearity\u00a0<\/i><\/b><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><b>Diagnosis<\/b><span style=\"font-weight: 400;\"> &#8211; To determine the correlation effect among variables, use a scatter plot. Alternatively, you can also use VIF factor. VIF value of &gt;= 10 indicates serious multicollinearity. On the other hand, if the value &lt;= 4 implies there is no multicollinearity.\u00a0<\/span><\/p>\n<p><b>Solution <\/b><span style=\"font-weight: 400;\">&#8211; The best way to eliminate multicollinearity is to remove one of VIF (out of two) from the model. You can use stepwise regression or best subsets regression to remove VIF. If not use Partial Least Square regression (PLS) to cut down the number of predictors.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">By leveraging the solutions mentioned above, fix the violations, control &amp; modify the analysis and explore the true potential of the linear regression model.\u00a0\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Typically, when a researcher wants to determine the linear relationship between the target and one or more predictors, the one test that would occur to the researcher is the linear regression model.\u00a0 Linear regression attempts to analyse whether one or more predictor variables explain the dependent variables. While one variable is considered to be explanatory, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[16],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to \u2018diagnose\u2019 &amp; \u2018fix\u2019 violated assumptions of linear regression model? - Dissertation Canada<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to \u2018diagnose\u2019 &amp; \u2018fix\u2019 violated assumptions of linear regression model? - Dissertation Canada\" \/>\n<meta property=\"og:description\" content=\"Typically, when a researcher wants to determine the linear relationship between the target and one or more predictors, the one test that would occur to the researcher is the linear regression model.\u00a0 Linear regression attempts to analyse whether one or more predictor variables explain the dependent variables. While one variable is considered to be explanatory, [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/\" \/>\n<meta property=\"og:site_name\" content=\"Dissertation Canada\" \/>\n<meta property=\"article:published_time\" content=\"2022-01-10T08:04:24+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-03-10T08:05:27+00:00\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/\",\"url\":\"https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/\",\"name\":\"How to \u2018diagnose\u2019 & \u2018fix\u2019 violated assumptions of linear regression model? - Dissertation Canada\",\"isPartOf\":{\"@id\":\"https:\/\/www.dissertationcanada.com\/blog\/#website\"},\"datePublished\":\"2022-01-10T08:04:24+00:00\",\"dateModified\":\"2022-03-10T08:05:27+00:00\",\"author\":{\"@id\":\"https:\/\/www.dissertationcanada.com\/blog\/#\/schema\/person\/be7eef8e524215c34702a6406667f08f\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.dissertationcanada.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to \u2018diagnose\u2019 &#038; \u2018fix\u2019 violated assumptions of linear regression model?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.dissertationcanada.com\/blog\/#website\",\"url\":\"https:\/\/www.dissertationcanada.com\/blog\/\",\"name\":\"Dissertation Canada\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.dissertationcanada.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.dissertationcanada.com\/blog\/#\/schema\/person\/be7eef8e524215c34702a6406667f08f\",\"name\":\"admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.dissertationcanada.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/b3a29821b662a95bc1ac009cbf381ce6?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/b3a29821b662a95bc1ac009cbf381ce6?s=96&d=mm&r=g\",\"caption\":\"admin\"},\"url\":\"https:\/\/www.dissertationcanada.com\/blog\/author\/fox\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to \u2018diagnose\u2019 & \u2018fix\u2019 violated assumptions of linear regression model? - Dissertation Canada","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/","og_locale":"en_US","og_type":"article","og_title":"How to \u2018diagnose\u2019 & \u2018fix\u2019 violated assumptions of linear regression model? - Dissertation Canada","og_description":"Typically, when a researcher wants to determine the linear relationship between the target and one or more predictors, the one test that would occur to the researcher is the linear regression model.\u00a0 Linear regression attempts to analyse whether one or more predictor variables explain the dependent variables. While one variable is considered to be explanatory, [&hellip;]","og_url":"https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/","og_site_name":"Dissertation Canada","article_published_time":"2022-01-10T08:04:24+00:00","article_modified_time":"2022-03-10T08:05:27+00:00","author":"admin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/","url":"https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/","name":"How to \u2018diagnose\u2019 & \u2018fix\u2019 violated assumptions of linear regression model? - Dissertation Canada","isPartOf":{"@id":"https:\/\/www.dissertationcanada.com\/blog\/#website"},"datePublished":"2022-01-10T08:04:24+00:00","dateModified":"2022-03-10T08:05:27+00:00","author":{"@id":"https:\/\/www.dissertationcanada.com\/blog\/#\/schema\/person\/be7eef8e524215c34702a6406667f08f"},"breadcrumb":{"@id":"https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.dissertationcanada.com\/blog\/how-to-diagnose-fix-violated-assumptions-of-linear-regression-model\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.dissertationcanada.com\/blog\/"},{"@type":"ListItem","position":2,"name":"How to \u2018diagnose\u2019 &#038; \u2018fix\u2019 violated assumptions of linear regression model?"}]},{"@type":"WebSite","@id":"https:\/\/www.dissertationcanada.com\/blog\/#website","url":"https:\/\/www.dissertationcanada.com\/blog\/","name":"Dissertation Canada","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dissertationcanada.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dissertationcanada.com\/blog\/#\/schema\/person\/be7eef8e524215c34702a6406667f08f","name":"admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dissertationcanada.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/b3a29821b662a95bc1ac009cbf381ce6?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b3a29821b662a95bc1ac009cbf381ce6?s=96&d=mm&r=g","caption":"admin"},"url":"https:\/\/www.dissertationcanada.com\/blog\/author\/fox\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dissertationcanada.com\/blog\/wp-json\/wp\/v2\/posts\/406"}],"collection":[{"href":"https:\/\/www.dissertationcanada.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dissertationcanada.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dissertationcanada.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dissertationcanada.com\/blog\/wp-json\/wp\/v2\/comments?post=406"}],"version-history":[{"count":1,"href":"https:\/\/www.dissertationcanada.com\/blog\/wp-json\/wp\/v2\/posts\/406\/revisions"}],"predecessor-version":[{"id":407,"href":"https:\/\/www.dissertationcanada.com\/blog\/wp-json\/wp\/v2\/posts\/406\/revisions\/407"}],"wp:attachment":[{"href":"https:\/\/www.dissertationcanada.com\/blog\/wp-json\/wp\/v2\/media?parent=406"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dissertationcanada.com\/blog\/wp-json\/wp\/v2\/categories?post=406"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dissertationcanada.com\/blog\/wp-json\/wp\/v2\/tags?post=406"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}