What is Simple Linear Regression?
You’re probably familiar with plotting line graphs with one X axis and one Y axis. The X variable is sometimes called the independent variable and the Y variable is called the dependent variable. Simple linear regression plots one independent variable X against one dependent variable Y. Technically, in regression analysis, the independent variable is usually called the predictor variable and the dependent variable is called the criterion variable. However, many people just call them the independent and dependent variables. More advanced regression techniques (like multiple regression) use multiple independent variables.
Regression analysis can result in linear or nonlinear graphs. A linear regression is where the relationships between your variables can be described with a straight line. Non-linear regressions produce curved lines.(**)
Regression analysis is almost always performed by a computer program, as the equations are extremely time-consuming to perform by hand.
**As this is an introductory article, I kept it simple. But there’s actually an important technical difference between linear and nonlinear, that will become more important if you continue studying regression. For details, see the article on nonlinear regression.Back to top
The Linear Regression Equation
Linear regression is a way to model the relationship between two variables. You might also recognize the equation as the slope formula. The equation has the form Y=a+bX, where Y is the dependent variable (that’s the variable that goes on the Y axis), X is the independent variable (i.e. it is plotted on the X axis), b is the slope of the line and a is the y-intercept.The first step in finding a linear regression equation is to determine if there is a relationship between the two variables. This is often a judgment call for the researcher. You’ll also need a list of your data in x-y format (i.e. two columns of data — independent and dependent variables).
- Just because two variables are related, it does not mean that one causes the other. For example, although there is a relationship between high GRE scores and better performance in grad school, it doesn’t mean that high GRE scores cause good grad school performance.
- If you attempt to try and find a linear regression equation for a set of data (especially through an automated program like Excel or a TI-83), you will find one, but it does not necessarily mean the equation is a good fit for your data. One technique is to make a scatter plot first, to see if the data roughly fits a line before you try to find a linear regression equation.
Linear Regression Equation Microsoft Excel: Steps
Step 1: Install the Data Analysis Toolpak, if it isn’t already installed. For instructions on how to load the Data Analysis Toolpak, click here.
Step 2: Type your data into two columns in Excel. For example, type your “x” data into column A and your “y” data into column b. Do not leave any blank cells between your entries.
Step 3: Click the “Data Analysis” tab on the Excel toolbar.
Step 4: Click “regression” in the pop up window and then click “OK.”
Step 5: Select your input Y range. You can do this two ways: either select the data in the worksheet or type the location of your data into the “Input Y Range box.” For example, if your Y data is in A2 through A10 then type “A2:A10” into the Input Y Range box.
Step 6: Select your input X range by selecting the data in the worksheet or typing the location of your data into the “Input X Range box.”
Step 7: Select the location where you want your output range to go by selecting a blank area in the worksheet or typing the location of where you want your data to go in the “Output Range” box.
Step 8: Click “OK”. Excel will calculate the linear regression and populate your worksheet with the results.
Tip: The linear regression equation information is given in the last output set (the coefficients column). The first entry in the “Intercept” row is “a” (the y-intercept) and the first entry in the “X” column is “b” (the slope).
Back to top
How to Find Linear Regression Slope: Steps
Step 1: Find the following data from the information given: Σx, Σy, Σxy, Σx2, Σy2. If you don’t remember how to get those variables from data, see this article on how to find a Pearson’s correlation coefficient. Follow the steps there to create a table and find Σx, Σy, Σxy, Σx2, and Σy2.
Step 2: Insert the data into the b formula (there is no need to find a).
If formulas scare you, you can find more comprehensive instructions on how to work the formula here: How to Find a Linear Regression Equation: Overview.
Linear Regression Test Value
Linear regression test values are used in simple linear regression exactly the same way as test values (like the z-score or T statistic) are used in hypothesis testing. Instead of working with the z-table you’ll be working with a t-distribution table. The linear regression test value is compared to the test statistic to help you support or reject a null hypothesis.
Linear Regression Test Value: Steps
Sample question: Given a set of data with sample size 8 and r = 0.454, find the linear regression test value.
Note: r is the correlation coefficient.
Step 1: Find r, the correlation coefficient, unless it has already been given to you in the question. In this case, r is given (r = .0454). Not sure how to find r? See: Correlation Coefficient for steps on how to find r.
Step 2: Use the following formula to compute the test value (n is the sample size):
How to solve the formula:
- Replace the variables with your numbers:T = .454√((8 – 2)/(1-[.454]2 ))
- Subtract 2 from n:8 – 2 = 6
- Square r:.454 × .454 = .206116
- Subtract step (3) from 1:1 – .206116 = .793884
- Divide step (2) by step (4):6 / .793884 = 7.557779
- Take the square root of step (5):√7.557779 = 2.74914154
- Multiply r by step (6):.454 × 2.74914154 = 1.24811026
The Linear Regression Test value, T = 1.24811026
Leverage in Linear Regression: How it Affects Graphs
In linear regression, the influential point (outlier) will try to pull the linear regression line toward itself. The graph below shows what happens to a linear regression line when outlier A is included:
Outliers with extreme X values (values that aren’t within the range of the other data points) have more leverage in linear regression than points with less extreme x values. In other words, extreme x-value outliers will move the line more than less extreme values.
The following graph shows a data point outside of the range of the other values. The values range from 0 to about 70,000. This one point has an x-value of about 80,000 which is outside the range. It affects the regression line a lot more than the point in the first image above, which was inside the range of the other values.
In general outliers that have values close to the mean of x will have less leverage that outliers towards the edges of the range. Outliers with values of x outside of the range will have more leverage. Values that are extreme on the y-axis (compared to the other values) will have more influence than values closer to the other y-values.
Like the videos? Subscribe to our Youtube Channel.
Confused and have questions? Head over to Chegg and use code “CS5OFFBTS18” (exp. 11/30/2018) to get $5 off your first month of Chegg Study, so you can understand any concept by asking a subject expert and getting an in-depth explanation online 24/7.
Comments? Need to post a correction? Please post a comment on our Facebook page.