The relationship between variables - Draw the correct conclusions
So far we have visualized relationships between two quantitative variables using measure that assesses the strength of the linear relationship between two . The same scatterplot, except a blue line with arrow has been draw over the plot. It is very important to understand relationship between variables to draw the right 1ANOVA; 2Correlation; 3Two-Way ANOVA; 4Multiple Regression; 5One-Way ANOVA need to establish how the variables are related - is the relationship linear or Suppose you measure a volume of a gas in a cylinder and measure its. Correlation measures the relationship between bivariate data. If we drew an imaginary oval around all of the points on the When there is no linear relationship between two variables, the correlation coefficient is 0.
Now, there's also this notion of outliers. If I said, hey, this line is trying to describe the data, well, we have some data that is fairly off the line.
Relationship Between Variables
So, for example, even though we're saying it's a positive, weak, linear relationship, this one over here is reasonably high on the vertical variable, but it's low on the horizontal variable. And so, this one right over here is an outlier. It's quite far away from the line.
You could view that as an outlier. And this is a little bit subjective. Outliers, well, what looks pretty far from the rest of the data?
This could also be an outlier. Let me label these.
- Bivariate relationship linearity, strength and direction
Now, pause the video and see if you can think about this one. Is this positive or negative, is it linear, non-linear, is it strong or weak?
I'll get my ruler tool out here. So, this goes here. It seems like I can fit a line pretty well to this. So, I could fit, maybe I'll do the line in purple. I could fit a line that looks like that.
And so, this one looks like it's positive. As one variable increases, the other one does, for these data points. So it's a positive. I'd say this was pretty strong. The dots are pretty close to the line there. It really does look like a little bit of a fat line, if you just look at the dots.
So, positive, strong, linear, linear relationship. And none of these data points are really strong outliers. This one's a little bit further out.
But they're all pretty close to the line, and seem to describe that trend roughly. All right, now, let's look at this data right over here. So, let me get my line tool out again. So, it looks like I can fit a line. So it looks, and it looks like it's a positive relationship. The line would be upward sloping. It would look something like this.
Relationships Between Variables, Part 3: Measures of Relationships
And, once again, I'm eyeballing it. You can use computers and other methods to actually find a more precise line that minimizes the collective distance to all of the points, but it looks like there is a positive, but I would say, this one is a weak linear relationship, 'cause we have a lot of points that are far off the line. So, not so strong. So, I would call this a positive, weak, linear relationship. And there's a lot of outliers here.
This one over here is pretty far, pretty far out. Pause this video and think about, is it positive or negative, is strong or weak? Is this linear or non-linear? Well, the first thing we wanna do is let's think about it with linear or non-linear.
I could try to put a line on it. But if I try to put a line on it, it's actually quite difficult. If I try to do a line like this, you'll notice everything is kind of bending away from the line.
It looks like, generally, as one variable increases, the other variable decreases, but they're not doing it in a linear fashion.
It looks like there's some other type of curve at play. So, I could try to do a fancier curve that looks something like this, and this seems to fit the data a lot better. So this one, I would describe as non-linear. And it is a negative relationship. As one variable increases, the other variable decreases. So, this is a negative, I would say, reasonably strong non-linear relationship.
And once again, this is subjective. So, I'll say negative, reasonably strong, non-linear relationship. And maybe you could call this one an outlier, but it's not that far, and I might even be able to fit a curve that gets a little bit closer to that. For a given data set, we can always make this measure larger or smaller by changing the units. Suppose we have a positive linear relationship and X is measured in feet.
If we change the X's to inches then sXY increases by the factor If we change the X's to mm's then sXY increases by the factor Thus we need to standardize our measure.
In this chapter we revisit this problem in Chapter 11we will insist on an absolute measure which in absolute value cannot exceed 1. As we said, for all data sets. The extreme values are interesting: Values of r close to zero indicate little or no linear relationship.
Scatter plots with values of r As we thought, the strongest relationships score 0 with our measure because they are both nonlinear. The best linear pattern is Plot 2, although Plot 3 is close.
Bivariate relationship linearity, strength and direction (video) | Khan Academy
We can do a bit more with the sample correlation coefficient. It is associated with the LS fit. It can be shown that where is the LS estimate of slope. So r contains information on the fit. We can be more precise. Consider the variation or noise in the Y data. A measure of this variation is the sample variance sY2 of the Y's.
In fact, is the percentage of variation accounted for in the LS fit of Y versus X. Consider the values of R2 for Plots The value of R2 can be obtained using the regression module. The measures r and R2 are not robust. We will consider alternative measures of r later, but for now we do offer an alternative to R2labeled as RW2. This is the measure that corresponds to the robust Wilcoxon fit.
This is not as sensitive as R2 to outliers. We show this for the baseball height and weight data. Recall that we changed the original data by inserting an outlier.
The measure corresponding to the robust Wilcoxon fit only changed from. Then compute it, ans: Reconsider exercise 1 of Exercise 1. The data are given below. Scatterplot the data and guess the correlation coefficient. Recall that the LS estimate of slope was 2.
Suppose the sample standard deviations of x and y are given by 3. Compute the correlation coefficient.