Before going into complex model building, looking at data relation is a sensible step to understand how your different variable interact together. Correlation look at trends shared between two variables, and regression look at causal relation between a predictor (independent variable) and a response (dependent) variable.
Correlation
As mentioned above correlation look at global movement shared between two variables, for example when one variable increases and the other increases as well, then these two variables are said to be positively correlated. The other way round when a variable increase and the other decrease then these two variables are negatively correlated. In the case of no correlation no pattern will be seen between the two variable.
Let’s look at some code before introducing correlation measure:
xHere is the plot:
From the plot we get we see that when we plot the variable y with x, the points form some kind of line, when the value of x get bigger the value of y get somehow proportionally bigger too, we can suspect a positive correlation between x and y.
The measure of this correlation is called the coefficient of correlation and can calculated in different ways, the most usual measure is the Pearson coefficient, it is the covariance of the two variable divided by the product of their variance, it is scaled between 1 (for a perfect positive correlation) to -1 (for a perfect negative correlation), 0 would be complete randomness. We can get the Pearson coefficient of correlation using the function
cor()
:cor(df,method="pearson")x y z
x 1.0000000 0.8736874 -0.2485967
y 0.8736874 1.0000000 -0.2376243
z -0.2485967 -0.2376243 1.0000000cor(df[,1:3],method="spearman")
Source:: r-bloggers.com