Correlations Using R

Correlations are used to explore the relationship between two variables. Such connections are interpreted to exhibit the linkages between two variables. However, the existence of causality should follow common sense. Thus, correlation is not proof of causality though causal factors and their effects are correlated. The causal relationship is taken to be valid only when the correlation is significant. Normally the level of significance P ≤ 0.05 (95% confidence interval).

The R platform provides support for two methods of correlation analysis: 1) Pearson correlation; 2) Kendall-Spearman rank-correlation. The first one is parametric and the second one is based on the rank and therefore called a non-parametric test. The Pearson correlation is valid when the data follows the normal distribution. Before carrying out the correlation analysis, test assumptions are verified using the Shapiro-Wilk test for normality. The test assumes that the data follows a normal distribution (null hypothesis). If the test fails, then the data is not normal. Then data transformation is applied to make the data normal i.e the transformed data should follow a normal distribution. If data fails to meet normality after transformation, then Kendall-Spearman rank correlation should be employed.

The R also libraries provides functions for plotting the correlation in several different ways. One such a function, ggscatter, help us to plot the confidence interval along the best-fit line (sometimes called the regression line). Moreover, the corrplot function helps us to draw different types of correlogram using suitable methods.

Reference:

Correlation matrix : A quick start guide to analyze, format and visualize a correlation matrix using R software




Comments