페이지

5/17/2016

Correlation Coefficient - Pearson coefficient

  It’s a sequel to the 1st assignment - study on the relationship b.t.w personal income and smoking amount. It’s already been verified through ANOVA test that there’re no significant relationship. 
  Plus, I was interested in association b.t.w personal & family income. To get better meaningful data, I should refine the income data. I got the mean & std value of the personal and family income. And then, discarded the data over 1 std from the mean value as np.nan.(refer to the below)
sub[sub.S1Q10A >= (sub.S1Q10A.mean()+sub.S1Q10A.std())] = np.nan sub[sub.S1Q11A >= (sub.S1Q11A.mean()+sub.S1Q11A.std())] = np.nan
  The results are,


Association between personal income and smoking amount
(0.034107681918807302, 0.0025274573221202926)
  Pearson coefficient is almost close to zero, however, the P-value is under significant level(alpha=0.05). The results made me confused because the plot looks there’re no relationship btw them while the P-value told me there’re certain related to one another. Probably, there’re missing factor beyond my knowledge, however, I can’t find the why for nowI tend to think that there is no ‘Linear’ relationship. But, obviously It seemed they have no relationship from that scatter plot.(and also I know the ANOVA results)


Association between personal income and family income
(0.66086621192976558, 0.0)
  The 2nd results are crystal clear to say that there’re in very strong positive relationship and also significantly related to one another. The interesting part is P-value is ZERO (Is it possible?) which means there’s any chance to be in error when rejecting the null hypothesis. 
 And one more thing, everyone can earn more than at least family income(Is it typical in USA? or Must be wrong something;;) - graph is like a upper diagonal matrix.

댓글 없음:

댓글 쓰기

블로그 보관함

Facebook

Advertising

Disqus Shortname

Popular Posts