Skip to content

Files

Latest commit

Oct 19, 2012
16626a4 · Oct 19, 2012

History

History
221 lines (180 loc) · 8.13 KB

learn R Function---sweep.md

File metadata and controls

221 lines (180 loc) · 8.13 KB

R函数介绍:sweep()

函数的用途

base包中以*“Sweep out Array Summaries”*,为sweep()函数命名。也表明了该函数的作用,即处理统计量的工具。所以一般结合apply()函数来使用,一般我们需要将apply()统计出来的统计量要代回原数据集去对应操作的时候就需要用到sweep()。

扫除、清除也是sweep单词的本义。

函数的参数

sweep(x, MARGIN, STATS, FUN = "-", check.margin = TRUE, ...)
  • x:即要处理的原数据集
  • MARGIN:对行或列,或者数列的其他维度进行操作
  • STATS:需要对原数据集操作用到的统计量
  • FUN:操作需要用到的四则运算,默认为减法"-",当然可以修改成"+","*","/"即加、乘、除
  • check.margin:是否需要检查维度是否适宜的问题,默认为TRUE。
  • ……

实例分析

1、比如我们需要将原数据集所有数据都减去各列的平均数,所以我们需要用apply()计算出每列的平均数,然后用sweep()完成。

require(stats)  # for mean
head(attitude, 10)
##    rating complaints privileges learning raises critical advance
## 1      43         51         30       39     61       92      45
## 2      63         64         51       54     63       73      47
## 3      71         70         68       69     76       86      48
## 4      61         63         45       47     54       84      35
## 5      81         78         56       66     71       83      47
## 6      43         55         49       44     54       49      34
## 7      58         67         42       56     66       68      35
## 8      71         75         50       55     70       66      41
## 9      72         82         72       67     71       83      31
## 10     67         61         45       47     62       80      41
mean.att <- apply(attitude, 2, mean)
mean.att
##     rating complaints privileges   learning     raises   critical 
##      64.63      66.60      53.13      56.37      64.63      74.77 
##    advance 
##      42.93
head(sweep(data.matrix(attitude), 2, mean.att), 10)  # subtract the column means
##        rating complaints privileges learning  raises critical advance
##  [1,] -21.633      -15.6    -23.133 -17.3667  -3.633   17.233   2.067
##  [2,]  -1.633       -2.6     -2.133  -2.3667  -1.633   -1.767   4.067
##  [3,]   6.367        3.4     14.867  12.6333  11.367   11.233   5.067
##  [4,]  -3.633       -3.6     -8.133  -9.3667 -10.633    9.233  -7.933
##  [5,]  16.367       11.4      2.867   9.6333   6.367    8.233   4.067
##  [6,] -21.633      -11.6     -4.133 -12.3667 -10.633  -25.767  -8.933
##  [7,]  -6.633        0.4    -11.133  -0.3667   1.367   -6.767  -7.933
##  [8,]   6.367        8.4     -3.133  -1.3667   5.367   -8.767  -1.933
##  [9,]   7.367       15.4     18.867  10.6333   6.367    8.233 -11.933
## [10,]   2.367       -5.6     -8.133  -9.3667  -2.633    5.233  -1.933

2、当然可以将默认的减法改变成除法,例如除以每列的标准差

head(attitude, 10)
##    rating complaints privileges learning raises critical advance
## 1      43         51         30       39     61       92      45
## 2      63         64         51       54     63       73      47
## 3      71         70         68       69     76       86      48
## 4      61         63         45       47     54       84      35
## 5      81         78         56       66     71       83      47
## 6      43         55         49       44     54       49      34
## 7      58         67         42       56     66       68      35
## 8      71         75         50       55     70       66      41
## 9      72         82         72       67     71       83      31
## 10     67         61         45       47     62       80      41
sd.att <- apply(attitude, 2, sd)
sd.att
##     rating complaints privileges   learning     raises   critical 
##     12.173     13.315     12.235     11.737     10.397      9.895 
##    advance 
##     10.289
head(sweep(data.matrix(attitude), 2, sd.att, "/"), 10)  # subtract the column sds
##       rating complaints privileges learning raises critical advance
##  [1,]  3.533      3.830      2.452    3.323  5.867    9.298   4.374
##  [2,]  5.176      4.807      4.168    4.601  6.059    7.378   4.568
##  [3,]  5.833      5.257      5.558    5.879  7.310    8.691   4.665
##  [4,]  5.011      4.732      3.678    4.004  5.194    8.489   3.402
##  [5,]  6.654      5.858      4.577    5.623  6.829    8.388   4.568
##  [6,]  3.533      4.131      4.005    3.749  5.194    4.952   3.305
##  [7,]  4.765      5.032      3.433    4.771  6.348    6.872   3.402
##  [8,]  5.833      5.633      4.086    4.686  6.733    6.670   3.985
##  [9,]  5.915      6.159      5.885    5.708  6.829    8.388   3.013
## [10,]  5.504      4.581      3.678    4.004  5.963    8.085   3.985

3、当然也可以实现标准化,只需要将以上两步骤合并即可。

head(attitude)
##   rating complaints privileges learning raises critical advance
## 1     43         51         30       39     61       92      45
## 2     63         64         51       54     63       73      47
## 3     71         70         68       69     76       86      48
## 4     61         63         45       47     54       84      35
## 5     81         78         56       66     71       83      47
## 6     43         55         49       44     54       49      34
mean.att <- apply(attitude, 2, mean)

mean.dat <- sweep(data.matrix(attitude), 2, mean.att)  # subtract the column means
sd.att <- apply(attitude, 2, sd)
sd.att
##     rating complaints privileges   learning     raises   critical 
##     12.173     13.315     12.235     11.737     10.397      9.895 
##    advance 
##     10.289
sd.dat <- sweep(data.matrix(mean.dat), 2, sd.att, "/")  # subtract the column sds
head(sd.dat, 10)
##        rating complaints privileges learning  raises critical advance
##  [1,] -1.7772   -1.17163    -1.8907 -1.47965 -0.3495   1.7416  0.2009
##  [2,] -0.1342   -0.19527    -0.1744 -0.20164 -0.1571  -0.1785  0.3953
##  [3,]  0.5230    0.25536     1.2151  1.07637  1.0932   1.1353  0.4924
##  [4,] -0.2985   -0.27038    -0.6647 -0.79805 -1.0227   0.9331 -0.7711
##  [5,]  1.3446    0.85619     0.2343  0.82077  0.6123   0.8321  0.3953
##  [6,] -1.7772   -0.87121    -0.3378 -1.05365 -1.0227  -2.6040 -0.8683
##  [7,] -0.5449    0.03004    -0.9099 -0.03124  0.1314  -0.6839 -0.7711
##  [8,]  0.5230    0.63088    -0.2561 -0.11644  0.5162  -0.8860 -0.1879
##  [9,]  0.6052    1.15661     1.5420  0.90597  0.6123   0.8321 -1.1598
## [10,]  0.1944   -0.42059    -0.6647 -0.79805 -0.2533   0.5289 -0.1879

对比base包中的标准化命令scale():

head(sd.dat, 10)
##        rating complaints privileges learning  raises critical advance
##  [1,] -1.7772   -1.17163    -1.8907 -1.47965 -0.3495   1.7416  0.2009
##  [2,] -0.1342   -0.19527    -0.1744 -0.20164 -0.1571  -0.1785  0.3953
##  [3,]  0.5230    0.25536     1.2151  1.07637  1.0932   1.1353  0.4924
##  [4,] -0.2985   -0.27038    -0.6647 -0.79805 -1.0227   0.9331 -0.7711
##  [5,]  1.3446    0.85619     0.2343  0.82077  0.6123   0.8321  0.3953
##  [6,] -1.7772   -0.87121    -0.3378 -1.05365 -1.0227  -2.6040 -0.8683
##  [7,] -0.5449    0.03004    -0.9099 -0.03124  0.1314  -0.6839 -0.7711
##  [8,]  0.5230    0.63088    -0.2561 -0.11644  0.5162  -0.8860 -0.1879
##  [9,]  0.6052    1.15661     1.5420  0.90597  0.6123   0.8321 -1.1598
## [10,]  0.1944   -0.42059    -0.6647 -0.79805 -0.2533   0.5289 -0.1879
head(scale(attitude), 10)
##        rating complaints privileges learning  raises critical advance
##  [1,] -1.7772   -1.17163    -1.8907 -1.47965 -0.3495   1.7416  0.2009
##  [2,] -0.1342   -0.19527    -0.1744 -0.20164 -0.1571  -0.1785  0.3953
##  [3,]  0.5230    0.25536     1.2151  1.07637  1.0932   1.1353  0.4924
##  [4,] -0.2985   -0.27038    -0.6647 -0.79805 -1.0227   0.9331 -0.7711
##  [5,]  1.3446    0.85619     0.2343  0.82077  0.6123   0.8321  0.3953
##  [6,] -1.7772   -0.87121    -0.3378 -1.05365 -1.0227  -2.6040 -0.8683
##  [7,] -0.5449    0.03004    -0.9099 -0.03124  0.1314  -0.6839 -0.7711
##  [8,]  0.5230    0.63088    -0.2561 -0.11644  0.5162  -0.8860 -0.1879
##  [9,]  0.6052    1.15661     1.5420  0.90597  0.6123   0.8321 -1.1598
## [10,]  0.1944   -0.42059    -0.6647 -0.79805 -0.2533   0.5289 -0.1879

可以看出,结果一样。 …………