base包中以*“Sweep out Array Summaries”*,为sweep()函数命名。也表明了该函数的作用,即处理统计量的工具。所以一般结合apply()函数来使用,一般我们需要将apply()统计出来的统计量要代回原数据集去对应操作的时候就需要用到sweep()。
扫除、清除也是sweep单词的本义。
sweep(x, MARGIN, STATS, FUN = "-", check.margin = TRUE, ...)
- x:即要处理的原数据集
- MARGIN:对行或列,或者数列的其他维度进行操作
- STATS:需要对原数据集操作用到的统计量
- FUN:操作需要用到的四则运算,默认为减法"-",当然可以修改成"+","*","/"即加、乘、除
- check.margin:是否需要检查维度是否适宜的问题,默认为TRUE。
- ……
1、比如我们需要将原数据集所有数据都减去各列的平均数,所以我们需要用apply()计算出每列的平均数,然后用sweep()完成。
require(stats) # for mean
head(attitude, 10)
## rating complaints privileges learning raises critical advance
## 1 43 51 30 39 61 92 45
## 2 63 64 51 54 63 73 47
## 3 71 70 68 69 76 86 48
## 4 61 63 45 47 54 84 35
## 5 81 78 56 66 71 83 47
## 6 43 55 49 44 54 49 34
## 7 58 67 42 56 66 68 35
## 8 71 75 50 55 70 66 41
## 9 72 82 72 67 71 83 31
## 10 67 61 45 47 62 80 41
mean.att <- apply(attitude, 2, mean)
mean.att
## rating complaints privileges learning raises critical
## 64.63 66.60 53.13 56.37 64.63 74.77
## advance
## 42.93
head(sweep(data.matrix(attitude), 2, mean.att), 10) # subtract the column means
## rating complaints privileges learning raises critical advance
## [1,] -21.633 -15.6 -23.133 -17.3667 -3.633 17.233 2.067
## [2,] -1.633 -2.6 -2.133 -2.3667 -1.633 -1.767 4.067
## [3,] 6.367 3.4 14.867 12.6333 11.367 11.233 5.067
## [4,] -3.633 -3.6 -8.133 -9.3667 -10.633 9.233 -7.933
## [5,] 16.367 11.4 2.867 9.6333 6.367 8.233 4.067
## [6,] -21.633 -11.6 -4.133 -12.3667 -10.633 -25.767 -8.933
## [7,] -6.633 0.4 -11.133 -0.3667 1.367 -6.767 -7.933
## [8,] 6.367 8.4 -3.133 -1.3667 5.367 -8.767 -1.933
## [9,] 7.367 15.4 18.867 10.6333 6.367 8.233 -11.933
## [10,] 2.367 -5.6 -8.133 -9.3667 -2.633 5.233 -1.933
2、当然可以将默认的减法改变成除法,例如除以每列的标准差
head(attitude, 10)
## rating complaints privileges learning raises critical advance
## 1 43 51 30 39 61 92 45
## 2 63 64 51 54 63 73 47
## 3 71 70 68 69 76 86 48
## 4 61 63 45 47 54 84 35
## 5 81 78 56 66 71 83 47
## 6 43 55 49 44 54 49 34
## 7 58 67 42 56 66 68 35
## 8 71 75 50 55 70 66 41
## 9 72 82 72 67 71 83 31
## 10 67 61 45 47 62 80 41
sd.att <- apply(attitude, 2, sd)
sd.att
## rating complaints privileges learning raises critical
## 12.173 13.315 12.235 11.737 10.397 9.895
## advance
## 10.289
head(sweep(data.matrix(attitude), 2, sd.att, "/"), 10) # subtract the column sds
## rating complaints privileges learning raises critical advance
## [1,] 3.533 3.830 2.452 3.323 5.867 9.298 4.374
## [2,] 5.176 4.807 4.168 4.601 6.059 7.378 4.568
## [3,] 5.833 5.257 5.558 5.879 7.310 8.691 4.665
## [4,] 5.011 4.732 3.678 4.004 5.194 8.489 3.402
## [5,] 6.654 5.858 4.577 5.623 6.829 8.388 4.568
## [6,] 3.533 4.131 4.005 3.749 5.194 4.952 3.305
## [7,] 4.765 5.032 3.433 4.771 6.348 6.872 3.402
## [8,] 5.833 5.633 4.086 4.686 6.733 6.670 3.985
## [9,] 5.915 6.159 5.885 5.708 6.829 8.388 3.013
## [10,] 5.504 4.581 3.678 4.004 5.963 8.085 3.985
3、当然也可以实现标准化,只需要将以上两步骤合并即可。
head(attitude)
## rating complaints privileges learning raises critical advance
## 1 43 51 30 39 61 92 45
## 2 63 64 51 54 63 73 47
## 3 71 70 68 69 76 86 48
## 4 61 63 45 47 54 84 35
## 5 81 78 56 66 71 83 47
## 6 43 55 49 44 54 49 34
mean.att <- apply(attitude, 2, mean)
mean.dat <- sweep(data.matrix(attitude), 2, mean.att) # subtract the column means
sd.att <- apply(attitude, 2, sd)
sd.att
## rating complaints privileges learning raises critical
## 12.173 13.315 12.235 11.737 10.397 9.895
## advance
## 10.289
sd.dat <- sweep(data.matrix(mean.dat), 2, sd.att, "/") # subtract the column sds
head(sd.dat, 10)
## rating complaints privileges learning raises critical advance
## [1,] -1.7772 -1.17163 -1.8907 -1.47965 -0.3495 1.7416 0.2009
## [2,] -0.1342 -0.19527 -0.1744 -0.20164 -0.1571 -0.1785 0.3953
## [3,] 0.5230 0.25536 1.2151 1.07637 1.0932 1.1353 0.4924
## [4,] -0.2985 -0.27038 -0.6647 -0.79805 -1.0227 0.9331 -0.7711
## [5,] 1.3446 0.85619 0.2343 0.82077 0.6123 0.8321 0.3953
## [6,] -1.7772 -0.87121 -0.3378 -1.05365 -1.0227 -2.6040 -0.8683
## [7,] -0.5449 0.03004 -0.9099 -0.03124 0.1314 -0.6839 -0.7711
## [8,] 0.5230 0.63088 -0.2561 -0.11644 0.5162 -0.8860 -0.1879
## [9,] 0.6052 1.15661 1.5420 0.90597 0.6123 0.8321 -1.1598
## [10,] 0.1944 -0.42059 -0.6647 -0.79805 -0.2533 0.5289 -0.1879
head(sd.dat, 10)
## rating complaints privileges learning raises critical advance
## [1,] -1.7772 -1.17163 -1.8907 -1.47965 -0.3495 1.7416 0.2009
## [2,] -0.1342 -0.19527 -0.1744 -0.20164 -0.1571 -0.1785 0.3953
## [3,] 0.5230 0.25536 1.2151 1.07637 1.0932 1.1353 0.4924
## [4,] -0.2985 -0.27038 -0.6647 -0.79805 -1.0227 0.9331 -0.7711
## [5,] 1.3446 0.85619 0.2343 0.82077 0.6123 0.8321 0.3953
## [6,] -1.7772 -0.87121 -0.3378 -1.05365 -1.0227 -2.6040 -0.8683
## [7,] -0.5449 0.03004 -0.9099 -0.03124 0.1314 -0.6839 -0.7711
## [8,] 0.5230 0.63088 -0.2561 -0.11644 0.5162 -0.8860 -0.1879
## [9,] 0.6052 1.15661 1.5420 0.90597 0.6123 0.8321 -1.1598
## [10,] 0.1944 -0.42059 -0.6647 -0.79805 -0.2533 0.5289 -0.1879
head(scale(attitude), 10)
## rating complaints privileges learning raises critical advance
## [1,] -1.7772 -1.17163 -1.8907 -1.47965 -0.3495 1.7416 0.2009
## [2,] -0.1342 -0.19527 -0.1744 -0.20164 -0.1571 -0.1785 0.3953
## [3,] 0.5230 0.25536 1.2151 1.07637 1.0932 1.1353 0.4924
## [4,] -0.2985 -0.27038 -0.6647 -0.79805 -1.0227 0.9331 -0.7711
## [5,] 1.3446 0.85619 0.2343 0.82077 0.6123 0.8321 0.3953
## [6,] -1.7772 -0.87121 -0.3378 -1.05365 -1.0227 -2.6040 -0.8683
## [7,] -0.5449 0.03004 -0.9099 -0.03124 0.1314 -0.6839 -0.7711
## [8,] 0.5230 0.63088 -0.2561 -0.11644 0.5162 -0.8860 -0.1879
## [9,] 0.6052 1.15661 1.5420 0.90597 0.6123 0.8321 -1.1598
## [10,] 0.1944 -0.42059 -0.6647 -0.79805 -0.2533 0.5289 -0.1879
可以看出,结果一样。 …………