Parallel Analysis on PCA

In [3]:
require 'statsample'
samples=150
variables=30
iterations=50
Statsample::Analysis.store(Statsample::Factor::ParallelAnalysis) do
  Daru.lazy_update = true
  
  rng = Distribution::Normal.rng()
  f1  = rnorm(samples)
  f2  = rnorm(samples)
  f3  = rnorm(samples)

  vectors={}

  variables.times do |i|
    vectors["v#{i}".to_sym] = Daru::Vector.new(samples.times.collect {|nv| f1[nv]*i+(f2[nv]*(15-i))+((f3[nv]*(30-i))*1.5)*rng.call})
    vectors["v#{i}".to_sym].rename "Vector #{i}"
  end

  ds = Daru::DataFrame.new(vectors)

  pa=Statsample::Factor::ParallelAnalysis.new(ds, :iterations=>iterations, :debug=>true)
  pca=pca(cor(ds))
  echo "There are 3 real factors on data"
  summary pca
  echo "Traditional Kaiser criterion (k>1) returns #{pca.m} factors"
  summary pa
  echo "Parallel Analysis returns #{pa.number_of_factors} factors to preserve"
 Daru.lazy_update = false
end

Statsample::Analysis.run_batch
Parallel Analysis: Iteration 0
Parallel Analysis: Iteration 1
Parallel Analysis: Iteration 2
Parallel Analysis: Iteration 3
Parallel Analysis: Iteration 4
Parallel Analysis: Iteration 5
Parallel Analysis: Iteration 6
Parallel Analysis: Iteration 7
Parallel Analysis: Iteration 8
Parallel Analysis: Iteration 9
Parallel Analysis: Iteration 10
Parallel Analysis: Iteration 11
Parallel Analysis: Iteration 12
Parallel Analysis: Iteration 13
Parallel Analysis: Iteration 14
Parallel Analysis: Iteration 15
Parallel Analysis: Iteration 16
Parallel Analysis: Iteration 17
Parallel Analysis: Iteration 18
Parallel Analysis: Iteration 19
Parallel Analysis: Iteration 20
Parallel Analysis: Iteration 21
Parallel Analysis: Iteration 22
Parallel Analysis: Iteration 23
Parallel Analysis: Iteration 24
Parallel Analysis: Iteration 25
Parallel Analysis: Iteration 26
Parallel Analysis: Iteration 27
Parallel Analysis: Iteration 28
Parallel Analysis: Iteration 29
Parallel Analysis: Iteration 30
Parallel Analysis: Iteration 31
Parallel Analysis: Iteration 32
Parallel Analysis: Iteration 33
Parallel Analysis: Iteration 34
Parallel Analysis: Iteration 35
Parallel Analysis: Iteration 36
Parallel Analysis: Iteration 37
Parallel Analysis: Iteration 38
Parallel Analysis: Iteration 39
Parallel Analysis: Iteration 40
Parallel Analysis: Iteration 41
Parallel Analysis: Iteration 42
Parallel Analysis: Iteration 43
Parallel Analysis: Iteration 44
Parallel Analysis: Iteration 45
Parallel Analysis: Iteration 46
Parallel Analysis: Iteration 47
Parallel Analysis: Iteration 48
Parallel Analysis: Iteration 49
Analysis 2015-06-04 12:37:28 +0530
= Statsample::Factor::ParallelAnalysis
  There are 3 real factors on data
  == Principal Component Analysis
    Number of factors: 7
    Communalities
+----------+---------+------------+--------+
| Variable | Initial | Extraction |   %    |
+----------+---------+------------+--------+
| v0       | 1.000   | 0.598      | 59.807 |
| v1       | 1.000   | 0.801      | 80.128 |
| v10      | 1.000   | 0.569      | 56.925 |
| v11      | 1.000   | 0.677      | 67.746 |
| v12      | 1.000   | 0.439      | 43.904 |
| v13      | 1.000   | 0.727      | 72.661 |
| v14      | 1.000   | 0.552      | 55.151 |
| v15      | 1.000   | 0.678      | 67.813 |
| v16      | 1.000   | 0.624      | 62.360 |
| v17      | 1.000   | 0.604      | 60.404 |
| v18      | 1.000   | 0.624      | 62.436 |
| v19      | 1.000   | 0.754      | 75.400 |
| v2       | 1.000   | 0.731      | 73.064 |
| v20      | 1.000   | 0.773      | 77.278 |
| v21      | 1.000   | 0.821      | 82.106 |
| v22      | 1.000   | 0.820      | 82.046 |
| v23      | 1.000   | 0.923      | 92.273 |
| v24      | 1.000   | 0.941      | 94.130 |
| v25      | 1.000   | 0.930      | 92.954 |
| v26      | 1.000   | 0.953      | 95.287 |
| v27      | 1.000   | 0.978      | 97.808 |
| v28      | 1.000   | 0.979      | 97.869 |
| v29      | 1.000   | 0.979      | 97.871 |
| v3       | 1.000   | 0.584      | 58.402 |
| v4       | 1.000   | 0.740      | 74.035 |
| v5       | 1.000   | 0.742      | 74.217 |
| v6       | 1.000   | 0.673      | 67.334 |
| v7       | 1.000   | 0.549      | 54.927 |
| v8       | 1.000   | 0.411      | 41.079 |
| v9       | 1.000   | 0.705      | 70.541 |
+----------+---------+------------+--------+

    Total Variance Explained
+--------------+---------+---------+---------+
|  Component   | E.Total |    %    | Cum. %  |
+--------------+---------+---------+---------+
| Component 1  | 12.649  | 42.163% | 42.163  |
| Component 2  | 2.835   | 9.451%  | 51.613  |
| Component 3  | 1.626   | 5.421%  | 57.035  |
| Component 4  | 1.349   | 4.497%  | 61.532  |
| Component 5  | 1.216   | 4.054%  | 65.586  |
| Component 6  | 1.119   | 3.730%  | 69.316  |
| Component 7  | 1.085   | 3.616%  | 72.932  |
| Component 8  | 0.980   | 3.268%  | 76.200  |
| Component 9  | 0.824   | 2.747%  | 78.947  |
| Component 10 | 0.785   | 2.618%  | 81.565  |
| Component 11 | 0.725   | 2.416%  | 83.981  |
| Component 12 | 0.699   | 2.330%  | 86.311  |
| Component 13 | 0.651   | 2.169%  | 88.480  |
| Component 14 | 0.538   | 1.792%  | 90.272  |
| Component 15 | 0.457   | 1.524%  | 91.797  |
| Component 16 | 0.423   | 1.410%  | 93.207  |
| Component 17 | 0.402   | 1.339%  | 94.547  |
| Component 18 | 0.321   | 1.068%  | 95.615  |
| Component 19 | 0.295   | 0.984%  | 96.599  |
| Component 20 | 0.240   | 0.800%  | 97.398  |
| Component 21 | 0.222   | 0.740%  | 98.138  |
| Component 22 | 0.185   | 0.616%  | 98.754  |
| Component 23 | 0.107   | 0.356%  | 99.110  |
| Component 24 | 0.098   | 0.326%  | 99.436  |
| Component 25 | 0.057   | 0.189%  | 99.625  |
| Component 26 | 0.052   | 0.173%  | 99.798  |
| Component 27 | 0.033   | 0.110%  | 99.908  |
| Component 28 | 0.019   | 0.065%  | 99.972  |
| Component 29 | 0.006   | 0.021%  | 99.993  |
| Component 30 | 0.002   | 0.007%  | 100.000 |
+--------------+---------+---------+---------+

    Component matrix
+-----+-------+-------+-------+-------+-------+-------+-------+
|     | PC_1  | PC_2  | PC_3  | PC_4  | PC_5  | PC_6  | PC_7  |
+-----+-------+-------+-------+-------+-------+-------+-------+
| v0  | .008  | .642  | .122  | -.027 | .261  | .312  | .066  |
| v1  | -.145 | .101  | -.479 | -.279 | .520  | -.385 | -.210 |
| v10 | .410  | .323  | .295  | -.149 | .238  | -.013 | -.362 |
| v11 | .308  | .034  | -.299 | -.122 | .331  | .460  | .394  |
| v12 | .482  | -.121 | -.192 | .110  | -.067 | .209  | -.308 |
| v13 | .448  | .382  | -.264 | .412  | .057  | -.366 | -.060 |
| v14 | .519  | .288  | -.127 | -.355 | -.198 | .028  | -.130 |
| v15 | .624  | .338  | .206  | .161  | -.143 | .012  | -.291 |
| v16 | .707  | .020  | .159  | .291  | .058  | .065  | .080  |
| v17 | .721  | -.110 | -.249 | -.042 | -.006 | -.044 | -.082 |
| v18 | .765  | -.006 | .122  | -.102 | .078  | .035  | .076  |
| v19 | .820  | -.027 | -.143 | -.059 | -.023 | .099  | -.214 |
| v2  | .131  | .703  | .007  | .085  | -.179 | .420  | .059  |
| v20 | .835  | .014  | .043  | -.044 | -.038 | .032  | .262  |
| v21 | .883  | -.072 | .032  | -.038 | .064  | -.012 | .174  |
| v22 | .898  | -.041 | .018  | .061  | .058  | -.075 | .012  |
| v23 | .946  | -.086 | .036  | -.097 | .014  | -.041 | .083  |
| v24 | .964  | -.065 | .002  | .048  | .011  | -.058 | .040  |
| v25 | .956  | -.048 | -.031 | .009  | -.044 | -.090 | .044  |
| v26 | .965  | -.126 | .024  | -.045 | .031  | .017  | .038  |
| v27 | .974  | -.136 | .036  | -.058 | -.027 | -.071 | .034  |
| v28 | .974  | -.139 | .052  | -.045 | .012  | -.058 | .038  |
| v29 | .975  | -.145 | .047  | -.037 | .002  | -.057 | .033  |
| v3  | -.090 | -.161 | -.687 | .065  | .190  | .140  | .135  |
| v4  | -.072 | .734  | -.055 | -.273 | .039  | -.260 | .222  |
| v5  | .066  | .463  | .128  | .552  | .300  | -.210 | .261  |
| v6  | .017  | .478  | -.203 | -.068 | -.515 | -.283 | .232  |
| v7  | .245  | .527  | .061  | -.293 | .189  | .098  | -.276 |
| v8  | .167  | .185  | -.394 | -.207 | -.383 | .030  | .058  |
| v9  | .267  | .161  | -.463 | .497  | -.118 | .186  | -.313 |
+-----+-------+-------+-------+-------+-------+-------+-------+

  Traditional Kaiser criterion (k>1) returns 7 factors
  == Parallel Analysis
    Bootstrap Method: random
    Uses SMC: No
    Correlation Matrix type : correlation_matrix
    Number of variables: 30
    Number of cases: 150
    Number of iterations: 50
    Number or factors to preserve: 2
    Eigenvalues
+----+-----------------+----------------------+--------+-----------+
| n  | data eigenvalue | generated eigenvalue |  p.95  | preserve? |
+----+-----------------+----------------------+--------+-----------+
| 1  | 12.6488         | 1.9482               | 2.0426 | Yes       |
| 2  | 2.8352          | 1.8029               | 1.8892 | Yes       |
| 3  | 1.6264          | 1.7055               | 1.8083 |           |
| 4  | 1.3492          | 1.6212               | 1.7078 |           |
| 5  | 1.2162          | 1.5343               | 1.6195 |           |
| 6  | 1.1190          | 1.4597               | 1.5586 |           |
| 7  | 1.0847          | 1.3873               | 1.4633 |           |
| 8  | 0.9804          | 1.3198               | 1.3579 |           |
| 9  | 0.8242          | 1.2639               | 1.3108 |           |
| 10 | 0.7853          | 1.2097               | 1.2580 |           |
| 11 | 0.7248          | 1.1571               | 1.2002 |           |
| 12 | 0.6991          | 1.1072               | 1.1427 |           |
| 13 | 0.6506          | 1.0566               | 1.0902 |           |
| 14 | 0.5376          | 1.0097               | 1.0522 |           |
| 15 | 0.4573          | 0.9611               | 1.0041 |           |
| 16 | 0.4231          | 0.9127               | 0.9611 |           |
| 17 | 0.4018          | 0.8725               | 0.9004 |           |
| 18 | 0.3205          | 0.8256               | 0.8674 |           |
| 19 | 0.2951          | 0.7902               | 0.8363 |           |
| 20 | 0.2399          | 0.7452               | 0.7848 |           |
| 21 | 0.2219          | 0.7063               | 0.7378 |           |
| 22 | 0.1849          | 0.6680               | 0.7120 |           |
| 23 | 0.1067          | 0.6306               | 0.6696 |           |
| 24 | 0.0978          | 0.5933               | 0.6302 |           |
| 25 | 0.0566          | 0.5507               | 0.5993 |           |
| 26 | 0.0520          | 0.5155               | 0.5522 |           |
| 27 | 0.0329          | 0.4733               | 0.5060 |           |
| 28 | 0.0194          | 0.4336               | 0.4700 |           |
| 29 | 0.0063          | 0.3954               | 0.4309 |           |
| 30 | 0.0020          | 0.3425               | 0.3953 |           |
+----+-----------------+----------------------+--------+-----------+

  Parallel Analysis returns 2 factors to preserve