SAS软件一VARCLUS过程变量聚类如果没右为VARCLUS过程提供初始分类情况.VARCLUS过程开始把所冇变城看成一个类.然后它垂复以下步骤:(1) 首先挑选-个将被分裂的类,通常这个被选屮的类的类分虽所解释的方差百分比姒小(选项PRECENT-〉或者同第二 主成分冇茨的持征值为报大(选项MAXETGH-) °(2) 把选屮的类分裂成两个类,首先计弊前两个主成分.再进行斜交旋转.并把每个变虽分配到旋转分虽对应的类里•分 配的原则是使变量与这个主成分的相关系数为赧大3) 变呈巫新归类通过多伙反复垂复•变量被垂新分配到这些类里•使得由这些类分虽所解释的方差为故大当毎…类满足用户规定的准則时• VARCLUS过程停止所谓准则,或長每个类分量所解释的方差的百分比,或是每一类的第二 待征值达到浪设定的标准为止 如果没有准则,则当每个类只有一林征值大于1时八ARCLUS过程停止SAS程序输入如下程序:OPTION PS-SOO;/*要求输出的结果中每页包括800行内容.可避免不必要的SAS标题反复出现/PROC VARCLUS DATA-WORKXLSSAS;VARX1-X12;RUN;说明:过程语句中没冇任何选择项.默认的聚类方法为主成分聚类法。
过程步瑕终会聚成多少类.将由默认的临界值來决定. 即肖每个类只有一个待征值大于1时.VARCLUS过程停止结果分析:The SAS Syctem 10:04 Wednesday, November 24. 2010 17Oblique Principal Component Cluster AnalycioObsen^tionc 101S PROPORTION 0Variables 12 MAXEIGEN 1EESClustering algorithm converged.这是用分解法思想进行斜交主成分聚类的第1步,将全部12个变量聚成1类,能解释的方差为2.134427,占总方差的17.79海 第二特征值为1.5146.,并預吿这一类将被分裂・Cluster nummary for 1 clusterSecondEigenvalueCluster Variation ProportionCluster Members Variation Explained Explained12 2.1344270.17791.5146Total variation explained = 2.134427 Proportion = 0.1779Cluster 1 will be cplit.(二)笫2步将I类分裂成2类,分别含叶和8个变童Variation Explained 方差,即第征值;Proportion Explained#Fff方差占本类总方差的百分比;Second Eigenvalue 类 中的第二特征值.Cluster MembersCluster summary^ for 2 cluctcrsCluster Variation ProportionVariationSecondEigenvalue1.9997S91.5005030.49990.18760.85031.2034Total variation explained ■ 3.500292 Proportion — 0.2917相关系数的平方第三列R-squared with Own Cluster是指每个变量与所属类分量之间相关系数的平方疋,变量X6, X7, X10, X12在第1类中,它与第1类分量(相当于主成分分析中的第1主成分)之间的R2*0.7550,第4列R-squared with Next Closest是指每个变量与相邻类的类分量之间的相关系效的R2为皿)80,该值越小.说明分类越合理; 第5列R-squared with 1-R**2 Ratio 是由同一横行的效据求得:l-R^*2Rati(>=[l-(R-squared with Own Cluster)]/! l-(R-squared with Next CloseOb此值越小,表明分类越合理.由此列可以看出.很多比值较大.说明这1叱咬量分成2类是不太合适的.R-:quared withOwnNext1-R**2VariableClusterVariableClusterClooectRatioLabelCluster 1X(50.75500.00800.2470X6X7035340.00730.6514X7X100.57140.00020.4287X10X12032000.00260.6S1SX12Cluster 2XI0.00070.00000.9993XIX20.2S430.00240.7175X2X30.2S880.00100.7119X3X40.00130.00000.99S7X4X50.16070.00000.S393X5XS0.219S0.01990.7961XSX90.22340.05200.S193X9XI10.32160.00360.6S0SXI1从标准化爽量预测类分量的标准回归系数若设Cl、C2为这两类.则有:Cl=0.434500X(5..297257X7^0.377990X10^0.282S86X12Standardized Scoring CoefficientsCluster12XIXI0.0000000.018208X2X20.000000■•355324X3X30.0000000.358174X4X40.000000-.023689X5X50.000000-.267146X(5X60.4345000.000000X7X7-.2972570.000000X8XS0.0000000.312420X9X90.000000-.314963X10X100.3779900.000000XUXll0.0000000.377930X12X120.2828S60.000000类结构类结构相当于因子分析中的因子模型,即毎个标准化变童可以表示成全部类分量的践性组合.例如 X1=0.004387C1+0.027322C2Cluster StructureCluster12XIXI0.0043870.027322X2X2-.049085-.533165X3X3-.0323700.537442X4X40.003068•.035546X5X50.002935-.400S53X(5X60.S6S9070.089540X7X7-.594452-.0S5225X8XS-.1410210.46S7S7X9X9-.228078-.472604X10X100.7559010.015031XllXll0.0596680.56708(5X12X120.565711-.051333类分it之间的相关系数阵Inter-Clucter CorrelationcCluster1211.000000.0554020.055401.00000Cluster 2 wiU be cpUt.这里預吿第埃将被分裂Clustering algorithm converged.Cluster summaryr for 3 cluctcrsClusterVariationProportionSecondClusterMembersVariationExplainedExplainedEigenvalue1441.9997S90.49990.85032331.2530310.41770.92703551.3185460.26371.0410Total variation explained ■ 4.571366 Proportion — 0.3S09R-:quarcd withOwnNext1・R”2VariableClusterVariableClusterClooectRatioLabelCluster 1X(50.75500.00980.2474X6X70.35340.01670.6576X7X100.57140.00120.4291X10X120.32000.00560.6S3SX12Cluster 2X20.47300.00(580.5306X2X30.49710.01230.5092X3X50.28290.003S0.719SX5Cluster 3XI0.15550.00S10.8514XIX40.07840.003S0.9250X4XS0.39200.01990.6203XSX90.27S00.05200.7616X9XI10.41460.02500.6005XI1Standardized Scoring CoefficientsCluster 1 2 3XIXI0.0000000.0000000.299038X2X20.0000000.54SS800.000000X3X30.000000-.5626570.000000X4X40.0000000.000000-.212413X5X50.0000000.4245150.000000X6X(50.4345000.0000000.000000X7X7-.2972570.0000000.000000XSX80.0000000.0000000.474S52X9X90.0000000.000000-.399912X10X100.3779900.0000000.000000XI1XI10.0000000.0000000.4SS31SX12X120.282SS60.0000000.000000Cluster StructureCluster123XIXI0.0043S70.0S97500.394295X2X2-.0490S50.687764-.082508X3X3-.032370-.7050270.110S13X4X40.00306S•061367-.280076X5X50.0029350.531930-.061359X6X(50.S6S9070.00540(50.09S953X7X7-.594452-.017322-.129087XSX8-.141021-.1030950.626114X9X9-.2280780.118370-.527302X10X100.755901-.035072-.016183XI1XU0.05966S-.1582290.643S70X12X120.565711-.006093-.074965Inter-Clucter CorrelationsCluster12311.00000-0.0074S0.054042-0.007481.00000-0.1336S30.05404-O.1336S1.00000Cluster 3 will be cplit.Clustering algorithm converged.1(0)1Cluster ::ummary for 4 cluctersClusterVariationProportionSecondClusterMembersVariationExplainedExplainedEigenvalue1441.9997890.49990.85032331.2530310.41770.92703■21.1202560.56010.87974331.17S5040.392S0.9474Total variation explained — 5.55158 Proportion ■ 0.4626R-cquared withOwnNext1-R**2VariableClusterVariableClusterClooectRatioLabelCluster 1X(50.75500.05540.2594X6X70.35340.02170.6610X7X100.57140.00710.4317X10X120.32000.00230.6S15X12Cluster 2X20.47300.00(520.5303X2X30.49710.01330.5097X3X50.28290.00300.7192X5Cluster 3XI0.56010.00S10.4434XIXS0.56010.02200.4497XSCluster 4X40.26930.00380.7335X4X90.44660.05200.5S37X9XI10.46260.02500.5512XI1标准回归系效Standardized Scoring CoefficientsCluster1234XIXI0.0000000.0000000.6680770.000000X2X20.0000000.54SSS00.0000000.000000X3X30.000000•.5626570.0000000.000000X4X40.0000000.0000000.0000000.440343X5X50.0000000.4245150.0000000.000000X6X60.4345000.0000000.0000000.000000X7X7-.2972570.0000000.0000000.000000XSXS0.0000000.0000000.6680770.000000X9X90.0000000.0000000.0000000.567079X10X100.3779900.0000000.0000000.000000XIIXII0.0000000.0000000.000000-.577107X12X120.2828860.0000000.0000000.000000Cluster StructureCluster1234XIXI0.0043870.0S97500.748417-.04<5<5S2X2X2-.O490S50.6S77640.0071020.07S952X3X3-.032370-.7050270.02440S..11549SX4X40.00306S-.061367-.0099S60.518946X5X50.0029350.5319300.0021670.054407X6X60.8(589070.005406-.123094-.235407X7X7-.594452-.017322-.0255280.147398XSXS-.141021-.1030950.74S417-.148181X9X9-.22S0780.11S370-.O7551S0.668305X10X100.755901-.035072-.084362-.069923XI1XII0.05966S-.1582290.143754..<580123X12X120.5(55711-.006093-.0477170.035731类分董之间的相关系数阵Inter-Clucter CorrelationsCluster123411.00000-0.00748-0.09128-0.162422-0.0074S1.00000-0.008920.131423-0.09128OOOS921.00000-0.1301S4-0.162420.13142-0.1301S1.00000No cluster meetc the criterion for :pHtting.此时已达到默认的停止分裂的临界值(即每个类中只有一个待征值大于1),停止分裂. 意思就是说 每一类中第1待征值(Variation Explained)与第2特征值(Second Eigenvalue)中只有一个待征值大于1.(见四 中的第一个表)最后,给出整个聚类过程的汇总信息:第2列表示分成1类、2类、3类.埃时分别能解释的总方差量;第3列表示分成1类.2|匕3类、快时分别能解释的方差占全部12个变量的总方差的百分比;第4列表示分成1类.2类.3类、J类时由1个类成分能解释的方差占全部12个变量的总方差的量小百分比$第5列表示各类中最大的第2符征值;第6列表示各类中1个变量与其所在类的类分量的最小相关系数平方R2,笫7列表示列为各类中(1-R2) own/(l-R2>next的最大比值.说明:如果设定分类的个数,或是设定其他停止分裂的准则,期毎次分裂按第2特征值最大选择分裂的类.TotalProportionMinimumMaximumXfinimumMa?dmumNumberVariationof VariationProportionSecondR-cquared1-R**2 RatioofExplainedExplainedExplainedEigenvaluefor afor aClustersby Cluctercby Cluctercby a Clusterin a ClusterVariableVariable12.1344270.17790.17791.5145960.000023.5002920.29170.18761.2034490.00070.999334.57136(50.3S090.26371.0410460.07840.925045.5515800.46260.39280.94741 S0.26930.7335说明:对于一批给定的数据,究竞应聚成几类合适,没有统一的规则。
可先将数据聚成类, 然后结合专业知识和各类能解释总方差的百分比来权衡。