Thursday, July 20, 2017

AMOS model fit measures

AMOS Model Fit Measures.

Contents
The Minimum sample discrepancy model: 1
  1. CMIN: 1
  2. P: 1
  3. CMIN/DF. 3
  4. FMIN.. 3
  1. NPAR: 4
  2. DF: 4
  3. PRATIO: 5
  4. PNFI: 5
  5. PCFI 5
  6. PCLOSE. 6
  1. NCP. 7
  2. F0. 7
  3. RMSEA: 7
  4. PCLOSE: 9
  1. AIC. 10
  2. BCC. 10
  3. BIC. 10
  4. CAIC. 11
  5. ECVI 11
  6. MECVI 12
  1. NFI 13
  2. RFI 15
  3. IFI 15
  4. TLI 16
  5. CFI 16
  1. PNFI 17
  2. PCFI 18
  3. GFI and related measures: 18
  4. GFI 18
  5. AGFI 19
  6. PGFI 19
  7. Miscellaneous measures: 20
  8. Hoelter index: 20
  9. RMR. 21
A.The Minimum sample discrepancy model:
The following fit measures are based on the minimum value of the discrepancy.
  • CMIN: CMIN is the minimum value, Description: 7244, of the discrepancy, C

  • P:P is the probability of getting as large a discrepancy as occurred with the present sample (under appropriate distributional assumptions and assuming a correctly specified model). That is, P is a "p value" for testing the hypothesis that the model fits perfectly in the population.One approach to model selection employs statistical hypothesis testing to eliminate from consideration those models that are inconsistent with the available data. Hypothesis testing is a widely accepted procedure and there is a lot of experience in its use. However, its unsuitability as a device for model selection was pointed out early in the development of analysis of moment structures (Jöreskog, 1969). It is generally acknowledged that most models are useful approximations that do not fit perfectly in the population. In other words, the null hypothesis of perfect fit is not credible to begin with and will in the end be accepted only if the sample is not allowed to get too big.If you encounter resistance to the foregoing view of the role of hypothesis testing in model fitting, the following quotations may come in handy. The first two quotes predate the development of structural modeling, and refer to other model fitting problems.

▪           "The power of the test to detect an underlying disagreement between theory and data is controlled largely by the size of the sample. With a small sample an alternative hypothesis which departs violently from the null hypothesis may still have a small probability of yielding a significant value of 7245. In a very large sample, small and unimportant departures from the null hypothesis are almost certain to be detected." (Cochran, 1952)
▪           "If the sample is small then the 7245 test will show that the data are 'not significantly different from' quite a wide range of very different theories, while if the sample is large, the 7245 test will show that the data are significantly different from those expected on a given theory even though the difference may be so very slight as to be negligible or unimportant on other criteria." (Gulliksen & Tukey, 1958, pp. 95–96)
▪           "Such a hypothesis [of perfect fit] may be quite unrealistic in most empirical work with test data. If a sufficiently large sample were obtained this 7245 statistic would, no doubt, indicate that any such non-trivial hypothesis is statistically untenable." (Jöreskog, 1969, p. 200)
▪           "... in very large samples virtually all models that one might consider would have to be rejected as statistically untenable .... In effect, a nonsignificant chi-square value is desired, and one attempts to infer the validity of the hypothesis of no difference between model and data. Such logic is well-known in various statistical guises as attempting to prove the null hypothesis. This procedure cannot generally be justified, since the chi-square variate v can be made small by simply reducing sample size." (Bentler & Bonett, 1980, p. 591)
▪           "Our opinion ... is that this null hypothesis [of perfect fit] is implausible and that it does not help much to know whether or not the statistical test has been able to detect that it is false." (Browne & Mels, 1992, p. 78).

CMIN/DF

CMIN/DF is the minimum discrepancy, Description: 7244, divided by its degrees of freedom:
Description: 7246.
Several writers have suggested the use of this ratio as a measure of fit. For every estimation criterion except for Uls and Sls, the ratio should be close to one for correct models. The trouble is that it isn't clear how far from one you should let the ratio get before concluding that a model is unsatisfactory.
Rules of thumb:
"...Wheaton et al. (1977) suggest that the researcher also compute a relative chi-square (Description: 7247) .... They suggest a ratio of approximately five or less 'as beginning to be reasonable.' In our experience, however, Description: 7245 to degrees of freedom ratios in the range of 2 to 1 or 3 to 1 are indicative of an acceptable fit between the hypothetical model and the sample data." (Carmines and McIver, 1981, page 80)
"... different researchers have recommended using ratios as low as 2 or as high as 5 to indicate a reasonable fit." (Marsh & Hocevar, 1985).

"... it seems clear that a Description: 7247 ratio > 2.00 represents an inadequate fit." (Byrne, 1989, p. 55).

 FMIN

FMIN is the minimum value, Description: 7248, of the discrepancy

Measures of parsimony

Models with relatively few parameters (and relatively many degrees of freedom) are sometimes said to be high in parsimony, or simplicity. Models with many parameters (and few degrees of freedom) are said to be complex, or lacking in parsimony. This use of the terms, simplicity and complexity, does not always conform to everyday usage. For example, the saturated model would be called complex while a model with an elaborate pattern of linear dependencies but with highly constrained parameter values would be called simple.
While one can inquire into the grounds for preferring simple, parsimonious models (e.g., Mulaik, et al., 1989), there does not appear to be any disagreement that parsimonious models are preferable to complex ones. When it comes to parameters, all other things being equal, less is more. At the same time, well fitting models are preferable to poorly fitting ones. Many fit measures represent an attempt to balance these two conflicting objectives—simplicity and goodness of fit.
"In the final analysis, it may be, in a sense, impossible to define one best way to combine measures of complexity and measures of badness-of-fit in a single numerical index, because the precise nature of the best numerical tradeoff between complexity and fit is, to some extent, a matter of personal taste. The choice of a model is a classic problem in the two-dimensional analysis of preference." (Steiger, 1990, p. 179)

NPAR:

NPAR is the number of distinct parameters (q) being estimated. Two parameters (two regression weights, say) that are required to be equal to each other count as a single parameter, not two.

DF:

DF is the number of degrees of freedom for testing the model:
Description: 7241.
where p is the number of sample moments and q is the number of distinct parameters. Rigdon (1994a) gives a detailed explanation of the calculation and interpretation of degrees of freedom.

PRATIO:

The parsimony ratio (James, Mulaik & Brett, 1982; Mulaik, et al., 1989) expresses the number of constraints in the model being evaluated as a fraction of the number of constraints in the independence model:
Description: 7242,
where d is the degrees of freedom of the model being evaluated and 7243 is the degrees of freedom of the independence model. The parsimony ratio is used in the calculation of PNFI and

PNFI:

The PNFI is the result of applying the James, Mulaik and Brett, 1982 parsimony adjustment to the NFI:
Description: 7309
where d is the degrees of freedom for the model being evaluated, and 7310 is the degrees of freedom for the baseline model.

PCFI

The PCFI is the result of applying the James, Mulaik and Brett, 1982 parsimony adjustment to the CFI:
Description: 7311
where d is the degrees of freedom for the model being evaluated, and 7312 is the degrees of freedom for the baseline model.

PCLOSE

Gets the "p value" for testing the null hypothesis that RMSEA is less than .05 in the population. (Browne & Cudeck, 1993)

Measures based on population discrepancy:

Steiger and Lind (1980) introduced the use of the population discrepancy function as a measure of model adequacy. The population discrepancy function, Description: 7249, is the value of the discrepancy function obtained by fitting a model to the population moments rather than to sample moments. That is,
Description: 7250
in contrast to
Description: 7251.
Steiger, Shapiro and Browne (1985) showed that under certain conditions Description: 7252 has a noncentral chi-square distribution with ddegrees of freedom and noncentrality parameter Description: 7253. The Steiger-Lind approach to model evaluation centers around the estimation of Description: 7249 and related quantities.
The present discussion of measures related to the population discrepancy relies mainly on Steiger and Lind (1980) and Steiger, Shapiro and Browne (1985). The notation is based on Browne and Mels (1992).

NCP

Description: 7254 is an estimate of the noncentrality parameter, Description: 7255.
The columns labeled LO 90 and HI 90 contain the lower limit (Description: 7256) and upper limit (Description: 7257) of a 90% confidence interval for Description: 7258Description: 7256is obtained by solving
Description: 7259
for Description: 7258, and Description: 7257 is obtained by solving
Description: 7260
for Description: 7258, where Description: 7261 is the distribution function of the noncentral chi-squared distribution with noncentrality parameter Description: 7258 and ddegrees of freedom.

F0

Description: 7262 is an estimate of Description: 7263.
The columns labeled LO 90 and HI 90 contain the lower limit and upper limit of a 90% confidence interval for Description: 7249:
Description: 7264
Description: 7265.

RMSEA:


Description: 7249 incorporates no penalty for model complexity and will tend to favor models with many parameters. In comparing two nested models, Description: 7249 will never favor the simpler model. Steiger and Lind (1980) suggested compensating for the effect of model complexity by dividing Description: 7249 by the number of degrees of freedom for testing the model. Taking the square root of the resulting ratio
gives the population "root mean square error of approximation", called RMS by Steiger and Lind, and RMSEA by Browne and Cudeck (1993).
Description: 7266
Description: 7267
The columns labeled LO 90 and HI 90 contain the lower limit and upper limit of a 90% confidence interval for the population value of RMSEA. The limits are given by
Description: 7268
Description: 7269
Rule of thumb:
"Practical experience has made us feel that a value of the RMSEA of about .05 or less would indicate a close fit of the model in relation to the degrees of freedom. This figure is based on subjective judgment. It cannot be regarded as infallible or correct, but it is more reasonable than the requirement of exact fit with the RMSEA = 0.0. We are also of the opinion that a value of about 0.08 or less for the RMSEA would indicate a reasonable error of approximation and would not want to employ a model with a RMSEA greater than 0.1." (Browne and Cudeck, 1993)

PCLOSE:

Description: 7270 is a "p value" for testing the null hypothesis that the population RMSEA is no greater than .05:
Description: 7271 .
By contrast, P is for testing the hypothesis that the population RMSEA is zero:
Description: 7272 .
Based on their experience with RMSEA, Browne and Cudeck (1993) suggest that a RMSEA of .05 or less indicates a "close fit". Employing this definition of "close fit", PCLOSE gives a test of close fit while gives a test of exact fit.

Information theoretic measures:

Amos reports several statistics of the form Description: 7273 or Description: 7274, where k is some positive constant. Each of these statistics creates a composite measure of badness of fit (Description: 7275) and complexity (q) by forming a weighted sum of the two. Simple models that fit well receive low scores according to such a criterion. Complicated, poorly fitting models get high scores. The constant k determines the relative penalties to be attached to badness of fit and to complexity.
The statistics described in this section are intended for model comparisons and not for the evaluation of an isolated model.
All of these statistics were developed for use with maximum likelihood estimation. Amos reports them for Gls and Adf estimation as well, although it is not clear that their use is appropriate there.

AIC

The Akaike information criterion (Akaike, 1973Akaike, 1987) is given by
Description: 7276 .

BCC


The Browne-Cudeck (Browne & Cudeck, 1989) criterion is given by
Description: 7277
where Description: 7278 if the Emulisrel6 method has been used, or Description: 7279 if it has not.
BCC imposes a slightly greater penalty for model complexity than does AIC.
BCC is the only measure in this section that was developed specifically for analysis of moment structures. Browne and Cudeck provided some empirical evidence suggesting that BCC may be superior to more generally applicable measures. Arbuckle (unpublished) gives an alternative justification for BCC and derives the above formula for multiple groups.

BIC

The Bayes information criterion (Schwarz, 1978; Raftery, 1995) is given by the formula,
Description: 7280.
Amos 4 used the formula (Raftery, 1993),
Description: 7281.
In comparison to AICBCC and CAICBIC assigns a greater penalty to model complexity, and so has a greater tendency to pick parsimonious models. BIC is reported only for the case of a single group where means and intercepts are not explicit model parameters.

CAIC

Bozdogan's (Bozdogan, 1987) CAIC (consistent AIC) is given by the formula,
Description: 7282.
CAIC assigns a greater penalty to model complexity than either AIC or BCC, but not as great a penalty as does BIC. CAIC is reported only for the case of a single group where means and intercepts are not explicit model parameters.

ECVI

Except for a constant scale factor, ECVI is the same as AIC:
Description: 7283.
The columns labeled LO 90 and HI 90 give the lower limit and upper limit of a 90% confidence interval for the population ECVI:
Description: 7284,
Description: 7285.

MECVI

Except for a scale factor, MECVI is identical to BCC:
Description: 7286,
where Description: 7287 if the Emulisrel6 method has been used, or Description: 7288 if it has not.

Comparison to baseline model

Several fit measures encourage you to reflect on the fact that, no matter how badly your model fits, things could always be worse.
Bentler and Bonett (1980) and Tucker and Lewis (1973) suggested fitting the independence model or some other very badly fitting "baseline" model as an exercise to see how large the discrepancy function becomes. The object of the exercise is to put the fit of your own model(s) into some perspective. If none of your models fit very well, it may cheer you up to see a really bad model. For example, as the following output shows, Model A from Example 6 has a rather large discrepancy (71.544) in relation to its degrees of freedom. On the other hand, 71.544 does not look so bad compared to 2131.790 (the discrepancy for the independence model).
Model
NPAR
CMIN
DF
P
CMIN/DF
Model A: No Autocorrelation
15
71.544
6
.000
11.924
Model B: Most General
16
6.383
5
.271
1.277
Model C: Time-Invariance
13
7.501
8
.484
.938
Model D: A and C Combined
12
73.077
9
.000
8.120
Saturated model
21
.000
0


Independence model
6
2131.790
15
.000
142.119
This things-could-be-worse philosophy of model evaluation is incorporated into a number of fit measures. All of the measures tend to range between zero and one, with values close to one indicating a good fit. Only NFI (described below) is guaranteed to be between zero and one, with one indicating a perfect fit. (CFI is also guaranteed to be between zero and one, but this is because values bigger than one are reported as one, while values less than zero are reported as zero.)
The independence model is only one example of a model that can be chosen as the baseline model, although it is the one most often used, and the one that Amos uses. Sobel and Bohrnstedt (1985) contend that the choice of the independence model as a baseline model is often inappropriate. They suggest alternatives, as did Bentler and Bonett (1980), and give some examples to demonstrate the sensitivity of NFI to the choice of baseline model.

NFI

The Bentler-Bonett (Bentler & Bonett, 1980) normed fit index ( NFI), or Description: 7289 in the notation of Bollen (1989b) can be written
Description: 7290,
where Description: 7291is the minimum discrepancy of the model being evaluated and Description: 7292 is the minimum discrepancy of the baseline model.
In Example 6 the independence model can be obtained by adding constraints to any of the other models. Any model can be obtained by constraining the saturated model. So Model A, for instance, with Description: 7293, is unambiguously "in between" the perfectly fitting saturated model (Description: 7294) and the independence model Description: 7295).
Model
NPAR
CMIN
DF
P
CMIN/DF
Model A: No Autocorrelation
15
71.544
6
.000
11.924
Model B: Most General
16
6.383
5
.271
1.277
Model C: Time-Invariance
13
7.501
8
.484
.938
Model D: A and C Combined
12
73.077
9
.000
8.120
Saturated model
21
.000
0


Independence model
6
2131.790
15
.000
142.119
Looked at in this way, the fit of Model A is a lot closer to the fit of the saturated model than it is to the fit of the independence model. In fact you might say that Model A has a discrepancy that is 96.6% of the way between the (terribly fitting) independence model and the (perfectly fitting) saturated model:
Description: 7296.
Rule of thumb:
"Since the scale of the fit indices is not necessarily easy to interpret (e.g., the indices are not squared multiple correlations), experience will be required to establish values of the indices that are associated with various degrees of meaningfulness of results. In our experience, models with overall fit indices of less than .9 can usually be improved substantially. These indices, and the general hierarchical comparisons described previously, are best understood by examples." (Bentler & Bonett, 1980, p. 600, referring to both theNFI and the TLI)

RFI

Bollen's (Bollen, 1986) relative fit index ( RFI) is given by

Description: 7297,
where Description: 7298 and Description: 7299 are the discrepancy and the degrees of freedom for the model being evaluated, and Description: 7300 and Description: 7301 are the discrepancy and the degrees of freedom for the baseline model.
The RFI is obtained from the NFI by substituting F/d for F.
RFI values close to 1 indicate a very good fit.

IFI

Bollen's (Bollen, 1989b) incremental fit index ( IFI) is given by
Description: 7302,
where Description: 7298 and Description: 7299 are the discrepancy and the degrees of freedom for the model being evaluated, and Description: 7300 and Description: 7301 are the discrepancy and the degrees of freedom for the baseline model.
IFI values close to 1 indicate a very good fit.

TLI

The Tucker-Lewis coefficient (Description: 7303 in the notation of Bollen, 1989b) was discussed by Bentler and Bonett (1980) in the context of analysis of moment structures, and is also known as the Bentler-Bonett non-normed fit index ( NNFI).
Description: 7304,
where Description: 7298 and Description: 7299 are the discrepancy and the degrees of freedom for the model being evaluated, and Description: 7300 and Description: 7301 are the discrepancy and the degrees of freedom for the baseline model.
The typical range for TLI lies between zero and one, but it is not limited to that range. TLI values close to 1 indicate a very good fit.

CFI

The comparative fit index (CFI; Bentler, 1990) is given by.
Description: 7305,
where Description: 7306, and NCP are the discrepancy, the degrees of freedom and the noncentrality parameter estimate for the model being evaluated, and Description: 7307 are the discrepancy, the degrees of freedom and the noncentrality parameter estimate for the baseline model.
The CFI is identical to the McDonald and Marsh (1990) relative noncentrality index ( RNI),
Description: 7308,
except that the CFI is truncated to fall in the range from 0 to 1. CFI values close to 1 indicate a very good fit.

Parsimony adjusted measures:

James, Mulaik and Brett, 1982 suggested multiplying the NFI by a "parsimony index" so as to take into account the number of degrees of freedom for testing both the model being evaluated and the baseline model. Mulaik, et al. (1989) suggested applying the same adjustment to the GFI. Amos also applies a parsimony adjustment to the CFI.

PNFI

The PNFI is the result of applying the James, Mulaik and Brett, 1982 parsimony adjustment to the NFI:
Description: 7309,
where d is the degrees of freedom for the model being evaluated, and Description: 7310 is the degrees of freedom for the baseline model.

PCFI

The PCFI is the result of applying the James, Mulaik and Brett, 1982 parsimony adjustment to the CFI:
Description: 7311
where d is the degrees of freedom for the model being evaluated, and Description: 7312 is the degrees of freedom for the baseline model.

GFI and related measures:

GFI

The GFI (goodness of fit index) was devised by Jöreskog and Sörbom (1984) for Ml and Uls estimation, and generalized to other estimation criteria by Tanaka and Huba (1985). The GFI is given by
Description: 7313
where Description: 7314is the minimum value of the discrepancy function defined in Appendix B and Description: 7315 is obtained by evaluating F with Description: 2159g= 1, 2,...,G. An exception has to be made for maximum likelihood estimation, since (D2) in Appendix B is not defined for Description: 7316. For the purpose of computing GFI in the case of maximum likelihood estimation, Description: 7317 in Appendix B is calculated as
Description: 7318
with Description: 7319, where Description: 7320 is the maximum likelihood estimate of Description: 7321.
GFI is less than or equal to 1. A value of 1 indicates a perfect fit.

AGFI

The AGFI (adjusted goodness of fit index) takes into account the degrees of freedom available for testing the model. It is given by
Description: 7323,
where
Description: 7324.
The AGFI is bounded above by one, which indicates a perfect fit. It is not, however, bounded below by zero, as the GFI is.

PGFI

The PGFI (parsimony goodness of fit index), suggested by Mulaik, et al. (1989), is a modification of the GFI that takes into account the degrees of freedom available for testing the model:
Description: 7325,
where d is the degrees of freedom for the model being evaluated, and
Description: 7326

Miscellaneous measures:

Hoelter index:


Hoelter's "critical N" (Hoelter, 1983) is the largest sample size for which one would accept the hypothesis that a model is correct. Hoelter does not specify a significance level to be used in determining the critical N, although he uses .05 in his examples. Amos reports a critical N for significance levels of .05 and .01. Here are the critical N's displayed by Amos for each of the models in Example 6.
Model
HOELTER
.05
HOELTER
.01

Model A: No Autocorrelation

164

219
Model B: Most General
1615
2201
Model C: Time-Invariance
1925
2494
Model D: A and C Combined
216
277
Independence model

11
14
Model A, for instance, would have been accepted at the .05 level if the sample moments had been exactly as they were found to be in the Wheaton study, but with a sample size of 164. With a sample size of 165, Model A would have been rejected. Hoelter argues that a critical N of 200 or better indicates a satisfactory fit. In an analysis of multiple groups, he suggests a threshold of 200 times the number of groups. Presumably this threshold is to be used in conjunction with a significance level of .05. This standard eliminates Model A and the independence model in Example 6. Models B, C and D are satisfactory according to the Hoelter criterion. I am not myself convinced by Hoelter's arguments in favor of the 200 benchmark. Unfortunately, the use of critical N as a practical aid to model selection requires some such standard. Bollen and Liang (1988) report some studies of the critical N statistic.

RMR

The RMR (root mean square residual) is the square root of the average squared amount by which the sample variances and covariances differ from their estimates obtained under the assumption that your model is correct:
Description: 7327.
The smaller the RMR is, the better. An RMR of zero indicates a perfect fit.

References:

  1. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov,
  2. Akaike, H. (1987). Factor analysis and AIC. Psychometrika52, 317–332.
  3. B.N. & Csaki, F. [Eds.], Proceedings of the 2nd International Symposium on Information Theory. Budapest: Akademiai Kiado, 267–281.
  4. Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological bulletin88(3), 588.
  5. Bollen, K.A. (1989b). A new incremental fit index for general structural equation models. Sociological Methods and Research17, 303–316.
  6. Bollen, K.A. & Long, J.S. [Eds.] (1993). Testing structural equation models. Newbury Park, CA: Sage.
  7. Bozdogan, H. (1987). Model selection and Akaike's information criterion (AIC): The general theory and its analytical extensions. Psychometrika52, 345–370.
  8. Browne, M. W., & Mels, G. (1992). RAMONA user’s guide.
  9. Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. Sage focus editions154, 136-136.
  10. Byrne, B.M. (1989). A primer of LISREL: Basic applications and programming for confirmatory factor analytic models. New York: Springer-Verlag.
  11. Carmines, E., & McIver, J. (1981). Analyzing models with unobserved variables, Social measurement: Current issues. Beverly Hills: Sage.
  12. Cochran, W. G. (1952). The χ2 test of goodness of fit. The Annals of Mathematical Statistics, 315-345.
  13. Gulliksen, H., & Tukey, J. W. (1958). Reliability for the law of comparative judgment. Psychometrika23(2), 95-110.
  14. Hoelter, J.W. (1983). The analysis of covariance structures: Goodness-of-fit indices. Sociological Methods and Research11, 325–344.
  15. James, L. R., Mulaik, S. A., & Brett, J. M. (1982). Causal analysis: Assumptions, models, and data (Vol. 1). SAGE Publications, Incorporated.
  16. Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika34(2), 183-202.
  17. Jöreskog, K.G. & Sörbom, D. (1984). LISREL-VI user's guide (3rd ed.). Mooresville, IN: Scientific Software.
  18. McDonald, R.P. & Marsh, H.W. (1990). Choosing a multivariate model: Noncentrality and goodness of fit. Psychological Bulletin107, 247-255.

  19. Marsh, H. W., & Hocevar, D. (1985). Application of confirmatory factor analysis to the study of self-concept: First-and higher order factor models and their invariance across groups. Psychological bulletin97(3), 562.
  20. Mulaik, S. A., James, L. R., Van Alstine, J., Bennett, N., Lind, S., & Stilwell, C. D. (1989). Evaluation of goodness-of-fit indices for structural equation models. Psychological bulletin105(3), 430.
  21. Raftery, A. (1995). Bayesian model selection in social research. In P. Marsden (Ed.), Sociological Methodology 1995 (pp. 111-163): San Francisco.
  22. Raftery, A.E. (1993). Bayesian model selection in structural equation models. In Bollen, K.A. & Long, J.S. [Eds.] Testing structural equation models. Newbury Park, CA: Sage, 163–180.
  23. Rigdon, E. E. (1998). Structural equation modeling.
  24. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
  25. Sobel, M.E. & Bohrnstedt, G.W. (1985). Use of null models in evaluating the fit of covariance structure models. In Tuma, N.B [Ed.] Sociological methodology 1985. San Francisco: Jossey-Bass, 152–178.
  26. Steiger, J. H. (1990). Structural model evaluation and modification: An interval estimation approach. Multivariate behavioral research25(2), 173-180.
  27. Steiger, J.H. & Lind, J.C. (1980, May 30, 1980). Statistically-based tests for the number of common factors. Paper presented at the Annual Spring Meeting of the Psychometric Society, Iowa City.
  28. Steiger, J.H., Shapiro, A. & Browne, M.W. (1985). On the multivariate asymptotic distribution of sequential chi-square statistics. Psychometrika50, 253–263.
  29. Tanaka, J.S. & Huba, G.J. (1985). A fit index for covariance structure models under arbitrary GLS estimation. British Journal of Mathematical and Statistical Psychology38, 197–201.
  30. Tucker, L.R & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika38, 1–10.
  31. Wheaton, B., Muthen, B., Alwin, D. F., & Summers, G. F. (1977). Assessing reliability and stability in panel models. Sociological methodology8, 84-136.