Removing the Scholastic Aptitude Test score from the predictive

Jonathan Baron
Professor of Psychology and Head, Admissions Committee, School of
Arts and Sciences
March 17, 1992

Here is why I think that the faculty of the School of Arts and
Sciences should vote to remove the SAT (mean Scholastic Aptitude
Test score) from the predictive index (PI).


The McGill Report (1967) established our current admissions
policies, including the predictive index.  The index is an
equally weighted composite of three components: SAT, the mean of
the three highest achievement tests (ACH, one must be English),
and high-school class rank (CLR, rescaled so that the numbers are
comparable to the other components).  The report recognized that
the relevance of the various components could change.  It said
(p. 32), ``The admission procedures proposed herein should be kept
under continual review by the Admissions Committee to make sure
that they are producing the results intended.  This surveillance
can be made far more meaningful if it is accompanies by an
imaginative research program by the appropriate offices of the

Until 1989, when Norman Adler became Undergraduate Dean, the SAS
Admissions committee was unable to obtain the data that it needed
to monitor admissions procedures.  Nor had anyone else carried
out more than a cursory review of the predictive index.  In
particular, no serious attempt was made to examine the usefulness
of the three components.

In 1989, I obtained data for classes entering in 1983 and 1984. 
These data were originally compiled as part of a study of the
prediction of attrition carried out at the direction of the
Provost, Michael Aiken.  (The results from the attrition study
were never made available to the SAS Admissions Committee.) 
Preliminary analyses indicated that the SAT was playing no role
in prediction, although ACH and CLR were.  I reported these
results to the Provost, to Lee Stetson (Dean of Admissions), to
Hugo Sonnenschein (Dean), to Norman Adler, and to the Admissions
Committee, of which I was then head.  No action ensued, but I was
advised (mostly by Dean Sonnenschein, who was quite sympathetic)
to publish the results.  Many people said that Penn would not be
the first to act because such an action would be perceived as
lowering standards.

With the permission of the Provost and the help of Frank Norman
(a mathematical psychologist in my department), I have prepared a
paper, which is now published as:
Baron, J. & Norman, M. F. (1992). SATs, achievement tests, and high-school class rank as predictors of college performance. Educational and Psychological Measurement, 52, 1047-1055.
The Admissions Committee for about a year has considered bringing a motion before the faculty to drop the SAT from the PI. The latest step in this was to invite Leonard Ramist from Educational Testing Service to come and discuss the issue. He did not challenge the basic findings, although he presented a different analysis that made the SAT look more useful. He asked that we wait until ETS conducts its own study before taking action. Summary of the results The clearest summary is the following table, from our paper: Table 2 Mean CUM as a function of SAT (here, the sum of both tests) and PI2 (sum of CLR and ACH). Cells with fewer than 15 students are omitted. The label of each row and column indicates the interval from five below to four above. PI2 105 115 125 135 145 155 ---- ---- ---- ---- ---- ---- SAT 105 2.64 2.88 3.05 115 2.75 2.84 3.07 3.27 125 2.66 2.91 3.12 3.22 3.46 135 2.87 3.10 3.25 3.45 145 3.18 3.27 3.43 3.69 155 3.28 PI2 is, of course, what we are recommending as the new predictive index, an equally weighted combination of CLR and ACH. Looking across each of the rows, it is apparent that CUM (cumulative GPA for four years) increases as PI2 increases. Looking down each column, it is apparent that CUM does not increase as SAT increases, once PI2 is known. Of course, SAT is correlated with PI2. So SAT by itself does predict grades. The trouble is that it is redundant. In particular, it is redundant with ACH. These results are summarized in two other tables from our paper. These tables report coefficients on a scale on which 0 represents no relationship and 1 represents a perfect relationship. ``Raw'' correlations are based on one component at a time. ``Regression weights'' (and ``logistic weight'') represent the strength of the relationship after the other predictors are taken into account. R-squared represents the total predictive power of a group of components. Here are the two tables: Table 1 Predictive power of SAT, ACH, and CLR. Regression weights are standardized except for the prediction of status, which is from a logistic regression. ``Grade'' is the negative of the number of incompletes or withdrawals. Asterisk indicates statistical significance at the .05 level; typically, levels were much lower. Measure SAT ACH CLR Raw correlation with CUM .199* .261* .305* first year only .264* .328* .344* Delaware (first year) .313* .366* .409* Regression weight for CUM -.013 .220* .266* first year only .015 .262* .291* Delaware (first year) -.001 .214* .326* Logistic weight for status .020 -.005 .039* Regression weight for grade -.032 .018 .126* newpage Table 3 R-squared and standardized regression weight (correlations for single variables) for each combination of predictors. Regression weights Predictors SAT ACH CLR R-squared SAT .199 - - .040 ACH - .261 - .068 CLR - - .305 .093 SAT & ACH .019 .247 - .068 SAT & CLR .145 - .277 .113 ACH & CLR - .211 .265 .136 SAT & ACH & CLR -.013 .220 .266 .136 The results from Delaware argue that the problem is not a result of the fact that Penn is highly selective. Delaware is demonstrably much less selective. We also found that these results held for most classes individually, not just for overall CUM. Some classes were exceptions however. The SAT did contribute to prediction in Wharton classes and in some other fields, but there were not enough of these to affect its usefulness in CUM overall. One important point is that these results are based on a sample that excludes underrepresented minorities (Blacks and Hispanics). The results for these minorities are similar, although samples are small and the results vary from year to year. (Ramist claimed that the SAT is useful for Blacks, but in the two other years examined, I have not found this, and, in fact, there is no evidence whatsoever that the predictors for minority and non-minority students are different. This is a general finding in the testing literature, which we replicated in our study. I believe that Ramist's result is a fluke. In some cases, I have found that the SAT is significantly NEGATIVELY correlated with grades.) If, however, we include the whole sample without taking account of minority or non-minority status, then SAT is somewhat useful. This is because SAT is lower for minorities and minorities get lower grades. This was the result that Ramist claimed was evidence for the usefulness of the SAT. Note also that the SAT is useless for predicting attrition or withdrawals from classes, although CLR is useful here. Reasons for taking action now 1. The fact that the SAT is useful for weeding out underrepresented minorities is not a reason to use it. Even if we wanted to give up our affirmative action policy, the reason for the poor average performance minority students is not at all clear. In particular, it is not clear how much of the problem results from things that happen after students are here, as opposed to the affirmative action policy itself. A thorough study of this problem should be conducted. In the meantime, giving up on affirmative action is not an option that we face. 2. Even if ETS does a study, we are going to believe our internal results rather than the results from other colleges. Colleges are likely to differ. My hunch is that they will not differ much, but if they did, so what? 3. Giving up a useless test is ethical. The ``Statement of principles of good practice'' (1989) of the National Association of College Admissions Counselors says, `` College and University Members agree that they will dots use test scores and related data discreetly and for purposes that are appropriate and validated [and] conduct institutional research to inquire into the most appropriate use of tests for admission decisions.'' The College Board's ``Guidelines on the uses of College Board test scores and related data'' says, ``When College Board tests are used for admissions purposes, the responsible officials and selection committee members should dots validate data used in the selection process regularly dots to ensure their continuing relevance.'' The ``Code of fair testing practices in education'' (1988) of the American Psychological Association's Joint Committee on Testing Practices says, ``Test users should select tests that meet the purpose for which they are to be used and that are appropriate for the intended test-taking populations,'' and ``Test users should dots obtain evidence to help show that the test is meeting its intended purpose.'' These stipulations would be empty if no action were then taken to remove a useless test from a predictive index. More generally, lack of incremental validity of just the sort we found has been used as a basis for civil suits concerning employment decisions. No such suits, to my knowledge, have concerned college admissions, but the principle seems to be a good one. A person denied admission on the basis of a useless test would seem to have legitimate grounds for complaint. By not acting to change the PI, we are perpetuating a system that is unfair in this way. 4. College admissions criteria have major effects on high-school education. College-bound high-school students do what they think will help them get into a good college. Students now spend a considerable amount of time preparing for the SAT. (When my son was taught how to take multiple-choice exams in Kindergarten, when he was in a group of children who could already read, I complained that this was an inappropriate activity, and I was told that it's never too early to start preparing for the SAT!) If they were told that the SAT was not important but their grades and their achievement test scores WERE important, they might spend more time trying to learn something and less time trying to learn how to appear to be intelligent on a test. This might be reason enough to drop the SAT, even if it were somewhat useful for prediction. 5. The PI is used for other things aside from admission. It is, for example, used to decide on the designation of Benjamin Franklin Scholars and to decide on financial aid for foreign students. We compete with many other excellent institutions for these top students. By using a slightly different PI than our competitors, but one that is just as valid, we could end up giving these kinds of special offers to some students who do NOT get identical offers from our competitors but who are excellent students none the less. Political aspects Given the fourth reason, it would be futile to change the PI and not make it public that we have done so. Clearly, such an announcement can be handled well or badly, and it would require considerable thought. It is perhaps the difficulty of planning this that has made the Provost and the Dean of Admission so reluctant to do anything. (The latest sign of their reluctance is that they did not even inform me of the availability of the data that they gave to Ramist.) It is clear that such an announcement could have detrimental effects. An advantage of waiting is that, once our article is published [as it was], other Ivy League institutions might undertake similar studies and might be willing to make some sort of joint move [which didn't happen]. On the other side, waiting for someone else to move could take years. These studies are not so easy to do without resources, and resources are in short supply. (We had no help except in obtaining the data.) I believe that an announcement could also have positive effects. It could be used to gain national publicity for the fact that our admissions standards have in fact been going up. We could emphasize that we expect them to continue to go up. We should also emphasize that we are doing this as a matter of intellectual honesty, which is what universities are about. In these times of doubts about even the integrity of universities, that could probably help too. In sum, although I think that the publicity has to be handled very carefully, with full consultation of all relevant parties, I do think that it can be handled so that it is positive rather than negative.