Removing the Scholastic Aptitude Test score from the predictive
index
Jonathan Baron
Professor of Psychology and Head, Admissions Committee, School of
Arts and Sciences
March 17, 1992
Here is why I think that the faculty of the School of Arts and
Sciences should vote to remove the SAT (mean Scholastic Aptitude
Test score) from the predictive index (PI).
Background
The McGill Report (1967) established our current admissions
policies, including the predictive index. The index is an
equally weighted composite of three components: SAT, the mean of
the three highest achievement tests (ACH, one must be English),
and high-school class rank (CLR, rescaled so that the numbers are
comparable to the other components). The report recognized that
the relevance of the various components could change. It said
(p. 32), ``The admission procedures proposed herein should be kept
under continual review by the Admissions Committee to make sure
that they are producing the results intended. This surveillance
can be made far more meaningful if it is accompanies by an
imaginative research program by the appropriate offices of the
University.''
Until 1989, when Norman Adler became Undergraduate Dean, the SAS
Admissions committee was unable to obtain the data that it needed
to monitor admissions procedures. Nor had anyone else carried
out more than a cursory review of the predictive index. In
particular, no serious attempt was made to examine the usefulness
of the three components.
In 1989, I obtained data for classes entering in 1983 and 1984.
These data were originally compiled as part of a study of the
prediction of attrition carried out at the direction of the
Provost, Michael Aiken. (The results from the attrition study
were never made available to the SAS Admissions Committee.)
Preliminary analyses indicated that the SAT was playing no role
in prediction, although ACH and CLR were. I reported these
results to the Provost, to Lee Stetson (Dean of Admissions), to
Hugo Sonnenschein (Dean), to Norman Adler, and to the Admissions
Committee, of which I was then head. No action ensued, but I was
advised (mostly by Dean Sonnenschein, who was quite sympathetic)
to publish the results. Many people said that Penn would not be
the first to act because such an action would be perceived as
lowering standards.
With the permission of the Provost and the help of Frank Norman
(a mathematical psychologist in my department), I have prepared a
paper, which is now published as:
Baron, J. & Norman, M. F. (1992). SATs, achievement tests, and
high-school class rank as predictors of college
performance. Educational and Psychological Measurement, 52,
1047-1055.
The Admissions Committee for about a year has considered bringing
a motion before the faculty to drop the SAT from the PI. The
latest step in this was to invite Leonard Ramist from Educational
Testing Service to come and discuss the issue. He did not
challenge the basic findings, although he presented a different
analysis that made the SAT look more useful. He asked that we
wait until ETS conducts its own study before taking action.
Summary of the results
The clearest summary is the following table, from our paper:
Table 2
Mean CUM as a function of SAT (here, the sum of both tests) and
PI2 (sum of CLR and ACH). Cells with fewer than 15 students are
omitted. The label of each row and column indicates the interval
from five below to four above.
PI2
105 115 125 135 145 155
---- ---- ---- ---- ---- ----
SAT 105 2.64 2.88 3.05
115 2.75 2.84 3.07 3.27
125 2.66 2.91 3.12 3.22 3.46
135 2.87 3.10 3.25 3.45
145 3.18 3.27 3.43 3.69
155 3.28
PI2 is, of course, what we are recommending as the new predictive
index, an equally weighted combination of CLR and ACH. Looking
across each of the rows, it is apparent that CUM (cumulative GPA
for four years) increases as PI2 increases. Looking down each
column, it is apparent that CUM does not increase as SAT
increases, once PI2 is known.
Of course, SAT is correlated with PI2. So SAT by itself does
predict grades. The trouble is that it is redundant. In
particular, it is redundant with ACH. These results are
summarized in two other tables from our paper. These tables
report coefficients on a scale on which 0 represents no
relationship and 1 represents a perfect relationship. ``Raw''
correlations are based on one component at a time. ``Regression
weights'' (and ``logistic weight'') represent the strength of the
relationship after the other predictors are taken into account.
R-squared represents the total predictive power of a group of
components. Here are the two tables:
Table 1
Predictive power of SAT, ACH, and CLR. Regression weights are
standardized except for the prediction of status, which is from a
logistic regression. ``Grade'' is the negative of the number of
incompletes or withdrawals. Asterisk indicates statistical
significance at the .05 level; typically, levels were much lower.
Measure SAT ACH CLR
Raw correlation with CUM .199* .261* .305*
first year only .264* .328* .344*
Delaware (first year) .313* .366* .409*
Regression weight for CUM -.013 .220* .266*
first year only .015 .262* .291*
Delaware (first year) -.001 .214* .326*
Logistic weight for status .020 -.005 .039*
Regression weight for grade -.032 .018 .126*
newpage
Table 3
R-squared and standardized regression weight (correlations for
single variables) for each combination of predictors.
Regression weights
Predictors SAT ACH CLR R-squared
SAT .199 - - .040
ACH - .261 - .068
CLR - - .305 .093
SAT & ACH .019 .247 - .068
SAT & CLR .145 - .277 .113
ACH & CLR - .211 .265 .136
SAT & ACH & CLR -.013 .220 .266 .136
The results from Delaware argue that the problem is not a result
of the fact that Penn is highly selective. Delaware is
demonstrably much less selective. We also found that these
results held for most classes individually, not just for overall
CUM. Some classes were exceptions however. The SAT did
contribute to prediction in Wharton classes and in some other
fields, but there were not enough of these to affect its
usefulness in CUM overall.
One important point is that these results are based on a sample
that excludes underrepresented minorities (Blacks and Hispanics).
The results for these minorities are similar, although samples
are small and the results vary from year to year. (Ramist
claimed that the SAT is useful for Blacks, but in the two other
years examined, I have not found this, and, in fact, there is no
evidence whatsoever that the predictors for minority and
non-minority students are different. This is a general finding
in the testing literature, which we replicated in our study. I
believe that Ramist's result is a fluke. In some cases, I have
found that the SAT is significantly NEGATIVELY correlated with
grades.) If, however, we include the whole sample without taking
account of minority or non-minority status, then SAT is somewhat
useful. This is because SAT is lower for minorities and
minorities get lower grades. This was the result that Ramist
claimed was evidence for the usefulness of the SAT.
Note also that the SAT is useless for predicting attrition or
withdrawals from classes, although CLR is useful here.
Reasons for taking action now
1. The fact that the SAT is useful for weeding out
underrepresented minorities is not a reason to use it. Even if
we wanted to give up our affirmative action policy, the reason
for the poor average performance minority students is not at all
clear. In particular, it is not clear how much of the problem
results from things that happen after students are here, as
opposed to the affirmative action policy itself. A thorough
study of this problem should be conducted. In the meantime,
giving up on affirmative action is not an option that we face.
2. Even if ETS does a study, we are going to believe our internal
results rather than the results from other colleges. Colleges
are likely to differ. My hunch is that they will not differ
much, but if they did, so what?
3. Giving up a useless test is ethical. The ``Statement of
principles of good practice'' (1989) of the National Association
of College Admissions Counselors says, `` College and University
Members agree that they will dots use test scores and related data
discreetly and for purposes that are appropriate and validated
[and] conduct institutional research to inquire into the most
appropriate use of tests for admission decisions.'' The College
Board's ``Guidelines on the uses of College Board test scores and
related data'' says, ``When College Board tests are used for
admissions purposes, the responsible officials and selection
committee members should dots validate data used in the selection
process regularly dots to ensure their continuing relevance.'' The
``Code of fair testing practices in education'' (1988) of the
American Psychological Association's Joint Committee on Testing
Practices says, ``Test users should select tests that meet the
purpose for which they are to be used and that are appropriate
for the intended test-taking populations,'' and ``Test users should
dots obtain evidence to help show that the test is meeting its
intended purpose.''
These stipulations would be empty if no action were then taken to
remove a useless test from a predictive index. More generally,
lack of incremental validity of just the sort we found has been
used as a basis for civil suits concerning employment decisions.
No such suits, to my knowledge, have concerned college
admissions, but the principle seems to be a good one. A person
denied admission on the basis of a useless test would seem to
have legitimate grounds for complaint. By not acting to change
the PI, we are perpetuating a system that is unfair in this way.
4. College admissions criteria have major effects on high-school
education. College-bound high-school students do what they think
will help them get into a good college. Students now spend a
considerable amount of time preparing for the SAT. (When my son
was taught how to take multiple-choice exams in Kindergarten,
when he was in a group of children who could already read, I
complained that this was an inappropriate activity, and I was
told that it's never too early to start preparing for the SAT!)
If they were told that the SAT was not important but their grades
and their achievement test scores WERE important, they might
spend more time trying to learn something and less time trying to
learn how to appear to be intelligent on a test. This might be
reason enough to drop the SAT, even if it were somewhat useful
for prediction.
5. The PI is used for other things aside from admission. It is,
for example, used to decide on the designation of Benjamin
Franklin Scholars and to decide on financial aid for foreign
students. We compete with many other excellent institutions for
these top students. By using a slightly different PI than our
competitors, but one that is just as valid, we could end up
giving these kinds of special offers to some students who do NOT
get identical offers from our competitors but who are excellent
students none the less.
Political aspects
Given the fourth reason, it would be futile to change the PI and
not make it public that we have done so. Clearly, such an
announcement can be handled well or badly, and it would require
considerable thought. It is perhaps the difficulty of planning
this that has made the Provost and the Dean of Admission so
reluctant to do anything. (The latest sign of their reluctance
is that they did not even inform me of the availability of the
data that they gave to Ramist.)
It is clear that such an announcement could have detrimental
effects. An advantage of waiting is that, once our article is
published [as it was], other Ivy League institutions might
undertake similar studies and might be willing to make some sort
of joint move [which didn't happen]. On the other side, waiting
for someone else to move could take years. These studies are not
so easy to do without resources, and resources are in short
supply. (We had no help except in obtaining the data.)
I believe that an announcement could also have positive effects.
It could be used to gain national publicity for the fact that our
admissions standards have in fact been going up. We could
emphasize that we expect them to continue to go up. We should
also emphasize that we are doing this as a matter of intellectual
honesty, which is what universities are about. In these times of
doubts about even the integrity of universities, that could
probably help too.
In sum, although I think that the publicity has to be handled
very carefully, with full consultation of all relevant parties, I
do think that it can be handled so that it is positive rather
than negative.