Howard Wainer’s most recent book, Uneducated Guesses, is both a challenge to education policymakers and a warning to the country about the misguided policies that shape our nation’s educational system. Wainer uses statistical evidence to uncover the problems that threaten education in the United States in a book that is both accessible and eye opening for any reader. We recently posed some questions to Professor Wainer and are thrilled to post this dialogue about various issues he addresses in his book.
PUP: You discuss a lot of issues surrounding college and university admittance in Uneducated Guesses, one of which is the current trend of not requiring the SAT for admittance. Do you think that more schools will follow suit?
Professor Howard Wainer: I hope not. Right now there are powerful forces pushing some schools to abandon admission tests. One of the most insidious is how making such tests optional artificially boosts the school’s US News & World Report rankings. I hope that by exposing such strategies it will help to stifle such policies.
PUP: I always thought the rankings were done without room to really “cheat”. How do optional SAT admissions allow schools to game the rankings?
HW: If they make an admission test optional, applicants will behave sensibly. If their scores are lower than is typical for the school to which they are applying,they are likely to not submit them. Therefore the average SAT score, calculated from those who submit them, will be higher than the true, but unknown, all-student average. Thus schools that make the SAT mandatory are placed at a competitive disadvantage. Suppose one school’s average SAT score (an important component of the US News & World Report rankings) included all attending students, whereas another school’s only included the top half? It doesn’t make for fair comparisons.
PUP: So then, how well designed are college rankings and how much do they mean?
HW: I think that the rankings generated by US News & World Report are a sensible way to begin. They choose a set of variables that are positively related to the vague concept ‘quality’, rank schools on each of these variables, and then add them up. The key elements that are of concern are: (i) are the variables all pointed in the right direction, (ii) are there any important variables missing, (iii) are there any included variables that have no relationship to ‘quality’. If these bases are covered there is a theorem (stated and proved by Princeton statistician Sam Wilks 75 years ago) that tells us that this procedure will work. Consumers of such an index must be worried about two things – first, the extent to which the variables used can be gamed, and second, that they are interpreting the rankings too finely.
PUP: Jumping to another topic you discuss extensively in the book: testing. You note that tests in which examinees are allowed to choose which questions they answer are problematic–why is this?
HW: Because they are not fair. It is insuperably difficult to write test questions that are of equal difficulty. If examinees choose unwisely they will get lower scores than others who choose to answer easier questions. When we build tests with choice we exacerbate group differences. In one test I looked at, women, for whatever reason, seemed to systematically choose harder questions, and thus obtained lower scores, than comparable men who chose the easier options.
PUP: Your book makes it clear that you believe that essays on large-scale standardized tests also pose special problems: they are time consuming for examinees, expensive to score, and yield less reliable scores than multiple-choice exams. Why do you think the College Board opted to add a writing section to the SAT?
HW: I don’t know. That test was added to the SAT after I left Educational Testing Service (ETS), so I was not privy to the discussions involved in its genesis. But I can guess. It is well known that if a topic is tested its likelihood of being taught increases. I suspect that the College Board wanted to emphasize the importance of developing skills in writing clear prose and that they probably figured that if there was a separate test of writing, schools would be more likely to emphasize its instruction.
PUP: Another take away for me was the sheer number and variety of tests that students take. You have a fascinating chapter on AP courses and tests, so let me start by asking why do you think so many high schools have such a high failure rate among their students who take AP exams?
HW: AP courses have a well-deserved reputation for rigor and quality among both parents and educators. Hence there is pressure on schools to offer as many AP courses as possible and to enroll as many of their students in them as they can. Unfortunately, not all students are prepared for such courses. Nevertheless, too often such students are allowed to take the courses; perhaps because school officials yield to parental pressure, or perhaps because schools are judged by the size of enrollment in AP courses, or both.
When this happens, AP teachers are placed in a difficult position. They must choose between teaching the course in a way that covers the material necessary to pass the exam or using up as much class time as necessary for remedial material. If they choose the latter they cheat those students who are prepared to take the course, if they choose the former they must leave some students hopelessly befuddled. Faced with this Scylla and Charybdis, the only sure outcome is unhappiness all around. Schools that screen students carefully before allowing them to take AP courses are the only ones that make full use of the considerable resources required to teach advanced courses.
PUP: And now to one of the hot-button issues in contemporary education policy — Value-Added Models of teacher evaluation. In the book you provide real evidence that this system simply doesn’t work as it stands now. Can you explain this a bit further?
HW: Evaluating a professional’s competence is a task that has a long and rocky history. Many people have worked on this in the past, and many more are working on it right now. Why don’t we learn from what is done with other professionals? How are lawyers or doctors evaluated? I suspect that if patient outcome was the principal datum in physician evaluation we would see many more dermatologists and far fewer oncologists. While it is obvious in medicine that the success of a physician is crucially dependent on the overall health of the patient, proponents of Value-Added Models seem to believe that a teacher can be evaluated without regard to the students’ initial ability. There is this mistaken belief that somehow the magic of statistics can make equal things that are not. It can’t.
PUP: You speak highly of computer adaptive testing in Uneducated Guesses, what is that exactly?
HW:To be efficient a test should be aimed at the ability of the examinee. It makes no sense to ask calculus questions of 3rd graders. Mass-administered tests have to have questions that span the range of ability of the examinees. This means that there will be questions that are inappropriately difficult for some people and inappropriately easy for others. A computer adaptive test, a CAT, fixes this. It asks the examinee a question of middling difficulty. If the examinee gets it right it asks a more difficult one. If the examinee gets it wrong it asks an easier one. In this way it can quickly zero in on the appropriate level. In practice we have found that a CAT only needs about half as many questions as a paper & pencil test to arrive at a score of comparable accuracy. For some purposes this is a significant saving.
PUP: Computer adaptive tests seem like the way to go, so why do you think institutions are not adopting the method?
HW: They are expensive to implement because they require a large pool of items to be included that have all been pre-calibrated. Such a system can be done by a large commercial testing organization (e.g. ETS, the US military, the College Board, ACT) but it is nigh onto impossible for a classroom teacher.
PUP: What do you believe is the largest misconception about the school system in the United States?
HW: That it can work miracles without the full cooperation of parents and without a lot of money.
To discuss this we must use the right words. We must distinguish between education and schooling. The former takes place 24 hours a day and for most of that time is determined by the home, the community, the church, and the school. The latter takes place for six hours a day, 5 days a week, 30 weeks a year. To have an effective education system, all of the components must work together toward common goals. To leave it to teachers whose schools are too often so short of resources that they, the teachers, end up having to buy classroom supplies from their own funds, is a recipe for failure.