So, I think it’s worth stressing that what makes a school good depends on definition. How likely it is to make a low income kid attend or graduate college is certainly one important measure for national policy makers to consider, but is obviously not what upper middle class parents- for whom college graduation of their kids is almost a given- are looking for when they seek a good school.

It would be a mistake however to conclude that school choice ought not to matter for those parents. Teaching at an elite college I’m struck everyday how radically school preparation affects the outcomes of my students. Presumably it also affects their quality of life while in college and both the college experience and gpa have some medium and long term consequences.

Finally we ought to remember that good schools should be about more than just academics. The social skills you develop, and hopefully friends you make, can literally change your life too. These aspects should not be neglected in public policy either - socialization is a key function of education- although I’m not sure exactly how we ought to measure these aspects.

Expand full comment

I read all of this and I still can’t figure out what makes a “good” school “good” other than it has “good” outcomes, which seems… circular?

Like, what are these schools doing to create these outcomes that other schools aren’t? Is any of it observable? Is any of it replicable at schools that aren’t doing those things?

Expand full comment

I could not agree with this article more. I currently teach at a somewhat high performing charter. The school serves a population that is overwhelmingly poor and from underrepresented minority communities. Demographically similar schools in our area usually score at around a 7% pass rate on state wide assessment whereas 15% of ours students passed.

The differences are not massive, but real. While its likely some of the effect comes from direct and indirect, there are a thousand things the school does to help our students succeed. Many of the biggest differences I have seen between my current school and previous lower performing schools I've taught at aren't measured in tests at all.

I suspect my school is reducing the gap between our students and their more privileged peers by about 10%. The school is far from a silver bullet, but that doesn't mean high quality education can't make real gains.

Expand full comment

There’s no way school matters; I have 23 years of schooling and I’m dumb as shit

Expand full comment

The reform-pessimistic position is that *education is not an effective instrument at reducing social inequality*, not "lol, nothing matters." Education is tremendously important for its ability to deliver absolute gains at a population level and help students achieve their potential, while providing high-quality daycare and socialization opportunities.

But the pessimists observe that every "once-promising idea" has turned out to be a statistical mirage or an epiphenomenon, just like you observe in this article, despite stowing the information away in a footnote! After touting how a recent paper shows that "achievement scores are rising, and gaps are closing across income and racial line", you acknowledge:

> The data for these three papers ends in 2015, 2015, and 2017, respectively. Achievement scores have fallen since then, particularly for lower-performing students and especially in the wake of COVID-19. But I’d attribute those declines, in part, to the “education reform nihilism” movement that was already taking root by then.

To put it politely, I am skeptical that it makes sense to attribute declines in scores over such a short time frame to the malign influence of a dissident minority within the field. It seems more likely that the gains were actually measurement errors, p-hacking apparitions, and/or confounded.

(Edit, added missing conjunction)

Expand full comment

Every couple days for a while now I’ve read an anecdote from a teacher that suggests that conditions in public schools across the country have basically collapsed? The kids can’t stop looking at their phones, half of them don’t know how to read, can’t be held back, teachers getting assaulted and the kid is back in the classroom the next day, etc. This seems like a huge story?

Expand full comment
Mar 25·edited Mar 25

Another long one from me. Here's the gist: the poor quality of tests used to measure student growth is important and, I think, overlooked. I mentioned in a comment on one of Matt's recent education posts that these tests suffer from both low reliability and low validity. The tests are, in short, poor-quality instruments for informing teacher accountability policies or determining school quality. We need better tests if we want these policies to be meaningful in any way.

The tests are not reliable because they are not reproduced under the same conditions. States change test constructs from year to year, often in major ways that make it a completely different test - i.e. adding and removing entire sections, changing the number of multiple-choice options per question, lengthening or shortening writing portions. Yet, these tests are used to determine student performance and compare both across years of takers in a grade and as growth measures following a kid's test scores up through the grades. Some states even flat out say that they measure "new-learning" and that their tests are not meant to compare across school years, but then also have policies where those tests are used for school quality purposes (New York, for example). Furthermore, student populations fluctuate demographically, schools are rezoned to include different student populations, and tested student bodies vary in meaningful ways due to issues from attendance. Texas and Florida are busing asylum seekers, often women and children, to New York and DC. Those kids are being enrolled in local schools. They take the tests and fail and make the schools look worse. Is it any wonder that the schools are trying to block these students' enrollment?

In an example from my personal experience, when our state accountability testing became computerized, the school lacked enough computers to test all the kids at once. The last kids to take the test were students with special needs students. Because we had so few computers, they were tested days after the official end of the testing period laid out by the state. This meant that the scores for the SPED students at my school were not initially included in the headline data and ended up in an amended report released later that year. All the attention (and accountability) went into the first, incomplete score report making the school look significantly better than it was. These tests are not reliable. Also, the tests were only offered in English, and any kid who didn't speak English was still expected to take the test. In English. But that's probably more of a validity issue.

Likewise, these tests have low validity. Or, probably more accurately on my part, these tests are being used in ways go beyond what they actually measure. The most famous example here is teacher accountability through measuring their "value added." The idea is that if we test a class at, say, the end of fourth grade and again at the end of fifth grade, we will know how much more that class of kids has learned (or can do) as a result of that 5th grade teacher. But there's a catch. What ends up happening a lot of the time is that there really isn't a value-added measure taking place. Instead, we end up with the less useful "student growth measure" which does not adequately control for student demographics or changes in the student population of that teacher's class (see above). States think that by simply reporting the average growth of various sub-groups (african-americans, free-and-reduced-lunch, ESOL, etc.) that they are somehow controlling for those differences. But VAM requires proper statistical controls that are often not put in place by the districts or states collating the data. The validity problem emerges because these tests are probably measuring out of school factors just as much as they are measuring in-school factors.

Additionally, the way these tests are used does not differentiate between the impact of multiple teachers. If a student receives strong instruction from her social studies teacher or her science teacher, this can positively impact her scores on ELA and math. I always struggled in math, but after taking a physics class, my algebra skills improved dramatically and my math scores in the latter half of high school were way better. The ongoing early childhood longitudinal study, for example, finds that students who receive more social studies instruction end up with higher reading scores (interestingly, it results in higher math scores for boys but not for girls). A test that cannot tell us with any degree of precision which teachers contributed to a student's growth is not a valid test when used for teacher accountability purposes.

Likewise, Aldeman points to attendance and shows us a graph saying that, on average, *35%* of students were chronically absent in the 2016-17 school year (Holy shit, that's high!). This means they missed upwards of 10% of school days that year: ~18-20 days of missing school - and that's the bottom cutoff, many kids missed more. We also see in that graph that chronic absenteeism is higher in african american and hispanic/latino populations. We know that those students tend to be in schools that are predominantly students of color. Schools that enroll high percentages of students of color have higher rates of chronic absenteeism. When those kids are there on test day, they take the test, they fail, and we make a note that their teacher did not add value (but, really, uncorrected student growth). But is that what the test's results are showing? Is the test showing us a shitty teacher or a kid who didn't show up for a tenth of the year? Or, maybe, it's the teacher's responsibility to ensure student attendance so that she can proceed to add value? There is not enough work being done in state departments of education to make sure that tests properly measure what they are meant to measure.

Compare all of this to, say, NAEP or the SAT, or PISA where these tests are deployed with sometimes years of construct evaluation, high correlation coefficients (good luck finding any states that even bother to measure that!), and careful work to ensure representative populations take the tests year after year. The accountability tests used by states to make determinations about student growth, teacher performance, and school quality are, frankly, awful. If we're going to write ed policy, we need to ensure the mechanisms that are used to inform and enact those policies are effective.

Expand full comment

“Weapons of Math Destruction” had some insightful counter-claims about measuring teachers against expectations. I think the summary was that the std dev for a single teacher in a single year is so high that the statistical evaluation mechanism against expectations is almost useless, and the people in DC who came up with the idea caused a lot of damage and ill will with the idea. Would be curious to hear the point grappled with / countered as part of this series.

Expand full comment

We have a big challenge measuring outcomes of school quality because so much depends on going to college. If schools incrementally improve the college attendance rate of their students, they look good. What about the kids who were already going to college, or the ones who probably weren't going to go anyway? All of these studies should look at college-educated outcomes, non-college outcomes, and the shift between these two groups separately, and we generally need a better mental model of good education. Encouraging and enabling more kids to go to college is great, but we risk focusing on that too much.

To get on my soapbox for a minute, our model of college needs to change too. A lot of kids are impatient to get into the real world and do things that have a real impact. College just continues the "sandbox" aspect of formal education, which I think is often pretty demotivating. I bet that results in a lot of kids skipping college as not for them, or showing up and then floating along for a few years with a sense of alienation.

It would be interesting to see more new models, like say work-based college. Let's say you want to be a graphic designer. You do a 6 month bootcamp to get basic skills, then you start a half-time entry level job. The other half of the time, you have classes to fill in your professional skillset and perhaps to teach general education requirements. The education connects to your real work as well as rewards like raises and promotions. If it turns out that you don't like graphic design, you switch jobs during college rather than waiting until your twenties to test the job market. By the time you graduate, you already have X years of work experience and are more likely doing something you enjoy.

Expand full comment

It's crucial to evaluate the impact of spending on schools vs. other spending priorities, particularly when it comes to using education as a poverty fighting tool.

I read FDB's post and my conclusion was that the impact of incremental education spending needs to be compared to spending on programs like Child Tax Credits.

I suspect money directly in the hands of parents is more impactful. We did have a one year national experiment in this regard in 2021 when there was a significant federal CTC and childhood poverty went down by about a third.

My second ever post, below, was about the demise of this program. "The Cruel, Untimely, and Much Too Quiet Death of the Expanded Federal Child Tax Credit (the post is under 1,000 words)


Expand full comment

To what extent is the 0.05 standard deviation improvement per decade real? I want to see big improvements, not something only a statistician can peer at and see a difference. I conclude covid vaccines work and masks didn’t because I can look at a graph.

Expand full comment

The Harvard study mentioned is plausible, but I did not understand how they identified school “quality” independent of the observed outcomes.

“But it’s still hard to convince the public that school quality should be defined based on outcomes and not all of the surface-level inputs.”

? Isn’t the real difficulty that “quality” is popularly measured by outcomes, not improvement in outcomes?

Expand full comment

Was gonna reach for that "like" button, the first I've ever granted to a non-Milan guest post (I think), but...then you did the same thing Matt does, mischaracterizing FdB's stance based on what seems to be an overly-literal surface level read. This runs both ways, with Freddie continually dunking on Matt's YIMBYism...based solely on his tweets, seemingly. Not sure where this historical beef comes from, but as someone who subscribes to both blogs, it's very weird.

The article in question: https://freddiedeboer.substack.com/p/education-commentary-is-dominated

The larger meat behind that bone: https://freddiedeboer.substack.com/p/education-doesnt-work-20

...building on FdB's first book, "The Cult of Smart", available at your friendly local bookstore.

Obviously no one's gonna wade through all that just on my say-so; I'll try and summarize one last time: Freddie doesn't claim schools "don't work" writ large, actually he goes to great pains showing that of course there's absolute gains from school, and yes a lot of that came from successful reform. I notice that you've grokked his distinction between absolute and relative gains, which is one step in the right direction. But even here, the argument is not that disparate relative gains ("gaps") never change - as you point out, research is quite clear on this, and anyway it follows naturally from absolute values changing. It'd be a very strange situation if absolute shifts just happened to coincidentally line up along the current relative-gap lines.

Here's the rub though: even though we can move the needle here and there, sometimes even in meaningfully big ways, the bulk of those gaps will always come from outside school. Like all other measures, school reflects our society's inequities back at us, so at some point one must change society. These are exogenous factors which school can't do much to change. (Wraparound services attempt to ameliorate somewhat, and are even sometimes kinda successful e.g. school lunch, but there's plenty of reasons to be wary of "mission creep" here. Iron Law of Instutitions, Law of Comparative Advantage, credentialism, etc.) Given this plateau, we ought to be really clear about what school can and cannot do...that is, think a bit like an EA, and realize there are probably better ways to improve those long-term outcomes than indirectly via school's diminishing returns. Get everyone through highschool, sure, get everyone through college? Hold up.

That was the point of the optimism-bias post, that "school" has become this way to launder hopes and dreams, a proxy battle that sucks energy out of larger debates. (School lunches work, but wouldn't be necessary if we actually solved hunger in America, which is totally within our power and yet we don't. Incremental gains are great, but don't forget to think bigger!) Ultimately, focusing on improving this correlate to long-term outcomes is a type of Goodharting*. Which isn't to say it doesn't have its own unique and intrinsic benefits - I would not want every kid to be "unschooled", and it's a load-bearing part of the current compromise on welfare (e.g. school-as-childcare, freeing up parents for more work hours). The question is, what is schooling for, what role do we want it to play in society? Because we're really muddled on that direction right now, to everyone's detriment.

I think I'll sit out of future education-related posts if they do the same thing. Too much talking-past, which is unfortunate cause I don't see the positions as contradictory. Not SB's strongest topic.

*And of course this also leads to Goodharting the Goodharting: https://educationrealist.wordpress.com/2021/09/18/false-positives/

Expand full comment

How is it determined what a good school is? That seems like exactly the sort of extremely difficult thing to measure that pops up all the time in social science, and instead of acknowledging that it's difficult, "experts" just take bad metrics and pretend that they're good. Is that what's happening here?

Expand full comment

I would be fine with school reform if it relied on clear, clean data from about K-6. After that, how can you score and pay a teacher based on value add if their kids are taking a 8th grade test and the kid in their room has been to 6 schools in their life and reads at a 3rd grade level? If that teacher spends a ton of limited time focusing on building that kid’s reading fluency, and even succeeds, that still wouldn’t show on an 8th grade test, because that test is still far beyond a student’s reading level. These students are statistical noise because there’s nothing you can do for them that shows up on a test. Happen to have fewer of them in one year? Then your scores look great.

Multiple this by 10 in high schools. I worked under value add, and one year I was a genius AP teacher and the next I was awful. The truth is somewhere in between, but it’s not measurable as long as kids who are extremely behind are included in the data.

I’ve come to the conclusion that those kids need one more burst of intensive catch up support in middle school, and then if they’re still behind, they go immediately to an alternative school and graduate in three years with a vocational degree.

Expand full comment

This is maybe a nitpick, but...

"It turns out that schools had a big, long-term effect on students. Low-income students who attended a high school at the 80th percentile of quality were 6 percentage points more likely to earn a bachelor’s degree and earned 13% more money (or about $3,600) per year at age 30."

The linked article makes it clear that this increase is in comparison to be students who attended a high school at the 20th percentile of quality. The effect is pretty small for going from 20th to 80th percentile.

And given that one of (if not the) main factor in school quality is the composition of the student body (and their parents), then there is going to be a hard limit on how many low-income students you can add to a given high-quality school before it is...no longer high-quality.

Additionally, the paper was limited to schools in Massachusetts. Which isnt inherently a problem, but my understanding is that MA is generally a significant outlier w.r.t. to education. So I'm not sure how useful it is.

Expand full comment