And does better than you might think!
Wonderful essay, Maya. Yet another piece of evidence that Harvard has lax standards ;)
The answer is simple: have Texas Instruments release a simple LLM and then force students to buy that same largely unmodified LLM for the next 40 years at an inflated price.
My reaction is that the points about how weak ChatGPT was on analysis, substance, and argument are really important and make me optimistic about the role of humans. I think we have a weakness for well-written prose that could be overcome as that becomes abundant. The future is in being super-rational and focusing on the substance. Stop giving credit for well-written work, just as you wouldn't say "well this has no typos so I guess it must be right."
My blackpilled take on all this (as somebody with a PhD in a humanities field who TA’d classes at an elite but not quite Harvard-level university) is that “passing grades in humanities classes” even at elite universities is not— and may never have been— a good signal of genuine critical thinking, analytic writing, or textual research skill. Chat-GPT can get passing grades because we give passing grades to a lot of intellectually empty fluff.
At the individual level, the institutional and interpersonal incentives to give almost everyone at least a C are very strong (actually failing students comes with both administrative overhead and the knowledge that your decision will probably make that student’s life materially worse). Departments are also reluctant to raise standards because of the possible negative effect on enrollments.
Ultimately, this dynamic is really bad for the long-term health of these academic fields. Low standards make it impossible for departments to credibly certify that their students actually have their field’s core skills— and that their best students are genuinely excellent rather than just baseline competent. This hurts the disciplines’ prestige and their graduates’ labor market prospects. It also attracts lower-quality students to the major, reducing the quality of classroom discussion and engagement— limiting opportunities for the strongest students to deeply hone their skills.
Unfortunately, faculty in these disciplines are reluctant to identify this issue as a serious collective action problem— and certainly aren’t taking steps to solve it. LLMs might provide some much-needed impetus to change, but I’m not holding my breath.
The in-person essay is the fix. A European style interview could also be fun.
First, terrific essay, Maya. Wow.
In-person proctored exams, potentially combined with PhD-type oral defense, can solve this looming problem in education. But that is gonna mean a VERY different university environment than the one we have now. More expensive and more exclusive, it won't be an education for the masses.
I'm less concerned about AI's effect on the "cerebral class" overall. Generating text is one part of those jobs (journalism mostly) but is not the major focus. Personal interactions, value judgements, crafting bespoke solutions and creatively helping your employer make money are all big parts of those careers and AI can't do those things. Yet.
My initial guess is that it will indeed lead to law school style exams across the humanities…but the future has a habit of throwing slurves at us.
(The reality of the internet is that very smart students could always get away with plagiarism by simply rewriting…it still saves time on writer’s block issues.)
As for law, I’m senior enough that my income depends upon making the hard decisions. I’m a bit sanguine about what AI really means for brief and contract or commercial instrument writing — no one writes those from scratch anyway (the real reason why WordPerfect 5.1 was the favorite legal word processor is no one since then developing a word processor has understood how lawyers work.)
So, Maya, be honest - did you write this piece, or did ChatGPT?
As an engineering major a few years out from college, I think the humanities can learn a bit from STEM courses. Homework was important as a learning tool, but not for grades.
Profs would often give the answers for the problem sets ahead of time, but students had to come up with the work in between - I thought this was useful to de-emphasize the answers relative to the work for both students and instructors.
Instead, most of my upper-level grades came from intensive exams and extended group projects. I don't think the exams are necessarily relevant to work in industry, but they're a great way to assess what students know without much risk for BS. Proctoring was fairly light-touch owing to an honor code with good buy-in across the school.
For younger grades, especially grades 4-8, I'm worried. There is absolutely nothing stopping them from doing all their take home written work via AI. Those kids are already more tech savvy than their parents, and can find a way to access anything that's available. The time, resources, and agility required for elementary and middle schools to prevent this themselves seems like it's not there and parents broadly do not have the ability either (some will, but as a whole, nope). And these are crucial ages for writing skills that are life skills, not academic ones. I fear we'll have many students that come into that age range proficient in reading and writing and then slowly drift downward over time before it's obvious they've been using AI and need remedial help.
I think the question about higher Ed is much less important than younger grades. For higher Ed, so much depends on whether AI can break into the realm of consistently great analysis rather than the mere good writing it produces today (both possibilities seem plausible to me). If it can do that, we're talking about a major breakthrough and the ability to measure college students grades is small potatoes in comparison. If it can't break through, colleges will tweak assessment methods or raise standards and it'll be fine.
At Oxford you’d eventually have to sit down with your don and they’d tear your paper apart in front of you while expecting you to defend every choice - both content and style.
Hard to scale but maybe a path forward since humanities enrollment is declining anyway.
yeah yeah, but does it get docked on the "personality" test when trying to get in? 👀
Great first piece, Maya! A couple of thoughts from a grad student in the philosophy department:
As many have noted in the comments already, I think this is at least as indicative of the rampant grade inflation at Harvard as of chatGPT's abilities. But I wanted to explain a little more of *why* I think I've capitulated to the inflation and how it pushes back against your final thesis, that chatGPT's ability to do well on undergrad coursework indicates it will do well at the jobs that coursework is supposed to prepare you for.
So first off, I think its a big deal that these were freshman courses. I've TFed two big intro philosophy classes, and I keep the knowledge in the back of my mind that most of these students are not going to major in philosophy. The biggest things I want them to take away from their essays are:
1) being able to write legible, specific, concise prose (chatGPT is full of the flowery imprecise language that flies in other humanities classrooms, but not in philosophy. I am constantly training my students out of bad habits here.)
2) having an interlocutor in mind and actually thinking carefully and generously about what it would take to convince them of a thesis they come into the paper disagreeing with. (As far as I've seen, chatGPT doesn't do very well on this more substantive front, and I kind of doubt it will get much better, because *most* writing you'll find in the world doesn't really do well on this front, even among philosophers.)
That's not to say chatGPT wouldn't do well on my freshman essays; it writes a *lot* like my freshman at the beginning of the course. It would be suspicious to me if the writing contained the same flaws with no improvement towards the end of the semester, but I'd probably assume it was a lazy student who didn't bother to read my feedback, and give them a B+ at worst.
But here's the thing: I *know* most of my students aren't going to be philosophers, particularly the ones who don't bother to respond to my feedback. And it's just not worth tanking my teaching evaluations to give mediocre grades to mediocre students in a freshman course. On the other hand, if my juniors were turning in chatGPT-level work, I'd sit them down and have a serious chat, because presumably by that point they are seriously considering a career in philosophy, and that standard simply won't cut it.
So the tl;dr here, I think, is that what flies for a freshman essay is at least in my discipline not indicative of the ability to succeed in the major, *much less* in the field professionally. Maybe this is less true of fields where you have job options besides academia, but I think that means the humanities at least are relatively safe.
So this is an interesting article and it’s on a theme I’ve heard a lot about (academic cheating) but it’s totally perpendicular to my personal experience.
I live in Australia and recently graduated from a law degree (it’s undergraduate here). I tried out Chat GPT to see what it could do and it was almost entirely useless. There is no way it would’ve been able to get even a passing grade on any legal essay or take home exam I’ve done. It seems the need for highly specific citations for all claims, synthesis of highly complex ideas and the diversity of sources of law were well beyond what Chat GPT could do.
That being said, it’s definitely improved over the last 18 months and maybe some day it’ll be able to do it - but it is well well off as of now.
Which is why I am very skeptical AI is going to be replacing very many lawyers in the near future (expect maybe improving doc review automation?)
It seems like the key difference here is that most of these essays/take-homes seem very high schooly to me. Again might be due to difference in how University works in different countries - but I’ve never received course work that was this easy - even in 1st year. Is this typical?
My thought on writing papers is that, besides moving to in-class essays, professors in non-creative writing style classes could place a lot more emphasis on including footnotes/endnotes done in legal writing style -- i.e., every statement of fact needs to be "pin cited" to an exact page in the source, preferably with a quotation in the text or a parenthetical quotation after the cite -- and then, instead of focusing mostly on the body of the text, the professor primarily focuses on checking a random sample of the citations to make sure they actually say what they purport to say. This isn't very glamorous, but, given that ChatGPT 4 reportedly still has issues with "hallucinating" sources, it seems like it's still the best non-automated way to see if work product is human generated.
As a former (science) TA, I started thinking about grading humanities essays and shuddered. ChatGPT gets my rubber stamp B+.
Still spouts complete nonsense when you ask it a science question though.