Lessons from 23 months of ChatGPT in schools

Balancing Performance Gains with Learning Outcomes in the Age of ChatGPT

Oct 15, 2024

This article was fully written by me, a human. No AI was used in the writing of this article.

It’s been 23 months since ChatGPT was released. I was immediately a superfan and have remained one, closely following each development on an almost weekly basis. Like many academic leaders, I have had mixed feelings about it. On the one hand, it has excited me regarding the possibilities to bolster education, as well as potential positive outcomes for students; on the other, it has deeply concerned me due to the potential for significant learning loss.

Early on, there was obviously no published peer-reviewed research to lean on to offer guidance on how to navigate the use of LLMs effectively. Instead, guidance took the form of educators collaboratively sharing their thoughts, discoveries, and best ideas. Organizations like ATLiS and NAIS hosted collaborative sharing sessions, forums to discuss frameworks, guidance on security and safety concerns, and innovative workshops. It’s an understatement to say that it’s been an exciting and wild ride over the last two years for those of us who have been deeply engaged in this work.

All of this collaboration has been incredibly helpful. It's as if an alien technology was dropped into our laps, and educators did what we do best: we engaged in learning and discovery. After 23 months, we not only have a better sense of what these tools are capable of and how to use them, but we also now have some concrete findings on their impact on student learning. What we have learned will help guide us into the next phase of our response to and engagement with generative AI.

In this edition of Fullstack Educator, recent findings will be synthesized into a helpful framework through which to think about generative AI in K-12 spaces. Through these insights, I will offer some advice on K-12 best practices moving forward. This is not a moment for schools to be sticking their heads in the sand because AI has fundamentally changed the world in which our students inhabit. It would be a disservice to them not to engage AI thoughtfully, providing our students with tools to navigate this new world.

When it comes to K-12 education, there are two different outcomes to consider: performance and learning

As I look back on all of the readings and posts that I’ve engaged with over the last two years, I see that these have always been the two competing outcomes that dominated the conversation. Recent research has highlighted the fact that there is a tension between performance and learning outcomes for educators when thinking about ChatGPT.

Performance is a measure of one’s ability to execute a task, carry out a skill, or achieve a goal and the degree of speed and quality with which it is achieved. For example, I can perform complex mathematics much quicker and more accurately when using a scientific calculator than I can with paper and a pencil.

Learning, on the other hand, is what it sounds like. It is the development of a deep conceptual understanding and the ability to independently apply that understanding through the application of skills and knowledge recall.

ChatGPT boosts performance… A LOT

Multiple studies have shown this to be true. Early on, a collaborative study of Boston Consulting Group showed that consultants who used ChatGPT saw significant gains in their performance.

A recent study conducted by the University of Pennsylvania has shown that the same is true with student work. Students produce higher-quality mathematics work if they have access to ChatGPT. As we will discuss later, this effect is heavily dependent on a few factors, but it totally makes sense, right? It seems obvious that a student with a high-powered calculator and knowledge of how to use it would perform better than a student, even a knowledgeable and skilled one, with only paper and a pencil.

Though much of the current research has focused on mathematics and coding classes, it seems likely to me that this effect will hold true across disciplines and academic domains. Especially as AI continues to improve in terms of reasoning and functionality, I expect that a student trained in the use of AI tools will be able to write a better history paper with high-quality analysis of primary sources than a student operating without AI technology.

When we think about preparing students for life, whether it be in the short term (college) or long term (the workforce), it seems as if it would be irresponsible not to equip students with an understanding of how to use AI tools effectively to see the biggest performance gains possible.

ChatGPT can harm learning outcomes for many students

There is nuance here: please don’t miss it.

There is much research supporting the very basic and fundamental idea that children (humans of all ages, really) learn best from doing the thing they are learning. While a calculator may be quicker and more accurate at arithmetic, solely relying on a calculator would harm a student’s ability to develop a deep conceptual understanding of arithmetic and the mental frameworks that accompany it.

For students to learn deeply, they need to engage in the doing of learning. Recent studies — again mainly focused on mathematics and coding classes — have shown that the use of ChatGPT was associated with decreases in core learning outcomes or, at best, completely neutral outcomes. Students who heavily relied on ChatGPT essentially outsourced their doing, which resulted in very little learning.

One caveat (and this is where the nuance that I was talking about lies) is for students who do not have access to high-quality learning resources, teachers, or tutors. Students who utilized AI LLM-based chatbots that were very thoughtfully created and prompted to never do a student’s work but to engage with them like a tutor or teacher saw significant learning gains similar to those found in studies of the impact of high-quality tutors. I’ll talk more about this later, but the challenge is that students, on their own, don’t know how to make AI act like a high-quality tutor since they don’t know much about the science of learning, instruction, or pedagogy.

These are serious findings, given the rapid adoption of AI tools by many schools. They provide a way for these schools to move forward responsibly. Those involved in education need to begin to think more carefully about our learning outcomes. Which are essential and require that we have students doing the learning in these areas? Which areas aren’t essential and would actually benefit from a significant increase in performance?

Here is an example: finding high-quality research articles to read for a lit review. When I am teaching students to engage with research, the part of the process where they look for research is a slog. Many academic journal databases have poor search functionality; it often feels like I am teaching students how to interact with a dysfunctional and poorly designed piece of tech that requires a magical and strange combination of keyword searches in order to find some quality research. However, new AI journal research tools are getting increasingly better at finding high-quality research. Since the outcome I am looking for is for students to be able to find articles, a boost in performance is ideal, and I am heavily encouraging students to use AI tools for this purpose. There is important training that is necessary for students to use AI tools well for finding research, but it is no more difficult or time-consuming than the older database searches that I had to train students to use.

Now, the part of the process of writing a lit review that requires students to read an large number of articles, critically reflect on them, draw connections, and develop a conceptual/theoretical framework based on their reading — that part I need them actually to do because I want them to develop those skills. Those are core learning outcomes of teaching research methods. I will not allow students to outsource that part of their learning. Once they have honed those skills and graduated, it will be up to them to decide the extent to which they would like AI tools to help them with this aspect of research. Until then, it is my job as an educator to learn to navigate these tools in a way that enhances their learning rather than diminishes it. As they get older, I imagine there will be times when it will be beneficial to them and times when they will choose not to lean on AI tools.

This is the crux of the matter and the hard part moving forward—in order to ensure that AI tools are supporting learning and not hurting it, we have to know what learning outcomes we believe students need to engage in doing in order to learn. It is also important that we understand something fundamental:

Just because something can be done more efficiently and with higher quality using AI, doesn’t mean students should use AI to do it.
Education has never been about efficiency of task completion. It has always been about the creation of transformational learning experiences that deeply form the learner. We cannot afford to lose these experiences.

Training is required for students to benefit from AI

Let me be clear: training could mean a group of teachers collaborating to develop best practices for themselves. I am not suggesting that outside expertise is needed. However, it is very clear that training or intentional exploration of the LLM tools is currently required in order to experience benefits from LLM tools.

Another finding that current research is making clear is the need for training focused on using LLMs effectively. There are a few things that the research asks us to consider.

When LLMs act like high-quality tutors, students can benefit. But, students do not know how to make LLMs act like high-quality tutors.

In the cases where LLMs supported individualized learning and resulted in learning gains, the LLMs were specifically crafted to function like high-quality tutors. Students are not likely to be able to engage with LLMs on their own in this way. In the studies, researchers, in collaboration with teachers, created the prompts that guided the ways that LLMs supported students.

In one study, students were broken into three groups. The control group didn’t use any LLM. One of the treatment groups had unrestricted access to a LLM. The other treatment group had access to an LLM that was designed to act like a tutor and not to provide answers but to guide students in their thinking. Students with unrestricted access to an LLM saw significant learning losses when compared to the control group (no LLM access). Students with access to the tutor-like LLM didn’t see any learning loss but also didn’t see any learning gains when compared to the control group.

The implications of these findings are that students do not currently possess the skills necessary to use LLMs on their own to support learning gains effectively. Self-guided LLM use is likely to hurt student learning in the short term, while people are still learning about LLMs and how to use them well.

Another study corroborates these findings, showing that it takes careful design and implementation of LLM tutors in order for positive learning gains to be found. Researchers developed a high-quality LLM tutor designed on research-based best practices for tutoring and gave access to it to students in an under-resourced network of schools. Students with access to this LLM tutor saw significant learning gains.

The key takeaway from both of these studies is that intentional design with well-defined safeguards is required for LLM use to result in positive learning outcomes.

Current research suggests that unrestricted use of LLMs by students pose a significant risk of learning loss.

This is not a finding that schools can ignore or approach haphazardly.

Another interesting anecdote from these studies was the finding that students who use LLMs, both unrestricted and well-designed, experienced a significant increase in their perceived learning that was greater than their actual learning. This is most concerning in the cases where students experienced learning loss as a result of using a LLM. AI tools produce unjustified confidence in students.

Leveraging LLMs for increased performance is a skill that needs to be taught or developed.

Multiple studies have shown that there are factors that impact the extent to which LLMs boost performance. For example, in the Boston Consulting Group study, high-performing consultants who also had deeply specialized knowledge were able to prompt ChatGPT in ways that resulted in bigger performance gains than consultants who lacked specialized knowledge.

Likewise, students who were exposed to LLMs designed to respond to prompts with questions and examples had significantly greater engagement with the LLM tools than with unrestricted LLMs. This implies that students who don’t know how to prompt LLMs effectively found them to be less helpful to engage with.

It is also known that, due to the simplicity of the user interface, many folks don’t know the scope of functionalities available in ChatGPT. The ability to search the web, generate images, work with spreadsheets, run code, and use sophisticated reasoning models is not obvious to all users. Nor is it obvious when or how it is helpful to activate each of the tools.

An example of this challenge is using ChatGPT to do mathematics. When using the base model, ChatGPT makes mistakes when solving math problems. However, when the Data Analysis feature is activated or when the Advanced Reasoning model is used, significantly fewer mistakes occur. This is a socio-economic and accessibility concern, given that these advanced features require a monthly subscription.

It takes intentional training and practice to benefit from the performance gains that are possible through using ChatGPT

This means that when faculty identify a skill where performance is more important than learning outcome (like my example of searching academic databases for articles), the performance gains are contingent on the quality of use of the AI tool. It is not obvious to most people how to get higher-quality results from an LLM or AI tool. It takes practice and experimentation. In the absence of instruction, it is unlikely that students will achieve the highest levels of performance gains.

If our goal is to teach students how to benefit the most from performance gains, then we need to teach this with intention, especially when we weigh the risks of learning loss associated with unrestricted LLM use.

What does this mean for schools and educators?

First, we need to move cautiously and follow both the science and our own experience.

We should move fast to give faculty access to the highest quality AI tools possible and to encourage them to engage with them to develop expertise intentionally. Schools are mostly on a level playing field at this point. However, schools with faculty who are engaged in this work will have a significant advantage as generative AI continues to advance and become more ubiquitous. Schools that cannot provide access to high-quality AI tools or that have a resistant faculty are likely to fall behind, and their students may suffer in the long run. It is important to remember that students are using generative AI tools like ChatGPT regardless of how these tools may be utilized in class. Hence, students with unrestricted and uninformed use are at risk for learning loss.

It is imporant for faculty to understand that learning how to use LLMs is essential, just like learning how to use the internet and email, but does not mean that they have to utilize these tools in class. We need informed facutly bodies. That is different from a universial and unrestricted adoption of an AI tool with students.

Putting AI tools in the hands of students needs to come more slowly and follow faculty training and further research.

Second, we need to develop high-quality AI Literacy curriculums that teach students how to leverage AI tools for both increased performance and increased learning outcomes. This is possible and can be taught. We also need students to understand what kinds of use are harmful to learning and would likely result in learning loss. This is similar to the early and ongoing development of media literacy and teaching students about effective and safe internet use. It will naturally develop over time, but schools that begin early will be offering their students an advantage and will protect them from potential harm.

We need to teach students how to leverage AI tools safely, ethically, and effectively. The speed, thoughtfulness, and quality with which schools do this will set them apart.

Third, we need to lean into the work of skills/competency-based mapping and design of curriculum. Schools that are clear on their learning outcomes will be able to make wise and informed decisions regarding where AI fits into their curriculum and will boost learning and performance. Schools that are fuzzy on their learning outcomes are likely to struggle in this area.

There is significant risk of misalignment of the curriculum if AI is implemented without a clear understanding of learning outcomes. For example, one 9th-grade teacher may feel that it is appropriate for students to use ChatGPT to help formulate thesis statements for research papers. If some or all 10th-grade teachers disagree and expect students to be highly skilled at formulating thesis statements without ChatGPT, there is now a misalignment within the curriculum. This is one simple example, but a lack of thoughtful implementation, collaboration within the faculty, and controlled experimentation will likely lead to many, many small misalignments like this one. This creates a chaotic experience for students.

Fourth, we cannot forget the value of unrestricted play and discovery in the development of learning. Students already have access to many AI tools via the internet. Regardless of what rules we put in place, they will play with and experiment with these tools. We might as well intentionally create space to play and learn together.

Find spaces to allow for safe creative exploration of AI tools. Teachers and students exploring and learning together builds community.

If schools or teachers ban the use of AI, students will explore in secret, which is less safe. You need to make room in your school and classroom for safe experimentation with AI.

Fifth, you need to teach faculty and students about the potential dangers and harms that can come from AI.

Students and faculty need to understand the dangers of AI-generated deepfakes and how to protect themselves. They need to understand the potential biases that exist within AI. All of this matters and much of it is still being researched and understood in real-time as the technology develops.

Finally, I recommend that schools consider offering a Machine Learning sequence. These have been increasing in popularity over the last decade or so. Machine Learning has never been as accessible to high school students as it is today, and it is becoming increasingly so. After just a few lessons, I’ve had students create powerful neural networks that solve real-world problems. I’ve engaged with quite a few schools that are doing this work, and the level of research and innovation that this kind of education can empower students to engage with is nothing short of incredible.

Machine learning is going to be increasingly important in the future and we need to begin introducing studnets to the subject in high school.

This is community work. Let’s take steps together.

This is all very new, and there are no experts. There are just a bunch of us, teacher-explorers and researchers, who love to learn and want the best for our students. The only way forward is together. We need to learn with and from one another.

Research on ChatGPT in K-12 settings is new, and the findings from the first studies are just now being shared.

Let’s form a community. Let’s share our successes and learnings with one another. If you have experimented with ChatGPT and would like to share your experience and learning, consider writing a guest post for Fullstack Educator or share your work with me, and I’ll highlight it in this newsletter.

Please share your thoughts via the chat feature.

And please consider subscribing and telling your friends about the Fullstack Educator Newsletter.

Thanks for reading Fullstack Educator! This post is public so feel free to share it.

Sources

Here are the sources that I’ve been reading that are informing my thoughts. The studies that I reference throughout this article come from these sources.

Ait Baha, T., El Hajji, M., Es-Saady, Y., & Fadili, H. (2024). The impact of educational chatbot on student learning experience. Education and Information Technologies, 29, 10153–10176. https://doi.org/10.1007/s10639-023-12166-w
Albadarin, Y., Saqr, M., Pope, N., & Tukiainen, M. (2024). A systematic literature review of empirical research on ChatGPT in education. Discover Education, 3, 60. https://doi.org/10.1007/s44217-024-00138-2
Chiu, T. K. F., Ahmad, Z., & Çoban, M. (2024). Development and validation of teacher artificial intelligence competence self-efficacy (TAICS) scale. Education and Information Technologies. https://doi.org/10.1007/s10639-024-13094-z
Dell'Acqua, F., McFowland III, E., Mollick, E., Lifshitz-Assaf, H., Kellogg, K. C., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K. R. (2023). Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality (Working Paper No. 24-013). Harvard Business School. https://www.hbs.edu/ris/Publication%20Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf
Deslauriers, L., McCarty, L. S., Miller, K., Callaghan, K., & Kestin, G. (2019). Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom. Proceedings of the National Academy of Sciences, 116(39), 19251-19257. https://www.pnas.org/cgi/doi/10.1073/pnas.1821936116
George, A. S., Baskar, T., & Srikaanth, P. B. (2024). The erosion of cognitive skills in the technological age: How reliance on technology impacts critical thinking, problem-solving, and creativity. Universal Innovative Research Publication, 2(3), 147–148. https://www.researchgate.net/publication/381452876
Gokce, A. T., Topal, A. D., Geçer, A. K., & Eren, C. D. (2024). Investigating the level of artificial intelligence literacy of university students using decision trees. Education and Information Technologies. https://doi.org/10.1007/s10639-024-13081-4
Jeon, J., & Lee, S. (2023). Large language models in education: A focus on the complementary relationship between human teachers and ChatGPT. Education and Information Technologies, 28, 15873–15892. https://doi.org/10.1007/s10639-023-11834-1
Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Gűnnemann, S., Hűllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., Stadler, M., Weller, J., Kuhn, J., & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 102274. https://www.researchgate.net/publication/367541637
Keppler, S., Sinchaisri, W. P., & Snyder, C. (2024). Making ChatGPT work for me. Proceedings of the 2024 ACM Conference on Computer-Supported Cooperative Work and Social Computing. https://ssrn.com/abstract=4700354
Qin, Q., & Zhang, S. (2024). Visualizing the knowledge mapping of artificial intelligence in education: A systematic review. Education and Information Technologies. https://doi.org/10.1007/s10639-024-13076-1