AI Round Up - A Wild Ten Days

New Releases and What They Mean for Schools

Feb 28, 2025

February 17–27, 2025 – In just ten days, we saw a stampede of AI announcements from the biggest names in tech. If you blinked, you might have missed a new AI model or tool that could change how we teach, learn, and lead in education. Let’s break down the highlights – and more importantly, what they mean for educators and school leaders. (Spoiler: it’s okay to feel equal parts excited and uneasy.)

OpenAI

GPT-4.5: The Half-Step Before the Leap

OpenAI decided one big update wasn’t enough – so they rolled out two. First came GPT-4.5, an enhanced version of the AI powering ChatGPT. OpenAI’s CEO, Sam Altman, called it a “giant, expensive model,” noting they literally ran out of GPUs while deploying it. What’s the big deal? GPT-4.5 is better at recognizing patterns and shows a significantly higher level of emotional intelligence (yes, the chatbot might feel a tad more human). It hallucinates (makes stuff up) much less often than previous models. In plain English, it’s more likely to get facts right and feel like you’re talking to a real person. It also supports image uploads for vision, and coding tasks, though voice and video features aren’t live in it yet

“Deep Research” Mode: Your AI Research Assistant (with Limits)

OpenAI’s new “Deep Research” mode sits right inside ChatGPT (note the blue button), aiming to turn the AI into a super research assistant. OpenAI rolled out a feature called Deep Research, and it’s one of the most ambitious (and controversial) moves in AI this year. The idea is tantalizing: you give ChatGPT a complex prompt – say, “Analyze the last 5 years of research on project-based learning outcomes in high school science” – and then sit back as the AI goes off to the web, finds and reads sources, and comes back with a detailed report complete with citations. In essence, it’s like an autonomous research intern that works at light speed. For busy educators and grad students, that sounds like a dream come true (who wouldn’t want a 5-minute literature review?).

Here is an example of Deep Research identifying the mathematically perfect burrito combination (ingredients, proportions, arrangement).

However, there’s a catch: Deep Research ain’t cheap or unlimited. Initially, OpenAI is only enabling this mode generously for those on its $200/month “Pro” plan. Everyone else gets a taste in tiny doses – 10 uses per month for ChatGPT Plus subscribers, and a mere 2 uses on the free tier at launch. Yes, you read that right: free users can try it only twice a month. The rationale is likely that each Deep Research query consumes a ton of computing resources (it can run for 5-30 minutes per query doing all that work. Still, the steep paywall has drawn some groans. $200 a month is beyond most individual educators’ budgets, putting this powerful tool mostly in the hands of well-heeled organizations… for now. OpenAI says it plans to “scale up” usage limits over time, so we may see the feature gradually open up more broadly if the kinks get worked out.

There’s also the question of quality: early reports say Deep Research is impressive but not infallible. It does produce lengthy, citation-packed reports, but those can still include factual errors or “hallucinations” if the AI misinterprets sources. OpenAI is pitching it as an experiment, almost a glimpse of how AI might autonomously handle research tasks. For educators, the potential is huge – imagine offloading some of your lesson planning (like finding good current event articles that specifically fit into this week’s unit), or having AI analyze student data (that’s right, you can have Deep Research analyze your own PDFs and Spreadsheets). But the limitations (both in access and accuracy) mean it’s not replacing human researchers just yet. In short, Deep Research is a cutting-edge tool with a sky-high price and training wheels still on. It’s worth watching (and maybe trying out if you have ChatGPT Plus), but don’t plan your research strategy around it just yet. Especially because, as we’ll see next, the competition is already cooking up cheaper alternatives…

Perplexity AI

Free “Deep Research” for the People

Perplexity’s interface now offers a “Deep research” mode too – and unlike OpenAI, they’re making it freely available right from the mode menu. If OpenAI’s Deep Research is a luxury item, Perplexity AI is positioning itself as the budget-friendly knockoff that might actually outperform the original. Perplexity – a smaller AI search startup – saw OpenAI’s $200/month research bot and basically said, “Nah, we can do this much cheaper.” In a move that must have made OpenAI cringe, Perplexity this month launched Perplexity Deep Research, a feature that also automates the process of searching the web and synthesizing information into reports. The killer difference? Cost: Perplexity’s version is free for anyone to use up to 5 queries a day, and if you pay for their $20/mo plan, you get a whopping 500 queries per day. That’s right: they turned what OpenAI limited to 2 uses for free users into something you could potentially use daily without paying a cent. It’s a bold swipe at both OpenAI and Google’s similar offerings.

The Ultimate Guide to Creating Slide Decks with ChatGPT: Beyond Just Content Generation

Price aside, does it actually work well? Early indicators say yes, quite well. Perplexity claims its Deep Research scored about 20-21% on “Humanity’s Last Exam,” a giant reasoning benchmark – outperforming several other AI research agents including Google’s version. (For context, OpenAI’s Deep Research is reported to score “higher than 20.5%” on that same test, so likely in the same ballpark. Google’s Gemini-based research mode lagged far behind at around 6% – more on that later. Benchmark bragging rights aside, Perplexity’s tool performs a similar multi-search and synthesis process, usually finishing a report in under 3 minutes. The output is a nicely structured response with citations, and users can even export the reports as PDF or share them with a link. In other words, it’s basically an AI research assistant accessible to anyone with an internet connection and a question to ask. With all that said, from my own experience of both tools OpenAI’s Deep Research feels like asking a graduate student to produce a report, while Perplexity’s Deep Research feels like the work of a Junior undergrad.

For educators, this is big news. Here we have a freely accessible AI that can do a first pass at gathering information on complex questions. A teacher preparing a new unit on, say, climate change effects could ask Perplexity to “deep research recent studies on climate change impacts in Pacific islands” and get a quick overview with sources to explore further. Students could use it (ethically!) to jumpstart research projects by gather a robust collection of sources. Of course, as with any such tool, we have to stress critical thinking: just because the AI provides citations doesn’t mean it interpreted them correctly. There’s still a need to verify facts and cross-check the AI’s work. But compared to locking this capability behind a corporate paywall, Perplexity’s approach is refreshing and democratizing. It’s also a sign of the competitive pressure in the AI space – OpenAI’s not going to be able to charge premium prices for long if free alternatives perform nearly as well.

The takeaway: Perplexity just made AI research assistants accessible to everyone. This could be a boon for under-resourced schools and curious students. At the very least, it ups the ante for OpenAI, Google, and others to keep innovating (or lower their prices). The AI research war has begun, and in a twist, the scrappy underdog might currently have the more compelling offer for educators.

Anthropic (Claude AI)

Claude 3.7: AI Coding Buddy with “Artifacts”

Anthropic, the company behind Claude, hasn’t been sitting still either. They unveiled Claude 3.7 Sonnet, which they tout as “our most intelligent model to date”. What’s special about Claude 3.7? For one, it’s a bit of a split personality – in a good way. This model can operate in two modes: a fast, near-instant answering mode, and an “extended thinking” mode where it takes its time and shows its step-by-step reasoning. In other words, Claude 3.7 can be both speedy and deep, depending on what you need. Anthropic’s philosophy here is interesting: instead of offering separate “quick” and “slow but smart” models, they integrated reasoning into the same AI. Just like a person can choose to respond off the cuff or pause and think harder, Claude 3.7 gives users that choice in one package. This unified approach means you don’t have to switch bots to get better reasoning – you just toggle Claude into a reflective mode when needed (on their API, you can even set how many “thinking” tokens it uses).

Why does this matter? For one, Claude 3.7 has shown big improvements in coding, science, and math when using its extended reasoning. Early tests (and some proud Anthropic statements) indicate it might be best-in-class for coding assistance, surpassing other models on real-world programming tasks. Anecdotally, users have noted Claude is excellent at debugging and generating workable code. And here’s a feature everyone is loving: Claude’s “Artifacts” feature. This lets Claude output a web-based interactive tool. Essentially, Claude can create persistent pieces of content within the conversation instead of just a wall of text. For example, you can ask Claude to build an interactive timeline of the American Revolution or a simulation of Newton’s Laws of Motion and it produces a shareable web app. It’s pretty cool!

From a teacher’s perspective, Claude 3.7 with artifacts could be a powerful tool for project-based assessment. Teachers can now ask students to create web-based apps as deliverables.

Example 1: A Sleek Homepage

Example 2: A 3D City

Example 3: Ineractive Physics Lab

Claude Code: AI that Codes with You, in the Terminal

Another notable release from Anthropic is Claude Code, a new tool that hints at where AI assistants are going next. Claude Code is essentially a command-line interface (CLI) agent that lets developers interact with Claude directly from a terminal window, asking it to generate or modify code as if you were chatting with a colleague on the command line. Anthropic introduced Claude Code as a “limited research preview” for now, but its significance is big: it shows an AI being invited into the programmer’s natural habitat (the terminal and IDE), rather than being siloed in a chat webpage.

With Claude Code, a developer could type a request like, “Hey Claude, create a simple HTML page with a form and some basic CSS,” right in their terminal, and Claude will output the files and even set up a little project structure. It’s like having a pair-programmer who lives in your computer’s console. Anthropic’s goal here is to let developers “delegate substantial engineering tasks to Claude directly from their terminal,” effectively speeding up development workflows. Need boilerplate for a new app? Just ask Claude Code to do it while you watch. Since it’s Claude under the hood, you get that strong coding ability and the option for the AI to “think more” if needed.

For education, imagine integrating something like Claude Code in a computer science class. Students could use it to generate snippets or get guidance without leaving their coding environment. It could lower the barrier for beginners – they can ask the AI to produce an example and then learn by inspecting and tweaking it, all within their coding tool. There’s also a lesson here: AI doesn’t have to live behind fancy chat UIs. It can plug into the tools professionals (and students) already use. We’re likely to see more of this “embedded AI” in software apps, IDEs, and maybe even in Microsoft Office (they’re already headed that way with Copilot). Anthropic’s early move with Claude Code underscores that future.

In summary, Anthropic’s Claude is quietly becoming a powerhouse for coding and complex reasoning tasks. It might not have the hype of GPT-5 or Gemini, but in practice Claude 3.7 is something educators should keep on their radar – especially for coding education and any tasks where seeing the AI’s reasoning or preserving its outputs would be helpful. And given Anthropic’s more measured, research-focused approach (the company is literally founded by former OpenAI folks focused on AI safety), Claude might also come with fewer surprises and a more “thoughtful” style, which some prefer in an educational setting.

Google (DeepMind)

Gemini 2.0 “Flash Thinking”: AI That Shows Its Work

Google’s AI juggernaut (now a collaboration of Google Research and DeepMind) has been pushing its next-gen model Gemini steadily forward. The latest twist is a feature with a sci-fi-sounding name: Flash Thinking. So what is it? In short, Flash Thinking is an experimental mode of Gemini 2.0 where the AI literally reveals its chain-of-thought to you, in real time, as it figures out an answer. If that reminds you of a student “showing their work” on a math problem, you’re spot on. Google is basically training Gemini to break down prompts into steps and display those reasoning steps to the user. The idea is that by making the reasoning process visible (and by encouraging a step-by-step approach), the model will deliver more accurate and transparent answers, especially on complex problems.

This month Google made Flash Thinking available to consumers as a free experiment in the Gemini app. I’ve given it a whirl, and it’s a trip to see. For example, if you ask a tricky multi-part question, you’ll see Gemini start generating something like “Step 1: Understand the question. Step 2: Gather relevant facts…”. It’s as if the AI is muttering to itself while solving the problem – except you get to eavesdrop on the muttering. And then it finalizes an answer for you. In terms of performance, Google claims this mode delivers better results than the standard Gemini flash mode when solving a complex problem. It’s especially touted for things like math and science questions that benefit from not jumping straight to a conclusion. (Gemini Flash Thinking even reportedly excels at certain math benchmarks now, thanks to this approach.

For educators, Gemini’s Flash Thinking is one of the most intriguing developments because it addresses a long-standing issue with AI in the classroom: lack of transparency. One big hesitation in using AI helpers for students is that they’re black boxes – they give an answer, but you don’t know how or why. With Flash Thinking, an AI is essentially showing its work, which could make it easier to trust and to diagnose where it might be going wrong. A teacher could actually see the steps the AI took to solve a problem and say, “Ah, I see where it took a wrong turn.” Students using it could learn problem-solving strategies by example, watching how the AI breaks a complex task into simpler parts. It’s not perfect (the AI’s reasoning can still be convoluted or even incorrect in parts), but it’s a huge step toward AI that teaches its process, not just its result.

Aside from Flash Thinking, Google’s Gemini 2.0 is now generally available in various flavors: Flash (the fast model), Flash-Lite (smaller, cheaper), and an advanced Pro model for coding tasks. They are integrating Gemini into all their platforms – from the consumer Gemini app (sort of Google’s answer to ChatGPT) to enterprise APIs and even tools like Vertex AI for developers. Notably, Gemini models come with massive context windows (up to 1 million tokens in the Pro version), native tool use (e.g. web browsing or running code), and multimodal capabilities (vision and voice are built-in). This means Google’s AI can do things like read a PDF or analyze an image as part of answering you – which is very relevant for education (imagine feeding it a graph or an article and having it incorporate that into its reasoning).

However, it’s clear Google is playing a bit of catch-up (or at least, don’t-get-left-behind) relative to OpenAI. Their “Deep Research” counterpart (sometimes called Gemini Deep Research) was launched late last year as part of the $20/month Google Advanced suite, but, frankly, it hasn’t made waves. As mentioned earlier, its performance on a major test was mediocre, and Google hasn’t heavily promoted it since. Instead, they seem to be leaning into their strength: reasoning with transparency, plus that integration into the Google ecosystem (search, Android, etc.)

Mistral AI

Small Models, Big Dreams (a New Challenger Approaches)

Mistral’s logo hints at its ambition: the letter “M” constructed from tiny building blocks – a nod to their focus on small, efficient models punching above their weight. Among the AI giants, let’s not overlook the feisty startup Mistral AI, which is quickly becoming the darling of the open-source AI world. If OpenAI and Google are in an arms race for the biggest, baddest model, Mistral is trying to change the game by doing more with less. The company (based in France) believes smaller models can be just as powerful if trained cleverly – and they’re walking the walk. Back in late 2023, Mistral released a 7B parameter model that shocked everyone by matching or exceeding some 13B+ models, and they open-sourced it for anyone to use. Fast forward to now, and they’ve unveiled Mistral Small 3, a model with 24B parameters that they claim “matches the performance of models three times its size.” In fact, they boldly stated it’s on par with Meta’s Llama 3 70B model – which is a huge assertion. It runs faster and cheaper, processing 150 tokens/sec and achieving about 81% of the top-tier accuracy on standard benchmarks. Crucially, Mistral Small 3 is released under an open Apache 2.0 license, meaning schools or developers can use and even fine-tune it without legal hassles.

Why does this matter? Because if Mistral’s approach works, it democratizes AI. You wouldn’t need Microsoft Azure or Google’s TPU pods to run a powerful model – maybe a decent PC or a single server could suffice. For schools, that could mean local AI models running on-premises, under your control, with no need to send data to third-party clouds. Privacy win, cost win. It also means the barrier to entry for AI development gets lower – more startups (or heck, research labs at universities) can train capable models without $100M budgets.

Thanks for reading Fullstack Educator! This post is public so feel free to share it.

Mistral isn’t just tossing models over the wall; they’re also building user-facing products. This month they launched Le Chat, an AI assistant/platform that leverages their models and is directly competing with the ChatGPTs and Claudes of the world. And they’re doing it in style. First off, Le Chat is fast. The company boasts it’s the “fastest chat assistant” out there, capable of spitting out up to 1000 words per second with its responses. They call this capability “Flash Answers,” and using it feels almost instantaneous – blink and the answer’s there. Speed may seem like a gimmick, but in practical use, it changes how you interact (no more twiddling thumbs waiting for a long answer to render).

But Le Chat isn’t just speedy. It’s trying to be smart and well-informed. Under the hood, it combines Mistral’s model (with its strong pre-trained knowledge) and live information retrieval from multiple sources – web search, news sites, social media, you name it. The result is an assistant that doesn’t just rely on a training dataset; it actively pulls in up-to-date info and even claims to balance it across diverse sources (to avoid just parroting one perspective). Mistral emphasizes “robust journalism” as a source, which hints they are trying to give factual, evidence-based answers by default. It’s like an AI with a built-in research librarian.

On top of that, Le Chat comes with all the bells and whistles modern AI users expect: you can upload documents or images for it to analyze (it has top-notch OCR and vision models for this); it has an in-place code interpreter so it can run code or data analysis within the chat (sound familiar? similar to ChatGPT’s Code Interpreter); and it even has integrated image generation (using a model called Flux, not that the name matters). In short, Le Chat is an AI Swiss Army knife – capable of answering questions, doing research, generating creative content, analyzing files, coding, and more, all in one interface. And here’s the kicker: most of it is free. Mistral is following a freemium model where many of their features are available to everyone at no cost. They only charge a modest $14.99/mo for higher usage limits or team-oriented features. This is in stark contrast to, say, OpenAI’s $20 for basic Plus access or the even pricier plans we discussed.

For educators and school leaders, Mistral’s Le Chat is an intriguing new option, especially if budget and data privacy are concerns. A free or low-cost AI assistant that can be deployed even on mobile devices (yes, they have iOS and Android apps now could make AI more accessible across a district without breaking the bank. Its super-speed means it could be used live in classrooms without slowdowns. And the fact it’s grounded in real-time info might reduce instances of outdated or wrong answers (though we’d have to test that). Mistral’s DNA is open-source, which usually means a bit less cautiousness in content moderation. So far, they haven’t gotten in any big trouble, but schools would need to vet the outputs.

Nonetheless, the broader point stands: the AI field is not a two-horse race. Innovators like Mistral are pushing from below, offering alternatives that are faster, cheaper, and more flexible. For schools, this could be fantastic – more choice means better bargaining power and the ability to pick a solution that fits your needs (be it on-premise for privacy, or a free service for equity). Mistral’s tagline is “Frontier AI in your hands,” and they’re actually trying to deliver that. Keep an eye on these folks – they might soon provide the AI engine in everything from your phone to your smartboard, without you even knowing.

xAI (Elon Musk’s Grok)

Grok 3: An Unfiltered, Unhinged AI – Hype vs. Reality

Leave it to Elon Musk to take AI off the rails – possibly literally. His AI venture xAI has been developing Grok, a chatbot that he promised would be a “maximum truth-seeking,” rebellious alternative to ChatGPT. What does that mean in practice? Well, apparently, it means an AI that will do things most others wouldn’t dare. In the latest update, Grok 3 introduced a Voice Conversation mode with multiple personalities, and these are not your grandma’s Alexa personas. One voice setting is actually called “Unhinged”, and it will scream at you, hurl insults, and act downright crazy if provoked. Another persona, charmingly called “Sexy” mode (18+), which engages in full (graphic) erotic roleplay – basically turning Grok into an AI phone sex operator. There’s also a conspiracy theorist mode (for all your Sasquatch and alien needs) and an “Unlicensed Therapist” that offers dubious counseling. In short, Grok’s new abilities read like a lineup of a shock-jock radio show: deliberately controversial and likely to grab headlines.

And headlines it did grab. Clips circulated of Grok’s Unhinged mode letting out a 30-second blood-curdling scream when a user kept asking it to be louder. In Sexy mode, users reported the AI whispering explicit content that no other mainstream chatbot would dare utter (OpenAI and Google would have a heart attack). Musk’s intention here is clear: he wants Grok to be seen as the uncensored, “tell it like it is” AI, free from the shackles of what he calls “politically correct” tech company guardrails. In his view (and indeed there’s an audience for this), AI like ChatGPT have been censored – they refuse to discuss certain topics, they won’t engage with edgey content, etc. Grok is his answer: an AI that’s edgy and not afraid to offend. Even the name “Grok” comes from a sci-fi novel (“Stranger in a Strange Land”) and implies a deep or intuitive understanding – Musk positioning it as the AI that truly “gets it” (with a wink).

Now, let’s talk about the reality and the educational perspective. Is Grok actually a great AI assistant, or just a great gimmick? So far, evidence leans toward the latter. Underneath the wild personas, Grok’s core smarts don’t obviously outperform the likes of GPT-4 or Claude. In fact, when it launched in late 2023, testers found Grok’s answers inferior in quality and accuracy compared to GPT-4. It could access real-time info from X (Twitter), which was neat, and it had a snarky style, but it wasn’t magic. The new version 3 might be improved, but xAI hasn’t released any truely convincing benchmarks to prove it’s intellectually on par with the top models. Instead, the emphasis is on how uncensored it is. That in itself is a double-edged sword: yes, Grok will role-play as Hitler or produce explicit content – things other AIs would refuse – but is that actually useful for anyone? In an educational context, I would emphatically argue no – or at least, it’s deeply problematic. The “Sexy” mode might be novel, but do we want students (or anyone) engaging with an AI in that way? The “Conspiracy” mode might happily spout pseudoscience – directly at odds with using AI to help teach critical thinking or accurate content. Even the “Unhinged” screamer, while a funny demo, has no real utility beyond entertainment shock value.

Critics are already pointing out the concerns. Allowing an AI to curse, scream, and indulge in hateful or erotic language breaks the norms that other companies have set for safety. There’s a reason OpenAI and others put guardrails: to prevent harassment, protect minors, avoid reinforcing bad behavior, etc. Grok throwing those out raises the risk of harm. For instance, an “Unlicensed Therapist” AI giving a troubled teen harmful advice could be downright dangerous. A conspiracy-loving AI could amplify false beliefs. Musk seems to be betting that users want this freedom – and some segment might – but it’s a very adult experiment and one laden with liability issues.

From a more technical standpoint, one has to wonder if Grok’s razzle-dazzle is covering up a lack of real progress. It’s like a student who isn’t top of the class in academics but is the class clown to get attention. By making Grok do outrageous things, xAI ensures we talk about it, rather than compare its IQ to GPT-4o, GPT-o3-mini-high, and Claude 3.7 Sonnet (where it might fall short).

Bottom line: Grok is a fascinating case study in “what if we let an AI say anything?” It’s arguably not a tool for education – if anything, it’s a cautionary tale for educators about the extremes of AI. In a school setting, you’d likely ban Grok’s use outright given the lack of filters. However, it does provoke useful discussion: What do we want our AI assistants to sound like? Is there value in an AI that can be a bit irreverent or emotional? Could some of Grok’s personas, in a toned-down form, make AI more engaging (imagine a history tutor AI that role-plays as a historical figure with some personality)? Possibly – but Grok has dialed it up to 11, well past the point of educational appropriateness.

On a more sober note, the emergence of Grok reminds us that AI development is diverging in multiple directions. While some are going for safer, more controlled AIs for work and school, others are exploring the free-speech-at-all-costs route. It will be important for educators to be aware of these differences. Not all “AI tutors” are created equal – a “helper” like ChatGPT is programmed to refuse certain content and maintain a polite tone, whereas Grok might gleefully violate all those norms. As AI becomes more prevalent, teaching AI literacy will include teaching students that the source and design of an AI matters greatly. Who made your AI and what are its goals? In Grok’s case, the goal seems to be “get attention and push boundaries” more than “provide reliably correct answers.” For a school or any serious knowledge work, that’s a huge red flag.

Wrapping Up: The State of AI (Early 2025)

We’ve covered a lot of ground: OpenAI’s near-term plans and pricey research concierge, a challenger offering research tools for free, Anthropic’s thoughtful coder assistant, Google’s transparency push, an open-source upstart revolutionizing speed and access, and the renegade that is Grok. It’s an exciting and occasionally bewildering landscape. For educators and school leaders, the key takeaway is that AI is not monolithic. Each of these developments could impact education in different ways. Some (like OpenAI’s GPT-4.5) might just quietly make your existing tools a bit better. Others (like Perplexity’s free research mode) you might want to adopt to empower students and teachers right now. Some (Claude’s artifacts, Gemini’s reasoning display) point toward a future where AI is more collaborative and transparent – features to wish for in EdTech software. And a few (looking at you, Grok) serve as warnings of what to be cautious about.

The tone from here on out is set: 2025 will be a year of rapid AI iteration. The best way to understand the state of AI, is to actively use it. So play around. See what works, see what fails, and share those insights. My bet is that by mid-year, we’ll be talking about GPT-5, perhaps a Gemini update, a few early stage AI agents, and who knows what else – and they’ll again change the game. Stay curious, stay critical, and don’t be afraid to experiment a little. The AI revolution in education isn’t coming – it’s here – and with a discerning approach, we can harness it to make learning more powerful (while avoiding the pitfalls). Until next time, keep exploring!