AI Agents in Action

My Hands-On Experiment with OpenAI’s Operator

Feb 25, 2025

If you feel like AI news has been on a rollercoaster lately, you’re not alone. We’ve been riding an AI hype cycle that has everyone talking – and educators are no exception. A new buzzword in this cycle is agentic AI: not just chatbots answering questions, but AI that takes action on our behalf. Think of software that can browse, click, and type for you, almost like a virtual intern. It sounds like science fiction, and to be fair, the technology is still finding its footing. Analysts note that agentic AI is in its early stages and is still immature in many ways. Yet, it’s advancing at breakneck speed.

In fact, we’re already seeing rapid progress from simple email helpers to more autonomous calendar scheduling agents, all of which are now offered through email clients like Superhuman and Shortwave. OpenAI recently released an experimental agent called Operator, which they claim can handle simple browser-based tasks, like ordering groceries for pickup or delivery. The pace of development is dizzying – new capabilities seem to emerge every month – and it’s not too soon to start paying attention.

So, with equal parts skepticism and curiosity, I decided to put Operator to the test. Would it live up to the hype and actually help with a real-world task, or would it reveal the cracks beneath the buzz? To find out, I gave Operator a challenge that many of us in education and research can appreciate: a tedious data-gathering project that I really didn’t want to do by hand. Here’s how that experiment unfolded.

Experiment Setup

For my test drive of Operator, I chose a task that was both practical and a bit ambitious: gathering financial data for 100 independent schools. Why this task? In the world of independent (private) schools, strategic decisions often rely on understanding finances – budgets, fundraising, endowments, expenditures - the financial strength of peer institutions. Much of that information is publicly available through IRS filings (Form 990) since most independent schools are nonprofit organizations. ProPublica is an organization and website that hosts this information in a handy web database called Nonprofit Explorer, which includes summary financial data from those tax returns. In other words, the data was out there on the web, but collecting it into one place would normally mean a lot of repetitive clicking and copy-pasting – exactly the kind of drudgery we hope AI might handle.

The plan was straightforward on paper: have Operator pull key financial metrics from ProPublica’s Nonprofit Explorer for 100 schools and record them in a Google Sheet. I prepared a blank Google Sheet with nothing but a list of the schools’ names (which I got from an o3-mini-high query). Operator would need to:

Search for each school on ProPublica’s Nonprofit Explorer site.
Click on the school’s entry to find the summary financial data (like total revenue and assets for the latest year).
Copy those numbers and paste them into the Google Sheet in the row for that school.

It’s the kind of rote web-based task that could eat up hours of my time – perfect for an AI agent if it actually works. The rationale was that this experiment would simulate an actual strategic analysis prep: before you can analyze or compare schools’ financials, you need to gather the data. Usually, I might write a custom script if I felt like coding. But here, I wanted to see if a no-code AI agent could serve as my data collector in the background while I focused on other work (or maybe just sipped my coffee).

With the stage set—Google Sheet ready, a list of 100 school names in hand, and Operator at my command—I kicked off the experiment.

How Operator Performed

As I set Operator loose on the task, it felt a bit like delegating to a very diligent but junior assistant. I gave it initial instructions explaining the goal (“find financial info for each school from this site and log it in the sheet”) and watched as it began clicking through web pages. Operator uses its own built-in web browser to navigate and act, essentially taking over the mouse and keyboard virtually. Sure enough, it started handling the typing, clicking, and scrolling all by itself. I could literally see it search for a school name, click the relevant result, and attempt to copy the numbers from the page. When it encountered something unexpected – say, multiple organizations with similar names – it would pause and then make a decision before proceeding. I was hopeful that it was doing some advanced thinking in the background, but it made mistakes and chose the wrong organization multiple times.

The data collection process wasn’t lightning-fast. In fact, watching Operator work was a lesson in patience. It navigated the task methodically, but sometimes quite sluggishly, as if each page load and copy-paste took just a beat longer than a human might take. Multiply that by 100 schools, and you get the idea – this was no instant scrape. At times, I caught myself hovering over the computer, wondering if I should take control to speed things up. But the whole point was to let the AI do its thing, so I resisted the urge to intervene (unless necessary) and let Operator churn away in the background.

Over the course of the run, a few inefficiencies and errors became apparent. For one, after the first few schools, Operator began rounding all of the numbers it was working with, which had a tendency to introduce slight rounding errors in the numbers. For example, a school’s revenue might be listed as $12,345,678 on the website, but the agent would record it as $12,345,680 or $12 million inconsistently. Since the rounding was not consistent, the data was less useful for comparison purposes. Another issue was misidentification: as I mentioned above, on a few occasions, Operator grabbed data for the wrong organization – usually when a school’s name was very common or ambiguous. This felt similar to supervising a new hire: I couldn’t altogether leave it unattended.

Operator is amazingly capable in that it truly can autonomously navigate a browser and handle multi-step tasks. However, it’s also not 100% reliable. I had to keep an eye on it and witnessed many mistakes. In terms of efficiency, Operator took longer to gather some data than I might have taken doing it manually (especially with its cautious pace). Yet, the big win was that Operator did all this while I was doing other things. It was like having a background process running – I could answer emails, work on a report, or step out for a quick break, and the data collection continued without my direct involvement. With that said, it took SO LONG that I ended up stopping it after it collected data for about 25 schools.

Thanks for reading Fullstack Educator! Please share this article with a friend!

Post-Processing with the o1 Model

Gathering the data was the focus of the experiment. But I decided to add a step. The next phase was to actually analyze and make sense of the schools’ financial figures that Operator collected. For this, I decided to try another cutting-edge AI tool: OpenAI’s recently updated o1 model. If Operator was my data-collecting intern, o1 would be the data analyst I’d hand things off to. The o1 model is part of a new class of AI that OpenAI designed explicitly for more intensive reasoning tasks. Unlike the usual ChatGPT-4o (which is pretty fast and fluent but can be shallow at times), o1 is designed for tasks that require complex reasoning, like analyzing data. In theory, o1 might draw deeper insights from the data – maybe even generate some nifty analysis – albeit at the cost of being slower to produce results.

For my purposes, I wasn’t going to push o1 to its theoretical limits—I just wanted it to help crunch the school finance data and perhaps visualize it. I took the spreadsheet that Operator had populated and fed it into the o1 model through a copy and past. Then, I asked o1 to analyze the data and generate an interactive dashboard that would let me explore the numbers easily. Making an interactive dashboard typically requires programming or using a BI tool. But I was curious to see how far AI assistance has come: could o1 basically become my data analyst and data visualizer in one?

Amazingly, o1 did produce something quite useful. It wrote some code inside of a Canvas to generate an interactive chart that I could open in my browser. The result wasn’t a polished Tableau or PowerBI dashboard, but it was usable! I could, for instance, toggle to select different financial metrics (revenue, expenses, assets, etc.) and see a bar chart comparing the schools on that metric. You can see an example in the video. One chart let me rank the schools by total revenue, making it easy to spot who the heavyweights were. With this dashboard, I found myself quickly identifying outliers (oh look, that school has an endowment many times larger than the others, and this school spends unusually more on a per-student basis). It turned a raw spreadsheet into a story, the kind of data exploration that would have taken much longer if I were stuck sorting rows and creating charts manually.

The fact that o1 could help build an interactive exploration tool felt like a glimpse of the future. It wasn’t perfect—the dashboard’s design was pretty basic, and I had to tweak a couple of the automatically generated labels—but the heavy lifting of coding was handled by AI. This meant I could focus on interpreting the data rather than wrestling with chart libraries. It’s a big deal when an educator or researcher without a coding background can simply ask an AI to “show me the data this way” and get a usable visual.

With a shiny new dashboard in hand, I then faced a question: Should I share this with others? On one hand, I wanted to show off the results (both to colleagues and perhaps publicly since it was a cool proof-of-concept). On the other hand, we’re dealing with real schools’ financial data. Yes, it’s all public information – anyone could go on ProPublica’s site and look up each Form 990 – but aggregating and presenting it raises ethical considerations. Just because data is publicly available doesn’t automatically make it polite or wise to broadcast it widely. I had to think about context and consent. If I publish a dashboard highlighting that School X had a big deficit last year, how might that be perceived? Could it be misleading without additional context from the school? Could it ruffle feathers in the independent school community? These questions gave me pause.

In the end, I decided not to share the dashboard. However, you can briefly see what it looks like in the video embedded in this post.

My hands-on experiment with Operator (and o1) left me with a mix of excitement, caution, and plenty to think about. On the one hand, AI agents like Operator are already surprisingly capable. I effectively had a tireless (if sometimes clumsy) research assistant at my disposal, one that gathered a substantial dataset while I focused elsewhere. It’s not hard to imagine how, in the very near future, such agents could become routine tools in schools and research offices. We might soon have AI “colleagues” that handle everything from compiling financial reports to scheduling meetings or scouring archives for relevant literature—the mundane yet essential tasks that consume so much of our time. For educators and school leaders, this could mean reclaiming hours in the day to spend on higher-level thinking, relationship-building, or creative work. The productivity implications are immense; indeed, some predict a “digital divide” emerging between those who leverage AI for knowledge work and those who don’t or can’t, giving the adopters an advantage. In other words, if you have an AI agent at your side and I don’t, you might lap me in terms of output and insights.

On the other hand, the experiment also underscored that we’re not close to a point of AI autonomy just yet—and perhaps that’s a good thing. Operator needed extensive supervision and course correction. The o1 model, while smart, still required me to guide its analysis and verify its results. These tools are powerful but also error-prone. The implication for educators and researchers is that our role is evolving, not disappearing. We might spend less time on drudgery but more time on oversight, critical thinking, and fine-tuning what the AI produces. In a sense, we become project managers for our AI assistants: deciding what tasks to delegate, checking the work, and integrating the results into the bigger picture. This collaboration between human and AI is where the work lies for early adopters and future users. I felt that firsthand – without my intervention at key points, the data would have been messier. That’s a lesson worth sharing: even as we embrace agentic AI, our professional judgment and expertise remain irreplaceable… for now.

Looking ahead, I’m optimistic about the near future of agentic AI in education and research. The experience I had—with an AI agent doing hours of work for me—might soon be commonplace. Today, it was fetching financial data; tomorrow, it could be compiling lab results, gathering community surveys, or monitoring campus systems for routine maintenance needs. We’ll likely see rapid improvements in these agents’ accuracy and speed. The errors and slowdowns I encountered could be ironed out with the next model update (or the one after that). It’s very plausible that within a year or two, an Operator-like tool will execute such tasks flawlessly at ten times the speed, entirely in the background, and notify you only when it’s done or if a critical decision point arises.

Anthropic, the company that runs the AI model Claude, recently released this timeline for their AI models. It shows that they expect to have function agentic models available within this calendar year. Who knows if that’s realistic, but it’s interesting to see it in writing.

Anthropic. (2025, February 24). *Claude 3.7 Sonnet and Claude Code*.

For those of us in the education sector, the impact could be profound. Imagine research departments that can do months of analysis in days or administrative teams that rely on AI agents to prepare first drafts of budgets and reports. The provocative question is: are we ready for that shift? It requires trust in technology (which itself requires lots of discussion and caution), new skills (like crafting good prompts or verifying AI outputs), and a willingness to rethink traditional workflows. Some tasks that used to be rites of passage for junior staff might be entirely automated. How do we train future school leaders in an age when an “AI intern” is always on call? These are the conversations we need to start having now… along with what aspects of education are core to the human experience and should never be lost

In the end, my little adventure with OpenAI’s Operator was both humbling and inspiring. Humbling, because it exposed the current limitations—reminding me that, no, AI hasn’t completely revolutionized everything yet. Inspiring because even in its nascent form, this tool delivered real value and a glimpse of how our work could be transformed. The takeaway for the Fullstack Educator community is this: don’t dismiss these AI agents as mere hype, but don’t blindly trust them either. They are becoming capable co-pilots, and the educators who learn to fly with these co-pilots will likely be ahead of the curve in preparing their students for the future (even if they teach their students why NOT to use them at school). As always, our job is to critically evaluate new tech and harness it for the good of our students. Agentic AI is knocking on the door—it’s up to us to open it thoughtfully, with eyes wide open to both the opportunities and the pitfalls. One browser bot collecting school budgets isn’t going to change the world, but it sure felt like a step into the future of how we might work, and that future is coming at us fast. Are we ready to embrace our AI assistants? The experiment continues…

Sources

Anthropic. Anthropic. Retrieved February 24, 2025, from https://www.anthropic.com/

Anthropic. Claude 3.7 sonnet. Retrieved February 24, 2025, from https://www.anthropic.com/news/claude-3-7-sonnet

OpenAI. Introducing operator. Retrieved February 24, 2025, from https://openai.com/index/introducing-operator/

OpenAI. O1 and new tools for developers. Retrieved February 24, 2025, from https://openai.com/index/o1-and-new-tools-for-developers/

OpenAI. OpenAI. Retrieved February 24, 2025, from https://openai.com/

OpenAI. OpenAI O3 mini. Retrieved February 24, 2025, from https://openai.com/index/openai-o3-mini/

OpenAI. Operator system card. Retrieved February 24, 2025, from https://openai.com/index/operator-system-card/

Operator ChatGPT. (n.d.). Operator ChatGPT. Retrieved February 24, 2025, from https://operator.chatgpt.com/

Discussion about this post

Ready for more?