AI agents: experiences with Manus, Operator and more

AI agents: experiences with Manus, Operator and more
Photo by Marek Pavlík / Unsplash

AI agents / agentic AIs have been rapidly evolving, and I think they're going to play a pivotal role in our path towards AGI.

I think before we move further, it's helpful to talk about what we mean by AI agents in the first place.

a black and white photo of a geometric object
Photo by Kanhaiya Sharma / Unsplash

What are AI agents?

Agentic AI is a term that's thrown around a lot these days, often as a marketing term. Let's clarify what it means.

A pretty good definition is when you're able to delegate multi-step tasks to an AI. That is, you're moving beyond a simple input <> output flow. Traditional (well, at this point anyway) chatbots like ChatGPT generally have worked in a question answering format, you send a message, get a message back. It couldn't do things before answering you, and it couldn't perform actions on your behalf.

Then we gave the AI tools, starting with the ability to perform web searches. Now, the AI could decide when it needed to reference external sources of information to better answer your query, and instead of just a single request to response, the AI could perform multiple steps internally before replying to you. Do one search, maybe another, think about the results, then articulate a response.

Even without increasing the raw model intelligence, this made AI models more capable and useful in a broader range of scenarios and use cases. The idea is basically this: Giving AIs access to tools and resources makes them broadly more capable and useful.

I like to think of this in terms of my favourite sea creature, the octopus.

Octopuses are considered to be very smart creatures. Part of the reason we consider them intelligent is their ability to use tools. (We also measure the progress of ancient human civilisations partially based on what kind of tools they used) Using tools allows them to interact with objects and their environment in ways that enable them to achieve things that raw intelligence can't. There's thinking, and there's doing. Intelligence and the ability to use tools compound each other.

When humans started harnessing fire, made the wheel, discovered electricity, our development as a species skyrocketed. Each tool didn't just add to our capabilities - it multiplied them. I think agentic AI represents a similar inflection point. In its early stages, it might be underestimated by some, but it could unlock a quantum leap for AI capabilities.

Foundation models vs. Agentic architecture

One fascinating observation is that the performance gains from agentic architecture far exceed what we get from model improvements alone. Upgrading from GPT-4 to Claude 3.5 might give you a 10-30% improvement, but adding agentic capabilities can give you 10-30x improvements. The foundation model might set the ceiling to what can be accomplished, but the agentic architecture determines how close you can get to that ceiling.

woman in black and white dress sitting on bed
Photo by Barbara Zandoval / Unsplash

My personal experiences with agentic AI

OpenAI Operator: The promise that underwhelmed

Rating: 4/10
Key Strengths: Autonomous multi-step browsing and web-page interactions
Weaknesses: Can't handle complex tasks, shallow outputs, issues with interpreting queries

A few months back, I started playing around with OpenAI Operator. 

Operator is OpenAI’s browser use research preview. It can do multi step tasks, like check dozens of hotel listings to find you something in your budget, aligned with your preferences during your travel window. It can interact with these sites, not just read them. It can click buttons, fill forms, and more. 

The promise is amazing. I could tell it to find relevant Reddit threads for me to comment about my products on, or to go through my emails and shortlist the most relevant ones. 

This is the theory, but OpenAI fails to deliver substantive results. (yet) 

If you have tried it and been underwhelmed, I wouldn’t blame you. The blame lies with OpenAI. But don’t let experiences with Operator shade your perception of the potential of AI agents. 

Operator ends up being lazy, putting in the bare minimum of effort, misunderstanding requests, or misinterpreting the outcomes of its research.

I hope it gets better over time, in its current form its not super useful, more of a "fun toy".

Note: I started writing this post almost two months ago. Over the past month, there have apparently been a few updates to Operator, and I expect the product has improved. It is still limited to browser use only though.

Manus: Everything Operator promised to be and more

Rating: 7/10
Key Strengths: Multi-step expanded research
Weaknesses: Code generation, sometimes cost

Manus is everything Operator promised to be and more. Where Operator feels lazy and superficial, Manus demonstrates genuine understanding and persistence. It will work through tasks over many more steps than Operator, tackling genuinely complex challenges.

Manus also has access to more than just a browser. It has a code editor, it can read and write files, run code, and perform API-driven searches (Operator's browser-only approach is inherently slower). Perhaps most impressively, it uses multiple agents and can decompose your task into subtasks. I'm not certain, but I think it sometimes even runs these subtasks in parallel.

I've given it genuinely challenging tasks. For instance, I told it to:

  • Study my products, personal blog, and product blogs
  • Research my writings across the internet
  • Understand the SEO landscape for my products
  • Do keyword research
  • Find the intersection between my interests and content that could generate revenue
  • Create a content calendar

This task took dozens of steps, over 30 minutes, and cost me around $10 to run. You might find that expensive, or you might look at what it would cost to get another human to do the same depth of work and find it insanely cheap.

I've been very impressed with Manus. While I got their $40 subscription, I've already topped up my account three times. The results speak for themselves.

There are certainly tasks that Manus fails at. When generating websites, it often fails to achieve very good results, in particular. Replit does a much better job with that. Manus will do good research but spit out broken code. The coding subagent seems to lose sight of the bigger picture. All this to say, it's not a perfect tool, it's not right for every use case, but still, when it works, it can be magical.

Devin: The AI software engineer

Rating: 4/10
Key Strengths: Fully autonomous, hands-off code changes
Weaknesses: Can't handle complex tasks, cost

Devin is an AI software engineer. Not super capable just yet, but what drew me to it is more the promise of whats possible and where it's headed.

Devin can do multi-step tasks and can test its changes in a browser, run linting and unit tests, and all of this is done while you are largely hands-off for the most part.

You ask it to do something via a Slack message, and in 10-30 minutes (depending on project and task complexity), it'll message you back with a GitHub PR link.

Right now, in my experience it fails to do anything non-trivial, but sometimes that trivial stuff is exactly what gets pushed off and procrastinated on, so it's good to be able to pawn off those boring tasks to Devin.

Devin gets very expensive very quickly though. Nowhere comparable to a human engineer, but the costs rack up quickly for relatively simple tasks, so keep that in mind.

Replit: Full-stack from natural language

Rating: 7/10
Key Strengths: Fully autonomous app development with end-to-end integration
Weaknesses: poor customer support, not production-ready

Replit Agent was my first taste of AI agents back in 2024.

It's a great product that can develop and deploy full stack web apps, truly end to end. That means it can set up a DB, environment variables, servers, everything you need to go from idea to live on the internet.

My experiences with Replit the company have been disappointing due to poor level of support, unexpected high billing that shouldn't have happened, and I suffered a total data loss once when the agent decided to just run a DROP TABLE command on my production DB and then Replit told me they don't maintain backups.

All that said, in terms of creating functional web apps, it does a superior job to Manus and Lovable, other tools in this category I've tried.

I've used it to create web apps to automate boring, loathesome tasks that I always wanted to automate but would never have put in the effort to code them myself.

That low effort, hands-off way to building apps truly reduces the friction in a way that opens new possibilities.

Cline: The sweet spot of human-AI collaboration (and also Claude Code)

Rating: 8/10
Key Strengths: Superior code quality, highly capable, great UX
Weaknesses: API costs rack up quickly, requires more user involvement

Cline is something that's more "semi-agentic", as it's tightly human-in-the-loop. It is a far superior version of Cursor in terms of quality and flexibility of output, and I think it showcases how a tight collaboration between humans and AI can achieve superior results to either approach alone.

Cline can do multi-file edits and multi-step reasoning. It can decompose tasks, auto-correct mistakes and linting errors, run unit tests and actual browser tests (rudimentary).

It's cut my dev time by a good 70%, I'd wager. At least when it comes to the actual writing code part. Even for planning, architecture, etc., it can be an excellent thinking partner.

Claude Code

Rating: 9/10
Key Strengths: Fixed cost, very active development, very capable agentic coding tool powered by the best coding models
Weaknesses: Cost with API, lack of GUI, requires lot of supervision and user involvement

Lately I've switched to Claude Code. I found myself getting anxious watching the API cost tick upwards with each request with Cline, and Claude Code is now bundled into Claude Max for a fixed cost. It works very similarly to Cline, just as a CLI tool instead of GUI. But their recently added VSCode integration bridges that gap nicely.

Everything that applies to Cline above, applies to Claude Code. It reduces friction a lot and enabled me to build multiple apps from scratch within a week (due to combination of its capabilities + fixed cost). In general, I think fixed costs enable more experimentation and creative exploration freedom. LLM subscriptions vs. APIs value for money See this post for more on the value for money.

OpenAI Codex

Rating: 5/10
Key Strengths: Fully autonomous, hands-off code changes, integration in mobile app
Weaknesses: Can't handle complex tasks, follow ups are slow

Codex is an OpenAI product that is similar to Devin. I have not used it much yet, but it's integrated into their mobile app and can go from message to pull request just like Devin.

From what I've heard, it's also good for simple requests and not complex things, where Cline & company still shine. But the low friction way to do things can feel very freeing.


What makes an AI agent great?

There are many agentic AI tools out there. Some good, some bad.

So what makes the good ones good?

I think the quality and assortment of tools available to the agent is one crucial part of it. It's what makes Manus so much more capable than Operator.

Secondly, I'd say it's the model. Manus uses a mixture of models approach, with the setup relying on Claude and multiple fine tuned Qwen models. So they've fine tuned smaller models for specific tasks, and in general it's widely reported that fine tuning a smaller model for a specific task can let it beat much larger general purpose models with a healthy margin.

I think this mixture of models approach is important, it's about using the right tool for the job. Different models excel in different areas, and we should leverage their strengths.

Conclusion

We are at an inflection point in AI capabilities. Looking back at all these tools, the pattern is clear. We've moved from AI that can answer questions to AI that can actually do things. That's a massive shift that is hard to overstate.

Sure, we're early. Operator is frustrating, Devin can't handle complex tasks, Replit might nuke your DB, Manus might generate broken websites, etc. But even with all these limitations, these tools are letting me get real work done faster than ever before.

What excites me the most here is that this is just the beginning. As models improve and smarter agentic architectures are developed, the improvements will keep on compounding.

When tools can spin up tools, agents can spin up specialized sub-agents (already happening in Claude Deep Research), and the friction of going from idea to execution approaches zero, the world will be forever changed. We are marching into a future beyond our wildest imaginations.

Have you used any agentic AI tools yet? If not, give one of the tools listed above a shot and you might be surprised at what is already possible.