Back to Blog

Testing my website with Gemini 2.5 Computer Use and Playwright

2025-10-075 min read

Google released Gemini 2.5 Computer Use today. It's a model that can look at screenshots and tell you where to click. Like, actually look at a webpage and understand "the blog link is in the top navigation."

I have 8 languages on my site. Sometimes when you switch languages, the page loads empty. No error, just blank. I've been trying to catch when this happens, but testing manually is a pain.

So I decided to let the AI do it. With Claude Code's help, I got a working test in under 15 minutes. I just gave it the Gemini 2.5 Computer Use docs and told it what I needed. It wrote the test.

How it works

The Gemini 2.5 Computer Use model looks at screenshots and generates function calls. The code captures a screenshot of the browser, sends it to the model, the model analyzes it and returns a function call like "click_at" with coordinates.

The code executes that function call (using Playwright). After clicking, it captures a new screenshot and sends it back to the model as a function response. The model uses this to decide the next action.

I wanted to test if my translations actually work. So I told it:

  • Go to the blog page
  • Click on the first blog post you see
  • Remember the title and first paragraph
  • Switch language to Spanish, German, and French
  • For each language, check if the content changed and is not empty
  • Report any translation issues found

Setting it up

You need three things:

The code is basically a loop:

while (!done) {
// Send screenshot to AI
const response = await client.models.generateContent({
model: 'gemini-2.5-computer-use-preview-10-2025',
contents: [screenshot, task],
});

// Get the click coordinates
const action = response.candidates[0].functionCall;

if (!action) {
done = true; // AI says it's finished
break;
}

// Click where the AI said to click
await page.mouse.click(action.args.x, action.args.y);

// Take new screenshot
screenshot = await page.screenshot();
}

The model returns coordinates on a 1000x1000 normalized grid. You scale them to your actual screen dimensions. I'm using 1440x900 (the recommended size).

const actualX = Math.floor((x / 1000) * 1440);
const actualY = Math.floor((y / 1000) * 900);

Watching it run

I ran it with the browser visible. It's kind of wild to watch.

Turn 1: Looks at my homepage. "I see a Blog link at the top." Clicks it.

Turn 2: On the blog page. "First post is about Kindle AI." Clicks it.

Turn 3: Reading the post. Remembers the title: "How to use AI directly on your Kindle."

Turn 4: Finds the language switcher (that little globe icon). Clicks it.

Turn 5: Menu opens. Clicks "Español."

Turn 6: Spanish version loads. "Content changed. This is working."

Turn 7: Switches to German. Clicks the language switcher again.

Turn 8: Page loads in German. "Wait, this page looks almost empty. Just shows a dot. Content is missing." Stops immediately.

It found the bug (you can read more about what caused the empty translated pages here). Then it generated a detailed report:

📊 FINAL REPORT
⚠️ Found 1 issue(s):

[ES] - 1 issue(s):
Post: Cómo usar IA directamente en tu Kindle
URL: .../traer-ia-a-kindle-como-construi-chatgpt-para-lectores-electronicos
Issue: Page loaded but content area is empty, just showing a bullet point.
Steps: Homepage → Blog → Clicked post → Content missing

The AI caught the exact issue I was looking for. It even described what it saw (empty page with just a dot) and listed the steps to reproduce.

Problems I hit

First run, I got a 503 error. The API is in preview and gets overloaded. Ran it again, worked fine.

The model supports 13 actions: open_web_browser, wait_5_seconds, go_back, go_forward, search, navigate, click_at, hover_at, type_text_at, key_combination, scroll_document, scroll_at, and drag_and_drop. I only implemented 7 for this test. For a real test suite you'd want all of them.

Each screenshot is 50-100 KB sent to the API. My test sent about 30 screenshots. That adds up if you're testing a lot.

Why this is useful

For testing, normal Playwright tests look like this:

await page.click('[data-testid="lang-switcher"]');
await page.click('[data-lang="es"]');
expect(await page.textContent('h1')).toBe('Cómo usar IA...');

With Computer Use you just say: "Switch to Spanish and check the title changed."

The AI test is slower and costs money. But:

  • Way faster to write
  • Doesn't break when you change CSS classes
  • Works like an actual user
  • Good for one-off checks

What else you could do with this

The model isn't limited to testing. The documentation mentions use cases like automating repetitive data entry, conducting research across websites, and filling out forms. After seeing this work, I thought of other applications:

  • "Compare prices for this product across these 5 websites"
  • "Fill out this job application form with my resume data"
  • "Find all blog posts about X topic and summarize them"
  • "Monitor this page daily and alert me when content changes"
  • "Extract all product details from this catalog into a spreadsheet"

Anything you'd do manually in a browser, you can automate with plain English instructions.

Try it

The Gemini API has a free tier for testing, though Computer Use uses the same pricing as Gemini 2.5 Pro ($1.25 per 1M input tokens). You need:

Start simple: "Go to example.com and tell me the page title." Then build from there.

Model name is gemini-2.5-computer-use-preview-10-2025.

Stay Updated

Get the latest posts and insights delivered to your inbox.

Unsubscribe anytime. No spam, ever.