Skip to content
Last updated

Computer use is an experimental capability that allows the AI to interact with graphical interfaces — clicking, typing, scrolling, and navigating — as if it were a human user. This enables automation of tasks that don't have a programmatic API.

Experimental: Computer use is in preview. Behavior may be unpredictable on complex interfaces. Test thoroughly before using in production workflows.

Enabling Computer Use

{
 "organization_id": "org_your_org_id",
 "assistant_id": "asst_abc123",
 "tools": [{"type": "computer_use_preview"}],
 "inputs": [{"role": "user", "content": "Open the settings page and take a screenshot."}]
}

How It Works

  1. The model receives a screenshot of the current screen state
  2. It decides which action to take (click, type, scroll, etc.)
  3. The action is executed and a new screenshot is captured
  4. This loop continues until the task is complete or the model signals it's done

Each action cycle produces a computer_use_preview output item.

Response Structure

{
 "output": [
 {
 "type": "computer_use_preview",
 "id": "cu_abc123",
 "status": "completed",
 "action": {
 "type": "screenshot"
 },
 "output": {
 "image_url": "https://..."
 }
 }
 ]
}

Supported Actions

ActionDescription
screenshotCapture the current screen
clickClick at screen coordinates
double_clickDouble-click at coordinates
typeType text at the current cursor position
keyPress a keyboard key or shortcut
scrollScroll the page
dragClick and drag between coordinates

Tool Modes

ModeBehavior
onEnable computer use
offDisable computer use

Use Cases

  • Automating desktop applications without an API
  • Web scraping and form filling
  • Testing UI workflows
  • Capturing screenshots of dynamic content

Safety Considerations

Computer use executes real actions on the host system. Always:

  • Run in an isolated/sandboxed environment
  • Review the model's planned actions before execution when possible
  • Set strict instructions about which applications and URLs are allowed
  • Never give the model access to sensitive systems or credentials