Computer use is an experimental capability that allows the AI to interact with graphical interfaces — clicking, typing, scrolling, and navigating — as if it were a human user. This enables automation of tasks that don't have a programmatic API.
Experimental: Computer use is in preview. Behavior may be unpredictable on complex interfaces. Test thoroughly before using in production workflows.
{
"organization_id": "org_your_org_id",
"assistant_id": "asst_abc123",
"tools": [{"type": "computer_use_preview"}],
"inputs": [{"role": "user", "content": "Open the settings page and take a screenshot."}]
}- The model receives a screenshot of the current screen state
- It decides which action to take (click, type, scroll, etc.)
- The action is executed and a new screenshot is captured
- This loop continues until the task is complete or the model signals it's done
Each action cycle produces a computer_use_preview output item.
{
"output": [
{
"type": "computer_use_preview",
"id": "cu_abc123",
"status": "completed",
"action": {
"type": "screenshot"
},
"output": {
"image_url": "https://..."
}
}
]
}| Action | Description |
|---|---|
screenshot | Capture the current screen |
click | Click at screen coordinates |
double_click | Double-click at coordinates |
type | Type text at the current cursor position |
key | Press a keyboard key or shortcut |
scroll | Scroll the page |
drag | Click and drag between coordinates |
| Mode | Behavior |
|---|---|
on | Enable computer use |
off | Disable computer use |
- Automating desktop applications without an API
- Web scraping and form filling
- Testing UI workflows
- Capturing screenshots of dynamic content
Computer use executes real actions on the host system. Always:
- Run in an isolated/sandboxed environment
- Review the model's planned actions before execution when possible
- Set strict instructions about which applications and URLs are allowed
- Never give the model access to sensitive systems or credentials
- System Tools Overview — All available built-in tools
- Computer Call Output — Handling computer action outputs
- Agentic Workflows — Multi-step automation