Skip to content
Last updated

Freddy supports audio input in multimodal requests, allowing models to understand spoken content, transcribe audio, and respond to voice messages.

Audio Input

Send audio as part of your request inputs using the audio_url content type:

{
 "organization_id": "org_your_org_id",
 "model": "gpt-4o",
 "inputs": [
 {
 "role": "user",
 "content": [
 {
 "type": "audio_url",
 "audio_url": {
 "url": "https://storage.example.com/audio/recording.mp3"
 }
 },
 {
 "type": "text",
 "text": "Transcribe and summarize this audio."
 }
 ]
 }
 ]
}

The model processes the audio and responds with text.

Supported Formats

FormatExtensionNotes
MP3.mp3Most common, recommended
WAV.wavUncompressed, larger files
M4A.m4aApple format
OGG.oggOpen source format
WebM.webmWeb streaming format

Audio URL Requirements

  • The URL must be publicly accessible, or hosted on a Freddy-accessible storage endpoint
  • Maximum audio duration: 25 minutes per request
  • Maximum file size: 25 MB

For long audio files, split them into segments and process each separately.

Uploading Audio Files

Upload audio files to Freddy's file storage, then reference them by URL:

import requests

# Upload the file
with open("audio.mp3", "rb") as f:
 upload_response = requests.post(
 "https://api.aitronos.com/v1/files",
 headers={"X-API-Key": os.environ["FREDDY_API_KEY"]},
 files={"file": f},
 data={"organization_id": "org_your_org_id", "purpose": "assistants"},
 )

file_data = upload_response.json()
audio_url = file_data["url"]

# Use in a request
response = requests.post(
 "https://api.aitronos.com/v1/model/response",
 headers={"X-API-Key": os.environ["FREDDY_API_KEY"]},
 json={
 "organization_id": "org_your_org_id",
 "model": "gpt-4o",
 "inputs": [
 {
 "role": "user",
 "content": [
 {"type": "audio_url", "audio_url": {"url": audio_url}},
 {"type": "text", "text": "What is being discussed?"},
 ],
 }
 ],
 },
)

Use Cases

  • Transcription — Convert spoken audio to text
  • Meeting summaries — Summarize recorded discussions
  • Voice Q&A — Answer questions about spoken content
  • Audio analysis — Identify topics, sentiment, or key points

Model Support

Audio input is supported by models with multimodal capabilities. Check that your selected model supports audio by reviewing Available Models.