Freddy supports audio input in multimodal requests, allowing models to understand spoken content, transcribe audio, and respond to voice messages.
Send audio as part of your request inputs using the audio_url content type:
{
"organization_id": "org_your_org_id",
"model": "gpt-4o",
"inputs": [
{
"role": "user",
"content": [
{
"type": "audio_url",
"audio_url": {
"url": "https://storage.example.com/audio/recording.mp3"
}
},
{
"type": "text",
"text": "Transcribe and summarize this audio."
}
]
}
]
}The model processes the audio and responds with text.
| Format | Extension | Notes |
|---|---|---|
| MP3 | .mp3 | Most common, recommended |
| WAV | .wav | Uncompressed, larger files |
| M4A | .m4a | Apple format |
| OGG | .ogg | Open source format |
| WebM | .webm | Web streaming format |
- The URL must be publicly accessible, or hosted on a Freddy-accessible storage endpoint
- Maximum audio duration: 25 minutes per request
- Maximum file size: 25 MB
For long audio files, split them into segments and process each separately.
Upload audio files to Freddy's file storage, then reference them by URL:
import requests
# Upload the file
with open("audio.mp3", "rb") as f:
upload_response = requests.post(
"https://api.aitronos.com/v1/files",
headers={"X-API-Key": os.environ["FREDDY_API_KEY"]},
files={"file": f},
data={"organization_id": "org_your_org_id", "purpose": "assistants"},
)
file_data = upload_response.json()
audio_url = file_data["url"]
# Use in a request
response = requests.post(
"https://api.aitronos.com/v1/model/response",
headers={"X-API-Key": os.environ["FREDDY_API_KEY"]},
json={
"organization_id": "org_your_org_id",
"model": "gpt-4o",
"inputs": [
{
"role": "user",
"content": [
{"type": "audio_url", "audio_url": {"url": audio_url}},
{"type": "text", "text": "What is being discussed?"},
],
}
],
},
)- Transcription — Convert spoken audio to text
- Meeting summaries — Summarize recorded discussions
- Voice Q&A — Answer questions about spoken content
- Audio analysis — Identify topics, sentiment, or key points
Audio input is supported by models with multimodal capabilities. Check that your selected model supports audio by reviewing Available Models.
- Inputs and Outputs — Full multimodal input reference
- Images and Vision — Image input and generation
- Files API — Uploading files for use in requests
- Available Models — Model capability guide