Perso Interactive SDK details (js)
The Perso Interactive SDK talks to the Perso production API server (https://platform.perso.ai) by default. All SDK calls take an options object — you only need to provide your API key and the call-specific arguments.
PersoInteractive
Create session id
Server-side only: This function requires your API key. Always call it from your server to keep the key secret. Pass only the returned
sessionIdto the browser.
Two call shapes are supported via a discriminated options object:
- Form A — pass
paramswith explicit runtime options. - Form B — pass
sessionTemplateIdto create a session from a SessionTemplate. The SDK resolves the template and maps its fields to the request body. Throws ifmodel_style.platform_typeis not"webrtc".
// Form A: Create from explicit params
function createSessionId(options: {
apiKey: string;
params: {
using_stf_webrtc: boolean;
model_style: string;
prompt: string;
document?: string;
background_image?: string;
mcp_servers?: Array<string>;
padding_left?: number;
padding_top?: number;
padding_height?: number;
llm_type?: string;
tts_type?: string;
stt_type?: string;
text_normalization_config?: string;
text_normalization_locale?: string | null;
stt_text_normalization_config?: string;
stt_text_normalization_locale?: string | null;
};
}): Promise<string>;
// Form B: Create from a SessionTemplate ID
function createSessionId(options: {
apiKey: string;
sessionTemplateId: string;
}): Promise<string>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
params | Form A | Explicit runtime options (see the params.* table below). |
sessionTemplateId | Form B | SessionTemplate ID. The SDK resolves the template and maps its fields to the request body. Throws if model_style.platform_type is not "webrtc". |
params.using_stf_webrtc | Yes | Whether to enable the STF WebRTC pipeline (set to true for the SDK demos) |
params.model_style | Yes | ModelStyle name |
params.prompt | Yes | Prompt prompt_id |
params.llm_type | No | LLM name |
params.tts_type | No | TTS name |
params.stt_type | No | STT name |
params.document | No | Document document_id |
params.background_image | No | BackgroundImage backgroundimage_id |
params.text_normalization_config | No | TextNormalizationConfig textnormalizationconfig_id applied to TTS/LLM input |
params.text_normalization_locale | No | Locale for text normalization (e.g., "ko", "en"). Pass null to explicitly disable. |
params.stt_text_normalization_config | No | TextNormalizationConfig textnormalizationconfig_id applied to STT output |
params.stt_text_normalization_locale | No | Locale for STT text normalization (e.g., "ko", "en", max 10 chars). Pass null to explicitly disable. |
params.mcp_servers | No | MCPServer mcpserver_id array |
params.padding_left | No | AI human horizontal position (A number between -1.0 and 1.0, default 0.0) |
params.padding_top | No | AI human vertical position (A number between 0.0 and 1.0, default 0.0) |
params.padding_height | No | The scale of AI human height; the width of the AI human cannot exceed the width of the background (A number between 0.0 and 5.0, default 1.0) |
// Form A — from explicit params
const sessionId = await createSessionId({
apiKey,
params: {
using_stf_webrtc: true,
model_style: "<model_style_name>",
prompt: "<prompt_id>",
llm_type: "<llm_name>",
tts_type: "<tts_name>",
stt_type: "<stt_name>",
},
});
// Form B — from a SessionTemplate
const sessionId = await createSessionId({
apiKey,
sessionTemplateId: "<sessiontemplate_id>",
});Returns: Session ID (string)
Express.js Example
This example uses Express. Install the required packages:
# npm
npm install express perso-interactive-sdk-web
# yarn
yarn add express perso-interactive-sdk-web
# pnpm
pnpm add express perso-interactive-sdk-web// server.js
const express = require("express");
const { createSessionId } = require("perso-interactive-sdk-web/server");
const app = express();
const API_KEY = process.env.PERSO_INTERACTIVE_API_KEY;
app.post("/api/session", async (req, res) => {
try {
const sessionId = await createSessionId({
apiKey: API_KEY,
params: {
using_stf_webrtc: true,
model_style: "<model_style_name>",
prompt: "<prompt_id>",
llm_type: "<llm_name>",
tts_type: "<tts_name>",
stt_type: "<stt_name>",
// text_normalization_config: "<textnormalizationconfig_id>", // optional
// stt_text_normalization_config: "<textnormalizationconfig_id>", // optional
// stt_text_normalization_locale: "ko", // optional
},
});
res.json({ sessionId });
} catch (error) {
console.error("Session creation failed:", error);
res.status(500).json({ error: "Failed to create session" });
}
});
// Using a SessionTemplate (simpler — no need to specify individual options)
app.post("/api/session-from-template", async (req, res) => {
try {
const sessionId = await createSessionId({
apiKey: API_KEY,
sessionTemplateId: "<sessiontemplate_id>",
});
res.json({ sessionId });
} catch (error) {
console.error("Session creation failed:", error);
res.status(500).json({ error: "Failed to create session" });
}
});
app.listen(3000, () => console.log("Server running on port 3000"));Client-side Testing Only
⚠️ Warning: The following example exposes your API key in the browser. Use this only for local testing. Never deploy this to production. If your API key is compromised due to client-side usage, the SDK provider assumes no responsibility.
import {
createSessionId,
createSession,
} from "perso-interactive-sdk-web/client";
const apiKey = "YOUR_API_KEY"; // ⚠️ NEVER commit or expose this in production
const sessionId = await createSessionId({
apiKey,
params: {
using_stf_webrtc: true,
model_style: "<model_style_name>",
prompt: "<prompt_id>",
llm_type: "<llm_name>",
tts_type: "<tts_name>",
stt_type: "<stt_name>",
// text_normalization_config: "<textnormalizationconfig_id>", // optional
// stt_text_normalization_config: "<textnormalizationconfig_id>", // optional
// stt_text_normalization_locale: "ko", // optional
},
});
const session = await createSession({
sessionId,
width: 1920,
height: 1080,
clientTools: [],
});
const videoEl = document.getElementById("video");
if (videoEl instanceof HTMLVideoElement) {
session.setSrc(videoEl);
}Create session
The sessionId parameter should be obtained from your server endpoint (see Create session id above).
function createSession(options: {
sessionId: string;
width: number;
height: number;
clientTools: Array<ChatTool>;
}): Promise<Session>;| Field | Required | Description |
|---|---|---|
sessionId | Yes | Session ID obtained from your server |
width | Yes | AI human video width |
height | Yes | AI human video height |
clientTools | Yes | Client tools to be registered with the LLM |
const session = await createSession({
sessionId,
width: 1920,
height: 1080,
clientTools: [],
});Returns: Session object
Get LLM list
function getLLMs(options: { apiKey: string }): Promise<any>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
const llms = await getLLMs({ apiKey });Returns: Array of LLM objects
[
{
"name": string
}
]Get TTS list
function getTTSs(options: { apiKey: string }): Promise<any>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
const ttsTypes = await getTTSs({ apiKey });Returns: Array of TTS objects
[
{
"name": string,
"service": string,
"speaker": string
}
]Get STT list
function getSTTs(options: { apiKey: string }): Promise<any>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
const sttTypes = await getSTTs({ apiKey });Returns: Array of STT objects
[
{
"name": string,
"service": string,
"options": string
}
]Get ModelStyle list
function getModelStyles(options: { apiKey: string }): Promise<any>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
const modelStyles = await getModelStyles({ apiKey });Returns: Array of ModelStyle objects
[
{
"name": string,
"model": string,
"style": string
}
]Get prompt list
function getPrompts(options: { apiKey: string }): Promise<any>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
const prompts = await getPrompts({ apiKey });Returns: Array of Prompt objects
[
{
"name": string,
"description": string,
"prompt_id": string,
"system_prompt": string,
"require_document": boolean,
"intro_message": string
}
]Get document list
function getDocuments(options: { apiKey: string }): Promise<any>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
const documents = await getDocuments({ apiKey });Returns: Array of Document objects
[
{
"document_id": string,
"title": string,
"description": string,
"search_count": number,
"processed": boolean,
"created_at": string, // ex) "2024-05-02T09:05:55.395Z",
"updated_at": string // ex) "2024-05-02T09:05:55.395Z"
}
]Get background image list
function getBackgroundImages(options: { apiKey: string }): Promise<any>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
const backgroundImages = await getBackgroundImages({ apiKey });Returns: Array of BackgroundImage objects
[
{
"backgroundimage_id": string,
"title": string,
"image": string,
"created_at": string // ex) "2024-05-02T09:05:55.395Z"
}
]Get Remote MCP Server list
function getMcpServers(options: { apiKey: string }): Promise<any>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
const mcpServers = await getMcpServers({ apiKey });Returns: Array of MCPServer objects
[
{
"mcpserver_id": string,
"name": string,
"url": string,
"description": string
}
]Get Text Normalization Config list
function getTextNormalizations(options: { apiKey: string }): Promise<any>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
const textNormalizations = await getTextNormalizations({ apiKey });Returns: Array of TextNormalizationConfig objects
[
{
"textnormalizationconfig_id": string,
"name": string,
"created_at": string
}
]Download Text Normalization Config
Downloads the ruleset data file for a specific Text Normalization Config. Returns a pre-signed Blob Storage URL for the CSV file that can be downloaded directly. Supports Azure Blob Storage ETag for caching.
Top-level convenience function (available from perso-interactive-sdk-web/client).
function getTextNormalization(options: {
apiKey: string;
configId: string;
}): Promise<TextNormalizationDownload>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
configId | Yes | TextNormalizationConfig textnormalizationconfig_id |
import { getTextNormalization } from "perso-interactive-sdk-web/client";
const download = await getTextNormalization({
apiKey,
configId: "<textnormalizationconfig_id>",
});
// download.file_url → pre-signed URL to the CSV rulesetReturns: TextNormalizationDownload object
interface TextNormalizationDownload {
config_id: string;
config_name: string;
file_url: string; // Pre-signed URL; use Azure Blob Storage ETag for caching
}Get all settings
function getAllSettings(options: { apiKey: string }): Promise<any>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
const settings = await getAllSettings({ apiKey });Returns: Object containing all settings
{
"llms": JSON, // result of getLLMs
"ttsTypes": JSON, // result of getTTSs
"sttTypes": JSON, // result of getSTTs
"modelStyles": JSON, // result of getModelStyles
"prompts": JSON, // result of getPrompts
"documents": JSON, // result of getDocuments
"backgroundImages": JSON, // result of getBackgroundImages
"mcpServers": JSON, // result of getMcpServers
"textNormalizations": JSON // result of getTextNormalizations
}Get intro message
Server-side only: This function requires your API key.
function getIntroMessage(options: {
apiKey: string;
promptId: string;
}): Promise<string>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
promptId | Yes | The prompt ID to fetch intro message for |
import { getIntroMessage } from "perso-interactive-sdk-web/server";
const intro = await getIntroMessage({ apiKey, promptId: "<prompt_id>" });Returns: The intro message string for the given prompt.
Throws:
Errorwithcause: 404if the prompt is not foundErrorwith the API error detail if anApiErroroccurs
Get Session Template list
function getSessionTemplates(options: {
apiKey: string;
}): Promise<SessionTemplate[]>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
const templates = await getSessionTemplates({ apiKey });Returns: Array of SessionTemplate objects
[
{
"sessiontemplate_id": string,
"name": string,
"description": string | null,
"prompt": {
"prompt_id": string,
"name": string,
"description"?: string,
"system_prompt": string,
"require_document"?: boolean,
"intro_message"?: string
},
"capability": [
{
"name": string, // "LLM" | "TTS" | "STT" | "STF_ONPREMISE" | "STF_WEBRTC"
"description"?: string | null
}
],
"document": {
"document_id": string,
"title": string,
"file": string,
"description"?: string,
"search_count"?: number,
"ef_search"?: number | null,
"processed": boolean,
"processed_v2": boolean,
"created_at": string,
"updated_at": string
} | null,
"llm_type": {
"name": string,
"service"?: string
},
"tts_type": {
"name": string,
"streamable"?: boolean,
"service": string,
"model"?: string | null,
"voice"?: string | null,
"voice_settings"?: object | null,
"style"?: string | null,
"voice_extra_data"?: object | null
},
"stt_type": {
"name": string,
"service": string,
"options"?: object | null
},
"text_normalization_config"?: {
"textnormalizationconfig_id": string,
"name": string,
"created_at": string
} | null,
"text_normalization_locale"?: string | null,
"stt_text_normalization_config"?: {
"textnormalizationconfig_id": string,
"name": string,
"created_at": string
} | null,
"stt_text_normalization_locale"?: string | null,
"model_style": {
"name": string,
"model": string,
"model_file"?: string | null,
"model_files": [{ "name": string, "file"?: string | null }],
"style": string,
"file"?: string | null,
"platform_type"?: string,
"configs": [{ "modelstyleconfig_id": string, "key": string, "value": string }]
},
"background_image": {
"backgroundimage_id": string,
"title": string,
"image": string,
"created_at": string
} | null,
"agent": string | null,
"padding_left": number | null,
"padding_top": number | null,
"padding_height": number | null,
"extra_data": object | null,
"mcp_servers"?: [
{
"mcpserver_id": string,
"name": string,
"description"?: string,
"url": string,
"transport_protocol"?: string,
"server_timeout_sec"?: number,
"extra_data"?: object | null
}
],
"created_at": string,
"last_used_at": string | null
}
]Get Session Template
Server-side only: This function requires your API key.
function getSessionTemplate(options: {
apiKey: string;
sessionTemplateId: string;
}): Promise<SessionTemplate>;| Field | Required | Description |
|---|---|---|
apiKey | Yes | API Key |
sessionTemplateId | Yes | Session Template ID |
const template = await getSessionTemplate({
apiKey,
sessionTemplateId: "<sessiontemplate_id>",
});Returns: A single SessionTemplate object (same schema as array element above)
Make TTS
function makeTTS(options: {
sessionId: string;
text: string;
locale?: string;
output_format?: string;
}): Promise<{ audio: string }>;| Field | Required | Description |
|---|---|---|
sessionId | Yes | Active session ID |
text | Yes | Text to synthesize |
locale | No | Language/locale code for TTS (e.g., "ko", "en") |
output_format | No | Audio output format (e.g., "wav") |
import { makeTTS } from "perso-interactive-sdk-web/client";
const result = await makeTTS({
sessionId: "<session_id>",
text: "Hello, world!",
locale: "en", // optional
output_format: "wav", // optional
});
// result.audio contains Base64-encoded audio dataReturns: Object with Base64-encoded audio string
{
"audio": string
}PersoUtilServer
Server-side only: Available from
perso-interactive-sdk-web/server.
Server-side alias for the internal PersoUtil class. Exposes low-level static methods for direct API calls (e.g., getLLMs, getPrompts, getModelStyles). Most users should prefer the top-level convenience functions instead.
Get session info
function getSessionInfo(options: {
sessionId: string;
}): Promise<any>;| Field | Required | Description |
|---|---|---|
sessionId | Yes | Session ID |
const info = await getSessionInfo({ sessionId });Returns: Session info object
{
"session_id": string,
"client_sdp": string,
"server_sdp": string,
"prompt": {
"name": string,
"description": string,
"prompt_id": string,
"system_prompt": string,
"require_document": boolean,
"intro_message": string
},
"document": string,
"llm_type": {
"name": string
},
"model_style": {
"name": string,
"model": string,
"model_file": string,
"style": string,
"file": string
},
"tts_type": {
"name": string,
"service": string,
"model": string,
"voice": string,
"style": string,
"voice_extra_data": string
},
"ice_servers": Array<RTCIceServer>,
"status": string, // "CREATED", "EXCHANGED", "IN_PROGRESS", "TERMINATED"
"termination_reason": string, // "GRACEFUL_TERMINATION", "SESSION_EXPIRED_BEFORE_CONNECTION", "SESSION_LOST_AFTER_CONNECTION", "SESSION_MISC_ERROR", "MAX_ACTIVE_SESSION_QUOTA_EXCEEDED", "MAX_MIN_PER_SESSION_QUOTA_EXCEEDED", "TOTAL_MIN_PER_MONTH_QUOTA_EXCEEDED"
"duration_sec": number,
"created_at": string, // ex) "2024-05-02T09:05:55.395Z"
"padding_left": number,
"padding_top": number,
"padding_height": number,
"background_image": {
"backgroundimage_id": string,
"title": string,
"image": string,
"created_at": string // ex) "2024-05-02T09:05:55.395Z"
},
"extra_data": string
}Session
Send a message to the AI human
function processChat(message: string): Promise<void>;'message' must not be empty
ChatState.LLM is set in the 'Chat states'.
While the AI human is preparing a response, ChatState.ANALYZING is set in 'Chat states'.
When speaking begins, ChatState.SPEAKING is set in 'Chat states'.
Stream LLM responses with full control (Advanced)
function processLLM(options: ProcessLLMOptions): AsyncGenerator<LLMStreamChunk>;| Parameter | Description |
|---|---|
options | Configuration object containing message, optional tools, and abort signal |
Returns: AsyncGenerator<LLMStreamChunk> - An async generator that yields streaming chunks
This method provides fine-grained control over LLM streaming responses. Unlike processChat(), it:
- Does NOT automatically trigger TTS/avatar speech
- Does NOT automatically update chat logs
- Does NOT manage chat states
- Gives you full control over the streaming response
Use cases:
- Custom UI rendering of streaming text
- Intercepting and modifying LLM responses
- Building custom chat interfaces
- Integration with external systems
Example usage:
const controller = new AbortController();
for await (const chunk of session.processLLM({
message: "Hello, how are you?",
signal: controller.signal
})) {
if (chunk.type === 'message') {
console.log('Streaming:', chunk.message);
if (chunk.finish) {
console.log('Complete response:', chunk.message);
// Optionally trigger avatar speech
session.processTTSTF(chunk.message);
}
} else if (chunk.type === 'tool_call') {
console.log('Tool called:', chunk.tool_calls);
} else if (chunk.type === 'tool_result') {
console.log('Tool result:', chunk.result);
} else if (chunk.type === 'error') {
console.error('Error:', chunk.error);
}
}Cancellation:
// Cancel the stream
controller.abort();
// Or break from the loop
for await (const chunk of session.processLLM({ message: "..." })) {
if (someCondition) break; // Cleanly exits the generator
}Get message history
function getMessageHistory(): ReadonlyArray<object>;Returns: Read-only array of message history objects
Returns the conversation history used by the LLM. Each entry contains role, type, and content fields.
Example usage:
const history = session.getMessageHistory();
console.log('Conversation has', history.length, 'messages');Transcribe audio to text
function transcribeAudio(audio: Blob | File, language?: string): Promise<string>;| Parameter | Required | Description |
|---|---|---|
audio | Yes | Audio data as Blob or File |
language | No | Language code for STT (e.g., 'ko', 'en') |
Returns: Promise resolving to transcribed text
Throws: STTError if the API call fails
Converts audio to text using the STT API. This is a lower-level method compared to startProcessSTT()/stopProcessSTT() - use it when you have pre-recorded audio or audio from external sources.
Example usage:
// From a file input
const fileInput = document.querySelector('input[type="file"]');
const file = fileInput.files[0];
const text = await session.transcribeAudio(file, 'ko');
// From a Blob
const audioBlob = new Blob([audioData], { type: 'audio/wav' });
const text = await session.transcribeAudio(audioBlob);Transcribe audio (detailed response)
function transcribeAudioDetailed(
audio: Blob | File,
language?: string
): Promise<STTResponse>;
interface STTResponse {
text: string;
}| Parameter | Required | Description |
|---|---|---|
audio | Yes | Audio data as Blob or File |
language | No | Language code for STT (e.g., 'ko', 'en'). When omitted, the server detects the language. |
Returns: STTResponse containing the transcribed text.
The underlying HTTP response may include additional fields such as
localeandnormalized_text. The SDK does not expose them as part ofSTTResponse. Consumers who need those values must call the REST endpoint (POST /api/v1/session/{session_id}/stt/) directly.
Throws: STTError if the API call fails.
Same as transcribeAudio() but returns the STTResponse object form. Use this when you want an extensible result shape rather than a bare string.
const result = await session.transcribeAudioDetailed(audioBlob, 'ko');
console.log(result.text);Custom LLM response (Deprecated)
/** @deprecated Use processTTSTF() with explicit history management instead. */
function processCustomChat(message: string): void;'message' must not be empty
Deprecated: This function is deprecated. Use
processTTSTF()instead and manage message history manually viagetMessageHistory()if needed.
This function makes the AI human speak the response message generated by a custom LLM. This function does not invoke the internal LLM.
While the AI human is preparing a response, ChatState.ANALYZING is set in 'Chat states'.
When speaking begins, ChatState.SPEAKING is set in 'Chat states'.
Important Notes:
- Does not manage messageHistory automatically
subscribeChatLogis not available when using this functionsubscribeChatStatescannot handle ChatState.LLM (only ANALYZING and SPEAKING states are triggered)- Use this function when you want to integrate your own custom LLM instead of using the internal LLM
Related methods:
processTTSTF()- Recommended replacement; adds message to history and chat log automaticallyprocessLLM()- For custom LLM integration with streaming support
AI human speaks the 'message'
function processTTSTF(message);'message' must not be empty
While the AI human is preparing a response, ChatState.ANALYZING is set in 'Chat states'. When speaking begins, ChatState.SPEAKING is set in 'Chat states'.
Generate TTS audio from text
function processTTS(
message: string,
options?: { resample?: boolean; locale?: string; output_format?: string }
): Promise<Blob | undefined>;| Parameter | Required | Description |
|---|---|---|
message | Yes | Text to convert to speech audio |
options | No | Options object |
options.resample | No | Whether to resample audio to TTS target sample rate (default: false) |
options.locale | No | Language locale code for TTS (e.g., 'ko', 'en'). Passed to the TTS API when specified. |
options.output_format | No | Audio output format (e.g., 'wav', 'mp3'). Passed to the TTS API when specified. |
Returns: Blob containing audio data (audio/wav), or undefined if the message is empty after filtering.
This function converts text to speech audio without making the AI human speak. Use this when you need the TTS audio file separately.
While processing, ChatState.TTS is set in the 'Chat states'.
Example usage:
const audioBlob = await session.processTTS("Hello, world!");
if (audioBlob) {
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();
// Remember to revoke the URL when done to prevent memory leaks
audio.onended = () => URL.revokeObjectURL(audioUrl);
}
// With locale and output format
const audioBlob = await session.processTTS("Hello, world!", {
resample: false,
locale: "en",
output_format: "mp3"
});Error handling:
If an error occurs during TTS processing, a TTSError is passed to the error handler set via setErrorHandler().
Send audio/video to STF pipeline
function processSTF(file: Blob, format?: string, message?: string): Promise<string>;| Parameter | Required | Description |
|---|---|---|
file | Yes | The audio file to send as a Blob |
format | No | Format hint. Accepts canonical 'wav' / 'mp3', MIME types ('audio/wav', 'audio/mpeg', 'audio/x-wav', 'audio/wave', 'audio/mp3', with or without parameters), or undefined. When omitted the SDK derives the format from file.type, falling back to 'wav'. |
message | No | Optional text message to accompany the file. Defaults to an empty string. |
Sends an audio file to the STF (Speech-To-Face) pipeline for processing. This method requires an active WebRTC connection (STF mode).
The SDK normalizes the format argument via normalizeAudioFormat before forwarding it, so callers can safely pass the Blob's own type field directly (e.g. processSTF(blob, blob.type, text)).
Returns: Promise<string> - The STF pipeline response
Throws:
Error("processSTF requires WebRTC (STF mode)")if the session does not have an active WebRTC connection
ChatState transitions: ANALYZING → SPEAKING
Example usage:
const audioBlob = new Blob([audioData], { type: 'audio/wav' });
const response = await session.processSTF(audioBlob, 'wav', 'Hello');
// Or let the SDK derive the format from the Blob:
const ttsBlob = await session.processTTS('Hello');
if (ttsBlob) {
await session.processSTF(ttsBlob, ttsBlob.type, 'Hello');
}Stop AI human speaking
function clearBuffer(): Promise<void>;You can stop the AI human's response. If called before speech starts, the pending speech will not begin. If speech has already started, it will stop after the current sentence finishes.
Start STT recording (HTTP API)
function startProcessSTT(timeout?: number): Promise<void>;| Parameter | Required | Description |
|---|---|---|
timeout | No | Optional timeout in milliseconds to automatically stop recording |
Starts recording audio for STT (Speech-To-Text) processing.
How it works:
- Records audio using Web Audio API (client-side)
- Encodes to WAV format
- Sends to STT REST API via HTTP
- Returns transcribed text
This is the recommended STT method with better cross-browser support and explicit control over the transcription flow.
Throws:
Errorif already recordingErrorif microphone access is denied
Browser Support:
- Chrome 66+
- Firefox 76+
- Safari 14.1+
- iOS Safari 14.5+
- Edge 79+
Stop STT recording and transcribe (HTTP API)
function stopProcessSTT(language?: string): Promise<string>;| Parameter | Required | Description |
|---|---|---|
language | No | Optional language code for STT (e.g., 'ko', 'en') |
Stops STT recording and sends the recorded audio to the STT REST API for transcription.
Returns: Promise resolving to the transcribed text (string)
Throws:
STTErrorif the API call failsError("STT recording is not in progress")if the recorder exists but is not recordingError("STT recording has not been started")ifstartProcessSTT()was not called
Usage Example:
// Start recording
await session.startProcessSTT();
// ... user speaks ...
// Stop recording and get transcribed text
try {
const text = await session.stopProcessSTT('ko');
console.log('Transcribed:', text);
// Process the text manually or send to chat
if (text.trim().length > 0) {
session.processChat(text);
}
} catch (error) {
if (error instanceof STTError) {
console.error('STT API failed:', error.message);
}
}Check STT recording status
function isSTTRecording(): boolean;Returns: true if STT recording is currently in progress or has audio pending from timeout, false otherwise
This method returns true in the following cases:
- Recording is actively in progress (after
startProcessSTT()and beforestopProcessSTT()) - Recording was automatically stopped by timeout but
stopProcessSTT()has not been called yet (audio is pending)
Usage Example:
if (!session.isSTTRecording()) {
await session.startProcessSTT();
} else {
const text = await session.stopProcessSTT();
// Handle transcribed text
}Get last recorded audio file
public lastRecordedAudioFile: File | null;Returns: The last recorded WAV audio file from STT processing, or null if no recording has been made.
After calling stopProcessSTT(), you can access the recorded audio file for playback or other purposes.
Usage Example:
// Start recording
await session.startProcessSTT();
// ... user speaks ...
// Stop recording and get transcribed text (language parameter is optional)
const text = await session.stopProcessSTT();
// Access the recorded audio file
if (session.lastRecordedAudioFile) {
const audioUrl = URL.createObjectURL(session.lastRecordedAudioFile);
const audio = new Audio(audioUrl);
audio.play();
// Remember to revoke the URL when done to prevent memory leaks
audio.onended = () => URL.revokeObjectURL(audioUrl);
}startVoiceChat (Deprecated)
/** @deprecated Use startProcessSTT() instead. */
function startVoiceChat(): void;Deprecated: This method is deprecated and will be removed in a future version. Use
startProcessSTT()instead.
Previously used WebRTC data channel for STT.
Migration: Use startProcessSTT() for HTTP API-based STT recording.
stopVoiceChat (Deprecated)
/** @deprecated Use stopProcessSTT() instead. */
function stopVoiceChat(): void;Deprecated: This method is deprecated and will be removed in a future version. Use
stopProcessSTT()instead.
Previously used WebRTC data channel for STT.
Migration: Use stopProcessSTT() for HTTP API-based STT transcription.
Get user's audio stream (Deprecated)
⚠️ Deprecated: Legacy voice chat mode will be removed in a future version.
function getLocalStream(): MediaStream | null;Returns: MediaStream | null — null if not in legacy voice chat mode.
Only available in legacy voice chat mode. Returns null otherwise.
Get AI human's media(video + audio) stream
function getRemoteStream(): MediaStream | undefined;Returns: MediaStream | undefined — undefined if the Perso renderer is not yet initialized.
It can be used for various purposes such as recording.
Get 'SessionID'
function getSessionId(): string;Returns: Session ID (string)
Specifies the HTMLVideoElement to display the AI human video
function setSrc(element: HTMLVideoElement);| Parameter | Description |
|---|---|
element | HTMLVideoElement (<video>) where the AI human's video will be rendered |
Adjusts the size of the AI Human video during execution
function changeSize(width: number, height: number);| Parameter | Description |
|---|---|
width | AI Human video width |
height | AI Human video height |
You can change the size of the AI human even after the session has been created.
Callback for when the session is closed
function onClose(callback: (manualClosed: boolean) => void): () => void;| Parameter | Description |
|---|---|
callback | Callback function invoked when the session is closed |
Callback parameter:
| Parameter | Description |
|---|---|
manualClosed | true if the user closed the session themselves, false if quota exceeded or network error occurred |
Returns: removeOnClose - function to remove the callback
Subscribe to conversation log changes from the session
function subscribeChatLog(callback: (chatLog: Array<Chat>) => void): () => void;| Parameter | Description |
|---|---|
callback | Callback function for receiving conversation log updates |
Chat interface:
interface Chat {
isUser: boolean; // true - user, false - AI human
text: string; // conversation
timestamp: Date; // conversation timestamp
}Returns: unsubscribeChatLog - function to unsubscribe
Receive the entire conversation log whenever it's updated.
Subscribe to 'Chat states' changes from the session
function subscribeChatStates(
callback: (chatStates: Set<ChatState>) => void
): () => void;| Parameter | Description |
|---|---|
callback | Callback function for receiving chat state changes |
Callback parameter:
| Parameter | Description |
|---|---|
chatStates | A set containing the processes currently in progress. Multiple states can be included at the same time. If empty, it indicates an available state. |
Returns: unsubscribeChatStates - function to unsubscribe
Receive 'Chat states' changes during conversation.
Set STT result callback (Voice chat)
function setSttResultCallback(callback: (text: string) => void): () => void;| Parameter | Description |
|---|---|
callback | Callback function that receives STT (Speech-To-Text) results during voice chat |
Returns: Function to remove the callback
Set error handler
function setErrorHandler(callback: (error: Error) => void): () => void;| Parameter | Description |
|---|---|
callback | Callback function that receives errors occurring during the processChat process |
Returns: Function to remove the error handler
Log session event
function logSessionEvent(detail?: string | Record<string, unknown>): Promise<void>;Sends a SESSION_LOG event for the current session. Use this to record custom analytics or debugging information.
| Parameter | Description |
|---|---|
detail | Optional event description. Strings are sent as-is; objects are JSON-stringified. |
Usage:
// String detail
await session.logSessionEvent("user clicked submit button");
// Object detail (JSON-stringified internally)
await session.logSessionEvent({ action: "button_click", target: "submit" });
// No detail
await session.logSessionEvent();Stop session
function stopSession();Closes the session and ends the AI human conversation immediately. If successfully closed, true will be sent to the onClose callback.
ChatState
enum ChatState {
RECORDING = "RECORDING",
LLM = "LLM",
ANALYZING = "ANALYZING",
SPEAKING = "SPEAKING",
TTS = "TTS",
}These values represent the conversation state.
| State | Description |
|---|---|
RECORDING | The user is inputting voice for a voice chat. |
LLM | The LLM is generating a response to the user's message. |
ANALYZING | The response generated by the LLM is being processed for the AI human to speak. |
SPEAKING | The AI human is speaking. |
TTS | The processTTS() function is generating audio from text. |
ChatTool
Perso Interactive supports 'Tool calling'.
There are two types of tool calling: Remote MCP tool and Client tool.
Remote MCP tools can be registered and managed through the Perso Interactive back office.
This section explains Client tools only.
What is 'Client tool'?
A Client tool is an external module or function that performs tasks that the LLM (Large Language Model) cannot execute directly.
In other words, it's a tool that enables the LLM to carry out real-world actions—such as data retrieval, calculations, or file processing—that it can't handle by "just talking."
LLM → calls Client tool → performs actual operation → returns result
The LLM only knows how and when to use the Client tool.
The Client tool handles the actual logic.
'ChatTool' is a Perso Interactive class used to define a Client tool and register it with the LLM.
class ChatTool<TArg = any, TResult extends object = object> {
constructor(
public name: string,
public description: string,
public parameters: object,
public call: (arg: TArg) => TResult | Promise<TResult>,
public executeOnly: boolean = false
);
}| Parameter | Required | Description |
|---|---|---|
name | Yes | The name of the ChatTool. It must be a unique value that distinguishes it from other tools (such as Remote MCP tools or Client tools). |
description | Yes | A description of the ChatTool's purpose and functionality. Since the LLM refers to this information to understand how and when to use the Client tool, it should be written clearly. |
parameters | Yes | The parameters required to call this ChatTool, defined in JSON Schema format. |
call | Yes | Defines the actual operation. You can implement the necessary logic—such as connecting to external APIs—and return the result as an Object. |
executeOnly | No | Set to true if the LLM does not need to provide a response after the ChatTool is executed. In some special cases (e.g., when multiple tools are requested simultaneously), this setting may be ignored. Default: false |
Example: Using a Client tool
Suppose you want the LLM to provide real-time weather information.
(However, by default, an LLM cannot provide real-time weather data.)
But, you already have a function that retrieves weather information.
/**
* Retrieves the current weather for a given location
* @param location - City and country, e.g. 'San Francisco, CA' / examples: 'New York, US', 'Seoul, KR'
* @param units - The temperature unit to use, enum Units
* @returns Weather
*/
function getCurrentWeather(
location: string,
units: Units = Units.CELSIUS
): Promise<Weather> {
...
}
enum Units {
CELSIUS = 'celsius',
FAHRENHEIT = 'fahrenheit'
}
interface Weather {
temperature: number,
condition: string,
humidity: number,
wind: number
}You can define a ChatTool to expose getCurrentWeather as a Client tool and register it with the LLM.
Below is an example ChatTool definition based on getCurrentWeather.
const chatTool = new PersoInteractiveSDK.ChatTool(
"get_current_weather",
"Retrieves the current weather for a given location",
{
type: "object",
properties: {
location: {
examples: ["New York, US", "Seoul, KR"],
type: "string",
description: "City and country, e.g. 'San Francisco, CA'",
},
units: {
type: "string",
description: "The temperature unit to use",
enum: ["celsius", "fahrenheit"],
default: "celsius",
},
},
required: ["location", "units"],
},
async (arg) => {
const location: string = arg.location;
// convert string to Units, 'celsius' -> Units.CELSIUS
const units = Object.values(Units).find((v) => v === arg.units);
if (units === undefined) {
return { result: "failed" };
}
try {
const weather = await getCurrentWeather(location, units);
return {
temperature: `${weather.temperature}${
units === Units.CELSIUS ? "℃" : "℉"
}`, // 30℃, 86℉
condition: weather.condition, // e.g. 'Mostly clear'
humidity: `${weather.humidity}%`, // e.g. 68%
wind: `${weather.wind}km/h`, // e.g. 10.3km/h
};
} catch (error) {
return { result: "failed" };
}
},
false
);Example Interaction
When a user asks:
"Tell me the current weather in Seoul."
The LLM analyzes the request and asks the client to execute the 'get_current_weather' tool.
It sends the tool name and the arguments needed for execution:
name : 'get_current_weather'
arg.location : 'Seoul' (or 'Seoul, KR')
arg.units : 'celsius' (since no specific unit was mentioned, the default 'celsius' is used)
The client receives the 'get_current_weather' request, finds the ChatTool with that name, and calls call(arg).
The result is returned to the LLM, which then generates a suitable response.
Example:
Q. Tell me the current weather in Seoul.
A. It's currently 30°C in Seoul with 68% humidity and clear skies.
Q. Tell me the current weather in Seoul, in Fahrenheit.
A. It's currently 86°F in Seoul with 68% humidity and clear skies.
ApiError
An error that occurs while using the Perso Interactive API. They include details such as the error type and description.
class ApiError extends Error {
constructor(
public errorCode: number,
public code: string,
public detail: string,
public attr?: string
);
}| Property | Required | Description |
|---|---|---|
errorCode | Yes | HTTP error code |
code | Yes | Error code string |
detail | Yes | Detailed error description |
attr | No | Additional attribute information |
SessionCreationError
An error thrown by createSessionId() (including the getSessionTemplate path) when the underlying API returns an ApiError. Extends ApiError, so existing instanceof ApiError branches keep working — use this subclass when you want to distinguish session creation failures from other API errors.
class SessionCreationError extends ApiError {
constructor(source: ApiError);
}The original errorCode, code, detail, and attr fields are preserved verbatim from the server response. The SDK does not parse detail strings — callers inspect the raw fields to decide how to react. For LIV-1681's "feature unavailable" scenarios, the relevant signals are typically:
| Server response | Typical meaning | SDK error |
|---|---|---|
code: "does_not_exist" (often with attr set) | Referenced resource (e.g. prompt) does not exist | DoesNotExistError (subclass of SessionCreationError) |
code: "not_in_organization" | Resource exists but is not assigned to the caller's organization | NotInOrganizationError (subclass of SessionCreationError) |
code: "invalid" with a "… not found" detail | Org-level feature (STT/TTS/LLM/Model Style) is not assigned | SessionCreationError — inspect detail |
DoesNotExistError
A SessionCreationError subclass thrown when the server's response code is "does_not_exist" — the referenced resource (e.g. an invalid prompt_id) does not exist at all. The attr field, when present, identifies which input field referenced the missing resource (e.g. "prompt").
class DoesNotExistError extends SessionCreationError {
constructor(source: ApiError);
}NotInOrganizationError
A SessionCreationError subclass thrown when the server's response code is "not_in_organization" — the resource exists in the platform catalog but is not assigned to the caller's organization (e.g. an LLM/TTS/STT type that requires admin enablement). The attr field, when present, identifies which input field referenced the unavailable resource.
class NotInOrganizationError extends SessionCreationError {
constructor(source: ApiError);
}Usage Example:
import {
createSessionId,
DoesNotExistError,
NotInOrganizationError,
SessionCreationError,
ApiError
} from 'perso-interactive-sdk-web/server';
try {
const sessionId = await createSessionId({ apiKey, params });
} catch (err) {
if (err instanceof DoesNotExistError) {
// e.g. invalid prompt_id — err.attr identifies which input field.
} else if (err instanceof NotInOrganizationError) {
// Resource exists but not enabled for this organization.
} else if (err instanceof SessionCreationError) {
// Other session creation failures (still has errorCode/code/detail/attr).
} else if (err instanceof ApiError) {
// Any other API error.
} else {
throw err;
}
}When chaining
instanceofchecks, order from narrowest to broadest:DoesNotExistError/NotInOrganizationError→SessionCreationError→ApiError. BecauseSessionCreationError extends ApiError, anApiErrorcheck first would absorb every session creation error.
LLMError
An error that occurs during the use of the LLM API, which includes ApiError and LLMStreamingResponseError. This error is passed to the error handler.
class LLMError extends Error {
public underlyingError: ApiError | LLMStreamingResponseError;
}| Property | Description |
|---|---|
underlyingError | The underlying error (ApiError or LLMStreamingResponseError) |
LLMStreamingResponseError
An error that occurs during the streaming of an LLM response.
class LLMStreamingResponseError extends Error {
public description: string;
}| Property | Description |
|---|---|
description | Detailed error description |
STTError
An error that occurs during STT (Speech-To-Text) API calls. This error wraps ApiError and is thrown by stopProcessSTT().
class STTError extends Error {
public underlyingError: ApiError;
}| Property | Description |
|---|---|
underlyingError | The underlying ApiError containing HTTP status code and error details |
Usage Example:
import { STTError } from 'perso-interactive-sdk-web/client';
try {
const text = await session.stopProcessSTT('ko');
} catch (error) {
if (error instanceof STTError) {
console.error('STT API Error:', error.underlyingError.detail);
console.error('Error Code:', error.underlyingError.code);
} else {
console.error('Recording Error:', error.message);
}
}TTSError
An error that occurs during the TTS (Text-to-Speech) process, which includes ApiError and TTSDecodeError. This error is passed to the error handler when processTTS() fails.
class TTSError extends Error {
public underlyingError: ApiError | TTSDecodeError;
}| Property | Description |
|---|---|
underlyingError | The underlying error (ApiError or TTSDecodeError) |
Example error handling:
session.setErrorHandler((error) => {
if (error instanceof TTSError) {
if (error.underlyingError instanceof ApiError) {
console.error("TTS API error:", error.underlyingError.detail);
} else if (error.underlyingError instanceof TTSDecodeError) {
console.error("TTS decode error:", error.underlyingError.description);
}
}
});TTSDecodeError
An error that occurs when decoding Base64 audio data from the TTS API fails.
class TTSDecodeError extends Error {
public description: string;
}| Property | Description |
|---|---|
description | Detailed error description |
ProcessLLMOptions
Options for the processLLM() method.
interface ProcessLLMOptions {
message: string;
tools?: Array<ChatTool>;
signal?: AbortSignal;
}| Property | Required | Description |
|---|---|---|
message | Yes | The user message to send to the LLM |
tools | No | Optional array of tools to override the session's default client tools |
signal | No | AbortSignal for cancellation support |
LLMStreamChunk
Discriminated union type yielded by the processLLM() async generator.
type LLMStreamChunk =
| {
type: 'message';
chunks: string[];
message: string;
finish: boolean;
}
| {
type: 'tool_call';
tool_calls: Array<object>;
}
| {
type: 'tool_result';
tool_call_id: string;
result: object;
}
| {
type: 'error';
error: Error;
};| Type | Description |
|---|---|
message | Streaming text content. chunks contains all accumulated text pieces, message is the full concatenated text, finish indicates if this is the final chunk |
tool_call | LLM requested tool execution. tool_calls contains the tool invocations with function names and arguments |
tool_result | Result of a tool execution. tool_call_id identifies which tool call this result belongs to, result contains the tool's return value |
error | An error occurred during streaming. error contains the LLMError instance |
Example handling all chunk types:
for await (const chunk of session.processLLM({ message: "What's the weather?" })) {
switch (chunk.type) {
case 'message':
// Update UI with streaming text
updateStreamingText(chunk.message);
if (chunk.finish) {
// Final response received
finalizeResponse(chunk.message);
}
break;
case 'tool_call':
// Tool is being executed (handled internally)
console.log('Executing tools:', chunk.tool_calls.map(t => t.function.name));
break;
case 'tool_result':
// Tool execution completed
console.log('Tool result:', chunk.tool_call_id, chunk.result);
break;
case 'error':
// Handle error
handleError(chunk.error);
break;
}
}LlmProcessor
A standalone module for LLM streaming, SSE parsing, tool execution, and message history management. Use this when you need full control over the LLM interaction outside of a Session.
Constructor
class LlmProcessor {
constructor(config: LlmProcessorConfig);
}LlmProcessorConfig
interface LlmProcessorConfig {
apiServer: string;
sessionId: string;
clientTools: Array<ChatTool>;
callbacks: LlmProcessorCallbacks;
}| Property | Description |
|---|---|
apiServer | Perso API server URL |
sessionId | Session ID for the LLM conversation |
clientTools | Default client tools available for LLM tool calling |
callbacks | Callback functions for side effects |
LlmProcessorCallbacks
interface LlmProcessorCallbacks {
onChatStateChange: (add: ChatState | null, remove: ChatState | null) => void;
onError: (error: Error) => void;
onChatLog: (message: string, isUser: boolean) => void;
onTTSTF: (message: string) => void;
}| Callback | Description |
|---|---|
onChatStateChange | Called when chat state should be added or removed |
onError | Called when an error occurs during processing |
onChatLog | Called when a message should be added to the chat log |
onTTSTF | Called when a message should be spoken by the AI human |
Stream LLM responses
function processLLM(options: ProcessLLMOptions): AsyncGenerator<LLMStreamChunk>;| Parameter | Description |
|---|---|
options | Configuration object containing message, optional tools, and abort signal |
Returns: AsyncGenerator<LLMStreamChunk> - An async generator that yields streaming chunks
Streams LLM responses with full tool execution support. Tool calls are executed internally, with tool_call and tool_result chunks yielded for observability. If tools require a follow-up LLM call, the generator loops transparently (up to 10 rounds maximum).
Example usage:
import { LlmProcessor, ChatTool, ChatState } from 'perso-interactive-sdk-web/client';
const processor = new LlmProcessor({
apiServer: "https://platform.perso.ai",
sessionId: "your-session-id",
clientTools: [weatherTool],
callbacks: {
onChatStateChange: (add, remove) => { /* update UI state */ },
onError: (error) => { console.error(error); },
onChatLog: (message, isUser) => { /* update chat UI */ },
onTTSTF: (message) => { /* trigger avatar speech */ },
}
});
for await (const chunk of processor.processLLM({ message: "Hello!" })) {
if (chunk.type === 'message' && chunk.finish) {
console.log('Complete:', chunk.message);
}
}Get message history
function getHistory(): ReadonlyArray<object>;Returns: Read-only array of the conversation history managed by this processor.
Add to message history
function addToHistory(entry: object): void;| Parameter | Description |
|---|---|
entry | A message history entry to append (e.g., { role: 'assistant', content: '...' }) |
WavRecorder
Records audio from the microphone and produces WAV files using Web Audio API with AudioWorklet. This is a standalone utility for capturing audio independently of the Session STT flow.
Browser Support:
- Chrome 66+
- Firefox 76+
- Safari 14.1+
- iOS Safari 14.5+
- Edge 79+
Constructor
class WavRecorder {
constructor(options?: WavRecorderOptions);
}WavRecorderOptions
interface WavRecorderOptions {
channels?: number;
targetSampleRate?: number;
}| Property | Required | Description |
|---|---|---|
channels | No | Number of audio channels (default: 1) |
targetSampleRate | No | Target sample rate for the output WAV. If different from the device's native rate, audio will be resampled. |
Start recording
function start(): Promise<void>;Starts recording audio from the microphone. Requests microphone permission via getUserMedia.
Throws:
Errorif already recordingErrorif microphone access is denied
Stop recording
function stop(): Promise<File>;Stops recording and returns the recorded audio as a WAV File.
Returns: Promise<File> - A File containing the recorded WAV audio (audio/wav)
Throws: Error if not currently recording
Check recording status
function isRecording(): boolean;Returns: true if recording is currently in progress, false otherwise
Factory function
function createWavRecorder(options?: WavRecorderOptions): WavRecorder;Convenience factory function to create a WavRecorder instance.
Example usage:
import { WavRecorder, createWavRecorder } from 'perso-interactive-sdk-web/client';
// Using factory function
const recorder = createWavRecorder({ targetSampleRate: 16000 });
// Or using constructor directly
const recorder = new WavRecorder({ channels: 1, targetSampleRate: 16000 });
// Start recording
await recorder.start();
// ... user speaks ...
// Stop recording and get WAV file
const audioFile = await recorder.stop();
// Use the file (e.g., send to server, play back, or pass to session.transcribeAudio)
const text = await session.transcribeAudio(audioFile, 'en');Audio Utilities
Get WAV sample rate
function getWavSampleRate(wavData: ArrayBuffer): number;| Parameter | Description |
|---|---|
wavData | Raw WAV file data as ArrayBuffer |
Returns: The sample rate of the WAV file (e.g., 16000, 44100, 48000)
Extracts the sample rate from WAV file header data. Useful for inspecting audio files before processing.
TTS Target Sample Rate
const TTS_TARGET_SAMPLE_RATE: number; // 16000The target sample rate (16000 Hz) used by the TTS system. Use this constant when resampling audio for TTS compatibility.