Getting Started
The Perso Interactive SDK lets you embed a real-time, WebRTC-based AI avatar session in a web app, with LLM chat, TTS/STT, and client-side tool calls.
This page is a short orientation. For the complete API surface, see the API Reference.
Install
# npm
npm install perso-interactive-sdk-web
# yarn
yarn add perso-interactive-sdk-web
# pnpm
pnpm add perso-interactive-sdk-webEntry points
The package exposes two subpath exports:
| Subpath | Use from | Purpose |
|---|---|---|
perso-interactive-sdk-web/client | Browser | createSession, ChatTool, ChatState, error classes. |
perso-interactive-sdk-web/server | Node.js / SSR runtime | createSessionId, getIntroMessage — keeps your API key off the wire. |
Security
Never call the server-only createSessionId from browser code with a real API key. Create the session id on your server and pass only the id to the client.
Quick start
The two functions you need are createSessionId (server) and createSession (browser). Both take a single options object — no API server URL to wire up.
// --- Server (Node.js) ---
import { createSessionId } from 'perso-interactive-sdk-web/server';
const sessionId = await createSessionId({
apiKey: process.env.PERSO_INTERACTIVE_API_KEY!,
params: {
using_stf_webrtc: true,
model_style: '<model_style_name>',
prompt: '<prompt_id>',
llm_type: '<llm_name>',
tts_type: '<tts_name>',
stt_type: '<stt_name>',
},
});
// Return `sessionId` to the browser via your API endpoint.// --- Browser ---
import { createSession } from 'perso-interactive-sdk-web/client';
const session = await createSession({
sessionId, // received from your server endpoint
width: 1920,
height: 1080,
clientTools: [],
});
session.setSrc(document.getElementById('video') as HTMLVideoElement);Minimal flow
- Fetch settings (LLMs, TTSs, STTs, model styles, prompts, …) from the API server using the helpers under
PersoInteractive— e.g.getAllSettings({ apiKey }). - On your server, call
createSessionId({ apiKey, params })(or passsessionTemplateIdinstead ofparams) and return the id to the browser. - In the browser, call
createSession({ sessionId, width, height, clientTools }). - Bind media:
session.setSrc(videoElement)or usesession.getRemoteStream(). - Subscribe to state and chat log:
session.subscribeChatStates(handler)session.subscribeChatLog(handler)session.setErrorHandler(handler)
- Drive interaction with one of the main APIs below.
Main interaction APIs
Chat (Recommended) — processLLM → processTTS → processSTF
Full pipeline with individual step control. Use this when you need to handle each stage (LLM response, TTS audio, avatar animation) separately.
// 1. Get LLM response
const llmGenerator = session.processLLM({ message: 'Hello!' });
let llmResponse = '';
for await (const chunk of llmGenerator) {
if (chunk.type === 'message' && chunk.finish) {
llmResponse = chunk.message;
}
}
// 2. Convert text to speech
const audioBlob = await session.processTTS(llmResponse);
// 3. Animate avatar with audio
if (audioBlob) {
await session.processSTF(audioBlob, audioBlob.type, llmResponse);
}With voice input (STT → LLM → TTS → STF):
await session.startProcessSTT();
const text = await session.stopProcessSTT();
// Pass `text` to the processLLM pipeline aboveDirect Speech — processTTSTF
Avatar speaks text directly without LLM. Useful for scripted greetings, announcements, or guided messages.
session.processTTSTF('Welcome! How can I help you today?');Lifecycle & observability
session.setSrc(videoElement); // bind remote video stream
session.subscribeChatStates(handler) // ChatState set updates
session.subscribeChatLog(handler); // full chat log updates
session.setErrorHandler(handler); // typed Error reporting
session.onClose((manual) => { ... }) // 200 close vs disconnect
session.stopSession(); // tear down WebRTC + mediaWhere to go next
- API Reference — full function/type signatures, return shapes, and error hierarchy.
- README on npm — quick install snippets and project links.