Skip to content

Getting Started

The Perso Interactive SDK lets you embed a real-time, WebRTC-based AI avatar session in a web app, with LLM chat, TTS/STT, and client-side tool calls.

This page is a short orientation. For the complete API surface, see the API Reference.

Install

bash
# npm
npm install perso-interactive-sdk-web

# yarn
yarn add perso-interactive-sdk-web

# pnpm
pnpm add perso-interactive-sdk-web

Entry points

The package exposes two subpath exports:

SubpathUse fromPurpose
perso-interactive-sdk-web/clientBrowsercreateSession, ChatTool, ChatState, error classes.
perso-interactive-sdk-web/serverNode.js / SSR runtimecreateSessionId, getIntroMessage — keeps your API key off the wire.

Security

Never call the server-only createSessionId from browser code with a real API key. Create the session id on your server and pass only the id to the client.

Quick start

The two functions you need are createSessionId (server) and createSession (browser). Both take a single options object — no API server URL to wire up.

ts
// --- Server (Node.js) ---
import { createSessionId } from 'perso-interactive-sdk-web/server';

const sessionId = await createSessionId({
  apiKey: process.env.PERSO_INTERACTIVE_API_KEY!,
  params: {
    using_stf_webrtc: true,
    model_style: '<model_style_name>',
    prompt: '<prompt_id>',
    llm_type: '<llm_name>',
    tts_type: '<tts_name>',
    stt_type: '<stt_name>',
  },
});
// Return `sessionId` to the browser via your API endpoint.
ts
// --- Browser ---
import { createSession } from 'perso-interactive-sdk-web/client';

const session = await createSession({
  sessionId,           // received from your server endpoint
  width: 1920,
  height: 1080,
  clientTools: [],
});

session.setSrc(document.getElementById('video') as HTMLVideoElement);

Minimal flow

  1. Fetch settings (LLMs, TTSs, STTs, model styles, prompts, …) from the API server using the helpers under PersoInteractive — e.g. getAllSettings({ apiKey }).
  2. On your server, call createSessionId({ apiKey, params }) (or pass sessionTemplateId instead of params) and return the id to the browser.
  3. In the browser, call createSession({ sessionId, width, height, clientTools }).
  4. Bind media: session.setSrc(videoElement) or use session.getRemoteStream().
  5. Subscribe to state and chat log:
    • session.subscribeChatStates(handler)
    • session.subscribeChatLog(handler)
    • session.setErrorHandler(handler)
  6. Drive interaction with one of the main APIs below.

Main interaction APIs

Full pipeline with individual step control. Use this when you need to handle each stage (LLM response, TTS audio, avatar animation) separately.

ts
// 1. Get LLM response
const llmGenerator = session.processLLM({ message: 'Hello!' });
let llmResponse = '';
for await (const chunk of llmGenerator) {
  if (chunk.type === 'message' && chunk.finish) {
    llmResponse = chunk.message;
  }
}

// 2. Convert text to speech
const audioBlob = await session.processTTS(llmResponse);

// 3. Animate avatar with audio
if (audioBlob) {
  await session.processSTF(audioBlob, audioBlob.type, llmResponse);
}

With voice input (STT → LLM → TTS → STF):

ts
await session.startProcessSTT();
const text = await session.stopProcessSTT();
// Pass `text` to the processLLM pipeline above

Direct Speech — processTTSTF

Avatar speaks text directly without LLM. Useful for scripted greetings, announcements, or guided messages.

ts
session.processTTSTF('Welcome! How can I help you today?');

Lifecycle & observability

ts
session.setSrc(videoElement);        // bind remote video stream
session.subscribeChatStates(handler) // ChatState set updates
session.subscribeChatLog(handler);   // full chat log updates
session.setErrorHandler(handler);    // typed Error reporting
session.onClose((manual) => { ... }) // 200 close vs disconnect
session.stopSession();               // tear down WebRTC + media

Where to go next

  • API Reference — full function/type signatures, return shapes, and error hierarchy.
  • README on npm — quick install snippets and project links.

Released under the MIT License.