Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,82 @@ Personaplex finetunes Moshi and benefits from the generalization capabilities of
You enjoy having a good conversation. Have a technical discussion about fixing a reactor core on a spaceship to Mars. You are an astronaut on a Mars mission. Your name is Alex. You are already dealing with a reactor core meltdown on a Mars mission. Several ship systems are failing, and continued instability will lead to catastrophic failure. You explain what is happening and you urgently ask for help thinking through how to stabilize the reactor.
```

## Korean Language Support

PersonaPlex supports Korean conversations through a parallel pipeline using best-in-class open-source components:

- **ASR**: `faster-whisper` (Korean speech to text)
- **LLM**: Any OpenAI-compatible API (Korean text generation)
- **TTS**: CosyVoice2-0.5B (Korean text to speech, streaming)

### Additional Dependencies

Install Korean language support dependencies:
```bash
pip install faster-whisper>=1.0.0 openai>=1.0.0 librosa>=0.10.0
```

For CosyVoice2 TTS, follow the [CosyVoice2 installation guide](https://github.com/FunAudioLLM/CosyVoice).

### LLM Backend Setup

Korean mode requires an OpenAI-compatible LLM backend. The easiest option is [Ollama](https://ollama.ai):

```bash
# Install and start Ollama, then pull a Korean-capable model
ollama pull qwen2.5:7b
```

### Launching with Korean Support

```bash
# English + Korean (both pipelines)
SSL_DIR=$(mktemp -d); python -m moshi.server --ssl "$SSL_DIR" --language all \
--llm-endpoint http://localhost:11434/v1 --llm-model qwen2.5:7b

# Korean only
SSL_DIR=$(mktemp -d); python -m moshi.server --ssl "$SSL_DIR" --language ko \
--llm-endpoint http://localhost:11434/v1 --llm-model qwen2.5:7b
```

### Korean CLI Arguments

| Argument | Default | Description |
|----------|---------|-------------|
| `--language` | `en` | Language mode: `en`, `ko`, or `all` |
| `--llm-endpoint` | `http://localhost:11434/v1` | OpenAI-compatible LLM API endpoint |
| `--llm-model` | `qwen2.5:7b` | LLM model name |
| `--llm-api-key` | `ollama` | API key for the LLM endpoint |
| `--whisper-model` | `large-v3` | Whisper model size for Korean ASR |
| `--cosyvoice-model` | `FunAudioLLM/CosyVoice2-0.5B` | CosyVoice2 model for Korean TTS |

### Korean Voices

| Voice | Description | Gender |
|-------|------------|--------|
| 한국어 여성 1 | Korean Female Natural | F |
| 한국어 여성 2 | Korean Female Expressive | F |
| 한국어 남성 1 | Korean Male Natural | M |
| 한국어 남성 2 | Korean Male Expressive | M |

### Korean Pipeline Architecture

```
User Mic → Opus → WebSocket → faster-whisper (Korean ASR)
Korean text
LLM (OpenAI-compatible API)
Korean response text
CosyVoice2-0.5B (Korean TTS, streaming)
PCM → Opus → WebSocket → Client Speaker
```

The Korean pipeline runs on a separate WebSocket endpoint (`/api/chat-ko`) and uses the same binary protocol as the English pipeline. Users select their language from the UI before connecting.

## License

The present code is provided under the MIT license. The weights for the models are released under the NVIDIA Open Model license.
Expand Down
40 changes: 33 additions & 7 deletions client/src/app.tsx
Original file line number Diff line number Diff line change
@@ -1,18 +1,44 @@
import { useState, useCallback, useMemo } from "react";
import ReactDOM from "react-dom/client";
import {
createBrowserRouter,
RouterProvider,
} from "react-router-dom";
import "./index.css";
import { Queue } from "./pages/Queue/Queue";
import { I18nContext, Language, translate } from "./i18n";

const router = createBrowserRouter([
{
path: "/",
element: <Queue />,
},
]);
const App = () => {
const [language, setLanguage] = useState<Language>("en");

const t = useCallback(
(key: string) => translate(language, key),
[language],
);

const i18nValue = useMemo(
() => ({ language, setLanguage, t }),
[language, t],
);

const router = useMemo(
() =>
createBrowserRouter([
{
path: "/",
element: <Queue />,
},
]),
[],
);

return (
<I18nContext.Provider value={i18nValue}>
<RouterProvider router={router} />
</I18nContext.Provider>
);
};

ReactDOM.createRoot(document.getElementById("root") as HTMLElement).render(
<RouterProvider router={router}/>
<App />
);
48 changes: 48 additions & 0 deletions client/src/i18n/en.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"app": {
"title": "PersonaPlex",
"description": "Full duplex conversational AI with text and voice control."
},
"queue": {
"textPromptLabel": "Text Prompt:",
"examplesLabel": "Examples:",
"textPromptPlaceholder": "Enter your text prompt...",
"voiceLabel": "Voice:",
"connectButton": "Connect",
"microphoneError": "Please enable your microphone before proceeding",
"languageLabel": "Language:"
},
"conversation": {
"newConversation": "New Conversation",
"disconnect": "Disconnect",
"connecting": "Connecting...",
"downloadAudio": "Download audio",
"connectionIssue": "A connection issue has been detected, you've been reconnected",
"dismiss": "Dismiss"
},
"serverInfo": {
"header": "Our server is running on the following configuration:",
"textTemperature": "Text temperature",
"textTopk": "Text topk",
"audioTemperature": "Audio temperature",
"audioTopk": "Audio topk",
"padMult": "Pad mult",
"repeatPenaltyLastN": "Repeat penalty last N",
"repeatPenalty": "Repeat penalty",
"lmModelFile": "LM model file",
"instanceName": "Instance name"
},
"stats": {
"title": "Server Audio Stats",
"audioPlayed": "Audio played:",
"missedAudio": "Missed audio:",
"latency": "Latency:",
"minMaxBuffer": "Min/Max buffer:"
},
"presets": {
"assistant": "Assistant (default)",
"medical": "Medical office (service)",
"bank": "Bank (service)",
"astronaut": "Astronaut (fun)"
}
}
45 changes: 45 additions & 0 deletions client/src/i18n/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import { createContext, useContext } from "react";
import en from "./en.json";
import ko from "./ko.json";

export type Language = "en" | "ko";

const translations: Record<Language, typeof en> = { en, ko };

export type I18nContextType = {
language: Language;
setLanguage: (lang: Language) => void;
t: (key: string) => string;
};

/**
* Get a nested value from an object using a dot-separated key path.
*/
function getNestedValue(obj: Record<string, unknown>, keyPath: string): string {
const keys = keyPath.split(".");
let current: unknown = obj;
for (const k of keys) {
if (current === null || current === undefined || typeof current !== "object") {
return keyPath;
}
current = (current as Record<string, unknown>)[k];
}
return typeof current === "string" ? current : keyPath;
}

export function translate(language: Language, key: string): string {
return getNestedValue(
translations[language] as unknown as Record<string, unknown>,
key,
);
}

export const I18nContext = createContext<I18nContextType>({
language: "en",
setLanguage: () => {},
t: (key: string) => translate("en", key),
});

export function useI18n(): I18nContextType {
return useContext(I18nContext);
}
48 changes: 48 additions & 0 deletions client/src/i18n/ko.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"app": {
"title": "PersonaPlex",
"description": "텍스트와 음성 제어가 가능한 전이중 대화형 AI."
},
"queue": {
"textPromptLabel": "텍스트 프롬프트:",
"examplesLabel": "예시:",
"textPromptPlaceholder": "텍스트 프롬프트를 입력하세요...",
"voiceLabel": "음성:",
"connectButton": "연결",
"microphoneError": "진행하기 전에 마이크를 활성화해 주세요",
"languageLabel": "언어:"
},
"conversation": {
"newConversation": "새 대화",
"disconnect": "연결 해제",
"connecting": "연결 중...",
"downloadAudio": "오디오 다운로드",
"connectionIssue": "연결 문제가 감지되어 다시 연결되었습니다",
"dismiss": "닫기"
},
"serverInfo": {
"header": "서버 구성 정보:",
"textTemperature": "텍스트 온도",
"textTopk": "텍스트 Top-K",
"audioTemperature": "오디오 온도",
"audioTopk": "오디오 Top-K",
"padMult": "패드 배수",
"repeatPenaltyLastN": "반복 패널티 마지막 N",
"repeatPenalty": "반복 패널티",
"lmModelFile": "LM 모델 파일",
"instanceName": "인스턴스 이름"
},
"stats": {
"title": "서버 오디오 통계",
"audioPlayed": "재생된 오디오:",
"missedAudio": "누락된 오디오:",
"latency": "지연 시간:",
"minMaxBuffer": "최소/최대 버퍼:"
},
"presets": {
"assistant": "AI 비서 (기본)",
"medical": "의료 상담 (서비스)",
"bank": "은행 상담 (서비스)",
"astronaut": "우주비행사 (재미)"
}
}
20 changes: 14 additions & 6 deletions client/src/pages/Conversation/Conversation.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import { ModelParamsValues, useModelParams } from "./hooks/useModelParams";
import fixWebmDuration from "webm-duration-fix";
import { getMimeType, getExtension } from "./getMimeType";
import { type ThemeType } from "./hooks/useSystemTheme";
import { useI18n, type Language } from "../../i18n";

type ConversationProps = {
workerAddr: string;
Expand All @@ -21,6 +22,7 @@ type ConversationProps = {
sessionId?: number;
email?: string;
theme: ThemeType;
language?: Language;
audioContext: MutableRefObject<AudioContext|null>;
worklet: MutableRefObject<AudioWorkletNode|null>;
onConversationEnd?: () => void;
Expand All @@ -36,13 +38,15 @@ const buildURL = ({
email,
textSeed,
audioSeed,
language = "en",
}: {
workerAddr: string;
params: ModelParamsValues;
workerAuthId?: string;
email?: string;
textSeed: number;
audioSeed: number;
language?: Language;
}) => {
const newWorkerAddr = useMemo(() => {
if (workerAddr == "same" || workerAddr == "") {
Expand All @@ -53,7 +57,8 @@ const buildURL = ({
return workerAddr;
}, [workerAddr]);
const wsProtocol = (window.location.protocol === 'https:') ? 'wss' : 'ws';
const url = new URL(`${wsProtocol}://${newWorkerAddr}/api/chat`);
const chatEndpoint = language === "ko" ? "/api/chat-ko" : "/api/chat";
const url = new URL(`${wsProtocol}://${newWorkerAddr}${chatEndpoint}`);
if(workerAuthId) {
url.searchParams.append("worker_auth_id", workerAuthId);
}
Expand Down Expand Up @@ -88,8 +93,10 @@ export const Conversation:FC<ConversationProps> = ({
isBypass=false,
email,
theme,
language = "en",
...params
}) => {
const { t } = useI18n();
const getAudioStats = useRef<() => AudioStats>(() => ({
playedAudioDuration: 0,
missedAudioDuration: 0,
Expand Down Expand Up @@ -120,6 +127,7 @@ export const Conversation:FC<ConversationProps> = ({
email: email,
textSeed: textSeed,
audioSeed: audioSeed,
language,
});

const onDisconnect = useCallback(() => {
Expand Down Expand Up @@ -223,14 +231,14 @@ export const Conversation:FC<ConversationProps> = ({

const socketButtonMsg = useMemo(() => {
if (isOver) {
return 'New Conversation';
return t("conversation.newConversation");
}
if (socketStatus === "connected") {
return 'Disconnect';
return t("conversation.disconnect");
} else {
return 'Connecting...';
return t("conversation.connecting");
}
}, [isOver, socketStatus]);
}, [isOver, socketStatus, t]);

return (
<SocketContext.Provider
Expand Down Expand Up @@ -272,7 +280,7 @@ export const Conversation:FC<ConversationProps> = ({
/>
<UserAudio theme={theme}/>
<div className="pt-8 text-sm flex justify-center items-center flex-col download-links">
{audioURL && <div><a href={audioURL} download={`personaplex_audio.${getExtension("audio")}`} className="pt-2 text-center block">Download audio</a></div>}
{audioURL && <div><a href={audioURL} download={`personaplex_audio.${getExtension("audio")}`} className="pt-2 text-center block">{t("conversation.downloadAudio")}</a></div>}
</div>
</div>
<div className="scrollbar player-text" ref={textContainerRef}>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import { FC, useRef } from "react";
import { AudioStats, useServerAudio } from "../../hooks/useServerAudio";
import { ServerVisualizer } from "../AudioVisualizer/ServerVisualizer";
import { type ThemeType } from "../../hooks/useSystemTheme";
import { useI18n } from "../../../../i18n";

type ServerAudioProps = {
setGetAudioStats: (getAudioStats: () => AudioStats) => void;
Expand All @@ -12,18 +13,19 @@ export const ServerAudio: FC<ServerAudioProps> = ({ setGetAudioStats, theme }) =
setGetAudioStats,
});
const containerRef = useRef<HTMLDivElement>(null);
const { t } = useI18n();
return (
<>
{hasCriticalDelay && (
<div className="fixed left-0 top-0 flex w-screen justify-between bg-red-500 p-2 text-center">
<p>A connection issue has been detected, you've been reconnected</p>
<p>{t("conversation.connectionIssue")}</p>
<button
onClick={async () => {
setHasCriticalDelay(false);
}}
className="bg-white p-1 text-black"
>
Dismiss
{t("conversation.dismiss")}
</button>
</div>
)}
Expand Down
Loading