In-browser Speech-to-Speech Integration with ChatGPT
This project is a voice-based interactive chatbot system that leverages Express.js, web sockets, Google Speech-to-Text, OpenAI, and Resemble AI. It allows users to have fluid, real-time conversations with a browser-based interface powered by OpenAI’s GPT-4. Users can speak directly into their microphone, and the system transcribes their speech in real-time using Google Cloud Speech-to-Text. The transcribed text is then sent to OpenAI’s API to generate a response. This response is converted into speech using Resemble AI’s text-to-speech service, which is played back to the user.
Here’s a summary of the steps this system takes:
- Set up Express Server: The Express.js app serves the user interface (UI) and handles API requests.
- Views are rendered using the EJS templating engine.
- Frontend Route: The root route (
/
) renders anindex.ejs
file as the basic user interface. - Run the Server: The server listens on the defined port and logs that it is running.
- Google Cloud Speech-to-Text: A connection is established using web sockets (
socket.io
) to stream live audio data from the client to the backend - Real-Time Communication:
- On connection, the backend creates a transcription stream for each client.
- As the client sends audio data (
audioData
event), it is streamed into the Google Speech-to-Text service for transcription. - The transcription is returned via the socket (
transcription
event) to update the UI.
- Chat Completion with OpenAI:
- The
/chat_completion
route handles POST requests containing a conversation history injsonData.messages
. - The conversation is sent to OpenAI’s Completions API (GPT-4), and the response is streamed directly back to the client.
- The
- Generate Audio Clip with Resemble AI:
- The
/audio_clip
route receives a text query and callsgenerateAudioClip
to send the text to Resemble AI’s API. - The API converts the text into speech, and the resulting audio url is downloaded and played on the frontend
- The
- Handle Client Disconnects:
- When a client disconnects, the corresponding transcription stream is closed, and the client is removed from the active streams list.
Installation
Clone this repository to your local machine.
git clone https://github.com/drewski90/Lucielle-Audio-Chat.git
Install dependencies using npm:
npm install
Set up environment variables by creating a .env
file at the root of the project and adding the following variables:
OPENAI_API_KEY=YOUR_OPENAI_API_KEY
RESEMBLE_PROJECT=YOUR_RESEMBLE_PROJECT_ID
RESEMBLE_API_KEY=YOUR_RESEMBLE_API_KEY
RESEMBLE_VOICE_ID=YOUR_RESEMBLE_VOICE_ID
GOOGLE_APPLICATION_CREDENTIALS=PATH_TO_YOUR_GOOGLE_CLOUD_SERVICE_ACCOUNT_KEY
Replace PATH_TO_YOUR_GOOGLE_CLOUD_SERVICE_ACCOUNT_KEY
with the path to your Google Cloud service account key file. Speech to text needs to be enabled on the account.
Run the app
npm start