In-browser Speech-to-Speech Integration with ChatGPT

This project is a voice-based interactive chatbot system that leverages Express.js, web sockets, Google Speech-to-Text, OpenAI, and Resemble AI. It allows users to have fluid, real-time conversations with a browser-based interface powered by OpenAI’s GPT-4. Users can speak directly into their microphone, and the system transcribes their speech in real-time using Google Cloud Speech-to-Text. The transcribed text is then sent to OpenAI’s API to generate a response. This response is converted into speech using Resemble AI’s text-to-speech service, which is played back to the user.

Here’s a summary of the steps this system takes:

Set up Express Server: The Express.js app serves the user interface (UI) and handles API requests.
- Views are rendered using the EJS templating engine.
Frontend Route: The root route (/) renders an index.ejs file as the basic user interface.
Run the Server: The server listens on the defined port and logs that it is running.
Google Cloud Speech-to-Text: A connection is established using web sockets (socket.io) to stream live audio data from the client to the backend
Real-Time Communication:
- On connection, the backend creates a transcription stream for each client.
- As the client sends audio data (audioData event), it is streamed into the Google Speech-to-Text service for transcription.
- The transcription is returned via the socket (transcription event) to update the UI.
Chat Completion with OpenAI:
- The /chat_completion route handles POST requests containing a conversation history in jsonData.messages.
- The conversation is sent to OpenAI’s Completions API (GPT-4), and the response is streamed directly back to the client.
Generate Audio Clip with Resemble AI:
- The /audio_clip route receives a text query and calls generateAudioClip to send the text to Resemble AI’s API.
- The API converts the text into speech, and the resulting audio url is downloaded and played on the frontend
Handle Client Disconnects:
- When a client disconnects, the corresponding transcription stream is closed, and the client is removed from the active streams list.

Installation

Clone this repository to your local machine.

git clone https://github.com/drewski90/Lucielle-Audio-Chat.git

Install dependencies using npm:

npm install

Set up environment variables by creating a .env file at the root of the project and adding the following variables:

OPENAI_API_KEY=YOUR_OPENAI_API_KEY
RESEMBLE_PROJECT=YOUR_RESEMBLE_PROJECT_ID
RESEMBLE_API_KEY=YOUR_RESEMBLE_API_KEY
RESEMBLE_VOICE_ID=YOUR_RESEMBLE_VOICE_ID
GOOGLE_APPLICATION_CREDENTIALS=PATH_TO_YOUR_GOOGLE_CLOUD_SERVICE_ACCOUNT_KEY

Replace PATH_TO_YOUR_GOOGLE_CLOUD_SERVICE_ACCOUNT_KEY with the path to your Google Cloud service account key file. Speech to text needs to be enabled on the account.

Run the app

npm start

chatgpt express.js node.js openai resemble.ai speech-to-speech text-to-speech

In-browser Speech-to-Speech Integration with ChatGPT