In-browser Speech-to-Speech Integration with ChatGPT

In-browser Speech-to-Speech Integration with ChatGPT

This project is a voice-based interactive chatbot system that leverages Express.js, web sockets, Google Speech-to-Text, OpenAI, and Resemble AI. It allows users to have fluid, real-time conversations with a browser-based interface powered by OpenAI’s GPT-4. Users can speak directly into their microphone, and the system transcribes their speech in real-time using Google Cloud Speech-to-Text. The transcribed text is then sent to OpenAI’s API to generate a response. This response is converted into speech using Resemble AI’s text-to-speech service, which is played back to the user.

Here’s a summary of the steps this system takes:

  1. Set up Express Server: The Express.js app serves the user interface (UI) and handles API requests.
    • Views are rendered using the EJS templating engine.
  2. Frontend Route: The root route (/) renders an index.ejs file as the basic user interface.
  3. Run the Server: The server listens on the defined port and logs that it is running.
  4. Google Cloud Speech-to-Text: A connection is established using web sockets (socket.io) to stream live audio data from the client to the backend
  5. Real-Time Communication:
    • On connection, the backend creates a transcription stream for each client.
    • As the client sends audio data (audioData event), it is streamed into the Google Speech-to-Text service for transcription.
    • The transcription is returned via the socket (transcription event) to update the UI.
  6. Chat Completion with OpenAI:
    • The /chat_completion route handles POST requests containing a conversation history in jsonData.messages.
    • The conversation is sent to OpenAI’s Completions API (GPT-4), and the response is streamed directly back to the client.
  7. Generate Audio Clip with Resemble AI:
    • The /audio_clip route receives a text query and calls generateAudioClip to send the text to Resemble AI’s API.
    • The API converts the text into speech, and the resulting audio url is downloaded and played on the frontend
  8. Handle Client Disconnects:
    • When a client disconnects, the corresponding transcription stream is closed, and the client is removed from the active streams list.

Installation

Clone this repository to your local machine.

git clone https://github.com/drewski90/Lucielle-Audio-Chat.git

Install dependencies using npm:

npm install

Set up environment variables by creating a .env file at the root of the project and adding the following variables:

OPENAI_API_KEY=YOUR_OPENAI_API_KEY
RESEMBLE_PROJECT=YOUR_RESEMBLE_PROJECT_ID
RESEMBLE_API_KEY=YOUR_RESEMBLE_API_KEY
RESEMBLE_VOICE_ID=YOUR_RESEMBLE_VOICE_ID
GOOGLE_APPLICATION_CREDENTIALS=PATH_TO_YOUR_GOOGLE_CLOUD_SERVICE_ACCOUNT_KEY

Replace PATH_TO_YOUR_GOOGLE_CLOUD_SERVICE_ACCOUNT_KEY with the path to your Google Cloud service account key file. Speech to text needs to be enabled on the account.

Run the app

npm start

 

Leave a Reply