In-browser Speech-to-Speech Integration with ChatGPT

In-browser Speech-to-Speech Integration with ChatGPT

This project is a voice-based interactive chatbot system that leverages Express.js, web sockets, Google Speech-to-Text, OpenAI, and Resemble AI. It allows users to have fluid, real-time conversations with a browser-based interface powered by OpenAI’s GPT-4. Users can speak directly into their microphone, and the system transcribes their speech in real-time using Google Cloud Speech-to-Text. The transcribed text is then sent to OpenAI’s API to generate a response. This response is converted into speech using Resemble AI’s text-to-speech service, which is played back to the user.

Here’s a summary of the steps this system takes:

  1. Set up Express Server: The Express.js app serves the user interface (UI) and handles API requests.
    • Views are rendered using the EJS templating engine.
  2. Frontend Route: The root route (/) renders an index.ejs file as the basic user interface.
  3. Run the Server: The server listens on the defined port and logs that it is running.
  4. Google Cloud Speech-to-Text: A connection is established using web sockets ( to stream live audio data from the client to the backend
  5. Real-Time Communication:
    • On connection, the backend creates a transcription stream for each client.
    • As the client sends audio data (audioData event), it is streamed into the Google Speech-to-Text service for transcription.
    • The transcription is returned via the socket (transcription event) to update the UI.
  6. Chat Completion with OpenAI:
    • The /chat_completion route handles POST requests containing a conversation history in jsonData.messages.
    • The conversation is sent to OpenAI’s Completions API (GPT-4), and the response is streamed directly back to the client.
  7. Generate Audio Clip with Resemble AI:
    • The /audio_clip route receives a text query and calls generateAudioClip to send the text to Resemble AI’s API.
    • The API converts the text into speech, and the resulting audio url is downloaded and played on the frontend
  8. Handle Client Disconnects:
    • When a client disconnects, the corresponding transcription stream is closed, and the client is removed from the active streams list.


Clone this repository to your local machine.

git clone

Install dependencies using npm:

npm install

Set up environment variables by creating a .env file at the root of the project and adding the following variables:


Replace PATH_TO_YOUR_GOOGLE_CLOUD_SERVICE_ACCOUNT_KEY with the path to your Google Cloud service account key file. Speech to text needs to be enabled on the account.

Run the app

npm start


Leave a Reply