I turned the most evil thing in the universe into a lightswitch.
I did not want Google or Amazon listening to me, so I created my own passive-aggressive voice assistant based on the main villain from Portal 2 to control my smart home.
You can now test out the GLaDOS Voice Generator TTS module.
This is an ongoing DIY project by a Portal fan, not a product available commercially. Code is distributed as is to work as an example for others.
About the DIY voice assistant project
I wanted to control my smart home with voice, but having Amazon or Google devices around creeped me out because their devices and AI are just black boxes. I have no idea what they are doing in the background, and it’s out of my control.
So, I created my own voice assistant that runs locally with Python and a potato running Ubuntu This project is still ongoing, and this is a short overview of what I’ve done so far. You can follow me (@nerdaxic) on Twitter; I post updates there. This project seems to have turned into functional art.
Why GLaDOS as voice assistant?
Mostly because I found GLaDOS funny. The character’s voice is recognizable and relatively easy to generate with a female TTS engine and Melodyne.
In addition, GLaDOS is a well-written character in the portal game series, with a good back-story and personality. This is important to write the responses for the assistant to stay in character.
- Local Trigger word detection using PocketSphinx
- Speech to text processing using Google’s API (for now)
- GLaDOS Text-to-Speech generation using locally hosted TTS
- Animatronic eye control using servos
- Round LCD for an eye to display textures
GLaDOS Text-to-speech engine
The voice assistant uses a locally hosted TTS API to generate the voice samples.
The open source neural TTS runs quickly enough, that the voice samples do not need to be cached.
If you wish to try it out, you can test my public GLaDOS Voice Generator
Works with Home Assistant
Device communicates locally with Home Assistant server to speak out notifications, get sensor data and control lights, scenes and devices.
- “Turn on living room ac”
- “When is the sauna ready”
- “What is the bedroom humidity”
Infrared face recognition
There is Raspberry Pi NoIR camera built into the bezel of the eye.
Idea is to mount the head on a neck mechanism, and use the face coordinates from camera to turn the head towards you when you speak to GLaDOS. Even in the dark!
Large animatronic eye can move in and out, and open and close eyelids.
Eye has a round LCD and neopixel-ring to provide the light and texture.
What can it do?
|Basic functionality||Novelty||Home Assistant|
|Tell the current time||Magic 8 ball answers||Control lights and scenes|
|Timers||Lore friendly jokes||Control devices and AC|
|Add things to shopping list||Answer to pleasantries||Tell the temperature / humidity|
|Current weather||Move eye around||Speak notifications aloud|
|Weekly weather forecast||IR-camera for face recognition||Estimate when sauna is ready|
Skills in development
- Local speech recognition for added data security
- Wolfram alpha integration for math and numeric data
“How much sodium in 25 grams of salt”
“What was the oil price in 1972”
GLaDOS Animatronic Hardware
Animatronic GLaDOS body is mostly 3D-printed, based on YouTuber Mr. Volt’s open source design.
Check out his original video on YouTube.
The main difference between our projects is that Mr. Volt used Amazon Echo as the voice assistant, where I made my own. His published design was not complete, so I had to re-design some of the internal mechanics of my own.
|Power supply for Digital + Audio||Raspberry Pi 15W USB-C Power supply|
|Camera||Raspberry Pi NoIR camera|
|Microcontroller||Teensy 4, to control the eye LCD and NeoPixels|
|Eye lights||Adafruit NeoPixel Diffused 5mm Through-Hole for the “REC” light|
|Eye lights||Adafruit 16 x 5050 NeoPixel Ring|
|Eye LCD||1.28 Inch TFT LCD Display Module Round, GC9A01 Driver SPI Interface 240 x 240|
Adafruit stereo amplifier
GLaDOS has two paper-cone speakers built into her hears giving her stereo sound.
These are powered by 3,7W Class D amplifier. Audio in, audio out. Set volume by jumper.
ReSpeaker Mic Array
Far-field microphone array device capable of detecting voices up to 5m away. Detects sound direction and improves audio quality by onboard processing.
|Audio amplifier||Adafruit Stereo 3.7W Class D Audio Amplifier|
|Speakers||Visaton FRS 7|
|Microphone & Audio interface||ReSpeaker Mic Array V2.0|
Mechanics are powered from their own power supply to allow more power for the servos and prevent brown-outs.
|Power supply||MeanWell LRS-50-5 5V|
|Servo controller||Pololu Micro Maestro|
|Servo: Eye movement||35 kg DS3235 (Control Angle 180)|
|Servo: Eyelids||25 kg DS3225 (Control Angle 180)|
|Screws||Various M3 and M4 screws|
|Jumper wires||0.32 mm²/22 AWG assortment|
Software & AI-Voice Assistant pipeline
Software works in few main parts built into glados.py and gladosTTS.py files.
1) Keyword detection with Pocketsphinx LiveSpeech
Main script hooks into the microphone plugged into the Raspberry Pi and is constantly listening for the selected trigger word. If Pocketsphinx detects the triggerword it is passed to the speech-to-text engine.
2) Python speech_recognition library
Once triggerword is detected, speech_recognition starts to listen for input from the microphone. After a small timeout the data is then passed to recognizer built in the library. This sends the recorded audio to Google for speech-to-text recognition and the service returns the spoken input as text.
This is not optimal and I’m looking for reliable solutions that can work locally, but with this solution I know what is being sent to Google and that it’s not being tied to any Google account.
3) Natural language processing & voice assistant skills
Once the command has been gotten from the speech regonition API, command is parsed. Currently this is done with a long if-else statement:
if 'cancel' in command: return elif 'time' in command: readTime() elif 'turn on' in command and 'daylight' in command: activateScene("scene.daylight") else: # tell user about an error and log the failed command
4) GLaDOS Text-to-Speech
In the Portal games GLaDOS has a distinctive and calm female voice. Originally the voice was generated by recording opera singer Ellen McLain speak the lines and pushing the audio through an autotune process to throw off the intonation to make her sound more robotic.
Local GLaDOS Text-to-speech (TTS) Voice Generator
I have moved to a neural network based GLaDOS TTS engine that can generate GLaDOS voice locally on the computer that is running the voice assistant. This makes the whole process more robust and secure. Things like weather forecasts take about 10 seconds to generate on an old laptop.
Online GLaDOS Voice Generator
I used to use glados.c-net.org to generate the voice samples over the API. Service uses the more traditional female TTS which is then pushed through Melodyne, but the process is really slow. With the que it can take half an hour to generate a simple voice sample!
This is why I have created local TTS cache of the audio samples and tried to break down the sentences into short segments. If the script finds the TTS sample from the memory card, play it. If not, generate it.
Most of the tasks and responses are repetitive, like time, weather and shopping list items. Device rarely needs to generate new samples.
Free and Open source (FOSS) Python voice assistant
You can download the code for free from GitHub.
You can use it for reference or build your own system for free!
GLaDOS Voice Assistant:
Old version GLaDOS Voice Assistant that works on Raspberry Pi:
AI-based stand-alone GLaDOS Text-to-speech engine:
Mr. Volt’s CAD-files: