← Back to all projects

Mr. Bones

Animatronic Halloween Friend

Mr. Bones intro video

Motivation

After completing GUNTHER, my roommate and I wanted another project that would challenge us and still provide a comical result. It was September 28th at the time, and with Halloween close enough we decided we wanted to create something in theme. That's when Mr. Bones was born, the idea to create a skeleton who would look around and converse with anybody on Halloween.

Implementation

Mr. Bones was implemented as a distributed system of four specialized processes running on a Raspberry Pi, communicating through POSIX message queues and pipes to enable real-time conversation and physical animation.

System Architecture

The system is composed of four independent processes that work in concert:

Jaw Controller

This process handles the physical actuation of the animatronic's jaw. It opens a message queue at /jaw-control and listens for a simple binary protocol:

  • 0x01 → Start talking (jaw movement)
  • 0x00 → Stop talking (jaw at rest)

Neck Controller

Responsible for managing the neck motion states, this process listens to a message queue at /neck-control and responds to the following commands:

  • 0x00 → Idle: Look around to random angles with noise every 2-8 seconds
  • 0x01 → Listening: Look straight forward with slight movement every 5 seconds
  • 0x02 → Thinking: Look upward in thought with subtle head movements

Speech Process

This is the central coordination process that uses Vosk for speech-to-text recognition. It manages a state machine with two primary states:

  • Wake Word Detection: Listens continuously for "Hey Mr Bones"
  • Conversation Mode: Captures user input after wake word detection

When speech is detected, it sends the text to the LLM process through a pipe and coordinates the physical responses by sending appropriate commands to both the jaw and neck controllers. When the LLM responds, it uses Piper for text-to-speech synthesis and plays the audio through the speakers.

LLM Backend

Built using llama.cpp with calme-3.1-llamaloi-3b-Q6_K.gguf as the backing pretrained model, this process listens for conversation input through stdin (redirected from the speech process via pipes) and streams responses token-by-token through stdout. The model was specifically chosen for its balance of performance and resource requirements suitable for the Raspberry Pi platform.

Performance Optimization

To achieve real-time conversation with minimal latency, the system employs several optimization strategies:

The complete system successfully ran on Halloween, engaging in real-time conversations with my friends and family. It was pretty awesome :)