GoodDog(.AI) — Reinforcement Learning Robots

2 Nov, 2022

New Code Repo

Recently, we’ve had the chance to rewrite the bot code. Over time, it had become progressively more and more difficult to work with the old ROS codebase. It was bad to have the realtime code and the training code be in two different repos (hard to sync the vector spaces when the code is in two places and in two different languages). Finally, adding more training data was more likely to cause weird NaN errors rather than improve robot performance.

RobotAI on Github

There are some nice new features based on learnings in the rewrite:

Cereal message system, based on Capnproto, from CommaAI.
MsgVec - There is one C++ class which is responsible for interfacing messages and vectors.
- It takes a stream of input messages, and creates a float vector for feeding into an ONNX graph.
- It takes an action vector from your RL output, and converts it back to messages to send out and control your robot.
- It works the same whether you are feeding live messages, or replaying a log
- Cython bindings so you can use the code in python
Companion web service
- Automatically upload log files after the robot runs
- Automatically download ONNX files, convert to TensorRT, validate results match between host and device
- Download and view logs and metadata
On-device video compression, based on NVIDIA libraries, allows for higher res recordings and saving space.
Unit test all the things, the new code is quite paranoid about making sure the inputs and outputs match between training and runtime. Any weirdnesses raise exceptions so that you can debug, rather than live with silent failures.
SAC reinforcement learning based on Stable Baselines 3
Vision model based on YoloV7 with no modifications
Reward calculations are a seperate, fully contained ONNX graph
Better config and caching to keep iteration-speeed fast

9 Feb, 2022

Basic Obstacle Avoidance

Here we see our little GoodDog, automatically avoiding an obstacle. I’ll be honest, this only works fully about 25% of the time, the other 25% of the time it may move towards a direction which could potentially avoid a collision, and the rest of the time it will just fail to avoid it.

However, still impressive, as this is purely end-to-end trained on around 6 hours of real-world robot data. The reward function gives a reward signal when detectable objects (from a YOLOv5 network) are in view, with a bonus for looking at humans. Additionally, as the human, you are able to press a button on your phone when the robot is training, to send an additional reward or punishment signal. We often do this when the robot has gotten stuck on a chair or corner.

(Video from Feb 2nd, using the usual-paper-495 checkpoint)

16 Jan, 2022

Charging Station

Bumble testing out his new charging dock

We’ve built a new charging dock for the robot, with Taras H’s help!

The internals are pretty simple, it’s just a container for a standard 4S LifePo4 battery charger, plus some spring contacts and plates added to the underside of the robot.

The next plan is to modify the robot’s reward function to reward finding its charging dock when its battery is low, and then getting it to dock itself.

19 Nov, 2021

GoodDog Bumble v0.2

We just got the Bumble v0.2 operational, especially big thanks to Taras H. for the mechanical design here!

As you see in the video, the motion of the bot, especially the head, is extremely smooth. We decided to drop the Dynamixel servos that were in version 0.1, and go instead for Brushless Gimbal Motors that you’d typically find stabilizing the camera on a drone.

In real life, it’s almost surreal watching the robot operate at this speed. The head just moves extremely smoothly, and completely silently. The main drive motors are also extremely quiet. This means that the robot should end up extremely low maintenance, and will stand up to lots of repetitive motions without any degradation.

It’s unfortunately not a cheap robot, but the total BOM cost is just under $2,000 USD (The most expensive part is the Jetson AGX Xavier dev kit). The cost of a comparable robot is probably in the $5k-$10k range from something like a Segway platform . A GoodDog Bumble is something that you can build with just basic tools, and a generic 3D printer.

All of the mechanical design files, 3D printing files, and provisioning instructions are available below:

24 Sep, 2021

Open Sourcing

The code for GoodDog.ai is now fully open source.

rossac - SAC training from ROS Bag files

bumble - Run ONNX models in ROS

Project Log:

Ability to have an arbitrary length history buffer to allow the RL network to have ability to see events in the past.
Currently trying just last 4 to 10 entries (a few seconds) worth of history, but some RNNs are possible with this approac
New YoloV5 checkpoints with higher accuracy, and fewer custom steps needed to update
Ability to inference YOLO’s ONNX export on GPU when doing offline training
Progress on “rewardbutton” bluetooth connection plus Android app

24 Aug, 2021

Robot following indoors

Here’s a small video of our latest checkpoint. We’ve previously run into problems where the robot would learn some very limited behaviors within one training checkpoint. Ex. you’d train a single checkpoint, and it’s behavior would collapse so it would only drive backwards, or only turn left, or only follow a human within an outdoor environment (with fewer cluttered objects). This checkpoint (sac-peachy-resonance-379-21504), is the first to have a variety of behavior depending on the situation.

Project Log:

Reduced everything down to 2Hz update rate, so that we don’t rattle the pan-tilt module to pieces.
Fixed bug with dropout being too high in training. (Now we can set dropout for different parts of the observation space independently.)
Discovered some LR schedules and rates which help prevent Q function collapse.
Up next: Being able to pass an N-element observation history to the network, so it can start to gain a memory.

2 Aug, 2021

Following humans

In this video, the robot has learned to keep a human “in-frame” by turning the wheels left and right. As I move to the left and right, it will slowly turn the motors to keep me centered. This behavior is fully trained in an end-to-end manner. The camera input goes to a pre-trained vanilla YoloV5s network, and one of the penultimate layers is then fed into a small MLP network trained with SAC to predict motor speeds and pan-tilt angles on the head.

One thing we saw was that the training data all came from the backyard, when running the same checkpoint indoors, the robot basically sits still, or rotates in place without much activity. But, if you take it outside back to where it was trained, the variety of behaviors increases dramatically.

Another downside, is that the left/right motion does not occur in all orientations or parts of the backyard, it can be location dependent.

Running the sac-amber-snow-284-05632 checkpoint.

Project Log:

Fixed bug with calculating the reward function from Yolo intermediate frame.
Sped up training with keeping the replay buffer on the GPU.

Next up:

Can we train in a “small-offset” simulator?
Continue training same checkpoint, potentially with newer bag files
Fix bugs with bag recording and message sending reliability

23 Jul, 2021

Exploring the lawn

Just a shot of the robot exploring the lawn. Running the sac-13200-bs2048-dropout0.88-normalized_env_newbags checkpoint.

Project Log:

Identified and fixed NaNs and large values in the observation/reward buffer
Identified throttling issue with bag recording, even with reduced camera framerates
Partial success testing the bluetooth reward/penalty button, decided to move to a phone-app based design.

20 Jul, 2021

Introducing GoodDog.ai

Very little work in our field of deep learning has been focused on the practical objective of building a synthetic animal-like consciousness. This lack of effort holds us back from achieving empathetic robots that can interact with their physical and social environments like animals do and thus holds us back from having empathy towards these synthetic entities ourselves. Imagine how nice it would be to have a pet “robot” who could give you the attention and love that a puppy or kitten could, all the while knowing that any attention you give it will be rewarded with real growth and learning for both of you.

Despite all the hype around machine learning, and all the results achieved by deep neural networks like GPT-3 and others, why hasn’t anyone worked on this? There has been extensive focus on dry, statistical approaches to text, image, and video understanding. These deep neural networks have now surpassed human ability in many subtasks of daily life, such as object recognition, video segmentation, depth perception, and language skills. But no project has been able to combine these available pieces into an agent that’s as flexible or enjoyable to spend time with as even a pet rodent.

I predict that simple reinforcement learning algorithms, trained with input from the final hidden layers of state-of-the-art perception networks, can achieve rich and complex objectives in the physical world. We know it will not be easy to design the proper reward functions, to gather the vast amount of needed training data, or to more deeply explore this promising space. However, we think that this path represents the most promising one yet towards creating synthetic conscious entities with empathy for the common person.

It is time that we connect the amazing perceptual abilities of networks such as Yolo and EfficientNet to a physical representation in the real world. Let’s build a robot that combines a state of the art vision network (YoloV5) as the basic input for reinforcement learning (SAC) that can optimize for the objective of exploring and interacting with the real world.

In this light, we are launching GoodDog.ai, an open project whose aim it will be to use the latest state of the art in machine learning to create a mobile robot capable of passing the Puppy Turing Test. Namely, we are building a synthetic entity capable of interacting with its surroundings in a way to evoke sustained empathy from humans. The chief theory behind this project will be to take state of the art perceptual neural networks, run them in real time on inexpensive mobile robots, and then train reinforcement learning algorithms on the final perceptual layers of these neural nets.

Stay tuned for more information on our published robot designs, BOMs, and software stack.