Introducing PetCam: a non-invasive machine learning-powered pet tracker that runs on older smartphones. This project is a collaboration between me and Jason Mayes, who came up with the idea. Also, funny story, uh… my colleague Marco Lepisto made the exact same project (almost) on his YouTube show, Level Up, at the same time. Which you can see here.. We use old smartphones. He uses a Coral Development Board. Choose your own adventure.
When I was young and living at home in New Jersey, my parents were really strict with me about locking the garage at night. Because if I don't close the garage something like this will happen:
Then the next morning, we'd walk out the front door, hit with a thud of dirty diapers, and find a trash bag ripped open and empty all over our driveway. Someone had clearly had a wild night.
Bears love to eat garbage. Raccoons love to eat garbage. Even the foxes will get trashed on it if you make it easy for them. All things considered, I probably should have been better about remembering to lock the garage. But it would also have been nice to have a little machine learning app hacked onto my dad's Nest camera rig, so that whenever it saw a bear taking out our trash, it would sound a loud, scary-sounding air horn. The noise of Scare the bandit back into the forest.
What I wanted at the time was the Pet Cam, the project I'm bringing you today: an App that alerts you when your dog, cat, bird, chicken (more animals pending) jumps on your bed, sofa, chair, laptop etc.
When Fluffy jumps on your couch (“event of interest”), PetCam sends you an alert via Slack and saves the snapshot to a “Diary” app in the cloud.
PetCam can be run on a smartphone and processes all video locally (ie the video stream never leaves your device). But if you want to save a photo of an event of interest (ie snap a photo of Fluffy on the sofa and save it), you can configure the PetCam to send a snapshot of that moment to the cloud.
Here is the review.
architecture
The project is divided into three main parts:
1. Pet detection front end
This is the bit that uses your camera and a TensorFlow.js machine learning model to analyze your pet's behavior. It's built in plain JavaScript and runs in the browser, and you can try it out for yourself right now. Here.
It uses a TensorFlow.js machine learning model that analyzes data from your phone's camera (or webcam) in the browser, without ever sending the data back to the cloud (which is what makes it “privacy-preserving”). makes). However, to store these events in the log (ie “cat on the couch”), we send images and event strings to the firestore so that we can later view the events in the diary (more on that later).
2. Pet “diary” frontend
“Diary” is a second front-end app that lets you view a log of all the activities your PetCam has detected. It basically loads the data stored in Fire Store in a beautiful UI.
3. Firebase Backend (+ Slack API)
The serverless “backend” does two things: it stores a log of events and images and also powers Slack notifications. So, when your pet does something neat, you get a slack ping about it:
It is power from all Fire baseGoogle's lightweight tool suite for building serverless backends, plus the Slack API.
3a Auto ML Vision Chicken Detection Model
Well, this part is really a “pretty” one. For PetCam 1.0, we will use a pre-built machine learning model (Coco SSD) that analyzes many animals by default. But it doesn't recognize baby chicks (my new and only pet) very well. So in this third part, we'll build a custom TensorFlow chicken detection model using Google Cloud AutoML:
Phew, this project is kind of a chore, isn't it? Two different front ends, isn't that a bit redundant? As always, you can find all the code to build this app yourself. Here In building with ML Repo. Or, if you're just all about the front-end of pet tracking, you can find the code and live demo Error link.
Now, since this project is huge and I have a very short attention span, I won't bore you with all the minutiae of how Jason and I verified and installed TensorFlow and the cloud function and Deployed blahdy-blah. You can figure most of this out yourself. Reading the codeStudy READMEWatching our YouTube video. [TODO]or (Please Don't Kill Me) Googling around.
Instead, I just want to cover every single component at 5,000 feet and focus mainly on the difficult, unexpected obstacles Jason and I had to deal with while building this thing, so that you can solve them yourself. Don't need it and we can feel smart. .
Now, on to the code mines!
Tracking pets (or people or cars or Amazon packages)
Have you ever tried? Object detection On your webcam, in a browser, using JavaScript?
Image by M. Thaler from Wikipedia
Turns out it's really easy. Pretty simple, if you ask me! Soon you'll be building web apps to track your cats. you.
You can set this up in a few lines of code:
The first thing this code does is load a pre-trained, generic target COCO-SD Computer Vision TensorFlow.js Model. COCO-SSD recognizes 80 types of itemsincluding but not limited to: giraffe, tie, wine glass, broccoli, hair dryer.
Using the model is as simple as calling `model.detect(video)`, where `video` is a pointer to your webcam stream. (This is `videoid=”webcam” autoplay>` in HTML land. It makes more sense when you look at the code. Here In error
To analyze a video stream instead of just a single frame, run `model.detect` in a loop. Voila! You have real-time object detection.
How to calculate bounding box intersections/distances.
Jason and I designed this app so that when you think two things are connected—a dog and a bowl of water, a cat and your laptop, you and your refrigerator— The app triggers an event, namely “Man in the fridge.” Then we store this event in a. Fire Store backend so that we can see a log of all past events in the future, and trigger a Slack notification (more on the backend in a bit). By intersecting, I mean when two bounding boxes around detected objects intersect:
In this photo, my sweet little girl Millie is “hinged” with her water dish, and so I conclude that she is probably drinking.
Jason has also created a nice interface that helps you choose which two items you want to track:
How do you calculate how close two bounding boxes are to each other? For this project, we wanted to know not only whether two bounding boxes (“bboxes”) intersect or not, but also, if they don't, how far apart are they? This is the code:
It returns 0 if the two boxes intersect, or the distance between them if they don't.
Know when to send alerts.
Once I figured out this box-intersection code, I felt great! But then Jason and I discovered a less sexy but more annoying problem that we didn't know how to solve: How do you know when to save an event and send an alert to the user?
Note that for real-time object detection and tracking, we run code to analyze our webcam image and detect an intersection many times a second. So, if your corgi Rufus is just chilling on the couch, we'll count how many times per second the “dog” bounding box intersects the “couch” bounding box. Obviously, we only want to count it as an “incident” when it first happens, when Rufus jumps on the couch, but not after that. Then if Rufus leaves and comes back later, we can fire the event again. So maybe we create a variable that keeps track of whether we've sent a notification to the user, and resets it when Rufus leaves the couch.
Except it's more complicated than that, because what if Rufus is hovering in front of the couch, or running around it, our code will make him turn the couch “on” and “off” several times in the same few seconds. “Recognizes as?” Like that dog with Zumi?
We don't want to spam our user with event notifications that aren't really “unique”. We need to do some sort of “debounce”, limiting how often we can send alerts. Sounds simple, right – like we should just add a cooling interval so we don't send users too many notifications over time? This is, in fact, what Jason and I did, and it looks like this in the code.
// Min number of seconds before we send another alert.
const MIN_ALERT_COOLDOWN_TIME = 60;
if (sendAlerts) {
sendAlerts = false;
sendAlert(naughtyAnimals);
setTimeout(cooldown, MIN_ALERT_COOLDOWN_TIME * 1000);
}
The cooling off period is not enough! Because if you have give Dog, or lots of dogs, running on and off the couch? Because if Rufus jumps and then Milo jumps right behind him, those are two unique events, and we want to alert the user for both. Suddenly you have to know. which one Dogs are walking around and tracking their position and you have the hairy “multi-object tracking” problem, which sounds like someone's PhD thesis, and it is! (Actually, you get this functionality for free. Google Cloud Video Intelligence APIbut we're stuck in TensorFlow.js land here and it's hard).
The feeling really made me sweat, and for a while, Jason and I thought we were screwed. But then we did what one always does when one is out of one's technical depth: solve a not-so-good but very simple problem. Our compromise was this: we would arrange to alert users when there were dogs on the couch. Increased But not when the number of dogs was going down and down the stream, like this:
To implement this, we kept a counter of how many animals were in the frame. When the number of animals increased, we sent a notification and increased the counter. However, we never decremented the counter until the number of animals in the frame reached zero (at which point, we performed a hard reset). Sounds weird, but take a look at the code here:
Setting up an alert system like this is a bit counter-intuitive, but take a second to convince yourself that it actually gives you the behavior you, as a user, probably want. There are a few different cases in which our algorithm does or does not fire an alert.
Scenarios | Send an alert? |
---|---|
1 dog -> 2 dogs | Yes |
2 dogs -> 3 dogs | Yes |
3 dogs -> 2 dogs | no |
2 dogs -> 1 dog -> 2 dogs | no |
2 dogs -> 1 dog -> 0 dogs -> 1 dog | Yes (for that last dog that jumps on the couch) |
Oops on the easy stuff!
Creating a pet diary/viewer app
So far, everything we've discussed runs entirely in your browser. This is good, because no data is leaving your phone or computer, so you can feel good from a privacy perspective. However, when events of interest occur, we want to be able to remember them and revisit them later. For this, we will use Firebase. Fire StoreA simple 'n lightweight database in the cloud that you can easily read in the browser from our Write.
I have used firestore. Lots of projects And talked about this many times on Deal on AI, I won't go into detail here. When the front-end PetCam app detects an event (via the algorithm above), it writes some event data to the FireStore. Here's what my FireStore database looks like:
Because writing to and reading from Firestore is easy, building a front-end “diary” viewer app around the data stored by Firestore was fairly straightforward. I built it using React, and you can find the code. Here. Here's what the Viewer app looked like: