Can you caption better than a 4-year-old?
מאת נביא שקר
(In English this time)
I recently tried the limits and failure-modes of Microsoft's new CaptionBot. It seems that CaptionBot doesn't really understand what it is looking at.
There are a few counters to that conclusion:
A) Neural networks see the world differently than we do, man!
B) Pfft. The neural-network behind CaptionBot just wasn't trained on enough data.
C) You need to be more forgiving and nurturing. Neural networks are like precious children, figuring out the world for the first time.
To which I say:
A) Maybe, but then the PR surrounding it shouldn't be "Our bot understands the content of images", but more like "Our bot sort of gets image-classification, and can sort of link that up to Flickr-type captions, if the images are similar to a dataset we trained on".
C) This post.
I managed to secure a 4-year-old's attention for about 20 minutes, and asked him to tell me what was in a series of photos ("describe what you see").
I offered encouragement, but not feedback.
Then I asked CaptionBot to do the same.
Here are the results:
4YO: This is a castle and these are buildings and that’s a wall and that’s a sign, and there’s grass
CaptionBot: I think it's a large stone building
4YO: I see a eagle and people are falling into the water, and that’s Moses, and those are the bad guys in the water
CaptionBot: I am not really confident, but I think it's a group of stuffed animals
4YO: The lion is trying to eat that person up
CaptionBot: I am not really confident, but I think it's a cat playing with a toy
4YO: People are pulling the horse with carriages, and they’re pulling it and pushing it and pulling it and pushing it, cause they want to break other people with that horse, and he’ll step on them, and he likes the nice guys
CaptionBot: I am not really confident, but I think it's a group of men standing next to a horse
4YO: They’re fighting. That’s Alexander and he’s going to win. He’s holding the bow, the arrow in his mouth. These guys have swords and shields
CaptionBot: I am not really confident, but I think it's a couple of giraffes that are next to a book
4YO: The Israelites. They’re tired and they’re walking and they don’t want to live in their land
CaptionBot: I think it's a group of people sitting in front of a building and they seem 😐 😐 😐
4YO: People fighting with dragons. The dragons might blow fire on them and eat them. And they’re opening their mouth
CaptionBot: I am not really confident, but I think it's a vase with flowers in it
4YO: A kayak, and this guy wants to throw that big rock at the kayak to make it sink it and break it
CaptionBot: I am not really confident, but I think it's a sandwich on a rock
4YO: It’s Alexander and his great people. There’s lots of horses, alexander is on his horses, dancing
CaptionBot: I am not really confident, but I think it's a group of people standing outside of a building
4YO: Hanukah people carrying Hanukah things and stuff
CaptionBot: I am not really confident, but I think it's an old photo of a stone wall
That'll do, CaptionBot. That'll do.