Can you caption better than a 4-year-old?

מאת נביא שקר

(In English this time)

I recently tried the limits and failure-modes of Microsoft's new CaptionBot. It seems that CaptionBot doesn't really understand what it is looking at.

There are a few counters to that conclusion:

A) Neural networks see the world differently than we do, man!

B) Pfft. The neural-network behind CaptionBot just wasn't trained on enough data.

C) You need to be more forgiving and nurturing. Neural networks are like precious children,  figuring out the world for the first time.

To which I say:

A) Maybe, but then the PR surrounding it shouldn't be "Our bot understands the content of images", but more like "Our bot sort of gets image-classification, and can sort of link that up to Flickr-type captions, if the images are similar to a dataset we trained on".

B) Nonsense.

C) This post.

I managed to secure a 4-year-old's attention for about 20 minutes, and asked him to tell me what was in a series of photos ("describe what you see").

I offered encouragement, but not feedback.

Then I asked CaptionBot to do the same.

Here are the results:

unnamed (9)

4YO: This is a castle and these are buildings and that’s a wall and that’s a sign, and there’s grass

CaptionBot: I think it's a large stone building

unnamed (8)

4YO: I see a eagle and people are falling into the water, and that’s Moses, and those are the bad guys in the water

CaptionBot: I am not really confident, but I think it's a group of stuffed animals

unnamed (7)

4YO: The lion is trying to eat that person up

CaptionBot: I am not really confident, but I think it's a cat playing with a toy

unnamed (6)

4YO: People are pulling the horse with carriages, and they’re pulling it and pushing it and pulling it and pushing it, cause they want to break other people with that horse, and he’ll step on them, and he likes the nice guys

CaptionBot: I am not really confident, but I think it's a group of men standing next to a horse

unnamed (5)

4YO: They’re fighting. That’s Alexander and he’s going to win. He’s holding the bow, the arrow in his mouth. These guys have swords and shields

CaptionBot: I am not really confident, but I think it's a couple of giraffes that are next to a book

unnamed (4)

4YO: The Israelites. They’re tired and they’re walking and they don’t want to live in their land

CaptionBot: I think it's a group of people sitting in front of a building and they seem 😐 😐 😐

unnamed (3)

4YO: People fighting with dragons. The dragons might blow fire on them and eat them. And they’re opening their mouth

CaptionBot: I am not really confident, but I think it's a vase with flowers in it

unnamed (2)

4YO: A kayak, and this guy wants to throw that big rock at the kayak to make it sink it and break it

CaptionBot: I am not really confident, but I think it's a sandwich on a rock

unnamed (1)

4YO: It’s Alexander and his great people. There’s lots of horses, alexander is on his horses, dancing

CaptionBot: I am not really confident, but I think it's a group of people standing outside of a building

unnamed

4YO: Hanukah people carrying Hanukah things and stuff

CaptionBot: I am not really confident, but I think it's an old photo of a stone wall

That'll do, CaptionBot. That'll do. 

מודעות פרסומת