We all know the story of the first YouTube video, a grainy 19-second clip of co-founder Jawed Karim in the Zoo who noticed the elephants behind him. This video was a central moment in the digital space, and in some ways it is a reflection or at least a reverse mirror image today when we digest the arrival of VEO 3.
Part of Google Gemini, VEO 3 was revealed on Google I/O 2025 and is the first generative video platform that, with a single prompt, generates a video of synchronized dialogue, sound effects and background sounds. Most of these 8-second clips arrive in less than 5 minutes after you enter the prompt.
I’ve been playing with VEO 3 for a few days and for my latest challenge I tried to go back to the beginning of the social video and that YouTube “me in the Zoo” moment. Specifically, I wondered if VEO 3 could recreate that video.
As I have written, the key to a good VEO 3 result is prompt. Without details and structure, VEO 3 tends to make the choices for you and you usually do not end up with what you want. To this experiment, I wondered how I could possibly describe all the details I wanted to get from the short video and deliver them to VEO 3 in the form of a prompt. So of course I turned to another AI.
Google Gemini 2.5 Pro is not currently able to analyze a URL, but Google AI mode, the brand new form of search that is rapidly spreading across the United States.
Here’s the prompt I fell into Google’s AI mode:
Google AI mode returned almost instantly with a detailed description that I took and fell into the Gemini VEO 3 -fast field.
I made some editing, mostly to remove phrases like “The video appears …” and the final analysis at the end, but otherwise I left most of it and added this at the top of the prompt:
“Let’s make a video based on these details. The output must be the 4: 3 ratio and look like it was recorded on 8 mm video tape.”
It took a while for VEO 3 to generate the video (I think the service is being hammered right now) and because it only creates 8-second chunks at a time, it was incomplete and cut off dialogue mid-phrase.
The result is still impressive. I would not say that the protagonist looks like Karim. To be fair, for example, the prompt does not describe Karim’s clipping, the shape of his face or his deep set of eyes. Google’s AI mode’s description of his clothing was also probably inadequate. I’m sure it would have done a better job if I had fed it with a screenshot of the original video.
Note to myself: You can never offer enough details in a generative prompt.
8 seconds at a time
The VEO 3 video -zoological garden is nicer than the one Karim has visited, and the elephants are much further away, though they are moving in there.
The VEO 3 got the film quality right, which gave it a nice look from 2005, but not the 4: 3 image. It also added archaic and unnecessary brands at the top that fortunately disappear quickly. I realize I should have removed the “title” bit from my prompt.
The sound is especially good. Dialogue syncs well with my protagonist, and if you listen carefully, you will also hear the background sounds.
The biggest problem is that this was only half of the short YouTube video. I wanted a full recreation so I decided to go back with a much shorter prompt:
Continue with the same video and add him to look back on the elephants and then look at the camera while he says this dialog:
“Fronter, and that’s cool.” “And that’s pretty much all that is to say.”
VEO 3 complied with the setting and the protagonist, but lost some of the plot and dropped the old school of grainy video of the first generated clip. This means that when I present them together (as I do above), we lose significant continuity. It’s like a movie crew time where they suddenly got a much better camera.
I’m also a little frustrated that all my VEO 3 videos have nonsensical captions. I have to remember to ask VEO 3 to remove, hide or put them outside the video frame.
I think about how hard it was probably for Karim to film, edit and upload the first short video, and how I just made the same clip without the need for people, lighting, microphones, cameras or elephants. I didn’t have to transfer recordings from tapes or even from an iPhone. I just spelled it out of an algorithm. We have really stepped through the looking glass my friends.
I learned another thing through this project. As Google AI Pro -Member I have two VEO 3 -Videorenations per. day. This means I can do this again tomorrow. Tell me that in the comments what you would like me to create.



