How do computers see the world? It’s not quite the same way humans do.
Recent advances in generative artificial intelligence (AI) make it possible to do more things with computer image processing. You might ask an AI tool to describe an image, for example, or to create an image from a description you provide.
As generative AI tools and services become more embedded in day-to-day life, knowing more about how computer vision compares to human vision is becoming essential.
My latest research, published in Visual Communication, uses AI-generated descriptions and images to get a sense of how AI models “see” – and discovered a bright, sensational world of generic images quite different from the human visual realm.

Elise Racine / Better Images of AI / Emotion: Joy, CC BY
Comparing human and computer vision
Humans see when light waves enter our eyes through the iris, cornea and lens. Light is converted into electrical signals by a light-sensitive surface called the retina inside the eyeball, and then our brains interpret these signals into images we see.
Our vision focuses on key aspects such as colour, shape, movement and depth. Our eyes let us detect changes in the environment and identify potential threats and hazards.
Computers work very differently. They process images by standardising them, inferring the context of an image through metadata (such as time and location information in an image file), and comparing images to other images they have previously learned about. Computers focus on things such as edges, corners or textures present in the image. They also look for patterns and try to classify objects.

CAPTCHA
You’ve likely helped computers learn how to “see” by completing online CAPTCHA tests.
These are typically used to help computers differentiate between humans and bots. But they’re also used to train and improve machine learning algorithms.
So, when you’re asked to “select all the images with a bus”, you’re helping software learn the difference between different types of vehicles as well as proving you’re human.
Exploring how computers ‘see’ differently
In my new research, I asked a large language model to describe two visually distinct sets of human-created images.
One set contained hand-drawn illustrations while the other was made up of camera-produced photographs.





