How we see

It may be obvious, but sometimes the obvious needs stating, which is that television pictures can only be assessed by the viewer through the human visual system. It is equally obvious, that moving picture reproduction systems developed without an understanding of human vision will be sub-optimal. That’s where we are at the moment: today’s TV and cinema standards were specified before much of what we know about sight was well understood. With our now expanded and newer understanding, we should apply the science to future television systems.

The Human Visual System, (HVS) is only one of a set of senses that have all evolved through millennia to give us some idea about our surroundings. That’s what consciousness is. Quite simply a creature that is more conscious of its environment is more likely to avoid being a victim, more likely to find food and a partner, more likely to survive and pass on genes to the next generation. It is therefore obvious that all of today’s surviving species will have been selected for good senses.

What we call reality is actually a restricted model of our surroundings in our minds. As we have no way of bypassing these senses, we have no choice in the matter. But our reality is a small subset of what is out there. The range of optical wavelengths over which human vision works is tiny and we have little or no idea what is happening at other wavelengths. Our reality is restricted and we cannot grasp what it must be like to be a fly that has a near-spherical field of view, or a bat that has an image of its surroundings in its mind that was created in total darkness with ultra sound.

There is significant variation in the operation of human senses both between and within ethnic groups. What you see is not necessarily what I see. Vision changes with age, and can be altered by substances of varying legality. One cup of coffee changes colour vision measurably. Tiredness alters our critical flicker frequency. An alcoholic drink alters the critical bandwidth and sensitivity of human hearing. There is no one reality and many things must remain subjective.

Another aspect of human vision that it is vital to understand is that most of the time people don’t see what is actually in front of them. They see what they believe they ought to see and often fail to see things that are truly remarkable if their attention is diverted.

In the days before digital cinema, where several reels of film were needed for the average movie, most moviegoers never noticed the cue dots used to synchronise reel changes. Colour temperature of daylight changes throughout the day, yet we see colours as being fairly constant. We can discuss colorimetry on another occasion.

Some people have little stereoscopic vision, whereas others primarily colour blind, yet without necessarily realising it because that is their reality. It follows that any one viewer watching a TV screen or the real world will see more than some and less than others.

How the eye “sees”

I think it is possible that the Human Visual System can learn or be trained to see more. Obviously this is impossible in any physical sense, but in the sense of extracting more information from what can be seen it is true. Photographers, videographers and cinema cameramen see more for a living and I doubt this stops when the camera is put down.

What we call vision is a long way from just the image captured by the retina. Instead it has been subject to an extensive amount of mental processing. What we believe we are seeing comes from a kind of three-dimensional frame store, which holds an approximate model of our surroundings. We are also in the model, so we can reach out for things and have some chance of taking hold of them.

Another reason for this arrangement is that it allows a huge field of view, around 180 degrees horizontally, but without massive and unattainable bandwidth between eye and brain. Our mental frame store only needs to be updated if anything changes, allowing a huge reduction in information from the eye. The second bandwidth reduction measure is that the acuity of the eye is not uniform. Best acuity is only present in a very small central circular area known as the fovea. The eye then needs to be able to turn so that it can rapidly place the item of interest on the fovea. It will do so if anything is detected by any sense that suggests something has changed in the environment.

The cause might include a sound, a vibration, a change in radiant heat or touch sensed by the skin, or something detected by the peripheral vision. Because the primary purpose of peripheral vision is to alert the senses to a change in the environment, the response rate of peripheral vision is higher than foveal vision. Stand so that a TV set appears right at the edge of your field of view and you will see it flickering. You will also experience a strong urge to look at it, which stops the flickering. Some people are addicted to television. This is one of the reasons.

The retina contains what we in television would recognise as photosites, although they are officially called rods and cones. The density of the photosites is low in the peripheral vision area and higher in the fovea. Only the fovea has colour vision. Peripheral vision sees in monochrome and the sensation of color comes from the frame store. The density of photosites in the fovea is much less than we would find in a TV camera of similar acuity, but this does not mean sampling theory is wrong, because there is another process the TV camera doesn’t have, known as saccadic motion.

Saccadic motion is the involuntary constant minute oscillation of the eyeball. It has the result of shifting a given photosite to a large number of different locations so that over time a high resolution image can be built up from all the measurements made in various places.

The HVS causes the eye motion, and then cancels the resultant image shift in a kind of DVE, with the result that the photosites shift. The HVS then integrates a sharp image by adding a number of images over time with the photosites in multiple locations. The time needed is one of the reasons for persistence of vision and the reduced visibility of flicker in the fovea.

Our eyes can never stop moving, and this is vital because our vision is AC coupled and we can only perceive changes. One reason for this is that evolution resulted in the retina being back to front, with the light sensitive layer underneath the blood vessels and nerves. The raw retinal image is seen as if through a spider’s web. By keeping the eye moving, we can average out the shadows of the blood vessels and get a better image.

The human eye continually moves to gain as much information as possible. Image courtesy Wikipedia.

The human eye continually moves to gain as much information as possible. Image courtesy Wikipedia.

The delays caused by image and hearing processing are quite significant, so that what we see and hear is behind real time. However, if you clap your hands, you feel and hear them touch at the same time as you see them touch because the mind has time base corrected all the sensations. The HVS has shifted the position of our hands in the image to where it should be because it knows they are moving and how fast. Were it not for this we would be unable to catch a ball.

Try panning a TV camera rapidly. The image blurs and flies across the screen. But when our eyes move rapidly from one point of interest to another, we don’t see image smear and we don’t think the world is moving. What happens is that the eyes are essentially switched off when they move rapidly. We don’t see darkness because we continue to see what is in our frame store. Magicians and pickpockets alike know this and can do things “invisibly”: they elicit a rapid eye shift during the action they want to conceal. You can try looking at one of your eyes in a mirror, and then switching your gaze to the other eye. Rest a fingertip lightly on an eyelid. You will feel your eyes move but you will never see them move.

The HVS has some superficial similarity to a camera. It has a lens, an iris and a sensor, but that is as far as it goes. In order to use a TV or movie camera effectively it is important to appreciate the difference. Reality for a camera is completely different to the human sensation. The old saying “the camera never lies” is about as far from the truth as it is possible to get. Good cinematographers and videographers know that the camera always lies, and part of the art is to keep it at least faithful to the intended illusion.

Possibly the most significant consequence of eyes that can move is that they have the ability to track moving objects to render them stationary on the retina. TV and movie cameras can’t do that and this turns on its head the way we should go about reproducing moving pictures. That is a long story and it will have to wait for another time.

You might also like...

An Introduction To Network Observability

The more complex and intricate IP networks and cloud infrastructures become, the greater the potential for unwelcome dynamics in the system, and the greater the need for rich, reliable, real-time data about performance and error rates.

Essential Guide: Location Sound Recording

This Essential Guide examines the delicate and diverse art of capturing audio on location, across a range of different types of film and television production. A group of seasoned professionals discuss their art and the how it can dramatically elevate…

What Are The Long-Term Implications Of AI For Broadcast?

We’ve all witnessed its phenomenal growth recently. The question is: how do we manage the process of adopting and adjusting to AI in the broadcasting industry? This article is more about our approach than specific examples of AI integration;…

Next-Gen 5G Contribution: Part 2 - MEC & The Disruptive Potential Of 5G

The migration of the core network functionality of 5G to virtualized or cloud-native infrastructure opens up new capabilities like MEC which have the potential to disrupt current approaches to remote production contribution networks.

Designing IP Broadcast Systems: Addressing & Packet Delivery

How layer-3 and layer-2 addresses work together to deliver data link layer packets and frames across networks to improve efficiency and reduce congestion.