2020 Annual Report

Vision or ‘seeing’ is an active process of discovering, from images, what is present in the world and where it is’.

Robotic vision, like human vision, involves ‘seeing’ and ‘understanding’ to enable useful actions in the world. It is a complex sense that is about more than just patterns of shades and colours, it is about understanding the elements of the scene: what they are, where they are and what they are doing. This rich “higher level” knowledge about the world, gleaned from the “low level” patterns of light, is what enables us to understand, plan and safely achieve goals.

For humans, everyday vision relies on an extraordinary range of abilities that we mostly take for granted. We can effortlessly identify objects; see colour; detect motion; gauge speed and distance; navigate; avoid obstructions; fill in blind spots; quickly correct distorted information; recognise people and understand their emotion.

Vision involves not just our eyes, but also about one third of our brain. It also involves our whole body, we use vision to help us move and balance, but we also move our body in order to see better. Robotic vision is similar, we just use camera chips instead of eyes, and high-performance computers instead of a brain, and a robot instead of a body.

There are two primary reasons why vision has, relatively recently, become a practical sensor for robots:

1. Cameras are now very cheap because they are commodities, used in huge numbers in mobile phones, laptops, dash cams, security cameras etc. What this means is that the actual sensor – equivalent to the retina of a human eye – now costs less than a dollar.

2. Computation is also very cheap, driven by the demands of mobile devices and games. We have access to powerful computers with lots of memory, which allow researchers to run sophisticated algorithms to process the data the cameras.