Acting: Vision & Action

The Vision & Action program is about robots manipulating real objects by pushing the boundaries of speed, coordination and complexity.

VA projects

The Vision & Action program is about robots manipulating real objects by pushing the boundaries of speed, coordination and complexity.


Research leader: Juxi Leitner (CI)

Research team: Peter Corke (CI), Rob Mahony (CI), Chunhua Shen (CI), François Chaumette (PI), Chris Lehnert (Research Affiliate), Norton Kelly-Boxall (PhD), Sean McMahon (PhD), Douglas Morrison (PhD), Adam Tow (PhD), Fangyi Zhang (PhD), Zheyu Zhang (PhD)

Project aim: Replicating hand-eye coordination in complex visual environments involves developing robust motion control of robotic platforms based on vision data, being able to deal with the variations and complexities the robot will encounter as it operates in a real-world environment.

This project aims to go beyond engineered visual features and environments to develop demonstrator systems that allow manipulation of real world objects including cups, pens, tools, vegetables, toys and many other examples. A key aspect of this project is developing systems and architectures that are robust enough to deal with a variety of conditions and can adapt easily to new tasks, new objects and new environments.

Centre PhD Researcher Fangyi Zhang pictured with a Baxter robot during his internship at the University of Maryland.


Research leader: Rob Mahony (CI)

Research team: Peter Corke (CI), Jonathan Roberts (CI), Jochen Trumpf (AI), Alex Martin (RE), Juan Adarve (PhD), Peter Kujala (PhD), Sean O’Brien (PhD), Jean-Luc Stevens (PhD), Dorian Tsai (PhD)

Project aim: The project is exploring real-world control scenarios in which there are multiple ways to complete a set task, including motions that interact with the environment, for instance moving a glass out of the way to reach a bottle, or picking up a sequence of items from a cluttered workspace in a learned order, rather than a pre-programmed order. Achieving this successfully involves two capabilities, which this project is addressing:

  • moving quickly, while maintaining stability, through a cluttered environment by calling on integrated control and sensing systems, in this case, developing algorithms to visually detect obstacles quickly and independently formulate strategies to avoid them
  • coming up with, and executing an effective and efficient solution to a complex task by calling on integrated decision and planning capabilities, in this case, a vision-based solution that exploits semantic information and other relevant cues.

The project is exploring real-world control scenarios in which there are multiple ways to complete a set task, including motions that interact with the environment.

Organisers of the “New Frontiers for Deep Learning in Robotics” workshop

Key Results in 2017

Research Fellow Juxi Leitner, CI Niko Sünderhauf, CI Michael Milford and Centre Director Peter Corke collaborated with Associate Professor Pieter Abbeel from UC Berkeley to present the workshop “New Frontiers for Deep Learning in Robotics” at the 2017 Robotics: Science and Systems (RSS) conference held in Boston, USA. The workshop was sponsored by Google and OSARO and attracted over 200 people.

The workshop “Deep Learning for Robotic Vision” was held at the 2017 Conference on Computer Vision and Pattern Recognition (CVPR) in Hawaii involving researchers from our Adelaide and QUT nodes. The workshop was led by CI Gustavo Carneiro and involved CI Niko Sünderhauf, RF Juxi Leitner, RF Vijay Kumar, RF Trung Pham, CI Michael Milford, CI Ian Reid and Centre Director Peter Corke. It was an international collaboration with Google Brain, Google Research, the University of Oxford and the University of Frieburg. The workshop attracted around 200 attendees. The group are organising a special issue on “Deep Learning for Robotic Vision” for the International Journal of Computer Vision (IJCV) in 2018 following the success of this workshop.

Adam Tow’s PhD research at the intersection of learning, vision and manipulation led to some interesting ideas on how to train a robot’s actions. In the paper “What Would You Do? Acting by Learning to Predict” a novel way to learn tasks directly from visual demonstrations was proposed.

By learning to predict the outcome of human and robot actions on an environment we enable a robot to physically perform a human demonstrated task without knowledge of the thought processes or actions of the human, only their visually observable state transitions. The approach was shown to work in proof-of-concept table-top, object manipulation tasks and demonstrate generalisation to previously unseen states, while reducing the priors required to implement a robot task learning system. This is one of the biggest short falls of existing approaches of Learning from Demonstration, Reinforcement Learning and Inverse Reinforcement Learning, the need to have a lot of trials or very strong priors. Adam is currently on leave and working for a Chinese start-up in the warehouse automation space.

Building on a visit in 2016, QUT and international partner INRIA have continued their joint research in combining novel deep learning techniques with classical visual servo control approaches. Quentin Bateux, a PhD researcher supervised by Eric Marchand and François Chaumette in Rennes was leading these efforts on the French side and RF Juxi Leitner from Brisbane. After attending the Robotic Vision Summer School, Quentin also visited QUT for a few weeks to implement, train and test various approaches.

photometric approaches developed at INRIA. The naïve integration yielded promising results and over the course of the year better results were achieved by tweaking the network architecture and the data training method. A paper, presented at the Deep Learning workshop at RSS 2017, highlighted that this can be trained quickly if one finds a smart way of training. In our case the idea was that accuracy is more important when you are close to s*, your desired feature state. Taking this into account during data augmentation yielded significant benefits.

The efforts to extend this work, which required scene specific training, to scene-agnostic visual servoing networks has been submitted (and in the meantime accepted) to be presented at ICRA 2018. Quentin has finalised his thesis and is now involved in a French startup company.

Competing as one of 16 teams from 10 countries, participants were tasked with building their own automated robot, including hardware and software, to successfully pick and stow items in a warehouse.

The Challenge combined object recognition, pose recognition, grasp planning, compliant manipulation, motion planning, task planning, task execution, error detection and error recovery. The robots were scored by how many items they successfully picked and stowed in a fixed amount of time.

For the challenge, the Australian Centre for Robotic Vision developed “Cartman”, a Cartesian robot that can move along three axes at right angles to each other, like a gantry crane, and featured a rotating gripper that allowed the robot to pick up items using either suction or a simple two-finger grip. “Cartman provides us the right tools and is specialised for the task at hand, picking out of boxes”, explains Project Leader Juxi Leitner. “We are world leaders in robotic vision and we are pushing the boundaries of computer vision and machine learning to complete these tasks in an unstructured environment.”

Cartman’s vision system was the result of hours of training data time, according to Dr Anton Milan from The University of Adelaide. “We had to create a robust vision system to cope with objects that we only got to see during the competition.”

One feature of our vision system was that it worked off a very small amount of hand annotated training data and we needed just seven images of each unseen item for us to be able to detect them, aided by the use of weighing scales.”