|Member Login

Research

Understanding

Robots that see and understand

Overview


Semantic Representations

The Semantic Representations (SR) program develops models, representations and learning algorithms that will allow robots to reason about their visual percepts, describe what they see and plan actions accordingly. The representations explored in this project will enable the recognition of activities from observer and actor viewpoints, fine-grained understanding of human-object, object-object and robot-object interactions, and large-scale semantic maps of environments. The project also investigates the mapping of visual inputs to sequence outputs for achieving a given task (e.g., describing a scene textually or providing a symbolic representation for a robotic task based derived from human demonstration).

Chief Investigator Stephen Gould talks about the Semantic Representation program in this video

People


Ian Reid
  • Ian Reid

    Deputy Director, University of Adelaide Node Leader, Chief Investigator, Project Leader (Scene Understanding)

  • University of Adelaide

View Profile
Basura Fernando
View Profile
Anton van den Hengel
View Profile
Niko Sünderhauf
  • Niko Sünderhauf

    Chief Investigator, Project Leader (Robotic Vision Evaluation Testbed)

  • Queensland University of Technology

View Profile
Tom Drummond
View Profile
Michael Milford
  • Michael Milford

    Chief Investigator, Project Leader (Self-driving cars Demonstrator)

  • Queensland University of Technology

View Profile
Chunhua Shen
View Profile
Gordon Wyeth
View Profile
Anthony Dick
View Profile
Trung Pham
View Profile
Yasir Latif
View Profile
Chao Ma
View Profile
Feras Dayoub
View Profile
Qi Wu
View Profile
Rodrigo Santa Cruz
View Profile
Saroj Weerasekera
View Profile
Mehdi Hosseinzadeh
View Profile
Fahimeh Rezazadegan
View Profile
Peter Anderson
View Profile
Ming Cai
View Profile
Kejie ‘Nic’ Li
View Profile
Lachlan Nicholson
View Profile
William Hooper
View Profile
Natalie Jablonsky
View Profile
Huangying Zhan
View Profile
Stephen Gould
  • Stephen Gould

    ANU Node Leader, Chief Investigator, Project Leader (Robots, Humans and Action)

  • Australian National University

View Profile
Sareh Shirazi
  • Sareh Shirazi

    Research Fellow (currently on leave)

  • Queensland University of Technology

View Profile
Anoop Cherian
  • Anoop Cherian

    Former Research Fellow, Australian National University

  • Mitsubishi Electric Research Labs, Cambridge

View Profile
Bohan Zhuang
View Profile

Projects


Scene Understanding


2018 onwards

Ian Reid, Niko Sünderhauf, Tat-Jun Chin, Viorela Ila, Andrew Spek, Yasir Latif, Thanh-Toan Do, Saroj Weerasekera, Huangying Zhan, Kejie ‘Nic’ Li, Lachlan Nicholson, Mina Henein, Jun Zhang, Natalie Jablonsky, Mehdi Hosseinzadeh, Sourav Garg, Shin Fang Ch’ng, Ravi Garg, Yu Liu

This project will create geometric and semantic models and representations of the environment from visual observations.This ability is crucial for a robotic system to reason about a scene, its objects and their relationships, and their affordances, so that it can plan actions effectively. Particular competences to be developed will include: (1) semantically labelled 3D maps; (2) object-based as opposed to point-cloud-based mapping of a scene; (3) models for incorporating physically based rules such as gravity and forces, for predicting the consequences of action; (4) the role of deep learning with geometry using 1000s or 1000000s of observations of scenes help create priors to resolve ambiguity.

ian.reid@adelaide.edu.au

Robots, Humans and Action


2018 onwards

Stephen Gould, Hongdong Li, Richard Hartley, Sareh Shirazi, Dylan Campbell, Rodrigo Santa Cruz, Cristian Rodriguez, Amir Rahimi, Zhen (Frederic) Zhang, Qinfeng ‘Javen’ Shi, Hamid Rezatofighi, Sadegh Aliakbarian

Activities are a sequence of interactions between one or more agents (human or robot) with their environment in the pursuit of some goal. To fully understand activities, we need to model the dynamic relationship between an agent, its environment, objects in the environment, and other agents. This project develops models for learning to recognize and describe activities from video. We focus on representations that will allow a robot to monitor, understand, and predict the actions and intent of a human. Moreover, the project investigates ways that understanding a visual environment can be used to predict the consequences of robot actions. The ultimate goal is to develop methods that facilitate robot-human interaction and cooperation.The project links to other Centre: Scene Understanding for detecting objects and modelling physical environments, Vision and Language for describing tasks and action sequences, Learning for improving underlying algorithms and models, and Manipulation and Vision for robot control and human-robot cooperation.

stephen.gould@anu.edu.au

Vision and Language


2018 onwards

Anton van den Hengel, Stephen Gould, Chunhua Shen, Anthony Dick, Qi Wu, Hui Li, Violetta Shevchenko

The project uses technology developed for vision-and-language purposes to develop capabilities relevant to visual robotics. This is more than just VQA for robots, or Dialogue for Tasking, but includes questions of what needs to be learned, stored, and reasoned over for a robot to be able to carry out a general task specified by a human through natural language.Recent progress in Visual Question Answering has allowed the development of methods which are capable of learning to respond to unforeseen questions about unforeseen images. This is particularly interesting because it requires developing a method which is not designed for a single predefined task (such as segmenting cows), but rather aims to respond in real time to unforeseen input. This seems a good simile for the broader goal of the Centre in moving from controlled to uncontrolled environments.The project will develop these technologies towards solving a subset of key problems in visual robotics, primarily centred around the goal of natural language robot tasking.

anton.vandenhengel@adelaide.edu.au

Previous Project: SR1: Understanding Human and Robot Actions and Interactions


Ongoing

Basura Fernando, Gordon Wyeth, Fahimeh Rezazadegan, Rodrigo Santa Cruz, Stephen Gould, Anoop Cherian, Sareh Shirazi, Bohan Zhuang

Robots should not view the world as a series of static images – they need to understand movement and dynamics. Furthermore arguably the most crucial dynamic content in any scene is the movement of the human (and other robotic) elements in the scene. This project addresses the question of understanding human and robot actions and interaction, primarily from video. The project considers ways in which videos or short video segments can be represented usefully for robots, ways in which a robot can monitor, understand and predict the actions and interactions of a human, and ways that the video feed can be used to predict the consequences of robot actions. The project investigates learning robotic tasks from observation and dynamic scene understanding for collaborative/cooperative tasks.

basura.fernando@anu.edu.au

Previous project – SR3: Scene Understanding for Robotic Vision


Ongoing

Niko Sünderhauf, Ian Reid, Tom Drummond, Michael Milford, Trung Pham, Yasir Latif, Feras Dayoub, Saroj Weerasekera, Mehdi Hosseinzadeh, Ming Cai, Kejie ‘Nic’ Li, Huangying Zhan, Lachlan Nicholson, William Hooper, Natalie Jablonsky

For many robotic applications we need models of the environment that enable reasoning about geometric, semantic concepts and "affordances" (i.e. action possibilities) jointly. This project aims to develop algorithms and representations to acquire and reason about the uncertain and dynamic environments in which robots must operate.Among other objectives the project will provide the semantic representations for ACRV SLAM. Initial work in this project aims to develop maps of the environment that are labelled semantically with a "standard" set of useful class labels. Work in VL1 is showing how such labels can be generated for single views, but here the aim is to ensure that such labels are consistent (and accurate) within the 3D structure of the scene. We also aim to leverage prior information from scenes that can be learnt via CNNs. We will investigate how the information from 1000s or 1000000s of exemplars can be used to improve scene structure without imposing hard constraints such as manhattan world models. Subsequent work aims to develop representations that enable a scene to be decomposed into its constituent parts and thereby used for planning for robotic navigation or acquisition/manipulation. Representation of uncertainty is a key element here; this is well-understood in the context of geometry, but is a research question how to determine, represent and use uncertainty resulting from inference over semantic entities. SR3 will draw on advances in VL1 to bridge this gap. Later work aims to go beyond simple type labels to a deeper and more powerful set of labels such as affordances of objects.

niko.suenderhauf@qut.edu.au

Previous project: SR4: Joint Vision & Language Reps for Robots


-2017

Anton van den Hengel, Chunhua Shen, Stephen Gould, Anthony Dick, Basura Fernando, Chao Ma, Qi Wu, Peter Anderson

This project looks at building joint vision and language representations for describing scenes (captioning) and answering queries (VQA). Going beyond natural language the project considers image-to-sequence models where a sequence may be intended as commands for robot control.Recent progress in Visual Question Answering has allowed the development of methods which are capable of learning to respond to unforeseen questions about unforeseen images. This is particularly interesting because it requires developing a method which is not designed for a single predefined task (such as segmenting cows), but rather aims to respond in real time to unforeseen events. This seems a good simile for the broader goal of the Centre in moving from controlled to uncontrolled environments.The VQA work has the additional advantage of providing a means of incorporating sequences within Deep Learning. Given that so much of robotics is concerned with sequences as both inputs and outputs, this seems an important capability.

anton.vandenhengel@adelaide.edu.au

Previous project: SR2: Representing Human-Object Interactions


2017 (merged with SR1)

Anoop Cherian, Stephen Gould, Bohan Zhuang

This project investigates representations and models of human-object interaction and recognising these interactions in video. Actions are expressed as the change in physical state of objects, and new object instances may be discovered while observing the activities. In comparison to SR1, that aims at capturing holistic representations for action recognition, SR2 investigates explicit and fine-grained details that makes up an action, so that it can be used for generating fine-grained robotic control commands and human-robot interaction. As part of SR2, we investigate problems such as fine-grained activity recognition, human pose estimation and forecasting, human-object and object-object spatial reasoning, and vision based robotic control generation.

anoop.cherian@roboticvision.org

Australian Centre for Robotic Vision
2 George Street Brisbane, 4001
+61 7 3138 7549