|Member Login



Robots that see and understand


Semantic Representations

The Semantic Representations (SR) program develops models, representations and learning algorithms that will allow robots to reason about their visual percepts, describe what they see and plan actions accordingly. The representations explored in this project will enable the recognition of activities from observer and actor viewpoints, fine-grained understanding of human-object, object-object and robot-object interactions, and large-scale semantic maps of environments. The project also investigates the mapping of visual inputs to sequence outputs for achieving a given task (e.g., describing a scene textually or providing a symbolic representation for a robotic task based derived from human demonstration).

Program Leader Stephen Gould talks about the Semantic Representation program in this video


Ian Reid
  • Ian Reid

    Deputy Director, Semantic Representations Program Leader, University of Adelaide Node Leader and Chief Investigator

  • University of Adelaide

View Profile
Stephen Gould
View Profile
Basura Fernando
View Profile
Anoop Cherian
  • Anoop Cherian

    Former Research Fellow, Australian National University

  • Mitsubishi Electric Research Labs, Cambridge

View Profile
Niko Sünderhauf
View Profile
Anton van den Hengel
View Profile
Tom Drummond
  • Tom Drummond

    Monash University Node Leader and Chief Investigator, AA1 Project Leader

  • Monash University

View Profile
Michael Milford
  • Michael Milford

    Chief Investigator, Robust Vision Program Leader, RV 1 Project Leader

  • Queensland University of Technology

View Profile
Chunhua Shen
View Profile
Sareh Shirazi
View Profile
Trung Pham
View Profile
Niko Sünderhauf
View Profile
Yasir Latif
View Profile
Chao Ma
View Profile
Rodrigo Santa Cruz
View Profile
Saroj Weerasekera
View Profile
Peter Anderson
View Profile


SR1: Video Representations


Basura Fernando, Stephen Gould, Anoop Cherian, Sareh Shirazi, Rodrigo Santa Cruz

Robots should not view the world as a series of static images---they need to understand movement and dynamics. However, representing videos or short video segments in a way that is useful for robots is poorly understood. This project addresses the question of learning video representations.


SR2: Representing Human-Object Interactions


Anoop Cherian, Stephen Gould

Human-Object Interactions investigates representations and models for the recognition interactions of humans with objects in an activity. In this setup, actions are expressed as the change in physical state of objects, and new object instances may be discovered while observing the activities. The project deals with human pose estimation and forecasting (in 2D and 3D) and representations for fine-grained activity recognition.


SR3: Scene Understanding for Robotic Vision


Niko Sünderhauf, Ian Reid, Tom Drummond, Michael Milford, Trung Pham, Yasir Latif, Saroj Weerasekera

For many robotic applications we need models of the environment that enable reasoning about geometric, semantic concepts and “affordances” (i.e. action possibilities) jointly. This project aims to develop algorithms and representations to acquire and reason about the uncertain and dynamic environments in which robots must operate. Among other objectives the project will provide the semantic representations for ACRV SLAM. Initial work in this project aims to develop maps of the environment that are labelled semantically with a “standard” set of useful class labels. Work in VL1 is showing how such labels can be generated for single views, but here the aim is to ensure that such labels are consistent (and accurate) within the 3D structure of the scene. We also aim to leverage prior information from scenes that can be learnt via CNNs. We will investigate how the information from thousands or millions of exemplars can be used to improve scene structure without imposing hard constraints such as Manhattan world models. Subsequent work aims to develop representations that enable a scene to be decomposed into its constituent parts and thereby used for planning for robotic navigation or acquisition/manipulation. Representation of uncertainty is a key element here; this is well-understood in the context of geometry, but is a research question how to determine, represent and use uncertainty resulting from inference over semantic entities. SR3 will draw on advances in VL1 to bridge this gap. Later work aims to go beyond simple type labels to a deeper and more powerful set of labels such as affordances of objects.


SR4: Joint Vision & Language Reps for Robots


Anton van den Hengel, Chunhua Shen, Stephen Gould, Basura Fernando, Chao Ma, Peter Anderson

This project aims to build joint vision and language representations for describing scenes (captioning), answering queries (VQA), and describing and defining robotic actions. Beyond natural language the project also considers more general image-to-sequence models where a sequence may be intended as commands for robot control. Unlike many computer vision tasks which can be precisely circumscribed (eg image segmentation) robots in an open world must be capable of learning to respond to unforeseen questions about unforeseen images, or develop action strategies for previously unseen circumstances.


Australian Centre for Robotic Vision
2 George Street Brisbane, 4001
+61 7 3138 7549