|Member Login


Robotic Vision Summer School (RVSS) 2018


The Australian Centre for Robotic Vision presents Robotic Vision Summer School (RVSS) in Kioloa, Australia.

The ability to see is the remaining technological roadblock to the ubiquitous deployment of robots into society. The Robotic Vision Summer School (RVSS) provides a premium venue for graduate students and industry researchers to learn about fundamental and advanced topics in computer vision and robotics. The summer school also provides a unique opportunity to experiment with computer vision algorithms on actual robotic hardware.

The summer school will be held at Kioloa on the NSW south coast, Australia, from Sunday, 4th February 2018 to Friday, 9th February 2018. Bus transportation (included in the registration fee) is provided to/from Canberra airport and the Kioloa campus.

The program incorporates both talks and practical sessions run by world leading researchers in robotics and computer vision. In addition there will be ample free time during the summer school for attendees to collaborate on research projects and explore the Kioloa surroundings. Further information on speakers and the program will be available here shortly.

The summer school is held annually at the Kioloa campus, Australian National University on the NSW south coast, Australia. This is an international summer school targeting Masters (later-year)/PhD students, academics, and industrial researchers.

Important Dates
Registration deadline: 22nd December 2017 (Extended to 12th January 2018)
Summer School: 4th to 9th February 2018



RVSS 2018 Workshop TicketInclusions$AUD (Australian Dollars)
External Attendeesworkshops, accommodation, meals, and transport from Canberra$825
Centre Affiliates (includes partner institutions, invited speakers, chief investigators, research fellows, and     PhD students)workshops, accommodation, meals, and transport from CanberraFee covered by centre

All attendees must register – Click and Register

Centre staff and students

All Centre-affiliated students are expected to attend RVSS in the first year of their PhD candidature. Later year Centre students should seek their supervisors’ approval before registering. Centre-affiliated staff, who are presenting, must still register. Other Centre staff should seek approval from their node leader before registering.

Non-attendance and refunds

If you are unable to attend RVSS for compassionate or legal reasons you must contact the organisers at rvss@roboticvision.org. Depending on the situation and how soon before the event, partial or full refunds may be possible.

Visa requirements

Please note that in order to be granted a visa to visit Australia you must demonstrate that you meet all the legislative requirements. It is strongly recommended that you apply for your visa 6-8 weeks prior to the event date. Details of visa application requirements are available on the Australian Department of Immigration’s website.The organisers take no responsibility for visa applications or processes.


Venue and Transport

RVSS 2018 will be held at the ANU Kioloa Coastal Campus, Australian National University. The campus offers accommodation and research facilities at a unique location extending from the beach front through a diverse ecology to thick bushland. The campus is one of Australia’s premier field stations situated on the NSW south coast between the Kioloa and Bawley Point villages, easily accessible from Canberra, Sydney, and Wollongong.

Kioloa is a small hamlet, with a population of about 200, located on the South Coast of New South Wales, Australia. It is pronounced by locals as ‘Ky-ola’. Click here for more about the Kioloa campus, facilities, accommodation, and travel/tourism details.

Kioloa is neighbour to some of the best beaches in Australia. For example, Batemans Bay, with its majestic seascapes and lovely beaches, is a popular holiday destination on the East coast, and is less than 50 kms from the Summer School venue. Other beaches, such as Pretty Beach, Bawley Coast, Pebbly Beach, and Termeil Beach, are just a few kilometres from the venue. Another popular beach, Jervis Bay is less than 90kms!

Transport to Venue

Coach transportation between Canberra and Kioloa will be provided for Summer School participants, however they are expected to make their own travel arrangements to Canberra.
Coaches will pick-up and drop-off from two locations in Canberra-  ANU (Fulton Muir Building, Cnr North and Daley Roads) and the Canberra Domestic Airport.  On Sunday, 4 February 2018, coaches will leave Canberra mid-morning and no later than 2:00pm.  Please ensure that you arrive in Canberra well in advance of the last pick-up.  On Friday, 9 February 2018, coaches will leave Kioloa at 2pm so please ensure that you do not book a return flight before 6:15pm if using the coach service.  Normal travel time from Canberra is approximately 2.5 hours.  Coach travel time from Canberra is approximately 3.15 hours.

Click on the link to check out your bus allocation: RVSS Bus Allocation

Registering at Venue

Delegates will be provided with conference name tags as proof of registration. Name tages should be worn to all events.


Accommodation at the Kioloa Coastal Campus is shared in either twin or bunk rooms. Accommodation includes a linen pack which includes a pillow, sheets, blanket and towel. Click here to check your room allocations. Note that we are required to check out by 9am on Friday morning.


All meals are incldued in the registration cost. If you have any special dietary or religious requirements, please advise us of these requirements through your registration form.


The weather on the south coast of NSW can be variable.  Please ensure that you bring adequate clothing for both hot and cool weather, a hat and sunscreen.  You may wish to attend a range of social activities, including a bushwalk and sporting activities, so please bring appropriate comfortable clothing.

Information Booklet

Click on the link to download the information pack for the summer school: information pack



Margarita Chli (ETH Zurich)

Vision-based perception for aerial robots

Abstract: This talk will describe the journey of the evolution of visual perception for aerial robots, presenting our latest results at the Vision for Robotics Lab on building blocks enabling autonomous navigation of a small aircraft and the transition from single to multi-robot collaborative estimation, touching on some of the most challenging problems we are faced with currently.

Bio: Margarita Chli is an Assistant Professor leading the Vision for Robotics Lab (V4RL) of ETH Zurich, Switzerland and the vice-director of the Institute of Robotics and Intelligent Systems at ETH Zurich. Originally from Greece and Cyprus, she received both her Bachelor and Master degrees in Information and Computing Engineering from Trinity College of the University of Cambridge, UK, and her PhD from Imperial College London. After a postdoctoral position at the ASL of ETH Zurich, she moved to the University of Edinburgh, UK, to accept the prestigious Chancellor’s Fellowship and initiate V4RL.  In 2015, she relocated with her group to ETH Zurich to accept a Swiss National Science Foundation Assistant Professorship (success rate of 15.7%).  Prof Chli received numerous academic scholarships from both Cyprus and Cambridge and she continues to hold an Honorary Fellowship from the University of Edinburgh. She is a World Economics Forum (WEF) Expert in Artificial Intelligence and Robotics and was a speaker at WEF 2017 in Davos, while she also featured in Robohub’s 2016 list of 25 women in Robotics you need to know about. Her research interests lie in developing vision-based perception for robots, such as small aircraft, leading teams in multiple national and international projects, such as the European Commission projects sflymyCopterSHERPA, and AEROWORKS. Prof Chli participated in the first vision-based autonomous flight of a small helicopter, and her work has received international acclaim from the community recently featuring in Reuters.

Vincent Lepetit (University of Bordeaux)

3D Registration with Deep Learning

Abstract: The first part of my talk will describe a novel method for 3D object detection and pose estimation from color images only. We introduce a “holistic’’ approach that relies on a representation of a 3D pose suitable to Deep Networks and on a feedback loop.  This approach, like many previous ones is however not sufficient for handling objects with an axis of rotational symmetry, as the pose of these objects is in fact ambiguous. We show how to relax this ambiguity with a combination of classification and regression. The second part will describe an approach bridging the gap between learning-based approaches and geometric approaches, for accurate and robust camera pose estimation in urban environments from single images and simple 2D maps.

Bio: Dr. Vincent Lepetit is a Full Professor at the LaBRI, University of Bordeaux, and an associate member of the Inria Manao team. He also supervizes a research group in Computer Vision for Augmented Reality at the Institute for Computer Graphics and Vision, TU Graz.

He received the PhD degree in Computer Vision in 2001 from the University of Nancy, France, after working in the ISA INRIA team. He then joined the Virtual Reality Lab at EPFL as a post-doctoral fellow and became a founding member of the Computer Vision Laboratory. He became a Professor at TU Graz in February 2014, and at University of Bordeaux in January 2017. His research interests include computer vision and machine learning, and their application to 3D hand pose estimation, feature point detection and description, and 3D registration from images. In particular, he introduced with his colleagues methods such as Ferns, BRIEF, LINE-MOD, and DeepPrior for feature point matching and 3D object recognition.

He often serves as program committee member and area chair of major vision conferences (CVPR, ICCV, ECCV, ACCV, BMVC). He is an editor for the International Journal of Computer Vision (IJCV) and the Computer Vision and Image Understanding (CVIU) journal.

Yarin Gal (University of Oxford)

Bayesian Deep Learning

Abstract: Bayesian models are rooted in Bayesian statistics and easily benefit from the vast literature in the field. In contrast, deep learning lacks a solid mathematical grounding. Instead, empirical developments in deep learning are often justified by metaphors, evading the unexplained principles at play. These two fields are perceived as fairly antipodal to each other in their respective communities. It is perhaps astonishing then that most modern deep learning models can be cast as performing approximate inference in a Bayesian setting. The implications of this are profound: we can use the rich Bayesian statistics literature with deep learning models, explain away many of the curiosities with this technique, combine results from deep learning into Bayesian modeling, and much more. In this talk I will review a new theory linking Bayesian modeling and deep learning and demonstrate the practical impact of the framework with a range of real-world applications. I will also explore open problems for future research—problems that stand at the forefront of this new and exciting field.

Bio: Yarin Gal obtained his PhD from the Machine Learning group at the University of Cambridge, and was a Research Fellow at St Catherine’s college, Cambridge. He is currently the Associate Professor of Machine Learning at the University of Oxford Computer Science department, holding positions also as a Tutorial Fellow in Computer Science at Christ Church, Oxford, a Visiting Researcher position at the University of Cambridge, as well as a Turing Fellowship at the Alan Turing Institute, the UK’s national institute for data science.


Andrea Cherubini (Université de Montpellier)

Perception to Inter-Action

Abstract:  Traditionally, heterogeneous sensor data was fed to fusion algorithms (e.g., Kalman or Bayesian-based), so as to provide state estimation for modeling the environment. However, since robot sensors generally measure different physical phenomena, it is preferable to use them directly in the low-level servo controller rather than to apply them to multi-sensory fusion or to design complex state machines. This idea, originally proposed in the hybrid position-force control paradigm, when extended to multiple sensors brings new challenges to the control design; challenges related to the task representation and to the sensor characteristics (synchronization, hybrid control, task compatibility, etc.). The rationale behind our work has precisely been to use sensor-based control as a means to facilitate the physical interaction between robots and humans.

In particular, we have used vision, proprioceptive force, touch and distance to address case studies, targeting four main research axes: teach-and-repeat navigation of wheeled mobile robots, collaborative industrial manipulation with safe physical interaction, force and visual control for interacting with humanoid robots, and shared robot control. Each of these axes will be presented here, before concluding with a general view of the issues at stake, and on the research projects that we plan to carry out in the upcoming years.

Bio: Andrea Cherubini is Associate Professor at Université de Montpellier and Researcher at LIRMM IDH (Interactive Digital Humans Group) since 2011. He received an MSc in 2001 from the University of Rome « La Sapienza » and a second one in 2003 from the University of Sheffield, U.K. From 2004 to 2008, he was PhD student, and then Postdoctoral fellow, at the Dipartimento di Informatica e Sistemistica (now DIAG), University of Rome « La Sapienza ». Then, from 2008 to 2011, he worked as PostDoc at INRIA Rennes. With IDH, he was involved in European projects VERE and RoboHow.Cog, and in the French Project ANR ICARO.
His main research interests include sensor-based control, humanoid robotics, and physical human-robot interaction. This research is targeted by the French projects CoBot@LR and ANR SISCOB, and by the European project H2020 VERSATILE, all of which he manages as Principal Investigator at LIRMM.

Elizabeth Croft (Monash University)

Hey robot – do you see what I see?  Creating common task frameworks through visual cue

Abstract: To be confirmed

Bio: Professor Elizabeth A. Croft (B.A.Sc UBC ’88, M.A.Sc Waterloo ’92, Ph.D. Toronto ’95) is the Dean of Engineering at Monash University commencing January 2018. She is formerly a Professor of Mechanical Engineering and Senior Associate Dean for the Faculty of Applied Science at the University of British Columbia (UBC) and Director of the Collaborative Advanced Robotics and Intelligent Systems (CARIS) Laboratory. Her research investigates how robotic systems can behave, and be perceived to behave, in a safe, predictable, and helpful manner, and how people interact with, and understand, robotic systems with applications ranging from manufacturing to healthcare and assistive technology. She held the NSERC Chair for Women in Science and Engineering (BC/Yukon) from 2010-2015 and the Marshall Bauder Professorship in Engineering Economics, Business and Management Training from 2015-2017. Her recognitions include a Peter Wall Early Career Scholar award, an NSERC Accelerator award, and WXN’s top 100 most powerful women in Canada. She is a Fellow of the Canadian Academy of Engineers, Engineers Canada, and the American Society of Mechanical Engineers.


Chunhua Shen (University of Adelaide)

Title: Deep Learning for Dense Per-Pixel Prediction and Vision-to-Language Problems

Abstract: Dense per-pixel prediction provides an estimate for each pixel given an image,  offering much richer information
than conventional sparse prediction models. Thus the Computer Vision community have been increasingly shifting the research focus to per-pixel prediction. In the first part of my talk, I will introduce my team’s recent work on deep structured methods for per-pixel prediction that combine deep learning and graphical models such as conditional random fields. I show how to improve depth estimation from single images and semantic segmentation with the use of contextual information in the context of deep structured learning.

Recent advances in computer vision and natural language processing (NLP) have led to new interesting applications.
Two popular ones are automatically generating natural captions for images/video and answering questions relevant to a given image
(i.e., visual question answering or VQA).  In the second part of my talk, I will describe several recent work from my group that take advantage of state-of-the-art computer vision  and NLP techniques to produce promising results on both tasks of image captioning and VQA.

Bio: Chunhua is a Professor of Computer Science at University of Adelaide, leading the Machine Learning Group. He held an ARC Future Fellowship from 2012 to 2016. His research and teaching have been focusing on Statistical Machine Learning and Computer Vision. These days his team focuses their effort on Deep Learning. In particular, with tools from deep learning, his research contributes to understand the visual world around us by exploiting the large amounts of imaging data.

Chunhua received a PhD degree at University of Adelaide; then worked at the NICTA (National ICT Australia) computer vision program for about six years. From 2006 to 2011, he held an adjunct position at College of Engineering & Computer Science, Australian National University. He moved back to University of Adelaide in 2011.


Tom Drummond (Monash University)

Algorithms and Architecture: Past, Present and Future

Abstract: To be confirmed.

Bio: Professor Drummond is a Chief Investigator based at Monash. He studied a BA in mathematics at the University of Cambridge.  In 1989 he emigrated to Australia and worked for CSIRO in Melbourne for four years before moving to Perth for his PhD in Computer Science at Curtin University.  In 1998 he returned to Cambridge as a post-doctoral Research Associate and in 1991 was appointed as a University Lecturer.  In 2010 he returned to Melbourne and took up a Professorship at Monash University.  His research is principally in the field of real-time computer vision (i.e. processing of information from a video camera in a computer in real-time typically at frame rate), machine learning and robust methods. These have applications in augmented reality, robotics, assistive technologies for visually impaired users as well as medical imaging.


Matthew Dunbabin (Queensland University of Technology)

Title: To be confirmed

Abstract: To be confirmed

Bio: Dr Matthew Dunbabin joined QUT as a Principal Research Fellow (Autonomous Systems) in 2013. He is known internationally for his research into field robotics, particularly environmental robots, and their application to large-scale marine habitat monitoring, marine pest (Crown-of-Thorns Starfish) control, and aquatic greenhouse gas mapping. He has wide research interests including adaptive sampling and path planning, vision-based navigation, cooperative robotics, as well as robot and sensor network interactions. Dr Dunbabin received his Bachelor of Engineering in Aerospace Engineering from the Royal Melbourne Institute of Technology and his PhD from the Queensland University of Technology. He started his professional career in 1995 as a project engineer at Roaduser Research International, and following his PhD joined the Commonwealth Scientific and Industrial Research Organisation (CSIRO) in the Autonomous Systems Laboratory. At CSIRO he held various roles including Principal Research Scientist, project leader and the Robotics Systems and Marine Robotics team leader before moving to QUT in 2013. A strong advocate of robotic systems in civilian applications, Dr Dunbabin is involved in a number of initiatives aimed at promoting, educating and demonstrating autonomous systems to a range of interest groups nationally and internationally.

Nick Barnes (Australian National University & Data61)

Title: Low level computer vision techniques for 3D scene parsing in bionic eyes and endoscopy

Abstract: Implantable visual prosthetic devices have low dynamic range and so users may have difficulty with poorly contrasted objects. We have shown that computer vision techniques to help with by ensuring the visibility of key objects in the scene. In Computer Vision this is semantic segmentation. Underlying this is are techniques in visual saliency and edge detection. I’ll present some of our recent work in this area as well as results in human implanted vision and our ongoing studies with Bionic Vision Technologies.

Bio: Nick Barnes received the B.Sc. degree with honours in 1992, and a Ph.D. in computer vision for robot navigation in 1999 from the University of Melbourne. From 1992-1994 he worked as a consultant in the IT industry. In 1999 he was a visiting research fellow at the LIRA-Lab at the University of Genoa, Italy, supported by an Achiever Award from the Queens’ Trust for Young Australians. From 2000 to 2003, he was a lecturer with the Department of Computer Science and Software Engineering, The University of Melbourne. Since 2003 he has been with NICTA’s Canberra Research Laboratory, which merged to become Data61@CSIRO. He has been conducting research in the areas of computer vision, vision for
driver and low vision assistance, and vision for vehicle guidance for more than 15 years. His team developed vision processing for bionic vision that was tested with three individuals implanted with a retinal prosthesis in 2012-2014. Their results showed that by using improved vision processing, implanted individuals could achieve better results on standard low vision tests, and functional vision tests. Further trials will commence during 2016. He has more than 100 peer reviewed scientific publications and is co-inventor of eight patent applications. He is currently a senior principal researcher and research group leader in computer vision for Data61 and an Adjunct Associate Professor with the Australian National University. His research interests include visual dynamic scene analysis, wearable sensing, vision for low vision assistance, computational models of biological vision, feature detection, vision for vehicle guidance and medical image analysis.

Tutorials and Workshops

Workshop: The Great Escape

The robot that will be used during the workshop.

Organizers: Feras Dayoub, Dimity Miller, Robert Lee, Lachlan Nicholson, Troy Cordie, Steven Martine

Summary:  The workshop focuses on the use of vision to build an inner representation of the world so a robot can plan and navigate in order to escape a maze. The robot uses a camera tilted towards the ground plane. Using a homography estimation of the plane, the robot has to build an occupancy grid of a network of tracks on the floor marked using colored tapes. As an added level of complexity, the floor plane might contain vertical structures.  The goal is to make the robot escape a maze. The process involves:

  • image segmentation (image processing)
  • the use of a homography to project image information to the floor plane (geometry)
  • occupancy grid of the floor plane (data structure)
  • path planning (search)

The aim is to make the students think about the role of a camera as a sensor on a mobile robot and how the robot uses visual information to build an inner representation of its workspace. The students will use familiar concepts such as image segmentation as well as unfamiliar concepts of using the camera to populate a 2D occupancy grid of the floor.

We have created a Slack workspace for all the students in the workshop to join. This workspace will allow us, the workshop organisers, to provide technical help and provide answers to the students before the school starts. This way we make sure all the required setups are done before Sunday 4th. Also, this workspace will allow the students to communicate and discuss ideas while solving the workshop exercise. You can access the workspace here.

Tutorial A: Region based CNN for Computer Vision and Robotics

Presenters: Ravi Garg, Chao Ma, Thanh-Toan Do

Summary: Deep neural networks have shown state-of-the-art performance on many computer vision problems, including image classification, object detection, semantic segmentation, visual tracking and scene classification. This tutorial will cover basics of deep neural networks and its application to some of the computer vision and robotics problems. The first part will cover the basic building blocks of deep networks, training procedure and popular network architectures. The second part will focus on a particular type of deep network called “Region based CNN” and how it can be applied to various computer vision and robotics problems such as object detection, semantic/instance level segmentation, visual object tracking, object affordances and pose estimation.

Tutorial B: Semantic SLAM – Making robots map and understand the world

Presenters: Yasir Latif, Vincent Lui, Viorela Ila, Trung Pham

Summary: The goal of Simultaneous Localization and Mapping (SLAM) is to construct the representation of an environment while localizing the robot with respect to it. However, this does not provide any understanding of the physical world that the robot is moving in. Such understanding is important for meaning interactions with the world and also results in improved performance for the mapping and localization tasks. This tutorial will introduce the problem of SLAM at it current stage of development and then address various development towards a semantic understanding of the world and its effects on the original SLAM formulation.

Tutorial C: Vision and Action

Presenters: Suman Bista, Valerio Ortenzi, Juxi Leitner

Summary: For an effective deployment of robotics in real-world applications, a robot must be able to perceive, detect and locate objects in its surroundings in order to inform future motion control and decision-making. Interesting examples of effective acting on the environment are robotic manipulation of objects; visual navigation in complex dynamic environments; and an active understanding of the environment [1]. In this tutorial, we will explore most of these aspects. We begin with some fundamentals of Visual servoing and grasping, and explore their applications in manipulation and visual navigation. Furthermore, we present the case of Cartman, our robot that won the Amazon Picking Challenge 2017 as an exercise to think and discuss the use of deep learning methods for robotic vision and action. Finally, we will have some concluding discussions on possible additional relevant topics.



Robert Mahony

Robert Mahony

Chief Investigator

Vincent Lui

Vincent Lui

Research Fellow

Valerio Ortenzi

Valerio Ortenzi

Research Fellow

Carol Taylor

Carol Taylor

ANU Node Administration Officer

Presentation Slides

On this page you will find the slides that have been presented during the summer school. They are still being updated at the moment,

so some folders may not be complete. Do continue to check this page periodically for the next 2-3 weeks.

Deep Dives: Slides

Technical Sessions: Slides

Tutorials: Slides

Australian Centre for Robotic Vision
2 George Street Brisbane, 4001
+61 7 3138 7549