In the late years Deep Learning has been a great force of change on most computer vision tasks. In video analysis problems, however, such as action recognition and detection, motion analysis and tracking, shallow architectures remain surprisingly competitive. What is the reason for this conundrum? Larger datasets are part of the solution. The recently proposed Sports1M helped recently in the realistic training of large motion networks. Still, the breakthrough has not yet arrived.
Assuming that the recently proposed video datasets are large enough for training deep networks for video, another likely culprit for the standstill in video analysis is the capacity of the existing deep models. More specifically, the existing deep networks for video analysis might not be sophisticated enough to address the complexity of motion information. This makes sense, as videos introduce an exponential complexity as compared to static images. Unfortunately, state-of-the-art motion representation models are extensions of existing image representations rather than motion dedicated ones. Brave, new and motion-specific representations are likely to be needed for a breakthrough in video analysis.
Visit here for full details of the workshop
Australian Centre for Robotic Vision
2 George Street Brisbane, 4001
+61 7 3138 7549