Activity recognition in video is a challenging task that requires understanding of both appearance and motion. Dynamic images provide an efficient and effective way of encoding motion from a sequence of video frames into a compact representation amenable for classification by state-of-the-art convolutional neural networks. From a technical perspective dynamic images can be viewed as an approximation to the rank-pooling operator, which learns mapping from frame features to temporal ordering and therefore captures the dynamics of activities in the scene.