Reconstructions from Ground Truth Affine

Left column: original video sequence (192x192 size randomly cropped, away from the border) from UCF-101 dataset. Right column: reconstructions using the estimated affine transforms described in sec. 2.1. Note: these reconstructions do not use the affine transforms predicted by our model, but estimated using the ground truth frames.

As you can see there is barely any noticeable difference between the ground truth and reconstructed video sequences. This qualitatively validates two of our assumptions: (i) frame can be decomposed into patches and (ii) each patch motion is well modeled by an affine transform

Frames from sequence	sequence re-created by applying estimated affine transformations to the previous ground truth frame