Deep video representation learning: a survey
Document Type
Article
Publication Date
6-1-2024
Abstract
This paper provides a review on representation learning for videos. We classify recent spatio-temporal feature learning methods for sequential visual data and compare their pros and cons for general video analysis. Building effective features for videos is a fundamental problem in computer vision tasks involving video analysis and understanding. Existing features can be generally categorized into spatial and temporal features. Their effectiveness under variations of illumination, occlusion, view and background are discussed. Finally, we discuss the remaining challenges in existing deep video representation learning studies.
Publication Source (Journal or Book title)
Multimedia Tools and Applications
First Page
59195
Last Page
59225
Recommended Citation
Ravanbakhsh, E., Liang, Y., Ramanujam, J., & Li, X. (2024). Deep video representation learning: a survey. Multimedia Tools and Applications, 83 (20), 59195-59225. https://doi.org/10.1007/s11042-023-17815-3