Aliaksandr Siarohin

Welcome! I am a Staff Research Scientist leading video generation research at Snap. Previously, I was a Ph.D Student at the University of Trento under supervision of prof Nicu Sebe. During my Ph.D. years I created First Order Motion Model, a video animation technology that started several startups and for several years was one of the most popular models on Runway. First Order Motion Model was also one of the first AI technologies used for commercial media creation. My team and I developed SnapVideo, a family of foundational video generation models of unprecedented speed, low cost and the quality at the level of leading models, such as Veo and Sora. SnapVideo now powers all video generation applications at Snap. I have more than 30 works published in top computer vision and machine learning conferences.

Contact: aliaksandr [dot] siarohin [at] gmail [dot] com

[Google Scholar]

[GitHub]

[CV]

Publications:

DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models

Ziyi Wu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Ashkan Mirzaei, Igor Gilitschenski, Sergey Tulyakov, Aliaksandr Siarohin NeurIPS 2025

Aliaksandr Siarohin

Publications:

DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models

4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

Improving Progressive Generation with Decomposable Flow Matching

Improving the Diffusability of Autoencoders

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

VideoAlchemy: Open-set Personalization in Video Generation

Mind the Time: Temporally-Controlled Multi-Event Video Generation

SF-V: Single Forward Video Generation Model

4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Snap Video: Scaled Spatiotemporal Transformers for Text-to-video Synthesis

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

SPAD: Spatially Aware Multi-View Diffusers

Promptable Game Models: Text-guided Game Simulation via Masked Diffusion Models

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

Text-Guided Synthesis of Eulerian Cinemagraphs

Autodecoding latent 3d diffusion models

Unsupervised Volumetric Animation

Playable Environments: Video Manipulation in Space and Time

Motion Representations for Articulated Animation

Playable Video Generation

Motion-supervised Co-Part Segmentation

TriGAN: image-to-image translation for multi-source domain adaptation

First Order Motion Model for Image Animation

Attention-based Fusion for Multi-source Human Image Generation

DwNet: Dense warp-based network for pose-guided human video generation

Increasing image memorability with neural style transfer

Appearance and Pose-Conditioned Human Image Generation using Deformable GANs

Unsupervised Domain Adaptation using Feature-Whitening and Consensus Loss

Animating arbitrary objects via deep motion transfer

Whitening and Coloring Batch Transform for GANs

Enhancing Perceptual Attributes with Bayesian Style Generation

Deformable gans for pose-based human image generation

How to make an image more memorable?: A deep style transfer approach