ControlMM: Controllable Masked Motion Generation

Ekkasit Pinyoanuntapong1, Muhammad Usama Saleem1, Korrawe Karunratanakul2, Pu Wang1, Hongfei Xue1, Chen Chen3, Chuan Guo4, Junli Cao4, Jian Ren4, Sergey Tulyakov4

1University of North Carolina at Charlotte, 2ETH Zurich, 3University of Central Florida, 4Snap Inc.

twbs

TL;DR

Masked Motion Models have shown superior quality and speed compared to Motion Diffusion Models. However, current SOTA methods for motion control primarily rely on Motion Diffusion Models. ControlMM is the first to introduce controllability to Masked Motion Models through two novel components:

Motion Control Model (ControlNet-like for Masked Model)
Logits/Codebook Editing (Inference time guidance for Masked Model)

Moreover, ControlMM achieves SOTA in both quality and control precision, while supporting real-time generation and a wide range of applications.

twbs

Comparison of FID score, spatial control error, and motion generation speed (circle size) for our accurate and fast models comparing to state-of-the-art models. The closer the point is to the origin and the smaller the circle, the better performance.

Method

twbs

Compared to SOTA - Multiple Joints

a person crosses their arms for chest fly

ControlMM (our)

OmniControl

MotionLCM

a person jumps in the air once

ControlMM (our)

OmniControl

MotionLCM

a person walks in a circle clockwise

ControlMM (our)

OmniControl

MotionLCM

a person walks forward and waves his hands

ControlMM (our)

OmniControl

MotionLCM

Compared to SOTA - Pelvis Only

a person walks forward and waves his hands

a person dances to salsa music

ControlMM (our)

GMD

ControlMM (our)

GMD

a person walks forward and come back to the same position from where we started

ControlMM (our)

GMD

Dense Signals

the person draws a heart with hand

person walks down and up in a figure 8 pattern

A figure walks forward in a zig zag pattern

a person waves both his arms

someone is lifting something up

a person stands and waving

a man walks in a curved line with his hands at his sides

a person walks with support

a person walks

Sparse Signals

A person walks forward with their hands up in a surrender pose

person walks over and sits down in a chair.

A person jumps and kicks a football in the air with their head

A person walks forward, casually greeting others with a wave or hello

a man walks left and right

A person walks, pauses, and performs a high kick in the air.

Body Part Timeline Control

Upper Body: a person puts hands in the air.

Left Foot : a person kicks left legs.

Lower Body: a person jumps forward.

0 frames 60 120 frames

Generating motion for the upper body from frames 0 to 120 based on the “a person puts hands in the air.” For the lower body, motion is generated in two parts: From frames 0 to 60, based on the “a person kicks left legs.” From frames 60 to 120, based on the “a person jumps forward.”

Upper Body: the person is bending over forward

Left Foot : shake with their left leg

0 frames 60 120 frames

Generating upper body motion from frames 0 to 120 based on the prompt: "the person is bending over forward" Simultaneously, lower body motion is generated from frames 0 to 120 based on the prompt: "shake with their left leg"

Obstacle Avoidance

the man walks zig zag.

the man walks forward in a straight line.