
Paper 
Data 
Code Coming soon 

To what extent does Bob’s behavior affect Alice’s behavior? We study this question in a couple’s dance  an example of fullbody dyadic physical social interaction. We predict the full body motion of a dancer, Alice (yellow), given their own past motion and their partner, Bob’s (blue), motion. 
Testtime autoregressive prediction in the unary (left) and dyadic (right) tasks. In the unary prediction task, we start from Alice's past motion up to time \( t_\pi \), and predict the next time step in each iteration. In the dyadic prediction task, we start from Alice and Bob's motion up to time \( t_\pi \). In each step, we predict Alice's next token from her past ground truth motion until \( t_{π} \), her predicted motion for \( t > t_\pi \) , and Bob's past ground truth motion 
Illustration of the VQVAEs. We learn three separate codebooks, one for each body model parameter. An encoder, E·, maps the body parameter to the codebook, Ζ·. The decoder, D· brings the codebook latent vectors back into body model parameter space. To obtain 3D meshes, we jointly pass the parameters through the body model function. 
Transformer training procedure in the dyadic case. The major part of our network is a transformerdecoder block with causal masking, such that Alice and Bob can only attend to their past motion. Input to our model are Alice and Bob's codebook indices for body pose, Θ, orientation, Φ, and translation, Γ. We embed the tokens into a latent space and add time, person, and parameter encoding. The final layer in our network generates probability scores over codebook indices, representing the likelihood of an index being the next motion \(s_{t_{\pi}+1}\) 
Citation 
AcknowledgementsWe thank Evonne Ng and members of the BAIR community for helpful discussions. This work was supported by BAIR/BDD sponsors, ONR MURI (N000142112801), and the DARPA MCS program. This webpage template was borrowed from some colorful folks 