Point-Based Modeling of Human Clothing

ICCV 2021

Ilya Zakharkin^1,2*

Kirill Mazur^1*

Artur Grigorev¹

Victor Lempitsky^1,2**

¹Samsung AI Center, Moscow

²Skolkovo Institute of Science and Technology, Moscow

^* denotes equal contribution

^** currently at Cinemersive Labs and Skoltech

TL;DR: We present a new point-based approach for 3D clothing modeling. We train a draping network based on cloud transformer and get low-dimensional latent space of garment style embeddings - outfit codes. With these we are able to reconstruct clothing geometry (point cloud) given a single image, as well as clothing appearance given a video using neural point-based graphics.

Abstract

We propose a new approach to human clothing modeling based on point clouds. Within this approach, we learn a deep model that can predict point clouds of various outfits, for various human poses, and for various human body shapes. Notably, outfits of various types and topologies can be handled by the same model. Using the learned model, we can infer the geometry of new outfits from as little as a single image, and perform outfit retargeting to new bodies in new poses. We complement our geometric model with appearance modeling that uses the point cloud geometry as a geometric scaffolding and employs neural point-based graphics to capture outfit appearance from videos and to re-render the captured outfits. We validate both geometric modeling and appearance modeling aspects of the proposed approach against recently proposed methods and establish the viability of point-based clothing modeling.

Main idea

Our method could be viewed as 3 parts:

Draping network training and Outfit code space learning
Single-image outfit geometry reconstruction
Neural point-based appearance modeling

Draping network training

The draping network takes the latent code of a clothing outfit, the subset of vertices of an SMPL body mesh, and predicts the point cloud of the clothing outfit adapted to the body shape and the body pose. We use the recently proposed Cloud Transformer architecture to perform the mapping.

Random walks in the latent outfit space.
We change the Z vector and fix the other input to a certain pose and to a certain body shape. — Random walks in the latent outfit space: Z vector changes while the body shape and pose are fixed

We train the model by fitting it to a Cloth3D synthetic dataset of physically simulated clothing using Generative Latent Optimization approach. After fitting, the outfits in the training dataset all get latent codes.

Single-image outfit geometry reconstruction

Given an input image, its segmentation, and its SMPL mesh fit, we can find the latent code of the outfit. The code is obtained by optimizing the mismatch between the segmentation and the projection of the point cloud. The draping network remains frozen during this process, only the outfit code vector is being changed (we are "searching" in the latent space).

Fitting an outfit code from a single image (its segmentation mask)

Comparison of image-based outfit code fitting (geometry reconstruction) on AzurePeople dataset

Outfits modeled from single in-the-wild images using our model are retargeted to novel poses

Results of clothing geometry reconstruction from unseen photos in unseen body poses.
Left: Comparison on AzurePeople dataset. Right: Reconstruction from in-the-wild internet images.

Neural point-based appearance modeling

We expand our geometric model further to include appearance modeling. We use the neural point-based graphics approach. In more detail, we assign each outfput point a neural appearance descriptor and introduce a rendering network. We then fit the parameters of this network as well as the appearance descriptors to a video of a person wearing a certain outfit. The appearance fitting is performed after we fit the clothing geometry.

Outfit appearance modeling utilizing the Neural Point-Based Graphics

Results of clothing neural rendering (apperance modeling) from unseen videos in unseen body poses.
Left: AzurePeople dataset. Right: PeopleSnapshot dataset.

Virtual try-on

Given an input video with SMPL mesh calculated for each frame, our model is able to retarget and repose the clothing learned from any other video. Our approach could be used for virtual try-on on images as well (more simple case than a video).