Education, Science, Technology, Innovation and Life
Open Access
Sign In

MoCapDT: Temporal Guided Motion Capture Solving with Diffusion Transfer

Download as PDF

DOI: 10.23977/jeis.2023.080309 | Downloads: 36 | Views: 542

Author(s)

Butian Xiong 1

Affiliation(s)

1 Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China

Corresponding Author

Butian Xiong

ABSTRACT

We present an approach to reconstruct the joint location from noisy marker position in 4D data. The 4D data means the 3D location of different markers and time sequence. At the core of our approach, we apply a modified diffusion model architecture to transfer and denoise the raw marker information in the latent space under the guidance of other temporal data. Then we decode the latent space to not real skeleton 3D space. This enable us not only utilize the temporal guidance, we further utilize the iterative denoising technique to exploit the potential in the diffusion network. Furthermore, we demonstrate that our work outperform auto- encoder based deep learning model by large margin during our experiment on CMU-Synthesized data set and some real-world dataset provided by NC-SOFT.

KEYWORDS

MoCapDT, 3D space, NC-SOFT

CITE THIS PAPER

Butian Xiong, MoCapDT: Temporal Guided Motion Capture Solving with Diffusion Transfer. Journal of Electronics and Information Science (2023) Vol. 8: 75-82. DOI: http://dx.doi.org/10.23977/10.23977/jeis.2023.080309.

REFERENCES

[1] S. Luo and W. Hu, "Score-based point cloud denoising," in Proceedings of the IEEE/CVF Inter-national Conference on Computer Vision, 2021, pp. 4583–4592 
[2] D. Holden, "Robust solving of optical motion capture data by denoising," ACM Transactions on Graphics (TOG), vol. 37, no. 4, pp. 1–12, 2018. 
[3] K. Chen, Y. Wang, S.-H. Zhang, S.-Z. Xu, W. Zhang, and S.-M. Hu, "Mocap-solver: A neural solver for optical motion capture data," ACM Transactions on Graphics (TOG), vol. 40, no. 4, pp. 1–11, 2021. 
[4] N. Ghorbani and M. J. Black, "Soma: Solving optical marker-based mocap automatically," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11 117–11126. 
[5] K. Aberman, P. Li, D. Lischinski, O. Sorkine-Hornung, D. Cohen-Or, and B. Chen, "Skeleton-aware networks for deep motion retargeting," ACM Transactions on Graphics (TOG), vol. 39, no. 4, pp. 62–1, 2020. 
[6] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241 
[7] J. Koo, S. Yoo, M. H. Nguyen, and M. Sung, "Salad: Part-level latent diffusion for 3d shape generation and manipulation," arXiv preprint arXiv:2303.12236, 2023 
[8] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, "Smpl: A skinned multi- person linear model," ACM transactions on graphics (TOG), vol. 34, no. 6, pp. 1–16, 2015. 
[9] A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik, "End-to-end recovery of human shape and pose," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7122–7131.
[10] N. Kolotouros, G. Pavlakos, and K. Daniilidis, "Convolutional mesh regression for single-image human shape reconstruction," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4501–4510.
[11] N. Kolotouros, G. Pavlakos, M. J. Black, and K. Daniilidis, "Learning to reconstruct 3d human pose and shape via model-fitting in the loop," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 2252–2261.
[12] M. Kocabas, N. Athanasiou, and M. J. Black, "Vibe: Video inference for human body pose and shape estimation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5253–5263.
[13] S. Goel, G. Pavlakos, J. Rajasegaran, A. Kanazawa, and J. Malik, "Humans in 4d: Reconstructing and tracking humans with transformers," arXiv preprint arXiv:2305.20091, 2023.
[14] Q. Cui and H. Sun, "Towards accurate 3d human motion prediction from incomplete observa-tions," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4801–4810.
[15] A. Hertz, R. Mokady, J. Tenenbaum, K. Aberman, Y. Pritch, and D. Cohen-Or, "Prompt-to-prompt image editing with cross attention control," arXiv preprint arXiv:2208.01626, 2022.
[16] J. Song, C. Meng, and S. Ermon, "Denoising diffusion implicit models," arXiv preprint arXiv:2010.02502, 2020.
[17] C.-H. Lin, J. Gao, L. Tang, T. Takikawa, X. Zeng, X. Huang, K. Kreis, S. Fidler, M.-Y. Liu, and T.-Y. Lin, "Magic3d: High-resolution text-to-3d content creation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 300–309.
[18] J. Ho and T. Salimans, "Classifier-free diffusion guidance," arXiv preprint arXiv:2207.12598, 2022.
[19] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gon-tijo Lopes, B. Karagol Ayan, T. Salimans et al., "Photorealistic text-to-image diffusion models with deep language understanding," Advances in Neural Information Processing Systems, vol. 35, pp. 36 479–36 494, 2022.
[20] G. Couairon, J. Verbeek, H. Schwenk, and M. Cord, "Diffedit: Diffusion-based semantic image editing with mask guidance," arXiv preprint arXiv:2210.11427, 2022.
[21] A. Hertz, R. Mokady, J. Tenenbaum, K. Aberman, Y. Pritch, and D. Cohen-Or, "Prompt-to-prompt image editing with cross attention control," arXiv preprint arXiv:2208.01626, 2022.
[22] B. Kawar, S. Zada, O. Lang, O. Tov, H. Chang, T. Dekel, I. Mosseri, and M. Irani, "Imagic: Text -based real image editing with diffusion models," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6007–6017.
[23] B. Poole, A. Jain, J. T. Barron, and B. Mildenhall, “Dreamfusion: Text-to-3d using 2d diffusion,” arXiv preprint arXiv:2209.14988, 2022.
[24] Q. Shen, X. Yang, and X. Wang, "Anything-3d: Towards single-view anything reconstruction in the wild," arXiv preprint arXiv:2304.10261, 2023.
[25] G. Qian, J. Mai, A. Hamdi, J. Ren, A. Siarohin, B. Li, H.-Y. Lee, I. Skorokhodov, P. Wonka, S. Tulyakov et al., "Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors," arXiv preprint arXiv:2306.17843, 2023.
[26] M. Xu, L. Yu, Y. Song, C. Shi, S. Ermon, and J. Tang, "Geodiff: A geometric diffusion model for molecular conformation generation," arXiv preprint arXiv:2203.02923, 2022.
[27] M. Xu, A. S. Powers, R. O. Dror, S. Ermon, and J. Leskovec, "Geometric latent diffusion models for 3d molecule generation," in International Conference on Machine Learning. PMLR, 2023, pp. 38 592–38 610.
[28] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.
[29] M. Xu, L. Yu, Y. Song, C. Shi, S. Ermon, and J. Tang, "Geodiff: A geometric diffusion model for molecular conformation generation," arXiv preprint arXiv:2203.02923, 2022.

Downloads: 6563
Visits: 254617

Sponsors, Associates, and Links


All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.