Learning Gait Using a Neuromusculoskeletal Model and Imitation Learning

Project Overview

This research addresses limitations in current neuromechanical control models, which struggle with dynamic tasks, environmental adaptation, and long-term planning. While deep reinforcement learning (RL) offers advantages for high-dimensional neuromusculoskeletal systems, existing implementations often produce unnatural movements despite achieving high task rewards. In this project, I explored combining reinforcement learning with imitation learning techniques to generate more physiologically plausible and human-like movements.

Report

Unnatural walking examples: Walking by Jumping (left) and Large Knee Extension (right)

Methodology

Physiologically Plausible Model: Utilized a planar humanoid lower limb model (H0918) with 9 degrees-of-freedom and 18 muscles, including realistic neural delays of 0.015 seconds.
Biomechanically Accurate Simulation: Leveraged the SconeGym environment, based on HyFyDy simulator, which offers significant speed advantages (100x faster than OpenSim) while maintaining accuracy in tendon elasticity and contact dynamics.
Proximal Policy Optimization (PPO): Applied state-of-the-art policy gradient methods with imitation learning components to train the musculoskeletal model.
Carefully Designed Reward Function: Incorporated multiple reward components, including forward progression, imitation of human motion capture data, muscle effort minimization, pain simulation for unnatural postures, and head stabilization.

Key Experiments

The research focused on two primary tasks:

Balancing: Achieved successful static balance maintenance with convergence at approximately 400,000 environment steps, demonstrating robustness across both delayed and non-delayed sensory feedback conditions.
Walking: Explored walking behaviors with non-delayed observations, achieving policy convergence at approximately 5 million environment steps, though with challenges in producing natural gait patterns.

Results and Insights

The preliminary findings revealed several important insights:

Reward Design Criticality: The balance between imitation and task rewards significantly impacts movement naturalness. Different reward weightings produced distinct movement patterns, with some exhibiting unnatural behaviors like hopping or excessive knee extension.
Hyperparameter Sensitivity: The performance and motion quality proved highly sensitive to hyperparameter settings, particularly the α parameter in the radial basis function used for imitation rewards.
Simulation Fidelity Importance: Working with physiologically accurate models required careful integration of biomechanical principles into the reward structure and simulation environment.

Conclusion and Future Directions

The project demonstrated that directly applying imitation rewards into DRL algorithms does not automatically yield natural walking behaviors. Success requires careful reward function design, extensive hyperparameter optimization, and consistent implementation environments. Future work will focus on improved imitation learning strategies and exploring unsupervised skill discovery to capture different movement strategies observed in humans.

This research contributes to the growing field of biomechanically accurate reinforcement learning for human movement simulation, with potential applications in rehabilitation robotics, prosthetics design, and human movement understanding.