A simple solution using cross entropy method to train openai gym acrobot game, converge within 1 minute in a i5 6500 cpu, a powerful tool for simple reinforcement learning scenarios.
Acrobot is a 2-link pendulum with only the second joint actuated
Intitially, both links point downwards. The goal is to swing the
end-effector at a height at least the length of one link above the base.
The action is either applying +1, 0 or -1 torque on the joint between
the two pendulum links
initial behavior, pure random movement
8 steps, still in a lost
64 steps, have some feelings
200 steps, first time touch the line successfully!
1000 steps, get more confidence, constantly touch the line in a shorter time
4000 steps, more mature behavior