First reinforcement learning show case: cross entropy method for openai gym acrobot

A simple solution using cross entropy method to train openai gym acrobot game, converge within 1 minute in a i5 6500 cpu, a powerful tool for simple reinforcement learning scenarios.

Acrobot is a 2-link pendulum with only the second joint actuated
Intitially, both links point downwards. The goal is to swing the
end-effector at a height at least the length of one link above the base.

The action is either applying +1, 0 or -1 torque on the joint between
the two pendulum links

initial behavior, pure random movement


8 steps, still in a lost

64 steps, have some feelings

200 steps, first time touch the line successfully!

1000 steps, get more confidence, constantly touch the line in a shorter time

4000 steps, more mature behavior



