Uncertainty Aware Imitation Learning in Multiple Contexts using Bayesian Neural Networks

Link to paper

MuJoCo Experiments

Set-up

The experiments have been performed on the following packages:
- openai gym, version = '0.9.3'
- MuJoCo mjpro131
Some experiments need changes to be made to the openai gym source code. Perform one of the following operations to have things set up for yourself.
- Copy, paste (, and replace if asked) each and every file inside MuJoCo/gym_files_to_be_merged to their equivalent locations of your gym installation.
- Use the OpenAI gym source code from here. Note that the new files needed for the experiment does not break any other part of the original code.

Setting up the demonstrator controllers

There are 20 tasks defined both for Swimmer and HalfCheetah domains. In order to generate demonstrators on these tasks use the script train.py under MuJoCo/PPO or just run ./overnight_1 for generating HalfCheetah demonstrators and ./overnight_2 for generating Swimmer demonstrators. The generated demonstrator controllers are stored under MuJoCo/saved_demonstrator_models directory. Note that at the end of training a new demonstrator controller, a certain number of demonstration episodes are run and stored for later use under the directory MuJoCo/demonstrator_trajectories. The number of demonstration episodes to record can be changed by changing the value of variable DEMONSTRATOR_EPISODES_TO_LOG in the file Housekeeping.py. Note that the code for PPO has been taken from here.
The quality of demonstrators can be checked by running the script test_demonstrator_models.py under MuJoCo/PPO. This script will generate plots showing performance of controllers under the directory MuJoCo/demonstrator_controller_reward_log. An example usage is python test_demonstrator_models.py HalfCheetah.

Reproducing the results from the paper

To reproduce the top subfigure in figure 2, run the following command from the directory sliding_block/.

python Generate_BBB_Controllers.py -c 3 -ws 2 -po True
python Validate_GP_Controller.py -c 3 -ws 2 -po True
python Validate_BBB_Controller.py -c 3 -ws 2 -po True After these commands finish execution, open sliding_block/plots.ipynb

  all_task_configurations = [
      {BEHAVIORAL_CONTROLLER_KEY: 'BBB', CONTEXT_CODE_KEY: '3', WINDOW_SIZE_KEY: '2', PARTIAL_OBSERVABILITY_KEY: 'True'},
      {BEHAVIORAL_CONTROLLER_KEY: 'GP', CONTEXT_CODE_KEY: '3', WINDOW_SIZE_KEY: '2', PARTIAL_OBSERVABILITY_KEY: 'True'},
      {BEHAVIORAL_CONTROLLER_KEY: 'BBB', CONTEXT_CODE_KEY: '3', WINDOW_SIZE_KEY: '5', PARTIAL_OBSERVABILITY_KEY: 'True'},
      {BEHAVIORAL_CONTROLLER_KEY: 'GP', CONTEXT_CODE_KEY: '3', WINDOW_SIZE_KEY: '5', PARTIAL_OBSERVABILITY_KEY: 'True'},
  ]

  plt.rcParams["grid.alpha"] = 1
  plt.rcParams["grid.color"] = "#cccccc"

  final_flourish(all_task_configurations, uncertainty_ylim=250., cost_ylim=1e6, predictive_error_ylim=1e6)

To reproduce the bottom subfigure in figure 2, run the following command from the directory MuJoCo/.
- python Generate_BBB_Controller.py -d Swimmer -t 4 -ws 1 -nd 1
- python Generate_GP_Controller.py -d Swimmer -t 4 -ws 1 -nd 1
- python Validate_BBB_Controller.py -d Swimmer -t 4 -ws 1 -nd 1 After these commands finish execution, open MuJoCo/Plots.ipynb
- BBBvsGP_generalization(domain_name='Swimmer', number_demonstrations='1', window_size='1', behavioral_controller='4')

To reproduce the results shown in figure 4, run the following commands from directory MuJoCo/.

python proposed_mechanism.py -et active_learning_proof_of_concept -c BBB -dn HalfCheetah -dc 7.0 -dm 200
python proposed_mechanism.py -et active_learning_proof_of_concept -c BBB -dn Swimmer -dc 20.0 -dm 200
python proposed_mechanism.py -et active_learning_proof_of_concept -c NAIVE -dn HalfCheetah
python proposed_mechanism.py -et active_learning_proof_of_concept -c NAIVE -dn Swimmer After these commands finish execution, open MuJoCo/Plots.ipynb

  all_configurations = [
      {'controller': 'BBB' , 'detector_c': 7.0, 'detector_m': 200, 'number_demonstrations': 10},
      {'controller': 'NAIVE', 'number_demonstrations': 10},
  ]

  compare_data_efficiency(domain_name='HalfCheetah',
                          all_configurations=all_configurations,
                          simulation_iterators=[0],
                          demonstration_request_to_gauge=2)

  all_configurations = [
      {'controller': 'BBB' , 'detector_c': 20.0, 'detector_m': 200, 'number_demonstrations': 10},
      {'controller': 'NAIVE', 'number_demonstrations': 10},
  ]

  compare_data_efficiency(domain_name='Swimmer',
                          all_configurations=all_configurations,
                          simulation_iterators=[0],
                          demonstration_request_to_gauge=2)

To reproduce the results shown in figure 4, run the following commands from directory MuJoCo/.

python proposed_mechanism.py -et data_efficient_active_learning -c BBB -dn HalfCheetah -dc 1.7 -dm 200
python proposed_mechanism.py -et data_efficient_active_learning -c BBB -dn HalfCheetah -dc 1.7 -dm 50
python proposed_mechanism.py -et data_efficient_active_learning -c BBB -dn HalfCheetah -dc 1.5 -dm 100
python proposed_mechanism.py -et data_efficient_active_learning -c BBB -dn HalfCheetah -dc 1.2 -dm 200
python proposed_mechanism.py -et data_efficient_active_learning -c BBB -dn HalfCheetah -dc 1.2 -dm 50
python proposed_mechanism.py -et data_efficient_active_learning -c BBB -dn Swimmer -dc 1.7 -dm 200
python proposed_mechanism.py -et data_efficient_active_learning -c BBB -dn Swimmer -dc 1.7 -dm 50
python proposed_mechanism.py -et data_efficient_active_learning -c BBB -dn Swimmer -dc 1.5 -dm 100
python proposed_mechanism.py -et data_efficient_active_learning -c BBB -dn Swimmer -dc 1.2 -dm 200
python proposed_mechanism.py -et data_efficient_active_learning -c BBB -dn Swimmer -dc 1.2 -dm 50
python proposed_mechanism.py -et data_efficient_active_learning -c NAIVE -dn HalfCheetah
python proposed_mechanism.py -et data_efficient_active_learning -c NAIVE -dn Swimmer
python random_controller.py HalfCheetah
python random_controller.py Swimmer After these commands finish execution, open MuJoCo/Plots.ipynb

  all_configurations = [
      {'controller': 'BBB' , 'detector_c': 1.7, 'detector_m': 200, 'number_demonstrations': 10},
      {'controller': 'BBB' , 'detector_c': 1.7, 'detector_m': 50, 'number_demonstrations': 10},
      {'controller': 'BBB' , 'detector_c': 1.5, 'detector_m': 100, 'number_demonstrations': 10},
      {'controller': 'BBB' , 'detector_c': 1.2, 'detector_m': 200, 'number_demonstrations': 10},
      {'controller': 'BBB' , 'detector_c': 1.2, 'detector_m': 50, 'number_demonstrations': 10},
      {'controller': 'NAIVE', 'number_demonstrations': 10},
      {'controller': 'RANDOM'},
  ]

  compare_conservativeness_with_scatter_plot(domain_name='HalfCheetah',
                           all_configurations=all_configurations, ymax=49000.,
                           simulation_iterators=[0,1,2,3,4])

  all_configurations = [
      {'controller': 'BBB' , 'detector_c': 1.7, 'detector_m': 200, 'number_demonstrations': 10},
      {'controller': 'BBB' , 'detector_c': 1.7, 'detector_m': 50, 'number_demonstrations': 10},
      {'controller': 'BBB' , 'detector_c': 1.5, 'detector_m': 100, 'number_demonstrations': 10},
      {'controller': 'BBB' , 'detector_c': 1.2, 'detector_m': 200, 'number_demonstrations': 10},
      {'controller': 'BBB' , 'detector_c': 1.2, 'detector_m': 50, 'number_demonstrations': 10},
      {'controller': 'NAIVE', 'number_demonstrations': 10},
      {'controller': 'RANDOM'},
  ]

  compare_conservativeness_with_scatter_plot(domain_name='Swimmer',
                           all_configurations=all_configurations, ymax=3500.,
                           simulation_iterators=[0,1,2,3,4])

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
MuJoCo		MuJoCo
assets		assets
pendulum_swing_up		pendulum_swing_up
sliding_block		sliding_block
.gitignore		.gitignore
BBBNNRegression.py		BBBNNRegression.py
Detector.py		Detector.py
Housekeeping.py		Housekeeping.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MuJoCo

MuJoCo

assets

assets

pendulum_swing_up

pendulum_swing_up

sliding_block

sliding_block

.gitignore

.gitignore

BBBNNRegression.py

BBBNNRegression.py

Detector.py

Detector.py

Housekeeping.py

Housekeeping.py

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Uncertainty Aware Imitation Learning in Multiple Contexts using Bayesian Neural Networks

MuJoCo Experiments

Set-up

Setting up the demonstrator controllers

Reproducing the results from the paper

About

Releases

Packages

Languages

License

sanjaythakur/Uncertainty-aware-Imitation-Learning-on-Multiple-Tasks-using-Bayesian-Neural-Networks

Folders and files

Latest commit

History

Repository files navigation

Uncertainty Aware Imitation Learning in Multiple Contexts using Bayesian Neural Networks

MuJoCo Experiments

Set-up

Setting up the demonstrator controllers

Reproducing the results from the paper

About

Resources

License

Stars

Watchers

Forks

Languages