Hi Antymon,
Thanks very much for the repo so that people can export the trained agent from python to c++.
I would like to confirm the changes I have to make in order to run my own task:
-
Make my environment inherit from Env abstract class under env\env.hpp
-
Modify the main ppo2.cpp which creates instance of an environment and passes it to PPO
*I have two questions:
a. if I would like to run inference only, should I use algorithm.eval(obs) directly? As it seems it uses get_deterministic_action() and I noticed there are also step(const tensorflow::Tensor& obs) and value(const tensorflow::Tensor& obs) which I can't tell exactlly what the differences are between them.
b. It seems a way to resume training using your implementation so that online learning can be achieved?
- Create own computational graph and potentially make some small modifications to the core algorithm if using more involved policies (currently implementation supports only MLP policies). Graph generation is mentioned below.
- I used 'tf.train.export_meta_graph(graph=model.graph, filename='my-model.meta', clear_devices=True, clear_extraneous_savers=True, strip_default_attrs=True)' as it mentioned here but it has many redundant tensors in the generated model when I visualize it. Some say its because some stuff used for training are preserved. I was wondering have you encountered such situation and how did you solve it?
Thank you again!
Hi Antymon,
Thanks very much for the repo so that people can export the trained agent from python to c++.
I would like to confirm the changes I have to make in order to run my own task:
Make my environment inherit from Env abstract class under env\env.hpp
Modify the main ppo2.cpp which creates instance of an environment and passes it to PPO
*I have two questions:
a. if I would like to run inference only, should I use algorithm.eval(obs) directly? As it seems it uses get_deterministic_action() and I noticed there are also step(const tensorflow::Tensor& obs) and value(const tensorflow::Tensor& obs) which I can't tell exactlly what the differences are between them.
b. It seems a way to resume training using your implementation so that online learning can be achieved?
Thank you again!