rlify.agents package
This package contains all agents and their helper classes.
rlify.agents.drl_agent module
- class rlify.agents.drl_agent.RL_Agent(obs_space, action_space, batch_size=256, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)
Bases:
ABC
RL_Agent is an abstract class that defines the basic structure of an RL agent. It is used as a base class for all RL agents.
- TRAIN = 0
- EVAL = 1
- __init__(obs_space, action_space, batch_size=256, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)
- Parameters:
obs_space (gym.spaces) – observation space of the environment
action_space (gym.spaces) – action space of the environment
batch_size (int, optional) – batch size for training. Defaults to 256.
explorer (Explorer, optional) – exploration method. Defaults to RandomExplorer().
num_parallel_envs (int) – number of parallel environments. Defaults to 4.
num_epochs_per_update (int) – Training epochs per update. Defaults to 10.
lr (float, optional) – learning rate. Defaults to 0.0001.
device (torch.device, optional) – device to run on. Defaults to None.
experience_class (object, optional) – experience replay class. Defaults to ExperienceReplay.
max_mem_size (int, optional) – maximum size of the experience replay buffer. Defaults to 10e6.
discount_factor (float, optional) – discount factor. Defaults to 0.99.
reward_normalization (bool, optional) – whether to normalize the rewards by maximum absolut value. Defaults to True.
tensorboard_dir (str, optional) – tensorboard directory. Defaults to ‘./tensorboard’.
dataloader_workers (int, optional) – number of workers for the dataloader. Defaults to 0.
accumulate_gradients_per_epoch (bool, optional) – whether to update the model every epoch or every batch. Defaults to None - when None is set in reccurent models it will be set to True, and in normal models it will be set to False
- get_train_batch_size()
Returns batch_size on normal NN Returns ceil(batch_size / num_parallel_envs) on RNN
- contains_reccurent_nn()
- validate_models(models)
- init_tb_writer(tensorboard_dir=None)
Initializes tensorboard writer
- Parameters:
tensorboard_dir (str) – tensorboard directory
- abstract static get_models_input_output_shape(obs_space, action_space)
Calculates the input and output shapes of the models :type obs_space: :param obs_space: observation space :type action_space: :param action_space: action space
- Returns:
dictionary containing the input and output shapes of the models
- Return type:
dictionary
- abstract setup_models()
Initializes the NN models
- Return type:
list
[Module
]
- abstract update_policy(trajectories_dataset)
Updates the models and according to the agnets logic
- get_train_metrics()
Returns the training metrics
- read_obs_space_properties()
Returns the observation space properties
- read_action_space_properties()
Returns the action space properties
- define_action_space(action_space)
Defines the action space
- __del__()
Destructor
- static read_nn_properties(ckpt_fname)
- _generate_nn_save_key(model)
Generates a key for saving the model the key includes the approximated args, class type - for reproducibility and the state dict of the model :type model:
Module
:param model: the model to save- Returns:
dictionary containing the model’s state
- Return type:
dictionary
- abstract save_agent(f_name)
Saves the agent to a file.
- Parameters:
f_name (str) – file name
- Return type:
dict
Returns: a dictionary containing the agent’s state.
- abstract load_agent(f_name)
Loads the agent from a file. Returns: a dictionary containing the agent’s state.
- abstract set_train_mode()
sets the agent to train mode - all models are set to train mode
- abstract set_eval_mode()
sets the agent to train mode - all models are set to eval mode
- gracefully_close_envs()
A decorator that closes the environment processes in case of an exception
- Parameters:
func – the function to wrap
- Returns:
the wrapped function
- train_episodial(*args, **kwargs)
- train_n_steps(*args, **kwargs)
- _train_n_iters(env, n_iters, episodes=False, max_episode_len=None, disable_tqdm=False)
Trains the agent for a given number of steps
- Parameters:
env (gym.Env) – the environment to train on
n_iters (int) – number of steps/episodes to train
episodes (bool, optional) – whether to train for episodes or steps. Defaults to False.
max_episode_len (int, optional) – maximum episode length - truncates after that. Defaults to None.
disable_tqdm (bool, optional) – disable tqdm. Defaults to False.
- Returns:
train rewards
- abstract get_trajectories_data()
Returns the trajectories data
- criterion_using_loss_flag(func, arg1, arg2, loss_flag)
Applies the function using only where loss flag is true
- Parameters:
func – the function to apply
arg1 – the first argument
arg2 – the second argument
loss_flag – the loss flag
- Returns:
the result of the function
- apply_regularization(reg_coeff, vector, loss_flag)
Applies the criterion to the arguments
- set_num_parallel_env(num_parallel_envs)
Sets the number of parallel environments
- Parameters:
num_parallel_envs (int) – number of parallel environments
- abstract act(observations, num_obs=1)
- Parameters:
observations (
array
) – The observations to act onnum_obs (
int
) – The number of observations to act on
- Return type:
array
- Returns:
The selected actions (np.ndarray)
- load_highest_score_agent()
Loads the highest score agent from training
- get_highest_score_agent_ckpt_path()
Returns the path of the highest score agent from training
- abstract best_act(observations, num_obs=1)
The highest probabilities actions in a detrminstic way
- Parameters:
observations – The observations to act on
num_obs – The number of observations to act on
- Returns:
The highest probabilty action to be taken in a detrministic way
- norm_obs(observations)
Normalizes the observations according to the pre given normalization parameters [future api - currently not availble]
- pre_process_obs_for_act(observations, num_obs)
Pre processes the observations for act
- Parameters:
observations (
ObsWrapper
|dict
) – The observations to act onnum_obs (
int
) – The number of observations to act on
- Returns:
The pre processed observations an ObsWrapper object with the right dims
- return_correct_actions_dim(actions, num_obs)
Returns the correct actions dimention
- Parameters:
actions (
array
) – The selected actionsnum_obs (
int
) – The number of observations to act on
- close_env_procs()
Closes the environment processes
- set_intrisic_reward_func(func)
sets the agents inner reward function to a custom function that takes state, action, reward and returns reward for the algorithm:
# Create some agent agent = PPO_Agent(obs_space=env.observation_space, action_space=env.action_space tensorboard_dir = None) def dummy_reward_func(state, action, reward): if state[0] > 0: return reward + 1 agent.set_intrisic_reward_func(dummy_reward_func) # now train normaly
- Parameters:
func (function) – a function that takes state, action, reward and returns reward for the algorithm
- intrisic_reward_func(state, action, reward)
Calculates the agents inner reward
- collect_episode_obs(env, max_episode_len=None, num_to_collect_in_parallel=None)
Collects observations from the environment
- Parameters:
env (gym.env) – gym environment
max_episode_len (int, optional) – maximum episode length. Defaults to None.
num_to_collect_in_parallel (int, optional) – number of parallel environments. Defaults to None.
env_funcs (dict, optional) – dictionary of env functions mapping to call on the environment. Defaults to {“step”: “step”, “reset”: “reset”}.
- Returns:
total reward collected
- Return type:
float
if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent
- get_last_collected_experiences(number_of_episodes)
returns the last collected experiences
- Parameters:
number_of_episodes (int) – number of episodes to return
- clear_exp()
clears the experience replay buffer
- __abstractmethods__ = frozenset({'act', 'best_act', 'get_models_input_output_shape', 'get_trajectories_data', 'load_agent', 'reset_rnn_hidden', 'save_agent', 'set_eval_mode', 'set_train_mode', 'setup_models', 'update_policy'})
- __dict__ = mappingproxy({'__module__': 'rlify.agents.drl_agent', '__doc__': '\n RL_Agent is an abstract class that defines the basic structure of an RL agent.\n It is used as a base class for all RL agents.\n ', 'TRAIN': 0, 'EVAL': 1, '__init__': <function RL_Agent.__init__>, 'get_train_batch_size': <function RL_Agent.get_train_batch_size>, 'contains_reccurent_nn': <function RL_Agent.contains_reccurent_nn>, 'validate_models': <function RL_Agent.validate_models>, 'init_tb_writer': <function RL_Agent.init_tb_writer>, 'get_models_input_output_shape': <staticmethod(<function RL_Agent.get_models_input_output_shape>)>, 'setup_models': <function RL_Agent.setup_models>, 'update_policy': <function RL_Agent.update_policy>, 'get_train_metrics': <function RL_Agent.get_train_metrics>, 'read_obs_space_properties': <function RL_Agent.read_obs_space_properties>, 'read_action_space_properties': <function RL_Agent.read_action_space_properties>, 'define_action_space': <function RL_Agent.define_action_space>, '__del__': <function RL_Agent.__del__>, 'read_nn_properties': <staticmethod(<function RL_Agent.read_nn_properties>)>, '_generate_nn_save_key': <function RL_Agent._generate_nn_save_key>, 'save_agent': <function RL_Agent.save_agent>, 'load_agent': <function RL_Agent.load_agent>, 'set_train_mode': <function RL_Agent.set_train_mode>, 'set_eval_mode': <function RL_Agent.set_eval_mode>, 'gracefully_close_envs': <function RL_Agent.gracefully_close_envs>, 'train_episodial': <function RL_Agent.gracefully_close_envs.<locals>.wrapper>, 'train_n_steps': <function RL_Agent.gracefully_close_envs.<locals>.wrapper>, '_train_n_iters': <function RL_Agent._train_n_iters>, 'get_trajectories_data': <function RL_Agent.get_trajectories_data>, 'criterion_using_loss_flag': <function RL_Agent.criterion_using_loss_flag>, 'apply_regularization': <function RL_Agent.apply_regularization>, 'set_num_parallel_env': <function RL_Agent.set_num_parallel_env>, 'act': <function RL_Agent.act>, 'load_highest_score_agent': <function RL_Agent.load_highest_score_agent>, 'get_highest_score_agent_ckpt_path': <function RL_Agent.get_highest_score_agent_ckpt_path>, 'best_act': <function RL_Agent.best_act>, 'norm_obs': <function RL_Agent.norm_obs>, 'pre_process_obs_for_act': <function RL_Agent.pre_process_obs_for_act>, 'return_correct_actions_dim': <function RL_Agent.return_correct_actions_dim>, 'close_env_procs': <function RL_Agent.close_env_procs>, 'set_intrisic_reward_func': <function RL_Agent.set_intrisic_reward_func>, 'intrisic_reward_func': <function RL_Agent.intrisic_reward_func>, 'collect_episode_obs': <function RL_Agent.collect_episode_obs>, 'reset_rnn_hidden': <function RL_Agent.reset_rnn_hidden>, 'get_last_collected_experiences': <function RL_Agent.get_last_collected_experiences>, 'clear_exp': <function RL_Agent.clear_exp>, 'run_env': <function RL_Agent.gracefully_close_envs.<locals>.wrapper>, '__dict__': <attribute '__dict__' of 'RL_Agent' objects>, '__weakref__': <attribute '__weakref__' of 'RL_Agent' objects>, '__abstractmethods__': frozenset({'save_agent', 'load_agent', 'act', 'get_trajectories_data', 'reset_rnn_hidden', 'update_policy', 'get_models_input_output_shape', 'set_eval_mode', 'setup_models', 'best_act', 'set_train_mode'}), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})
- __module__ = 'rlify.agents.drl_agent'
- __weakref__
list of weak references to the object
- _abc_impl = <_abc._abc_data object>
- run_env(*args, **kwargs)
rlify.agents.vdqn_agent module
- class rlify.agents.vdqn_agent.DQNDataset(states, actions, rewards, returns, dones, truncated, next_states, prepare_for_rnn)
Bases:
Dataset
Dataset for DQN
- __init__(states, actions, rewards, returns, dones, truncated, next_states, prepare_for_rnn)
- Parameters:
states – np.array: the states
actions – np.array: the actions
rewards – np.array: the rewards
returns – np.array: the returns
dones – np.array: the dones
truncated – np.array: the truncated
next_states – np.array: the next states
prepare_for_rnn – bool: whether to prepare for RNN or not
- __len__()
- __getitems__(idx)
- __getitem__(idx)
- collate_fn(batch)
- __module__ = 'rlify.agents.vdqn_agent'
- __parameters__ = ()
- class rlify.agents.vdqn_agent.DQNData(states, actions, rewards, returns, dones, truncated, next_states, prepare_for_rnn)
Bases:
IData
DQN Data
- __init__(states, actions, rewards, returns, dones, truncated, next_states, prepare_for_rnn)
- Parameters:
states – np.array: the states
actions – np.array: the actions
rewards – np.array: the rewards
returns – np.array: the returns
dones – np.array: the dones
truncated – np.array: the truncated
next_states – np.array: the next states
prepare_for_rnn – bool: whether to prepare for RNN or not
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __module__ = 'rlify.agents.vdqn_agent'
- _abc_impl = <_abc._abc_data object>
- class rlify.agents.vdqn_agent.VDQN_Agent(obs_space, action_space, Q_model, dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0, accumulate_gradients_per_epoch=None)
Bases:
RL_Agent
DQN Agent
- __init__(obs_space, action_space, Q_model, dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0, accumulate_gradients_per_epoch=None)
Example:
env_name = "CartPole-v1" env = gym.make(env_name, render_mode=None) models_shapes = VDQN_Agent.get_models_input_output_shape(env.observation_space, env.action_space) Q_input_shape = models_shapes["Q_model"]["input_shape"] Q_out_shape = models_shapes["Q_model"]["out_shape"] Q_model = fc.FC(input_shape=Q_input_shape, out_shape=Q_out_shape) agent = VDQN_Agent(obs_space=env.observation_space, action_space=env.action_space, batch_size=64, max_mem_size=10**5, num_parallel_envs=16, lr=3e-4, Q_model=Q_model, discount_factor=0.99, target_update='hard[update_freq=10]', tensorboard_dir = None, num_epochs_per_update=2) train_stats = agent.train_n_steps(env=env,n_steps=40000)
- Parameters:
obs_space (gym.spaces) – The observation space.
action_space (gym.spaces) – The action space.
Q_model (BaseModel) – The Q model.
dqn_reg (float, optional) – The DQN regularization. Defaults to 0.0.
batch_size (int, optional) – The batch size. Defaults to 64.
soft_exploit (bool, optional) – Whether to use soft exploitation. Defaults to True.
explorer (Explorer, optional) – The explorer. Defaults to RandomExplorer().
num_parallel_envs (int, optional) – The number of parallel environments. Defaults to 4.
num_epochs_per_update (int, optional) – The number of epochs per update. Defaults to 10.
lr (float, optional) – The learning rate. Defaults to 3e-4.
device (str, optional) – The device. Defaults to None.
experience_class (object, optional) – The experience class. Defaults to ExperienceReplay.
max_mem_size (int, optional) – The maximum memory size. Defaults to int(10e6).
discount_factor (float, optional) – The discount factor. Defaults to 0.99.
reward_normalization (bool, optional) – Whether to normalize the rewards. Defaults to True.
tensorboard_dir (str, optional) – The tensorboard directory. Defaults to “./tensorboard”.
dataloader_workers (int, optional) – The number of dataloader workers. Defaults to 0.
accumulate_gradients_per_epoch (bool, optional) – Whether to accumulate gradients per epoch. Defaults to None.
- check_action_space()
- setup_models()
Initializes the Q Model and optimizer.
- static get_models_input_output_shape(obs_space, action_space)
Calculates the input and output shapes of the models :type obs_space: :param obs_space: observation space :type action_space: :param action_space: action space
- Returns:
dictionary containing the input and output shapes of the models
- Return type:
dictionary
- set_train_mode()
sets the agent to train mode - all models are set to train mode
- set_eval_mode()
sets the agent to train mode - all models are set to eval mode
- best_act(observations, num_obs=1)
The highest probabilities actions in a detrminstic way
- Parameters:
observations – The observations to act on
num_obs – The number of observations to act on
- Returns:
The highest probabilty action to be taken in a detrministic way
- save_agent(f_name)
Saves the agent to a file.
- Parameters:
f_name (str) – file name
- Return type:
dict
Returns: a dictionary containing the agent’s state.
- load_agent(f_name)
Loads the agent from a file. Returns: a dictionary containing the agent’s state.
- contains_reccurent_nn()
- act_base(observations, num_obs=1)
Returns the Q values for the given observations.
- Parameters:
observations (np.array) – The observations.
num_obs (int, optional) – The number of observations. Defaults to 1.
- Return type:
Tensor
- Returns:
The Q values (torch.tensor)
- act(observations, num_obs=1)
- Parameters:
observations (
array
) – The observations to act onnum_obs (
int
) – The number of observations to act on
- Return type:
ndarray
- Returns:
The selected actions (np.ndarray)
if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent
- get_trajectories_data()
Returns the trajectories data
- _get_dqn_experiences()
loads experiences from the replay buffer and returns them as tensors.
- Returns:
(states, actions, rewards, dones, truncated, next_states, returns)
- Return type:
tuple
- update_policy(trajectory_data)
Updates the models and according to the agnets logic
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __module__ = 'rlify.agents.vdqn_agent'
- _abc_impl = <_abc._abc_data object>
rlify.agents.dqn_agent module
- class rlify.agents.dqn_agent.DQN_Agent(obs_space, action_space, Q_model, target_update='hard[update_freq=10]', dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)
Bases:
VDQN_Agent
DQN Agent
- __init__(obs_space, action_space, Q_model, target_update='hard[update_freq=10]', dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)
Example:
env_name = "CartPole-v1" env = gym.make(env_name, render_mode=None) models_shapes = DQN_Agent.get_models_input_output_shape(env.observation_space, env.action_space) Q_input_shape = models_shapes["Q_model"]["input_shape"] Q_out_shape = models_shapes["Q_model"]["out_shape"] Q_model = fc.FC(input_shape=Q_input_shape, out_shape=Q_out_shape) agent = DQN_Agent( obs_space=env.observation_space, action_space=env.action_space, Q_model=Q_model, batch_size=64, max_mem_size=int(10e6), num_parallel_envs=4, num_epochs_per_update=10, lr=3e-4, discount_factor=0.99, target_update="hard[update_freq=10]", ) train_stats = agent.train_n_steps(env=env, n_steps=40000)
- Parameters:
obs_space (gym.spaces) – The observation space of the environment.
action_space (gym.spaces) – The action space of the environment.
Q_model (BaseModel) – The Q-network model.
dqn_reg (float, optional) – The L2 regularization coefficient for the Q-network. Defaults to 0.0.
target_update (str, optional) – The target update rule. Defaults to “hard[update_freq=10]”.
batch_size (int, optional) – The batch size for training. Defaults to 64.
soft_exploit (bool, optional) – Whether to use soft exploitation during action selection. Defaults to True.
explorer (Explorer, optional) – The exploration strategy. Defaults to RandomExplorer().
num_parallel_envs (int, optional) – The number of parallel environments. Defaults to 4.
num_epochs_per_update (int, optional) – The number of epochs per update. Defaults to 10.
lr (float, optional) – The learning rate. Defaults to 3e-4.
device (str, optional) – The device to use for training. Defaults to None.
experience_class (object, optional) – The experience replay class. Defaults to ExperienceReplay.
max_mem_size (int, optional) – The maximum size of the experience replay memory. Defaults to int(10e6).
discount_factor (float, optional) – The discount factor for future rewards. Defaults to 0.99.
reward_normalization (bool, optional) – Whether to normalize rewards. Defaults to True.
tensorboard_dir (str, optional) – The directory to save TensorBoard logs. Defaults to “./tensorboard”.
dataloader_workers (int, optional) – The number of workers for the data loader. Defaults to 0.
accumulate_gradients_per_epoch (bool, optional) – Whether to accumulate gradients per epoch. Defaults to None.
- setup_models()
Initializes the Q and target Q networks.
- static get_models_input_output_shape(obs_space, action_space)
Calculates the input and output shapes of the models :type obs_space: :param obs_space: observation space :type action_space: :param action_space: action space
- Returns:
dictionary containing the input and output shapes of the models
- Return type:
dictionary
- init_target_update_rule(target_update)
Initializes the target update rule.
- Parameters:
target_update (str) – ‘soft[tau=0.01]’ or ‘hard[update_freq=10]’ target update
- set_train_mode()
sets the agent to train mode - all models are set to train mode
- set_eval_mode()
sets the agent to train mode - all models are set to eval mode
- hard_target_update(manual_update=False)
Hard update model parameters.
- Parameters:
manual_update (bool, optional) – Whether to force an update. Defaults to False - in case of force update target_update_counter is not updated.
- soft_target_update()
Soft update model parameters.
- save_agent(f_name)
Saves the agent to a file.
- Parameters:
f_name (str) – file name
- Return type:
dict
Returns: a dictionary containing the agent’s state.
- load_agent(f_name)
Loads the agent from a file. Returns: a dictionary containing the agent’s state.
if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent
- update_policy(trajectory_data)
Updates the policy. Using the DQN algorithm.
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __module__ = 'rlify.agents.dqn_agent'
- _abc_impl = <_abc._abc_data object>
rlify.agents.ddpg_agent module
- class rlify.agents.ddpg_agent.DDPG_Agent(obs_space, action_space, Q_model, Q_mle_model, target_update='soft[tau=0.005]', dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)
Bases:
DQN_Agent
DQN Agent
- __init__(obs_space, action_space, Q_model, Q_mle_model, target_update='soft[tau=0.005]', dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)
Example:
env_name = "CartPole-v1" env = gym.make(env_name, render_mode=None) models_shapes = DDPG_Agent.get_models_input_output_shape( env.observation_space, env.action_space) Q_input_shape = models_shapes["Q_model"]["input_shape"] Q_out_shape = models_shapes["Q_model"]["out_shape"] Q_mle_input_shape = models_shapes["Q_mle_model"]["input_shape"] Q_mle_out_shape = models_shapes["Q_mle_model"]["out_shape"] Q_model = fc.FC( input_shape=Q_input_shape, out_shape=Q_out_shape, ) Q_mle_model = fc.FC( input_shape=Q_mle_input_shape, out_shape=Q_mle_out_shape, ) agent = DDPG_Agent(obs_space=env.observation_space, action_space= env.action_space, Q_model=Q_model, Q_mle_model=Q_mle_model) train_stats = agent.train_n_steps(env=env_c,n_steps=40000)
- Parameters:
obs_space (gym.spaces) – The observation space of the environment
action_space (gym.spaces) – The action space of the environment
Q_model (BaseModel) – The Q model
Q_mle_model (BaseModel) – The MLE model
target_update (str, optional) – The target update strategy.
"soft[tau=0.005]". (Defaults to)
dqn_reg (float, optional) – The regularization factor for the Q model.
0.0. (Defaults to)
batch_size (int, optional) – The batch size. Defaults to 64.
soft_exploit (bool, optional) – Whether to use soft exploitation.
True. (Defaults to)
explorer (Explorer, optional) – The explorer. Defaults to RandomExplorer().
num_parallel_envs (int, optional) – The number of parallel environments.
4. (Defaults to)
num_epochs_per_update (int, optional) – The number of epochs per update.
10. (Defaults to)
lr (float, optional) – The learning rate. Defaults to 3e-4.
device (str, optional) – The device to use. Defaults to None.
experience_class (object, optional) – The experience class. Defaults to ExperienceReplay.
max_mem_size (int, optional) – The maximum memory size. Defaults to int(10e6).
discount_factor (float, optional) – The discount factor. Defaults to 0.99.
reward_normalization (bool, optional) – Whether to normalize the rewards.
True.
tensorboard_dir (str, optional) – The tensorboard directory. Defaults to “./tensorboard”.
dataloader_workers (int, optional) – The number of dataloader workers.
0. (Defaults to)
accumulate_gradients_per_epoch (bool, optional) – Whether to accumulate gradients per epoch.
None. (Defaults to)
- setup_models()
Initializes the Q, target Q and MLE networks.
- static get_models_input_output_shape(obs_space, action_space)
Returns the input and output shapes of the Q model.
- Return type:
dict
- check_action_space()
- set_train_mode()
sets the agent to train mode - all models are set to train mode
- set_eval_mode()
sets the agent to train mode - all models are set to eval mode
- hard_target_update(manual_update=False)
Hard update model parameters.
- Parameters:
manual_update (bool, optional) – Whether to force an update,
update (Defaults to False - in case of force) – target_update_counter is not updated.
- soft_target_update()
Soft update model parameters.
- best_act(observations, num_obs=1)
The highest probabilities actions in a detrminstic way
- Parameters:
observations – The observations to act on
num_obs – The number of observations to act on
- Returns:
The highest probabilty action to be taken in a detrministic way
if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent
- save_agent(f_name)
Saves the agent to a file.
- Parameters:
f_name (str) – file name
- Return type:
dict
Returns: a dictionary containing the agent’s state.
- load_agent(f_name)
Loads the agent from a file. Returns: a dictionary containing the agent’s state.
- actor_action(observations, num_obs=1, use_target=False)
Returns the actor action for a batch of observations.
- Parameters:
observations (np.ndarray, torch.tensor) – The observations to act on
num_obs (int, optional) – The number of observations to act on.
1. (Defaults to)
- Returns:
The actions
- Return type:
torch.tensor
- get_actor_action_value(states, actions, use_target=False)
Returns the actor action value for a batch of observations.
- Parameters:
states (torch.tensor) – The observations to act on
dones (torch.tensor) – The dones of the observations
actions (torch.tensor) – The actions to act on
- Returns:
The actions values
- Return type:
torch.tensor
- act(observations, num_obs=1)
- Parameters:
observations (
array
) – The observations to act onnum_obs (
int
) – The number of observations to act on
- Returns:
The selected actions (np.ndarray)
- update_policy(trajectory_data)
Updates the policy, using the DDPG algorithm.
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __module__ = 'rlify.agents.ddpg_agent'
- _abc_impl = <_abc._abc_data object>
rlify.agents.ppo_agent module
- class rlify.agents.ppo_agent.PPODataset(states, actions, dones, returns, advantages, logits, prepare_for_rnn)
Bases:
Dataset
Dataset for PPO.
- __init__(states, actions, dones, returns, advantages, logits, prepare_for_rnn)
- Parameters:
states (np.ndarray) – The states.
actions (np.ndarray) – The actions.
dones (np.ndarray) – The dones.
returns (np.ndarray) – The returns.
advantages (np.ndarray) – The advantages.
logits (np.ndarray) – The logits.
prepare_for_rnn (bool) – Whether to prepare for RNN.
- __len__()
- __getitems__(idx)
- __getitem__(idx)
- collate_fn(batch)
- __annotations__ = {}
- __module__ = 'rlify.agents.ppo_agent'
- __parameters__ = ()
- class rlify.agents.ppo_agent.PPOData(states, actions, dones, returns, advantages, logits, prepare_for_rnn)
Bases:
IData
A class for PPO data.
- __init__(states, actions, dones, returns, advantages, logits, prepare_for_rnn)
- Parameters:
states (np.ndarray) – The states.
actions (np.ndarray) – The actions.
dones (np.ndarray) – The dones.
returns (np.ndarray) – The returns.
advantages (np.ndarray) – The advantages.
logits (np.ndarray) – The logits.
prepare_for_rnn (bool) – Whether to prepare for RNN.
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __module__ = 'rlify.agents.ppo_agent'
- _abc_impl = <_abc._abc_data object>
- class rlify.agents.ppo_agent.PPO_Agent(obs_space, action_space, policy_nn, critic_nn, batch_size=1024, entropy_coeff=0.1, kl_div_thresh=0.03, clip_param=0.1, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ForgettingExperienceReplay'>, max_mem_size=1000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0, accumulate_gradients_per_epoch=None)
Bases:
RL_Agent
Proximal Policy Optimization (PPO) reinforcement learning agent. Inherits from RL_Agent.
- __init__(obs_space, action_space, policy_nn, critic_nn, batch_size=1024, entropy_coeff=0.1, kl_div_thresh=0.03, clip_param=0.1, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ForgettingExperienceReplay'>, max_mem_size=1000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0, accumulate_gradients_per_epoch=None)
Example:
env_name = 'Pendulum-v1' env = gym.make(env_name, render_mode=None) models_shapes = PPO_Agent.get_models_input_output_shape(env.observation_space, env.action_space) policy_input_shape = models_shapes["policy_nn"]["input_shape"] policy_out_shape = models_shapes["policy_nn"]["out_shape"] critic_input_shape = models_shapes["critic_nn"]["input_shape"] critic_out_shape = models_shapes["critic_nn"]["out_shape"] policy_nn = fc.FC(input_shape=policy_input_shape, embed_dim=128, depth=3, activation=torch.nn.ReLU(), out_shape=policy_out_shape) critic_nn = fc.FC(input_shape=critic_input_shape, embed_dim=128, depth=3, activation=torch.nn.ReLU(), out_shape=critic_out_shape) agent = PPO_Agent(obs_space=env.observation_space, action_space=env.action_space, device=device, batch_size=1024, max_mem_size=10**5, num_parallel_envs=4, lr=3e-4, entropy_coeff=0.05, policy_nn=policy_nn, critic_nn=critic_nn, discount_factor=0.99, tensorboard_dir = None) train_stats = agent.train_n_steps(env=env,n_steps=250000)
- Parameters:
obs_space (gym.spaces) – The observation space of the environment.
action_space (gym.spaces) – The action space of the environment.
policy_nn (nn.Module) – The policy neural network.
critic_nn (nn.Module) – The critic neural network.
batch_size (int) – The batch size for training.
entropy_coeff (float) – The coefficient for the entropy regularization term.
kl_div_thresh (float) – The threshold for the KL divergence between old and new policy.
clip_param (float) – The clipping parameter for the PPO loss.
explorer (Explorer) – The exploration strategy.
num_parallel_envs (int) – The number of parallel environments.
num_epochs_per_update (int) – The number of epochs per update.
lr (float) – The learning rate.
device (str) – The device to use for training.
experience_class (object) – The experience replay class.
max_mem_size (int) – The maximum memory size for experience replay.
discount_factor (float) – The discount factor for future rewards.
reward_normalization (bool) – Whether to normalize rewards.
tensorboard_dir (str) – The directory to save tensorboard logs.
dataloader_workers (int) – The number of workers for the dataloader.
accumulate_gradients_per_epoch (bool) – Whether to accumulate gradients per epoch.
- set_train_mode()
sets the agent to train mode - all models are set to train mode
- set_eval_mode()
sets the agent to train mode - all models are set to eval mode
- static get_models_input_output_shape(obs_space, action_space)
Returns the input and output shapes of the Q model.
- Return type:
dict
- setup_models()
Initializes the NN models
- save_agent(f_name)
Saves the agent to a file.
- Parameters:
f_name (str) – file name
- Return type:
dict
Returns: a dictionary containing the agent’s state.
- load_agent(f_name)
Loads the agent from a file. Returns: a dictionary containing the agent’s state.
if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent
- set_num_parallel_env(num_parallel_envs)
Sets the number of parallel environments
- Parameters:
num_parallel_envs (int) – number of parallel environments
- best_act(observations, num_obs=1)
The highest probabilities actions in a detrminstic way
- Parameters:
observations – The observations to act on
num_obs – The number of observations to act on
- Returns:
The highest probabilty action to be taken in a detrministic way
- best_act_discrete(observations, num_obs=1)
- best_act_cont(observations, num_obs=1)
- act(observations, num_obs=1)
- Parameters:
observations – The observations to act on
num_obs – The number of observations to act on
- Returns:
The selected actions (np.ndarray)
- get_trajectories_data()
Returns the trajectories data
- calc_logits_values(states, actions, dones)
- _get_ppo_experiences(num_episodes=None)
Get the experiences for PPO :type num_episodes: :param num_episodes: Number of episodes to get. :type num_episodes: int
- Returns:
(states, actions, rewards, dones, truncated, next_states)
- Return type:
tuple
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __module__ = 'rlify.agents.ppo_agent'
- _abc_impl = <_abc._abc_data object>
- update_policy(trajectory_data)
Update the policy network. Args: exp (tuple): Experience tuple.
rlify.agents.heuristic_agent module
- class rlify.agents.heuristic_agent.Heuristic_Agent(heuristic_func, **kwargs)
Bases:
RL_Agent
A Heuristic Agent that uses a heuristic function to act.
- __init__(heuristic_func, **kwargs)
- Parameters:
heuristic_function – A function that takes in the inner_state, observation (ObsWrapper) and returns a tuple: (inner_state, action) the inner state (could be None) and the action to be taken, please notice that the actions shape is b,n_actions,action_dim Please check more ObsWrapper for more info on the observation input object
kwargs – Arguments for the RL_Agent base class
Example:
env_name = "CartPole-v1" env_c = gym.make(env_name, render_mode=None) def heuristic_func(inner_state, obs: ObsWrapper): # an function that does not keep inner state b_shape = len(obs) actions = np.zeros((b_shape, 1)) # single discrete action # just a dummy heuristic for a gym env with np.array observations (for more details about the obs object check ObsWrapper) # the heuristic check whether the first number of each observation is positive, if so, it returns action=1, else 0 actions[torch.where(obs['data'][:,0] > 0)[0].cpu()] = 1 return None, actions agent_c = Heuristic_Agent(obs_space=env_c.observation_space, action_space=env_c.action_space, heuristic_func=heuristic_func) reward = agent_c.run_env(env_c, best_act=True) print("Run Reward:", reward)
- setup_models()
Does nothing in this agent.
- get_models_input_output_shape(action_space)
Does nothing in this agent.
- set_train_mode()
sets the agent to train mode - all models are set to train mode
- set_eval_mode()
sets the agent to train mode - all models are set to eval mode
- save_agent(f_name)
Saves the agent to a file.
- Parameters:
f_name (str) – file name
- Return type:
dict
Returns: a dictionary containing the agent’s state.
- load_agent(f_name)
Loads the agent from a file. Returns: a dictionary containing the agent’s state.
- train(env, n_episodes)
- act(observations, num_obs=1)
- Parameters:
observations – The observations to act on
num_obs – The number of observations to act on
- Returns:
The selected actions (np.ndarray)
- best_act(observations, num_obs=1)
The highest probabilities actions in a detrminstic way
- Parameters:
observations – The observations to act on
num_obs – The number of observations to act on
- Returns:
The highest probabilty action to be taken in a detrministic way
reset nn hidden_state - does nothing in this agent
- update_policy(trajectory)
does nothing in this agent.
- get_trajectories_data(num_episodes)
Mainly for Paired Algorithm support
- clear_exp()
clears the experience replay buffer
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __module__ = 'rlify.agents.heuristic_agent'
- _abc_impl = <_abc._abc_data object>
rlify.agents.explorers module
- class rlify.agents.explorers.Explorer
Bases:
ABC
Abstrcat Exploration Class
- __init__()
- abstract explore()
Returns True if it is an exploration action time step
- abstract update()
updates the exploration epsilon
- abstract act(action_space, obs, num_obs)
Responsible for storing an inner state if needed(in self.inner_state attr) Returns the action to be taken
- __abstractmethods__ = frozenset({'act', 'explore', 'update'})
- __annotations__ = {}
- __dict__ = mappingproxy({'__module__': 'rlify.agents.explorers', '__doc__': 'Abstrcat Exploration Class', '__init__': <function Explorer.__init__>, 'explore': <function Explorer.explore>, 'update': <function Explorer.update>, 'act': <function Explorer.act>, '__dict__': <attribute '__dict__' of 'Explorer' objects>, '__weakref__': <attribute '__weakref__' of 'Explorer' objects>, '__abstractmethods__': frozenset({'act', 'update', 'explore'}), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})
- __module__ = 'rlify.agents.explorers'
- __weakref__
list of weak references to the object
- _abc_impl = <_abc._abc_data object>
- class rlify.agents.explorers.RandomExplorer(exploration_epsilon=1, eps_end=0.05, eps_dec=0.01)
Bases:
Explorer
Class that acts a linear exploration method
- __init__(exploration_epsilon=1, eps_end=0.05, eps_dec=0.01)
- Parameters:
exploration_epsilon (
int
) – The initial exploration epsiloneps_end (
float
) – The final exploration epsiloneps_dec (
float
) – The decay rate of the exploration epsilon
- explore()
Returns True if it is an exploration action time step (randomness based on the exploration epsilon)
- update()
updates the exploration epsilon in linear mode: exploration_epsilon * (1-self.eps_dec)
- act(action_space, obs, num_obs)
- Parameters:
action_space – The action space of the env
obs – The observation of the env
num_obs – The number of observations to act on
Reutns a random action from the action space
- _act_discrete(action_space, obs, num_obs)
- _act_cont(action_space, obs, num_obs)
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __module__ = 'rlify.agents.explorers'
- _abc_impl = <_abc._abc_data object>
- class rlify.agents.explorers.HeuristicExplorer(heuristic_function, exploration_epsilon=1, eps_end=0.05, eps_dec=0.01)
Bases:
RandomExplorer
A class for custom exploration methods- defined by user in init heuristic_function(inner_state, obs) - > action
- __init__(heuristic_function, exploration_epsilon=1, eps_end=0.05, eps_dec=0.01)
- Parameters:
heuristic_function – A function that takes in the inner_state, observation (ObsWrapper) and returns a tuple: (inner_state, action) the inner state (could be None) and the action to be taken, please notice that the actions shape is b,n_actions,action_dim
- explore()
Returns True if it is an exploration action time step (randomness based on the exploration epsilon)
- update()
updates the exploration epsilon in linear mode: exploration_epsilon * (1-self.eps_dec)
- act(action_space, obs, num_obs)
Call the heuristic function to get the action, also updates the inner state
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __module__ = 'rlify.agents.explorers'
- _abc_impl = <_abc._abc_data object>
rlify.agents.action_spaces_utils module
- class rlify.agents.action_spaces_utils.MCAW(lows, highs, locs_scales)
Bases:
object
Multivariate Continuous Action Space wrapper
- __init__(lows, highs, locs_scales)
- Parameters:
low (list) – the lower bound of the actions
high (list) – the higher bound of the actions
locs_scales (torch.tensor) – the mean and scale of all the actions
- sample(sample_shape=())
- Parameters:
sample_shape (torch.Size) – the shape of the sample
- Returns:
a tensor of shape (b, n_actions, sample_shape)
- log_prob(actions)
calculates the log prob of each action :type actions: :param actions: a tensor of shape (b, n_actions) :type actions: torch.tensor
- Returns:
a tensor of shape (b, n_actions)
- property loc
- property scale
- entropy()
caculates the mean entropy of all the actions :returns: a tensor of shape (b, 1)
- __dict__ = mappingproxy({'__module__': 'rlify.agents.action_spaces_utils', '__doc__': '\n Multivariate Continuous Action Space wrapper\n ', '__init__': <function MCAW.__init__>, 'sample': <function MCAW.sample>, 'log_prob': <function MCAW.log_prob>, 'loc': <property object>, 'scale': <property object>, 'entropy': <function MCAW.entropy>, '__dict__': <attribute '__dict__' of 'MCAW' objects>, '__weakref__': <attribute '__weakref__' of 'MCAW' objects>, '__annotations__': {}})
- __module__ = 'rlify.agents.action_spaces_utils'
- __weakref__
list of weak references to the object
- class rlify.agents.action_spaces_utils.CAW(low, high, loc, scale)
Bases:
Normal
Continuous Action Wrapper
- __init__(low, high, loc, scale)
- Parameters:
low (float) – the lower bound of the action
high (float) – the higher bound of the action
loc (torch.tensor) – the mean of the action
scale (torch.tensor) – the scale of the action
- sample(sample_shape=())
- Parameters:
sample_shape (torch.Size) – the shape of the sample
- Returns:
a tensor of shape (b, sample_shape)
- __module__ = 'rlify.agents.action_spaces_utils'
- class rlify.agents.action_spaces_utils.MDA(start, possible_actions, n_actions, x)
Bases:
object
Multivariate Discrete Action Space
- __init__(start, possible_actions, n_actions, x)
- Parameters:
start (np.array) – an offset for start of each action
possible_actions (int) – number of possible actions
n_actions (np.array) – number of actions for each action
x (torch.tensor) – the logits for each action
- sample(sample_shape=())
- Returns:
a tensor of shape (b, n_actions, sample_shape)
- log_prob(actions)
calculates the log prob of each action :type actions:
tensor
:param actions: a tensor of shape (b, n_actions) :type actions: torch.tensor- Returns:
a tensor of shape (b, n_actions)
- property probs
Returns: a tensor of shape (b, n_actions)
- entropy()
- __dict__ = mappingproxy({'__module__': 'rlify.agents.action_spaces_utils', '__doc__': '\n Multivariate Discrete Action Space\n ', '__init__': <function MDA.__init__>, 'sample': <function MDA.sample>, 'log_prob': <function MDA.log_prob>, 'probs': <property object>, 'entropy': <function MDA.entropy>, '__dict__': <attribute '__dict__' of 'MDA' objects>, '__weakref__': <attribute '__weakref__' of 'MDA' objects>, '__annotations__': {}})
- __module__ = 'rlify.agents.action_spaces_utils'
- __weakref__
list of weak references to the object
rlify.agents.agent_utils module
- rlify.agents.agent_utils.pad_from_done_indices(data, dones)
Packs the data from the done indices to torch.nn.utils.rnn.PackedSequence
- rlify.agents.agent_utils.pad_states_from_done_indices(data, dones)
Packs the data from the done indices to torch.nn.utils.rnn.PackedSequence
- rlify.agents.agent_utils.pad_tensors_from_done_indices(data, dones)
Packs the data from the done indices to torch.nn.utils.rnn.PackedSequence
- rlify.agents.agent_utils.calc_gaes(rewards, values, terminated, discount_factor=0.99, decay=0.9)
works with rewards vector which consitst of many epidsodes Return the General Advantage Estimates from the given rewards and values. Paper: https://arxiv.org/pdf/1506.02438.pdf
- rlify.agents.agent_utils.calc_returns(rewards, terminated, discount_factor=0.99)
works with rewards vector which consitst of many epidsodes Return the General Advantage Estimates from the given rewards and values. Paper: https://arxiv.org/pdf/1506.02438.pdf
- class rlify.agents.agent_utils.ObsShapeWraper(obs_shape)
Bases:
dict
- dict_types = [<class 'dict'>, <class 'gymnasium.spaces.dict.Dict'>]
- __init__(obs_shape)
- __dict__ = mappingproxy({'__module__': 'rlify.agents.agent_utils', 'dict_types': [<class 'dict'>, <class 'gymnasium.spaces.dict.Dict'>], '__init__': <function ObsShapeWraper.__init__>, '__dict__': <attribute '__dict__' of 'ObsShapeWraper' objects>, '__weakref__': <attribute '__weakref__' of 'ObsShapeWraper' objects>, '__doc__': None, '__annotations__': {}})
- __module__ = 'rlify.agents.agent_utils'
- __weakref__
list of weak references to the object
- class rlify.agents.agent_utils.ObsWrapper(data=None, keep_dims=True, tensors=False)
Bases:
object
A class for wrapping observations, the object is roughly a dict of np.arrays or torch.tensors A default key is ‘data’ for the main data if it in either a np.array or torch.tensor
Example:
obs = ObsWrapper({'data':np.array([1,2,3]), 'data2':np.array([4,5,6])}) print(obs['data']) print(obs['data2']) print(obs['data'][0]) obs = ObsWrapper(np.array([1,2,3])) print(obs['data'])
- __init__(data=None, keep_dims=True, tensors=False)
- Parameters:
data ((
dict
,array
,tensor
)) – The data to wrapkeep_dims (
bool
) – Whether to keep the dimensions of the data, if False will add a dimension of batch to the datatensors (
bool
) – Whether to keep the data in torch.tensor
- update_shape()
Updates the shape of the object
- init_from_dict(data, keep_dims, tensors)
Initializes from a dict
- Parameters:
data – The data to initialize from
keep_dims – Whether to keep the dimensions of the data, if False will add a dimension of batch to the data
tensors – Whether to keep the data in torch.tensor
- init_from_list_obsWrapper_obs(obs_list)
Initializes from a list of ObsWrapper objects :type obs_list: :param obs_list: The list of ObsWrapper objects
- init_from_list_generic_data(obs_list)
Initializes from a list of generic data :type obs_list: :param obs_list: The list of generic data
- _init_from_none_(keep_dims, tensors)
Initializes an object without data
- __setitem__(key, value)
Sets an item in the object :type key: :param key: The key to set :type value: :param value: The value to set
- __delitem__(key)
Deletes an item in the object
- Parameters:
key – The key to delete
- __iter__()
- Returns:
an iterator over the object
- __getitem__(key)
- Parameters:
key – The key to get
- Returns:
The relevant item in respect to the key
- slice_tensors(key)
- Parameters:
key – The key to get
- Returns:
The sliced tensors
- keys()
- Returns:
the keys of the object
- items()
- Returns:
the items of the object
- values()
- Returns:
the values of the object
- __len__()
- Returns:
The length of the object
- __str__()
Returns the string representation of the object
- Return type:
str
- __repr__()
Return repr(self).
- Return type:
str
- __mul__(other)
Multiplies the object by another object :type other: :param other: The other object to multiply by :param multiplies key by key using <*> pointwise operator:
- __add__(other)
Adds the object by another object :type other: :param other: The other object to add by :param adds key by key using <+> pointwise operator:
- __neg__()
Negates the object
- __sub__(other)
Subtracts the object by another object :type other: :param other: The other object to subtract by :param subtracts key by key using <-> pointwise operator:
- __truediv__(other)
Divides the object by another object :type other: :param other: The other object to divide by :param divides key by key using </> pointwise operator:
- unsqueeze(dim=0)
- Parameters:
dim – The device to put the tensors on
- Returns:
The object as tensors
- squeeze(dim=0)
- Parameters:
dim – The device to put the tensors on
- Returns:
The object as tensors
- flatten(start_dim=None, env_dim=None)
- Parameters:
dim – The device to put the tensors on
- Returns:
The object as tensors
- get_as_tensors(device)
- Parameters:
device – The device to put the tensors on
- Returns:
The object as tensors
- to(device, non_blocking=False)
- Parameters:
device – The device to put the tensors on
- Returns:
The object as tensors
- stack()
stack a list of objects
- cat(other, axis=0)
Concatenates the object by another object :type other: :param other: The other object to concatenate by :param concatenates key by key:
- np_roll(indx, inplace=False)
Rolls the data by indx and fills the empty space with zeros - only on axis 0 :type indx: :param indx: The index to roll by :type inplace: :param inplace: Whether to do the roll inplace
- Returns:
The rolled object
- __dict__ = mappingproxy({'__module__': 'rlify.agents.agent_utils', '__doc__': "\n A class for wrapping observations, the object is roughly a dict of np.arrays or torch.tensors\n A default key is 'data' for the main data if it in either a np.array or torch.tensor\n\n Example::\n\n obs = ObsWrapper({'data':np.array([1,2,3]), 'data2':np.array([4,5,6])})\n print(obs['data'])\n print(obs['data2'])\n print(obs['data'][0])\n obs = ObsWrapper(np.array([1,2,3]))\n print(obs['data'])\n ", '__init__': <function ObsWrapper.__init__>, 'update_shape': <function ObsWrapper.update_shape>, 'init_from_dict': <function ObsWrapper.init_from_dict>, 'init_from_list_obsWrapper_obs': <function ObsWrapper.init_from_list_obsWrapper_obs>, 'init_from_list_generic_data': <function ObsWrapper.init_from_list_generic_data>, '_init_from_none_': <function ObsWrapper._init_from_none_>, '__setitem__': <function ObsWrapper.__setitem__>, '__delitem__': <function ObsWrapper.__delitem__>, '__iter__': <function ObsWrapper.__iter__>, '__getitem__': <function ObsWrapper.__getitem__>, 'slice_tensors': <function ObsWrapper.slice_tensors>, 'keys': <function ObsWrapper.keys>, 'items': <function ObsWrapper.items>, 'values': <function ObsWrapper.values>, '__len__': <function ObsWrapper.__len__>, '__str__': <function ObsWrapper.__str__>, '__repr__': <function ObsWrapper.__repr__>, '__mul__': <function ObsWrapper.__mul__>, '__add__': <function ObsWrapper.__add__>, '__neg__': <function ObsWrapper.__neg__>, '__sub__': <function ObsWrapper.__sub__>, '__truediv__': <function ObsWrapper.__truediv__>, 'unsqueeze': <function ObsWrapper.unsqueeze>, 'squeeze': <function ObsWrapper.squeeze>, 'flatten': <function ObsWrapper.flatten>, 'get_as_tensors': <function ObsWrapper.get_as_tensors>, 'to': <function ObsWrapper.to>, 'stack': <function ObsWrapper.stack>, 'cat': <function ObsWrapper.cat>, 'np_roll': <function ObsWrapper.np_roll>, '__dict__': <attribute '__dict__' of 'ObsWrapper' objects>, '__weakref__': <attribute '__weakref__' of 'ObsWrapper' objects>, '__annotations__': {}})
- __module__ = 'rlify.agents.agent_utils'
- __weakref__
list of weak references to the object
- class rlify.agents.agent_utils.IData(dataset, prepare_for_rnn)
Bases:
ABC
An abstract class for agents data
- __init__(dataset, prepare_for_rnn)
- Parameters:
dataset (
Dataset
) – The dataset to useprepare_for_rnn – Whether to prepare the data for RNN
- get_dataloader(batch_size, shuffle, num_workers)
- Parameters:
batch_size – The batch size
shuffle – Whether to shuffle the data
num_workers – The number of workers
- Returns:
A DataLoader object
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __dict__ = mappingproxy({'__module__': 'rlify.agents.agent_utils', '__doc__': '\n An abstract class for agents data\n ', '__init__': <function IData.__init__>, 'get_dataloader': <function IData.get_dataloader>, '__dict__': <attribute '__dict__' of 'IData' objects>, '__weakref__': <attribute '__weakref__' of 'IData' objects>, '__abstractmethods__': frozenset(), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})
- __module__ = 'rlify.agents.agent_utils'
- __weakref__
list of weak references to the object
- _abc_impl = <_abc._abc_data object>
- class rlify.agents.agent_utils.LambdaDataset(obs_collection, tensor_collection, dones, prepare_for_rnn)
Bases:
Dataset
A dataset class for general purposes
- __init__(obs_collection, tensor_collection, dones, prepare_for_rnn)
- Parameters:
obs_collection (
tuple
[ObsWrapper
]) – The observation collectiontensor_collection (
tuple
[tensor
]) – The tensor collectiondones (
tensor
) – The dones tensorprepare_for_rnn (
bool
) – Whether to prepare the data for RNN
- __len__()
- _prepare_data(obs_collection, tensor_collection, dones)
Prepares the data for the dataset in form of tensors :type obs_collection: :param obs_collection: The observation collection :type tensor_collection: :param tensor_collection: The tensor collection :type dones: :param dones: The dones tensor
- Returns:
The prepared data
- _pad_experiecne(obs_collection, tensor_collection, dones)
Pads the experience for RNN :type obs_collection: :param obs_collection: The observation collection :type tensor_collection: :param tensor_collection: The tensor collection :type dones: :param dones: The dones tensor
- Returns:
The padded experience and loss flag loss flag is a tensor of ones where the data are not padded
- __annotations__ = {}
- __getitems__(idx)
- __module__ = 'rlify.agents.agent_utils'
- __parameters__ = ()
- __getitem__(idx)
- collate_fn(batch)
- class rlify.agents.agent_utils.LambdaData(obs_collection, tensor_collection, dones, prepare_for_rnn)
Bases:
IData
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __module__ = 'rlify.agents.agent_utils'
- _abc_impl = <_abc._abc_data object>
- __init__(obs_collection, tensor_collection, dones, prepare_for_rnn)
- Parameters:
dataset – The dataset to use
prepare_for_rnn (
bool
) – Whether to prepare the data for RNN
- class rlify.agents.agent_utils.TrainMetrics
Bases:
object
- __dict__ = mappingproxy({'__module__': 'rlify.agents.agent_utils', '__init__': <function TrainMetrics.__init__>, 'add': <function TrainMetrics.add>, 'on_epoch_end': <function TrainMetrics.on_epoch_end>, 'get_metrcis_df': <function TrainMetrics.get_metrcis_df>, '__iter__': <function TrainMetrics.__iter__>, '__next__': <function TrainMetrics.__next__>, '__getitem__': <function TrainMetrics.__getitem__>, '__dict__': <attribute '__dict__' of 'TrainMetrics' objects>, '__weakref__': <attribute '__weakref__' of 'TrainMetrics' objects>, '__doc__': None, '__annotations__': {}})
- __module__ = 'rlify.agents.agent_utils'
- __weakref__
list of weak references to the object
- __init__()
- Parameters:
metrics – The metrics to store.
- add(metric_name, value)
Adds a metric to the metrics. :type metric_name: :param metric_name: The name of the metric. :type value: :param value: The value of the metric.
- on_epoch_end()
Adds a metric to the metrics. :param metric_name: The name of the metric. :param value: The value of the metric.
- get_metrcis_df()
- Returns:
The metrics as a dataframe.
- __iter__()
- Returns:
An iterator over the metrics.
- __next__()
Returns: An iterator over the metrics.
- __getitem__(key)
Returns: An iterator over the metrics.