API Documentation
HNP
- class hnp.hnp.HNP(slow_continuous_idx)
Bases:
object
Class for HNP computation
- __init__(slow_continuous_idx) None
Constructor for HNP object
- Parameters:
slow_continuous_idx – Indices for slowly-changing continuous vars
- Returns:
None
- get_next_value(vtb, full_obs_index, cont_obs_index_floats)
Computes the new state value of tiles using HNP
HNP is only applied to slowly-changing continuous variables. First compute the next state tile portions from the float indices, then compute the next state value using the tile portions.
- Parameters:
vtb – State value table
full_obs_index – Value table index of observation
cont_obs_index_floats – Continuous variable indices
- Returns:
Next state value for continuous variables
Agents
- class hnp.agents.Agent(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False)
Bases:
ABC
Parent Reinforcement Learning Agent Class
- __init__(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False) None
Constructor for RL agent
- Parameters:
env – Gym environment
config – Agent configuration
results_dir – Directory to save results
use_beobench – Enable beobench
- Returns:
None
- save_results() None
Saves training result
- abstract train() None
RL agent training
- class hnp.agents.RandomActionAgent(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False)
Bases:
Agent
Random Action Agent Class
- __init__(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False) None
Constructor for RL agent
- Parameters:
env – Gym environment
config – Agent configuration
results_dir – Directory to save results
use_beobench – Enable beobench
- Returns:
None
- train() None
Random Action agent training
- class hnp.agents.FixedActionAgent(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False)
Bases:
Agent
Fixed Action Agent Class
- __init__(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False) None
Constructor for RL agent
- Parameters:
env – Gym environment
config – Agent configuration
results_dir – Directory to save results
use_beobench – Enable beobench
- Returns:
None
- train() None
Fixed Action agent training
- class hnp.agents.QLearningAgent(env: sinergym.envs.EplusEnv, config: dict, obs_mask: numpy.ndarray, results_dir: str = 'training_results', use_beobench: bool = False, use_hnp: bool = True)
Bases:
Agent
Q-Learning Agent Class
- __init__(env: sinergym.envs.EplusEnv, config: dict, obs_mask: numpy.ndarray, results_dir: str = 'training_results', use_beobench: bool = False, use_hnp: bool = True) None
Constructor for Q-Learning agent
- Parameters:
env – Gym environment
config – Agent configuration
obs_mask – Mask to categorize variables into slowly-changing continuous, fast-changing continuous, and discrete variables
results_dir – Directory to save results
use_beobench – Enable Beobench
use_hnp – Enable HNP
- Returns:
None
- choose_action(obs_index: numpy.ndarray, mode: str = 'explore') int
Get action following epsilon-greedy policy
- Parameters:
obs_index – Observation index
mode – Training or evaluation
- Returns:
Action
- get_act_shape() int
Get the action space shape
- Returns:
Action space shape
- get_next_value(obs: numpy.ndarray) tuple[float, numpy.ndarray]
Computes the new state value
If not using HNP, the new state value is retrieved from the value table directly. For HNP computation, refer to the get_next_value function in HNP class.
- Parameters:
obs – Observation
- Returns:
Next state value and value table index of observation
- get_obs_shape() tuple
Get the observation space shape
The state space for continuous variables are tile coded based on the number of tiles set in the configuration file.
- Returns:
Tuple of discretized observation space for continuous variables and the observation space for discrete variables
- get_vtb_idx_from_obs(obs: numpy.ndarray) tuple[numpy.ndarray, numpy.ndarray]
Get the value table index from observation
For continuous variables, first get the indices as floats, then round to integers and stack them with discrete variable indices
- Parameters:
obs – Observation
- Returns:
Value table index and continuous variable indices as floats
- save_results() None
Saves training result and learned Q table
- train() None
Q-Learning agent training
The training follows conventional Q-Learning update rule. If the episode length is less than the total number of weather data points, then the environment would not reset by the end of the episode. The environment only resets when all the weather data points have been used up.
Environment
- class hnp.environment.ObservationWrapper(*args: Any, **kwargs: Any)
Bases:
ObservationWrapper
Sinergym environment wrapper to modify observations
- __init__(env, obs_to_keep)
Constructor for observation wrapper
- Parameters:
env – Sinergym environment
obs_to_keep – Indices of state variables that are used
- Returns:
None
- observation(observation)
Remove the unused state variables from observation
- Parameters:
observation – Full observation
- Returns:
Filtered observation
- hnp.environment.create_env(env_config: dict | None = None) gym.Env
Create sinergym environment
- Parameters:
env_config – Configuration kwargs for sinergym. Currently, there is only a single key in this dictionary, “name”. This sets the name of the environment.
- Returns:
A configured gym environment.