API Documentation

HNP

class hnp.hnp.HNP(slow_continuous_idx)

Bases: object

Class for HNP computation

__init__(slow_continuous_idx) → None

Constructor for HNP object

Parameters:: slow_continuous_idx – Indices for slowly-changing continuous vars
Returns:: None

get_next_value(vtb, full_obs_index, cont_obs_index_floats)

Computes the new state value of tiles using HNP

HNP is only applied to slowly-changing continuous variables. First compute the next state tile portions from the float indices, then compute the next state value using the tile portions.

Parameters:

vtb – State value table
full_obs_index – Value table index of observation
cont_obs_index_floats – Continuous variable indices

Returns:

Next state value for continuous variables

Agents

class hnp.agents.Agent(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False)

Bases: ABC

Parent Reinforcement Learning Agent Class

__init__(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False) → None

Constructor for RL agent

Parameters:

env – Gym environment
config – Agent configuration
results_dir – Directory to save results
use_beobench – Enable beobench

Returns:

None

save_results() → None: Saves training result

abstract train() → None: RL agent training

class hnp.agents.RandomActionAgent(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False)

Bases: Agent

Random Action Agent Class

__init__(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False) → None

Constructor for RL agent

Parameters:

env – Gym environment
config – Agent configuration
results_dir – Directory to save results
use_beobench – Enable beobench

Returns:

None

train() → None: Random Action agent training

class hnp.agents.FixedActionAgent(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False)

Bases: Agent

Fixed Action Agent Class

__init__(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False) → None

Constructor for RL agent

Parameters:

env – Gym environment
config – Agent configuration
results_dir – Directory to save results
use_beobench – Enable beobench

Returns:

None

train() → None: Fixed Action agent training

class hnp.agents.QLearningAgent(env: sinergym.envs.EplusEnv, config: dict, obs_mask: numpy.ndarray, results_dir: str = 'training_results', use_beobench: bool = False, use_hnp: bool = True)

Bases: Agent

Q-Learning Agent Class

__init__(env: sinergym.envs.EplusEnv, config: dict, obs_mask: numpy.ndarray, results_dir: str = 'training_results', use_beobench: bool = False, use_hnp: bool = True) → None

Constructor for Q-Learning agent

Parameters:

env – Gym environment
config – Agent configuration
obs_mask – Mask to categorize variables into slowly-changing continuous, fast-changing continuous, and discrete variables
results_dir – Directory to save results
use_beobench – Enable Beobench
use_hnp – Enable HNP

Returns:

None

choose_action(obs_index: numpy.ndarray, mode: str = 'explore') → int

Get action following epsilon-greedy policy

Parameters:

obs_index – Observation index
mode – Training or evaluation

Returns:

Action

get_act_shape() → int

Get the action space shape

Returns:: Action space shape

get_next_value(obs: numpy.ndarray) → tuple[float, numpy.ndarray]

Computes the new state value

If not using HNP, the new state value is retrieved from the value table directly. For HNP computation, refer to the get_next_value function in HNP class.

Parameters:: obs – Observation
Returns:: Next state value and value table index of observation

get_obs_shape() → tuple

Get the observation space shape

The state space for continuous variables are tile coded based on the number of tiles set in the configuration file.

Returns:: Tuple of discretized observation space for continuous variables and the observation space for discrete variables

get_vtb_idx_from_obs(obs: numpy.ndarray) → tuple[numpy.ndarray, numpy.ndarray]

Get the value table index from observation

For continuous variables, first get the indices as floats, then round to integers and stack them with discrete variable indices

Parameters:: obs – Observation
Returns:: Value table index and continuous variable indices as floats

save_results() → None: Saves training result and learned Q table

train() → None

Q-Learning agent training

The training follows conventional Q-Learning update rule. If the episode length is less than the total number of weather data points, then the environment would not reset by the end of the episode. The environment only resets when all the weather data points have been used up.

Environment

class hnp.environment.ObservationWrapper(*args: Any, **kwargs: Any)

Bases: ObservationWrapper

Sinergym environment wrapper to modify observations

__init__(env, obs_to_keep)

Constructor for observation wrapper

Parameters:

env – Sinergym environment
obs_to_keep – Indices of state variables that are used

Returns:

None

observation(observation)

Remove the unused state variables from observation

Parameters:: observation – Full observation
Returns:: Filtered observation

hnp.environment.create_env(env_config: dict | None = None) → gym.Env

Create sinergym environment

Parameters:: env_config – Configuration kwargs for sinergym. Currently, there is only a single key in this dictionary, “name”. This sets the name of the environment.
Returns:: A configured gym environment.