Configuration

HNP uses dictionary to parse agent and environment configuration. You can use any file format you are comfortable with (YAML, JSON, etc.). Below is a sample configuration in YAML format. The comments above each setting describe its functionality.

agent:
    # The number of episodes for training
    num_episodes: 1460
    # Maximum number of time steps in each episode, each time step lasts 15 minutes in Sinergym
    horizon: 24
    # Discount factor
    gamma: 0.99
    # Number of tiles for tile coding. Use a integer for the same number of tiles across all continuous variables.
    # Use a list for distinct number of tiles for different continuous variables, each number in the list must match the order in obs_to_keep
    num_tiles: 20
    # Initial value for Epsilon
    initial_epsilon: 1
    # Annealing rate for Epsilon
    epsilon_annealing: 0.999
    # Initial learning rate
    learning_rate: 0.1
    # Annealing rate for learning rate
    learning_rate_annealing: 0.999
    # The action index for training, this is only used for fixed action agent
    action_index: 9

env:
    # Sinergym environment name
    name: Eplus-5Zone-hot-discrete-v1
    # Whether to normalise observations
    normalize: True
    # The observation variables to use for training. Use a empty list if you want to use all observation variables
    obs_to_keep: [1, 2, 8, 10]
    # The type of each observation variable:
    #   0 - slowly-changing continuous variable
    #   1 - fast-changing continuous variable
    #   2 - discrete variable
    mask: [0, 0, 0, 0]