Skip to content

manqingzhou/self-driven-cab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PROBLEM: We aim to create a self-driving cab, its job is to pick up the passenger at one location and drop them off in another. This cab will also take care of: Drop off the passenger to the right location. Save passenger's time by taking minimum time possible to drop off Take care of passenger's safety and traffic rules.

ENVIROMENT:

enviroment

ALGORITHM: The function of Q_learning

def q_learning(episode_limit,q_table,alpha =0.1,gamma = 0.6):
    y = np.zeros((episode_limit,1))
    for i in range(1,episode_limit):
        state = env.reset() #every single time we reset the enviroment
        epoch,penalty,reward =0,0,0
        done = False
   
        while not done:
            action = choose_a(state)
            next_state,reward,done,info = env.step(action)
            q_table[state,action] = q_table[state,action] + alpha *(reward + gamma * np.max(q_table[next_state])-q_table[state,action])
            if reward == -10:
                penalty += 1
            state = next_state
     
      
            epoch += 1
        y[i] = epoch 

The function of q_learning_lambda

def q_learning_lambda(episode_limit,q_table,alpha =0.1,gamma = 0.6,lam = 0.5):
    y = np.zeros((episode_limit,1))

    for i in range(1,episode_limit):
        state = env.reset() #every single time we reset the enviroment
        epoch,penalty,reward =0,0,0
        done = False
        e = np.zeros([env.observation_space.n,env.action_space.n])
   
        while not done:
            action = choose_a(state)
            next_state,reward,done,info = env.step(action)
            next_action = choose_a(next_state)
            best_a = np.argmax(q_table[next_state])
            error = reward + gamma * q_table[next_state,best_a] - q_table[state,action]
            e[state,action] += 1
            q_table[state,action] += alpha * error * e[state,action]
            if next_action == best_a:
                e[state,action] = gamma * lam * e[state,action]
            else:
                e[state,action] = 0
            if reward == -10:
                penalty += 1
            state = next_state
     
      
            epoch += 1
        y[i] = epoch

RESULTS: Through learning from the past, the cab now can pick up and drop off passenger in less than 20 steps.

result

result

About

reinforcement learning algorithms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published