GitHub - manqingzhou/self-driven-cab: reinforcement learning algorithms

PROBLEM: We aim to create a self-driving cab, its job is to pick up the passenger at one location and drop them off in another. This cab will also take care of: Drop off the passenger to the right location. Save passenger's time by taking minimum time possible to drop off Take care of passenger's safety and traffic rules.

ENVIROMENT:

ALGORITHM: The function of Q_learning

def q_learning(episode_limit,q_table,alpha =0.1,gamma = 0.6):
    y = np.zeros((episode_limit,1))
    for i in range(1,episode_limit):
        state = env.reset() #every single time we reset the enviroment
        epoch,penalty,reward =0,0,0
        done = False
   
        while not done:
            action = choose_a(state)
            next_state,reward,done,info = env.step(action)
            q_table[state,action] = q_table[state,action] + alpha *(reward + gamma * np.max(q_table[next_state])-q_table[state,action])
            if reward == -10:
                penalty += 1
            state = next_state
     
      
            epoch += 1
        y[i] = epoch

The function of q_learning_lambda

def q_learning_lambda(episode_limit,q_table,alpha =0.1,gamma = 0.6,lam = 0.5):
    y = np.zeros((episode_limit,1))

    for i in range(1,episode_limit):
        state = env.reset() #every single time we reset the enviroment
        epoch,penalty,reward =0,0,0
        done = False
        e = np.zeros([env.observation_space.n,env.action_space.n])
   
        while not done:
            action = choose_a(state)
            next_state,reward,done,info = env.step(action)
            next_action = choose_a(next_state)
            best_a = np.argmax(q_table[next_state])
            error = reward + gamma * q_table[next_state,best_a] - q_table[state,action]
            e[state,action] += 1
            q_table[state,action] += alpha * error * e[state,action]
            if next_action == best_a:
                e[state,action] = gamma * lam * e[state,action]
            else:
                e[state,action] = 0
            if reward == -10:
                penalty += 1
            state = next_state
     
      
            epoch += 1
        y[i] = epoch

RESULTS: Through learning from the past, the cab now can pick up and drop off passenger in less than 20 steps.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
image		image
.DS_Store		.DS_Store
README.md		README.md
project.ipynb		project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

manqingzhou/self-driven-cab

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages