loss function (In Policy Gradient section), optimizer and entropy

Dear Mr.hongzi
I was interested in your resource scheduling method. Now, I stuck in your network class. I can't understand why you used the blow function:
`loss = T.log(prob_act[T.arange(N), actions]).dot(values) / N`
Did you calculate the special loss function? If you didn't, what's the name of this loss function?