Can you tell me how to process the skeleton sequence coordinates of 15 key points in several frames and then train the network for prediction?