Confused about the training process

Thanks for your code sharing and fantastic paper. I am confused about the training process. Pipeline in the paper claimed that estimated depth and poses are training on both sides. Code just split this process into **two independent parts**, SFM is a complete process for estimating and refining camera poses and depth in the same time. Am I just missup your thought?