Thanks for your code sharing and fantastic paper. I am confused about the training process. Pipeline in the paper claimed that estimated depth and poses are training on both sides. Code just split this process into two independent parts, SFM is a complete process for estimating and refining camera poses and depth in the same time. Am I just missup your thought?