From @rpc333
On my first test run I didn't properly set source and title properties when ingesting files, which ended up being very painful because of the separate indices for paragraphs vs full text. Having more control/being able to more generally manipulate data already in the ES store seems important for being able to use the system with confidence. Once we've ingested 25k+ papers, starting over again won't be an option for us. We should find ways to give ourselves more "access ducts" into the internals here