-
Notifications
You must be signed in to change notification settings - Fork 4
Description
How it's leaky abstraction?
output_path is the path to store all the pechas created by the formatters. It defaults to ~/.openpecha/pechas/
But in OCRFormatter's create_opf, the output_path is passed as path of OpenPecha which is opf_path
now, the caller code, eg OCR-pipelines, needs to create opf_path for pecha, which in turn requires to create pecha_id. But the pecha_id generation is handled by Metadata, which is only created in the Formatters. So, with currently implementation, pecha will be saved at opf_path created by caller code. Since, it's doesn't have access to Metadata creation, metadata will generate new pecha_id. Now, we ended up with different, pecha_id in opf_path and `meta.yml.
Therefore, I think this is a leaky abstraction. The caller code only needs to provide where to store the all the pechas to the Formatters.
Why this problem exists?
I think, there is two scenarios, creating and loading pecha and we don't have clean way to handle these two.
Solution
1. When Creating new pechas
We should initialise the pecha object with the actual data like, base, layers, metadata, etc.
When saving, we should provide output_path which is the parent path of the pecha path like {output_path}/{pecha_id}/{pecha_id}.opf. Now the output_path is configurable, which is desired behaviour.
class OpenPechaFS(OpenPecha):
...
@property
def opf_path(self):
return self.base_path / self.pecha_id / f"{self.pecha_id}.opf"
def save(output_path: Path) -> Path:
self.base_path = output_path
...
return self.opf_path2. When Loading existing pechas
I think we should go with classmethods, for eg:
pecha = OpenPechaFS.from_path(<path_to_pecha>) # loads local pechapecha = OpenPechaFS.from_id(<pecha_id>) # downloads pecha from githubpecha = OpenPechaGitRepo.from_commit_sha(<commit_sha>) # loads pecha from specific commit sha