Open
Conversation
Collaborator
|
Hello @AbhinavChede This is wonderful work! I can see that you have carefully researched the formats of the example files and created deliberate regex. I can only comment on the Greengenes file:
Hello @pavia27 Please work with Abhinav to sort out the KEGG file format. It may be helpful it we have a small sample set of files placed under
|
Collaborator
|
@pavia27 any thoughts? Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi @qiyunzhu ,
I have added the functionality into BinaRena that reads annotation files. This is only a preliminary version. I think the current version gets the job done but maybe is not the most optimal method, regarding run time. Also, I do not know if that is how you imagined the code should check if the current file is an annotation file. See here . I also need to find a better regex for the greengenes file to account for exceptions in the order of the taxons. For now, it does recognize most of the taxons in the test cases.
Also, @pavia27 , is this how you imagined the KEGG support should work? Right now, the code reads the KEGG annotation file and outputs which genes the contig has. It does not indicate at which position the genes is in nor does it show the missing genes. Regarding the KEGG support, I only tested it with a very small dataset due to a lack of proper testing samples. Let me know what you think and if there is something I have to add.