Conversation
* Add WanDB to Hinglish (#48) * Adding SentencePiece * Rearrange * Adding finetuning scripts * Making dir structure more managable * Add Random Search * Add majority voting * Strip notebooks * Add nb stripout from fastai standalone version * Move everything to one notebook * Change the name of the file saved * Change the parameters * Combine common code * Pass "name" to methods in hinglish utils * fix imports * Pass name as variable to add_padding * Remove output * Fix typo * Change names for ensemble models * Change from "output" to name of the LM model * Fix typo * Remove hardcoding for epochs * isort and black :) * Move everything to hinglish * Coffe isn't good for me * Break everything into sensible methods * import clash * fix run_valid * Change notebook according to hinglish.py * Change logs * Fix imports * Change method names * Remove tf dependency * Remove tf from requirements.txt * Remove hardcoding from pd columns * Use store_attr to load variables for class * Fix the size mismatch error by changing final_test.json file * nb-stripout worked * Add majority voting explanation * extract if tarfile and run_language_modeling documentation * Split the transformers notebook * Something broke, I don't know what. * Remove setup * Things would be easier if I knew OOP or Python better * Will fix this later is this works¿ * nan sent¿ * Print eval and test metrics * Change the label for eval * Change the file with empty clean_text * Remove eval testing for now * Logfile name * moving the part which copies things to drive here * Fix formatting add pathlib * add drivepath * Changed the file paths * Remove additional code * Update model performance after reproducing code (#40) - Reproduced on Monday 29th December. - [Training Files](https://drive.google.com/drive/folders/12qEbxbefBY24-YqahVV0v7q_IFyxz3L8?usp=sharing) - [Model/Output files](https://drive.google.com/drive/folders/1x-6klxSJEQu5gUOR1zHUHyHjHrApKmRD?usp=sharing) * Strip NB * Black and Isort all Python code * Squashed commit of the following (Ref wandb): commit 733d6eb Author: meghanabhange <meghanabhange13@gmail.com> Date: Tue Oct 13 14:18:24 2020 +0530 Add sweeps for wandb commit 2f10366 Author: meghanabhange <meghanabhange13@gmail.com> Date: Tue Oct 13 12:45:24 2020 +0530 Add command line training commit 9f3680d Author: meghanabhange <meghanabhange13@gmail.com> Date: Tue Oct 13 00:48:44 2020 +0530 Removing print statements for the variables which are being logged by wandb commit e4d15be Author: meghanabhange <meghanabhange13@gmail.com> Date: Tue Oct 13 00:35:54 2020 +0530 Add wname commit e3c9896 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 23:54:18 2020 +0530 Change timestamp logic commit a80379d Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 23:50:41 2020 +0530 Add configs to wandb commit 77024af Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 23:43:28 2020 +0530 Add hyperparameters commit 5aeb78f Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 22:52:08 2020 +0530 Need to push before power cuts commit 738f1fb Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 22:47:27 2020 +0530 We need apex? commit 37361b0 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 22:46:05 2020 +0530 transformers?? commit f9f9a11 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 22:43:29 2020 +0530 Clean notebooks commit 3d6f182 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 22:41:45 2020 +0530 Fix requirements commit fec8e54 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 22:39:13 2020 +0530 Edit requirements and typo commit eaf18d3 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 22:35:13 2020 +0530 Wandb notebook changes commit b35d72f Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 22:32:58 2020 +0530 Add wandb commit 9ad9372 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 22:13:16 2020 +0530 Add final metrics to wandb too commit 129a6ab Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 21:43:58 2020 +0530 Remove stray print statement commit 66009c6 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 21:39:29 2020 +0530 Remove redundant print statements commit 4497615 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 21:31:42 2020 +0530 Change valid logging commit 05e7fe8 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 21:24:13 2020 +0530 Make logging per epoch commit 5709ff3 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 21:15:22 2020 +0530 Log traning loss per batch commit 0c63488 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 21:14:22 2020 +0530 Print metrics too commit 12115e8 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 21:11:42 2020 +0530 Change run valid commit 8b76118 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 21:03:52 2020 +0530 Init wandb only once and run valid at every batch commit 3a6742a Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 20:50:54 2020 +0530 Change name of logging variables commit 435ecc4 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 20:46:59 2020 +0530 save all output files created commit e898996 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 20:40:20 2020 +0530 Only log metrics for wandb commit fc997e9 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 20:23:53 2020 +0530 Make wandb logs to dict commit 07c8e5f Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 20:22:00 2020 +0530 Undo run_lm changes commit 9fbc95e Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 19:58:30 2020 +0530 Remove logger commit 18a5fc1 Author: meghanabhange <meghanabhange13@gmail.com> Date: Mon Oct 12 19:46:54 2020 +0530 Convert the existing logging to wandb logs * Remove the branch switching in notebook * Didn't remove merge conflict properly * Put random back in the universe * Resolve merge conflicts in hinglishutils * Black Hinglishutils * Fix import errors * Extract only tarfile * This is where tests would have helped right? * import time * Fix imports * Fix imports more Co-authored-by: meghanabhange <meghana.bhange@cumminscollege.in> Co-authored-by: meghanabhange <34004739+meghanabhange@users.noreply.github.com> Co-authored-by: meghanabhange <meghanabhange13@gmail.com> * Add MIT License * Created using Colaboratory * graph exploration * resolving conflicts Co-authored-by: Nirant <NirantK@users.noreply.github.com> Co-authored-by: meghanabhange <meghana.bhange@cumminscollege.in> Co-authored-by: meghanabhange <34004739+meghanabhange@users.noreply.github.com> Co-authored-by: meghanabhange <meghanabhange13@gmail.com>
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
| @@ -0,0 +1,165 @@ | |||
| { | |||
There was a problem hiding this comment.
Can we have the same/one clean function for tweets which we can use in all notebooks instead of having a new one for each notebook? This will change how the cleaned file looks right?
Reply via ReviewNB
| @@ -0,0 +1,165 @@ | |||
| { | |||
There was a problem hiding this comment.
Um. Confusion is me. Where are we using data_lst again? Shouldn't this be appended to text?
Reply via ReviewNB
| @@ -0,0 +1,74 @@ | |||
| { | |||
There was a problem hiding this comment.
A question here, is "outputs" in "data" dir? Are we making this dir somewhere? Also shouldn't we have the code to download and extract 7z from gdrive somewhere because then this would be hard to figure out again in 5 months? Or we don't want to keep this code public?
Reply via ReviewNB
There was a problem hiding this comment.
Good question, we should remove this code from public now itself.
| @@ -0,0 +1,74 @@ | |||
| { | |||
There was a problem hiding this comment.
| @@ -0,0 +1,165 @@ | |||
| { | |||
There was a problem hiding this comment.
| @@ -0,0 +1,903 @@ | |||
| { | |||
There was a problem hiding this comment.
@/radhika look at relative paths using pathlib. This notebook is very colab specific.
Reply via ReviewNB
| @@ -0,0 +1,903 @@ | |||
| { | |||
There was a problem hiding this comment.
Avoid %cd in code directly. Instead, use the Path(data_dir)/"filename" kind of convention?
Reply via ReviewNB
| @@ -0,0 +1,903 @@ | |||
| { | |||
There was a problem hiding this comment.
Try having a consistent naming convention for the variable names. Ref: https://www.python.org/dev/peps/pep-0008/#naming-conventions
Reply via ReviewNB
| @@ -0,0 +1,903 @@ | |||
| { | |||
There was a problem hiding this comment.
| @@ -0,0 +1,903 @@ | |||
| { | |||
There was a problem hiding this comment.
Is the stopwords-hinglish.txt locally uploaded?
This will be hard to reproduce for someone who doesn't have access to that file. Add the text file to drive and use gdown there?
Reply via ReviewNB
|
Also, @NirantK when reorganising a lot files will break because of the file paths. Currently They have data_path like |
|
Also, notebooks like |
meghanabhange
left a comment
There was a problem hiding this comment.
- Reordering would need changes in the notebooks datapath so that it doesn't break
- Also would require a change in how the imports will work in notebooks which import from
hinglishandhinglishutilsscripts. Vocabulary_Count.ipynbis repeated.
No description provided.