T2T batching by varisd · Pull Request #786 · ufal/neuralmonkey

varisd · 2019-01-30T12:20:08Z

included batching scheme methods from:
https://github.com/tensorflow/tensor2tensor/blob/415585f40d9f21c56df7bda35033bc915d82321e/tensor2tensor/utils/data_reader.py

jindrahelcl · 2019-02-22T13:47:54Z

neuralmonkey/readers/string_vector_reader.py

        return np.array(numbers, dtype=dtype)

-    def reader(files: List[str])-> Iterable[List[np.ndarray]]:
+    def reader(files: List[str]) -> Iterable[List[np.ndarray]]:


tohle nesouvisí s tou změnou, jen to zanese konflikt do branche s tf datasetem.

ale jestli to jinak neprojde přes travis, tak to tu nechej

jindrahelcl · 2019-02-22T13:48:33Z

tests/hier-multiattention.ini

 output="tests/outputs/hier-multiattention"
 overwrite_output_dir=True
 epochs=1
+batch_size=1


batch size by neměla být povinná jen kvůli tomu, že je někde nějaký workaround..

jindrahelcl · 2019-02-22T13:49:21Z

neuralmonkey/learning_utils.py

                        summaries=True)
+                    # workaround: we need to use validation batching scheme
+                    #             during evaluation
+                    batch.batching = BatchingScheme(batch_size=cfg.batch_size)


tohle neni validation batching scheme. zahoď tuhle změnu, v mým refaktoru už to funguje správně a tohle by zbytečně zaneslo konflikt.

jindrahelcl · 2019-02-22T13:49:46Z

neuralmonkey/dataset.py

+            batch sizes and sequence length tolerance.
+        min_length: int, sequences shorter than this will be skipped.
+    Return:
+         A dictionary with parameters that can be passed to input_pipeline:


tohle neni pravda

jindrahelcl · 2019-02-22T13:50:55Z

neuralmonkey/dataset.py



+def _bucket_boundaries(max_length, min_length=8, length_bucket_step=1.1):
+    """Create a default set of length-bucket boundaries."""


přidal bych příklad vstupu a výstupu, moc nechápu proč length bucket step je float

jindrahelcl · 2019-02-22T13:51:02Z

neuralmonkey/dataset.py

 # pylint: enable=too-few-public-methods


+def _bucket_boundaries(max_length, min_length=8, length_bucket_step=1.1):


chybí typový anotace

jindrahelcl · 2019-02-22T13:52:42Z

neuralmonkey/dataset.py

+    max_length = max_length or batch_size
+    if max_length < min_length:
+        raise ValueError("max_length must be greater or equal to min_length")
+


tady by se mělo kontrolovat že length_bucket_step je > 1.0 a hodit valueerror se zprávou a nenechávat to až na assert v pomocný funkci

jindrahelcl · 2019-02-22T13:57:36Z

ad workaround - to už svuj pull request má, proč je to tady taky?

varisd · 2019-02-25T12:07:51Z

Workaround == je to rozvrtane (rozumej, pada to v normalnich scenarich), takze potrebuju rychly fix, abych mohl pracovat na dalsich vecech.

Vetsina tech veci na sobe zavisi, na druhou stranu se daji semanticky rozdelit, coz jsem udelal do pull requestu. Klidne muzu priste udelat jeden velky PR a nebudeme muset resit zavislosti.

jlibovicky · 2019-03-18T16:47:42Z

Rozumím tomu správě, že tohle potřeba zamergovat jako první? Na čem to teda přesně vázne?

varisd · 2019-03-18T17:43:47Z

Rozumím tomu správě, že tohle potřeba zamergovat jako první? Na čem to teda přesně vázne?
Jo, protoze tento PR prinasi humanni vytvareni schematu pro bucketed token-level batching (ktery je de-facto pro transformery nezbytny).

Je potreba opravit dokumentaci v tech dataset.* metodach vykradenych z t2t (a uvest, ze je berem od nich). Dale doplnit anotace... Jak rikam slo prakticky o copy-paste, abych si nemusel pokazde rucne pocitat bucket_batch_sizes a bucket_boundaries.

Samozrejme ty ostatni PR by mely fungovat i bez tohoto, ale budes si je muset rebasnout :)

jindrahelcl · 2019-05-09T13:04:38Z

Tohle je teda součást #802? Jestli jo, tak to prosím zavři.

varisd · 2019-05-09T13:37:01Z

Neni. Spatne jsem rebasnul

varisd added 2 commits January 9, 2019 16:29

workaround for train_set batching during inference time

299c1bc

added batching schemes from tensor2tensor

7a62312

varisd mentioned this pull request Feb 6, 2019

added simplified BERT support #791

Closed

fixing failed travis tests

1d968b5

varisd force-pushed the t2t_batching branch from a97affc to 1d968b5 Compare February 6, 2019 15:42

jindrahelcl requested changes Feb 22, 2019

View reviewed changes

jindrahelcl mentioned this pull request Mar 20, 2019

Attentive interface #802

Open



		def _bucket_boundaries(max_length, min_length=8, length_bucket_step=1.1):
		"""Create a default set of length-bucket boundaries."""

		# pylint: enable=too-few-public-methods


		def _bucket_boundaries(max_length, min_length=8, length_bucket_step=1.1):

Conversation

varisd commented Jan 30, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jindrahelcl commented Feb 22, 2019

Uh oh!

varisd commented Feb 25, 2019

Uh oh!

jlibovicky commented Mar 18, 2019

Uh oh!

varisd commented Mar 18, 2019

Uh oh!

jindrahelcl commented May 9, 2019

Uh oh!

varisd commented May 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants