You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+55-23Lines changed: 55 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,37 @@
1
1
# Changelog
2
2
3
+
## 2.5.0
4
+
5
+
> [!WARNING]
6
+
>
7
+
> The `imperial_units` keyword argument for `parse_ingredient` is deprecated and will be removed at the next major release.
8
+
>
9
+
> Use the new `volumetric_units_system="imperial"` keyword argument for the same functionality.
10
+
11
+
* Improve execution and accuracy performance of the foundation foods matching functionality.
12
+
13
+
* See the docs [here](https://ingredient-parser.readthedocs.io/en/latest/explanation/foundation.html) for details on how this now works.
14
+
15
+
* The execution performance is ~2.5x faster than in version 2.4.0.
16
+
17
+
* Add `volumetric_unit_system` keyword argument for `parse_ingredient` which allows for specifying unit system that will be used to volumetric units like cup, tablespoon etc. where there can are multiple options with slight differences in the volumes.
18
+
19
+
* This replaced the `imperial_units` argument which will removed in a future release.
20
+
* Supported options are `us_customary` (default), `imperial`, `metric` (for metric tablespoon, teaspoon definitions) , `australian` (for Australian pints, tablespoons), `japanese` (for Japanese cups).
21
+
* See the docs [here](https://ingredient-parser.readthedocs.io/en/latest/tutorials/options.html#volumetric-units-system) for specific details.
22
+
* The customised Pint units registry (`UREG`) that contains additional units relevant to cooking (such as metric cups and tablespoons, Japanese cups etc.) is also more easily importable.
23
+
24
+
```py
25
+
from ingredient_parser importUREG
26
+
```
27
+
* Add `unit_system` attribute to `IngredientAmount` and `CompositeIngredientAmount` to indicate which unit system the amount uses.
28
+
29
+
* This is an Enum with the following values: METRIC, US_CUSTOMARY, IMPERIAL, AUSTRALIAN, JAPANESE, OTHER, NONE.
30
+
31
+
* Fix a bug where an exception was raised if quantity range ended with `x` (e.g. `3-4x`).
32
+
33
+
* If an amount has `MULTIPLIER=True`, set `SINGULAR=True` for any immediately subsequent amounts.
34
+
3
35
## 2.4.0
4
36
5
37
### General
@@ -26,13 +58,13 @@
26
58
>
27
59
> This release only contains changes related to the development tools for this library. There are no changes to the functionality of the library.
28
60
29
-
### Development tools
61
+
### Development Tools
30
62
31
-
* Replace the labeller and webapp tools with a new tool ("webtools") written in react. Many thanks to @[mcioffi](https://github.com/mcioffi) for this contribution. Key functionality:
63
+
* Replace the labeler and webapp tools with a new tool ("webtools") written in react. Many thanks to @[mcioffi](https://github.com/mcioffi) for this contribution. Key functionality:
32
64
33
65
* Parser, to display to parsed output of an input ingredient sentence.
34
66
35
-
*Labeller, to edit the labelled training data or add new training data.
67
+
*Labeler, to edit the labelled training data or add new training data.
36
68
37
69
* Trainer, to initiate training of models.
38
70
@@ -42,7 +74,7 @@
42
74
43
75
## 2.2.0
44
76
45
-
### Foundation foods:
77
+
### Foundation Foods:
46
78
47
79
* Bias foundation food matching to prefer "raw" FDC ingredients, but only if the ingredient name does not include any verbs that indicate the ingredient is not raw (e.g. "cooked").
48
80
* Normalise spelling of tokens in ingredient names to align with spelling used in FDC ingredient descriptions.
@@ -68,13 +100,13 @@
68
100
69
101
> [!WARNING]
70
102
>
71
-
> This version replaces the floret dependency with numpy.
103
+
> This version replaces the floret dependency with NumPy.
72
104
>
73
-
> Numpy was already a dependency of floret, so if you are upgrading from v2.0.0 there should be little impact.
105
+
> NumPy was already a dependency of floret, so if you are upgrading from v2.0.0 there should be little impact.
74
106
75
-
* This release overhauls the foundation foods functionality so that ingredient names are matched to entries in the [FoodData Central](https://fdc.nal.usda.gov/) (FDC) database.
107
+
* This release overhauls the foundation foods functionality so that ingredient names are matched to entries in the [Food Data Central](https://fdc.nal.usda.gov/) (FDC) database.
76
108
77
-
* This update does not change the API. It adds additional fields to `FoundationFood` objects for FDC ID, category and data type. The `text` field now returns the description for the matching FDC entry.
109
+
* This update does not change the API. It adds additional fields to `FoundationFood` objects for FDC ID, category, and data type. The `text` field now returns the description for the matching FDC entry.
78
110
79
111
* Beware that enabling this functionality causes the `parse_ingredient` function to be much slower than when disabled (default).
80
112
@@ -179,7 +211,7 @@
179
211
180
212
* Various minor improvements to feature generation.
181
213
182
-
* Add PREPARED_INGREDIENT flag to IngredientAmount objects. This is used to indicate if the amount refers to the prepared ingredient (`PREPARED_INGREDIENT=True`) or the unpreprared ingredient (`PREPARED_INGREDIENT=False`).
214
+
* Add PREPARED_INGREDIENT flag to IngredientAmount objects. This is used to indicate if the amount refers to the prepared ingredient (`PREPARED_INGREDIENT=True`) or the unprepared ingredient (`PREPARED_INGREDIENT=False`).
183
215
184
216
* Add `starting_index` attribute to IngredientText objects, indicating the index of the token that starts the IngredientText.
185
217
@@ -245,15 +277,15 @@ Require NLTK >= 3.8.2 due to change in POS tagger weights format.
245
277
246
278
### Processing
247
279
248
-
* Change processing of numbers written as words (e.g. 'one', 'two' ). If the token is labelled as QTY, then the number will converted to a digit (i.e. 'one' -> 1) or collapsed into a range (i.e. 'one or two' -> 1-2), otherwise the token is left unchanged.
280
+
* Change processing of numbers written as words (e.g. 'one', 'two' ). If the token is labelled as QTY, then the number will be converted to a digit (i.e. 'one' -> 1) or collapsed into a range (i.e. 'one or two' -> 1-2), otherwise the token is left unchanged.
249
281
250
282
## 1.0.1
251
283
252
284
> [!WARNING]
253
285
>
254
286
> This version requires NLTK >=3.8.2
255
287
256
-
NLTK 3.8.2 changes the file format (from pickle to json) of the weights used by the part of speech tagger used in this project, to address some security concerns. This patch updates the NLTK resource checks performed when `ingredient-parser` is imported to check for the new json files, and downloads them if they are not present.
288
+
NLTK 3.8.2 changes the file format (from pickle to json) of the weights used by the part of speech tagger used in this project, to address some security concerns. This patch updates the NLTK resource checks performed when `ingredient-parser` is imported to check for the new JSON files, and downloads them if they are not present.
257
289
258
290
This version requires NLTK>=3.8.2.
259
291
@@ -285,7 +317,7 @@ This version requires NLTK>=3.8.2.
285
317
### Processing
286
318
287
319
* Various bug fixes to post-processing of tokens with labels NAME, COMMENT, PREP, PURPOSE, SIZE to correct punctuation and confidence calculations.
288
-
* Modification of tokeniser to split full stops from the end of tokens. This helps to model avoid treating "`token.`" and "`token`" as different cases to learn.
320
+
* Modification of tokenizer to split full stops from the end of tokens. This helps to model avoid treating "`token.`" and "`token`" as different cases to learn.
289
321
* Add fallback functionality to `parse_ingredient` for cases where none of the tokens are labelled as NAME. This will select name as the token with the highest confidence of being labelled NAME, even though a different label has a high confidence for that token. This can be disabled by setting `expect_name_in_output=False` in `parse_ingredient`.
290
322
291
323
## 0.1.0-beta10
@@ -298,14 +330,14 @@ Fix incorrect python version specifier in package which was preventing pip in Py
298
330
299
331
### General
300
332
301
-
- Add github actions to run tests (#7, @boxydog)
333
+
- Add GitHub actions to run tests (#7, @boxydog)
302
334
303
335
- Add pre-commit for use with development (#10, @boxydog)
304
336
305
337
### Model
306
338
307
339
- Add additional model performance metrics.
308
-
- Add model hyper-parameter tuning functionality with `python train.py gridsearch` to iterate over specified training algorithms and hyper-parameters.
340
+
- Add model hyperparameter tuning functionality with `python train.py gridsearch` to iterate over specified training algorithms and hyperparameters.
309
341
- Add `--detailed` argument to output detailed information about model performance on test data. (#9, @boxydog)
310
342
- Change model labels to treat label all punctuation as PUNC - this resolves some of the ambiguity in token labeling
311
343
- Introduce SIZE label for tokens that modify the size of the ingredient. Note that his only applies to size modifiers of the ingredient. Size modifiers of the unit will remain part of the unit e.g. large clove.
@@ -316,7 +348,7 @@ Fix incorrect python version specifier in package which was preventing pip in Py
316
348
317
349
- By default, units in `IngredientAmount` object will be returned as `pint.Unit` objects (where possible). This enables the easy conversion of amounts between different units. This can be disabled by setting `string_units=True` in the `parse_ingredient` function calls.
318
350
319
-
- For units that have US customary and Imperial version with the same name (e.g, cup), setting `imperial_units=True` in the `parse_ingredient` function calls will return the imperial version. The default is US customary.
351
+
- For units that have US customary and Imperial version with the same name (e.g., cup), setting `imperial_units=True` in the `parse_ingredient` function calls will return the imperial version. The default is US customary.
320
352
- This only applies to units in `pint`'s unit registry (basically all common, standardised units). If the unit can't be found, then the string is returned as previously.
321
353
322
354
- Additions to `IngredientAmount` object:
@@ -326,7 +358,7 @@ Fix incorrect python version specifier in package which was preventing pip in Py
326
358
- RANGE is set to True for quantity ranges e.g. `1-2`
327
359
- MULTIPLIER is set to True for quantities like `1x`
328
360
- Conversion of quantity field to `float` where possible
329
-
- PreProcessor improvements
361
+
-`PreProcessor` improvements
330
362
- Be less aggressive about replacing written numbers (e.g. one) with the digit version. For example, in sentences like `1 tsp Chinese five-spice`, `five-spice` is now kept as written instead of being replaced by two tokens: `5 spice`.
331
363
- Improve handling of ranges that duplicate the units e.g. `1 pound to 2 pound` is now returned as `1-2 pound`
332
364
@@ -340,26 +372,26 @@ Fix incorrect python version specifier in package which was preventing pip in Py
340
372
### Model
341
373
342
374
- Include more training data, expanding the Cookstr and BBC data by 5,000 additional sentences each
343
-
- Change how the training data is stored. An SQLite database is now used to store the sentences and their tokens and labels. This fixes a long standing bug where tokens in the training data would be assigned the wrong label. csv exports are still available.
375
+
- Change how the training data is stored. An SQLite database is now used to store the sentences and their tokens and labels. This fixes a long standing bug where tokens in the training data would be assigned the wrong label. CSV exports are still available.
344
376
- Discard any sentences containing OTHER label prior to training model, so a parsed ingredient sentence can never contain anything labelled OTHER.
345
377
346
378
### Processing
347
379
348
380
- Remove `other` field from `ParsedIngredient` return from `parse_ingredient` function.
349
381
350
-
- Added `text` field to `IngredientAmount`. This is auto-generated on when the object is created and proves a human readable string for the amount e.g. "100 g"
382
+
- Added `text` field to `IngredientAmount`. This is autogenerated on when the object is created and proves a human readable string for the amount e.g. "100 g"
351
383
352
384
- Allow SINGULAR flag to be set if the amount it's being applied to is in brackets
353
385
354
386
- Where a sentence has multiple related amounts e.g. `14 ounce (400 g)` , any flags set for one of the related amounts are applied to all the related amounts
355
387
356
-
- Rewrite the tokeniser so it doesn't require all handled characters to be explicitly stated
388
+
- Rewrite the tokenizer so it doesn't require all handled characters to be explicitly stated
357
389
358
-
- Add an option to `parse_ingredient` to discard isolated stop words that appear in the name, comment and preparation fields.
390
+
- Add an option to `parse_ingredient` to discard isolated stop words that appear in the name, comment, and preparation fields.
359
391
360
392
-`IngredientAmount.amount` elements are now ordered to match the order in which they appear in the sentence.
361
393
362
-
- Initial support for composite ingredient amounts e.g. `1 lb 2 oz`is now consider to be a single `CompositeIngredientAmount` instead of two separate `IngredientAmount`.
394
+
- Initial support for composite ingredient amounts e.g. `1 lb 2 oz` is now consider to be a single `CompositeIngredientAmount` instead of two separate `IngredientAmount`.
363
395
364
396
- Further work required to handle other cases such `1 tablespoon plus 1 teaspoon`.
365
397
- This solution may change as it develops
@@ -376,7 +408,7 @@ Fix incorrect python version specifier in package which was preventing pip in Py
376
408
- Removal of StrangerFoods dataset from model training due to lack of PREP labels
377
409
- Addition of a BBC Food dataset in the model training
378
410
- 10,000 additional ingredient sentences from the archive of 10599 recipes found at https://archive.org/details/recipes-en-201706
379
-
- Miscellaneous bug fixes to the preprocessing steps to resolve reported issues
411
+
- Miscellaneous bugfixes to the preprocessing steps to resolve reported issues
380
412
- Handling of fractions with the format: 1 and 1/2
381
413
- Handling of amounts followed by 'x' e.g. 1x can
382
414
- Handling of ranges where the units were duplicated: 100g - 200g
@@ -386,7 +418,7 @@ Fix incorrect python version specifier in package which was preventing pip in Py
386
418
- Support the extraction of multiple amounts from the input sentence.
387
419
- Change output dataclass to put confidence values with each field.
388
420
- The name, comment, other fields are output as an `IngredientText` object containing the text and confidence
389
-
- The amounts are output as an `IngredientAmount` object containing the quantity, unit, confidence and flags for whether the amount is approximate or for a singular item of the ingredient.
421
+
- The amounts are output as an `IngredientAmount` object containing the quantity, unit, confidence, and flags for whether the amount is approximate or for a singular item of the ingredient.
390
422
- Rewrite post-processing functionality to make it more maintainable and extensible in the future.
391
423
- Add a [model card](https://github.com/strangetom/ingredient-parser/blob/master/ingredient_parser/ModelCard.md), which provides information about the data used to train and evaluate the model, the purpose of the model and it's limitations.
392
424
- Increase l1 regularisation during model training.
0 commit comments