Conversation
(in InternationalizedHumanizer class) |
|
|
||
| // *** if the expression $date->getDay() === null && $date->getMonth() === null | ||
| // is superfluous as it seems the function can be restored to its | ||
| // original declaration ( int $year, UnspecifiedDigit $unspecifiedDigit ) |
There was a problem hiding this comment.
We'll, find out by removing the code. We have the tests, and you can add more tests if there are cases you are not sure about.
| // is superfluous as it seems the function can be restored to its | ||
| // original declaration ( int $year, UnspecifiedDigit $unspecifiedDigit ) | ||
|
|
||
| if ( $unspecifiedDigit->getYear() === 4 && $date->getDay() === null && $date->getMonth() === null ) { |
There was a problem hiding this comment.
What is this 4 about?
If all 3 of these conditions are needed, then extracting into isUnspecifiedYear(): bool would be nice. Remember: one level of abstraction.
There was a problem hiding this comment.
Also, I think that -5XXXX is possible, which might break your === 4 check.
| yield 'Day only' => [ 'XXXX-XX-12', '12th' ]; | ||
|
|
||
| // https://github.com/ProfessionalWiki/EDTF/issues/80 | ||
| yield 'Some year' => [ 'XXXX', 'some year' ]; |
There was a problem hiding this comment.
Maybe add
yield 'Some year' => [ 'X', 'some year' ];
yield 'Some year' => [ 'XXX', 'some year' ];
Curious as to what "-X" would yield.
There was a problem hiding this comment.
And check what -5XXXX results into, unless already tested
There was a problem hiding this comment.
And check what -5XXXX results into, unless already tested
50000 BC
this is nice, but still doesn't account for the unspecified digits
There was a problem hiding this comment.
it should perhaps be:
50000 BC (millennium, century and year in decem millennium unspecified)
There was a problem hiding this comment.
That does not scale. Because what is -5XXXXXXX ? How about 50000 BC (year unspecified)
There was a problem hiding this comment.
-5XXXXXXX would be "50 millions years BC" and -5XXXX " 5 decem millennium BC",
so (given that your solution is the only viable at the moment) in general for unspecified digits we could use always the literal expression, that is "5 millenniums BC" instead than "5000 BC (year unspecified)" (with the first notation it is implicit)
There was a problem hiding this comment.
That means implementing scales (https://en.wikipedia.org/wiki/Long_and_short_scales#Short_scale) e.g. the short scale to billion? As (according to Google) the universe is 13 billion years old and will likely exists for 22 billion years and this is about dates. :)
There was a problem hiding this comment.
I was checking further the specification here https://www.loc.gov/standards/datetime/
(Unspecified Digit) and the parser,
but I'm not sure that the parser is checking whether
the unspecified digit is placed incrementally from the right of the date component.
e.g.
the following should throw an error:
-XXXXXXX4 (4th year of any millennium BC)
e008d42 to
edb98e4
Compare
| yield 'Scales (8 digits)' => [ '5XXXXXXXX', '5 hundreds of millions' ]; | ||
| yield 'Scales (9 digits)' => [ '5XXXXXXXXX', '5 billions' ]; | ||
| yield 'Scales (10 digits)' => [ '5XXXXXXXXXX', '5 tens of billions' ]; | ||
| yield 'Scales (11 digits)' => [ '5XXXXXXXXXXX', '5 hundreds of billions' ]; |
There was a problem hiding this comment.
"5 decem millennia"? What does decem mean here?
"5 hundreds of thousands" is just an amount; should it not also have "years" in there somewhere? Is this even a valid EDTF date to begin with? @mzeinstra
There was a problem hiding this comment.
yes, sometimes the word "year" is missing, I have still to add it where necessary
There was a problem hiding this comment.
"5 decem millennia"? What does
decemmean here?
"Decem millennium" theoretically means "Ten thousand years" it could be also used the expression "tens of thousands"
There was a problem hiding this comment.
tens of thousands is indeed better, but we can also leave that up to the translators.
Cosmology seems to me a valid border of amounts of years that we humanise. Maybe a fallback to just presenting the years in numbers?
The most years I can find in the sciences is the half-life of Xenon, which is 18 billion trillion years :)
There was a problem hiding this comment.
As we should not forget the objective here, that is to make these edtf-string human readable. If 99% of the use cases can be captured than that is ok with me.
There was a problem hiding this comment.
Previously I also posted this link
https://asistdl.onlinelibrary.wiley.com/doi/epdf/10.1002/pra2.552
in the element.io SemanticMediawiki group because somebody asked about the support for "geological and historical eras".
Regarding the following
Cosmology seems to me a valid border of amounts of years that we humanise. Maybe a fallback to just presenting the years in numbers?
in my opinion if the user registers a date with unspecified digits and the library outputs a human readable format, and additionally computes correctly comparisons and intervals, that's really useful.
Still in my opinion the approach to present a year with mixed unspecified digits using numbers for the specified digits and literals for the unspecified (scaled) part, seems also formally rigorous. So using the example above -5XXXXXXX is not to be precise "50 millions years BC" but "5 tens of millions years BC" (in the output the number represents the specified digits, and the literal part the unspecified digits)
So what about, when this is more refined, to publish a demo (which could query the library using Ajax) and to propose this approach to the authors of the format, the Washington' Library of Congress, as far as I know ?
There was a problem hiding this comment.
Afaik there is no standard for the humanisation of EDTF, humanised examples are used on the library of congres website but they are not the owner of the standard.
So your best efforts here will de facto help set the humanisation standard.
@mnhn-paul you might be able to provide us with a standard of humanising eras in EDTF. Eg. What would the date: -5XXXXXXX be humanised to?
Paul is from our museum of natural history and might have more insight into this.
|
|
||
| return $ret + 1; | ||
| } | ||
| return 0; |
There was a problem hiding this comment.
Can apply https://www.entropywins.wtf/blog/2019/01/14/readable-functions-guard-clause/
$ret is not a good var name
| } | ||
|
|
||
| private function humanizeYear( int $year, UnspecifiedDigit $unspecifiedDigit ): string { | ||
| private function scaleToMessageKey( int $scale ): string { |
There was a problem hiding this comment.
Yay for putting this in its own method!
|
|
||
| // FIXME: reuse recursively the scale with trillions | ||
| // e.g. tens-of-trillions etc., | ||
| return 'edtf-tens-of-trillions'; |
There was a problem hiding this comment.
Let's first make sure we actually need this. I am dubious that these are valid EDTF dates.
There was a problem hiding this comment.
sure, the only sensible use-case is in cosmology
There was a problem hiding this comment.
Let's first make sure we actually need this. I am dubious that these are valid EDTF dates.
actually they are required only for testing purposes, e.g. by an user who wonder what is the output entering something like that 56XXXXXXXXXXXXXXXXXXX
|
|
||
| if ( $date->isBC() ) { | ||
| $ret .= " " . $this->message( "edtf-date-BC" ); | ||
| } |
There was a problem hiding this comment.
Concatenating messages like this bakes in an assumption about the order of these messages, which might not hold in all languages. That is why for instance we have messages such as "edtf-qualified-date": "$1 ($2)".
There was a problem hiding this comment.
yes, I guessed something like that, so that should be taken into account and enhanced
| $ret .= " " . $this->message( "edtf-date-BC" ); | ||
| } | ||
|
|
||
| return $ret; |
fix #80