Skip to content
Draft
Changes from all commits
Commits
Show all changes
1775 commits
Select commit Hold shift + click to select a range
69e5e3d
review glutton consolidation and raw DOI normalisation
kermitt2 Feb 12, 2019
1a55156
review consolidation
kermitt2 Feb 12, 2019
6d487aa
better glutton interfacing
kermitt2 Feb 12, 2019
52ff6a3
some improvement in consolidation
kermitt2 Feb 13, 2019
9472e55
Add better control when calling crossref REST API
kermitt2 Feb 14, 2019
196f322
update lexicon
Feb 15, 2019
dbbbe0f
hotfix for pdf2xml renaming
kermitt2 Feb 16, 2019
9758d1b
update renaming in test resource
kermitt2 Feb 16, 2019
85df4ee
cleaning of lexical resources
kermitt2 Feb 18, 2019
c496e0a
Merge pull request #396 from geritwagner/master
kermitt2 Feb 18, 2019
60f0b6e
Avoid extra reference parsing when using glutton
kermitt2 Feb 20, 2019
0b92f5f
Merge branch 'master' of https://github.com/kermitt2/grobid
kermitt2 Feb 20, 2019
d304dd9
[pdfalto] Fix svg coordinates.
Feb 20, 2019
d04ed4d
Put back sense description file to avoid grobid-ner breaking
kermitt2 Feb 22, 2019
3aea34b
Use svg vectorial image extension.
Feb 26, 2019
8a79a7f
Update Xqueries.
Feb 26, 2019
ad298ba
More radical tmp file cleaning
kermitt2 Feb 26, 2019
704894c
Merge branch 'master' of https://github.com/kermitt2/grobid
kermitt2 Feb 26, 2019
fcd0ba3
Fix issue invalid utf8 sequence.
Feb 26, 2019
533a178
fix an error when x and y scaling are different
kermitt2 Feb 28, 2019
acc71d0
Update to latest DeLFT version
kermitt2 Mar 1, 2019
82395ac
[pdfalto] Fix issue with coordinates #330.
Mar 5, 2019
0439efd
[pdfalto] Fix svg bounding box & fix token issue.
Mar 12, 2019
b973846
Renaming to pdfalto.
Mar 12, 2019
ac4959b
Fixing missing pdf.js in the grobid-service jar and docker image #405
lfoppiano Mar 12, 2019
5c72b9c
Update pdfalto sax parser tests.
Mar 12, 2019
4fa512d
Remove old pdf2xml sax parser.
Mar 12, 2019
41fd74e
additional correction web page $405
lfoppiano Mar 12, 2019
a55deaa
Update unit test for pdfalto parsers.
Mar 13, 2019
0a9ea3f
Update annotation actions.
Mar 19, 2019
99d7083
Avoid processing short texts as only one continuous chunk of layout t…
kermitt2 Apr 1, 2019
d8ecab7
Merge branch 'master' of https://github.com/kermitt2/grobid
kermitt2 Apr 1, 2019
828c1b9
Add workaround for Java version to Troubleshooting
rgieseke Apr 2, 2019
d5d713d
Merge pull request #414 from rgieseke/patch-1
kermitt2 Apr 2, 2019
e24a7d4
Add missing win-64 binary
kermitt2 Apr 6, 2019
1e2cc19
Workaround when tiny is not PID 1 (to avoid zombies) #416
lfoppiano Apr 11, 2019
b33428f
Updating docker documentation #416
lfoppiano Apr 11, 2019
f8322f3
Add workaround for compiling with recent java version
Aazhar Apr 12, 2019
97bceac
Update Troubleshooting.md
Aazhar Apr 12, 2019
22666b0
make it build in IntelliJ
boumenot Apr 14, 2019
a35341b
Merge pull request #421 from boumenot/boumenot/playground
kermitt2 Apr 14, 2019
da0a8f1
Some dependency updates for JVM version 10
kermitt2 Apr 15, 2019
db23ff8
test citation style
kermitt2 Apr 23, 2019
346543d
test variant citation style
kermitt2 Apr 23, 2019
10a96a6
Fix a wrong parameter name in the demo
kermitt2 Apr 24, 2019
0feb144
Clean pdftoxml exe and dlls.
Apr 25, 2019
5d07f93
Update pdfalto and add windows/cygwin dependencies.
Apr 25, 2019
2285253
Revert "Clean pdftoxml exe and dlls."
Apr 26, 2019
fe8c8b8
Use batch files to resolve conflict between pdf2xml & pdfalto dlls.
Apr 26, 2019
4e1bb24
Add subdirectory for pdfalto executables and dlls.
Apr 26, 2019
b4ff5b7
add software heritage citation
kermitt2 May 4, 2019
ec23b2f
Use pdf metadata when available when everythng else failed
kermitt2 May 4, 2019
dad047c
try to make a more decent readme
kermitt2 May 4, 2019
ad47453
exploit full glutton metadata when selected
kermitt2 May 5, 2019
68f43e6
better glutton integration for oa links
kermitt2 May 7, 2019
262475b
support glutton server with prefix path
kermitt2 May 7, 2019
523e37c
ignore new submodule
kermitt2 May 12, 2019
76fd65d
more robust publication date selection for consolidation
kermitt2 May 15, 2019
edea564
add all consolidation identifiers in header too
kermitt2 May 15, 2019
dd9a7df
review glutton consolidation strategy in case of doi for header extra…
kermitt2 May 16, 2019
332e303
consistent doi consolidation for header in case of crossref
kermitt2 May 17, 2019
8ef4ca5
update date demo
kermitt2 May 20, 2019
4957316
prepare crossref api User-Agent header
kermitt2 May 26, 2019
a01aa93
Refine crossref call; documentation on deep learning models and conso…
kermitt2 May 27, 2019
e01592b
add optional authorization token for Crossref Metadata Plus service
kermitt2 May 27, 2019
997cf87
update tests
lfoppiano Mar 11, 2019
d747d4f
Adding the jeb library for mac #409
lfoppiano May 28, 2019
4cd5173
Adding the jeb library for mac #409
lfoppiano May 28, 2019
49c6421
Cleaning
kermitt2 May 28, 2019
1fbfb19
[Gradle Release Plugin] - pre tag commit: '0.5.5'.
kermitt2 May 28, 2019
f5238bf
[Gradle Release Plugin] - new version commit: '0.5.6-SNAPSHOT'.
kermitt2 May 28, 2019
d446747
Update doc to new version
kermitt2 May 28, 2019
bfcd747
Merge pull request #433 from kermitt2/0.5.4-fixes
lfoppiano May 28, 2019
51475f4
Updating gradle build for docker image creation
lfoppiano May 28, 2019
f93e4ab
insominac fix of #295
kermitt2 May 31, 2019
65c07ab
Merge branch 'master' of https://github.com/kermitt2/grobid
kermitt2 May 31, 2019
dff1715
Adding the jeb library for mac #409
lfoppiano Jun 6, 2019
5d98df9
Add missing case sensitiveness option for matching layout tokens
kermitt2 Jun 6, 2019
c59cea3
Add unicode equivalent classes for parentheses
kermitt2 Jun 6, 2019
18f18ed
prepare the addition of subscript/superscript feature
kermitt2 Jun 8, 2019
ae1ba53
Correct TEI serialization if list items, thanks @Vitaliy-1 #429
kermitt2 Jun 9, 2019
20d0aeb
migration to gradle 5 #432
lfoppiano Jun 9, 2019
d372e6c
Merge pull request #435 from kermitt2/gradle5
kermitt2 Jun 10, 2019
e3b8886
Correct version in docker documentation
lfoppiano Jun 10, 2019
f1f7129
Merge branch 'master' into jep_macOs
lfoppiano Jun 11, 2019
d8bcaa1
Adding the possibility to run jep via a virtual environment (ideally …
lfoppiano Jun 11, 2019
5b7f91b
loading libraries using the right name for the right OS #409
lfoppiano Jun 11, 2019
44f6241
Trying to load the library installed in the virtualenv + some test in…
lfoppiano Jun 11, 2019
3f154ee
Adding try-with-resources when opening jep + correcting library expli…
lfoppiano Jun 11, 2019
34d59bb
Swap w/h in coordinates documentation
bfirsh Jun 14, 2019
9eac968
Merge pull request #437 from bfirsh/patch-1
kermitt2 Jun 14, 2019
4e31b83
changed python prefix to python3.6
de-code Jun 21, 2019
460ef0f
do not load the python library
de-code Jun 21, 2019
40c4f02
fixed closing jep instance
de-code Jun 21, 2019
1d107a7
looking for venv lib containing python3
de-code Jun 21, 2019
bd654e3
added logging for creating new JEP instance
de-code Jun 21, 2019
9a45408
ading virtual env jep library path
de-code Jun 21, 2019
62b2dd6
extracted method
de-code Jun 21, 2019
a7f61d9
extracted PythonVirtualEnvConfig
de-code Jun 21, 2019
1d6c035
try to add site packages as an include path
de-code Jun 21, 2019
e48fb84
renamed to PythonEnvironmentConfig
de-code Jun 21, 2019
774aceb
support for detecting active environment via VIRTUAL_ENV
de-code Jun 21, 2019
1fc3b85
polishing the mac-osx version #409
lfoppiano Jun 26, 2019
f29fcce
Merge branch 'de-code-fix-jep' into jep_macOs
lfoppiano Jun 26, 2019
4ef1163
Some doc update and cleaning
kermitt2 Jun 26, 2019
6a0dd1c
use https:// in links (not http://)
bnewbold Jun 27, 2019
fed8ffa
consistency in DOI capitalization (not 'doi') in XML
bnewbold Jun 27, 2019
2fdb9cf
Merge pull request #447 from bnewbold/bnewbold-tei-doi-capitalization
kermitt2 Jun 27, 2019
41f3bbb
Merge pull request #446 from bnewbold/bnewbold-https
kermitt2 Jun 27, 2019
c9ffc1b
Throw an Exception if jep cannot be loaded at startup #409
lfoppiano Jun 27, 2019
ab91807
added support for activated conda virtual envs
de-code Jun 27, 2019
4e0034c
added getNativeLibPaths
de-code Jun 27, 2019
b2f0ddc
Fixing delft path
lfoppiano Jun 27, 2019
5bb34c6
add library paths to the front (and preferring paths from the venv)
de-code Jun 27, 2019
f45dec5
synchronize getJEPInstance
de-code Jun 27, 2019
95bf021
reverted setting activeVirtualEnv to virtualEnv
de-code Jun 27, 2019
152ac01
on linux let jep be installed with pip or anaconda #409
lfoppiano Jun 27, 2019
6d2f76f
on linux let jep be installed with pip or anaconda #409
lfoppiano Jun 27, 2019
39c23f4
on linux let jep be installed with pip or anaconda #409
lfoppiano Jun 27, 2019
cf4a83a
on linux let jep be installed with pip or anaconda #409
lfoppiano Jun 27, 2019
50b5d3b
Merge branch 'revise-jep-v2' of https://github.com/de-code/grobid int…
lfoppiano Jun 27, 2019
1146c2c
Another merge
lfoppiano Jun 27, 2019
d68b6a9
Merge branch 'de-code-revise-jep-v2' into jep_macOs
lfoppiano Jun 27, 2019
b34affe
Getting a python version and load it in mac $409
lfoppiano Jun 28, 2019
39e79f0
use getNativeLibPaths
de-code Jun 28, 2019
6093e42
clear grobid.delft.python.virtualEnv
de-code Jun 28, 2019
b976ef4
Merge pull request #452 from de-code/revise-jep-v3
lfoppiano Jun 28, 2019
b638483
Merge branch 'master' into check-evaluation
lfoppiano Jul 4, 2019
457f860
forgotten import
lfoppiano Jul 4, 2019
fce3cd1
trying to block the service when the models cannot be loaded. Loading…
lfoppiano Jul 8, 2019
2890aae
updates #409
lfoppiano Jul 9, 2019
7b0f27f
regenerate feature files for fulltext model; add robustness when crea…
kermitt2 Jul 9, 2019
7d14184
Merge branch 'master' of https://github.com/kermitt2/grobid
kermitt2 Jul 9, 2019
e98b300
some doc fix
kermitt2 Jul 9, 2019
b01c2ad
update fulltext model
kermitt2 Jul 10, 2019
080668c
first implementation of n-fold evaluation #453
lfoppiano Jul 11, 2019
08f2e3a
Adding more tests and more output #453
lfoppiano Jul 11, 2019
e115396
Adding dummy model for testing #410 #453
lfoppiano Jul 11, 2019
b273728
More unit tests and fixes #453
lfoppiano Jul 11, 2019
fbd798a
Printing out more information #453
lfoppiano Jul 11, 2019
59122b4
Adding more information in output #453
lfoppiano Jul 11, 2019
0752f9f
add docker badge in the doc
kermitt2 Jul 11, 2019
55f060f
cleanup #453
lfoppiano Jul 11, 2019
05270e7
dummy edit in the doc to test the webhook
kermitt2 Jul 11, 2019
561c019
new dummy edit in the doc to test the webhook
kermitt2 Jul 11, 2019
8cb5851
implementing output report on file - moved printing code within the m…
lfoppiano Jul 11, 2019
ec1b178
Fixing results output #453 #58
lfoppiano Jul 12, 2019
c72ff35
removing token-level results #453
lfoppiano Jul 12, 2019
e6782f9
fixed docker build with latest openjdk:8-jre-slim image (#458)
de-code Jul 16, 2019
bc1105b
test docker hub badge
kermitt2 Jul 16, 2019
2ba3bfd
Improving output format #453
lfoppiano Jul 16, 2019
7af4150
Using java base librarie instead of guava #453
lfoppiano Jul 16, 2019
244a09a
Print output #453
lfoppiano Jul 16, 2019
156190c
Print output #453
lfoppiano Jul 16, 2019
8b83ffe
Adding label support in result evaluation + cosmetics #453
lfoppiano Jul 16, 2019
5c6f0cd
output label support #453
lfoppiano Jul 16, 2019
25af2b7
Set n = 10 by default and throw exception for n = 1#453
lfoppiano Jul 16, 2019
0049ecc
Implementing averages on labels for 10-fold #453
lfoppiano Jul 16, 2019
31da48b
test jdk 11
lfoppiano Jul 16, 2019
ff9e6b2
calm down and go to sleep #453
lfoppiano Jul 16, 2019
6699124
change travis jdk to openjdk
kermitt2 Jul 16, 2019
8a6331e
optionally redirect jep output
de-code Jul 16, 2019
15c9a87
Adding raw result output #453
lfoppiano Jul 17, 2019
3271c48
cosmetics #453
lfoppiano Jul 17, 2019
b9c6961
Improving visualisation - more cosmetics #453
lfoppiano Jul 17, 2019
62a897c
Adding output of raw results for n-fold evaluation #453
lfoppiano Jul 17, 2019
4f308a5
fixing copy-pasta distraction problem #453
lfoppiano Jul 17, 2019
e675b11
FIxing other minor and nasty annoying errors #453
lfoppiano Jul 17, 2019
149d3b7
do not include the raw results in the output #453
lfoppiano Jul 17, 2019
9a20338
fix issue #461
kermitt2 Jul 23, 2019
604d678
Merge branch 'redirect-jep-output' of https://github.com/elifescience…
lfoppiano Jul 26, 2019
cd35302
Adding documentation and requirements files for conda (GPU version is…
lfoppiano Jul 26, 2019
2255b82
added workaround for setting JEP value with very special characters
de-code Jul 26, 2019
aff3a3d
delete temp file
de-code Jul 26, 2019
c3322d9
revised logging message
de-code Jul 26, 2019
e40f620
added dot to temp file extension
de-code Jul 26, 2019
e54b3c6
support python 3.7
lfoppiano Jul 29, 2019
cb1f538
improving documentation
lfoppiano Jul 29, 2019
4f3a906
create training data: log full exception (#471)
de-code Jul 30, 2019
9395ce1
disable header heuristics by default
de-code Aug 7, 2019
6021517
changed header us heuristics default to true
de-code Aug 7, 2019
aefa5df
Merge pull request #479 from elifesciences/disable-header-heuristics
kermitt2 Aug 7, 2019
f4e945d
Update pdfalto with last fixes
lfoppiano Jul 31, 2019
1c343f7
minor cosmetics, renaming test on pdf alto to match the main class
lfoppiano Aug 8, 2019
24487e3
Merge branch 'master' into check-evaluation
lfoppiano Aug 8, 2019
67243f4
fixing test
lfoppiano Aug 8, 2019
270c83e
fixing test
lfoppiano Aug 8, 2019
7fc02a8
Remove 10-fold from date trainer - forgot there from testing
lfoppiano Aug 9, 2019
d6226a8
remove useless trace
kermitt2 Aug 13, 2019
5e3a81c
revert delft as default sequence labelling
kermitt2 Aug 13, 2019
379c77a
Merge pull request #454 from kermitt2/jep_macOs
kermitt2 Aug 13, 2019
b073e9a
add option to get the raw reference string in the extracted citation …
kermitt2 Aug 15, 2019
dcd1c2f
documentation about the option to add the raw reference string to the…
kermitt2 Aug 15, 2019
662c814
adapt tests for the option to add the raw reference string to the ext…
kermitt2 Aug 15, 2019
cad7683
Merge pull request #483 from kermitt2/option-442
kermitt2 Aug 15, 2019
27a1eed
Merge pull request #468 from elifesciences/fix-label-task-very-specia…
kermitt2 Aug 15, 2019
bb8cf62
document optional parameter includeRawCitations for patent processing
kermitt2 Aug 16, 2019
abc9490
adding more tests for evaluation and fixing small bug on support metrics
lfoppiano Aug 20, 2019
ce933aa
Adding more tests and moving code around
lfoppiano Aug 20, 2019
2b09153
saved by a test :-)
lfoppiano Aug 20, 2019
c6d2930
create valid DocumentPiece for further structuring abstract
kermitt2 Aug 20, 2019
5f71cd6
added bin to .gitignore
de-code Aug 20, 2019
01b9238
Merge pull request #488 from elifesciences/added-bin-to-gitignore
kermitt2 Aug 20, 2019
17d83dd
cleaning remaining bin/
kermitt2 Aug 20, 2019
2525857
better PMID and PMC ID recognition, update citation model with some P…
kermitt2 Aug 22, 2019
626ad60
update processShort for applying the fulltext model to short piece of…
kermitt2 Aug 22, 2019
377ad90
rollback
kermitt2 Aug 22, 2019
2087e78
use previous processShort for all short texts
kermitt2 Aug 23, 2019
2a6ab09
Implementing review remarks #453
lfoppiano Aug 27, 2019
2adac37
documentation for n-folds evaluation
kermitt2 Aug 28, 2019
27cde82
correct spelling in new doc
kermitt2 Aug 28, 2019
64a2a46
extra explanations on grobid-home for the batch mode to avoid any con…
kermitt2 Aug 31, 2019
9e44d1c
improving naming
lfoppiano Sep 4, 2019
fc0cafa
cleanup dehypenisation
lfoppiano Sep 4, 2019
3ccfd89
improving dehypenisation using coordinates to check breakline
lfoppiano Sep 6, 2019
f6b2434
avoiding going out of bounds
lfoppiano Sep 6, 2019
4b14c67
cosmetics
lfoppiano Sep 6, 2019
d6b1d0e
getting instance of GrobidProperties before running tests
lfoppiano Sep 6, 2019
a22098e
adding subList by Offset for layout tokens
lfoppiano Sep 6, 2019
02612ff
Implementing suggestions and move code into methods + adding some uni…
lfoppiano Aug 21, 2019
c534c41
avoid that the python.virtualEnv property breaks the modules performi…
kermitt2 Sep 7, 2019
db33786
Merge branch 'master' of https://github.com/kermitt2/grobid
kermitt2 Sep 7, 2019
12e392c
ignore submodule grobid-keyterm
kermitt2 Sep 11, 2019
38489ed
Merge pull request #280 from kermitt2/check-evaluation
kermitt2 Sep 11, 2019
1a1653b
add model declaration for dataseer
kermitt2 Sep 11, 2019
345c6ae
review processShort; fix bug for DocumentPiece handling in feature ge…
kermitt2 Sep 12, 2019
6a9e167
fix #424, fix labeled abstract mapping
kermitt2 Sep 12, 2019
06da47f
Add a cleaning method for abstract working with layout tokens
kermitt2 Sep 12, 2019
f184546
Merge pull request #486 from kermitt2/duplicated-body-parts-476
kermitt2 Sep 12, 2019
bc6bd9a
fix merging issue with master
kermitt2 Sep 13, 2019
c71f879
merge with master for benchmark
kermitt2 Sep 13, 2019
bf7e1de
cleaning
kermitt2 Sep 13, 2019
8086c8c
Review usage of XMP PDF embedded metadata
kermitt2 Sep 13, 2019
bc18bed
Do not use the XMP embedded metadata for the moment; cleaning
kermitt2 Sep 14, 2019
ce45a96
Fix #505
kermitt2 Sep 23, 2019
24e6f0e
doc update
kermitt2 Sep 27, 2019
5f4af22
Merge branch 'master' into improved-dehypenisation
kermitt2 Sep 28, 2019
883b3cb
do not use anymore deprecated dehyphenization methods in grobid core
kermitt2 Sep 28, 2019
f395e21
support unicode strings
kermitt2 Sep 28, 2019
3212b61
fix test
kermitt2 Sep 28, 2019
472324a
Merge pull request #498 from kermitt2/improved-dehypenisation
kermitt2 Sep 28, 2019
acfc775
add a basic fatcat release JSON parser
bnewbold Jun 26, 2019
d284dbc
add support for additional identifiers
bnewbold Jun 26, 2019
0940820
add glutton_fatcat as a consolidation option
bnewbold Jun 26, 2019
c006a1b
include fatcat+wikidata in citation render; fatcat not fatcat_ident
bnewbold Jun 27, 2019
5d72a0b
expand support for person 'rawName'
bnewbold Jun 27, 2019
d7230e6
better fatcat parsing of names
bnewbold Jun 27, 2019
df2d90b
add fatcat links, and impose my opinion
bnewbold Jun 27, 2019
6dd44fd
remove strong comment
bnewbold Oct 4, 2019
3969c87
set an explicit version to distinguish this branch
bnewbold Oct 4, 2019

Sorry, this diff is taking too long to generate.

It may be too large to display on GitHub.