-
Notifications
You must be signed in to change notification settings - Fork 530
new-figure-table-extraction - Extract figures from SVG #1297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
45da2f2 to
d23c92f
Compare
# Conflicts: # grobid-core/src/main/java/org/grobid/core/engines/FullTextParser.java # grobid-home/models/fulltext/model.wapiti # grobid-service/src/main/java/org/grobid/service/GrobidPaths.java # grobid-service/src/main/java/org/grobid/service/GrobidRestService.java # grobid-service/src/main/java/org/grobid/service/process/GrobidRestProcessFiles.java
Check warning
Code scanning / CodeQL
Information exposure through an error message Medium
Error information
Error information
Error information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix the information exposure issue, we need to ensure that exceptions thrown from restProcessFiles.getFigures(inputStream) do not result in sensitive error messages reaching the REST client. The best way is to wrap this call in a try-catch block, log the details of the exception on the server (using the local logger), and return a generic error message to the client with an appropriate HTTP status code (such as 500 Internal Server Error). Only the generic message should go to the response. We will edit only the relevant method in GrobidRestService.java, add logging of the error (using SLF4J, already imported as logger), and suppress details in the client response. No changes to the rest of the code are necessary.
-
Copy modified lines R863-R872
| @@ -860,8 +860,16 @@ | ||
| @Produces(MediaType.APPLICATION_XML) | ||
| @POST | ||
| public Response getFiguresAndTables( | ||
| @FormDataParam(INPUT) InputStream inputStream) throws Exception { | ||
| return restProcessFiles.getFigures(inputStream); | ||
| @FormDataParam(INPUT) InputStream inputStream) { | ||
| try { | ||
| return restProcessFiles.getFigures(inputStream); | ||
| } catch (Exception ex) { | ||
| logger.error("Exception occurred while extracting figures and tables.", ex); | ||
| return Response.status(Response.Status.INTERNAL_SERVER_ERROR) | ||
| .entity("An internal server error occurred while processing the request.") | ||
| .type(MediaType.TEXT_PLAIN) | ||
| .build(); | ||
| } | ||
| } | ||
|
|
||
| @Path(PATH_CREATE_TRAINING) |
@kermitt2 I've started working on the new figure table extraction, but since I have rebased your initial branch on master I created a new branch, to keep the original, in case I break stuff.
From now on, I'm going to do incremental branches, this first PR implemented the SVG parsing and extraction.
There might be some problems with images that are going beyond their actual zone, which I did not find a way to exclude (checking for transparency, etc... did not help in these edge cases - any idea is welcome 👍 ):
(For reference this is Figure 1 SVG):
I'll try to post a few benchmarks for each new implementation so that we can track the progress.
Here are other images (of correctly identified figures) 😄 :