-pdfcpu is a PDF processing library written in [Go](http://golang.org) supporting encryption.
-It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000).
+pdfcpu is a PDF processing library written in [Go](https://go.dev/) that supports encryption and offers both an API and a command-line interface (CLI). It is compatible with all PDF versions with basic support and ongoing improvement for PDF 2.0 (ISO-32000-2).
-Support for PDF 2.0 is basic and ongoing work.
## Motivation
@@ -41,7 +42,8 @@ This is an effort to build a comprehensive PDF processing library from the groun
## Focus
-The main focus lies on strong support for batch processing and scripting via a rich command line. At the same time pdfcpu wants to make it easy to integrate PDF processing into your Go based backend system by providing a robust command set.
+The primary emphasis is on providing robust assistance for batch processing and scripting through a comprehensive command-line interface.
+Simultaneously, pdfcpu aims to simplify the integration of PDF processing into your Go-based backend system by offering a versatile set of commands.
## Command Set
@@ -50,10 +52,12 @@ The main focus lies on strong support for batch processing and scripting via a r
* [booklet](https://pdfcpu.io/generate/booklet)
* [bookmarks](https://pdfcpu.io/bookmarks/bookmarks)
* [boxes](https://pdfcpu.io/boxes/boxes)
+* [certificates](https://pdfcpu.io/core/certs)
* [change owner password](https://pdfcpu.io/encrypt/change_opw)
* [change user password](https://pdfcpu.io/encrypt/change_upw)
* [collect](https://pdfcpu.io/core/collect)
-* [create](https://pdfcpu.io/generate/create)
+* [config](https://pdfcpu.io/config/config)
+* [create](https://pdfcpu.io/create/create)
* [crop](https://pdfcpu.io/core/crop)
* [cut](https://pdfcpu.io/generate/cut)
* [decrypt](https://pdfcpu.io/encrypt/decryptPDF)
@@ -73,25 +77,27 @@ The main focus lies on strong support for batch processing and scripting via a r
* [pagelayout](https://pdfcpu.io/pagelayout/pagelayout)
* [pagemode](https://pdfcpu.io/pagemode/pagemode)
* [pages](https://pdfcpu.io/pages/pages)
-* [permissions](https://pdfcpu.io/encrypt/perm_add)
+* [permissions](https://pdfcpu.io/encrypt/perm_set)
* [portfolio](https://pdfcpu.io/portfolio/portfolio)
* [poster](https://pdfcpu.io/generate/poster)
* [properties](https://pdfcpu.io/properties/properties)
* [resize](https://pdfcpu.io/core/resize)
* [rotate](https://pdfcpu.io/core/rotate)
+* [signatures](http://pdfcpu.io/core/sign)
* [split](https://pdfcpu.io/core/split)
* [stamp](https://pdfcpu.io/core/stamp)
* [trim](https://pdfcpu.io/core/trim)
-* [validate](https://pdfcpu.io/core/validate) 👉 now including rudimentory support for PDF 2.0
+* [validate](https://pdfcpu.io/core/validate)
* [viewerpref](https://pdfcpu.io/viewerpref/viewerpref)
* [watermark](https://pdfcpu.io/core/watermark)
* [zoom](https://pdfcpu.io/core/zoom)
## Documentation
-* The main entry point is [pdfcpu.io](https://pdfcpu.io).
-* For CLI examples also go to [pdfcpu.io](https://pdfcpu.io). There you will find explanations of all the commands and their parameters.
-* For API examples of all pdfcpu operations please refer to [GoDoc](https://pkg.go.dev/github.com/pdfcpu/pdfcpu/pkg/api).
+* [pdfcpu.io](https://pdfcpu.io)
+* [API tests](https://github.com/pdfcpu/pdfcpu/tree/master/pkg/api/test)
+* [API samples](https://github.com/pdfcpu/pdfcpu/tree/master/pkg/samples)
+* CLI usage: `$ pdfcpu help cmd`
### GoDoc
@@ -147,10 +153,10 @@ $ pdfcpu version
### Run in a Docker container
-```
+```shell
$ docker build -t pdfcpu .
-# mount current folder into container to process local files
-$ docker run -it --mount type=bind,source="$(pwd)",target=/app pdfcpu ./pdfcpu validate /app/pdfs/a.pdf
+# mount current host folder into container as /app to process files in the local host folder
+$ docker run -it -v "$(pwd)":/app pdfcpu validate a.pdf
```
## Contributing
@@ -204,7 +210,8 @@ Thanks 💚 goes to these wonderful people:
| [This is some rich text.
+ // + // `, // rich text (ignored by Mac Preview and rendered mediocre by Adobe Reader) + types.AlignCenter, // horizontal alignment + "Helvetica", // font name (TODO) + 12, // font size in points (TODO) + &color.Green, // font color + "", // DS (default style string) + nil, // Intent + nil, // callOutLine + nil, // callOutLineEndingStyle + 0, 0, 0, 0, // margin + 0, // borderWidth + model.BSSolid, // borderStyle + false, // cloudyBorder + 0) // cloudyBorderIntensity var linkAnn model.AnnotationRenderer = model.NewLinkAnnotation( - *types.NewRectangle(200, 0, 300, 100), - nil, - nil, - "https://pdfcpu.io", - "ID2", - 0, - 1, - model.BSSolid, - &color.Red, - true) + *types.NewRectangle(200, 0, 300, 100), // rect + 0, // apObjNr + "", // contents + "ID2", // id + "", // modDate + 0, // f + &color.Red, // borderCol + nil, // dest + "https://pdfcpu.io", // uri + nil, // quad + true, // border + 1, // borderWidth + model.BSSolid, // borderStyle +) var squareAnn model.AnnotationRenderer = model.NewSquareAnnotation( - *types.NewRectangle(300, 0, 350, 50), - "Square Annotation", - "ID3", - 0, - 1, - model.BSSolid, - &color.Blue, - false, - 0, - nil, - 0, 0, 0, 0) + *types.NewRectangle(300, 0, 350, 50), // rect + 0, // apObjNr + "Square Annotation", // contents + "ID3", // id + "", // modDate + 0, // f + &color.Gray, // col + "Title1", // title + nil, // popupIndRef + nil, // ca + "", // rc + "", // subject + &color.Blue, // fillCol + 0, // MLeft + 0, // MTop + 0, // MRight + 0, // MBot + 1, // borderWidth + model.BSSolid, // borderStyle + false, // cloudyBorder + 0, // cloudyBorderIntensity +) + +var squareAnnCJK model.AnnotationRenderer = model.NewSquareAnnotation( + *types.NewRectangle(300, 50, 350, 100), // rect + 0, // apObjNr + "方形注释", // contents + "ID3CJK", // id + "", // modDate + 0, // f + &color.Gray, // col + "Title1", // title + nil, // popupIndRef + nil, // ca + "", // rc + "", // subject + &color.Green, // fillCol + 0, // MLeft + 0, // MTop + 0, // MRight + 0, // MBot + 1, // borderWidth + model.BSDashed, // borderStyle + false, // cloudyBorder + 0, // cloudyBorderIntensity +) var circleAnn model.AnnotationRenderer = model.NewCircleAnnotation( - *types.NewRectangle(400, 0, 450, 50), - "Circle Annotation", - "ID4", - model.AnnLocked, - 3, - model.BSBeveled, - &color.Green, - true, - 1, - &color.Blue, - 10, 10, 10, 10) + *types.NewRectangle(400, 0, 450, 50), // rect + 0, // apObjNr + "Circle Annotation", // contents + "ID4", // id + "", // modDate + 0, // f + &color.Gray, // col + "Title1", // title + nil, // popupIndRef + nil, // ca + "", // rc + "", // subject + &color.Blue, // fillCol + 0, // MLeft + 0, // MTop + 0, // MRight + 0, // MBot + 1, // borderWidth + model.BSSolid, // borderStyle + false, // cloudyBorder + 0, // cloudyBorderIntensity +) + +var circleAnnCJK model.AnnotationRenderer = model.NewCircleAnnotation( + *types.NewRectangle(400, 50, 450, 100), // rect + 0, // apObjNr + "圆圈注释", // contents + "ID4CJK", // id + "", // modDate + 0, // f + &color.Green, // col + "Title1", // title + nil, // popupIndRef + nil, // ca + "", // rc + "", // subject + &color.Blue, // fillCol + 10, // MLeft + 10, // MTop + 10, // MRight + 10, // MBot + 1, // borderWidth + model.BSBeveled, // borderStyle + false, // cloudyBorder + 0, // cloudyBorderIntensity +) func annotationCount(t *testing.T, inFile string) int { t.Helper() @@ -434,24 +571,30 @@ func TestAddAnnotationsLowLevel(t *testing.T) { func TestAddLinkAnnotationWithDest(t *testing.T) { msg := "TestAddLinkAnnotationWithDest" + // Best viewed with Adobe Reader. + inFile := filepath.Join(inDir, "Walden.pdf") outFile := filepath.Join(samplesDir, "annotations", "LinkAnnotWithDestTopLeft.pdf") // Create internal link: // Add a 100x100 link rectangle on the bottom left corner of page 2. // Set destination to top left corner of page 1. + dest := &model.Destination{Typ: model.DestXYZ, PageNr: 1, Left: -1, Top: -1} internalLink := model.NewLinkAnnotation( - *types.NewRectangle(0, 0, 100, 100), - nil, - &model.Destination{Typ: model.DestXYZ, PageNr: 1, Left: -1, Top: -1}, - "", - "id", - 0, - 1, - model.BSSolid, - &color.Red, - true, + *types.NewRectangle(0, 0, 100, 100), // rect + 0, // apObjNr + "", // contents + "ID2", // id + "", // modDate + 0, // f + &color.Red, // borderCol + dest, // dest + "", // uri + nil, // quad + true, // border + 1, // borderWidth + model.BSSolid, // borderStyle ) err := api.AddAnnotationsFile(inFile, outFile, []string{"2"}, internalLink, nil, false) @@ -463,14 +606,21 @@ func TestAddLinkAnnotationWithDest(t *testing.T) { func TestAddAnnotationsFile(t *testing.T) { msg := "TestAddAnnotationsFile" + // Best viewed with Adobe Reader. + inFile := filepath.Join(inDir, "test.pdf") - outFile := filepath.Join(samplesDir, "annotations", "TestAnnotationsFile.pdf") + outFile := filepath.Join(samplesDir, "annotations", "Annotations.pdf") // Add text annotation. if err := api.AddAnnotationsFile(inFile, outFile, nil, textAnn, nil, false); err != nil { t.Fatalf("%s add: %v\n", msg, err) } + // Add CJK text annotation. + if err := api.AddAnnotationsFile(outFile, outFile, nil, textAnnCJK, nil, false); err != nil { + t.Fatalf("%s add: %v\n", msg, err) + } + // Add link annotation. if err := api.AddAnnotationsFile(outFile, outFile, nil, linkAnn, nil, false); err != nil { t.Fatalf("%s add: %v\n", msg, err) @@ -481,17 +631,28 @@ func TestAddAnnotationsFile(t *testing.T) { t.Fatalf("%s add: %v\n", msg, err) } + // Add CJK square annotation. + if err := api.AddAnnotationsFile(outFile, outFile, nil, squareAnnCJK, nil, false); err != nil { + t.Fatalf("%s add: %v\n", msg, err) + } + // Add circle annotation. if err := api.AddAnnotationsFile(outFile, outFile, nil, circleAnn, nil, false); err != nil { t.Fatalf("%s add: %v\n", msg, err) } + + // Add CJK circle annotation. + if err := api.AddAnnotationsFile(outFile, outFile, nil, circleAnnCJK, nil, false); err != nil { + t.Fatalf("%s add: %v\n", msg, err) + } + } func TestAddAnnotations(t *testing.T) { msg := "TestAddAnnotations" inFile := filepath.Join(inDir, "test.pdf") - outFile := filepath.Join(samplesDir, "annotations", "TestAnnotations.pdf") + outFile := filepath.Join(outDir, "Annotations.pdf") // Create a context from inFile. ctx, err := api.ReadContextFile(inFile) @@ -501,14 +662,19 @@ func TestAddAnnotations(t *testing.T) { // Prepare annotations for page 1. m := map[int][]model.AnnotationRenderer{} - anns := make([]model.AnnotationRenderer, 4) + anns := make([]model.AnnotationRenderer, 7) + anns[0] = textAnn - anns[1] = linkAnn + anns[1] = textAnnCJK anns[2] = squareAnn - anns[3] = circleAnn + anns[3] = squareAnnCJK + anns[4] = circleAnn + anns[5] = circleAnnCJK + anns[6] = linkAnn + m[1] = anns - // Add 4 annotations to page 1. + // Add 7 annotations to page 1. if ok, err := pdfcpu.AddAnnotationsMap(ctx, m, false); err != nil || !ok { t.Fatalf("%s add: %v\n", msg, err) } @@ -519,3 +685,412 @@ func TestAddAnnotations(t *testing.T) { } } + +func TestPopupAnnotation(t *testing.T) { + msg := "TestPopupAnnotation" + + // Add a Markup annotation and a linked Popup annotation. + // Best viewed with Adobe Reader. + + inFile := filepath.Join(inDir, "test.pdf") + outFile := filepath.Join(samplesDir, "annotations", "PopupAnnotation.pdf") + + incr := false + pageNr := 1 + + // Create a context. + ctx, err := api.ReadContextFile(inFile) + if err != nil { + t.Fatalf("%s readContext: %v\n", msg, err) + } + + // Add Markup annotation. + parentIndRef, textAnnotDict, err := pdfcpu.AddAnnotationToPage(ctx, pageNr, textAnn, incr) + if err != nil { + t.Fatalf("%s Add Text AnnotationToPage: %v\n", msg, err) + } + + // Add Markup annotation as parent of Popup annotation. + popupAnn := model.NewPopupAnnotation( + *types.NewRectangle(0, 0, 100, 100), // rect + 0, // apObjNr + "Popup content", // contents + "IDPopup", // id + "", // modDate + 0, // f + &color.Green, // col + 0, // borderRadX + 0, // borderRadY + 2, // borderWidth + parentIndRef, // parentIndRef, + false, // displayOpen + ) + + // Add Popup annotation. + popupIndRef, _, err := pdfcpu.AddAnnotationToPage(ctx, pageNr, popupAnn, incr) + if err != nil { + t.Fatalf("%s Add Popup AnnotationToPage: %v\n", msg, err) + } + + // Add Popup annotation to Markup annotation. + textAnnotDict["Popup"] = *popupIndRef + + // Write context to file. + if err := api.WriteContextFile(ctx, outFile); err != nil { + t.Fatalf("%s write: %v\n", msg, err) + } +} + +func TestInkAnnotation(t *testing.T) { + msg := "TestInkAnnotation" + + // Best viewed with Adobe Reader. + + inFile := filepath.Join(inDir, "test.pdf") + outFile := filepath.Join(samplesDir, "annotations", "InkAnnotation.pdf") + + p1 := model.InkPath{100., 542., 150., 492., 200., 542.} + p2 := model.InkPath{100, 592, 150, 592} + + inkAnn := model.NewInkAnnotation( + *types.NewRectangle(0, 0, 100, 100), // rect + 0, // apObjNr + "Ink content", // contents + "IDInk", // id + "", // modDate + 0, // f + &color.Red, // col + "Title1", // title + nil, // popupIndRef + nil, // ca + "", // rc + "", // subject + []model.InkPath{p1, p2}, // InkList + 0, // borderWidth + model.BSSolid, // borderStyle + ) + + // Add Ink annotation. + if err := api.AddAnnotationsFile(inFile, outFile, nil, inkAnn, nil, false); err != nil { + t.Fatalf("%s add: %v\n", msg, err) + } +} + +func TestHighlightAnnotation(t *testing.T) { + msg := "TestHighlightAnnotation" + + // Best viewed with Adobe Reader. + + inFile := filepath.Join(inDir, "testWithText.pdf") + outFile := filepath.Join(samplesDir, "annotations", "HighlightAnnotation.pdf") + + r := types.NewRectangle(205, 624.16, 400, 645.88) + + ql := types.NewQuadLiteralForRect(r) + + inkAnn := model.NewHighlightAnnotation( + *r, // rect + 0, // apObjNr + "Highlight content", // contents + "IDHighlight", // id + "", // modDate + model.AnnLocked, // f + &color.Yellow, // col + 0, // borderRadX + 0, // borderRadY + 2, // borderWidth + "Comment by Horst", // title + nil, // popupIndRef + nil, // ca + "", // rc + "Subject", // subject + types.QuadPoints{*ql}, // quad points + ) + + // Add Highlight annotation. + if err := api.AddAnnotationsFile(inFile, outFile, nil, inkAnn, nil, false); err != nil { + t.Fatalf("%s add: %v\n", msg, err) + } +} + +func TestUnderlineAnnotation(t *testing.T) { + msg := "TestUnderlineAnnotation" + + // Best viewed with Adobe Reader. + + inFile := filepath.Join(inDir, "testWithText.pdf") + outFile := filepath.Join(samplesDir, "annotations", "UnderlineAnnotation.pdf") + + r := types.NewRectangle(205, 624.16, 400, 645.88) + + ql := types.NewQuadLiteralForRect(r) + + underlineAnn := model.NewUnderlineAnnotation( + *r, // rect + 0, // apObjNr + "Underline content", // contents + "IDUnderline", // id + "", // modDate + model.AnnLocked, // f + &color.Yellow, // col + 0, // borderRadX + 0, // borderRadY + 2, // borderWidth + "Title1", // title + nil, // popupIndRef + nil, // ca + "", // rc + "", // subject + types.QuadPoints{*ql}, // quad points + ) + + // Add Underline annotation. + if err := api.AddAnnotationsFile(inFile, outFile, nil, underlineAnn, nil, false); err != nil { + t.Fatalf("%s add: %v\n", msg, err) + } +} + +func TestSquigglyAnnotation(t *testing.T) { + msg := "TestSquigglyAnnotation" + + // Best viewed with Adobe Reader. + + inFile := filepath.Join(inDir, "testWithText.pdf") + outFile := filepath.Join(samplesDir, "annotations", "SquigglyAnnotation.pdf") + + r := types.NewRectangle(205, 624.16, 400, 645.88) + + ql := types.NewQuadLiteralForRect(r) + + squigglyAnn := model.NewSquigglyAnnotation( + *r, // rect + 0, // apObjNr + "Squiggly content", // contents + "IDSquiggly", // id + "", // modDate + model.AnnLocked, // f + &color.Yellow, // col + 0, // borderRadX + 0, // borderRadY + 2, // borderWidth + "Title1", // title + nil, // popupIndRef + nil, // ca + "", // rc + "", // subject + types.QuadPoints{*ql}, // quad points + ) + + // Add Squiggly annotation. + if err := api.AddAnnotationsFile(inFile, outFile, nil, squigglyAnn, nil, false); err != nil { + t.Fatalf("%s add: %v\n", msg, err) + } +} + +func TestStrikeOutAnnotation(t *testing.T) { + msg := "TestStrikeOutAnnotation" + + // Best viewed with Adobe Reader. + + inFile := filepath.Join(inDir, "testWithText.pdf") + outFile := filepath.Join(samplesDir, "annotations", "StrikeOutAnnotation.pdf") + + r := types.NewRectangle(205, 624.16, 400, 645.88) + + ql := types.NewQuadLiteralForRect(r) + + strikeOutAnn := model.NewStrikeOutAnnotation( + *r, // rect + 0, // apObjNr + "StrikeOut content", // contents + "IDStrikeOut", // id + "", // modDate + model.AnnLocked, // f + &color.Yellow, // col + 0, // borderRadX + 0, // borderRadY + 2, // borderWidth + "Title1", // title + nil, // popupIndRef + nil, // ca + "", // rc + "", // subject + types.QuadPoints{*ql}, // quad points + ) + + // Add StrikeOut annotation. + if err := api.AddAnnotationsFile(inFile, outFile, nil, strikeOutAnn, nil, false); err != nil { + t.Fatalf("%s add: %v\n", msg, err) + } +} + +func TestFreeTextAnnotation(t *testing.T) { + msg := "TestFreeTextAnnotation" + + // Best viewed with Adobe Reader. + + inFile := filepath.Join(inDir, "test.pdf") + outFile := filepath.Join(samplesDir, "annotations", "FreeTextAnnotation.pdf") + + // Add Free text annotation. + if err := api.AddAnnotationsFile(inFile, outFile, nil, freeTextAnn, nil, false); err != nil { + t.Fatalf("%s add: %v\n", msg, err) + } +} + +func TestPolyLineAnnotation(t *testing.T) { + msg := "TestPolyLineAnnotation" + + // Best viewed with Adobe Reader. + + inFile := filepath.Join(inDir, "test.pdf") + outFile := filepath.Join(samplesDir, "annotations", "PolyLineAnnotation.pdf") + + leButt := model.LEButt + leOpenArrow := model.LEOpenArrow + + polyLineAnn := model.NewPolyLineAnnotation( + *types.NewRectangle(30, 30, 110, 110), // rect + 0, // apObjNr + "PolyLine Annotation", // contents + "IDPolyLine", // id + "", // modDate + 0, // f + &color.Gray, // col + "Title1", // title + nil, // popupIndRef + nil, // ca + "", // rc + "", // subject + types.NewNumberArray(30, 30, 110, 110, 110, 30), // vertices + nil, // path + nil, // intent + nil, // measure + &color.Green, // fillCol + 1, // borderWidth + model.BSDashed, // borderStyle + &leButt, // start lineEndingStyle + &leOpenArrow, // end lineEndingStyle + ) + + // Add PolyLine annotation. + if err := api.AddAnnotationsFile(inFile, outFile, nil, polyLineAnn, nil, false); err != nil { + t.Fatalf("%s add: %v\n", msg, err) + } +} + +func TestPolygonAnnotation(t *testing.T) { + msg := "TestPolygonAnnotation" + + // Best viewed with Adobe Reader. + + inFile := filepath.Join(inDir, "test.pdf") + outFile := filepath.Join(samplesDir, "annotations", "PolygonAnnotation.pdf") + + polygonAnn := model.NewPolygonAnnotation( + *types.NewRectangle(30, 30, 110, 110), // rect + 0, // apObjNr + "Polygon Annotation", // contents + "IDPolygon", // id + "", // modDate + 0, // f + &color.Gray, // col + "Title1", // title + nil, // popupIndRef + nil, // ca + "", // rc + "", // subject + types.NewNumberArray(30, 30, 110, 110, 110, 30), // vertices + nil, // path + nil, // intent + nil, // measure + &color.Green, // fillCol + 5, // borderWidth + model.BSDashed, // borderStyle + true, // cloudyBorder + 2) // cloudyBorderIntensity + + // Add Polygon annotation. + if err := api.AddAnnotationsFile(inFile, outFile, nil, polygonAnn, nil, false); err != nil { + t.Fatalf("%s add: %v\n", msg, err) + } +} + +func TestLineAnnotation(t *testing.T) { + msg := "TestLineAnnotation" + + // Best viewed with Adobe Reader. + + inFile := filepath.Join(inDir, "test.pdf") + outFile := filepath.Join(samplesDir, "annotations", "LineAnnotation.pdf") + + leOpenArrow := model.LEOpenArrow + + lineAnn := model.NewLineAnnotation( + *types.NewRectangle(30, 30, 110, 110), // rect + 0, // apObjNr + "Diagonal", // contents + "IDLine", // id + "", // modDate + 0, // f + &color.DarkGray, // col + "Title1", // title + nil, // popupIndRef + nil, // ca + "", // rc + "", // subject + types.NewPoint(148.75, 140.33), // P1 + types.NewPoint(297.5, 280.66), // P2 + &leOpenArrow, // start lineEndingStyle + &leOpenArrow, // end lineEndingStyle + 50, // leader line length + 0, // leader line offset + 10, // leader line extension length + nil, // intent + nil, // measure + true, // caption + false, // caption position top + 0, // caption offset X + 0, // caption offset Y + nil, // fillCol + 1, // borderWidth + model.BSSolid) // borderStyle + + // Add line annotation. + if err := api.AddAnnotationsFile(inFile, outFile, nil, lineAnn, nil, false); err != nil { + t.Fatalf("%s add: %v\n", msg, err) + } +} + +func TestCaretAnnotation(t *testing.T) { + msg := "TestCaretAnnotation" + + // Best viewed with Adobe Reader. + + inFile := filepath.Join(inDir, "test.pdf") + outFile := filepath.Join(samplesDir, "annotations", "CaretAnnotation.pdf") + + caretAnn := model.NewCaretAnnotation( + *types.NewRectangle(30, 30, 110, 110), // rect + 0, // apObjNr + "Caret Annotation", // contents + "IDCaret", // id + "", // modDate + 0, // f, + nil, // col + 0, // borderRadX + 0, // borderRadY + 0, // borderWidth + "Title1", // title + nil, // popupIndRef + nil, // ca + "", // rc + "", // subject + types.NewRectangle(20, 20, 20, 20), // RD + true) // paragraph symbol + + // Add line annotation. + if err := api.AddAnnotationsFile(inFile, outFile, nil, caretAnn, nil, false); err != nil { + t.Fatalf("%s add: %v\n", msg, err) + } +} diff --git a/pkg/api/test/api_test.go b/pkg/api/test/api_test.go index c559cbfd..14de21c7 100644 --- a/pkg/api/test/api_test.go +++ b/pkg/api/test/api_test.go @@ -61,6 +61,10 @@ func TestMain(m *testing.M) { samplesDir = filepath.Join("..", "..", "samples") conf = api.LoadConfiguration() + if os.Getenv("GITHUB_ACTIONS") == "true" { + conf.Offline = true + } + fmt.Printf("conf.Offline: %t\n", conf.Offline) // Install test user fonts from pkg/testdata/fonts. fonts, err := userFonts(filepath.Join(inDir, "fonts")) @@ -186,6 +190,8 @@ func TestValidate(t *testing.T) { msg := "TestValidate" inFile := filepath.Join(inDir, "Acroforms2.pdf") + //log.SetDefaultStatsLogger() + // Validate inFile. if err := api.ValidateFile(inFile, nil); err != nil { t.Fatalf("%s: %v\n", msg, err) @@ -230,7 +236,7 @@ func TestInfo(t *testing.T) { } defer f.Close() - info, err := api.PDFInfo(f, inFile, nil, conf) + info, err := api.PDFInfo(f, inFile, nil, true, conf) if err != nil { t.Fatalf("%s: %v\n", msg, err) } diff --git a/pkg/api/test/attachment_test.go b/pkg/api/test/attachment_test.go index ed8ec614..449be26d 100644 --- a/pkg/api/test/attachment_test.go +++ b/pkg/api/test/attachment_test.go @@ -17,6 +17,7 @@ limitations under the License. package test import ( + "fmt" "io" "os" "path/filepath" @@ -260,3 +261,35 @@ func TestAttachmentsLowLevel(t *testing.T) { removeAttachment(t, msg, outFile, a, ctx) } + +func TestSanitizePath(t *testing.T) { + + msg := "TestSanitizePath" + + testPaths := []string{ + "", + ".", + "..", + "../..", + "foo/.", + "bar/..", + "foo/bar/.", + "foo/bar/", + "foo/./bar/..", + "foo/./bar/./..", + "foo/./bar/../.", + "foo/./bar/../..", + "foo/./bar/", + "foo/../bar/..", + "docs/report.pdf", + "../../etc/passwd", + "/etc/passwd", + "subdir/../bar//../file.txt", + } + + for _, path := range testPaths { + result := api.SanitizePath(path) + fmt.Printf("%s: %q -> %q \n", msg, path, result) + } + +} diff --git a/pkg/api/test/booklet_test.go b/pkg/api/test/booklet_test.go index 7977fd43..32861f4d 100644 --- a/pkg/api/test/booklet_test.go +++ b/pkg/api/test/booklet_test.go @@ -156,7 +156,7 @@ func TestBooklet(t *testing.T) { []string{filepath.Join(inDir, "bookletTest.pdf")}, filepath.Join(outDir, "BookletFromPDFLetter_2Up_perfectbound.pdf"), []string{"1-24"}, - "p:LetterP, g:on, btype:perfectbound", + "p:LetterP, g:on, btype:perfectbound, ma:10, bgcol:#f7e6c7", "points", 2, false, @@ -165,27 +165,18 @@ func TestBooklet(t *testing.T) { []string{filepath.Join(inDir, "bookletTest.pdf")}, filepath.Join(outDir, "BookletFromPDFLedger_6Up.pdf"), []string{"1-24"}, - "p:LedgerP, g:on", + "p:LedgerP, g:on, ma:10, bgcol:#f7e6c7", "points", 6, false, }, - {"TestBookletFromPDF_8up", - []string{filepath.Join(inDir, "bookletTest.pdf")}, - filepath.Join(outDir, "BookletFromPDFLedger_8Up.pdf"), - []string{"1-32"}, - "p:LedgerP, g:on", - "points", - 8, - false, - }, // misc orientations and booklet types on 4-up {"TestBookletFromPDF_4up_portrait_short", []string{filepath.Join(inDir, "bookletTest.pdf")}, filepath.Join(outDir, "BookletFromPDFLedger_4Up_portrait_short.pdf"), []string{"1-24"}, - "p:LedgerP, g:on, binding:short", + "p:LedgerP, g:on, binding:short, ma:10, bgcol:#f7e6c7", "points", 4, false, @@ -194,7 +185,7 @@ func TestBooklet(t *testing.T) { []string{filepath.Join(inDir, "bookletTestLandscape.pdf")}, filepath.Join(outDir, "BookletFromPDFLedger_4Up_landscape_long.pdf"), []string{"1-24"}, - "p:LedgerL, g:on", + "p:LedgerL, g:on, ma:10, bgcol:#f7e6c7", "points", 4, false, @@ -203,7 +194,7 @@ func TestBooklet(t *testing.T) { []string{filepath.Join(inDir, "bookletTestLandscape.pdf")}, filepath.Join(outDir, "BookletFromPDFLedger_4Up_landscape_short.pdf"), []string{"1-24"}, - "p:LedgerL, g:on, binding:short", + "p:LedgerL, g:on, binding:short, ma:10, bgcol:#f7e6c7", "points", 4, false, @@ -212,7 +203,7 @@ func TestBooklet(t *testing.T) { []string{filepath.Join(inDir, "bookletTest.pdf")}, filepath.Join(outDir, "BookletFromPDFLedger_4Up_portrait_long_advanced.pdf"), []string{"1-24"}, - "p:LedgerP, g:on, btype:bookletadvanced", + "p:LedgerP, g:on, btype:bookletadvanced, ma:10, bgcol:#f7e6c7", "points", 4, false, @@ -221,7 +212,7 @@ func TestBooklet(t *testing.T) { []string{filepath.Join(inDir, "bookletTestLandscape.pdf")}, filepath.Join(outDir, "BookletFromPDFLedger_4Up_landscape_short_advanced.pdf"), []string{"1-24"}, - "p:LedgerL, g:on, binding:short, btype:bookletadvanced", + "p:LedgerL, g:on, binding:short, btype:bookletadvanced, ma:10, bgcol:#f7e6c7", "points", 4, false, @@ -230,16 +221,53 @@ func TestBooklet(t *testing.T) { []string{filepath.Join(inDir, "bookletTest.pdf")}, filepath.Join(outDir, "BookletFromPDFLedger_4Up_perfectbound.pdf"), []string{"1-24"}, - "p:LedgerP, g:on, btype:perfectbound", + "p:LedgerP, g:on, btype:perfectbound, ma:10, bgcol:#f7e6c7", "points", 4, false, }, + // 8up + {"TestBookletFromPDF8Up", + []string{filepath.Join(inDir, "bookletTestA6.pdf")}, + filepath.Join(outDir, "BookletFromPDF8Up.pdf"), + nil, + "p:A3, g:on, ma:10, bgcol:#f7e6c7", + "points", + 8, + false, + }, + {"TestBookletFromPDF8UpPortraitShort", + []string{filepath.Join(inDir, "bookletTestA6.pdf")}, + filepath.Join(outDir, "BookletFromPDF8UpPortraitShort.pdf"), + nil, + "p:A3, binding:short, g:on, ma:10, bgcol:#f7e6c7", + "points", + 8, + false, + }, + {"TestBookletFromPDF8UpLandscapeLong", + []string{filepath.Join(inDir, "bookletTestA6L.pdf")}, + filepath.Join(outDir, "BookletFromPDF8UpLandscapeLong.pdf"), + nil, + "p:A3, binding:long, g:on, ma:10, bgcol:#f7e6c7", + "points", + 8, + false, + }, + {"TestBookletFromPDF8UpLandscapeShort", + []string{filepath.Join(inDir, "bookletTestA6L.pdf")}, + filepath.Join(outDir, "BookletFromPDF8UpLandscapeShort.pdf"), + nil, + "p:A3, binding:short, g:on, ma:10, bgcol:#f7e6c7", + "points", + 8, + false, + }, + // 2-up multi folio booklet from PDF on A4 using 8 sheets per folio // using the default foliosize:8 // Here we print 2 complete folios (2 x 8 sheets) + 1 partial folio - // multi folio only makes sense for n = 2 // See also https://www.instructables.com/How-to-bind-your-own-Hardback-Book/ {"TestHardbackBookFromPDF", []string{filepath.Join(inDir, "WaldenFull.pdf")}, @@ -251,8 +279,10 @@ func TestBooklet(t *testing.T) { false, }, } { - conf := model.NewDefaultConfiguration() - conf.SetUnit(tt.unit) - testBooklet(t, tt.msg, tt.inFiles, tt.outFile, tt.selectedPages, tt.desc, tt.n, tt.isImg, conf) + t.Run(tt.msg, func(subTest *testing.T) { + conf := model.NewDefaultConfiguration() + conf.SetUnit(tt.unit) + testBooklet(subTest, tt.msg, tt.inFiles, tt.outFile, tt.selectedPages, tt.desc, tt.n, tt.isImg, conf) + }) } } diff --git a/pkg/api/test/bookmark_test.go b/pkg/api/test/bookmark_test.go index 51e44272..98c243a2 100644 --- a/pkg/api/test/bookmark_test.go +++ b/pkg/api/test/bookmark_test.go @@ -66,15 +66,24 @@ func TestListBookmarks(t *testing.T) { } } -func InactiveTestAddDuplicateBookmarks(t *testing.T) { +func TestAddDuplicateBookmarks(t *testing.T) { msg := "TestAddDuplicateBookmarks" inFile := filepath.Join(inDir, "CenterOfWhy.pdf") outFile := filepath.Join("..", "..", "samples", "bookmarks", "bookmarkDuplicates.pdf") bms := []pdfcpu.Bookmark{ - {PageFrom: 2, Title: "Duplicate Name"}, - {PageFrom: 3, Title: "Duplicate Name"}, - {PageFrom: 5, Title: "Duplicate Name"}, + {PageFrom: 1, Title: "Parent1", + Kids: []pdfcpu.Bookmark{ + {PageFrom: 2, Title: "kid1"}, + {PageFrom: 3, Title: "kid2"}, + }, + }, + {PageFrom: 4, Title: "Parent2", + Kids: []pdfcpu.Bookmark{ + {PageFrom: 5, Title: "kid1"}, + {PageFrom: 6, Title: "kid2"}, + }, + }, } replace := true // Replace existing bookmarks. diff --git a/pkg/api/test/certificate_test.go b/pkg/api/test/certificate_test.go new file mode 100644 index 00000000..51e6f7ac --- /dev/null +++ b/pkg/api/test/certificate_test.go @@ -0,0 +1,34 @@ +/* + Copyright 2025 The pdfcpu Authors. + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +*/ + +package test + +import ( + "testing" + + "github.com/angel-one/pdfcpu/pkg/api" +) + +func TestListCertificates(t *testing.T) { + msg := "TestListCertificates" + + n, err := api.LoadCertificates() + if err != nil { + t.Fatalf("%s: %v\n", msg, err) + } + + t.Logf("Loaded %d certs", n) +} diff --git a/pkg/api/test/encryption_test.go b/pkg/api/test/encryption_test.go index a6e4e2ad..9478fd8f 100644 --- a/pkg/api/test/encryption_test.go +++ b/pkg/api/test/encryption_test.go @@ -189,6 +189,22 @@ func TestEncryption(t *testing.T) { } } +func TestPDF20Encryption(t *testing.T) { + // PDF 2.0 encryption assumes aes/256. + for _, fileName := range []string{ + "i277.pdf", + "imageWithBPC.pdf", + "pageLevelOutputIntent.pdf", + "SimplePDF2.0.pdf", + "utf8stringAndAnnotation.pdf", + "utf8test.pdf", + "viaIncrementalSave.pdf", + "withOffsetStart.pdf", + } { + testEncryption(t, filepath.Join("pdf20", fileName), "aes", 256) + } +} + func TestSetPermissions(t *testing.T) { msg := "TestSetPermissions" inFile := filepath.Join(inDir, "5116.DCT_Filter.pdf") diff --git a/pkg/api/test/extract_test.go b/pkg/api/test/extract_test.go index b8830ab6..6ab511fb 100644 --- a/pkg/api/test/extract_test.go +++ b/pkg/api/test/extract_test.go @@ -201,7 +201,7 @@ func TestExtractFontsLowLevel(t *testing.T) { // Extract fonts for page 1. i := 1 - ff, err := pdfcpu.ExtractPageFonts(ctx, i) + ff, err := pdfcpu.ExtractPageFonts(ctx, 1, types.IntSet{}, types.IntSet{}) if err != nil { t.Fatalf("%s extractPageFonts(%d): %v\n", msg, i, err) } @@ -227,7 +227,7 @@ func TestExtractPages(t *testing.T) { func TestExtractPagesLowLevel(t *testing.T) { msg := "TestExtractPagesLowLevel" inFile := filepath.Join(inDir, "TheGoProgrammingLanguageCh1.pdf") - outFile := filepath.Join(outDir, "MyExtractedAndProcessedSinglePage.pdf") + outFile := "MyExtractedAndProcessedSinglePage.pdf" // Create a context. ctx, err := api.ReadContextFile(inFile) @@ -237,17 +237,16 @@ func TestExtractPagesLowLevel(t *testing.T) { // Extract page 1. i := 1 - ctxNew, err := pdfcpu.ExtractPage(ctx, i) + + r, err := api.ExtractPage(ctx, i) if err != nil { t.Fatalf("%s extractPage(%d): %v\n", msg, i, err) } - // Here you can process this single page PDF context. - - // Write context to file. - if err := api.WriteContextFile(ctxNew, outFile); err != nil { - t.Fatalf("%s write: %v\n", msg, err) + if err := api.WritePage(r, outDir, outFile, i); err != nil { + t.Fatalf("%s writePage(%d): %v\n", msg, i, err) } + } func TestExtractContent(t *testing.T) { @@ -266,7 +265,7 @@ func TestExtractContentLowLevel(t *testing.T) { // Create a context. ctx, err := api.ReadContextFile(inFile) if err != nil { - t.Fatalf("%s readContext: %v\n", msg, err) + t.Fatalf("%s read context: %v\n", msg, err) } // Extract page content for page 2. diff --git a/pkg/api/test/images_test.go b/pkg/api/test/images_test.go new file mode 100644 index 00000000..6ddfa9d8 --- /dev/null +++ b/pkg/api/test/images_test.go @@ -0,0 +1,131 @@ +/* +Copyright 2024 The pdf Authors. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package test + +import ( + "path/filepath" + "testing" + + "github.com/angel-one/pdfcpu/pkg/api" +) + +func testUpdateImages(t *testing.T, msg string, inFile, imgFile, outFile string, objNr, pageNr int, id string) { + t.Helper() + + if err := api.UpdateImagesFile(inFile, imgFile, outFile, objNr, pageNr, id, conf); err != nil { + t.Fatalf("%s %s: %v\n", msg, outFile, err) + } + if err := api.ValidateFile(outFile, conf); err != nil { + t.Fatalf("%s: %v\n", msg, err) + } +} + +func TestUpdateImages(t *testing.T) { + + outDir := filepath.Join(samplesDir, "images") + inDir := outDir + + for _, tt := range []struct { + msg string + inFile string + imgFile string + outFile string + objNr int // by objNr + pageNr int // or by (pageNr, id) + id string + }{ + {"TestUpdateByObjNr", + "test.pdf", + "test_1_Im1.png", + "ImageUpdatedByObjNr.pdf", + 8, + 0, + ""}, + + {"TestUpdateByPageNrAndId", + "test.pdf", + "test_1_Im1.png", + "imageUpdatedByPageNrAndIdPage1.pdf", + 0, + 1, + "Im1"}, + + {"TestUpdateByPageNrAndId", + "test.pdf", + "test_1_Im1.png", + "imageUpdatedByPageNrAndIdPage2.pdf", + 0, + 2, + "Im1"}, + + {"TestUpdateByImageFileName", + "test.pdf", + "test_1_Im1.png", + "imageUpdatedByFileName.pdf", + 0, + 0, + ""}, + + {"TestUpdateByPageNrAndId", + "test.pdf", + "any.png", + "imageUpdatedByPageNrAndIdAny.pdf", + 0, + 1, + "Im1"}, + + {"TestUpdateByObjNrPNG", + "test.pdf", + "any.png", + "imageUpdatedByObjNrPNG.pdf", + 8, + 0, + ""}, + + {"TestUpdateByObjNrJPG", + "test.pdf", + "any.jpg", + "imageUpdatedByObjNrJPG.pdf", + 8, + 0, + ""}, + + {"TestUpdateByObjNrTIFF", + "test.pdf", + "any.tiff", + "imageUpdatedByObjNrTIFF.pdf", + 8, + 0, + ""}, + + {"TestUpdateByObjNrWEBP", + "test.pdf", + "any.webp", + "imageUpdatedByObjNrWEBP.pdf", + 8, + 0, + ""}, + } { + testUpdateImages(t, tt.msg, + filepath.Join(inDir, tt.inFile), + filepath.Join(outDir, tt.imgFile), + filepath.Join(outDir, tt.outFile), + tt.objNr, + tt.pageNr, + tt.id) + } +} diff --git a/pkg/api/test/keyword_test.go b/pkg/api/test/keyword_test.go index ebd8a840..58664b14 100644 --- a/pkg/api/test/keyword_test.go +++ b/pkg/api/test/keyword_test.go @@ -23,6 +23,7 @@ import ( "github.com/angel-one/pdfcpu/pkg/api" "github.com/angel-one/pdfcpu/pkg/pdfcpu/model" + "github.com/angel-one/pdfcpu/pkg/pdfcpu/types" ) func listKeywordsFile(t *testing.T, fileName string, conf *model.Configuration) ([]string, error) { @@ -51,8 +52,9 @@ func listKeywords(t *testing.T, msg, fileName string, want []string) []string { if len(got) != len(want) { t.Fatalf("%s: list keywords %s: want %d got %d\n", msg, fileName, len(want), len(got)) } - for i, v := range got { - if v != want[i] { + + for _, v := range got { + if !types.MemberOf(v, want) { t.Fatalf("%s: list keywords %s: want %v got %v\n", msg, fileName, want, got) } } @@ -70,19 +72,22 @@ func TestKeywords(t *testing.T) { // # of keywords must be 0 listKeywords(t, msg, fileName, nil) - keywords := []string{"Ö", "keyword2"} - + keywords := []string{"Ö", "你好"} if err := api.AddKeywordsFile(fileName, "", keywords, nil); err != nil { t.Fatalf("%s add keywords: %v\n", msg, err) } - listKeywords(t, msg, fileName, keywords) - if err := api.RemoveKeywordsFile(fileName, "", []string{"keyword2"}, nil); err != nil { - t.Fatalf("%s remove 1 keyword: %v\n", msg, err) + keywords = []string{"world"} + if err := api.AddKeywordsFile(fileName, "", keywords, nil); err != nil { + t.Fatalf("%s add keywords: %v\n", msg, err) } + listKeywords(t, msg, fileName, []string{"Ö", "你好", "world"}) - listKeywords(t, msg, fileName, []string{"Ö"}) + if err := api.RemoveKeywordsFile(fileName, "", []string{"你好"}, nil); err != nil { + t.Fatalf("%s remove 1 keyword: %v\n", msg, err) + } + listKeywords(t, msg, fileName, []string{"Ö", "world"}) if err := api.RemoveKeywordsFile(fileName, "", nil, nil); err != nil { t.Fatalf("%s remove all keywords: %v\n", msg, err) diff --git a/pkg/api/test/merge_test.go b/pkg/api/test/merge_test.go index 07425b2b..4c7edfc4 100644 --- a/pkg/api/test/merge_test.go +++ b/pkg/api/test/merge_test.go @@ -64,8 +64,8 @@ func TestMergeCreateZipped(t *testing.T) { // The actual usecase for this is the recombination of 2 PDF files representing even and odd pages of some PDF source. // See #716 - inFile2 := filepath.Join(inDir, "adobe_errata.pdf") inFile1 := filepath.Join(inDir, "Acroforms2.pdf") + inFile2 := filepath.Join(inDir, "adobe_errata.pdf") outFile := filepath.Join(outDir, "out.pdf") if err := api.MergeCreateZipFile(inFile1, inFile2, outFile, nil); err != nil { diff --git a/pkg/api/test/page_test.go b/pkg/api/test/page_test.go index a6b1338e..f8896e3a 100644 --- a/pkg/api/test/page_test.go +++ b/pkg/api/test/page_test.go @@ -21,6 +21,8 @@ import ( "testing" "github.com/angel-one/pdfcpu/pkg/api" + "github.com/angel-one/pdfcpu/pkg/pdfcpu" + "github.com/angel-one/pdfcpu/pkg/pdfcpu/types" ) func TestInsertRemovePages(t *testing.T) { @@ -34,7 +36,7 @@ func TestInsertRemovePages(t *testing.T) { } // Insert an empty page before pages 1 and 2. - if err := api.InsertPagesFile(inFile, outFile, []string{"-2"}, true, nil); err != nil { + if err := api.InsertPagesFile(inFile, outFile, []string{"-2"}, true, nil, nil); err != nil { t.Fatalf("%s %s: %v\n", msg, outFile, err) } if err := api.ValidateFile(outFile, nil); err != nil { @@ -63,3 +65,36 @@ func TestInsertRemovePages(t *testing.T) { t.Fatalf("%s %s: pageCount want:%d got:%d\n", msg, inFile, n1, n2) } } + +func TestInsertCustomBlankPage(t *testing.T) { + msg := "TestInsertCustomBlankPage" + inFile := filepath.Join(inDir, "Acroforms2.pdf") + outFile := filepath.Join(outDir, "test.pdf") + + selectedPages := []string{"2"} + + before := false + + pageConf, err := pdfcpu.ParsePageConfiguration("f:A5L", conf.Unit) + if err != nil { + t.Fatalf("%s %s: %v\n", msg, outFile, err) + } + + // Insert an empty A5 page in landscape mode after page 5. + if err := api.InsertPagesFile(inFile, outFile, selectedPages, before, pageConf, conf); err != nil { + t.Fatalf("%s %s: %v\n", msg, outFile, err) + } + + selectedPages = []string{"odd"} + + pageConf, err = pdfcpu.ParsePageConfiguration("dim:5 10", types.CENTIMETRES) + if err != nil { + t.Fatalf("%s %s: %v\n", msg, outFile, err) + } + + // Insert an empty page with dimensions 5 x 10 cm after every odd page. + if err := api.InsertPagesFile(inFile, outFile, selectedPages, before, pageConf, conf); err != nil { + t.Fatalf("%s %s: %v\n", msg, outFile, err) + } + +} diff --git a/pkg/api/test/property_test.go b/pkg/api/test/property_test.go index 3ef3b5ed..4c0851b6 100644 --- a/pkg/api/test/property_test.go +++ b/pkg/api/test/property_test.go @@ -83,18 +83,18 @@ func TestProperties(t *testing.T) { // # of properties must be 0 listProperties(t, msg, fileName, nil) - properties := map[string]string{"name1": "value1", "nameÖ": "valueö"} + properties := map[string]string{"name1": "value1", "nameÖ": "valueö", "cjkv": "你好"} if err := api.AddPropertiesFile(fileName, "", properties, nil); err != nil { t.Fatalf("%s add properties: %v\n", msg, err) } - listProperties(t, msg, fileName, []string{"name1 = value1", "nameÖ = valueö"}) + listProperties(t, msg, fileName, []string{"cjkv = 你好", "name1 = value1", "nameÖ = valueö"}) if err := api.RemovePropertiesFile(fileName, "", []string{"nameÖ"}, nil); err != nil { t.Fatalf("%s remove 1 property: %v\n", msg, err) } - listProperties(t, msg, fileName, []string{"name1 = value1"}) + listProperties(t, msg, fileName, []string{"cjkv = 你好", "name1 = value1"}) if err := api.RemovePropertiesFile(fileName, "", nil, nil); err != nil { t.Fatalf("%s remove all properties: %v\n", msg, err) diff --git a/pkg/api/test/selectPages_test.go b/pkg/api/test/selectPages_test.go index 92c4d5e2..1b367cfe 100644 --- a/pkg/api/test/selectPages_test.go +++ b/pkg/api/test/selectPages_test.go @@ -144,6 +144,7 @@ func TestSelectedPages(t *testing.T) { testSelectedPages("l,even", pageCount, "01011", t) testSelectedPages("1-l,!2-l-1", pageCount, "10001", t) + testSelectedPages("1-l,!2-l-1", pageCount, "10001", t) } func collectedPagesString(cp []int) string { @@ -188,7 +189,6 @@ func TestCollectedPages(t *testing.T) { testCollectedPages("3", pageCount, "[3]", t) testCollectedPages("4", pageCount, "[4]", t) testCollectedPages("5", pageCount, "[5]", t) - testCollectedPages("6", pageCount, "[]", t) testCollectedPages("-3", pageCount, "[1 2 3]", t) testCollectedPages("3-", pageCount, "[3 4 5]", t) @@ -201,18 +201,12 @@ func TestCollectedPages(t *testing.T) { testCollectedPages("5-7", pageCount, "[5]", t) testCollectedPages("4-", pageCount, "[4 5]", t) testCollectedPages("5-", pageCount, "[5]", t) - testCollectedPages("!4", pageCount, "[]", t) testCollectedPages("-l", pageCount, "[1 2 3 4 5]", t) testCollectedPages("-l-1", pageCount, "[1 2 3 4]", t) testCollectedPages("2-l", pageCount, "[2 3 4 5]", t) testCollectedPages("2-l-2", pageCount, "[2 3]", t) testCollectedPages("2-l-3", pageCount, "[2]", t) - testCollectedPages("2-l-4", pageCount, "[]", t) - testCollectedPages("!l", pageCount, "[]", t) - testCollectedPages("nl", pageCount, "[]", t) - testCollectedPages("!l-2", pageCount, "[]", t) - testCollectedPages("nl-2", pageCount, "[]", t) testCollectedPages("l", pageCount, "[5]", t) testCollectedPages("l-1", pageCount, "[4]", t) testCollectedPages("l-1-", pageCount, "[4 5]", t) @@ -226,6 +220,4 @@ func TestCollectedPages(t *testing.T) { testCollectedPages("1-,!l", pageCount, "[1 2 3 4]", t) testCollectedPages("1-,nl", pageCount, "[1 2 3 4]", t) - testSelectedPages("1-l,!2-l-1", pageCount, "10001", t) - } diff --git a/pkg/api/test/sign_test.go b/pkg/api/test/sign_test.go new file mode 100644 index 00000000..72316ba2 --- /dev/null +++ b/pkg/api/test/sign_test.go @@ -0,0 +1,107 @@ +/* +Copyright 2025 The pdf Authors. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package test + +import ( + "fmt" + "path/filepath" + "testing" + + "github.com/angel-one/pdfcpu/pkg/api" +) + +func logResults(ss []string) { + for _, s := range ss { + fmt.Println(s) + } +} + +func TestValidateSignature_X509_RSA_SHA1(t *testing.T) { + msg := "ValidateSignature_X509_RSA_SHA1" + + // You may provide your signed PDFs in this dir. + dir := filepath.Join(samplesDir, "signatures", "adbe.x509.rsa_sha1") + + for _, fn := range AllPDFs(t, dir) { + inFile := filepath.Join(dir, fn) + fmt.Println("\nvalidate signatures of " + inFile) + all := true + full := false + ss, err := api.ValidateSignaturesFile(inFile, all, full, conf) + if err != nil { + t.Fatalf("%s: %v\n", msg, err) + } + logResults(ss) + } +} + +func TestValidateSignature_PKCS7_SHA1(t *testing.T) { + msg := "ValidateSignature_PKCS7_SHA1" + + // You may provide your signed PDFs in this dir. + dir := filepath.Join(samplesDir, "signatures", "adbe.pkcs7.sha1") + + for _, fn := range AllPDFs(t, dir) { + inFile := filepath.Join(dir, fn) + fmt.Println("validate signatures of " + inFile) + all := true + full := false + ss, err := api.ValidateSignaturesFile(inFile, all, full, conf) + if err != nil { + t.Fatalf("%s: %v\n", msg, err) + } + logResults(ss) + } +} + +func TestValidateSignature_PKCS7_Detached(t *testing.T) { + msg := "ValidateSignature_PKCS7_Detached" + + // You may provide your signed PDFs in this dir. + dir := filepath.Join(samplesDir, "signatures", "adbe.pkcs7.detached") + + for _, fn := range AllPDFs(t, dir) { + inFile := filepath.Join(dir, fn) + fmt.Println("\nvalidate signatures of " + inFile) + all := true + full := true + ss, err := api.ValidateSignaturesFile(inFile, all, full, conf) + if err != nil { + t.Fatalf("%s: %v\n", msg, err) + } + logResults(ss) + } +} + +func TestValidateSignature_ETSI_CAdES_Detached(t *testing.T) { + msg := "ValidateSignature_ETSI_CAdES_Detached" + + // You may provide your signed PDFs in this dir. + dir := filepath.Join(samplesDir, "signatures", "ETSI.CAdES.detached") + + for _, fn := range AllPDFs(t, dir) { + inFile := filepath.Join(dir, fn) + fmt.Println("\nvalidate signatures of " + inFile) + all := true + full := true + ss, err := api.ValidateSignaturesFile(inFile, all, full, conf) + if err != nil { + t.Fatalf("%s: %v\n", msg, err) + } + logResults(ss) + } +} diff --git a/pkg/api/test/stampVersatile_test.go b/pkg/api/test/stampVersatile_test.go index 3cafe061..57f415a1 100644 --- a/pkg/api/test/stampVersatile_test.go +++ b/pkg/api/test/stampVersatile_test.go @@ -77,7 +77,7 @@ func TestAlternatingPageNumbersViaWatermarkMap(t *testing.T) { t.Fatalf("%s %s: %v\n", msg, outFile, err) } - // Add a "Draft" stamp with opacity 0.6 along the 1st diagonale in light blue using Courier. + // Add a "Draft" stamp with opacity 0.6 along the 1st diagonal in light blue using Courier. if err := api.AddTextWatermarksFile(outFile, outFile, nil, true, "Draft", "fo:Courier, scale:.9, fillcol:#00aacc, op:.6", nil); err != nil { t.Fatalf("%s %s: %v\n", msg, outFile, err) } @@ -134,7 +134,7 @@ func TestAlternatingPageNumbersViaWatermarkMapLowLevel(t *testing.T) { t.Fatalf("%s %s: %v\n", msg, outFile, err) } - // Add a "Draft" stamp with opacity 0.6 along the 1st diagonale in light blue using Courier. + // Add a "Draft" stamp with opacity 0.6 along the 1st diagonal in light blue using Courier. wm, err = api.TextWatermark("Draft", "fo:Courier, scale:.9, fillcol:#00aacc, op:.6", true, false, unit) if err != nil { t.Fatalf("%s %s: %v\n", msg, outFile, err) @@ -204,7 +204,7 @@ func TestAlternatingPageNumbersViaWatermarkSliceMap(t *testing.T) { wms = append(wms, wm) // 3rd watermark on page - // Add a "Draft" stamp with opacity 0.6 along the 1st diagonale in light blue using Courier. + // Add a "Draft" stamp with opacity 0.6 along the 1st diagonal in light blue using Courier. text = "Draft" desc = fmt.Sprintf("fo:Courier, scale:.9, fillcol:#00aacc, op:%f", opacity) wm, err = api.TextWatermark(text, desc, onTop, update, unit) diff --git a/pkg/api/test/stamp_test.go b/pkg/api/test/stamp_test.go index 99be4423..763420f0 100644 --- a/pkg/api/test/stamp_test.go +++ b/pkg/api/test/stamp_test.go @@ -540,8 +540,8 @@ func hasWatermarks(inFile string, t *testing.T) bool { return ok } -func TestStampingLifecyle(t *testing.T) { - msg := "TestStampingLifecyle" +func TestStampingLifecycle(t *testing.T) { + msg := "TestStampingLifecycle" inFile := filepath.Join(inDir, "Acroforms2.pdf") outFile := filepath.Join(outDir, "stampLC.pdf") onTop := true // we are testing stamps diff --git a/pkg/api/validate.go b/pkg/api/validate.go index 4df1c59b..8d2bd4db 100644 --- a/pkg/api/validate.go +++ b/pkg/api/validate.go @@ -23,6 +23,7 @@ import ( "time" "github.com/angel-one/pdfcpu/pkg/log" + "github.com/angel-one/pdfcpu/pkg/pdfcpu" "github.com/angel-one/pdfcpu/pkg/pdfcpu/model" "github.com/pkg/errors" ) @@ -56,6 +57,15 @@ func Validate(rs io.ReadSeeker, conf *model.Configuration) error { err = errors.Wrap(err, fmt.Sprintf("validation error (obj#:%d)%s", ctx.CurObj, s)) } + if err == nil { + if conf.Optimize { + if log.CLIEnabled() { + log.CLI.Println("optimizing...") + } + err = pdfcpu.OptimizeXRefTable(ctx) + } + } + dur2 := time.Since(from2).Seconds() dur := time.Since(from1).Seconds() diff --git a/pkg/api/viewerPreferences.go b/pkg/api/viewerPreferences.go index b43f6c4e..c3a0370d 100644 --- a/pkg/api/viewerPreferences.go +++ b/pkg/api/viewerPreferences.go @@ -48,7 +48,7 @@ func ViewerPreferences(rs io.ReadSeeker, conf *model.Configuration) (*model.View return nil, nil, err } - v := ctx.Version() + v := ctx.XRefTable.Version() return ctx.ViewerPref, &v, nil } @@ -98,7 +98,7 @@ func ListViewerPreferences(rs io.ReadSeeker, all bool, conf *model.Configuration return []string{"No viewer preferences available."}, nil } - vp1, err := model.ViewerPreferencesWithDefaults(ctx.ViewerPref, ctx.Version()) + vp1, err := model.ViewerPreferencesWithDefaults(ctx.ViewerPref, ctx.XRefTable.Version()) if err != nil { return nil, err } @@ -183,7 +183,7 @@ func SetViewerPreferences(rs io.ReadSeeker, w io.Writer, vp model.ViewerPreferen return err } - version := ctx.Version() + version := ctx.XRefTable.Version() if err := vp.Validate(version); err != nil { return err diff --git a/pkg/cli/cli.go b/pkg/cli/cli.go index c81bc003..b0b8c4b2 100644 --- a/pkg/cli/cli.go +++ b/pkg/cli/cli.go @@ -115,7 +115,7 @@ func InsertPages(cmd *Command) ([]string, error) { if cmd.Mode == model.INSERTPAGESAFTER { before = false } - return nil, api.InsertPagesFile(*cmd.InFile, *cmd.OutFile, cmd.PageSelection, before, cmd.Conf) + return nil, api.InsertPagesFile(*cmd.InFile, *cmd.OutFile, cmd.PageSelection, before, cmd.PageConf, cmd.Conf) } // RemovePages removes selected pages. @@ -185,7 +185,7 @@ func ExtractAttachments(cmd *Command) ([]string, error) { // ListInfo gathers information about inFile and returns the result as []string. func ListInfo(cmd *Command) ([]string, error) { - return ListInfoFiles(cmd.InFiles, cmd.PageSelection, cmd.BoolVal1, cmd.Conf) + return ListInfoFiles(cmd.InFiles, cmd.PageSelection, cmd.BoolVal1, cmd.BoolVal2, cmd.Conf) } // CreateCheatSheetsFonts creates single page PDF cheat sheets for user fonts in current dir. @@ -275,6 +275,24 @@ func ListImages(cmd *Command) ([]string, error) { return ListImagesFile(cmd.InFiles, cmd.PageSelection, cmd.Conf) } +// UpdateImages replaces image objects. +func UpdateImages(cmd *Command) ([]string, error) { + var ( + objNr int + pageNr int + id string + ) + if cmd.IntVal > 0 { + if cmd.StringVal != "" { + pageNr = cmd.IntVal + id = cmd.StringVal + } else { + objNr = cmd.IntVal + } + } + return nil, api.UpdateImagesFile(cmd.InFiles[0], cmd.InFiles[1], *cmd.OutFile, objNr, pageNr, id, cmd.Conf) +} + // Dump known object to stdout. func Dump(cmd *Command) ([]string, error) { mode := cmd.IntVals[0] @@ -422,3 +440,23 @@ func ResetViewerPreferences(cmd *Command) ([]string, error) { func Zoom(cmd *Command) ([]string, error) { return nil, api.ZoomFile(*cmd.InFile, *cmd.OutFile, cmd.PageSelection, cmd.Zoom, cmd.Conf) } + +// ListCertificates returns installed certificates. +func ListCertificates(cmd *Command) ([]string, error) { + return ListCertificatesAll(cmd.BoolVal1, cmd.Conf) +} + +// ListCertificates returns installed certificates. +func ImportCertificates(cmd *Command) ([]string, error) { + return api.ImportCertificates(cmd.InFiles) +} + +// InspectCertificates prints the certificate details. +func InspectCertificates(cmd *Command) ([]string, error) { + return api.InspectCertificates(cmd.InFiles) +} + +// ValidateSignatures validates contained digital signatures. +func ValidateSignatures(cmd *Command) ([]string, error) { + return api.ValidateSignaturesFile(*cmd.InFile, cmd.BoolVal1, cmd.BoolVal2, cmd.Conf) +} diff --git a/pkg/cli/cmd.go b/pkg/cli/cmd.go index e67561c4..bbb51e16 100644 --- a/pkg/cli/cmd.go +++ b/pkg/cli/cmd.go @@ -55,6 +55,7 @@ type Command struct { Zoom *model.Zoom Watermark *model.Watermark ViewerPreferences *model.ViewerPreferences + PageConf *pdfcpu.PageConfiguration Conf *model.Configuration } @@ -110,6 +111,7 @@ var cmdMap = map[model.CommandMode]func(cmd *Command) ([]string, error){ model.LISTANNOTATIONS: processPageAnnotations, model.REMOVEANNOTATIONS: processPageAnnotations, model.LISTIMAGES: processImages, + model.UPDATEIMAGES: processImages, model.DUMP: Dump, model.CREATE: Create, model.LISTFORMFIELDS: processForm, @@ -138,6 +140,10 @@ var cmdMap = map[model.CommandMode]func(cmd *Command) ([]string, error){ model.SETVIEWERPREFERENCES: processViewerPreferences, model.RESETVIEWERPREFERENCES: processViewerPreferences, model.ZOOM: Zoom, + model.LISTCERTIFICATES: processCertificates, + model.INSPECTCERTIFICATES: processCertificates, + model.IMPORTCERTIFICATES: processCertificates, + model.VALIDATESIGNATURES: processSignatures, } // ValidateCommand creates a new command to validate a file. @@ -515,7 +521,7 @@ func ImportImagesCommand(imageFiles []string, outFile string, imp *pdfcpu.Import } // InsertPagesCommand creates a new command to insert a blank page before or after selected pages. -func InsertPagesCommand(inFile, outFile string, pageSelection []string, conf *model.Configuration, mode string) *Command { +func InsertPagesCommand(inFile, outFile string, pageSelection []string, conf *model.Configuration, mode string, pageConf *pdfcpu.PageConfiguration) *Command { if conf == nil { conf = model.NewDefaultConfiguration() } @@ -529,6 +535,7 @@ func InsertPagesCommand(inFile, outFile string, pageSelection []string, conf *mo InFile: &inFile, OutFile: &outFile, PageSelection: pageSelection, + PageConf: pageConf, Conf: conf} } @@ -592,7 +599,7 @@ func BookletCommand(inFiles []string, outFile string, pageSelection []string, nu } // InfoCommand creates a new command to output information about inFile. -func InfoCommand(inFiles []string, pageSelection []string, json bool, conf *model.Configuration) *Command { +func InfoCommand(inFiles []string, pageSelection []string, fonts, json bool, conf *model.Configuration) *Command { if conf == nil { conf = model.NewDefaultConfiguration() } @@ -601,7 +608,8 @@ func InfoCommand(inFiles []string, pageSelection []string, json bool, conf *mode Mode: model.LISTINFO, InFiles: inFiles, PageSelection: pageSelection, - BoolVal1: json, + BoolVal1: fonts, + BoolVal2: json, Conf: conf} } @@ -835,6 +843,22 @@ func ListImagesCommand(inFiles []string, pageSelection []string, conf *model.Con Conf: conf} } +// UpdateImagesCommand creates a new command to update images. +func UpdateImagesCommand(inFile, imageFile, outFile string, objNrOrPageNr int, id string, conf *model.Configuration) *Command { + if conf == nil { + conf = model.NewDefaultConfiguration() + } + conf.Cmd = model.UPDATEIMAGES + + return &Command{ + Mode: model.UPDATEIMAGES, + InFiles: []string{inFile, imageFile}, + OutFile: &outFile, + IntVal: objNrOrPageNr, + StringVal: id, + Conf: conf} +} + // DumpCommand creates a new command to dump objects on stdout. func DumpCommand(inFilePDF string, vals []int, conf *model.Configuration) *Command { if conf == nil { @@ -1226,3 +1250,53 @@ func ZoomCommand(inFile, outFile string, pageSelection []string, zoom *model.Zoo Zoom: zoom, Conf: conf} } + +// ListCertificatesCommand creates a new command to list installed certificates. +func ListCertificatesCommand(json bool, conf *model.Configuration) *Command { + if conf == nil { + conf = model.NewDefaultConfiguration() + } + conf.Cmd = model.LISTCERTIFICATES + return &Command{ + Mode: model.LISTCERTIFICATES, + BoolVal1: json, + Conf: conf} +} + +// InspectCertificatesCommand creates a new command to inspect certificates. +func InspectCertificatesCommand(inFiles []string, conf *model.Configuration) *Command { + if conf == nil { + conf = model.NewDefaultConfiguration() + } + conf.Cmd = model.INSPECTCERTIFICATES + return &Command{ + Mode: model.INSPECTCERTIFICATES, + InFiles: inFiles, + Conf: conf} +} + +// ImportCertificatesCommand creates a new command to import certificates. +func ImportCertificatesCommand(inFiles []string, conf *model.Configuration) *Command { + if conf == nil { + conf = model.NewDefaultConfiguration() + } + conf.Cmd = model.IMPORTCERTIFICATES + return &Command{ + Mode: model.IMPORTCERTIFICATES, + InFiles: inFiles, + Conf: conf} +} + +// ValidateSignaturesCommand creates a new command to validate encountered digital signatures. +func ValidateSignaturesCommand(inFile string, all, full bool, conf *model.Configuration) *Command { + if conf == nil { + conf = model.NewDefaultConfiguration() + } + conf.Cmd = model.VALIDATESIGNATURES + return &Command{ + Mode: model.VALIDATESIGNATURES, + InFile: &inFile, + BoolVal1: all, + BoolVal2: full, + Conf: conf} +} diff --git a/pkg/cli/list.go b/pkg/cli/list.go index 07d08f7d..502b30d8 100644 --- a/pkg/cli/list.go +++ b/pkg/cli/list.go @@ -18,13 +18,17 @@ limitations under the License. package cli import ( + "crypto/x509" "encoding/json" + "encoding/pem" "fmt" "io" "math" "os" + "path/filepath" "sort" "strconv" + "strings" "time" "github.com/angel-one/pdfcpu/pkg/api" @@ -33,6 +37,7 @@ import ( "github.com/angel-one/pdfcpu/pkg/pdfcpu/form" "github.com/angel-one/pdfcpu/pkg/pdfcpu/model" "github.com/angel-one/pdfcpu/pkg/pdfcpu/types" + "github.com/hhrutter/pkcs7" "github.com/pkg/errors" ) @@ -211,7 +216,7 @@ func listImages(rs io.ReadSeeker, selectedPages []string, conf *model.Configurat } conf.Cmd = model.LISTIMAGES - ctx, err := api.ReadAndValidate(rs, conf) + ctx, err := api.ReadValidateAndOptimize(rs, conf) if err != nil { return nil, err } @@ -260,14 +265,14 @@ func ListImagesFile(inFiles []string, selectedPages []string, conf *model.Config } // ListInfoFile returns formatted information about inFile. -func ListInfoFile(inFile string, selectedPages []string, conf *model.Configuration) ([]string, error) { +func ListInfoFile(inFile string, selectedPages []string, fonts bool, conf *model.Configuration) ([]string, error) { f, err := os.Open(inFile) if err != nil { return nil, err } defer f.Close() - info, err := api.PDFInfo(f, inFile, selectedPages, conf) + info, err := api.PDFInfo(f, inFile, selectedPages, fonts, conf) if err != nil { return nil, err } @@ -277,7 +282,7 @@ func ListInfoFile(inFile string, selectedPages []string, conf *model.Configurati return nil, err } - ss, err := pdfcpu.ListInfo(info, pages) + ss, err := pdfcpu.ListInfo(info, pages, fonts) if err != nil { return nil, err } @@ -352,7 +357,7 @@ func jsonInfo(info *pdfcpu.PDFInfo, pages types.IntSet) (map[string]model.PageBo return nil, dims } -func listInfoFilesJSON(inFiles []string, selectedPages []string, conf *model.Configuration) ([]string, error) { +func listInfoFilesJSON(inFiles []string, selectedPages []string, fonts bool, conf *model.Configuration) ([]string, error) { var infos []*pdfcpu.PDFInfo for _, fn := range inFiles { @@ -363,7 +368,7 @@ func listInfoFilesJSON(inFiles []string, selectedPages []string, conf *model.Con } defer f.Close() - info, err := api.PDFInfo(f, fn, selectedPages, conf) + info, err := api.PDFInfo(f, fn, selectedPages, fonts, conf) if err != nil { return nil, err } @@ -395,10 +400,10 @@ func listInfoFilesJSON(inFiles []string, selectedPages []string, conf *model.Con } // ListInfoFiles returns formatted information about inFiles. -func ListInfoFiles(inFiles []string, selectedPages []string, json bool, conf *model.Configuration) ([]string, error) { +func ListInfoFiles(inFiles []string, selectedPages []string, fonts, json bool, conf *model.Configuration) ([]string, error) { if json { - return listInfoFilesJSON(inFiles, selectedPages, conf) + return listInfoFilesJSON(inFiles, selectedPages, fonts, conf) } var ss []string @@ -407,7 +412,7 @@ func ListInfoFiles(inFiles []string, selectedPages []string, json bool, conf *mo if i > 0 { ss = append(ss, "") } - ssx, err := ListInfoFile(fn, selectedPages, conf) + ssx, err := ListInfoFile(fn, selectedPages, fonts, conf) if err != nil { if len(inFiles) == 1 { return nil, err @@ -446,10 +451,6 @@ func listPermissions(rs io.ReadSeeker, conf *model.Configuration) ([]string, err return nil, err } - if ctx.Version() == model.V20 { - return nil, pdfcpu.ErrUnsupportedVersion - } - return pdfcpu.Permissions(ctx), nil } @@ -545,3 +546,128 @@ func ListBookmarksFile(inFile string, conf *model.Configuration) ([]string, erro return listBookmarks(f, conf) } + +func listPEM(fName string) (int, error) { + bb, err := os.ReadFile(fName) + if err != nil { + fmt.Printf("%v\n", err) + return 0, err + } + + if len(bb) == 0 { + //return 0, errors.Errorf("%s is empty\n", filepath.Base(fName)) + return 0, errors.New("is empty\n") + } + + ss := []string{} + for len(bb) > 0 { + var block *pem.Block + block, bb = pem.Decode(bb) + if block == nil { + break + } + if block.Type != "CERTIFICATE" || len(block.Headers) != 0 { + continue + } + + certBytes := block.Bytes + cert, err := x509.ParseCertificate(certBytes) + if err != nil { + fmt.Printf("%v\n", err) + continue + } + ss = append(ss, model.CertString(cert)) + } + + sort.Strings(ss) + for i, s := range ss { + fmt.Printf("%03d:\n%s\n", i+1, s) + } + + return len(ss), nil +} + +func listP7C(fName string) (int, error) { + bb, err := os.ReadFile(fName) + if err != nil { + fmt.Printf("%v\n", err) + return 0, err + } + + if len(bb) == 0 { + //return 0, errors.Errorf("%s is empty\n", filepath.Base(fName)) + return 0, errors.New("is empty\n") + } + + // // Check if the data starts with PEM markers (for Base64 encoding) + // if isPEM(data) { + // // If the file is Base64 encoded (PEM format), decode it + // decodedData, err := base64.StdEncoding.DecodeString(string(data)) + // if err != nil { + // log.Fatalf("Error decoding Base64: %v", err) + // } + // data = decodedData + // } + + p7, err := pkcs7.Parse(bb) + if err != nil { + return 0, err + } + + ss := []string{} + for _, cert := range p7.Certificates { + ss = append(ss, model.CertString(cert)) + } + + sort.Strings(ss) + for i, s := range ss { + fmt.Printf("%03d:\n%s\n", i+1, s) + } + + return len(ss), nil +} + +// ListCertificatesAll returns formatted information about installed certificates. +func ListCertificatesAll(json bool, conf *model.Configuration) ([]string, error) { + // Process *.pem and *.p7c + fmt.Printf("certDir: %s\n", model.CertDir) + + if err := os.MkdirAll(model.CertDir, os.ModePerm); err != nil { + return nil, err + } + + count := 0 + + err := filepath.WalkDir(model.CertDir, func(path string, d os.DirEntry, err error) error { + if err != nil { + return err + } + if d.IsDir() { + return nil + } + if !model.IsPEM(path) && !model.IsP7C(path) { + return nil + } + + fmt.Printf("\n%s:\n", strings.TrimPrefix(path, model.CertDir)) + + if model.IsPEM(path) { + c, err := listPEM(path) + if err != nil { + fmt.Printf("%v\n", err) + } + count += c + return nil + } + c, err := listP7C(path) + if err != nil { + fmt.Printf("%v\n", err) + } + count += c + return nil + }) + + fmt.Printf("total installed certs: %d\n", count) + + return nil, err +} diff --git a/pkg/cli/process.go b/pkg/cli/process.go index ed17c263..f9d65d9e 100644 --- a/pkg/cli/process.go +++ b/pkg/cli/process.go @@ -38,19 +38,6 @@ func Process(cmd *Command) (out []string, err error) { return nil, errors.Errorf("pdfcpu: process: Unknown command mode %d\n", cmd.Mode) } -func processPageAnnotations(cmd *Command) (out []string, err error) { - switch cmd.Mode { - - case model.LISTANNOTATIONS: - out, err = ListAnnotations(cmd) - - case model.REMOVEANNOTATIONS: - out, err = RemoveAnnotations(cmd) - } - - return out, err -} - func processAttachments(cmd *Command) (out []string, err error) { switch cmd.Mode { @@ -70,38 +57,23 @@ func processAttachments(cmd *Command) (out []string, err error) { return out, err } -func processKeywords(cmd *Command) (out []string, err error) { - switch cmd.Mode { - - case model.LISTKEYWORDS: - out, err = ListKeywords(cmd) - - case model.ADDKEYWORDS: - out, err = AddKeywords(cmd) - - case model.REMOVEKEYWORDS: - out, err = RemoveKeywords(cmd) - - } - - return out, err -} - -func processProperties(cmd *Command) (out []string, err error) { +func processBookmarks(cmd *Command) (out []string, err error) { switch cmd.Mode { - case model.LISTPROPERTIES: - out, err = ListProperties(cmd) + case model.LISTBOOKMARKS: + return ListBookmarks(cmd) - case model.ADDPROPERTIES: - out, err = AddProperties(cmd) + case model.EXPORTBOOKMARKS: + return ExportBookmarks(cmd) - case model.REMOVEPROPERTIES: - out, err = RemoveProperties(cmd) + case model.IMPORTBOOKMARKS: + return ImportBookmarks(cmd) + case model.REMOVEBOOKMARKS: + return RemoveBookmarks(cmd) } - return out, err + return nil, nil } func processEncryption(cmd *Command) (out []string, err error) { @@ -123,46 +95,32 @@ func processEncryption(cmd *Command) (out []string, err error) { return nil, nil } -func processPermissions(cmd *Command) (out []string, err error) { - switch cmd.Mode { - - case model.LISTPERMISSIONS: - return ListPermissions(cmd) - - case model.SETPERMISSIONS: - return SetPermissions(cmd) - } - - return nil, nil -} - -func processPages(cmd *Command) (out []string, err error) { +func processForm(cmd *Command) (out []string, err error) { switch cmd.Mode { - case model.INSERTPAGESBEFORE, model.INSERTPAGESAFTER: - return InsertPages(cmd) + case model.LISTFORMFIELDS: + return ListFormFields(cmd) - case model.REMOVEPAGES: - return RemovePages(cmd) - } + case model.REMOVEFORMFIELDS: + return RemoveFormFields(cmd) - return nil, nil -} + case model.LOCKFORMFIELDS: + return LockFormFields(cmd) -func processPageBoundaries(cmd *Command) (out []string, err error) { - switch cmd.Mode { + case model.UNLOCKFORMFIELDS: + return UnlockFormFields(cmd) - case model.LISTBOXES: - return ListBoxes(cmd) + case model.RESETFORMFIELDS: + return ResetFormFields(cmd) - case model.ADDBOXES: - return AddBoxes(cmd) + case model.EXPORTFORMFIELDS: + return ExportFormFields(cmd) - case model.REMOVEBOXES: - return RemoveBoxes(cmd) + case model.FILLFORMFIELDS: + return FillFormFields(cmd) - case model.CROP: - return Crop(cmd) + case model.MULTIFILLFORMFIELDS: + return MultiFillFormFields(cmd) } return nil, nil @@ -173,56 +131,58 @@ func processImages(cmd *Command) (out []string, err error) { case model.LISTIMAGES: return ListImages(cmd) + + case model.UPDATEIMAGES: + return UpdateImages(cmd) } return nil, nil } -func processForm(cmd *Command) (out []string, err error) { +func processKeywords(cmd *Command) (out []string, err error) { switch cmd.Mode { - case model.LISTFORMFIELDS: - return ListFormFields(cmd) + case model.LISTKEYWORDS: + out, err = ListKeywords(cmd) - case model.REMOVEFORMFIELDS: - return RemoveFormFields(cmd) + case model.ADDKEYWORDS: + out, err = AddKeywords(cmd) - case model.LOCKFORMFIELDS: - return LockFormFields(cmd) + case model.REMOVEKEYWORDS: + out, err = RemoveKeywords(cmd) - case model.UNLOCKFORMFIELDS: - return UnlockFormFields(cmd) + } - case model.RESETFORMFIELDS: - return ResetFormFields(cmd) + return out, err +} - case model.EXPORTFORMFIELDS: - return ExportFormFields(cmd) +func processPageAnnotations(cmd *Command) (out []string, err error) { + switch cmd.Mode { - case model.FILLFORMFIELDS: - return FillFormFields(cmd) + case model.LISTANNOTATIONS: + out, err = ListAnnotations(cmd) - case model.MULTIFILLFORMFIELDS: - return MultiFillFormFields(cmd) + case model.REMOVEANNOTATIONS: + out, err = RemoveAnnotations(cmd) } - return nil, nil + return out, err } -func processBookmarks(cmd *Command) (out []string, err error) { +func processPageBoundaries(cmd *Command) (out []string, err error) { switch cmd.Mode { - case model.LISTBOOKMARKS: - return ListBookmarks(cmd) + case model.LISTBOXES: + return ListBoxes(cmd) - case model.EXPORTBOOKMARKS: - return ExportBookmarks(cmd) + case model.ADDBOXES: + return AddBoxes(cmd) - case model.IMPORTBOOKMARKS: - return ImportBookmarks(cmd) + case model.REMOVEBOXES: + return RemoveBoxes(cmd) - case model.REMOVEBOOKMARKS: - return RemoveBookmarks(cmd) + case model.CROP: + return Crop(cmd) } return nil, nil @@ -260,6 +220,49 @@ func processPageMode(cmd *Command) (out []string, err error) { return nil, nil } +func processPages(cmd *Command) (out []string, err error) { + switch cmd.Mode { + + case model.INSERTPAGESBEFORE, model.INSERTPAGESAFTER: + return InsertPages(cmd) + + case model.REMOVEPAGES: + return RemovePages(cmd) + } + + return nil, nil +} + +func processPermissions(cmd *Command) (out []string, err error) { + switch cmd.Mode { + + case model.LISTPERMISSIONS: + return ListPermissions(cmd) + + case model.SETPERMISSIONS: + return SetPermissions(cmd) + } + + return nil, nil +} + +func processProperties(cmd *Command) (out []string, err error) { + switch cmd.Mode { + + case model.LISTPROPERTIES: + out, err = ListProperties(cmd) + + case model.ADDPROPERTIES: + out, err = AddProperties(cmd) + + case model.REMOVEPROPERTIES: + out, err = RemoveProperties(cmd) + + } + + return out, err +} + func processViewerPreferences(cmd *Command) (out []string, err error) { switch cmd.Mode { @@ -275,3 +278,30 @@ func processViewerPreferences(cmd *Command) (out []string, err error) { return nil, nil } + +func processCertificates(cmd *Command) (out []string, err error) { + switch cmd.Mode { + + case model.LISTCERTIFICATES: + return ListCertificates(cmd) + + case model.INSPECTCERTIFICATES: + return InspectCertificates(cmd) + + case model.IMPORTCERTIFICATES: + return ImportCertificates(cmd) + + } + + return nil, nil +} + +func processSignatures(cmd *Command) (out []string, err error) { + switch cmd.Mode { + + case model.VALIDATESIGNATURES: + return ValidateSignatures(cmd) + } + + return nil, nil +} diff --git a/pkg/cli/test/certificate_test.go b/pkg/cli/test/certificate_test.go new file mode 100644 index 00000000..25ac462f --- /dev/null +++ b/pkg/cli/test/certificate_test.go @@ -0,0 +1,32 @@ +/* +Copyright 2025 The pdfcpu Authors. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package test + +import ( + "testing" + + "github.com/angel-one/pdfcpu/pkg/cli" +) + +func TestListCertificates(t *testing.T) { + msg := "TestListCertificates" + + cmd := cli.ListCertificatesCommand(false, conf) + if _, err := cli.Process(cmd); err != nil { + t.Fatalf("%s: %v\n", msg, err) + } +} diff --git a/pkg/cli/test/cli_test.go b/pkg/cli/test/cli_test.go index 543d88d0..84ced87b 100644 --- a/pkg/cli/test/cli_test.go +++ b/pkg/cli/test/cli_test.go @@ -153,7 +153,7 @@ func TestInfoCommand(t *testing.T) { msg := "TestInfoCommand" inFile := filepath.Join(inDir, "5116.DCT_Filter.pdf") - cmd := cli.InfoCommand([]string{inFile}, nil, true, conf) + cmd := cli.InfoCommand([]string{inFile}, nil, true, true, conf) if _, err := cli.Process(cmd); err != nil { t.Fatalf("%s: %v\n", msg, err) } diff --git a/pkg/cli/test/images_test.go b/pkg/cli/test/images_test.go new file mode 100644 index 00000000..4150994e --- /dev/null +++ b/pkg/cli/test/images_test.go @@ -0,0 +1,120 @@ +/* +Copyright 2024 The pdfcpu Authors. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package test + +import ( + "path/filepath" + "testing" + + "github.com/angel-one/pdfcpu/pkg/cli" +) + +func testUpdateImages(t *testing.T, msg string, inFile, imgFile, outFile string, objNrOrPageNr int, id string) { + t.Helper() + + cmd := cli.UpdateImagesCommand(inFile, imgFile, outFile, objNrOrPageNr, id, conf) + if _, err := cli.Process(cmd); err != nil { + t.Fatalf("%s %s: %v\n", msg, inFile, err) + } + + if err := validateFile(t, outFile, conf); err != nil { + t.Fatalf("%s: %v\n", msg, err) + } +} + +func TestUpdateImages(t *testing.T) { + inDir := filepath.Join(samplesDir, "images") + + for _, tt := range []struct { + msg string + inFile string + imgFile string + outFile string + objNrOrPageNr int + id string + }{ + {"TestUpdateByObjNr", + "test.pdf", + "test_1_Im1.png", + "ImageUpdatedByObjNr.pdf", + 8, + ""}, + + {"TestUpdateByPageNrAndId", + "test.pdf", + "test_1_Im1.png", + "imageUpdatedByPageNrAndIdPage1.pdf", + 1, + "Im1"}, + + {"TestUpdateByPageNrAndId", + "test.pdf", + "test_1_Im1.png", + "imageUpdatedByPageNrAndIdPage2.pdf", + 2, + "Im1"}, + + {"TestUpdateByImageFileName", + "test.pdf", + "test_1_Im1.png", + "imageUpdatedByFileName.pdf", + 0, + ""}, + + {"TestUpdateByPageNrAndId", + "test.pdf", + "any.png", + "imageUpdatedByPageNrAndIdAny.pdf", + 1, + "Im1"}, + + {"TestUpdateByObjNrPNG", + "test.pdf", + "any.png", + "imageUpdatedByObjNrPNG.pdf", + 8, + ""}, + + {"TestUpdateByObjNrJPG", + "test.pdf", + "any.jpg", + "imageUpdatedByObjNrJPG.pdf", + 8, + ""}, + + {"TestUpdateByObjNrTIFF", + "test.pdf", + "any.tiff", + "imageUpdatedByObjNrTIFF.pdf", + 8, + ""}, + + {"TestUpdateByObjNrWEBP", + "test.pdf", + "any.webp", + "imageUpdatedByObjNrWEBP.pdf", + 8, + ""}, + } { + testUpdateImages(t, tt.msg, + filepath.Join(inDir, tt.inFile), + filepath.Join(inDir, tt.imgFile), + filepath.Join(outDir, tt.outFile), + tt.objNrOrPageNr, + tt.id) + } +} diff --git a/pkg/cli/test/keyword_test.go b/pkg/cli/test/keyword_test.go index ce8fc799..0a0c83c9 100644 --- a/pkg/cli/test/keyword_test.go +++ b/pkg/cli/test/keyword_test.go @@ -21,6 +21,7 @@ import ( "testing" "github.com/angel-one/pdfcpu/pkg/cli" + "github.com/angel-one/pdfcpu/pkg/pdfcpu/types" ) func listKeywords(t *testing.T, msg, fileName string, want []string) []string { @@ -33,8 +34,9 @@ func listKeywords(t *testing.T, msg, fileName string, want []string) []string { if len(got) != len(want) { t.Fatalf("%s: list keywords %s: want %d got %d\n", msg, fileName, len(want), len(got)) } - for i, v := range got { - if v != want[i] { + + for _, v := range got { + if !types.MemberOf(v, want) { t.Fatalf("%s: list keywords %s: want %v got %v\n", msg, fileName, want, got) } } diff --git a/pkg/cli/test/nup_test.go b/pkg/cli/test/nup_test.go index a769b65f..c2ce1749 100644 --- a/pkg/cli/test/nup_test.go +++ b/pkg/cli/test/nup_test.go @@ -90,7 +90,7 @@ func TestNUpCommand(t *testing.T) { }, filepath.Join(outDir, "out1.pdf"), nil, - "form:Tabloid, bo:off, ma:0", + "form:Tabloid, bo:off, ma:0, enforce:off", "points", 6, true}, diff --git a/pkg/cli/test/page_test.go b/pkg/cli/test/page_test.go index 796eac05..5d777de1 100644 --- a/pkg/cli/test/page_test.go +++ b/pkg/cli/test/page_test.go @@ -35,7 +35,7 @@ func TestPagesCommand(t *testing.T) { } // Insert an empty page before pages 1 and 2. - cmd := cli.InsertPagesCommand(inFile, outFile, []string{"-2"}, conf, "before") + cmd := cli.InsertPagesCommand(inFile, outFile, []string{"-2"}, conf, "before", nil) if _, err := cli.Process(cmd); err != nil { t.Fatalf("%s %s: %v\n", msg, outFile, err) } diff --git a/pkg/cli/test/stamp_test.go b/pkg/cli/test/stamp_test.go index 713b4fe1..65f67c67 100644 --- a/pkg/cli/test/stamp_test.go +++ b/pkg/cli/test/stamp_test.go @@ -144,8 +144,8 @@ func TestAddWatermarks(t *testing.T) { } } -func TestStampingLifecyle(t *testing.T) { - msg := "TestStampingLifecyle" +func TestStampingLifecycle(t *testing.T) { + msg := "TestStampingLifecycle" inFile := filepath.Join(inDir, "Acroforms2.pdf") outFile := filepath.Join(outDir, "stampLC.pdf") onTop := true // we are testing stamps diff --git a/pkg/filter/ascii85Decode.go b/pkg/filter/ascii85Decode.go index af1d883f..3f195e07 100644 --- a/pkg/filter/ascii85Decode.go +++ b/pkg/filter/ascii85Decode.go @@ -48,6 +48,10 @@ func (f ascii85Decode) Encode(r io.Reader) (io.Reader, error) { // Decode implements decoding for an ASCII85Decode filter. func (f ascii85Decode) Decode(r io.Reader) (io.Reader, error) { + return f.DecodeLength(r, -1) +} + +func (f ascii85Decode) DecodeLength(r io.Reader, maxLen int64) (io.Reader, error) { bb, err := getReaderBytes(r) if err != nil { @@ -71,8 +75,14 @@ func (f ascii85Decode) Decode(r io.Reader) (io.Reader, error) { decoder := ascii85.NewDecoder(bytes.NewReader(bb)) var b2 bytes.Buffer - if _, err := io.Copy(&b2, decoder); err != nil { - return nil, err + if maxLen < 0 { + if _, err := io.Copy(&b2, decoder); err != nil { + return nil, err + } + } else { + if _, err := io.CopyN(&b2, decoder, maxLen); err != nil { + return nil, err + } } return &b2, nil diff --git a/pkg/filter/asciiHexDecode.go b/pkg/filter/asciiHexDecode.go index 04a8bea1..5d0945a2 100644 --- a/pkg/filter/asciiHexDecode.go +++ b/pkg/filter/asciiHexDecode.go @@ -47,7 +47,10 @@ func (f asciiHexDecode) Encode(r io.Reader) (io.Reader, error) { // Decode implements decoding for an ASCIIHexDecode filter. func (f asciiHexDecode) Decode(r io.Reader) (io.Reader, error) { + return f.DecodeLength(r, -1) +} +func (f asciiHexDecode) DecodeLength(r io.Reader, maxLen int64) (io.Reader, error) { bb, err := getReaderBytes(r) if err != nil { return nil, err @@ -70,9 +73,12 @@ func (f asciiHexDecode) Decode(r io.Reader) (io.Reader, error) { p = append(p, '0') } - dst := make([]byte, hex.DecodedLen(len(p))) + if maxLen < 0 { + maxLen = int64(hex.DecodedLen(len(p))) + } + dst := make([]byte, maxLen) - if _, err := hex.Decode(dst, p); err != nil { + if _, err := hex.Decode(dst, p[:maxLen*2]); err != nil { return nil, err } diff --git a/pkg/filter/ccittDecode.go b/pkg/filter/ccittDecode.go index 1b5793e0..964c8c7c 100644 --- a/pkg/filter/ccittDecode.go +++ b/pkg/filter/ccittDecode.go @@ -37,6 +37,10 @@ func (f ccittDecode) Encode(r io.Reader) (io.Reader, error) { // Decode implements decoding for a CCITTDecode filter. func (f ccittDecode) Decode(r io.Reader) (io.Reader, error) { + return f.DecodeLength(r, -1) +} + +func (f ccittDecode) DecodeLength(r io.Reader, maxLen int64) (io.Reader, error) { if log.TraceEnabled() { log.Trace.Println("DecodeCCITT begin") } diff --git a/pkg/filter/dctDecode.go b/pkg/filter/dctDecode.go index db517337..593c3ccc 100644 --- a/pkg/filter/dctDecode.go +++ b/pkg/filter/dctDecode.go @@ -35,7 +35,10 @@ func (f dctDecode) Encode(r io.Reader) (io.Reader, error) { // Decode implements decoding for a DCTDecode filter. func (f dctDecode) Decode(r io.Reader) (io.Reader, error) { + return f.DecodeLength(r, -1) +} +func (f dctDecode) DecodeLength(r io.Reader, maxLen int64) (io.Reader, error) { im, err := jpeg.Decode(r) if err != nil { return nil, err diff --git a/pkg/filter/filter.go b/pkg/filter/filter.go index 9f5cd860..7b1afdb5 100644 --- a/pkg/filter/filter.go +++ b/pkg/filter/filter.go @@ -45,6 +45,10 @@ var ErrUnsupportedFilter = errors.New("pdfcpu: filter not supported") type Filter interface { Encode(r io.Reader) (io.Reader, error) Decode(r io.Reader) (io.Reader, error) + // DecodeLength will decode at least maxLen bytes. For filters where decoding + // parts doesn't make sense (e.g. DCT), the whole stream is decoded. + // If maxLen < 0 is passed, the whole stream is decoded. + DecodeLength(r io.Reader, maxLen int64) (io.Reader, error) } // NewFilter returns a filter for given filterName and an optional parameter dictionary. @@ -100,6 +104,10 @@ type baseFilter struct { parms map[string]int } +func SupportsDecodeParms(f string) bool { + return f == CCITTFax || f == LZW || f == Flate +} + func getReaderBytes(r io.Reader) ([]byte, error) { var bb []byte if buf, ok := r.(*bytes.Buffer); ok { diff --git a/pkg/filter/flateDecode.go b/pkg/filter/flateDecode.go index 2a75ffdd..c11c9b2b 100644 --- a/pkg/filter/flateDecode.go +++ b/pkg/filter/flateDecode.go @@ -20,6 +20,7 @@ import ( "bytes" "compress/zlib" "io" + "strings" "github.com/angel-one/pdfcpu/pkg/log" "github.com/pkg/errors" @@ -81,6 +82,10 @@ func (f flate) Encode(r io.Reader) (io.Reader, error) { // Decode implements decoding for a Flate filter. func (f flate) Decode(r io.Reader) (io.Reader, error) { + return f.DecodeLength(r, -1) +} + +func (f flate) DecodeLength(r io.Reader, maxLen int64) (io.Reader, error) { if log.TraceEnabled() { log.Trace.Println("DecodeFlate begin") } @@ -92,12 +97,23 @@ func (f flate) Decode(r io.Reader) (io.Reader, error) { defer rc.Close() // Optional decode parameters need postprocessing. - return f.decodePostProcess(rc) + return f.decodePostProcess(rc, maxLen) } -func passThru(rin io.Reader) (*bytes.Buffer, error) { +func passThru(rin io.Reader, maxLen int64) (*bytes.Buffer, error) { var b bytes.Buffer - _, err := io.Copy(&b, rin) + var err error + if maxLen < 0 { + _, err = io.Copy(&b, rin) + } else { + _, err = io.CopyN(&b, rin, maxLen) + } + if err != nil && strings.Contains(err.Error(), "invalid checksum") { + if log.CLIEnabled() { + log.CLI.Println("skipped: truncated zlib stream") + } + err = nil + } if err == io.ErrUnexpectedEOF { // Workaround for missing support for partial flush in compress/flate. // See also https://github.com/golang/go/issues/31514 @@ -258,11 +274,26 @@ func (f flate) parameters() (colors, bpc, columns int, err error) { return colors, bpc, columns, nil } +func checkBufLen(b bytes.Buffer, maxLen int64) bool { + return maxLen < 0 || int64(b.Len()) < maxLen +} + +func process(w io.Writer, pr, cr []byte, predictor, colors, bytesPerPixel int) error { + d, err := processRow(pr, cr, predictor, colors, bytesPerPixel) + if err != nil { + return err + } + + _, err = w.Write(d) + + return err +} + // decodePostProcess -func (f flate) decodePostProcess(r io.Reader) (io.Reader, error) { +func (f flate) decodePostProcess(r io.Reader, maxLen int64) (io.Reader, error) { predictor, found := f.parms["Predictor"] if !found || predictor == PredictorNo { - return passThru(r) + return passThru(r, maxLen) } if !intMemberOf( @@ -299,7 +330,7 @@ func (f flate) decodePostProcess(r io.Reader) (io.Reader, error) { // Output buffer var b bytes.Buffer - for { + for checkBufLen(b, maxLen) { // Read decompressed bytes for one pixel row. n, err := io.ReadFull(r, cr) @@ -317,14 +348,8 @@ func (f flate) decodePostProcess(r io.Reader) (io.Reader, error) { return nil, errors.Errorf("pdfcpu: filter FlateDecode: read error, expected %d bytes, got: %d", m, n) } - d, err1 := processRow(pr, cr, predictor, colors, bytesPerPixel) - if err1 != nil { - return nil, err1 - } - - _, err1 = b.Write(d) - if err1 != nil { - return nil, err1 + if err := process(&b, pr, cr, predictor, colors, bytesPerPixel); err != nil { + return nil, err } if err == io.EOF { @@ -334,7 +359,7 @@ func (f flate) decodePostProcess(r io.Reader) (io.Reader, error) { pr, cr = cr, pr } - if b.Len()%rowSize > 0 { + if maxLen < 0 && b.Len()%rowSize > 0 { log.Info.Printf("failed postprocessing: %d %d\n", b.Len(), rowSize) return nil, errors.New("pdfcpu: filter FlateDecode: postprocessing failed") } diff --git a/pkg/filter/lzwDecode.go b/pkg/filter/lzwDecode.go index 407a0344..76886a11 100644 --- a/pkg/filter/lzwDecode.go +++ b/pkg/filter/lzwDecode.go @@ -59,6 +59,10 @@ func (f lzwDecode) Encode(r io.Reader) (io.Reader, error) { // Decode implements decoding for an LZWDecode filter. func (f lzwDecode) Decode(r io.Reader) (io.Reader, error) { + return f.DecodeLength(r, -1) +} + +func (f lzwDecode) DecodeLength(r io.Reader, maxLen int64) (io.Reader, error) { if log.TraceEnabled() { log.Trace.Println("DecodeLZW begin") } @@ -77,7 +81,13 @@ func (f lzwDecode) Decode(r io.Reader) (io.Reader, error) { defer rc.Close() var b bytes.Buffer - written, err := io.Copy(&b, rc) + var written int64 + var err error + if maxLen < 0 { + written, err = io.Copy(&b, rc) + } else { + written, err = io.CopyN(&b, rc, maxLen) + } if err != nil { return nil, err } diff --git a/pkg/filter/runLengthDecode.go b/pkg/filter/runLengthDecode.go index 006653ae..2cbb0bcf 100644 --- a/pkg/filter/runLengthDecode.go +++ b/pkg/filter/runLengthDecode.go @@ -25,7 +25,8 @@ type runLengthDecode struct { baseFilter } -func (f runLengthDecode) decode(w io.ByteWriter, src []byte) { +func (f runLengthDecode) decode(w io.ByteWriter, src []byte, maxLen int64) { + var written int64 for i := 0; i < len(src); { b := src[i] @@ -37,14 +38,24 @@ func (f runLengthDecode) decode(w io.ByteWriter, src []byte) { if b < 0x80 { c := int(b) + 1 for j := 0; j < c; j++ { + if maxLen >= 0 && maxLen == written { + break + } + w.WriteByte(src[i]) + written++ i++ } continue } c := 257 - int(b) for j := 0; j < c; j++ { + if maxLen >= 0 && maxLen == written { + break + } + w.WriteByte(src[i]) + written++ } i++ } @@ -125,6 +136,10 @@ func (f runLengthDecode) Encode(r io.Reader) (io.Reader, error) { // Decode implements decoding for an RunLengthDecode filter. func (f runLengthDecode) Decode(r io.Reader) (io.Reader, error) { + return f.DecodeLength(r, -1) +} + +func (f runLengthDecode) DecodeLength(r io.Reader, maxLen int64) (io.Reader, error) { b1, err := getReaderBytes(r) if err != nil { @@ -132,7 +147,7 @@ func (f runLengthDecode) Decode(r io.Reader) (io.Reader, error) { } var b2 bytes.Buffer - f.decode(&b2, b1) + f.decode(&b2, b1, maxLen) return &b2, nil } diff --git a/pkg/filter/runLengthDecode_test.go b/pkg/filter/runLengthDecode_test.go index 9c674347..1008cecc 100644 --- a/pkg/filter/runLengthDecode_test.go +++ b/pkg/filter/runLengthDecode_test.go @@ -71,7 +71,7 @@ func TestRunLengthEncoding(t *testing.T) { compare(t, enc.Bytes(), []byte(tt.enc)) var raw bytes.Buffer - f.decode(&raw, enc.Bytes()) + f.decode(&raw, enc.Bytes(), -1) compare(t, raw.Bytes(), []byte(tt.raw)) } diff --git a/pkg/font/metrics.go b/pkg/font/metrics.go index b5d5952e..109e7add 100644 --- a/pkg/font/metrics.go +++ b/pkg/font/metrics.go @@ -23,6 +23,7 @@ import ( "os" "path" "path/filepath" + "runtime/debug" "strconv" "strings" "sync" @@ -116,7 +117,7 @@ func (fd TTFLight) unicodeRangeBits(id string) []int { // Returns a slice of relevant unicodeRangeBits. // // This mapping is incomplete as we only cover unicode blocks of the most popular scripts. - // Please go to https://github.com/pdfcpu/pdfcpu/issues/new/choose for an extension request. + // Please go to https://github.com/angel-one/pdfcpu/issues/new/choose for an extension request. // // 0 Basic Latin 0000-007F // 1 Latin-1 Supplement 0080-00FF @@ -277,6 +278,7 @@ func CharWidth(fontName string, r rune) int { ttf, ok := UserFontMetrics[fontName] if !ok { fmt.Fprintf(os.Stderr, "pdfcpu: user font not loaded: %s\n", fontName) + debug.PrintStack() os.Exit(1) } diff --git a/pkg/pdfcpu/annotation.go b/pkg/pdfcpu/annotation.go index 8606e0ef..ecffc8b6 100644 --- a/pkg/pdfcpu/annotation.go +++ b/pkg/pdfcpu/annotation.go @@ -23,6 +23,7 @@ import ( "strings" "github.com/angel-one/pdfcpu/pkg/log" + "github.com/angel-one/pdfcpu/pkg/pdfcpu/draw" "github.com/angel-one/pdfcpu/pkg/pdfcpu/model" "github.com/angel-one/pdfcpu/pkg/pdfcpu/types" "github.com/pkg/errors" @@ -122,18 +123,47 @@ func findAnnotByObjNr(objNr int, annots types.Array) (int, error) { return -1, nil } -func createAnnot(ctx *model.Context, ar model.AnnotationRenderer, pageIndRef *types.IndirectRef) (*types.IndirectRef, error) { - d, err := ar.RenderDict(ctx.XRefTable, *pageIndRef) +func createAnnot(ctx *model.Context, ar model.AnnotationRenderer, pageIndRef *types.IndirectRef) (*types.IndirectRef, types.Dict, error) { + d, err := ar.RenderDict(ctx.XRefTable, pageIndRef) if err != nil { - return nil, err + return nil, nil, err + } + indRef, err := ctx.IndRefForNewObject(d) + if err != nil { + return nil, nil, err } - return ctx.IndRefForNewObject(d) + return indRef, d, nil +} + +func linkAnnotation(xRefTable *model.XRefTable, d types.Dict, r *types.Rectangle, apObjNr int, contents, nm string, f model.AnnotationFlags) (model.AnnotationRenderer, error) { + var uri string + o, found := d.Find("A") + if found && o != nil { + d, err := xRefTable.DereferenceDict(o) + if err != nil { + if xRefTable.ValidationMode == model.ValidationStrict { + return nil, err + } + model.ShowSkipped("invalid link annotation entry \"A\"") + + } + if d != nil { + bb, err := xRefTable.DereferenceStringEntryBytes(d, "URI") + if err != nil { + return nil, err + } + if len(bb) > 0 { + uri = string(bb) + } + } + } + dest := (*model.Destination)(nil) // will not collect link dest during validation. + return model.NewLinkAnnotation(*r, apObjNr, contents, nm, "", f, nil, dest, uri, nil, false, 0, model.BSSolid), nil } // Annotation returns an annotation renderer. // Validation sets up a cache of annotation renderers. func Annotation(xRefTable *model.XRefTable, d types.Dict) (model.AnnotationRenderer, error) { - subtype := d.NameEntry("Subtype") o, _ := d.Find("Rect") @@ -142,16 +172,31 @@ func Annotation(xRefTable *model.XRefTable, d types.Dict) (model.AnnotationRende return nil, err } - r, err := xRefTable.RectForArray(arr) - if err != nil { - return nil, err + var r *types.Rectangle + + if len(arr) == 4 { + r, err = xRefTable.RectForArray(arr) + if err != nil { + return nil, err + } + } else if xRefTable.ValidationMode == model.ValidationRelaxed { + r = types.NewRectangle(0, 0, 0, 0) } - bb, err := d.StringEntryBytes("Contents") - if err != nil { - return nil, err + var apObjNr int + indRef := d.IndirectRefEntry("AP") + if indRef != nil { + apObjNr = indRef.ObjectNumber.Value() + } + + contents := "" + if c, ok := d["Contents"]; ok { + contents, err = xRefTable.DereferenceStringOrHexLiteral(c, model.V10, nil) + if err != nil { + return nil, err + } + contents = types.RemoveControlChars(contents) } - contents := string(bb) var nm string s := d.StringEntry("NM") // This is what pdfcpu refers to as the annotation id. @@ -170,36 +215,24 @@ func Annotation(xRefTable *model.XRefTable, d types.Dict) (model.AnnotationRende switch *subtype { case "Text": - ann = model.NewTextAnnotation(*r, contents, nm, "", f, nil, nil, "", "", true, "") + popupIndRef := d.IndirectRefEntry("Popup") + ann = model.NewTextAnnotation(*r, apObjNr, contents, nm, "", f, nil, "", popupIndRef, nil, "", "", 0, 0, 0, true, "") case "Link": - var uri string - o, found := d.Find("A") - if found && o != nil { - d, err := xRefTable.DereferenceDict(o) - if err != nil { - return nil, err - } - - bb, err := xRefTable.DereferenceStringEntryBytes(d, "URI") - if err != nil { - return nil, err - } - if len(bb) > 0 { - uri = string(bb) - } + ann, err = linkAnnotation(xRefTable, d, r, apObjNr, contents, nm, f) + if err != nil { + return nil, err } - dest := (*model.Destination)(nil) // will not collect link dest during validation. - ann = model.NewLinkAnnotation(*r, nil, dest, uri, nm, f, 0, model.BSSolid, nil, false) case "Popup": parentIndRef := d.IndirectRefEntry("Parent") - ann = model.NewPopupAnnotation(*r, nil, contents, nm, f, nil, parentIndRef) + ann = model.NewPopupAnnotation(*r, apObjNr, contents, nm, "", f, nil, 0, 0, 0, parentIndRef, false) // TODO handle remaining annotation types. default: - ann = model.NewAnnotationForRawType(*subtype, *r, contents, nil, nm, f, nil) + ann = model.NewAnnotationForRawType(*subtype, *r, apObjNr, contents, nm, "", f, nil, 0, 0, 0) + } return ann, nil @@ -234,6 +267,56 @@ func AnnotationsForSelectedPages(ctx *model.Context, selectedPages types.IntSet) return m } +func prepareHeader(horSep *[]int, maxLen *AnnotListMaxLengths, customAnnot bool) string { + s := " Obj# " + if maxLen.ObjNr > 4 { + s += strings.Repeat(" ", maxLen.ObjNr-4) + *horSep = append(*horSep, 10+maxLen.ObjNr-4) + } else { + *horSep = append(*horSep, 10) + } + + s += draw.VBar + " Id " + if maxLen.ID > 2 { + s += strings.Repeat(" ", maxLen.ID-2) + *horSep = append(*horSep, 4+maxLen.ID-2) + } else { + *horSep = append(*horSep, 4) + } + + s += draw.VBar + " Rect " + if maxLen.Rect > 4 { + s += strings.Repeat(" ", maxLen.Rect-4) + *horSep = append(*horSep, 6+maxLen.Rect-4) + } else { + *horSep = append(*horSep, 6) + } + + s += draw.VBar + " Content" + if maxLen.Content > 7 { + s += strings.Repeat(" ", maxLen.Content-7) + *horSep = append(*horSep, 8+maxLen.Content-7) + } else { + *horSep = append(*horSep, 8) + } + + if customAnnot { + s += draw.VBar + " Type" + if maxLen.Type > 4 { + s += strings.Repeat(" ", maxLen.Type-4) + *horSep = append(*horSep, 5+maxLen.Type-4) + } else { + *horSep = append(*horSep, 5) + } + } + + return s +} + +type AnnotListMaxLengths struct { + ObjNr, ID, Rect, Content, Type int +} + // ListAnnotations returns a formatted list of annotations. func ListAnnotations(annots map[int]model.PgAnnots) (int, []string, error) { var ( @@ -262,37 +345,73 @@ func ListAnnotations(annots map[int]model.PgAnnots) (int, []string, error) { for _, annType := range annTypes { annots := pageAnnots[model.AnnotTypes[annType]] - var ( - maxLenRect int - maxLenContent int - ) - maxLenID := 2 + + var maxLen AnnotListMaxLengths + maxLen.ID = 2 + maxLen.Content = len("Content") + maxLen.Type = len("Type") + var objNrs []int for objNr, ann := range annots.Map { objNrs = append(objNrs, objNr) - if len(ann.RectString()) > maxLenRect { - maxLenRect = len(ann.RectString()) + s := strconv.Itoa(objNr) + if len(s) > maxLen.ObjNr { + maxLen.ObjNr = len(s) } - if len(ann.ID()) > maxLenID { - maxLenID = len(ann.ID()) + if len(ann.RectString()) > maxLen.Rect { + maxLen.Rect = len(ann.RectString()) } - if len(ann.ContentString()) > maxLenContent { - maxLenContent = len(ann.ContentString()) + if len(ann.ID()) > maxLen.ID { + maxLen.ID = len(ann.ID()) + } + if len(ann.ContentString()) > maxLen.Content { + maxLen.Content = len(ann.ContentString()) + } + if len(ann.CustomTypeString()) > maxLen.Type { + maxLen.Type = len(ann.CustomTypeString()) } } sort.Ints(objNrs) ss = append(ss, "") ss = append(ss, fmt.Sprintf(" %s:", annType)) - s1 := (" obj# ") - s2 := fmt.Sprintf("%%%ds", maxLenRect) - s3 := fmt.Sprintf("%%%ds", maxLenID) - s4 := fmt.Sprintf("%%%ds", maxLenContent) - s := fmt.Sprintf(s1+s2+" "+s3+" "+s4, "rect", "id", "content") - ss = append(ss, s) - ss = append(ss, " "+strings.Repeat("=", len(s)-4)) + + horSep := []int{} + + // Render header. + ss = append(ss, prepareHeader(&horSep, &maxLen, annType == "Custom")) + + // Render separator. + ss = append(ss, draw.HorSepLine(horSep)) + + // Render content. for _, objNr := range objNrs { ann := annots.Map[objNr] - ss = append(ss, fmt.Sprintf(" %5d "+s2+" "+s3+" "+s4, objNr, ann.RectString(), ann.ID(), ann.ContentString())) + + s := strconv.Itoa(objNr) + fill1 := strings.Repeat(" ", maxLen.ObjNr-len(s)) + if maxLen.ObjNr < 4 { + fill1 += strings.Repeat(" ", 4-maxLen.ObjNr) + } + + s = ann.ID() + fill2 := strings.Repeat(" ", maxLen.ID-len(s)) + if maxLen.ID < 2 { + fill2 += strings.Repeat(" ", 2-maxLen.ID) + } + + s = ann.RectString() + fill3 := strings.Repeat(" ", maxLen.Rect-len(s)) + + if ann.Type() != model.AnnCustom { + ss = append(ss, fmt.Sprintf(" %s%d %s %s%s %s %s%s %s %s", + fill1, objNr, draw.VBar, fill2, ann.ID(), draw.VBar, fill3, ann.RectString(), draw.VBar, ann.ContentString())) + } else { + s = ann.ContentString() + fill4 := strings.Repeat(" ", maxLen.Content-len(s)) + ss = append(ss, fmt.Sprintf(" %s%d %s %s%s %s %s%s %s %s%s%s %s", + fill1, objNr, draw.VBar, fill2, ann.ID(), draw.VBar, fill3, ann.RectString(), draw.VBar, fill4, ann.ContentString(), draw.VBar, ann.CustomTypeString())) + } + j++ } } @@ -308,14 +427,14 @@ func addAnnotationToDirectObj( pageDict types.Dict, pageNr int, ar model.AnnotationRenderer, - incr bool) (bool, error) { + incr bool) error { i, err := findAnnotByID(ctx, ar.ID(), annots) if err != nil { - return false, err + return err } if i >= 0 { - return false, errors.Errorf("page %d: duplicate annotation with id:%s\n", pageNr, ar.ID()) + return errors.Errorf("page %d: duplicate annotation with id:%s\n", pageNr, ar.ID()) } pageDict.Update("Annots", append(annots, *annotIndRef)) if incr { @@ -323,7 +442,7 @@ func addAnnotationToDirectObj( ctx.Write.IncrementWithObjNr(pageDictIndRef.ObjectNumber.Value()) } ctx.EnsureVersionForWriting() - return true, nil + return nil } // AddAnnotation adds ar to pageDict. @@ -333,18 +452,18 @@ func AddAnnotation( pageDict types.Dict, pageNr int, ar model.AnnotationRenderer, - incr bool) (bool, error) { + incr bool) (*types.IndirectRef, types.Dict, error) { // Create xreftable entry for annotation. - annotIndRef, err := createAnnot(ctx, ar, pageDictIndRef) + annotIndRef, d, err := createAnnot(ctx, ar, pageDictIndRef) if err != nil { - return false, err + return nil, nil, err } // Add annotation to xreftable page annotation cache. err = addAnnotationToCache(ctx, ar, pageNr, annotIndRef.ObjectNumber.Value()) if err != nil { - return false, err + return nil, nil, err } if incr { @@ -360,33 +479,33 @@ func AddAnnotation( ctx.Write.IncrementWithObjNr(pageDictIndRef.ObjectNumber.Value()) } ctx.EnsureVersionForWriting() - return true, nil + return annotIndRef, d, nil } ir, ok := obj.(types.IndirectRef) if !ok { - return addAnnotationToDirectObj(ctx, obj.(types.Array), annotIndRef, pageDictIndRef, pageDict, pageNr, ar, incr) + return annotIndRef, d, addAnnotationToDirectObj(ctx, obj.(types.Array), annotIndRef, pageDictIndRef, pageDict, pageNr, ar, incr) } // Annots array is an IndirectReference. o, err := ctx.Dereference(ir) if err != nil || o == nil { - return false, err + return nil, nil, err } annots, _ := o.(types.Array) i, err := findAnnotByID(ctx, ar.ID(), annots) if err != nil { - return false, err + return nil, nil, err } if i >= 0 { - return false, errors.Errorf("page %d: duplicate annotation with id:%s\n", pageNr, ar.ID()) + return nil, nil, errors.Errorf("page %d: duplicate annotation with id:%s\n", pageNr, ar.ID()) } entry, ok := ctx.FindTableEntryForIndRef(&ir) if !ok { - return false, errors.Errorf("page %d: can't dereference Annots indirect reference(obj#:%d)\n", pageNr, ir.ObjectNumber) + return nil, nil, errors.Errorf("page %d: can't dereference Annots indirect reference(obj#:%d)\n", pageNr, ir.ObjectNumber) } entry.Object = append(annots, *annotIndRef) if incr { @@ -395,7 +514,21 @@ func AddAnnotation( } ctx.EnsureVersionForWriting() - return true, nil + return annotIndRef, d, nil +} + +func AddAnnotationToPage(ctx *model.Context, pageNr int, ar model.AnnotationRenderer, incr bool) (*types.IndirectRef, types.Dict, error) { + pageDictIndRef, err := ctx.PageDictIndRef(pageNr) + if err != nil { + return nil, nil, err + } + + d, err := ctx.DereferenceDict(*pageDictIndRef) + if err != nil { + return nil, nil, err + } + + return AddAnnotation(ctx, pageDictIndRef, d, pageNr, ar, incr) } // AddAnnotations adds ar to selected pages. @@ -424,11 +557,11 @@ func AddAnnotations(ctx *model.Context, selectedPages types.IntSet, ar model.Ann return false, err } - added, err := AddAnnotation(ctx, pageDictIndRef, d, k, ar, incr) + indRef, _, err := AddAnnotation(ctx, pageDictIndRef, d, k, ar, incr) if err != nil { return false, err } - if added { + if indRef != nil { ok = true } } @@ -460,11 +593,11 @@ func AddAnnotationsMap(ctx *model.Context, m map[int][]model.AnnotationRenderer, } for _, annot := range annots { - added, err := AddAnnotation(ctx, pageDictIndRef, d, i, annot, incr) + indRef, _, err := AddAnnotation(ctx, pageDictIndRef, d, i, annot, incr) if err != nil { return false, err } - if added { + if indRef != nil { ok = true } } diff --git a/pkg/pdfcpu/booklet.go b/pkg/pdfcpu/booklet.go index 3208d26d..99562c1c 100644 --- a/pkg/pdfcpu/booklet.go +++ b/pkg/pdfcpu/booklet.go @@ -19,6 +19,7 @@ package pdfcpu import ( "bytes" "fmt" + "math" "os" "strconv" "strings" @@ -43,6 +44,7 @@ func DefaultBookletConfig() *model.NUp { nup.FolioSize = 8 nup.BookletType = model.Booklet nup.BookletBinding = model.LongEdge + nup.Enforce = true return nup } @@ -68,11 +70,10 @@ func PDFBookletConfig(val int, desc string, conf *model.Configuration) (*model.N if err := ParseNUpValue(val, nup); err != nil { return nil, err } - // 6up and 8up special cases - if nup.IsBooklet() && val > 4 && nup.IsTopFoldBinding() { + // 6up special cases + if nup.IsBooklet() && val == 6 && nup.IsTopFoldBinding() { // You can't top fold a 6up with 3 rows. - // TODO: support this for 8up - return nup, fmt.Errorf("pdfcpu booklet: n>4 must have binding on side (portrait long-edge or landscape short-edge)") + return nup, fmt.Errorf("pdfcpu booklet: n=6 must have binding on side (portrait long-edge or landscape short-edge)") } // bookletadvanced if nup.BookletType == model.BookletAdvanced && val == 4 && nup.IsTopFoldBinding() { @@ -99,7 +100,10 @@ func getPageNumber(pageNumbers []int, n int) int { return pageNumbers[n] } -func nup2OutputPageNr(inputPageNr, inputPageCount int, pageNumbers []int) (int, bool) { +type pageNumberFunction func(inputPageNr int, pageCount int, pageNumbers []int, nup *model.NUp) (int, bool) + +func nup2OutputPageNr(inputPageNr, inputPageCount int, pageNumbers []int, _ *model.NUp) (int, bool) { + // (output page, input page) = [(1,n), (2,1), (3, n-1), (4, 2), (5, n-2), (6, 3), ...] var p int if inputPageNr%2 == 0 { p = inputPageCount - 1 - inputPageNr/2 @@ -383,12 +387,7 @@ func nupPerfectBound(positionNumber int, inputPageCount int, pageNumbers []int, return getPageNumber(pageNumbers, p-1), rotate // p is one-indexed and we want zero-indexed } -type bookletPage struct { - number int - rotate bool -} - -func sortSelectedPagesForBooklet(pages types.IntSet, nup *model.NUp) []bookletPage { +func GetBookletOrdering(pages types.IntSet, nup *model.NUp) []model.BookletPage { pageNumbers := sortSelectedPages(pages) pageCount := len(pageNumbers) @@ -401,46 +400,55 @@ func sortSelectedPagesForBooklet(pages types.IntSet, nup *model.NUp) []bookletPa pageCount += sheetPageCount - pageCount%sheetPageCount } - bookletPages := make([]bookletPage, pageCount) + if nup.MultiFolio { + bookletPages := make([]model.BookletPage, 0) + // folioSize is the number of sheets - each "folio" has two sides and two pages per side + nPagesPerSignature := nup.FolioSize * 4 + nSignaturesInBooklet := int(math.Ceil(float64(pageCount) / float64(nPagesPerSignature))) + for j := 0; j < nSignaturesInBooklet; j++ { + start := j * nPagesPerSignature + stop := (j + 1) * nPagesPerSignature + if stop > len(pageNumbers) { + // last signature may be short + stop = len(pageNumbers) + nPagesPerSignature = pageCount - start + } + bookletPages = append(bookletPages, getBookletPageOrdering(nup, pageNumbers[start:stop], nPagesPerSignature)...) + } + return bookletPages + } + return getBookletPageOrdering(nup, pageNumbers, pageCount) +} + +func getBookletPageOrdering(nup *model.NUp, pageNumbers []int, pageCount int) []model.BookletPage { + bookletPages := make([]model.BookletPage, pageCount) + var pageNumberFn pageNumberFunction switch nup.BookletType { case model.Booklet, model.BookletAdvanced: switch nup.N() { case 2: - // (output page, input page) = [(1,n), (2,1), (3, n-1), (4, 2), (5, n-2), (6, 3), ...] - for i := 0; i < pageCount; i++ { - pageNr, rotate := nup2OutputPageNr(i, pageCount, pageNumbers) - bookletPages[i].number = pageNr - bookletPages[i].rotate = rotate - } - + pageNumberFn = nup2OutputPageNr case 4: - for i := 0; i < pageCount; i++ { - pageNr, rotate := nup4OutputPageNr(i, pageCount, pageNumbers, nup) - bookletPages[i].number = pageNr - bookletPages[i].rotate = rotate - } + pageNumberFn = nup4OutputPageNr case 6: - for i := 0; i < pageCount; i++ { - pageNr, rotate := nupLRTBOutputPageNr(i, pageCount, pageNumbers, nup) - bookletPages[i].number = pageNr - bookletPages[i].rotate = rotate - } + pageNumberFn = nupLRTBOutputPageNr case 8: - for i := 0; i < pageCount; i++ { - pageNr, rotate := nup8OutputPageNr(i, pageCount, pageNumbers, nup) - bookletPages[i].number = pageNr - bookletPages[i].rotate = rotate + if nup.BookletBinding == model.ShortEdge { + pageNumberFn = nupLRTBOutputPageNr + } else { // long edge + pageNumberFn = nup8OutputPageNr } } case model.BookletPerfectBound: - for i := 0; i < pageCount; i++ { - pageNr, rotate := nupPerfectBound(i, pageCount, pageNumbers, nup) - bookletPages[i].number = pageNr - bookletPages[i].rotate = rotate - } + pageNumberFn = nupPerfectBound } + for i := 0; i < pageCount; i++ { + pageNr, rotate := pageNumberFn(i, pageCount, pageNumbers, nup) + bookletPages[i].Number = pageNr + bookletPages[i].Rotate = rotate + } return bookletPages } @@ -455,7 +463,7 @@ func bookletPages( formsResDict := types.NewDict() rr := nup.RectsForGrid() - for i, bp := range sortSelectedPagesForBooklet(selectedPages, nup) { + for i, bp := range GetBookletOrdering(selectedPages, nup) { if i > 0 && i%len(rr) == 0 { // Wrap complete page. @@ -468,7 +476,7 @@ func bookletPages( rDest := rr[i%len(rr)] - if bp.number == 0 { + if bp.Number == 0 { // This is an empty page at the end. if nup.BgColor != nil { draw.FillRectNoBorder(&buf, rDest, *nup.BgColor) @@ -476,7 +484,7 @@ func bookletPages( continue } - if err := ctx.NUpTilePDFBytesForPDF(bp.number, formsResDict, &buf, rDest, nup, bp.rotate); err != nil { + if err := ctx.NUpTilePDFBytesForPDF(bp.Number, formsResDict, &buf, rDest, nup, bp.Rotate); err != nil { return err } } @@ -503,7 +511,7 @@ func BookletFromImages(ctx *model.Context, fileNames []string, nup *model.NUp, p var buf bytes.Buffer rr := nup.RectsForGrid() - for i, bp := range sortSelectedPagesForBooklet(selectedPages, nup) { + for i, bp := range GetBookletOrdering(selectedPages, nup) { if i > 0 && i%len(rr) == 0 { @@ -518,7 +526,7 @@ func BookletFromImages(ctx *model.Context, fileNames []string, nup *model.NUp, p rDest := rr[i%len(rr)] - if bp.number == 0 { + if bp.Number == 0 { // This is an empty page at the end of a booklet. if nup.BgColor != nil { draw.FillRectNoBorder(&buf, rDest, *nup.BgColor) @@ -526,12 +534,12 @@ func BookletFromImages(ctx *model.Context, fileNames []string, nup *model.NUp, p continue } - f, err := os.Open(fileNames[bp.number-1]) + f, err := os.Open(fileNames[bp.Number-1]) if err != nil { return err } - imgIndRef, w, h, err := model.CreateImageResource(xRefTable, f, false, false) + imgIndRef, w, h, err := model.CreateImageResource(xRefTable, f) if err != nil { return err } @@ -549,8 +557,7 @@ func BookletFromImages(ctx *model.Context, fileNames []string, nup *model.NUp, p formsResDict.Insert(formResID, *formIndRef) // Append to content stream of booklet page i. - enforceOrientation := false - model.NUpTilePDFBytes(&buf, types.RectForDim(float64(w), float64(h)), rr[i%len(rr)], formResID, nup, bp.rotate, enforceOrientation) + model.NUpTilePDFBytes(&buf, types.RectForDim(float64(w), float64(h)), rr[i%len(rr)], formResID, nup, bp.Rotate) } // Wrap incomplete booklet page. @@ -587,27 +594,8 @@ func BookletFromPDF(ctx *model.Context, selectedPages types.IntSet, nup *model.N nup.PageDim = &types.Dim{Width: mb.Width(), Height: mb.Height()} - if nup.MultiFolio { - pages := types.IntSet{} - for _, i := range sortSelectedPages(selectedPages) { - pages[i] = true - if len(pages) == 4*nup.FolioSize { - if err = bookletPages(ctx, pages, nup, pagesDict, pagesIndRef); err != nil { - return err - } - pages = types.IntSet{} - } - } - if len(pages) > 0 { - if err = bookletPages(ctx, pages, nup, pagesDict, pagesIndRef); err != nil { - return err - } - } - - } else { - if err = bookletPages(ctx, selectedPages, nup, pagesDict, pagesIndRef); err != nil { - return err - } + if err = bookletPages(ctx, selectedPages, nup, pagesDict, pagesIndRef); err != nil { + return err } // Replace original pagesDict. diff --git a/pkg/pdfcpu/booklet_test.go b/pkg/pdfcpu/booklet_test.go index 942d0fab..fa495ba6 100644 --- a/pkg/pdfcpu/booklet_test.go +++ b/pkg/pdfcpu/booklet_test.go @@ -22,16 +22,52 @@ import ( ) type pageOrderResults struct { - id string - nup int - pageCount int - expectedPageOrder []int - papersize string - bookletType string - binding string + id string + nup int + pageCount int + expectedPageOrder []int + papersize string + bookletType string + binding string + useSignatures bool + nPagesPerSignature int } var bookletTestCases = []pageOrderResults{ + { + id: "2up", + nup: 2, + pageCount: 16, + expectedPageOrder: []int{ + 16, 1, + 15, 2, + 14, 3, + 13, 4, + 12, 5, + 11, 6, + 10, 7, + 9, 8, + }, + papersize: "A6", + bookletType: "booklet", + binding: "long", + }, + { + id: "2up with trailing blank pages", + nup: 2, + pageCount: 10, + expectedPageOrder: []int{ + 0, 1, + 0, 2, + 10, 3, + 9, 4, + 8, 5, + 7, 6, + }, + papersize: "A6", + bookletType: "booklet", + binding: "long", + }, // basic booklet sidefold test cases { id: "booklet portrait long edge", @@ -140,7 +176,7 @@ var bookletTestCases = []pageOrderResults{ }, // 8up test { - id: "8up", + id: "8up portrait long edge", nup: 8, pageCount: 32, expectedPageOrder: []int{ @@ -149,7 +185,43 @@ var bookletTestCases = []pageOrderResults{ 9, 22, 24, 11, 13, 18, 20, 15, 21, 10, 12, 23, 17, 14, 16, 19, }, - papersize: "A6", // portrait, long-edge binding + papersize: "A6", + bookletType: "booklet", + binding: "long", + }, + { + id: "8up portrait short edge", + nup: 8, + pageCount: 16, + expectedPageOrder: []int{ + 16, 1, 14, 3, 12, 5, 10, 7, + 2, 15, 4, 13, 6, 11, 8, 9, + }, + papersize: "A6", + bookletType: "booklet", + binding: "short", + }, + { + id: "8up landscape short edge", + nup: 8, + pageCount: 16, + expectedPageOrder: []int{ + 16, 1, 14, 3, 12, 5, 10, 7, + 2, 15, 4, 13, 6, 11, 8, 9, + }, + papersize: "A6L", + bookletType: "booklet", + binding: "short", + }, + { + id: "8up landscape long edge", + nup: 8, + pageCount: 16, + expectedPageOrder: []int{ + 1, 14, 16, 3, 5, 10, 12, 7, + 13, 2, 4, 15, 9, 6, 8, 11, + }, + papersize: "A6L", bookletType: "booklet", binding: "long", }, @@ -208,26 +280,95 @@ var bookletTestCases = []pageOrderResults{ bookletType: "perfectbound", binding: "long", }, + // signatures + { + id: "signatures 2up", + nup: 2, + pageCount: 16, + expectedPageOrder: []int{ + 12, 1, // signature 1 + 11, 2, + 10, 3, + 9, 4, + 8, 5, + 7, 6, + 16, 13, // signature 2, incomplete + 15, 14, + }, + papersize: "A6", + bookletType: "booklet", + binding: "long", + useSignatures: true, + nPagesPerSignature: 12, + }, + { + id: "signatures 4up", + nup: 4, + pageCount: 24, + expectedPageOrder: []int{ + 16, 1, 3, 14, // signature 1 + 2, 15, 13, 4, + 12, 5, 7, 10, + 6, 11, 9, 8, + 24, 17, 19, 22, // signature 2, incomplete + 18, 23, 21, 20, + }, + papersize: "A5", + bookletType: "booklet", + binding: "long", + useSignatures: true, + nPagesPerSignature: 16, + }, + { + id: "signatures 2up with trailing blank pages", + nup: 2, + pageCount: 18, + expectedPageOrder: []int{ + 12, 1, // signature 1 + 11, 2, + 10, 3, + 9, 4, + 8, 5, + 7, 6, + 0, 13, // signature 2, incomplete, with blanks + 0, 14, + 18, 15, + 17, 16, + }, + papersize: "A6", + bookletType: "booklet", + binding: "long", + useSignatures: true, + nPagesPerSignature: 12, + }, } func TestBookletPageOrder(t *testing.T) { for _, test := range bookletTestCases { - t.Run(test.id, func(t *testing.T) { - nup, err := PDFBookletConfig(test.nup, fmt.Sprintf("papersize:%s, btype:%s, binding: %s", test.papersize, test.bookletType, test.binding), nil) + t.Run(test.id, func(tt *testing.T) { + desc := fmt.Sprintf("papersize:%s, btype:%s, binding: %s", test.papersize, test.bookletType, test.binding) + if test.useSignatures { + desc += fmt.Sprintf(", multifolio:on, foliosize:%d", test.nPagesPerSignature/4) + } + nup, err := PDFBookletConfig(test.nup, desc, nil) if err != nil { - t.Fatal(err) + tt.Fatal(err) } pageNumbers := make(map[int]bool) for i := 0; i < test.pageCount; i++ { pageNumbers[i+1] = true } - pageOrder := make([]int, test.pageCount) - for i, p := range sortSelectedPagesForBooklet(pageNumbers, nup) { - pageOrder[i] = p.number + pageOrder := make([]int, len(test.expectedPageOrder)) + out := GetBookletOrdering(pageNumbers, nup) + if len(test.expectedPageOrder) != len(out) { + tt.Fatalf("page order output has the wrong length, expected %d but got %d", len(test.expectedPageOrder), len(out)) + } + for i, p := range out { + pageOrder[i] = p.Number } for i, expected := range test.expectedPageOrder { if pageOrder[i] != expected { - t.Fatal("incorrect page order\nexpected:", arrayToString(test.expectedPageOrder), "\n got:", arrayToString(pageOrder)) + tt.Fatal("incorrect page order\nexpected:", arrayToString(test.expectedPageOrder), "\n got:", arrayToString(pageOrder)) } } }) diff --git a/pkg/pdfcpu/bookmark.go b/pkg/pdfcpu/bookmark.go index 0b5c6afb..77543f9e 100644 --- a/pkg/pdfcpu/bookmark.go +++ b/pkg/pdfcpu/bookmark.go @@ -32,9 +32,9 @@ import ( ) var ( - errNoBookmarks = errors.New("pdfcpu: no bookmarks available") - errCorruptedBookmarks = errors.New("pdfcpu: corrupt bookmark") - errExistingBookmarks = errors.New("pdfcpu: existing bookmarks") + errNoBookmarks = errors.New("pdfcpu: no bookmarks available") + errInvalidBookmark = errors.New("pdfcpu: invalid bookmark") + errExistingBookmarks = errors.New("pdfcpu: existing bookmarks") ) type Header struct { @@ -94,30 +94,12 @@ func (bm Bookmark) Style() int { return i } -func positionToFirstBookmark(ctx *model.Context) (types.Dict, *types.IndirectRef, error) { - - // Position to first bookmark on top most level with more than 1 bookmarks. - // Default to top most single bookmark level. - +func positionToFirstBookmark(ctx *model.Context) (*types.IndirectRef, error) { d := ctx.Outlines if d == nil { - return nil, nil, errNoBookmarks + return nil, errNoBookmarks } - - first := d.IndirectRefEntry("First") - last := d.IndirectRefEntry("Last") - - var err error - - for first != nil && last != nil && *first == *last { - if d, err = ctx.DereferenceDict(*first); err != nil { - return nil, nil, err - } - first = d.IndirectRefEntry("First") - last = d.IndirectRefEntry("Last") - } - - return d, first, nil + return d.IndirectRefEntry("First"), nil } func outlineItemTitle(s string) string { @@ -131,43 +113,83 @@ func outlineItemTitle(s string) string { return sb.String() } -// PageObjFromDestinationArray returns an IndirectRef of the destinations page. -func PageObjFromDestination(ctx *model.Context, dest types.Object) (*types.IndirectRef, error) { - var ( - err error - ir types.IndirectRef - arr types.Array - ) +func destArray(ctx *model.Context, dest types.Object) (types.Array, error) { switch dest := dest.(type) { case types.Name: - arr, err = ctx.DereferenceDestArray(dest.Value()) - if err == nil { - ir = arr[0].(types.IndirectRef) - } + return ctx.DereferenceDestArray(dest.Value()) case types.StringLiteral: s, err := types.StringLiteralToString(dest) if err != nil { return nil, err } - arr, err = ctx.DereferenceDestArray(s) - if err == nil { - ir = arr[0].(types.IndirectRef) - } + return ctx.DereferenceDestArray(s) case types.HexLiteral: s, err := types.HexLiteralToString(dest) if err != nil { return nil, err } - arr, err = ctx.DereferenceDestArray(s) - if err == nil { - ir = arr[0].(types.IndirectRef) - } + return ctx.DereferenceDestArray(s) case types.Array: - if dest[0] != nil { - ir = dest[0].(types.IndirectRef) + return dest, nil + } + return nil, errors.Errorf("unable to resolve destination array %v\n", dest) +} + +// PageNrFromDestination returns the page number of a destination. +func PageNrFromDestination(ctx *model.Context, dest types.Object) (int, error) { + arr, err := destArray(ctx, dest) + if err != nil && ctx.XRefTable.ValidationMode == model.ValidationRelaxed { + return 0, nil + } + + if i, ok := arr[0].(types.Integer); ok { + return i.Value(), nil + } + + if ir, ok := arr[0].(types.IndirectRef); ok { + return ctx.PageNumber(ir.ObjectNumber.Value()) + } + + return 0, errors.Errorf("unable to extract dest pageNr of %v\n", dest) +} + +func title(ctx *model.Context, d types.Dict) (string, error) { + obj, err := ctx.Dereference(d["Title"]) + if err != nil { + return "", err + } + + s, err := model.Text(obj) + if err != nil { + if ctx.XRefTable.ValidationMode == model.ValidationStrict { + return "", err } + return "", nil } - return &ir, err + + return outlineItemTitle(s), nil +} + +func bookmark(d types.Dict, title string, pageFrom int, parent *Bookmark) Bookmark { + bm := Bookmark{ + Title: title, + PageFrom: pageFrom, + Parent: parent, + Bold: false, + Italic: false, + } + + if arr := d.ArrayEntry("C"); len(arr) == 3 { + col := color.NewSimpleColorForArray(arr) + bm.Color = &col + } + + if f := d.IntEntry("F"); f != nil { + bm.Bold = *f&0x02 > 0 + bm.Italic = *f&0x01 > 0 + } + + return bm } // BookmarksForOutlineItem returns the bookmarks tree for an outline item. @@ -186,18 +208,15 @@ func BookmarksForOutlineItem(ctx *model.Context, item *types.IndirectRef, parent return nil, err } - obj, err := ctx.Dereference(d["Title"]) + title, err := title(ctx, d) if err != nil { return nil, err } - s, err := model.Text(obj) - if err != nil { - return nil, err + if title == "" { + continue } - title := outlineItemTitle(s) - // Retrieve page number out of a destination via "Dest" or "Goto Action". dest, destFound := d["Dest"] if !destFound { @@ -213,20 +232,12 @@ func BookmarksForOutlineItem(ctx *model.Context, item *types.IndirectRef, parent dest = act.(types.Dict)["D"] } - obj, err = ctx.Dereference(dest) + obj, err := ctx.Dereference(dest) if err != nil { return nil, err } - ir, err := PageObjFromDestination(ctx, obj) - if err != nil { - return nil, err - } - if ir == nil { - continue - } - - pageFrom, err := ctx.PageNumber(ir.ObjectNumber.Value()) + pageFrom, err := PageNrFromDestination(ctx, obj) if err != nil { return nil, err } @@ -239,32 +250,16 @@ func BookmarksForOutlineItem(ctx *model.Context, item *types.IndirectRef, parent } } - newBookmark := Bookmark{ - Title: title, - PageFrom: pageFrom, - Parent: parent, - Bold: false, - Italic: false, - } - - if arr := d.ArrayEntry("C"); len(arr) == 3 { - col := color.NewSimpleColorForArray(arr) - newBookmark.Color = &col - } - - if f := d.IntEntry("F"); f != nil { - newBookmark.Bold = *f&0x02 > 0 - newBookmark.Italic = *f&0x01 > 0 - } + bm := bookmark(d, title, pageFrom, parent) first := d["First"] if first != nil { indRef := first.(types.IndirectRef) - kids, _ := BookmarksForOutlineItem(ctx, &indRef, &newBookmark) - newBookmark.Kids = kids + kids, _ := BookmarksForOutlineItem(ctx, &indRef, &bm) + bm.Kids = kids } - bms = append(bms, newBookmark) + bms = append(bms, bm) } return bms, nil @@ -277,7 +272,7 @@ func Bookmarks(ctx *model.Context) ([]Bookmark, error) { return nil, err } - _, first, err := positionToFirstBookmark(ctx) + first, err := positionToFirstBookmark(ctx) if err != nil { if err != errNoBookmarks { return nil, err @@ -311,7 +306,7 @@ func BookmarkList(ctx *model.Context) ([]string, error) { return nil, err } - if bms == nil { + if len(bms) == 0 { return []string{"no bookmarks available"}, nil } @@ -323,7 +318,7 @@ func ExportBookmarks(ctx *model.Context, source string) (*BookmarkTree, error) { if err != nil { return nil, err } - if bms == nil { + if len(bms) == 0 { return nil, nil } @@ -365,7 +360,7 @@ func bmDict(ctx *model.Context, bm Bookmark, parent types.IndirectRef) (types.Di var o types.Object = *ir - s, err := types.EscapeUTF16String(bm.Title) + s, err := types.EscapedUTF16String(bm.Title) if err != nil { return nil, err } @@ -404,11 +399,11 @@ func createOutlineItemDict(ctx *model.Context, bms []Bookmark, parent *types.Ind for i, bm := range bms { if i == 0 && parentPageNr != nil && bm.PageFrom < *parentPageNr { - return nil, nil, 0, 0, errCorruptedBookmarks + return nil, nil, 0, 0, errInvalidBookmark } if i > 0 && bm.PageFrom < bms[i-1].PageFrom { - return nil, nil, 0, 0, errCorruptedBookmarks + return nil, nil, 0, 0, errInvalidBookmark } total++ @@ -463,11 +458,49 @@ func createOutlineItemDict(ctx *model.Context, bms []Bookmark, parent *types.Ind return first, irPrev, total, visible, nil } +func cleanupDestinations(ctx *model.Context, dNamesEmpty bool) error { + if dNamesEmpty { + delete(ctx.Names, "Dests") + if err := ctx.RemoveNameTree("Dests"); err != nil { + return err + } + } + + if ctx.Dests != nil && len(ctx.Dests) == 0 { + delete(ctx.RootDict, "Dests") + } + + return nil +} + +func removeDest(ctx *model.Context, name string) (bool, bool, error) { + var ( + dNamesEmpty, ok bool + err error + ) + if dNames := ctx.Names["Dests"]; dNames != nil { + // Remove destName from dest nametree. + dNamesEmpty, ok, err = dNames.Remove(ctx.XRefTable, name) + if err != nil { + return false, false, err + } + } + + if !ok { + if ctx.Dests != nil { + // Remove destName from named destinations. + ok = ctx.Dests.Delete(name) != nil + } + } + + return dNamesEmpty, ok, err +} + func removeNamedDests(ctx *model.Context, item *types.IndirectRef) error { var ( - d types.Dict - err error - empty, ok bool + d types.Dict + err error + dNamesEmpty, ok bool ) for ir := item; ir != nil; ir = d.IndirectRefEntry("Next") { @@ -498,9 +531,7 @@ func removeNamedDests(ctx *model.Context, item *types.IndirectRef) error { continue } - // Remove destName from dest nametree. - // TODO also try to remove from any existing root.Dests - empty, ok, err = ctx.Names["Dests"].Remove(ctx.XRefTable, s) + dNamesEmpty, ok, err = removeDest(ctx, s) if err != nil { return err } @@ -519,19 +550,12 @@ func removeNamedDests(ctx *model.Context, item *types.IndirectRef) error { } } - if empty { - delete(ctx.Names, "Dests") - if err := ctx.RemoveNameTree("Dests"); err != nil { - return err - } - } - - return nil + return cleanupDestinations(ctx, dNamesEmpty) } // RemoveBookmarks erases all outlines from ctx. func RemoveBookmarks(ctx *model.Context) (bool, error) { - _, first, err := positionToFirstBookmark(ctx) + first, err := positionToFirstBookmark(ctx) if err != nil { if err != errNoBookmarks { return false, err diff --git a/pkg/pdfcpu/certificate.go b/pkg/pdfcpu/certificate.go new file mode 100644 index 00000000..d5a35ab1 --- /dev/null +++ b/pkg/pdfcpu/certificate.go @@ -0,0 +1,257 @@ +/* +Copyright 2025 The pdfcpu Authors. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package pdfcpu + +import ( + "bytes" + "crypto/x509" + "encoding/base64" + "encoding/pem" + "fmt" + "os" + "path/filepath" + "strings" + + "github.com/angel-one/pdfcpu/pkg/pdfcpu/model" + "github.com/hhrutter/pkcs7" + "github.com/pkg/errors" +) + +var ErrUnknownFileType = errors.New("pdfcpu: unsupported file type") + +func loadSingleCertFile(filename string) (*x509.Certificate, error) { + bb, err := os.ReadFile(filename) + if err != nil { + return nil, err + } + + block, _ := pem.Decode(bb) + if block != nil && block.Type == "CERTIFICATE" { + return x509.ParseCertificate(block.Bytes) + } + + // DER + return x509.ParseCertificate(bb) +} + +func loadCertsFromPEM(filename string) ([]*x509.Certificate, error) { + bb, err := os.ReadFile(filename) + if err != nil { + return nil, err + } + + var certs []*x509.Certificate + + for len(bb) > 0 { + var block *pem.Block + block, bb = pem.Decode(bb) + if block == nil { + break + } + if block.Type != "CERTIFICATE" || len(block.Headers) != 0 { + continue + } + cert, err := x509.ParseCertificate(block.Bytes) + if err != nil { + return nil, err + } + certs = append(certs, cert) + } + + return certs, nil +} + +const PKCS7_PREFIX = "-----BEGIN PKCS7-----" +const PKCS7_SUFFIX = "-----END PKCS7-----" + +func isPEMEncoded(s string) bool { + s = strings.TrimRight(s, " \t\r\n") + return strings.HasPrefix(s, PKCS7_PREFIX) && strings.HasSuffix(s, PKCS7_SUFFIX) +} + +func decodePKCS7Block(s string) ([]byte, error) { + start := strings.Index(s, PKCS7_PREFIX) + end := strings.Index(s, PKCS7_SUFFIX) + + if start == -1 || end == -1 || end <= start { + return nil, fmt.Errorf("decodePKCS7Block: PEM block not found") + } + + s = s[start+len(PKCS7_PREFIX) : end] + s = strings.TrimSpace(s) + + return base64.StdEncoding.DecodeString(s) +} + +func loadCertsFromP7C(filename string) ([]*x509.Certificate, error) { + bb, err := os.ReadFile(filename) + if err != nil { + return nil, err + } + + s := string(bb) + if isPEMEncoded(s) { + bb, err = decodePKCS7Block(s) + if err != nil { + return nil, err + } + } // else DER (binary) + + p7, err := pkcs7.Parse(bb) + if err != nil { + return nil, err + } + + return p7.Certificates, nil +} + +func LoadCertificates(filename string) ([]*x509.Certificate, error) { + ext := strings.ToLower(filepath.Ext(filename)) + switch ext { + case ".crt", ".cer": + cert, err := loadSingleCertFile(filename) + if err != nil { + return nil, err + } + return []*x509.Certificate{cert}, nil + case ".p7c": + return loadCertsFromP7C(filename) + case ".pem": + return loadCertsFromPEM(filename) + default: + return nil, ErrUnknownFileType + } +} + +func loadCertificatesToCertPool(path string, certPool *x509.CertPool, n *int) error { + certs, err := LoadCertificates(path) + if err != nil { + if err == ErrUnknownFileType { + return nil + } + return err + } + for _, cert := range certs { + certPool.AddCert(cert) + } + *n += len(certs) + return nil +} + +func LoadCertificatesToCertPool(dir string, certPool *x509.CertPool) (int, error) { + n := 0 + err := filepath.WalkDir(dir, func(path string, d os.DirEntry, err error) error { + if err != nil { + return err + } + if d.IsDir() { + return nil + } + return loadCertificatesToCertPool(path, certPool, &n) + }) + return n, err +} + +func saveCertsAsPEM(certs []*x509.Certificate, filename string, overwrite bool) (bool, error) { + if len(certs) == 0 { + return false, errors.New("no certificates to save") + } + + if !overwrite { + if _, err := os.Stat(filename); err == nil { + return false, nil + } + } + + file, err := os.Create(filename) + if err != nil { + return false, fmt.Errorf("failed to create file: %w", err) + } + defer file.Close() + + for _, cert := range certs { + block := &pem.Block{ + Type: "CERTIFICATE", + Bytes: cert.Raw, + } + if err := pem.Encode(file, block); err != nil { + return false, err + } + } + + return true, nil +} + +func saveCertsAsP7C(certs []*x509.Certificate, filename string, overwrite bool) (bool, error) { + // TODO encodeBase64 bool (PEM) + + if len(certs) == 0 { + return false, errors.New("no certificates to save") + } + + p7, err := pkcs7.NewSignedData(nil) + if err != nil { + return false, err + } + + for _, cert := range certs { + p7.AddCertificate(cert) + } + + bb, err := p7.Finish() + if err != nil { + return false, err + } + + return Write(bytes.NewReader(bb), filename, overwrite) +} + +func ImportCertificate(inFile string, overwrite bool) (int, bool, error) { + certs, err := LoadCertificates(inFile) + if err != nil { + return 0, false, err + } + + // We have validated the incoming cert info. + + enforceP7C := true // takes less disk space + + base := filepath.Base(inFile) + outFileNoExt := base[:len(base)-len(filepath.Ext(base))] + outFile := outFileNoExt + ".p7c" + outFile = filepath.Join(model.CertDir, outFile) + + if enforceP7C { + // Write certs as .p7c to certDir. + ok, err := saveCertsAsP7C(certs, outFile, overwrite) + if err != nil { + return 0, false, err + } + return len(certs), ok, nil + } + + // Copy inFile to certDir (may be .pem or p7c) + ok, err := CopyFile(inFile, outFile, overwrite) + if err != nil { + return 0, false, err + } + return len(certs), ok, nil +} + +func InspectCertificate(cert *x509.Certificate) (string, error) { + return model.CertString(cert), nil +} diff --git a/pkg/pdfcpu/color/color.go b/pkg/pdfcpu/color/color.go index b66ac4d3..0a77ee1b 100644 --- a/pkg/pdfcpu/color/color.go +++ b/pkg/pdfcpu/color/color.go @@ -36,6 +36,7 @@ var ( Red = SimpleColor{1, 0, 0} Green = SimpleColor{0, 1, 0} Blue = SimpleColor{0, 0, 1} + Yellow = SimpleColor{.5, .5, 0} ) var ErrInvalidColor = errors.New("pdfcpu: invalid color constant") diff --git a/pkg/pdfcpu/create/create.go b/pkg/pdfcpu/create/create.go index 2bbcf312..0d258abe 100644 --- a/pkg/pdfcpu/create/create.go +++ b/pkg/pdfcpu/create/create.go @@ -332,7 +332,7 @@ func CreatePage( } for _, la := range p.LinkAnnots { - d, err := la.RenderDict(xRefTable, *pageDictIndRef) + d, err := la.RenderDict(xRefTable, pageDictIndRef) if err != nil { return nil, nil, &json.UnsupportedTypeError{} } @@ -382,7 +382,7 @@ func UpdatePage(xRefTable *model.XRefTable, dIndRef types.IndirectRef, d, res ty } for _, la := range p.LinkAnnots { - d, err := la.RenderDict(xRefTable, dIndRef) + d, err := la.RenderDict(xRefTable, &dIndRef) if err != nil { return err } diff --git a/pkg/pdfcpu/createAnnotations.go b/pkg/pdfcpu/createAnnotations.go index 1e9124fb..59fa4797 100644 --- a/pkg/pdfcpu/createAnnotations.go +++ b/pkg/pdfcpu/createAnnotations.go @@ -537,7 +537,7 @@ func createFileAttachmentAnnotation(xRefTable *model.XRefTable, pageIndRef types fn := filepath.Base(fileName) - s, err := types.EscapeUTF16String(fn) + s, err := types.EscapedUTF16String(fn) if err != nil { return nil, err } @@ -583,7 +583,7 @@ func createFileSpecDict(xRefTable *model.XRefTable, fileName string) (types.Dict } fn := filepath.Base(fileName) - s, err := types.EscapeUTF16String(fn) + s, err := types.EscapedUTF16String(fn) if err != nil { return nil, err } @@ -682,9 +682,9 @@ func createMovieAnnotation(xRefTable *model.XRefTable, pageIndRef types.Indirect return xRefTable.IndRefForNewObject(d) } -func createMediaRenditionAction(xRefTable *model.XRefTable, mediaClipDataDict *types.IndirectRef) types.Dict { +func createMediaRenditionAction(mediaClipDataDict *types.IndirectRef) types.Dict { - r := createMediaRendition(xRefTable, mediaClipDataDict) + r := createMediaRendition(mediaClipDataDict) return types.Dict( map[string]types.Object{ @@ -717,7 +717,7 @@ func createScreenAnnotation(xRefTable *model.XRefTable, pageIndRef types.Indirec return nil, err } - mediaRenditionAction := createMediaRenditionAction(xRefTable, ir) + mediaRenditionAction := createMediaRenditionAction(ir) selectorRenditionAction := createSelectorRenditionAction(ir) diff --git a/pkg/pdfcpu/createRenditions.go b/pkg/pdfcpu/createRenditions.go index 76cc3b0e..dcf0c065 100644 --- a/pkg/pdfcpu/createRenditions.go +++ b/pkg/pdfcpu/createRenditions.go @@ -267,7 +267,7 @@ func createScreenParamsDict() *types.Dict { return &d1 } -func createMediaRendition(xRefTable *model.XRefTable, mediaClipDataDict *types.IndirectRef) *types.Dict { +func createMediaRendition(mediaClipDataDict *types.IndirectRef) *types.Dict { mhbe := createMHBEDict() diff --git a/pkg/pdfcpu/createTestPDF.go b/pkg/pdfcpu/createTestPDF.go index 700d01a2..617e4140 100644 --- a/pkg/pdfcpu/createTestPDF.go +++ b/pkg/pdfcpu/createTestPDF.go @@ -34,13 +34,25 @@ var ( ) func CreateXRefTableWithRootDict() (*model.XRefTable, error) { + // TODO + //xRefTable := model.NewXRefTable(nil) xRefTable := &model.XRefTable{ - Table: map[int]*model.XRefTableEntry{}, - Names: map[string]*model.Node{}, - PageAnnots: map[int]model.PgAnnots{}, - Stats: model.NewPDFStats(), - URIs: map[int]map[string]string{}, - UsedGIDs: map[string]map[uint16]bool{}, + Table: map[int]*model.XRefTableEntry{}, + Names: map[string]*model.Node{}, + NameRefs: map[string]model.NameMap{}, + KeywordList: types.StringSet{}, + Properties: map[string]string{}, + LinearizationObjs: types.IntSet{}, + PageAnnots: map[int]model.PgAnnots{}, + PageThumbs: map[int]types.IndirectRef{}, + Signatures: map[int]map[int]model.Signature{}, + Stats: model.NewPDFStats(), + ValidationMode: model.ValidationRelaxed, + ValidateLinks: false, + URIs: map[int]map[string]string{}, + UsedGIDs: map[string]map[uint16]bool{}, + FillFonts: map[string]types.IndirectRef{}, + Conf: nil, } xRefTable.Table[0] = model.NewFreeHeadXRefTableEntry() @@ -222,7 +234,7 @@ func CreateResourceDictInheritanceDemoXRef() (*model.XRefTable, error) { return xRefTable, nil } -func createFunctionalShadingDict(xRefTable *model.XRefTable) types.Dict { +func createFunctionalShadingDict() types.Dict { f := types.Dict( map[string]types.Object{ "FunctionType": types.Integer(2), @@ -241,7 +253,7 @@ func createFunctionalShadingDict(xRefTable *model.XRefTable) types.Dict { return d } -func createRadialShadingDict(xRefTable *model.XRefTable) types.Dict { +func createRadialShadingDict() types.Dict { f := types.Dict( map[string]types.Object{ "FunctionType": types.Integer(2), @@ -347,9 +359,9 @@ func addResources(xRefTable *model.XRefTable, pageDict types.Dict, fontName stri return err } - functionalBasedShDict := createFunctionalShadingDict(xRefTable) + functionalBasedShDict := createFunctionalShadingDict() - radialShDict := createRadialShadingDict(xRefTable) + radialShDict := createRadialShadingDict() f := types.Dict( map[string]types.Object{ @@ -1151,7 +1163,7 @@ func addThreads(xRefTable *model.XRefTable, rootDict types.Dict, pageIndRef type return nil } -func addOpenAction(xRefTable *model.XRefTable, rootDict types.Dict) error { +func addOpenAction(rootDict types.Dict) error { nextActionDict := types.Dict( map[string]types.Object{ "Type": types.Name("Action"), @@ -1176,7 +1188,7 @@ func addOpenAction(xRefTable *model.XRefTable, rootDict types.Dict) error { return nil } -func addURI(xRefTable *model.XRefTable, rootDict types.Dict) { +func addURI(rootDict types.Dict) { d := types.NewDict() d.InsertString("Base", "http://www.adobe.com") @@ -1214,7 +1226,7 @@ func addSpiderInfo(xRefTable *model.XRefTable, rootDict types.Dict) error { return nil } -func addOCProperties(xRefTable *model.XRefTable, rootDict types.Dict) error { +func addOCProperties(rootDict types.Dict) error { usageAppDict := types.Dict( map[string]types.Object{ "Event": types.Name("View"), @@ -1251,7 +1263,7 @@ func addOCProperties(xRefTable *model.XRefTable, rootDict types.Dict) error { return nil } -func addRequirements(xRefTable *model.XRefTable, rootDict types.Dict) { +func addRequirements(rootDict types.Dict) { d := types.NewDict() d.InsertName("Type", "Requirement") d.InsertName("S", "EnableJavaScripts") @@ -1283,24 +1295,24 @@ func CreateAnnotationDemoXRef() (*model.XRefTable, error) { return nil, err } - err = addOpenAction(xRefTable, rootDict) + err = addOpenAction(rootDict) if err != nil { return nil, err } - addURI(xRefTable, rootDict) + addURI(rootDict) err = addSpiderInfo(xRefTable, rootDict) if err != nil { return nil, err } - err = addOCProperties(xRefTable, rootDict) + err = addOCProperties(rootDict) if err != nil { return nil, err } - addRequirements(xRefTable, rootDict) + addRequirements(rootDict) return xRefTable, nil } @@ -1949,7 +1961,7 @@ func CreateContextWithXRefTable(conf *model.Configuration, pageDim *types.Dim) ( return CreateContext(xRefTable, conf), nil } -func createDemoContentStreamDict(xRefTable *model.XRefTable, pageDict types.Dict, b []byte) (*types.IndirectRef, error) { +func createDemoContentStreamDict(xRefTable *model.XRefTable, b []byte) (*types.IndirectRef, error) { sd, _ := xRefTable.NewStreamDictForBuf(b) if err := sd.Encode(); err != nil { return nil, err @@ -1980,7 +1992,7 @@ func createDemoPage(xRefTable *model.XRefTable, parentPageIndRef types.IndirectR pageDict.Insert("Resources", resDict) } - ir, err := createDemoContentStreamDict(xRefTable, pageDict, p.Buf.Bytes()) + ir, err := createDemoContentStreamDict(xRefTable, p.Buf.Bytes()) if err != nil { return nil, err } diff --git a/pkg/pdfcpu/crypto.go b/pkg/pdfcpu/crypto.go index 44c1e9ee..7b3890ac 100644 --- a/pkg/pdfcpu/crypto.go +++ b/pkg/pdfcpu/crypto.go @@ -26,10 +26,12 @@ import ( "crypto/rand" "crypto/rc4" "crypto/sha256" + "crypto/sha512" "encoding/binary" "encoding/hex" "fmt" "io" + "math/big" "strconv" "time" @@ -37,6 +39,9 @@ import ( "github.com/angel-one/pdfcpu/pkg/pdfcpu/model" "github.com/angel-one/pdfcpu/pkg/pdfcpu/types" "github.com/pkg/errors" + + "golang.org/x/text/secure/precis" + "golang.org/x/text/unicode/norm" ) var ( @@ -99,6 +104,7 @@ var ( model.IMPORTBOOKMARKS: {0, 1}, model.EXPORTBOOKMARKS: {0, 1}, model.LISTIMAGES: {0, 1}, + model.UPDATEIMAGES: {0, 1}, model.CREATE: {0, 0}, model.DUMP: {0, 1}, model.LISTFORMFIELDS: {0, 0}, @@ -117,14 +123,14 @@ var ( model.LISTVIEWERPREFERENCES: {0, 1}, model.SETVIEWERPREFERENCES: {0, 1}, model.RESETVIEWERPREFERENCES: {0, 1}, + model.ZOOM: {0, 1}, } - ErrUnknownEncryption = errors.New("pdfcpu: PDF 2.0 encryption not supported") + ErrUnknownEncryption = errors.New("pdfcpu: unknown encryption") ) // NewEncryptDict creates a new EncryptDict using the standard security handler. -func newEncryptDict(needAES bool, keyLength int, permissions int16) types.Dict { - +func newEncryptDict(v model.Version, needAES bool, keyLength int, permissions int16) types.Dict { d := types.NewDict() d.Insert("Filter", types.Name("Standard")) @@ -135,8 +141,11 @@ func newEncryptDict(needAES bool, keyLength int, permissions int16) types.Dict { if keyLength == 256 { i = 5 } - d.Insert("R", types.Integer(i)) d.Insert("V", types.Integer(i)) + if v == model.V20 { + i++ + } + d.Insert("R", types.Integer(i)) } else { d.Insert("R", types.Integer(2)) d.Insert("V", types.Integer(1)) @@ -183,7 +192,6 @@ func newEncryptDict(needAES bool, keyLength int, permissions int16) types.Dict { } func encKey(userpw string, e *model.Enc) (key []byte) { - // 2a pw := []byte(userpw) if len(pw) >= 32 { @@ -235,11 +243,14 @@ func encKey(userpw string, e *model.Enc) (key []byte) { // validateUserPassword validates the user password aka document open password. func validateUserPassword(ctx *model.Context) (ok bool, err error) { - if ctx.E.R == 5 { return validateUserPasswordAES256(ctx) } + if ctx.E.R == 6 { + return validateUserPasswordAES256Rev6(ctx) + } + // Alg.4/5 p63 // 4a/5a create encryption key using Alg.2 p61 @@ -263,7 +274,6 @@ func validateUserPassword(ctx *model.Context) (ok bool, err error) { } func key(ownerpw, userpw string, r, l int) (key []byte) { - // 3a pw := []byte(ownerpw) if len(pw) == 0 { @@ -301,7 +311,6 @@ func key(ownerpw, userpw string, r, l int) (key []byte) { // O calculates the owner password digest. func o(ctx *model.Context) ([]byte, error) { - ownerpw := ctx.OwnerPW userpw := ctx.UserPW @@ -348,10 +357,8 @@ func o(ctx *model.Context) ([]byte, error) { // U calculates the user password digest. func u(ctx *model.Context) (u []byte, key []byte, err error) { - - // The PW string is generated from OS codepage characters by first converting the string to - // PDFDocEncoding. If input is Unicode, first convert to a codepage encoding , and then to - // PDFDocEncoding for backward compatibility. + // The PW string is generated from OS codepage characters by first converting the string to PDFDocEncoding. + // If input is Unicode, first convert to a codepage encoding , and then to PDFDocEncoding for backward compatibility. userpw := ctx.UserPW //fmt.Printf("U userpw=ctx.UserPW=%s\n", userpw) @@ -414,21 +421,41 @@ func validationSalt(bb []byte) []byte { } func keySalt(bb []byte) []byte { - return bb[40:] + return bb[40:48] } -func validateOwnerPasswordAES256(ctx *model.Context) (ok bool, err error) { +func decryptOE(ctx *model.Context, opw []byte) error { + b := append(opw, keySalt(ctx.E.O)...) + b = append(b, ctx.E.U...) + key := sha256.Sum256(b) + + cb, err := aes.NewCipher(key[:]) + if err != nil { + return err + } + + iv := make([]byte, 16) + ctx.EncKey = make([]byte, 32) + + mode := cipher.NewCBCDecrypter(cb, iv) + mode.CryptBlocks(ctx.EncKey, ctx.E.OE) + return nil +} + +func validateOwnerPasswordAES256(ctx *model.Context) (ok bool, err error) { if len(ctx.OwnerPW) == 0 { return false, nil } - // TODO Process PW with SASLPrep profile (RFC 4013) of stringprep (RFC 3454). - opw := []byte(ctx.OwnerPW) + opw, err := processInput(ctx.OwnerPW) + if err != nil { + return false, err + } + if len(opw) > 127 { opw = opw[:127] } - //fmt.Printf("opw <%s> isValidUTF8String: %t\n", opw, utf8.Valid(opw)) // Algorithm 3.2a 3. b := append(opw, validationSalt(ctx.E.O)...) @@ -439,32 +466,39 @@ func validateOwnerPasswordAES256(ctx *model.Context) (ok bool, err error) { return false, nil } - b = append(opw, keySalt(ctx.E.O)...) - b = append(b, ctx.E.U...) - key := sha256.Sum256(b) + if err := decryptOE(ctx, opw); err != nil { + return false, err + } + + return true, nil +} + +func decryptUE(ctx *model.Context, upw []byte) error { + key := sha256.Sum256(append(upw, keySalt(ctx.E.U)...)) cb, err := aes.NewCipher(key[:]) if err != nil { - return false, err + return err } iv := make([]byte, 16) ctx.EncKey = make([]byte, 32) mode := cipher.NewCBCDecrypter(cb, iv) - mode.CryptBlocks(ctx.EncKey, ctx.E.OE) + mode.CryptBlocks(ctx.EncKey, ctx.E.UE) - return true, nil + return nil } func validateUserPasswordAES256(ctx *model.Context) (ok bool, err error) { + upw, err := processInput(ctx.UserPW) + if err != nil { + return false, err + } - // TODO Process PW with SASLPrep profile (RFC 4013) of stringprep (RFC 3454). - upw := []byte(ctx.UserPW) if len(upw) > 127 { upw = upw[:127] } - //fmt.Printf("upw <%s> isValidUTF8String: %t\n", upw, utf8.Valid(upw)) // Algorithm 3.2a 4, s := sha256.Sum256(append(upw, validationSalt(ctx.E.U)...)) @@ -473,7 +507,112 @@ func validateUserPasswordAES256(ctx *model.Context) (ok bool, err error) { return false, nil } - key := sha256.Sum256(append(upw, keySalt(ctx.E.U)...)) + if err := decryptUE(ctx, upw); err != nil { + return false, err + } + + return true, nil +} + +func processInput(input string) ([]byte, error) { + // Create a new Precis profile for SASLprep + p := precis.NewIdentifier( + precis.BidiRule, + precis.Norm(norm.NFKC), + ) + + output, err := p.String(input) + if err != nil { + return nil, err + } + + return []byte(output), nil +} + +func hashRev6(input, pw, U []byte) ([]byte, int, error) { + // 7.6.4.3.4 Algorithm 2.B returns 32 bytes. + + mod3 := new(big.Int).SetUint64(3) + + k0 := sha256.Sum256(input) + k := k0[:] + + var e []byte + j := 0 + + for ; j < 64 || e[len(e)-1] > byte(j-32); j++ { + var k1 []byte + bb := append(pw, k...) + if len(U) > 0 { + bb = append(bb, U...) + } + for i := 0; i < 64; i++ { + k1 = append(k1, bb...) + } + + cb, err := aes.NewCipher(k[:16]) + if err != nil { + return nil, -1, err + } + + iv := k[16:32] + e = make([]byte, len(k1)) + mode := cipher.NewCBCEncrypter(cb, iv) + mode.CryptBlocks(e, k1) + + num := new(big.Int).SetBytes(e[:16]) + r := (new(big.Int).Mod(num, mod3)).Uint64() + + switch r { + case 0: + k0 := sha256.Sum256(e) + k = k0[:] + case 1: + k0 := sha512.Sum384(e) + k = k0[:] + case 2: + k0 := sha512.Sum512(e) + k = k0[:] + } + + } + + return k[:32], j, nil +} + +func validateOwnerPasswordAES256Rev6(ctx *model.Context) (ok bool, err error) { + if len(ctx.OwnerPW) == 0 { + return false, nil + } + + // Process PW with SASLPrep profile (RFC 4013) of stringprep (RFC 3454). + opw, err := processInput(ctx.OwnerPW) + if err != nil { + return false, err + } + + if len(opw) > 127 { + opw = opw[:127] + } + + // Algorithm 12 + bb := append(opw, validationSalt(ctx.E.O)...) + bb = append(bb, ctx.E.U...) + s, _, err := hashRev6(bb, opw, ctx.E.U) + if err != nil { + return false, err + } + + if !bytes.HasPrefix(ctx.E.O, s[:]) { + return false, nil + } + + bb = append(opw, keySalt(ctx.E.O)...) + bb = append(bb, ctx.E.U...) + key, _, err := hashRev6(bb, opw, ctx.E.U) + if err != nil { + return false, err + } cb, err := aes.NewCipher(key[:]) if err != nil { @@ -484,23 +623,67 @@ func validateUserPasswordAES256(ctx *model.Context) (ok bool, err error) { ctx.EncKey = make([]byte, 32) mode := cipher.NewCBCDecrypter(cb, iv) - mode.CryptBlocks(ctx.EncKey, ctx.E.UE) + mode.CryptBlocks(ctx.EncKey, ctx.E.OE) + + return true, nil +} + +func validateUserPasswordAES256Rev6(ctx *model.Context) (bool, error) { + if len(ctx.E.UE) != 32 { + return false, errors.New("UE: invalid length") + } + + upw, err := processInput(ctx.UserPW) + if err != nil { + return false, err + } + if len(upw) > 127 { + upw = upw[:127] + } + + // Validate U prefix + bb := append([]byte{}, upw...) + bb = append(bb, validationSalt(ctx.E.U)...) + s, _, err := hashRev6(bb, upw, nil) + if err != nil { + return false, err + } + if !bytes.HasPrefix(ctx.E.U, s) { + return false, nil + } + + // Derive decryption key + bb = append([]byte{}, upw...) + bb = append(bb, keySalt(ctx.E.U)...) + key, _, err := hashRev6(bb, upw, nil) + if err != nil { + return false, err + } + + block, err := aes.NewCipher(key) + if err != nil { + return false, err + } + + iv := make([]byte, 16) + encKey := make([]byte, 32) + cipher.NewCBCDecrypter(block, iv).CryptBlocks(encKey, ctx.E.UE) + ctx.EncKey = encKey return true, nil } // ValidateOwnerPassword validates the owner password aka change permissions password. func validateOwnerPassword(ctx *model.Context) (ok bool, err error) { - e := ctx.E if e.R == 5 { return validateOwnerPasswordAES256(ctx) } - // The PW string is generated from OS codepage characters by first converting the string to - // PDFDocEncoding. If input is Unicode, first convert to a codepage encoding , and then to - // PDFDocEncoding for backward compatibility. + if e.R == 6 { + return validateOwnerPasswordAES256Rev6(ctx) + } ownerpw := ctx.OwnerPW userpw := ctx.UserPW @@ -554,29 +737,67 @@ func validateOwnerPassword(ctx *model.Context) (ok bool, err error) { return ok, err } -// SupportedCFEntry returns true if all entries found are supported. -func supportedCFEntry(d types.Dict) (bool, error) { +func validateCFLength(len int, cfm *string) bool { + // See table 25 Length + + if cfm != nil { + if (*cfm == "AESV2" && len != 16) || (*cfm == "AESV3" && len != 32) { + return false + } + } + + // Standard security handler expresses in bytes. + minBytes, maxBytes := 5, 32 + if len < minBytes { + return false + } + if len <= maxBytes { + return true + } + + // Public security handler expresses in bits. + minBits, maxBits := 40, 256 + if len < minBits || len > maxBits { + return false + } + + if len%8 > 0 { + return false + } + + return true +} +func supportedCFEntry(d types.Dict) (bool, error) { cfm := d.NameEntry("CFM") if cfm != nil && *cfm != "V2" && *cfm != "AESV2" && *cfm != "AESV3" { return false, errors.New("pdfcpu: supportedCFEntry: invalid entry \"CFM\"") } + aes := cfm != nil && (*cfm == "AESV2" || *cfm == "AESV3") + ae := d.NameEntry("AuthEvent") if ae != nil && *ae != "DocOpen" { - return false, errors.New("pdfcpu: supportedCFEntry: invalid entry \"AuthEvent\"") + return aes, errors.New("pdfcpu: supportedCFEntry: invalid entry \"AuthEvent\"") } - l := d.IntEntry("Length") - if l != nil && (*l < 5 || *l > 16) && *l != 32 && *l != 256 { - return false, errors.New("pdfcpu: supportedCFEntry: invalid entry \"Length\"") + len := d.IntEntry("Length") + if len == nil { + return aes, nil } - return cfm != nil && (*cfm == "AESV2" || *cfm == "AESV3"), nil + if !validateCFLength(*len, cfm) { + s := "" + if cfm != nil { + s = *cfm + } + return false, errors.Errorf("pdfcpu: supportedCFEntry: invalid entry \"Length\" %d %s", *len, s) + } + + return aes, nil } func perms(p int) (list []string) { - list = append(list, fmt.Sprintf("permission bits: %012b (x%03X)", uint32(p)&0x0F3C, uint32(p)&0x0F3C)) list = append(list, fmt.Sprintf("Bit 3: %t (print(rev2), print quality(rev>=3))", p&0x0004 > 0)) list = append(list, fmt.Sprintf("Bit 4: %t (modify other than controlled by bits 6,9,11)", p&0x0008 > 0)) @@ -586,13 +807,11 @@ func perms(p int) (list []string) { list = append(list, fmt.Sprintf("Bit 10: %t (extract(rev>=3))", p&0x0200 > 0)) list = append(list, fmt.Sprintf("Bit 11: %t (modify(rev>=3))", p&0x0400 > 0)) list = append(list, fmt.Sprintf("Bit 12: %t (print high-level(rev>=3))", p&0x0800 > 0)) - return list } // PermissionsList returns a list of set permissions. func PermissionsList(p int) (list []string) { - if p == 0 { return append(list, "Full access") } @@ -602,7 +821,6 @@ func PermissionsList(p int) (list []string) { // Permissions returns a list of set permissions. func Permissions(ctx *model.Context) (list []string) { - p := 0 if ctx.E != nil { p = ctx.E.P @@ -612,10 +830,9 @@ func Permissions(ctx *model.Context) (list []string) { } func validatePermissions(ctx *model.Context) (bool, error) { - // Algorithm 3.2a 5. - if ctx.E.R != 5 { + if ctx.E.R != 5 && ctx.E.R != 6 { return true, nil } @@ -635,10 +852,9 @@ func validatePermissions(ctx *model.Context) (bool, error) { } func writePermissions(ctx *model.Context, d types.Dict) error { - // Algorithm 3.10 - if ctx.E.R != 5 { + if ctx.E.R != 5 && ctx.E.R != 6 { return nil } @@ -682,7 +898,6 @@ func logP(enc *model.Enc) { } func maskExtract(mode model.CommandMode, secHandlerRev int) int { - p, ok := perm[mode] // no permissions defined or don't need extract permission @@ -700,7 +915,6 @@ func maskExtract(mode model.CommandMode, secHandlerRev int) int { } func maskModify(mode model.CommandMode, secHandlerRev int) int { - p, ok := perm[mode] // no permissions defined or don't need modify permission @@ -719,7 +933,6 @@ func maskModify(mode model.CommandMode, secHandlerRev int) int { // HasNeededPermissions returns true if permissions for pdfcpu processing are present. func hasNeededPermissions(mode model.CommandMode, enc *model.Enc) bool { - // see 7.6.3.2 logP(enc) @@ -741,18 +954,25 @@ func hasNeededPermissions(mode model.CommandMode, enc *model.Enc) bool { return true } -func getV(d types.Dict) (*int, error) { - +func getV(ctx *model.Context, d types.Dict, l int) (*int, error) { v := d.IntEntry("V") if v == nil || (*v != 1 && *v != 2 && *v != 4 && *v != 5) { return nil, errors.Errorf("getV: \"V\" must be one of 1,2,4,5") } + if *v == 5 { + if l != 256 { + return nil, errors.Errorf("getV: \"V\" 5 invalid length, must be 256, got %d", l) + } + if ctx.XRefTable.Version() != model.V20 && ctx.XRefTable.ValidationMode == model.ValidationStrict { + return nil, errors.New("getV: 5 valid for PDF 2.0 only") + } + } + return v, nil } func checkStmf(ctx *model.Context, stmf *string, cfDict types.Dict) error { - if stmf != nil && *stmf != "Identity" { d := cfDict.DictEntry(*stmf) @@ -770,9 +990,8 @@ func checkStmf(ctx *model.Context, stmf *string, cfDict types.Dict) error { return nil } -func checkV(ctx *model.Context, d types.Dict) (*int, error) { - - v, err := getV(d) +func checkV(ctx *model.Context, d types.Dict, l int) (*int, error) { + v, err := getV(ctx, d, l) if err != nil { return nil, err } @@ -827,7 +1046,6 @@ func checkV(ctx *model.Context, d types.Dict) (*int, error) { } func length(d types.Dict) (int, error) { - l := d.IntEntry("Length") if l == nil { return 40, nil @@ -840,23 +1058,27 @@ func length(d types.Dict) (int, error) { return *l, nil } -func getR(d types.Dict) (int, error) { +func getR(ctx *model.Context, d types.Dict) (int, error) { + maxR := 5 + if ctx.XRefTable.Version() == model.V20 || ctx.XRefTable.ValidationMode == model.ValidationRelaxed { + maxR = 6 + } r := d.IntEntry("R") - if r == nil || *r < 2 || *r > 5 { - if r != nil && *r > 5 { - return 0, ErrUnknownEncryption - } - return 0, errors.New("pdfcpu: encryption: \"R\" must be 2,3,4,5") + if r == nil || *r < 2 || *r > maxR { + return 0, ErrUnknownEncryption } return *r, nil } func validateAlgorithm(ctx *model.Context) (ok bool) { - k := ctx.EncryptKeyLength + if ctx.XRefTable.Version() == model.V20 { + return ctx.EncryptUsingAES && k == 256 + } + if ctx.EncryptUsingAES { return k == 40 || k == 128 || k == 256 } @@ -865,76 +1087,86 @@ func validateAlgorithm(ctx *model.Context) (ok bool) { } func validateAES256Parameters(d types.Dict) (oe, ue, perms []byte, err error) { + // OE + oe, err = d.StringEntryBytes("OE") + if err != nil { + return nil, nil, nil, err + } + if len(oe) != 32 { + return nil, nil, nil, errors.New("pdfcpu: encryption dictionary: 'OE' entry missing or not 32 bytes") + } - for { - - // OE - oe, err = d.StringEntryBytes("OE") - if err != nil { - break - } - if oe == nil || len(oe) != 32 { - err = errors.New("pdfcpu: unsupported encryption: required entry \"OE\" missing or invalid") - break - } - - // UE - ue, err = d.StringEntryBytes("UE") - if err != nil { - break - } - if ue == nil || len(ue) != 32 { - err = errors.New("pdfcpu: unsupported encryption: required entry \"UE\" missing or invalid") - break - } - - // Perms - perms, err = d.StringEntryBytes("Perms") - if err != nil { - break - } - if perms == nil || len(perms) != 16 { - err = errors.New("pdfcpu: unsupported encryption: required entry \"Perms\" missing or invalid") - } + // UE + ue, err = d.StringEntryBytes("UE") + if err != nil { + return nil, nil, nil, err + } + if len(ue) != 32 { + return nil, nil, nil, errors.New("pdfcpu: encryption dictionary: 'UE' entry missing or not 32 bytes") + } - break + // Perms + perms, err = d.StringEntryBytes("Perms") + if err != nil { + return nil, nil, nil, err + } + if len(perms) != 16 { + return nil, nil, nil, errors.New("pdfcpu: encryption dictionary: 'Perms' entry missing or not 16 bytes") } - return oe, ue, perms, err + return oe, ue, perms, nil } -func validateOAndU(d types.Dict) (o, u []byte, err error) { - - for { +func validateOAndU(ctx *model.Context, d types.Dict, r int) (o, u []byte, err error) { + // O, 32 bytes long if the value of R is 4 or less and 48 bytes long if the value of R is 6. + o, err = d.StringEntryBytes("O") + if err != nil { + return nil, nil, err + } - // O - o, err = d.StringEntryBytes("O") - if err != nil { - break + if ctx.XRefTable.ValidationMode == model.ValidationStrict { + if r == 6 && len(o) < 48 { + return nil, nil, errors.New("pdfcpu: unsupported encryption: missing or invalid required entry \"O\"") } - if o == nil || len(o) != 32 && len(o) != 48 { - err = errors.New("pdfcpu: unsupported encryption: missing or invalid required entry \"O\"") - break + if r <= 4 && len(o) < 32 { + return nil, nil, errors.New("pdfcpu: unsupported encryption: missing or invalid required entry \"O\"") } + } - // U - u, err = d.StringEntryBytes("U") - if err != nil { - break + // if l := len(o); l != 32 && l != 48 { + // if ctx.XRefTable.ValidationMode == model.ValidationStrict || l < 48 { + // return nil, nil, errors.New("pdfcpu: unsupported encryption: missing or invalid required entry \"O\"") + // } + // o = o[:48] // len(o) > 48, truncate + // } + + // U, 32 bytes long if the value of R is 4 or less and 48 bytes long if the value of R is 6. + u, err = d.StringEntryBytes("U") + if err != nil { + return nil, nil, err + } + + if ctx.XRefTable.ValidationMode == model.ValidationStrict { + if r == 6 && len(u) < 48 { + return nil, nil, errors.New("pdfcpu: unsupported encryption: missing or invalid required entry \"O\"") } - if u == nil || len(u) != 32 && len(u) != 48 { - err = errors.New("pdfcpu: unsupported encryption: missing or invalid required entry \"U\"") + if r <= 4 && len(u) < 32 { + return nil, nil, errors.New("pdfcpu: unsupported encryption: missing or invalid required entry \"O\"") } - - break } - return o, u, err + // if l := len(u); l != 32 && l != 48 { + // if ctx.XRefTable.ValidationMode == model.ValidationStrict || l < 48 { // Fix 1163 + // return nil, nil, errors.New("pdfcpu: unsupported encryption: missing or invalid required entry \"U\"") + // } + // u = u[:48] + // } + + return o, u, nil } // SupportedEncryption returns a pointer to a struct encapsulating used encryption. func supportedEncryption(ctx *model.Context, d types.Dict) (*model.Enc, error) { - // Filter filter := d.NameEntry("Filter") if filter == nil || *filter != "Standard" { @@ -946,31 +1178,31 @@ func supportedEncryption(ctx *model.Context, d types.Dict) (*model.Enc, error) { return nil, errors.New("pdfcpu: unsupported encryption: \"SubFilter\" not supported") } - // V - v, err := checkV(ctx, d) + // Length + l, err := length(d) if err != nil { return nil, err } - // Length - l, err := length(d) + // V + v, err := checkV(ctx, d, l) if err != nil { return nil, err } // R - r, err := getR(d) + r, err := getR(ctx, d) if err != nil { return nil, err } - o, u, err := validateOAndU(d) + o, u, err := validateOAndU(ctx, d, r) if err != nil { return nil, err } var oe, ue, perms []byte - if r == 5 { + if r == 5 || r == 6 { oe, ue, perms, err = validateAES256Parameters(d) if err != nil { return nil, err @@ -1005,7 +1237,6 @@ func supportedEncryption(ctx *model.Context, d types.Dict) (*model.Enc, error) { } func decryptKey(objNumber, generation int, key []byte, aes bool) []byte { - m := md5.New() nr := uint32(objNumber) @@ -1034,69 +1265,31 @@ func decryptKey(objNumber, generation int, key []byte, aes bool) []byte { // EncryptBytes encrypts s using RC4 or AES. func encryptBytes(b []byte, objNr, genNr int, encKey []byte, needAES bool, r int) ([]byte, error) { - if needAES { k := encKey - if r != 5 { + if r != 5 && r != 6 { k = decryptKey(objNr, genNr, encKey, needAES) } - bb, err := encryptAESBytes(b, k) - if err != nil { - return nil, err - } - return bb, nil + return encryptAESBytes(b, k) } return applyRC4CipherBytes(b, objNr, genNr, encKey, needAES) } -// EncryptString encrypts s using RC4 or AES. -func encryptString(s string, objNr, genNr int, key []byte, needAES bool, r int) (*string, error) { - - b, err := encryptBytes([]byte(s), objNr, genNr, key, needAES, r) - if err != nil { - return nil, err - } - - s1, err := types.Escape(string(b)) - if err != nil { - return nil, err - } - - return s1, err -} - // decryptBytes decrypts bb using RC4 or AES. func decryptBytes(b []byte, objNr, genNr int, encKey []byte, needAES bool, r int) ([]byte, error) { - if needAES { k := encKey - if r != 5 { + if r != 5 && r != 6 { k = decryptKey(objNr, genNr, encKey, needAES) } - bb, err := decryptAESBytes(b, k) - if err != nil { - return nil, err - } - return bb, nil + return decryptAESBytes(b, k) } return applyRC4CipherBytes(b, objNr, genNr, encKey, needAES) } -// decryptString decrypts s using RC4 or AES. -func decryptString(s string, objNr, genNr int, key []byte, needAES bool, r int) ([]byte, error) { - - bb, err := types.Unescape(s) - if err != nil { - return nil, err - } - - return decryptBytes(bb, objNr, genNr, key, needAES, r) -} - func applyRC4CipherBytes(b []byte, objNr, genNr int, key []byte, needAES bool) ([]byte, error) { - c, err := rc4.NewCipher(decryptKey(objNr, genNr, key, needAES)) if err != nil { return nil, err @@ -1108,14 +1301,13 @@ func applyRC4CipherBytes(b []byte, objNr, genNr int, key []byte, needAES bool) ( } func encrypt(m map[string]types.Object, k string, v types.Object, objNr, genNr int, key []byte, needAES bool, r int) error { - s, err := encryptDeepObject(v, objNr, genNr, key, needAES, r) if err != nil { return err } if s != nil { - m[k] = *s + m[k] = s } return nil @@ -1128,7 +1320,7 @@ func encryptDict(d types.Dict, objNr, genNr int, key []byte, needAES bool, r int ft = d["Type"] } if ft != nil { - if ftv, ok := ft.(types.Name); ok && ftv == "Sig" { + if ftv, ok := ft.(types.Name); ok && (ftv == "Sig" || ftv == "DocTimeStamp") { isSig = true } } @@ -1145,9 +1337,88 @@ func encryptDict(d types.Dict, objNr, genNr int, key []byte, needAES bool, r int return nil } -// EncryptDeepObject recurses over non trivial PDF objects and encrypts all strings encountered. -func encryptDeepObject(objIn types.Object, objNr, genNr int, key []byte, needAES bool, r int) (*types.HexLiteral, error) { +func encryptStringLiteral(sl types.StringLiteral, objNr, genNr int, key []byte, needAES bool, r int) (*types.StringLiteral, error) { + bb, err := types.Unescape(sl.Value()) + if err != nil { + return nil, err + } + + bb, err = encryptBytes(bb, objNr, genNr, key, needAES, r) + if err != nil { + return nil, err + } + + s, err := types.Escape(string(bb)) + if err != nil { + return nil, err + } + + sl = types.StringLiteral(*s) + + return &sl, nil +} + +func decryptStringLiteral(sl types.StringLiteral, objNr, genNr int, key []byte, needAES bool, r int) (*types.StringLiteral, error) { + if sl.Value() == "" { + return &sl, nil + } + bb, err := types.Unescape(sl.Value()) + if err != nil { + return nil, err + } + + bb, err = decryptBytes(bb, objNr, genNr, key, needAES, r) + if err != nil { + return nil, err + } + + s, err := types.Escape(string(bb)) + if err != nil { + return nil, err + } + + sl = types.StringLiteral(*s) + + return &sl, nil +} + +func encryptHexLiteral(hl types.HexLiteral, objNr, genNr int, key []byte, needAES bool, r int) (*types.HexLiteral, error) { + bb, err := hl.Bytes() + if err != nil { + return nil, err + } + + bb, err = encryptBytes(bb, objNr, genNr, key, needAES, r) + if err != nil { + return nil, err + } + + hl = types.NewHexLiteral(bb) + + return &hl, nil +} + +func decryptHexLiteral(hl types.HexLiteral, objNr, genNr int, key []byte, needAES bool, r int) (*types.HexLiteral, error) { + if hl.Value() == "" { + return &hl, nil + } + bb, err := hl.Bytes() + if err != nil { + return nil, err + } + + bb, err = decryptBytes(bb, objNr, genNr, key, needAES, r) + if err != nil { + return nil, err + } + + hl = types.NewHexLiteral(bb) + return &hl, nil +} + +// EncryptDeepObject recurses over non trivial PDF objects and encrypts all strings encountered. +func encryptDeepObject(objIn types.Object, objNr, genNr int, key []byte, needAES bool, r int) (types.Object, error) { _, ok := objIn.(types.IndirectRef) if ok { return nil, nil @@ -1174,26 +1445,23 @@ func encryptDeepObject(objIn types.Object, objNr, genNr int, key []byte, needAES return nil, err } if s != nil { - obj[i] = *s + obj[i] = s } } case types.StringLiteral: - s := obj.Value() - b, err := encryptBytes([]byte(s), objNr, genNr, key, needAES, r) + sl, err := encryptStringLiteral(obj, objNr, genNr, key, needAES, r) if err != nil { return nil, err } - hl := types.NewHexLiteral(b) - return &hl, nil + return *sl, nil case types.HexLiteral: - bb, err := encryptHexLiteral(obj, objNr, genNr, key, needAES, r) + hl, err := encryptHexLiteral(obj, objNr, genNr, key, needAES, r) if err != nil { return nil, err } - hl := types.NewHexLiteral(bb) - return &hl, nil + return *hl, nil default: @@ -1209,7 +1477,7 @@ func decryptDict(d types.Dict, objNr, genNr int, key []byte, needAES bool, r int ft = d["Type"] } if ft != nil { - if ftv, ok := ft.(types.Name); ok && ftv == "Sig" { + if ftv, ok := ft.(types.Name); ok && (ftv == "Sig" || ftv == "DocTimeStamp") { isSig = true } } @@ -1222,14 +1490,13 @@ func decryptDict(d types.Dict, objNr, genNr int, key []byte, needAES bool, r int return err } if s != nil { - d[k] = *s + d[k] = s } } return nil } -func decryptDeepObject(objIn types.Object, objNr, genNr int, key []byte, needAES bool, r int) (*types.HexLiteral, error) { - +func decryptDeepObject(objIn types.Object, objNr, genNr int, key []byte, needAES bool, r int) (types.Object, error) { _, ok := objIn.(types.IndirectRef) if ok { return nil, nil @@ -1249,25 +1516,23 @@ func decryptDeepObject(objIn types.Object, objNr, genNr int, key []byte, needAES return nil, err } if s != nil { - obj[i] = *s + obj[i] = s } } case types.StringLiteral: - bb, err := decryptString(obj.Value(), objNr, genNr, key, needAES, r) + sl, err := decryptStringLiteral(obj, objNr, genNr, key, needAES, r) if err != nil { return nil, err } - hl := types.NewHexLiteral(bb) - return &hl, nil + return *sl, nil case types.HexLiteral: - bb, err := decryptHexLiteral(obj, objNr, genNr, key, needAES, r) + hl, err := decryptHexLiteral(obj, objNr, genNr, key, needAES, r) if err != nil { return nil, err } - hl := types.NewHexLiteral(bb) - return &hl, nil + return *hl, nil default: @@ -1278,9 +1543,8 @@ func decryptDeepObject(objIn types.Object, objNr, genNr int, key []byte, needAES // EncryptStream encrypts a stream buffer using RC4 or AES. func encryptStream(buf []byte, objNr, genNr int, encKey []byte, needAES bool, r int) ([]byte, error) { - k := encKey - if r != 5 { + if r != 5 && r != 6 { k = decryptKey(objNr, genNr, encKey, needAES) } @@ -1293,9 +1557,8 @@ func encryptStream(buf []byte, objNr, genNr int, encKey []byte, needAES bool, r // decryptStream decrypts a stream buffer using RC4 or AES. func decryptStream(buf []byte, objNr, genNr int, encKey []byte, needAES bool, r int) ([]byte, error) { - k := encKey - if r != 5 { + if r != 5 && r != 6 { k = decryptKey(objNr, genNr, encKey, needAES) } @@ -1307,7 +1570,6 @@ func decryptStream(buf []byte, objNr, genNr int, encKey []byte, needAES bool, r } func applyRC4Bytes(buf, key []byte) ([]byte, error) { - c, err := rc4.NewCipher(key) if err != nil { return nil, err @@ -1326,7 +1588,6 @@ func applyRC4Bytes(buf, key []byte) ([]byte, error) { } func encryptAESBytes(b, key []byte) ([]byte, error) { - // pad b to aes.Blocksize l := len(b) % aes.BlockSize c := 0x10 @@ -1363,7 +1624,6 @@ func encryptAESBytes(b, key []byte) ([]byte, error) { } func decryptAESBytes(b, key []byte) ([]byte, error) { - if len(b) < aes.BlockSize { return nil, errors.New("pdfcpu: decryptAESBytes: Ciphertext too short") } @@ -1395,7 +1655,6 @@ func decryptAESBytes(b, key []byte) ([]byte, error) { } func fileID(ctx *model.Context) (types.HexLiteral, error) { - // see also 14.4 File Identifiers. // The calculation of the file identifier need not be reproducible; @@ -1415,7 +1674,7 @@ func fileID(ctx *model.Context) (types.HexLiteral, error) { h.Write([]byte(strconv.Itoa(ctx.Read.ReadFileSize()))) // All values of the info dict which is assumed to be there at this point. - if ctx.Version() < model.V20 { + if ctx.XRefTable.Version() < model.V20 { d, err := ctx.DereferenceDict(*ctx.Info) if err != nil { return "", err @@ -1434,87 +1693,77 @@ func fileID(ctx *model.Context) (types.HexLiteral, error) { return types.HexLiteral(hex.EncodeToString(m)), nil } -func encryptHexLiteral(hl types.HexLiteral, objNr, genNr int, key []byte, needAES bool, r int) ([]byte, error) { - - bb, err := hl.Bytes() - if err != nil { - return nil, err - } - - return encryptBytes(bb, objNr, genNr, key, needAES, r) +func calcFileEncKey(ctx *model.Context) error { + ctx.EncKey = make([]byte, 32) + _, err := io.ReadFull(rand.Reader, ctx.EncKey) + return err } -func decryptHexLiteral(hl types.HexLiteral, objNr, genNr int, key []byte, needAES bool, r int) ([]byte, error) { - - bb, err := hl.Bytes() +func calcOAndUAES256(ctx *model.Context, d types.Dict) (err error) { + b := make([]byte, 16) + _, err = io.ReadFull(rand.Reader, b) if err != nil { - return nil, err + return err } - return decryptBytes(bb, objNr, genNr, key, needAES, r) -} + u := append(make([]byte, 32), b...) + upw := []byte(ctx.UserPW) + h := sha256.Sum256(append(upw, validationSalt(u)...)) -func calcFileEncKeyFromUE(ctx *model.Context) (k []byte, err error) { + ctx.E.U = append(h[:], b...) + d.Update("U", types.HexLiteral(hex.EncodeToString(ctx.E.U))) - upw := []byte(ctx.OwnerPW) - key := sha256.Sum256(append(upw, keySalt(ctx.E.U)...)) + /////////////////////////////////// - cb, err := aes.NewCipher(key[:]) + b = make([]byte, 16) + _, err = io.ReadFull(rand.Reader, b) if err != nil { - return nil, err + return err } - iv := make([]byte, 16) - k = make([]byte, 32) - - mode := cipher.NewCBCDecrypter(cb, iv) - mode.CryptBlocks(k, ctx.E.UE) - - return k, nil -} - -// func calcFileEncKeyFromOE(ctx *model.Context) (k []byte, err error) { + o := append(make([]byte, 32), b...) + opw := []byte(ctx.OwnerPW) + c := append(opw, validationSalt(o)...) + h = sha256.Sum256(append(c, ctx.E.U...)) + ctx.E.O = append(h[:], b...) + d.Update("O", types.HexLiteral(hex.EncodeToString(ctx.E.O))) -// opw := []byte(ctx.OwnerPW) -// b := append(opw, keySalt(ctx.E.O)...) -// b = append(b, ctx.E.U...) -// key := sha256.Sum256(b) + ////////////////////////////////// -// cb, err := aes.NewCipher(key[:]) -// if err != nil { -// return nil, err -// } + if err := calcFileEncKey(ctx); err != nil { + return err + } -// iv := make([]byte, 16) -// k = make([]byte, 32) + ////////////////////////////////// -// mode := cipher.NewCBCDecrypter(cb, iv) -// mode.CryptBlocks(k, ctx.E.OE) + h = sha256.Sum256(append(upw, keySalt(u)...)) + cb, err := aes.NewCipher(h[:]) + if err != nil { + return err + } -// return k, nil -// } + iv := make([]byte, 16) + mode := cipher.NewCBCEncrypter(cb, iv) + mode.CryptBlocks(ctx.E.UE, ctx.EncKey) + d.Update("UE", types.HexLiteral(hex.EncodeToString(ctx.E.UE))) -func calcFileEncKey(ctx *model.Context, d types.Dict) (err error) { + ////////////////////////////////// - // Calc Random UE (32 bytes) - ue := make([]byte, 32) - _, err = io.ReadFull(rand.Reader, ue) + c = append(opw, keySalt(o)...) + h = sha256.Sum256(append(c, ctx.E.U...)) + cb, err = aes.NewCipher(h[:]) if err != nil { return err } - ctx.E.UE = ue - d.Update("UE", types.HexLiteral(hex.EncodeToString(ctx.E.UE))) - - // Calc file encryption key. - ctx.EncKey, err = calcFileEncKeyFromUE(ctx) + mode = cipher.NewCBCEncrypter(cb, iv) + mode.CryptBlocks(ctx.E.OE, ctx.EncKey) + d.Update("OE", types.HexLiteral(hex.EncodeToString(ctx.E.OE))) - return err + return nil } -func calcOAndUAES256(ctx *model.Context, d types.Dict) (err error) { - - // 1) Calc U. +func calcOAndUAES256Rev6(ctx *model.Context, d types.Dict) (err error) { b := make([]byte, 16) _, err = io.ReadFull(rand.Reader, b) if err != nil { @@ -1523,11 +1772,16 @@ func calcOAndUAES256(ctx *model.Context, d types.Dict) (err error) { u := append(make([]byte, 32), b...) upw := []byte(ctx.UserPW) - h := sha256.Sum256(append(upw, validationSalt(u)...)) + h, _, err := hashRev6(append(upw, validationSalt(u)...), upw, nil) + if err != nil { + return err + } + ctx.E.U = append(h[:], b...) d.Update("U", types.HexLiteral(hex.EncodeToString(ctx.E.U))) - // 2) Calc O (depends on U). + /////////////////////////// + b = make([]byte, 16) _, err = io.ReadFull(rand.Reader, b) if err != nil { @@ -1537,17 +1791,27 @@ func calcOAndUAES256(ctx *model.Context, d types.Dict) (err error) { o := append(make([]byte, 32), b...) opw := []byte(ctx.OwnerPW) c := append(opw, validationSalt(o)...) - h = sha256.Sum256(append(c, ctx.E.U...)) + h, _, err = hashRev6(append(c, ctx.E.U...), opw, ctx.E.U) + if err != nil { + return err + } + ctx.E.O = append(h[:], b...) d.Update("O", types.HexLiteral(hex.EncodeToString(ctx.E.O))) - err = calcFileEncKey(ctx, d) + /////////////////////////// + + if err := calcFileEncKey(ctx); err != nil { + return err + } + + /////////////////////////// + + h, _, err = hashRev6(append(upw, keySalt(u)...), upw, nil) if err != nil { return err } - // Encrypt file encryption key into UE. - h = sha256.Sum256(append(upw, keySalt(u)...)) cb, err := aes.NewCipher(h[:]) if err != nil { return err @@ -1558,9 +1822,14 @@ func calcOAndUAES256(ctx *model.Context, d types.Dict) (err error) { mode.CryptBlocks(ctx.E.UE, ctx.EncKey) d.Update("UE", types.HexLiteral(hex.EncodeToString(ctx.E.UE))) - // Encrypt file encryption key into OE. + ////////////////////////////// + c = append(opw, keySalt(o)...) - h = sha256.Sum256(append(c, ctx.E.U...)) + h, _, err = hashRev6(append(c, ctx.E.U...), opw, ctx.E.U) + if err != nil { + return err + } + cb, err = aes.NewCipher(h[:]) if err != nil { return err @@ -1574,11 +1843,14 @@ func calcOAndUAES256(ctx *model.Context, d types.Dict) (err error) { } func calcOAndU(ctx *model.Context, d types.Dict) (err error) { - if ctx.E.R == 5 { return calcOAndUAES256(ctx, d) } + if ctx.E.R == 6 { + return calcOAndUAES256Rev6(ctx, d) + } + ctx.E.O, err = o(ctx) if err != nil { return err diff --git a/pkg/pdfcpu/cut.go b/pkg/pdfcpu/cut.go index e20a73ee..d51ab03e 100644 --- a/pkg/pdfcpu/cut.go +++ b/pkg/pdfcpu/cut.go @@ -147,6 +147,7 @@ func createOutline( ctxSrc, ctxDest *model.Context, pagesIndRef types.IndirectRef, pagesDict, d types.Dict, + pageNr int, cropBox *types.Rectangle, migrated map[int]int, cut *model.Cut) error { @@ -177,7 +178,7 @@ func createOutline( drawOutlineCuts(&buf, cropBox, cb, cut) - bb, err := ctxSrc.PageContent(d1) + bb, err := ctxSrc.PageContent(d1, pageNr) if err != nil { return err } @@ -223,7 +224,7 @@ func createOutline( return nil } -func prepForCut(ctxSrc *model.Context, i int) ( +func prepForCut(ctxSrc *model.Context, pageNr int) ( *model.Context, *types.Rectangle, *types.IndirectRef, @@ -247,12 +248,12 @@ func prepForCut(ctxSrc *model.Context, i int) ( return nil, nil, nil, nil, nil, nil, err } - d, _, inhPAttrs, err := ctxSrc.PageDict(i, false) + d, _, inhPAttrs, err := ctxSrc.PageDict(pageNr, false) if err != nil { return nil, nil, nil, nil, nil, nil, err } if d == nil { - return nil, nil, nil, nil, nil, nil, errors.Errorf("pdfcpu: unknown page number: %d\n", i) + return nil, nil, nil, nil, nil, nil, errors.Errorf("pdfcpu: unknown page number: %d\n", pageNr) } d.Delete("Annots") @@ -264,8 +265,8 @@ func prepForCut(ctxSrc *model.Context, i int) ( return ctxDest, cropBox, pagesIndRef, pagesDict, d, inhPAttrs, nil } -func internPageRot(ctxSrc *model.Context, rotate int, cropBox *types.Rectangle, d types.Dict, trans []byte) error { - bb, err := ctxSrc.PageContent(d) +func internPageRot(ctxSrc *model.Context, rotate int, cropBox *types.Rectangle, d types.Dict, pageNr int, trans []byte) error { + bb, err := ctxSrc.PageContent(d, pageNr) if err != nil { return err } @@ -297,7 +298,7 @@ func internPageRot(ctxSrc *model.Context, rotate int, cropBox *types.Rectangle, return nil } -func handleCutMargin(ctxSrc *model.Context, d, d1 types.Dict, cropBox, cb *types.Rectangle, i, j int, w, h float64, sc *float64, cut *model.Cut) error { +func handleCutMargin(ctxSrc *model.Context, d, d1 types.Dict, pageNr int, cropBox, cb *types.Rectangle, i, j int, w, h float64, sc *float64, cut *model.Cut) error { ar := cb.AspectRatio() mv := cut.Margin / ar @@ -355,7 +356,7 @@ func handleCutMargin(ctxSrc *model.Context, d, d1 types.Dict, cropBox, cb *types var trans bytes.Buffer fmt.Fprintf(&trans, "q %.5f %.5f %.5f %.5f %.5f %.5f cm ", m[0][0], m[0][1], m[1][0], m[1][1], m[2][0], m[2][1]) - bbOrig, err := ctxSrc.PageContent(d) + bbOrig, err := ctxSrc.PageContent(d, pageNr) if err != nil { return err } @@ -383,6 +384,7 @@ func createTiles( ctxSrc, ctxDest *model.Context, pagesIndRef types.IndirectRef, pagesDict, d types.Dict, + pageNr int, cropBox *types.Rectangle, inhPAttrs *model.InheritedPageAttrs, migrated map[int]int, @@ -422,7 +424,7 @@ func createTiles( d1["CropBox"] = cb.Array() if cut.Margin > 0 { - if err := handleCutMargin(ctxSrc, d, d1, cropBox, cb, i, j, w, h, &sc, cut); err != nil { + if err := handleCutMargin(ctxSrc, d, d1, pageNr, cropBox, cb, i, j, w, h, &sc, cut); err != nil { return err } } @@ -449,12 +451,12 @@ func createTiles( return nil } -func CutPage(ctxSrc *model.Context, i int, cut *model.Cut) (*model.Context, error) { +func CutPage(ctxSrc *model.Context, pageNr int, cut *model.Cut) (*model.Context, error) { // required: at least one of horizontalCut, verticalCut // optionally: border, margin, bgcolor - ctxDest, cropBox, pagesIndRef, pagesDict, d, inhPAttrs, err := prepForCut(ctxSrc, i) + ctxDest, cropBox, pagesIndRef, pagesDict, d, inhPAttrs, err := prepForCut(ctxSrc, pageNr) if err != nil { return nil, err } @@ -470,17 +472,17 @@ func CutPage(ctxSrc *model.Context, i int, cut *model.Cut) (*model.Context, erro d.Delete("Rotate") } - if err := internPageRot(ctxSrc, rotate, cropBox, d, nil); err != nil { + if err := internPageRot(ctxSrc, rotate, cropBox, d, pageNr, nil); err != nil { return nil, err } migrated := map[int]int{} - if err := createOutline(ctxSrc, ctxDest, *pagesIndRef, pagesDict, d, cropBox, migrated, cut); err != nil { + if err := createOutline(ctxSrc, ctxDest, *pagesIndRef, pagesDict, d, pageNr, cropBox, migrated, cut); err != nil { return nil, err } - if err := createTiles(ctxSrc, ctxDest, *pagesIndRef, pagesDict, d, cropBox, inhPAttrs, migrated, cut); err != nil { + if err := createTiles(ctxSrc, ctxDest, *pagesIndRef, pagesDict, d, pageNr, cropBox, inhPAttrs, migrated, cut); err != nil { return nil, err } @@ -524,11 +526,11 @@ func createNDownCuts(n int, cropBox *types.Rectangle, cut *model.Cut) { } } -func NDownPage(ctxSrc *model.Context, i, n int, cut *model.Cut) (*model.Context, error) { +func NDownPage(ctxSrc *model.Context, pageNr, n int, cut *model.Cut) (*model.Context, error) { // Optionally: border, margin, bgcolor - ctxDest, cropBox, pagesIndRef, pagesDict, d, inhPAttrs, err := prepForCut(ctxSrc, i) + ctxDest, cropBox, pagesIndRef, pagesDict, d, inhPAttrs, err := prepForCut(ctxSrc, pageNr) if err != nil { return nil, err } @@ -544,7 +546,7 @@ func NDownPage(ctxSrc *model.Context, i, n int, cut *model.Cut) (*model.Context, d.Delete("Rotate") } - if err := internPageRot(ctxSrc, rotate, cropBox, d, nil); err != nil { + if err := internPageRot(ctxSrc, rotate, cropBox, d, pageNr, nil); err != nil { return nil, err } @@ -552,11 +554,11 @@ func NDownPage(ctxSrc *model.Context, i, n int, cut *model.Cut) (*model.Context, migrated := map[int]int{} - if err := createOutline(ctxSrc, ctxDest, *pagesIndRef, pagesDict, d, cropBox, migrated, cut); err != nil { + if err := createOutline(ctxSrc, ctxDest, *pagesIndRef, pagesDict, d, pageNr, cropBox, migrated, cut); err != nil { return nil, err } - if err := createTiles(ctxSrc, ctxDest, *pagesIndRef, pagesDict, d, cropBox, inhPAttrs, migrated, cut); err != nil { + if err := createTiles(ctxSrc, ctxDest, *pagesIndRef, pagesDict, d, pageNr, cropBox, inhPAttrs, migrated, cut); err != nil { return nil, err } @@ -591,12 +593,12 @@ func createPosterCuts(cropBox *types.Rectangle, cut *model.Cut) { } } -func PosterPage(ctxSrc *model.Context, i int, cut *model.Cut) (*model.Context, error) { +func PosterPage(ctxSrc *model.Context, pageNr int, cut *model.Cut) (*model.Context, error) { // required: formsize(=papersize) or dimensions // optionally: scalefactor, border, margin, bgcolor - ctxDest, cropBox, pagesIndRef, pagesDict, d, inhPAttrs, err := prepForCut(ctxSrc, i) + ctxDest, cropBox, pagesIndRef, pagesDict, d, inhPAttrs, err := prepForCut(ctxSrc, pageNr) if err != nil { return nil, err } @@ -630,7 +632,7 @@ func PosterPage(ctxSrc *model.Context, i int, cut *model.Cut) (*model.Context, e var trans bytes.Buffer fmt.Fprintf(&trans, "q %.5f %.5f %.5f %.5f %.5f %.5f cm ", m[0][0], m[0][1], m[1][0], m[1][1], m[2][0], m[2][1]) - if err := internPageRot(ctxSrc, rotate, cropBox, d, trans.Bytes()); err != nil { + if err := internPageRot(ctxSrc, rotate, cropBox, d, pageNr, trans.Bytes()); err != nil { return nil, err } @@ -638,11 +640,11 @@ func PosterPage(ctxSrc *model.Context, i int, cut *model.Cut) (*model.Context, e migrated := map[int]int{} - if err := createOutline(ctxSrc, ctxDest, *pagesIndRef, pagesDict, d, cropBox, migrated, cut); err != nil { + if err := createOutline(ctxSrc, ctxDest, *pagesIndRef, pagesDict, d, pageNr, cropBox, migrated, cut); err != nil { return nil, err } - if err := createTiles(ctxSrc, ctxDest, *pagesIndRef, pagesDict, d, cropBox, inhPAttrs, migrated, cut); err != nil { + if err := createTiles(ctxSrc, ctxDest, *pagesIndRef, pagesDict, d, pageNr, cropBox, inhPAttrs, migrated, cut); err != nil { return nil, err } diff --git a/pkg/pdfcpu/doc.go b/pkg/pdfcpu/doc.go index 196c65f4..47153def 100644 --- a/pkg/pdfcpu/doc.go +++ b/pkg/pdfcpu/doc.go @@ -9,10 +9,11 @@ The commands are: booklet arrange pages onto larger sheets of paper to make a booklet or zine bookmarks list, import, export, remove bookmarks boxes list, add, remove page boundaries for selected pages + certificates list, inspect, import, reset certificates changeopw change owner password changeupw change user password collect create custom sequence of selected pages - config print configuration + config list, reset configuration create create PDF content including forms via JSON crop set cropbox for selected pages cut custom cut pages horizontally or vertically @@ -22,7 +23,7 @@ The commands are: fonts install, list supported fonts, create cheat sheets form list, remove fields, lock, unlock, reset, export, fill form via JSON or CSV grid rearrange pages or images for enhanced browsing experience - images list images for selected pages + images list, extract, update images import import/convert images to PDF info print file info keywords list, add, remove keywords @@ -41,12 +42,13 @@ The commands are: resize scale selected pages rotate rotate selected pages selectedpages print definition of the -pages flag + signatures validate signatures split split up a PDF by span or bookmark stamp add, remove, update Unicode text, image or PDF stamps for selected pages trim create trimmed version of selected pages validate validate PDF against PDF 32000-1:2008 (PDF 1.7) + basic PDF 2.0 validation version print version - viewpref list, set, reset viewer preferences for opened document + viewerpref list, set, reset viewer preferences for opened document watermark add, remove, update Unicode text, image or PDF watermarks for selected pages zoom zoom in/out of selected pages by magnification factor or corresponding margin */ diff --git a/pkg/pdfcpu/extract.go b/pkg/pdfcpu/extract.go index 9b45c915..1b8be0f4 100644 --- a/pkg/pdfcpu/extract.go +++ b/pkg/pdfcpu/extract.go @@ -24,6 +24,7 @@ import ( "github.com/angel-one/pdfcpu/pkg/filter" "github.com/angel-one/pdfcpu/pkg/log" + "github.com/angel-one/pdfcpu/pkg/pdfcpu/font" "github.com/angel-one/pdfcpu/pkg/pdfcpu/model" "github.com/angel-one/pdfcpu/pkg/pdfcpu/types" "github.com/pkg/errors" @@ -34,7 +35,22 @@ import ( func ImageObjNrs(ctx *model.Context, pageNr int) []int { // TODO Exclude SMask image objects. objNrs := []int{} - for k, v := range ctx.Optimize.PageImages[pageNr-1] { + + if pageNr < 1 { + return objNrs + } + + imgObjNrs := ctx.Optimize.PageImages + if len(imgObjNrs) == 0 { + return objNrs + } + + pageImgObjNrs := imgObjNrs[pageNr-1] + if pageImgObjNrs == nil { + return objNrs + } + + for k, v := range pageImgObjNrs { if v { objNrs = append(objNrs, k) } @@ -201,6 +217,30 @@ func ColorSpaceComponents(xRefTable *model.XRefTable, sd *types.StreamDict) (int return 0, nil } +func imageWidth(ctx *model.Context, sd *types.StreamDict, objNr int) (int, error) { + obj, ok := sd.Find("Width") + if !ok { + return 0, errors.Errorf("pdfcpu: missing image width obj#%d", objNr) + } + i, err := ctx.DereferenceInteger(obj) + if err != nil { + return 0, err + } + return i.Value(), nil +} + +func imageHeight(ctx *model.Context, sd *types.StreamDict, objNr int) (int, error) { + obj, ok := sd.Find("Height") + if !ok { + return 0, errors.Errorf("pdfcpu: missing image height obj#%d", objNr) + } + i, err := ctx.DereferenceInteger(obj) + if err != nil { + return 0, err + } + return i.Value(), nil +} + func imageStub( ctx *model.Context, sd *types.StreamDict, @@ -209,14 +249,14 @@ func imageStub( thumb, imgMask bool, objNr int) (*model.Image, error) { - w := sd.IntEntry("Width") - if w == nil { - return nil, errors.Errorf("pdfcpu: missing image width obj#%d", objNr) + w, err := imageWidth(ctx, sd, objNr) + if err != nil { + return nil, err } - h := sd.IntEntry("Height") - if h == nil { - return nil, errors.Errorf("pdfcpu: missing image height obj#%d", objNr) + h, err := imageHeight(ctx, sd, objNr) + if err != nil { + return nil, err } cs, err := ColorSpaceString(ctx, sd) @@ -256,7 +296,7 @@ func imageStub( interpol = true } - i, err := StreamLength(ctx, sd) + size, err := StreamLength(ctx, sd) if err != nil { return nil, err } @@ -273,13 +313,13 @@ func imageStub( IsImgMask: imgMask, HasImgMask: mask, HasSMask: sMask, - Width: *w, - Height: *h, + Width: w, + Height: h, Cs: cs, Comp: comp, Bpc: bpc, Interpol: interpol, - Size: i, + Size: size, Filter: filters, DecodeParms: s, } @@ -332,7 +372,7 @@ func decodeImage(ctx *model.Context, sd *types.StreamDict, filters, lastFilter s switch lastFilter { - case filter.DCT, filter.JPX, filter.Flate, filter.CCITTFax, filter.RunLength: + case filter.DCT, filter.JPX, filter.Flate, filter.LZW, filter.CCITTFax, filter.RunLength: if err := sd.Decode(); err != nil { return err } @@ -354,7 +394,7 @@ func decodeImage(ctx *model.Context, sd *types.StreamDict, filters, lastFilter s func img( ctx *model.Context, sd *types.StreamDict, - thumb, imgMask bool, + thumb bool, resourceID, filters, lastFilter string, objNr int) (*model.Image, error) { @@ -394,7 +434,7 @@ func ExtractImage(ctx *model.Context, sd *types.StreamDict, thumb bool, resource return imageStub(ctx, sd, resourceID, filters, lastFilter, decodeParms, thumb, imgMask, objNr) } - return img(ctx, sd, thumb, imgMask, resourceID, filters, lastFilter, objNr) + return img(ctx, sd, thumb, resourceID, filters, lastFilter, objNr) } // ExtractPageImages extracts all images used by pageNr. @@ -403,7 +443,7 @@ func ExtractPageImages(ctx *model.Context, pageNr int, stub bool) (map[int]model m := map[int]model.Image{} for _, objNr := range ImageObjNrs(ctx, pageNr) { imageObj := ctx.Optimize.ImageObjects[objNr] - img, err := ExtractImage(ctx, imageObj.ImageDict, false, imageObj.ResourceNames[0], objNr, stub) + img, err := ExtractImage(ctx, imageObj.ImageDict, false, imageObj.ResourceNames[pageNr-1], objNr, stub) if err != nil { return nil, err } @@ -442,7 +482,22 @@ type Font struct { // Requires an optimized context. func FontObjNrs(ctx *model.Context, pageNr int) []int { objNrs := []int{} - for k, v := range ctx.Optimize.PageFonts[pageNr-1] { + + if pageNr < 1 { + return objNrs + } + + fontObjNrs := ctx.Optimize.PageFonts + if len(fontObjNrs) == 0 { + return objNrs + } + + pageFontObjNrs := fontObjNrs[pageNr-1] + if pageFontObjNrs == nil { + return objNrs + } + + for k, v := range pageFontObjNrs { if v { objNrs = append(objNrs, k) } @@ -452,15 +507,7 @@ func FontObjNrs(ctx *model.Context, pageNr int) []int { // ExtractFont extracts a font from fontObject. func ExtractFont(ctx *model.Context, fontObject model.FontObject, objNr int) (*Font, error) { - // Only embedded fonts have binary data. - if !fontObject.Embedded() { - if log.DebugEnabled() { - log.Debug.Printf("ExtractFont: ignoring obj#%d - non embedded font: %s\n", objNr, fontObject.FontName) - } - return nil, nil - } - - d, err := fontDescriptor(ctx.XRefTable, fontObject.FontDict, objNr) + d, err := font.FontDescriptor(ctx.XRefTable, fontObject.FontDict, objNr) if err != nil { return nil, err } @@ -509,8 +556,12 @@ func ExtractFont(ctx *model.Context, fontObject model.FontObject, objNr int) (*F f = &Font{bytes.NewReader(sd.Content), fontObject.FontName, "ttf"} default: + s := fmt.Sprintf("extractFontData: obj#%d - unsupported fonttype %s - font: %s\n", objNr, fontType, fontObject.FontName) if log.InfoEnabled() { - log.Info.Printf("extractFontData: ignoring obj#%d - unsupported fonttype %s - font: %s\n", objNr, fontType, fontObject.FontName) + log.Info.Println(s) + } + if log.CLIEnabled() { + log.CLI.Printf(s) } return nil, nil } @@ -519,9 +570,12 @@ func ExtractFont(ctx *model.Context, fontObject model.FontObject, objNr int) (*F } // ExtractPageFonts extracts all fonts used by pageNr. -func ExtractPageFonts(ctx *model.Context, pageNr int) ([]Font, error) { +func ExtractPageFonts(ctx *model.Context, pageNr int, objNrs, skipped types.IntSet) ([]Font, error) { ff := []Font{} for _, i := range FontObjNrs(ctx, pageNr) { + if objNrs[i] || skipped[i] { + continue + } fontObject := ctx.Optimize.FontObjects[i] f, err := ExtractFont(ctx, *fontObject, i) if err != nil { @@ -529,6 +583,9 @@ func ExtractPageFonts(ctx *model.Context, pageNr int) ([]Font, error) { } if f != nil { ff = append(ff, *f) + objNrs[i] = true + } else { + skipped[i] = true } } return ff, nil @@ -549,14 +606,9 @@ func ExtractFormFonts(ctx *model.Context) ([]Font, error) { return ff, nil } -// ExtractPage extracts pageNr into a new single page context. -func ExtractPage(ctx *model.Context, pageNr int) (*model.Context, error) { - return ExtractPages(ctx, []int{pageNr}, false) -} - // ExtractPages extracts pageNrs into a new single page context. func ExtractPages(ctx *model.Context, pageNrs []int, usePgCache bool) (*model.Context, error) { - ctxDest, err := CreateContextWithXRefTable(nil, types.PaperSize["A4"]) + ctxDest, err := CreateContextWithXRefTable(ctx.Conf, types.PaperSize["A4"]) if err != nil { return nil, err } @@ -575,7 +627,7 @@ func ExtractPageContent(ctx *model.Context, pageNr int) (io.Reader, error) { if err != nil { return nil, err } - bb, err := ctx.PageContent(d) + bb, err := ctx.PageContent(d, pageNr) if err != nil && err != model.ErrNoContent { return nil, err } diff --git a/pkg/pdfcpu/font/fontDict.go b/pkg/pdfcpu/font/fontDict.go index 73fd2992..8eef51ca 100644 --- a/pkg/pdfcpu/font/fontDict.go +++ b/pkg/pdfcpu/font/fontDict.go @@ -30,6 +30,7 @@ import ( "unicode/utf16" "github.com/angel-one/pdfcpu/pkg/font" + "github.com/angel-one/pdfcpu/pkg/log" "github.com/angel-one/pdfcpu/pkg/pdfcpu/model" "github.com/angel-one/pdfcpu/pkg/pdfcpu/types" "github.com/pkg/errors" @@ -66,6 +67,15 @@ func CJKEncoding(s string) bool { return types.MemberOf(s, []string{"UniGB-UTF16-H", "UniCNS-UTF16-H", "UniJIS-UTF16-H", "UniKS-UTF16-H"}) } +func ScriptForEncoding(enc string) string { + for k, v := range cjkParms { + if v.encoding == enc { + return k + } + } + return "" +} + func fontDescriptorIndRefs(fd types.Dict, lang string, font *model.FontResource) error { if lang != "" { if s := fd.NameEntry("Lang"); s != nil { @@ -147,7 +157,7 @@ func flateEncodedStreamIndRef(xRefTable *model.XRefTable, data []byte) (*types.I return xRefTable.IndRefForNewObject(*sd) } -func ttfFontFile(xRefTable *model.XRefTable, ttf font.TTFLight, fontName string) (*types.IndirectRef, error) { +func ttfFontFile(xRefTable *model.XRefTable, fontName string) (*types.IndirectRef, error) { bb, err := font.Read(fontName) if err != nil { return nil, err @@ -155,7 +165,7 @@ func ttfFontFile(xRefTable *model.XRefTable, ttf font.TTFLight, fontName string) return flateEncodedStreamIndRef(xRefTable, bb) } -func ttfSubFontFile(xRefTable *model.XRefTable, ttf font.TTFLight, fontName string, indRef *types.IndirectRef) (*types.IndirectRef, error) { +func ttfSubFontFile(xRefTable *model.XRefTable, fontName string, indRef *types.IndirectRef) (*types.IndirectRef, error) { bb, err := font.Subset(fontName, xRefTable.UsedGIDs[fontName]) if err != nil { return nil, err @@ -232,15 +242,6 @@ func coreFontDict(xRefTable *model.XRefTable, coreFontName string) (*types.Indir if coreFontName != "Symbol" && coreFontName != "ZapfDingbats" { d.InsertName("Encoding", "WinAnsiEncoding") } - // if coreFontName == "Helvetica" { - // indRef, err := PDFDocEncoding(xRefTable) - // if err != nil { - // return nil, err - // } - // d.Insert("Encoding", *indRef) - // } else if coreFontName != "Symbol" && coreFontName != "ZapfDingbats" { - // d.InsertName("Encoding", "WinAnsiEncoding") - // } return xRefTable.IndRefForNewObject(d) } @@ -301,11 +302,11 @@ func ttfFontDescriptorFlags(ttf font.TTFLight) uint32 { } // CIDFontFile returns a TrueType font file or subfont file for fontName. -func CIDFontFile(xRefTable *model.XRefTable, ttf font.TTFLight, fontName string, subFont bool) (*types.IndirectRef, error) { +func CIDFontFile(xRefTable *model.XRefTable, fontName string, subFont bool) (*types.IndirectRef, error) { if subFont { - return ttfSubFontFile(xRefTable, ttf, fontName, nil) + return ttfSubFontFile(xRefTable, fontName, nil) } - return ttfFontFile(xRefTable, ttf, fontName) + return ttfFontFile(xRefTable, fontName) } // CIDFontDescriptor returns a font descriptor describing the CIDFont’s default metrics other than its glyph widths. @@ -330,7 +331,7 @@ func CIDFontDescriptor(xRefTable *model.XRefTable, ttf font.TTFLight, fontName, ) if embed { - fontFile, err = CIDFontFile(xRefTable, ttf, fontName, true) + fontFile, err = CIDFontFile(xRefTable, fontName, true) if err != nil { return nil, err } @@ -360,8 +361,8 @@ func CIDFontDescriptor(xRefTable *model.XRefTable, ttf font.TTFLight, fontName, } // FontDescriptor returns a TrueType font descriptor describing font’s default metrics other than its glyph widths. -func FontDescriptor(xRefTable *model.XRefTable, ttf font.TTFLight, fontName, fontLang string) (*types.IndirectRef, error) { - fontFile, err := ttfFontFile(xRefTable, ttf, fontName) +func NewFontDescriptor(xRefTable *model.XRefTable, ttf font.TTFLight, fontName, fontLang string) (*types.IndirectRef, error) { + fontFile, err := ttfFontFile(xRefTable, fontName) if err != nil { return nil, err } @@ -747,7 +748,7 @@ func UpdateUserfont(xRefTable *model.XRefTable, fontName string, f model.FontRes return err } - if _, err := ttfSubFontFile(xRefTable, ttf, fontName, f.FontFile); err != nil { + if _, err := ttfSubFontFile(xRefTable, fontName, f.FontFile); err != nil { return err } @@ -954,7 +955,7 @@ func trueTypeFontDict(xRefTable *model.XRefTable, fontName, fontLang string) (*t return nil, err } - fdIndRef, err := FontDescriptor(xRefTable, ttf, fontName, fontLang) + fdIndRef, err := NewFontDescriptor(xRefTable, ttf, fontName, fontLang) if err != nil { return nil, err } @@ -1068,8 +1069,8 @@ func Name(xRefTable *model.XRefTable, fontDict types.Dict, objNumber int) (prefi } // Lang detects the optional language indicator in a font dict. -func Lang(xRefTable *model.XRefTable, d types.Dict) (string, error) { - o, found := d.Find("FontDescriptor") +func Lang(xRefTable *model.XRefTable, fontDict types.Dict) (string, error) { + o, found := fontDict.Find("FontDescriptor") if found { fd, err := xRefTable.DereferenceDict(o) if err != nil { @@ -1083,9 +1084,21 @@ func Lang(xRefTable *model.XRefTable, d types.Dict) (string, error) { return s, nil } - arr := d.ArrayEntry("DescendantFonts") - indRef := arr[0].(types.IndirectRef) - d1, err := xRefTable.DereferenceDict(indRef) + o, found = fontDict.Find("DescendantFonts") + if !found { + return "", ErrCorruptFontDict + } + + arr, err := xRefTable.DereferenceArray(o) + if err != nil { + return "", err + } + + if len(arr) != 1 { + return "", ErrCorruptFontDict + } + + d1, err := xRefTable.DereferenceDict(arr[0]) if err != nil { return "", err } @@ -1105,3 +1118,107 @@ func Lang(xRefTable *model.XRefTable, d types.Dict) (string, error) { return "", nil } + +func trivialFontDescriptor(xRefTable *model.XRefTable, fontDict types.Dict, objNr int) (types.Dict, error) { + o, ok := fontDict.Find("FontDescriptor") + if !ok { + return nil, nil + } + + // fontDescriptor directly available. + + d, err := xRefTable.DereferenceDict(o) + if err != nil { + return nil, err + } + + if d == nil { + return nil, errors.Errorf("pdfcpu: trivialFontDescriptor: FontDescriptor is null for font object %d\n", objNr) + } + + if d.Type() != nil && *d.Type() != "FontDescriptor" { + return nil, errors.Errorf("pdfcpu: trivialFontDescriptor: FontDescriptor dict incorrect dict type for font object %d\n", objNr) + } + + return d, nil +} + +// FontDescriptor gets the font descriptor for this font. +func FontDescriptor(xRefTable *model.XRefTable, fontDict types.Dict, objNr int) (types.Dict, error) { + if log.OptimizeEnabled() { + log.Optimize.Println("fontDescriptor begin") + } + + d, err := trivialFontDescriptor(xRefTable, fontDict, objNr) + if err != nil { + return nil, err + } + if d != nil { + return d, nil + } + + // Try to access a fontDescriptor in a Descendent font for Type0 fonts. + + o, ok := fontDict.Find("DescendantFonts") + if !ok { + //logErrorOptimize.Printf("FontDescriptor: Neither FontDescriptor nor DescendantFonts for font object %d\n", objectNumber) + return nil, nil + } + + // A descendant font is contained in an array of size 1. + + a, err := xRefTable.DereferenceArray(o) + if err != nil || a == nil { + return nil, errors.Errorf("pdfcpu: fontDescriptor: DescendantFonts: IndirectRef or Array with length 1 expected for font object %d\n", objNr) + } + if len(a) != 1 { + return nil, errors.Errorf("pdfcpu: fontDescriptor: DescendantFonts Array length <> 1 %v\n", a) + } + + // dict is the fontDict of the descendant font. + d, err = xRefTable.DereferenceDict(a[0]) + if err != nil { + return nil, errors.Errorf("pdfcpu: fontDescriptor: No descendant font dict for %v\n", a) + } + if d == nil { + return nil, errors.Errorf("pdfcpu: fontDescriptor: descendant font dict is null for %v\n", a) + } + + if *d.Type() != "Font" { + return nil, errors.Errorf("pdfcpu: fontDescriptor: font dict with incorrect dict type for %v\n", d) + } + + o, ok = d.Find("FontDescriptor") + if !ok { + log.Optimize.Printf("fontDescriptor: descendant font not embedded %s\n", d) + return nil, nil + } + + d, err = xRefTable.DereferenceDict(o) + if err != nil { + return nil, errors.Errorf("pdfcpu: fontDescriptor: No FontDescriptor dict for font object %d\n", objNr) + } + + if log.OptimizeEnabled() { + log.Optimize.Println("fontDescriptor end") + } + + return d, nil +} + +func Embedded(xRefTable *model.XRefTable, fontDict types.Dict, objNr int) (bool, error) { + fd, err := FontDescriptor(xRefTable, fontDict, objNr) + if err != nil { + return false, err + } + if _, ok := fd.Find("FontFile"); ok { + return true, nil + } + if _, ok := fd.Find("FontFile2"); ok { + return true, nil + } + if _, ok := fd.Find("FontFile3"); ok { + return true, nil + } + return false, nil +} diff --git a/pkg/pdfcpu/form/export.go b/pkg/pdfcpu/form/export.go index 2f9a964f..e3861f70 100644 --- a/pkg/pdfcpu/form/export.go +++ b/pkg/pdfcpu/form/export.go @@ -20,6 +20,7 @@ import ( "encoding/json" "io" "path/filepath" + "strconv" "strings" "time" @@ -29,6 +30,15 @@ import ( "github.com/pkg/errors" ) +const ( + + // REQUIRED is used for required dict entries. + REQUIRED = true + + // OPTIONAL is used for optional dict entries. + OPTIONAL = false +) + // Header represents form meta data. type Header struct { Source string `json:"source"` @@ -48,8 +58,10 @@ type TextField struct { Pages []int `json:"pages"` ID string `json:"id"` Name string `json:"name,omitempty"` + AltName string `json:"altname,omitempty"` Default string `json:"default,omitempty"` Value string `json:"value"` + MaxLen int `json:"maxlen,omitempty"` Multiline bool `json:"multiline"` Locked bool `json:"locked"` } @@ -59,6 +71,7 @@ type DateField struct { Pages []int `json:"pages"` ID string `json:"id"` Name string `json:"name,omitempty"` + AltName string `json:"altname,omitempty"` Format string `json:"format"` Default string `json:"default,omitempty"` Value string `json:"value"` @@ -70,6 +83,7 @@ type CheckBox struct { Pages []int `json:"pages"` ID string `json:"id"` Name string `json:"name,omitempty"` + AltName string `json:"altname,omitempty"` Default bool `json:"default"` Value bool `json:"value"` Locked bool `json:"locked"` @@ -80,6 +94,7 @@ type RadioButtonGroup struct { Pages []int `json:"pages"` ID string `json:"id"` Name string `json:"name,omitempty"` + AltName string `json:"altname,omitempty"` Options []string `json:"options"` Default string `json:"default,omitempty"` Value string `json:"value"` @@ -91,6 +106,7 @@ type ComboBox struct { Pages []int `json:"pages"` ID string `json:"id"` Name string `json:"name,omitempty"` + AltName string `json:"altname,omitempty"` Editable bool `json:"editable"` Options []string `json:"options"` Default string `json:"default,omitempty"` @@ -103,6 +119,7 @@ type ListBox struct { Pages []int `json:"pages"` ID string `json:"id"` Name string `json:"name,omitempty"` + AltName string `json:"altname,omitempty"` Multi bool `json:"multi"` Options []string `json:"options"` Defaults []string `json:"defaults,omitempty"` @@ -186,15 +203,54 @@ func (f Form) listBoxValuesAndLock(id, name string) ([]string, bool, bool) { return nil, false, false } -func extractRadioButtonGroupOptions(xRefTable *model.XRefTable, d types.Dict) ([]string, error) { +func locateAPN(xRefTable *model.XRefTable, d types.Dict) (types.Dict, error) { + + obj, ok := d.Find("AP") + if !ok { + return nil, errors.New("corrupt form field: missing entry \"AP\"") + } + d1, err := xRefTable.DereferenceDict(obj) + if err != nil { + return nil, err + } + if len(d1) == 0 { + return nil, errors.New("corrupt form field: missing entry \"AP\"") + } + + obj, ok = d1.Find("N") + if !ok { + return nil, errors.New("corrupt AP field: missing entry \"N\"") + } + d2, err := xRefTable.DereferenceDict(obj) + if err != nil { + return nil, err + } + + if len(d2) == 0 { + return nil, errors.New("corrupt AP field: missing entry \"N\"") + } + + return d2, nil +} + +func extractRadioButtonGroupOptions(xRefTable *model.XRefTable, d types.Dict) ([]string, bool, error) { var opts []string p := 0 + opts, err := parseOptions(xRefTable, d, OPTIONAL) + if err != nil { + return nil, false, err + } + + if len(opts) > 0 { + return opts, true, nil + } + for _, o := range d.ArrayEntry("Kids") { d, err := xRefTable.DereferenceDict(o) if err != nil { - return nil, err + return nil, false, err } indRef := d.IndirectRefEntry("P") @@ -206,18 +262,15 @@ func extractRadioButtonGroupOptions(xRefTable *model.XRefTable, d types.Dict) ([ } } - d1 := d.DictEntry("AP") - if d1 == nil { - return nil, errors.New("corrupt form field: missing entry AP") - } - d2 := d1.DictEntry("N") - if d2 == nil { - return nil, errors.New("corrupt AP field: missing entry N") + d1, err := locateAPN(xRefTable, d) + if err != nil { + return nil, false, err } - for k := range d2 { + + for k := range d1 { k, err := types.DecodeName(k) if err != nil { - return nil, err + return nil, false, err } if k != "Off" { for _, opt := range opts { @@ -230,15 +283,42 @@ func extractRadioButtonGroupOptions(xRefTable *model.XRefTable, d types.Dict) ([ } } - return opts, nil + return opts, false, nil +} + +func resolveOption(s string, opts []string, explicit bool) (string, error) { + n, err := types.DecodeName(s) + if err != nil { + return "", err + } + if len(opts) > 0 && explicit { + j, err := strconv.Atoi(n) + if err != nil { + return "", err + } + for i, o := range opts { + if i == j { + n = o + break + } + } + } + return n, nil } -func extractRadioButtonGroup(xRefTable *model.XRefTable, page int, d types.Dict, id, name string, locked bool) (*RadioButtonGroup, error) { +func extractRadioButtonGroup(xRefTable *model.XRefTable, page int, d types.Dict, id, name, altName string, locked bool) (*RadioButtonGroup, error) { - rbg := &RadioButtonGroup{Pages: []int{page}, ID: id, Name: name, Locked: locked} + rbg := &RadioButtonGroup{Pages: []int{page}, ID: id, Name: name, AltName: altName, Locked: locked} + + opts, explicit, err := extractRadioButtonGroupOptions(xRefTable, d) + if err != nil { + return nil, err + } + + rbg.Options = opts if s := d.NameEntry("DV"); s != nil { - n, err := types.DecodeName(*s) + n, err := resolveOption(*s, opts, explicit) if err != nil { return nil, err } @@ -246,7 +326,7 @@ func extractRadioButtonGroup(xRefTable *model.XRefTable, page int, d types.Dict, } if s := d.NameEntry("V"); s != nil { - n, err := types.DecodeName(*s) + n, err := resolveOption(*s, opts, explicit) if err != nil { return nil, err } @@ -255,41 +335,35 @@ func extractRadioButtonGroup(xRefTable *model.XRefTable, page int, d types.Dict, } } - opts, err := extractRadioButtonGroupOptions(xRefTable, d) - if err != nil { - return nil, err - } - - rbg.Options = opts - return rbg, nil } -func extractCheckBox(page int, d types.Dict, id, name string, locked bool) (*CheckBox, error) { +func extractCheckBox(page int, d types.Dict, id, name, altName string, locked bool) (*CheckBox, error) { - cb := &CheckBox{Pages: []int{page}, ID: id, Name: name, Locked: locked} + cb := &CheckBox{Pages: []int{page}, ID: id, Name: name, AltName: altName, Locked: locked} if o, ok := d.Find("DV"); ok { - cb.Default = o.(types.Name) == "Yes" + cb.Default = o.(types.Name) != "Off" } if o, ok := d.Find("V"); ok { - cb.Value = o.(types.Name) == "Yes" + n := o.(types.Name) + cb.Value = len(n) > 0 && n != "Off" } return cb, nil } -func extractComboBox(xRefTable *model.XRefTable, page int, d types.Dict, id, name string, locked bool) (*ComboBox, error) { +func extractComboBox(xRefTable *model.XRefTable, page int, d types.Dict, id, name, altName string, locked bool) (*ComboBox, error) { - cb := &ComboBox{Pages: []int{page}, ID: id, Name: name, Locked: locked} + cb := &ComboBox{Pages: []int{page}, ID: id, Name: name, AltName: altName, Locked: locked} if sl := d.StringLiteralEntry("DV"); sl != nil { s, err := types.StringLiteralToString(*sl) if err != nil { return nil, err } - cb.Default = s + cb.Default = strings.TrimSpace(s) } if sl := d.StringLiteralEntry("V"); sl != nil { @@ -297,10 +371,10 @@ func extractComboBox(xRefTable *model.XRefTable, page int, d types.Dict, id, nam if err != nil { return nil, err } - cb.Value = s + cb.Value = strings.TrimSpace(s) } - opts, err := parseOptions(xRefTable, d) + opts, err := parseOptions(xRefTable, d, REQUIRED) if err != nil { return nil, err } @@ -313,8 +387,7 @@ func extractComboBox(xRefTable *model.XRefTable, page int, d types.Dict, id, nam return cb, nil } -func extractDateFormat(xRefTable *model.XRefTable, d types.Dict) (*primitives.DateFormat, error) { - +func dateFormatFromJSAction(d types.Dict) (*primitives.DateFormat, error) { d1 := d.DictEntry("AA") if len(d1) > 0 { d2 := d1.DictEntry("F") @@ -336,24 +409,45 @@ func extractDateFormat(xRefTable *model.XRefTable, d types.Dict) (*primitives.Da } } } + return nil, nil +} + +func extractDateFormat(xRefTable *model.XRefTable, d types.Dict) (*primitives.DateFormat, error) { + df, err := dateFormatFromJSAction(d) + if err != nil { + return nil, err + } + if df != nil { + return df, nil + } if o, found := d.Find("DV"); found { - sl, _ := o.(types.StringLiteral) - s, err := types.StringLiteralToString(sl) + o1, err := xRefTable.Dereference(o) if err != nil { return nil, err } + sl, err := types.StringOrHexLiteral(o1) + if err != nil { + return nil, err + } + s := "" + if sl != nil { + s = *sl + } if df, err := primitives.DateFormatForDate(s); err == nil { return df, nil } } if o, found := d.Find("V"); found { - sl, _ := o.(types.StringLiteral) - s, err := types.StringLiteralToString(sl) + sl, err := types.StringOrHexLiteral(o) if err != nil { return nil, err } + s := "" + if sl != nil { + s = *sl + } if df, err := primitives.DateFormatForDate(s); err == nil { return df, nil } @@ -362,61 +456,55 @@ func extractDateFormat(xRefTable *model.XRefTable, d types.Dict) (*primitives.Da return nil, nil } -func extractDateField(page int, d types.Dict, id, name string, df *primitives.DateFormat, locked bool) (*DateField, error) { +func extractDateField(xRefTable *model.XRefTable, page int, d types.Dict, id, name, altName string, df *primitives.DateFormat, locked bool) (*DateField, error) { - dfield := &DateField{Pages: []int{page}, ID: id, Name: name, Format: df.Ext, Locked: locked} + dfield := &DateField{Pages: []int{page}, ID: id, Name: name, AltName: altName, Format: df.Ext, Locked: locked} - if o, found := d.Find("DV"); found { - sl, _ := o.(types.StringLiteral) - s, err := types.StringLiteralToString(sl) - if err != nil { - return nil, err - } - dfield.Default = s + v, err := getV(xRefTable, d) + if err != nil { + return nil, err } + dfield.Value = v - if o, found := d.Find("V"); found { - sl, _ := o.(types.StringLiteral) - s, err := types.StringLiteralToString(sl) - if err != nil { - return nil, err - } - dfield.Value = s + dv, err := getDV(xRefTable, d) + if err != nil { + return nil, err } + dfield.Default = dv return dfield, nil } -func extractTextField(page int, d types.Dict, id, name string, ff *int, locked bool) (*TextField, error) { +func extractTextField(xRefTable *model.XRefTable, page int, d types.Dict, id, name, altName string, ff *int, locked bool) (*TextField, error) { multiLine := ff != nil && uint(primitives.FieldFlags(*ff))&uint(primitives.FieldMultiline) > 0 - tf := &TextField{Pages: []int{page}, ID: id, Name: name, Multiline: multiLine, Locked: locked} + maxLen := 0 + i := d.IntEntry("MaxLen") + if i != nil { + maxLen = *i + } - if o, found := d.Find("DV"); found { - sl, _ := o.(types.StringLiteral) - s, err := types.StringLiteralToString(sl) - if err != nil { - return nil, err - } - tf.Default = s + tf := &TextField{Pages: []int{page}, ID: id, Name: name, AltName: altName, Multiline: multiLine, MaxLen: maxLen, Locked: locked} + + v, err := getV(xRefTable, d) + if err != nil { + return nil, err } + tf.Value = v - if o, found := d.Find("V"); found { - sl, _ := o.(types.StringLiteral) - s, err := types.StringLiteralToString(sl) - if err != nil { - return nil, err - } - tf.Value = s + dv, err := getDV(xRefTable, d) + if err != nil { + return nil, err } + tf.Default = dv return tf, nil } -func extractListBox(xRefTable *model.XRefTable, page int, d types.Dict, id, name string, locked, multi bool) (*ListBox, error) { +func extractListBox(xRefTable *model.XRefTable, page int, d types.Dict, id, name, altName string, locked, multi bool) (*ListBox, error) { - lb := &ListBox{Pages: []int{page}, ID: id, Name: name, Locked: locked, Multi: multi} + lb := &ListBox{Pages: []int{page}, ID: id, Name: name, AltName: altName, Locked: locked, Multi: multi} if !multi { if sl := d.StringLiteralEntry("DV"); sl != nil { @@ -424,14 +512,14 @@ func extractListBox(xRefTable *model.XRefTable, page int, d types.Dict, id, name if err != nil { return nil, err } - lb.Defaults = []string{s} + lb.Defaults = []string{strings.TrimSpace(s)} } if sl := d.StringLiteralEntry("V"); sl != nil { s, err := types.StringLiteralToString(*sl) if err != nil { return nil, err } - lb.Values = []string{s} + lb.Values = []string{strings.TrimSpace(s)} } } else { ss, err := parseStringLiteralArray(xRefTable, d, "DV") @@ -446,7 +534,7 @@ func extractListBox(xRefTable *model.XRefTable, page int, d types.Dict, id, name lb.Values = ss } - opts, err := parseOptions(xRefTable, d) + opts, err := parseOptions(xRefTable, d, REQUIRED) if err != nil { return nil, err } @@ -509,11 +597,11 @@ func exportBtn( i int, form *Form, d types.Dict, - id, name string, + id, name, altName string, locked bool, ok *bool) error { - if len(d.ArrayEntry("Kids")) > 0 { + if len(d.ArrayEntry("Kids")) > 1 { for _, rb := range form.RadioButtonGroups { if rb.ID == id && rb.Name == name { @@ -522,7 +610,7 @@ func exportBtn( } } - rbg, err := extractRadioButtonGroup(xRefTable, i, d, id, name, locked) + rbg, err := extractRadioButtonGroup(xRefTable, i, d, id, name, altName, locked) if err != nil { return err } @@ -539,7 +627,7 @@ func exportBtn( } } - cb, err := extractCheckBox(i, d, id, name, locked) + cb, err := extractCheckBox(i, d, id, name, altName, locked) if err != nil { return err } @@ -554,7 +642,7 @@ func exportCh( i int, form *Form, d types.Dict, - id, name string, + id, name, altName string, locked bool, ok *bool) error { @@ -572,7 +660,7 @@ func exportCh( } } - cb, err := extractComboBox(xRefTable, i, d, id, name, locked) + cb, err := extractComboBox(xRefTable, i, d, id, name, altName, locked) if err != nil { return err } @@ -589,7 +677,7 @@ func exportCh( } multi := primitives.FieldFlags(*ff)&primitives.FieldMultiselect > 0 - lb, err := extractListBox(xRefTable, i, d, id, name, locked, multi) + lb, err := extractListBox(xRefTable, i, d, id, name, altName, locked, multi) if err != nil { return err } @@ -604,7 +692,7 @@ func exportTx( i int, form *Form, d types.Dict, - id, name string, + id, name, altName string, ff *int, locked bool, ok *bool) error { @@ -623,7 +711,7 @@ func exportTx( } } - df, err := extractDateField(i, d, id, name, df, locked) + df, err := extractDateField(xRefTable, i, d, id, name, altName, df, locked) if err != nil { return err } @@ -640,7 +728,7 @@ func exportTx( } } - tf, err := extractTextField(i, d, id, name, ff, locked) + tf, err := extractTextField(xRefTable, i, d, id, name, altName, ff, locked) if err != nil { return err } @@ -650,6 +738,21 @@ func exportTx( return nil } +func exportPageField(ft string, xRefTable *model.XRefTable, i int, form *Form, d types.Dict, id, name, altName string, locked bool, ok *bool, ff *int) error { + var err error + + switch ft { + case "Btn": + err = exportBtn(xRefTable, i, form, d, id, name, altName, locked, ok) + case "Ch": + err = exportCh(xRefTable, i, form, d, id, name, altName, locked, ok) + case "Tx": + err = exportTx(xRefTable, i, form, d, id, name, altName, ff, locked, ok) + } + + return err +} + func exportPageFields(xRefTable *model.XRefTable, i int, form *Form, m map[string]fieldInfo, ok *bool) error { for id, fi := range m { @@ -677,23 +780,20 @@ func exportPageFields(xRefTable *model.XRefTable, i int, form *Form, m map[strin } } - switch *ft { - case "Btn": - if err := exportBtn(xRefTable, i, form, d, id, name, locked, ok); err != nil { - return err - } - - case "Ch": - if err := exportCh(xRefTable, i, form, d, id, name, locked, ok); err != nil { + altName := "" + if o, found := d.Find("TU"); found { + s, err := types.StringOrHexLiteral(o) + if err != nil { return err } - - case "Tx": - if err := exportTx(xRefTable, i, form, d, id, name, ff, locked, ok); err != nil { - return err + if s != nil { + altName = *s } } + if err := exportPageField(*ft, xRefTable, i, form, d, id, name, altName, locked, ok, ff); err != nil { + return err + } } return nil diff --git a/pkg/pdfcpu/form/fill.go b/pkg/pdfcpu/form/fill.go index b5f46505..f052a5a0 100644 --- a/pkg/pdfcpu/form/fill.go +++ b/pkg/pdfcpu/form/fill.go @@ -22,6 +22,7 @@ import ( "strconv" "strings" + "github.com/angel-one/pdfcpu/pkg/font" pdffont "github.com/angel-one/pdfcpu/pkg/pdfcpu/font" "github.com/angel-one/pdfcpu/pkg/pdfcpu/model" "github.com/angel-one/pdfcpu/pkg/pdfcpu/primitives" @@ -69,6 +70,8 @@ func addImages(ctx *model.Context, pages map[string]*Page) ([]*model.Page, error RadioBtnAPs: map[float64]*primitives.AP{}, OldFieldIDs: types.StringSet{}, Debug: false, + Offline: ctx.Offline, + Timeout: ctx.Timeout, } if err := cacheResIDs(ctx, pdf); err != nil { @@ -302,7 +305,7 @@ func imageBox(s, src, url string) (*primitives.ImageBox, string, error) { s = s[4:] if s[0] != '(' || s[len(s)-1] != ')' { - return nil, "", errors.Errorf("pdfcpu: parsing cvs fieldNames: corrupted @img: <%s>", s) + return nil, "", errors.Errorf("pdfcpu: parsing cvs fieldNames: invalid @img: <%s>", s) } s = s[1 : len(s)-1] @@ -319,7 +322,7 @@ func imageBox(s, src, url string) (*primitives.ImageBox, string, error) { for _, s := range ss { ss1 := strings.Split(s, ":") if len(ss1) != 2 { - return nil, "", errors.Errorf("pdfcpu: parsing cvs fieldNames: corrupted @img: <%s>", s) + return nil, "", errors.Errorf("pdfcpu: parsing cvs fieldNames: invalid @img: <%s>", s) } paramPrefix := strings.TrimSpace(ss1[0]) @@ -459,17 +462,12 @@ func fillRadioButtons(ctx *model.Context, d types.Dict, vNew string, v types.Nam return err } - d1 := d.DictEntry("AP") - if d1 == nil { - return errors.New("pdfcpu: corrupt form field: missing entry AP") - } - - d2 := d1.DictEntry("N") - if d2 == nil { - return errors.New("pdfcpu: corrupt AP field: missing entry N") + d1, err := locateAPN(ctx.XRefTable, d) + if err != nil { + return err } - for k := range d2 { + for k := range d1 { k, err := types.DecodeName(k) if err != nil { return err @@ -491,6 +489,7 @@ func fillRadioButtonGroup( ctx *model.Context, d types.Dict, id, name string, + opts []string, locked bool, format DataFormat, fillDetails func(id, name string, fieldType FieldType, format DataFormat) ([]string, bool, bool), @@ -514,6 +513,16 @@ func fillRadioButtonGroup( } vNew := vv[0] + + if len(opts) > 0 { + for i, o := range opts { + if o == vNew { + vNew = strconv.Itoa(i) + break + } + } + } + vOld := "" if s := d.NameEntry("V"); s != nil { n, err := types.DecodeName(*s) @@ -541,6 +550,34 @@ func fillRadioButtonGroup( return nil } +func fillCheckBoxKid(ctx *model.Context, kids types.Array, off bool) (*types.Name, error) { + d, err := ctx.DereferenceDict(kids[0]) + if err != nil { + return nil, err + } + + d1, err := locateAPN(ctx.XRefTable, d) + if err != nil { + return nil, err + } + + offName, yesName, err := primitives.CalcCheckBoxASNames(ctx, d1) + if err != nil { + return nil, err + } + + asName := yesName + if off { + asName = offName + } + + if _, found := d.Find("AS"); found { + d["AS"] = asName + } + + return &asName, nil +} + func fillCheckBox( ctx *model.Context, d types.Dict, @@ -568,10 +605,11 @@ func fillCheckBox( } s := strings.ToLower(vv[0]) - vNew := strings.HasPrefix(s, "t") + vNew := strings.HasPrefix(s, "t") // true vOld := false if o, found := d.Find("V"); found { - vOld = o.(types.Name) == "Yes" + n := o.(types.Name) + vOld = len(n) > 0 && n != "Off" } if vNew == vOld { return nil @@ -581,18 +619,33 @@ func fillCheckBox( if vNew { v = types.Name("Yes") } + + kids := d.ArrayEntry("Kids") + if len(kids) == 1 { + asName, err := fillCheckBoxKid(ctx, kids, v == types.Name("Off")) + if err != nil { + return err + } + d["V"] = *asName + *ok = true + return nil + } + d["V"] = v if _, found := d.Find("AS"); found { - offName, yesName := primitives.CalcCheckBoxASNames(d) + offName, yesName, err := primitives.CalcCheckBoxASNames(ctx, d) + if err != nil { + return err + } //fmt.Printf("off:<%s> yes:<%s>\n", offName, yesName) asName := yesName if v == "Off" { asName = offName } d["AS"] = asName + d["V"] = asName } *ok = true - return nil } @@ -610,8 +663,13 @@ func fillBtn( return nil } - if len(d.ArrayEntry("Kids")) > 0 { - if err := fillRadioButtonGroup(ctx, d, id, name, locked, format, fillDetails, ok); err != nil { + opts, err := parseOptions(ctx.XRefTable, d, OPTIONAL) + if err != nil { + return err + } + + if len(d.ArrayEntry("Kids")) > 1 { + if err := fillRadioButtonGroup(ctx, d, id, name, opts, locked, format, fillDetails, ok); err != nil { return err } } else { @@ -632,7 +690,6 @@ func fillComboBox( format DataFormat, fonts map[string]types.IndirectRef, fillDetails func(id, name string, fieldType FieldType, format DataFormat) ([]string, bool, bool), - ff *int, ok *bool) error { vv, lock, found := fillDetails(id, name, FTComboBox, format) @@ -640,6 +697,8 @@ func fillComboBox( return nil } + da := d.StringEntry("DA") + vNew := vv[0] if locked { if !lock { @@ -649,7 +708,7 @@ func fillComboBox( } } else if lock { lockFormField(d) - if err := primitives.EnsureComboBoxAP(ctx, d, vNew, fonts); err != nil { + if err := primitives.EnsureComboBoxAP(ctx, d, vNew, da, fonts); err != nil { return err } *ok = true @@ -667,7 +726,7 @@ func fillComboBox( return nil } - s, err := types.EscapeUTF16String(vNew) + s, err := types.EscapedUTF16String(vNew) if err != nil { return err } @@ -702,7 +761,7 @@ func updateListBoxValues(multi bool, d types.Dict, opts, vNew []string) (types.A break } } - s, err := types.EscapeUTF16String(v) + s, err := types.EscapedUTF16String(v) if err != nil { return nil, err } @@ -719,7 +778,7 @@ func updateListBoxValues(multi bool, d types.Dict, opts, vNew []string) (types.A } v := vNew[0] - s, err := types.EscapeUTF16String(v) + s, err := types.EscapedUTF16String(v) if err != nil { return nil, err } @@ -796,7 +855,9 @@ func fillListBox( return err } - if err := primitives.EnsureListBoxAP(ctx, d, opts, ind, fonts); err != nil { + da := d.StringEntry("DA") + + if err := primitives.EnsureListBoxAP(ctx, d, opts, ind, da, fonts); err != nil { return err } @@ -820,7 +881,7 @@ func fillCh( return errors.New("pdfcpu: corrupt form field: missing entry Ff") } - opts, err := parseOptions(ctx.XRefTable, d) + opts, err := parseOptions(ctx.XRefTable, d, REQUIRED) if err != nil { return err } @@ -830,7 +891,7 @@ func fillCh( } if primitives.FieldFlags(*ff)&primitives.FieldCombo > 0 { - return fillComboBox(ctx, d, id, name, opts, locked, format, fonts, fillDetails, ff, ok) + return fillComboBox(ctx, d, id, name, opts, locked, format, fonts, fillDetails, ok) } return fillListBox(ctx, d, id, name, opts, locked, format, fonts, fillDetails, ff, ok) @@ -844,7 +905,6 @@ func fillDateField( format DataFormat, fonts map[string]types.IndirectRef, fillDetails func(id, name string, fieldType FieldType, format DataFormat) ([]string, bool, bool), - ff *int, ok *bool) error { vv, lock, found := fillDetails(id, name, FTDate, format) @@ -865,23 +925,44 @@ func fillDateField( } vNew := vv[0] + if vNew == vOld { return nil } - s, err := types.EscapeUTF16String(vNew) + s, err := types.EscapedUTF16String(vNew) if err != nil { return err } - d["V"] = types.StringLiteral(*s) - if err := primitives.EnsureDateFieldAP(ctx, d, vNew, fonts); err != nil { + da := d.StringEntry("DA") + + kids := d.ArrayEntry("Kids") + if len(kids) > 0 { + + for _, o := range kids { + + d, err := ctx.DereferenceDict(o) + if err != nil { + return err + } + + if err := primitives.EnsureDateFieldAP(ctx, d, vNew, da, fonts); err != nil { + return err + } + + *ok = true + } + + return nil + } + + if err := primitives.EnsureDateFieldAP(ctx, d, vNew, da, fonts); err != nil { return err } *ok = true - return nil } @@ -919,17 +1000,27 @@ func fillTextField( return nil } - s, err := types.EscapeUTF16String(vNew) + s, err := types.EscapedUTF16String(vNew) if err != nil { return err } - d["V"] = types.StringLiteral(*s) multiLine := ff != nil && uint(primitives.FieldFlags(*ff))&uint(primitives.FieldMultiline) > 0 + comb := ff != nil && primitives.FieldFlags(*ff)&primitives.FieldComb > 0 + + maxLen := 0 + i := d.IntEntry("MaxLen") + if i != nil { + maxLen = *i + } + + da := d.StringEntry("DA") + kids := d.ArrayEntry("Kids") if len(kids) > 0 { + for _, o := range kids { d, err := ctx.DereferenceDict(o) @@ -937,7 +1028,7 @@ func fillTextField( return err } - if err := primitives.EnsureTextFieldAP(ctx, d, vNew, multiLine, fonts); err != nil { + if err := primitives.EnsureTextFieldAP(ctx, d, vNew, multiLine, comb, maxLen, da, fonts); err != nil { return err } @@ -947,7 +1038,7 @@ func fillTextField( return nil } - if err := primitives.EnsureTextFieldAP(ctx, d, vNew, multiLine, fonts); err != nil { + if err := primitives.EnsureTextFieldAP(ctx, d, vNew, multiLine, comb, maxLen, da, fonts); err != nil { return err } @@ -970,18 +1061,14 @@ func fillTx( if err != nil { return err } - vOld := "" - if o, found := d.Find("V"); found { - sl, _ := o.(types.StringLiteral) - s, err := types.StringLiteralToString(sl) - if err != nil { - return err - } - vOld = s + + vOld, err := getV(ctx.XRefTable, d) + if err != nil { + return err } if df != nil { - return fillDateField(ctx, d, id, name, vOld, locked, format, fonts, fillDetails, ff, ok) + return fillDateField(ctx, d, id, name, vOld, locked, format, fonts, fillDetails, ok) } return fillTextField(ctx, d, id, name, vOld, locked, format, fonts, fillDetails, ff, ok) @@ -1058,6 +1145,34 @@ func fillWidgetAnnots( return nil } +func setupFillFonts(xRefTable *model.XRefTable) error { + d, err := primitives.FormFontResDict(xRefTable) + if err != nil { + return err + } + + m := xRefTable.FillFonts + + if d == nil { + // TODO setup/reuse Helvetica and add to m + return nil + } + + for k, v := range d { + indRef := v.(types.IndirectRef) + fontName, _, _, err := primitives.FormFontDetails(xRefTable, indRef) + if err != nil { + return err + } + + if font.IsCoreFont(fontName) || font.IsUserFont(fontName) { + m[k] = indRef + } + } + + return nil +} + // FillForm populates form fields as provided by fillDetails and also supports virtual image fields. func FillForm( ctx *model.Context, @@ -1075,6 +1190,10 @@ func FillForm( fonts := map[string]types.IndirectRef{} indRefs := map[types.IndirectRef]bool{} + if err := setupFillFonts(xRefTable); err != nil { + return false, nil, err + } + var ok bool for i := 1; i <= xRefTable.PageCount; i++ { diff --git a/pkg/pdfcpu/form/form.go b/pkg/pdfcpu/form/form.go index 06215a38..c893d848 100644 --- a/pkg/pdfcpu/form/form.go +++ b/pkg/pdfcpu/form/form.go @@ -65,14 +65,15 @@ func (ft FieldType) String() string { // Field represents a form field for s particular page number. type Field struct { - Pages []int - Locked bool - Typ FieldType - ID string - Name string - Dv string - V string - Opts string + Pages []int + Locked bool + Typ FieldType + ID string + Name string + AltName string + Dv string + V string + Opts string } func (f Field) pageString() string { @@ -88,8 +89,8 @@ func (f Field) pageString() string { } type FieldMeta struct { - def, val, opt bool - pageMax, defMax, valMax, idMax, nameMax int + altName, def, val, opt bool + pageMax, defMax, valMax, idMax, nameMax, altNameMax int } func fields(xRefTable *model.XRefTable) (types.Array, error) { @@ -250,8 +251,14 @@ func extractStringSlice(a types.Array) ([]string, error) { return ss, nil } -func parseOptions(xRefTable *model.XRefTable, d types.Dict) ([]string, error) { - o, _ := d.Find("Opt") +func parseOptions(xRefTable *model.XRefTable, d types.Dict, required bool) ([]string, error) { + o, ok := d.Find("Opt") + if !ok { + if required { + return nil, errors.New("corrupt form field: missing entry \"Opt\"") + } + return nil, nil + } a, err := xRefTable.DereferenceArray(o) if err != nil { return nil, err @@ -285,27 +292,32 @@ func parseStringLiteralArray(xRefTable *model.XRefTable, d types.Dict, key strin return nil, nil } -func collectRadioButtonGroupOptions(xRefTable *model.XRefTable, d types.Dict) (string, error) { +func collectRadioButtonGroupOptions(xRefTable *model.XRefTable, d types.Dict) ([]string, error) { - var vv []string + vv, err := parseOptions(xRefTable, d, OPTIONAL) + if err != nil { + return nil, err + } + if len(vv) > 0 { + return vv, nil + } for _, o := range d.ArrayEntry("Kids") { + d, err := xRefTable.DereferenceDict(o) if err != nil { - return "", err - } - d1 := d.DictEntry("AP") - if d1 == nil { - return "", errors.New("corrupt form field: missing entry AP") + return nil, err } - d2 := d1.DictEntry("N") - if d2 == nil { - return "", errors.New("corrupt AP field: missing entry N") + + d1, err := locateAPN(xRefTable, d) + if err != nil { + return nil, err } - for k := range d2 { + + for k := range d1 { k, err := types.DecodeName(k) if err != nil { - return "", err + return nil, err } if k != "Off" { found := false @@ -323,19 +335,40 @@ func collectRadioButtonGroupOptions(xRefTable *model.XRefTable, d types.Dict) (s } } - return strings.Join(vv, ","), nil + return vv, nil } func collectRadioButtonGroup(xRefTable *model.XRefTable, d types.Dict, f *Field, fm *FieldMeta) error { f.Typ = FTRadioButtonGroup + opts, err := collectRadioButtonGroupOptions(xRefTable, d) + if err != nil { + return err + } + + f.Opts = strings.Join(opts, ",") + if len(f.Opts) > 0 { + fm.opt = true + } + if s := d.NameEntry("V"); s != nil { v, err := types.DecodeName(*s) if err != nil { return err } if v != "Off" { + if len(opts) > 0 { + j, err := strconv.Atoi(v) + if err == nil { + for i, o := range opts { + if i == j { + v = o + break + } + } + } + } if w := runewidth.StringWidth(v); w > fm.valMax { fm.valMax = w } @@ -344,16 +377,6 @@ func collectRadioButtonGroup(xRefTable *model.XRefTable, d types.Dict, f *Field, } } - s, err := collectRadioButtonGroupOptions(xRefTable, d) - if err != nil { - return err - } - - f.Opts = s - if len(f.Opts) > 0 { - fm.opt = true - } - return nil } @@ -381,13 +404,14 @@ func collectBtn(xRefTable *model.XRefTable, d types.Dict, f *Field, fm *FieldMet f.Dv = dv } - if len(d.ArrayEntry("Kids")) > 0 { + if len(d.ArrayEntry("Kids")) > 1 { return collectRadioButtonGroup(xRefTable, d, f, fm) } f.Typ = FTCheckBox if o, found := d.Find("V"); found { - if o.(types.Name) == "Yes" { + n := o.(types.Name) + if len(n) > 0 && n != "Off" { v := "Yes" if len(v) > fm.valMax { fm.valMax = len(v) @@ -400,7 +424,7 @@ func collectBtn(xRefTable *model.XRefTable, d types.Dict, f *Field, fm *FieldMet return nil } -func collectComboBox(xRefTable *model.XRefTable, d types.Dict, f *Field, fm *FieldMeta) error { +func collectComboBox(d types.Dict, f *Field, fm *FieldMeta) error { f.Typ = FTComboBox if sl := d.StringLiteralEntry("V"); sl != nil { v, err := types.StringLiteralToString(*sl) @@ -484,7 +508,7 @@ func collectListBox(xRefTable *model.XRefTable, multi bool, d types.Dict, f *Fie func collectCh(xRefTable *model.XRefTable, d types.Dict, f *Field, fm *FieldMeta) error { ff := d.IntEntry("Ff") - vv, err := parseOptions(xRefTable, d) + vv, err := parseOptions(xRefTable, d, REQUIRED) if err != nil { return err } @@ -495,7 +519,7 @@ func collectCh(xRefTable *model.XRefTable, d types.Dict, f *Field, fm *FieldMeta } if ff != nil && primitives.FieldFlags(*ff)&primitives.FieldCombo > 0 { - return collectComboBox(xRefTable, d, f, fm) + return collectComboBox(d, f, fm) } multi := ff != nil && (primitives.FieldFlags(*ff)&primitives.FieldMultiselect > 0) @@ -503,42 +527,91 @@ func collectCh(xRefTable *model.XRefTable, d types.Dict, f *Field, fm *FieldMeta return collectListBox(xRefTable, multi, d, f, fm) } -func collectTx(xRefTable *model.XRefTable, d types.Dict, f *Field, fm *FieldMeta) error { +func inheritedV(xRefTable *model.XRefTable, d types.Dict) (string, error) { if o, found := d.Find("V"); found { - sl, _ := o.(types.StringLiteral) - s, err := types.StringLiteralToString(sl) + s1, err := types.StringOrHexLiteral(o) if err != nil { - return err + return "", err } - v := s - if i := strings.Index(s, "\n"); i >= 0 { - v = s[:i] - v += "\\n" + if s1 != nil { + return *s1, nil } + } + indRef := d.IndirectRefEntry("Parent") + if indRef == nil { + return "", nil + } + d, err := xRefTable.DereferenceDict(*indRef) + if err != nil { + return "", err + } + return inheritedV(xRefTable, d) +} + +func getV(xRefTable *model.XRefTable, d types.Dict) (string, error) { + v, err := inheritedV(xRefTable, d) + if err != nil { + return "", err + } + return v, nil +} + +func inheritedDV(xRefTable *model.XRefTable, d types.Dict) (string, error) { + if o, found := d.Find("DV"); found { + s1, err := types.StringOrHexLiteral(o) + if err != nil { + return "", err + } + if s1 != nil { + return *s1, nil + } + } + indRef := d.IndirectRefEntry("Parent") + if indRef == nil { + return "", nil + } + d, err := xRefTable.DereferenceDict(*indRef) + if err != nil { + return "", err + } + return inheritedDV(xRefTable, d) +} + +func getDV(xRefTable *model.XRefTable, d types.Dict) (string, error) { + dv, err := inheritedDV(xRefTable, d) + if err != nil { + return "", err + } + return dv, nil +} + +func collectTx(xRefTable *model.XRefTable, d types.Dict, f *Field, fm *FieldMeta) error { + v, err := getV(xRefTable, d) + if err != nil { + return err + } + if v != "" { + v = strings.ReplaceAll(v, "\x0A", "\\n") if w := runewidth.StringWidth(v); w > fm.valMax { fm.valMax = w } fm.val = true f.V = v } - if o, found := d.Find("DV"); found { - sl, _ := o.(types.StringLiteral) - s, err := types.StringLiteralToString(sl) - if err != nil { - return err - } - dv := s - if i := strings.Index(s, "\n"); i >= 0 { - dv = dv[:i] - dv += "\\n" - } - if w := runewidth.StringWidth(dv); w > fm.defMax { - fm.defMax = w + dv, err := getDV(xRefTable, d) + if err != nil { + return err + } + if dv != "" { + dv = strings.ReplaceAll(dv, "\x0A", "\\n") + if w := runewidth.StringWidth(dv); w > fm.valMax { + fm.valMax = w } fm.def = true f.Dv = dv } + df, err := extractDateFormat(xRefTable, d) if err != nil { return err @@ -550,28 +623,47 @@ func collectTx(xRefTable *model.XRefTable, d types.Dict, f *Field, fm *FieldMeta return nil } -func collectPageField( - xRefTable *model.XRefTable, - d types.Dict, - i int, - fi *fieldInfo, - fm *FieldMeta, - fs *[]Field) error { +func collectField(xRefTable *model.XRefTable, ft string, d types.Dict, f *Field, fm *FieldMeta) error { + var err error - exists := false + switch ft { + case "Btn": + err = collectBtn(xRefTable, d, f, fm) + case "Ch": + err = collectCh(xRefTable, d, f, fm) + case "Tx": + err = collectTx(xRefTable, d, f, fm) + } + + return err +} + +func locateField(fs *[]Field, fi *fieldInfo, fm *FieldMeta, pageNr int) bool { for j, field := range *fs { if field.ID == fi.id && field.Name == fi.name { - field.Pages = append(field.Pages, i) + field.Pages = append(field.Pages, pageNr) ps := field.pageString() if len(ps) > fm.pageMax { fm.pageMax = len(ps) } (*fs)[j] = field - exists = true + return true } } + return false +} - f := Field{Pages: []int{i}} +func collectPageField( + xRefTable *model.XRefTable, + d types.Dict, + pageNr int, + fi *fieldInfo, + fm *FieldMeta, + fs *[]Field) error { + + foundField := locateField(fs, fi, fm, pageNr) + + f := Field{Pages: []int{pageNr}} f.ID = fi.id if w := runewidth.StringWidth(fi.id); w > fm.idMax { @@ -594,28 +686,36 @@ func collectPageField( if ft == nil { ft = d.NameEntry("FT") if ft == nil { - return errors.Errorf("pdfcpu: corrupt form field %s: missing entry FT\n%s", f.ID, d) + return errors.Errorf("pdfcpu: corrupt form field %s: missing entry \"FT\"\n%s", f.ID, d) } } - var err error - - switch *ft { - case "Btn": - err = collectBtn(xRefTable, d, &f, fm) - - case "Ch": - err = collectCh(xRefTable, d, &f, fm) + if o, found := d.Find("TU"); found { + s1, err := types.StringOrHexLiteral(o) + if err != nil { + return err + } + s := "" + if s1 != nil { + s = *s1 + } + if len(s) > 80 { + s = s[:40] + } + altName := s - case "Tx": - err = collectTx(xRefTable, d, &f, fm) + if w := runewidth.StringWidth(altName); w > fm.altNameMax { + fm.altNameMax = w + } + fm.altName = true + f.AltName = altName } - if err != nil { + if err := collectField(xRefTable, *ft, d, &f, fm); err != nil { return err } - if !exists { + if !foundField { *fs = append(*fs, f) } @@ -716,6 +816,15 @@ func calcListHeader(fm *FieldMeta) (string, []int) { horSep = append(horSep, 6) } + if fm.altName { + s += draw.VBar + " AltName " + if fm.altNameMax > 7 { + s += strings.Repeat(" ", fm.altNameMax-7) + horSep = append(horSep, 9+fm.altNameMax-7) + } else { + horSep = append(horSep, 9) + } + } if fm.def { s += draw.VBar + " Default " if fm.defMax > 7 { @@ -763,7 +872,7 @@ func multiPageFieldsMap(fs []Field) map[string][]Field { return m } -func renderMultiPageFields(ctx *model.Context, m map[string][]Field, fm *FieldMeta) ([]string, error) { +func renderMultiPageFields(m map[string][]Field, fm *FieldMeta) ([]string, error) { var ss []string @@ -801,6 +910,10 @@ func renderMultiPageFields(ctx *model.Context, m map[string][]Field, fm *FieldMe nameFill := strings.Repeat(" ", fm.nameMax-runewidth.StringWidth(f.Name)) s := fmt.Sprintf("%s%s %s %-9s %s %s%s %s %s%s ", p, pageFill, l, t, draw.VBar, f.ID, idFill, draw.VBar, f.Name, nameFill) p = strings.Repeat(" ", len(p)) + if fm.altName { + altNameFill := strings.Repeat(" ", fm.altNameMax-runewidth.StringWidth(f.AltName)) + s += fmt.Sprintf("%s %s%s ", draw.VBar, f.AltName, altNameFill) + } if fm.def { dvFill := strings.Repeat(" ", fm.defMax-runewidth.StringWidth(f.Dv)) s += fmt.Sprintf("%s %s%s ", draw.VBar, f.Dv, dvFill) @@ -829,7 +942,7 @@ func renderFields(ctx *model.Context, fs []Field, fm *FieldMeta) ([]string, erro m := multiPageFieldsMap(fs) if len(m) > 0 { - ss1, err := renderMultiPageFields(ctx, m, fm) + ss1, err := renderMultiPageFields(m, fm) if err != nil { return nil, err } @@ -873,6 +986,10 @@ func renderFields(ctx *model.Context, fs []Field, fm *FieldMeta) ([]string, erro idFill := strings.Repeat(" ", fm.idMax-runewidth.StringWidth(f.ID)) nameFill := strings.Repeat(" ", fm.nameMax-runewidth.StringWidth(f.Name)) s := fmt.Sprintf("%s%s %s %-9s %s %s%s %s %s%s ", p, pageFill, l, t, draw.VBar, f.ID, idFill, draw.VBar, f.Name, nameFill) + if fm.altName { + altNameFill := strings.Repeat(" ", fm.altNameMax-runewidth.StringWidth(f.AltName)) + s += fmt.Sprintf("%s %s%s ", draw.VBar, f.AltName, altNameFill) + } if fm.def { dvFill := strings.Repeat(" ", fm.defMax-runewidth.StringWidth(f.Dv)) s += fmt.Sprintf("%s %s%s ", draw.VBar, f.Dv, dvFill) @@ -901,7 +1018,7 @@ func FormFields(ctx *model.Context) ([]Field, *FieldMeta, error) { return nil, nil, err } - fm := &FieldMeta{pageMax: 2, idMax: 3, nameMax: 4, defMax: 7, valMax: 5} + fm := &FieldMeta{pageMax: 2, idMax: 3, nameMax: 4, altNameMax: 7, defMax: 7, valMax: 5} fs, err := collectFields(xRefTable, fields, fm) if err != nil { @@ -1051,7 +1168,7 @@ func removeIndRefByIndex(indRefs []types.IndirectRef, i int) []types.IndirectRef return indRefs[:lastIndex] } -func removeFromFields(xRefTable *model.XRefTable, indRefs *[]types.IndirectRef, fields *types.Array) error { +func removeFormFields(xRefTable *model.XRefTable, indRefs *[]types.IndirectRef, fields *types.Array) error { f := types.Array{} for _, v := range *fields { indRef1 := v.(types.IndirectRef) @@ -1085,7 +1202,7 @@ func removeFromFields(xRefTable *model.XRefTable, indRefs *[]types.IndirectRef, if err != nil { return err } - if err := removeFromFields(xRefTable, indRefs, &kids); err != nil { + if err := removeFormFields(xRefTable, indRefs, &kids); err != nil { return err } if len(kids) > 0 { @@ -1164,7 +1281,7 @@ func RemoveFormFields(ctx *model.Context, fieldIDsOrNames []string) (bool, error copy(indRefsClone, indRefs) // Remove fields from AcroDict. - if err := removeFromFields(xRefTable, &indRefsClone, &fields); err != nil { + if err := removeFormFields(xRefTable, &indRefsClone, &fields); err != nil { return false, err } @@ -1244,19 +1361,18 @@ func resetBtn(xRefTable *model.XRefTable, d types.Dict) error { // RadiobuttonGroup for _, o := range d.ArrayEntry("Kids") { + d, err := xRefTable.DereferenceDict(o) if err != nil { return err } - d1 := d.DictEntry("AP") - if d1 == nil { - return errors.New("corrupt form field: missing entry AP") - } - d2 := d1.DictEntry("N") - if d2 == nil { - return errors.New("corrupt AP field: missing entry N") + + d1, err := locateAPN(xRefTable, d) + if err != nil { + return err } - for k := range d2 { + + for k := range d1 { k, err := types.DecodeName(k) if err != nil { return err @@ -1333,10 +1449,10 @@ func resetMultiListBox(xRefTable *model.XRefTable, d types.Dict, opts []string) func resetCh(ctx *model.Context, d types.Dict, fonts map[string]types.IndirectRef) error { ff := d.IntEntry("Ff") if ff == nil { - return errors.New("pdfcpu: corrupt form field: missing entry Ff") + return errors.New("pdfcpu: corrupt form field: missing entry \"Ff\"") } - opts, err := parseOptions(ctx.XRefTable, d) + opts, err := parseOptions(ctx.XRefTable, d, REQUIRED) if err != nil { return err } @@ -1356,8 +1472,10 @@ func resetCh(ctx *model.Context, d types.Dict, fonts map[string]types.IndirectRe return err } + da := d.StringEntry("DA") + if primitives.FieldFlags(*ff)&primitives.FieldCombo == 0 { - if err := primitives.EnsureListBoxAP(ctx, d, opts, ind, fonts); err != nil { + if err := primitives.EnsureListBoxAP(ctx, d, opts, ind, da, fonts); err != nil { return err } } @@ -1371,8 +1489,12 @@ func resetTx(ctx *model.Context, d types.Dict, fonts map[string]types.IndirectRe err error ) if o, found := d.Find("DV"); found { - d["V"] = o - sl, _ := o.(types.StringLiteral) + o1, err := ctx.Dereference(o) + if err != nil { + return err + } + d["V"] = o1 + sl, _ := o1.(types.StringLiteral) s, err = types.StringLiteralToString(sl) if err != nil { return err @@ -1384,18 +1506,46 @@ func resetTx(ctx *model.Context, d types.Dict, fonts map[string]types.IndirectRe d.Delete("V") } - isDate := true + isDate := false if s != "" { _, err := primitives.DateFormatForDate(s) isDate = err == nil } + ff := d.IntEntry("Ff") + multiLine := ff != nil && uint(primitives.FieldFlags(*ff))&uint(primitives.FieldMultiline) > 0 + comb := ff != nil && uint(primitives.FieldFlags(*ff))&uint(primitives.FieldComb) > 0 + + da := d.StringEntry("DA") + + kids := d.ArrayEntry("Kids") + if len(kids) > 0 { + + for _, o := range kids { + + d, err := ctx.DereferenceDict(o) + if err != nil { + return err + } + + if isDate { + err = primitives.EnsureDateFieldAP(ctx, d, s, da, fonts) + } else { + err = primitives.EnsureTextFieldAP(ctx, d, s, multiLine, comb, 0, da, fonts) + } + + if err != nil { + return err + } + } + + return nil + } + if isDate { - err = primitives.EnsureDateFieldAP(ctx, d, s, fonts) + err = primitives.EnsureDateFieldAP(ctx, d, s, da, fonts) } else { - ff := d.IntEntry("Ff") - multiLine := ff != nil && uint(primitives.FieldFlags(*ff))&uint(primitives.FieldMultiline) > 0 - err = primitives.EnsureTextFieldAP(ctx, d, s, multiLine, fonts) + err = primitives.EnsureTextFieldAP(ctx, d, s, multiLine, comb, 0, da, fonts) } return err @@ -1450,7 +1600,7 @@ func resetPageFields( if ft == nil { ft = d.NameEntry("FT") if ft == nil { - return errors.Errorf("pdfcpu: corrupt form field %s: missing entry FT\n%s", fi.id, d) + return errors.Errorf("pdfcpu: corrupt form field %s: missing entry \"FT\"\n%s", fi.id, d) } } @@ -1550,10 +1700,12 @@ func ensureAP(ctx *model.Context, d types.Dict, fi *fieldInfo, fonts map[string] if ft == nil { ft = d.NameEntry("FT") if ft == nil { - return errors.Errorf("pdfcpu: corrupt form field %s: missing entry FT\n%s", fi.id, d) + return errors.Errorf("pdfcpu: corrupt form field %s: missing entry \"FT\"\n%s", fi.id, d) } } + da := d.StringEntry("DA") + if *ft == "Ch" { ff := d.IntEntry("Ff") @@ -1568,7 +1720,7 @@ func ensureAP(ctx *model.Context, d types.Dict, fi *fieldInfo, fonts map[string] v = s } - if err := primitives.EnsureComboBoxAP(ctx, d, v, fonts); err != nil { + if err := primitives.EnsureComboBoxAP(ctx, d, v, da, fonts); err != nil { return err } @@ -1712,7 +1864,7 @@ func deleteAP(d types.Dict, fi *fieldInfo) error { if ft == nil { ft = d.NameEntry("FT") if ft == nil { - return errors.Errorf("pdfcpu: corrupt form field %s: missing entry FT\n%s", fi.id, d) + return errors.Errorf("pdfcpu: corrupt form field %s: missing entry \"FT\"\n%s", fi.id, d) } } if *ft == "Ch" { diff --git a/pkg/pdfcpu/iccProfile.go b/pkg/pdfcpu/iccProfile.go index 6ec8b84b..8fa038e0 100644 --- a/pkg/pdfcpu/iccProfile.go +++ b/pkg/pdfcpu/iccProfile.go @@ -28,7 +28,7 @@ import ( // // We fall back to the alternate color space and if there is none to whatever color space makes sense. -//ICC profiles use big endian always. +// ICC profiles use big endian always. type iccProfile struct { b []byte rX, rY, rZ float32 // redMatrixColumn; the first column in the matrix, which is used in matrix/TRC transforms. @@ -279,7 +279,7 @@ func (p iccProfile) String() string { s += fmt.Sprintf("Tag %d: signature:%s offset:%d(#%02x) size:%d(#%02x)\n%s\n", i, sig, off, off, size, size, hex.Dump(p.b[off:off+size])) //s += fmt.Sprintf("Tag %d: signature:%s offset:%d(#%02x) size:%d(#%02x)\n", i, sig, off, off, size, size) } - s += fmt.Sprintf("Matrix:\n") + s += "Matrix:\n" s += fmt.Sprintf("%4.4f %4.4f %4.4f\n", p.rX, p.gX, p.bX) s += fmt.Sprintf("%4.4f %4.4f %4.4f\n", p.rY, p.gY, p.bY) s += fmt.Sprintf("%4.4f %4.4f %4.4f\n", p.rZ, p.gZ, p.bZ) diff --git a/pkg/pdfcpu/image.go b/pkg/pdfcpu/image.go index fc346747..2f73dfba 100644 --- a/pkg/pdfcpu/image.go +++ b/pkg/pdfcpu/image.go @@ -18,6 +18,7 @@ package pdfcpu import ( "fmt" + "io" "path/filepath" "sort" "strconv" @@ -27,6 +28,7 @@ import ( "github.com/angel-one/pdfcpu/pkg/pdfcpu/draw" "github.com/angel-one/pdfcpu/pkg/pdfcpu/model" "github.com/angel-one/pdfcpu/pkg/pdfcpu/types" + "github.com/pkg/errors" ) // Images returns all embedded images of ctx. @@ -42,10 +44,17 @@ func Images(ctx *model.Context, selectedPages types.IntSet) ([]map[int]model.Ima mm := []map[int]model.Image{} var ( - maxLenObjNr, maxLenID, maxLenSize, maxLenFilters int + maxLenPageNr, maxLenObjNr, maxLenID, maxLenSize, maxLenFilters int ) + maxPageNr := 0 + for _, i := range pageNrs { + + if i > maxPageNr { + maxPageNr = i + } + m, err := ExtractPageImages(ctx, i, true) if err != nil { return nil, nil, err @@ -72,18 +81,28 @@ func Images(ctx *model.Context, selectedPages types.IntSet) ([]map[int]model.Ima mm = append(mm, m) } - maxLen := &ImageListMaxLengths{ObjNr: maxLenObjNr, ID: maxLenID, Size: maxLenSize, Filters: maxLenFilters} + maxLenPageNr = len(strconv.Itoa(maxPageNr)) + + maxLen := &ImageListMaxLengths{PageNr: maxLenPageNr, ObjNr: maxLenObjNr, ID: maxLenID, Size: maxLenSize, Filters: maxLenFilters} return mm, maxLen, nil } func prepHorSep(horSep *[]int, maxLen *ImageListMaxLengths) string { - s := "Page Obj# " + s := "Page " + if maxLen.PageNr > 4 { + s += strings.Repeat(" ", maxLen.PageNr-4) + *horSep = append(*horSep, 5+maxLen.PageNr-4) + } else { + *horSep = append(*horSep, 5) + } + + s += draw.VBar + " Obj# " if maxLen.ObjNr > 4 { s += strings.Repeat(" ", maxLen.ObjNr-4) - *horSep = append(*horSep, 10+maxLen.ObjNr-4) + *horSep = append(*horSep, 6+maxLen.ObjNr-4) } else { - *horSep = append(*horSep, 10) + *horSep = append(*horSep, 6) } s += draw.VBar + " Id " @@ -126,7 +145,39 @@ func sortedObjNrs(ii map[int]model.Image) []int { return objNrs } -func listImages(ctx *model.Context, mm []map[int]model.Image, maxLen *ImageListMaxLengths) ([]string, int, int64, error) { +func attrs(img model.Image) (string, string, string, string, string) { + t := "image" + if img.IsImgMask { + t = "imask" + } + if img.Thumb { + t = "thumb" + } + + sm := " " + if img.HasSMask { + sm = "*" + } + + im := " " + if img.HasImgMask { + im = "*" + } + + bpc := "-" + if img.Bpc > 0 { + bpc = strconv.Itoa(img.Bpc) + } + + interp := " " + if img.Interpol { + interp = "*" + } + + return t, sm, im, bpc, interp +} + +func listImages(mm []map[int]model.Image, maxLen *ImageListMaxLengths) ([]string, int, int64) { ss := []string{} first := true j, size := 0, int64(0) @@ -144,40 +195,22 @@ func listImages(ctx *model.Context, mm []map[int]model.Image, maxLen *ImageListM for _, objNr := range sortedObjNrs(ii) { img := ii[objNr] - pageNr := "" - if newPage { - pageNr = strconv.Itoa(img.PageNr) + pageNr := strconv.Itoa(img.PageNr) + if !newPage { + pageNr = strings.Repeat(" ", len(pageNr)) + } else { newPage = false } - t := "image" - if img.IsImgMask { - t = "imask" - } - if img.Thumb { - t = "thumb" - } - sm := " " - if img.HasSMask { - sm = "*" - } - - im := " " - if img.HasImgMask { - im = "*" - } + t, sm, im, bpc, interp := attrs(img) - bpc := "-" - if img.Bpc > 0 { - bpc = strconv.Itoa(img.Bpc) + s := strconv.Itoa(img.PageNr) + fill0 := strings.Repeat(" ", maxLen.PageNr-len(s)) + if maxLen.PageNr < 4 { + fill0 += strings.Repeat(" ", 4-maxLen.PageNr) } - interp := " " - if img.Interpol { - interp = "*" - } - - s := strconv.Itoa(img.ObjNr) + s = strconv.Itoa(img.ObjNr) fill1 := strings.Repeat(" ", maxLen.ObjNr-len(s)) if maxLen.ObjNr < 4 { fill1 += strings.Repeat(" ", 4-maxLen.ObjNr) @@ -194,8 +227,9 @@ func listImages(ctx *model.Context, mm []map[int]model.Image, maxLen *ImageListM fill3 = strings.Repeat(" ", 4-maxLen.Size) } - ss = append(ss, fmt.Sprintf("%4s %s%s %s %s%s %s %s %s %s %s %5d %s %5d %s %10s %d %s %s %s %s%s %s %s", - pageNr, fill1, strconv.Itoa(img.ObjNr), draw.VBar, + ss = append(ss, fmt.Sprintf("%s%s %s %s%s %s %s%s %s %s %s %s %s %5d %s %5d %s %10s %d %s %s %s %s%s %s %s", + fill0, pageNr, draw.VBar, + fill1, strconv.Itoa(img.ObjNr), draw.VBar, fill2, img.Name, draw.VBar, t, sm, im, draw.VBar, img.Width, draw.VBar, @@ -210,11 +244,11 @@ func listImages(ctx *model.Context, mm []map[int]model.Image, maxLen *ImageListM } } } - return ss, j, size, nil + return ss, j, size } type ImageListMaxLengths struct { - ObjNr, ID, Size, Filters int + PageNr, ObjNr, ID, Size, Filters int } // ListImages returns a formatted list of embedded images. @@ -225,12 +259,15 @@ func ListImages(ctx *model.Context, selectedPages types.IntSet) ([]string, error return nil, err } - ss, j, size, err := listImages(ctx, mm, maxLen) - if err != nil { - return nil, err + ss, j, size := listImages(mm, maxLen) + + s := fmt.Sprintf("%d images available", j) + + if j > 0 { + s += fmt.Sprintf(" (%s)", types.ByteSize(size)) } - return append([]string{fmt.Sprintf("%d images available(%s)", j, types.ByteSize(size))}, ss...), nil + return append([]string{s}, ss...), nil } // WriteImageToDisk returns a closure for writing img to disk. @@ -245,14 +282,140 @@ func WriteImageToDisk(outDir, fileName string) func(model.Image, bool, int) erro qual = "thumb" } f := fmt.Sprintf(s+"_%s.%s", fileName, img.PageNr, qual, img.FileType) - // if singleImgPerPage { - // if img.thumb { - // s += "_" + qual - // } - // f = fmt.Sprintf(s+".%s", fileName, img.pageNr, img.FileType) - // } outFile := filepath.Join(outDir, f) log.CLI.Printf("writing %s\n", outFile) return WriteReader(outFile, img) } } + +func validateImageDimensions(ctx *model.Context, objNr, w, h int) error { + imgObj := ctx.Optimize.ImageObjects[objNr] + if imgObj == nil { + return errors.Errorf("pdfcpu: unknown image object for objNr=%d", objNr) + } + + d := imgObj.ImageDict + + width := d.IntEntry("Width") + height := d.IntEntry("Height") + + if width == nil || height == nil { + return errors.New("pdfcpu: corrupt image dict") + } + + if *width != w || *height != h { + return errors.Errorf("pdfcpu: invalid image dimensions, want(%d,%d), got(%d,%d)", w, h, *width, *height) + } + + return nil +} + +// UpdateImagesByObjNr replaces an XObject. +func UpdateImagesByObjNr(ctx *model.Context, rd io.Reader, objNr int) error { + + sd, w, h, err := model.CreateImageStreamDict(ctx.XRefTable, rd) + if err != nil { + return err + } + + if err := validateImageDimensions(ctx, objNr, w, h); err != nil { + return err + } + + genNr := 0 + entry, ok := ctx.FindTableEntry(objNr, genNr) + if !ok { + errors.Errorf("pdfcpu: invalid objNr=%d", objNr) + } + + entry.Object = *sd + + return nil +} + +func isInheritedXObjectResource(inhRes types.Dict, id string) bool { + if inhRes == nil { + return false + } + + d := inhRes.DictEntry("XObject") + if d == nil { + return false + } + + for resId := range d { + if resId == id { + return true + } + } + + return false +} + +// UpdateImagesByPageNrAndId replaces the XObject referenced by pageNr and id. +func UpdateImagesByPageNrAndId(ctx *model.Context, rd io.Reader, pageNr int, id string) error { + + imgIndRef, w, h, err := model.CreateImageResource(ctx.XRefTable, rd) + if err != nil { + return err + } + + d, _, inhPAttrs, err := ctx.PageDict(pageNr, false) + if err != nil { + return err + } + + obj, found := d.Find("Resources") + if !found { + if isInheritedXObjectResource(inhPAttrs.Resources, id) { + d1 := types.NewDict() + d1[id] = *imgIndRef + d2 := types.NewDict() + d2["XObject"] = d1 + d["Resources"] = d2 + return nil + } + return errors.Errorf("pdfcpu: page %d: unknown resource %s\n", pageNr, id) + } + + resDict, err := ctx.DereferenceDict(obj) + if err != nil { + return err + } + + obj1, ok := resDict.Find("XObject") + if !ok { + if isInheritedXObjectResource(inhPAttrs.Resources, id) { + d := types.NewDict() + d[id] = *imgIndRef + resDict["XObject"] = d + return nil + } + return errors.Errorf("pdfcpu: page %d: unknown resource %s\n", pageNr, id) + } + + imgResDict, err := ctx.DereferenceDict(obj1) + if err != nil { + return err + } + + for resId, indRef := range imgResDict { + if resId == id { + + ir := indRef.(types.IndirectRef) + if err := validateImageDimensions(ctx, ir.ObjectNumber.Value(), w, h); err != nil { + return err + } + + imgResDict[id] = *imgIndRef + return nil + } + } + + if isInheritedXObjectResource(inhPAttrs.Resources, id) { + imgResDict[id] = *imgIndRef + return nil + } + + return errors.Errorf("pdfcpu: page %d: unknown resource %s\n", pageNr, id) +} diff --git a/pkg/pdfcpu/image_test.go b/pkg/pdfcpu/image_test.go index 5fab526c..ca3683e1 100644 --- a/pkg/pdfcpu/image_test.go +++ b/pkg/pdfcpu/image_test.go @@ -87,7 +87,7 @@ func streamDictForJPGFile(xRefTable *model.XRefTable, fileName string) (*types.S } - sd, err := model.CreateDCTImageObject(xRefTable, bb, c.Width, c.Height, 8, cs) + sd, err := model.CreateDCTImageStreamDict(xRefTable, bb, c.Width, c.Height, 8, cs) if err != nil { return nil, err } @@ -107,7 +107,7 @@ func streamDictForImageFile(xRefTable *model.XRefTable, fileName string) (*types } defer f.Close() - sd, _, _, err := model.CreateImageStreamDict(xRefTable, f, false, false) + sd, _, _, err := model.CreateImageStreamDict(xRefTable, f) return sd, err } @@ -217,7 +217,7 @@ func TestReadWritePNGAndWEBP(t *testing.T) { } // Read in a device gray image stream dump from disk. -func read1BPCDeviceGrayFlateStreamDump(xRefTable *model.XRefTable, fileName string) (*types.StreamDict, error) { +func read1BPCDeviceGrayFlateStreamDump(fileName string) (*types.StreamDict, error) { f, err := os.Open(fileName) if err != nil { return nil, err @@ -257,7 +257,7 @@ func TestReadDeviceGrayWritePNG(t *testing.T) { filename := "DeviceGray" path := filepath.Join(inDir, filename+".raw") - sd, err := read1BPCDeviceGrayFlateStreamDump(xRefTable, path) + sd, err := read1BPCDeviceGrayFlateStreamDump(path) if err != nil { t.Fatalf("err: %v\n", err) } @@ -305,7 +305,7 @@ func TestReadDeviceGrayWritePNG(t *testing.T) { } // Read in a device CMYK image stream dump from disk. -func read8BPCDeviceCMYKFlateStreamDump(xRefTable *model.XRefTable, fileName string) (*types.StreamDict, error) { +func read8BPCDeviceCMYKFlateStreamDump(fileName string) (*types.StreamDict, error) { f, err := os.Open(fileName) if err != nil { return nil, err @@ -352,7 +352,7 @@ func TestReadCMYKWriteTIFF(t *testing.T) { filename := "DeviceCMYK" path := filepath.Join(inDir, filename+".raw") - sd, err := read8BPCDeviceCMYKFlateStreamDump(xRefTable, path) + sd, err := read8BPCDeviceCMYKFlateStreamDump(path) if err != nil { t.Errorf("err: %v\n", err) } diff --git a/pkg/pdfcpu/importImage.go b/pkg/pdfcpu/importImage.go index ed891646..b8aac410 100644 --- a/pkg/pdfcpu/importImage.go +++ b/pkg/pdfcpu/importImage.go @@ -118,7 +118,7 @@ func parsePageFormatImp(s string, imp *Import) (err error) { return err } -func parsePageDim(v string, u types.DisplayUnit) (*types.Dim, string, error) { +func ParsePageDim(v string, u types.DisplayUnit) (*types.Dim, string, error) { ss := strings.Split(v, " ") if len(ss) != 2 { @@ -127,12 +127,12 @@ func parsePageDim(v string, u types.DisplayUnit) (*types.Dim, string, error) { w, err := strconv.ParseFloat(ss[0], 64) if err != nil || w <= 0 { - return nil, v, errors.Errorf("pdfcpu: dimension X must be a positiv numeric value: %s\n", ss[0]) + return nil, v, errors.Errorf("pdfcpu: dimension X must be a positive numeric value: %s\n", ss[0]) } h, err := strconv.ParseFloat(ss[1], 64) if err != nil || h <= 0 { - return nil, v, errors.Errorf("pdfcpu: dimension Y must be a positiv numeric value: %s\n", ss[1]) + return nil, v, errors.Errorf("pdfcpu: dimension Y must be a positive numeric value: %s\n", ss[1]) } d := types.Dim{Width: types.ToUserSpace(w, u), Height: types.ToUserSpace(h, u)} @@ -144,7 +144,7 @@ func parseDimensionsImp(s string, imp *Import) (err error) { if imp.UserDim { return errors.New("pdfcpu: only one of formsize(papersize) or dimensions allowed") } - imp.PageDim, imp.PageSize, err = parsePageDim(s, imp.InpUnit) + imp.PageDim, imp.PageSize, err = ParsePageDim(s, imp.InpUnit) imp.UserDim = true return err } @@ -327,56 +327,72 @@ func importImagePDFBytes(wr io.Writer, pageDim *types.Dim, imgWidth, imgHeight f m[0][0], m[0][1], m[1][0], m[1][1], m[2][0], m[2][1]) } -// NewPageForImage creates a new page dict in xRefTable for given image reader r. -func NewPageForImage(xRefTable *model.XRefTable, r io.Reader, parentIndRef *types.IndirectRef, imp *Import) (*types.IndirectRef, error) { +// NewPagesForImage creates a new page dicts in xRefTable for given image reader r. +func NewPagesForImage(xRefTable *model.XRefTable, r io.Reader, parentIndRef *types.IndirectRef, imp *Import) ([]*types.IndirectRef, error) { // create image dict. - imgIndRef, w, h, err := model.CreateImageResource(xRefTable, r, imp.Gray, imp.Sepia) + imgResources, err := model.CreateImageResources(xRefTable, r, imp.Gray, imp.Sepia) if err != nil { return nil, err } - // create resource dict for XObject. - d := types.Dict( - map[string]types.Object{ - "ProcSet": types.NewNameArray("PDF", "Text", "ImageB", "ImageC", "ImageI"), - "XObject": types.Dict(map[string]types.Object{"Im0": *imgIndRef}), - }, - ) + indRefs := []*types.IndirectRef{} - resIndRef, err := xRefTable.IndRefForNewObject(d) - if err != nil { - return nil, err - } + for _, imgRes := range imgResources { - dim := &types.Dim{Width: float64(w), Height: float64(h)} - if imp.Pos != types.Full { - dim = imp.PageDim - } - // mediabox = physical page dimensions - mediaBox := types.RectForDim(dim.Width, dim.Height) + // create resource dict for XObject. + d := types.Dict( + map[string]types.Object{ + "ProcSet": types.NewNameArray("PDF", "Text", "ImageB", "ImageC", "ImageI"), + "XObject": types.Dict(map[string]types.Object{imgRes.Res.ID: *imgRes.Res.IndRef}), + }, + ) - var buf bytes.Buffer - importImagePDFBytes(&buf, dim, float64(w), float64(h), imp) - sd, _ := xRefTable.NewStreamDictForBuf(buf.Bytes()) - if err = sd.Encode(); err != nil { - return nil, err - } + resIndRef, err := xRefTable.IndRefForNewObject(d) + if err != nil { + return nil, err + } - contentsIndRef, err := xRefTable.IndRefForNewObject(*sd) - if err != nil { - return nil, err + dim := &types.Dim{Width: float64(imgRes.Width), Height: float64(imgRes.Height)} + if imp.Pos != types.Full { + dim = imp.PageDim + } + // mediabox = physical page dimensions + mediaBox := types.RectForDim(dim.Width, dim.Height) + + var buf bytes.Buffer + importImagePDFBytes(&buf, dim, float64(imgRes.Width), float64(imgRes.Height), imp) + sd, err := xRefTable.NewStreamDictForBuf(buf.Bytes()) + if err != nil { + return nil, err + } + + if err = sd.Encode(); err != nil { + return nil, err + } + + contentsIndRef, err := xRefTable.IndRefForNewObject(*sd) + if err != nil { + return nil, err + } + + pageDict := types.Dict( + map[string]types.Object{ + "Type": types.Name("Page"), + "Parent": *parentIndRef, + "MediaBox": mediaBox.Array(), + "Resources": *resIndRef, + "Contents": *contentsIndRef, + }, + ) + + indRef, err := xRefTable.IndRefForNewObject(pageDict) + if err != nil { + return nil, err + } + + indRefs = append(indRefs, indRef) } - pageDict := types.Dict( - map[string]types.Object{ - "Type": types.Name("Page"), - "Parent": *parentIndRef, - "MediaBox": mediaBox.Array(), - "Resources": *resIndRef, - "Contents": *contentsIndRef, - }, - ) - - return xRefTable.IndRefForNewObject(pageDict) + return indRefs, nil } diff --git a/pkg/pdfcpu/info.go b/pkg/pdfcpu/info.go index b557e08f..74845873 100644 --- a/pkg/pdfcpu/info.go +++ b/pkg/pdfcpu/info.go @@ -19,6 +19,7 @@ package pdfcpu import ( "fmt" "sort" + "strings" "time" "github.com/angel-one/pdfcpu/pkg/log" @@ -345,6 +346,7 @@ type PDFInfo struct { Attachments []model.Attachment `json:"attachments,omitempty"` Unit types.DisplayUnit `json:"-"` UnitString string `json:"unit"` + Fonts []model.FontInfo `json:"fonts,omitempty"` } func (info PDFInfo) renderKeywords(ss *[]string) error { @@ -423,15 +425,17 @@ func (info PDFInfo) renderFlagsPart2(ss *[]string, separator string) { s = "Yes" } *ss = append(*ss, fmt.Sprintf(" Form: %s", s)) + + if info.Signatures || info.AppendOnly { + *ss = append(*ss, " Signatures: Yes") + } + if info.Form { - if info.Signatures || info.AppendOnly { - *ss = append(*ss, " SignaturesExist: Yes") - s = "No" - if info.AppendOnly { - s = "Yes" - } - *ss = append(*ss, fmt.Sprintf(" AppendOnly: %s", s)) + s = "No" + if info.AppendOnly { + s = "Yes" } + *ss = append(*ss, fmt.Sprintf(" AppendOnly: %s", s)) } s = "No" @@ -471,16 +475,69 @@ func (info *PDFInfo) renderPermissions(ss *[]string) { } func (info *PDFInfo) renderAttachments(ss *[]string) { - ss0 := []string{} - for _, a := range info.Attachments { - ss0 = append(ss0, a.FileName) + for i, a := range info.Attachments { + if i == 0 { + *ss = append(*ss, fmt.Sprintf("%20s: %s", "Attachments", a.FileName)) + continue + } + *ss = append(*ss, fmt.Sprintf("%20s %s", "", a.FileName)) + } +} + +func (info *PDFInfo) renderFonts(ss *[]string) { + if len(info.Fonts) == 0 { + *ss = append(*ss, fmt.Sprintf("%20s: No fonts available", "Fonts")) + return + } + + *ss = append(*ss, fmt.Sprintf("%20s:", "Fonts")) + + maxLenName := 0 + for _, fi := range info.Fonts { + name := fi.Name + if len(fi.Prefix) > 0 { + name = fi.Prefix + "-" + name + } + if len(name) > maxLenName { + maxLenName = len(name) + } + } + + *ss = append(*ss, fmt.Sprintf("Name%s Type Encoding Embedded", strings.Repeat(" ", maxLenName-4))) + *ss = append(*ss, fmt.Sprint(draw.HorSepLine([]int{41 + maxLenName}))) + for _, fi := range info.Fonts { + name := fi.Name + if len(fi.Prefix) > 0 { + name = fi.Prefix + "-" + name + } + *ss = append(*ss, fmt.Sprintf("%s%s %-10s %-20s %t", name, strings.Repeat(" ", maxLenName-len(name)), fi.Type, fi.Encoding, fi.Embedded)) + } +} + +func setupFontInfos(ctx *model.Context, fontInfos *[]model.FontInfo) { + var fontNames []string + for k := range ctx.Optimize.Fonts { + fontNames = append(fontNames, k) + } + sort.Strings(fontNames) + + for _, fontName := range fontNames { + for _, objNr := range ctx.Optimize.Fonts[fontName] { + fontObj := ctx.Optimize.FontObjects[objNr] + fontInfo := model.FontInfo{ + Prefix: fontObj.Prefix, + Name: fontObj.FontName, + Type: fontObj.SubType(), + Encoding: fontObj.Encoding(), + Embedded: fontObj.Embedded, + } + *fontInfos = append(*fontInfos, fontInfo) + } } - sort.Strings(ss0) - *ss = append(*ss, ss0...) } // Info returns info about ctx. -func Info(ctx *model.Context, fileName string, selectedPages types.IntSet) (*PDFInfo, error) { +func Info(ctx *model.Context, fileName string, selectedPages types.IntSet, fonts bool) (*PDFInfo, error) { info := &PDFInfo{FileName: fileName, Unit: ctx.Unit, UnitString: ctx.UnitString()} v := ctx.HeaderVersion @@ -510,10 +567,11 @@ func Info(ctx *model.Context, fileName string, selectedPages types.IntSet) (*PDF info.PageDimensions = m info.Title = ctx.Title + info.Author = ctx.Author info.Subject = ctx.Subject info.Producer = ctx.Producer info.Creator = ctx.Creator - info.CreationDate = ctx.CreationDate + info.CreationDate = ctx.XRefTable.CreationDate info.ModificationDate = ctx.ModDate info.PageMode = "" @@ -528,7 +586,7 @@ func Info(ctx *model.Context, fileName string, selectedPages types.IntSet) (*PDF info.ViewerPref = ctx.ViewerPref - kwl, err := KeywordsList(ctx.XRefTable) + kwl, err := KeywordsList(ctx) if err != nil { return nil, err } @@ -546,7 +604,7 @@ func Info(ctx *model.Context, fileName string, selectedPages types.IntSet) (*PDF info.Outlines = len(ctx.Outlines) > 0 info.Names = len(ctx.Names) > 0 - info.Signatures = ctx.SignatureExist + info.Signatures = ctx.SignatureExist || ctx.AppendOnly || len(ctx.Signatures) > 0 info.AppendOnly = ctx.AppendOnly info.Encrypted = ctx.Encrypt != nil @@ -560,11 +618,19 @@ func Info(ctx *model.Context, fileName string, selectedPages types.IntSet) (*PDF } info.Attachments = aa + fontInfos := []model.FontInfo{} + + if fonts { + setupFontInfos(ctx, &fontInfos) + } + + info.Fonts = fontInfos + return info, nil } // ListInfo returns formatted info about ctx. -func ListInfo(info *PDFInfo, selectedPages types.IntSet) ([]string, error) { +func ListInfo(info *PDFInfo, selectedPages types.IntSet, fonts bool) ([]string, error) { var separator = draw.HorSepLine([]int{44}) var ss []string @@ -605,5 +671,9 @@ func ListInfo(info *PDFInfo, selectedPages types.IntSet) ([]string, error) { info.renderPermissions(&ss) info.renderAttachments(&ss) + if fonts { + info.renderFonts(&ss) + } + return ss, nil } diff --git a/pkg/pdfcpu/io.go b/pkg/pdfcpu/io.go new file mode 100644 index 00000000..e07e1972 --- /dev/null +++ b/pkg/pdfcpu/io.go @@ -0,0 +1,64 @@ +/* +Copyright 2025 The pdfcpu Authors. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package pdfcpu + +import ( + "io" + "os" +) + +// Write rd to filepath and respect overwrite. +func Write(rd io.Reader, filepath string, overwrite bool) (bool, error) { + if !overwrite { + if _, err := os.Stat(filepath); err == nil { + return false, nil + } + } + + to, err := os.Create(filepath) + if err != nil { + return false, err + } + defer to.Close() + + _, err = io.Copy(to, rd) + return true, err +} + +// CopyFile copies srcFilename to destFilename +func CopyFile(srcFilename, destFilename string, overwrite bool) (bool, error) { + if !overwrite { + if _, err := os.Stat(destFilename); err == nil { + //log.Printf("skipping: %s already exists", filepath) + return false, nil + } + } + + from, err := os.Open(srcFilename) + if err != nil { + return false, err + } + defer from.Close() + to, err := os.Create(destFilename) + if err != nil { + return false, err + } + defer to.Close() + + _, err = io.Copy(to, from) + return true, err +} diff --git a/pkg/pdfcpu/keyword.go b/pkg/pdfcpu/keyword.go index d6f9fda7..707e74a2 100644 --- a/pkg/pdfcpu/keyword.go +++ b/pkg/pdfcpu/keyword.go @@ -24,44 +24,94 @@ import ( ) // KeywordsList returns a list of keywords as recorded in the document info dict. -func KeywordsList(xRefTable *model.XRefTable) ([]string, error) { - ss := strings.FieldsFunc(xRefTable.Keywords, func(c rune) bool { return c == ',' || c == ';' || c == '\r' }) - for i, s := range ss { - ss[i] = strings.TrimSpace(s) +func KeywordsList(ctx *model.Context) ([]string, error) { + var ss []string + for keyword, val := range ctx.KeywordList { + if val { + ss = append(ss, keyword) + } } return ss, nil } -// KeywordsAdd adds keywords to the document info dict. -// Returns true if at least one keyword was added. -func KeywordsAdd(xRefTable *model.XRefTable, keywords []string) error { - - list, err := KeywordsList(xRefTable) +func removeKeywordsFromMetadata(ctx *model.Context) error { + rootDict, err := ctx.Catalog() if err != nil { return err } - for _, s := range keywords { - if !types.MemberOf(s, list) { - xRefTable.Keywords += ", " + types.UTF8ToCP1252(s) - } + indRef, _ := rootDict["Metadata"].(types.IndirectRef) + entry, _ := ctx.FindTableEntryForIndRef(&indRef) + sd, _ := entry.Object.(types.StreamDict) + + if err = sd.Decode(); err != nil { + return err + } + + if err = model.RemoveKeywords(&sd.Content); err != nil { + return err + } + + //fmt.Println(hex.Dump(sd.Content)) + + if err := sd.Encode(); err != nil { + return err } - d, err := xRefTable.DereferenceDict(*xRefTable.Info) + entry.Object = sd + + return nil +} + +func finalizeKeywords(ctx *model.Context) error { + d, err := ctx.DereferenceDict(*ctx.Info) if err != nil || d == nil { return err } - d["Keywords"] = types.StringLiteral(xRefTable.Keywords) + ss, err := KeywordsList(ctx) + if err != nil { + return err + } + + s0 := strings.Join(ss, "; ") + + s, err := types.EscapedUTF16String(s0) + if err != nil { + return err + } + + d["Keywords"] = types.StringLiteral(*s) + + if ctx.CatalogXMPMeta != nil { + removeKeywordsFromMetadata(ctx) + } return nil } +// KeywordsAdd adds keywords to the document info dict. +// Returns true if at least one keyword was added. +func KeywordsAdd(ctx *model.Context, keywords []string) error { + if err := ensureInfoDictAndFileID(ctx); err != nil { + return err + } + + for _, keyword := range keywords { + ctx.KeywordList[strings.TrimSpace(keyword)] = true + } + + return finalizeKeywords(ctx) +} + // KeywordsRemove deletes keywords from the document info dict. // Returns true if at least one keyword was removed. -func KeywordsRemove(xRefTable *model.XRefTable, keywords []string) (bool, error) { - // TODO Handle missing info dict. - d, err := xRefTable.DereferenceDict(*xRefTable.Info) +func KeywordsRemove(ctx *model.Context, keywords []string) (bool, error) { + if ctx.Info == nil { + return false, nil + } + + d, err := ctx.DereferenceDict(*ctx.Info) if err != nil || d == nil { return false, err } @@ -69,38 +119,25 @@ func KeywordsRemove(xRefTable *model.XRefTable, keywords []string) (bool, error) if len(keywords) == 0 { // Remove all keywords. delete(d, "Keywords") - return true, nil - } - kw := make([]string, len(keywords)) - for i, s := range keywords { - kw[i] = types.UTF8ToCP1252(s) - } + if ctx.CatalogXMPMeta != nil { + removeKeywordsFromMetadata(ctx) + } - // Distil document keywords. - ss := strings.FieldsFunc(xRefTable.Keywords, func(c rune) bool { return c == ',' || c == ';' || c == '\r' }) + return true, nil + } - xRefTable.Keywords = "" var removed bool - first := true - - for _, s := range ss { - s = strings.TrimSpace(s) - if types.MemberOf(s, kw) { + for keyword := range ctx.KeywordList { + if types.MemberOf(keyword, keywords) { + ctx.KeywordList[keyword] = false removed = true - continue - } - if first { - xRefTable.Keywords = s - first = false - continue } - xRefTable.Keywords += ", " + s } if removed { - d["Keywords"] = types.StringLiteral(xRefTable.Keywords) + err = finalizeKeywords(ctx) } - return removed, nil + return removed, err } diff --git a/pkg/pdfcpu/merge.go b/pkg/pdfcpu/merge.go index 0c693bf7..b88d4af2 100644 --- a/pkg/pdfcpu/merge.go +++ b/pkg/pdfcpu/merge.go @@ -217,7 +217,7 @@ func handleCO(ctxSrc, ctxDest *model.Context, dSrc, dDest types.Dict) error { return nil } -func handleDR(ctxSrc, ctxDest *model.Context, dSrc, dDest types.Dict) error { +func handleDR(ctxSrc *model.Context, dSrc, dDest types.Dict) error { o, found := dSrc.Find("DR") if !found { return nil @@ -312,7 +312,7 @@ func handleFormAttributes(ctxSrc, ctxDest *model.Context, dSrc, dDest types.Dict } // DR: default resource dict - if err := handleDR(ctxSrc, ctxDest, dSrc, dDest); err != nil { + if err := handleDR(ctxSrc, dSrc, dDest); err != nil { return err } @@ -753,7 +753,7 @@ func createDividerPagesDict(ctx *model.Context, parentIndRef types.IndirectRef) last := len(dims) - 1 mediaBox := types.NewRectangle(0, 0, dims[last].Width, dims[last].Height) - indRefPageDict, err := ctx.EmptyPage(indRef, mediaBox) + indRefPageDict, err := ctx.EmptyPage(indRef, mediaBox, 0) if err != nil { return nil, err } diff --git a/pkg/pdfcpu/migrate.go b/pkg/pdfcpu/migrate.go index 89f93060..5a237b06 100644 --- a/pkg/pdfcpu/migrate.go +++ b/pkg/pdfcpu/migrate.go @@ -89,12 +89,8 @@ func migrateObject(o types.Object, ctxSource, ctxDest *model.Context, migrated m } func migrateAnnots(o types.Object, pageIndRef types.IndirectRef, ctxSrc, ctxDest *model.Context, migrated map[int]int) (types.Object, error) { - arr, err := ctxSrc.DereferenceArray(o) - if err != nil { - return nil, err - } - - for i, v := range arr { + arr := o.(types.Array) + for i, v := range o.(types.Array) { var d types.Dict o, ok := v.(types.IndirectRef) if ok { @@ -130,9 +126,11 @@ func migrateAnnots(o types.Object, pageIndRef types.IndirectRef, ctxSrc, ctxDest } pDict.Delete("Parent") } - if d[k], err = migrateObject(v, ctxSrc, ctxDest, migrated); err != nil { + o1, err := migrateObject(v, ctxSrc, ctxDest, migrated) + if err != nil { return nil, err } + d[k] = o1 } } @@ -146,6 +144,24 @@ func migratePageDict(d types.Dict, pageIndRef types.IndirectRef, ctxSrc, ctxDest continue } if k == "Annots" { + o, ok := d[k].(types.IndirectRef) + if ok { + objNr := o.ObjectNumber.Value() + if migrated[objNr] > 0 { + o.ObjectNumber = types.Integer(migrated[objNr]) + d[k] = o + continue + } + v, err = migrateIndRef(&o, ctxSrc, ctxDest, migrated) + if err != nil { + return err + } + d[k] = o + if _, err = migrateAnnots(v, pageIndRef, ctxSrc, ctxDest, migrated); err != nil { + return err + } + continue + } if d[k], err = migrateAnnots(v, pageIndRef, ctxSrc, ctxDest, migrated); err != nil { return err } @@ -158,6 +174,40 @@ func migratePageDict(d types.Dict, pageIndRef types.IndirectRef, ctxSrc, ctxDest return nil } +func migrateAnnot(indRef *types.IndirectRef, fieldsSrc, fieldsDest *types.Array, ctxSrc *model.Context, migrated map[int]int) error { + for _, v := range *fieldsSrc { + ir, ok := v.(types.IndirectRef) + if !ok { + continue + } + objNr := ir.ObjectNumber.Value() + if migrated[objNr] == indRef.ObjectNumber.Value() { + *fieldsDest = append(*fieldsDest, *indRef) + break + } + d, err := ctxSrc.DereferenceDict(ir) + if err != nil { + return err + } + o, ok := d.Find("Kids") + if !ok { + continue + } + kids, err := ctxSrc.DereferenceArray(o) + if err != nil { + return err + } + if ok, err = detectMigratedAnnot(ctxSrc, indRef, kids, migrated); err != nil { + return err + } + if ok { + *fieldsDest = append(*fieldsDest, *indRef) + } + } + + return nil +} + func migrateFields(d types.Dict, fieldsSrc, fieldsDest *types.Array, ctxSrc, ctxDest *model.Context, migrated map[int]int) error { o, _ := d.Find("Annots") annots, err := ctxDest.DereferenceArray(o) @@ -186,36 +236,11 @@ func migrateFields(d types.Dict, fieldsSrc, fieldsDest *types.Array, ctxSrc, ctx if found { continue } - for _, v := range *fieldsSrc { - ir, ok := v.(types.IndirectRef) - if !ok { - continue - } - objNr := ir.ObjectNumber.Value() - if migrated[objNr] == indRef.ObjectNumber.Value() { - *fieldsDest = append(*fieldsDest, indRef) - break - } - d, err := ctxSrc.DereferenceDict(ir) - if err != nil { - return err - } - o, ok := d.Find("Kids") - if !ok { - continue - } - kids, err := ctxSrc.DereferenceArray(o) - if err != nil { - return err - } - if ok, err = detectMigratedAnnot(ctxSrc, &indRef, kids, migrated); err != nil { - return err - } - if ok { - *fieldsDest = append(*fieldsDest, indRef) - } + if err := migrateAnnot(&indRef, fieldsSrc, fieldsDest, ctxSrc, migrated); err != nil { + return err } } + return nil } diff --git a/pkg/pdfcpu/model/annotation.go b/pkg/pdfcpu/model/annotation.go index 4cb9e3ac..d594fcd0 100644 --- a/pkg/pdfcpu/model/annotation.go +++ b/pkg/pdfcpu/model/annotation.go @@ -22,6 +22,7 @@ import ( "github.com/angel-one/pdfcpu/pkg/pdfcpu/color" "github.com/angel-one/pdfcpu/pkg/pdfcpu/types" + "github.com/pkg/errors" ) // AnnotationFlags represents the PDF annotation flags. @@ -70,6 +71,7 @@ const ( AnnWatermark Ann3D AnnRedact + AnnCustom ) var AnnotTypes = map[string]AnnotationType{ @@ -81,7 +83,7 @@ var AnnotTypes = map[string]AnnotationType{ "Circle": AnnCircle, "Polygon": AnnPolygon, "PolyLine": AnnPolyLine, - "HighLight": AnnHighLight, + "Highlight": AnnHighLight, "Underline": AnnUnderline, "Squiggly": AnnSquiggly, "StrikeOut": AnnStrikeOut, @@ -99,6 +101,7 @@ var AnnotTypes = map[string]AnnotationType{ "Watermark": AnnWatermark, "3D": Ann3D, "Redact": AnnRedact, + "Custom": AnnCustom, } // AnnotTypeStrings manages string representations for annotation types. @@ -111,7 +114,7 @@ var AnnotTypeStrings = map[AnnotationType]string{ AnnCircle: "Circle", AnnPolygon: "Polygon", AnnPolyLine: "PolyLine", - AnnHighLight: "HighLight", + AnnHighLight: "Highlight", AnnUnderline: "Underline", AnnSquiggly: "Squiggly", AnnStrikeOut: "StrikeOut", @@ -129,6 +132,7 @@ var AnnotTypeStrings = map[AnnotationType]string{ AnnWatermark: "Watermark", Ann3D: "3D", AnnRedact: "Redact", + AnnCustom: "Custom", } // BorderStyle (see table 168) @@ -180,57 +184,136 @@ func borderEffectDict(cloudyBorder bool, intensity int) types.Dict { }) } +func borderArray(rx, ry, width float64) types.Array { + return types.NewNumberArray(rx, ry, width) +} + +// LineEndingStyle (see table 179) +type LineEndingStyle int + +const ( + LESquare LineEndingStyle = iota + LECircle + LEDiamond + LEOpenArrow + LEClosedArrow + LENone + LEButt + LEROpenArrow + LERClosedArrow + LESlash +) + +func LineEndingStyleName(les LineEndingStyle) string { + var s string + switch les { + case LESquare: + s = "Square" + case LECircle: + s = "Circle" + case LEDiamond: + s = "Diamond" + case LEOpenArrow: + s = "OpenArrow" + case LEClosedArrow: + s = "ClosedArrow" + case LENone: + s = "None" + case LEButt: + s = "Butt" + case LEROpenArrow: + s = "ROpenArrow" + case LERClosedArrow: + s = "RClosedArrow" + case LESlash: + s = "Slash" + } + return s +} + // AnnotationRenderer is the interface for PDF annotations. type AnnotationRenderer interface { - RenderDict(xRefTable *XRefTable, pageIndRef types.IndirectRef) (types.Dict, error) + RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) Type() AnnotationType RectString() string + APObjNrInt() int ID() string ContentString() string + CustomTypeString() string } -// Annotation represents a PDF annnotation. +// Annotation represents a PDF annotation. type Annotation struct { - SubType AnnotationType // The type of annotation that this dictionary describes. - Rect types.Rectangle // The annotation rectangle, defining the location of the annotation on the page in default user space units. - Contents string // Text that shall be displayed for the annotation. - P *types.IndirectRef // An indirect reference to the page object with which this annotation is associated. - NM string // (Since V1.4) The annotation name, a text string uniquely identifying it among all the annotations on its page. - ModDate string // The date and time when the annotation was most recently modified. - F AnnotationFlags // A set of flags specifying various characteristics of the annotation. - C *color.SimpleColor // The background color of the annotation’s icon when closed. + SubType AnnotationType // The type of annotation that this dictionary describes. + CustomSubType string // Out of spec annot type. + Rect types.Rectangle // The annotation rectangle, defining the location of the annotation on the page in default user space units. + APObjNr int // The objNr of the appearance stream dict. + Contents string // Text that shall be displayed for the annotation. + NM string // (Since V1.4) The annotation name, a text string uniquely identifying it among all the annotations on its page. + ModificationDate string // M - The date and time when the annotation was most recently modified. + P *types.IndirectRef // An indirect reference to the page object with which this annotation is associated. + F AnnotationFlags // A set of flags specifying various characteristics of the annotation. + C *color.SimpleColor // The background color of the annotation’s icon when closed, pop up title bar color, link ann border color. + BorderRadX float64 // Border radius X + BorderRadY float64 // Border radius Y + BorderWidth float64 // Border width + Hash uint32 + // StructParent int + // OC types.dict } // NewAnnotation returns a new annotation. func NewAnnotation( typ AnnotationType, + customTyp string, rect types.Rectangle, - contents string, - pageIndRef *types.IndirectRef, - nm string, + apObjNr int, + contents, id string, + modDate string, f AnnotationFlags, - col *color.SimpleColor) Annotation { + col *color.SimpleColor, + borderRadX float64, + borderRadY float64, + borderWidth float64) Annotation { return Annotation{ - SubType: typ, - Rect: rect, - Contents: contents, - P: pageIndRef, - NM: nm, - F: f, - C: col} + SubType: typ, + CustomSubType: customTyp, + Rect: rect, + APObjNr: apObjNr, + Contents: contents, + NM: id, + ModificationDate: modDate, + F: f, + C: col, + BorderRadX: borderRadX, + BorderRadY: borderRadY, + BorderWidth: borderWidth, + } } // NewAnnotationForRawType returns a new annotation of a specific type. func NewAnnotationForRawType( typ string, rect types.Rectangle, - contents string, - pageIndRef *types.IndirectRef, - nm string, + apObjNr int, + contents, id string, + modDate string, f AnnotationFlags, - col *color.SimpleColor) Annotation { - return NewAnnotation(AnnotTypes[typ], rect, contents, pageIndRef, nm, f, col) + + col *color.SimpleColor, + borderRadX float64, + borderRadY float64, + borderWidth float64) Annotation { + + annType, ok := AnnotTypes[typ] + if !ok { + annType = AnnotTypes["Custom"] + } else { + typ = "" + } + + return NewAnnotation(annType, typ, rect, apObjNr, contents, id, modDate, f, col, borderRadX, borderRadY, borderWidth) } // ID returns the annotation id. @@ -243,11 +326,20 @@ func (ann Annotation) ContentString() string { return ann.Contents } +// ContentString returns a string representation of ann's contents. +func (ann Annotation) CustomTypeString() string { + return ann.CustomSubType +} + // RectString returns ann's positioning rectangle. func (ann Annotation) RectString() string { return ann.Rect.ShortString() } +func (ann Annotation) APObjNrInt() int { + return ann.APObjNr +} + // Type returns ann's type. func (ann Annotation) Type() AnnotationType { return ann.SubType @@ -258,32 +350,88 @@ func (ann Annotation) TypeString() string { return AnnotTypeStrings[ann.SubType] } -// RenderDict is a stub for behavior that renders ann's PDF dict. -func (ann Annotation) RenderDict(xRefTable *XRefTable, pageIndRef types.IndirectRef) (types.Dict, error) { - return nil, nil +// HashString returns the annotation hash. +func (ann Annotation) HashString() uint32 { + return ann.Hash +} + +func (ann Annotation) RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) { + d := types.Dict(map[string]types.Object{ + "Type": types.Name("Annot"), + "Subtype": types.Name(ann.TypeString()), + "Rect": ann.Rect.Array(), + }) + + if pageIndRef != nil { + d["P"] = *pageIndRef + } + + if ann.Contents != "" { + s, err := types.EscapedUTF16String(ann.Contents) + if err != nil { + return nil, err + } + d.InsertString("Contents", *s) + } + + if ann.NM != "" { + d.InsertString("NM", ann.NM) + } + + modDate := types.DateString(time.Now()) + if ann.ModificationDate != "" { + _, ok := types.DateTime(ann.ModificationDate, xRefTable.ValidationMode == ValidationRelaxed) + if !ok { + return nil, errors.Errorf("pdfcpu: annotation renderDict - validateDateEntry: <%s> invalid date", ann.ModificationDate) + } + modDate = ann.ModificationDate + } + d.InsertString("ModDate", modDate) + + if ann.F != 0 { + d["F"] = types.Integer(ann.F) + } + + if ann.C != nil { + d["C"] = ann.C.Array() + } + + if ann.BorderWidth > 0 { + d["Border"] = borderArray(ann.BorderRadX, ann.BorderRadY, ann.BorderWidth) + } + + return d, nil } // PopupAnnotation represents PDF Popup annotations. type PopupAnnotation struct { Annotation - ParentIndRef *types.IndirectRef // The parent annotation with which this pop-up annotation shall be associated. + ParentIndRef *types.IndirectRef // The optional parent markup annotation with which this pop-up annotation shall be associated. Open bool // A flag specifying whether the annotation shall initially be displayed open. } // NewPopupAnnotation returns a new popup annotation. func NewPopupAnnotation( rect types.Rectangle, - pageIndRef *types.IndirectRef, + apObjNr int, contents, id string, + modDate string, f AnnotationFlags, - bgCol *color.SimpleColor, - parentIndRef *types.IndirectRef) PopupAnnotation { + col *color.SimpleColor, + borderRadX float64, + borderRadY float64, + borderWidth float64, - ann := NewAnnotation(AnnPopup, rect, contents, pageIndRef, id, f, bgCol) + parentIndRef *types.IndirectRef, + displayOpen bool) PopupAnnotation { + + ann := NewAnnotation(AnnPopup, "", rect, apObjNr, contents, id, modDate, f, col, borderRadX, borderRadY, borderWidth) return PopupAnnotation{ Annotation: ann, - ParentIndRef: parentIndRef} + ParentIndRef: parentIndRef, + Open: displayOpen, + } } // ContentString returns a string representation of ann's content. @@ -295,193 +443,19 @@ func (ann PopupAnnotation) ContentString() string { return s } -// MarkupAnnotation represents a PDF markup annotation. -type MarkupAnnotation struct { - Annotation - T string // The text label that shall be displayed in the title bar of the annotation’s pop-up window when open and active. This entry shall identify the user who added the annotation. - PopupIndRef *types.IndirectRef // An indirect reference to a pop-up annotation for entering or editing the text associated with this annotation. - CA *float64 // (Default: 1.0) The constant opacity value that shall be used in painting the annotation. - RC string // A rich text string that shall be displayed in the pop-up window when the annotation is opened. - CreationDate string // The date and time when the annotation was created. - Subj string // Text representing a short description of the subject being addressed by the annotation. -} - -// NewMarkupAnnotation returns a new markup annotation. -func NewMarkupAnnotation( - subType AnnotationType, - rect types.Rectangle, - pageIndRef *types.IndirectRef, - contents, id, title string, - f AnnotationFlags, - bgCol *color.SimpleColor, - popupIndRef *types.IndirectRef, - ca *float64, - rc, subject string) MarkupAnnotation { - - ann := NewAnnotation(subType, rect, contents, pageIndRef, id, f, bgCol) - - return MarkupAnnotation{ - Annotation: ann, - T: title, - PopupIndRef: popupIndRef, - CreationDate: types.DateString(time.Now()), - CA: ca, - RC: rc, - Subj: subject} -} - -// TextAnnotation represents a PDF text annotation aka "Sticky Note". -type TextAnnotation struct { - MarkupAnnotation - Open bool // A flag specifying whether the annotation shall initially be displayed open. - Name string // The name of an icon that shall be used in displaying the annotation. Comment, Key, (Note), Help, NewParagraph, Paragraph, Insert -} - -// NewTextAnnotation returns a new text annotation. -func NewTextAnnotation( - rect types.Rectangle, - contents, id, title string, - f AnnotationFlags, - bgCol *color.SimpleColor, - ca *float64, - rc, subj string, - open bool, - name string) TextAnnotation { - - ma := NewMarkupAnnotation(AnnText, rect, nil, contents, id, title, f, bgCol, nil, ca, rc, subj) - - return TextAnnotation{ - MarkupAnnotation: ma, - Open: open, - Name: name, - } -} - -// RenderDict renders ann into a PDF annotation dict. -func (ann TextAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef types.IndirectRef) (types.Dict, error) { - subject := "Sticky Note" - if ann.Subj != "" { - subject = ann.Subj - } - d := types.Dict(map[string]types.Object{ - "Type": types.Name("Annot"), - "Subtype": types.Name(ann.TypeString()), - "Rect": ann.Rect.Array(), - "P": pageIndRef, - "F": types.Integer(ann.F), - "CreationDate": types.StringLiteral(ann.CreationDate), - "Subj": types.StringLiteral(subject), - "Open": types.Boolean(ann.Open), - }) - if ann.CA != nil { - d.Insert("CA", types.Float(*ann.CA)) - } - if ann.PopupIndRef != nil { - d.Insert("Popup", *ann.PopupIndRef) - } - if ann.RC != "" { - d.InsertString("RC", ann.RC) - } - if ann.Name != "" { - d.InsertName("Name", ann.Name) - } - if ann.Contents != "" { - d.InsertString("Contents", ann.Contents) - } - if ann.NM != "" { - d.InsertString("NM", ann.NM) // TODO check for uniqueness across annotations on this page. - } - if ann.T != "" { - d.InsertString("T", ann.T) - } - if ann.C != nil { - d.Insert("C", ann.C.Array()) - } - return d, nil -} - -// A series of alternating x and y coordinates in PDF user space, specifying points along the path. -type InkPath []float64 - -type InkAnnotation struct { - MarkupAnnotation - InkList []InkPath - BS types.Dict - AP types.Dict -} - -// NewInkAnnotation returns a new ink annotation. -func NewInkAnnotation( - rect types.Rectangle, - contents, id, title string, - ink []InkPath, - bs types.Dict, - f AnnotationFlags, - bgCol *color.SimpleColor, - ca *float64, - rc, subj string, - ap types.Dict, -) InkAnnotation { - - ann := NewMarkupAnnotation(AnnInk, rect, nil, contents, id, title, f, bgCol, nil, ca, rc, subj) - - return InkAnnotation{ - MarkupAnnotation: ann, - InkList: ink, - BS: bs, - AP: ap, +func (ann PopupAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) { + d, err := ann.Annotation.RenderDict(xRefTable, pageIndRef) + if err != nil { + return nil, err } -} -func (ann InkAnnotation) RenderDict(pageIndRef types.IndirectRef) types.Dict { - subject := "Ink Annotation" - if ann.Subj != "" { - subject = ann.Subj - } - ink := types.Array{} - for i := range ann.InkList { - ink = append(ink, types.NewNumberArray(ann.InkList[i]...)) + if ann.ParentIndRef != nil { + d["Parent"] = *ann.ParentIndRef } - d := types.Dict(map[string]types.Object{ - "Type": types.Name("Annot"), - "Subtype": types.Name(ann.TypeString()), - "Rect": ann.Rect.Array(), - "P": pageIndRef, - "F": types.Integer(ann.F), - "CreationDate": types.StringLiteral(ann.CreationDate), - "Subj": types.StringLiteral(subject), - "InkList": ink, - }) - if ann.AP != nil { - d.Insert("AP", ann.AP) - } - if ann.CA != nil { - d.Insert("CA", types.Float(*ann.CA)) - } - if ann.PopupIndRef != nil { - d.Insert("Popup", *ann.PopupIndRef) - } - if ann.RC != "" { - d.InsertString("RC", ann.RC) - } - if ann.BS != nil { - d.Insert("BS", ann.BS) - } - if ann.Contents != "" { - d.InsertString("Contents", ann.Contents) - } - if ann.NM != "" { - d.InsertString("NM", ann.NM) // TODO check for uniqueness across annotations on this page. - } - if ann.T != "" { - d.InsertString("T", ann.T) - } - if ann.C != nil { - d.Insert("C", ann.C.Array()) - } + d["Open"] = types.Boolean(ann.Open) - return d + return d, nil } // LinkAnnotation represents a PDF link annotation. @@ -498,17 +472,20 @@ type LinkAnnotation struct { // NewLinkAnnotation returns a new link annotation. func NewLinkAnnotation( rect types.Rectangle, - quad types.QuadPoints, + apObjNr int, + contents, id string, + modDate string, + f AnnotationFlags, + borderCol *color.SimpleColor, + dest *Destination, // supply dest or uri, dest takes precedence uri string, - id string, - f AnnotationFlags, + quad types.QuadPoints, + border bool, borderWidth float64, - borderStyle BorderStyle, - borderCol *color.SimpleColor, - border bool) LinkAnnotation { + borderStyle BorderStyle) LinkAnnotation { - ann := NewAnnotation(AnnLink, rect, "", nil, id, f, borderCol) + ann := NewAnnotation(AnnLink, "", rect, apObjNr, contents, id, modDate, f, borderCol, 0, 0, 0) return LinkAnnotation{ Annotation: ann, @@ -534,22 +511,10 @@ func (ann LinkAnnotation) ContentString() string { } // RenderDict renders ann into a page annotation dict. -func (ann LinkAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef types.IndirectRef) (types.Dict, error) { - d := types.Dict(map[string]types.Object{ - "Type": types.Name("Annot"), - "Subtype": types.Name(ann.TypeString()), - "Rect": ann.Rect.Array(), - "P": pageIndRef, - "F": types.Integer(ann.F), - "BS": borderStyleDict(ann.BorderWidth, ann.BorderStyle), - }) - - if !ann.Border { - d["Border"] = types.NewIntegerArray(0, 0, 0) - } else { - if ann.C != nil { - d["C"] = ann.C.Array() - } +func (ann LinkAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) { + d, err := ann.Annotation.RenderDict(xRefTable, pageIndRef) + if err != nil { + return nil, err } if ann.Dest != nil { @@ -579,93 +544,345 @@ func (ann LinkAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef types.Indi }) d["A"] = actionDict } - if ann.NM != "" { - d.InsertString("NM", ann.NM) // TODO check for uniqueness across annotations on this page. - } + if ann.Quad != nil { d.Insert("QuadPoints", ann.Quad.Array()) } + + if !ann.Border { + d["Border"] = types.NewIntegerArray(0, 0, 0) + } else { + if ann.C != nil { + d["C"] = ann.C.Array() + } + } + + d["BS"] = borderStyleDict(ann.BorderWidth, ann.BorderStyle) + return d, nil } -// SquareAnnotation represents a square annotation. -type SquareAnnotation struct { +// MarkupAnnotation represents a PDF markup annotation. +type MarkupAnnotation struct { Annotation - FillCol *color.SimpleColor - Margins types.Array - BorderWidth float64 - BorderStyle BorderStyle - CloudyBorder bool - CloudyBorderIntensity int // 0,1,2 + T string // The text label that shall be displayed in the title bar of the annotation’s pop-up window when open and active. This entry shall identify the user who added the annotation. + PopupIndRef *types.IndirectRef // An indirect reference to a pop-up annotation for entering or editing the text associated with this annotation. + CA *float64 // (Default: 1.0) The constant opacity value that shall be used in painting the annotation. + RC string // A rich text string that shall be displayed in the pop-up window when the annotation is opened. + CreationDate string // The date and time when the annotation was created. + Subj string // Text representing a short description of the subject being addressed by the annotation. } -// NewSquareAnnotation returns a new square annotation. -func NewSquareAnnotation( +// NewMarkupAnnotation returns a new markup annotation. +func NewMarkupAnnotation( + subType AnnotationType, rect types.Rectangle, - contents string, - id string, + apObjNr int, + contents, id string, + modDate string, f AnnotationFlags, + col *color.SimpleColor, + borderRadX float64, + borderRadY float64, borderWidth float64, - borderStyle BorderStyle, - borderCol *color.SimpleColor, - cloudyBorder bool, - cloudyBorderIntensity int, - fillCol *color.SimpleColor, - MLeft, MTop, MRight, MBot float64) SquareAnnotation { - ann := NewAnnotation(AnnSquare, rect, contents, nil, id, f, borderCol) - - if cloudyBorderIntensity < 0 || cloudyBorderIntensity > 2 { - cloudyBorderIntensity = 0 - } + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string) MarkupAnnotation { - squareAnn := SquareAnnotation{ - Annotation: ann, - FillCol: fillCol, - BorderWidth: borderWidth, - BorderStyle: borderStyle, - CloudyBorder: cloudyBorder, - CloudyBorderIntensity: cloudyBorderIntensity, - } + ann := NewAnnotation(subType, "", rect, apObjNr, contents, id, modDate, f, col, borderRadX, borderRadY, borderWidth) + + return MarkupAnnotation{ + Annotation: ann, + T: title, + PopupIndRef: popupIndRef, + CA: ca, + RC: rc, + CreationDate: types.DateString(time.Now()), + Subj: subject} +} + +// ContentString returns a string representation of ann's content. +func (ann MarkupAnnotation) ContentString() string { + s := "\"" + ann.Contents + "\"" + if ann.PopupIndRef != nil { + s += "-> #" + ann.PopupIndRef.ObjectNumber.String() + } + return s +} + +func (ann MarkupAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) { + d, err := ann.Annotation.RenderDict(xRefTable, pageIndRef) + if err != nil { + return nil, err + } + + if ann.T != "" { + s, err := types.EscapedUTF16String(ann.T) + if err != nil { + return nil, err + } + d.InsertString("T", *s) + } + + if ann.PopupIndRef != nil { + d.Insert("Popup", *ann.PopupIndRef) + } + + if ann.CA != nil { + d.Insert("CA", types.Float(*ann.CA)) + } + + if ann.RC != "" { + s, err := types.EscapedUTF16String(ann.RC) + if err != nil { + return nil, err + } + d.InsertString("RC", *s) + } + + d.InsertString("CreationDate", ann.CreationDate) + + if ann.Subj != "" { + s, err := types.EscapedUTF16String(ann.Subj) + if err != nil { + return nil, err + } + d.InsertString("Subj", *s) + } + + return d, nil +} + +// TextAnnotation represents a PDF text annotation aka "Sticky Note". +type TextAnnotation struct { + MarkupAnnotation + Open bool // A flag specifying whether the annotation shall initially be displayed open. + Name string // The name of an icon that shall be used in displaying the annotation. Comment, Key, (Note), Help, NewParagraph, Paragraph, Insert +} + +// NewTextAnnotation returns a new text annotation. +func NewTextAnnotation( + rect types.Rectangle, + apObjNr int, + contents, id string, + modDate string, + f AnnotationFlags, + col *color.SimpleColor, + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string, + borderRadX float64, + borderRadY float64, + borderWidth float64, + + displayOpen bool, + name string) TextAnnotation { + + ma := NewMarkupAnnotation(AnnText, rect, apObjNr, contents, id, modDate, f, col, borderRadX, borderRadY, borderWidth, title, popupIndRef, ca, rc, subject) + + return TextAnnotation{ + MarkupAnnotation: ma, + Open: displayOpen, + Name: name, + } +} + +// RenderDict renders ann into a PDF annotation dict. +func (ann TextAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) { + d, err := ann.MarkupAnnotation.RenderDict(xRefTable, pageIndRef) + if err != nil { + return nil, err + } + + d["Open"] = types.Boolean(ann.Open) + + if ann.Name != "" { + d.InsertName("Name", ann.Name) + } + + return d, nil +} + +// FreeTextIntent represents the various free text annotation intents. +type FreeTextIntent int + +const ( + IntentFreeText FreeTextIntent = 1 << iota + IntentFreeTextCallout + IntentFreeTextTypeWriter +) + +func FreeTextIntentName(fti FreeTextIntent) string { + var s string + switch fti { + case IntentFreeText: + s = "FreeText" + case IntentFreeTextCallout: + s = "FreeTextCallout" + case IntentFreeTextTypeWriter: + s = "FreeTextTypeWriter" + } + return s +} + +// FreeText Annotation displays text directly on the page. +type FreeTextAnnotation struct { + MarkupAnnotation + Text string // Rich text string, see XFA 3.3 + HAlign types.HAlignment // Code specifying the form of quadding (justification) + FontName string // font name + FontSize int // font size + FontCol *color.SimpleColor // font color + DS string // Default style string + Intent string // Description of the intent of the free text annotation + CallOutLine types.Array // if intent is FreeTextCallout + CallOutLineEndingStyle string + Margins types.Array + BorderWidth float64 + BorderStyle BorderStyle + CloudyBorder bool + CloudyBorderIntensity int // 0,1,2 +} + +// XFA conform rich text string examples: +// The second and fourth words are bold. +// The second and fourth words are italicized. +// For more information see this web site. + +// NewFreeTextAnnotation returns a new free text annotation. +func NewFreeTextAnnotation( + rect types.Rectangle, + apObjNr int, + contents, id string, + modDate string, + f AnnotationFlags, + col *color.SimpleColor, + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string, + + text string, + hAlign types.HAlignment, + fontName string, + fontSize int, + fontCol *color.SimpleColor, + ds string, + intent *FreeTextIntent, + callOutLine types.Array, + callOutLineEndingStyle *LineEndingStyle, + MLeft, MTop, MRight, MBot float64, + borderWidth float64, + borderStyle BorderStyle, + cloudyBorder bool, + cloudyBorderIntensity int) FreeTextAnnotation { + + // validate required DA, DS + + // validate callOutline: 2 or 3 points => array of 4 or 6 numbers. + + ma := NewMarkupAnnotation(AnnFreeText, rect, apObjNr, contents, id, modDate, f, col, 0, 0, 0, title, popupIndRef, ca, rc, subject) + + if cloudyBorderIntensity < 0 || cloudyBorderIntensity > 2 { + cloudyBorderIntensity = 0 + } + + freeTextIntent := "" + if intent != nil { + freeTextIntent = FreeTextIntentName(*intent) + } + + leStyle := "" + if callOutLineEndingStyle != nil { + leStyle = LineEndingStyleName(*callOutLineEndingStyle) + } + + freeTextAnn := FreeTextAnnotation{ + MarkupAnnotation: ma, + Text: text, + HAlign: hAlign, + FontName: fontName, + FontSize: fontSize, + FontCol: fontCol, + DS: ds, + Intent: freeTextIntent, + CallOutLine: callOutLine, + CallOutLineEndingStyle: leStyle, + BorderWidth: borderWidth, + BorderStyle: borderStyle, + CloudyBorder: cloudyBorder, + CloudyBorderIntensity: cloudyBorderIntensity, + } if MLeft > 0 || MTop > 0 || MRight > 0 || MBot > 0 { - squareAnn.Margins = types.NewNumberArray(MLeft, MTop, MRight, MBot) + freeTextAnn.Margins = types.NewNumberArray(MLeft, MTop, MRight, MBot) } - return squareAnn + return freeTextAnn } -// RenderDict renders ann into a page annotation dict. -func (ann SquareAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef types.IndirectRef) (types.Dict, error) { - d := types.Dict(map[string]types.Object{ - "Type": types.Name("Annot"), - "Subtype": types.Name(ann.TypeString()), - "Rect": ann.Rect.Array(), - "P": pageIndRef, - "F": types.Integer(ann.F), - "BS": borderStyleDict(ann.BorderWidth, ann.BorderStyle), - }) +// RenderDict renders ann into a PDF annotation dict. +func (ann FreeTextAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) { + d, err := ann.MarkupAnnotation.RenderDict(xRefTable, pageIndRef) + if err != nil { + return nil, err + } - if ann.NM != "" { - d.InsertString("NM", ann.NM) // TODO check for uniqueness across annotations on this page. + da := "" + + // TODO Implement Tf operator + + // fontID, err := xRefTable.EnsureFont(ann.FontName) // in root page Resources? + // if err != nil { + // return nil, err + // } + + // da := fmt.Sprintf("/%s %d Tf", fontID, ann.FontSize) + + if ann.FontCol != nil { + da += fmt.Sprintf(" %.2f %.2f %.2f rg", ann.FontCol.R, ann.FontCol.G, ann.FontCol.B) } + d["DA"] = types.StringLiteral(da) - if ann.Contents != "" { - d.InsertString("Contents", ann.Contents) + d.InsertInt("Q", int(ann.HAlign)) + + if ann.Text == "" { + if ann.Contents == "" { + return nil, errors.New("pdfcpu: FreeTextAnnotation missing \"text\"") + } + ann.Text = ann.Contents } + s, err := types.EscapedUTF16String(ann.Text) + if err != nil { + return nil, err + } + d.InsertString("RC", *s) - if ann.C != nil { - d["C"] = ann.C.Array() + if ann.DS != "" { + d.InsertString("DS", ann.DS) } - if ann.FillCol != nil { - d["IC"] = ann.FillCol.Array() + if ann.Intent != "" { + d.InsertName("IT", ann.Intent) + if ann.Intent == "FreeTextCallout" { + if len(ann.CallOutLine) > 0 { + d["CL"] = ann.CallOutLine + d.InsertName("LE", ann.CallOutLineEndingStyle) + } + } } if ann.Margins != nil { d["RD"] = ann.Margins } + if ann.BorderWidth > 0 { + d["BS"] = borderStyleDict(ann.BorderWidth, ann.BorderStyle) + } + if ann.CloudyBorder && ann.CloudyBorderIntensity > 0 { d["BE"] = borderEffectDict(ann.CloudyBorder, ann.CloudyBorderIntensity) } @@ -673,9 +890,185 @@ func (ann SquareAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef types.In return d, nil } -// CircleAnnotation represents a square annotation. -type CircleAnnotation struct { - Annotation +// LineIntent represents the various line annotation intents. +type LineIntent int + +const ( + IntentLineArrow LineIntent = 1 << iota + IntentLineDimension +) + +func LineIntentName(li LineIntent) string { + var s string + switch li { + case IntentLineArrow: + s = "LineArrow" + case IntentLineDimension: + s = "LineDimension" + } + return s +} + +// LineAnnotation represents a line annotation. +type LineAnnotation struct { + MarkupAnnotation + P1, P2 types.Point // Two points in default user space. + LineEndings types.Array // Optional array of two names that shall specify the line ending styles. + LeaderLineLength float64 // Length of leader lines in default user space that extend from each endpoint of the line perpendicular to the line itself. + LeaderLineOffset float64 // Non-negative number that shall represent the length of the leader line offset, which is the amount of empty space between the endpoints of the annotation and the beginning of the leader lines. + LeaderLineExtensionLength float64 // Non-negative number that shall represents the length of leader line extensions that extend from the line proper 180 degrees from the leader lines, + Intent string // Optional description of the intent of the line annotation. + Measure types.Dict // Optional measure dictionary that shall specify the scale and units that apply to the line annotation. + Caption bool // Use text specified by "Contents" or "RC" as caption. + CaptionPositionTop bool // if true the caption shall be on top of the line else caption shall be centred inside the line. + CaptionOffsetX float64 + CaptionOffsetY float64 + FillCol *color.SimpleColor + BorderWidth float64 + BorderStyle BorderStyle +} + +// NewLineAnnotation returns a new line annotation. +func NewLineAnnotation( + rect types.Rectangle, + apObjNr int, + contents, id string, + modDate string, + f AnnotationFlags, + col *color.SimpleColor, + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string, + + p1, p2 types.Point, + beginLineEndingStyle *LineEndingStyle, + endLineEndingStyle *LineEndingStyle, + leaderLineLength float64, + leaderLineOffset float64, + leaderLineExtensionLength float64, + intent *LineIntent, + measure types.Dict, + caption bool, + captionPosTop bool, + captionOffsetX float64, + captionOffsetY float64, + fillCol *color.SimpleColor, + borderWidth float64, + borderStyle BorderStyle) LineAnnotation { + + ma := NewMarkupAnnotation(AnnLine, rect, apObjNr, contents, id, modDate, f, col, 0, 0, 0, title, popupIndRef, ca, rc, subject) + + lineIntent := "" + if intent != nil { + lineIntent = LineIntentName(*intent) + } + + lineAnn := LineAnnotation{ + MarkupAnnotation: ma, + P1: p1, + P2: p2, + LeaderLineLength: leaderLineLength, + LeaderLineOffset: leaderLineOffset, + LeaderLineExtensionLength: leaderLineExtensionLength, + Intent: lineIntent, + Measure: measure, + Caption: caption, + CaptionPositionTop: captionPosTop, + CaptionOffsetX: captionOffsetX, + CaptionOffsetY: captionOffsetY, + FillCol: fillCol, + BorderWidth: borderWidth, + BorderStyle: borderStyle, + } + + if beginLineEndingStyle != nil && endLineEndingStyle != nil { + lineAnn.LineEndings = + types.NewNameArray( + LineEndingStyleName(*beginLineEndingStyle), + LineEndingStyleName(*endLineEndingStyle), + ) + } + + return lineAnn +} + +func (ann LineAnnotation) validateLeaderLineAttrs() error { + if ann.LeaderLineExtensionLength < 0 { + return errors.New("pdfcpu: LineAnnotation leader line extension length must not be negative.") + } + + if ann.LeaderLineExtensionLength > 0 && ann.LeaderLineLength == 0 { + return errors.New("pdfcpu: LineAnnotation leader line length missing.") + } + + if ann.LeaderLineOffset < 0 { + return errors.New("pdfcpu: LineAnnotation leader line offset must not be negative.") + } + + return nil +} + +// RenderDict renders ann into a PDF annotation dict. +func (ann LineAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) { + + d, err := ann.MarkupAnnotation.RenderDict(xRefTable, pageIndRef) + if err != nil { + return nil, err + } + + if err := ann.validateLeaderLineAttrs(); err != nil { + return nil, err + } + + d["L"] = types.NewNumberArray(ann.P1.X, ann.P1.Y, ann.P2.X, ann.P2.Y) + + if ann.LeaderLineExtensionLength > 0 { + d["LLE"] = types.Float(ann.LeaderLineExtensionLength) + } + + if ann.LeaderLineLength > 0 { + d["LL"] = types.Float(ann.LeaderLineLength) + if ann.LeaderLineOffset > 0 { + d["LLO"] = types.Float(ann.LeaderLineOffset) + } + } + + if len(ann.Measure) > 0 { + d["Measure"] = ann.Measure + } + + if ann.Intent != "" { + d.InsertName("IT", ann.Intent) + + } + + d["Cap"] = types.Boolean(ann.Caption) + if ann.Caption { + if ann.CaptionPositionTop { + d["CP"] = types.Name("Top") + } + d["CO"] = types.NewNumberArray(ann.CaptionOffsetX, ann.CaptionOffsetY) + } + + if ann.FillCol != nil { + d["IC"] = ann.FillCol.Array() + } + + if ann.BorderWidth > 0 { + d["BS"] = borderStyleDict(ann.BorderWidth, ann.BorderStyle) + } + + if len(ann.LineEndings) == 2 { + d["LE"] = ann.LineEndings + } + + return d, nil +} + +// SquareAnnotation represents a square annotation. +type SquareAnnotation struct { + MarkupAnnotation FillCol *color.SimpleColor Margins types.Array BorderWidth float64 @@ -684,28 +1077,34 @@ type CircleAnnotation struct { CloudyBorderIntensity int // 0,1,2 } -// NewCircleAnnotation returns a new circle annotation. -func NewCircleAnnotation( +// NewSquareAnnotation returns a new square annotation. +func NewSquareAnnotation( rect types.Rectangle, - contents string, - id string, + apObjNr int, + contents, id string, + modDate string, f AnnotationFlags, + col *color.SimpleColor, + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string, + + fillCol *color.SimpleColor, + MLeft, MTop, MRight, MBot float64, borderWidth float64, borderStyle BorderStyle, - borderCol *color.SimpleColor, cloudyBorder bool, - cloudyBorderIntensity int, - fillCol *color.SimpleColor, - MLeft, MTop, MRight, MBot float64) CircleAnnotation { + cloudyBorderIntensity int) SquareAnnotation { - ann := NewAnnotation(AnnCircle, rect, contents, nil, id, f, borderCol) + ma := NewMarkupAnnotation(AnnSquare, rect, apObjNr, contents, id, modDate, f, col, 0, 0, 0, title, popupIndRef, ca, rc, subject) if cloudyBorderIntensity < 0 || cloudyBorderIntensity > 2 { cloudyBorderIntensity = 0 } - circleAnn := CircleAnnotation{ - Annotation: ann, + squareAnn := SquareAnnotation{ + MarkupAnnotation: ma, FillCol: fillCol, BorderWidth: borderWidth, BorderStyle: borderStyle, @@ -714,33 +1113,17 @@ func NewCircleAnnotation( } if MLeft > 0 || MTop > 0 || MRight > 0 || MBot > 0 { - circleAnn.Margins = types.NewNumberArray(MLeft, MTop, MRight, MBot) + squareAnn.Margins = types.NewNumberArray(MLeft, MTop, MRight, MBot) } - return circleAnn + return squareAnn } // RenderDict renders ann into a page annotation dict. -func (ann CircleAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef types.IndirectRef) (types.Dict, error) { - d := types.Dict(map[string]types.Object{ - "Type": types.Name("Annot"), - "Subtype": types.Name(ann.TypeString()), - "Rect": ann.Rect.Array(), - "P": pageIndRef, - "F": types.Integer(ann.F), - "BS": borderStyleDict(ann.BorderWidth, ann.BorderStyle), - }) - - if ann.NM != "" { - d.InsertString("NM", ann.NM) // TODO check for uniqueness across annotations on this page. - } - - if ann.Contents != "" { - d.InsertString("Contents", ann.Contents) - } - - if ann.C != nil { - d["C"] = ann.C.Array() +func (ann SquareAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) { + d, err := ann.MarkupAnnotation.RenderDict(xRefTable, pageIndRef) + if err != nil { + return nil, err } if ann.FillCol != nil { @@ -751,9 +1134,593 @@ func (ann CircleAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef types.In d["RD"] = ann.Margins } + if ann.BorderWidth > 0 { + d["BS"] = borderStyleDict(ann.BorderWidth, ann.BorderStyle) + } + if ann.CloudyBorder && ann.CloudyBorderIntensity > 0 { d["BE"] = borderEffectDict(ann.CloudyBorder, ann.CloudyBorderIntensity) } return d, nil } + +// CircleAnnotation represents a square annotation. +type CircleAnnotation struct { + MarkupAnnotation + FillCol *color.SimpleColor + Margins types.Array + BorderWidth float64 + BorderStyle BorderStyle + CloudyBorder bool + CloudyBorderIntensity int // 0,1,2 +} + +// NewCircleAnnotation returns a new circle annotation. +func NewCircleAnnotation( + rect types.Rectangle, + apObjNr int, + contents, id string, + modDate string, + f AnnotationFlags, + col *color.SimpleColor, + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string, + + fillCol *color.SimpleColor, + MLeft, MTop, MRight, MBot float64, + borderWidth float64, + borderStyle BorderStyle, + cloudyBorder bool, + cloudyBorderIntensity int) CircleAnnotation { + + ma := NewMarkupAnnotation(AnnCircle, rect, apObjNr, contents, id, modDate, f, col, 0, 0, 0, title, popupIndRef, ca, rc, subject) + + if cloudyBorderIntensity < 0 || cloudyBorderIntensity > 2 { + cloudyBorderIntensity = 0 + } + + circleAnn := CircleAnnotation{ + MarkupAnnotation: ma, + FillCol: fillCol, + BorderWidth: borderWidth, + BorderStyle: borderStyle, + CloudyBorder: cloudyBorder, + CloudyBorderIntensity: cloudyBorderIntensity, + } + + if MLeft > 0 || MTop > 0 || MRight > 0 || MBot > 0 { + circleAnn.Margins = types.NewNumberArray(MLeft, MTop, MRight, MBot) + } + + return circleAnn +} + +// RenderDict renders ann into a page annotation dict. +func (ann CircleAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) { + d, err := ann.MarkupAnnotation.RenderDict(xRefTable, pageIndRef) + if err != nil { + return nil, err + } + + if ann.FillCol != nil { + d["IC"] = ann.FillCol.Array() + } + + if ann.Margins != nil { + d["RD"] = ann.Margins + } + + if ann.BorderWidth > 0 { + d["BS"] = borderStyleDict(ann.BorderWidth, ann.BorderStyle) + } + + if ann.CloudyBorder && ann.CloudyBorderIntensity > 0 { + d["BE"] = borderEffectDict(ann.CloudyBorder, ann.CloudyBorderIntensity) + } + + return d, nil +} + +// PolygonIntent represents the various polygon annotation intents. +type PolygonIntent int + +const ( + IntentPolygonCloud PolygonIntent = 1 << iota + IntentPolygonDimension +) + +func PolygonIntentName(pi PolygonIntent) string { + var s string + switch pi { + case IntentPolygonCloud: + s = "PolygonCloud" + case IntentPolygonDimension: + s = "PolygonDimension" + } + return s +} + +// PolygonAnnotation represents a polygon annotation. +type PolygonAnnotation struct { + MarkupAnnotation + Vertices types.Array // Array of numbers specifying the alternating horizontal and vertical coordinates, respectively, of each vertex, in default user space. + Path types.Array // Array of n arrays, each supplying the operands for a path building operator (m, l or c). + Intent string // Optional description of the intent of the polygon annotation. + Measure types.Dict // Optional measure dictionary that shall specify the scale and units that apply to the annotation. + FillCol *color.SimpleColor + BorderWidth float64 + BorderStyle BorderStyle + CloudyBorder bool + CloudyBorderIntensity int // 0,1,2 +} + +// NewPolygonAnnotation returns a new polygon annotation. +func NewPolygonAnnotation( + rect types.Rectangle, + apObjNr int, + contents, id string, + modDate string, + f AnnotationFlags, + col *color.SimpleColor, + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string, + + vertices types.Array, + path types.Array, + intent *PolygonIntent, + measure types.Dict, + fillCol *color.SimpleColor, + borderWidth float64, + borderStyle BorderStyle, + cloudyBorder bool, + cloudyBorderIntensity int) PolygonAnnotation { + + ma := NewMarkupAnnotation(AnnPolygon, rect, apObjNr, contents, id, modDate, f, col, 0, 0, 0, title, popupIndRef, ca, rc, subject) + + polygonIntent := "" + if intent != nil { + polygonIntent = PolygonIntentName(*intent) + } + + if cloudyBorderIntensity < 0 || cloudyBorderIntensity > 2 { + cloudyBorderIntensity = 0 + } + + polygonAnn := PolygonAnnotation{ + MarkupAnnotation: ma, + Vertices: vertices, + Path: path, + Intent: polygonIntent, + Measure: measure, + FillCol: fillCol, + BorderWidth: borderWidth, + BorderStyle: borderStyle, + CloudyBorder: cloudyBorder, + CloudyBorderIntensity: cloudyBorderIntensity, + } + + return polygonAnn +} + +// RenderDict renders ann into a PDF annotation dict. +func (ann PolygonAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) { + + d, err := ann.MarkupAnnotation.RenderDict(xRefTable, pageIndRef) + if err != nil { + return nil, err + } + + if len(ann.Measure) > 0 { + d["Measure"] = ann.Measure + } + + if len(ann.Vertices) > 0 && len(ann.Path) > 0 { + return nil, errors.New("pdfcpu: PolygonAnnotation supports \"Vertices\" or \"Path\" only") + } + + if len(ann.Vertices) > 0 { + d["Vertices"] = ann.Vertices + } else { + d["Path"] = ann.Path + } + + if ann.Intent != "" { + d.InsertName("IT", ann.Intent) + + } + + if ann.FillCol != nil { + d["IC"] = ann.FillCol.Array() + } + + if ann.BorderWidth > 0 { + d["BS"] = borderStyleDict(ann.BorderWidth, ann.BorderStyle) + } + + if ann.CloudyBorder && ann.CloudyBorderIntensity > 0 { + d["BE"] = borderEffectDict(ann.CloudyBorder, ann.CloudyBorderIntensity) + } + + return d, nil +} + +// PolyLineIntent represents the various polyline annotation intents. +type PolyLineIntent int + +const ( + IntentPolyLinePolygonCloud PolyLineIntent = 1 << iota + IntentPolyLineDimension +) + +func PolyLineIntentName(pi PolyLineIntent) string { + var s string + switch pi { + case IntentPolyLineDimension: + s = "PolyLineDimension" + } + return s +} + +type PolyLineAnnotation struct { + MarkupAnnotation + Vertices types.Array // Array of numbers specifying the alternating horizontal and vertical coordinates, respectively, of each vertex, in default user space. + Path types.Array // Array of n arrays, each supplying the operands for a path building operator (m, l or c). + Intent string // Optional description of the intent of the polyline annotation. + Measure types.Dict // Optional measure dictionary that shall specify the scale and units that apply to the annotation. + FillCol *color.SimpleColor + BorderWidth float64 + BorderStyle BorderStyle + LineEndings types.Array // Optional array of two names that shall specify the line ending styles. +} + +// NewPolyLineAnnotation returns a new polyline annotation. +func NewPolyLineAnnotation( + rect types.Rectangle, + apObjNr int, + contents, id string, + modDate string, + f AnnotationFlags, + col *color.SimpleColor, + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string, + + vertices types.Array, + path types.Array, + intent *PolyLineIntent, + measure types.Dict, + fillCol *color.SimpleColor, + borderWidth float64, + borderStyle BorderStyle, + beginLineEndingStyle *LineEndingStyle, + endLineEndingStyle *LineEndingStyle) PolyLineAnnotation { + + ma := NewMarkupAnnotation(AnnPolyLine, rect, apObjNr, contents, id, modDate, f, col, 0, 0, 0, title, popupIndRef, ca, rc, subject) + + polyLineIntent := "" + if intent != nil { + polyLineIntent = PolyLineIntentName(*intent) + } + + polyLineAnn := PolyLineAnnotation{ + MarkupAnnotation: ma, + Vertices: vertices, + Path: path, + Intent: polyLineIntent, + Measure: measure, + FillCol: fillCol, + BorderWidth: borderWidth, + BorderStyle: borderStyle, + } + + if beginLineEndingStyle != nil && endLineEndingStyle != nil { + polyLineAnn.LineEndings = + types.NewNameArray( + LineEndingStyleName(*beginLineEndingStyle), + LineEndingStyleName(*endLineEndingStyle), + ) + } + + return polyLineAnn +} + +// RenderDict renders ann into a PDF annotation dict. +func (ann PolyLineAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) { + + d, err := ann.MarkupAnnotation.RenderDict(xRefTable, pageIndRef) + if err != nil { + return nil, err + } + + if len(ann.Measure) > 0 { + d["Measure"] = ann.Measure + } + + if len(ann.Vertices) > 0 && len(ann.Path) > 0 { + return nil, errors.New("pdfcpu: PolyLineAnnotation supports \"Vertices\" or \"Path\" only") + } + + if len(ann.Vertices) > 0 { + d["Vertices"] = ann.Vertices + } else { + d["Path"] = ann.Path + } + + if ann.Intent != "" { + d.InsertName("IT", ann.Intent) + + } + + if ann.FillCol != nil { + d["IC"] = ann.FillCol.Array() + } + + if ann.BorderWidth > 0 { + d["BS"] = borderStyleDict(ann.BorderWidth, ann.BorderStyle) + } + + if len(ann.LineEndings) == 2 { + d["LE"] = ann.LineEndings + } + + return d, nil +} + +type TextMarkupAnnotation struct { + MarkupAnnotation + Quad types.QuadPoints +} + +func NewTextMarkupAnnotation( + subType AnnotationType, + rect types.Rectangle, + apObjNr int, + contents, id string, + modDate string, + f AnnotationFlags, + col *color.SimpleColor, + borderRadX float64, + borderRadY float64, + borderWidth float64, + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string, + + quad types.QuadPoints) TextMarkupAnnotation { + + ma := NewMarkupAnnotation(subType, rect, apObjNr, contents, id, modDate, f, col, borderRadX, borderRadY, borderWidth, title, popupIndRef, ca, rc, subject) + + return TextMarkupAnnotation{ + MarkupAnnotation: ma, + Quad: quad, + } +} + +func (ann TextMarkupAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) { + d, err := ann.MarkupAnnotation.RenderDict(xRefTable, pageIndRef) + if err != nil { + return nil, err + } + + if ann.Quad != nil { + d.Insert("QuadPoints", ann.Quad.Array()) + } + + return d, nil +} + +type HighlightAnnotation struct { + TextMarkupAnnotation +} + +func NewHighlightAnnotation( + rect types.Rectangle, + apObjNr int, + contents, id string, + modDate string, + f AnnotationFlags, + col *color.SimpleColor, + borderRadX float64, + borderRadY float64, + borderWidth float64, + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string, + + quad types.QuadPoints) HighlightAnnotation { + + return HighlightAnnotation{ + NewTextMarkupAnnotation(AnnHighLight, rect, apObjNr, contents, id, modDate, f, col, borderRadX, borderRadY, borderWidth, title, popupIndRef, ca, rc, subject, quad), + } +} + +type UnderlineAnnotation struct { + TextMarkupAnnotation +} + +func NewUnderlineAnnotation( + rect types.Rectangle, + apObjNr int, + contents, id string, + modDate string, + f AnnotationFlags, + col *color.SimpleColor, + borderRadX float64, + borderRadY float64, + borderWidth float64, + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string, + + quad types.QuadPoints) UnderlineAnnotation { + + return UnderlineAnnotation{ + NewTextMarkupAnnotation(AnnUnderline, rect, apObjNr, contents, id, modDate, f, col, borderRadX, borderRadY, borderWidth, title, popupIndRef, ca, rc, subject, quad), + } +} + +type SquigglyAnnotation struct { + TextMarkupAnnotation +} + +func NewSquigglyAnnotation( + rect types.Rectangle, + apObjNr int, + contents, id string, + modDate string, + f AnnotationFlags, + col *color.SimpleColor, + borderRadX float64, + borderRadY float64, + borderWidth float64, + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string, + + quad types.QuadPoints) SquigglyAnnotation { + + return SquigglyAnnotation{ + NewTextMarkupAnnotation(AnnSquiggly, rect, apObjNr, contents, id, modDate, f, col, borderRadX, borderRadY, borderWidth, title, popupIndRef, ca, rc, subject, quad), + } +} + +type StrikeOutAnnotation struct { + TextMarkupAnnotation +} + +func NewStrikeOutAnnotation( + rect types.Rectangle, + apObjNr int, + contents, id string, + modDate string, + f AnnotationFlags, + col *color.SimpleColor, + borderRadX float64, + borderRadY float64, + borderWidth float64, + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string, + + quad types.QuadPoints) StrikeOutAnnotation { + + return StrikeOutAnnotation{ + NewTextMarkupAnnotation(AnnStrikeOut, rect, apObjNr, contents, id, modDate, f, col, borderRadX, borderRadY, borderWidth, title, popupIndRef, ca, rc, subject, quad), + } +} + +type CaretAnnotation struct { + MarkupAnnotation + RD *types.Rectangle // A set of four numbers that shall describe the numerical differences between two rectangles: the Rect entry of the annotation and the actual boundaries of the underlying caret. + Paragraph bool // A new paragraph symbol (¶) shall be associated with the caret. +} + +func NewCaretAnnotation( + rect types.Rectangle, + apObjNr int, + contents, id string, + modDate string, + f AnnotationFlags, + col *color.SimpleColor, + borderRadX float64, + borderRadY float64, + borderWidth float64, + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string, + + rd *types.Rectangle, + paragraph bool) CaretAnnotation { + + ma := NewMarkupAnnotation(AnnCaret, rect, apObjNr, contents, id, modDate, f, col, borderRadX, borderRadY, borderWidth, title, popupIndRef, ca, rc, subject) + + return CaretAnnotation{ + MarkupAnnotation: ma, + RD: rd, + Paragraph: paragraph, + } +} + +func (ann CaretAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) { + d, err := ann.MarkupAnnotation.RenderDict(xRefTable, pageIndRef) + if err != nil { + return nil, err + } + + if ann.RD != nil { + d["RD"] = ann.RD.Array() + } + + if ann.Paragraph { + d["Sy"] = types.Name("P") + } + + return d, nil +} + +// A series of alternating x and y coordinates in PDF user space, specifying points along the path. +type InkPath []float64 + +type InkAnnotation struct { + MarkupAnnotation + InkList []InkPath // Array of n arrays, each representing a stroked path of points in user space. + BorderWidth float64 + BorderStyle BorderStyle +} + +func NewInkAnnotation( + rect types.Rectangle, + apObjNr int, + contents, id string, + modDate string, + f AnnotationFlags, + col *color.SimpleColor, + title string, + popupIndRef *types.IndirectRef, + ca *float64, + rc, subject string, + + ink []InkPath, + borderWidth float64, + borderStyle BorderStyle) InkAnnotation { + + ma := NewMarkupAnnotation(AnnInk, rect, apObjNr, contents, id, modDate, f, col, 0, 0, 0, title, popupIndRef, ca, rc, subject) + + return InkAnnotation{ + MarkupAnnotation: ma, + InkList: ink, + BorderWidth: borderWidth, + BorderStyle: borderStyle, + } +} + +func (ann InkAnnotation) RenderDict(xRefTable *XRefTable, pageIndRef *types.IndirectRef) (types.Dict, error) { + d, err := ann.MarkupAnnotation.RenderDict(xRefTable, pageIndRef) + if err != nil { + return nil, err + } + + ink := types.Array{} + for i := range ann.InkList { + ink = append(ink, types.NewNumberArray(ann.InkList[i]...)) + } + d["InkList"] = ink + + if ann.BorderWidth > 0 { + d["BS"] = borderStyleDict(ann.BorderWidth, ann.BorderStyle) + } + + return d, nil +} diff --git a/pkg/pdfcpu/model/attach.go b/pkg/pdfcpu/model/attach.go index d5e7a235..4c7c8760 100644 --- a/pkg/pdfcpu/model/attach.go +++ b/pkg/pdfcpu/model/attach.go @@ -23,7 +23,6 @@ import ( "sort" "time" - "github.com/angel-one/pdfcpu/pkg/filter" "github.com/angel-one/pdfcpu/pkg/log" "github.com/angel-one/pdfcpu/pkg/pdfcpu/types" "github.com/pkg/errors" @@ -42,7 +41,7 @@ func (a Attachment) String() string { return fmt.Sprintf("Attachment: id:%s desc:%s modTime:%s", a.ID, a.Desc, a.ModTime) } -func decodeFileSpecStreamDict(sd *types.StreamDict, id string) error { +func decodeFileSpecStreamDict(sd *types.StreamDict) error { fpl := sd.FilterPipeline if fpl == nil { @@ -50,23 +49,6 @@ func decodeFileSpecStreamDict(sd *types.StreamDict, id string) error { return nil } - // Ignore filter chains with length > 1 - if len(fpl) > 1 { - if log.DebugEnabled() { - log.Debug.Printf("decodedFileSpecStreamDict: ignore %s, more than 1 filter.\n", id) - } - return nil - } - - // Only FlateDecode supported. - if fpl[0].Name != filter.Flate { - if log.DebugEnabled() { - log.Debug.Printf("decodedFileSpecStreamDict: ignore %s, %s filter unsupported.\n", id, fpl[0].Name) - } - return nil - } - - // Decode streamDict for supported filters only. return sd.Decode() } @@ -92,7 +74,7 @@ func fileSpecStreamDict(xRefTable *XRefTable, d types.Dict) (*types.StreamDict, } d, err := xRefTable.DereferenceDict(o) - if err != nil || o == nil { + if err != nil || d == nil { return nil, err } @@ -122,6 +104,27 @@ func (xRefTable *XRefTable) NewFileSpecDictForAttachment(a Attachment) (types.Di return xRefTable.NewFileSpecDict(a.ID, a.ID, a.Desc, *sd) } +func getModDate(xRefTable *XRefTable, obj types.Object) (*time.Time, error) { + errInvalidModDate := errors.New("pdfcpu: invalid date ModDate") + o, err := xRefTable.Dereference(obj) + if err != nil || o == nil { + return nil, errInvalidModDate + } + sl, ok := o.(types.StringLiteral) + if !ok { + return nil, errInvalidModDate + } + s, err := types.StringLiteralToString(sl) + if err != nil { + return nil, errInvalidModDate + } + md, ok := types.DateTime(s, xRefTable.ValidationMode == ValidationRelaxed) + if !ok { + return nil, errInvalidModDate + } + return &md, nil +} + func fileSpecStreamDictInfo(xRefTable *XRefTable, id string, o types.Object, decode bool) (*types.StreamDict, string, string, *time.Time, error) { d, err := xRefTable.DereferenceDict(o) if err != nil { @@ -149,16 +152,16 @@ func fileSpecStreamDictInfo(xRefTable *XRefTable, id string, o types.Object, dec var modDate *time.Time if d = sd.DictEntry("Params"); d != nil { - if s := d.StringEntry("ModDate"); s != nil { - dt, ok := types.DateTime(*s, xRefTable.ValidationMode == ValidationRelaxed) - if !ok { - return nil, desc, "", nil, errors.New("pdfcpu: invalid date ModDate") + obj, ok := d.Find("ModDate") + if ok { + modDate, err = getModDate(xRefTable, obj) + if err != nil { + return nil, desc, "", nil, err } - modDate = &dt } } - err = decodeFileSpecStreamDict(sd, id) + err = decodeFileSpecStreamDict(sd) return sd, desc, fileName, modDate, err } diff --git a/pkg/pdfcpu/model/booklet.go b/pkg/pdfcpu/model/booklet.go index 3847a41f..7e402df0 100644 --- a/pkg/pdfcpu/model/booklet.go +++ b/pkg/pdfcpu/model/booklet.go @@ -64,6 +64,11 @@ func (b BookletBinding) String() string { return "" } +type BookletPage struct { + Number int + Rotate bool +} + func drawGuideLineLabel(w io.Writer, x, y float64, s string, mb *types.Rectangle, fm FontMap, rot int) { fontName := "Helvetica" td := TextDescriptor{ @@ -145,8 +150,13 @@ func getCutFolds(nup *NUp) (horizontal cutOrFold, vertical cutOrFold) { // Really, it has two horizontal cuts. return cut, fold case 8: - // Also has a horizontal cut in the center. - return fold, cut + if nup.BookletBinding == LongEdge { + // Also has cuts in the center row & column. + return cut, cut + } else { + // short edge has the fold in the center col. cut on each row + return cut, fold + } } return none, none } @@ -206,10 +216,17 @@ func DrawBookletGuides(nup *NUp, w io.Writer) FontMap { drawGuideHorizontal(w, height*1/3, width, horz, nup, mb, fm) drawGuideHorizontal(w, height*2/3, width, horz, nup, mb, fm) case 8: - // 8up: middle cut and 1/4,3/4 folds - drawGuideHorizontal(w, height/2, width, cut, nup, mb, fm) - drawGuideHorizontal(w, height*1/4, width, fold, nup, mb, fm) - drawGuideHorizontal(w, height*3/4, width, fold, nup, mb, fm) + if nup.BookletBinding == LongEdge { + // 8up: middle cut and 1/4,3/4 folds + drawGuideHorizontal(w, height/2, width, cut, nup, mb, fm) + drawGuideHorizontal(w, height*1/4, width, fold, nup, mb, fm) + drawGuideHorizontal(w, height*3/4, width, fold, nup, mb, fm) + } else { + // short edge: cuts on rows + for i := 1; i < 4; i++ { + drawGuideHorizontal(w, height*float64(i)/4, width, cut, nup, mb, fm) + } + } } } if vert != none { diff --git a/pkg/pdfcpu/model/box.go b/pkg/pdfcpu/model/box.go index e5357b3e..67864955 100644 --- a/pkg/pdfcpu/model/box.go +++ b/pkg/pdfcpu/model/box.go @@ -181,7 +181,7 @@ func processBox(b **Box, boxID, paramValueStr string, unit types.DisplayUnit) er boxVal, err := resolveBoxType(paramValueStr) if err == nil { if boxVal == boxID { - return errors.Errorf("pdfcpu: invalid box self assigment: %s", boxID) + return errors.Errorf("pdfcpu: invalid box self assignment: %s", boxID) } *b = &Box{RefBox: boxVal} return nil diff --git a/pkg/pdfcpu/model/certificate.go b/pkg/pdfcpu/model/certificate.go new file mode 100644 index 00000000..c3cf1328 --- /dev/null +++ b/pkg/pdfcpu/model/certificate.go @@ -0,0 +1,114 @@ +/* +Copyright 2025 The pdfcpu Authors. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package model + +import ( + "crypto/x509" + "crypto/x509/pkix" + "fmt" + "os" + "strings" +) + +// CertDir is the location for installed certificates. +var CertDir string + +// UserCertPool contains all certificates loaded from CertDir. +var UserCertPool *x509.CertPool + +// TODO Do we need locking? +//var UserCertPoolLock = &sync.RWMutex{} + +func IsPEM(fname string) bool { + return strings.HasSuffix(strings.ToLower(fname), ".pem") +} + +func IsP7C(fname string) bool { + return strings.HasSuffix(strings.ToLower(fname), ".p7c") +} + +func strSliceString(ss []string) string { + if len(ss) == 0 { + return "" + } + ss1 := []string{} + ss1 = append(ss1, ss...) + return strings.Join(ss1, ",") +} + +func nameString(subj pkix.Name) string { + var sb strings.Builder + + sb.WriteString(fmt.Sprintf(" org : %s", strSliceString(subj.Organization))) + + if len(subj.OrganizationalUnit) > 0 { + sb.WriteString(fmt.Sprintf("\n unit : %s", strSliceString(subj.OrganizationalUnit))) + } + + if len(subj.CommonName) > 0 { + sb.WriteString(fmt.Sprintf("\n name : %s", subj.CommonName)) + } + + if len(subj.StreetAddress) > 0 { + sb.WriteString(fmt.Sprintf("\n street : %s", strSliceString(subj.StreetAddress))) + } + + if len(subj.Locality) > 0 { + sb.WriteString(fmt.Sprintf("\n locality : %s", strSliceString(subj.Locality))) + } + + if len(subj.Province) > 0 { + sb.WriteString(fmt.Sprintf("\n province : %s", strSliceString(subj.Province))) + } + + if len(subj.PostalCode) > 0 { + sb.WriteString(fmt.Sprintf("\n postalCode: %s", strSliceString(subj.PostalCode))) + } + + if len(subj.Country) > 0 { + sb.WriteString(fmt.Sprintf("\n country : %s", strSliceString(subj.Country))) + } + + return sb.String() +} + +func CertString(cert *x509.Certificate) string { + + return fmt.Sprintf( + " Subject:\n%s\n"+ + " Issuer:\n%s\n"+ + " from: %s\n"+ + " thru: %s\n"+ + " CA: %t\n", + nameString(cert.Subject), + nameString(cert.Issuer), + cert.NotBefore.Format("2006-01-02"), + cert.NotAfter.Format("2006-01-02"), + cert.IsCA, + ) +} + +func ResetCertificates() error { + + // remove certs/*.pem + + path, err := os.UserConfigDir() + if err != nil { + path = os.TempDir() + } + return EnsureDefaultConfigAt(path, true) +} diff --git a/pkg/pdfcpu/model/configuration.go b/pkg/pdfcpu/model/configuration.go index ef55aecb..089e6c9e 100644 --- a/pkg/pdfcpu/model/configuration.go +++ b/pkg/pdfcpu/model/configuration.go @@ -17,10 +17,12 @@ limitations under the License. package model import ( + "embed" _ "embed" "fmt" "os" "path/filepath" + "strings" "time" "github.com/angel-one/pdfcpu/pkg/font" @@ -121,6 +123,7 @@ const ( IMPORTBOOKMARKS EXPORTBOOKMARKS LISTIMAGES + UPDATEIMAGES CREATE DUMP LISTFORMFIELDS @@ -152,6 +155,12 @@ const ( SETVIEWERPREFERENCES RESETVIEWERPREFERENCES ZOOM + ADDSIGNATURE + VALIDATESIGNATURE + LISTCERTIFICATES + INSPECTCERTIFICATES + IMPORTCERTIFICATES + VALIDATESIGNATURES ) // Configuration of a Context. @@ -159,6 +168,10 @@ type Configuration struct { // Location of corresponding config.yml Path string + CreationDate string + + Version string + // Check filename extensions. CheckFileNameExt bool @@ -227,14 +240,39 @@ type Configuration struct { // Date format. DateFormat string - // Optimize duplicate content streams across pages. + // Optimize after reading and validating the xreftable but before processing. + Optimize bool + + // Optimize after processing but before writing. + // TODO add to config.yml + OptimizeBeforeWriting bool + + // Optimize page resources via content stream analysis. (assuming Optimize == true || OptimizeBeforeWriting == true) + OptimizeResourceDicts bool + + // Optimize duplicate content streams across pages. (assuming Optimize == true || OptimizeBeforeWriting == true) OptimizeDuplicateContentStreams bool - // Merge creates bookmarks + // Merge creates bookmarks. CreateBookmarks bool // PDF Viewer is expected to supply appearance streams for form fields. NeedAppearances bool + + // Internet availability. + Offline bool + + // HTTP timeout in seconds. + Timeout int + + // Http timeout in seconds for CRL revocation checking. + TimeoutCRL int + + // Http timeout in seconds for OCSP revocation checking. + TimeoutOCSP int + + // Preferred certificate revocation checking mechanism: CRL, OSCP + PreferredCertRevocationChecker int } // ConfigPath defines the location of pdfcpu's configuration directory. @@ -256,11 +294,29 @@ var configFileBytes []byte //go:embed resources/Roboto-Regular.ttf var robotoFontFileBytes []byte -func ensureConfigFileAt(path string) error { +//go:embed resources/certs/*.p7c +var certFilesEU embed.FS + +func ensureConfigFileAt(path string, override bool) error { f, err := os.Open(path) - if err != nil { + if err != nil || override { f.Close() - s := fmt.Sprintf("#############################\n# pdfcpu %s #\n# Created: %s #\n", VersionStr, time.Now().Format("2006-01-02 15:04")) + + s := fmt.Sprintf(` +############################# +# Default configuration # +############################# + +# Creation date +created: %s + +# version (Do not edit!) +version: %s + +`, + time.Now().Format("2006-01-02 15:04"), + VersionStr) + bb := append([]byte(s), configFileBytes...) if err := os.WriteFile(path, bb, os.ModePerm); err != nil { return err @@ -275,40 +331,117 @@ func ensureConfigFileAt(path string) error { return parseConfigFile(f, path) } -// EnsureDefaultConfigAt tries to load the default configuration from path. -// If path/pdfcpu/config.yaml is not found, it will be created. -func EnsureDefaultConfigAt(path string) error { - configDir := filepath.Join(path, "pdfcpu") - font.UserFontDir = filepath.Join(configDir, "fonts") - if err := os.MkdirAll(font.UserFontDir, os.ModePerm); err != nil { - return err - } - if err := ensureConfigFileAt(filepath.Join(configDir, "config.yml")); err != nil { - return err +func onlyHidden(files []os.DirEntry) bool { + for _, file := range files { + if !strings.HasPrefix(file.Name(), ".") { + return false + } } - //fmt.Println(loadedDefaultConfig) + return true +} +func initUserFonts() error { files, err := os.ReadDir(font.UserFontDir) if err != nil { return err } - if len(files) == 0 { + if onlyHidden(files) { // Ensure Roboto font for form filling. - fn := "Roboto-Regular" + fontname := "Roboto-Regular" if log.CLIEnabled() { log.CLI.Printf("installing user font:") } - if err := font.InstallFontFromBytes(font.UserFontDir, fn, robotoFontFileBytes); err != nil { - if log.CLIEnabled() { - log.CLI.Printf("%v", err) - } + if err := font.InstallFontFromBytes(font.UserFontDir, fontname, robotoFontFileBytes); err != nil { + return err } } return font.LoadUserFonts() } +func initCertificates() error { + // NOTE + // Load certs managed by The European Union Trusted Lists (EUTL) (https://eidas.ec.europa.eu/efda/trust-services/browse/eidas/tls). + // Additional certificates may be loaded using the corresponding CLI command: pdfcpu certificates import + // Certificates will be loaded by corresponding commands where applicable. + + files, err := os.ReadDir(CertDir) + if err != nil { + return err + } + if !onlyHidden(files) { + return nil + } + + files, err = certFilesEU.ReadDir("resources/certs") + if err != nil { + return err + } + + euDir := filepath.Join(CertDir, "eu") + if err := os.MkdirAll(euDir, os.ModePerm); err != nil { + return err + } + + for _, file := range files { + //fmt.Println("Embedded file:", file.Name()) + + content, err := certFilesEU.ReadFile("resources/certs/" + file.Name()) + if err != nil { + return err + } + + path := filepath.Join(euDir, file.Name()) + //fmt.Printf("writing to %s\n", path) + + destFile, err := os.Create(path) + if err != nil { + return err + } + defer destFile.Close() + + _, err = destFile.Write(content) + if err != nil { + return err + } + } + + return nil +} + +// EnsureDefaultConfigAt tries to load the default configuration from path. +// If path/pdfcpu/config.yaml is not found, it will be created. +func EnsureDefaultConfigAt(path string, override bool) error { + configDir := filepath.Join(path, "pdfcpu") + if err := os.MkdirAll(configDir, os.ModePerm); err != nil { + return err + } + if err := ensureConfigFileAt(filepath.Join(configDir, "config.yml"), override); err != nil { + return err + } + + font.UserFontDir = filepath.Join(configDir, "fonts") + if err := os.MkdirAll(font.UserFontDir, os.ModePerm); err != nil { + return err + } + if err := initUserFonts(); err != nil { + return err + } + + CertDir = filepath.Join(configDir, "certs") + if err := os.MkdirAll(CertDir, os.ModePerm); err != nil { + return err + } + if err := initCertificates(); err != nil { + return err + } + + //fmt.Println(loadedDefaultConfig) + + return nil +} + func newDefaultConfiguration() *Configuration { // NOTE: Needs to stay in sync with config.yml // @@ -316,6 +449,8 @@ func newDefaultConfiguration() *Configuration { // cli: supply -conf disable // api: call api.DisableConfigDir() return &Configuration{ + CreationDate: time.Now().Format("2006-01-02 15:04"), + Version: VersionStr, CheckFileNameExt: true, Reader15: true, DecodeAllStreams: false, @@ -329,10 +464,24 @@ func newDefaultConfiguration() *Configuration { Permissions: PermissionsPrint, TimestampFormat: "2006-01-02 15:04", DateFormat: "2006-01-02", + Optimize: true, + OptimizeBeforeWriting: true, + OptimizeResourceDicts: true, OptimizeDuplicateContentStreams: false, CreateBookmarks: true, NeedAppearances: false, + Offline: false, + Timeout: 5, + PreferredCertRevocationChecker: CRL, + } +} + +func ResetConfig() error { + path, err := os.UserConfigDir() + if err != nil { + path = os.TempDir() } + return EnsureDefaultConfigAt(path, true) } // NewDefaultConfiguration returns the default pdfcpu configuration. @@ -346,11 +495,11 @@ func NewDefaultConfiguration() *Configuration { if err != nil { path = os.TempDir() } - if err = EnsureDefaultConfigAt(path); err == nil { + if err = EnsureDefaultConfigAt(path, false); err == nil { c := *loadedDefaultConfig return &c } - fmt.Fprintf(os.Stderr, "pdfcpu: config dir problem: %v\n", err) + fmt.Fprintf(os.Stderr, "pdfcpu: config problem: %v\n", err) os.Exit(1) } // Bypass config.yml @@ -377,49 +526,6 @@ func NewRC4Configuration(userPW, ownerPW string, keyLength int) *Configuration { return c } -func (c Configuration) String() string { - path := "default" - if len(c.Path) > 0 { - path = c.Path - } - return fmt.Sprintf("pdfcpu configuration:\n"+ - "Path: %s\n"+ - "CheckFileNameExt: %t\n"+ - "Reader15: %t\n"+ - "DecodeAllStreams: %t\n"+ - "ValidationMode: %s\n"+ - "Eol: %s\n"+ - "WriteObjectStream: %t\n"+ - "WriteXrefStream: %t\n"+ - "EncryptUsingAES: %t\n"+ - "EncryptKeyLength: %d\n"+ - "Permissions: %d\n"+ - "Unit : %s\n"+ - "TimestampFormat: %s\n"+ - "DateFormat: %s\n"+ - "OptimizeDuplicateContentStreams %t\n"+ - "CreateBookmarks %t\n"+ - "NeedAppearances %t\n", - path, - c.CheckFileNameExt, - c.Reader15, - c.DecodeAllStreams, - c.ValidationModeString(), - c.EolString(), - c.WriteObjectStream, - c.WriteXRefStream, - c.EncryptUsingAES, - c.EncryptKeyLength, - c.Permissions, - c.UnitString(), - c.TimestampFormat, - c.DateFormat, - c.OptimizeDuplicateContentStreams, - c.CreateBookmarks, - c.NeedAppearances, - ) -} - // EolString returns a string rep for the eol in effect. func (c *Configuration) EolString() string { var s string @@ -442,6 +548,14 @@ func (c *Configuration) ValidationModeString() string { return "relaxed" } +// PreferredCertRevocationCheckerString returns a string rep for the preferred certificate revocation checker in effect. +func (c *Configuration) PreferredCertRevocationCheckerString() string { + if c.PreferredCertRevocationChecker == CRL { + return "CRL" + } + return "OSCP" +} + // UnitString returns a string rep for the display unit in effect. func (c *Configuration) UnitString() string { var s string diff --git a/pkg/pdfcpu/model/context.go b/pkg/pdfcpu/model/context.go index 98973a02..6d0e4a84 100644 --- a/pkg/pdfcpu/model/context.go +++ b/pkg/pdfcpu/model/context.go @@ -183,17 +183,18 @@ type ReadContext struct { FileSize int64 // Input file size. RS io.ReadSeeker // Input read seeker. EolCount int // 1 or 2 characters used for eol. - BinaryTotalSize int64 // total stream data - BinaryImageSize int64 // total image stream data - BinaryFontSize int64 // total font stream data (fontfiles) - BinaryImageDuplSize int64 // total obsolet image stream data after optimization - BinaryFontDuplSize int64 // total obsolet font stream data after optimization - Linearized bool // File is linearized. - Hybrid bool // File is a hybrid PDF file. - UsingObjectStreams bool // File is using object streams. - ObjectStreams types.IntSet // All object numbers of any object streams found which need to be decoded. - UsingXRefStreams bool // File is using xref streams. - XRefStreams types.IntSet // All object numbers of any xref streams found. + RepairOffset int64 + BinaryTotalSize int64 // total stream data + BinaryImageSize int64 // total image stream data + BinaryFontSize int64 // total font stream data (fontfiles) + BinaryImageDuplSize int64 // total obsolet image stream data after optimization + BinaryFontDuplSize int64 // total obsolet font stream data after optimization + Linearized bool // File is linearized. + Hybrid bool // File is a hybrid PDF file. + UsingObjectStreams bool // File is using object streams. + ObjectStreams types.IntSet // All object numbers of any object streams found which need to be decoded. + UsingXRefStreams bool // File is using xref streams. + XRefStreams types.IntSet // All object numbers of any xref streams found. } func newReadContext(rs io.ReadSeeker) (*ReadContext, error) { @@ -238,30 +239,6 @@ func (rc *ReadContext) ObjectStreamsString() (int, string) { return len(objStreams), strings.Join(objStreams, ",") } -// IsXRefStreamObject returns true if object #i is a an xref stream. -func (rc *ReadContext) IsXRefStreamObject(i int) bool { - return rc.XRefStreams[i] -} - -// XRefStreamsString returns a formatted string and the number of xref stream objects. -func (rc *ReadContext) XRefStreamsString() (int, string) { - - var objs []int - for k := range rc.XRefStreams { - if rc.XRefStreams[k] { - objs = append(objs, k) - } - } - sort.Ints(objs) - - var xrefStreams []string - for _, i := range objs { - xrefStreams = append(xrefStreams, fmt.Sprintf("%d", i)) - } - - return len(xrefStreams), strings.Join(xrefStreams, ",") -} - // LogStats logs stats for read file. func (rc *ReadContext) LogStats(optimized bool) { if !log.StatsEnabled() { @@ -302,22 +279,23 @@ func (rc *ReadContext) ReadFileSize() int { return int(rc.FileSize) } -// OptimizationContext represents the context for the optimiziation of a PDF file. +// OptimizationContext represents the context for the optimization of a PDF file. type OptimizationContext struct { // Font section - PageFonts []types.IntSet // For each page a registry of font object numbers. - FontObjects map[int]*FontObject // FontObject lookup table by font object number. - FormFontObjects map[int]*FontObject // FormFontObject lookup table by font object number. - Fonts map[string][]int // All font object numbers registered for a font name. - DuplicateFonts map[int]types.Dict // Registry of duplicate font dicts. - DuplicateFontObjs types.IntSet // The set of objects that represents the union of the object graphs of all duplicate font dicts. + PageFonts []types.IntSet // For each page a registry of font object numbers. + FontObjects map[int]*FontObject // FontObject lookup table by font object number. + FormFontObjects map[int]*FontObject // FormFontObject lookup table by font object number. + Fonts map[string][]int // All font object numbers registered for a font name. + DuplicateFonts map[int]types.Dict // Registry of duplicate font dicts. + DuplicateFontObjs types.IntSet // The set of objects that represents the union of the object graphs of all duplicate font dicts. + CorruptFontResDicts []types.Dict // Corrupted fontDicts encountered during bypassing xreftable. // Image section - PageImages []types.IntSet // For each page a registry of image object numbers. - ImageObjects map[int]*ImageObject // ImageObject lookup table by image object number. - DuplicateImages map[int]*types.StreamDict // Registry of duplicate image dicts. - DuplicateImageObjs types.IntSet // The set of objects that represents the union of the object graphs of all duplicate image dicts. + PageImages []types.IntSet // For each page a registry of image object numbers. + ImageObjects map[int]*ImageObject // ImageObject lookup table by image object number. + DuplicateImages map[int]*DuplicateImageObject // Registry of duplicate image dicts. + DuplicateImageObjs types.IntSet // The set of objects that represents the union of the object graphs of all duplicate image dicts. ContentStreamCache map[int]*types.StreamDict FormStreamCache map[int]*types.StreamDict @@ -331,13 +309,14 @@ type OptimizationContext struct { func newOptimizationContext() *OptimizationContext { return &OptimizationContext{ - FontObjects: map[int]*FontObject{}, - FormFontObjects: map[int]*FontObject{}, - Fonts: map[string][]int{}, - DuplicateFonts: map[int]types.Dict{}, - DuplicateFontObjs: types.IntSet{}, + FontObjects: map[int]*FontObject{}, + FormFontObjects: map[int]*FontObject{}, + Fonts: map[string][]int{}, + DuplicateFonts: map[int]types.Dict{}, + DuplicateFontObjs: types.IntSet{}, + ImageObjects: map[int]*ImageObject{}, - DuplicateImages: map[int]*types.StreamDict{}, + DuplicateImages: map[int]*DuplicateImageObject{}, DuplicateImageObjs: types.IntSet{}, DuplicateInfoObjects: types.IntSet{}, ContentStreamCache: map[int]*types.StreamDict{}, @@ -544,7 +523,10 @@ func (oc *OptimizationContext) collectImageInfo(logStr []string) []string { for _, objectNumber := range objectNumbers { imageObject := oc.ImageObjects[objectNumber] - logStr = append(logStr, fmt.Sprintf("#%-6d %s\n", objectNumber, imageObject.ResourceNamesString())) + resName, ok := imageObject.ResourceNames[i] + if ok { + logStr = append(logStr, fmt.Sprintf("#%-6d %s\n", objectNumber, resName)) + } } } @@ -601,6 +583,8 @@ type WriteContext struct { BinaryFontSize int64 // total font stream data (fontfiles) = copy of Read.BinaryFontSize. Table map[int]int64 // object write offsets Offset int64 // current write offset + OffsetSigByteRange int64 // write offset of signature dict value for "ByteRange" + OffsetSigContents int64 // write offset of signature dict value for "Contents" WriteToObjectStream bool // if true start to embed objects into object streams and obey ObjectStreamMaxObjects. CurrentObjStream *int // if not nil, any new non-stream-object gets added to the object stream with this object number. Eol string // end of line char sequence diff --git a/pkg/pdfcpu/model/cut.go b/pkg/pdfcpu/model/cut.go index 47562195..180b916a 100644 --- a/pkg/pdfcpu/model/cut.go +++ b/pkg/pdfcpu/model/cut.go @@ -49,7 +49,7 @@ func parseHorCut(v string, cut *Cut) (err error) { return errors.Errorf("pdfcpu: cut position must be a float value: %s\n", s) } if f <= 0 || f >= 1 { - return errors.Errorf("pdfcpu: invalid cut poistion %.2f: 0 < i < 1.0\n", f) + return errors.Errorf("pdfcpu: invalid cut position %.2f: 0 < i < 1.0\n", f) } cut.Hor = append(cut.Hor, f) } @@ -65,7 +65,7 @@ func parseVertCut(v string, cut *Cut) (err error) { return errors.Errorf("pdfcpu: cut position must be a float value: %s\n", s) } if f <= 0 || f >= 1 { - return errors.Errorf("pdfcpu: invalid cut poistion %.2f: 0 < i < 1.0\n", f) + return errors.Errorf("pdfcpu: invalid cut position %.2f: 0 < i < 1.0\n", f) } cut.Vert = append(cut.Vert, f) } diff --git a/pkg/pdfcpu/model/dereference.go b/pkg/pdfcpu/model/dereference.go index f4d0c2f5..0a812693 100644 --- a/pkg/pdfcpu/model/dereference.go +++ b/pkg/pdfcpu/model/dereference.go @@ -17,15 +17,53 @@ limitations under the License. package model import ( + "context" "strings" "github.com/angel-one/pdfcpu/pkg/pdfcpu/types" "github.com/pkg/errors" ) -func (xRefTable *XRefTable) indRefToObject(ir *types.IndirectRef) (types.Object, error) { +func processDictRefCounts(xRefTable *XRefTable, d types.Dict) { + for _, e := range d { + switch o1 := e.(type) { + case types.IndirectRef: + xRefTable.IncrementRefCount(&o1) + case types.Dict: + ProcessRefCounts(xRefTable, o1) + case types.Array: + ProcessRefCounts(xRefTable, o1) + } + } +} + +func processArrayRefCounts(xRefTable *XRefTable, a types.Array) { + for _, e := range a { + switch o1 := e.(type) { + case types.IndirectRef: + xRefTable.IncrementRefCount(&o1) + case types.Dict: + ProcessRefCounts(xRefTable, o1) + case types.Array: + ProcessRefCounts(xRefTable, o1) + } + } +} + +func ProcessRefCounts(xRefTable *XRefTable, o types.Object) { + switch o := o.(type) { + case types.Dict: + processDictRefCounts(xRefTable, o) + case types.StreamDict: + processDictRefCounts(xRefTable, o.Dict) + case types.Array: + processArrayRefCounts(xRefTable, o) + } +} + +func (xRefTable *XRefTable) indRefToObject(ir *types.IndirectRef, decodeLazy bool) (types.Object, int, error) { if ir == nil { - return nil, errors.New("pdfcpu: indRefToObject: input argument is nil") + return nil, 0, errors.New("pdfcpu: indRefToObject: input argument is nil") } // 7.3.10 @@ -33,13 +71,23 @@ func (xRefTable *XRefTable) indRefToObject(ir *types.IndirectRef) (types.Object, // it shall be treated as a reference to the null object. entry, found := xRefTable.FindTableEntryForIndRef(ir) if !found || entry.Free { - return nil, nil + return nil, 0, nil } xRefTable.CurObj = int(ir.ObjectNumber) - // return dereferenced object - return entry.Object, nil + if l, ok := entry.Object.(types.LazyObjectStreamObject); ok && decodeLazy { + ob, err := l.DecodedObject(context.TODO()) + if err != nil { + return nil, 0, err + } + + ProcessRefCounts(xRefTable, ob) + entry.Object = ob + } + + // return dereferenced object and increment nr. + return entry.Object, entry.Incr, nil } // Dereference resolves an indirect object and returns the resulting PDF object. @@ -50,7 +98,32 @@ func (xRefTable *XRefTable) Dereference(o types.Object) (types.Object, error) { return o, nil } - return xRefTable.indRefToObject(&ir) + obj, _, err := xRefTable.indRefToObject(&ir, true) + return obj, err +} + +// Dereference resolves an indirect object and returns the resulting PDF object. +// It also returns the number of the written PDF Increment this object is part of. +// The higher the increment number the older the object. +func (xRefTable *XRefTable) DereferenceWithIncr(o types.Object) (types.Object, int, error) { + ir, ok := o.(types.IndirectRef) + if !ok { + // Nothing do dereference. + return o, 0, nil + } + + return xRefTable.indRefToObject(&ir, true) +} + +func (xRefTable *XRefTable) DereferenceForWrite(o types.Object) (types.Object, error) { + ir, ok := o.(types.IndirectRef) + if !ok { + // Nothing do dereference. + return o, nil + } + + obj, _, err := xRefTable.indRefToObject(&ir, false) + return obj, err } // DereferenceBoolean resolves and validates a boolean object, which may be an indirect reference. @@ -280,6 +353,24 @@ func (xRefTable *XRefTable) DereferenceDict(o types.Object) (types.Dict, error) return d, nil } +// DereferenceDictWithIncr resolves and validates a dictionary object, which may be an indirect reference. +// It also returns the number of the written PDF Increment this object is part of. +// The higher the increment number the older the object. +func (xRefTable *XRefTable) DereferenceDictWithIncr(o types.Object) (types.Dict, int, error) { + + o, incr, err := xRefTable.DereferenceWithIncr(o) + if err != nil || o == nil { + return nil, 0, err + } + + d, ok := o.(types.Dict) + if !ok { + return nil, 0, errors.Errorf("pdfcpu: dereferenceDictWithIncr: wrong type %T <%v>", o, o) + } + + return d, incr, nil +} + // DereferenceFontDict returns the font dict referenced by indRef. func (xRefTable *XRefTable) DereferenceFontDict(indRef types.IndirectRef) (types.Dict, error) { d, err := xRefTable.DereferenceDict(indRef) @@ -290,12 +381,14 @@ func (xRefTable *XRefTable) DereferenceFontDict(indRef types.IndirectRef) (types return nil, nil } - if d.Type() == nil { - return nil, errors.Errorf("pdfcpu: DereferenceFontDict: missing dict type %s\n", indRef) - } + if xRefTable.ValidationMode == ValidationStrict { + if d.Type() == nil { + return nil, errors.Errorf("pdfcpu: DereferenceFontDict: missing dict type %s\n", indRef) + } - if *d.Type() != "Font" { - return nil, errors.Errorf("pdfcpu: DereferenceFontDict: expected Type=Font, unexpected Type: %s", *d.Type()) + if *d.Type() != "Font" { + return nil, errors.Errorf("pdfcpu: DereferenceFontDict: expected Type=Font, unexpected Type: %s", *d.Type()) + } } return d, nil @@ -338,21 +431,27 @@ func (xRefTable *XRefTable) dereferenceDestArray(o types.Object) (types.Array, e } arr, ok := o1.(types.Array) if !ok { - errors.Errorf("pdfcpu: corrupted dest array:\n%s\n", o) + errors.Errorf("pdfcpu: invalid dest array:\n%s\n", o) } return arr, nil } - return nil, errors.Errorf("pdfcpu: corrupted dest array:\n%s\n", o) + return nil, errors.Errorf("pdfcpu: invalid dest array:\n%s\n", o) } // DereferenceDestArray resolves the destination for key. func (xRefTable *XRefTable) DereferenceDestArray(key string) (types.Array, error) { - o, ok := xRefTable.Names["Dests"].Value(key) - if !ok { - return nil, errors.Errorf("pdfcpu: corrupted named destination for: %s", key) + if dNames := xRefTable.Names["Dests"]; dNames != nil { + if o, ok := dNames.Value(key); ok { + return xRefTable.dereferenceDestArray(o) + } } - return xRefTable.dereferenceDestArray(o) + + if o, ok := xRefTable.Dests[key]; ok { + return xRefTable.dereferenceDestArray(o) + } + + return nil, errors.Errorf("pdfcpu: invalid named destination for: %s", key) } // DereferenceDictEntry returns a dereferenced dict entry. diff --git a/pkg/pdfcpu/model/font.go b/pkg/pdfcpu/model/font.go new file mode 100644 index 00000000..bcd855d6 --- /dev/null +++ b/pkg/pdfcpu/model/font.go @@ -0,0 +1,25 @@ +/* +Copyright 2024 The pdfcpu Authors. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package model + +type FontInfo struct { + Prefix string `json:"prefix"` + Name string `json:"name"` + Type string `json:"type"` + Encoding string `json:"encoding"` + Embedded bool `json:"embedded"` +} diff --git a/pkg/pdfcpu/model/image.go b/pkg/pdfcpu/model/image.go index 7c92500f..9dcc1e70 100644 --- a/pkg/pdfcpu/model/image.go +++ b/pkg/pdfcpu/model/image.go @@ -18,17 +18,22 @@ package model import ( "bytes" + "encoding/binary" + "fmt" "image" "image/color" "image/draw" "image/jpeg" _ "image/png" + "io" "math" "os" "path/filepath" "strings" + "github.com/hhrutter/tiff" + "github.com/angel-one/pdfcpu/pkg/filter" "github.com/angel-one/pdfcpu/pkg/pdfcpu/types" "github.com/pkg/errors" @@ -111,7 +116,8 @@ func createSMaskObject(xRefTable *XRefTable, buf []byte, w, h, bpc int) (*types. return xRefTable.IndRefForNewObject(*sd) } -func createFlateImageObject(xRefTable *XRefTable, buf, sm []byte, w, h, bpc int, cs string) (*types.StreamDict, error) { +// CreateFlateImageStreamDict returns a flate stream dict. +func CreateFlateImageStreamDict(xRefTable *XRefTable, buf, sm []byte, w, h, bpc int, cs string) (*types.StreamDict, error) { var softMaskIndRef *types.IndirectRef if sm != nil { var err error @@ -121,14 +127,22 @@ func createFlateImageObject(xRefTable *XRefTable, buf, sm []byte, w, h, bpc int, } } - // Create Flate stream dict. - sd, _ := xRefTable.NewStreamDictForBuf(buf) - sd.InsertName("Type", "XObject") - sd.InsertName("Subtype", "Image") - sd.InsertInt("Width", w) - sd.InsertInt("Height", h) - sd.InsertInt("BitsPerComponent", bpc) - sd.InsertName("ColorSpace", cs) + sd := &types.StreamDict{ + Dict: types.Dict( + map[string]types.Object{ + "Type": types.Name("XObject"), + "Subtype": types.Name("Image"), + "Width": types.Integer(w), + "Height": types.Integer(h), + "BitsPerComponent": types.Integer(bpc), + "ColorSpace": types.Name(cs), + }, + ), + Content: buf, + FilterPipeline: []types.PDFFilter{{Name: filter.Flate, DecodeParms: nil}}, + } + + sd.InsertName("Filter", filter.Flate) if softMaskIndRef != nil { sd.Insert("SMask", *softMaskIndRef) @@ -145,8 +159,8 @@ func createFlateImageObject(xRefTable *XRefTable, buf, sm []byte, w, h, bpc int, return sd, nil } -// CreateDCTImageObject returns a DCT encoded stream dict. -func CreateDCTImageObject(xRefTable *XRefTable, buf []byte, w, h, bpc int, cs string) (*types.StreamDict, error) { +// CreateDCTImageStreamDict returns a DCT encoded stream dict. +func CreateDCTImageStreamDict(xRefTable *XRefTable, buf []byte, w, h, bpc int, cs string) (*types.StreamDict, error) { sd := &types.StreamDict{ Dict: types.Dict( map[string]types.Object{ @@ -184,15 +198,29 @@ func CreateDCTImageObject(xRefTable *XRefTable, buf []byte, w, h, bpc int, cs st return sd, nil } -func writeRGBAImageBuf(img image.Image) []byte { +func writeRGBAImageBuf(img image.Image) ([]byte, []byte) { w := img.Bounds().Dx() h := img.Bounds().Dy() i := 0 + var sm []byte buf := make([]byte, w*h*3) + var softMask bool for y := 0; y < h; y++ { for x := 0; x < w; x++ { c := img.At(x, y).(color.RGBA) + if !softMask { + if c.A != 0xFF { + softMask = true + sm = []byte{} + for j := 0; j < y*w+x; j++ { + sm = append(sm, 0xFF) + } + sm = append(sm, c.A) + } + } else { + sm = append(sm, c.A) + } buf[i] = c.R buf[i+1] = c.G buf[i+2] = c.B @@ -200,7 +228,7 @@ func writeRGBAImageBuf(img image.Image) []byte { } } - return buf + return buf, sm } func writeRGBA64ImageBuf(img image.Image) []byte { @@ -225,25 +253,6 @@ func writeRGBA64ImageBuf(img image.Image) []byte { return buf } -// func writeYCbCrToRGBAImageBuf(img image.Image) []byte { -// w := img.Bounds().Dx() -// h := img.Bounds().Dy() -// i := 0 -// buf := make([]byte, w*h*3) - -// for y := 0; y < h; y++ { -// for x := 0; x < w; x++ { -// c := img.At(x, y).(color.YCbCr) -// r, g, b, _ := c.RGBA() -// buf[i] = uint8(r >> 8 & 0xFF) -// buf[i+1] = uint8(g >> 8 & 0xFF) -// buf[i+2] = uint8(b >> 8 & 0xFF) -// i += 3 -// } -// } -// return buf -// } - func writeNRGBAImageBuf(xRefTable *XRefTable, img image.Image) ([]byte, []byte) { w := img.Bounds().Dx() h := img.Bounds().Dy() @@ -376,11 +385,27 @@ func writeCMYKImageBuf(img image.Image) []byte { func convertToRGBA(img image.Image) *image.RGBA { b := img.Bounds() - m := image.NewRGBA(image.Rect(0, 0, b.Dx(), b.Dy())) + m := image.NewRGBA(b) draw.Draw(m, m.Bounds(), img, b.Min, draw.Src) return m } +func convertNYCbCrAToRGBA(img *image.NYCbCrA) *image.RGBA { + b := img.Bounds() + m := image.NewRGBA(b) + for y := b.Min.Y; y < b.Max.Y; y++ { + for x := b.Min.X; x < b.Max.X; x++ { + ycbr := img.YCbCrAt(x, y) + stride := img.Bounds().Dx() + alphaOffset := (y-b.Min.Y)*stride + (x - b.Min.X) + alpha := img.A[alphaOffset] + r, g, b := color.YCbCrToRGB(ycbr.Y, ycbr.Cb, ycbr.Cr) + m.Set(x, y, color.RGBA{R: r, G: g, B: b, A: alpha}) + } + } + return m +} + func convertToGray(img image.Image) *image.Gray { b := img.Bounds() m := image.NewGray(image.Rect(0, 0, b.Dx(), b.Dy())) @@ -413,18 +438,18 @@ func convertToSepia(img image.Image) *image.RGBA { return m } -func createImageDict(xRefTable *XRefTable, buf, softMask []byte, w, h, bpc int, format, cs string) (*types.StreamDict, int, int, error) { +func createImageStreamDict(xRefTable *XRefTable, buf, softMask []byte, w, h, bpc int, format, cs string) (*types.StreamDict, error) { var ( sd *types.StreamDict err error ) switch format { case "jpeg": - sd, err = CreateDCTImageObject(xRefTable, buf, w, h, bpc, cs) + sd, err = CreateDCTImageStreamDict(xRefTable, buf, w, h, bpc, cs) default: - sd, err = createFlateImageObject(xRefTable, buf, softMask, w, h, bpc, cs) + sd, err = CreateFlateImageStreamDict(xRefTable, buf, softMask, w, h, bpc, cs) } - return sd, w, h, err + return sd, err } func encodeJPEG(img image.Image) ([]byte, string, error) { @@ -457,13 +482,13 @@ func createImageBuf(xRefTable *XRefTable, img image.Image, format string) ([]byt var cs string - switch img.(type) { + switch img := img.(type) { case *image.RGBA: // A 32-bit alpha-premultiplied color, having 8 bits for each of red, green, blue and alpha. // An alpha-premultiplied color component C has been scaled by alpha (A), so it has valid values 0 <= C <= A. cs = DeviceRGBCS bpc = 8 - buf = writeRGBAImageBuf(img) + buf, sm = writeRGBAImageBuf(img) case *image.RGBA64: // A 64-bit alpha-premultiplied color, having 16 bits for each of red, green, blue and alpha. @@ -511,16 +536,18 @@ func createImageBuf(xRefTable *XRefTable, img image.Image, format string) ([]byt case *image.YCbCr: cs = DeviceRGBCS bpc = 8 - buf = writeRGBAImageBuf(convertToRGBA(img)) + buf, sm = writeRGBAImageBuf(convertToRGBA(img)) case *image.NYCbCrA: - return buf, sm, bpc, cs, errors.New("pdfcpu: unsupported image type: NYCbCrA") + cs = DeviceRGBCS + bpc = 8 + buf, sm = writeRGBAImageBuf(convertNYCbCrAToRGBA(img)) case *image.Paletted: // In-memory image of uint8 indices into a given palette. cs = DeviceRGBCS bpc = 8 - buf = writeRGBAImageBuf(convertToRGBA(img)) + buf, sm = writeRGBAImageBuf(convertToRGBA(img)) default: return buf, sm, bpc, cs, errors.Errorf("pdfcpu: unsupported image type: %T", img) @@ -541,40 +568,142 @@ func colorSpaceForJPEGColorModel(cm color.Model) string { return "" } -func createDCTImageObjectForJPEG(xRefTable *XRefTable, c image.Config, bb bytes.Buffer) (*types.StreamDict, int, int, error) { +func createDCTImageStreamDictForJPEG(xRefTable *XRefTable, c image.Config, bb bytes.Buffer) (*types.StreamDict, error) { cs := colorSpaceForJPEGColorModel(c.ColorModel) if cs == "" { - return nil, 0, 0, errors.New("pdfcpu: unexpected color model for JPEG") + return nil, errors.New("pdfcpu: unexpected color model for JPEG") } - sd, err := CreateDCTImageObject(xRefTable, bb.Bytes(), c.Width, c.Height, 8, cs) + return CreateDCTImageStreamDict(xRefTable, bb.Bytes(), c.Width, c.Height, 8, cs) +} - return sd, c.Width, c.Height, err +func createImageResourcesForJPEG(xRefTable *XRefTable, c image.Config, bb bytes.Buffer) ([]ImageResource, error) { + sd, err := createDCTImageStreamDictForJPEG(xRefTable, c, bb) + if err != nil { + return nil, err + } + + indRef, err := xRefTable.IndRefForNewObject(*sd) + if err != nil { + return nil, err + } + + res := Resource{ID: "Im0", IndRef: indRef} + ir := ImageResource{Res: res, Width: c.Width, Height: c.Height} + return []ImageResource{ir}, err } -// CreateImageStreamDict returns a stream dict for image data represented by r and applies optional filters. -func CreateImageStreamDict(xRefTable *XRefTable, r io.Reader, gray, sepia bool) (*types.StreamDict, int, int, error) { +func decodeImage(xRefTable *XRefTable, buf *bytes.Reader, currentOffset int64, gray, sepia bool, byteOrder binary.ByteOrder, imgResources *[]ImageResource) (int64, error) { + img, err := tiff.DecodeAt(buf, currentOffset) + if err != nil { + return 0, err + } - var bb bytes.Buffer - tee := io.TeeReader(r, &bb) + if gray { + switch img.(type) { + case *image.Gray, *image.Gray16: + default: + img = convertToGray(img) + } + } - var sniff bytes.Buffer - if _, err := io.Copy(&sniff, tee); err != nil { - return nil, 0, 0, err + if sepia { + switch img.(type) { + case *image.Gray, *image.Gray16: + default: + img = convertToSepia(img) + } } - c, format, err := image.DecodeConfig(&sniff) + imgBuf, softMask, bpc, cs, err := createImageBuf(xRefTable, img, "tiff") if err != nil { - return nil, 0, 0, err + return 0, err } - if format == "jpeg" && !gray && !sepia { - return createDCTImageObjectForJPEG(xRefTable, c, bb) + w, h := img.Bounds().Dx(), img.Bounds().Dy() + + sd, err := createImageStreamDict(xRefTable, imgBuf, softMask, w, h, bpc, "tiff", cs) + if err != nil { + return 0, err + } + + indRef, err := xRefTable.IndRefForNewObject(*sd) + if err != nil { + return 0, err + } + + res := Resource{ID: "Im0", IndRef: indRef} + ir := ImageResource{Res: res, Width: w, Height: h} + *imgResources = append(*imgResources, ir) + + if _, err := buf.Seek(currentOffset, io.SeekStart); err != nil { + return 0, err + } + + var numEntries uint16 + if err := binary.Read(buf, byteOrder, &numEntries); err != nil { + return 0, err + } + + if _, err := buf.Seek(int64(numEntries)*12, io.SeekCurrent); err != nil { + return 0, err + } + + var nextIFDOffset uint32 + if err := binary.Read(buf, byteOrder, &nextIFDOffset); err != nil { + return 0, err } + // if nextIFDOffset >= uint32(bb.Len()) { + // fmt.Println("Invalid next IFD offset, stopping.") + // break + // } + + return int64(nextIFDOffset), nil +} + +func createImageResourcesForTIFF(xRefTable *XRefTable, bb bytes.Buffer, gray, sepia bool) ([]ImageResource, error) { + imgResources := []ImageResource{} + + buf := bytes.NewReader(bb.Bytes()) + + var header [8]byte + if _, err := io.ReadFull(buf, header[:]); err != nil { + return nil, err + } + + var byteOrder binary.ByteOrder + if string(header[:2]) == "II" { + byteOrder = binary.LittleEndian + } else if string(header[:2]) == "MM" { + byteOrder = binary.BigEndian + } else { + return nil, fmt.Errorf("invalid TIFF byte order") + } + + firstIFDOffset := byteOrder.Uint32(header[4:]) + if firstIFDOffset < 8 || firstIFDOffset >= uint32(bb.Len()) { + return nil, fmt.Errorf("invalid TIFF file: no valid IFD") + } + + var err error + + off := int64(firstIFDOffset) + + for off != 0 && off < int64(bb.Len()) { + off, err = decodeImage(xRefTable, buf, off, gray, sepia, byteOrder, &imgResources) + if err != nil { + return nil, err + } + } + + return imgResources, nil +} + +func createImageResources(xRefTable *XRefTable, c image.Config, bb bytes.Buffer, gray, sepia bool) ([]ImageResource, error) { img, format, err := image.Decode(&bb) if err != nil { - return nil, 0, 0, err + return nil, err } if gray { @@ -593,19 +722,107 @@ func CreateImageStreamDict(xRefTable *XRefTable, r io.Reader, gray, sepia bool) } } + imgBuf, softMask, bpc, cs, err := createImageBuf(xRefTable, img, format) + if err != nil { + return nil, err + } + + w, h := img.Bounds().Dx(), img.Bounds().Dy() + if w != c.Width || h != c.Height { + return nil, errors.New("pdfcpu: unexpected width or height") + } + + sd, err := createImageStreamDict(xRefTable, imgBuf, softMask, w, h, bpc, format, cs) + if err != nil { + return nil, err + } + + indRef, err := xRefTable.IndRefForNewObject(*sd) + if err != nil { + return nil, err + } + + res := Resource{ID: "Im0", IndRef: indRef} + ir := ImageResource{Res: res, Width: w, Height: h} + return []ImageResource{ir}, err +} + +// CreateImageResources creates a new XObject for given image data represented by r and applies optional filters. +func CreateImageResources(xRefTable *XRefTable, r io.Reader, gray, sepia bool) ([]ImageResource, error) { + + var bb bytes.Buffer + tee := io.TeeReader(r, &bb) + + var sniff bytes.Buffer + if _, err := io.Copy(&sniff, tee); err != nil { + return nil, err + } + + c, format, err := image.DecodeConfig(&sniff) + if err != nil { + return nil, err + } + + if format == "tiff" { + return createImageResourcesForTIFF(xRefTable, bb, gray, sepia) + } + + if format == "jpeg" && !gray && !sepia { + return createImageResourcesForJPEG(xRefTable, c, bb) + } + + return createImageResources(xRefTable, c, bb, gray, sepia) +} + +// CreateImageStreamDict returns a stream dict for image data represented by r and applies optional filters. +func CreateImageStreamDict(xRefTable *XRefTable, r io.Reader) (*types.StreamDict, int, int, error) { + + var bb bytes.Buffer + tee := io.TeeReader(r, &bb) + + var sniff bytes.Buffer + if _, err := io.Copy(&sniff, tee); err != nil { + return nil, 0, 0, err + } + + c, format, err := image.DecodeConfig(&sniff) + if err != nil { + return nil, 0, 0, err + } + + if format == "jpeg" { + sd, err := createDCTImageStreamDictForJPEG(xRefTable, c, bb) + if err != nil { + return nil, 0, 0, err + } + return sd, c.Width, c.Height, nil + } + + img, format, err := image.Decode(&bb) + if err != nil { + return nil, 0, 0, err + } + imgBuf, softMask, bpc, cs, err := createImageBuf(xRefTable, img, format) if err != nil { return nil, 0, 0, err } w, h := img.Bounds().Dx(), img.Bounds().Dy() + if w != c.Width || h != c.Height { + return nil, 0, 0, errors.New("pdfcpu: unexpected width or height") + } - return createImageDict(xRefTable, imgBuf, softMask, w, h, bpc, format, cs) + sd, err := createImageStreamDict(xRefTable, imgBuf, softMask, w, h, bpc, format, cs) + if err != nil { + return nil, 0, 0, err + } + return sd, c.Width, c.Height, nil } // CreateImageResource creates a new XObject for given image data represented by r and applies optional filters. -func CreateImageResource(xRefTable *XRefTable, r io.Reader, gray, sepia bool) (*types.IndirectRef, int, int, error) { - sd, w, h, err := CreateImageStreamDict(xRefTable, r, gray, sepia) +func CreateImageResource(xRefTable *XRefTable, r io.Reader) (*types.IndirectRef, int, int, error) { + sd, w, h, err := CreateImageStreamDict(xRefTable, r) if err != nil { return nil, 0, 0, err } diff --git a/pkg/pdfcpu/model/message.go b/pkg/pdfcpu/model/message.go new file mode 100644 index 00000000..b427caf2 --- /dev/null +++ b/pkg/pdfcpu/model/message.go @@ -0,0 +1,61 @@ +/* +Copyright 2024 The pdfcpu Authors. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package model + +import ( + "fmt" + + "github.com/angel-one/pdfcpu/pkg/log" +) + +func ShowMsg(msg string) { + s := "pdfcpu " + msg + if log.DebugEnabled() { + log.Debug.Println(s) + } + if log.ReadEnabled() { + log.Read.Println(s) + } + if log.ValidateEnabled() { + log.Validate.Println(s) + } + if log.CLIEnabled() { + log.CLI.Println(s) + } +} + +func ShowMsgTopic(topic, msg string) { + msg = topic + ": " + msg + ShowMsg(msg) +} + +func ShowRepaired(msg string) { + ShowMsgTopic("repaired", msg) +} + +func ShowSkipped(msg string) { + ShowMsgTopic("skipped", msg) +} + +func ShowDigestedSpecViolation(msg string) { + ShowMsgTopic("digested", msg) +} + +func ShowDigestedSpecViolationError(xRefTable *XRefTable, err error) { + msg := fmt.Sprintf("spec violation around obj#(%d): %v\n", xRefTable.CurObj, err) + ShowMsgTopic("digested", msg) +} diff --git a/pkg/pdfcpu/model/metadata.go b/pkg/pdfcpu/model/metadata.go new file mode 100644 index 00000000..aae62cb0 --- /dev/null +++ b/pkg/pdfcpu/model/metadata.go @@ -0,0 +1,155 @@ +/* +Copyright 2024 The pdfcpu Authors. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package model + +import ( + "encoding/xml" + "strings" + "time" +) + +type UserDate time.Time + +const userDateFormatNoTimeZone = "2006-01-02T15:04:05Z" +const userDateFormatNegTimeZone = "2006-01-02T15:04:05-07:00" +const userDateFormatPosTimeZone = "2006-01-02T15:04:05+07:00" + +func (ud *UserDate) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error { + dateString := "" + err := d.DecodeElement(&dateString, &start) + if err != nil { + return err + } + dat, err := time.Parse(userDateFormatNoTimeZone, dateString) + if err == nil { + *ud = UserDate(dat) + return nil + } + dat, err = time.Parse(userDateFormatPosTimeZone, dateString) + if err == nil { + *ud = UserDate(dat) + return nil + } + dat, err = time.Parse(userDateFormatNegTimeZone, dateString) + if err == nil { + *ud = UserDate(dat) + return nil + } + return err +} + +type Alt struct { + //XMLName xml.Name `xml:"http://www.w3.org/1999/02/22-rdf-syntax-ns# Alt"` + Entries []string `xml:"http://www.w3.org/1999/02/22-rdf-syntax-ns# li"` +} + +type Seq struct { + //XMLName xml.Name `xml:"http://www.w3.org/1999/02/22-rdf-syntax-ns# Seq"` + Entries []string `xml:"http://www.w3.org/1999/02/22-rdf-syntax-ns# li"` +} + +type Title struct { + //XMLName xml.Name `xml:"http://purl.org/dc/elements/1.1/ title"` + Alt Alt `xml:"http://www.w3.org/1999/02/22-rdf-syntax-ns# Alt"` +} + +type Desc struct { + //XMLName xml.Name `xml:"http://purl.org/dc/elements/1.1/ description"` + Alt Alt `xml:"http://www.w3.org/1999/02/22-rdf-syntax-ns# Alt"` +} + +type Creator struct { + //XMLName xml.Name `xml:"http://purl.org/dc/elements/1.1/ creator"` + Seq Seq `xml:"http://www.w3.org/1999/02/22-rdf-syntax-ns# Seq"` +} + +type Description struct { + //XMLName xml.Name `xml:"http://www.w3.org/1999/02/22-rdf-syntax-ns# Description"` + Title Title `xml:"http://purl.org/dc/elements/1.1/ title"` + Author Creator `xml:"http://purl.org/dc/elements/1.1/ creator"` + Subject Desc `xml:"http://purl.org/dc/elements/1.1/ description"` + Creator string `xml:"http://ns.adobe.com/xap/1.0/ CreatorTool"` + CreationDate UserDate `xml:"http://ns.adobe.com/xap/1.0/ CreateDate"` + ModDate UserDate `xml:"http://ns.adobe.com/xap/1.0/ ModifyDate"` + Producer string `xml:"http://ns.adobe.com/pdf/1.3/ Producer"` + Trapped bool `xml:"http://ns.adobe.com/pdf/1.3/ Trapped"` + Keywords string `xml:"http://ns.adobe.com/pdf/1.3/ Keywords"` +} + +type RDF struct { + XMLName xml.Name `xml:"http://www.w3.org/1999/02/22-rdf-syntax-ns# RDF"` + Description Description +} + +type XMPMeta struct { + XMLName xml.Name `xml:"adobe:ns:meta/ xmpmeta"` + RDF RDF +} + +func removeTag(s, kw string) string { + kwLen := len(kw) + i := strings.Index(s, kw) + if i < 0 { + return "" + } + + j := i + kwLen + + i = strings.LastIndex(s[:i], "<") + if i < 0 { + return "" + } + + block1 := s[:i] + + s = s[j:] + i = strings.Index(s, kw) + if i < 0 { + return "" + } + + j = i + kwLen + + block2 := s[j:] + + s1 := block1 + block2 + + return s1 +} + +func RemoveKeywords(metadata *[]byte) error { + + // Opt for simple byte removal instead of xml de/encoding. + + s := string(*metadata) + if len(s) == 0 { + return nil + } + + s = removeTag(s, "Keywords>") + if len(s) == 0 { + return nil + } + + // Possible Acrobat bug. + // Acrobat seems to use dc:subject for keywords but ***does not*** show the content in Subject. + s = removeTag(s, "subject>") + + *metadata = []byte(s) + + return nil +} diff --git a/pkg/pdfcpu/model/nameTree.go b/pkg/pdfcpu/model/nameTree.go index 39c599b9..51b3d671 100644 --- a/pkg/pdfcpu/model/nameTree.go +++ b/pkg/pdfcpu/model/nameTree.go @@ -306,10 +306,6 @@ func (n *Node) Add(xRefTable *XRefTable, k string, v types.Object, m NameMap, na return n.HandleLeaf(xRefTable, k, v, m, nameRefDictKeys) } - if k == n.Kmin || k == n.Kmax { - return nil - } - if keyLess(k, n.Kmin) { n.Kmin = k } else if keyLess(n.Kmax, k) { diff --git a/pkg/pdfcpu/model/nup.go b/pkg/pdfcpu/model/nup.go index 1f7edddd..520d4683 100644 --- a/pkg/pdfcpu/model/nup.go +++ b/pkg/pdfcpu/model/nup.go @@ -72,7 +72,8 @@ type NUp struct { PageDim *types.Dim // Page dimensions in display unit. PageSize string // Paper size eg. A4L, A4P, A4(=default=A4P), see paperSize.go UserDim bool // true if one of dimensions or paperSize provided overriding the default. - Orient orientation // One of rd(=default),dr,ld,dl + Orient orientation // One of rd(=default),dr,ld,dl - grid orientation + Enforce bool // enforce best-fit orientation of individual content on grid. Grid *types.Dim // Intra page grid dimensions eg (2,2) PageGrid bool // Create a m x n grid of pages for PDF inputfiles only (think "extra page n-Up"). ImgInputFile bool // Process image or PDF input files. @@ -95,6 +96,7 @@ func DefaultNUpConfig() *NUp { Orient: RightDown, Margin: 3, Border: true, + Enforce: true, } } @@ -197,7 +199,7 @@ func createNUpFormForPDF(xRefTable *XRefTable, resDict *types.IndirectRef, conte } // NUpTilePDFBytesForPDF applies nup tiles to content bytes. -func NUpTilePDFBytes(wr io.Writer, rSrc, rDest *types.Rectangle, formResID string, nup *NUp, rotate, enforceOrient bool) { +func NUpTilePDFBytes(wr io.Writer, rSrc, rDest *types.Rectangle, formResID string, nup *NUp, rotate bool) { // rScr is a rectangular region represented by form formResID in form space. @@ -227,7 +229,7 @@ func NUpTilePDFBytes(wr io.Writer, rSrc, rDest *types.Rectangle, formResID strin // Best fit translation of a source rectangle into a destination rectangle. // For nup we enforce the dest orientation, // whereas in cases where the original orientation needs to be preserved eg. for booklets, we don't. - w, h, dx, dy, r := types.BestFitRectIntoRect(rSrc, rDestCr, enforceOrient, false) + w, h, dx, dy, r := types.BestFitRectIntoRect(rSrc, rDestCr, nup.Enforce, false) if nup.BgColor != nil { if nup.ImgInputFile { @@ -319,7 +321,7 @@ func (ctx *Context) NUpTilePDFBytesForPDF( } // Retrieve content stream bytes. - bb, err := ctx.PageContent(d) + bb, err := ctx.PageContent(d, pageNr) if err == ErrNoContent { // TODO render if has annotations. return nil @@ -358,7 +360,7 @@ func (ctx *Context) NUpTilePDFBytesForPDF( formsResDict.Insert(formResID, *formIndRef) // Append to content stream buf of destination page. - NUpTilePDFBytes(buf, cropBox, rDest, formResID, nup, rotate, true) + NUpTilePDFBytes(buf, cropBox, rDest, formResID, nup, rotate) return nil } diff --git a/pkg/pdfcpu/model/parse.go b/pkg/pdfcpu/model/parse.go index b6583ef2..03157291 100644 --- a/pkg/pdfcpu/model/parse.go +++ b/pkg/pdfcpu/model/parse.go @@ -18,6 +18,7 @@ package model import ( "context" + "fmt" "strconv" "strings" "unicode" @@ -33,6 +34,7 @@ var ( errArrayNotTerminated = errors.New("pdfcpu: parse: unterminated array") errDictionaryCorrupt = errors.New("pdfcpu: parse: corrupt dictionary") errDictionaryNotTerminated = errors.New("pdfcpu: parse: unterminated dictionary") + errDictionaryDuplicateKey = errors.New("pdfcpu: parse: duplicate key") errHexLiteralCorrupt = errors.New("pdfcpu: parse: corrupt hex literal") errHexLiteralNotTerminated = errors.New("pdfcpu: parse: hex literal not terminated") errNameObjectCorrupt = errors.New("pdfcpu: parse: corrupt name object") @@ -45,6 +47,8 @@ var ( errXrefStreamCorruptIndex = errors.New("pdfcpu: parse: xref stream dict corrupt entry Index") errObjStreamMissingN = errors.New("pdfcpu: parse: obj stream dict missing entry W") errObjStreamMissingFirst = errors.New("pdfcpu: parse: obj stream dict missing entry First") + + ErrCorruptObjectOffset = errors.New("pdfcpu: corrupt object offset") ) func positionToNextWhitespace(s string) (int, string) { @@ -74,15 +78,15 @@ func positionToNextWhitespaceOrChar(s, chars string) (int, string) { return -1, s } -func positionToNextEOL(s string) string { +func positionToNextEOL(s string) (string, int) { for i, c := range s { for _, m := range "\x0A\x0D" { if c == m { - return s[i:] + return s[i:], i } } } - return "" + return "", 0 } // trimLeftSpace trims leading whitespace and trailing comment. @@ -118,7 +122,7 @@ func trimLeftSpace(s string, relaxed bool) (string, bool) { break } // trim PDF comment (= '%' up to eol) - s = positionToNextEOL(s) + s, _ = positionToNextEOL(s) if log.ParseEnabled() { log.Parse.Printf("2 outstr: <%s>\n", s) } @@ -227,9 +231,38 @@ func delimiter(b byte) bool { return false } -// ParseObjectAttributes parses object number and generation of the next object for given string buffer. -func ParseObjectAttributes(line *string) (objectNumber *int, generationNumber *int, err error) { +func detectObj(s string) (string, string, error) { + i := strings.Index(s, "obj") + if i > 0 { + return s[:i], s[i+3:], nil + } + + i = strings.Index(s, "bj") + if i > 0 { + return s[:i], s[i+2:], nil + } + return "", "", errors.New("pdfcpu: ParseObjectAttributes: can't find \"obj\"") +} + +func cleanObjProlog(s string) (string, error) { + s, _ = trimLeftSpace(s, false) + if len(s) == 0 { + return "", errors.New("pdfcpu: ParseObjectAttributes: can't find object number") + } + + var b strings.Builder + for _, r := range s { + if r >= '0' && r <= '9' || r == ' ' { + b.WriteRune(r) + } + } + return b.String(), nil +} + +// ParseObjectAttributes parses object number and generation of the next object for given string buffer. +func ParseObjectAttributes(line *string) (*int, *int, error) { + // TODO always called twice ? if line == nil || len(*line) == 0 { return nil, nil, errors.New("pdfcpu: ParseObjectAttributes: buf not available") } @@ -238,58 +271,55 @@ func ParseObjectAttributes(line *string) (objectNumber *int, generationNumber *i log.Parse.Printf("ParseObjectAttributes: buf=<%s>\n", *line) } - l := *line - var remainder string - - i := strings.Index(l, "obj") - if i < 0 { - return nil, nil, errors.New("pdfcpu: ParseObjectAttributes: can't find \"obj\"") + l, remainder, err := detectObj(*line) + if err != nil { + return nil, nil, err } - remainder = l[i+len("obj"):] - l = l[:i] - // object number - l, _ = trimLeftSpace(l, false) - if len(l) == 0 { - return nil, nil, errors.New("pdfcpu: ParseObjectAttributes: can't find object number") + l, err = cleanObjProlog(l) + if err != nil { + return nil, nil, err } - i, _ = positionToNextWhitespaceOrChar(l, "%") - if i <= 0 { - return nil, nil, errors.New("pdfcpu: ParseObjectAttributes: can't find end of object number") + i, _ := positionToNextWhitespaceOrChar(l, "%") + s := l + if i > 0 { + s = l[:i] } - objNr, err := strconv.Atoi(l[:i]) + objNr, err := strconv.Atoi(strings.TrimSpace(s)) if err != nil { - return nil, nil, err + return nil, nil, ErrCorruptObjectOffset } // generation number - l = l[i:] - l, _ = trimLeftSpace(l, false) - if len(l) == 0 { - return nil, nil, errors.New("pdfcpu: ParseObjectAttributes: can't find generation number") - } + genNr := 0 - i, _ = positionToNextWhitespaceOrChar(l, "%") - if i <= 0 { - return nil, nil, errors.New("pdfcpu: ParseObjectAttributes: can't find end of generation number") - } + if i > 0 { - genNr, err := strconv.Atoi(l[:i]) - if err != nil { - return nil, nil, err - } + l = l[i:] + l, _ = trimLeftSpace(l, false) + if len(l) == 0 { + return nil, nil, errors.New("pdfcpu: ParseObjectAttributes: can't find generation number") + } - objectNumber = &objNr - generationNumber = &genNr + i, _ = positionToNextWhitespaceOrChar(l, "%") + if i <= 0 { + return nil, nil, errors.New("pdfcpu: ParseObjectAttributes: can't find end of generation number") + } + + genNr, err = strconv.Atoi(l[:i]) + if err != nil { + return nil, nil, err + } + } *line = remainder - return objectNumber, generationNumber, nil + return &objNr, &genNr, nil } func parseArray(c context.Context, line *string) (*types.Array, error) { @@ -387,10 +417,6 @@ func parseStringLiteral(line *string) (types.Object, error) { return nil, errBufNotAvailable } - if log.ParseEnabled() { - log.Parse.Printf("ParseObject: value = String Literal: <%s>\n", *line) - } - l := *line if log.ParseEnabled() { @@ -451,7 +477,10 @@ func parseHexLiteral(line *string) (types.Object, error) { hexStr, ok := hexString(strings.TrimSpace(l[:eov])) if !ok { - return nil, errHexLiteralCorrupt + // Skip junk + *line = forwardParseBuf(l[eov:], 1) + return nil, nil + //return nil, errHexLiteralCorrupt } // position behind '>' @@ -510,18 +539,19 @@ func parseName(line *string) (*types.Name, error) { return &nameObj, nil } -func insertKey(d types.Dict, key string, val types.Object) error { +func insertKey(d types.Dict, key string, val types.Object, relaxed bool) error { if _, found := d[key]; !found { d[key] = val } else { - // for now we digest duplicate keys. - // TODO - // if !validationRelaxed { + + // was: for now we ignore duplicate keys - config flag ? + + // if !relaxed { // return errDictionaryDuplicateKey // } - // if log.CLIEnabled() { - // log.CLI.Printf("ParseDict: digesting duplicate key\n") - // } + + d[key] = val + ShowDigestedSpecViolation(fmt.Sprintf("duplicate key \"%s\"", key)) } if log.ParseEnabled() { @@ -531,12 +561,16 @@ func insertKey(d types.Dict, key string, val types.Object) error { return nil } +func dictString(l string) bool { + return len(l) > 0 && !strings.HasPrefix(l, ">>") +} + func processDictKeys(c context.Context, line *string, relaxed bool) (types.Dict, error) { l := *line var eol bool d := types.NewDict() - for !strings.HasPrefix(l, ">>") { + for dictString(l) { if err := c.Err(); err != nil { return nil, err @@ -544,7 +578,11 @@ func processDictKeys(c context.Context, line *string, relaxed bool) (types.Dict, keyName, err := parseName(&l) if err != nil { - return nil, err + if !relaxed { + return nil, err + } + // Skip junk. + l = forwardParseBuf(l, 1) } if log.ParseEnabled() { @@ -554,10 +592,12 @@ func processDictKeys(c context.Context, line *string, relaxed bool) (types.Dict, // Position to first non whitespace after key. l, eol = trimLeftSpace(l, relaxed) + if err != nil && relaxed { + // Skip junk. + continue + } + if len(l) == 0 { - if log.ParseEnabled() { - log.Parse.Println("ParseDict: only whitespace after key") - } // Only whitespace after key. return nil, errDictionaryNotTerminated } @@ -576,13 +616,13 @@ func processDictKeys(c context.Context, line *string, relaxed bool) (types.Dict, // Specifying the null object as the value of a dictionary entry (7.3.7, "Dictionary Objects") // shall be equivalent to omitting the entry entirely. if val != nil { - if err := insertKey(d, string(*keyName), val); err != nil { + if err := insertKey(d, string(*keyName), val, relaxed); err != nil { return nil, err } } // We are positioned on the char behind the last parsed dict value. - if len(l) == 0 { + if len(l) < 2 { return nil, errDictionaryNotTerminated } @@ -665,9 +705,10 @@ func startParseNumericOrIndRef(l string) (string, string, int) { 0.000000000 */ if len(str) > 1 && str[0] == '0' { - if str[1] == '+' || str[1] == '-' { + switch str[1] { + case '+', '-': str = str[1:] - } else if str[1] == '.' { + case '.': var i int for i = 2; len(str) > i && str[i] == '0'; i++ { } @@ -688,8 +729,7 @@ func isRangeError(err error) bool { return false } -func parseIndRef(s, l, l1 string, line *string, i, i2 int, rangeErr bool) (types.Object, error) { - +func parseIndRef(s, l, l1 string, line *string, i, i2 int) (types.Object, error) { g, err := strconv.Atoi(s) if err != nil { // 2nd int(generation number) not available. @@ -705,9 +745,6 @@ func parseIndRef(s, l, l1 string, line *string, i, i2 int, rangeErr bool) (types l, _ = trimLeftSpace(l, false) if len(l) == 0 { - if rangeErr { - return nil, err - } // only whitespace *line = l1 return types.Integer(i), nil @@ -715,17 +752,10 @@ func parseIndRef(s, l, l1 string, line *string, i, i2 int, rangeErr bool) (types if l[0] == 'R' { *line = forwardParseBuf(l, 1) - if rangeErr { - return nil, nil - } // We have all 3 components to create an indirect reference. return *types.NewIndirectRef(i, g), nil } - if rangeErr { - return nil, err - } - // 'R' not available. // Can't be an indirect reference. if log.ParseEnabled() { @@ -737,15 +767,27 @@ func parseIndRef(s, l, l1 string, line *string, i, i2 int, rangeErr bool) (types } func parseFloat(s string) (types.Object, error) { - f, err := strconv.ParseFloat(s, 64) - if err != nil { - return nil, err + // Replace ',' with '.' to accept comma as decimal separator + s = strings.Replace(s, ",", ".", 1) + + f, n := strconv.ParseFloat(s, 64) + if n != nil { + // Fallback: handle ".-" case (e.g., ".-5") + s = strings.Replace(s, ".-", ".", 1) + f, err := strconv.ParseFloat(s, 64) + if err != nil { + // Skip junk + return nil, nil + } + if log.ParseEnabled() { + log.Parse.Printf("parseFloat: value is: %f\n", f) + } + return types.Float(f), nil } if log.ParseEnabled() { log.Parse.Printf("parseFloat: value is: %f\n", f) } - return types.Float(f), nil } @@ -762,29 +804,22 @@ func parseNumericOrIndRef(line *string) (types.Object, error) { s, l1, i1 := startParseNumericOrIndRef(l) // Try int - var rangeErr bool i, err := strconv.Atoi(s) if err != nil { - rangeErr = isRangeError(err) - if !rangeErr { - // Try float + if isRangeError(err) { + // #407 + i = 0 *line = l1 - return parseFloat(s) + return types.Integer(i), nil } - - // #407 - i = 0 + *line = l1 + return parseFloat(s) } // We have an Int! // if not followed by whitespace return sole integer value. if i1 <= 0 || delimiter(l[i1]) { - - if rangeErr { - return nil, err - } - if log.ParseEnabled() { log.Parse.Printf("parseNumericOrIndRef: value is numeric int: %d\n", i) } @@ -799,9 +834,6 @@ func parseNumericOrIndRef(line *string) (types.Object, error) { l, _ = trimLeftSpace(l, false) if len(l) == 0 { // only whitespace - if rangeErr { - return nil, err - } *line = l1 return types.Integer(i), nil } @@ -811,9 +843,6 @@ func parseNumericOrIndRef(line *string) (types.Object, error) { // if only 2 token, can't be indirect reference. // if not followed by whitespace return sole integer value. if i2 <= 0 || delimiter(l[i2]) { - if rangeErr { - return nil, err - } if log.ParseEnabled() { log.Parse.Printf("parseNumericOrIndRef: 2 objects => value is numeric int: %d\n", i) } @@ -826,7 +855,7 @@ func parseNumericOrIndRef(line *string) (types.Object, error) { s = l[:i2] } - return parseIndRef(s, l, l1, line, i, i2, rangeErr) + return parseIndRef(s, l, l1, line, i, i2) } func parseHexLiteralOrDict(c context.Context, l *string) (val types.Object, err error) { @@ -862,9 +891,16 @@ func parseHexLiteralOrDict(c context.Context, l *string) (val types.Object, err return val, nil } -func parseBooleanOrNull(l string) (val types.Object, s string, ok bool) { +func parseBooleanOrNull(l string) (types.Object, string, bool) { + + if len(l) < 4 { + return nil, "", false + } + + s := strings.ToLower(l[:4]) + // null, absent object - if strings.HasPrefix(l, "null") { + if strings.HasPrefix(s, "null") { if log.ParseEnabled() { log.Parse.Println("parseBoolean: value = null") } @@ -872,15 +908,21 @@ func parseBooleanOrNull(l string) (val types.Object, s string, ok bool) { } // boolean true - if strings.HasPrefix(l, "true") { + if strings.HasPrefix(s, "true") { if log.ParseEnabled() { log.Parse.Println("parseBoolean: value = true") } return types.Boolean(true), "true", true } + if len(l) < 5 { + return nil, "", false + } + + s += strings.ToLower(l[4:5]) + // boolean false - if strings.HasPrefix(l, "false") { + if strings.HasPrefix(s, "false") { if log.ParseEnabled() { log.Parse.Println("parseBoolean: value = false") } @@ -1098,3 +1140,254 @@ func ObjectStreamDict(sd *types.StreamDict) (*types.ObjectStreamDict, error) { return &osd, nil } + +func isMarkerTerminated(r rune) bool { + return r == 0x00 || unicode.IsSpace(r) +} + +func detectMarker(line, marker string) int { + i := strings.Index(line, marker) + if i < 0 { + return i + } + if i+len(marker) >= len(line) { + return -1 + } + off := i + len(marker) + ind := i + for !isMarkerTerminated(rune(line[off])) { + line = line[off:] + if marker == "endobj" { + j := strings.Index(line, "xref") + if j >= 0 { + r := rune(line[j+4]) + if isMarkerTerminated(r) { + return ind + } + } + } + i = strings.Index(line, marker) + if i < 0 { + return -1 + } + if i+len(marker) >= len(line) { + return -1 + } + off = i + len(marker) + ind += off + } + + return ind +} + +func detectMarkers(line string, endInd, streamInd *int) { + //fmt.Printf("buflen=%d\n%s", len(line), hex.Dump([]byte(line))) + if *endInd == 0 { + *endInd = detectMarker(line, "endobj") + + } + if *streamInd == 0 { + *streamInd = detectMarker(line, "stream") + } +} + +func positionAfterStringLiteral(line string) (string, int, error) { + i := balancedParenthesesPrefix(line) + if i < 0 { + return "", 0, errStringLiteralCorrupt + } + + line = forwardParseBuf(line[i:], 1) + + return line, i + 1, nil +} + +func posFloor(pos1, pos2 int) int { + if pos1 < 0 { + return pos2 + } + if pos1 < pos2 { + return pos1 + } + if pos2 < 0 { + return pos1 + } + return pos2 +} + +func detectNonEscaped(line, s string) int { + var ind int + for { + i := strings.Index(line, s) + if i < 0 { + // did not find s + return -1 + } + if i == 0 { + // found s at pos 0 + return ind + } + if line[i-1] != 0x5c { + // found s at pos i + return ind + i + } + // found escaped s + if i == len(line)-1 { + // last is escaped s -> did not find s + return -1 + } + // moving on after escaped s + line = line[i+1:] + ind += i + 1 + } +} + +func applyOffBoth(endInd, streamInd, off int) (int, int, error) { + if endInd >= 0 { + endInd += off + } + if streamInd >= 0 { + streamInd += off + } + return endInd, streamInd, nil +} + +func applyOffEndIndFirst(endInd, streamInd, off, floor int) (int, int, error) { + endInd += off + if streamInd > 0 { + if streamInd > floor { + // stream after any ( or % to skip + streamInd = -1 + } else { + streamInd += off + } + } + return endInd, streamInd, nil +} + +func applyOffStreamIndFirst(endInd, streamInd, off, floor int) (int, int, error) { + streamInd += off + if endInd > 0 { + if endInd > floor { + // endobj after any ( or % to skip + endInd = -1 + } else { + endInd += off + } + } + return endInd, streamInd, nil +} + +func isComment(commentPos, strLitPos int) bool { + return commentPos >= 0 && (strLitPos < 0 || commentPos < strLitPos) +} + +func DetectKeywords(line string) (endInd int, streamInd int, err error) { + return DetectKeywordsWithContext(context.Background(), line) +} + +func skipComment(line string, commentPos int, off, endInd, streamInd *int) string { + l, i := positionToNextEOL(line[commentPos:]) + if l == "" { + return l + } + delta := commentPos + i + *off += delta + + // Adjust found positions for changed line. + if *endInd > delta { + *endInd -= delta + } else if *endInd != -1 { + *endInd = 0 + } + if *streamInd > delta { + *streamInd -= delta + } else if *streamInd != -1 { + *streamInd = 0 + } + return l +} + +func skipStringLit(line string, strLitPos int, off, endInd, streamInd *int) (string, error) { + l, i, err := positionAfterStringLiteral(line[strLitPos:]) + if err != nil { + return "", err + } + delta := strLitPos + i + *off += delta + // Adjust found positions for changed line. + if *endInd > delta { + *endInd -= delta + } else if *endInd != -1 { + *endInd = 0 + } + if *streamInd > delta { + *streamInd -= delta + } else if *streamInd != -1 { + *streamInd = 0 + } + return l, nil +} + +func skipCommentOrStringLiteral(line string, commentPos, slPos int, off, endInd, streamInd *int) (string, error) { + if isComment(commentPos, slPos) { + // skip comment if % before any ( + line = skipComment(line, commentPos, off, endInd, streamInd) + if line == "" { + return "", nil + } + return line, nil + } + return skipStringLit(line, slPos, off, endInd, streamInd) +} + +func DetectKeywordsWithContext(c context.Context, line string) (endInd int, streamInd int, err error) { + // return endInd or streamInd which ever first encountered. + off := 0 + strLitPos, commentPos := 0, 0 + for { + if err := c.Err(); err != nil { + return -1, -1, err + } + + detectMarkers(line, &endInd, &streamInd) + + if off == 0 && endInd < 0 && streamInd < 0 { + return -1, -1, nil + } + + // Don't re-search in partial line if known to be not present. + if strLitPos != -1 { + strLitPos = detectNonEscaped(line, "(") + } + if commentPos != -1 { + commentPos = detectNonEscaped(line, "%") + } + + if strLitPos < 0 && commentPos < 0 { + // neither ( nor % to skip + return applyOffBoth(endInd, streamInd, off) + } + + floor := posFloor(strLitPos, commentPos) + + if endInd > 0 { + if endInd < floor { + // endobj before any ( or % to skip + return applyOffEndIndFirst(endInd, streamInd, off, floor) + } + } + + if streamInd > 0 { + if streamInd < floor { + // stream before any ( or % to skip + return applyOffStreamIndFirst(endInd, streamInd, off, floor) + } + } + + line, err = skipCommentOrStringLiteral(line, commentPos, strLitPos, &off, &endInd, &streamInd) + if err != nil { + return -1, -1, err + } + } +} diff --git a/pkg/pdfcpu/model/parseConfig.go b/pkg/pdfcpu/model/parseConfig.go index f0397bf2..44571d46 100644 --- a/pkg/pdfcpu/model/parseConfig.go +++ b/pkg/pdfcpu/model/parseConfig.go @@ -22,6 +22,7 @@ package model import ( "bytes" "io" + "strings" "github.com/angel-one/pdfcpu/pkg/pdfcpu/types" "github.com/pkg/errors" @@ -29,6 +30,8 @@ import ( ) type configuration struct { + CreationDate string `yaml:"created"` + Version string `yaml:"version"` CheckFileNameExt bool `yaml:"checkFileNameExt"` Reader15 bool `yaml:"reader15"` DecodeAllStreams bool `yaml:"decodeAllStreams"` @@ -41,18 +44,27 @@ type configuration struct { EncryptKeyLength int `yaml:"encryptKeyLength"` Permissions int `yaml:"permissions"` Unit string `yaml:"unit"` - Units string `yaml:"units"` // Be flexible if version < v0.3.8 TimestampFormat string `yaml:"timestampFormat"` DateFormat string `yaml:"dateFormat"` + Optimize bool `yaml:"optimize"` + OptimizeBeforeWriting bool `yaml:"optimizeBeforeWriting"` + OptimizeResourceDicts bool `yaml:"optimizeResourceDicts"` OptimizeDuplicateContentStreams bool `yaml:"optimizeDuplicateContentStreams"` CreateBookmarks bool `yaml:"createBookmarks"` NeedAppearances bool `yaml:"needAppearances"` + Offline bool `yaml:"offline"` + Timeout int `yaml:"timeout"` + TimeoutCRL int `yaml:"timeoutCRL"` + TimeoutOCSP int `yaml:"timeoutOCSP"` + PreferredCertRevocationChecker string `yaml:"preferredCertRevocationChecker"` } func loadedConfig(c configuration, configPath string) *Configuration { var conf Configuration conf.Path = configPath + conf.CreationDate = c.CreationDate + conf.Version = c.Version conf.CheckFileNameExt = c.CheckFileNameExt conf.Reader15 = c.Reader15 conf.DecodeAllStreams = c.DecodeAllStreams @@ -93,9 +105,26 @@ func loadedConfig(c configuration, configPath string) *Configuration { conf.TimestampFormat = c.TimestampFormat conf.DateFormat = c.DateFormat + conf.Optimize = c.Optimize + + // TODO add to config.yml + conf.OptimizeBeforeWriting = true + + conf.OptimizeResourceDicts = c.OptimizeResourceDicts conf.OptimizeDuplicateContentStreams = c.OptimizeDuplicateContentStreams conf.CreateBookmarks = c.CreateBookmarks conf.NeedAppearances = c.NeedAppearances + conf.Offline = c.Offline + conf.Timeout = c.Timeout + conf.TimeoutCRL = c.TimeoutCRL + conf.TimeoutOCSP = c.TimeoutOCSP + + switch strings.ToLower(c.PreferredCertRevocationChecker) { + case "crl": + conf.PreferredCertRevocationChecker = CRL + case "ocsp": + conf.PreferredCertRevocationChecker = OCSP + } return &conf } @@ -118,15 +147,11 @@ func parseConfigFile(r io.Reader, configPath string) error { if !types.MemberOf(c.ValidationMode, []string{"ValidationStrict", "ValidationRelaxed"}) { return errors.Errorf("invalid validationMode: %s", c.ValidationMode) } + if !types.MemberOf(c.Eol, []string{"EolLF", "EolCR", "EolCRLF"}) { return errors.Errorf("invalid eol: %s", c.Eol) } - if c.Unit == "" { - // v0.3.8 modifies "units" to "unit". - if c.Units != "" { - c.Unit = c.Units - } - } + if !types.MemberOf(c.Unit, []string{"points", "inches", "cm", "mm"}) { return errors.Errorf("invalid unit: %s", c.Unit) } @@ -135,6 +160,13 @@ func parseConfigFile(r io.Reader, configPath string) error { return errors.Errorf("encryptKeyLength possible values: 40, 128, 256, got: %s", c.Unit) } + if !types.MemberOf(c.PreferredCertRevocationChecker, []string{"crl", "ocsp"}) { + if c.PreferredCertRevocationChecker != "" { + return errors.Errorf("invalid preferred certificate revocation checker: %s", c.PreferredCertRevocationChecker) + } + c.PreferredCertRevocationChecker = "crl" + } + loadedDefaultConfig = loadedConfig(c, configPath) return nil diff --git a/pkg/pdfcpu/model/parseConfig_js.go b/pkg/pdfcpu/model/parseConfig_js.go index 26a4387d..59184a86 100644 --- a/pkg/pdfcpu/model/parseConfig_js.go +++ b/pkg/pdfcpu/model/parseConfig_js.go @@ -28,6 +28,16 @@ import ( // This gets rid of the gopkg.in/yaml.v2 dependency for wasm builds. +func handleCreationDate(v string, c *Configuration) error { + c.CreationDate = v + return nil +} + +func handleVersion(v string, c *Configuration) error { + c.Version = v + return nil +} + func handleCheckFileNameExt(k, v string, c *Configuration) error { v = strings.ToLower(v) if v != "true" && v != "false" { @@ -131,6 +141,33 @@ func handleConfEncryptKeyLength(v string, c *Configuration) error { return nil } +func handleTimeout(v string, c *Configuration) error { + i, err := strconv.Atoi(v) + if err != nil { + return errors.Errorf("timeout is numeric > 0, got: %s", v) + } + c.Timeout = i + return nil +} + +func handleTimeoutCRL(v string, c *Configuration) error { + i, err := strconv.Atoi(v) + if err != nil { + return errors.Errorf("timeoutCRL is numeric > 0, got: %s", v) + } + c.TimeoutCRL = i + return nil +} + +func handleTimeoutOCSP(v string, c *Configuration) error { + i, err := strconv.Atoi(v) + if err != nil { + return errors.Errorf("timeoutOCSP is numeric > 0, got: %s", v) + } + c.TimeoutOCSP = i + return nil +} + func handleConfPermissions(v string, c *Configuration) error { i, err := strconv.Atoi(v) if err != nil { @@ -157,6 +194,21 @@ func handleConfUnit(v string, c *Configuration) error { return nil } +func handlePreferredCertRevocationChecker(v string, c *Configuration) error { + v1 := strings.ToLower(v) + switch v1 { + case "crl": + c.PreferredCertRevocationChecker = CRL + case "ocsp": + c.PreferredCertRevocationChecker = OCSP + case "": + c.PreferredCertRevocationChecker = CRL + default: + return errors.Errorf("invalid preferredCertRevocationChecker: %s", v) + } + return nil +} + func handleTimestampFormat(v string, c *Configuration) error { c.TimestampFormat = v return nil @@ -167,36 +219,23 @@ func handleDateFormat(v string, c *Configuration) error { return nil } -func handleOptimizeDuplicateContentStreams(k, v string, c *Configuration) error { - v = strings.ToLower(v) - if v != "true" && v != "false" { - return errors.Errorf("config key %s is boolean", k) - } - c.OptimizeDuplicateContentStreams = v == "true" - return nil -} - -func handleCreateBookmarks(k, v string, c *Configuration) error { - v = strings.ToLower(v) - if v != "true" && v != "false" { - return errors.Errorf("config key %s is boolean", k) - } - c.CreateBookmarks = v == "true" - return nil -} - -func handleNeedAppearances(k, v string, c *Configuration) error { +func boolean(k, v string) (bool, error) { v = strings.ToLower(v) if v != "true" && v != "false" { - return errors.Errorf("config key %s is boolean", k) + return false, errors.Errorf("config key %s is boolean", k) } - c.NeedAppearances = v == "true" - return nil + return v == "true", nil } func parseKeysPart1(k, v string, c *Configuration) (bool, error) { switch k { + case "created": + return true, handleCreationDate(v, c) + + case "version": + return true, handleVersion(v, c) + case "checkFileNameExt": return true, handleCheckFileNameExt(k, v, c) @@ -225,38 +264,66 @@ func parseKeysPart1(k, v string, c *Configuration) (bool, error) { return false, nil } -func parseKeysPart2(k, v string, c *Configuration) error { +func parseKeysPart2(k, v string, c *Configuration) (bool, error) { switch k { case "encryptUsingAES": - return handleConfEncryptUsingAES(k, v, c) + return true, handleConfEncryptUsingAES(k, v, c) case "encryptKeyLength": - return handleConfEncryptKeyLength(v, c) + return true, handleConfEncryptKeyLength(v, c) case "permissions": - return handleConfPermissions(v, c) + return true, handleConfPermissions(v, c) case "unit", "units": - return handleConfUnit(v, c) + return true, handleConfUnit(v, c) case "timestampFormat": - return handleTimestampFormat(v, c) + return true, handleTimestampFormat(v, c) case "dateFormat": - return handleDateFormat(v, c) + return true, handleDateFormat(v, c) + + case "timeout": + return true, handleTimeout(v, c) + + case "timeoutCRL": + return true, handleTimeoutCRL(v, c) + + case "timeoutOCSP": + return true, handleTimeoutOCSP(v, c) + + case "preferredCertRevocationChecker": + return true, handlePreferredCertRevocationChecker(v, c) + } + + return false, nil +} + +func parseKeysPart3(k, v string, c *Configuration) (err error) { + switch k { + + case "optimize": + c.Optimize, err = boolean(k, v) + + case "optimizeResourceDicts": + c.OptimizeResourceDicts, err = boolean(k, v) case "optimizeDuplicateContentStreams": - return handleOptimizeDuplicateContentStreams(k, v, c) + c.OptimizeDuplicateContentStreams, err = boolean(k, v) case "createBookmarks": - return handleCreateBookmarks(k, v, c) + c.CreateBookmarks, err = boolean(k, v) case "needAppearances": - return handleNeedAppearances(k, v, c) + c.NeedAppearances, err = boolean(k, v) + + case "offline": + c.Offline, err = boolean(k, v) } - return nil + return err } func parseKeyValue(k, v string, c *Configuration) error { @@ -267,7 +334,16 @@ func parseKeyValue(k, v string, c *Configuration) error { if ok { return nil } - return parseKeysPart2(k, v, c) + + ok, err = parseKeysPart2(k, v, c) + if err != nil { + return err + } + if ok { + return nil + } + + return parseKeysPart3(k, v, c) } func parseConfigFile(r io.Reader, configPath string) error { @@ -275,6 +351,9 @@ func parseConfigFile(r io.Reader, configPath string) error { var conf Configuration conf.Path = configPath + // TODO add to config.yml + conf.OptimizeBeforeWriting = true + s := bufio.NewScanner(r) for s.Scan() { t := s.Text() diff --git a/pkg/pdfcpu/model/parseContent.go b/pkg/pdfcpu/model/parseContent.go index 7a5fa33a..f4626894 100644 --- a/pkg/pdfcpu/model/parseContent.go +++ b/pkg/pdfcpu/model/parseContent.go @@ -48,23 +48,31 @@ func skipDict(l *string) error { return errDictionaryCorrupt } if s[i] == '<' { - j++ + if i == len(s)-1 { + return errDictionaryCorrupt + } + if s[i+1] == '<' { + j++ + s = s[i+2:] + continue + } s = s[i+1:] continue } if s[i] == '>' { - if j > 0 { - j-- - s = s[i+1:] - continue - } - // >> ? - s = s[i:] - if !strings.HasPrefix(s, ">>") { + if i == len(s)-1 { return errDictionaryCorrupt } - *l = s[2:] - break + if s[i+1] == '>' { + if j > 0 { + j-- + s = s[i+2:] + continue + } + *l = s[i+2:] + break + } + s = s[i+1:] } } return nil @@ -75,9 +83,17 @@ func skipStringLiteral(l *string) error { i := 0 for { i = strings.IndexByte(s, byte(')')) - if i <= 0 || i > 0 && s[i-1] != '\\' || i > 1 && s[i-2] == '\\' { + if i <= 0 || i > 0 && s[i-1] != '\\' { + break + } + k := 0 + for j := i - 1; j >= 0 && s[j] == '\\'; j-- { + k++ + } + if k%2 == 0 { break } + // Skip \) s = s[i+1:] } if i < 0 { @@ -128,21 +144,70 @@ func skipTJ(l *string) error { return nil } +func lookupEI(l *string) (int, error) { + s := *l + //fmt.Printf("\n%s\n", hex.Dump([]byte(s))) + for i := 2; i <= len(s)-2; i++ { + if s[i:i+2] != "EI" { + continue + } + j := i + 2 + ws := 0 + for j < len(s) && unicode.IsSpace(rune(s[j])) && ws < 2 { + j++ + ws++ + } + switch { + case j == len(s) && ws <= 2: + // "EI" at end or followed by 1–2 spaces till end + return i, nil + case ws >= 1 && ws <= 2 && j < len(s) && s[j] == 'Q': + // "EI" followed by 1–2 spaces, then 'Q' + return i, nil + case ws == 0 && j == len(s): + // suffix "EI" + return i, nil + } + } + return 0, errBIExpressionCorrupt +} + func skipBI(l *string, prn PageResourceNames) error { s := *l + //fmt.Printf("skipBI <%s>\n", s) for { s = strings.TrimLeftFunc(s, whitespaceOrEOL) - if strings.HasPrefix(s, "EI") && whitespaceOrEOL(rune(s[2])) { - s = s[2:] + if strings.HasPrefix(s, "ID") && whitespaceOrEOL(rune(s[2])) { + i, err := lookupEI(&s) + if err != nil { + return err + } + s = s[i+2:] break } - // TODO Check len(s) > 0 + if len(s) == 0 { + return errBIExpressionCorrupt + } if s[0] == '/' { s = s[1:] i, _ := positionToNextWhitespaceOrChar(s, "/") if i < 0 { return errBIExpressionCorrupt } + token := s[:i] + if token == "CS" || token == "ColorSpace" { + s = s[i:] + s, _ = trimLeftSpace(s, false) + s = s[1:] + i, _ = positionToNextWhitespaceOrChar(s, "/") + if i < 0 { + return errBIExpressionCorrupt + } + name := s[:i] + if !types.MemberOf(name, []string{"DeviceGray", "DeviceRGB", "DeviceCMYK", "Indexed", "G", "RGB", "CMYK", "I"}) { + prn["ColorSpace"][name] = true + } + } s = s[i:] continue } @@ -164,6 +229,12 @@ func positionToNextContentToken(line *string, prn PageResourceNames) (bool, erro // whitespace or eol only return true, nil } + if l[0] == '%' { + // Skip comment. + l, _ = positionToNextEOL(l) + continue + } + if l[0] == '[' { // Skip TJ expression: // [()...()] TJ @@ -200,12 +271,12 @@ func positionToNextContentToken(line *string, prn PageResourceNames) (bool, erro } } -func nextContentToken(line *string, prn PageResourceNames) (string, error) { +func nextContentToken(pre string, line *string, prn PageResourceNames) (string, error) { // A token is either a name or some chunk terminated by white space or one of /, (, [ if noBuf(line) { return "", nil } - l := *line + l := pre + *line t := "" //log.Parse.Printf("nextContentToken: start buf= <%s>\n", *line) @@ -258,9 +329,8 @@ func nextContentToken(line *string, prn PageResourceNames) (string, error) { return t, nil } -func resourceNameAtPos1(s, name string, prn PageResourceNames) bool { - switch s { - case "cs", "CS": +func colorSpace(s, name string, prn PageResourceNames) bool { + if strings.HasPrefix(s, "cs") || strings.HasPrefix(s, "CS") { if !types.MemberOf(name, []string{"DeviceGray", "DeviceRGB", "DeviceCMYK", "Pattern"}) { prn["ColorSpace"][name] = true if log.ParseEnabled() { @@ -268,72 +338,90 @@ func resourceNameAtPos1(s, name string, prn PageResourceNames) bool { } } return true + } + return false +} + +func resourceNameAtPos1(s, name string, prn PageResourceNames) (string, bool) { + if colorSpace(s, name, prn) { + return s[2:], true + } - case "gs": + if strings.HasPrefix(s, "gs") { prn["ExtGState"][name] = true if log.ParseEnabled() { log.Parse.Printf("ExtGState[%s]\n", name) } - return true + return s[2:], true + } - case "Do": + if strings.HasPrefix(s, "Do") { prn["XObject"][name] = true if log.ParseEnabled() { log.Parse.Printf("XObject[%s]\n", name) } - return true + return s[2:], true + } - case "sh": + if strings.HasPrefix(s, "sh") { prn["Shading"][name] = true if log.ParseEnabled() { log.Parse.Printf("Shading[%s]\n", name) } - return true + return s[2:], true + } - case "scn", "SCN": + if strings.HasPrefix(s, "scn") || strings.HasPrefix(s, "SCN") { prn["Pattern"][name] = true if log.ParseEnabled() { log.Parse.Printf("Pattern[%s]\n", name) } - return true + return s[3:], true + } - case "ri", "BMC", "MP": - return true + if strings.HasPrefix(s, "ri") || strings.HasPrefix(s, "MP") { + return s[2:], true + } + if strings.HasPrefix(s, "BMC") { + return s[3:], true } - return false + return "", false } -func resourceNameAtPos2(s, name string, prn PageResourceNames) bool { +func resourceNameAtPos2(s, name string, prn PageResourceNames) (string, bool) { switch s { case "Tf": prn["Font"][name] = true if log.ParseEnabled() { log.Parse.Printf("Font[%s]\n", name) } - return true + return "", true case "BDC", "DP": prn["Properties"][name] = true if log.ParseEnabled() { log.Parse.Printf("Properties[%s]\n", name) } - return true + return "", true } - return false + return "", false } func parseContent(s string) (PageResourceNames, error) { var ( + pre string name string n bool + ok bool ) prn := NewPageResourceNames() //fmt.Printf("parseContent:\n%s\n", hex.Dump([]byte(s))) for pos := 0; ; { - t, err := nextContentToken(&s, prn) + t, err := nextContentToken(pre, &s, prn) + pre = "" if log.ParseEnabled() { log.Parse.Printf("t = <%s>\n", t) } @@ -367,17 +455,22 @@ func parseContent(s string) (PageResourceNames, error) { pos++ if pos == 1 { - if resourceNameAtPos1(t, name, prn) { + if pre, ok = resourceNameAtPos1(t, name, prn); ok { n = false } continue } if pos == 2 { - if resourceNameAtPos2(t, name, prn) { + if pre, ok = resourceNameAtPos2(t, name, prn); ok { n = false } continue } - return nil, errPageContentCorrupt + ShowSkipped("corrupt page content") + n = false + if log.ParseEnabled() { + log.Parse.Printf("skip:%s\n", t) + } + //return nil, errPageContentCorrupt } } diff --git a/pkg/pdfcpu/model/parseContent_test.go b/pkg/pdfcpu/model/parseContent_test.go index 2fd17a5f..2bd38f4c 100644 --- a/pkg/pdfcpu/model/parseContent_test.go +++ b/pkg/pdfcpu/model/parseContent_test.go @@ -26,11 +26,12 @@ func TestParseContent(t *testing.T) { Span<>>, Span<>>, Span<>> BDC /a1 BMC/a2 MP /a3 /MC0 BDC/P0 scn/RelativeColorimetric ri/P1 SCN/GS0 gs[(Q[i,j]/2.)16.6(The/]maxi\)-)]TJ/CS1 CS/a4<>> BDC /a5 <>> BDC (0.5*\(1/8\)*64 or +/4.\))Tj/T1_0 1 Tf <00150015> Tj /Im5 Do/a5 << /A