Add preserveFormatting option for comments/whitespace #38

arvindth · 2019-07-31T04:56:19Z

This pull request addresses a use case for reading and updating properties files that have:

commented out sample values for key/value entries that need to be preserved. eg:

# Uncomment the following to override the default value
# serverHostname = host.com

sectional comments with whitespace formatting. eg:

# 
# Section 1 
#

# comment1
key = value

Changes:

Add a preserveFormatting bool option to the loader to allow scanning and retaining whitespace as part of comments.
Add a preserveFormatting bool option when writing comments to pass through the whitespace formatting to the output.
Add a virtual key/value for the end of the input to allow trailing comments to be retained and emitted.
Update lexer to not discard whitespace if preserveFormatting is set.
Comments are now structs containing 2 elements: the original prefixes from the input, and the comment value itself.

Notes:

I chose not to add a preserveFormatting option to each of the Load* methods. I felt doing this would explode the number of combinations of potential parameters.
- Instead, the preserveFormatting option for loading is only available through the GetLoader method followed by explicitly setting the loader options before invoking the underlying Load methods.

arvindth · 2019-08-02T16:05:47Z

@magiconair could you take a look at this PR and let me know if you'll consider accepting it?

magiconair

I've started reviewing this but stopped since this solution seems too complex. If he goal is to preserve whitespace for comments and the lexer should handle that transparently. In this approach this permeates everywhere. Doesn't look right.

magiconair · 2019-08-06T02:27:33Z

load.go

 	return p
 }

+func GetLoader() (*Loader, error) {


We don't need this and it is not idiomatic. It would need to be called NewLoader() but since it just returns an empty Loader we can drop this method.

Removed this method. Callers can just instantiate the struct themselves.

magiconair · 2019-08-06T02:28:58Z

load.go


-func (l *Loader) loadBytes(buf []byte, enc Encoding) (*Properties, error) {
-	p, err := parse(convert(buf, enc))
+func (l *Loader) loadBytes(buf []byte, enc Encoding, preserveFormatting bool) (*Properties, error) {


Since all loadXXX functions eventually call loadBytes you don't need pass the parameter through. you can just use l.PreserveFormatting

Removed the passed parameters in load.go.

magiconair · 2019-08-06T02:40:09Z

lex.go


 // lex creates a new scanner for the input string.
-func lex(input string) *lexer {
+func lex(input string, preserveFormatting bool) *lexer {


Since #37 wants to add another config parameter to the lexer I think we should move the go l.run() call to the parse function and refactor this function as follows:

func lex(input string) *lexer { return &lexer{ input: input, items: make(chan item), runes: make([]rune, 0, 32), } }

Then parse looks like this:

func parse(input string) (properties *Properties, err error) { l := lex(input) go l.run() p := &parser{lex: l} ... }

This way it is possible to pass additional configuration to the lexer without adding them as function arguments.

magiconair · 2019-08-06T02:41:31Z

lex.go


 // lexer holds the state of the scanner.
 type lexer struct {
+	preserveFormatting bool  // whether to scan EOLs/whitespace as part of comments


Lets rename to

keepWS // keepWS retains whitespace in comments

magiconair · 2019-08-06T02:44:12Z

lex.go

 	case isEOL(r):
-		l.ignore()
-		return lexBeforeKey
+		if l.preserveFormatting {


Please make these two separate switch cases (also for the other ones):

case isEOL(r) && l.keepWS: l.appendRune(r) l.backup() return lexComment case isEOL(r): l.ignore() return lexBeforeKey

magiconair · 2019-08-06T02:46:26Z

properties.go


 // -----------------------------------------------------------------------------

+type Comment struct {


This should not be exported and does not look properly formatted but I also don't understand why we need this in the first place. The only difference is whether the comments have whitespace or not. Shouldn't the lexer handle that transparently for the parser?

unexported this struct. This is mainly needed to preserve the formatting of the prefix. Since java allows either # or ! as a prefix, with optional leading whitespace before the prefix, I wanted to be able to preserve that. eg:

############################# !! !! !! Section 1 !! !! !! ############################# # comment 1 key = value

magiconair · 2019-08-06T02:52:24Z

parser.go

 		case itemEOF:
+			if preserveFormatting && (len(comments) > 0 || token.val != "") {
+				// There are comments at the end of the input that are not tied to a particular key
+				// Save these off against a special empty key when preserving formatting


Instead of a sentinel key I suggest using a separate field for heading and trailing comments, e.g. trailComment comment in the Properties.

Also, please revert the logic of the if statement to avoid the indent.

if !preserveFormatting || len(comments) == 0 { goto done } if token.val == "" { p.trailingComments = comments goto done } ... goto done

magiconair · 2019-08-06T02:53:52Z

parser.go

 			continue
 		case itemKey:
-			key = token.val
+			key = strings.TrimSpace(token.val)


This does not look right since whitespace parsing should only affect comments.

This is necessary to maintain previous behavior. Previously, whitespace around the key was being ignored in lexBeforeKey. However, now the lexer doesn't know whether the whitespace in lexBeforeKey is part of a comment and needs to be preserved or whitespace before a key and needs to be ignored. So I do it here.

arvindth · 2019-08-06T20:35:17Z

I've put up an additional commit addressing your comments so far, and also removed some of the extra parameters, like properties.PreserveFormatting. In addition, I think at least some of the complexity comes from having to support writeFormattedCommentWithoutFormattingTests, i.e. being able to take formatted comments and write them out as before by stripping out the whitespace. I'm considering removing or modifying this capability to see if it reduces the complexity, especially in the WriteComment method.

- Original behavior is still retained for calls to the original Write methods.

arvindth · 2019-08-07T00:32:08Z

I've refactored the WriteComment method and moved writing formatted comments out into a new method. The individual commit diff makes it look odd, but looking at the FilesChanged view shows the diff better. This in combination with the previous commit's changes does simplify the original change a bit.

However, in order to maintain backward compatibility in the lexer, the parser and in properties, I don't think that this can be achieved transparently in the lexer. I think that the parser needs to understand preserveFormatting and behave differently when it's specified. Either that, or the list of lexer states would need to grow much larger, since the lexer would need to understand the underlying state quite a bit more, and even then, the parser would still need enhancements to understand and consume those new emitted items.

Let me know if you agree, or have any further suggestions to modify this PR.

However, I also understand if you think it's still not the right approach. If you believe this is not the right direction for this feature, I'm ok with closing this out.

arvindth added 2 commits July 30, 2019 23:30

Add preserveFormatting option for comments/whitespace

10c1dcc

Add tests for preserveFormatting feature

9db77ae

arvindth mentioned this pull request Jul 31, 2019

SEC-283: Remove extra whitespace after prefix when writing comments confluentinc/cli#241

Closed

magiconair reviewed Aug 6, 2019

View reviewed changes

PR comments

4ddf0d0

arvindth force-pushed the preserveFormatting branch from b2c5409 to be36114 Compare August 6, 2019 23:25

arvindth added 2 commits August 6, 2019 19:06

Create separate WriteFormattedContent method to simplify properties.go

b0894f2

- Original behavior is still retained for calls to the original Write methods.

Test updates for new WriteFormattedComment method

c94c1ec

arvindth force-pushed the preserveFormatting branch from be36114 to c94c1ec Compare August 7, 2019 00:06

arvindth mentioned this pull request Aug 14, 2019

SEC-283 - Preserve whitespace formatting for comments in property files confluentinc/cli#252

Merged

arvindth closed this Nov 29, 2022


		// -----------------------------------------------------------------------------

		type Comment struct {

Add preserveFormatting option for comments/whitespace #38

Add preserveFormatting option for comments/whitespace #38

Conversation

arvindth commented Jul 31, 2019

Uh oh!

arvindth commented Aug 2, 2019

Uh oh!

magiconair left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arvindth commented Aug 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arvindth commented Aug 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arvindth commented Aug 6, 2019 •

edited

Loading

arvindth commented Aug 7, 2019 •

edited

Loading