Skip to content

Conversation

@jhogstrom
Copy link

By using either command line replacement pairs or specifying a file with json defintions of replacements an arbitrary regexp ("pattern") can be detected and replaced with another string, including expanding captured groups in the pattern.

The replacement phase is taking place just before upsert, so all other textual manipulations are done by that time.

Replacements happen in a deterministic sequence. There are ample opportunities to get unexpected (but logically consistent) results by inadvertently result of a previous replacement.

Format of json file:

{
    "environment": [
        {
            "import": "<module name>",
            "path": "<source file>"
        }
    ],
    "replacements":[
        {
            "name": "<name - optional>",
            "pattern": "<regexp>",
            "new_value": "<string with optional group expansions>"
            "evaluate": <true|false - optional>
        },
    ]
}

The environment block is optional and used for very dynamic replacements. By specifying a python source file, it will be dynamically imported at run time. The new_value field can then specify a <module>.<func> that returns a string value. As an example, the following adds a replacement of "TODAY" to an iso-formatted datetime.

{
    "environment": [
        {
            "import": "funcs",
            "path": "funcs.py"
        }
    ],
    "replacements":[
        {
            "name": "Todays date",
            "pattern": "TODAY",
            "new_value": "funcs.today"
            "evaluate": true
        },
    ]
}

Funcs.py:

import datetime

def today(term):
    return datetime.datetime.now().isoformat()

The parameter term is a Match object as per using https://docs.python.org/3/library/re.html#re.subn.

By using either command line replacement pairs or specifying a file
with json defintions of replacements an arbitrary regexp ("pattern")
can be detected and replaced with another string, including expanding
captured groups in the pattern.

The replacement phase is taking place just before upsert, so all other
textual manipulations are done by that time.

Replacements happen in a deterministic sequence. There are ample
opportunities to get unexpected (but logically consistent) results
by inadvertently result of a previous replacement.

Format of json file:
```
{
    "environment": [
        {
            "import": "<module name>",
            "path": "<source file>"
        }
    ],
    "replacements":[
        {
            "name": "<name - optional>",
            "pattern": "<regexp>",
            "new_value": "<string with optional group expansions>"
            "evaluate": <true|false - optional>
        },
    ]
}
```
The `environment` block is optional and used for very dynamic
replacements. By specifying a python source file, it will be
dynamically imported at run time. The `new_value` field can then specify
a `<module>.<func>` that returns a string value. As an example, the
following adds a replacement of "TODAY" to an iso-formatted datetime.

```
{
    "environment": [
        {
            "import": "funcs",
            "path": "funcs.py"
        }
    ],
    "replacements":[
        {
            "name": "Todays date",
            "pattern": "TODAY",
            "new_value": "funcs.today"
            "evaluate": true
        },
    ]
}
```

Funcs.py:
```
import datetime

def today(term):
    return datetime.datetime.now().isoformat()
```

The parameter `term` is a Match object as per using
https://docs.python.org/3/library/re.html#re.subn.
@iamjackg iamjackg changed the base branch from master to develop October 22, 2023 00:39
Copy link
Owner

@iamjackg iamjackg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the slew of comments -- this is really cool functionality (although it borders the edge of what I would consider appropriate in a tool that's "simply" uploading documents to Confluence) so I'd love for it to be polished for inclusion!


## Replacements

By using either command line replacement pairs or specifying a file with json defintions of replacements an arbitrary regexp ("pattern") can be detected and replaced with another string, including expanding captured groups in the pattern.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably recommend doing this in YAML instead of JSON. Most other configuration files in the documentation-adjacent space tend to be YAML, and JSON is a valid subset of YAML anyway.

"name": "<name - optional>",
"pattern": "<regexp>",
"new_value": "<string with optional group expansions>"
"evaluate": <true|false - optional>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not particularly intuitive. I would suggest using two different fields instead of making the behaviour dependent on this flag (e.g. replacement_text and replacement_function).

def replace(self, page):
console.print(f"Performing replacement '{self.name}'")
if self.evaluate:
new_value = eval(self.new_value)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is implemented with an eval, the field can actually specify any valid python code at all, which can be a security issue. Granted, once you're importing a custom module all security concerns go out the window anyway, but I'd still prefer something a bit smarter, like only allowing module.function here, parsing it, and using getattr to get it from the module.


```json
{
"environment": [
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is not particularly intuitive. I would maybe use something like dynamic_replacement_modules

Comment on lines +270 to +274
parser.add_argument(
"--replacements",
dest="replacementfile",
help="Filename with replacement definition in json format",
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parser.add_argument(
"--replacements",
dest="replacementfile",
help="Filename with replacement definition in json format",
)
parser.add_argument(
"--replacement-file",
dest="replacement_file",
help="Filename with replacement definition in YAML format",
)


# Create Replacement objects for the commandline replacements
for i, r in enumerate(commandline_replacements):
result.append(Replacement(f"CLI replacement {i}", *r.split("=", 1)))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can go wrong pretty easily if somebody forgets the =: it will just result in a non-obvious exception about the number of parameters when calling the constructor. I'd love to have this function receive data that's already well-structured. For example, you could parse the command line arguments in __main.py__ (with error handling if the format is wrong) and give create_replacement a well-formed list of lists, e.g.

[["pattern", "replacement], ["another pattern", "another replacement"]]

if not replacementfile:
return result

file_replacements = json.load(open(replacementfile))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file_replacements = json.load(open(replacementfile))
with open(replacement_file) as fp:
file_replacements = json.load(fp)

"replacements":[
{
"name": "<name - optional>",
"pattern": "<regexp>",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"pattern": "<regexp>",
"regex": "<regex>",

# Get the replacement definitions
for i, r in enumerate(file_replacements["replacements"]):
new_value = r["new_value"]
if isinstance(new_value, list):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behaviour is not documented anywhere. Is it really necessary? What's a use case for this vs. a string with newlines?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests at all for this :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants