Add JsonL-Format as possible logformat#47
Conversation
|
Please also add a to test out your logic for detecting the timestamp key, even when
EDIT: also please add lines that are problematic:
|
|
Overall, this looks very good. I like your adaptive timestamp detection and removal from the logged content.
Also, I'll add support for detecting the presence of the EDIT: well, I found a few more things to discuss before merging. See other comments. |
|
This file format poses some interesting issues compared to the other types. Most of the others deal with processing lines of text, so all the fields are strings by default. Now that we are getting data that has been through the JSON parser, it can be any of a number of types (int, float, bool, None, even dict if the JSON is nested), so we can't assume strings for everything. |
What do you think should happen, if no timestamp key is available? I don't think it makes sense to add this row to the previous rows. |
|
I think this may bring up a difference in approach from the log reader that reads loosely-formatted text files. In the case of text file, lines with no timestamp are presumed to be continuations of the last timestamped line (if a traceback gets logged, for instance). With parsed JSON, it is kind of a puzzle, why would we get a line with no timestamp? And what to do with such a line? Dropping it on the floor seems the easiest, but also the least friendly to the user - that line might have important stuff in it. I guess we could just log a warning in that case so it doesn't get lost, but we don't make any extra assumptions about it. Also, with regard to the code that looks for a new timestamp key if the old one changes to a new one - I'm a little wary of being too helpful there. In the past I have written APIs with similar helpfulness in mind, and I ended up getting tied up in some knots because an API was too flexible, and this feels similar. Do you think this is going to be a common occurrence? Have you seen this in the log files you work with? To begin, I feel we should start strict, and require the key to be the same throughout a given jsonl file, and if this becomes more common, we'll address it in a future version. |
|
Without the dynamic timestamp col search, the row without timestamp will be added to the previous line. The only times i found changing keys, is when i changed the logging mechanism. But to be honest, it was just bad practice having both versions in the same file. |
ptmcg
left a comment
There was a problem hiding this comment.
Thanks for making the changes - all looks good!
|
Just released these changes with logmerger 0.11.0 - thanks for the help! |
This PR adds support for jsonL (or ndjson) logs.
Files are found if they are called with the file ending jsonl.
Currently is the timestamp searched in all top-level keys in the json objects.
Each key is printed to its own line (\n are not escaped). That's the easiest option for showing json objects. In the future it is possible to allow more complex options.
Additionally there is a small example file, which can be read with the new Reader.