ADDON-77801 Added support for ingesting json event using conte…#376
ADDON-77801 Added support for ingesting json event using conte…#376spanchal-crest wants to merge 2 commits intomainfrom
Conversation
1f21143 to
0f3be11
Compare
|
Had some internal discussion directly with the PR author about this, so this PR may change. As such I have converted it to draft at this time. |
| sourcetype: str = Field(...) | ||
| custom_index: str | None = None | ||
| host: str | None = None | ||
| endpoint: str | None = None |
There was a problem hiding this comment.
I am wondering if it makes sense to add param, so that it would need to be set externally. It is doable for sure, on the other hand maybe it is generic enough for contentctl to spot that event format is hec format and decide as @spanchal-crest was proposing initally?
Mean there is very specific json format, requiring "event" field and some others: https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector
On the other hand if assuming that having event and not having other fields than listed on above page seems to risky we can stick to the entry here.
There was a problem hiding this comment.
I'm not exactly sure that I am following your suggestion. Could you provide some pseudocode, by any chance?
Are you saying that, for example, the JSON data that we want to replay, today, is actually JSONL and is a single file that looks like the following:
{"field1": "somevalue", "field2": 4}
{"field1": "someothervalue", "field2": 2}
{"field1": "somethirdvalue", "field2": 1}
But must actually be converted to the following format before it is replayed:
{"sourcetype": "some_sourcetype", "event": {"field1": "somevalue", "field2": 4}}
{"sourcetype": "some_sourcetype", "event": {"field1": "someothervalue", "field2": 2}}
{"sourcetype": "some_sourcetype", "event": {"field1": "somethirdvalue", "field2": 1}}
In line with the requirements for the HEC JSON endpoint?
Secondly, instead of the typing
endpoint: str | None = None
I would suggest
DEFAULT_HEC_ENDPOINT = "services/collector/raw"
`endpoint: str = DEFAULT_HEC_ENDPOINT`
or, even better
from enum import StrEnum
class ReplayEndpoint(StrEnum):
HEC = "services/collector/raw"
JSON = "services/collector/event"
endpoint: ReplayEndpoint = ReplayEndpoint.HEC
given that we already have thousands of tests and all of them use the raw event collector endpoint. The typing this way:
1. Ensures that endpoint is never `None` - since it is required it MUST be defined and Test Object creation time (giving it a reasonable default value means we don't need to go update 2,000 existing ymls).
2. Makes it explicit after the object is constructed so that we don't need a check in the replay logic for whether or not the value is None.
3. Means that if someone typos the endpoint, or tries to replay to an endpoint that contentctl does not support, they get a warning at validation time.
A third option could even be to make ReplayEndpoint its own Abstract Class and make HEC and JSON child classes that MUST implement the function `transform` so that if "raw" data (which comes directly from a file) must be "changed" in some way, the logic can be implemented in each of the ReplayEndpoint classes rather than in the replay logic itself.
OR, the JSON event file could be in the `{"sourcetype": "xyz", "event":{...}}` format on-disk.
| url_with_hec_path = urllib.parse.urljoin( | ||
| url_with_port, "services/collector/raw" | ||
| ) | ||
|
|
There was a problem hiding this comment.
Can we do if else above with this, not override later?
Is there any test where we can test this behavior?
|
As our approach have changed we do not need to send JSON data, instead we will send _raw data to EP with pipeline installed. |
Jira: https://splunk.atlassian.net/browse/ADDON-77801
Added support for ingesting the json event using contentctl.
For the windows TA, the SPL2 transformed logs have additional fields along with _raw. So, in order to ingest those samples we need to use
services/collector/eventendpoint as we want to ingest json data.For ingesting the events in json format, we should have the below format of the event.
{"event": "raw_event", "fields" : {"fields1" : "value1", "fields2" : "value2"}}