diff --git a/docs/blog/index.md b/docs/blog/index.md
index 808493d..e5c5fc2 100644
--- a/docs/blog/index.md
+++ b/docs/blog/index.md
@@ -7,6 +7,51 @@
# Blog
+## Try out the first experience: Use Loggie and VictoriaLogs to quickly build a new generation of logging system
+
+
+:octicons-person-16: [__ethfoo__]
+
+:octicons-calendar-24: 2023-6-28
+:octicons-clock-24: 20 min read
+
+
+
+Basic moments:
+
+- Pain points of current log storage
+- Advantages of VictoriaLogs
+- Use Loggie to collect logs into VictoriaLogs
+
+ [:octicons-arrow-right-24: __Continue reading__](./victorialogs/loggie_victorialogs.md)
+ [__ethfoo__]: https://github.com/ethfoo
+ [网易基于Loggie的云原生大规模日志架构演进]: (./victorialogs/loggie_victorialogs.md)
+
+## The evolution of NetEase’s cloud-native large-scale log architecture based on Loggie
+
+
+:octicons-person-16: [__ethfoo__]
+
+:octicons-calendar-24: 2022-12-17
+:octicons-clock-24: 15 min read
+
+
+
+Sharing from the 2022 Top 100 Global Software Case Study Summit.
+
+Basic moments:
+
+- Initial exploration of the log scene
+- Large-scale migration to cloud native
+- Unified logging platform using Loggie as the core
+- In-depth logging architecture best practices
+
+ [:octicons-arrow-right-24: __Continue reading__](https://github.com/loggie-io/asserts/blob/main/ppt/%E7%BD%91%E6%98%93%E5%9F%BA%E4%BA%8ELoggie%E7%9A%84%E4%BA%91%E5%8E%9F%E7%94%9F%E5%A4%A7%E8%A7%84%E6%A8%A1%E6%97%A5%E5%BF%97%E6%9E%B6%E6%9E%84%E6%BC%94%E8%BF%9B.pdf)
+ [__ethfoo__]: https://github.com/ethfoo
+ [网易基于Loggie的云原生大规模日志架构演进]: (https://github.com/loggie-io/asserts/blob/main/ppt/%E7%BD%91%E6%98%93%E5%9F%BA%E4%BA%8ELoggie%E7%9A%84%E4%BA%91%E5%8E%9F%E7%94%9F%E5%A4%A7%E8%A7%84%E6%A8%A1%E6%97%A5%E5%BF%97%E6%9E%B6%E6%9E%84%E6%BC%94%E8%BF%9B.pdf)
+
+---
+
## Quickly Build a Scalable Cloud-Native Logging Architecture with Loggie
diff --git a/docs/blog/victorialogs/img/benchmark.png b/docs/blog/victorialogs/img/benchmark.png
new file mode 100644
index 0000000..d4410bf
Binary files /dev/null and b/docs/blog/victorialogs/img/benchmark.png differ
diff --git a/docs/blog/victorialogs/img/terminal1.png b/docs/blog/victorialogs/img/terminal1.png
new file mode 100644
index 0000000..faa12dd
Binary files /dev/null and b/docs/blog/victorialogs/img/terminal1.png differ
diff --git a/docs/blog/victorialogs/img/terminal2.png b/docs/blog/victorialogs/img/terminal2.png
new file mode 100644
index 0000000..5dbd7a2
Binary files /dev/null and b/docs/blog/victorialogs/img/terminal2.png differ
diff --git a/docs/blog/victorialogs/img/ui.png b/docs/blog/victorialogs/img/ui.png
new file mode 100644
index 0000000..a9e74ca
Binary files /dev/null and b/docs/blog/victorialogs/img/ui.png differ
diff --git a/docs/blog/victorialogs/loggie_victorialogs.md b/docs/blog/victorialogs/loggie_victorialogs.md
new file mode 100644
index 0000000..610f21e
--- /dev/null
+++ b/docs/blog/victorialogs/loggie_victorialogs.md
@@ -0,0 +1,311 @@
+# First experience: Use Loggie and VictoriaLogs to quickly build a new generation of logging system
+
+If you are familiar with Prometheus, you must also know VictoriaMetrics, an increasingly popular monitoring project that can be used as an enhancement or replacement for Prometheus. An important highlight of VictoriaMetrics is to solve the storage problem of Prometheus at the scale of large-scale Metrics indicator data.
+
+Both belong to observability. When we focus on the field of logs, in fact, a pain point of logs for a long time is also storage.
+
+## Basic moments
+
+Some of the more common open source log storage projects nowadays include: Elasticsearch, Clickhouse, Loki, etc. Of course, Elasticsearch and Clickhouse are not inherently designed for log storage, we can just use them to store log data.
+
+For example, the core of Elasticsearch is a search engine. For log storage scenarios, full-text retrieval is a major advantage, but it also has the following shortcomings:
+
+- Write performance is relatively slow
+- High resource usage
+- Compression difference for log storage
+
+Generally speaking, Elasticsearch is a log storage database with a long history and is widely used. After all, the concept of ELK was deeply rooted in the hearts of the people. However, under the current background of cost reduction and efficiency improvement, many enterprises are still sensitive to the machine resources occupied by Elasticsearch. If it is only used to store a large number of operation and maintenance logs, the cost performance is still low.
+
+Therefore, the emergence of Grafana Loki in the past two years caused a bit of splash. After all, the log field has been suffering from Elasticsearch for a long time.
+
+Let’s briefly introduce the advantages of Loki:
+
+- Designed to store logs
+- Resource usage is pretty good
+- Introduced the concept of Log Stream
+
+More than half a year ago, a department within our company began to try to use Loki to store some system logs. But there are always some small problems that are not very reassuring. In addition, Loki's shortcomings include:
+
+- There is no actual full-text search, so keyword queries may be slower.
+- Setting the retrieved label independently is not supported, which may cause a series of problems such as performance.
+
+Of course, Loki is still a relatively young project, and we can understand that these stability, performance, and design issues may be the pain of early development.
+
+However, it seems that many people can’t wait any longer.
+
+## Long overdue: the advantages of VictoriaLogs
+
+Recently, VictoriaMetrics released a preview version of VictoriaLogs, which is similar to Loki and is specifically used to store logs. In view of VictoriaMetrics' good reputation, everyone still has certain expectations for this "catfish" that will disrupt the situation.
+
+Why did VictoriaMetrics get involved with VictoriaLogs?
+
+In fact, starting from this Issue in 2020:https://github.com/VictoriaMetrics/VictoriaMetrics/issues/816
+
+VictoriaMetrics had the idea to develop VictoriaLogs. From the discussion of this issue, we can see that everyone is still a little bit critical of Loki. For example, storage relies on S3 (local storage does not support distribution), such as performance.
+
+Here is an excerpt from the complaints in issues:
+
+> almost 2 years passed and Loki is still unusable for scenarios with real logging data. Trying to query anything hitting more than 50k logs is exploding servers :)
+>
+
+No need for translation, we can all feel the user’s strong dissatisfaction across the screen.
+
+After more than two years, VictoriaLogs has finally officially arrived in front of us. So what are the advantages of VictoriaLogs and what problems can it solve in the field of log storage?
+
+Here I briefly summarize a few points. Interested students can find more information in [Official Documents]([https://docs.victoriametrics.com/VictoriaLogs/](https://docs.victoriametrics.com/VictoriaLogs/)) Many answers.
+
+- Compatible with Elasticsearch bulk interface
+- Support horizontal and vertical expansion
+- Low resource usage
+- Supports multi-tenancy
+- Inherited (copied) Loki's log stream concept, but with some optimizations
+- Supports full-text search and provides simple and powerful LogsQL query syntax
+
+Let’s start with a major feature of VictoriaMetrics: **Compatibility**.
+VictoriaLogs directly supports the Elasticsearch bulk API. Since almost all log collection agents on the market support sending to Elasticsearch, seamless docking and migration can be achieved without the need for these agents to develop and add new output sources. (I really want to complain about Loki here. It doesn’t even provide a public client SDK package. How can anyone connect with it?)
+
+However, it supports horizontal and vertical expansion. Since the preview version of VictoriaLogs currently only provides single node, it cannot be confirmed yet.
+
+In addition, in terms of **resource usage**, we can directly look at the [benchmark]([https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/logs-benchmark](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/logs-benchmark)) results. Comparing Elasticsearch, it can be seen from the figure below:
+
+- average memory
+ - Elasticsearch:4.4 GiB
+ - VictoriaLogs:144 MiB
+- Average disk usage:
+ - Elasticsearch:53.9 GB
+ - VictoriaLogs:4.20 GB
+
+
+
+The memory and disk usage are indeed much lower, basically by an order of magnitude. If the storage capacity is large, it can save a lot of money on the server, which is undoubtedly a great boon for patients who are currently reducing costs and increasing efficiency.
+
+VictoriaLogs also introduces the concept of **log stream**. Combined with the ability of multi-tenants, it seems that it can achieve the optimal solution under the trade-off between performance and resource usage in log storage scenarios. This is why VictoriaLogs differs from Elasticsearch and other non-log-specific log storage scenarios. Stores the core elements of a design database.
+
+So before using VictoriaLogs, be sure to have a good understanding of log stream.
+
+What is log stream?
+
+Simply put, it represents a log instance of the application (service). As for the specific granularity of this log instance, we can design and control it ourselves, but it is not recommended that the overall number is particularly large.
+
+For example, a log instance can be:
+
+- Logs generated by the next Linux process deployed by the host
+- Logs generated by the Container container of the Pod running an application on Kubernetes, or more granularly, a log file in the container can also represent a log stream
+
+The key to the design of the log stream is that it can be uniquely identified, so that the location where the log is generated in the distributed system can be determined, such as which log file in which container on which node.
+
+A log stream is identified by multiple labels, so in fact it is similar to the label of Prometheus metrics. We can analogize some concepts in Prometheus:
+
+- job label: indicates applications on multiple copies, such as deployment name
+- instance label: Indicates which process and port number generate the metrics
+
+In VictoriaLogs, you can also design some similar labels yourself and add them to the meta-information collected in logs, which can also be used for subsequent correlation and retrieval of logs and indicators. Of course, in actual applications, we can also add labels such as environment, data center, namespace, etc.
+
+If you knew Loki before, you would definitely want to say, didn’t Loki also design the label in this way?
+
+Yes, but after you have used Loki in depth, you may encounter this pitfall: when the log labels sent carry some frequently modified fields, such as a log, the offset field in it is used as a label. It probably looks like this:
+
+```yaml
+{
+ "message": "xxx",
+ "timestamp": "",
+ "logconfig": "foo",
+ "podname": "bar",
+ "offset": 20,
+ ...
+}
+```
+
+Loki will use the values of all labels as a unique log stream identifier. For example, the above content will use `{logconfig: "foo", "podname": "bar", "offset": 20}` as a log stream. Since in the same file, the offset will increase with each line of log collected, this will cause the number of log streams to grow infinitely, causing huge pressure on Loki.
+
+In order to avoid such problems, VictoriaLogs is designed to distinguish stream labels from ordinary labels. For example, in the above scenario, we only need to use logconfig and podname as stream labels, and offset as ordinary labels.
+
+After understanding the stream label, we can better understand the following data formats in VictoriaLogs:
+
+- `_msg`: Log content field
+- `_time`: time field
+- `_stream` label: In the same log stream, the label remains unchanged
+- Ordinary labels: can change in the same log stream, such as level, traceId, etc.
+
+## Real world: Use Loggie to collect logs into VictoriaLogs
+
+Let's start with a real experience of how to use Loggie and VictoriaLogs to quickly build a logging system.
+
+### 1. Deploy VictoriaLogs
+
+The following commands can be executed:
+
+```bash
+helm repo add vm https://victoriametrics.github.io/helm-charts/
+helm repo update
+
+helm install vlsingle vm/victoria-logs-single -n victoria-logs --create-namespace
+```
+
+For more details, please refer to [helm chart deployment](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-logs-single/README.md)。
+
+Here we set the deployed namespace to victoria-logs. If the namespace name is modified, please also modify some of the following configurations simultaneously.
+
+### 2. Deploy Loggie
+
+If you don’t know Loggie yet, please go [here]([https://github.com/loggie-io/loggie](https://github.com/loggie-io/loggie))。
+
+For convenience, we provide a deployment configuration adapted to VictoriaLogs in the Loggie catalog.
+
+```bash
+VERSION=v1.4.0
+helm pull https://github.com/loggie-io/installation/releases/download/$VERSION/loggie-$VERSION.tgz && tar xvzf loggie-$VERSION.tgz
+# Download the deployment configuration values file used to adapt victoriaLogs from the catalog
+wget https://raw.githubusercontent.com/loggie-io/catalog/main/scenarios/victoriaLogs/values.yml
+# Specify the values file to deploy Loggie
+helm install loggie ./loggie -n loggie --create-namespace -f values.yml
+```
+
+### 3. Generate and collect logs
+
+After deploying VictoriaLogs and Loggie, create a test Deployment genfiles to generate logs.
+
+```bash
+wget https://raw.githubusercontent.com/loggie-io/catalog/main/common/genfiles/deployment.yml
+kubectl apply -f deployment.yml
+```
+
+Then create a matching log collection task and tell Loggie to collect the log files in this Deployment container:
+
+```bash
+wget https://raw.githubusercontent.com/loggie-io/catalog/main/scenarios/victoriaLogs/genfiles_logconfig.yml
+kubectl apply -f genfiles_logconfig.yml
+```
+
+Here we focus on the sink configuration in genfiles_logconfig.yml:
+
+```yaml
+sink: |
+ type: elasticsearch
+ hosts: [ "vlsingle-victoria-logs-single-server.victoria-logs.svc:9428/insert/elasticsearch/" ]
+ parameters:
+ _msg_field: "body"
+ _time_field: "@timestamp"
+ _stream_fields: "logconfig,namespace,podname,containername"
+```
+
+- Directly use the Elasticsearch type sink because VictoriaLogs is compatible with the bulk interface
+- Here the hosts are changed to the url of VictoriaLogs. Please note that a fixed path is added at the end: `/insert/elasticsearch/`
+- Added parameters field to be compatible with the log format required by VictoriaLogs, and configured stream labels field
+ - `_msg_field`: Indicates which log field is used as the msg content field. The default log content field of Loggie is `body`. If you are already using Loggie and have modified it to other fields, please modify it accordingly.
+ - `_time_field`: Indicates which field to use as the time field
+ - `_stream_fields`: Indicates which fields are used as the unique identification label of the log stream.
+
+In this example, the log format we send on the sink side is roughly as follows:
+
+```yaml
+{
+ "body": "2023-07-04 02:58:18.014 INF cmd/subcmd/genfiles/genfiles.go:57 > 1000 TqrccSCPzRUYRP PJ MlvgdAluEpIoRIRyzjZoNk",
+ "containername": "genfiles",
+ "namespace": "default",
+ "podname": "genfiles-66f5c86fdb-tjpzr",
+ "@timestamp": "2023-07-04T02:58:21.905Z",
+ "offset": 1092798,
+ "cluster": "test",
+ "logconfig": "genfiles",
+ "nodename": "kind-control-plane",
+ "filename": "/var/lib/kubelet/pods/c7b2da94-b152-414e-a7d8-1951e9d4f09a/volumes/kubernetes.io~empty-dir/logs/loggie.log"
+}
+```
+
+As you can see, the granularity of the log stream is set to the Pod container level, so _stream_fields is set to `cluster,logconfig,namespace,podname,containername`.
+
+Now we simulate generating a little log:
+
+```bash
+# Enter the genfiles container
+kubectl exec -it $(kubectl get po -l app=genfiles -o jsonpath="{.items[0].metadata.name}") bash
+
+# Generate some logs
+./loggie genfiles -totalCount=1000 -lineBytes=1024 -qps=0 \
+ -log.maxBackups=1 -log.maxSize=1000 -log.directory=/tmp/log -log.noColor=true \
+ -log.enableStdout=false -log.enableFile=true -log.timeFormat="2006-01-02 15:04:05.000"
+```
+
+Regarding the genfiles subcommand of loggie generating logs, for more usage methods, please refer to [here]([https://github.com/loggie-io/catalog/tree/main/common/genfiles](https://github.com/loggie-io/catalog/tree/main/common/genfiles)).
+
+Under normal circumstances, Loggie will quickly collect these logs and then send them to VictoriaLogs.
+
+Of course, we can also enter the Loggie terminal console to confirm the collection progress:
+
+```bash
+kubectl -n loggie -it exec $(kubectl -n loggie get po -l app=loggie -o jsonpath="{.items[0].metadata.name}") -- ./loggie inspect
+```
+
+
+
+
+
+As shown in the figure above, the file collection progress is 100%, which means that the collection has been completed and the logs have been sent to VictoriaLogs.
+
+For the specific operation methods of Loggie terminal, you can also refer to our [Log Collection Quick Troubleshooting Guide]([https://loggie-io.github.io/docs/main/user-guide/troubleshot/log-collection/]( https://loggie-io.github.io/docs/main/user-guide/troubleshot/log-collection/)).
+
+Next, we can use the built-in UI of VictoriaLogs to view the collected logs.
+
+## LogsQL和日志查询
+
+Since the local network and the internal network of the Kubernetes cluster are not connected, for the sake of simplicity, we directly port-forward:
+
+```bash
+export POD_NAME=$(kubectl get pods --n victoria-logs -l "app=server" -o jsonpath="{.items[0].metadata.name}")
+kubectl -n victoria-logs port-forward $POD_NAME 9428
+```
+
+Then access the following page locally:`[http://localhost:9428/select/vmui/](http://localhost:9428/select/vmui/)`
+
+VictoriaLogs currently provides a simple UI page that can be used to query logs. In order to demonstrate how to query the logs we just collected, let’s take a quick look at the query syntax.
+
+Taking into account maximizing query performance, it is recommended that the standard routine for writing LogsQL is as follows:
+
+**1. Determine log stream**
+
+The filtering fields here in `_stream` are the parameters `._stream_fields` configured on the sink just now, such as querying which log collection task, which Pod, etc. In the example, we query all the logs under the log collection task just created based on the logconfig label in the log stream. LogsQL is as follows:
+
+`_stream:{logconfig=genfiles}`
+
+
+
+If logconfig matches a lot of Pods, you can also add corresponding podname and other fields here to further filter. for example:`_stream:{logconfig=genfiles, podname="genfiles-66f5c86fdb-mrfzc"}`
+
+**2. Add time interval**
+
+Then we can also increase the time interval to further reduce the number of returned logs.
+
+LogsQL: `_stream:{logconfig=genfiles} _time:[now-1h,now]`
+
+Please note that these are separated by spaces. In addition, `_stream` and `_time` are built-in fields, and there is no order of distinction.
+
+**3. Further filtering**
+
+Above, we asked Victoria to quickly determine the log range of a log stream, and then, based on this, perform keyword matching, field filtering and other complex log retrieval.
+
+for example:
+
+- Keyword search: _stream:{logconfig=genfiles} _time:[now-1h,now]
+- Filter based on common label fields: _stream:{logconfig=genfiles} _time:[now-1h,now] nodename:`kind-control-plane`
+- You can also add logical conditions: AND, OR, NOT
+
+ …
+
+LogsQL still has many functions. For more detailed usage of LogsQL, please refer to [Official Document](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html)。
+
+## Summarize
+
+Although VictoriaLogs has only released a preview version, judging from the current design and experience, it is still better than Loki. After all, it stands on Loki's shoulders and has the advantage of being a latecomer.
+
+However, there is no silver bullet in software design. In the real world, whether you choose to use a certain project or not, the most important thing is whether it is suitable, not whether the function of the project itself is powerful.
+
+It can be expected that for a long time, old players such as Elasticsearch will still occupy most of the market, because many enterprises have Elasticsearch or Clickhouse that have been running stably for a long time, and also have corresponding operation and maintenance. Personnel and supporting support.
+
+So what situations are VictoriaLogs suitable for?
+
+If you are building your own logging system from scratch, are willing to accept the potential risks of new things, and are confident in the future of VictoriaLogs, it is still worth a try.
+
+It is precisely because of the entry of new stars like VictoriaLogs that the logging field has become more involution and at the same time benefited the broad masses of the people. After all, we have a good choice, and there is more work to do in the domestic "open source customized development" market.
+
+If you have more ideas to exchange about VictoriaLogs, you are welcome to [scan the QR code](../../getting-started/overview/#_3) to join the Loggie communication group to discuss all log-related technologies.
diff --git a/docs/developer-guide/build.md b/docs/developer-guide/build.md
new file mode 100644
index 0000000..caee4ce
--- /dev/null
+++ b/docs/developer-guide/build.md
@@ -0,0 +1,67 @@
+# Compile and build
+
+## Persistence engine selection
+
+Loggie provides badger engines in addition to sqlite in v1.5 and later versions.
+
+For the configuration of the two persistence engines, sqlite and badger, please refer to [here](../reference/global/db.md).
+
+Why introduce the badger engine? The problems with using sqlite and CGO are:
+
+- SQLite has requirements for the c lib library version of the environment. In the host deployment scenario, there are many machines with lower versions that may not be able to run.
+- Avoid some construction pitfalls introduced by CGO, such as building multi-architecture (amd64/arm64) images
+
+## Container image build
+
+By default, Loggie will build images for all release branches and main branches and push them to [dockerhub](https://hub.docker.com/r/loggieio/loggie/). However, pulling the image from dockerhub may have problems such as current limitation, so it is recommended that you build the image yourself or re-push it to your own image warehouse.
+
+Two Dockerfiles are provided under the Loggie project:
+
+- Dockerfile: The default dockerfile uses sqlite, so it will also be built using CGO.
+- Dockerfile.badger: You can use the `-f` parameter during docker build to specify the use of this dockerfile, which uses badger as the persistence engine, pure Golang, and no CGO.
+
+In addition to using docker build, you can also use make to build:
+
+```bash
+make docker-build REPO=/loggie TAG=v1.x.x
+make docker-push REPO=/loggie TAG=v1.x.x
+```
+
+Or build and push multi-architecture images directly:
+
+```bash
+make docker-multi-arch REPO=/loggie TAG=v1.x.x
+```
+
+Note:
+
+- The TAG here does not need to be passed. By default, it will be automatically generated based on the tag or branch in Git.
+
+### Multiple Architectures
+
+All Dockerfiles in the Loggie project support multi-architecture construction. The Loggie image on dockerhub is multi-architecture of amd64 and arm64, and the local architecture will be automatically recognized when pulling.
+
+However, if you want to modify the tag and re-push it to your own warehouse, please do not directly docker pull & docker tag & docker push. This will only push the image of your local architecture, causing the multi-architecture image in the warehouse to become invalid.
+If you want to push multi-architecture images to your own warehouse, you can use some other open source tools, such as [regctl](https://github.com/regclient/regclient/blob/main/docs/regctl.md), which can be used locally Use a command like `regctl image copy loggieio/loggie:xxx ` to push.
+
+If you connect your own build process or tools, please use `make docker-multi-arch` or `docker buildx` to build multi-architecture images.
+
+## Binary build
+
+Binaries are available for host deployment scenarios.
+
+- sqlite:
+
+ ```bash
+ make build
+ ```
+
+- badger:
+
+ ```bash
+ make build-in-badger
+ ```
+
+If cross compilation is required, please add GOOS and GOARCH.
+
+If the local build using `make build` fails, you can modify the extra_flags extra build parameters in the makefile. Please try to remove `-extldflags "-static"`.
diff --git a/docs/getting-started/install/kubernetes.md b/docs/getting-started/install/kubernetes.md
index 4d00971..9bd4204 100644
--- a/docs/getting-started/install/kubernetes.md
+++ b/docs/getting-started/install/kubernetes.md
@@ -10,7 +10,7 @@ Make sure you have kubectl and helm executable locally.
### Download helm-chart
```bash
-VERSION=v1.3.0
+VERSION=v1.5.0
helm pull https://github.com/loggie-io/installation/releases/download/${VERSION}/loggie-${VERSION}.tgz && tar xvzf loggie-${VERSION}.tgz
```
Please replace `` above with the specific version number such as v1.3.0, which can be found [release tag](https://github.com/loggie-io/loggie/tags).
@@ -250,7 +250,7 @@ servicePorts:
For the initial deployment, we specify that the deployment is under the `loggie` namespace, and let helm automatically create the namespace.
```bash
-helm install loggie ./ -nloggie --create-namespace
+helm install loggie ./ -n loggie --create-namespace
```
If `loggie` namespace has been created in your environment, you can ignore `-nloggie` and `--create-namespace`. Of course, you can use your own namespace.
@@ -261,11 +261,10 @@ If `loggie` namespace has been created in your environment, you can ignore `-nlo
```
If you have a similar problem during helm install, it means that your Kubernetes version is too low and does not support the apiextensions.k8s.io/v1 version CRD. Loggie temporarily retains the CRD of the v1beta1 version, please delete the v1beta1 version in the charts, `rm loggie/crds/crds.yaml` and reinstall it.
-
### Check deployment status
After execution, use the helm command to check the deployment status:
```
-helm list -nloggie
+helm list -n loggie
```
Result should be like:
```
@@ -275,15 +274,13 @@ loggie loggie 1 2021-11-30 18:06:16.976334232 +0800 CST deployed loggi
At the same time, you can also use the kubectl command to check whether the Pod has been created.
```
-kubectl -nloggie get po
+kubectl -n loggie get po
```
Result should be like:
```
loggie-sxxwh 1/1 Running 0 5m21s 10.244.0.5 kind-control-plane
```
-
-
## Deploy Loggie Aggregator
Deploying Aggregator is basically the same as deploying Agent. In Helm chart we provide `aggregator config`. Modify as `enabled: true`.
@@ -312,7 +309,7 @@ At the same time, you can add content in values.yaml according to the cases:
Command reference:
```
-helm install loggie-aggregator ./ -nloggie-aggregator --create-namespace
+helm install loggie-aggregator ./ -n loggie-aggregator --create-namespace
```
!!! note
diff --git a/docs/getting-started/install/node.md b/docs/getting-started/install/node.md
index d940c08..6f8851b 100644
--- a/docs/getting-started/install/node.md
+++ b/docs/getting-started/install/node.md
@@ -13,7 +13,7 @@ The current release only contains binary executables generated by GOOS=linux GOA
## Download Binary
```
-VERSION=v1.3.0
+VERSION=v1.5.0
mkdir /opt/loggie && curl https://github.com/loggie-io/loggie/releases/download/${VERSION}/loggie-linux-amd64 -o /opt/loggie/loggie && chmod +x /opt/loggie/loggie
```
diff --git a/docs/getting-started/quick-start/kubernetes.md b/docs/getting-started/quick-start/kubernetes.md
index 36cce9a..47af2c1 100644
--- a/docs/getting-started/quick-start/kubernetes.md
+++ b/docs/getting-started/quick-start/kubernetes.md
@@ -23,7 +23,7 @@ You can choose:
#### Download the chart and deploy it
```bash
-VERSION=v1.3.0
+VERSION=v1.5.0
helm pull https://github.com/loggie-io/installation/releases/download/${VERSION}/loggie-${VERSION}.tgz && tar xvzf loggie-${VERSION}.tgz
```
Try to modify values.yaml in it. Please replace the `` above with the specific version number.
@@ -31,14 +31,14 @@ Try to modify values.yaml in it. Please replace the `` above with the s
Deploy:
```bash
-helm install loggie ./loggie -nloggie --create-namespace
+helm install loggie ./loggie -n loggie --create-namespace
```
You can also:
#### Deploy directly:
```bash
-helm install loggie -nloggie --create-namespace https://github.com/loggie-io/installation/releases/download/${VERSION}/loggie-${VERSION}.tgz
+helm install loggie -n loggie --create-namespace https://github.com/loggie-io/installation/releases/download/${VERSION}/loggie-${VERSION}.tgz
```
Please replace the `` above with the specific version number.
diff --git a/docs/getting-started/quick-start/node.md b/docs/getting-started/quick-start/node.md
index 86805e2..4c66854 100644
--- a/docs/getting-started/quick-start/node.md
+++ b/docs/getting-started/quick-start/node.md
@@ -5,7 +5,7 @@ We will demonstrate the simplest scenario of collecting host log files.
### 1. Download the Executable File
Please find a Linux server host and download the Loggie binary executable file.
```shell
-VERSION=v1.3.0
+VERSION=v1.5.0
curl -LJ https://github.com/loggie-io/loggie/releases/download/${VERSION}/loggie-linux-amd64 -o loggie
```
@@ -54,6 +54,5 @@ After adding the above two configuration files on the host, we can start Loggie.
./loggie -config.system=./loggie.yml -config.pipeline=./pipelines.yml -log.jsonFormat=false
```
-Fill the file paths of logie.yml and pipelines.yml in CMD arguments.
-
+Fill the file paths of `loggie.yml` and pipelines.yml in CMD arguments.
Normal startup log shows that Loggie has started to work normally. Contents of files matching `/var/log/*.log` will be printed to standard output.
diff --git a/docs/getting-started/roadmap/roadmap-2023.md b/docs/getting-started/roadmap/roadmap-2023.md
deleted file mode 100644
index b7c9de9..0000000
--- a/docs/getting-started/roadmap/roadmap-2023.md
+++ /dev/null
@@ -1,16 +0,0 @@
-# 2023 Loggie RoadMap
-
-## More Components and Functional Extensions
-- Persistent queue.
-- Stream processing capabilities: aggregation, computing, etc. (similar to pulsar's funtion, or lightweight flink).
-- Source: http ...
-- Sink: clickhouse, influxdb, s3, hdfs, etc.
-- WASM form supports custom log parsing processing.
-- Supports serverless expansion and shrinkage indicators (like Knative/KEDA), and realizes automatic expansion and shrinkage of aggregator analysis and processing.
-
-## Cloud Native and Kubernetes
-- Support automatic injection of Loggie sidecar.
-- opentelemetry compatibility and support.
-
-## Service Discovery
-- Loggie dashboard: Provides a front-end page for configuration management.
diff --git a/docs/reference/apis/config.md b/docs/reference/apis/config.md
new file mode 100644
index 0000000..bd1e867
--- /dev/null
+++ b/docs/reference/apis/config.md
@@ -0,0 +1,44 @@
+# Pipeline config configure related interfaces
+
+## /api/v1/reload/config
+
+### URL
+
+```bash
+GET /api/v1/reload/config
+```
+
+### describe
+
+View the contents of the Pipeline configuration file on the Loggie disk.
+Therefore, every time this interface is called, it will be read from the configuration file specified in the Loggie startup parameter `-config.pipeline`.
+However, this configuration file is not necessarily the configuration currently read in the Loggie memory, because there are situations such as reload interval and configuration file format errors.
+
+### Request parameters
+
+none
+
+### return
+
+Pipeline configuration file text content.
+
+## /api/v1/controller/pipelines
+
+### URL
+
+```bash
+GET /api/v1/controller/pipelines
+```
+
+### describe
+
+View the configuration file read in Loggie's memory, which is also the running Pipeline configuration.
+The configuration content returned by this interface is consistent with that in the [help](help.md#apiv1help) interface.
+
+### Request parameters
+
+none
+
+### return
+
+Pipeline configuration file text content.
diff --git a/docs/reference/apis/help.md b/docs/reference/apis/help.md
new file mode 100644
index 0000000..8ed834c
--- /dev/null
+++ b/docs/reference/apis/help.md
@@ -0,0 +1,194 @@
+# help interface
+
+## /api/v1/help
+
+### **URL**
+
+```bash
+GET /api/v1/help
+```
+
+### describe
+
+Query the internal configuration log collection status of the Loggie Agent
+
+### Request parameters
+
+- pipeline: View the configuration and log collection status of a pipelineName
+- source: View the configuration and log collection status of a sourceName
+- status: It can be pending, that is, only the log files that are being collected are displayed, such as 100%, NaN% and ignored log files, but including 0%.
+
+Example:
+
+```bash
+/api/v1/help?pipeline=test&status=pending
+```
+
+Indicates that only the status of all log files being collected with pipeline test is returned.
+
+### return
+
+Examples are as follows:
+
+```bash
+--------- Usage: -----------------------
+|--- view details: /api/v1/help?detail=, module is one of: pipeline/log
+|--- query by pipeline name: /api/v1/help?pipeline=
+|--- query by source name: /api/v1/help?source=
+
+--------- Pipeline Status: --------------
+all 1 pipelines running
+ * pipeline: local, sources: [demo]
+
+✅ pipeline configurations consistency check passed
+
+pipelines:
+- name: local
+ cleanDataTimeout: 5s
+ queue:
+ type: channel
+ interceptors:
+ - name: global
+ type: schema
+ addMeta:
+ timestamp:
+ key: '@timestamp'
+ order: 700
+ - type: metric
+ - type: maxbytes
+ - type: retry
+ sources:
+ - name: demo
+ type: file
+ fieldsUnderKey: fields
+ fields:
+ topic: loggie
+ paths:
+ - /tmp/log/*.log
+ watcher:
+ maxOpenFds: 6000
+ sink:
+ type: dev
+ parallelism: 1
+ codec:
+ type: json
+ printEvents: true
+ pretty: true
+
+| more details:
+|--- pipelines configuration in the path ref: /api/v1/reload/config
+|--- current running pipelines configuration ref: /api/v1/controller/pipelines
+
+--------- Log Collection Status: ---------
+
+all activeFdCount: 0, inActiveFdCount: 1
+
+> pipeline * source - filename | progress(offset/size) | modify
+ > local
+ * demo
+ - /tmp/log/app.log | 100.00%(189/189) | 2023-07-17T17:10:33+08:00
+
+| more details:
+|--- registry storage ref: /api/v1/source/file/registry
+```
+
+- Pipeline Status
+ - Indicates how many pipelines are running
+ - `pipeline configurations consistency check passed`:Indicates checking whether the pipeline configuration on the disk is consistent with the one in Loggie memory.
+ - Pipelines configuration: The configuration finally read by Loggie, including the global default field and the default field fusion final configuration
+- Log Collection Status
+ - Number of all active and inactive log files: Please note that active and
+ inactive here are Loggie’s internal mechanism for judging activity, and are
+ the same as [maxContinueRead](../../reference/pipelines/source/file.md in
+ file source) #maxcontinueread),
+ [maxContinueReadTimeout](../../reference/pipelines/source/file.md#maxcontinuereadtimeout),
+ [maxEofCount](../../reference/pipelines/source/file.md#maxeofcount) are
+ available relation. For example, if a file prints a log every 1 second, it is not active in the default configuration.
+ - Log collection progress: including file path, collection progress, last modification time, etc.
+
+## /api/v1/help/log
+
+### **URL**
+
+```bash
+GET /api/v1/help/log
+```
+
+### Describe
+
+Query the log collection status of the Loggie Agent
+
+### Request parameters
+
+- pipeline: means only querying the status of a certain pipeline
+- status: If status=pending, it means only returning the status of the log file
+ being collected (including 0%), ignoring the collection progress of 100%, the file does not exist, NaN% and ignored status.
+
+Example:
+
+```bash
+/api/v1/help/log?pipeline=test&status=pending
+```
+
+Indicates that only the status of all log files being collected in the pipeline of test will be returned.
+
+### Return parameters
+
+| Parameter name | Description | Parameter type | Remarks |
+| -------- | -------- | -------- | -------- |
+| fdStatus | file handle status | | |
+| fdStatus.activeFdCount | Number of active fds | int | |
+| fdStatus.inActiveFdCount | Number of inactive fds | int | |
+| fileStatus | File collection status | | |
+| fileStatus.pipeline.`` | Pipeline status, corresponding to the pipeline name in the configuration, refer to the [pipeline](help.md#pipeline) parameters below | map | |
+
+#### pipeline
+
+| Parameter name | Description | Parameter type | Remarks |
+| -------- | -------- | -------- | -------- |
+| source.`` | For the status of the source in the pipeline, refer to the [source](help.md#source) parameter below | map | |
+
+#### source
+
+| Parameter name | Description | Parameter type | Remarks |
+| -------- | -------- | -------- | -------- |
+| paths | The path defined in the configuration file source | string array | |
+| detail | The status of the source in the pipeline | array | |
+| detail[n].filename | file name | string | |
+| detail[n].offset | Collection progress offset | int | |
+| detail[n].size | File size | int | |
+| detail[n].modify | File last updated | int | unix milliseconds |
+| detail[n].ignored | Whether the file is ignored (determined by the ignoreOlder configuration in the file source) | bool | |
+
+!!! example
+
+```json
+ {
+ "fdStatus": {
+ "activeFdCount": 0,
+ "inActiveFdCount": 1
+ },
+ "fileStatus": {
+ "pipeline": {
+ "local": {
+ "source": {
+ "demo": {
+ "paths": [
+ "/tmp/log/*.log"
+ ],
+ "detail": [
+ {
+ "filename": "/tmp/log/access.log",
+ "offset": 469,
+ "size": 469,
+ "modify": 1673436846523,
+ "ignored": false
+ }
+ ]
+ }
+ }
+ }
+ }
+ }
+ }
+```
diff --git a/docs/reference/apis/registry.md b/docs/reference/apis/registry.md
new file mode 100644
index 0000000..290eb21
--- /dev/null
+++ b/docs/reference/apis/registry.md
@@ -0,0 +1,35 @@
+# registry
+
+## /api/v1/source/file/registry
+
+### **URL**
+
+```bash
+GET /api/v1/source/file/registry
+```
+
+### Describe
+
+Query Loggie's persistence collection progress of files under this node.
+Loggie will persist information such as the progress of file collection to the local area. If Loggie is restarted, collection will start from the progress offset recorded in the file to avoid starting over from the beginning of collecting files.
+
+### Request parameters
+
+- format: The display format can be json or text, and the default is json.
+
+Example:
+
+```bash
+/api/v1/source/file/registry?format=text
+```
+
+#### return
+
+Example:
+
+```bash
+{Id:1 PipelineName:local SourceName:demo Filename:/tmp/log/app2.log JobUid:75064440-16777234 Offset:259 CollectTime:2023-07-17 20:19:04.846 Version:0.0.1 LineNumber:9}
+{Id:2 PipelineName:local SourceName:demo Filename:/tmp/log/app.log JobUid:75610913-16777234 Offset:0 CollectTime:2023-07-17 20:19:12.343 Version:0.0.1 LineNumber:0}
+```
+
+- JobUid: composed of {inodeId}-{deviceId} of the file
diff --git a/docs/reference/apis/version.md b/docs/reference/apis/version.md
new file mode 100644
index 0000000..fc2dd5a
--- /dev/null
+++ b/docs/reference/apis/version.md
@@ -0,0 +1,19 @@
+# version
+
+## **URL**
+
+```bash
+GET /version
+```
+
+## Describe
+
+View Loggie version
+
+## Request parameters
+
+none
+
+## Return parameters
+
+Directly returns the version name of Loggie.
diff --git a/docs/reference/discovery/kubernetes/clusterlogconfig.md b/docs/reference/discovery/kubernetes/clusterlogconfig.md
index 37af3c5..e4f9499 100644
--- a/docs/reference/discovery/kubernetes/clusterlogconfig.md
+++ b/docs/reference/discovery/kubernetes/clusterlogconfig.md
@@ -31,6 +31,34 @@ Cluster-level CRDs that can be used to:
## spec.selector
Indicates the scope to which the Pipeline configuration applies
+## spec.namespaceSelector
+
+Technology provider: [ Qingchuang Technology](https://www.eoitek.com/ )
+
+Indicates matching all PODs under the specified namespace
+
+ !!! example
+
+ ```yaml
+ namespaceSelector:
+ - test1
+ - test2
+ ```
+
+## spec.excludeNamespaceSelector
+
+Technology provider: [ Qingchuang Technology](https://www.eoitek.com/ )
+
+Indicates that all PODs under the specified namespace are excluded
+
+!!! example
+
+ ```yaml
+ excludeNamespaceSelector:
+ - test1
+ - test2
+ ```
+
### type: pod
Select a batch of Pods for log collection through Pipeline configuration
@@ -84,6 +112,49 @@ To deliver the Pipeline configuration to a Loggie cluster, it usually needs to b
```
Indicates that the configured Pipelines are delivered to cluster whose `cluster` is aggregator.
+### type: workload
+
+Technology provider: [ Qingchuang Technology](https://www.eoitek.com/ )
+
+Select a batch of loads for log collection through Pipeline configuration
+
+| `Field` | `Type` | `Required or not` | `Default value` | `Meaning` |
+|------|-------|--------| --------- |---------------------------------------------------------|
+| type | array | Optional | | Currently supports Deployment, DaemonSet, CronJob, Job, StatefulSet, if not specified, it will be all |
+| nameSelector | array | Optional | | Load name, if not written, it will be all |
+| namespaceSelector | array | Optional | | Namespace, only matches the load under the specified namespace, if not written, it will be all |
+| excludeNamespaceSelector | array | Optional | | Exclude part of the namespace |
+
+!!! example
+
+ ```yaml
+ apiVersion: loggie.io/v1beta1
+ kind: ClusterLogConfig
+ metadata:
+ name: globalstdout
+ spec:
+ selector:
+ type: workload
+ workload_selector:
+ - type:
+ - Deployment
+ nameSelector:
+ - default1
+ - default2
+ nameSpaceSelector:
+ - default1
+ - default2
+
+ pipeline:
+ sources: |
+ - type: file
+ name: stdout
+ paths:
+ - stdout
+ sinkRef: default
+ ```
+
+Indicates that the logs of all Pods named default1 and default2 under the Deployment of default1 and default2 are collected.
### cluster
@@ -95,3 +166,15 @@ To deliver the Pipeline configuration to a Loggie cluster, it usually needs to b
## spec.pipeline
The configuration is consistent with LogConfig.
+
+### sources
+
+In ClusterLogConfig, `source` adds the following additional parameters:
+
+#### typeNodeFields
+
+If when `type: node`:
+
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| typeNodeFields | map | Optional | | The same as [typeNodeFields] (variables supported by ../../global/discovery.md#typenodefields) in the global configuration discovery.kubernetes, the difference is that it takes effect at the clusterlogconfig level |
diff --git a/docs/reference/discovery/kubernetes/logconfig.md b/docs/reference/discovery/kubernetes/logconfig.md
index a3591c1..047fcfd 100644
--- a/docs/reference/discovery/kubernetes/logconfig.md
+++ b/docs/reference/discovery/kubernetes/logconfig.md
@@ -111,6 +111,39 @@ In LogConfig, when `type: pod`, several parameters specifically for containeriza
| matchFields.annotationKey | string array | false | | Similar to the above labelKey. Inject annotations of pod. "*" is supported |
| matchFields.env | string array | false | | Similar to the above labelKey. Inject env of pod. "*" is supported |
+#### matchFields
+
+Optional, add the information in the Pod to Fields
+
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| labelKey | string array | Optional | | Specify the Label Key value on the added Pod. For example, if the Pod contains Label: `app: demo`, fill in `labelKey: app` here, and the ` on the Pod will be added. The app: demo` label is added to the file source fields, and the collected logs will be added with the label information. There are scenarios where the labels applied to matching Pods are inconsistent. Supports configuring "*" to obtain all labels |
+| annotationKey | string array | Optional | | Similar to labelKey above, the injected value is the value of Pod Annoatation, and supports configuring "*" to obtain all annotations |
+| env | string array | Optional | | Similar to labelKey above, the value injected is the value of the Pod Env environment variable. It supports configuring "*" to obtain all env |
+| reformatKeys | | Optional | | Reformat key |
+| reformatKeys.label | fmt parameter array | Optional | | Reformat label key, please refer to the following [fmt parameter](./logconfig.md#fmt) |
+| reformatKeys.annotation | fmt parameter array | Optional | | Reformat the annotation key, please refer to the following [fmt parameter](./logconfig.md#fmt)|
+| reformatKeys.env | fmt parameter array | Optional | | Reformat env key, please refer to the following [fmt parameter](./logconfig.md#fmt) |
+
+##### fmt
+
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| regex | string | optional | | matching regular expression |
+| replace | string | optional | | re-rendered format |
+
+!!! example "reformatKeys"
+
+ 假设pod labels为`aa.bb/foo=bar`
+ 配置reformatKeys如下:
+ ```
+ matchFields:
+ reformatKeys:
+ label:
+ - regex: aa.bb/(.*)
+ replace: pre-${1}
+ ```
+ 最终添加到日志的元信息为:`pre-foo=bar`
!!! example
diff --git a/docs/reference/global/db.md b/docs/reference/global/db.md
new file mode 100644
index 0000000..0adc0cf
--- /dev/null
+++ b/docs/reference/global/db.md
@@ -0,0 +1,57 @@
+# db
+
+Configuration of persistent data. Save the file name, file inode, file collection offset and other information during the collection process. Used to restore the last collection progress after loggie reload or restart.
+
+v1.5 and later versions have newly introduced [badger persistence engine] (../../developer-guide/build.md), which can replace the previous sqlite and avoid using CGO.
+
+!!! caution
+
+ Please note that **incompatible changes**: after v1.5 (including v1.5), the db in the file source is moved here as a global configuration.
+
+ If you upgrade from a lower version to v1.5 or later, be sure to check whether the file source has a db configured. If not configured, it can be ignored and the default value will remain compatible.
+
+!!! example
+
+ === "sqlite"
+
+ ```yml
+ db:
+ file: /opt/data/loggie.db
+ ```
+
+ === "badger"
+
+ ```yml
+ db:
+ file: /opt/data/badger
+ ```
+
+## file
+
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| file | string | Optional | badger: `./data/badger`,sqlite: `./data/loggie.db` | Persistent directory files |
+
+## flushTimeout
+
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| flushTimeout | time.Duration | Optional | 2s | |
+
+## bufferSize
+
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| bufferSize | int | Optional | 2048 | The buffer used for persistent writing |
+
+## cleanInactiveTimeout
+
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| cleanInactiveTimeout | time.Duration | Optional | 504h | If a record has not been updated for a long time, it will be cleared. The default is 21d (504h). |
+
+## cleanScanInterval
+
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| cleanInactiveTimeout | time.Duration | Optional | 1h | Time interval for cleanup logic execution |
diff --git a/docs/reference/global/subcmd.md b/docs/reference/global/subcmd.md
new file mode 100644
index 0000000..50293c8
--- /dev/null
+++ b/docs/reference/global/subcmd.md
@@ -0,0 +1,67 @@
+# child command
+
+## genfiles
+
+Simulation generates logs. Can be used for stress testing and testing scenarios of log collection.
+
+### Parameters
+
+- totalCount: The total number of generated log lines.
+- lineBytes: The log length of each line.
+- qps: qps to generate logs.
+- Other log parameters: The genfiles subcommand essentially uses loggie's own logging framework to generate a large number of logs, so loggie's [log parameters] (./args.md#log parameters) are also parameters of the genfiles subcommand.
+
+### How to use
+
+#### Using loggie executable file
+
+```bash
+LOG_DIR=/tmp/log ## log directory
+LOG_MAXSIZE=10 ## max size in MB of the logfile before it's rolled
+LOG_QPS=0 ## qps of line generate
+LOG_TOTAL=5 ## total line count
+LOG_LINE_BYTES=1024 ## bytes per line
+LOG_MAX_BACKUPS=5 ## max number of rolled files to keep
+
+./loggie genfiles -totalCount=${LOG_TOTAL} -lineBytes=${LOG_LINE_BYTES} -qps=${LOG_QPS} \
+ -log.maxBackups=${LOG_MAX_BACKUPS} -log.maxSize=${LOG_MAXSIZE} -log.directory=${LOG_DIR} -log.noColor=true \
+ -log.enableStdout=false -log.enableFile=true -log.timeFormat="2006-01-02 15:04:05.000"
+
+```
+
+Example:
+
+```bash
+# Generate a log file loggie.log under /tmp/log,
+# which contains 1000 logs, each line of log is 1KB, and the total size is about 1.1MB
+
+./loggie genfiles -totalCount=1000 -lineBytes=1024 -qps=0 \
+ -log.maxBackups=1 -log.maxSize=1000 -log.directory=/tmp/log -log.noColor=true \
+ -log.enableStdout=false -log.enableFile=true -log.timeFormat="2006-01-02 15:04:05.000"
+```
+
+To use it in a container, please refer to [deployment](https://github.com/loggie-io/catalog/tree/main/common/genfiles) in the deployment catalog, and exec to the Pod in it to execute the above command.
+
+#### Local testing
+
+Example:
+
+```makefile
+make genfiles LOG_TOTAL=1000
+```
+
+For specific other parameters, please refer to the makefiles in the project.
+
+## version
+
+Check out Loggie's version.
+
+### Parameters
+
+none
+
+### How to use
+
+```bash
+./loggie version
+```
diff --git a/docs/reference/monitor/info.md b/docs/reference/monitor/info.md
new file mode 100644
index 0000000..b9d4fdb
--- /dev/null
+++ b/docs/reference/monitor/info.md
@@ -0,0 +1,21 @@
+# info listener
+
+Display some information about Loggie itself.
+
+## Configuration
+
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| period | time.Duration | Optional | 10s | Time interval for exposure indicators |
+
+## Metrics
+
+### loggie_info_stat
+
+```bash
+# HELP loggie_info_stat Loggie info
+# TYPE loggie_info_stat gauge
+loggie_info_stat{version=v1.4} 1
+```
+
+The version represents the version number of Loggie itself (the version number is injected when Loggie is built. If the correct version number does not appear, please check the parameters of go build and compile).
diff --git a/docs/reference/monitor/logalert.md b/docs/reference/monitor/logalert.md
index 727c5d8..a97c79c 100644
--- a/docs/reference/monitor/logalert.md
+++ b/docs/reference/monitor/logalert.md
@@ -1,13 +1,31 @@
# logAlert listener
-Used for sending log alerts.
+Used for sending log alarms.
+For usage examples, please refer to [Log Alarm](../../user-guide/monitor/service-log-alarm.md).
+
+!!! example
+
+ ```yaml
+ logAlert:
+ addr: [ "http://127.0.0.1:8080/loggie" ]
+ bufferSize: 100
+ batchTimeout: 10s
+ batchSize: 1
+ lineLimit: 10
+ template: |
+ *****
+ ```
## Configuration
| `field` | `type` | `required` | `default` | `description` |
| ---------- | ----------- | ----------- | --------- | -------- |
-| alertManagerAddress | string arrays | true | | alertManager addresses |
-| bufferSize | int | false | 100 | The size of the buffer sent by the logAlert. Unit is the number of alert events. |
-| batchTimeout | time.Duration | false | 10s | The maximum sending time of each alarm batch. |
-| batchSize | int | false | 10 | The maximum number of alert included in each alarm batch. |
-
+| addr | string array | required | | http address to send alert |
+| bufferSize | int | Optional | 100 | The buffer size for sending log alarms, the unit is the number of alarm events |
+| batchTimeout | time.Duration | Optional | 10s | The maximum sending time for each alarm sending batch |
+| batchSize | int | Optional | 10 | The maximum number of alarm requests included in each alarm sending batch |
+| template | string | Optional | | Go template that renders the sent alert structure |
+| timeout | time.Duration | optional | 30s | http timeout for sending alert |
+| headers | map | optional | | http header to send alert |
+| method | string | Optional | POST | http method to send alert, if put is not filled in (not case-sensitive), it will be considered POST |
+| lineLimit | int | Optional | 10 | In the case of multi-line log collection, the maximum number of log lines included in each alert |
diff --git a/docs/reference/pipelines/interceptor/addhostmeta.md b/docs/reference/pipelines/interceptor/addhostmeta.md
new file mode 100644
index 0000000..e1bfb11
--- /dev/null
+++ b/docs/reference/pipelines/interceptor/addhostmeta.md
@@ -0,0 +1,64 @@
+# addHostMeta
+
+For host deployment, add some meta-information parameters of the host.
+
+!!! example
+
+ ```yaml
+ interceptors:
+ - type: addHostMeta
+ addFields:
+ hostname: "${hostname}"
+ ip: "${ip}"
+ os: "${os}"
+ platform: "${platform}"
+ kernelVersion: "${kernelVersion}"
+ kernelArch: "${kernelArch}"
+ ```
+
+## addFields
+
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| addFields | map | required | | meta information to be added |
+
+The meta information fields currently supported are:
+
+- `${hostname}`: node name
+- `${ip}`: array of IPv4 addresses of nodes
+- `${os}`: operating system
+- `${kernelVersion}`: kernel version
+- `${kernelArch}`
+- `${platform}`
+- `${platformFamily}`
+- `${platformVersion}`
+
+With the above meta-information, the displayed log example is:
+
+!!! example
+
+ ```json
+ {
+ "@timestamp": "2023-07-13T07:13:50.394Z",
+ "host": {
+ "kernelVersion": "22.2.0",
+ "os": "darwin",
+ "platform": "darwin",
+ "platformFamily": "Standalone Workstation",
+ "platformVersion": "13.1",
+ "hostname": "xxxMacBook-Pro.local",
+ "ip": [
+ "10.xxx.xxx.221",
+ "192.xxx.xxx.1"
+ ],
+ "kernelArch": "arm64"
+ },
+ "body": "xxx"
+ }
+ ```
+
+## fieldsName
+
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| fieldsName | string | Optional | host | Added meta information field name |
diff --git a/docs/reference/pipelines/interceptor/addk8smeta.md b/docs/reference/pipelines/interceptor/addk8smeta.md
index 638fa7d..e216fc2 100644
--- a/docs/reference/pipelines/interceptor/addk8smeta.md
+++ b/docs/reference/pipelines/interceptor/addk8smeta.md
@@ -23,7 +23,7 @@ With any one of the above three kinds of index information, Loggie can query the
## pattern
-| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
| ---------- | ----------- | ----------- | --------- | -------- |
| pattern | string | true | | Matching model for extracting fields |
@@ -38,19 +38,19 @@ For example: `/var/log/${pod.uid}/${pod.name}/`
## patternFields
-| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
| ---------- | ----------- | ----------- | --------- | -------- |
| patternFields | string | false | By default, the filename in the system field will be obtained from the event. In this case, you need to use the file source | Fields for the extracting pattern from the event |
## fieldsName
-| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
| ---------- | ----------- | ----------- | --------- | -------- |
| fieldsName | string | false | kubernetes | Fields to add meta information |
## addFields
-| `字段` | `类型` | `是否必填` | `默认值` | `含义` |
+| `Field` | `Type` | `Required or not` | `Default value` | `Description` |
| ---------- | ----------- | ----------- | --------- | -------- |
| addFields | map | false | | Meta information to be added |
diff --git a/docs/reference/pipelines/interceptor/logalert.md b/docs/reference/pipelines/interceptor/logalert.md
index 53150e3..a05cc3d 100644
--- a/docs/reference/pipelines/interceptor/logalert.md
+++ b/docs/reference/pipelines/interceptor/logalert.md
@@ -11,21 +11,85 @@ Please refer to [Log Alarm](../../../user-guide/monitor/service-log-alarm.md) fo
- type: logAlert
matcher:
contains: ["error", "err"]
-
+ regexp: ['.*example.*']
+ ignore: ['.*INFO.*']
+ sendOnlyMatched: true
+ additions:
+ module: "loggie"
+ alertname: "alert-test"
+ cluster: "local-cluster"
+ namespace: "default"
+ advanced:
+ enabled: true
+ mode: [ "noData","regexp" ]
+ duration: 6h
+ matchType: "any"
+ rules:
+ - regexp: '(?.*?) (?[\S|\\.]+) ([\S|\\.]+) (?.*?) --- (?\[*?\]) (?.*) : (?(.|\n|\t)*)'
+ matchType: "any"
+ groups:
+ - key: status
+ operator: "eq"
+ value: WARN
+ - key: thread
+ operator: "eq"
+ value: 200
+ - regexp: '(?.*?) (?[\S|\\.]+) (?[\S|\\.]+) (?.*?) --- (?\[.*?\]) (?.*) : (?(.|\n|\t)*)'
+ matchType: "any"
+ groups:
+ - key: status
+ operator: "eq"
+ value: ERROR
```
## matcher
| `field` | `type` | `required` | `default` | `description` |
| ---------- | ----------- | ----------- | --------- | -------- |
-| matcher.contains | string array | false | | check whether log contains string |
-| matcher.regexp | string array | false | | check whether log matches regexp pattern |
-| matcher.target | string | false | body | the field of log data to check |
+| matcher.contains | string array | false | | Log data contains string detection |
+| matcher.regexp | string array | false | | Log data regularity detection |
+| matcher.target | string | Optional | body | Detect based on this field of log data. If you want to split the log or drop the body field, please fill in the required fields |
+## ignore
-## labels
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| ignore | string array | optional | | regular expression, if it matches, ignore this log and pass it downwards, which can be used to exclude certain logs when alerting |
+
+## additions
| `field` | `type` | `required` | `default` | `description` |
| ---------- | ----------- | ----------- | --------- | -------- |
-| labels.from | string array | false | | additional labels from the header. Fill in the specific key name in the header. |
+| additions | map | false | | When sending alert, the additional fields will be placed in the `_additions` field and can be used for rendering. |
+
+## sendOnlyMatched
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- |------| ----------- |-------|--------------------|
+| sendOnlyMatched | bool | false | false | Whether to send only successfully matched data to the sink |
+
+## advanced
+
+| `field` | `type` | `required` | `default` | `description` |
+|-----------|---------------| ----------- | --------- |----------------------------------------|
+| enabled | bool | false | false | whether to enable advanced matching mode |
+| mode | string list | false | | Matching mode supports both `regexp` and `noData`, which can be effective at the same time. |
+| duration | time.Duration | false | | NoData mode is required. Within a certain period of time, if there is no log, an alarm will be issued. |
+| matchType | string | false | | regexp mode is required, optional any or all, indicating matching any or all rules rule |
+| rules | Rule list | false | | regexp pattern required, matching rule list |
+
+### advanced.rule
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| regexp | string | true | | regular grouping expression |
+| matchType | string | true | | Optional any or all, indicating matching any or all matching groups |
+| groups | group list | true | | matching group list |
+
+#### advanced.rule.group
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| key | string | true | | key value after group matching |
+| operator | string | true | | Operator, currently supports eq, gt, lt |
+| value | string | true | | target value |
diff --git a/docs/reference/pipelines/interceptor/maxbytes.md b/docs/reference/pipelines/interceptor/maxbytes.md
index 9a42081..1d545bc 100644
--- a/docs/reference/pipelines/interceptor/maxbytes.md
+++ b/docs/reference/pipelines/interceptor/maxbytes.md
@@ -16,3 +16,9 @@ Source interceptor which is Built-in and loaded by default.
| `field` | `type` | `required` | `default` | `description` |
| ---------- | ----------- | ----------- | --------- | -------- |
| maxBytes | int | false | | The maximum number of bytes in a single line. The excess part will be discarded. |
+
+## target
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| target | string | optional | body | target field |
diff --git a/docs/reference/pipelines/interceptor/transformer.md b/docs/reference/pipelines/interceptor/transformer.md
index 71ee774..1cae58a 100644
--- a/docs/reference/pipelines/interceptor/transformer.md
+++ b/docs/reference/pipelines/interceptor/transformer.md
@@ -51,6 +51,8 @@ interceptors:
### Common fields
+#### ignoreError
+
- ignoreError: Indicates whether to ignore and print errors during the processing of this action.
!!! example
@@ -63,7 +65,22 @@ interceptors:
ignoreError: true
```
The ignoreError here is set to true, which means that the regular matching error will be ignored, and subsequent actions will continue to be executed.
-
+
+#### dropIfError
+
+Indicates that if an error occurs, the event will be discarded directly.
+
+!!! example
+
+ ```yaml
+ - type: transformer
+ actions:
+ - action: regex(body)
+ pattern: (?\S+) (?\S+) (?\S+) (?\[.*?\]) (?\".*?\") (?\S+) (?\S+)
+ dropIfError: true
+ ```
+
+ The dropIfError here is set to true, which means that if a regular matching error occurs, the log will be discarded directly (subsequent actions will not be executed).
### add(key, value)
Add additional key:value to the event.
@@ -375,6 +392,49 @@ Extra fields:
}
```
+### grok(key)
+Use grok to split logs and extract fields.
+It can also be grok(key, to).
+
+parameter:
+
+- key: required, field extracted by grok
+- to: Optional, the key to which all fields will be placed after extraction. The default is empty, which means extracting the field to the root
+
+Extra fields:
+
+- match: required, grok expression
+- ignoreBlank: optional, defaults to true, whether to ignore empty fields. If the result of the parsed field key is "", the result will not be written to key: ""
+- pattern: optional, custom pattern
+- patternPaths: optional, obtains the path of pattern, supports url and path, where url is the response for parsing the get request. Here is an example [url](https://raw.githubusercontent.com/vjeantet/grok/master/patterns/grok-patterns); path is the local path. If you fill in a directory, all files in the directory will be fetched. rules that may be included in
+
+!!! example
+
+ ```yaml
+ - action: grok(body)
+ match: "^%{DATESTAMP:datetime} %{FILE:file}:%{INT:line}: %{IPV4:ip} %{PATH:path} %{UUID:uuid}(?P[a-zA-Z]?)"
+ pattern:
+ FILE: "[a-zA-Z0-9._-]+"
+ ```
+
+ input:
+
+ ```json
+ {
+ "body": "2022/05/28 01:32:01 logTest.go:66: 192.168.0.1 /var/log/test.log 54ce5d87-b94c-c40a-74a7-9cd375289334",
+ }
+ ```
+
+ output:
+
+ ```json
+ "datetime": "2022/05/28 01:32:01",
+ "line": "66",
+ "ip": "192.168.0.1",
+ "path": "/var/log/test.log",
+ "uuid": "54ce5d87-b94c-c40a-74a7-9cd375289334",
+ ```
+
### jsonDecode(key)
Deserialize json text.
Can also be jsonDecode(key, to).
@@ -406,6 +466,86 @@ parameter:
}
```
+### jsonEncode(key)
+
+Serialize multiple fields into json string form.
+
+Can also be jsonEncode(key, to).
+
+parameter:
+
+- key: required, corresponding field key
+- to: Optional, the key to which all fields will be placed after extraction. The default is empty, which means the field is extracted to the root
+
+!!! example
+
+ ```yaml
+ interceptors:
+ - type: transformer
+ actions:
+ - action: jsonEncode(fields)
+ ```
+
+ input:
+
+ ```json
+ {
+ "body": "this is test",
+ "fields":
+ "topic": "loggie",
+ "foo": "bar"
+ }
+ ```
+
+ output:
+
+ ```json
+ {
+ "fields": "{\"topic\":\"loggie\",\"foo\":\"bar\"}",
+ "body": "this is test"
+ }
+ ```
+
+### split(key)
+
+Split a line of logs according to a certain delimiter.将一行日志根据某种分割符切分。
+
+parameter:
+
+- key: required, corresponding field key
+- to: Optional, the key to which all fields will be placed after extraction. The default is empty, which means the field is extracted to the root
+
+Extra fields:
+
+- separator: separator, string, required
+- max: the maximum number of fields obtained after splitting by delimiter, int, optional, default value is -1
+- keys: key corresponding to the field after splitting, string array, required
+
+!!! example
+
+ ```yaml
+ interceptors:
+ - type: transformer
+ actions:
+ - action: split(body)
+ separator: "|"
+ keys: ["time", "order", "service", "price"]
+ ```
+
+ input:
+
+ ```json
+ "body": `2021-08-08|U12345|storeCenter|13.14`,
+ ```
+
+ output:
+
+ ```json
+ "time": "2021-08-08"
+ "order": "U12345"
+ "service": "storeCenter"
+ "price: "13.14"
+ ```
### strconv(key, type)
Value type conversion.
@@ -437,6 +577,41 @@ parameter:
}
```
+### toStr(key, type)
+
+Convert field value to string.
+
+参数:
+
+parameter:
+
+- key: target field
+- type: field type before conversion, which can be `bool`, `int`, `float`, `int64`, `float64`. It is not required. If the field type is known, it is recommended to fill it in. Please ensure that the type is correct when filling it in, otherwise the conversion may fail. If it is not filled in, the field type will be obtained based on reflection, which may affect the collection efficiency.
+
+!!! example
+
+ ```yaml
+ - action: toStr(code, int)
+ ```
+
+ input:
+
+ ```json
+ {
+ "body": "2021-02-16T09:21:20.545525544Z DEBUG this is log body",
+ "code": 200
+ }
+ ```
+
+ output:
+
+ ```json
+ {
+ "body": "2021-02-16T09:21:20.545525544Z DEBUG this is log body",
+ "code": "200"
+ }
+ ```
+
### print()
Print event. Generally used in the debugging phase.
@@ -512,6 +687,11 @@ Whether the value of field is equal to target.
### contain(key, target)
Whether the value of field contains target.
+!!! caution
+
+Please use the target string directly without adding double quotes.
+ For example, `contain(body, error)`, not `contain(body, "error")`. `contain(body, "error")` will be matched as `"error"`.
+
### exist(key)
Whether the field exists or is empty.
diff --git a/docs/reference/pipelines/sink/dev.md b/docs/reference/pipelines/sink/dev.md
index 8253ab6..bffc9ad 100644
--- a/docs/reference/pipelines/sink/dev.md
+++ b/docs/reference/pipelines/sink/dev.md
@@ -1,6 +1,5 @@
# dev
-
The dev sink prints log data to the console, which can generally be used for debugging or troubleshooting.
After configuring the dev sink, you can set printEvents=true to view the log data sent to the sink in Loggie. In addition to the original logs received or collected by the source, the data generally contains other meta information.
@@ -21,4 +20,35 @@ After configuring the dev sink, you can set printEvents=true to view the log dat
| ---------- | ----------- | ----------- | --------- | -------- |
| printEvents | bool | false | false | Whether to print the collected logs |
-By default, the log of Loggie is printed in json format, and the CMD arguments can be configured `-log.jsonFormat=false` to facilitate viewing the output results on the log of Loggie.
\ No newline at end of file
+By default, the log of Loggie is printed in json format, and the CMD arguments can be configured `-log.jsonFormat=false` to facilitate viewing the output results on the log of Loggie.
+
+## printEventsInterval
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| printEventsInterval | time.Duration | Optional | | Print the collected logs at intervals. If the amount of logs to be output is large, you can fill in a time interval, such as `10s`. Loggie will only print the log every 10s to avoid Too many logs swipe the screen and are inconvenient to view |
+
+## printMetrics
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| printMetrics | bool | Optional | | Whether to print some indicators of sending log statistics, including totalCount and qps |
+
+## printMetricsInterval
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| printMetricsInterval | time.Duration | Optional | 1s | The time interval for printing after printMetrics is turned on |
+
+## resultStatus
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| resultStatus | string | Optional | success | Simulates the processing of logs by the sink sender, including `success`, `fail`, `drop` |
+
+In addition to filling in the configuration, you can also use the interface to dynamically simulate, such as setting the sink `success` state to `fail` state when Loggie is running.
+
+ ```bash
+ # Please replace the following with the actual configured pipelineName
+ curl :/api/v1/pipeline//sink/dev?status=fail
+ ```
diff --git a/docs/reference/pipelines/sink/elasticsearch.md b/docs/reference/pipelines/sink/elasticsearch.md
index 8caa6af..0afdce1 100644
--- a/docs/reference/pipelines/sink/elasticsearch.md
+++ b/docs/reference/pipelines/sink/elasticsearch.md
@@ -11,6 +11,17 @@ Use Elasticsearch sink to send data to Elasticsearch cluster.
index: "log-${fields.service}-${+YYYY.MM.DD}"
```
+!!! caution
+
+ If the elasticsearch version is v6.x, please add the following `etype: _doc` parameter.
+
+ ```yaml
+ sink:
+ type: elasticsearch
+ etype: _doc
+ ...
+ ```
+
## hosts
| `field` | `type` | `required` | `default` | `description` |
@@ -40,27 +51,56 @@ You can use `${a.b}` to obtain fields in the log data, or add `${+YYYY.MM.DD.hh}
| ---------- | ----------- | ----------- | --------- | -------- |
| password | string | false | none | If Elasticsearch is configured with username and password authentication, you need to fill in the requested password. |
-## schema
+## headers
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| headers | map | Optional | None | Request headers carried by Elasticsearch |
+
+## parameters
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| parameters | map | Optional | None | Request the url parameters of Elasticsearch |
+
+## apiKey
| `field` | `type` | `required` | `default` | `description` |
| ---------- | ----------- | ----------- | --------- | -------- |
-| schema | string | false | http | used for client sniffing |
+| apiKey | string | Optional | | Base64-encoded token used for authorization; if set, overrides username/password and service token |
-## sniff
+## serviceToken
| `field` | `type` | `required` | `default` | `description` |
| ---------- | ----------- | ----------- | --------- | -------- |
-| sniff | bool | false | false | whether to enable sniffer |
+| serviceToken | string | Optional | | Service token used for authorization; if set, overrides username/password |
-## gzip
+## caCertPath
| `field` | `type` | `required` | `default` | `description` |
| ---------- | ----------- | ----------- | --------- | -------- |
-| gzip | bool | false | false | whether to enable gzip compression for sending data |
+| caCertPath | string | Optional | | The path where the pem-encoded ca certificate is stored |
+
+## compress
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| compress | bool | optional | false | whether to compress the request body |
## documentId
| `field` | `type` | `required` | `default` | `description` |
| ---------- | ----------- | ----------- | --------- | -------- |
-| documentId | string | false | | The id value sent to elasticsearch, which can be extracted from a field by `${}`. |
+| documentId | string | Optional | | The id value sent to elasticsearch, you can use `${}` to get a certain field |
+
+## opType
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| opType | string | Optional | index | Reference [Official Document](https://www.elastic.co/guide/en/elasticsearch/reference/master/docs-index_.html#docs-index-api-query -params), if the target is datastream, it needs to be set to create |
+
+## sendBufferBytes
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| sendBufferBytes | int | Optional | 131072 | The number of bytes in the buffer sent to write the request body |
diff --git a/docs/reference/pipelines/sink/franzkafka.md b/docs/reference/pipelines/sink/franzkafka.md
new file mode 100644
index 0000000..0b66bb9
--- /dev/null
+++ b/docs/reference/pipelines/sink/franzkafka.md
@@ -0,0 +1,216 @@
+# franz kafka
+
+Use the [franz-go kafka](https://github.com/twmb/franz-go) library to send log data to downstream Kafka, and can better support Kerberos authentication.
+(The difference between this sink and kafka sink is generally only the kafka golang library used, which is provided for users who have a preference for the franz kafka library)
+
+!!! example
+
+=== "Simple"
+
+ ```yaml
+ sink:
+ type: franzKafka
+ brokers: ["127.0.0.1:6400"]
+ topic: "log-${fields.topic}"
+ ```
+
+=== "SASL Authentication"
+
+ ```yaml
+ sink:
+ type: franzKafka
+ brokers: ["127.0.0.1:6400"]
+ topic: "demo"
+ sasl:
+ enabled: true
+ mechanism: SCRAM-SHA-512
+ username: ***
+ password: ***
+ ```
+
+## brokers
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| brokers | string array | required | none | brokers address that sends logs to Kafka |
+
+## topic
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| topic | string | optional | loggie | Topic for sending logs to Kafka |
+
+You can use `${a.b}` to get the field value in the event as the specific topic name.
+
+For example, an event is:
+
+ ```json
+ {
+ "topic": "loggie",
+ "hello": "world"
+ }
+ ```
+
+You can configure `topic: ${topic}`, and the topic sent to Kafka for this event is "loggie".
+
+Nested selection methods are also supported:
+
+ ```json
+ {
+ "fields": {
+ "topic": "loggie"
+ },
+ "hello": "world"
+ }
+ ```
+
+Configurable `topic: ${fields.topic}` will also be sent to topic "logie".
+
+## ifRenderTopicFailed
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| ifRenderTopicFailed | | Optional | | If you use dynamic rules to render topic, such as `topic: ${fields.topic}`, the rendering may fail (for example, the log does not have fields.topic field). The following configuration indicates the action after failure. . |
+| ifRenderTopicFailed.dropEvent | | Optional | true | Default is to drop |
+| ifRenderTopicFailed.ignoreError | | Optional | | Ignore the error log, please note that the error log is not printed here |
+| ifRenderTopicFailed.defaultTopic | | Optional | | Send to the default topic, dropEvent will not take effect after configuration |
+
+!!! example
+
+ === "1"
+
+ ```yaml
+ sink:
+ type: kafka
+ brokers: ["127.0.0.1:6400"]
+ topic: "log-${fields.topic}"
+ ifRenderTopicFailed:
+ dropEvent: true
+ ```
+
+ === "2"
+
+ ```yaml
+ sink:
+ type: kafka
+ brokers: ["127.0.0.1:6400"]
+ topic: "log-${fields.topic}"
+ ifRenderTopicFailed:
+ ignoreError: true
+ defaultTopic: default
+ ```
+
+## ignoreUnknownTopicOrPartition
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| ignoreUnknownTopicOrPartition | | Optional | | Used to ignore the error Kafka returns UNKNOWN_TOPIC_OR_PARTITION when the sent topic does not exist |
+
+- This situation generally occurs when a dynamically rendered topic is used, but Kafka in the environment turns off automatic topic creation, resulting in the inability to send it to the rendered topic. By default, Loggie will keep retrying and cannot send new logs.
+- After turning on ignoreUnknownTopicOrPartition, the sent logs will be discarded directly to avoid affecting the sending of other logs that normally contain existing topics.
+- Please note the difference from `ifRenderTopicFailed` above. `ifRenderTopicFailed` means that the topic cannot be dynamically rendered or the rendered value is a null value, while `ignoreUnknownTopicOrPartition` means that the rendering is successful, but the topic does not actually exist in Kafka.
+
+## balance
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- |----------------------------------------|
+| balance | string | Optional | roundRobin | Load balancing strategy, you can fill in `roundRobin`, `range`, `sticky`, `cooperativeSticky` |
+
+## compression
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| compression | string | Optional | gzip | Compression strategy for sending logs to Kafka, you can fill in `gzip`, `snappy`, `lz4`, `zstd` |
+
+## batchSize
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| batchSize | int | Optional | 100 | The maximum number of data contained in each batch when sending |
+
+## batchBytes
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| batchBytes | int | Optional | 1048576 | Maximum number of bytes per send request |
+
+## writeTimeout
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| writeTimeout | time.Duration | Optional | 10s | Write timeout |
+
+## tls
+
+| `field` | `type` | `required` | `default` | `description` |
+|------------------------|--------| ----------- | --------- |-----------------------|
+| tls.enabled | bool | optional | false | whether to enable |
+| tls.caCertFiles | string | Optional | | Certificate file path |
+| tls.clientCertFile | string | Required | | SASL type, can be: `client cert file` |
+| tls.clientKeyFile | string | Required | | SASL type, can be: `client key file` |
+| tls.endpIdentAlgo | bool | required when type=scram | | Whether the client verifies the server's certificate name |
+
+## sasl
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- |------------------------------------------------------------------------------------|
+| sasl | | optional | | SASL authentication |
+| sasl.enabled | bool | optional | false | whether to enable |
+| sasl.mechanism | string | Required | | SASL type, can be: `PLAIN`, `SCRAM-SHA-256`, `SCRAM-SHA-512`, `GSSAPI`|
+| sasl.username | string | required | | username |
+| sasl.password | string | required | | password |
+
+## gssapi
+
+| `field` | `type` | `required` | `default` | `description` |
+|--------------------------------|--------| ----------- | --------- |--------------------------------------|
+| sasl.gssapi | | Optional | | SASL authentication |
+| sasl.gssapi.authType | string | Required | | SASL type, can be: 1 using account password, 2 using keytab |
+| sasl.gssapi.keyTabPath | string | Required | | keytab file path |
+| sasl.gssapi.kerberosConfigPath | string | Required | | kerbeos file path |
+| sasl.gssapi.serviceName | string | Required | | service name |
+| sasl.gssapi.userName | string | Required | | username |
+| sasl.gssapi.password | string | Required | | password |
+| sasl.gssapi.realm | string | Required | | field |
+| sasl.gssapi.disablePAFXFAST | bool | Required when type=scram | |DisablePAFXFAST is used to configure the client not to use PA_FX_FAST |
+
+## security
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- |--------| ----------- | ------ |-------------------------------------|
+| security | string | Optional | | Security certification content in java format, which can be automatically converted into a format adapted to franz-go |
+
+!!! example
+ ```
+ pipelines:
+ - name: local
+ sources:
+ - type: file
+ name: demo
+ paths:
+ - /tmp/log/*.log
+ sink:
+ type: franzKafka
+ brokers:
+ - "hadoop74.axrzpt.com:9092"
+ topic: loggie
+ writeTimeout: 5s
+ sasl:
+ gssapi:
+ kerberosConfigPath: /etc/krb5-conf/krb5.conf
+ security:
+ security.protocol: "SASL_PLAINTEXT"
+ sasl.mechanism: "GSSAPI"
+ sasl.jaas.config: "com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true debug=true keyTab=\"/shylock/kerberos/zork.keytab\" principal=\"zork@AXRZPT.COM\";"
+ sasl.kerberos.service.name: "kafka"
+ ```
+
+To mount the keytab binary certificate on Kubernetes, please refer to [official documentation](https://kubernetes.io/zh-cn/docs/concepts/configuration/secret/).
+
+## partitionKey
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- |--------| ----------- |-------|-----------------|
+| partitionKey | string | Optional | None | Control which partition under the topic is sent to |
+
+Similar to topic, you can use `${a.b}` to get the field value in the event as the specific topic name.
diff --git a/docs/reference/pipelines/sink/kafka.md b/docs/reference/pipelines/sink/kafka.md
index daf4c19..5ac4c23 100644
--- a/docs/reference/pipelines/sink/kafka.md
+++ b/docs/reference/pipelines/sink/kafka.md
@@ -4,12 +4,30 @@ Use sink kafka to send log data to downstream Kafka.
!!! example
- ```yaml
- sink:
- type: kafka
- brokers: ["127.0.0.1:6400"]
- topic: "log-${fields.topic}"
- ```
+ === "SIMPLE"
+ ```yaml
+ sink:
+ type: kafka
+ brokers: ["127.0.0.1:6400"]
+ topic: "log-${fields.topic}"
+ ```
+
+ === "SASL certification"
+ ```yaml
+ sink:
+ type: kafka
+ brokers: ["127.0.0.1:6400"]
+ topic: "demo"
+ sasl:
+ type: scram
+ username: ***
+ password: ***
+ algorithm: sha512
+ ```
+
+!!! note "Supported Kafka versions"
+
+ The kafka sink uses the [segmentio/kafka-go](https://github.com/segmentio/kafka-go) library. The current library version used by Loggie is `v0.4.39`, and the Kafka version supported by the corresponding version test is: [0.10.1.0 - 2.7.1](https://github.com/segmentio/kafka-go/tree/v0. 4.39#kafka-versions)
## brokers
@@ -47,6 +65,49 @@ Also nested selection is supported:
```
Configure `topic: ${fields.topic}`, and the topic of Kafka is "loggie".
+## ifRenderTopicFailed
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| ifRenderTopicFailed | | Optional | | If you use dynamic rules to render topic, such as `topic: ${fields.topic}`, the rendering may fail (for example, the log does not have fields.topic field). The following configuration indicates the action after failure. . |
+| ifRenderTopicFailed.dropEvent | | Optional | true | Default is to drop |
+| ifRenderTopicFailed.ignoreError | | Optional | | Ignore the error log, please note that the error log is not printed here |
+| ifRenderTopicFailed.defaultTopic | | Optional | | Send to the default topic, dropEvent will not take effect after configuration |
+
+!!! example
+
+ === "1"
+
+ ```yaml
+ sink:
+ type: kafka
+ brokers: ["127.0.0.1:6400"]
+ topic: "log-${fields.topic}"
+ ifRenderTopicFailed:
+ dropEvent: true
+ ```
+
+ === "2"
+
+ ```yaml
+ sink:
+ type: kafka
+ brokers: ["127.0.0.1:6400"]
+ topic: "log-${fields.topic}"
+ ifRenderTopicFailed:
+ ignoreError: true
+ defaultTopic: default
+ ```
+
+## ignoreUnknownTopicOrPartition
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| ignoreUnknownTopicOrPartition | | Optional | | Used to ignore the error Kafka returns UNKNOWN_TOPIC_OR_PARTITION when the sent topic does not exist |
+
+- This situation generally occurs when a dynamically rendered topic is used, but Kafka in the environment turns off automatic topic creation, resulting in the inability to send it to the rendered topic. By default, Loggie will keep retrying and cannot send new logs.
+- After turning on ignoreUnknownTopicOrPartition, the sent logs will be discarded directly to avoid affecting the sending of other logs that normally contain existing topics.
+- Please note the difference from `ifRenderTopicFailed` above. `ifRenderTopicFailed` means that the topic cannot be dynamically rendered or the rendered value is a null value, while `ignoreUnknownTopicOrPartition` means that the rendering is successful, but the topic does not actually exist in Kafka.
## balance
diff --git a/docs/reference/pipelines/sink/loki.md b/docs/reference/pipelines/sink/loki.md
index da43d3a..c740078 100644
--- a/docs/reference/pipelines/sink/loki.md
+++ b/docs/reference/pipelines/sink/loki.md
@@ -37,4 +37,10 @@ loki sink is used to send data to Loki storage. Loki documentation can be found
Loki's log data structure is roughly divided into label and main data. By default, Loggie will convert the meta-information field in the header into a label connected with `_`.
-In addition, it should be noted that since loki's labels key does not support `.`, `/`, `-`, the keys containing these symbols in the header will be automatically converted into `_` form.
\ No newline at end of file
+In addition, it should be noted that since loki's labels key does not support `.`, `/`, `-`, the keys containing these symbols in the header will be automatically converted into `_` form.
+
+## insecureSkipVerify
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| insecureSkipVerify | bool | optional | false | whether to ignore certificate authentication |
diff --git a/docs/reference/pipelines/sink/overview.md b/docs/reference/pipelines/sink/overview.md
index 5ee3e39..eca6cc7 100644
--- a/docs/reference/pipelines/sink/overview.md
+++ b/docs/reference/pipelines/sink/overview.md
@@ -2,6 +2,8 @@
A Pipeline has a Sink.
+For Concurrency-related configuration, please refer to [Adaptive sink flow control](../../../user-guide/best-practice/concurrency.md).
+
## Sink Common Configuration
!!! example
@@ -12,7 +14,24 @@ A Pipeline has a Sink.
codec:
type: json
pretty: true
-
+ parallelism: 16
+ concurrency:
+ enabled: true
+ rtt:
+ blockJudgeThreshold: 120%
+ newRttWeigh: 0.4
+ goroutine:
+ initThreshold: 8
+ maxGoroutine: 20
+ unstableTolerate: 3
+ channelLenOfCap: 0.4
+ ratio:
+ multi: 2
+ linear: 2
+ linearWhenBlocked: 4
+ duration:
+ unstable: 15
+ stable: 30
```
### parallelism
@@ -46,4 +65,75 @@ Used to send the collected raw body data.
type: dev
codec:
type: raw
- ```
\ No newline at end of file
+ ```
+
+#### printEvents
+
+Whether to print out events to Loggie's log can be used to temporarily
+troubleshoot problems.
+At this time, there is no need to change the current sink type to dev, which is
+more convenient.
+
+!!! example
+
+ ```yaml
+ sink:
+ type: kafka
+ ...
+ codec:
+ printEvents: true
+ ```
+
+### concurrency
+
+!!! example
+
+ ```yaml
+ sink:
+ type: kafka
+ concurrency:
+ enabled: true
+ ```
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| enabled | bool | optional | false | whether to enable sink adaptive concurrency control |
+
+Note: This feature is not enabled by default.
+
+#### concurrency.goroutine
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| initThreshold | int | Optional | 16 | Initial threshold, below this value, the number of coroutines will grow exponentially (fast startup phase), above this value, it will grow linearly |
+| maxGoroutine | int | Optional | 30 | Maximum number of coroutines |
+| unstableTolerate | int | Optional | 3 | After entering the stable phase, the tolerance for network fluctuations, the number of coroutines needs to be reduced, including when rtt increases, the request fails, the number of coroutines needs to increase, including when the network is stable and the channel Saturation, the same situation will trigger a change in the number of coroutines only if it occurs three times. If it occurs a fourth time, it will be added until the opposite situation triggers a change in the number of coroutines. |
+| channelLenOfCap | float | Optional | 0.4 | Channel saturation threshold. If it exceeds this value, the number of coroutines needs to be increased. This value will only be calculated when rtt is stable |
+
+#### concurrency.rtt
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| blockJudgeThreshold | string | Optional | 120% | Judge the rtt increase threshold. When the new rtt exceeds the current average rtt to a certain extent, it is considered that the number of coroutines needs to be reduced |
+| newRttWeigh | float | Optional | 0.5 | The weight of the new rtt when calculating the new average rtt |
+
+Note: blockJudgeThreshold(b) supports both percentage and floating point numbers.
+
+If it is a percentage, determine whether (new rtt/average rtt)>b.
+
+If it is a floating point number, determine whether (new rtt-average rtt)>b.
+
+#### concurrency.ratio
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| multi | int | Optional | 2 | In the fast startup phase, the number of coroutines grows exponentially |
+| linear | int | optional | 2 | (after fast start) the linear growth or decrease rate of the number of coroutines |
+| linearWhenBlocked | int | Optional | 4 | When the channel is full (upstream blocking), the linear growth rate of the number of coroutines |
+
+#### concurrency.duration
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| unstable | int | Optional | 15 | Non-stationary phase, the time interval for collecting data and calculating the number of coroutines, in seconds |
+| stable | int | Optional | 30 | Stable phase, the time interval for collecting data and calculating the number of coroutines, in seconds |
diff --git a/docs/reference/pipelines/sink/pulsar.md b/docs/reference/pipelines/sink/pulsar.md
new file mode 100644
index 0000000..d635769
--- /dev/null
+++ b/docs/reference/pipelines/sink/pulsar.md
@@ -0,0 +1,145 @@
+# pulsar
+
+The pulsar sink is used to send data to [pulsar](https://github.com/apache/pulsar) storage.
+This sink is in beta trial status, please use it in a production environment with caution.
+
+!!! example
+
+ ```yaml
+ sink:
+ type: pulsar
+ url: pulsar://localhost:6650
+ topic: my-topic
+ ```
+
+## url
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| url| string | Required | None | Log sending end pulsar connection address |
+
+## topic
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| topic | string | Required | None | Send logs to pulsar's topic |
+
+## producerName
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| producerName | string | Optional | None | specifies a name for the producer |
+
+## properties
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| producerName | map | Optional | none | Properties specifies a set of application defined properties for the producer |
+
+## operationTimeoutSeconds
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| operationTimeoutSeconds| time.Duration| Optional | 30s | Producer-create, subscribe and unsubscribe operations will be retried until this interval, after which the operation will be marked as failed |
+
+## connectionTimeout
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| connectionTimeout| time.Duration| Optional | 5s | Timeout for the establishment of a TCP connection |
+
+## sendTimeout
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| sendTimeout| time.Duration| Optional | 30s | SendTimeout set the timeout for a message that is not acknowledged by the server 30s |
+
+## maxPendingMessages
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| sendTimeout| time.Duration| Optional | none | MaxPendingMessages specifies the max size of the queue holding the messages pending to receive an acknowledgment from the broker |
+
+## hashingSchema
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| hashingSchema| int| Optional | 0 |HashingScheme is used to define the partition on where to publish a particular message. 0:JavaStringHash,1:Murmur3_32Hash |
+
+## compressionType
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| compressionType | int| Optional | 0 | 0:NoCompression, 1:LZ4, 2:ZLIB, 3:ZSTD |
+
+## compressionLevel
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| compressionLevel | int| Optional | 0 | 0:Default, 1:Faster, 2:Better |
+
+## logLevel
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| logLevel| string | Optional | 0 | Log level: "info","debug", "error" |
+
+## batchingMaxSize
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| batchingMaxSize| int | Optional | 2048(KB) | BatchingMaxSize specifies the maximum number of bytes permitted in a batch |
+
+## batchingMaxMessages
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| batchingMaxMessages| int | Optional | 1000 |BatchingMaxMessages specifies the maximum number of messages permitted in a batch |
+
+## batchingMaxPublishDelay
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| batchingMaxPublishDelay| time.Duration | Optional | 10ms | BatchingMaxPublishDelay specifies the time period within which the messages sent will be batched |
+
+## useTLS
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| useTLS| bool | Optional | false | Whether to use TLS authentication |
+
+## tlsTrustCertsFilePath
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| tlsTrustCertsFilePath| string | Optional | none | the path to the trusted TLS certificate file |
+
+## tlsAllowInsecureConnection
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| tlsAllowInsecureConnection| bool| Optional | false | Configure whether the Pulsar client accept untrusted TLS certificate from broker |
+
+## certificatePath
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| certificatePath| string | Optional | none | TLS certificate path |
+
+## privateKeyPath
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| privateKeyPath| string | Optional | none | TLS privateKey path |
+
+## token
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| token | string| Optional | none | If you use token authentication to authenticate pulsar, please fill in this item|
+
+## tokenFilePath
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| tokenFilePath| string| Optional | none | auth token from a file|
diff --git a/docs/reference/pipelines/sink/rocketmq.md b/docs/reference/pipelines/sink/rocketmq.md
new file mode 100644
index 0000000..e96cc4a
--- /dev/null
+++ b/docs/reference/pipelines/sink/rocketmq.md
@@ -0,0 +1,164 @@
+# Rocketmq
+
+rocketmq sink can send log data to downstream [RocketMQ](https://github.com/apache/rocketmq).
+
+!!! example
+
+ ```yaml
+ sink:
+ type: rocketmq
+ nameServer:
+ - 127.0.0.1:9876
+ topic: "log-${fields.topic}"
+ ```
+
+## nameServer
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- |-----------|--------| --------- |-----------------------------|
+| nameServer | string array | Optional | None | RocketMQ nameserver addr list |
+
+## nsResolver
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- |-----------| ----------- |-------|-------------------------------------|
+| nsResolver | string array | Optional | None | Reslover list used to get nameserver addr |
+
+> Note: Choose between nameServer and nsResolver, one must be set
+
+## topic
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ---------- |--------|------------------------|
+| topic | string | required | loggie | topic for sending logs to RocketMQ |
+
+You can use `${a.b}` to get the field value in the event as the specific topic name.
+
+For example, an event is:
+
+ ```json
+ {
+ "topic": "loggie",
+ "hello": "world"
+ }
+ ```
+
+You can configure `topic: ${topic}`, in which case the topic sent to RocketMQ for this event is "loggie".
+
+Nested selection methods are also supported:
+
+ ```json
+ {
+ "fields": {
+ "topic": "loggie"
+ },
+ "hello": "world"
+ }
+ ```
+
+Configurable `topic: ${fields.topic}` will also be sent to topic "loggie".
+
+## ifRenderTopicFailed
+
+| `field` | `type` | `required` | `default` | `description` |
+|----------------------|--------| ----------- |-------|-----------------------|
+| ifRenderTopicFailed | object | Optional | | Configure topic action parameters for dynamic rendering failure |
+| ifRenderTopicFailed.dropEvent | bool | Optional | true | Whether to drop this message |
+| ifRenderTopicFailed.ignoreError | bool | Optional | false | Whether to ignore errors |
+| ifRenderTopicFailed.defaultValue | string | Optional | None | The topic value of the default configuration used when rendering fails |
+
+## group
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- |---------------------------------------------|
+| group | string | Optional | DEFAULT_PRODUCER | A collection of Producers of the same type, which send the same type of messages and have consistent sending logic |
+
+## tag
+
+| `field` | `type` | `required` | `default` | `description` |
+|------| ----------- | ----------- |-------|-------------------------------------------------------------|
+| tag | string | Optional | None | The flag set for the message, used to distinguish different types of messages under the same topic, can be understood as a secondary message type, used to further distinguish the message classification under a certain Topic |
+
+RocketMQ has support for tag capabilities. Therefore, similar to topic, tag can also use `${a.b}` to obtain the field value in the event as the specific tag name.
+
+For example, an event is:
+
+ ```json
+ {
+ "tag": "loggie",
+ "hello": "world"
+ }
+ ```
+
+You can configure `tag: ${tag}`, in which case the tag of the RocketMQ message is "loggie".
+
+Nested selection methods are also supported:
+
+ ```json
+ {
+ "fields": {
+ "tag": "loggie"
+ },
+ "hello": "world"
+ }
+ ```
+
+`tag: ${fields.tag}` can be configured, and the message tag will also be configured as "loggie".
+
+## ifRenderTagFailed
+
+| `field` | `type` | `required` | `default` | `description` |
+|----------------------|--------| ----------- |-------|---------------------|
+| ifRenderTagFailed | object | Optional | | Action parameters to configure tag dynamic rendering failure |
+| ifRenderTagFailed.dropEvent | bool | Optional | true | Whether to drop this message |
+| ifRenderTagFailed.ignoreError | bool | Optional | false | Whether to ignore errors |
+| ifRenderTagFailed.defaultValue | string | Optional | None | The tag value of the default configuration used when rendering fails |
+
+> Note: Because tag is a non-required parameter and is empty by default, the ifRenderTagFailed parameter will only take effect when tag is configured and rendering fails.
+
+## messageKeys
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- |-----------| ----------- |-------|----------------------------------------------------------|
+| messageKeys | string array | Optional | None | The business ID of the message, set by the message producer (Producer), `uniquely` identifies a certain business logic, such as using the order ID as the key |
+
+## retry
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- |-------|------|
+| retry | int | optional | 2 | number of retries |
+
+## sendMsgTimeout
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- |-------|-----------|
+| sendMsgTimeout | time.Duration | Optional | 3s | Timeout for message sending |
+
+## compressLevel
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- |------| ----------- |-------|---------------|
+| compressLevel | int | Optional | 5 | Compression registration of the message, value range: 0 1 2 3 4 5 6 7 8 9 |
+
+## compressMsgBodyOverHowmuch
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- |------| ----------- | --------- | -------- |
+| compressMsgBodyOverHowmuch | int | Optional | 4096 | How much the message body exceeds to start compression (Consumer will automatically decompress when receiving the message), unit byte |
+
+## topicQueueNums
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- |------| ----------- |-------| -------- |
+| topicQueueNums | int | Optional | 4 | The number of queues created by default when sending messages and automatically creating topics that do not exist on the server |
+
+## credentials
+
+> Credentials are used in scenarios where RocketMQ enables ACL permission control.
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- |-------|--------|-------|------------------------------------|
+| credentials | object | Optional | None | Credential information used by the client for authentication, which only needs to be transmitted when the server turns on identity recognition and authentication |
+| credentials.accessKey | string | required | none | accessKey |
+| credentials.accessKey | string | required | none | secretKey |
+| credentials.securityToken | string | Optional | None | secretKey |
diff --git a/docs/reference/pipelines/sink/webhook.md b/docs/reference/pipelines/sink/webhook.md
new file mode 100644
index 0000000..564385c
--- /dev/null
+++ b/docs/reference/pipelines/sink/webhook.md
@@ -0,0 +1,28 @@
+# alertWebhook
+
+alertWebhook sink sends log data to http receiver.
+For usage examples, please refer to [Log Alarm](../../../user-guide/monitor/service-log-alarm.md)
+
+!!! example
+
+ ```yaml
+ sink:
+ type: alertWebhook
+ addr: http://localhost:8080/loggie
+ headers:
+ api: test1
+ lineLimit: 10
+ template: |
+ ******
+ ```
+
+## webhook
+
+| `field` | `type` | `required` | `default` | `description` |
+|-----------|---------------|--------|-------|-----------------------------------------------|
+| addr | string | Optional | | http address to send alert, if empty, it will not be sent |
+| template | string | optional | | template used for rendering |
+| timeout | time.Duration | optional | 30s | http timeout for sending alert |
+| headers | map | optional | | http header to send alert |
+| method | string | Optional | POST | http method to send alert, if put is not filled in (not case-sensitive), it will be considered POST |
+| lineLimit | int | Optional | 10 | In the case of multi-line log collection, the maximum number of log lines included in each alert |
diff --git a/docs/reference/pipelines/sink/zinc.md b/docs/reference/pipelines/sink/zinc.md
new file mode 100644
index 0000000..370fe91
--- /dev/null
+++ b/docs/reference/pipelines/sink/zinc.md
@@ -0,0 +1,44 @@
+# zinc
+
+zinc sink is used to send data to [zinc](https://github.com/zinclabs/zinc) storage.
+
+!!! example
+
+ ```yaml
+ sink:
+ type: zinc
+ host: "http://127.0.0.1:4080"
+ username: admin
+ password: Complexpass#123
+ index: "demo"
+ ```
+
+## host
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| host | string | optional | `http://127.0.0.1:4080` | zinc's url address |
+
+## username
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| username | string | Optional | | Username sent to zinc |
+
+## password
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| password | string | optional | | Password sent to zinc |
+
+## index
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| index | string | optional | default | index sent to zinc |
+
+## skipSSLVerify
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| skipSSLVerify | bool | optional | true | whether to ignore SSL verification |
diff --git a/docs/reference/pipelines/source/elasticsearch.md b/docs/reference/pipelines/source/elasticsearch.md
new file mode 100644
index 0000000..37d9925
--- /dev/null
+++ b/docs/reference/pipelines/source/elasticsearch.md
@@ -0,0 +1,156 @@
+# elasticsearch
+
+Consume elasticsearch data.
+
+!!! example
+
+ === "Normal"
+
+ ```yaml
+ pipelines:
+ - name: local
+ sources:
+ - type: elasticsearch
+ name: elastic
+ hosts: ["localhost:9200"]
+ indices: ["blog*"]
+ size: 10 # data size per fetch
+ interval: 30s # pull data frequency
+ ```
+
+
+ === "Advanced"
+
+ ```yaml
+ pipelines:
+ - name: local
+ sources:
+ - type: elasticsearch
+ name: elastic
+ hosts:
+ - "localhost:9200"
+ - "localhost:9201"
+ indices: ["blog*"]
+ username: "bob"
+ password: "bob"
+ schema: ""
+ sniff: false
+ gzip: true
+ includeFields: # pull selected field
+ - Title
+ - Content
+ - Author
+ excludeFields: # exclude selected field
+ - Content
+ query: | # elastic query phrases
+ {
+ "match": {"Title": "bob"}
+ }
+ size: 10 # data size per fetch
+ interval: 30s # pull data frequency
+ timeout: 5s # pull timeout
+ db:
+ flushTimeout: 2s # persistent the elastic pull location frequency
+ cleanInactiveTimeout: 24h # delete the db record after the time
+ cleanScanInterval: 1h # check the expired db record frequency
+ ```
+
+## hosts
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| hosts | string array | required | | consumed elasticsearch url address |
+
+## indices
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| index | string array | required | | Query the index name of elasticsearch |
+
+## username
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| username | string | Optional | | Username for consuming elasticsearch |
+
+## password
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| password | string | required | | Password for consuming elasticsearch |
+
+## schema
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| schema | string | Optional | http | HTTP scheme (http/https), used for sniff |
+
+## gzip
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| gzip | bool | optional | false | whether to enable gzip compression |
+
+## includeFields
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| includeFields | string array | Optional | | Only return the specified _source field |
+
+## excludeFields
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| excludeFields | string array | optional | | exclude the specified _source field |
+
+## query
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| query | string | Optional | | Expression to query elasticsearch |
+
+## size
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| size | int | Optional | 100 | The number of hits returned for each request |
+
+## sortBy
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| sortBy | | required | array | sort used for query |
+| sortBy.fields | string | required | | fields to sort |
+| sortBy.ascending | bool | optional | true | whether to sort in ascending order |
+
+!!! example
+
+ ```yml
+ sortBy:
+ - fields: "@timestamp"
+ ascending: true
+ - fields: "_id"
+ ascending: true
+ ```
+
+## interval
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| interval | time.Duration | Optional | 30s | Time interval for scheduled elasticsearch requests |
+
+## timeout
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| timeout | time.Duration | Optional | 5s | Request timeout |
+
+## db
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| db | | Optional | | Persistent records query the progress of elasticsearch requests, which will be stored in elasticsearch to avoid repeated consumption of data after Loggie restarts |
+| db.indexPrefix | string | Optional | .loggie-db | By default, loggie will regularly write persistent data to the index in the format of `${indexPrefix}-${pipelineName}-${sourceName}` Medium |
+| db.flushTimeout | time.Duration | Optional | 2s | The interval between persistent data writing |
+| db.cleanInactiveTimeout | time.Duration | Optional | 504h (21day) | Timeout for cleaning expired persistent data |
+| db.cleanScanInterval | time.Duration | Optional | 1h | Check expiration interval |
diff --git a/docs/reference/pipelines/source/file.md b/docs/reference/pipelines/source/file.md
index 16c127c..a4aa628 100644
--- a/docs/reference/pipelines/source/file.md
+++ b/docs/reference/pipelines/source/file.md
@@ -60,12 +60,6 @@ file source is used for log collection.
| ----------- | ------------- | ---------- | -------- | ----------------------------------------------------- |
| ignoreOlder | time.Duration | false | none | for example, 48h, which means to ignore files whose update time is 2 days ago |
-## ignoreSymlink
-
-| `field` | `type` | `required` | `default` | `description` |
-| ------------- | ------ | ---------- | -------- | -------------------------------- |
-| ignoreSymlink | bool | false | false | whether to ignore symbolic links (soft links) files|
-
## addonMeta
@@ -100,6 +94,97 @@ state explanation:
- bytes: the number of bytes of data collected
- hostname: the name of the node where it is located
+## multi
+
+Multi-line collection related configurations
+
+!!! example
+
+ ```yaml
+ sources:
+ - type: file
+ name: accesslog
+ multi:
+ active: true
+ pattern: '^\d{4}-\d{2}-\d{2}'
+ ```
+
+### active
+
+| `field` | `type` | `required` | `default` | `description` |
+| ------ | ------ | ---------- | -------- | -------------------- |
+| active | bool | false | false | Whether to enable multi-line collection mode |
+
+### pattern
+
+| `field` | `type` | `required` | `default` | `description` |
+| ------- | ------ | ----------------------------- | -------- | ------------------------------------------------------------ |
+| pattern | string | Required when multi.active=true | false | A regular expression that determines it is a new log. For example, if the configuration is `'^\['`, it is considered that a new log starts with `[` at the beginning of the line. Otherwise, the content of this line will be merged into the previous log as part of the previous log. |
+
+!!! example
+
+ Suppose a multi-line log looks like this:
+
+ ```
+ 2023-05-11 14:30:15 ERROR Exception in thread "main" java.lang.NullPointerException
+ at com.example.MyClass.myMethod(MyClass.java:25)
+ at com.example.MyClass.main(MyClass.java:10)
+ ```
+ Configure pattern regularity: ^\d{4}-\d{2}-\d{2}
+ Will turn the log into one line. In this way, during log query, problems such as the above multi-line exception log stack disorder will not occur.
+
+### maxLines
+
+| `field` | `type` | `required` | `default` | `description` |
+| -------- | ------ | ---------- | -------- | ------------------------------------------------------------ |
+| maxLines | int | false | 500 | A log can contain at most several lines of content. The default is 500 lines. If the upper limit is exceeded, the current log will be forcibly sent, and the excess will be treated as a new log. |
+
+### maxBytes
+
+| `field` | `type` | `required` | `default` | `description` |
+| -------- | ------ | ---------- | -------- | ------------------------------------------------------------ |
+| maxBytes | int64 | false | 131072 | A log can contain at most several bytes. The default is 128K. If the upper limit is exceeded, the current log will be forcibly sent, and the excess part will be used as a new log. |
+
+### timeout
+
+| `field` | `type` | `required` | `default` | `description` |
+| ------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
+| timeout | time.Duration | false | 5s | The maximum length of time to wait for a log to be collected into a complete log. The default is 5s. If the upper limit is exceeded, the current log will be forcibly sent, and the excess will be treated as a new log. |
+
+## readFromTail
+
+| `field` | `type` | `required` | `default` | `description` |
+| ------------ | ------ | ---------- | -------- | ------------------------------------------------------------ |
+| readFromTail | bool | false | false | Whether to start collecting from the latest line of the file, regardless of the content written to the file historically. Suitable for scenarios such as migration of collection systems |
+
+## cleanFiles
+
+Clean up file related configurations. Expired files that have been collected will be deleted directly from the disk to free up disk space.
+
+### maxHistoryDays
+
+| `field` | `type` | `required` | `default` | `description` |
+| -------------- | ------ | ---------- | -------- | ------------------------------------------------------------ |
+| maxHistoryDays | int | false | none | The maximum number of days that files (after collection is completed) can be retained. If the limit is exceeded, the file will be deleted directly from the disk. If not configured, the file will never be deleted |
+
+### cleanUnfinished
+
+| `field` | `type` | `required` | `default` | `description` |
+| -------------- | ------ | ---------- | -------- | ------------------------------------------------------------ |
+| cleanUnfinished | bool | false | false | Even if the files have not been collected, they will still be cleaned. |
+
+## fdHoldTimeoutWhenInactive
+
+| `field` | `type` | `required` | `default` | `description` |
+| ------------------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
+| fdHoldTimeoutWhenInactive | time.Duration | false | 5m | When the time from the last collection of a file to the present exceeds the limit (the file has not been written for a long time, it is considered that there is a high probability that no more content will be written), the file handle of the file will be released to release system resources. |
+
+## fdHoldTimeoutWhenRemove
+
+| `field` | `type` | `required` | `default` | `description` |
+| ----------------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
+| fdHoldTimeoutWhenRemove | time.Duration | false | 5m | When a file is deleted and the collection is not completed, the maximum time will be waited for the collection to be completed. If the limit is exceeded, regardless of whether the file is finally collected, the file handle will be released directly and no longer will be collected. |
+
## workerCount
| `field` | `type` | `required` | `default` | `description` |
@@ -131,6 +216,12 @@ state explanation:
| --------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
| inactiveTimeout | time.Duration | false | 3s | If the file has exceeded inactiveTimeout from the last collection, it is considered that the file has entered an inactive state (that is, the last log has been written), and that the last line of log can be collected safely. |
+## ignoreSymlink
+
+| `field` | `type` | `required` | `default` | `description` |
+| ------------- | ------ | ---------- | -------- | -------------------------------- |
+| ignoreSymlink | bool | 非必填 | false | 是否忽略符号链接(软链接)的文件 |
+
## firstNBytesForIdentifier
| `field` | `type` | `required` | `default` | `description` |
@@ -280,53 +371,6 @@ The corresponding newline symbols are:
| --------------- |--------| ---------- |-------|-------|
| charset | string | false | utf-8 | newline symbol encoding |
-
-## multi
-
-Multi-line collection configuration
-
-!!! example
-
- ```yaml
- sources:
- - type: file
- name: accesslog
- multi:
- active: true
- ```
-
-### active
-
-| `field` | `type` | `required` | `default` | `description` |
-| ------ | ------ | ---------- | -------- | -------------------- |
-| active | bool | false | false | whether to enable multi-line |
-
-### pattern
-
-| `field` | `type` | `required` | `default` | `description` |
-| ------- | ------ | ----------------------------- | -------- | ------------------------------------------------------------ |
-| pattern | string | required when multi.active=true | false | A regular expression that is used to judge whether a line is a brand new log. For example, if it is configured as '^\[', it is considered that a line beginning with `[` is a new log, otherwise the content of this line is merged into the previous log as part of the previous log. |
-
-
-#### maxLines
-
-| `field` | `type` | `required` | `default` | `description` |
-| -------- | ------ | ---------- | -------- | ------------------------------------------------------------ |
-| maxLines | int | false | 500 | Number of lines a log can contains at most. The default is 500 lines. If the upper limit is exceeded, the current log will be forced to be sent, and the excess will be used as a new log. |
-
-
-#### maxBytes
-
-| `field` | `type` | `required` | `default` | `description` |
-| -------- | ------ | ---------- | -------- | ------------------------------------------------------------ |
-| maxBytes | int64 | false | 131072 | Number of bytes a log can contains at most. The default is 128K. If the upper limit is exceeded, the current log will be forced to be sent, and the excess will be used as a new log. |
-
-#### timeout
-
-| `field` | `type` | `required` | `default` | `description` |
-| ------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
-| timeout | time.Duration | false | 5s | How long to wait for a log to be collected as a complete log. The default is 5s. If the upper limit is exceeded, the current log will be sent, and the excess will be used as a new log. |
-
## ack
Configuration related to the confirmation of the source. If you need to make sure `at least once`, you need to turn on the ack mechanism, but there will be a certain performance loss.
@@ -356,59 +400,6 @@ Configuration related to the confirmation of the source. If you need to make sur
| ------------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
| maintenanceInterval | time.Duration | false | 20h | maintenance cycle. Used to regularly clean up expired confirmation data (such as the ack information of files that are no longer collected) |
-## db
-
-Use `sqlite3` as database. Save the file name, file inode, offset of file collection and other information during the collection process. Used to restore the last collection progress after logie reload or restart.
-
-!!! caution
- This configuration can only be configured in defaults.
-
-!!! example
-
- ```yaml
- defaults:
- sources:
- - type: file
- db:
- file: "./data/loggie.db"
- ```
-
-### file
-
-| `field` | `type` | `required` | `default` | `description` |
-| ------ | ------ | ---------- | ---------------- | -------------- |
-| file | string | false | ./data/loggie.db | database file path |
-
-### tableName
-
-| `field` | `type` | `required` | `default` | `description` |
-| --------- | ------ | ---------- | -------- | ------------ |
-| tableName | string | false | registry | database table name |
-
-### flushTimeout
-
-| `field` | `type` | `required` | `default` | `description` |
-| ------------ | ------------- | ---------- | -------- | -------------------------- |
-| flushTimeout | time.Duration | false | 2s | write the collected information to the database regularly |
-
-### bufferSize
-
-| `field` | `type` | `required` | `default` | `description` |
-| ---------- | ------ | ---------- | -------- | -------------------------------- |
-| bufferSize | int | false | 2048 | The buffer size of the collection information written into the database |
-
-### cleanInactiveTimeout
-
-| `field` | `type` | `required` | `default` | `description` |
-| -------------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
-| cleanInactiveTimeout | time.Duration | false | 504h | Clean up outdated data in the database. If the update time of the data exceeds the configured value, the data will be deleted. 21 days by default. |
-
-### cleanScanInterval
-
-| `field` | `type` | `required` | `default` | `description` |
-| ----------------- | ------------- | ---------- | -------- | ----------------------------------------------------- |
-| cleanScanInterval | time.Duration | false | 1h | Periodically check the database for outdated data. Check every 1 hour by default |
-
## watcher
Configuration for monitoring file changes
@@ -423,15 +414,9 @@ Configuration for monitoring file changes
sources:
- type: file
watcher:
- enableOsWatch: true
+ maxOpenFds: 8000
```
-### enableOsWatch
-
-| `field` | `type` | `required` | `default` | `description` |
-| ------------- | ------ | ---------- | -------- | ------------------------------------------------ |
-| enableOsWatch | bool | false | true | Whether to enable the monitoring notification mechanism of the OS. For example, inotify of linux |
-
### scanTimeInterval
| `field` | `type` | `required` | `default` | `description` |
@@ -444,23 +429,11 @@ Configuration for monitoring file changes
| ------------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
| maintenanceInterval | time.Duration | false | 5m | Periodic maintenance work (such as reporting and collecting statistics, cleaning files, etc.) |
-### fdHoldTimeoutWhenInactive
-
-| `field` | `type` | `required` | `default` | `description` |
-| ------------------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
-| fdHoldTimeoutWhenInactive | time.Duration | false | 5m | When the time from the last collection of the file to the present exceeds the limit (the file has not been written for a long time, it is considered that there is a high probability that the content will not be written again), the handle of the file will be released to release system resources |
-
-### fdHoldTimeoutWhenRemove
-
-| `field` | `type` | `required` | `default` | `description` |
-| ----------------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
-| fdHoldTimeoutWhenRemove | time.Duration | false | 5m | When the file is deleted and the collection is not completed, it will wait for the maximum time to complete the collection. If the limit is exceeded, no matter whether the file is finally collected or not, the handle will be released directly and no longer collected. |
-
### maxOpenFds
| `field` | `type` | `required` | `default` | `description` |
| ---------- | ------ | ---------- | -------- | -------------------------------------------------- |
-| maxOpenFds | int | false | 512 | The maximum number of open file handles. If the limit is exceeded, the files will not be collected temporarily |
+| maxOpenFds | int | false | 4096 | The maximum number of open file handles. If the limit is exceeded, the files will not be collected temporarily |
### maxEofCount
@@ -474,27 +447,12 @@ Configuration for monitoring file changes
| ---------------- | ------ | ---------- | -------- | ---------------------------------------------- |
| cleanWhenRemoved | bool | false | true | When the file is deleted, whether to delete the collection-related information in the db synchronously. |
-### readFromTail
-
-| `field` | `type` | `required` | `default` | `description` |
-| ------------ | ------ | ---------- | -------- | ------------------------------------------------------------ |
-| readFromTail | bool | false | false | Whether to start collecting from the latest line of the file, regardless of writing history. It is suitable for scenarios such as migration of collection systems. |
-
### taskStopTimeout
| `field` | `type` | `required` | `default` | `description` |
| --------------- | ------------- | ---------- | -------- | ------------------------------------------------------------ |
| taskStopTimeout | time.Duration | false | 30s | The timeout period for the collection task to exit. It is a bottom-up solution when Loggie cannot be reloaded. |
+## db
-### cleanFiles
-
-File clearing related configuration. Expired and collected files will be deleted directly from the disk to free up disk space.
-
-#### maxHistoryDays
-
-| `field` | `type` | `required` | `default` | `description` |
-| -------------- | ------ | ---------- | -------- | ------------------------------------------------------------ |
-| maxHistoryDays | int | false | none | Maximum number of days to keep files (after collection). If the limit is exceeded, the file will be deleted directly from the disk. If not configured, the file will never be deleted |
-
-
+It has been deleted in v1.5 and later versions. Please use the globally configured [db](../../global/db.md) to replace it.
diff --git a/docs/reference/pipelines/source/franzkafka.md b/docs/reference/pipelines/source/franzkafka.md
new file mode 100644
index 0000000..b023e98
--- /dev/null
+++ b/docs/reference/pipelines/source/franzkafka.md
@@ -0,0 +1,133 @@
+# franz kafka
+
+Use the [franz-go kafka](https://github.com/twmb/franz-go) library to consume Kafka.
+
+(The difference between this sink and kafka sink is generally only the kafka golang library used, which is provided for users who have a preference for the franz kafka library)
+
+!!! example
+
+ ```yaml
+ sources:
+ - type: kafka
+ brokers: ["kafka1.kafka.svc:9092"]
+ topic: log-*
+ ```
+
+!!! note "Supported Kafka versions"
+
+ The kafka sink uses the [segmentio/kafka-go](https://github.com/segmentio/kafka-go) library. The current library version used by Loggie is `v0.4.39`, and the Kafka version supported by the corresponding version test is: [0.10.1.0 - 2.7.1](https://github.com/segmentio/kafka-go/tree/v0. 4.39#kafka-versions)
+
+## brokers
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| brokers | string array | required | none | Kafka broker address |
+
+## topic
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| topic | string | Required | None | The received topic, you can use regular expressions to match multiple topics |
+
+## topics
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| topics | Array | Required | None | Received topics, you can fill in multiple topics, or use regular rules to match multiple topics |
+
+## groupId
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| groupId | string | optional | loggie | groupId for Loggie to consume kafka |
+
+## clientId
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| clientId | string | optional | loggie | clientId of Loggie consuming kafka |
+
+## worker
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| worker | int | Optional | 1 | The number of worker threads used by Loggie to consume Kafka |
+
+## addonMeta
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| addonMeta | bool | Optional | true | Whether to add some meta information for consuming Kafka |
+
+The added meta information fields include:
+
+- offset: offset of consumption
+- partition: partition
+- timestamp: timestamp added when consuming, in the format of RFC3339 ("2006-01-02T15:04:05Z07:00")
+- topic: consumption topic
+
+## fetchMaxWait
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| fetchMaxWait | time.Duration | Optional | 5s | Maximum time to wait for a fetch response to reach the minimum number of bytes required before returning |
+
+## fetchMaxBytes
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| fetchMaxBytes | int | Optional | 50 << 20 (50MiB) | Maximum number of bytes fetched |
+
+## fetchMinBytes
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| fetchMinBytes | int | Optional | 1 | Minimum number of bytes to fetch |
+
+## fetchMaxPartitionBytes
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| fetchMaxPartitionBytes | int | Optional | 50 << 20 (50MiB) | Maximum number of bytes to fetch from a single partition |
+
+## enableAutoCommit
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| enableAutoCommit | bool | optional | false | whether to enable automatic commit to kafka |
+
+## autoCommitInterval
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| autoCommitInterval | time.Duration | Optional | 1s | The time interval for automatic commit to kafka |
+
+## autoOffsetReset
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| autoOffsetReset | string | Optional | latest | If there is no corresponding offset, where to start consuming topic data, which can be: `latest` or `earliest` |
+
+## sasl
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| sasl | | optional | | SASL authentication |
+| sasl.enabled | bool | optional | false | whether to enable |
+| sasl.mechanism | string | Required | | SASL type, can be: `PLAIN`, `SCRAM-SHA-256`, `SCRAM-SHA-512`, `GSSAPI`|
+| sasl.username | string | required | | username |
+| sasl.password | string | required | | password |
+
+## gssapi
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| sasl.gssapi | | Optional | | SASL authentication |
+| sasl.gssapi.authType | string | required | | SASL type, can be: 1 using account password, 2 using keytab |
+| sasl.gssapi.keyTabPath | string | required | | keytab file path |
+| sasl.gssapi.kerberosConfigPath | string | required | | kerbeos file path |
+| sasl.gssapi.serviceName | string | required | | service name |
+| sasl.gssapi.userName | string | required | | username |
+| sasl.gssapi.password | string | required | | password |
+| sasl.gssapi.realm | string | required | | realm |
+| sasl.gssapi.disablePAFXFAST | bool | required when type=scram | |DisablePAFXFAST is used to configure the client not to use PA_FX_FAST |
diff --git a/docs/reference/pipelines/source/kafka.md b/docs/reference/pipelines/source/kafka.md
index 700516d..48bd7dc 100644
--- a/docs/reference/pipelines/source/kafka.md
+++ b/docs/reference/pipelines/source/kafka.md
@@ -10,6 +10,10 @@ Kafka source is used for receice Kafka data.
topic: log-*
```
+!!! note "Supported Kafka versions"
+
+ The kafka sink uses the [segmentio/kafka-go](https://github.com/segmentio/kafka-go) library. The current library version used by Loggie is `v0.4.39`, and the Kafka version supported by the corresponding version test is: [0.10.1.0 - 2.7.1](https://github.com/segmentio/kafka-go/tree/v0. 4.39#kafka-versions)
+
## brokers
| `field` | `type` | `required` | `default` | `description` |
@@ -30,6 +34,31 @@ Kafka source is used for receice Kafka data.
| ---------- | ----------- | ----------- | --------- | -------- |
| groupId | string | false | loggie | groupId Loggie use to consume kafka |
+## clientId
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| clientId | string | optional | loggie | clientId of Loggie consuming kafka |
+
+## worker
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| worker | int | Optional | 1 | The number of worker threads used by Loggie to consume Kafka |
+
+## addonMeta
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| addonMeta | bool | Optional | true | Whether to add some meta information for consuming Kafka |
+
+The added meta information fields include:
+
+- offset: offset of consumption
+- partition: partition
+- timestamp: timestamp added when consuming, in the format of RFC3339 ("2006-01-02T15:04:05Z07:00")
+- topic: consumption topic
+
## queueCapacity
| `field` | `type` | `required` | `default` | `description` |
diff --git a/docs/reference/pipelines/source/overview.md b/docs/reference/pipelines/source/overview.md
index 08ade65..a1f8088 100644
--- a/docs/reference/pipelines/source/overview.md
+++ b/docs/reference/pipelines/source/overview.md
@@ -65,6 +65,36 @@ For example:
Loggie will get the value `${SVC_NAME}` from the environment variable where Loggie is located, and then add a field to all log events:`service: ${SVC_NAME}`.
+### fieldsFromPath
+
+| `field` | `type` | `required` | `default` | `description` |
+| ---------- | ----------- | ----------- | --------- | -------- |
+| fieldsFromPath | map | Optional | | Additional fields added to the event, the value is the content in the file specified by path |
+
+For example:
+
+!!! example
+
+ ```yaml
+ sources:
+ - type: file
+ name: access
+ paths:
+ - /var/log/*.log
+ fieldsFromPath:
+ test: /tmp/foo
+ ```
+
+Assume that the content of file `/tmp/foo` is bar:
+
+ ```bash
+ cat /tmp/foo
+ ---
+ bar
+ ```
+
+Loggie will add fields to all log events: `test`: `bar`.
+
### fieldsUnderRoot
| `field` | `type` | `required` | `default` | `description` |
diff --git a/docs/user-guide/best-practice/concurrency.md b/docs/user-guide/best-practice/concurrency.md
new file mode 100644
index 0000000..713fadc
--- /dev/null
+++ b/docs/user-guide/best-practice/concurrency.md
@@ -0,0 +1,132 @@
+# Adaptive sink flow control
+
+Concurrency can be controlled when logs are sent to downstream services. However, at least the following two points will affect the actual situation: whether there is pressure on the downstream server and whether there is pressure on data sending. These two points change in real time, and it is difficult for a fixed value to meet actual needs in the long term.
+
+Adaptive sink flow control function can:
+
+- Automatically adjust the parallel number of downstream data sending according to the actual situation of the downstream data response, so as to maximize the performance of the downstream server without affecting its performance.
+
+- When upstream data collection is blocked, appropriately adjust the downstream data sending speed to alleviate upstream congestion.
+
+For configuration details, please refer to [sink concurrency](../../reference/pipelines/sink/overview.md).
+
+## Key words
+
+- Response speed: rtt, round trip time
+- Whether the response is successful: whether the downstream is successfully sent
+- Whether the upstream is blocked: channel saturation
+- Concurrency value: the number of coroutines enabled by the sink
+- Average response speed in the last period: rtt last duration, rttD
+- Overall average response speed: rtt total, rttT
+- Exponential growth (fast start) threshold: threshold
+
+## Implementation ideas
+
+The core is to imitate tcp flow control and make adjustments according to loggie itself.
+
+1. **Quick Start Phase**
+
+ - The number of coroutines increases exponentially. After reaching the threshold, the fast start ends and linearly increases or decreases. If rtt increases or a failure return is received during the fast start phase, the fast start ends directly.
+ - Exponential reduction is cancelled, preventing large fluctuations in the number of coroutines and reducing overhead.
+
+2. **Quick Start End**
+
+ - Statistics will start after the quick startup is completed, and the number of coroutines adjusted each time the coroutine pool is collected.
+ - If rtt increases or failure returns are received, the number of coroutines will be reduced immediately.
+ - If the channel saturation reaches a certain value and the current rtt is relatively stable, increase the number of coroutines.
+ - After a period of time, obtain the average number of coroutines adjusted for each coroutine pool and enter a stable stage. The mean will serve as a limit for subsequent adjustments to the number of coroutines, also to reduce fluctuations.
+
+3. **Stationary Phase**
+
+ - After entering the stationary phase, the time interval increases.
+ - Each time rtt increases or a failure is returned, a request to shrink the coroutine pool will be accumulated.
+ - If the channel saturation reaches a certain value and the current rtt is relatively stable, a request to expand the coroutine pool will be accumulated.
+ - When the same request reaches a certain number of times, the coroutine pool adjustment is triggered and the number of requests in the opposite direction is cleared.
+ - If the same request is received again after the adjustment, additional adjustments will be made.
+ - The adjustment of the number of coroutines in the stationary phase will be affected by the mean. If it is too much lower or higher than the mean, only the mean will be adjusted first. If the same situation occurs again, the coroutine pool will be adjusted and the mean will be adjusted at the same time.
+
+4. **Other details**
+
+ - If the channel saturation reaches 100%, the range of linear growth will be expanded.
+ - The calculation formula of rttT is the weighted average of the overall average response speed and the response speed of the previous period, that is, rttT.New = a(rttD) + (1-a)rttT.Old.
+ - This feature is not enabled by default. To enable it, you need to set concurrency.enable=true.
+
+## Use cases and interpretation
+
+### Downstream server
+
+An additional local server is built downstream, and the rtt value can be adjusted freely to simulate network fluctuations.
+
+A random failure return has been added, and the probability can be set.
+
+The blocking situation is simulated by greatly increasing rtt. Since rttT is not a fixed value, when rtt is stable, it will not affect the judgment logic.
+
+### Configuration
+
+!!!config
+
+ ```yaml
+ concurrency:
+ enable: true
+ goroutine:
+ initThreshold: 16
+ maxGoroutine: 30
+ unstableTolerate: 3
+ channelLenOfCap: 0.4
+ rtt:
+ blockJudgeThreshold: 120%
+ newRttWeigh: 0.5
+ ratio:
+ multi: 2
+ linear: 2
+ linearWhenBlocked: 4
+ duration:
+ unstable: 15
+ stable: 30
+ ```
+
+### Case number one
+
+Simulate the downstream service without returning an error, only adjust RTT, and test the algorithm's response to network delays.
+
+
+
+#### Interpretation
+
+- 0-3: Quick start stage, the number of coroutines is doubled each time
+- 4: The number of coroutines reaches 16, and the quick start ends
+- 5: The growth of rtt exceeds the threshold, and it has not entered a stable stage at this time. It is judged that the number of coroutines needs to be reduced ratio.linear=2
+- 6-17: The network has no fluctuations and remains stable.
+- 18: Collect enough data, calculate the average number of coroutines, and enter a stable stage
+- 19-21: rtt increases three times in a row, reaching goroutine.unstableTolerate, which reduces the number of triggered coroutines.
+- 22: The number of coroutines is reduced
+- 25: rtt increases again, and the number of additional coroutines decreases again
+- 29: rtt increases again and the number of additional coroutines decreases. This is to simulate the situation where the downstream sending is overfull and the channel is blocked.
+- 30-32: The channel saturation exceeds the threshold three times in a row and reaches goroutine.unstableTolerate, which will increase the number of triggered coroutines.
+- 33: The number of coroutines increases
+- 34-37: The channel saturation decreases and there is no more pressure. At the same time, the network is stable and the number of coroutines remains unchanged.
+
+### Case 2
+
+Based on Case 1, simulate the situation where an error is returned downstream, and the probability is set to 0.15%.
+
+
+
+#### Interpretation
+
+- 0-3: Quick start
+- 4: The number of coroutines reaches the threshold, and the quick start ends
+- 6: rtt increases, and it has not entered a stable stage at this time. It is judged that the number of coroutines needs to be reduced.
+- 10: Receive error return, reduce the number of coroutines
+- 16: Receive error return, reduce the number of coroutines
+- 18: After receiving an error return, reduce the number of coroutines, collect enough data, calculate the average number of coroutines to 11, and enter a stable stage.
+- 21-22: rtt increases twice in a row, but goroutine.unstableTolerate is 3, which does not trigger the adjustment of the number of coroutines.
+- 24: Error return is received and goroutine.unstableTolerate is reached. However, since the average number of coroutines is 11, direct downward adjustment may cause large jitter, so the average number is only downwardly adjusted to 9, and the downward adjustment amount is controlled by ratio.linear
+- 25: After receiving an error return again, the number of additional coroutines is reduced. At this time, the average number is 9, and the lower limit is 7, so the number of coroutines is reduced to 7.
+- 26-29: RTT is greatly improved, and the number of coroutines is continuously reduced. This is to simulate channel saturation. When the number of coroutines is too small, the default lower limit is set to 2 to ensure that two coroutines are running, but the average value is adjusted to 1 at this time.
+- 33: The channel continues to be saturated and rtt tends to be stable. At this time, the number of coroutines is increased. Since the mean is 1, the upper limit of the increase is 3 (the upper limit of the increase is also controlled by ratio.linear), so the number of coroutines is increased to 3 and adjusted at the same time. mean to 3
+- 34-37: The channel continues to be saturated, and the number of coroutines continues to be increased.
+- 38-42: The channel is no longer saturated, rtt is stable, the number of coroutines remains unchanged, and a single error return will not trigger the adjustment of the number of coroutines.
+
+!!! info
+ This feature is in the experimental stage and is turned off by default. Discussions are welcome.
diff --git a/docs/user-guide/best-practice/log-collect-rotate.md b/docs/user-guide/best-practice/log-collect-rotate.md
new file mode 100644
index 0000000..313e19d
--- /dev/null
+++ b/docs/user-guide/best-practice/log-collect-rotate.md
@@ -0,0 +1,32 @@
+# Log rotate and log collection
+
+## Two cutting modes
+
+### 1.1 create mode
+
+1. First rename the log file name that the current process is outputting. Because the process determines which log file to output based on the inode number, changing the log file name will not affect the inode number, so in this step, the process will still modify it. Output the log in the log file with the specified name.
+
+2. Create a new log. The name of the new log is the same as the old log, but it is new. So the inode number is different. At this time, the process log is still output to the old renamed log file.
+
+3. Initiate a signal notification to the process to let it rewrite the log.
+
+The commonly used logback is based on this mode.
+
+### 1.2 copytruncate mode
+
+1. Copy the current log file and rename it to a new file so that the process still writes to the old file.
+
+2. Then logrotate truncates the old files and clears the old files. This completes a log cut.
+
+This type of log cutting does not require a reload signal to be issued to the process.
+
+risk:
+
+- Because log files will be copied, if the system file is huge, the available system space will suddenly increase dramatically.
+- There is a short interval between copy and truncate, and the log is still being written. Once truncate is done, the intermediate log may be lost.
+
+!!! caution
+
+ It is generally not recommended to use copytruncate mode. If you use Loggie to collect logs cut by copytruncate mode, please configure the specific file name to be written to the path of the file source.
+
+ For example, if app.log is always written, the cut files are app-1.log, app-2.log, etc. Please configure the path to `app.log` instead of `*.log`, because after copying, there is a new file, because Loggie identifies the uniqueness of the file based on inode+deviceId, etc. instead of the file name. If it matches the cut file, this part of the log will be collected repeatedly.
diff --git a/docs/user-guide/enterprise-practice/architecture-and-evolution.md b/docs/user-guide/enterprise-practice/architecture-and-evolution.md
index 17d1f77..1daf82d 100644
--- a/docs/user-guide/enterprise-practice/architecture-and-evolution.md
+++ b/docs/user-guide/enterprise-practice/architecture-and-evolution.md
@@ -1,5 +1,9 @@
# Logging System Architecture and Evolution
-
+
+We can build a cloud-native and scalable full-link data platform with Loggie as the core. Loggie supports the use of different technology options.
+
+
+
Bases on different business types, different scenarios, and different log scales, we may adopt different log system architectures. There is no good or bad architecture, only suitable architecture. In a simple scenario, a log system built with a complex architecture may bring O&M disasters.
Here is a summary of common log system architectures from the perspective of scale evolution. Of course, there are many actual technical options and variants, and we cannot list them one by one. I believe you can build an architecture suitable for your own business by referring to the following.
@@ -76,6 +80,9 @@ For example:
- Use Loggie's multi-pipeline feature to split business logs and send them to multiple Kafka clusters.
- Add a front-end Loggie aggregator cluster in a large-scale architecture, and perform traffic distribution and forwarding in advance.
+Finally, we can build a production-level full-link log data platform based on Loggie.
+
+
## More
In fact, in order to implement a complete log architecture, you also need to consider:
diff --git a/docs/user-guide/enterprise-practice/imgs/loggie-chain.png b/docs/user-guide/enterprise-practice/imgs/loggie-chain.png
new file mode 100644
index 0000000..b56ec3e
Binary files /dev/null and b/docs/user-guide/enterprise-practice/imgs/loggie-chain.png differ
diff --git a/docs/user-guide/enterprise-practice/imgs/loggie-extend.png b/docs/user-guide/enterprise-practice/imgs/loggie-extend.png
new file mode 100644
index 0000000..24c4192
Binary files /dev/null and b/docs/user-guide/enterprise-practice/imgs/loggie-extend.png differ
diff --git a/docs/user-guide/monitor/service-log-alarm.md b/docs/user-guide/monitor/service-log-alarm.md
index 02b1a99..88d576e 100644
--- a/docs/user-guide/monitor/service-log-alarm.md
+++ b/docs/user-guide/monitor/service-log-alarm.md
@@ -19,9 +19,14 @@ Loggie does not need to be independently deployed. However, the matching during
### Configuration
-Add `logAlert listener`:
+#### 1. Added logAlert listener
-!!! config
+Configure a new `logAlert listener` to send alarm configuration. Used to send
+log alarms to backends such as alertManager after detecting matching logs. For
+detailed configuration, please refer to [logAlert
+listener](../../reference/monitor/logalert.md).
+
+!!! config "Global Config file"
```yaml
loggie:
@@ -31,10 +36,42 @@ Add `logAlert listener`:
enabled: true
listeners:
logAlert:
- alertManagerAddress: ["http://127.0.0.1:9093"]
+ addr: ["http://127.0.0.1:8080/loggie"]
bufferSize: 100
batchTimeout: 10s
batchSize: 10
+ linelimit: 10
+ template: |
+ {
+ "alerts":
+ [
+ {{$first := true}}
+ {{range .Alerts}}
+ {{if $first}}{{$first = false}}{{else}},{{end}}
+ {
+ "labels": {
+ "topic": "{{.fields.topic}}"
+ },
+ "annotations": {
+ "message": "\nNew alert: \nbody:\n{{range .body}}{{.}}\n{{end}}\ncontainerid: {{._meta.pipelineName}}\nsource: {{._meta.sourceName}}\ncontainername: {{.fields.containername}}\nlogconfig: {{.fields.logconfig}}\nname: {{.fields.name}}\nnamespace: {{.fields.namespace}}\nnodename: {{.fields.nodename}}\npodname: {{.fields.podname}}\nfilename: {{.state.filename}}\n",
+ "reason": "{{.reason}}"
+ },
+ "startsAt": "{{._meta.timestamp}}",
+ "endsAt": "{{._meta.timestamp}}"
+ }
+ {{end}}
+ ],
+ {{$first := true}}
+ {{range .Alerts}}
+ {{if $first}}{{$first = false}}{{else}}
+ "commonLabels": {
+ "module": "{{._additions.module}}",
+ "alertname": "{{._additions.alertname}}",
+ "cluster": "{{._additions.cluster}}"
+ }
+ {{end}}
+ {{end}}
+ }
filesource: ~
filewatcher: ~
reload: ~
@@ -45,7 +82,92 @@ Add `logAlert listener`:
port: 9196
```
-Add `logAlert interceptor`, and reference it in ClusterLogConfig/LogConfig:
+The above template represents the format of the alarm content sent, using the go
+template format. You can refer to [GO
+Template](https://pkg.go.dev/text/template), or please search for go template
+usage tutorials by yourself.
+
+You can use forms like `{{._meta.timestamp}}` to dynamically render fields in
+the original alert data.
+
+**alert field explanation**:
+
+| `Field` | `Is it built-in` | `Meaning` |
+|--------------------|--------|-----------------------|
+| _meta | Yes | alert metadata |
+| _meta.pipelineName | | represents the pipeline name |
+| _meta.sourceName | | represents the source name |
+| _meta.timestamp | | Represents the log timestamp |
+| body | is | logBody |
+| reason | Yes | Reason for successful matching |
+| fields | No | field field, added by the rest of the configuration |
+| state | No | To collect information, you need to configure `addonMeta: true` in the file source |
+| _additions | No | Specified by configuration |
+
+The original alert data is a json, where `Alerts` is a fixed key.
+
+!!! example "original alert data example"
+
+ ```json
+ {
+ "Alerts": [
+ {
+ "_meta": {
+ "pipelineName": "default/spring",
+ "sourceName": "loggie-source-756fd6bb94-4skqv/loggie-alert/common",
+ "timestamp": "2022-10-28T13:12:30.528824+08:00"
+ },
+ "body": [
+ "2022-10-28 01:48:07.093 ERROR 1 --- [nio-8080-exec-1] o.a.c.c.C.[.[.[/]. [dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.ArithmeticException: / by zero] with root cause",
+ "",
+ "java.lang.ArithmeticException: / by zero"
+ ],
+ "fields": {
+ "containerid": "0dc5f07983bfdf7709ee4fce752679983c4184e94c70dab5fe6df5843d5cbb68",
+ "containername": "loggie-alert",
+ "logconfig": "spring",
+ "name": "loggie-source",
+ "namespace": "default",
+ "nodename": "docker-desktop",
+ "podname": "loggie-source-756fd6bb94-4skqv",
+ "topic": "loggie"
+ },
+ "reason": "matches some rules",
+ "state": {
+ "bytes": 6913,
+ "filename": "/var/log/pods/ default_loggie-source-756fd6bb94-4skqv_9da3e440-e749-4930-8e4d-41e0d5b66417/ loggie-alert/1.log",
+ "hostname": "docker-desktop",
+ "offset": 3836,
+ "pipeline": "default/spring",
+ "source": "loggie-source-756fd6bb94-4skqv/loggie-alert/common",
+ "timestamp": "2022-10-28T13:12:30.527Z"
+ },
+ "_additions": {
+ "namespace": "default",
+ "cluster": "local",
+ "alertname": "loggie-test",
+ "module": "loggie"
+ }
+ }
+ ]
+ }
+ ```
+
+#### 2. Add logAlert interceptor
+
+`logAlert interceptor` is added to Pipeline to detect logs during collection and
+match log alarm rules, which can be referenced in ClusterLogConfig/LogConfig.
+Among them, `additions` is an additional field added to the alert, which will be
+placed in the `_addtions` field of the alert original data and can be used for
+template rendering.
+
+It is recommended to first use debug mode (`-log.level=debug`) to observe the
+original alert data format, and then configure the template for rendering. The
+original data will be affected by other configurations. Only one example is
+shown here.
+
+For detailed configuration, please refer to
+[logAlert interceptor](../../reference/pipelines/interceptor/logalert.md)
!!! config
@@ -58,73 +180,97 @@ Add `logAlert interceptor`, and reference it in ClusterLogConfig/LogConfig:
interceptors: |
- type: logAlert
matcher:
- contains: ["err"]
- ```
-
-
-The alertManager's webhook can be configured for other services to receive alerts.
-
-!!! config
- ```yaml
- receivers:
- - name: webhook
- webhook_configs:
- - url: http://127.0.0.1:8787/webhook
- send_resolved: true
+ contains: ["ERROR"]
+ additions:
+ module: "loggie"
+ alertname: "loggie-test"
+ cluster: "local"
```
-
-When successful, we can view similar logs in the alertManager:
-
-```
-ts=2021-12-22T13:33:08.639Z caller=log.go:124 level=debug component=dispatcher msg="Received alert" alert=[6b723d0][active]
-ts=2021-12-22T13:33:38.640Z caller=log.go:124 level=debug component=dispatcher aggrGroup={}:{} msg=flushing alerts=[[6b723d0][active]]
-ts=2021-12-22T13:33:38.642Z caller=log.go:124 level=debug component=dispatcher receiver=webhook integration=webhook[0] msg="Notify success" attempts=1
-
-```
-
-At the same time, the webhook receives a similar alarm:
+After matching the log alarm rules, the alarm backend can receive similar data
+as follows:
!!! example
```json
{
- "receiver": "webhook",
- "status": "firing",
"alerts": [
{
- "status": "firing",
"labels": {
- "host": "fuyideMacBook-Pro.local",
- "source": "a"
+ "topic": "loggie"
},
"annotations": {
- "message": "10.244.0.1 - - [13/Dec/2021:12:40:48 +0000] error \"GET / HTTP/1.1\" 404 683",
- "reason": "contained error"
+ "message": "\nNew alert: \nbody:\n2022-10-28 01:48:07.093 ERROR 1 --- [nio-8080-exec-1] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.ArithmeticException: / by zero] with root cause\n\njava.lang.ArithmeticException: / by zero\ncontainerid: 0dc5f07983bfdf7709ee4fce752679983c4184e94c70dab5fe6df5843d5cbb68\nsource: loggie-source-756fd6bb94-4skqv/loggie-alert/common\ncontainername: loggie-alert\nlogconfig: spring\nname: loggie-source\nnamespace: default\nnodename: docker-desktop\npodname: loggie-source-756fd6bb94-4skqv\nfilename: /var/log/pods/default_loggie-source-756fd6bb94-4skqv_9da3e440-e749-4930-8e4d-41e0d5b66417/loggie-alert/1.log\n",
+ "reason": "matches some rules"
},
- "startsAt": "2021-12-22T21:33:08.638086+08:00",
- "endsAt": "0001-01-01T00:00:00Z",
- "generatorURL": "",
- "fingerprint": "6b723d0e395b14dc"
+ "startsAt": "2022-10-28T13:12:30.527Z",
+ "endsAt": "2022-10-28T13:12:30.527Z"
}
],
- "groupLabels": {},
"commonLabels": {
- "host": "node1",
- "source": "a"
- },
- "commonAnnotations": {
- "message": "10.244.0.1 - - [13/Dec/2021:12:40:48 +0000] error \"GET / HTTP/1.1\" 404 683",
- "reason": "contained error"
- },
- "externalURL": "http://xxxxxx:9093",
- "version": "4",
- "groupKey": "{}:{}",
- "truncatedAlerts": 0
+ "module": "loggie",
+ "alertname": "loggie-test",
+ "cluster": "local"
+ }
}
```
+## Independent link detection
+
+### Principle
+
+Loggie configures the source to collect logs. When matched by `logAlert
+interceptor`, you can configure `sendOnlyMatched` to only send successfully
+matched logs to `alertWebhook sink`. Logs that fail to match are regarded as
+normal logs and ignored. It is recommended that when using `alertWebhook sink`,
+also enable `logAlert interceptor` and set `sendOnlyMatched` to `true` for use
+together.
+
+### Configuration
+
+Added `alertWebhook sink` to the configuration. For detailed configuration,
+please refer to [alertWebhook Sink](../../reference/pipelines/sink/webhook.md).
+
+!!! config
+
+ ```yaml
+ sink:
+ type: alertWebhook
+ addr: http://localhost:8080/loggie
+ linelimit: 10
+ template: |
+ {
+ "alerts":
+ [
+ {{$first := true}}
+ {{range .Alerts}}
+ {{if $first}}{{$first = false}}{{else}},{{end}}
+ {
+ "labels": {
+ "topic": "{{.fields.topic}}"
+ },
+ "annotations": {
+ "message": "\nNew alert: \nbody:\n{{range .body}}{{.}}\n {{end}}\ncontainerid: {{._meta.pipelineName}}\nsource: {{. _meta.sourceName}}\ncontainername: {{.fields. containername}}\nlogconfig: {{.fields.logconfig}}\nname: {{.fields.name}}\nnamespace: {{.fields.namespace}} \nnodename: {{.fields.nodename}}\npodname: {{.fields. podname}}\nfilename: {{.state.filename}}\n",
+ "reason": "{{.reason}}"
+ },
+ "startsAt": "{{._meta.timestamp}}",
+ "endsAt": "{{._meta.timestamp}}"
+ }
+ {{end}}
+ ],
+ {{$first := true}}
+ {{range .Alerts}}
+ {{if $first}}{{$first = false}}{{else}}
+ "commonLabels": {
+ "namespace": "{{._additions.namespace}}",
+ "module": "{{._additions.module}}",
+ "alertname": "{{._additions.alertname}}",
+ "cluster": "{{._additions.cluster}}"
+ }
+ {{end}}
+ {{end}}
+ }
+ ```
-## Independent Alarm
-!!! info
- Coming soon, stay tuned...
\ No newline at end of file
+The `logAlert Interceptor` configuration and the alarms received by the receiver
+are similar to the collection link detection alarms.
diff --git a/docs/user-guide/troubleshot/log-collection.md b/docs/user-guide/troubleshot/log-collection.md
index 872f042..b12fad6 100644
--- a/docs/user-guide/troubleshot/log-collection.md
+++ b/docs/user-guide/troubleshot/log-collection.md
@@ -10,11 +10,53 @@
Understanding the implementation mechanism is the basis for troubleshooting:
-1. Distribute collection tasks: Create a log collection task LogConfig CR in Kubernetes.
+1. Distribute collection tasks: Create a log collection task LogConfig CRD in Kubernetes.
2. Receive log configuration: The Agent Loggie of the node listens to the corresponding events of K8s and converts the LogConfig into a Pipelines configuration file.
3. Collect log files: Loggie will automatically reload and then read the configuration file, and then send the corresponding log data to downstream services according to the configuration.
-(In host scenarios, only the steps to issue the LogConfig CR are not needed, and the rest are similar)
+(For non-Kubernetes host scenarios, only the step of issuing LogConfig CRD
+configuration is missing, and the rest is similar)
+
+## Loggie Terminal
+
+In the Kubernetes scenario, Loggie currently provides a terminal-based
+interactive dashboard, which can better help us troubleshoot problems
+conveniently.
+
+### Enter Terminal
+
+Find any Loggie Pod and execute loggie inspect:
+
+```bash
+kubectl -n loggie exec -it $(kubectl -n loggie get po -o name|head -n1|cut -d/ -f2) -- ./loggie inspect
+```
+
+or,
+If you can't remember the above command, you can:
+
+- Find any Loggie Pod
+
+```bash
+kubectl -n loggie get po -o wide
+```
+
+- Enter one of the Loggie Pods and execute the loggie inspect subcommand to enter the terminal
+
+```bash
+kubectl -n loggie exec -it ${podName} -- ./loggie inspect
+```
+
+### Use terminal
+
+The terminal homepage display example is as follows:
+
+
+For details, please refer to the instructions for use
+[video](https://www.bilibili.com/video/BV1oK411R79b).
+
+!!! tips
+
+ The Loggie terminal function is only available from version v1.4. If you upgrade from a lower version, you need to add clusterrole configuration. Please refer to [here](https://github.com/loggie-io/loggie/pull/416).
## Troubleshooting Steps
@@ -32,9 +74,9 @@ If there are no events, the problmem could be:
- :question: Pod Label does not match:
The label specified in logConfig `labelSelector` does not match the pod we expect. View with the following command
```bash
- kubectl -n ${namespace} get po -owide -l ${labels}
+ kubectl -n ${namespace} get po -o wide -l ${labels}
```
- For example, use `kubectl -n ns1 get po -owide -l app=tomcat,service=web` to determine whether there is a matching Pod.
+ For example, use `kubectl -n ns1 get po -o wide -l app=tomcat,service=web` to determine whether there is a matching Pod.
If there are no events similar to sync success, you can troubleshoot the problem based on the events combined with the loggie log:
@@ -43,7 +85,7 @@ check with the following command
```bash
kubectl -n ${loggie-namespace} logs -f ${loggie-pod-name} —-tail=${N}
```
-For example, `kubectl -nloggie logs -f loggie-5x6vf --tail=100`.
+For example, `kubectl -n loggie logs -f loggie-5x6vf --tail=100`.
Check the loggie log of the corresponding node and handle it according to the log.
Common exceptions are:
diff --git a/docs/user-guide/use-in-kubernetes/collect-container-logs.md b/docs/user-guide/use-in-kubernetes/collect-container-logs.md
index d2e4b0d..83e3eab 100644
--- a/docs/user-guide/use-in-kubernetes/collect-container-logs.md
+++ b/docs/user-guide/use-in-kubernetes/collect-container-logs.md
@@ -398,6 +398,8 @@ You only need to set the configuration in values.yml in helm chart`rootFsCollect
After the modification is complete, `helm upgrade` again.
The helm template will automatically add some additional mount paths and configurations. If you upgrade from a lower version, you need to modify the deployed Daemonset yaml. Please refer to issues [#208](https://github.com/loggie-io/loggie/issues/208) for specific principles.
+Please note:
-
-
+- Container log collection without mounted volumes under containerd runtime has
+ not yet been run in large-scale production. Please use it with caution in
+ production environments.
diff --git a/docs/user-guide/use-in-kubernetes/sidecar.md b/docs/user-guide/use-in-kubernetes/sidecar.md
index ccbb276..4d487b8 100644
--- a/docs/user-guide/use-in-kubernetes/sidecar.md
+++ b/docs/user-guide/use-in-kubernetes/sidecar.md
@@ -120,3 +120,7 @@ Node:
!!! info
In the future, Loggie will support automatic Sidecar injection and automatic generation of ConfigMap mounts through LogConfig, so as to achieve the same experience as using DaemonSet.
+
+## Method 2: Automatically inject sidecar
+
+Please refer to: [https://github.com/loggie-io/loggie-operator](https://github.com/loggie-io/loggie-operator)
diff --git a/mkdocs.yml b/mkdocs.yml
index 4e1acc6..3e80a98 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -32,6 +32,8 @@ theme:
- navigation.tracking
- navigation.tabs.sticky
- navigation.top
+ - content.code.copy
+ - content.action.edit
# palette:
# - scheme: default
diff --git a/nav.yml b/nav.yml
index 276d99c..f930d04 100644
--- a/nav.yml
+++ b/nav.yml
@@ -11,8 +11,6 @@ nav:
- Deploy:
- Kubernetes: getting-started/install/kubernetes.md
- Host: getting-started/install/node.md
- - RoadMap:
- - 2023: getting-started/roadmap/roadmap-2023.md
- User Guide:
- Overview: user-guide/index.md
@@ -31,9 +29,11 @@ nav:
- Collect Kubernetes Events: user-guide/use-in-kubernetes/kube-event-source.md
- Best Practices:
+ - Log rotate and log collection: user-guide/best-practice/log-collect-rotate.md
- Log Format and Metadata: user-guide/best-practice/log-enrich.md
- Log Segmentation: user-guide/best-practice/log-process.md
- Loggie Aggregator: user-guide/best-practice/aggregator.md
+ - Adaptive sink flow control: user-guide/best-practice/concurrency.md
- Monitor and Alerm:
- Monitoring and Alarming for Loggie: user-guide/monitor/loggie-monitor.md
@@ -50,6 +50,7 @@ nav:
- Configurations:
- Overview: reference/index.md
- Start Arguments: reference/global/args.md
+ - Child Command: reference/global/subcmd.md
- Field Variable: reference/global/var.md
- System Configuration:
- monitor: reference/global/monitor.md
@@ -57,6 +58,7 @@ nav:
- reload: reference/global/reload.md
- defaults: reference/global/defaults.md
- http: reference/global/http.md
+ - db: reference/global/db.md
- Kubernetes CRD:
- LogConfig: reference/discovery/kubernetes/logconfig.md
@@ -68,6 +70,8 @@ nav:
- Overview: reference/pipelines/source/overview.md
- file: reference/pipelines/source/file.md
- kafka: reference/pipelines/source/kafka.md
+ - kafka(franz): reference/pipelines/source/franzkafka.md
+ - elasticsearch: reference/pipelines/source/elasticsearch.md
- kubeEvent: reference/pipelines/source/kube-event.md
- grpc: reference/pipelines/source/grpc.md
- prometheusExporter: reference/pipelines/source/prometheus-exporter.md
@@ -78,17 +82,23 @@ nav:
- Overview: reference/pipelines/sink/overview.md
- elasticsearch: reference/pipelines/sink/elasticsearch.md
- kafka: reference/pipelines/sink/kafka.md
+ - kafka(franz): reference/pipelines/sink/franzkafka.md
- loki: reference/pipelines/sink/loki.md
+ - pulsar: reference/pipelines/sink/pulsar.md
+ - rocketmq: reference/pipelines/sink/rocketmq.md
- grpc: reference/pipelines/sink/grpc.md
- file: reference/pipelines/sink/file.md
- dev: reference/pipelines/sink/dev.md
- sls: reference/pipelines/sink/sls.md
+ - alertwebhook: reference/pipelines/sink/webhook.md
+ - zinc: reference/pipelines/sink/zinc.md
- Interceptor:
- Overview: reference/pipelines/interceptor/overview.md
- schema: reference/pipelines/interceptor/schema.md
- transformer: reference/pipelines/interceptor/transformer.md
- rateLimit: reference/pipelines/interceptor/limit.md
+ - addHostMeta: reference/pipelines/interceptor/addhostmeta.md
- addK8sMeta: reference/pipelines/interceptor/addk8smeta.md
- logAlert: reference/pipelines/interceptor/logalert.md
- metrics: reference/pipelines/interceptor/metrics.md
@@ -109,11 +119,21 @@ nav:
- queue: reference/monitor/queue.md
- logAlert: reference/monitor/logalert.md
- sys: reference/monitor/sys.md
+ - info: reference/monitor/info.md
+ - APIs:
+ - help: reference/apis/help.md
+ - version: reference/apis/version.md
+ - config: reference/apis/config.md
+ - registry: reference/apis/registry.md
+
- Contributing:
- Contributing: developer-guide/contributing.md
+ - Compile and build: developer-guide/build.md
- Local Development: developer-guide/development.md
- Specification: developer-guide/code/coding-guide.md
- Component Development: developer-guide/component/component-guide.md
- Release Process: developer-guide/release.md
+ - Blog:
+ - Posts: blog/index.md
\ No newline at end of file