Architecture
Logs
[application] -> log.file -> [filebeats] -> [lostash] -> [Signoz Otel Collector] -> [ClickHouse]-> [Dashboard]
Traces/Metics
[application] -> [prometheus] -> [signoz Otel Collector] -> [ClickHouse]-> [Dashboard]
참고 : https://signoz.io/blog/opentelemetry-collector-complete-guide/
---
Config
Logs
filebeats
filebeat.yml
# ============================== Filebeat inputs ===============================
filebeat.inputs:
- type: log
id: main
enabled: true
paths:
- /home/logs/gateway/trn*.log
# ------------------------------ Logstash Output -------------------------------
output.logstash:
# The Logstash hosts
hosts: ["localhost:5044"]
logstash
- logstash.yml
path.logs: /home/logs/logstash/
- pipelines.yml
- pipeline.id: main
path.config: "/home/elk/logstash-8.9.0/config/logstash-sample.conf"
# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.
input {
beats {
port => 5044
}
}
filter {
json {
source => "message"
}
}
output {
tcp {
codec => json_lines # this is required otherwise it will send eveything in a single line
host => "192.168.2.13" #signoz OTEL Collector server
port => 2255 #signoz OTEL Collector default port
}
}
Signoz Otel Collector
docker file 설정으로, 아래 설치 부분 compose 파일 참조
Traces/Metics
application /prometheus
- Spring dependency
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-micrometer</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<scope>runtime</scope> <!-- runtime으로 -->
</dependency>
- application yaml
management:
endpoints:
prometheus:
enabled: true
web:
exposure:
include: "*"
endpoint:
shutdown:
enabled: true
health:
showDetails: always
health:
circuit-breakers:
enabled: true
metrics:
enable:
resilience4j:
circuitbreaker:
calls: true
resilience4j:
circuitbreaker:
metrics:
legacy:
enabled: true #circuitbreaker 상태 metric 전송
설치 (onprem-docker)
signoz docker
https://signoz.io/docs/install/docker/
docker compose -f docker/clickhouse-setup/docker-compose.yaml up -
- docker-compose.yaml
- "gliderlabs/logspout:v3.2.14", "jaegertracing/example-hotrod:1.30", "signoz/locust:1.2.3" 파일은 docker-compose.yaml 에서 주석
- otel-collector 에서 8888 port 활성화
otel-collector:
image: signoz/signoz-otel-collector:${OTELCOL_TAG:-0.88.14}
container_name: signoz-otel-collector
command:
[
"--config=/etc/otel-collector-config.yaml",
"--manager-config=/etc/manager-config.yaml",
"--copy-path=/var/tmp/collector-config.yaml",
"--feature-gates=-pkg.translator.prometheus.NormalizeName"
]
user: root # required for reading docker container logs
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
- ./otel-collector-opamp-config.yaml:/etc/manager-config.yaml
- /var/lib/docker/containers:/var/lib/docker/containers:ro
environment:
- OTEL_RESOURCE_ATTRIBUTES=host.name=signoz-host,os.type=linux
- DOCKER_MULTI_NODE_CLUSTER=false
- LOW_CARDINAL_EXCEPTION_GROUPING=false
ports:
- "4317:4317" # OTLP gRPC receiver
- "4318:4318" # OTLP HTTP receiver
- "2255:2255"
- "8888:8888" # OtelCollector internal metrics
restart: on-failure
depends_on:
clickhouse:
condition: service_healthy
otel-collector-migrator:
condition: service_completed_successfully
query-service:
condition: service_healthy
- otel-collector-config.yaml의 pipeline
Opentelemetry의 pipeline은 Receivers, Processors, Exporters 3단계로 이루어집니다.
이 내용을 참고하여 파이프 라인이 어떻게 구성되었는지를 중점적으로 확인해주세요.
- signoz-otel-collector의 default metric server 는 8888 입니다.
service:
telemetry:
metrics:
address: 0.0.0.0:8888
- logs는 "tcplog/logstash" recievers로 2255 port에서 정보를 받아서 clickhouselogsexporter로 전달
- traces 정보는 grpc 4317 포트에서 받아서 processor로 처리 후 clickhousetraces로 전달
span/apim processors는 web에서 traces 정보 조회 시 service이름이 engine, gateway일 때
지정 메소드(GET*PUT*DELETE*POST*)에 http.target, http.url 정보가 있을 시에 해당 정보값을 붙여주도록 설정한 것입니다.
- otel-collector-config.yaml
receivers:
tcplog/logstash:
max_log_size: 1MiB
listen_address: "0.0.0.0:2255"
attributes: {}
resource: {}
add_attributes: false
operators: []
opencensus:
endpoint: 0.0.0.0:55678
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
thrift_http:
endpoint: 0.0.0.0:14268
# thrift_compact:
# endpoint: 0.0.0.0:6831
# thrift_binary:
# endpoint: 0.0.0.0:6832
hostmetrics:
collection_interval: 30s
scrapers:
cpu: {}
load: {}
memory: {}
disk: {}
filesystem: {}
network: {}
process:
mute_process_name_error: true
mute_process_exe_error: true
mute_process_io_error: true
processes: {}
prometheus:
config:
global:
scrape_interval: 10s
scrape_configs:
# otel-collector internal metrics
- job_name: otel-collector
static_configs:
- targets:
- localhost:8888
labels:
job_name: otel-collector
# - job_name: otel-collector
# metrics_path: "/actuator/prometheus"
# static_configs:
# - targets:
# - 192.168.2.13:38080
# labels:
# job_name: gateway-collector
processors:
batch:
send_batch_size: 10000
send_batch_max_size: 11000
timeout: 10s
signozspanmetrics/cumulative:
metrics_exporter: clickhousemetricswrite
metrics_flush_interval: 60s
latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
dimensions_cache_size: 100000
dimensions:
- name: service.namespace
default: default
- name: deployment.environment
default: default
# This is added to ensure the uniqueness of the timeseries
# Otherwise, identical timeseries produced by multiple replicas of
# collectors result in incorrect APM metrics
- name: 'signoz.collector.id'
# memory_limiter:
# # 80% of maximum memory up to 2G
# limit_mib: 1500
# # 25% of limit up to 2G
# spike_limit_mib: 512
# check_interval: 5s
#
# # 50% of the maximum memory
# limit_percentage: 50
# # 20% of max memory usage spike expected
# spike_limit_percentage: 20
# queued_retry:
# num_workers: 4
# queue_size: 100
# retry_on_failure: true
resourcedetection:
# Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels.
detectors: [env, system] # include ec2 for AWS, gcp for GCP and azure for Azure.
timeout: 2s
signozspanmetrics/delta:
metrics_exporter: clickhousemetricswrite
metrics_flush_interval: 60s
latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
dimensions_cache_size: 100000
aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
enable_exp_histogram: true
dimensions:
- name: service.namespace
default: default
- name: deployment.environment
default: default
# This is added to ensure the uniqueness of the timeseries
# Otherwise, identical timeseries produced by multiple replicas of
# collectors result in incorrect APM metrics
- name: signoz.collector.id
span/apim:
include:
match_type: regexp
services: ["gateway", "engine"]
span_names: ["GET.*", "PUT.*", "POST.*", "DELETE.*"]
name:
separator: " "
from_attributes: [http.method, http.target]
extensions:
health_check:
endpoint: 0.0.0.0:13133
zpages:
endpoint: 0.0.0.0:55679
pprof:
endpoint: 0.0.0.0:1777
exporters:
clickhousetraces:
datasource: tcp://clickhouse:9000/?database=signoz_traces
docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
low_cardinal_exception_grouping: ${LOW_CARDINAL_EXCEPTION_GROUPING}
clickhousemetricswrite:
endpoint: tcp://clickhouse:9000/?database=signoz_metrics
resource_to_telemetry_conversion:
enabled: true
clickhousemetricswrite/prometheus:
endpoint: tcp://clickhouse:9000/?database=signoz_metrics
# logging: {}
clickhouselogsexporter:
dsn: tcp://clickhouse:9000/
docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
timeout: 10s
service:
telemetry:
metrics:
address: 0.0.0.0:8888
extensions:
- health_check
- zpages
- pprof
pipelines:
traces:
receivers: [jaeger, otlp]
processors: [span/apim, signozspanmetrics/cumulative, signozspanmetrics/delta, batch]
exporters: [clickhousetraces]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [clickhousemetricswrite]
metrics/generic:
receivers: [hostmetrics]
processors: [resourcedetection, batch]
exporters: [clickhousemetricswrite]
metrics/prometheus:
receivers: [prometheus]
processors: [batch]
exporters: [clickhousemetricswrite/prometheus]
logs:
# receivers: [otlp, tcplog/docker]
receivers: [tcplog/logstash]
processors: [batch]
exporters: [clickhouselogsexporter]
otel javaagent 설정
OTEL_LOGS_EXPORTER=otlp OTEL_EXPORTER_OTLP_ENDPOINT="http://<host>:4317" OTEL_RESOURCE_ATTRIBUTES=service.name=myapp java -javaagent:/path/opentelemetry-javaagent.jar -jar target/*.jar
https://signoz.io/docs/userguide/collecting_application_logs_otel_sdk_java/
https://opentelemetry.io/docs/languages/java/automatic/
- otel java agent는 spring에서 dependency 로 심는 방법과, 외부 jar파일을 물고 올라가도록 설정하는 방법 이렇게 두 가지 방법이 있습니다.
- 외부에서 설정하는 것이 더 유연하다는 판단으로 외부 jar파일을 실행할 때 함께 실행하는 것으로 설정했습니다.
- jar latest 파일 경로 :https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar
- OpenTelemetry Instrumentation for Java 공식 github >> https://github.com/open-telemetry/opentelemetry-java-instrumentation?tab=readme-ov-file#getting-started
- meven dependency로 설정 방법 : https://opentelemetry.io/docs/languages/java/
'Monitoring > Signoz' 카테고리의 다른 글
Signoz Cloud 버전을 Helm chart to Yaml로 설치 (1) (2) | 2025.02.07 |
---|---|
오픈 소스 APM도구 SIGNOZ (1) (0) | 2025.02.05 |
Signoz 모니터링에 앞서, 사용한 로그 형식 (0) | 2025.02.03 |
ClickhouseDB JSONExtractString 쿼리 실행에 CANNOT_ALLOCATE_MEMORY 에러가 발생하는 오류 (0) | 2024.06.07 |