본문 바로가기
Monitoring/Signoz

오픈 소스 APM도구 SIGNOZ (2)

by abstract.jiin 2025. 2. 5.

Architecture

Logs

[application] -> log.file -> [filebeats] -> [lostash] -> [Signoz Otel Collector] -> [ClickHouse]-> [Dashboard]

 

Traces/Metics

[application] -> [prometheus] -> [signoz Otel Collector] -> [ClickHouse]-> [Dashboard]

참고 : https://signoz.io/blog/opentelemetry-collector-complete-guide/

 

 

--- 

Config

Logs

filebeats

filebeat.yml

# ============================== Filebeat inputs ===============================
filebeat.inputs:
- type: log
  id: main
  enabled: true
  paths:
    - /home/logs/gateway/trn*.log

# ------------------------------ Logstash Output -------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["localhost:5044"]

logstash

  • logstash.yml
path.logs: /home/logs/logstash/
  • pipelines.yml
 - pipeline.id: main
   path.config: "/home/elk/logstash-8.9.0/config/logstash-sample.conf"
# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
  beats {
    port => 5044
  }
}

filter {
  json {
    source => "message"
  }
}

output {
  tcp {
    codec => json_lines # this is required otherwise it will send eveything in a single line
    host => "192.168.2.13" #signoz OTEL Collector server
    port => 2255 #signoz OTEL Collector default port
  }
}      

Signoz Otel Collector

docker file 설정으로, 아래 설치 부분 compose 파일 참조

Traces/Metics

application /prometheus

  • Spring dependency
        <dependency>
            <groupId>io.github.resilience4j</groupId>
            <artifactId>resilience4j-micrometer</artifactId>
            <version>2.0.2</version>
        </dependency>
        
        <dependency>
            <groupId>io.micrometer</groupId>
            <artifactId>micrometer-registry-prometheus</artifactId>
            <scope>runtime</scope>  <!--  runtime으로 -->
        </dependency>
  • application yaml
management:
  endpoints:
    prometheus:
      enabled: true        
    web:
      exposure:
        include: "*"
  endpoint:
    shutdown:
      enabled: true
    health:
      showDetails: always
  health:
    circuit-breakers:
      enabled: true
  metrics:
    enable:
      resilience4j:
        circuitbreaker:
          calls: true

resilience4j:
  circuitbreaker:
    metrics:
      legacy:
        enabled: true #circuitbreaker 상태 metric 전송

 

 

설치 (onprem-docker)

signoz docker

https://signoz.io/docs/install/docker/

docker compose -f docker/clickhouse-setup/docker-compose.yaml up -
  • docker-compose.yaml

- "gliderlabs/logspout:v3.2.14", "jaegertracing/example-hotrod:1.30", "signoz/locust:1.2.3" 파일은 docker-compose.yaml 에서 주석

- otel-collector 에서 8888 port 활성화

  otel-collector:
    image: signoz/signoz-otel-collector:${OTELCOL_TAG:-0.88.14}
    container_name: signoz-otel-collector
    command:
      [
        "--config=/etc/otel-collector-config.yaml",
        "--manager-config=/etc/manager-config.yaml",
        "--copy-path=/var/tmp/collector-config.yaml",
        "--feature-gates=-pkg.translator.prometheus.NormalizeName"
      ]
    user: root # required for reading docker container logs
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
      - ./otel-collector-opamp-config.yaml:/etc/manager-config.yaml
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    environment:
      - OTEL_RESOURCE_ATTRIBUTES=host.name=signoz-host,os.type=linux
      - DOCKER_MULTI_NODE_CLUSTER=false
      - LOW_CARDINAL_EXCEPTION_GROUPING=false
    ports:
      - "4317:4317" # OTLP gRPC receiver
      - "4318:4318" # OTLP HTTP receiver
      - "2255:2255"
      - "8888:8888"     # OtelCollector internal metrics
    restart: on-failure
    depends_on:
      clickhouse:
        condition: service_healthy
      otel-collector-migrator:
        condition: service_completed_successfully
      query-service:
        condition: service_healthy

 

  • otel-collector-config.yaml의 pipeline

Opentelemetry의 pipeline은 Receivers, Processors, Exporters 3단계로 이루어집니다.

이 내용을 참고하여 파이프 라인이 어떻게 구성되었는지를 중점적으로 확인해주세요.

- signoz-otel-collector의 default metric server 는 8888 입니다.

service:
telemetry:
metrics:
address: 0.0.0.0:8888

- logs는 "tcplog/logstash" recievers로 2255 port에서 정보를 받아서 clickhouselogsexporter로 전달
- traces 정보는 grpc 4317 포트에서 받아서 processor로 처리 후 clickhousetraces로 전달
span/apim processors는 web에서 traces 정보 조회 시 service이름이 engine, gateway일 때
지정 메소드(GET*PUT*DELETE*POST*)에 http.target, http.url 정보가 있을 시에 해당 정보값을 붙여주도록 설정한 것입니다.

 

  • otel-collector-config.yaml
receivers:
  tcplog/logstash:
    max_log_size: 1MiB
    listen_address: "0.0.0.0:2255"
    attributes: {}
    resource: {}
    add_attributes: false
    operators: []
  opencensus:
    endpoint: 0.0.0.0:55678
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
      # thrift_compact:
      #   endpoint: 0.0.0.0:6831
      # thrift_binary:
      #   endpoint: 0.0.0.0:6832
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu: {}
      load: {}
      memory: {}
      disk: {}
      filesystem: {}
      network: {}
      process:
        mute_process_name_error: true
        mute_process_exe_error: true
        mute_process_io_error: true
      processes: {}
  prometheus:
    config:
      global:
        scrape_interval: 10s
      scrape_configs:
        # otel-collector internal metrics
        - job_name: otel-collector
          static_configs:
          - targets:
              - localhost:8888
            labels:
              job_name: otel-collector
              #        - job_name: otel-collector
              #          metrics_path: "/actuator/prometheus" 
          #          static_configs:
          #          - targets:
          #              - 192.168.2.13:38080
          #              labels:
          #              job_name: gateway-collector
processors:
  batch:
    send_batch_size: 10000
    send_batch_max_size: 11000
    timeout: 10s
  signozspanmetrics/cumulative:
    metrics_exporter: clickhousemetricswrite
    metrics_flush_interval: 60s
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
    dimensions_cache_size: 100000
    dimensions:
      - name: service.namespace
        default: default
      - name: deployment.environment
        default: default
      # This is added to ensure the uniqueness of the timeseries
      # Otherwise, identical timeseries produced by multiple replicas of
      # collectors result in incorrect APM metrics
      - name: 'signoz.collector.id'
  # memory_limiter:
  #   # 80% of maximum memory up to 2G
  #   limit_mib: 1500
  #   # 25% of limit up to 2G
  #   spike_limit_mib: 512
  #   check_interval: 5s
  #
  #   # 50% of the maximum memory
  #   limit_percentage: 50
  #   # 20% of max memory usage spike expected
  #   spike_limit_percentage: 20
  # queued_retry:
  #   num_workers: 4
  #   queue_size: 100
  #   retry_on_failure: true
  resourcedetection:
    # Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels.
    detectors: [env, system] # include ec2 for AWS, gcp for GCP and azure for Azure.
    timeout: 2s
  signozspanmetrics/delta:
    metrics_exporter: clickhousemetricswrite
    metrics_flush_interval: 60s
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1000ms, 1400ms, 2000ms, 5s, 10s, 20s, 40s, 60s ]
    dimensions_cache_size: 100000
    aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
    enable_exp_histogram: true
    dimensions:
      - name: service.namespace
        default: default
      - name: deployment.environment
        default: default
      # This is added to ensure the uniqueness of the timeseries
      # Otherwise, identical timeseries produced by multiple replicas of
      # collectors result in incorrect APM metrics
      - name: signoz.collector.id
  span/apim:
    include:
      match_type: regexp
      services: ["gateway", "engine"]
      span_names: ["GET.*", "PUT.*", "POST.*", "DELETE.*"]
    name:
      separator: " "
      from_attributes: [http.method, http.target]

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  zpages:
    endpoint: 0.0.0.0:55679
  pprof:
    endpoint: 0.0.0.0:1777

exporters:
  clickhousetraces:
    datasource: tcp://clickhouse:9000/?database=signoz_traces
    docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
    low_cardinal_exception_grouping: ${LOW_CARDINAL_EXCEPTION_GROUPING}
  clickhousemetricswrite:
    endpoint: tcp://clickhouse:9000/?database=signoz_metrics
    resource_to_telemetry_conversion:
      enabled: true
  clickhousemetricswrite/prometheus:
    endpoint: tcp://clickhouse:9000/?database=signoz_metrics
  # logging: {}

  clickhouselogsexporter:
    dsn: tcp://clickhouse:9000/
    docker_multi_node_cluster: ${DOCKER_MULTI_NODE_CLUSTER}
    timeout: 10s

service:
  telemetry:
    metrics:
      address: 0.0.0.0:8888
  extensions:
    - health_check
    - zpages
    - pprof
  pipelines:
    traces:
      receivers: [jaeger, otlp]
      processors: [span/apim, signozspanmetrics/cumulative, signozspanmetrics/delta, batch]
      exporters: [clickhousetraces]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [clickhousemetricswrite]
    metrics/generic:
      receivers: [hostmetrics]
      processors: [resourcedetection, batch]
      exporters: [clickhousemetricswrite]
    metrics/prometheus:
      receivers: [prometheus]
      processors: [batch]
      exporters: [clickhousemetricswrite/prometheus]
    logs:
            #      receivers: [otlp, tcplog/docker]
      receivers: [tcplog/logstash]
      processors: [batch]
      exporters: [clickhouselogsexporter]

 

otel javaagent 설정

OTEL_LOGS_EXPORTER=otlp OTEL_EXPORTER_OTLP_ENDPOINT="http://<host>:4317" OTEL_RESOURCE_ATTRIBUTES=service.name=myapp java -javaagent:/path/opentelemetry-javaagent.jar -jar target/*.jar

https://signoz.io/docs/userguide/collecting_application_logs_otel_sdk_java/

https://opentelemetry.io/docs/languages/java/automatic/