본문 바로가기
Monitoring/Signoz

ClickHouse 이중화(HA) 구성 가이드 - 서버 분리

by abstract.jiin 2025. 6. 19.

ClickHouse 이중화(HA) 구성 가이드 - 서버 분리

개요

SigNoz 모니터링 시스템의 ClickHouse 데이터베이스 이중화 구축

시스템 구성

구성도

서버

구분 서버 IP 주소 역할 컨테이너
Node 1 13번 서버 192.168.2.13 Primary + ZooKeeper clickhouse-1, zookeeper-1, zookeeper-3
Node 2 12번 서버 192.168.2.12 Secondary clickhouse-2, zookeeper-2

ClickHouse 클러스터 설정

  • 클러스터명: cluster
  • 샤드 수: 1개
  • 복제본 수: 2개 (각 서버당 1개)
  • 엔진: ReplicatedMergeTree
  • 복제 방식: 양방향 자동 복제

ZooKeeper 클러스터

  • 노드 수: 3개 (권장 홀수 구성)
  • 분산: Node 1(2개), Node 2(1개)
  • 포트: 2181, 2181, 2183

네트워크 구성

서비스 포트 용도 방화벽 오픈 필요
ClickHouse HTTP 8123 웹 인터페이스 O
ClickHouse TCP 9000 클라이언트 연결 O
ZooKeeper 2181, 2183 클러스터 조정 O
ZooKeeper 2888 Follower 포트(데이터 동기화) O
ZooKeeper 3888 Election 포트(리더선출) O
ClickHouse 9009 서버간 복제 통신
내부프로토콜 X

주요 설정 파일 - Node 1

Node 1 13번 서버 192.168.2.13
- docker-compose.yml
```bash
x-clickhouse-defaults: &clickhouse-defaults
  restart: on-failure
  image: clickhouse/clickhouse-server:24.1.2-alpine
  tty: true
  depends_on:
    - zookeeper-1
    - zookeeper-3
  logging:
    options:
      max-size: 50m
      max-file: "3"
  healthcheck:
    test: ["CMD", "wget", "--spider", "-q", "0.0.0.0:8123/ping"]
    interval: 30s
    timeout: 5s
    retries: 3
  ulimits:
    nproc: 65535
    nofile:
      soft: 262144
      hard: 262144

x-db-depend: &db-depend
  depends_on:
    clickhouse:
      condition: service_healthy
    otel-collector-migrator-sync:
      condition: service_completed_successfully

services:

  zookeeper-1:
    image: bitnami/zookeeper:3.7.1
    container_name: zookeeper-1
    hostname: zookeeper-1
    user: ${USER_UID}
    extra_hosts:
      - "zookeeper-1:192.168.2.13"
      - "zookeeper-2:192.168.2.12"
      - "zookeeper-3:192.168.2.13"
      - "clickhouse-1:192.168.2.13"
      - "clickhouse-2:192.168.2.12"
    ports:
      - "2181:2181"
      - "2888:2888"
      - "3888:3888"
    volumes:
      - ${TEST_DATA}/zookeeper-1:/bitnami/zookeeper
    environment:
      - ZOO_SERVER_ID=1
      - ZOO_SERVERS=0.0.0.0:2888:3888,zookeeper-2:2888:3888,zookeeper-3:2888:3888
      - ALLOW_ANONYMOUS_LOGIN=yes
      - ZOO_AUTOPURGE_INTERVAL=1
      - ZOO_TICK_TIME=2000
      - ZOO_INIT_LIMIT=10
      - ZOO_SYNC_LIMIT=5
      - ZOO_MAX_CLIENT_CNXNS=300
    healthcheck:
      test: ["CMD", "zkServer.sh", "status"]
      interval: 10s
      timeout: 5s
      retries: 3

  zookeeper-3:
    image: bitnami/zookeeper:3.7.1
    container_name: zookeeper-3
    hostname: zookeeper-3
    user: ${USER_UID}
    extra_hosts:
      - "zookeeper-1:192.168.2.13"
      - "zookeeper-2:192.168.2.12"
      - "zookeeper-3:192.168.2.13"
      - "clickhouse-1:192.168.2.13"
      - "clickhouse-2:192.168.2.12"
    ports:
      - "2183:2181"
      - "2890:2888"
      - "3890:3888"
    volumes:
      - ${TEST_DATA}/zookeeper-3:/bitnami/zookeeper
    environment:
      - ZOO_SERVER_ID=3
      - ZOO_SERVERS=zookeeper-1:2888:3888,zookeeper-2:2888:3888,0.0.0.0:2888:3888
      - ALLOW_ANONYMOUS_LOGIN=yes
      - ZOO_AUTOPURGE_INTERVAL=1
      - ZOO_TICK_TIME=2000
      - ZOO_INIT_LIMIT=10
      - ZOO_SYNC_LIMIT=5
      - ZOO_MAX_CLIENT_CNXNS=300
    healthcheck:
      test: ["CMD", "zkServer.sh", "status"]
      interval: 10s
      timeout: 5s
      retries: 3

  clickhouse:
    <<: *clickhouse-defaults
    container_name: clickhouse-1
    hostname: clickhouse-1
    user: ${USER_UID}:${USER_UID}
    extra_hosts:
      - "zookeeper-1:192.168.2.13"
      - "zookeeper-2:192.168.2.12"
      - "zookeeper-3:192.168.2.13"
      - "clickhouse-1:127.0.0.1"
      - "clickhouse-2:192.168.2.12"
    ports:
      - "9000:9000"
      - "8123:8123"
      - "9181:9181"
      - "9009:9009"
    volumes:
      - ./clickhouse-config.xml:/etc/clickhouse-server/config.xml
      - ./clickhouse-users.xml:/etc/clickhouse-server/users.xml
      - ./custom-function.xml:/etc/clickhouse-server/custom-function.xml
      - ./clickhouse-cluster-server1.xml:/etc/clickhouse-server/config.d/cluster.xml
      - ${TEST_DATA}/clickhouse/:/var/lib/clickhouse/
    environment:
      - CLICKHOUSE_DISTRIBUTED_DDL_TASK_TIMEOUT=1800
      - CLICKHOUSE_CONNECTION_POOL_SIZE=1024
      - CLICKHOUSE_MAX_CONNECTIONS=4096
    sysctls:
      - net.ipv6.conf.all.disable_ipv6=1
    depends_on:
      zookeeper-1:
        condition: service_healthy
      zookeeper-3:
        condition: service_healthy

  alertmanager:
    image: signoz/alertmanager:${ALERTMANAGER_TAG:-0.23.7}
    container_name: signoz-alertmanager
    user: ${USER_UID}
    volumes:
      - ${TEST_DATA}/alertmanager:/data
    depends_on:
      query-service:
        condition: service_healthy
    restart: on-failure
    command:
      - --queryService.url=http://query-service:8085
      - --storage.path=/data

  query-service:
    image: signoz/query-service:${DOCKER_TAG:-0.62.0}
    container_name: signoz-query-service
    command:
      [
        "-config=/root/config/prometheus.yml",
        "--use-logs-new-schema=true"
      ]
    volumes:
      - ./prometheus.yml:/root/config/prometheus.yml
      - ../dashboards:/root/config/dashboards
      - ${TEST_DATA}/signoz/:/var/lib/signoz/
    environment:
      - ClickHouseUrl=tcp://clickhouse-1:9000
      - ALERTMANAGER_API_PREFIX=http://alertmanager:9093/api/
      - SIGNOZ_LOCAL_DB_PATH=/var/lib/signoz/signoz.db
      - DASHBOARDS_PATH=/root/config/dashboards
      - STORAGE=clickhouse
      - GODEBUG=netdns=go
      - TELEMETRY_ENABLED=true
      - DEPLOYMENT_TYPE=docker-standalone-amd
    restart: on-failure
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "localhost:8080/api/v1/health"]
      interval: 30s
      timeout: 5s
      retries: 3
    <<: *db-depend

  frontend:
    image: signoz/frontend:${DOCKER_TAG:-0.62.0}
    container_name: signoz-frontend
    restart: on-failure
    depends_on:
      - alertmanager
      - query-service
    ports:
      - "3301:3301"
    volumes:
      - ../common/nginx-config.conf:/etc/nginx/conf.d/default.conf

  otel-collector-migrator-sync:
    image: signoz/signoz-schema-migrator:${OTELCOL_TAG:-0.111.15}
    container_name: otel-migrator-sync
    user: ${USER_UID}
    extra_hosts:
      - "zookeeper-1:192.168.2.13"
      - "zookeeper-2:192.168.2.12"
      - "zookeeper-3:192.168.2.13"
      - "clickhouse-1:192.168.2.13"
      - "clickhouse-2:192.168.2.12"
    environment:
      - DOCKER_MULTI_NODE_CLUSTER=true
      - CLICKHOUSE_DISTRIBUTED_DDL_TASK_TIMEOUT=1800
      - CLICKHOUSE_CONNECTION_TIMEOUT=30000
      - CLICKHOUSE_RECEIVE_TIMEOUT=600000
      - CLICKHOUSE_SEND_TIMEOUT=600000
    command:
      - "sync"
      - "--dsn=tcp://clickhouse-1:9000"
      - "--replication=true"
      - "--cluster-name=cluster"
      - "--up="
    depends_on:
      clickhouse:
        condition: service_healthy
    restart: on-failure

  otel-collector-migrator-async:
    image: signoz/signoz-schema-migrator:${OTELCOL_TAG:-0.111.15}
    container_name: otel-migrator-async
    extra_hosts:
      - "zookeeper-1:192.168.2.13"
      - "zookeeper-2:192.168.2.12"
      - "zookeeper-3:192.168.2.13"
      - "clickhouse-1:127.0.0.1"
      - "clickhouse-2:192.168.2.12"
    user: ${USER_UID}
    environment:
      - DOCKER_MULTI_NODE_CLUSTER=true
      - CLICKHOUSE_CONNECTION_TIMEOUT=30000
    command:
      - "async"
      - "--dsn=tcp://clickhouse-1:9000"
      - "--up="
    depends_on:
      clickhouse:
        condition: service_healthy
      otel-collector-migrator-sync:
        condition: service_completed_successfully
    restart: on-failure

  otel-collector:
    image: signoz/signoz-otel-collector:${OTELCOL_TAG:-0.111.15}
    container_name: signoz-otel-collector
    command:
      [
        "--config=/etc/otel-collector-config.yaml",
        "--manager-config=/etc/manager-config.yaml",
        "--copy-path=/var/tmp/collector-config.yaml",
        "--feature-gates=-pkg.translator.prometheus.NormalizeName"
      ]
    user: ${USER_UID}
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
      - ./otel-collector-opamp-config.yaml:/etc/manager-config.yaml
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /:/hostfs:ro
    environment:
      - OTEL_RESOURCE_ATTRIBUTES=host.name=TEST-prod,os.type=linux
      - DOCKER_MULTI_NODE_CLUSTER=true
      - LOW_CARDINAL_EXCEPTION_GROUPING=false
      - GOMEMLIMIT=4GiB
    ports:
      - "1777:1777"
      - "4317:4317"
      - "4318:4318"
      - "2255:2255"
      - "8888:8888"
      - "8889:8889"
      - "13133:13133"
      - "14250:14250"
      - "14268:14268"
    restart: on-failure
    deploy:
      resources:
        limits:
          memory: 6G
        reservations:
          memory: 4G
    depends_on:
      clickhouse:
        condition: service_healthy
      otel-collector-migrator-sync:
        condition: service_completed_successfully
      query-service:
        condition: service_healthy

```
  • clickhouse-cluster-server1.xml

    <?xml version="1.0"?>
    <clickhouse>
    
        <listen_host>0.0.0.0</listen_host>
        <interserver_http_host>192.168.2.13</interserver_http_host>
        <interserver_http_port>9009</interserver_http_port>
        <zookeeper>
            <node index="1">
                <host>zookeeper-1</host>
                <port>2181</port>
            </node>
            <node index="2">
                <host>zookeeper-2</host>
                <port>2181</port>
            </node>
            <node index="3">
                <host>zookeeper-3</host>
                <port>2183</port>
            </node>
        </zookeepe
    
        <remote_servers>
            <cluster>
                <shard>
                    <internal_replication>true</internal_replication>
                    <replica>
                        <host>clickhouse-1</host>
                        <port>9000</port>
                        <user>default</user>
                        <password></password>
                    </replica>
                    <replica>
                        <host>clickhouse-2</host>
                        <port>9000</port>
                        <user>default</user>
                        <password></password>
                    </replica>
                </shard>
            </cluster>
        </remote_servers>
    
        <macros>
            <cluster>cluster</cluster>
            <shard>01</shard>
            <replica>replica_server_a</replica>
        </macros>
    </clickhouse>
    

주요 설정 파일 - Node 2

Node 2 12번 서버 192.168.2.12
- docker-compose.yml
```bash
x-clickhouse-defaults: &clickhouse-defaults
  restart: on-failure
  image: clickhouse/clickhouse-server:24.1.2-alpine
  tty: true
  depends_on:
    - zookeeper-2
  logging:
    options:
      max-size: 50m
      max-file: "3"
  healthcheck:
    test: ["CMD", "wget", "--spider", "-q", "0.0.0.0:8123/ping"]
    interval: 30s
    timeout: 5s
    retries: 3
  ulimits:
    nproc: 65535
    nofile:
      soft: 262144
      hard: 262144

services:

  zookeeper-2:
    image: bitnami/zookeeper:3.7.1
    container_name: zookeeper-2
    hostname: zookeeper-2
    user: ${USER_UID}
    extra_hosts:
      - "zookeeper-1:192.168.2.13"
      - "zookeeper-2:192.168.2.12"
      - "zookeeper-3:192.168.2.13"
      - "clickhouse-1:192.168.2.13"
      - "clickhouse-2:192.168.2.12"
    ports:
      - "2181:2181"
      - "2888:2888"
      - "3888:3888"
    volumes:
      - ${TEST_DATA}/zookeeper-2:/bitnami/zookeeper
    environment:
      - ZOO_SERVER_ID=2
      - ZOO_SERVERS=zookeeper-1:2888:3888,0.0.0.0:2888:3888,zookeeper-3:2888:3888
      - ALLOW_ANONYMOUS_LOGIN=yes
      - ZOO_AUTOPURGE_INTERVAL=1
      - ZOO_TICK_TIME=2000
      - ZOO_INIT_LIMIT=10
      - ZOO_SYNC_LIMIT=5
      - ZOO_MAX_CLIENT_CNXNS=300
    healthcheck:
      test: ["CMD", "zkServer.sh", "status"]
      interval: 10s
      timeout: 5s
      retries: 3

  clickhouse:
    <<: *clickhouse-defaults
    container_name: clickhouse-2
    hostname: clickhouse-2
    user: ${USER_UID}:${USER_UID}
    extra_hosts:
      - "zookeeper-1:192.168.2.13"
      - "zookeeper-2:192.168.2.12"
      - "zookeeper-3:192.168.2.13"
      - "clickhouse-1:192.168.2.13"
      - "clickhouse-2:127.0.0.1"
    ports:
      - "9000:9000"
      - "8123:8123"
      - "9181:9181"
      - "9009:9009"
    volumes:
      - ./clickhouse-config.xml:/etc/clickhouse-server/config.xml
      - ./clickhouse-users.xml:/etc/clickhouse-server/users.xml
      - ./custom-function.xml:/etc/clickhouse-server/custom-function.xml
      - ./clickhouse-cluster-server2.xml:/etc/clickhouse-server/config.d/cluster.xml
      - ${TEST_DATA}/clickhouse/:/var/lib/clickhouse/
    environment:
      - CLICKHOUSE_DISTRIBUTED_DDL_TASK_TIMEOUT=1800
      - CLICKHOUSE_CONNECTION_POOL_SIZE=1024
      - CLICKHOUSE_MAX_CONNECTIONS=4096
    sysctls:
      - net.ipv6.conf.all.disable_ipv6=1
    depends_on:
      zookeeper-2:
        condition: service_healthy

```
  • clickhouse-cluster-server2.xml

    <?xml version="1.0"?>
    <clickhouse>
    
        <!-- 네트워크 설정 -->
        <listen_host>0.0.0.0</listen_host>
        <interserver_http_host>192.168.2.12</interserver_http_host>
        <interserver_http_port>9009</interserver_http_port>
    
        <!-- ZooKeeper 설정 -->
        <zookeeper>
            <node index="1">
                <host>zookeeper-1</host>
                <port>2181</port>
            </node>
          <node index="2">
                <host>zookeeper-2</host>
                <port>2181</port>
            </node>
            <node index="3">
                <host>zookeeper-3</host>
                <port>2183</port>
            </node>
        </zookeeper>
    
        <remote_servers>
            <cluster>
                <shard>
                    <internal_replication>true</internal_replication>
                    <replica>
                        <host>clickhouse-1</host>
                        <port>9000</port>
                        <user>default</user>
                        <password></password>
                    </replica>
                    <replica>
                        <host>clickhouse-2</host>
                        <port>9000</port>
                        <user>default</user>
                        <password></password>
                    </replica>
                </shard>
            </cluster>
        </remote_servers>
    
        <macros>
            <cluster>cluster</cluster> 
            <shard>01</shard>
            <replica>replica_server_b</replica>
        </macros>
    </clickhouse>
    

테스트 결과

최종 검증 결과 (2025-06-18)

=== 최종 테스트 결과 ===
총 테스트: 21개
통과: 21개 ✅
실패: 0개 ❌

세부 테스트 항목

테스트 항목 서버 A 서버 B 상태
기본 연결 O O 성공
클러스터 인식 O O 성공
SignOz 테이블 수 38개 38개 일치
Distributed 테이블 수 15개 15개 일치
span_attributes 123개 123개 동기화 완료
signoz_spans 190개 190개 동기화 완료
traces_v3_resource 3개 3개 동기화 완료
응답 속도 4ms 4ms 고성능

테스트 스크립트 및 수동 테스트 명령어

clickhouse_ha_test.sh: 종합 이중화 상태 테스트

  • clickhouse_ha_test.sh

    #!/bin/bash
    # clickhouse_ha_test.sh - ClickHouse 서버간 이중화 테스트
    
    echo "=== ClickHouse 서버간 이중화 테스트 ==="
    
    # 서버 정보 설정
    SERVER_A_IP="192.168.2.12"
    SERVER_B_IP="192.168.2.13"
    SERVER_A_PORT="8123"
    SERVER_B_PORT="8123"
    
    echo "테스트 대상:"
    echo "  - 서버 A: $SERVER_A_IP:$SERVER_A_PORT"
    echo "  - 서버 B: $SERVER_B_IP:$SERVER_B_PORT"
    echo ""
    
    # 색상 정의
    RED='\033[0;31m'
    GREEN='\033[0;32m'
    YELLOW='\033[1;33m'
    BLUE='\033[0;34m'
    NC='\033[0m' # No Color
    
    # 테스트 결과 추적
    PASS_COUNT=0
    FAIL_COUNT=0
    
    # 테스트 함수
    test_query() {
        local server_name="$1"
        local server_url="$2"
        local query="$3"
        local description="$4"
    
        echo -e "${BLUE}[테스트]${NC} $description"
        echo "  서버: $server_name ($server_url)"
        echo "  쿼리: $query"
    
        result=$(curl -s "$server_url/?query=$(echo "$query" | sed 's/ /%20/g')" 2>/dev/null)
    
        if [ $? -eq 0 ] && [ -n "$result" ]; then
            echo -e "  ${GREEN}✓ 성공${NC}: $result"
            ((PASS_COUNT++))
            return 0
        else
            echo -e "  ${RED}✗ 실패${NC}: 연결 불가 또는 응답 없음"
            ((FAIL_COUNT++))
            return 1
        fi
    }
    
    # 1. 기본 연결 테스트
    echo -e "\n${YELLOW}=== 1. 기본 연결 테스트 ===${NC}"
    
    test_query "서버 A" "http://$SERVER_A_IP:$SERVER_A_PORT" "SELECT 'Server A Connected' as status" "서버 A 연결 확인"
    test_query "서버 B" "http://$SERVER_B_IP:$SERVER_B_PORT" "SELECT 'Server B Connected' as status" "서버 B 연결 확인"
    
    # 2. 클러스터 상태 확인
    echo -e "\n${YELLOW}=== 2. 클러스터 상태 확인 ===${NC}"
    
    echo -e "${BLUE}[테스트]${NC} 서버 A 클러스터 설정"
    curl -s "http://$SERVER_A_IP:$SERVER_A_PORT/?query=SELECT%20cluster,%20shard_num,%20replica_num,%20host_name,%20host_address%20FROM%20system.clusters%20WHERE%20cluster='cluster'%20FORMAT%20Pretty" || echo "클러스터 정보 조회 실패"
    
    echo -e "\n${BLUE}[테스트]${NC} 서버 B 클러스터 설정"
    curl -s "http://$SERVER_B_IP:$SERVER_B_PORT/?query=SELECT%20cluster,%20shard_num,%20replica_num,%20host_name,%20host_address%20FROM%20system.clusters%20WHERE%20cluster='cluster'%20FORMAT%20Pretty" || echo "클러스터 정보 조회 실패"
    
    # 3. SignOz 테이블 존재 확인
    echo -e "\n${YELLOW}=== 3. SignOz 테이블 존재 확인 ===${NC}"
    
    test_query "서버 A" "http://$SERVER_A_IP:$SERVER_A_PORT" "SELECT count() FROM system.tables WHERE database = 'signoz_traces'" "서버 A SignOz 테이블 수"
    test_query "서버 B" "http://$SERVER_B_IP:$SERVER_B_PORT" "SELECT count() FROM system.tables WHERE database = 'signoz_traces'" "서버 B SignOz 테이블 수"
    
    # 4. Distributed 테이블 확인
    echo -e "\n${YELLOW}=== 4. Distributed 테이블 확인 ===${NC}"
    
    test_query "서버 A" "http://$SERVER_A_IP:$SERVER_A_PORT" "SELECT count() FROM system.tables WHERE database = 'signoz_traces' AND engine = 'Distributed'" "서버 A Distributed 테이블 수"
    test_query "서버 B" "http://$SERVER_B_IP:$SERVER_B_PORT" "SELECT count() FROM system.tables WHERE database = 'signoz_traces' AND engine = 'Distributed'" "서버 B Distributed 테이블 수"
    
    # 5. 핵심 테이블 데이터 확인
    echo -e "\n${YELLOW}=== 5. 핵심 테이블 데이터 확인 ===${NC}"
    
    CORE_TABLES=("span_attributes" "signoz_spans" "traces_v3_resource")
    
    for table in "${CORE_TABLES[@]}"; do
        echo -e "\n${BLUE}[테이블]${NC} $table"
        test_query "서버 A" "http://$SERVER_A_IP:$SERVER_A_PORT" "SELECT count() FROM signoz_traces.$table" "서버 A $table 레코드 수"
        test_query "서버 B" "http://$SERVER_B_IP:$SERVER_B_PORT" "SELECT count() FROM signoz_traces.$table" "서버 B $table 레코드 수"
    done
    
    # 6. Distributed 테이블 기능 테스트
    echo -e "\n${YELLOW}=== 6. Distributed 테이블 기능 테스트 ===${NC}"
    
    DISTRIBUTED_TABLES=("distributed_span_attributes" "distributed_signoz_spans" "distributed_traces_v3_resource")
    
    for table in "${DISTRIBUTED_TABLES[@]}"; do
        echo -e "\n${BLUE}[Distributed 테이블]${NC} $table"
        test_query "서버 A" "http://$SERVER_A_IP:$SERVER_A_PORT" "SELECT count() FROM signoz_traces.$table" "서버 A $table 분산 쿼리"
        test_query "서버 B" "http://$SERVER_B_IP:$SERVER_B_PORT" "SELECT count() FROM signoz_traces.$table" "서버 B $table 분산 쿼리"
    done
    
    # 7. 장애 복구 테스트 (선택사항)
    echo -e "\n${YELLOW}=== 7. 장애 복구 테스트 ===${NC}"
    
    echo -e "${BLUE}[정보]${NC} 수동 장애 복구 테스트 방법:"
    echo "  1. 서버 A 중지: ssh user@$SERVER_A_IP 'cd /path/to/signoz && docker compose stop clickhouse'"
    echo "  2. 서버 B로 쿼리: curl \"http://$SERVER_B_IP:$SERVER_B_PORT/?query=SELECT%20count()%20FROM%20signoz_traces.span_attributes\""
    echo "  3. 서버 A 재시작: ssh user@$SERVER_A_IP 'cd /path/to/signoz && docker compose start clickhouse'"
    
    # 8. 성능 간단 테스트
    echo -e "\n${YELLOW}=== 8. 성능 간단 테스트 ===${NC}"
    
    echo -e "${BLUE}[테스트]${NC} 서버별 응답 시간 비교"
    
    for server in "A:$SERVER_A_IP:$SERVER_A_PORT" "B:$SERVER_B_IP:$SERVER_B_PORT"; do
        IFS=':' read -r name ip port <<< "$server"
        echo "서버 $name 응답 시간:"
    
        start_time=$(date +%s%N)
        curl -s "http://$ip:$port/?query=SELECT%20count()%20FROM%20signoz_traces.span_attributes" > /dev/null 2>&1
        end_time=$(date +%s%N)
    
        if [ $? -eq 0 ]; then
            duration=$(( (end_time - start_time) / 1000000 ))
            echo "  ✓ $duration ms"
        else
            echo "  ✗ 응답 실패"
        fi
    done
    
    # 9. 데이터 일관성 확인 (중요!)
    echo -e "\n${YELLOW}=== 9. 데이터 일관성 확인 ===${NC}"
    
    echo -e "${BLUE}[테스트]${NC} 서버간 데이터 수 비교"
    
    COMPARE_TABLES=("span_attributes" "signoz_spans" "traces_v3_resource")
    
    for table in "${COMPARE_TABLES[@]}"; do
        echo "테이블: $table"
    
        count_a=$(curl -s "http://$SERVER_A_IP:$SERVER_A_PORT/?query=SELECT%20count()%20FROM%20signoz_traces.$table" 2>/dev/null)
        count_b=$(curl -s "http://$SERVER_B_IP:$SERVER_B_PORT/?query=SELECT%20count()%20FROM%20signoz_traces.$table" 2>/dev/null)
    
        if [ "$count_a" = "$count_b" ]; then
            echo -e "  ${GREEN}✓ 일치${NC}: 서버 A($count_a) = 서버 B($count_b)"
            ((PASS_COUNT++))
        else
            echo -e "  ${RED}✗ 불일치${NC}: 서버 A($count_a) ≠ 서버 B($count_b)"
            echo -e "    ${YELLOW}⚠ 데이터 동기화 필요${NC}"
            ((FAIL_COUNT++))
        fi
    done
    
    # 10. 최종 결과
    echo -e "\n${YELLOW}=== 최종 테스트 결과 ===${NC}"
    
    total_tests=$((PASS_COUNT + FAIL_COUNT))
    echo "총 테스트: $total_tests"
    echo -e "통과: ${GREEN}$PASS_COUNT${NC}"
    echo -e "실패: ${RED}$FAIL_COUNT${NC}"
    
    if [ $FAIL_COUNT -eq 0 ]; then
        echo -e "\n${GREEN}🎉 모든 테스트 통과! ClickHouse 서버간 이중화가 정상적으로 작동합니다.${NC}"
    elif [ $FAIL_COUNT -lt 3 ]; then
        echo -e "\n${YELLOW}⚠ 일부 문제가 있지만 기본 기능은 작동합니다. 실패한 항목을 점검해보세요.${NC}"
    else
        echo -e "\n${RED}❌ 심각한 문제가 있습니다. 설정을 다시 확인해주세요.${NC}"
    fi
    
    echo -e "\n=== 테스트 완료 ==="
    

테스트 테이블 생성 및 데이터 삽입 쿼리

  • 명령어 및 결과
  • # docker exec clickhouse-1 clickhouse-client --query " > CREATE TABLE IF NOT EXISTS test_replication ON CLUSTER cluster ( > id UInt64, > timestamp DateTime DEFAULT now(), > server_name String, > message String, > test_data String > ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{cluster}/test_replication', '{replica}') > ORDER BY (id, timestamp) > PARTITION BY toYYYYMM(timestamp) > " clickhouse-2 9000 0 1 0 clickhouse-1 9000 0 0 0
# docker exec clickhouse-1 clickhouse-client --query "
> CREATE TABLE IF NOT EXISTS distributed_test_replication ON CLUSTER cluster AS test_replication
> ENGINE = Distributed('cluster', 'default', 'test_replication')
> "
clickhouse-2    9000    0        1    0
clickhouse-1    9000    0        0    0


# curl -s "http://192.168.2.12:8123/?query=SELECT%20count()%20FROM%20system.tables%20WHERE%20name%20LIKE%20'%25test_replication%25'"
2


# curl -s "http://192.168.2.13:8123/?query=SELECT%20count()%20FROM%20system.tables%20WHERE%20name%20LIKE%20'%25test_replication%25'"
2


# docker exec clickhouse-1 clickhouse-client --query "
> INSERT INTO test_replication (id, server_name, message, test_data) VALUES
> (1, 'server-13', 'Hello from server 13', 'test_data_1'),
> (2, 'server-13', 'Replication test from 13', 'test_data_2'),
> (3, 'server-13', 'Third message from 13', 'test_data_3')
> "


# curl -s "http://192.168.2.12:8123/?query=SELECT%20count()%20FROM%20test_replication"
3


# curl -s "http://192.168.2.13:8123/?query=SELECT%20count()%20FROM%20test_replication"
3


# curl -s "http://192.168.2.12:8123/" -d "INSERT INTO test_replication (id, server_name, message, test_data) VALUES (4, 'server-12', 'Hello from server 12', 'test_data_4'), (5, 'server-12', 'Reverse replication test', 'test_data_5')"

# curl -s "http://192.168.2.12:8123/?query=SELECT%20count()%20FROM%20test_replication"
5


# curl -s "http://192.168.2.13:8123/?query=SELECT%20count()%20FROM%20test_replication"
5

# curl -s "http://192.168.2.12:8123/?query=SELECT%20id,%20server_name,%20message%20FROM%20test_replication%20ORDER%20BY%20id%20FORMAT%20PrettyCompact"
┌─id─┬─server_name─┬─message──────────────────┐
│  1 │ server-13   │ Hello from server 13     │
│  2 │ server-13   │ Replication test from 13 │
│  3 │ server-13   │ Third message from 13    │
└────┴─────────────┴──────────────────────────┘
┌─id─┬─server_name─┬─message──────────────────┐
│  4 │ server-12   │ Hello from server 12     │
│  5 │ server-12   │ Reverse replication test │
└────┴─────────────┴──────────────────────────┘

# curl -s "http://192.168.2.13:8123/?query=SELECT%20id,%20server_name,%20message%20FROM%20test_replication%20ORDER%20BY%20id%20FORMAT%20PrettyCompact"
┌─id─┬─server_name─┬─message──────────────────┐
│  1 │ server-13   │ Hello from server 13     │
│  2 │ server-13   │ Replication test from 13 │
│  3 │ server-13   │ Third message from 13    │
└────┴─────────────┴──────────────────────────┘
┌─id─┬─server_name─┬─message──────────────────┐
│  4 │ server-12   │ Hello from server 12     │
│  5 │ server-12   │ Reverse replication test │
└────┴─────────────┴──────────────────────────┘

```

이중화 기능 검증

  • 양방향 복제: 어느 서버에서 데이터 삽입해도 자동 복제
  • 실시간 동기화: 5초 이내 데이터 동기화 완료
  • Distributed 쿼리: 분산 테이블을 통한 통합 조회 가능
  • 장애 복구: 한 서버 다운 시에도 서비스 지속 가능

고가용성

장애 대응 시나리오

  1. Node 1 (13번) 장애
    • ZooKeeper 2개 중단 → 클러스터 일시 중단
    • ClickHouse-1 중단 → Node 2에서 서비스 지속
  2. Node 2 (12번) 장애
    • ZooKeeper 1개 중단 → 클러스터 정상 (과반수 유지)
    • ClickHouse-2 중단 → Node 1에서 서비스 지속

*zookeeper 3개를 각 다른 노드에 배치하면 과반수 유지되어 정상 작동함

*zookeeper 를 단일로 사용하는 방법도 있으나, ZooKeeper 서버 장애 시 전체 ClickHouse 클러스터 중단
운영 환경에서는 최소 3개 ZooKeeper 노드 구성 필수

데이터 보호

  • 자동 복제: 모든 데이터가 양쪽 서버에 실시간 복제
  • 정합성 보장: ZooKeeper를 통한 트랜잭션 조정
  • 백업 불필요: 실시간 복제로 인한 자동 백업 효과

모니터링 방법

정기 상태 확인

# 클러스터 상태 확인
docker exec clickhouse-1 clickhouse-client --query "
SELECT host_name, host_address, is_local, errors_count
FROM system.clusters WHERE cluster = 'cluster'"

# 복제 상태 확인
docker exec clickhouse-1 clickhouse-client --query "
SELECT database, table, replica_name, is_leader, absolute_delay, active_replicas
FROM system.replicas WHERE database LIKE 'signoz%'"

참고 자료

트러블슈팅

  • 네트워크 연결 문제: HTTP API사용
  • otel-migrator-sync 실행 안되는 문제
    로그: DDL 큐 문제
    해결: zookeeper 클러스터 ZOO_SERVERS 서식 변경
    • 본인은 0.0.0.0, 나머지 클러스터는 hostname
      - ZOO_SERVERS=0.0.0.0:2888:3888,zookeeper-2:2888:3888,zookeeper-3:2888:3888
  • 타 서버 host 연결 못하는 문제 : extra_hosts 지정
    extra_hosts:
      - "zookeeper-1:192.168.2.13"
      - "zookeeper-2:192.168.2.12"
      - "zookeeper-3:192.168.2.13"
      - "clickhouse-1:192.168.2.13"
      - "clickhouse-2:192.168.2.12"