ClickHouse 이중화(HA) 구성 가이드 - 서버 분리
개요
SigNoz 모니터링 시스템의 ClickHouse 데이터베이스 이중화 구축
시스템 구성
구성도

서버
| 구분 | 서버 | IP 주소 | 역할 | 컨테이너 |
|---|---|---|---|---|
| Node 1 | 13번 서버 | 192.168.2.13 | Primary + ZooKeeper | clickhouse-1, zookeeper-1, zookeeper-3 |
| Node 2 | 12번 서버 | 192.168.2.12 | Secondary | clickhouse-2, zookeeper-2 |
ClickHouse 클러스터 설정
- 클러스터명:
cluster - 샤드 수: 1개
- 복제본 수: 2개 (각 서버당 1개)
- 엔진: ReplicatedMergeTree
- 복제 방식: 양방향 자동 복제
ZooKeeper 클러스터
- 노드 수: 3개 (권장 홀수 구성)
- 분산: Node 1(2개), Node 2(1개)
- 포트: 2181, 2181, 2183
네트워크 구성
| 서비스 | 포트 | 용도 | 방화벽 오픈 필요 |
|---|---|---|---|
| ClickHouse HTTP | 8123 | 웹 인터페이스 | O |
| ClickHouse TCP | 9000 | 클라이언트 연결 | O |
| ZooKeeper | 2181, 2183 | 클러스터 조정 | O |
| ZooKeeper | 2888 | Follower 포트(데이터 동기화) | O |
| ZooKeeper | 3888 | Election 포트(리더선출) | O |
| ClickHouse | 9009 | 서버간 복제 통신 | |
| 내부프로토콜 | X |
주요 설정 파일 - Node 1
| Node 1 | 13번 서버 | 192.168.2.13 |
|---|---|---|
| - docker-compose.yml |
```bash
x-clickhouse-defaults: &clickhouse-defaults
restart: on-failure
image: clickhouse/clickhouse-server:24.1.2-alpine
tty: true
depends_on:
- zookeeper-1
- zookeeper-3
logging:
options:
max-size: 50m
max-file: "3"
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "0.0.0.0:8123/ping"]
interval: 30s
timeout: 5s
retries: 3
ulimits:
nproc: 65535
nofile:
soft: 262144
hard: 262144
x-db-depend: &db-depend
depends_on:
clickhouse:
condition: service_healthy
otel-collector-migrator-sync:
condition: service_completed_successfully
services:
zookeeper-1:
image: bitnami/zookeeper:3.7.1
container_name: zookeeper-1
hostname: zookeeper-1
user: ${USER_UID}
extra_hosts:
- "zookeeper-1:192.168.2.13"
- "zookeeper-2:192.168.2.12"
- "zookeeper-3:192.168.2.13"
- "clickhouse-1:192.168.2.13"
- "clickhouse-2:192.168.2.12"
ports:
- "2181:2181"
- "2888:2888"
- "3888:3888"
volumes:
- ${TEST_DATA}/zookeeper-1:/bitnami/zookeeper
environment:
- ZOO_SERVER_ID=1
- ZOO_SERVERS=0.0.0.0:2888:3888,zookeeper-2:2888:3888,zookeeper-3:2888:3888
- ALLOW_ANONYMOUS_LOGIN=yes
- ZOO_AUTOPURGE_INTERVAL=1
- ZOO_TICK_TIME=2000
- ZOO_INIT_LIMIT=10
- ZOO_SYNC_LIMIT=5
- ZOO_MAX_CLIENT_CNXNS=300
healthcheck:
test: ["CMD", "zkServer.sh", "status"]
interval: 10s
timeout: 5s
retries: 3
zookeeper-3:
image: bitnami/zookeeper:3.7.1
container_name: zookeeper-3
hostname: zookeeper-3
user: ${USER_UID}
extra_hosts:
- "zookeeper-1:192.168.2.13"
- "zookeeper-2:192.168.2.12"
- "zookeeper-3:192.168.2.13"
- "clickhouse-1:192.168.2.13"
- "clickhouse-2:192.168.2.12"
ports:
- "2183:2181"
- "2890:2888"
- "3890:3888"
volumes:
- ${TEST_DATA}/zookeeper-3:/bitnami/zookeeper
environment:
- ZOO_SERVER_ID=3
- ZOO_SERVERS=zookeeper-1:2888:3888,zookeeper-2:2888:3888,0.0.0.0:2888:3888
- ALLOW_ANONYMOUS_LOGIN=yes
- ZOO_AUTOPURGE_INTERVAL=1
- ZOO_TICK_TIME=2000
- ZOO_INIT_LIMIT=10
- ZOO_SYNC_LIMIT=5
- ZOO_MAX_CLIENT_CNXNS=300
healthcheck:
test: ["CMD", "zkServer.sh", "status"]
interval: 10s
timeout: 5s
retries: 3
clickhouse:
<<: *clickhouse-defaults
container_name: clickhouse-1
hostname: clickhouse-1
user: ${USER_UID}:${USER_UID}
extra_hosts:
- "zookeeper-1:192.168.2.13"
- "zookeeper-2:192.168.2.12"
- "zookeeper-3:192.168.2.13"
- "clickhouse-1:127.0.0.1"
- "clickhouse-2:192.168.2.12"
ports:
- "9000:9000"
- "8123:8123"
- "9181:9181"
- "9009:9009"
volumes:
- ./clickhouse-config.xml:/etc/clickhouse-server/config.xml
- ./clickhouse-users.xml:/etc/clickhouse-server/users.xml
- ./custom-function.xml:/etc/clickhouse-server/custom-function.xml
- ./clickhouse-cluster-server1.xml:/etc/clickhouse-server/config.d/cluster.xml
- ${TEST_DATA}/clickhouse/:/var/lib/clickhouse/
environment:
- CLICKHOUSE_DISTRIBUTED_DDL_TASK_TIMEOUT=1800
- CLICKHOUSE_CONNECTION_POOL_SIZE=1024
- CLICKHOUSE_MAX_CONNECTIONS=4096
sysctls:
- net.ipv6.conf.all.disable_ipv6=1
depends_on:
zookeeper-1:
condition: service_healthy
zookeeper-3:
condition: service_healthy
alertmanager:
image: signoz/alertmanager:${ALERTMANAGER_TAG:-0.23.7}
container_name: signoz-alertmanager
user: ${USER_UID}
volumes:
- ${TEST_DATA}/alertmanager:/data
depends_on:
query-service:
condition: service_healthy
restart: on-failure
command:
- --queryService.url=http://query-service:8085
- --storage.path=/data
query-service:
image: signoz/query-service:${DOCKER_TAG:-0.62.0}
container_name: signoz-query-service
command:
[
"-config=/root/config/prometheus.yml",
"--use-logs-new-schema=true"
]
volumes:
- ./prometheus.yml:/root/config/prometheus.yml
- ../dashboards:/root/config/dashboards
- ${TEST_DATA}/signoz/:/var/lib/signoz/
environment:
- ClickHouseUrl=tcp://clickhouse-1:9000
- ALERTMANAGER_API_PREFIX=http://alertmanager:9093/api/
- SIGNOZ_LOCAL_DB_PATH=/var/lib/signoz/signoz.db
- DASHBOARDS_PATH=/root/config/dashboards
- STORAGE=clickhouse
- GODEBUG=netdns=go
- TELEMETRY_ENABLED=true
- DEPLOYMENT_TYPE=docker-standalone-amd
restart: on-failure
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "localhost:8080/api/v1/health"]
interval: 30s
timeout: 5s
retries: 3
<<: *db-depend
frontend:
image: signoz/frontend:${DOCKER_TAG:-0.62.0}
container_name: signoz-frontend
restart: on-failure
depends_on:
- alertmanager
- query-service
ports:
- "3301:3301"
volumes:
- ../common/nginx-config.conf:/etc/nginx/conf.d/default.conf
otel-collector-migrator-sync:
image: signoz/signoz-schema-migrator:${OTELCOL_TAG:-0.111.15}
container_name: otel-migrator-sync
user: ${USER_UID}
extra_hosts:
- "zookeeper-1:192.168.2.13"
- "zookeeper-2:192.168.2.12"
- "zookeeper-3:192.168.2.13"
- "clickhouse-1:192.168.2.13"
- "clickhouse-2:192.168.2.12"
environment:
- DOCKER_MULTI_NODE_CLUSTER=true
- CLICKHOUSE_DISTRIBUTED_DDL_TASK_TIMEOUT=1800
- CLICKHOUSE_CONNECTION_TIMEOUT=30000
- CLICKHOUSE_RECEIVE_TIMEOUT=600000
- CLICKHOUSE_SEND_TIMEOUT=600000
command:
- "sync"
- "--dsn=tcp://clickhouse-1:9000"
- "--replication=true"
- "--cluster-name=cluster"
- "--up="
depends_on:
clickhouse:
condition: service_healthy
restart: on-failure
otel-collector-migrator-async:
image: signoz/signoz-schema-migrator:${OTELCOL_TAG:-0.111.15}
container_name: otel-migrator-async
extra_hosts:
- "zookeeper-1:192.168.2.13"
- "zookeeper-2:192.168.2.12"
- "zookeeper-3:192.168.2.13"
- "clickhouse-1:127.0.0.1"
- "clickhouse-2:192.168.2.12"
user: ${USER_UID}
environment:
- DOCKER_MULTI_NODE_CLUSTER=true
- CLICKHOUSE_CONNECTION_TIMEOUT=30000
command:
- "async"
- "--dsn=tcp://clickhouse-1:9000"
- "--up="
depends_on:
clickhouse:
condition: service_healthy
otel-collector-migrator-sync:
condition: service_completed_successfully
restart: on-failure
otel-collector:
image: signoz/signoz-otel-collector:${OTELCOL_TAG:-0.111.15}
container_name: signoz-otel-collector
command:
[
"--config=/etc/otel-collector-config.yaml",
"--manager-config=/etc/manager-config.yaml",
"--copy-path=/var/tmp/collector-config.yaml",
"--feature-gates=-pkg.translator.prometheus.NormalizeName"
]
user: ${USER_UID}
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
- ./otel-collector-opamp-config.yaml:/etc/manager-config.yaml
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /:/hostfs:ro
environment:
- OTEL_RESOURCE_ATTRIBUTES=host.name=TEST-prod,os.type=linux
- DOCKER_MULTI_NODE_CLUSTER=true
- LOW_CARDINAL_EXCEPTION_GROUPING=false
- GOMEMLIMIT=4GiB
ports:
- "1777:1777"
- "4317:4317"
- "4318:4318"
- "2255:2255"
- "8888:8888"
- "8889:8889"
- "13133:13133"
- "14250:14250"
- "14268:14268"
restart: on-failure
deploy:
resources:
limits:
memory: 6G
reservations:
memory: 4G
depends_on:
clickhouse:
condition: service_healthy
otel-collector-migrator-sync:
condition: service_completed_successfully
query-service:
condition: service_healthy
```clickhouse-cluster-server1.xml
<?xml version="1.0"?> <clickhouse> <listen_host>0.0.0.0</listen_host> <interserver_http_host>192.168.2.13</interserver_http_host> <interserver_http_port>9009</interserver_http_port> <zookeeper> <node index="1"> <host>zookeeper-1</host> <port>2181</port> </node> <node index="2"> <host>zookeeper-2</host> <port>2181</port> </node> <node index="3"> <host>zookeeper-3</host> <port>2183</port> </node> </zookeepe <remote_servers> <cluster> <shard> <internal_replication>true</internal_replication> <replica> <host>clickhouse-1</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>clickhouse-2</host> <port>9000</port> <user>default</user> <password></password> </replica> </shard> </cluster> </remote_servers> <macros> <cluster>cluster</cluster> <shard>01</shard> <replica>replica_server_a</replica> </macros> </clickhouse>
주요 설정 파일 - Node 2
| Node 2 | 12번 서버 | 192.168.2.12 |
|---|---|---|
| - docker-compose.yml |
```bash
x-clickhouse-defaults: &clickhouse-defaults
restart: on-failure
image: clickhouse/clickhouse-server:24.1.2-alpine
tty: true
depends_on:
- zookeeper-2
logging:
options:
max-size: 50m
max-file: "3"
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "0.0.0.0:8123/ping"]
interval: 30s
timeout: 5s
retries: 3
ulimits:
nproc: 65535
nofile:
soft: 262144
hard: 262144
services:
zookeeper-2:
image: bitnami/zookeeper:3.7.1
container_name: zookeeper-2
hostname: zookeeper-2
user: ${USER_UID}
extra_hosts:
- "zookeeper-1:192.168.2.13"
- "zookeeper-2:192.168.2.12"
- "zookeeper-3:192.168.2.13"
- "clickhouse-1:192.168.2.13"
- "clickhouse-2:192.168.2.12"
ports:
- "2181:2181"
- "2888:2888"
- "3888:3888"
volumes:
- ${TEST_DATA}/zookeeper-2:/bitnami/zookeeper
environment:
- ZOO_SERVER_ID=2
- ZOO_SERVERS=zookeeper-1:2888:3888,0.0.0.0:2888:3888,zookeeper-3:2888:3888
- ALLOW_ANONYMOUS_LOGIN=yes
- ZOO_AUTOPURGE_INTERVAL=1
- ZOO_TICK_TIME=2000
- ZOO_INIT_LIMIT=10
- ZOO_SYNC_LIMIT=5
- ZOO_MAX_CLIENT_CNXNS=300
healthcheck:
test: ["CMD", "zkServer.sh", "status"]
interval: 10s
timeout: 5s
retries: 3
clickhouse:
<<: *clickhouse-defaults
container_name: clickhouse-2
hostname: clickhouse-2
user: ${USER_UID}:${USER_UID}
extra_hosts:
- "zookeeper-1:192.168.2.13"
- "zookeeper-2:192.168.2.12"
- "zookeeper-3:192.168.2.13"
- "clickhouse-1:192.168.2.13"
- "clickhouse-2:127.0.0.1"
ports:
- "9000:9000"
- "8123:8123"
- "9181:9181"
- "9009:9009"
volumes:
- ./clickhouse-config.xml:/etc/clickhouse-server/config.xml
- ./clickhouse-users.xml:/etc/clickhouse-server/users.xml
- ./custom-function.xml:/etc/clickhouse-server/custom-function.xml
- ./clickhouse-cluster-server2.xml:/etc/clickhouse-server/config.d/cluster.xml
- ${TEST_DATA}/clickhouse/:/var/lib/clickhouse/
environment:
- CLICKHOUSE_DISTRIBUTED_DDL_TASK_TIMEOUT=1800
- CLICKHOUSE_CONNECTION_POOL_SIZE=1024
- CLICKHOUSE_MAX_CONNECTIONS=4096
sysctls:
- net.ipv6.conf.all.disable_ipv6=1
depends_on:
zookeeper-2:
condition: service_healthy
```clickhouse-cluster-server2.xml
<?xml version="1.0"?> <clickhouse> <!-- 네트워크 설정 --> <listen_host>0.0.0.0</listen_host> <interserver_http_host>192.168.2.12</interserver_http_host> <interserver_http_port>9009</interserver_http_port> <!-- ZooKeeper 설정 --> <zookeeper> <node index="1"> <host>zookeeper-1</host> <port>2181</port> </node> <node index="2"> <host>zookeeper-2</host> <port>2181</port> </node> <node index="3"> <host>zookeeper-3</host> <port>2183</port> </node> </zookeeper> <remote_servers> <cluster> <shard> <internal_replication>true</internal_replication> <replica> <host>clickhouse-1</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>clickhouse-2</host> <port>9000</port> <user>default</user> <password></password> </replica> </shard> </cluster> </remote_servers> <macros> <cluster>cluster</cluster> <shard>01</shard> <replica>replica_server_b</replica> </macros> </clickhouse>
테스트 결과
최종 검증 결과 (2025-06-18)
=== 최종 테스트 결과 ===
총 테스트: 21개
통과: 21개 ✅
실패: 0개 ❌세부 테스트 항목
| 테스트 항목 | 서버 A | 서버 B | 상태 |
|---|---|---|---|
| 기본 연결 | O | O | 성공 |
| 클러스터 인식 | O | O | 성공 |
| SignOz 테이블 수 | 38개 | 38개 | 일치 |
| Distributed 테이블 수 | 15개 | 15개 | 일치 |
| span_attributes | 123개 | 123개 | 동기화 완료 |
| signoz_spans | 190개 | 190개 | 동기화 완료 |
| traces_v3_resource | 3개 | 3개 | 동기화 완료 |
| 응답 속도 | 4ms | 4ms | 고성능 |
테스트 스크립트 및 수동 테스트 명령어
clickhouse_ha_test.sh: 종합 이중화 상태 테스트
clickhouse_ha_test.sh#!/bin/bash # clickhouse_ha_test.sh - ClickHouse 서버간 이중화 테스트 echo "=== ClickHouse 서버간 이중화 테스트 ===" # 서버 정보 설정 SERVER_A_IP="192.168.2.12" SERVER_B_IP="192.168.2.13" SERVER_A_PORT="8123" SERVER_B_PORT="8123" echo "테스트 대상:" echo " - 서버 A: $SERVER_A_IP:$SERVER_A_PORT" echo " - 서버 B: $SERVER_B_IP:$SERVER_B_PORT" echo "" # 색상 정의 RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' BLUE='\033[0;34m' NC='\033[0m' # No Color # 테스트 결과 추적 PASS_COUNT=0 FAIL_COUNT=0 # 테스트 함수 test_query() { local server_name="$1" local server_url="$2" local query="$3" local description="$4" echo -e "${BLUE}[테스트]${NC} $description" echo " 서버: $server_name ($server_url)" echo " 쿼리: $query" result=$(curl -s "$server_url/?query=$(echo "$query" | sed 's/ /%20/g')" 2>/dev/null) if [ $? -eq 0 ] && [ -n "$result" ]; then echo -e " ${GREEN}✓ 성공${NC}: $result" ((PASS_COUNT++)) return 0 else echo -e " ${RED}✗ 실패${NC}: 연결 불가 또는 응답 없음" ((FAIL_COUNT++)) return 1 fi } # 1. 기본 연결 테스트 echo -e "\n${YELLOW}=== 1. 기본 연결 테스트 ===${NC}" test_query "서버 A" "http://$SERVER_A_IP:$SERVER_A_PORT" "SELECT 'Server A Connected' as status" "서버 A 연결 확인" test_query "서버 B" "http://$SERVER_B_IP:$SERVER_B_PORT" "SELECT 'Server B Connected' as status" "서버 B 연결 확인" # 2. 클러스터 상태 확인 echo -e "\n${YELLOW}=== 2. 클러스터 상태 확인 ===${NC}" echo -e "${BLUE}[테스트]${NC} 서버 A 클러스터 설정" curl -s "http://$SERVER_A_IP:$SERVER_A_PORT/?query=SELECT%20cluster,%20shard_num,%20replica_num,%20host_name,%20host_address%20FROM%20system.clusters%20WHERE%20cluster='cluster'%20FORMAT%20Pretty" || echo "클러스터 정보 조회 실패" echo -e "\n${BLUE}[테스트]${NC} 서버 B 클러스터 설정" curl -s "http://$SERVER_B_IP:$SERVER_B_PORT/?query=SELECT%20cluster,%20shard_num,%20replica_num,%20host_name,%20host_address%20FROM%20system.clusters%20WHERE%20cluster='cluster'%20FORMAT%20Pretty" || echo "클러스터 정보 조회 실패" # 3. SignOz 테이블 존재 확인 echo -e "\n${YELLOW}=== 3. SignOz 테이블 존재 확인 ===${NC}" test_query "서버 A" "http://$SERVER_A_IP:$SERVER_A_PORT" "SELECT count() FROM system.tables WHERE database = 'signoz_traces'" "서버 A SignOz 테이블 수" test_query "서버 B" "http://$SERVER_B_IP:$SERVER_B_PORT" "SELECT count() FROM system.tables WHERE database = 'signoz_traces'" "서버 B SignOz 테이블 수" # 4. Distributed 테이블 확인 echo -e "\n${YELLOW}=== 4. Distributed 테이블 확인 ===${NC}" test_query "서버 A" "http://$SERVER_A_IP:$SERVER_A_PORT" "SELECT count() FROM system.tables WHERE database = 'signoz_traces' AND engine = 'Distributed'" "서버 A Distributed 테이블 수" test_query "서버 B" "http://$SERVER_B_IP:$SERVER_B_PORT" "SELECT count() FROM system.tables WHERE database = 'signoz_traces' AND engine = 'Distributed'" "서버 B Distributed 테이블 수" # 5. 핵심 테이블 데이터 확인 echo -e "\n${YELLOW}=== 5. 핵심 테이블 데이터 확인 ===${NC}" CORE_TABLES=("span_attributes" "signoz_spans" "traces_v3_resource") for table in "${CORE_TABLES[@]}"; do echo -e "\n${BLUE}[테이블]${NC} $table" test_query "서버 A" "http://$SERVER_A_IP:$SERVER_A_PORT" "SELECT count() FROM signoz_traces.$table" "서버 A $table 레코드 수" test_query "서버 B" "http://$SERVER_B_IP:$SERVER_B_PORT" "SELECT count() FROM signoz_traces.$table" "서버 B $table 레코드 수" done # 6. Distributed 테이블 기능 테스트 echo -e "\n${YELLOW}=== 6. Distributed 테이블 기능 테스트 ===${NC}" DISTRIBUTED_TABLES=("distributed_span_attributes" "distributed_signoz_spans" "distributed_traces_v3_resource") for table in "${DISTRIBUTED_TABLES[@]}"; do echo -e "\n${BLUE}[Distributed 테이블]${NC} $table" test_query "서버 A" "http://$SERVER_A_IP:$SERVER_A_PORT" "SELECT count() FROM signoz_traces.$table" "서버 A $table 분산 쿼리" test_query "서버 B" "http://$SERVER_B_IP:$SERVER_B_PORT" "SELECT count() FROM signoz_traces.$table" "서버 B $table 분산 쿼리" done # 7. 장애 복구 테스트 (선택사항) echo -e "\n${YELLOW}=== 7. 장애 복구 테스트 ===${NC}" echo -e "${BLUE}[정보]${NC} 수동 장애 복구 테스트 방법:" echo " 1. 서버 A 중지: ssh user@$SERVER_A_IP 'cd /path/to/signoz && docker compose stop clickhouse'" echo " 2. 서버 B로 쿼리: curl \"http://$SERVER_B_IP:$SERVER_B_PORT/?query=SELECT%20count()%20FROM%20signoz_traces.span_attributes\"" echo " 3. 서버 A 재시작: ssh user@$SERVER_A_IP 'cd /path/to/signoz && docker compose start clickhouse'" # 8. 성능 간단 테스트 echo -e "\n${YELLOW}=== 8. 성능 간단 테스트 ===${NC}" echo -e "${BLUE}[테스트]${NC} 서버별 응답 시간 비교" for server in "A:$SERVER_A_IP:$SERVER_A_PORT" "B:$SERVER_B_IP:$SERVER_B_PORT"; do IFS=':' read -r name ip port <<< "$server" echo "서버 $name 응답 시간:" start_time=$(date +%s%N) curl -s "http://$ip:$port/?query=SELECT%20count()%20FROM%20signoz_traces.span_attributes" > /dev/null 2>&1 end_time=$(date +%s%N) if [ $? -eq 0 ]; then duration=$(( (end_time - start_time) / 1000000 )) echo " ✓ $duration ms" else echo " ✗ 응답 실패" fi done # 9. 데이터 일관성 확인 (중요!) echo -e "\n${YELLOW}=== 9. 데이터 일관성 확인 ===${NC}" echo -e "${BLUE}[테스트]${NC} 서버간 데이터 수 비교" COMPARE_TABLES=("span_attributes" "signoz_spans" "traces_v3_resource") for table in "${COMPARE_TABLES[@]}"; do echo "테이블: $table" count_a=$(curl -s "http://$SERVER_A_IP:$SERVER_A_PORT/?query=SELECT%20count()%20FROM%20signoz_traces.$table" 2>/dev/null) count_b=$(curl -s "http://$SERVER_B_IP:$SERVER_B_PORT/?query=SELECT%20count()%20FROM%20signoz_traces.$table" 2>/dev/null) if [ "$count_a" = "$count_b" ]; then echo -e " ${GREEN}✓ 일치${NC}: 서버 A($count_a) = 서버 B($count_b)" ((PASS_COUNT++)) else echo -e " ${RED}✗ 불일치${NC}: 서버 A($count_a) ≠ 서버 B($count_b)" echo -e " ${YELLOW}⚠ 데이터 동기화 필요${NC}" ((FAIL_COUNT++)) fi done # 10. 최종 결과 echo -e "\n${YELLOW}=== 최종 테스트 결과 ===${NC}" total_tests=$((PASS_COUNT + FAIL_COUNT)) echo "총 테스트: $total_tests" echo -e "통과: ${GREEN}$PASS_COUNT${NC}" echo -e "실패: ${RED}$FAIL_COUNT${NC}" if [ $FAIL_COUNT -eq 0 ]; then echo -e "\n${GREEN}🎉 모든 테스트 통과! ClickHouse 서버간 이중화가 정상적으로 작동합니다.${NC}" elif [ $FAIL_COUNT -lt 3 ]; then echo -e "\n${YELLOW}⚠ 일부 문제가 있지만 기본 기능은 작동합니다. 실패한 항목을 점검해보세요.${NC}" else echo -e "\n${RED}❌ 심각한 문제가 있습니다. 설정을 다시 확인해주세요.${NC}" fi echo -e "\n=== 테스트 완료 ==="
테스트 테이블 생성 및 데이터 삽입 쿼리
명령어 및 결과# docker exec clickhouse-1 clickhouse-client --query " > CREATE TABLE IF NOT EXISTS test_replication ON CLUSTER cluster ( > id UInt64, > timestamp DateTime DEFAULT now(), > server_name String, > message String, > test_data String > ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{cluster}/test_replication', '{replica}') > ORDER BY (id, timestamp) > PARTITION BY toYYYYMM(timestamp) > " clickhouse-2 9000 0 1 0 clickhouse-1 9000 0 0 0
# docker exec clickhouse-1 clickhouse-client --query "
> CREATE TABLE IF NOT EXISTS distributed_test_replication ON CLUSTER cluster AS test_replication
> ENGINE = Distributed('cluster', 'default', 'test_replication')
> "
clickhouse-2 9000 0 1 0
clickhouse-1 9000 0 0 0
# curl -s "http://192.168.2.12:8123/?query=SELECT%20count()%20FROM%20system.tables%20WHERE%20name%20LIKE%20'%25test_replication%25'"
2
# curl -s "http://192.168.2.13:8123/?query=SELECT%20count()%20FROM%20system.tables%20WHERE%20name%20LIKE%20'%25test_replication%25'"
2
# docker exec clickhouse-1 clickhouse-client --query "
> INSERT INTO test_replication (id, server_name, message, test_data) VALUES
> (1, 'server-13', 'Hello from server 13', 'test_data_1'),
> (2, 'server-13', 'Replication test from 13', 'test_data_2'),
> (3, 'server-13', 'Third message from 13', 'test_data_3')
> "
# curl -s "http://192.168.2.12:8123/?query=SELECT%20count()%20FROM%20test_replication"
3
# curl -s "http://192.168.2.13:8123/?query=SELECT%20count()%20FROM%20test_replication"
3
# curl -s "http://192.168.2.12:8123/" -d "INSERT INTO test_replication (id, server_name, message, test_data) VALUES (4, 'server-12', 'Hello from server 12', 'test_data_4'), (5, 'server-12', 'Reverse replication test', 'test_data_5')"
# curl -s "http://192.168.2.12:8123/?query=SELECT%20count()%20FROM%20test_replication"
5
# curl -s "http://192.168.2.13:8123/?query=SELECT%20count()%20FROM%20test_replication"
5
# curl -s "http://192.168.2.12:8123/?query=SELECT%20id,%20server_name,%20message%20FROM%20test_replication%20ORDER%20BY%20id%20FORMAT%20PrettyCompact"
┌─id─┬─server_name─┬─message──────────────────┐
│ 1 │ server-13 │ Hello from server 13 │
│ 2 │ server-13 │ Replication test from 13 │
│ 3 │ server-13 │ Third message from 13 │
└────┴─────────────┴──────────────────────────┘
┌─id─┬─server_name─┬─message──────────────────┐
│ 4 │ server-12 │ Hello from server 12 │
│ 5 │ server-12 │ Reverse replication test │
└────┴─────────────┴──────────────────────────┘
# curl -s "http://192.168.2.13:8123/?query=SELECT%20id,%20server_name,%20message%20FROM%20test_replication%20ORDER%20BY%20id%20FORMAT%20PrettyCompact"
┌─id─┬─server_name─┬─message──────────────────┐
│ 1 │ server-13 │ Hello from server 13 │
│ 2 │ server-13 │ Replication test from 13 │
│ 3 │ server-13 │ Third message from 13 │
└────┴─────────────┴──────────────────────────┘
┌─id─┬─server_name─┬─message──────────────────┐
│ 4 │ server-12 │ Hello from server 12 │
│ 5 │ server-12 │ Reverse replication test │
└────┴─────────────┴──────────────────────────┘
```이중화 기능 검증
- 양방향 복제: 어느 서버에서 데이터 삽입해도 자동 복제
- 실시간 동기화: 5초 이내 데이터 동기화 완료
- Distributed 쿼리: 분산 테이블을 통한 통합 조회 가능
- 장애 복구: 한 서버 다운 시에도 서비스 지속 가능
고가용성
장애 대응 시나리오
- Node 1 (13번) 장애
- ZooKeeper 2개 중단 → 클러스터 일시 중단
- ClickHouse-1 중단 → Node 2에서 서비스 지속
- Node 2 (12번) 장애
- ZooKeeper 1개 중단 → 클러스터 정상 (과반수 유지)
- ClickHouse-2 중단 → Node 1에서 서비스 지속
*zookeeper 3개를 각 다른 노드에 배치하면 과반수 유지되어 정상 작동함
*zookeeper 를 단일로 사용하는 방법도 있으나, ZooKeeper 서버 장애 시 전체 ClickHouse 클러스터 중단
운영 환경에서는 최소 3개 ZooKeeper 노드 구성 필수
데이터 보호
- 자동 복제: 모든 데이터가 양쪽 서버에 실시간 복제
- 정합성 보장: ZooKeeper를 통한 트랜잭션 조정
- 백업 불필요: 실시간 복제로 인한 자동 백업 효과
모니터링 방법
정기 상태 확인
# 클러스터 상태 확인
docker exec clickhouse-1 clickhouse-client --query "
SELECT host_name, host_address, is_local, errors_count
FROM system.clusters WHERE cluster = 'cluster'"
# 복제 상태 확인
docker exec clickhouse-1 clickhouse-client --query "
SELECT database, table, replica_name, is_leader, absolute_delay, active_replicas
FROM system.replicas WHERE database LIKE 'signoz%'"참고 자료
- ClickHouse 공식 문서 - 복제
- SigNoz 분산 ClickHouse 가이드
- ZooKeeper 클러스터 구성 가이드
- https://github.com/SigNoz/signoz/blob/main/deploy/docker-swarm/docker-compose.ha.yaml
트러블슈팅
- 네트워크 연결 문제: HTTP API사용
- otel-migrator-sync 실행 안되는 문제
로그: DDL 큐 문제
해결: zookeeper 클러스터 ZOO_SERVERS 서식 변경- 본인은 0.0.0.0, 나머지 클러스터는 hostname
- ZOO_SERVERS=0.0.0.0:2888:3888,zookeeper-2:2888:3888,zookeeper-3:2888:3888- 타 서버 host 연결 못하는 문제 : extra_hosts 지정
extra_hosts:
- "zookeeper-1:192.168.2.13"
- "zookeeper-2:192.168.2.12"
- "zookeeper-3:192.168.2.13"
- "clickhouse-1:192.168.2.13"
- "clickhouse-2:192.168.2.12"'Monitoring > Signoz' 카테고리의 다른 글
| ClickHouse 이중화(HA) 구성 가이드 - 서버 분리 - podman ver (0) | 2025.10.15 |
|---|---|
| ClickHouse 이중화(HA) 구성 가이드 (0) | 2025.06.13 |
| Signoz Cloud 버전을 Helm chart to Yaml로 설치 (1) (2) | 2025.02.07 |
| 오픈 소스 APM도구 SIGNOZ (2) (6) | 2025.02.05 |
| 오픈 소스 APM도구 SIGNOZ (1) (0) | 2025.02.05 |