Pgpool-II Watchdog – DECODE X NET

これまで、Pgpool-IIサーバについて、一台のみのテストでしたが、今回二台を使って動かしてみました。（本来はスプリットブレイン対応のため3台が良いとされます）

Pgpool-II & PostgreSQL

Pgpool-II & PostgreSQL w/ repmgr

複雑な仕組みを持つPgpoolですので、まず第一歩として、サーバが切り替わるときの、仮想IPを書き換える仕組みを確認することを目的とします。またサーバ起動・停止は手動で、その時のログ（syslog）を観察します。そのため、Postgresはレプリケーションなどせずアクセス確認のためのテーブルをつくるだけにしたり、faileover時のスクリプトも登録せず、できる限りシンプルなシステムで動かします。

環境）Ubuntu 22.04 / docker / Mac(arm)
※これまでの環境を利用しようとしましたが、うまくいかず新規で構築。ubuntu22.04をベースにしてaptでインストールできるものを利用。

docker run -it –net pgpool_default –name ubuntu2204 ubuntu:22.04

最初は上記のようにイメージを持ってきましたが、いろいろとインストール後、commit してイメージを再構築、複製を何度か繰り返しました。

その他インストールしたもの

net-tools, vim, syslog-ng, sudo, iproute2, arping

最終的な各コンテナ起動コマンド

docker run –privileged -it –net pgpool_default –name ubuntu2204s1 ubuntu2204s1
docker run –privileged -it –net pgpool_default –name ubuntu2204s2 ubuntu2204s2

メモ）
postgresサーバを起動したまま、コンテナを停止させてしまい、次回起動エラーが出た場合

$ /lib/postgresql/14/bin/pg_resetwal /var/lib/postgresql/14/main/
Write-ahead log reset

PgpoolからPostgresへのpsqlコマンド接続で、認証エラーが出たため、pgpoolの認証に関わる部分を無効にし、下記設定のパスワードを利用した。

alter user postgres password ‘password’

※postgresユーザで　psql[enter]。rootユーザでも passwd postgres で設定
visudo

postgres ALL=NOPASSWD: /usr/sbin/ip *, /usr/sbin/arping *

pgpool.conf(ubuntu2204s1) 抜粋

use_watchdog = on
trusted_servers = ”
ping_path = ‘/bin’
wd_hostname = ‘ubuntu2204s1’
wd_port = 9000
wd_priority = 1
wd_authkey = ”
wd_ipc_socket_dir = ‘/tmp’
delegate_IP = ‘172.22.0.88’
if_cmd_path = ‘/sbin’
if_up_cmd = ‘/usr/bin/sudo /usr/sbin/ip addr add $_IP_$/16 dev eth0 label eth0:0’
if_down_cmd = ‘/usr/bin/sudo /usr/sbin/ip addr del $_IP_$/16 dev eth0’
arping_path = ‘/usr/sbin’
arping_cmd = ‘/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0’

other_pgpool_hostname0 = ‘ubuntu2204s2’
other_pgpool_port0 = 5433
other_wd_port0 = 9000

下記状態からs1,s2の順で、service pgpool2 start

root@f0a3f83b4402:/# service –status-all
[ – ] cron
[ – ] dbus
[ ? ] hwclock.sh
[ – ] pgpool2
[ + ] postgresql
[ – ] procps
[ + ] syslog-ng
[ – ] sysstat

syslog s1 抜粋

2023-10-12 14:26:37: pid 546: LOG: watchdog remote node:0 on ubuntu2204s2:9000
2023-10-12 14:26:47: pid 546: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2023-10-12 14:26:51: pid 546: LOG: setting the local node “ubuntu2204s1:5433 Linux f0a3f83b4402″ as watchdog cluster master
2023-10-12 14:26:51: pid 548: LOG: watchdog nodes ID:0 Name:”ubuntu2204s1:5433 Linux f0a3f83b4402″
2023-10-12 14:26:51: pid 548: DETAIL: Host:”ubuntu2204s1″ WD Port:9000 pgpool-II port:5433
2023-10-12 14:26:51: pid 548: LOG: watchdog nodes ID:1 Name:”Not_Set”
2023-10-12 14:26:51: pid 548: DETAIL: Host:”ubuntu2204s2″ WD Port:9000 pgpool-II port:5433
2023-10-12 14:26:51: pid 540: LOG: pgpool-II successfully started. version 4.1.4 (karasukiboshi)
2023-10-12 14:26:52: pid 555: LOG: creating socket for sending heartbeat
2023-10-12 14:26:52: pid 553: LOG: creating watchdog heartbeat receive socket.
2023-10-12 14:29:10: pid 546: LOG: new watchdog node connection is received from “172.22.0.3:19084″
2023-10-12 14:29:10: pid 546: LOG: new node joined the cluster hostname:”ubuntu2204s2″ port:9000 pgpool_port:5433
2023-10-12 14:29:10: pid 546: DETAIL: Pgpool-II version:”4.1.4” watchdog messaging version: 1.1
2023-10-12 14:29:10: pid 546: LOG: new outbound connection to ubuntu2204s2:9000
023-10-12 14:29:16: pid 546: LOG: adding watchdog node “ubuntu2204s2:5433 Linux e87ed8e08fff” to the standby list
2023-10-12 14:29:16: pid 546: LOG: quorum found
2023-10-12 14:29:23: pid 546: LOG: remote node “ubuntu2204s2:5433 Linux e87ed8e08fff” is replying again after
2023-10-12 14:30:11: pid 548: LOG: watchdog: lifecheck started

syslog s2 抜粋

2023-10-12 14:29:10: pid 497: LOG: setting the local watchdog node name to “ubuntu2204s2:5433 Linux e87ed8e08fff”
2023-10-12 14:29:10: pid 497: LOG: watchdog remote node:0 on ubuntu2204s1:9000
2023-10-12 14:29:10: pid 497: LOG: new outbound connection to ubuntu2204s1:9000
2023-10-12 14:29:10: pid 497: LOG: new watchdog node connection is received from “172.22.0.2:29374″
2023-10-12 14:29:10: pid 497: LOG: new node joined the cluster hostname:”ubuntu2204s1″ port:9000 pgpool_port:5433
2023-10-12 14:29:10: pid 497: DETAIL: Pgpool-II version:”4.1.4” watchdog messaging version: 1.1
2023-10-12 14:29:15: pid 497: LOG: setting the remote node “ubuntu2204s1:5433 Linux f0a3f83b4402” as watchdog cluster master
2023-10-12 14:29:16: pid 497: LOG: watchdog node state changed from [INITIALIZING] to [STANDBY]
2023-10-12 14:29:16: pid 497: DETAIL: our join coordinator request is accepted by cluster leader node “ubuntu2204s1:5433 Linux f0a3f83b4402″
2023-10-12 14:29:16: pid 499: LOG: watchdog nodes ID:0 Name:”ubuntu2204s2:5433 Linux e87ed8e08fff”
2023-10-12 14:29:16: pid 499: DETAIL: Host:”ubuntu2204s2″ WD Port:9000 pgpool-II port:5433
2023-10-12 14:29:16: pid 499: LOG: watchdog nodes ID:1 Name:”ubuntu2204s1:5433 Linux f0a3f83b4402″
2023-10-12 14:29:16: pid 499: DETAIL: Host:”ubuntu2204s1″ WD Port:9000 pgpool-II port:5433
2023-10-12 14:29:16: pid 491: LOG: pgpool-II successfully started. version 4.1.4 (karasukiboshi)
2023-10-12 14:30:56: pid 499: LOG: watchdog: lifecheck started

s1 down

syslog s1 抜粋

2023-10-12 14:34:09: pid 546: LOG: Watchdog is shutting down
2023-10-12 14:34:09: pid 644: LOG: watchdog: de-escalation started
2023-10-12 14:34:09: pid 644: LOG: successfully released the delegate IP:”172.22.0.88″
2023-10-12 14:34:09: pid 644: DETAIL: ‘if_down_cmd’ returned with success

syslog s2 抜粋

2023-10-12 14:34:09: pid 497: LOG: remote node “ubuntu2204s1:5433 Linux f0a3f83b4402” is shutting down
2023-10-12 14:34:09: pid 497: LOG: removing the remote node “ubuntu2204s1:5433 Linux f0a3f83b4402” from watchdog cluster master
2023-10-12 14:34:09: pid 497: LOG: We have lost the cluster master node “ubuntu2204s1:5433 Linux f0a3f83b4402”
2023-10-12 14:34:14: pid 497: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2023-10-12 14:34:19: pid 497: LOG: setting the local node “ubuntu2204s2:5433 Linux e87ed8e08fff” as watchdog cluster master
2023-10-12 14:34:19: pid 491: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
2023-10-12 14:34:36: pid 499: DETAIL: node id :1 status = “NODE DEAD” message:”No heartbeat signal from node”
2023-10-12 14:34:36: pid 497: DETAIL: No heartbeat signal from node
2023-10-12 14:34:36: pid 497: LOG: remote node “ubuntu2204s1:5433 Linux f0a3f83b4402” is shutting down

s1 up

syslog s1 抜粋

2023-10-12 14:37:00: pid 676: LOG: setting the local watchdog node name to “ubuntu2204s1:5433 Linux f0a3f83b4402”
2023-10-12 14:37:00: pid 676: LOG: watchdog remote node:0 on ubuntu2204s2:9000
2023-10-12 14:37:00: pid 676: LOG: new outbound connection to ubuntu2204s2:9000
2023-10-12 14:37:00: pid 676: LOG: setting the remote node “ubuntu2204s2:5433 Linux e87ed8e08fff” as watchdog cluster master
2023-10-12 14:37:00: pid 676: LOG: new watchdog node connection is received from “172.22.0.3:34969″
2023-10-12 14:37:00: pid 676: LOG: new node joined the cluster hostname:”ubuntu2204s2″ port:9000 pgpool_port:5433
2023-10-12 14:37:00: pid 676: DETAIL: Pgpool-II version:”4.1.4” watchdog messaging version: 1.1
2023-10-12 14:37:01: pid 676: LOG: watchdog node state changed from [INITIALIZING] to [STANDBY]
2023-10-12 14:37:01: pid 676: DETAIL: our join coordinator request is accepted by cluster leader node “ubuntu2204s2:5433 Linux e87ed8e08fff”
023-10-12 14:37:01: pid 676: LOG: get data request from local pgpool-II node received on IPC interface is forwarded to master watchdog node “ubuntu2204s2:5433 Linux e87ed8e08fff”
2023-10-12 14:37:01: pid 671: LOG: master watchdog node “ubuntu2204s2:5433 Linux e87ed8e08fff” returned status for 2 backend nodes
2023-10-12 14:37:01: pid 671: DETAIL: backend:0 is UP on cluster master “ubuntu2204s2:5433 Linux e87ed8e08fff”
2023-10-12 14:37:01: pid 678: LOG: watchdog nodes ID:0 Name:”ubuntu2204s1:5433 Linux f0a3f83b4402″
2023-10-12 14:37:01: pid 678: DETAIL: Host:”ubuntu2204s1″ WD Port:9000 pgpool-II port:5433
2023-10-12 14:37:01: pid 678: LOG: watchdog nodes ID:1 Name:”ubuntu2204s2:5433 Linux e87ed8e08fff”
2023-10-12 14:37:01: pid 678: DETAIL: Host:”ubuntu2204s2″ WD Port:9000 pgpool-II port:5433
2023-10-12 14:37:02: pid 671: LOG: pgpool-II successfully started. version 4.1.4 (karasukiboshi)
2023-10-12 14:37:06: pid 676: LOG: remote node “ubuntu2204s2:5433 Linux e87ed8e08fff” is reporting that it has found us again
2023-10-12 14:37:06: pid 676: LOG: re-sending join coordinator message to master node: “ubuntu2204s2:5433 Linux e87ed8e08fff”
2023-10-12 14:37:06: pid 676: LOG: successfully joined the watchdog cluster as standby node
2023-10-12 14:37:06: pid 676: DETAIL: our join coordinator request is accepted by cluster leader node “ubuntu2204s2:5433 Linux e87ed8e08fff”
2023-10-12 14:37:06: pid 671: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
2023-10-12 14:37:06: pid 676: LOG: get data request from local pgpool-II node received on IPC interface is forwarded to master watchdog node “ubuntu2204s2:5433 Linux e87ed8e08fff”
2023-10-12 14:37:06: pid 671: LOG: master watchdog node “ubuntu2204s2:5433 Linux e87ed8e08fff” returned status for 2 backend nodes
2023-10-12 14:37:06: pid 671: LOG: backend nodes status remains same after the sync from “ubuntu2204s2:5433 Linux e87ed8e08fff”

syslog s2 抜粋

2023-10-12 14:37:00: pid 497: LOG: new watchdog node connection is received from “172.22.0.2:40067″
2023-10-12 14:37:00: pid 497: LOG: new node joined the cluster hostname:”ubuntu2204s1″ port:9000 pgpool_port:5433
2023-10-12 14:37:00: pid 497: DETAIL: Pgpool-II version:”4.1.4″ watchdog messaging version: 1.1
2023-10-12 14:37:00: pid 497: LOG: The newly joined node:”ubuntu2204s1:5433 Linux f0a3f83b4402” had left the cluster because it was shutdown
2023-10-12 14:37:00: pid 497: LOG: new outbound connection to ubuntu2204s1:9000
2023-10-12 14:37:01: pid 497: LOG: adding watchdog node “ubuntu2204s1:5433 Linux f0a3f83b4402” to the standby list
023-10-12 14:37:06: pid 499: DETAIL: node id :1 status = “NODE ALIVE” message:”Heartbeat signal found”
2023-10-12 14:37:06: pid 497: LOG: remote node “ubuntu2204s1:5433 Linux f0a3f83b4402” became reachable again
2023-10-12 14:37:06: pid 497: LOG: remote node “ubuntu2204s1:5433 Linux f0a3f83b4402” is reachable again
2023-10-12 14:37:06: pid 497: DETAIL: trying to add it back as a standby

s2 down

syslog s1 抜粋

syslog s2 抜粋

2023-10-12 14:41:15: pid 497: LOG: Watchdog is shutting down
2023-10-12 14:41:15: pid 593: LOG: watchdog: de-escalation started
2023-10-12 14:41:15: pid 593: LOG: successfully released the delegate IP:”172.22.0.88”
2023-10-12 14:41:15: pid 593: DETAIL: ‘if_down_cmd’ returned with success

s2 up

syslog s1 抜粋

2023-10-12 14:41:15: pid 676: LOG: remote node “ubuntu2204s2:5433 Linux e87ed8e08fff” is shutting down
2023-10-12 14:41:15: pid 676: LOG: removing the remote node “ubuntu2204s2:5433 Linux e87ed8e08fff” from watchdog cluster master
2023-10-12 14:41:15: pid 676: LOG: We have lost the cluster master node “ubuntu2204s2:5433 Linux e87ed8e08fff”
2023-10-12 14:41:15: pid 676: LOG: watchdog node state changed from [STANDBY] to [JOINING]
2023-10-12 14:41:19: pid 676: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2023-10-12 14:41:20: pid 676: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2023-10-12 14:41:24: pid 676: LOG: setting the local node “ubuntu2204s1:5433 Linux f0a3f83b4402” as watchdog cluster master
2023-10-12 14:41:42: pid 678: DETAIL: node id :1 status = “NODE DEAD” message:”No heartbeat signal from node”
2023-10-12 14:41:42: pid 676: LOG: remote node “ubuntu2204s2:5433 Linux e87ed8e08fff” is shutting down
2023-10-12 14:42:21: pid 676: LOG: new watchdog node connection is received from “172.22.0.3:50383″
2023-10-12 14:42:21: pid 676: LOG: new node joined the cluster hostname:”ubuntu2204s2″ port:9000 pgpool_port:5433
2023-10-12 14:42:21: pid 676: LOG: The newly joined node:”ubuntu2204s2:5433 Linux e87ed8e08fff” had left the cluster because it was shutdown
2023-10-12 14:42:21: pid 676: LOG: new outbound connection to ubuntu2204s2:9000
2023-10-12 14:42:22: pid 676: LOG: adding watchdog node “ubuntu2204s2:5433 Linux e87ed8e08fff” to the standby list
2023-10-12 14:42:22: pid 671: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
2023-10-12 14:42:32: pid 678: DETAIL: node id :1 status = “NODE ALIVE” message:”Heartbeat signal found”
2023-10-12 14:42:32: pid 676: LOG: remote node “ubuntu2204s2:5433 Linux e87ed8e08fff” became reachable again
2023-10-12 14:42:32: pid 676: LOG: remote node “ubuntu2204s2:5433 Linux e87ed8e08fff” is reachable again
2023-10-12 14:42:32: pid 676: DETAIL: trying to add it back as a standby

syslog s2 抜粋

2023-10-12 14:42:21: pid 623: LOG: setting the local watchdog node name to “ubuntu2204s2:5433 Linux e87ed8e08fff”
2023-10-12 14:42:21: pid 623: LOG: new outbound connection to ubuntu2204s1:9000
2023-10-12 14:42:21: pid 623: LOG: setting the remote node “ubuntu2204s1:5433 Linux f0a3f83b4402” as watchdog cluster master
2023-10-12 14:42:21: pid 623: LOG: watchdog node state changed from [LOADING] to [INITIALIZING]
2023-10-12 14:42:21: pid 623: LOG: new watchdog node connection is received from “172.22.0.2:16605″
2023-10-12 14:42:21: pid 623: LOG: new node joined the cluster hostname:”ubuntu2204s1″ port:9000 pgpool_port:5433
2023-10-12 14:42:21: pid 623: DETAIL: Pgpool-II version:”4.1.4” watchdog messaging version: 1.1
2023-10-12 14:42:22: pid 623: LOG: watchdog node state changed from [INITIALIZING] to [STANDBY]
2023-10-12 14:42:22: pid 623: DETAIL: our join coordinator request is accepted by cluster leader node “ubuntu2204s1:5433 Linux f0a3f83b4402”
2023-10-12 14:42:22: pid 623: LOG: get data request from local pgpool-II node received on IPC interface is forwarded to master watchdog node “ubuntu2204s1:5433 Linux f0a3f83b4402″
2023-10-12 14:42:22: pid 624: LOG: watchdog nodes ID:0 Name:”ubuntu2204s2:5433 Linux e87ed8e08fff”
2023-10-12 14:42:22: pid 624: DETAIL: Host:”ubuntu2204s2″ WD Port:9000 pgpool-II port:5433
2023-10-12 14:42:22: pid 624: LOG: watchdog nodes ID:1 Name:”ubuntu2204s1:5433 Linux f0a3f83b4402″
2023-10-12 14:42:22: pid 624: DETAIL: Host:”ubuntu2204s1″ WD Port:9000 pgpool-II port:5433
2023-10-12 14:42:22: pid 617: LOG: master watchdog node “ubuntu2204s1:5433 Linux f0a3f83b4402” returned status for 2 backend nodes
2023-10-12 14:42:22: pid 617: DETAIL: backend:0 is UP on cluster master “ubuntu2204s1:5433 Linux f0a3f83b4402”
2023-10-12 14:42:22: pid 617: LOG: pgpool-II successfully started. version 4.1.4 (karasukiboshi)
2023-10-12 14:42:32: pid 623: LOG: remote node “ubuntu2204s1:5433 Linux f0a3f83b4402” is reporting that it has found us again
2023-10-12 14:42:32: pid 623: DETAIL: our join coordinator request is accepted by cluster leader node “ubuntu2204s1:5433 Linux f0a3f83b4402”
2023-10-12 14:42:32: pid 617: LOG: we have joined the watchdog cluster as STANDBY node
2023-10-12 14:42:32: pid 623: LOG: get data request from local pgpool-II node received on IPC interface is forwarded to master watchdog node “ubuntu2204s1:5433 Linux f0a3f83b4402”
2023-10-12 14:42:32: pid 617: LOG: master watchdog node “ubuntu2204s1:5433 Linux f0a3f83b4402” returned status for 2 backend nodes
2023-10-12 14:42:32: pid 617: LOG: backend nodes status remains same after the sync from “ubuntu2204s1:5433 Linux f0a3f83b4402”

疏通確認（s1,s2がそれぞれMASTERの時）

ログは抜粋で漏れもありますが、MASTER、SLAVEの移行が確認できます。ただIPアドレスの付け替えは、eth0:0が存在する状態で、stop->startとすると、他方にeth0:0ができます。stopの時点では付け替えがされず172.22.0.88が存在しない状態になります。
何かスクリプトを動かすような設定が必要なのか、3台にすると良いのかわかりませんが、このような結果になりました。
if_up_cmdは失敗するとエラーが出るのですが、成功で出ないためifconfigで確認します。このように時間経過で成功しているものもあるのでログを見るときに注意が必要です。（if_down_cmdの成功はでます）

まずは2台での挙動の確認でした。

参考）
https://www.pgpool.net/docs/42/ja/html/example-cluster.html
https://www.pgpool.net/docs/latest/ja/html/runtime-watchdog-config.html
https://www.pgpool.net/docs/pgpool-II-3.2.0/tutorial-watchdog-ja.html
※バージョンちがいによる記述の差異に注意