首页 » git » 正文

Postgresql(Gitlab)修改了事物日志导致无法启动

今天一个同事修改了Gitlab中Postgresql的事物日志导致无法启动,日志路径如下:

/var/opt/gitlab/postgresql/data/pg_xlog/0000000100000000000000001

启动gitlab报错:

2015-05-07_06:57:31.44580 LOG: startup process (PID 24323) was terminated by signal 6: Aborted
2015-05-07_06:57:31.44584 LOG: aborting startup due to startup process failure
2015-05-07_06:57:32.57551 LOG: database system was shut down at 2015-05-07 06:01:13 GMT
2015-05-07_06:57:32.57616 LOG: invalid magic number 209F in log file 0, segment 1, offset 13164544
2015-05-07_06:57:32.57626 LOG: invalid primary checkpoint record
2015-05-07_06:57:32.57633 LOG: invalid secondary checkpoint record
2015-05-07_06:57:32.57637 PANIC: could not locate a valid checkpoint record
2015-05-07_06:57:32.58162 LOG: startup process (PID 24328) was terminated by signal 6: Aborted
2015-05-07_06:57:32.58165 LOG: aborting startup due to startup process failure
2015-05-07_06:57:33.71350 LOG: database system was shut down at 2015-05-07 06:01:13 GMT
2015-05-07_06:57:33.71406 LOG: invalid magic number 209F in log file 0, segment 1, offset 13164544
2015-05-07_06:57:33.71422 LOG: invalid primary checkpoint record
2015-05-07_06:57:33.71440 LOG: invalid secondary checkpoint record
2015-05-07_06:57:33.71452 PANIC: could not locate a valid checkpoint record
2015-05-07_06:57:33.71862 LOG: startup process (PID 24333) was terminated by signal 6: Aborted
2015-05-07_06:57:33.71865 LOG: aborting startup due to startup process failure
2015-05-07_06:57:34.83445 LOG: database system was shut down at 2015-05-07 06:01:13 GMT
2015-05-07_06:57:34.83475 LOG: invalid magic number 209F in log file 0, segment 1, offset 13164544
2015-05-07_06:57:34.83484 LOG: invalid primary checkpoint record
2015-05-07_06:57:34.83490 LOG: invalid secondary checkpoint record

可见,事物日志格式被毁坏,所以,找不到魔数。我开始认为事物日志不是必须的数据日志,首先尝试了删除操作,而且在我的个人电脑的Gitlab上做了尝试,删除事物日志后Gitlab正常重启,然而,在我们的服务器上删除后重启仍然出错:

2015-05-07_07:05:33.92258 LOG:  could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): No such file or directory
2015-05-07_07:05:33.92266 LOG:  invalid primary checkpoint record
2015-05-07_07:05:33.92280 LOG:  could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): No such file or directory
2015-05-07_07:05:33.92294 LOG:  invalid secondary checkpoint record
2015-05-07_07:05:33.92306 PANIC:  could not locate a valid checkpoint record
2015-05-07_07:05:33.93380 LOG:  startup process (PID 26695) was terminated by signal 6: Aborted
2015-05-07_07:05:33.93384 LOG:  aborting startup due to startup process failure
2015-05-07_07:05:35.12269 LOG:  database system was shut down at 2015-05-07 06:01:13 GMT
2015-05-07_07:05:35.12302 LOG:  could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): No such file or directory
2015-05-07_07:05:35.12311 LOG:  invalid primary checkpoint record
2015-05-07_07:05:35.12319 LOG:  could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): No such file or directory
2015-05-07_07:05:35.12326 LOG:  invalid secondary checkpoint record
2015-05-07_07:05:35.12331 PANIC:  could not locate a valid checkpoint record
2015-05-07_07:05:35.12604 LOG:  startup process (PID 26698) was terminated by signal 6: Aborted
2015-05-07_07:05:35.12613 LOG:  aborting startup due to startup process failure
ç2015-05-07_07:05:36.26036 LOG:  database system was shut down at 2015-05-07 06:01:13 GMT
2015-05-07_07:05:36.26059 LOG:  could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): No such file or directory
2015-05-07_07:05:36.26068 LOG:  invalid primary checkpoint record
2015-05-07_07:05:36.26075 LOG:  could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): No such file or directory
2015-05-07_07:05:36.26082 LOG:  invalid secondary checkpoint record

这次提示说明没有这个文件确实不行,Gitlab拒绝重启。

在网上找了半天,最后使用日志重置等命令将其修好。步骤如下:

1. 首先找到Gitlab使用的数据库:

[root@localhost data]# gitlab-ctl status
run: logrotate: (pid 16789) 2010s; run: log: (pid 1354) 78349s
run: nginx: (pid 16792) 2010s; run: log: (pid 1213) 78355s
run: postgresql: (pid 16796) 2009s; run: log: (pid 902) 78432s
run: redis: (pid 16798) 2009s; run: log: (pid 608) 78497s
run: sidekiq: (pid 16800) 2009s; run: log: (pid 1155) 78361s
run: unicorn: (pid 16804) 2009s; run: log: (pid 1105) 78367s

2. 然后查找postgresql所在的目录:

[root@localhost data]# ps -elf | grep  postgresql
4 S root       890   535  0  80   0 -  1045 poll_s May06 ?        00:00:00 runsv postgresql
4 S root       902   890  0  80   0 -  1081 poll_s May06 ?        00:00:00 svlogd -tt /var/log/gitlab/postgresql
4 S gitlab-+ 16796   890  0  80   0 - 331484 poll_s 15:45 ?       00:00:00 /opt/gitlab/embedded/bin/postgres -D /var/opt/gitlab/postgresql/data
0 S root     20240 10053  0  80   0 - 28161 pipe_w 16:18 pts/3    00:00:00 grep --color=auto postgresql

看到postgresql的二进制文件再/opt/gitlab/embedded/bin/postgres下。

3. 在相同的目录下可以找到另外两个命令:

pg_controldata 
pg_resetxlog 

4. 这两个命令可以用来重置postgresql的事物日志,不过我还是不明白是重写新的日志还是修复之前的日志,个人觉得是根据数据文件或者内存数据重写事物日志文件。

5. 查找NextOID和NextXID:

[root@localhost data]# /opt/gitlab/embedded/bin/pg_controldata /var/opt/gitlab/postgresql/data/
pg_control version number:            922
Catalog version number:               201204301
Database system identifier:           6145704018498809342
Database cluster state:               in production
pg_control last modified:             Thu 07 May 2015 03:50:09 PM CST
Latest checkpoint location:           0/20000B0
Prior checkpoint location:            0/2000020
Latest checkpoint's REDO location:    0/20000B0
Latest checkpoint's TimeLineID:       1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0/2184
Latest checkpoint's NextOID:          16857
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        1878
Latest checkpoint's oldestXID's DB:   16385
Latest checkpoint's oldestActiveXID:  0
Time of latest checkpoint:            Thu 07 May 2015 03:50:09 PM CST
Minimum recovery ending location:     0/0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            minimal
Current max_connections setting:      200
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value

从输出中可以看到有个NextOID和NextXID,记下这两个数字,下面的命令里面会使用。

5. 重置事物日志:

[root@localhost data]# sudo -u gitlab-psql /opt/gitlab/embedded/bin/pg_resetxlog -o 16857 -x 2184 /var/opt/gitlab/postgresql/data/
Transaction log reset

提示重置成功。如果提示数据库被锁,说明数据库在打开状态,先关闭Gitlab后重试。

gitlab-ctl stop

6. 然后重启Gitlab即可。

gitlab-ctl start

参考:

http://www.cnblogs.com/eshizhan/archive/2012/09/23/2699327.html