Skip to content

Commit 493c475

Browse files
srinathv2Commitfest Bot
authored andcommitted
pg_rewind: ignore shutdown-only WAL when determining end-of-WAL
Previously, pg_rewind determined the end-of-WAL on the target by using the last shutdown checkpoint (or minRecoveryPoint for a standby). This caused false positives in scenarios where the old primary was shut down after a failover: the only WAL record generated was a shutdown checkpoint, while the new primary and old primary still contained identical data. In such cases, pg_rewind incorrectly concluded that if (target_wal_endrec > divergerec) rewind_needed = true; and performed a rewind even though no real changes existed after the divergence point. With this patch, pg_rewind now scans backward from the last checkpoint to locate the most recent valid WAL record that is not a shutdown checkpoint or XLOG switch. As a result, a rewind is only required when the target contains actual changes past the divergence point, avoiding unnecessary rewind operations in clean failover scenarios.
1 parent ae0e1be commit 493c475

File tree

3 files changed

+55
-16
lines changed

3 files changed

+55
-16
lines changed

src/bin/pg_rewind/parsexlog.c

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -117,18 +117,23 @@ extractPageMap(const char *datadir, XLogRecPtr startpoint, int tliIndex,
117117
}
118118

119119
/*
120-
* Reads one WAL record. Returns the end position of the record, without
121-
* doing anything with the record itself.
120+
* Find the last valid WAL record after the divergence point.
121+
*
122+
* Skips over records such as shutdown checkpoints and XLOG
123+
* switch records, which otherwise could make pg_rewind think a
124+
* rewind is required even when no real changes happened after failover.
125+
* Returns the end position of the last meaningful record.
122126
*/
123127
XLogRecPtr
124-
readOneRecord(const char *datadir, XLogRecPtr ptr, int tliIndex,
128+
findLastValidRecord(const char *datadir, XLogRecPtr ptr, int tliIndex,
125129
const char *restoreCommand)
126130
{
127131
XLogRecord *record;
128132
XLogReaderState *xlogreader;
129133
char *errormsg;
130134
XLogPageReadPrivate private;
131135
XLogRecPtr endptr;
136+
uint8 info;
132137

133138
private.tliIndex = tliIndex;
134139
private.restoreCommand = restoreCommand;
@@ -138,16 +143,37 @@ readOneRecord(const char *datadir, XLogRecPtr ptr, int tliIndex,
138143
if (xlogreader == NULL)
139144
pg_fatal("out of memory while allocating a WAL reading processor");
140145

146+
for (;;)
147+
{
148+
XLogBeginRead(xlogreader, ptr);
149+
record = XLogReadRecord(xlogreader, &errormsg);
150+
if (record == NULL)
151+
{
152+
if (errormsg)
153+
pg_fatal("could not read WAL record at %X/%08X: %s",
154+
LSN_FORMAT_ARGS(ptr), errormsg);
155+
else
156+
pg_fatal("could not read WAL record at %X/%08X",
157+
LSN_FORMAT_ARGS(ptr));
158+
}
159+
ptr = record->xl_prev;
160+
info = record->xl_info & ~XLR_INFO_MASK;
161+
if((info != XLOG_CHECKPOINT_SHUTDOWN) && (info != XLOG_SWITCH))
162+
{
163+
break;
164+
}
165+
}
166+
ptr = xlogreader->EndRecPtr;
141167
XLogBeginRead(xlogreader, ptr);
142168
record = XLogReadRecord(xlogreader, &errormsg);
143169
if (record == NULL)
144170
{
145171
if (errormsg)
146172
pg_fatal("could not read WAL record at %X/%08X: %s",
147-
LSN_FORMAT_ARGS(ptr), errormsg);
173+
LSN_FORMAT_ARGS(ptr), errormsg);
148174
else
149175
pg_fatal("could not read WAL record at %X/%08X",
150-
LSN_FORMAT_ARGS(ptr));
176+
LSN_FORMAT_ARGS(ptr));
151177
}
152178
endptr = xlogreader->EndRecPtr;
153179

src/bin/pg_rewind/pg_rewind.c

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -405,16 +405,29 @@ main(int argc, char **argv)
405405

406406

407407
/*
408-
* Determine the end-of-WAL on the target.
409-
*
410-
* The WAL ends at the last shutdown checkpoint, or at
411-
* minRecoveryPoint if it was a standby. (If we supported rewinding a
412-
* server that was not shut down cleanly, we would need to replay
413-
* until we reach the first invalid record, like crash recovery does.)
414-
*/
415-
416-
/* read the checkpoint record on the target to see where it ends. */
417-
chkptendrec = readOneRecord(datadir_target,
408+
* Determine the effective end-of-WAL on the target.
409+
*
410+
* Previously, this was taken directly from the last shutdown checkpoint,
411+
* or from minRecoveryPoint if the server was a standby. However, this
412+
* approach can falsely indicate divergence: when the old primary is shut
413+
* down after promoting a standby, the only WAL record generated on the
414+
* old primary is a shutdown checkpoint. In such cases, both clusters have
415+
* identical data, yet the presence of that extra checkpoint record makes
416+
* pg_rewind believe the target WAL extends past the divergence point:
417+
*
418+
* if (target_wal_endrec > divergerec)
419+
* rewind_needed = true;
420+
*
421+
* That sets rewind_needed = true even though no user data changes exist.
422+
*
423+
* To avoid this, we no longer treat a plain shutdown checkpoint
424+
* as a meaningful record when determining end-of-WAL. We instead
425+
* scan backward to the last valid WAL record *after* divergence,
426+
* skipping over shutdown-only artifacts. This ensures rewind is only
427+
* triggered if there are actual changes on the target after divergence.
428+
*/
429+
430+
chkptendrec = findLastValidRecord(datadir_target,
418431
ControlFile_target.checkPoint,
419432
targetNentries - 1,
420433
restore_command);

src/bin/pg_rewind/pg_rewind.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ extern void findLastCheckpoint(const char *datadir, XLogRecPtr forkptr,
4040
XLogRecPtr *lastchkptrec, TimeLineID *lastchkpttli,
4141
XLogRecPtr *lastchkptredo,
4242
const char *restoreCommand);
43-
extern XLogRecPtr readOneRecord(const char *datadir, XLogRecPtr ptr,
43+
extern XLogRecPtr findLastValidRecord(const char *datadir, XLogRecPtr ptr,
4444
int tliIndex, const char *restoreCommand);
4545

4646
/* in pg_rewind.c */

0 commit comments

Comments
 (0)