Quick Links

Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

Lists:	pgsql-bugs

From:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>
To:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-19 11:46:30
Message-ID:	680bdaf6-f7d1-4536-b580-05c2760c67c6@deepbluecap.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel
6.8.0); x86-64

Good morning from DeepBlueCapital. Soon after upgrading to 17.5 from 17.4, we
started seeing logical replication failures with publisher errors like this:

ERROR: invalid memory alloc request size 1196493216

(the exact size varies). Here is a typical log extract from the publisher:

2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\(at)blue DEBUG:
00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time
2025-05-19 10:30:07.467048+02
2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\(at)blue LOCATION:
ProcessStandbyReplyMessage, walsender.c:2431
2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\(at)blue DEBUG:
00000: skipped replication of an empty transaction with XID: 207637565
2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\(at)blue CONTEXT:
slot "jnb\_production", output plugin "pgoutput", in the commit callback,
associated LSN FB03/349FF938
2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\(at)blue LOCATION:
pgoutput\_commit\_txn, pgoutput.c:629
2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\(at)blue DEBUG:
00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191
2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\(at)blue LOCATION:
UpdateDecodingStats, logical.c:1943
2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\(at)blue DEBUG:
00000: found top level transaction 207637519, with catalog changes
2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\(at)blue LOCATION:
SnapBuildCommitTxn, snapbuild.c:1150
2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\(at)blue DEBUG:
00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0
2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\(at)blue LOCATION:
SnapBuildDistributeSnapshotAndInval, snapbuild.c:915
2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\(at)blue ERROR:
XX000: invalid memory alloc request size 1196493216

If I'm reading it right, things go wrong on the publisher while preparing the
message, i.e. it's not a subscriber problem.

This particular instance was triggered by a large number of catalog
invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s
FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line:

rmgr: Transaction len (rec/tot): 10665/ 10665, tx: 207637519, lsn:
FB03/34A1AAE0, prev FB03/34A1A8C8, desc: COMMIT 2025-05-19 08:10:12.880599 CEST;
dropped stats: 2/17426/661557718 2/17426/661557717 2/17426/661557714
2/17426/661557678 2/17426/661557677 2/17426/661557674 2/17426/661557673
2/17426/661557672 2/17426/661557669 2/17426/661557618 2/17426/661557617
2/17426/661557614; inval msgs: catcache 80 catcache 79 catcache 80 catcache 79
catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55
catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 55
catcache 54 catcache 55 catcache 54 catcache 55 catcache 54 catcache 80 catcache
79 catcache 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
catcache 6 catcache 32 catcache 55 catcache 54 catcache 55 catcache 54 catcache
55 catcache 54 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
catcache 63 catcache 63 catcache 55 catcache 54 catcache 80 catcache 79 catcache
80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache
6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6
catcache 32 catcache 55 catcache 54 catcache 55 catcache 54 catcache 55 catcache
54 catcache 80 catcache 79 catcache 80 catcache 79 catcache 55 catcache 54
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55
catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 55 catcache 54
catcache 55 catcache 54 catcache 55 catcache 54 catcache 63 catcache 63 catcache
63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
catcache 63 catcache 63 catcache 55 catcache 54 catcache 32 catcache 7 catcache
6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache
79 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 7 catcache
6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 32
catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80 catcache 79
catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 catcache 32 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache 79 catcache
63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
catcache 32 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80
catcache 79 catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 55 catcache 54 snapshot 2608 relcache 661557614 snapshot
1214 relcache 661557617 relcache 661557618 relcache 661557617 snapshot 2608
relcache 661557617 relcache 661557618 relcache 661557614 snapshot 2608 snapshot
2608 relcache 661557669 snapshot 1214 relcache 661557672 relcache 661557673
relcache 661557672 snapshot 2608 relcache 661557672 relcache 661557673 relcache
661557669 snapshot 2608 relcache 661557669 snapshot 2608 relcache 661557674
snapshot 1214 relcache 661557677 relcache 661557678 relcache 661557677 snapshot
2608 relcache 661557677 relcache 661557678 relcache 661557674 snapshot 2608
snapshot 2608 relcache 661557714 snapshot 1214 relcache 661557717 relcache
661557718 relcache 661557717 snapshot 2608 relcache 661557717 relcache 661557718
relcache 661557714 snapshot 2608 relcache 661557714 relcache 661557718 relcache
661557717 snapshot 2608 relcache 661557717 snapshot 2608 snapshot 2608 snapshot
2608 relcache 661557714 snapshot 2608 snapshot 1214 relcache 661557678 relcache
661557677 snapshot 2608 relcache 661557677 snapshot 2608 snapshot 2608 snapshot
2608 relcache 661557674 snapshot 2608 snapshot 1214 relcache 661557673 relcache
661557672 snapshot 2608 relcache 661557672 snapshot 2608 snapshot 2608 snapshot
2608 relcache 661557669 snapshot 2608 snapshot 1214 relcache 661557618 relcache
661557617 snapshot 2608 relcache 661557617 snapshot 2608 snapshot 2608 snapshot
2608 relcache 661557614 snapshot 2608 snapshot 1214

While it is long, it doesn't seem to merit allocating anything like 1GB of
memory. So I'm guessing that postgres is miscalculating the required size somehow.

If I skip over this LSN, for example by dropping the subscription and recreating
it anew, then things go fine for a while before hitting another "invalid memory
alloc request", i.e. it wasn't just a one-off. On the other hand, after
downgrading to 17.4, subscribers spontaneously recovered and the issue has gone
way. Since I didn't skip over the last LSN of this kind, presumably 17.4
successfully serialized a message for the same problematic bit of WAL that
caused 17.5 to blow up, which suggests a regression between 17.4 and 17.5.

Best wishes, Duncan.

From:	Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-21 05:46:15
Message-ID:	CANhcyEWp_T7tX-yKbdbxdUR144UAZ7oxNM_AORfCvWHZg0ja5w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Mon, 19 May 2025 at 20:08, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com> wrote:
>
> PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel
> 6.8.0); x86-64
>
> Good morning from DeepBlueCapital. Soon after upgrading to 17.5 from 17.4, we
> started seeing logical replication failures with publisher errors like this:
>
> ERROR: invalid memory alloc request size 1196493216
>
> (the exact size varies). Here is a typical log extract from the publisher:
>
> 2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\(at)blue DEBUG:
> 00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time
> 2025-05-19 10:30:07.467048+02
> 2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\(at)blue LOCATION:
> ProcessStandbyReplyMessage, walsender.c:2431
> 2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\(at)blue DEBUG:
> 00000: skipped replication of an empty transaction with XID: 207637565
> 2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\(at)blue CONTEXT:
> slot "jnb\_production", output plugin "pgoutput", in the commit callback,
> associated LSN FB03/349FF938
> 2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\(at)blue LOCATION:
> pgoutput\_commit\_txn, pgoutput.c:629
> 2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\(at)blue DEBUG:
> 00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191
> 2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\(at)blue LOCATION:
> UpdateDecodingStats, logical.c:1943
> 2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\(at)blue DEBUG:
> 00000: found top level transaction 207637519, with catalog changes
> 2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\(at)blue LOCATION:
> SnapBuildCommitTxn, snapbuild.c:1150
> 2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\(at)blue DEBUG:
> 00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0
> 2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\(at)blue LOCATION:
> SnapBuildDistributeSnapshotAndInval, snapbuild.c:915
> 2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\(at)blue ERROR:
> XX000: invalid memory alloc request size 1196493216
>
> If I'm reading it right, things go wrong on the publisher while preparing the
> message, i.e. it's not a subscriber problem.
>
> This particular instance was triggered by a large number of catalog
> invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s
> FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line:
>
> rmgr: Transaction len (rec/tot): 10665/ 10665, tx: 207637519, lsn:
> FB03/34A1AAE0, prev FB03/34A1A8C8, desc: COMMIT 2025-05-19 08:10:12.880599 CEST;
> dropped stats: 2/17426/661557718 2/17426/661557717 2/17426/661557714
> 2/17426/661557678 2/17426/661557677 2/17426/661557674 2/17426/661557673
> 2/17426/661557672 2/17426/661557669 2/17426/661557618 2/17426/661557617
> 2/17426/661557614; inval msgs: catcache 80 catcache 79 catcache 80 catcache 79
> catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55
> catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 55
> catcache 54 catcache 55 catcache 54 catcache 55 catcache 54 catcache 80 catcache
> 79 catcache 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 32 catcache 55 catcache 54 catcache 55 catcache 54 catcache
> 55 catcache 54 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
> catcache 63 catcache 63 catcache 55 catcache 54 catcache 80 catcache 79 catcache
> 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache
> 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 32 catcache 55 catcache 54 catcache 55 catcache 54 catcache 55 catcache
> 54 catcache 80 catcache 79 catcache 80 catcache 79 catcache 55 catcache 54
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55
> catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 55 catcache 54
> catcache 55 catcache 54 catcache 55 catcache 54 catcache 63 catcache 63 catcache
> 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
> catcache 63 catcache 63 catcache 55 catcache 54 catcache 32 catcache 7 catcache
> 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache
> 79 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
> catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 7 catcache
> 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 32
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80 catcache 79
> catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 catcache 32 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache 79 catcache
> 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
> catcache 32 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80
> catcache 79 catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 snapshot 2608 relcache 661557614 snapshot
> 1214 relcache 661557617 relcache 661557618 relcache 661557617 snapshot 2608
> relcache 661557617 relcache 661557618 relcache 661557614 snapshot 2608 snapshot
> 2608 relcache 661557669 snapshot 1214 relcache 661557672 relcache 661557673
> relcache 661557672 snapshot 2608 relcache 661557672 relcache 661557673 relcache
> 661557669 snapshot 2608 relcache 661557669 snapshot 2608 relcache 661557674
> snapshot 1214 relcache 661557677 relcache 661557678 relcache 661557677 snapshot
> 2608 relcache 661557677 relcache 661557678 relcache 661557674 snapshot 2608
> snapshot 2608 relcache 661557714 snapshot 1214 relcache 661557717 relcache
> 661557718 relcache 661557717 snapshot 2608 relcache 661557717 relcache 661557718
> relcache 661557714 snapshot 2608 relcache 661557714 relcache 661557718 relcache
> 661557717 snapshot 2608 relcache 661557717 snapshot 2608 snapshot 2608 snapshot
> 2608 relcache 661557714 snapshot 2608 snapshot 1214 relcache 661557678 relcache
> 661557677 snapshot 2608 relcache 661557677 snapshot 2608 snapshot 2608 snapshot
> 2608 relcache 661557674 snapshot 2608 snapshot 1214 relcache 661557673 relcache
> 661557672 snapshot 2608 relcache 661557672 snapshot 2608 snapshot 2608 snapshot
> 2608 relcache 661557669 snapshot 2608 snapshot 1214 relcache 661557618 relcache
> 661557617 snapshot 2608 relcache 661557617 snapshot 2608 snapshot 2608 snapshot
> 2608 relcache 661557614 snapshot 2608 snapshot 1214
>
> While it is long, it doesn't seem to merit allocating anything like 1GB of
> memory. So I'm guessing that postgres is miscalculating the required size somehow.
>
> If I skip over this LSN, for example by dropping the subscription and recreating
> it anew, then things go fine for a while before hitting another "invalid memory
> alloc request", i.e. it wasn't just a one-off. On the other hand, after
> downgrading to 17.4, subscribers spontaneously recovered and the issue has gone
> way. Since I didn't skip over the last LSN of this kind, presumably 17.4
> successfully serialized a message for the same problematic bit of WAL that
> caused 17.5 to blow up, which suggests a regression between 17.4 and 17.5.
>
Hi Duncan,

Thanks for reporting this.
I tried adding around ~80000 invalidations but could not reproduce the issue.
Can you share the steps to reproduce the above scenario?

Thanks and Regards,
Shlok Kyal

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-21 05:48:24
Message-ID:	CAA4eK1JwJw6JOnfDxtGtSRF7kM0LbEVPRmNxWeJa5+wyoG05Xg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Mon, May 19, 2025 at 8:08 PM Duncan Sands
<duncan(dot)sands(at)deepbluecap(dot)com> wrote:
>
> PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel
> 6.8.0); x86-64
>
> Good morning from DeepBlueCapital. Soon after upgrading to 17.5 from 17.4, we
> started seeing logical replication failures with publisher errors like this:
>
> ERROR: invalid memory alloc request size 1196493216
>
> (the exact size varies). Here is a typical log extract from the publisher:
>
> 2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\(at)blue DEBUG:
> 00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time
> 2025-05-19 10:30:07.467048+02
> 2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\(at)blue LOCATION:
> ProcessStandbyReplyMessage, walsender.c:2431
> 2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\(at)blue DEBUG:
> 00000: skipped replication of an empty transaction with XID: 207637565
> 2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\(at)blue CONTEXT:
> slot "jnb\_production", output plugin "pgoutput", in the commit callback,
> associated LSN FB03/349FF938
> 2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\(at)blue LOCATION:
> pgoutput\_commit\_txn, pgoutput.c:629
> 2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\(at)blue DEBUG:
> 00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191
> 2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\(at)blue LOCATION:
> UpdateDecodingStats, logical.c:1943
> 2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\(at)blue DEBUG:
> 00000: found top level transaction 207637519, with catalog changes
> 2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\(at)blue LOCATION:
> SnapBuildCommitTxn, snapbuild.c:1150
> 2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\(at)blue DEBUG:
> 00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0
> 2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\(at)blue LOCATION:
> SnapBuildDistributeSnapshotAndInval, snapbuild.c:915
> 2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\(at)blue ERROR:
> XX000: invalid memory alloc request size 1196493216
>
> If I'm reading it right, things go wrong on the publisher while preparing the
> message, i.e. it's not a subscriber problem.
>

Right, I also think so.

> This particular instance was triggered by a large number of catalog
> invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s
> FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line:
>
...
...
>
> While it is long, it doesn't seem to merit allocating anything like 1GB of
> memory. So I'm guessing that postgres is miscalculating the required size somehow.
>

We fixed a bug in commit 4909b38af0 to distribute invalidation at the
transaction end to avoid data loss in certain cases, which could cause
such a problem. I am wondering that even prior to that commit, we
would eventually end up allocating the required memory for a
transaction for all the invalidations because of repalloc in
ReorderBufferAddInvalidations, so why it matter with this commit? One
possibility is that we need allocations for multiple in-progress
transactions now. I'll think more about this. It would be helpful if
you could share more details about the workload, or if possible, a
testcase or script using which we can reproduce this problem.

--
With Regards,
Amit Kapila.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-21 11:12:15
Message-ID:	CAA4eK1LMgqeT_bPZ3MH-VKvwOqpZyfJmF7knZhu1rqt2Pqsnwg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Wed, May 21, 2025 at 11:18 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, May 19, 2025 at 8:08 PM Duncan Sands
> <duncan(dot)sands(at)deepbluecap(dot)com> wrote:
> >
> > While it is long, it doesn't seem to merit allocating anything like 1GB of
> > memory. So I'm guessing that postgres is miscalculating the required size somehow.
> >
>
> We fixed a bug in commit 4909b38af0 to distribute invalidation at the
> transaction end to avoid data loss in certain cases, which could cause
> such a problem. I am wondering that even prior to that commit, we
> would eventually end up allocating the required memory for a
> transaction for all the invalidations because of repalloc in
> ReorderBufferAddInvalidations, so why it matter with this commit? One
> possibility is that we need allocations for multiple in-progress
> transactions now.
>

I think the problem here is that when we are distributing
invalidations to a concurrent transaction, in addition to queuing the
invalidations as a change, we also copy the distributed invalidations
along with the original transaction's invalidations via repalloc in
ReorderBufferAddInvalidations. So, when there are many in-progress
transactions, each would try to copy all its accumulated invalidations
to the remaining in-progress transactions. This could lead to such an
increase in allocation request size. However, after queuing the
change, we don't need to copy it along with the original transaction's
invalidations. This is because the copy is only required when we don't
process any changes in cases like ReorderBufferForget(). I have
analyzed all such cases, and my analysis is as follows:

ReorderBufferForget()
------------------------------
It is okay not to perform the invalidations that we got from other
concurrent transactions during ReorderBufferForget. This is because
ReorderBufferForget executes invalidations when we skip the
transaction being decoded, as it is not from a database of interest.
So, we execute only to invalidate shared catalogs (See comment at the
caller of ReorderBufferForget). It is sufficient to execute such
invalidations in the source transaction only because the transaction
being skipped wouldn't have loaded anything in the shared catalog.

ReorderBufferAbort()
-----------------------------
ReorderBufferAbort() process invalidation when it has already streamed
some changes. Whenever it would have streamed the change, it would
have processed the concurrent transactions' invalidation messages that
happened before the statement that led to streaming. That should be
sufficient for us.

Consider the following variant of the original case that required the
distribution of invalidations:
1) S1: CREATE TABLE d(data text not null);
2) S1: INSERT INTO d VALUES('d1');
3) S2: BEGIN; INSERT INTO d VALUES('d2');
4) S1: ALTER PUBLICATION pb ADD TABLE d;
5) S2: INSERT INTO unrelated_tab VALUES(1);
6) S2: ROLLBACK;
7) S2: INSERT INTO d VALUES('d3');
8) S1: INSERT INTO d VALUES('d4');

The problem with the sequence is that the insert from 3) could be
decoded *after* 4) in step 5) due to streaming, and that to decode the
insert (which happened before the ALTER) the catalog snapshot and
cache state is from *before* the ALTER TABLE. Because the transaction
started in 3) doesn't modify any catalogs, no invalidations are
executed after decoding it. The result could be that the cache looks
like it did at 3), not like after 4). However, this won't create a
problem because while streaming at 5), we would execute invalidation
from S-1 due to the change added via message
REORDER_BUFFER_CHANGE_INVALIDATION in ReorderBufferAddInvalidations.

ReorderBufferInvalidate
--------------------------------
The reason is the same as ReorderBufferForget(), as it executes
invalidations for the same reason, but with a different function to
avoid the cleanup of the buffer at the end.

XLOG_XACT_INVALIDATIONS
-------------------------------------------
While processing XLOG_XACT_INVALIDATIONS, we don't need invalidations
accumulated from other xacts because this is a special case to execute
invalidations from a particular command (DDL) in a transaction. It
won't build any cache, so it can't create any invalid state.

--
With Regards,
Amit Kapila.

From:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-21 11:30:58
Message-ID:	f0b728d5-0061-46d2-a52c-7babd8b6024f@deepbluecap.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Hi Amit and Shlok, thanks for thinking about this issue. We are working on
reproducing it in our test environment. Since it seems likely to be related to
our primary database being very busy with lots of concurrency and large
transactions, we are starting by creating a streaming replication copy of our
primary server (this copy to run 17.5, with the primary on 17.4), with the idea
of then doing logical replication from the standby to see if we hit the same
issue. If so, that gives something to poke at, and we can move on to something
better from there.

Best wishes, Duncan.

On 21/05/2025 07:48, Amit Kapila wrote:
> On Mon, May 19, 2025 at 8:08 PM Duncan Sands
> <duncan(dot)sands(at)deepbluecap(dot)com> wrote:
>>
>> PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel
>> 6.8.0); x86-64
>>
>> Good morning from DeepBlueCapital. Soon after upgrading to 17.5 from 17.4, we
>> started seeing logical replication failures with publisher errors like this:
>>
>> ERROR: invalid memory alloc request size 1196493216
>>
>> (the exact size varies). Here is a typical log extract from the publisher:
>>
>> 2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\(at)blue DEBUG:
>> 00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time
>> 2025-05-19 10:30:07.467048+02
>> 2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\(at)blue LOCATION:
>> ProcessStandbyReplyMessage, walsender.c:2431
>> 2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\(at)blue DEBUG:
>> 00000: skipped replication of an empty transaction with XID: 207637565
>> 2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\(at)blue CONTEXT:
>> slot "jnb\_production", output plugin "pgoutput", in the commit callback,
>> associated LSN FB03/349FF938
>> 2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\(at)blue LOCATION:
>> pgoutput\_commit\_txn, pgoutput.c:629
>> 2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\(at)blue DEBUG:
>> 00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191
>> 2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\(at)blue LOCATION:
>> UpdateDecodingStats, logical.c:1943
>> 2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\(at)blue DEBUG:
>> 00000: found top level transaction 207637519, with catalog changes
>> 2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\(at)blue LOCATION:
>> SnapBuildCommitTxn, snapbuild.c:1150
>> 2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\(at)blue DEBUG:
>> 00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0
>> 2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\(at)blue LOCATION:
>> SnapBuildDistributeSnapshotAndInval, snapbuild.c:915
>> 2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\(at)blue ERROR:
>> XX000: invalid memory alloc request size 1196493216
>>
>> If I'm reading it right, things go wrong on the publisher while preparing the
>> message, i.e. it's not a subscriber problem.
>>
>
> Right, I also think so.
>
>> This particular instance was triggered by a large number of catalog
>> invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s
>> FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line:
>>
> ...
> ...
>>
>> While it is long, it doesn't seem to merit allocating anything like 1GB of
>> memory. So I'm guessing that postgres is miscalculating the required size somehow.
>>
>
> We fixed a bug in commit 4909b38af0 to distribute invalidation at the
> transaction end to avoid data loss in certain cases, which could cause
> such a problem. I am wondering that even prior to that commit, we
> would eventually end up allocating the required memory for a
> transaction for all the invalidations because of repalloc in
> ReorderBufferAddInvalidations, so why it matter with this commit? One
> possibility is that we need allocations for multiple in-progress
> transactions now. I'll think more about this. It would be helpful if
> you could share more details about the workload, or if possible, a
> testcase or script using which we can reproduce this problem.
>

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>
Cc:	"pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-21 11:48:12
Message-ID:	OSCPR01MB14966139B6C89C27647712A90F59EA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear hackers,

> I think the problem here is that when we are distributing
> invalidations to a concurrent transaction, in addition to queuing the
> invalidations as a change, we also copy the distributed invalidations
> along with the original transaction's invalidations via repalloc in
> ReorderBufferAddInvalidations. So, when there are many in-progress
> transactions, each would try to copy all its accumulated invalidations
> to the remaining in-progress transactions. This could lead to such an
> increase in allocation request size. However, after queuing the
> change, we don't need to copy it along with the original transaction's
> invalidations. This is because the copy is only required when we don't
> process any changes in cases like ReorderBufferForget(). I have
> analyzed all such cases, and my analysis is as follows:

Based on the analysis, I created a PoC which avoids the repalloc().
Invalidation messages distributed by SnapBuildDistributeSnapshotAndInval() are
skipped to add in the list, just queued - repalloc can be skipped. Also, the function
distributes messages only in the list, so received messages won't be sent again.

Now a patch for PG17 is created for testing purpose. Duncan, can you apply this and
confirms whether the issue can be solved?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment	Content-Type	Size
PG17-0001-Avoid-distributing-invalidation-messages-several-tim.patch	application/octet-stream	5.0 KB

From:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-21 13:27:31
Message-ID:	75edd9d3-d861-4dd0-b190-180cc034ceba@deepbluecap.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

> Based on the analysis, I created a PoC which avoids the repalloc().
> Invalidation messages distributed by SnapBuildDistributeSnapshotAndInval() are
> skipped to add in the list, just queued - repalloc can be skipped. Also, the function
> distributes messages only in the list, so received messages won't be sent again.
>
> Now a patch for PG17 is created for testing purpose. Duncan, can you apply this and
> confirms whether the issue can be solved?

Thanks Hayato Kuroda, will do, however it may take a few days.

Best wishes, Duncan.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-21 18:24:14
Message-ID:	CAD21AoCjpj-DoASrhKCiEKuxh=0q3rC1RnAeugEwEJ=xwk=n6g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Wed, May 21, 2025 at 4:12 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, May 21, 2025 at 11:18 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, May 19, 2025 at 8:08 PM Duncan Sands
> > <duncan(dot)sands(at)deepbluecap(dot)com> wrote:
> > >
> > > While it is long, it doesn't seem to merit allocating anything like 1GB of
> > > memory. So I'm guessing that postgres is miscalculating the required size somehow.
> > >
> >
> > We fixed a bug in commit 4909b38af0 to distribute invalidation at the
> > transaction end to avoid data loss in certain cases, which could cause
> > such a problem. I am wondering that even prior to that commit, we
> > would eventually end up allocating the required memory for a
> > transaction for all the invalidations because of repalloc in
> > ReorderBufferAddInvalidations, so why it matter with this commit? One
> > possibility is that we need allocations for multiple in-progress
> > transactions now.
> >
>
> I think the problem here is that when we are distributing
> invalidations to a concurrent transaction, in addition to queuing the
> invalidations as a change, we also copy the distributed invalidations
> along with the original transaction's invalidations via repalloc in
> ReorderBufferAddInvalidations. So, when there are many in-progress
> transactions, each would try to copy all its accumulated invalidations
> to the remaining in-progress transactions. This could lead to such an
> increase in allocation request size.

I agree with this analysis.

> However, after queuing the
> change, we don't need to copy it along with the original transaction's
> invalidations. This is because the copy is only required when we don't
> process any changes in cases like ReorderBufferForget().

It seems that we use the accumulated invalidation message also after
replaying or concurrently aborting a transaction via
ReorderBufferExecuteInvalidations(). Do we need to consider such cases
too?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-22 10:56:55
Message-ID:	CAA4eK1LVcHD6VpH7m=8F9ru7Qdj9hsXE0gXtS93RmtTxf6L1NA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Wed, May 21, 2025 at 11:54 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, May 21, 2025 at 4:12 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
>
> > However, after queuing the
> > change, we don't need to copy it along with the original transaction's
> > invalidations. This is because the copy is only required when we don't
> > process any changes in cases like ReorderBufferForget().
>
> It seems that we use the accumulated invalidation message also after
> replaying or concurrently aborting a transaction via
> ReorderBufferExecuteInvalidations(). Do we need to consider such cases
> too?
>

Good point. After replaying the transaction, it doesn't matter because
we would have already relayed the required invalidation while
processing REORDER_BUFFER_CHANGE_INVALIDATION messages. However for
concurrent abort case it could matter. See my analysis for the same
below:

Simulation of concurrent abort
------------------------------------------
1) S1: CREATE TABLE d(data text not null);
2) S1: INSERT INTO d VALUES('d1');
3) S2: BEGIN; INSERT INTO d VALUES('d2');
4) S2: INSERT INTO unrelated_tab VALUES(1);
5) S1: ALTER PUBLICATION pb ADD TABLE d;
6) S2: INSERT INTO unrelated_tab VALUES(2);
7) S2: ROLLBACK;
8) S2: INSERT INTO d VALUES('d3');
9) S1: INSERT INTO d VALUES('d4');

The problem with the sequence is that the insert from 3) could be
decoded *after* 5) in step 6) due to streaming and that to decode the
insert (which happened before the ALTER) the catalog snapshot and
cache state is from *before* the ALTER TABLE. Because the transaction
started in 3) doesn't actually modify any catalogs, no invalidations
are executed after decoding it. Now, assume, while decoding Insert
from 4), we detected a concurrent abort, then the distributed
invalidation won't be executed, and if we don't have accumulated
messages in txn->invalidations, then the invalidation from step 5)
won't be performed. The data loss can occur in steps 8 and 9. This is
just a theory, so I could be missing something.

If the above turns out to be a problem, one idea for fixing it is that
for the concurrent abort case (both during streaming and for prepared
transaction's processing), we still check all the remaining changes
and process only the changes related to invalidations. This has to be
done before the current txn changes are freed via
ReorderBufferResetTXN->ReorderBufferTruncateTXN.

Thoughts?

--
With Regards,
Amit Kapila.

From:	Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-22 12:23:47
Message-ID:	CANhcyEW8UyMr_7idB580DT3bjtB=EKiHwecTx5KC3ggiVs9c+A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Wed, 21 May 2025 at 17:18, Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear hackers,
>
> > I think the problem here is that when we are distributing
> > invalidations to a concurrent transaction, in addition to queuing the
> > invalidations as a change, we also copy the distributed invalidations
> > along with the original transaction's invalidations via repalloc in
> > ReorderBufferAddInvalidations. So, when there are many in-progress
> > transactions, each would try to copy all its accumulated invalidations
> > to the remaining in-progress transactions. This could lead to such an
> > increase in allocation request size. However, after queuing the
> > change, we don't need to copy it along with the original transaction's
> > invalidations. This is because the copy is only required when we don't
> > process any changes in cases like ReorderBufferForget(). I have
> > analyzed all such cases, and my analysis is as follows:
>
> Based on the analysis, I created a PoC which avoids the repalloc().
> Invalidation messages distributed by SnapBuildDistributeSnapshotAndInval() are
> skipped to add in the list, just queued - repalloc can be skipped. Also, the function
> distributes messages only in the list, so received messages won't be sent again.
>
> Now a patch for PG17 is created for testing purpose. Duncan, can you apply this and
> confirms whether the issue can be solved?
>
Hi,

I was able to reproduce the issue with following test:

1. First begin 9 concurrent txn. (BEGIN; INSERT into t1 values(11);)
2. In 10th concurrent txn : perform 1000 DDL (ALTER PUBLICATION ADD/DROP TABLE)
3. For each concurrent 9 txn. Perform:
i. Add 1000 DDL
ii. COMMIT;
iii. BEGIN; INSERT into t1 values(11);
4. Perform step (2 and 3) in loop

This steps reproduced the error:
2025-05-22 19:03:35.111 JST [63150] sub1 ERROR: invalid memory alloc
request size 1555752832
2025-05-22 19:03:35.111 JST [63150] sub1 STATEMENT: START_REPLICATION
SLOT "sub1" LOGICAL 0/0 (proto_version '4', streaming 'parallel',
origin 'any', publication_names '"pub1"')

I have also attached the test script for the same.
Also, I tried to run the test with Kuroda-san's patch and it did not
reproduce the issue.

Thanks and Regards,
Shlok Kyal

Attachment	Content-Type	Size
036_test.pl	application/octet-stream	6.1 KB

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-22 12:59:52
Message-ID:	OSCPR01MB149669E1CAFE63051244F8E35F599A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Amit, Sawada-san,

> Good point. After replaying the transaction, it doesn't matter because
> we would have already relayed the required invalidation while
> processing REORDER_BUFFER_CHANGE_INVALIDATION messages. However
> for
> concurrent abort case it could matter. See my analysis for the same
> below:
>
> Simulation of concurrent abort
> ------------------------------------------
> 1) S1: CREATE TABLE d(data text not null);
> 2) S1: INSERT INTO d VALUES('d1');
> 3) S2: BEGIN; INSERT INTO d VALUES('d2');
> 4) S2: INSERT INTO unrelated_tab VALUES(1);
> 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> 6) S2: INSERT INTO unrelated_tab VALUES(2);
> 7) S2: ROLLBACK;
> 8) S2: INSERT INTO d VALUES('d3');
> 9) S1: INSERT INTO d VALUES('d4');

> The problem with the sequence is that the insert from 3) could be
> decoded *after* 5) in step 6) due to streaming and that to decode the
> insert (which happened before the ALTER) the catalog snapshot and
> cache state is from *before* the ALTER TABLE. Because the transaction
> started in 3) doesn't actually modify any catalogs, no invalidations
> are executed after decoding it. Now, assume, while decoding Insert
> from 4), we detected a concurrent abort, then the distributed
> invalidation won't be executed, and if we don't have accumulated
> messages in txn->invalidations, then the invalidation from step 5)
> won't be performed. The data loss can occur in steps 8 and 9. This is
> just a theory, so I could be missing something.

I verified this is real or not, and succeeded to reproduce. See appendix the
detailed steps.

> If the above turns out to be a problem, one idea for fixing it is that
> for the concurrent abort case (both during streaming and for prepared
> transaction's processing), we still check all the remaining changes
> and process only the changes related to invalidations. This has to be
> done before the current txn changes are freed via
> ReorderBufferResetTXN->ReorderBufferTruncateTXN.

I roughly implemented the part, PSA the updated version. One concern is whether we
should consider the case that invalidations can cause ereport(ERROR). If happens,
the walsender will exit at that time.

Appendix - reproducer
==============
Only an instance was used in the test. Defined objects were:
```
CREATE TABLE d(data text not null);
CREATE TABLE unrelated_tab(data text not null);
CREATE PUBLICATION pb;
```

Then, pg_recvlogical was used to allow replicating changes. Actual command:
```
$ pg_recvlogical --plugin=pgoutput --create-slot --start --slot test -U postgres
-d postgres -o proto_version=4 -o publication_names=pb -o messages=true
-o streaming=true -f -
```

Below are the actual steps. Gdb debugger was used to synchronize tests.

0. Prepare two sessions S1, and S2, and one replication connection
1. Ran "INSERT INTO d VALUES('d1');" on S1
2. Ran "BEGIN; INSERT INTO d VALUES('d2');"
3. Ran "INSERT INTO unrelated_tab VALUES('d2');"
4. Ran "ALTER PUBLICATION pb ADD TABLE d;"
5. Attached the walsender process via gdb
6. set a breakpoint at HandleConcurrentAbort
7. Ran INSERT INTO unrelated_tab VALUES(generate_series(1, 5000));
This allows to stream changes in S2.
8. Confrimed that gdb stopped the walsender process.
9. Ran continue comamnd in gdb several times, to ensure the process
accesses to "unrelated_tab". On my env, backtrace at that time was [1].
10. Ran "ROLLBACK" in S2
11. On gdb session, moved forward the program and ensured that the concurrent_abort
error was raised.
12. gdb detached from the walsender
13. Ran "INSERT INTO d VALUES('d3');" on S2
14. Ran "INSERT INTO d VALUES('d4');" on S1.
15. Checked the output from pg_recvlogical, and confirmed d3 and d4 were not output [2]

[1]
```
Breakpoint 1, HandleConcurrentAbort () at ../postgres/src/backend/access/index/genam.c:484
484 if (TransactionIdIsValid(CheckXidAlive) &&
(gdb) bt
#0 HandleConcurrentAbort () at ../postgres/src/backend/access/index/genam.c:484
#1 0x000000000052628a in systable_getnext (sysscan=0x31bcbd0)
at ../postgres/src/backend/access/index/genam.c:545
#2 0x0000000000b37afa in SearchCatCacheMiss (cache=0x3107180, nkeys=1, hashValue=2617776010,
hashIndex=10, v1=16389, v2=0, v3=0, v4=0)
at ../postgres/src/backend/utils/cache/catcache.c:1544
#3 0x0000000000b379a3 in SearchCatCacheInternal (cache=0x3107180, nkeys=1, v1=16389, v2=0, v3=0,
v4=0) at ../postgres/src/backend/utils/cache/catcache.c:1464
#4 0x0000000000b3769a in SearchCatCache1 (cache=0x3107180, v1=16389)
at ../postgres/src/backend/utils/cache/catcache.c:1332
#5 0x0000000000b544d5 in SearchSysCache1 (cacheId=55, key1=16389)
at ../postgres/src/backend/utils/cache/syscache.c:228
#6 0x0000000000b3e62a in get_rel_namespace (relid=16389)
at ../postgres/src/backend/utils/cache/lsyscache.c:1956
#7 0x00007fa3fdb1e0ec in get_rel_sync_entry (data=0x3160108, relation=0x7fa3fd06f398)
at ../postgres/src/backend/replication/pgoutput/pgoutput.c:2037
#8 0x00007fa3fdb1d126 in pgoutput_change (ctx=0x315fd90, txn=0x31b8aa0, relation=0x7fa3fd06f398,
change=0x31bab18) at ../postgres/src/backend/replication/pgoutput/pgoutput.c:1455
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) f 8
#8 0x00007fa3fdb1d126 in pgoutput_change (ctx=0x315fd90, txn=0x31b8aa0, relation=0x7fa3fd06f398,
change=0x31bab18) at ../postgres/src/backend/replication/pgoutput/pgoutput.c:1455
1455 relentry = get_rel_sync_entry(data, relation);
(gdb) p relation->rd_rel.relname
$2 = {data = "unrelated_tab", '\000' <repeats 50 times>}
```
[2]:
```
$ pg_recvlogical --plugin=pgoutput --create-slot --start --slot test -U postgres
-d postgres -o proto_version=4 -o publication_names=pb -o messages=true
-o streaming=true -f -
S
E
A
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment	Content-Type	Size
v2-PG17-0001-Avoid-distributing-invalidation-messages-sev.patch	application/octet-stream	7.1 KB

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-23 03:00:35
Message-ID:	CAA4eK1L7CA-A=VMn8fiugZ+CRt+wz473Adrx3nxq8Ougu=O2kQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, May 22, 2025 at 6:29 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Amit, Sawada-san,
>
> > Good point. After replaying the transaction, it doesn't matter because
> > we would have already relayed the required invalidation while
> > processing REORDER_BUFFER_CHANGE_INVALIDATION messages. However
> > for
> > concurrent abort case it could matter. See my analysis for the same
> > below:
> >
> > Simulation of concurrent abort
> > ------------------------------------------
> > 1) S1: CREATE TABLE d(data text not null);
> > 2) S1: INSERT INTO d VALUES('d1');
> > 3) S2: BEGIN; INSERT INTO d VALUES('d2');
> > 4) S2: INSERT INTO unrelated_tab VALUES(1);
> > 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> > 6) S2: INSERT INTO unrelated_tab VALUES(2);
> > 7) S2: ROLLBACK;
> > 8) S2: INSERT INTO d VALUES('d3');
> > 9) S1: INSERT INTO d VALUES('d4');
>
> > The problem with the sequence is that the insert from 3) could be
> > decoded *after* 5) in step 6) due to streaming and that to decode the
> > insert (which happened before the ALTER) the catalog snapshot and
> > cache state is from *before* the ALTER TABLE. Because the transaction
> > started in 3) doesn't actually modify any catalogs, no invalidations
> > are executed after decoding it. Now, assume, while decoding Insert
> > from 4), we detected a concurrent abort, then the distributed
> > invalidation won't be executed, and if we don't have accumulated
> > messages in txn->invalidations, then the invalidation from step 5)
> > won't be performed. The data loss can occur in steps 8 and 9. This is
> > just a theory, so I could be missing something.
>
> I verified this is real or not, and succeeded to reproduce. See appendix the
> detailed steps.
>
> > If the above turns out to be a problem, one idea for fixing it is that
> > for the concurrent abort case (both during streaming and for prepared
> > transaction's processing), we still check all the remaining changes
> > and process only the changes related to invalidations. This has to be
> > done before the current txn changes are freed via
> > ReorderBufferResetTXN->ReorderBufferTruncateTXN.
>
> I roughly implemented the part, PSA the updated version. One concern is whether we
> should consider the case that invalidations can cause ereport(ERROR). If happens,
> the walsender will exit at that time.
>

But, in the catch part, we are already executing invalidations:
...
/* make sure there's no cache pollution */
ReorderBufferExecuteInvalidations(txn->ninvalidations, txn->invalidations);
...

So, the behaviour should be the same.

--
With Regards,
Amit Kapila.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-23 17:31:54
Message-ID:	CAD21AoCeM2nni1P7Z6KXzLM=6zCdShC82sOvuvu0_hBuJkm9Qw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, May 22, 2025 at 3:57 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, May 21, 2025 at 11:54 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, May 21, 2025 at 4:12 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> >
> > > However, after queuing the
> > > change, we don't need to copy it along with the original transaction's
> > > invalidations. This is because the copy is only required when we don't
> > > process any changes in cases like ReorderBufferForget().
> >
> > It seems that we use the accumulated invalidation message also after
> > replaying or concurrently aborting a transaction via
> > ReorderBufferExecuteInvalidations(). Do we need to consider such cases
> > too?
> >
>
> Good point. After replaying the transaction, it doesn't matter because
> we would have already relayed the required invalidation while
> processing REORDER_BUFFER_CHANGE_INVALIDATION messages.

I think the reason why we execute all invalidation messages even in
non concurrent abort cases is that we need to invalidate all caches as
well that are loaded during the replay. Consider the following
sequences:

1) S1: CREATE TABLE d (data text not null);
2) S1: INSERT INTO d VALUES ('d1');
3) S2: BEGIN; INSERT INTO d VALUES ('d2');
4) S3: BEGIN; INSERT INTO d VALUES ('d3');
5) S1: ALTER PUBLICATION pb ADD TABLE d;
6) S2: INSERT INTO d VALUES ('d4');
7) S2: COMMIT;
8) S3: COMMIT;
9) S2: INSERT INTO d VALUES('d5');
10) S1: INSERT INTO d VALUES ('d6');

When replaying S2's first transaction at 7), we decode the insert from
3) using the snapshot which is from before the ALTER, creating the
cache for table 'd'. Then we invalidate the cache by the inval message
distributed from S1's the ALTER and then build the relcache again when
decoding the insert from 6). The cache is the state after the ALTER.
When replaying S3's transaction at 8), we should decode the insert
from 4) using the snapshot which is from before the ALTER. Since we
call ReorderBufferExecuteInvalidations() also in non concurrent abort
paths, we can invalidate the relcache built when decoding the insert
from 6). If we don't include the inval message distributed from 5) to
txn->invalidations, we don't invalidate the relcache and end up
sending the insert from 4) even though it happened before the ALTER.

> However for
> concurrent abort case it could matter. See my analysis for the same
> below:
>
> Simulation of concurrent abort
> ------------------------------------------
> 1) S1: CREATE TABLE d(data text not null);
> 2) S1: INSERT INTO d VALUES('d1');
> 3) S2: BEGIN; INSERT INTO d VALUES('d2');
> 4) S2: INSERT INTO unrelated_tab VALUES(1);
> 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> 6) S2: INSERT INTO unrelated_tab VALUES(2);
> 7) S2: ROLLBACK;
> 8) S2: INSERT INTO d VALUES('d3');
> 9) S1: INSERT INTO d VALUES('d4');
>
> The problem with the sequence is that the insert from 3) could be
> decoded *after* 5) in step 6) due to streaming and that to decode the
> insert (which happened before the ALTER) the catalog snapshot and
> cache state is from *before* the ALTER TABLE. Because the transaction
> started in 3) doesn't actually modify any catalogs, no invalidations
> are executed after decoding it. Now, assume, while decoding Insert
> from 4), we detected a concurrent abort, then the distributed
> invalidation won't be executed, and if we don't have accumulated
> messages in txn->invalidations, then the invalidation from step 5)
> won't be performed. The data loss can occur in steps 8 and 9. This is
> just a theory, so I could be missing something.

This scenario makes sense to me. I agree that this turns out to be a problem.

>
> If the above turns out to be a problem, one idea for fixing it is that
> for the concurrent abort case (both during streaming and for prepared
> transaction's processing), we still check all the remaining changes
> and process only the changes related to invalidations. This has to be
> done before the current txn changes are freed via
> ReorderBufferResetTXN->ReorderBufferTruncateTXN.
>
> Thoughts?

If the above hypothesis is true, we need to consider another idea so
that we can execute invalidation messages in both cases.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-24 15:42:30
Message-ID:	6bc28291-b212-4a84-925a-e6e5ef2fb72c@deepbluecap.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Hayato Kuroda, thank you so much for working on this problem. Your patch
PG17-0001-Avoid-distributing-invalidation-messages-several-tim.patch solves the
issue for me. Without it I get an invalid memory alloc request error within
about twenty minutes. With your patch, 24 hours have passed with no errors.

Best wishes, Duncan.

On 21/05/2025 13:48, Hayato Kuroda (Fujitsu) wrote:
> Dear hackers,
>
>> I think the problem here is that when we are distributing
>> invalidations to a concurrent transaction, in addition to queuing the
>> invalidations as a change, we also copy the distributed invalidations
>> along with the original transaction's invalidations via repalloc in
>> ReorderBufferAddInvalidations. So, when there are many in-progress
>> transactions, each would try to copy all its accumulated invalidations
>> to the remaining in-progress transactions. This could lead to such an
>> increase in allocation request size. However, after queuing the
>> change, we don't need to copy it along with the original transaction's
>> invalidations. This is because the copy is only required when we don't
>> process any changes in cases like ReorderBufferForget(). I have
>> analyzed all such cases, and my analysis is as follows:
>
> Based on the analysis, I created a PoC which avoids the repalloc().
> Invalidation messages distributed by SnapBuildDistributeSnapshotAndInval() are
> skipped to add in the list, just queued - repalloc can be skipped. Also, the function
> distributes messages only in the list, so received messages won't be sent again.
>
> Now a patch for PG17 is created for testing purpose. Duncan, can you apply this and
> confirms whether the issue can be solved?
>
> Best regards,
> Hayato Kuroda
> FUJITSU LIMITED
>

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-25 04:55:16
Message-ID:	OSCPR01MB149668F3F198C888028E76A44F59AA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Sawada-san,

> I think the reason why we execute all invalidation messages even in
> non concurrent abort cases is that we need to invalidate all caches as
> well that are loaded during the replay. Consider the following
> sequences:
>
> 1) S1: CREATE TABLE d (data text not null);
> 2) S1: INSERT INTO d VALUES ('d1');
> 3) S2: BEGIN; INSERT INTO d VALUES ('d2');
> 4) S3: BEGIN; INSERT INTO d VALUES ('d3');
> 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> 6) S2: INSERT INTO d VALUES ('d4');
> 7) S2: COMMIT;
> 8) S3: COMMIT;
> 9) S2: INSERT INTO d VALUES('d5');
> 10) S1: INSERT INTO d VALUES ('d6');
>
> When replaying S2's first transaction at 7), we decode the insert from
> 3) using the snapshot which is from before the ALTER, creating the
> cache for table 'd'. Then we invalidate the cache by the inval message
> distributed from S1's the ALTER and then build the relcache again when
> decoding the insert from 6). The cache is the state after the ALTER.
> When replaying S3's transaction at 8), we should decode the insert
> from 4) using the snapshot which is from before the ALTER. Since we
> call ReorderBufferExecuteInvalidations() also in non concurrent abort
> paths, we can invalidate the relcache built when decoding the insert
> from 6). If we don't include the inval message distributed from 5) to
> txn->invalidations, we don't invalidate the relcache and end up
> sending the insert from 4) even though it happened before the ALTER.

Thanks for giving another scenario. Let me test this workload and share result later.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-26 09:22:30
Message-ID:	OS7PR01MB14968B3C263074A2DEB77DB58F565A@OS7PR01MB14968.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Sawada-san,

You're right. I tested the workload on the latest PG17 and PoC, and confirmed that
PoC replicated d3 tuple, which is not good.

> If the above hypothesis is true, we need to consider another idea so
> that we can execute invalidation messages in both cases.

The straightforward fix is to check the change queue as well when the transaction
has invalidation messages. 0003 implemented that. One downside is that traversing
changes can affect performance. Currently we iterates all of changes even a
single REORDER_BUFFER_CHANGE_INVALIDATION. I cannot find better solutions for now.
Thought?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment	Content-Type	Size
v3-PG17-0001-Avoid-distributing-invalidation-messages-sev.patch	application/octet-stream	8.5 KB

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-26 11:18:56
Message-ID:	CAA4eK1+PGTdkTJwK8Necq8GrAJWP4T8cp_ponzbtogkFYvbK2w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Mon, May 26, 2025 at 2:52 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> > If the above hypothesis is true, we need to consider another idea so
> > that we can execute invalidation messages in both cases.
>
> The straightforward fix is to check the change queue as well when the transaction
> has invalidation messages. 0003 implemented that. One downside is that traversing
> changes can affect performance. Currently we iterates all of changes even a
> single REORDER_BUFFER_CHANGE_INVALIDATION. I cannot find better solutions for now.
>

It can impact the performance for large transactions with fewer
invalidations, especially the ones which has spilled changes because
it needs to traverse the entire list of changes again at the end. The
other idea would be to add new member(s) in ReorderBufferTXN to
receive distributed invalidations. For adding the new member in
ReorderBufferTXN: (a) in HEAD, it should be okay, (b) for
backbranches, we may be able to add at the end, but we should check if
there are any extensions using sizeof(ReorderBufferTxn) and if they
are using what we need to do.

I think the new member could be similar to existing members (uint32
ninvalidations; and SharedInvalidationMessage *invalidations;) or a
separate change queue of only REORDER_BUFFER_CHANGE_INVALIDATION
messages. The second one is worth considering because multiple
transactions can distribute their invalidations to a single txn in
chunks, which can be stored as separate changes, and the other benefit
of the second one is lower risk of the need for a larger chunk of
memory allocation due to repalloc. Also, it would be easy to consider
its size via ReorderBufferChangeMemoryUpdate.

--
With Regards,
Amit Kapila.

From:	Christoph Berg <myon(at)debian(dot)org>
To:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-26 11:47:54
Message-ID:	aDRU6kxWoq015CbH@msg.df7cb.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Re: Duncan Sands
> PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel
> 6.8.0); x86-64

Fwiw, one more field report from Debian:

> We had a severe issue after upgrading postgresql-16 from 16.8-1.pgdg110+1 to 16.9-1.pgdg110+1. The following error happened quickly on a logical replication (sorry for the French).
>
> ERREUR: n'a pas pu recevoir des données du flux de WAL : ERROR: invalid memory alloc request size 1196912896
>
> After trying several things, we finally managed to restart the replication after downgrading the pg-related packages on this host alone (the other servers were not impacted).

https://salsa.debian.org/postgresql/postgresql/-/issues/4

(no extra details in there, though)

Christoph

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-27 21:05:17
Message-ID:	CAD21AoBCn7RR0EYbK+1n5UTksc3CVn5AKvxBRSr7zR2eWqTTOw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Mon, May 26, 2025 at 4:19 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, May 26, 2025 at 2:52 PM Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > > If the above hypothesis is true, we need to consider another idea so
> > > that we can execute invalidation messages in both cases.
> >
> > The straightforward fix is to check the change queue as well when the transaction
> > has invalidation messages. 0003 implemented that. One downside is that traversing
> > changes can affect performance. Currently we iterates all of changes even a
> > single REORDER_BUFFER_CHANGE_INVALIDATION. I cannot find better solutions for now.
> >
>
> It can impact the performance for large transactions with fewer
> invalidations, especially the ones which has spilled changes because
> it needs to traverse the entire list of changes again at the end.

Agreed.

> The
> other idea would be to add new member(s) in ReorderBufferTXN to
> receive distributed invalidations. For adding the new member in
> ReorderBufferTXN: (a) in HEAD, it should be okay, (b) for
> backbranches, we may be able to add at the end, but we should check if
> there are any extensions using sizeof(ReorderBufferTxn) and if they
> are using what we need to do.

If we can make sure that that change won't break the existing
extensions, I think this would be the most reasonable solution.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-28 12:27:51
Message-ID:	OSCPR01MB149667B316377CE0615E15138F567A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Sawada-san, Amit,

> > It can impact the performance for large transactions with fewer
> > invalidations, especially the ones which has spilled changes because
> > it needs to traverse the entire list of changes again at the end.
>
> Agreed.
>
> > The
> > other idea would be to add new member(s) in ReorderBufferTXN to
> > receive distributed invalidations. For adding the new member in
> > ReorderBufferTXN: (a) in HEAD, it should be okay, (b) for
> > backbranches, we may be able to add at the end, but we should check if
> > there are any extensions using sizeof(ReorderBufferTxn) and if they
> > are using what we need to do.
>
> If we can make sure that that change won't break the existing
> extensions, I think this would be the most reasonable solution.

Based on the discussion, I created PoC for master/PG17. Please see attached.
The basic idea is to introduce the new queue which only contains distributed inval
messages. Contents are consumed at end of transactions. I feel some of codes can
be re-used so that internal functions are introduced. At least, it could pass
regression tests and workloads discussed here.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment	Content-Type	Size
v4-PG17-0001-Avoid-distributing-invalidation-messages-sev.patch	application/octet-stream	21.2 KB
v4-master-0001-Avoid-distributing-invalidation-messages-s.patch	application/octet-stream	21.2 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-28 18:51:39
Message-ID:	CAD21AoCUJ=hvM=VDcH-Po=_spfyDwenniXEcgkiZU2xc4FJdJQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

What if we remember all executed REORDER_BUFFER_CHANGE_INVALIDATION in
a queue while replaying the transaction so that we can execute them at
the end in a non-error path, instead of re-traversing the entire list
of changes to execute the inval messages? As for concurrent abort
paths, probably we can consider re-traversing the entire list,
unconditionally invalidating all caches (using
InvalidateSystemCaches()), or somehow traversing the list of changes
only when there might be any REORDER_BUFFER_CHANGE_INVALIDATION in the
rest of changes?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-29 05:20:06
Message-ID:	CAA4eK1LqTcH5vYrCUFn=NUHv_9eYrZS6piLA3UUKD4=39xB=fQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, May 29, 2025 at 12:22 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, May 26, 2025 at 4:19 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, May 26, 2025 at 2:52 PM Hayato Kuroda (Fujitsu)
> > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > >
> > > > If the above hypothesis is true, we need to consider another idea so
> > > > that we can execute invalidation messages in both cases.
> > >
> > > The straightforward fix is to check the change queue as well when the transaction
> > > has invalidation messages. 0003 implemented that. One downside is that traversing
> > > changes can affect performance. Currently we iterates all of changes even a
> > > single REORDER_BUFFER_CHANGE_INVALIDATION. I cannot find better solutions for now.
> > >
> >
> > It can impact the performance for large transactions with fewer
> > invalidations, especially the ones which has spilled changes because
> > it needs to traverse the entire list of changes again at the end.
>
> What if we remember all executed REORDER_BUFFER_CHANGE_INVALIDATION in
> a queue while replaying the transaction so that we can execute them at
> the end in a non-error path, instead of re-traversing the entire list
> of changes to execute the inval messages?
>

The current proposed patch (v4) is also traversing only the required
inval messages, as it has maintained a separate queue for that. So,
what will be the advantage of forming such a queue during the
processing of changes? Are you imagining a local instead of a queue at
ReorderBufferTXN level? I feel we still need at ReorderBufferTXN level
to ensure that we can execute those changes across streaming blocks,
otherwise, the cleanup of such a local queue would be tricky and add
to maintenance effort.

One disadvantage of the approach you suggest is that the changes in
the new queue won't be accounted for in logical_decoding_work_mem
computation, which can be done in the proposed approach, although the
patch hasn't implemented it as of now.

A few comments on v4:
===================
1.
+static void
+ReorderBufferExecuteInvalidationsInQueue(ReorderBuffer *rb,
ReorderBufferTXN *txn)
+{
...
...
+ /* Skip other changes because the transaction was aborted */
+ case REORDER_BUFFER_CHANGE_INSERT:
+ case REORDER_BUFFER_CHANGE_UPDATE:
+ case REORDER_BUFFER_CHANGE_DELETE:
+ case REORDER_BUFFER_CHANGE_MESSAGE:
+ case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
+ case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
+ case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
+ case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT:
+ case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
+ case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
+ case REORDER_BUFFER_CHANGE_TRUNCATE:
+ break;

This cases should never happen, so we should have either an Assert or
an elog here.

2.
+void
+ReorderBufferAddInvalidationsExtended(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn, Size nmsgs,
+ SharedInvalidationMessage *msgs,
+ bool needs_distribute)
{
...
+ * XXX: IIUC This must be done only to the toptxns, but is it right?
+ */
+ if (!needs_distribute && !TransactionIdIsValid(txn->toplevel_xid))
+ {
+ ReorderBufferChange *inval_change;
+
+ /* Duplicate the inval change to queue it */
+ inval_change = ReorderBufferAllocChange(rb);
+ inval_change->action = REORDER_BUFFER_CHANGE_INVALIDATION;

The name needs_distribute seems confusing because when this function
is invoked from SnapBuildDistributeSnapshotAndInval(), the parameter
is passed as false; it should be true in that case, and the meaning of
this parameter in the function should be reversed.

--
With Regards,
Amit Kapila.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-29 06:07:10
Message-ID:	CAD21AoAAmGhA6NCWq=P-fzOxi4WZQCoN0y2_UOug+CMJL=YZPA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Wed, May 28, 2025 at 10:20 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, May 29, 2025 at 12:22 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, May 26, 2025 at 4:19 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Mon, May 26, 2025 at 2:52 PM Hayato Kuroda (Fujitsu)
> > > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > > >
> > > > > If the above hypothesis is true, we need to consider another idea so
> > > > > that we can execute invalidation messages in both cases.
> > > >
> > > > The straightforward fix is to check the change queue as well when the transaction
> > > > has invalidation messages. 0003 implemented that. One downside is that traversing
> > > > changes can affect performance. Currently we iterates all of changes even a
> > > > single REORDER_BUFFER_CHANGE_INVALIDATION. I cannot find better solutions for now.
> > > >
> > >
> > > It can impact the performance for large transactions with fewer
> > > invalidations, especially the ones which has spilled changes because
> > > it needs to traverse the entire list of changes again at the end.
> >
> > What if we remember all executed REORDER_BUFFER_CHANGE_INVALIDATION in
> > a queue while replaying the transaction so that we can execute them at
> > the end in a non-error path, instead of re-traversing the entire list
> > of changes to execute the inval messages?
> >
>
> The current proposed patch (v4) is also traversing only the required
> inval messages, as it has maintained a separate queue for that. So,
> what will be the advantage of forming such a queue during the
> processing of changes? Are you imagining a local instead of a queue at
> ReorderBufferTXN level? I feel we still need at ReorderBufferTXN level
> to ensure that we can execute those changes across streaming blocks,
> otherwise, the cleanup of such a local queue would be tricky and add
> to maintenance effort.

Hmm, right. It seems that we keep accumulating inval messages across
streaming blocks.

> One disadvantage of the approach you suggest is that the changes in
> the new queue won't be accounted for in logical_decoding_work_mem
> computation, which can be done in the proposed approach, although the
> patch hasn't implemented it as of now.

If we serialize the new queue to the disk, we would need to restore
them in PG_CATCH() block in order to execute all inval messages, which
is something that I'd like to avoid as it would involve many
operations that could end up in an error.

If each ReorderBufferTXN has only non-distributed inval messages in
txn->invalidation and distribute only txn->invalidations to other
transactions, the scope of influence of a single Inval Message is
limited to transactions that are being decoded at the same time. How
much is there chance the size of txn->invalidations reach 1GB? Given
the size of SharedInvalidationMessage is 16 bytes, we need about 67k
inval messages generated across concurrent transactions.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-29 06:10:27
Message-ID:	OSCPR01MB149666911A5401ED7474A3845F566A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Sawada-san,

> What if we remember all executed REORDER_BUFFER_CHANGE_INVALIDATION
> in
> a queue while replaying the transaction so that we can execute them at
> the end in a non-error path, instead of re-traversing the entire list
> of changes to execute the inval messages?

I think this idea is similar with v4 patch [1]. It adds another queue which stores the inval
messages when they are being distributed, and the inval queue is consumed while committing.

> As for concurrent abort
> paths, probably we can consider re-traversing the entire list,
> unconditionally invalidating all caches (using
> InvalidateSystemCaches()), or somehow traversing the list of changes
> only when there might be any REORDER_BUFFER_CHANGE_INVALIDATION in
> the
> rest of changes?

In my v4 patch, it is just OK to consume all inval messages in another queue,
because all needed messages are stored before we try to process txn.
Based on that, I feel v4 seems bit simpler approach.

[1]: https://www.postgresql.org/message-id/OSCPR01MB149667B316377CE0615E15138F567A%40OSCPR01MB14966.jpnprd01.prod.outlook.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-29 08:22:44
Message-ID:	CAA4eK1LYsYCx1SNMid-4EgfYvBtxMbVvWC4q7NM7rLfyPz5GzQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, May 29, 2025 at 11:37 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, May 28, 2025 at 10:20 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, May 29, 2025 at 12:22 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Mon, May 26, 2025 at 4:19 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, May 26, 2025 at 2:52 PM Hayato Kuroda (Fujitsu)
> > > > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > > > >
> > > > > > If the above hypothesis is true, we need to consider another idea so
> > > > > > that we can execute invalidation messages in both cases.
> > > > >
> > > > > The straightforward fix is to check the change queue as well when the transaction
> > > > > has invalidation messages. 0003 implemented that. One downside is that traversing
> > > > > changes can affect performance. Currently we iterates all of changes even a
> > > > > single REORDER_BUFFER_CHANGE_INVALIDATION. I cannot find better solutions for now.
> > > > >
> > > >
> > > > It can impact the performance for large transactions with fewer
> > > > invalidations, especially the ones which has spilled changes because
> > > > it needs to traverse the entire list of changes again at the end.
> > >
> > > What if we remember all executed REORDER_BUFFER_CHANGE_INVALIDATION in
> > > a queue while replaying the transaction so that we can execute them at
> > > the end in a non-error path, instead of re-traversing the entire list
> > > of changes to execute the inval messages?
> > >
> >
> > The current proposed patch (v4) is also traversing only the required
> > inval messages, as it has maintained a separate queue for that. So,
> > what will be the advantage of forming such a queue during the
> > processing of changes? Are you imagining a local instead of a queue at
> > ReorderBufferTXN level? I feel we still need at ReorderBufferTXN level
> > to ensure that we can execute those changes across streaming blocks,
> > otherwise, the cleanup of such a local queue would be tricky and add
> > to maintenance effort.
>
> Hmm, right. It seems that we keep accumulating inval messages across
> streaming blocks.
>
> > One disadvantage of the approach you suggest is that the changes in
> > the new queue won't be accounted for in logical_decoding_work_mem
> > computation, which can be done in the proposed approach, although the
> > patch hasn't implemented it as of now.
>
> If we serialize the new queue to the disk, we would need to restore
> them in PG_CATCH() block in order to execute all inval messages, which
> is something that I'd like to avoid as it would involve many
> operations that could end up in an error.
>
> If each ReorderBufferTXN has only non-distributed inval messages in
> txn->invalidation and distribute only txn->invalidations to other
> transactions, the scope of influence of a single Inval Message is
> limited to transactions that are being decoded at the same time. How
> much is there chance the size of txn->invalidations reach 1GB? Given
> the size of SharedInvalidationMessage is 16 bytes, we need about 67k
> inval messages generated across concurrent transactions.
>

I agree that chances are much lower than current if txn->invalidations
doesn't contain invalidations from other transactions, but it is not
clear what exactly you are trying to advocate by it. Are you trying to
advocate that we should maintain a member similar to txn->invalidation
(say txn->distributed_invals) instead of a queue?

--
With Regards,
Amit Kapila.

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-29 12:09:57
Message-ID:	OSCPR01MB149668136257B13A675CDD1BBF566A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Amit,

I created updated patch. I did some self-reviewing and updated.

This part is not implemented yet, because it is under discussion. Roughly considered,
we must consider several points - it might be complex:

1. What is the ordering of serialization? Should we prioritize spilling
invalidation messages, or normal changes?
2. What is the filename? Can we use same one as changes?
3. IIUC we shouldn't stream inval messages even when stream=on/parallel case.
Isn't it hacky?

> A few comments on v4:
> ===================
> 1.
> +static void
> +ReorderBufferExecuteInvalidationsInQueue(ReorderBuffer *rb,
> ReorderBufferTXN *txn)
> +{
> ...
> ...
> + /* Skip other changes because the transaction was aborted */
> + case REORDER_BUFFER_CHANGE_INSERT:
> + case REORDER_BUFFER_CHANGE_UPDATE:
> + case REORDER_BUFFER_CHANGE_DELETE:
> + case REORDER_BUFFER_CHANGE_MESSAGE:
> + case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
> + case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
> + case REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID:
> + case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT:
> + case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
> + case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT:
> + case REORDER_BUFFER_CHANGE_TRUNCATE:
> + break;
>
> This cases should never happen, so we should have either an Assert or
> an elog here.

Fixed. Switch was removed and Assert() was added.

> 2.
> +void
> +ReorderBufferAddInvalidationsExtended(ReorderBuffer *rb, TransactionId xid,
> + XLogRecPtr lsn, Size nmsgs,
> + SharedInvalidationMessage *msgs,
> + bool needs_distribute)
> {
> ...
> + * XXX: IIUC This must be done only to the toptxns, but is it right?
> + */
> + if (!needs_distribute && !TransactionIdIsValid(txn->toplevel_xid))
> + {
> + ReorderBufferChange *inval_change;
> +
> + /* Duplicate the inval change to queue it */
> + inval_change = ReorderBufferAllocChange(rb);
> + inval_change->action = REORDER_BUFFER_CHANGE_INVALIDATION;
>
> The name needs_distribute seems confusing because when this function
> is invoked from SnapBuildDistributeSnapshotAndInval(), the parameter
> is passed as false; it should be true in that case, and the meaning of
> this parameter in the function should be reversed.

Fixed.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment	Content-Type	Size
v5-master-0001-Fix-invalid-memory-alloc-request-size-issu.patch	application/octet-stream	22.2 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-29 17:26:32
Message-ID:	CAD21AoBDDGkHLryP1TJeh9QM3DB5ipLLThGfC6P_mk_bsxSA8A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, May 29, 2025 at 1:22 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, May 29, 2025 at 11:37 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, May 28, 2025 at 10:20 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, May 29, 2025 at 12:22 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, May 26, 2025 at 4:19 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Mon, May 26, 2025 at 2:52 PM Hayato Kuroda (Fujitsu)
> > > > > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > > > > >
> > > > > > > If the above hypothesis is true, we need to consider another idea so
> > > > > > > that we can execute invalidation messages in both cases.
> > > > > >
> > > > > > The straightforward fix is to check the change queue as well when the transaction
> > > > > > has invalidation messages. 0003 implemented that. One downside is that traversing
> > > > > > changes can affect performance. Currently we iterates all of changes even a
> > > > > > single REORDER_BUFFER_CHANGE_INVALIDATION. I cannot find better solutions for now.
> > > > > >
> > > > >
> > > > > It can impact the performance for large transactions with fewer
> > > > > invalidations, especially the ones which has spilled changes because
> > > > > it needs to traverse the entire list of changes again at the end.
> > > >
> > > > What if we remember all executed REORDER_BUFFER_CHANGE_INVALIDATION in
> > > > a queue while replaying the transaction so that we can execute them at
> > > > the end in a non-error path, instead of re-traversing the entire list
> > > > of changes to execute the inval messages?
> > > >
> > >
> > > The current proposed patch (v4) is also traversing only the required
> > > inval messages, as it has maintained a separate queue for that. So,
> > > what will be the advantage of forming such a queue during the
> > > processing of changes? Are you imagining a local instead of a queue at
> > > ReorderBufferTXN level? I feel we still need at ReorderBufferTXN level
> > > to ensure that we can execute those changes across streaming blocks,
> > > otherwise, the cleanup of such a local queue would be tricky and add
> > > to maintenance effort.
> >
> > Hmm, right. It seems that we keep accumulating inval messages across
> > streaming blocks.
> >
> > > One disadvantage of the approach you suggest is that the changes in
> > > the new queue won't be accounted for in logical_decoding_work_mem
> > > computation, which can be done in the proposed approach, although the
> > > patch hasn't implemented it as of now.
> >
> > If we serialize the new queue to the disk, we would need to restore
> > them in PG_CATCH() block in order to execute all inval messages, which
> > is something that I'd like to avoid as it would involve many
> > operations that could end up in an error.
> >
> > If each ReorderBufferTXN has only non-distributed inval messages in
> > txn->invalidation and distribute only txn->invalidations to other
> > transactions, the scope of influence of a single Inval Message is
> > limited to transactions that are being decoded at the same time. How
> > much is there chance the size of txn->invalidations reach 1GB? Given
> > the size of SharedInvalidationMessage is 16 bytes, we need about 67k
> > inval messages generated across concurrent transactions.
> >
>
> I agree that chances are much lower than current if txn->invalidations
> doesn't contain invalidations from other transactions, but it is not
> clear what exactly you are trying to advocate by it. Are you trying to
> advocate that we should maintain a member similar to txn->invalidation
> (say txn->distributed_invals) instead of a queue?

Yes, because I guess it's much simpler. I think it would not be a good
idea to introduce a new concept of accounting the memory usage of the
distributed inval messages too and serializing them, at least on back
branches. I think that In case where the txn->distriubted_inval is
about to overflow (not has to be 1GB) we can invalidate all caches
instread.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-30 04:01:19
Message-ID:	CALDaNm0f=LgfQiV3oM3_4LOZrdLSSw8-+kUFvYNmxt4PYZymxw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, 29 May 2025 at 22:57, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > I agree that chances are much lower than current if txn->invalidations
> > doesn't contain invalidations from other transactions, but it is not
> > clear what exactly you are trying to advocate by it. Are you trying to
> > advocate that we should maintain a member similar to txn->invalidation
> > (say txn->distributed_invals) instead of a queue?
>
> Yes, because I guess it's much simpler. I think it would not be a good
> idea to introduce a new concept of accounting the memory usage of the
> distributed inval messages too and serializing them, at least on back
> branches. I think that In case where the txn->distriubted_inval is
> about to overflow (not has to be 1GB) we can invalidate all caches
> instread.

To identify overflow scenarios, I’m considering the following options:
a) Introduce a new txn_flags value, such as RBTXN_INVAL_ALL_CACHE, to
explicitly mark transactions that require full cache invalidation.
b) Add a dedicated parameter to indicate an overflow scenario.
c) setting the newly added nentries_distr to -1, to indicate an
overflow scenario.

Do you have any preference or thoughts on which of these approaches
would be cleaner?

Regards,
Vignesh

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-30 04:42:16
Message-ID:	CAA4eK1J8_iazK=3G76Wq6pFe+LW6FYwBOmz82Y1JTpJ8Ca1YLg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Fri, May 30, 2025 at 9:31 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Thu, 29 May 2025 at 22:57, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > > I agree that chances are much lower than current if txn->invalidations
> > > doesn't contain invalidations from other transactions, but it is not
> > > clear what exactly you are trying to advocate by it. Are you trying to
> > > advocate that we should maintain a member similar to txn->invalidation
> > > (say txn->distributed_invals) instead of a queue?
> >
> > Yes, because I guess it's much simpler. I think it would not be a good
> > idea to introduce a new concept of accounting the memory usage of the
> > distributed inval messages too and serializing them, at least on back
> > branches. I think that In case where the txn->distriubted_inval is
> > about to overflow (not has to be 1GB) we can invalidate all caches
> > instread.
>

I agree that it would be simpler, and to avoid invalid memory
allocation requests even for rare cases, we can have the backup logic
to invalidate all caches.

> To identify overflow scenarios, I’m considering the following options:
> a) Introduce a new txn_flags value, such as RBTXN_INVAL_ALL_CACHE, to
> explicitly mark transactions that require full cache invalidation.
> b) Add a dedicated parameter to indicate an overflow scenario.
> c) setting the newly added nentries_distr to -1, to indicate an
> overflow scenario.
>
> Do you have any preference or thoughts on which of these approaches
> would be cleaner?
>

I would prefer (a) as that is an explicit way to indicate that we need
to invalidate all caches. But let us see if Sawada-san has something
else in mind.

--
With Regards,
Amit Kapila.

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-30 12:15:19
Message-ID:	CALDaNm1-RXLA_6y_o1=j5Borz9kGapyimN=6jDbx1nDCmQzjGQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Fri, 30 May 2025 at 10:12, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, May 30, 2025 at 9:31 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Thu, 29 May 2025 at 22:57, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > > I agree that chances are much lower than current if txn->invalidations
> > > > doesn't contain invalidations from other transactions, but it is not
> > > > clear what exactly you are trying to advocate by it. Are you trying to
> > > > advocate that we should maintain a member similar to txn->invalidation
> > > > (say txn->distributed_invals) instead of a queue?
> > >
> > > Yes, because I guess it's much simpler. I think it would not be a good
> > > idea to introduce a new concept of accounting the memory usage of the
> > > distributed inval messages too and serializing them, at least on back
> > > branches. I think that In case where the txn->distriubted_inval is
> > > about to overflow (not has to be 1GB) we can invalidate all caches
> > > instread.
> >
>
> I agree that it would be simpler, and to avoid invalid memory
> allocation requests even for rare cases, we can have the backup logic
> to invalidate all caches.
>
> > To identify overflow scenarios, I’m considering the following options:
> > a) Introduce a new txn_flags value, such as RBTXN_INVAL_ALL_CACHE, to
> > explicitly mark transactions that require full cache invalidation.
> > b) Add a dedicated parameter to indicate an overflow scenario.
> > c) setting the newly added nentries_distr to -1, to indicate an
> > overflow scenario.
> >
> > Do you have any preference or thoughts on which of these approaches
> > would be cleaner?
> >
>
> I would prefer (a) as that is an explicit way to indicate that we need
> to invalidate all caches. But let us see if Sawada-san has something
> else in mind.

The attached v6 version patch has the changes for the same.
Thoughts?

Regards,
Vignesh

Attachment	Content-Type	Size
v6-master-0001-Fix-exponential-memory-allocation-issue-in-logica.patch	text/x-patch	9.0 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-30 16:45:34
Message-ID:	CAD21AoDRYGx_6Mr5xUTiNDOWUGYajDXzxgLv3kCyxn9=yqbpcg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, May 29, 2025 at 9:42 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, May 30, 2025 at 9:31 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Thu, 29 May 2025 at 22:57, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > > I agree that chances are much lower than current if txn->invalidations
> > > > doesn't contain invalidations from other transactions, but it is not
> > > > clear what exactly you are trying to advocate by it. Are you trying to
> > > > advocate that we should maintain a member similar to txn->invalidation
> > > > (say txn->distributed_invals) instead of a queue?
> > >
> > > Yes, because I guess it's much simpler. I think it would not be a good
> > > idea to introduce a new concept of accounting the memory usage of the
> > > distributed inval messages too and serializing them, at least on back
> > > branches. I think that In case where the txn->distriubted_inval is
> > > about to overflow (not has to be 1GB) we can invalidate all caches
> > > instread.
> >
>
> I agree that it would be simpler, and to avoid invalid memory
> allocation requests even for rare cases, we can have the backup logic
> to invalidate all caches.
>
> > To identify overflow scenarios, I’m considering the following options:
> > a) Introduce a new txn_flags value, such as RBTXN_INVAL_ALL_CACHE, to
> > explicitly mark transactions that require full cache invalidation.
> > b) Add a dedicated parameter to indicate an overflow scenario.
> > c) setting the newly added nentries_distr to -1, to indicate an
> > overflow scenario.
> >
> > Do you have any preference or thoughts on which of these approaches
> > would be cleaner?
> >
>
> I would prefer (a) as that is an explicit way to indicate that we need
> to invalidate all caches. But let us see if Sawada-san has something
> else in mind.

(a) makes sense to me too. One concern in back branches is that unused
bits in txn_flags might be used by extensions, but it's unlikely.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-30 17:29:34
Message-ID:	CAD21AoCPJS3=SY28X5X_sfKzMA8PU3y0nm16ReyboFdX8=gRfg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Fri, May 30, 2025 at 5:15 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Fri, 30 May 2025 at 10:12, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, May 30, 2025 at 9:31 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Thu, 29 May 2025 at 22:57, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > > I agree that chances are much lower than current if txn->invalidations
> > > > > doesn't contain invalidations from other transactions, but it is not
> > > > > clear what exactly you are trying to advocate by it. Are you trying to
> > > > > advocate that we should maintain a member similar to txn->invalidation
> > > > > (say txn->distributed_invals) instead of a queue?
> > > >
> > > > Yes, because I guess it's much simpler. I think it would not be a good
> > > > idea to introduce a new concept of accounting the memory usage of the
> > > > distributed inval messages too and serializing them, at least on back
> > > > branches. I think that In case where the txn->distriubted_inval is
> > > > about to overflow (not has to be 1GB) we can invalidate all caches
> > > > instread.
> > >
> >
> > I agree that it would be simpler, and to avoid invalid memory
> > allocation requests even for rare cases, we can have the backup logic
> > to invalidate all caches.
> >
> > > To identify overflow scenarios, I’m considering the following options:
> > > a) Introduce a new txn_flags value, such as RBTXN_INVAL_ALL_CACHE, to
> > > explicitly mark transactions that require full cache invalidation.
> > > b) Add a dedicated parameter to indicate an overflow scenario.
> > > c) setting the newly added nentries_distr to -1, to indicate an
> > > overflow scenario.
> > >
> > > Do you have any preference or thoughts on which of these approaches
> > > would be cleaner?
> > >
> >
> > I would prefer (a) as that is an explicit way to indicate that we need
> > to invalidate all caches. But let us see if Sawada-san has something
> > else in mind.
>
> The attached v6 version patch has the changes for the same.
> Thoughts?

Thank you for updating the patch. Here are some review comments:

@@ -3439,9 +3464,27 @@ ReorderBufferAddInvalidations(ReorderBuffer
*rb, TransactionId xid,
XLogRecPtr lsn, Size nmsgs,
SharedInvalidationMessage *msgs)
{
- ReorderBufferTXN *txn;
+ ReorderBufferAddInvalidationsExtended(rb, xid, lsn, nmsgs, msgs, false);
+}

If the patch is the changes for master do we need to have an extended
version of ReorderBufferAddInvalidation()?

---
+ /*
+ * Make sure there's no cache pollution. Unlike the PG_TRY part,
+ * this must be done unconditionally because the processing might
+ * fail before we reach invalidation messages.
+ */
+ if (rbtxn_inval_all_cache(txn))
+ InvalidateSystemCaches();
+ else
+ ReorderBufferExecuteInvalidations(txn->ninvalidations_distr,
+
txn->distributed_invalidations);
+

If we don't need to execute the distributed inval message in an error
path other than detecting concurrent abort, we should describe the
reason.

---
Given that we don't account the memory usage of both
txn->invalidations and txn->distributed_invalidations, probably we can
have a lower limit, say 8MB (or lower?), to avoid memory exhaustion.

---
+ if ((for_inval && !AllocSizeIsValid(req_mem_size)) ||
+ rbtxn_inval_all_cache(txn))
{
- txn->ninvalidations = nmsgs;
- txn->invalidations = (SharedInvalidationMessage *)
- palloc(sizeof(SharedInvalidationMessage) * nmsgs);
- memcpy(txn->invalidations, msgs,
- sizeof(SharedInvalidationMessage) * nmsgs);
+ txn->txn_flags |= RBTXN_INVAL_ALL_CACHE;
+
+ if (*invalidations)
+ {
+ pfree(*invalidations);
+ *invalidations = NULL;
+ *ninvalidations = 0;
+ }

RBTXN_INVAL_ALL_CACHE seems to have an effect only on the distributed
inval messages. One question is do we need to care about the overflow
of txn->invalidations as well? If no, does it make sense to have a
separate function like ReorderBufferAddDistributedInvalidtions()
instead of having an extended version of
ReorderBufferAddInvalidations()? Some common routines can also be
declared as a static function if needed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-31 04:50:45
Message-ID:	CAA4eK1JQ2UQEbKdWUa0ozh3NdJFMJXfAKRW-vFrC3JqKxmx1FQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Fri, May 30, 2025 at 11:00 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Thank you for updating the patch. Here are some review comments:
>
> ---
> + /*
> + * Make sure there's no cache pollution. Unlike the PG_TRY part,
> + * this must be done unconditionally because the processing might
> + * fail before we reach invalidation messages.
> + */
> + if (rbtxn_inval_all_cache(txn))
> + InvalidateSystemCaches();
> + else
> + ReorderBufferExecuteInvalidations(txn->ninvalidations_distr,
> +
> txn->distributed_invalidations);
> +
>
> If we don't need to execute the distributed inval message in an error
> path other than detecting concurrent abort, we should describe the
> reason.
>

The concurrent abort handling is done for streaming and prepared
transactions, where we send the transaction changes to the client
before we read COMMIT/COMMIT PREPARED/ROLLBACK/ROLLBACK PREPARED. Now,
among these COMMIT/ROLLBACK PREPARED cases are handled as part of a
prepared transaction case. For ROLLBACK, we will never perform any
changes from the current transaction, so we don't need distributed
invalidations to be executed. For COMMIT, if we encounter any errors
while processing changes (this is when we reach the ERROR path, which
is not a concurrent abort), then we will reprocess all changes and, at
the end, execute both the current transaction and distributed
invalidations. Now, one possibility is that if, after ERROR, the
caller does slot_advance to skip the ERROR, then we will probably miss
executing the distributed invalidations, leading to data loss
afterwards. If the above theory is correct, then it is better to
execute distributed invalidation even in non-concurrent-abort cases in
the ERROR path.

> ---
> Given that we don't account the memory usage of both
> txn->invalidations and txn->distributed_invalidations, probably we can
> have a lower limit, say 8MB (or lower?), to avoid memory exhaustion.
>

Are you thinking if there are many waslenders in the system, and we
kept the limit higher, say 256 MB, then all will start consuming that
much memory, leading to memory exhaustion? If so, then I agree with
your point to keep this limit as 8MB, we can probably explain the same
in comments as well.

> ---
> + if ((for_inval && !AllocSizeIsValid(req_mem_size)) ||
> + rbtxn_inval_all_cache(txn))
> {
> - txn->ninvalidations = nmsgs;
> - txn->invalidations = (SharedInvalidationMessage *)
> - palloc(sizeof(SharedInvalidationMessage) * nmsgs);
> - memcpy(txn->invalidations, msgs,
> - sizeof(SharedInvalidationMessage) * nmsgs);
> + txn->txn_flags |= RBTXN_INVAL_ALL_CACHE;
> +
> + if (*invalidations)
> + {
> + pfree(*invalidations);
> + *invalidations = NULL;
> + *ninvalidations = 0;
> + }
>
> RBTXN_INVAL_ALL_CACHE seems to have an effect only on the distributed
> inval messages. One question is do we need to care about the overflow
> of txn->invalidations as well?
>

I don't think so. This has been working for a long time, so unless we
see a case where the overflow can happen, it is better not to change
it.

>
If no, does it make sense to have a
> separate function like ReorderBufferAddDistributedInvalidtions()
> instead of having an extended version of
> ReorderBufferAddInvalidations()? Some common routines can also be
> declared as a static function if needed.
>

+1.

--
With Regards,
Amit Kapila.

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-31 07:57:52
Message-ID:	CALDaNm0iqYWveBLwGSgyn5WZQ7F-hjPNtxLtctUi=Oa=-jTHDw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Fri, 30 May 2025 at 23:00, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Thank you for updating the patch. Here are some review comments:
>
> @@ -3439,9 +3464,27 @@ ReorderBufferAddInvalidations(ReorderBuffer
> *rb, TransactionId xid,
> XLogRecPtr lsn, Size nmsgs,
> SharedInvalidationMessage *msgs)
> {
> - ReorderBufferTXN *txn;
> + ReorderBufferAddInvalidationsExtended(rb, xid, lsn, nmsgs, msgs, false);
> +}
>
> If the patch is the changes for master do we need to have an extended
> version of ReorderBufferAddInvalidation()?

This has been removed now and ReorderBufferAddDistributedInvalidtions
has been added

> ---
> + /*
> + * Make sure there's no cache pollution. Unlike the PG_TRY part,
> + * this must be done unconditionally because the processing might
> + * fail before we reach invalidation messages.
> + */
> + if (rbtxn_inval_all_cache(txn))
> + InvalidateSystemCaches();
> + else
> + ReorderBufferExecuteInvalidations(txn->ninvalidations_distr,
> +
> txn->distributed_invalidations);
> +
>
> If we don't need to execute the distributed inval message in an error
> path other than detecting concurrent abort, we should describe the
> reason.

Removed it to keep it in the common error path

Modified

The attached v7 version patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v7-master-0001-Fix-exponential-memory-allocation-issue-in-logica.patch	text/x-patch	11.7 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-02 03:13:37
Message-ID:	CALDaNm0=JNjLqFzcAhujHLH1XQWcu+Jh_WJ6CCuqpjNb-fLnzg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Sat, 31 May 2025 at 10:20, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, May 30, 2025 at 11:00 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Thank you for updating the patch. Here are some review comments:
> >
> > ---
> > + /*
> > + * Make sure there's no cache pollution. Unlike the PG_TRY part,
> > + * this must be done unconditionally because the processing might
> > + * fail before we reach invalidation messages.
> > + */
> > + if (rbtxn_inval_all_cache(txn))
> > + InvalidateSystemCaches();
> > + else
> > + ReorderBufferExecuteInvalidations(txn->ninvalidations_distr,
> > +
> > txn->distributed_invalidations);
> > +
> >
> > If we don't need to execute the distributed inval message in an error
> > path other than detecting concurrent abort, we should describe the
> > reason.
> >
>
> The concurrent abort handling is done for streaming and prepared
> transactions, where we send the transaction changes to the client
> before we read COMMIT/COMMIT PREPARED/ROLLBACK/ROLLBACK PREPARED. Now,
> among these COMMIT/ROLLBACK PREPARED cases are handled as part of a
> prepared transaction case. For ROLLBACK, we will never perform any
> changes from the current transaction, so we don't need distributed
> invalidations to be executed. For COMMIT, if we encounter any errors
> while processing changes (this is when we reach the ERROR path, which
> is not a concurrent abort), then we will reprocess all changes and, at
> the end, execute both the current transaction and distributed
> invalidations. Now, one possibility is that if, after ERROR, the
> caller does slot_advance to skip the ERROR, then we will probably miss
> executing the distributed invalidations, leading to data loss
> afterwards. If the above theory is correct, then it is better to
> execute distributed invalidation even in non-concurrent-abort cases in
> the ERROR path.

One possible reason this scenario may not occur is that
pg_logical_slot_get_changes_guts uses a PG_CATCH block to handle
exceptions, during which it calls InvalidateSystemCaches to clear the
system cache. Because of this, I believe the scenario might not
actually happen.
@Sawada-san / others — Are there any other cases where this could still occur?

Regards,
Vignesh

From:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-02 07:12:50
Message-ID:	61ccd3cd-3185-4b07-9ce4-738c91546b54@deepbluecap.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Hayato Kuroda, I tried
v4-PG17-0001-Avoid-distributing-invalidation-messages-sev.patch and I can
confirm that it also resolves my original issue.

Best wishes, Duncan.

On 28/05/2025 14:27, Hayato Kuroda (Fujitsu) wrote:
> Dear Sawada-san, Amit,
>
>>> It can impact the performance for large transactions with fewer
>>> invalidations, especially the ones which has spilled changes because
>>> it needs to traverse the entire list of changes again at the end.
>>
>> Agreed.
>>
>>> The
>>> other idea would be to add new member(s) in ReorderBufferTXN to
>>> receive distributed invalidations. For adding the new member in
>>> ReorderBufferTXN: (a) in HEAD, it should be okay, (b) for
>>> backbranches, we may be able to add at the end, but we should check if
>>> there are any extensions using sizeof(ReorderBufferTxn) and if they
>>> are using what we need to do.
>>
>> If we can make sure that that change won't break the existing
>> extensions, I think this would be the most reasonable solution.
>
> Based on the discussion, I created PoC for master/PG17. Please see attached.
> The basic idea is to introduce the new queue which only contains distributed inval
> messages. Contents are consumed at end of transactions. I feel some of codes can
> be re-used so that internal functions are introduced. At least, it could pass
> regression tests and workloads discussed here.
>
> Best regards,
> Hayato Kuroda
> FUJITSU LIMITED
>

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-02 10:22:27
Message-ID:	CALDaNm2sGfKsZwm8rG132N_i0AvyxkYu8TL46kNcx7KwAUABxA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Sat, 31 May 2025 at 13:27, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Fri, 30 May 2025 at 23:00, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Thank you for updating the patch. Here are some review comments:
> >
> > @@ -3439,9 +3464,27 @@ ReorderBufferAddInvalidations(ReorderBuffer
> > *rb, TransactionId xid,
> > XLogRecPtr lsn, Size nmsgs,
> > SharedInvalidationMessage *msgs)
> > {
> > - ReorderBufferTXN *txn;
> > + ReorderBufferAddInvalidationsExtended(rb, xid, lsn, nmsgs, msgs, false);
> > +}
> >
> > If the patch is the changes for master do we need to have an extended
> > version of ReorderBufferAddInvalidation()?
>
> This has been removed now and ReorderBufferAddDistributedInvalidtions
> has been added
>
> > ---
> > + /*
> > + * Make sure there's no cache pollution. Unlike the PG_TRY part,
> > + * this must be done unconditionally because the processing might
> > + * fail before we reach invalidation messages.
> > + */
> > + if (rbtxn_inval_all_cache(txn))
> > + InvalidateSystemCaches();
> > + else
> > + ReorderBufferExecuteInvalidations(txn->ninvalidations_distr,
> > +
> > txn->distributed_invalidations);
> > +
> >
> > If we don't need to execute the distributed inval message in an error
> > path other than detecting concurrent abort, we should describe the
> > reason.
>
> Removed it to keep it in the common error path
>
> > ---
> > Given that we don't account the memory usage of both
> > txn->invalidations and txn->distributed_invalidations, probably we can
> > have a lower limit, say 8MB (or lower?), to avoid memory exhaustion.
>
> Modified
>
> > ---
> > + if ((for_inval && !AllocSizeIsValid(req_mem_size)) ||
> > + rbtxn_inval_all_cache(txn))
> > {
> > - txn->ninvalidations = nmsgs;
> > - txn->invalidations = (SharedInvalidationMessage *)
> > - palloc(sizeof(SharedInvalidationMessage) * nmsgs);
> > - memcpy(txn->invalidations, msgs,
> > - sizeof(SharedInvalidationMessage) * nmsgs);
> > + txn->txn_flags |= RBTXN_INVAL_ALL_CACHE;
> > +
> > + if (*invalidations)
> > + {
> > + pfree(*invalidations);
> > + *invalidations = NULL;
> > + *ninvalidations = 0;
> > + }
> >
> > RBTXN_INVAL_ALL_CACHE seems to have an effect only on the distributed
> > inval messages. One question is do we need to care about the overflow
> > of txn->invalidations as well? If no, does it make sense to have a
> > separate function like ReorderBufferAddDistributedInvalidtions()
> > instead of having an extended version of
> > ReorderBufferAddInvalidations()? Some common routines can also be
> > declared as a static function if needed.
>
> Modified
>
> The attached v7 version patch has the changes for the same.

Here is the patch, including updates for the back branches.
The main difference from master is that the newly added structure
members are appended at the end in the back branches to preserve
compatibility. The invalidation addition logic remains consistent with
master: a new function, ReorderBufferAddDistributedInvalidations, has
been introduced to handle distributed invalidations. Shared logic
between ReorderBufferAddInvalidations and
ReorderBufferAddDistributedInvalidations has been factored out into a
common helper, ReorderBufferAddInvalidationsCommon. This approach
simplifies future merges and, in my assessment, does not introduce any
backward compatibility issues.

Regards,
Vignesh

Attachment	Content-Type	Size
v8-PG15-0001-Fix-exponential-memory-allocation-issue-in-l.patch	application/octet-stream	12.0 KB
v8-PG14-0001-Fix-exponential-memory-allocation-issue-in-l.patch	application/octet-stream	11.9 KB
v8-PG13-0001-Fix-exponential-memory-allocation-issue-in-l.patch	application/octet-stream	11.1 KB
v8-PG17-0001-Fix-exponential-memory-allocation-issue-in-l.patch	application/octet-stream	11.9 KB
v8-master-0001-Fix-exponential-memory-allocation-issue-in.patch	application/octet-stream	12.0 KB
v8-PG16-0001-Fix-exponential-memory-allocation-issue-in-l.patch	application/octet-stream	11.9 KB

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	"pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, 'vignesh C' <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-02 11:35:36
Message-ID:	OSCPR01MB14966A375A7ADF466F335324CF562A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Amit, Duncan, Vignesh, Sawada-san,

I searched on GitHub and confirmed that no extensions use sizeof(ReorderBufferTXN).
Based on that, can we proceed with the fix for backbenches?
Note that I checked only repositories that are opened on GitHub. Proprietary
ones and hosting on other services cannot be guaranteed.

How I search
===========
I searched on github with "sizeof(ReorderBufferTXN)", and 259 files were found.
All of them were either "reorderbuffer.c" or "reorderbuffer.cpp". reorderbuffer.c
came from the forked postgres repos, and "reorderbuffer.cpp" came from the
openGauss project, which tries to enhance postgres. In both cases, they are
server-side code and will rebase community's commits when we update the header
file and sizeof(ReorderBufferTXN).
In this check, I could not find extensions that refer to the size; only
server-side codes were found. Based on that, we could extend the struct
ReorderBufferTXN.

[1]: https://github.com/search?q=sizeof%28ReorderBufferTXN%29&type=code
[2]: https://github.com/opengauss-mirror/openGauss-server/blob/master/src/gausskernel/storage/replication/logical/reorderbuffer.cpp

Best regards,
Hayato Kuroda
FUJITSU LIMITED

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-02 16:51:03
Message-ID:	CAD21AoAe08b6smo7aq6zUUSANXpcfH2JVhU54LDyaBo59Y0DHg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Sun, Jun 1, 2025 at 8:13 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Sat, 31 May 2025 at 10:20, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, May 30, 2025 at 11:00 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > Thank you for updating the patch. Here are some review comments:
> > >
> > > ---
> > > + /*
> > > + * Make sure there's no cache pollution. Unlike the PG_TRY part,
> > > + * this must be done unconditionally because the processing might
> > > + * fail before we reach invalidation messages.
> > > + */
> > > + if (rbtxn_inval_all_cache(txn))
> > > + InvalidateSystemCaches();
> > > + else
> > > + ReorderBufferExecuteInvalidations(txn->ninvalidations_distr,
> > > +
> > > txn->distributed_invalidations);
> > > +
> > >
> > > If we don't need to execute the distributed inval message in an error
> > > path other than detecting concurrent abort, we should describe the
> > > reason.
> > >
> >
> > The concurrent abort handling is done for streaming and prepared
> > transactions, where we send the transaction changes to the client
> > before we read COMMIT/COMMIT PREPARED/ROLLBACK/ROLLBACK PREPARED. Now,
> > among these COMMIT/ROLLBACK PREPARED cases are handled as part of a
> > prepared transaction case. For ROLLBACK, we will never perform any
> > changes from the current transaction, so we don't need distributed
> > invalidations to be executed. For COMMIT, if we encounter any errors
> > while processing changes (this is when we reach the ERROR path, which
> > is not a concurrent abort), then we will reprocess all changes and, at
> > the end, execute both the current transaction and distributed
> > invalidations. Now, one possibility is that if, after ERROR, the
> > caller does slot_advance to skip the ERROR, then we will probably miss
> > executing the distributed invalidations, leading to data loss
> > afterwards. If the above theory is correct, then it is better to
> > execute distributed invalidation even in non-concurrent-abort cases in
> > the ERROR path.

This theory seems to be correct to me. If users re-try the logical
decoding with smaller logical_decoding_work_mem the process could end
up streaming the transaction with the wrong relcache before sending
the committed transaction?

> One possible reason this scenario may not occur is that
> pg_logical_slot_get_changes_guts uses a PG_CATCH block to handle
> exceptions, during which it calls InvalidateSystemCaches to clear the
> system cache. Because of this, I believe the scenario might not
> actually happen.
> @Sawada-san / others — Are there any other cases where this could still occur?

I think that we cannot confine use cases to walsender and
pg_logical_slot_get_changes_guts() given that we expose logical
decoding API at C level.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-02 17:19:10
Message-ID:	CAD21AoAX6sLVFz2bzDEuHP8mRpHEEmwpt88S4=FD1k_+fynLHg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Mon, Jun 2, 2025 at 3:22 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Sat, 31 May 2025 at 13:27, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Fri, 30 May 2025 at 23:00, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > Thank you for updating the patch. Here are some review comments:
> > >
> > > @@ -3439,9 +3464,27 @@ ReorderBufferAddInvalidations(ReorderBuffer
> > > *rb, TransactionId xid,
> > > XLogRecPtr lsn, Size nmsgs,
> > > SharedInvalidationMessage *msgs)
> > > {
> > > - ReorderBufferTXN *txn;
> > > + ReorderBufferAddInvalidationsExtended(rb, xid, lsn, nmsgs, msgs, false);
> > > +}
> > >
> > > If the patch is the changes for master do we need to have an extended
> > > version of ReorderBufferAddInvalidation()?
> >
> > This has been removed now and ReorderBufferAddDistributedInvalidtions
> > has been added
> >
> > > ---
> > > + /*
> > > + * Make sure there's no cache pollution. Unlike the PG_TRY part,
> > > + * this must be done unconditionally because the processing might
> > > + * fail before we reach invalidation messages.
> > > + */
> > > + if (rbtxn_inval_all_cache(txn))
> > > + InvalidateSystemCaches();
> > > + else
> > > + ReorderBufferExecuteInvalidations(txn->ninvalidations_distr,
> > > +
> > > txn->distributed_invalidations);
> > > +
> > >
> > > If we don't need to execute the distributed inval message in an error
> > > path other than detecting concurrent abort, we should describe the
> > > reason.
> >
> > Removed it to keep it in the common error path
> >
> > > ---
> > > Given that we don't account the memory usage of both
> > > txn->invalidations and txn->distributed_invalidations, probably we can
> > > have a lower limit, say 8MB (or lower?), to avoid memory exhaustion.
> >
> > Modified
> >
> > > ---
> > > + if ((for_inval && !AllocSizeIsValid(req_mem_size)) ||
> > > + rbtxn_inval_all_cache(txn))
> > > {
> > > - txn->ninvalidations = nmsgs;
> > > - txn->invalidations = (SharedInvalidationMessage *)
> > > - palloc(sizeof(SharedInvalidationMessage) * nmsgs);
> > > - memcpy(txn->invalidations, msgs,
> > > - sizeof(SharedInvalidationMessage) * nmsgs);
> > > + txn->txn_flags |= RBTXN_INVAL_ALL_CACHE;
> > > +
> > > + if (*invalidations)
> > > + {
> > > + pfree(*invalidations);
> > > + *invalidations = NULL;
> > > + *ninvalidations = 0;
> > > + }
> > >
> > > RBTXN_INVAL_ALL_CACHE seems to have an effect only on the distributed
> > > inval messages. One question is do we need to care about the overflow
> > > of txn->invalidations as well? If no, does it make sense to have a
> > > separate function like ReorderBufferAddDistributedInvalidtions()
> > > instead of having an extended version of
> > > ReorderBufferAddInvalidations()? Some common routines can also be
> > > declared as a static function if needed.
> >
> > Modified
> >
> > The attached v7 version patch has the changes for the same.
>
> Here is the patch, including updates for the back branches.
> The main difference from master is that the newly added structure
> members are appended at the end in the back branches to preserve
> compatibility. The invalidation addition logic remains consistent with
> master: a new function, ReorderBufferAddDistributedInvalidations, has
> been introduced to handle distributed invalidations. Shared logic
> between ReorderBufferAddInvalidations and
> ReorderBufferAddDistributedInvalidations has been factored out into a
> common helper, ReorderBufferAddInvalidationsCommon. This approach
> simplifies future merges and, in my assessment, does not introduce any
> backward compatibility issues.
>

Thank you for updating the patch. Here are some review comments:

+ req_mem_size = sizeof(SharedInvalidationMessage) *
(txn->ninvalidations_distr + nmsgs);
+
+ /*
+ * If the number of invalidation messages is larger than 8MB, it's more
+ * efficient to invalidate the entire cache rather than processing each
+ * message individually.
+ */
+ if (req_mem_size > (8 * 1024 * 1024) || rbtxn_inval_all_cache(txn))

It's better to define the maximum number of distributed inval messages
per transaction as a macro instead of calculating the memory size
every time.

---
+static void
+ReorderBufferAddInvalidationsCommon(ReorderBuffer *rb, TransactionId xid,
+ XLogRecPtr lsn, Size nmsgs,
+ SharedInvalidationMessage *msgs,
+ ReorderBufferTXN *txn,
+ bool for_inval)

This function is quite confusing to me. For instance,
ReorderBufferAddDistributedInvalidations() needs to call this function
with for_inval=false in spite of adding inval messages actually. Also,
the following condition seems not intuisive but there is no comment:

if (!for_inval || (for_inval && !rbtxn_inval_all_cache(txn)))

Instead of having ReorderBufferAddInvalidationsCommon(), I think we
can have a function say ReorderBufferQueueInvalidations() where we
enqueue the given inval messages as a
REORDER_BUFFER_CHANGE_INVALIDATION change.
ReorderBufferAddInvalidations() adds inval messages to
txn->invalidations and calls that function, while
ReorderBufferQueueInvalidations() adds inval messages to
txn->distributed_ivnalidations and calls that function if the array is
not full.

BTW if we need to invalidate all accumulated caches at the end of
transaction replay anyway, we don't need to add inval messages to
txn->invalidations once txn->distributed_invalidations gets full?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	"pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-02 18:08:54
Message-ID:	CAD21AoDfbbF8RSvAgURXT=FycXkq1OBDJ5=-P+PccvYKQQf+3Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Mon, Jun 2, 2025 at 4:35 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Amit, Duncan, Vignesh, Sawada-san,
>
> I searched on GitHub and confirmed that no extensions use sizeof(ReorderBufferTXN).
> Based on that, can we proceed with the fix for backbenches?
> Note that I checked only repositories that are opened on GitHub. Proprietary
> ones and hosting on other services cannot be guaranteed.
>
> How I search
> ===========
> I searched on github with "sizeof(ReorderBufferTXN)", and 259 files were found.
> All of them were either "reorderbuffer.c" or "reorderbuffer.cpp". reorderbuffer.c
> came from the forked postgres repos, and "reorderbuffer.cpp" came from the
> openGauss project, which tries to enhance postgres. In both cases, they are
> server-side code and will rebase community's commits when we update the header
> file and sizeof(ReorderBufferTXN).
> In this check, I could not find extensions that refer to the size; only
> server-side codes were found. Based on that, we could extend the struct
> ReorderBufferTXN.

Thank you for checking the source codes on Github. I personally think
the chance that extensions depend on the ReorderBufferTXN size is low.
Without that, we would need more complex logic to store inval messages
per-transaction, which introduce a risk. So I agree with the current
solution.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-03 03:09:29
Message-ID:	CAA4eK1LApPjMVF4bDMgj422-k4P29UmON6+Qrzi+Y6w4ojDXKg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Mon, Jun 2, 2025 at 11:39 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Jun 2, 2025 at 4:35 AM Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > Dear Amit, Duncan, Vignesh, Sawada-san,
> >
> > I searched on GitHub and confirmed that no extensions use sizeof(ReorderBufferTXN).
> > Based on that, can we proceed with the fix for backbenches?
> > Note that I checked only repositories that are opened on GitHub. Proprietary
> > ones and hosting on other services cannot be guaranteed.
> >
> > How I search
> > ===========
> > I searched on github with "sizeof(ReorderBufferTXN)", and 259 files were found.
> > All of them were either "reorderbuffer.c" or "reorderbuffer.cpp". reorderbuffer.c
> > came from the forked postgres repos, and "reorderbuffer.cpp" came from the
> > openGauss project, which tries to enhance postgres. In both cases, they are
> > server-side code and will rebase community's commits when we update the header
> > file and sizeof(ReorderBufferTXN).
> > In this check, I could not find extensions that refer to the size; only
> > server-side codes were found. Based on that, we could extend the struct
> > ReorderBufferTXN.
>
> Thank you for checking the source codes on Github. I personally think
> the chance that extensions depend on the ReorderBufferTXN size is low.
> Without that, we would need more complex logic to store inval messages
> per-transaction, which introduce a risk. So I agree with the current
> solution.
>

It is difficult to predict whether proprietary extensions rely on the
sizeof ReorderBufferTXN, but I can't think of a better fix (where we
add new members at the end of ReorderBufferTXN) for backbranches. I
think we should explicitly mention this in the commit message, and in
the worst case, we need to request extension owners (that rely on
sizeof(ReorderBufferTXN)) to rebuild their extension.

--
With Regards,
Amit Kapila.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-03 03:38:09
Message-ID:	CAA4eK1JDogUOS7yx3yYL2VycWQ7THsfvmq+AnxjZx7YTOss4Mw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Mon, Jun 2, 2025 at 10:21 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sun, Jun 1, 2025 at 8:13 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Sat, 31 May 2025 at 10:20, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > >
> > > The concurrent abort handling is done for streaming and prepared
> > > transactions, where we send the transaction changes to the client
> > > before we read COMMIT/COMMIT PREPARED/ROLLBACK/ROLLBACK PREPARED. Now,
> > > among these COMMIT/ROLLBACK PREPARED cases are handled as part of a
> > > prepared transaction case. For ROLLBACK, we will never perform any
> > > changes from the current transaction, so we don't need distributed
> > > invalidations to be executed. For COMMIT, if we encounter any errors
> > > while processing changes (this is when we reach the ERROR path, which
> > > is not a concurrent abort), then we will reprocess all changes and, at
> > > the end, execute both the current transaction and distributed
> > > invalidations. Now, one possibility is that if, after ERROR, the
> > > caller does slot_advance to skip the ERROR, then we will probably miss
> > > executing the distributed invalidations, leading to data loss
> > > afterwards. If the above theory is correct, then it is better to
> > > execute distributed invalidation even in non-concurrent-abort cases in
> > > the ERROR path.
>
> This theory seems to be correct to me. If users re-try the logical
> decoding with smaller logical_decoding_work_mem the process could end
> up streaming the transaction with the wrong relcache before sending
> the committed transaction?
>

But will it give a problem? I think if a different transaction is
chosen for streaming after restart, then it should also have
distributed invalidation change queued for it at the appropriate
place, so that helps in making sure the cache is appropriate.

> > One possible reason this scenario may not occur is that
> > pg_logical_slot_get_changes_guts uses a PG_CATCH block to handle
> > exceptions, during which it calls InvalidateSystemCaches to clear the
> > system cache. Because of this, I believe the scenario might not
> > actually happen.
> > @Sawada-san / others — Are there any other cases where this could still occur?
>
> I think that we cannot confine use cases to walsender and
> pg_logical_slot_get_changes_guts() given that we expose logical
> decoding API at C level.
>

Hmm, yeah, that could be risky. Also, if we have to rely on the
InvalidateSystemCaches performed by the caller, then we don't even
need to execute the non-distributed invalidations. I think for the
sake of consistency and for the reason you mentioned, we can execute
distributed invalidations in the non-concurrent-abort ERROR path.

--
With Regards,
Amit Kapila.

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-03 07:07:14
Message-ID:	CALDaNm21OWoL66qpgVEsv0ZvgDLaiVkjtcU-YDVqQZVvo3NNPA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Mon, 2 Jun 2025 at 22:49, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Thank you for updating the patch. Here are some review comments:
>
> + req_mem_size = sizeof(SharedInvalidationMessage) *
> (txn->ninvalidations_distr + nmsgs);
> +
> + /*
> + * If the number of invalidation messages is larger than 8MB, it's more
> + * efficient to invalidate the entire cache rather than processing each
> + * message individually.
> + */
> + if (req_mem_size > (8 * 1024 * 1024) || rbtxn_inval_all_cache(txn))
>
> It's better to define the maximum number of distributed inval messages
> per transaction as a macro instead of calculating the memory size
> every time.

Modified

> ---
> +static void
> +ReorderBufferAddInvalidationsCommon(ReorderBuffer *rb, TransactionId xid,
> + XLogRecPtr lsn, Size nmsgs,
> + SharedInvalidationMessage *msgs,
> + ReorderBufferTXN *txn,
> + bool for_inval)
>
> This function is quite confusing to me. For instance,
> ReorderBufferAddDistributedInvalidations() needs to call this function
> with for_inval=false in spite of adding inval messages actually. Also,
> the following condition seems not intuisive but there is no comment:
>
> if (!for_inval || (for_inval && !rbtxn_inval_all_cache(txn)))
>
> Instead of having ReorderBufferAddInvalidationsCommon(), I think we
> can have a function say ReorderBufferQueueInvalidations() where we
> enqueue the given inval messages as a
> REORDER_BUFFER_CHANGE_INVALIDATION change.
> ReorderBufferAddInvalidations() adds inval messages to
> txn->invalidations and calls that function, while
> ReorderBufferQueueInvalidations() adds inval messages to
> txn->distributed_ivnalidations and calls that function if the array is
> not full.

Modified

> BTW if we need to invalidate all accumulated caches at the end of
> transaction replay anyway, we don't need to add inval messages to
> txn->invalidations once txn->distributed_invalidations gets full?

yes, no need to add invalidation messages to txn->invalidation once
RBTXN_INVAL_ALL_CACHE is set. This is handled now.

The attached v9 version patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v9-master-0001-Fix-exponential-memory-allocation-issue-in.patch	text/x-patch	11.2 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-03 12:24:16
Message-ID:	CALDaNm0KrrAtcY_BzZKDLdc48S8+iVnqpisw6NNEdxXAdtRJOQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Tue, 3 Jun 2025 at 12:37, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, 2 Jun 2025 at 22:49, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Thank you for updating the patch. Here are some review comments:
> >
> > + req_mem_size = sizeof(SharedInvalidationMessage) *
> > (txn->ninvalidations_distr + nmsgs);
> > +
> > + /*
> > + * If the number of invalidation messages is larger than 8MB, it's more
> > + * efficient to invalidate the entire cache rather than processing each
> > + * message individually.
> > + */
> > + if (req_mem_size > (8 * 1024 * 1024) || rbtxn_inval_all_cache(txn))
> >
> > It's better to define the maximum number of distributed inval messages
> > per transaction as a macro instead of calculating the memory size
> > every time.
>
> Modified
>
> > ---
> > +static void
> > +ReorderBufferAddInvalidationsCommon(ReorderBuffer *rb, TransactionId xid,
> > + XLogRecPtr lsn, Size nmsgs,
> > + SharedInvalidationMessage *msgs,
> > + ReorderBufferTXN *txn,
> > + bool for_inval)
> >
> > This function is quite confusing to me. For instance,
> > ReorderBufferAddDistributedInvalidations() needs to call this function
> > with for_inval=false in spite of adding inval messages actually. Also,
> > the following condition seems not intuisive but there is no comment:
> >
> > if (!for_inval || (for_inval && !rbtxn_inval_all_cache(txn)))
> >
> > Instead of having ReorderBufferAddInvalidationsCommon(), I think we
> > can have a function say ReorderBufferQueueInvalidations() where we
> > enqueue the given inval messages as a
> > REORDER_BUFFER_CHANGE_INVALIDATION change.
> > ReorderBufferAddInvalidations() adds inval messages to
> > txn->invalidations and calls that function, while
> > ReorderBufferQueueInvalidations() adds inval messages to
> > txn->distributed_ivnalidations and calls that function if the array is
> > not full.
>
> Modified
>
> > BTW if we need to invalidate all accumulated caches at the end of
> > transaction replay anyway, we don't need to add inval messages to
> > txn->invalidations once txn->distributed_invalidations gets full?
>
> yes, no need to add invalidation messages to txn->invalidation once
> RBTXN_INVAL_ALL_CACHE is set. This is handled now.
>
> The attached v9 version patch has the changes for the same.

I've posted the patch for the master branch only. I'll submit the
back-branch patches once this patch is in a committable state.

Regards,
Vignesh

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-03 19:43:50
Message-ID:	CAD21AoBhDBqavSkxr+0GCxom1Q7P7guY5Ees20Wda=YZLFVfCA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Tue, Jun 3, 2025 at 12:07 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, 2 Jun 2025 at 22:49, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Thank you for updating the patch. Here are some review comments:
> >
> > + req_mem_size = sizeof(SharedInvalidationMessage) *
> > (txn->ninvalidations_distr + nmsgs);
> > +
> > + /*
> > + * If the number of invalidation messages is larger than 8MB, it's more
> > + * efficient to invalidate the entire cache rather than processing each
> > + * message individually.
> > + */
> > + if (req_mem_size > (8 * 1024 * 1024) || rbtxn_inval_all_cache(txn))
> >
> > It's better to define the maximum number of distributed inval messages
> > per transaction as a macro instead of calculating the memory size
> > every time.
>
> Modified
>
> > ---
> > +static void
> > +ReorderBufferAddInvalidationsCommon(ReorderBuffer *rb, TransactionId xid,
> > + XLogRecPtr lsn, Size nmsgs,
> > + SharedInvalidationMessage *msgs,
> > + ReorderBufferTXN *txn,
> > + bool for_inval)
> >
> > This function is quite confusing to me. For instance,
> > ReorderBufferAddDistributedInvalidations() needs to call this function
> > with for_inval=false in spite of adding inval messages actually. Also,
> > the following condition seems not intuisive but there is no comment:
> >
> > if (!for_inval || (for_inval && !rbtxn_inval_all_cache(txn)))
> >
> > Instead of having ReorderBufferAddInvalidationsCommon(), I think we
> > can have a function say ReorderBufferQueueInvalidations() where we
> > enqueue the given inval messages as a
> > REORDER_BUFFER_CHANGE_INVALIDATION change.
> > ReorderBufferAddInvalidations() adds inval messages to
> > txn->invalidations and calls that function, while
> > ReorderBufferQueueInvalidations() adds inval messages to
> > txn->distributed_ivnalidations and calls that function if the array is
> > not full.
>
> Modified
>
> > BTW if we need to invalidate all accumulated caches at the end of
> > transaction replay anyway, we don't need to add inval messages to
> > txn->invalidations once txn->distributed_invalidations gets full?
>
> yes, no need to add invalidation messages to txn->invalidation once
> RBTXN_INVAL_ALL_CACHE is set. This is handled now.
>
> The attached v9 version patch has the changes for the same.

Thank you for updating the patch. Here are review comments on v9 patch:

+/*
+ * Maximum number of distributed invalidation messages per transaction.
+ * Each message is ~16 bytes, this allows up to 8 MB of invalidation
+ * message data.
+ */
+#define MAX_DISTR_INVAL_MSG_PER_TXN 524288

The size of SharedInvalidationMessage could change in the future so we
should calculate it at compile time.

---
+ /*
+ * If the complete cache will be invalidated, we don't need to accumulate
+ * the invalidations.
+ */
+ if (!rbtxn_inval_all_cache(txn))
+ ReorderBufferAccumulateInvalidations(&txn->ninvalidations,
+ &txn->invalidations, nmsgs, msgs);

We need to explain why we don't check the number of invalidation
messages for txn->invalidations and mark it as inval-all-cache, unlike
ReorderBufferAddDistributedInvalidations().

---
+ /*
+ * If the number of invalidation messages is high, performing a full cache
+ * invalidation is more efficient than handling each message separately.
+ */
+ if (((nmsgs + txn->ninvalidations_distributed) >
MAX_DISTR_INVAL_MSG_PER_TXN) ||
+ rbtxn_inval_all_cache(txn))
{
- txn->invalidations = (SharedInvalidationMessage *)
- repalloc(txn->invalidations, sizeof(SharedInvalidationMessage) *
- (txn->ninvalidations + nmsgs));
+ txn->txn_flags |= RBTXN_INVAL_ALL_CACHE;

I think we don't need to mark the transaction as RBTXN_INVAL_ALL_CACHE
again. I'd rewrite the logic as follows:

if (txn->ninvalidations_distributed + nmsgs >= MAX_DISTR_INVAL_MSG_PER_TXN)
{
/* mark the txn as inval-all-cache */
....
/* free the accumulated inval msgs */
....
}

if (!rbtxn_inval_all_cache(txn))
ReorderBufferAccumulateInvalidations(...);

---
- ReorderBufferAddInvalidations(builder->reorder, txn->xid, lsn,
- ninvalidations, msgs);
+ ReorderBufferAddDistributedInvalidations(builder->reorder,
+ txn->xid, lsn,
+ ninvalidations, msgs);

I think we need some comments here to explain why we need to
distribute only inval messages coming from the current transaction.

---
+/* Should the complete cache be invalidated? */
+#define rbtxn_inval_all_cache(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_INVAL_ALL_CACHE) != 0 \
+)

I find that if we rename the flag to something like
RBTXN_INVAL_OVERFLOWED, it would explain the state of the transaction
clearer.

---
Can we have a reasonable test case that covers the inval message overflow cases?

I've attached a patch for some changes and adding more comments (note
that it still has XXX comments). Please include these changes that you
agreed with in the next version patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
change_v9_masahiko.patch	application/octet-stream	8.8 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-04 06:47:55
Message-ID:	CALDaNm1MMafe_Xr3RFc0t3ds82W4CrR4=Ewi1Vh3uscpZ-rW0Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Wed, 4 Jun 2025 at 01:14, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Jun 3, 2025 at 12:07 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Mon, 2 Jun 2025 at 22:49, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > Thank you for updating the patch. Here are some review comments:
> > >
> > > + req_mem_size = sizeof(SharedInvalidationMessage) *
> > > (txn->ninvalidations_distr + nmsgs);
> > > +
> > > + /*
> > > + * If the number of invalidation messages is larger than 8MB, it's more
> > > + * efficient to invalidate the entire cache rather than processing each
> > > + * message individually.
> > > + */
> > > + if (req_mem_size > (8 * 1024 * 1024) || rbtxn_inval_all_cache(txn))
> > >
> > > It's better to define the maximum number of distributed inval messages
> > > per transaction as a macro instead of calculating the memory size
> > > every time.
> >
> > Modified
> >
> > > ---
> > > +static void
> > > +ReorderBufferAddInvalidationsCommon(ReorderBuffer *rb, TransactionId xid,
> > > + XLogRecPtr lsn, Size nmsgs,
> > > + SharedInvalidationMessage *msgs,
> > > + ReorderBufferTXN *txn,
> > > + bool for_inval)
> > >
> > > This function is quite confusing to me. For instance,
> > > ReorderBufferAddDistributedInvalidations() needs to call this function
> > > with for_inval=false in spite of adding inval messages actually. Also,
> > > the following condition seems not intuisive but there is no comment:
> > >
> > > if (!for_inval || (for_inval && !rbtxn_inval_all_cache(txn)))
> > >
> > > Instead of having ReorderBufferAddInvalidationsCommon(), I think we
> > > can have a function say ReorderBufferQueueInvalidations() where we
> > > enqueue the given inval messages as a
> > > REORDER_BUFFER_CHANGE_INVALIDATION change.
> > > ReorderBufferAddInvalidations() adds inval messages to
> > > txn->invalidations and calls that function, while
> > > ReorderBufferQueueInvalidations() adds inval messages to
> > > txn->distributed_ivnalidations and calls that function if the array is
> > > not full.
> >
> > Modified
> >
> > > BTW if we need to invalidate all accumulated caches at the end of
> > > transaction replay anyway, we don't need to add inval messages to
> > > txn->invalidations once txn->distributed_invalidations gets full?
> >
> > yes, no need to add invalidation messages to txn->invalidation once
> > RBTXN_INVAL_ALL_CACHE is set. This is handled now.
> >
> > The attached v9 version patch has the changes for the same.
>
> Thank you for updating the patch. Here are review comments on v9 patch:
>
> +/*
> + * Maximum number of distributed invalidation messages per transaction.
> + * Each message is ~16 bytes, this allows up to 8 MB of invalidation
> + * message data.
> + */
> +#define MAX_DISTR_INVAL_MSG_PER_TXN 524288
>
> The size of SharedInvalidationMessage could change in the future so we
> should calculate it at compile time.

Modified

> ---
> + /*
> + * If the complete cache will be invalidated, we don't need to accumulate
> + * the invalidations.
> + */
> + if (!rbtxn_inval_all_cache(txn))
> + ReorderBufferAccumulateInvalidations(&txn->ninvalidations,
> + &txn->invalidations, nmsgs, msgs);
>
> We need to explain why we don't check the number of invalidation
> messages for txn->invalidations and mark it as inval-all-cache, unlike
> ReorderBufferAddDistributedInvalidations().

Added comments

> ---
> + /*
> + * If the number of invalidation messages is high, performing a full cache
> + * invalidation is more efficient than handling each message separately.
> + */
> + if (((nmsgs + txn->ninvalidations_distributed) >
> MAX_DISTR_INVAL_MSG_PER_TXN) ||
> + rbtxn_inval_all_cache(txn))
> {
> - txn->invalidations = (SharedInvalidationMessage *)
> - repalloc(txn->invalidations, sizeof(SharedInvalidationMessage) *
> - (txn->ninvalidations + nmsgs));
> + txn->txn_flags |= RBTXN_INVAL_ALL_CACHE;
>
> I think we don't need to mark the transaction as RBTXN_INVAL_ALL_CACHE
> again. I'd rewrite the logic as follows:
>
> if (txn->ninvalidations_distributed + nmsgs >= MAX_DISTR_INVAL_MSG_PER_TXN)
> {
> /* mark the txn as inval-all-cache */
> ....
> /* free the accumulated inval msgs */
> ....
> }
>
> if (!rbtxn_inval_all_cache(txn))
> ReorderBufferAccumulateInvalidations(...);

Modified

> ---
> - ReorderBufferAddInvalidations(builder->reorder, txn->xid, lsn,
> - ninvalidations, msgs);
> + ReorderBufferAddDistributedInvalidations(builder->reorder,
> + txn->xid, lsn,
> + ninvalidations, msgs);
>
> I think we need some comments here to explain why we need to
> distribute only inval messages coming from the current transaction.

Added comments

> ---
> +/* Should the complete cache be invalidated? */
> +#define rbtxn_inval_all_cache(txn) \
> +( \
> + ((txn)->txn_flags & RBTXN_INVAL_ALL_CACHE) != 0 \
> +)
>
> I find that if we rename the flag to something like
> RBTXN_INVAL_OVERFLOWED, it would explain the state of the transaction
> clearer.

Modified

> Can we have a reasonable test case that covers the inval message overflow cases?
One of us will work on this and post a separate patch

> I've attached a patch for some changes and adding more comments (note
> that it still has XXX comments). Please include these changes that you
> agreed with in the next version patch.

Thanks for the comments, I merged it.

The attached v10 version patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v10-master-0001-Fix-exponential-memory-allocation-issue-i.patch	application/octet-stream	13.0 KB

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-04 09:20:51
Message-ID:	OSCPR01MB149664746A441DE1F9F56203DF56CA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Sawada-san,

> Can we have a reasonable test case that covers the inval message overflow
> cases?

I have been considering how we add tests, but it needs lots of invalidation
messages and consumes resources so much. Instead of that, how do you feel
to use injection_points? If it is enabled, the threshold for overflow is much
smaller than usual. Attached patch implemented the idea.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment	Content-Type	Size
0001-add-regression-test.txt	text/plain	5.9 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-04 21:48:54
Message-ID:	CAD21AoCV=n5AkTb_DDu+paZnBPshj2-tZ1-CAYgRWdWjNabm+w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Tue, Jun 3, 2025 at 11:48 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Wed, 4 Jun 2025 at 01:14, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Jun 3, 2025 at 12:07 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Mon, 2 Jun 2025 at 22:49, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > Thank you for updating the patch. Here are some review comments:
> > > >
> > > > + req_mem_size = sizeof(SharedInvalidationMessage) *
> > > > (txn->ninvalidations_distr + nmsgs);
> > > > +
> > > > + /*
> > > > + * If the number of invalidation messages is larger than 8MB, it's more
> > > > + * efficient to invalidate the entire cache rather than processing each
> > > > + * message individually.
> > > > + */
> > > > + if (req_mem_size > (8 * 1024 * 1024) || rbtxn_inval_all_cache(txn))
> > > >
> > > > It's better to define the maximum number of distributed inval messages
> > > > per transaction as a macro instead of calculating the memory size
> > > > every time.
> > >
> > > Modified
> > >
> > > > ---
> > > > +static void
> > > > +ReorderBufferAddInvalidationsCommon(ReorderBuffer *rb, TransactionId xid,
> > > > + XLogRecPtr lsn, Size nmsgs,
> > > > + SharedInvalidationMessage *msgs,
> > > > + ReorderBufferTXN *txn,
> > > > + bool for_inval)
> > > >
> > > > This function is quite confusing to me. For instance,
> > > > ReorderBufferAddDistributedInvalidations() needs to call this function
> > > > with for_inval=false in spite of adding inval messages actually. Also,
> > > > the following condition seems not intuisive but there is no comment:
> > > >
> > > > if (!for_inval || (for_inval && !rbtxn_inval_all_cache(txn)))
> > > >
> > > > Instead of having ReorderBufferAddInvalidationsCommon(), I think we
> > > > can have a function say ReorderBufferQueueInvalidations() where we
> > > > enqueue the given inval messages as a
> > > > REORDER_BUFFER_CHANGE_INVALIDATION change.
> > > > ReorderBufferAddInvalidations() adds inval messages to
> > > > txn->invalidations and calls that function, while
> > > > ReorderBufferQueueInvalidations() adds inval messages to
> > > > txn->distributed_ivnalidations and calls that function if the array is
> > > > not full.
> > >
> > > Modified
> > >
> > > > BTW if we need to invalidate all accumulated caches at the end of
> > > > transaction replay anyway, we don't need to add inval messages to
> > > > txn->invalidations once txn->distributed_invalidations gets full?
> > >
> > > yes, no need to add invalidation messages to txn->invalidation once
> > > RBTXN_INVAL_ALL_CACHE is set. This is handled now.
> > >
> > > The attached v9 version patch has the changes for the same.
> >
> > Thank you for updating the patch. Here are review comments on v9 patch:
> >
> > +/*
> > + * Maximum number of distributed invalidation messages per transaction.
> > + * Each message is ~16 bytes, this allows up to 8 MB of invalidation
> > + * message data.
> > + */
> > +#define MAX_DISTR_INVAL_MSG_PER_TXN 524288
> >
> > The size of SharedInvalidationMessage could change in the future so we
> > should calculate it at compile time.
>
> Modified
>
> > ---
> > + /*
> > + * If the complete cache will be invalidated, we don't need to accumulate
> > + * the invalidations.
> > + */
> > + if (!rbtxn_inval_all_cache(txn))
> > + ReorderBufferAccumulateInvalidations(&txn->ninvalidations,
> > + &txn->invalidations, nmsgs, msgs);
> >
> > We need to explain why we don't check the number of invalidation
> > messages for txn->invalidations and mark it as inval-all-cache, unlike
> > ReorderBufferAddDistributedInvalidations().
>
> Added comments
>
> > ---
> > + /*
> > + * If the number of invalidation messages is high, performing a full cache
> > + * invalidation is more efficient than handling each message separately.
> > + */
> > + if (((nmsgs + txn->ninvalidations_distributed) >
> > MAX_DISTR_INVAL_MSG_PER_TXN) ||
> > + rbtxn_inval_all_cache(txn))
> > {
> > - txn->invalidations = (SharedInvalidationMessage *)
> > - repalloc(txn->invalidations, sizeof(SharedInvalidationMessage) *
> > - (txn->ninvalidations + nmsgs));
> > + txn->txn_flags |= RBTXN_INVAL_ALL_CACHE;
> >
> > I think we don't need to mark the transaction as RBTXN_INVAL_ALL_CACHE
> > again. I'd rewrite the logic as follows:
> >
> > if (txn->ninvalidations_distributed + nmsgs >= MAX_DISTR_INVAL_MSG_PER_TXN)
> > {
> > /* mark the txn as inval-all-cache */
> > ....
> > /* free the accumulated inval msgs */
> > ....
> > }
> >
> > if (!rbtxn_inval_all_cache(txn))
> > ReorderBufferAccumulateInvalidations(...);
>
> Modified
>
> > ---
> > - ReorderBufferAddInvalidations(builder->reorder, txn->xid, lsn,
> > - ninvalidations, msgs);
> > + ReorderBufferAddDistributedInvalidations(builder->reorder,
> > + txn->xid, lsn,
> > + ninvalidations, msgs);
> >
> > I think we need some comments here to explain why we need to
> > distribute only inval messages coming from the current transaction.
>
> Added comments
>
> > ---
> > +/* Should the complete cache be invalidated? */
> > +#define rbtxn_inval_all_cache(txn) \
> > +( \
> > + ((txn)->txn_flags & RBTXN_INVAL_ALL_CACHE) != 0 \
> > +)
> >
> > I find that if we rename the flag to something like
> > RBTXN_INVAL_OVERFLOWED, it would explain the state of the transaction
> > clearer.
>
> Modified
>
> > Can we have a reasonable test case that covers the inval message overflow cases?
> One of us will work on this and post a separate patch
>
> > I've attached a patch for some changes and adding more comments (note
> > that it still has XXX comments). Please include these changes that you
> > agreed with in the next version patch.
>
> Thanks for the comments, I merged it.
>
> The attached v10 version patch has the changes for the same.

Thank you for updating the patch. I have some comments and questions:

In ReorderBufferAbort():

/*
* We might have decoded changes for this transaction that could load
* the cache as per the current transaction's view (consider DDL's
* happened in this transaction). We don't want the decoding of future
* transactions to use those cache entries so execute invalidations.
*/
if (txn->ninvalidations > 0)
ReorderBufferImmediateInvalidation(rb, txn->ninvalidations,
txn->invalidations);

I think that if the txn->invalidations_distributed is overflowed, we
would miss executing the txn->invalidations here. Probably the same is
true for ReorderBufferForget() and ReorderBufferInvalidate().

---
I'd like to make it clear again which case we need to execute
txn->invalidations as well as txn->invalidations_distributed (like in
ReorderBufferProcessTXN()) and which case we need to execute only
txn->invalidations (like in ReorderBufferForget() and
ReorderBufferAbort()). I think it might be worth putting some comments
about overall strategy somewhere.

---
BTW for back branches, a simple fix without ABI breakage would be to
introduce the RBTXN_INVAL_OVERFLOWED flag to limit the size of
txn->invalidations. That is, we accumulate inval messages both coming
from the current transaction and distributed by other transactions but
once the size reaches the threshold we invalidate all caches. Is it
worth considering for back branches?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-04 22:06:23
Message-ID:	CAD21AoAvZ=8_in4NcdXcOWaHnBdch_R5NE7N+iOde2+7S2QuDw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Wed, Jun 4, 2025 at 2:21 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Sawada-san,
>
> > Can we have a reasonable test case that covers the inval message overflow
> > cases?
>
> I have been considering how we add tests, but it needs lots of invalidation
> messages and consumes resources so much. Instead of that, how do you feel
> to use injection_points? If it is enabled, the threshold for overflow is much
> smaller than usual. Attached patch implemented the idea.

Alternative idea is to lower the constant value when using an
assertion build. That way, we don't need to rely on injection points
being enabled.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-05 06:20:04
Message-ID:	CAA4eK1+aON_z8JdZs+2inrfeBETTU5RiNweLNtoTEqn6f0qkmQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, Jun 5, 2025 at 3:19 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Jun 3, 2025 at 11:48 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Wed, 4 Jun 2025 at 01:14, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Thank you for updating the patch. I have some comments and questions:
>
> In ReorderBufferAbort():
>
> /*
> * We might have decoded changes for this transaction that could load
> * the cache as per the current transaction's view (consider DDL's
> * happened in this transaction). We don't want the decoding of future
> * transactions to use those cache entries so execute invalidations.
> */
> if (txn->ninvalidations > 0)
> ReorderBufferImmediateInvalidation(rb, txn->ninvalidations,
> txn->invalidations);
>
> I think that if the txn->invalidations_distributed is overflowed, we
> would miss executing the txn->invalidations here. Probably the same is
> true for ReorderBufferForget() and ReorderBufferInvalidate().
>

This is because of the following check "if
(!rbtxn_inval_overflowed(txn))" in function
ReorderBufferAddInvalidations(). What is the need of such a check in
this function? We don't need to execute distributed invalidations in
cases like ReorderBufferForget() when we haven't decoded any changes.

> ---
> I'd like to make it clear again which case we need to execute
> txn->invalidations as well as txn->invalidations_distributed (like in
> ReorderBufferProcessTXN()) and which case we need to execute only
> txn->invalidations (like in ReorderBufferForget() and
> ReorderBufferAbort()). I think it might be worth putting some comments
> about overall strategy somewhere.
>
> ---
> BTW for back branches, a simple fix without ABI breakage would be to
> introduce the RBTXN_INVAL_OVERFLOWED flag to limit the size of
> txn->invalidations. That is, we accumulate inval messages both coming
> from the current transaction and distributed by other transactions but
> once the size reaches the threshold we invalidate all caches. Is it
> worth considering for back branches?
>

It should work and is worth considering. The main concern would be
that it will hit sooner than we expect in the field, seeing the recent
reports. So, such a change has the potential to degrade the
performance. I feel that the number of people impacted due to
performance would be more than the number of people impacted due to
such an ABI change (adding the new members at the end of
ReorderBufferTXN). However, if we think we want to go safe w.r.t
extensions that can rely on the sizeof ReorderBufferTXN then your
proposal makes sense.

--
With Regards,
Amit Kapila.

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-05 06:36:11
Message-ID:	OSCPR01MB14966456C004EB65B31FE72AAF56FA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Sawada-san,

> Alternative idea is to lower the constant value when using an
> assertion build. That way, we don't need to rely on injection points
> being enabled.

Hmm, possible but I prefer current one. Two concerns:

1.
USE_ASSERT_CHECKING has not been used to change the value yet. The main usage is
to call debug functions in debug build.

2.
If we add tests which is usable only for debug build, it must be run only when it
is enabled. IIUC such test does not exist yet.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-05 10:28:30
Message-ID:	CALDaNm2E3ks06QTvEuQAHw3CxPhUrg6J6twXAwYO93S3xZPtvQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, 5 Jun 2025 at 03:19, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Thank you for updating the patch. I have some comments and questions:
>
> In ReorderBufferAbort():
>
> /*
> * We might have decoded changes for this transaction that could load
> * the cache as per the current transaction's view (consider DDL's
> * happened in this transaction). We don't want the decoding of future
> * transactions to use those cache entries so execute invalidations.
> */
> if (txn->ninvalidations > 0)
> ReorderBufferImmediateInvalidation(rb, txn->ninvalidations,
> txn->invalidations);
>
> I think that if the txn->invalidations_distributed is overflowed, we
> would miss executing the txn->invalidations here. Probably the same is
> true for ReorderBufferForget() and ReorderBufferInvalidate().

I'm accumulating the invalidations in txn->invalidations irrespective
of RBTXN_INVAL_OVERFLOWED txn.

I have added comments for this, feel free to reword it if some
changes are required.

The attached v11 version patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v11-master-0001-Fix-exponential-memory-allocation-issue-i.patch	text/x-patch	13.2 KB

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-05 11:07:22
Message-ID:	OSCPR01MB149662920804EAA70CE1E286FF56FA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Amit,

> > ---
> > I'd like to make it clear again which case we need to execute
> > txn->invalidations as well as txn->invalidations_distributed (like in
> > ReorderBufferProcessTXN()) and which case we need to execute only
> > txn->invalidations (like in ReorderBufferForget() and
> > ReorderBufferAbort()). I think it might be worth putting some comments
> > about overall strategy somewhere.
> >
> > ---
> > BTW for back branches, a simple fix without ABI breakage would be to
> > introduce the RBTXN_INVAL_OVERFLOWED flag to limit the size of
> > txn->invalidations. That is, we accumulate inval messages both coming
> > from the current transaction and distributed by other transactions but
> > once the size reaches the threshold we invalidate all caches. Is it
> > worth considering for back branches?
> >
>
> It should work and is worth considering. The main concern would be
> that it will hit sooner than we expect in the field, seeing the recent
> reports. So, such a change has the potential to degrade the
> performance. I feel that the number of people impacted due to
> performance would be more than the number of people impacted due to
> such an ABI change (adding the new members at the end of
> ReorderBufferTXN). However, if we think we want to go safe w.r.t
> extensions that can rely on the sizeof ReorderBufferTXN then your
> proposal makes sense.

While considering the approach, I found a doubtful point. Consider the below
workload:

0. S1: CREATE TABLE d(data text not null);
1. S1: BEGIN;
2. S1: INSERT INTO d VALUES ('d1')
3. S2: BEGIN;
4. S2: INSERT INTO d VALUES ('d2')
5. S1: ALTER PUBLICATION pb ADD TABLE d;
6. S1: ... lots of DDLs so overflow happens
7. S1: COMMIT;
8. S2: INSERT INTO d VALUES ('d3');
9. S2: COMMIT;
10. S2: INSERT INTO d VALUES ('d4');

In this case, the inval message generated by step 5 is discarded at step 6. No
invalidation messages are distributed in the SnapBuildDistributeSnapshotAndInval().
While decoding S2, relcache cannot be discarded and tuples d3 and d4 won't be
replicated. Do you think this can happen?

Note that this won't happen for v11 patch. The patch won't discard txn->invalidations
in case of overflow, needed inval messages can be distributed.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-05 12:14:43
Message-ID:	OSCPR01MB14966C8D376CB550AB5FA6040F56FA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

>0. S1: CREATE TABLE d(data text not null);
>1. S1: BEGIN;
>2. S1: INSERT INTO d VALUES ('d1')
>3. S2: BEGIN;
>4. S2: INSERT INTO d VALUES ('d2')
>5. S1: ALTER PUBLICATION pb ADD TABLE d;
>6. S1: ... lots of DDLs so overflow happens
>7. S1: COMMIT;
>8. S2: INSERT INTO d VALUES ('d3');
>9. S2: COMMIT;
>10. S2: INSERT INTO d VALUES ('d4');
>
> In this case, the inval message generated by step 5 is discarded at step 6. No
> invalidation messages are distributed in the
> SnapBuildDistributeSnapshotAndInval().
> While decoding S2, relcache cannot be discarded and tuples d3 and d4 won't be
> replicated. Do you think this can happen?

Before running the workload, pg_recvlogical was run to check the output.
At step 6, I created and dropped the same table several times until the debug log was output:

```
[650552] LOG: RBTXN_INVAL_OVERFLOWED flag is set
[650552] STATEMENT: START_REPLICATION SLOT "test" LOGICAL 0/0 ("proto_version" '4', "publication_names" 'pb')
```

After that I executed till 10 but no output was done by pg_recvlogical

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment	Content-Type	Size
0001-implement-limitation-approach-for-PG17.patch	application/octet-stream	7.0 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-05 17:43:25
Message-ID:	CAD21AoCR5ULReGxM9VNsp91MamPYLNLM7oiWqizpb-gPSpicVg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Wed, Jun 4, 2025 at 11:36 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Sawada-san,
>
> > Alternative idea is to lower the constant value when using an
> > assertion build. That way, we don't need to rely on injection points
> > being enabled.
>
> Hmm, possible but I prefer current one. Two concerns:
>
> 1.
> USE_ASSERT_CHECKING has not been used to change the value yet. The main usage is
> to call debug functions in debug build.

I think we have a similar precedent such as MT_NRELS_HASH to improve
the test coverages.

> 2.
> If we add tests which is usable only for debug build, it must be run only when it
> is enabled. IIUC such test does not exist yet.

I think we need to test cases not to check if we reach a specific code
point but to check if we can get the correct results even if we've
executed various code paths. As for this bug, it is better to check
that it works properly in a variety of cases. That way, we can check
overflow cases and non-overflow cases also in test cases added in the
future, improving the test coverage more.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-05 18:49:42
Message-ID:	CAD21AoBaiMiAMLF-daEyB43hLbWA6fMmWWToGDMyp9V3kp149w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Wed, Jun 4, 2025 at 11:20 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Jun 5, 2025 at 3:19 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Jun 3, 2025 at 11:48 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Wed, 4 Jun 2025 at 01:14, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Thank you for updating the patch. I have some comments and questions:
> >
> > In ReorderBufferAbort():
> >
> > /*
> > * We might have decoded changes for this transaction that could load
> > * the cache as per the current transaction's view (consider DDL's
> > * happened in this transaction). We don't want the decoding of future
> > * transactions to use those cache entries so execute invalidations.
> > */
> > if (txn->ninvalidations > 0)
> > ReorderBufferImmediateInvalidation(rb, txn->ninvalidations,
> > txn->invalidations);
> >
> > I think that if the txn->invalidations_distributed is overflowed, we
> > would miss executing the txn->invalidations here. Probably the same is
> > true for ReorderBufferForget() and ReorderBufferInvalidate().
> >
>
> This is because of the following check "if
> (!rbtxn_inval_overflowed(txn))" in function
> ReorderBufferAddInvalidations(). What is the need of such a check in
> this function? We don't need to execute distributed invalidations in
> cases like ReorderBufferForget() when we haven't decoded any changes.

>
> > ---
> > I'd like to make it clear again which case we need to execute
> > txn->invalidations as well as txn->invalidations_distributed (like in
> > ReorderBufferProcessTXN()) and which case we need to execute only
> > txn->invalidations (like in ReorderBufferForget() and
> > ReorderBufferAbort()). I think it might be worth putting some comments
> > about overall strategy somewhere.
> >
> > ---
> > BTW for back branches, a simple fix without ABI breakage would be to
> > introduce the RBTXN_INVAL_OVERFLOWED flag to limit the size of
> > txn->invalidations. That is, we accumulate inval messages both coming
> > from the current transaction and distributed by other transactions but
> > once the size reaches the threshold we invalidate all caches. Is it
> > worth considering for back branches?
> >
>
> It should work and is worth considering. The main concern would be
> that it will hit sooner than we expect in the field, seeing the recent
> reports. So, such a change has the potential to degrade the
> performance. I feel that the number of people impacted due to
> performance would be more than the number of people impacted due to
> such an ABI change (adding the new members at the end of
> ReorderBufferTXN).

That's a fair point. I initially assumed that DDLs were not executed
often in practice, but analyzing this bug has made me realize this
assumption was misguided.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-05 19:21:21
Message-ID:	CAD21AoCjHVR28__2TAuM5BZfgHbyYD9X=4nof3e+NdTVhg95Yw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, Jun 5, 2025 at 4:07 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Amit,
>
> > > ---
> > > I'd like to make it clear again which case we need to execute
> > > txn->invalidations as well as txn->invalidations_distributed (like in
> > > ReorderBufferProcessTXN()) and which case we need to execute only
> > > txn->invalidations (like in ReorderBufferForget() and
> > > ReorderBufferAbort()). I think it might be worth putting some comments
> > > about overall strategy somewhere.
> > >
> > > ---
> > > BTW for back branches, a simple fix without ABI breakage would be to
> > > introduce the RBTXN_INVAL_OVERFLOWED flag to limit the size of
> > > txn->invalidations. That is, we accumulate inval messages both coming
> > > from the current transaction and distributed by other transactions but
> > > once the size reaches the threshold we invalidate all caches. Is it
> > > worth considering for back branches?
> > >
> >
> > It should work and is worth considering. The main concern would be
> > that it will hit sooner than we expect in the field, seeing the recent
> > reports. So, such a change has the potential to degrade the
> > performance. I feel that the number of people impacted due to
> > performance would be more than the number of people impacted due to
> > such an ABI change (adding the new members at the end of
> > ReorderBufferTXN). However, if we think we want to go safe w.r.t
> > extensions that can rely on the sizeof ReorderBufferTXN then your
> > proposal makes sense.
>
> While considering the approach, I found a doubtful point. Consider the below
> workload:
>
> 0. S1: CREATE TABLE d(data text not null);
> 1. S1: BEGIN;
> 2. S1: INSERT INTO d VALUES ('d1')
> 3. S2: BEGIN;
> 4. S2: INSERT INTO d VALUES ('d2')
> 5. S1: ALTER PUBLICATION pb ADD TABLE d;
> 6. S1: ... lots of DDLs so overflow happens
> 7. S1: COMMIT;
> 8. S2: INSERT INTO d VALUES ('d3');
> 9. S2: COMMIT;
> 10. S2: INSERT INTO d VALUES ('d4');
>
> In this case, the inval message generated by step 5 is discarded at step 6. No
> invalidation messages are distributed in the SnapBuildDistributeSnapshotAndInval().
> While decoding S2, relcache cannot be discarded and tuples d3 and d4 won't be
> replicated. Do you think this can happen?

I think that once the S1's inval messages got overflowed, we should
mark other transactions as overflowed instead of distributing inval
messages.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-05 21:59:12
Message-ID:	CAD21AoDaCL9X4E8VAe=fYa=zjqGTKRJW13dTPazqAuOAEEykOg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, Jun 5, 2025 at 3:28 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Thu, 5 Jun 2025 at 03:19, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Thank you for updating the patch. I have some comments and questions:
> >
> > In ReorderBufferAbort():
> >
> > /*
> > * We might have decoded changes for this transaction that could load
> > * the cache as per the current transaction's view (consider DDL's
> > * happened in this transaction). We don't want the decoding of future
> > * transactions to use those cache entries so execute invalidations.
> > */
> > if (txn->ninvalidations > 0)
> > ReorderBufferImmediateInvalidation(rb, txn->ninvalidations,
> > txn->invalidations);
> >
> > I think that if the txn->invalidations_distributed is overflowed, we
> > would miss executing the txn->invalidations here. Probably the same is
> > true for ReorderBufferForget() and ReorderBufferInvalidate().
>
> I'm accumulating the invalidations in txn->invalidations irrespective
> of RBTXN_INVAL_OVERFLOWED txn.

Agreed with this change.

I think the patch is getting into good shape. I've attached a patch
that includes changes I recommend. For example, it's better to rename
RBTXN_INVAL_OVERFLOWED to RBTXN_DISTR_INVAL_OVERFLOWED, and it
includes some comment updates. Please review them.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
change_v11_masahiko.patch	application/octet-stream	7.1 KB

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-06 03:21:54
Message-ID:	CAA4eK1KonVMndZ+a4mCGCbgGDfOqKDiJvYV5EHXyjnF8oSn7BQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Fri, Jun 6, 2025 at 12:51 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Jun 5, 2025 at 4:07 AM Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > Dear Amit,
> >
> > > > ---
> > > > I'd like to make it clear again which case we need to execute
> > > > txn->invalidations as well as txn->invalidations_distributed (like in
> > > > ReorderBufferProcessTXN()) and which case we need to execute only
> > > > txn->invalidations (like in ReorderBufferForget() and
> > > > ReorderBufferAbort()). I think it might be worth putting some comments
> > > > about overall strategy somewhere.
> > > >
> > > > ---
> > > > BTW for back branches, a simple fix without ABI breakage would be to
> > > > introduce the RBTXN_INVAL_OVERFLOWED flag to limit the size of
> > > > txn->invalidations. That is, we accumulate inval messages both coming
> > > > from the current transaction and distributed by other transactions but
> > > > once the size reaches the threshold we invalidate all caches. Is it
> > > > worth considering for back branches?
> > > >
> > >
> > > It should work and is worth considering. The main concern would be
> > > that it will hit sooner than we expect in the field, seeing the recent
> > > reports. So, such a change has the potential to degrade the
> > > performance. I feel that the number of people impacted due to
> > > performance would be more than the number of people impacted due to
> > > such an ABI change (adding the new members at the end of
> > > ReorderBufferTXN). However, if we think we want to go safe w.r.t
> > > extensions that can rely on the sizeof ReorderBufferTXN then your
> > > proposal makes sense.
> >
> > While considering the approach, I found a doubtful point. Consider the below
> > workload:
> >
> > 0. S1: CREATE TABLE d(data text not null);
> > 1. S1: BEGIN;
> > 2. S1: INSERT INTO d VALUES ('d1')
> > 3. S2: BEGIN;
> > 4. S2: INSERT INTO d VALUES ('d2')
> > 5. S1: ALTER PUBLICATION pb ADD TABLE d;
> > 6. S1: ... lots of DDLs so overflow happens
> > 7. S1: COMMIT;
> > 8. S2: INSERT INTO d VALUES ('d3');
> > 9. S2: COMMIT;
> > 10. S2: INSERT INTO d VALUES ('d4');
> >
> > In this case, the inval message generated by step 5 is discarded at step 6. No
> > invalidation messages are distributed in the SnapBuildDistributeSnapshotAndInval().
> > While decoding S2, relcache cannot be discarded and tuples d3 and d4 won't be
> > replicated. Do you think this can happen?
>
> I think that once the S1's inval messages got overflowed, we should
> mark other transactions as overflowed instead of distributing inval
> messages.
>

Yeah, this should work, but are you still advocating that we go with
this approach (marking txn->invalidations also as overflowed) for
backbranches? In the previous email, you seemed to agree with the
performance impact due to DDLs, so it is not clear which approach you
prefer.

--
With Regards,
Amit Kapila.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-06 03:43:28
Message-ID:	CAA4eK1JVQfYKF8TfDuoZpQYRXq7siVn5k_4u270FMVGth1V_bw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, Jun 5, 2025 at 11:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Jun 4, 2025 at 11:36 PM Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > Dear Sawada-san,
> >
> > > Alternative idea is to lower the constant value when using an
> > > assertion build. That way, we don't need to rely on injection points
> > > being enabled.
> >
> > Hmm, possible but I prefer current one. Two concerns:
> >
> > 1.
> > USE_ASSERT_CHECKING has not been used to change the value yet. The main usage is
> > to call debug functions in debug build.
>
> I think we have a similar precedent such as MT_NRELS_HASH to improve
> the test coverages.
>
> > 2.
> > If we add tests which is usable only for debug build, it must be run only when it
> > is enabled. IIUC such test does not exist yet.
>
> I think we need to test cases not to check if we reach a specific code
> point but to check if we can get the correct results even if we've
> executed various code paths. As for this bug, it is better to check
> that it works properly in a variety of cases. That way, we can check
> overflow cases and non-overflow cases also in test cases added in the
> future, improving the test coverage more.
>

This makes sense, but we should see whether some existing tests cover
this code path after lowering the limit in the overflow code path. One
minor point to consider is that at the time, the value MT_NRELS_HASH
was used to cover cases in a debug build, but we didn't have the
injection_point framework.

BTW, I noticed that you removed the following comments in your suggestions:
/*
* Stores cache invalidation messages distributed by other transactions.
- *
- * It is acceptable to skip invalidations received from concurrent
- * transactions during ReorderBufferForget and ReorderBufferInvalidate,
- * because the transaction being discarded wouldn't have loaded any shared

IIUC, you only mentioned having some comments like this for ease of
understanding, and now you are suggesting to remove those.

--
With Regards,
Amit Kapila.

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-06 06:22:45
Message-ID:	OSCPR01MB14966952C546AE32654B84AACF56EA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Sawada-san,

> > 1.
> > USE_ASSERT_CHECKING has not been used to change the value yet. The main
> usage is
> > to call debug functions in debug build.
>
> I think we have a similar precedent such as MT_NRELS_HASH to improve
> the test coverages.

Oh, good detection, it seems a typical way.

> > 2.
> > If we add tests which is usable only for debug build, it must be run only when it
> > is enabled. IIUC such test does not exist yet.
>
> I think we need to test cases not to check if we reach a specific code
> point but to check if we can get the correct results even if we've
> executed various code paths. As for this bug, it is better to check
> that it works properly in a variety of cases. That way, we can check
> overflow cases and non-overflow cases also in test cases added in the
> future, improving the test coverage more.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>
Cc:	'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, 'Duncan Sands' <duncan(dot)sands(at)deepbluecap(dot)com>, "'pgsql-bugs(at)lists(dot)postgresql(dot)org'" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, 'vignesh C' <vignesh21(at)gmail(dot)com>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-06 06:54:51
Message-ID:	OSCPR01MB1496608B31715BB094C194B70F56EA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Sawada-san,

Sorry, I mistakenly sent partial part of the post. Let me continue the reply to 2.

You meant that 1) we do not have to ensure we reached the overflow part by seeing
the actual log output, and 2) it should be tested by existing ones.

Based on your advice, I updated the patch set.

0001 contains changes raised by [1]. I checked then and looked good.
0002 reduces the limitation to extremely lower value. I confirmed by adding debug
log and 7 cases can cause the overflow.

Appending
=======
Added debug log:
```
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3607,6 +3607,8 @@ ReorderBufferAddDistributedInvalidations(ReorderBuffer *rb, TransactionId xid,
*/
txn->txn_flags |= RBTXN_DISTR_INVAL_OVERFLOWED;

+ elog(LOG, "RBTXN_DISTR_INVAL_OVERFLOWED is set to the transaction")
```

Method how I count the testcases:
```
testrun$ grep -rI "RBTXN_DISTR_INVAL_OVERFLOWED" | awk '{print $1 $6}' | sort -u
subscription/100_bugs/log/100_bugs_twoways.log:2025-06-06LOG:
test_decoding/isolation/log/postmaster.log:2025-06-06isolation/catalog_change_snapshot/s1
test_decoding/isolation/log/postmaster.log:2025-06-06isolation/concurrent_ddl_dml/s2
test_decoding/isolation/log/postmaster.log:2025-06-06isolation/concurrent_stream/s1
test_decoding/isolation/log/postmaster.log:2025-06-06isolation/invalidation_distribution/s2
test_decoding/isolation/log/postmaster.log:2025-06-06isolation/oldest_xmin/s0
test_decoding/isolation/log/postmaster.log:2025-06-06isolation/snapshot_transfer/s0
```

[1]: https://www.postgresql.org/message-id/CAD21AoDaCL9X4E8VAe%3DfYa%3DzjqGTKRJW13dTPazqAuOAEEykOg%40mail.gmail.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment	Content-Type	Size
v12-0001-Fix-exponential-memory-allocation-issue-in-logic.patch	application/octet-stream	13.9 KB
v12-0002-Make-MAX_DISTR_INVAL_MSG_PER_TXN-lower-in-case-o.patch	application/octet-stream	1.2 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-06 17:20:34
Message-ID:	CAD21AoCNNkLPoc+PWLGpZs74NwHxCvvffAjx6Yn3pJ2xJJPWtw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, Jun 5, 2025 at 8:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Jun 5, 2025 at 11:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Jun 4, 2025 at 11:36 PM Hayato Kuroda (Fujitsu)
> > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > >
> > > Dear Sawada-san,
> > >
> > > > Alternative idea is to lower the constant value when using an
> > > > assertion build. That way, we don't need to rely on injection points
> > > > being enabled.
> > >
> > > Hmm, possible but I prefer current one. Two concerns:
> > >
> > > 1.
> > > USE_ASSERT_CHECKING has not been used to change the value yet. The main usage is
> > > to call debug functions in debug build.
> >
> > I think we have a similar precedent such as MT_NRELS_HASH to improve
> > the test coverages.
> >
> > > 2.
> > > If we add tests which is usable only for debug build, it must be run only when it
> > > is enabled. IIUC such test does not exist yet.
> >
> > I think we need to test cases not to check if we reach a specific code
> > point but to check if we can get the correct results even if we've
> > executed various code paths. As for this bug, it is better to check
> > that it works properly in a variety of cases. That way, we can check
> > overflow cases and non-overflow cases also in test cases added in the
> > future, improving the test coverage more.
> >
>
> This makes sense, but we should see whether some existing tests cover
> this code path after lowering the limit in the overflow code path. One
> minor point to consider is that at the time, the value MT_NRELS_HASH
> was used to cover cases in a debug build, but we didn't have the
> injection_point framework.

True.

After thinking about it more, perhaps my proposal would not be a good
idea for this case. I think that the cases where we selectively
invalidate caches is more complex and error-prone than the cases where
we invalidate a complete cache. If we invalidated all caches after
decoding each transaction, we wouldn't have had the original data-loss
issue. Having a lower MAX_DISTR_INVAL_MSG_PER_TXN value when using an
assertio build means that we're going to test the cases using a
simpler invalidation mechanism while productions systems, which has a
higher MAX_DISTR_INVAL_MSG_PER_TXN value, would end up executing
complex cases, which is not great. What do you think?

BTW, as for a new test case, it might be worth having a case I
mentioned before[1]:

With this case, we can test if we need to execute the distributed
invalidations as well in the non-error path in
ReorderBufferProcessTXN().

>
> BTW, I noticed that you removed the following comments in your suggestions:
> /*
> * Stores cache invalidation messages distributed by other transactions.
> - *
> - * It is acceptable to skip invalidations received from concurrent
> - * transactions during ReorderBufferForget and ReorderBufferInvalidate,
> - * because the transaction being discarded wouldn't have loaded any shared
>
> IIUC, you only mentioned having some comments like this for ease of
> understanding, and now you are suggesting to remove those.

I forgot to mention the reason. I thought we need either a
comprehensive comment in a place about in which case we need to
execute both the current transaction's inval messages and the
distributed inval messages and in which case we need to execute only
inval messages in the current transaction or to put comments where we
need. The v11 added the comprehensive comment to the declaration of
ninvalidations_distributed and invalidations_distributed in
ReoderBufferTXN, but I'm not sure that was the right place to have
such a comment as it's beyond the description of these fields. So in
my suggestion, I tried to clarify that we execute only the inval
message in the current transaction in ReorderBufferForget() and
ReorderBufferAbort() as they seems to already have a enough comment
for the reason why we need to execute the inval message there.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoCeM2nni1P7Z6KXzLM%3D6zCdShC82sOvuvu0_hBuJkm9Qw%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-06 17:23:42
Message-ID:	CAD21AoDKisp9pGb=8qos8Y4ddDLt62D5=P5usQMG3cm+A+vfOg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, Jun 5, 2025 at 8:22 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Jun 6, 2025 at 12:51 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Jun 5, 2025 at 4:07 AM Hayato Kuroda (Fujitsu)
> > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > >
> > > Dear Amit,
> > >
> > > > > ---
> > > > > I'd like to make it clear again which case we need to execute
> > > > > txn->invalidations as well as txn->invalidations_distributed (like in
> > > > > ReorderBufferProcessTXN()) and which case we need to execute only
> > > > > txn->invalidations (like in ReorderBufferForget() and
> > > > > ReorderBufferAbort()). I think it might be worth putting some comments
> > > > > about overall strategy somewhere.
> > > > >
> > > > > ---
> > > > > BTW for back branches, a simple fix without ABI breakage would be to
> > > > > introduce the RBTXN_INVAL_OVERFLOWED flag to limit the size of
> > > > > txn->invalidations. That is, we accumulate inval messages both coming
> > > > > from the current transaction and distributed by other transactions but
> > > > > once the size reaches the threshold we invalidate all caches. Is it
> > > > > worth considering for back branches?
> > > > >
> > > >
> > > > It should work and is worth considering. The main concern would be
> > > > that it will hit sooner than we expect in the field, seeing the recent
> > > > reports. So, such a change has the potential to degrade the
> > > > performance. I feel that the number of people impacted due to
> > > > performance would be more than the number of people impacted due to
> > > > such an ABI change (adding the new members at the end of
> > > > ReorderBufferTXN). However, if we think we want to go safe w.r.t
> > > > extensions that can rely on the sizeof ReorderBufferTXN then your
> > > > proposal makes sense.
> > >
> > > While considering the approach, I found a doubtful point. Consider the below
> > > workload:
> > >
> > > 0. S1: CREATE TABLE d(data text not null);
> > > 1. S1: BEGIN;
> > > 2. S1: INSERT INTO d VALUES ('d1')
> > > 3. S2: BEGIN;
> > > 4. S2: INSERT INTO d VALUES ('d2')
> > > 5. S1: ALTER PUBLICATION pb ADD TABLE d;
> > > 6. S1: ... lots of DDLs so overflow happens
> > > 7. S1: COMMIT;
> > > 8. S2: INSERT INTO d VALUES ('d3');
> > > 9. S2: COMMIT;
> > > 10. S2: INSERT INTO d VALUES ('d4');
> > >
> > > In this case, the inval message generated by step 5 is discarded at step 6. No
> > > invalidation messages are distributed in the SnapBuildDistributeSnapshotAndInval().
> > > While decoding S2, relcache cannot be discarded and tuples d3 and d4 won't be
> > > replicated. Do you think this can happen?
> >
> > I think that once the S1's inval messages got overflowed, we should
> > mark other transactions as overflowed instead of distributing inval
> > messages.
> >
>
> Yeah, this should work, but are you still advocating that we go with
> this approach (marking txn->invalidations also as overflowed) for
> backbranches? In the previous email, you seemed to agree with the
> performance impact due to DDLs, so it is not clear which approach you
> prefer.

No, I just wanted to make it clear that this idea is possible. But I
agree to use the idea of having both invalidations_distributed and
ninvalidations_distributed in ReorderBufferTXN in all branches.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-07 06:03:28
Message-ID:	CAA4eK1JnLoJqo9+bBWdtNEe5hH5j5bMErnkZmjTpjr5nV0Qegw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Fri, Jun 6, 2025 at 10:51 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> After thinking about it more, perhaps my proposal would not be a good
> idea for this case. I think that the cases where we selectively
> invalidate caches is more complex and error-prone than the cases where
> we invalidate a complete cache. If we invalidated all caches after
> decoding each transaction, we wouldn't have had the original data-loss
> issue. Having a lower MAX_DISTR_INVAL_MSG_PER_TXN value when using an
> assertio build means that we're going to test the cases using a
> simpler invalidation mechanism while productions systems, which has a
> higher MAX_DISTR_INVAL_MSG_PER_TXN value, would end up executing
> complex cases, which is not great. What do you think?
>

Your reasoning makes sense to me. The other thing is that it would be
better if we don't add more cases to rely on debug build for testing.
Going forward, it can become difficult to decide which cases are good
to test only in debug mode.

> BTW, as for a new test case, it might be worth having a case I
> mentioned before[1]:
>
> 1) S1: CREATE TABLE d (data text not null);
> 2) S1: INSERT INTO d VALUES ('d1');
> 3) S2: BEGIN; INSERT INTO d VALUES ('d2');
> 4) S3: BEGIN; INSERT INTO d VALUES ('d3');
> 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> 6) S2: INSERT INTO d VALUES ('d4');
> 7) S2: COMMIT;
> 8) S3: COMMIT;
> 9) S2: INSERT INTO d VALUES('d5');
> 10) S1: INSERT INTO d VALUES ('d6');
>
> With this case, we can test if we need to execute the distributed
> invalidations as well in the non-error path in
> ReorderBufferProcessTXN().
>

+1.

> >
> > BTW, I noticed that you removed the following comments in your suggestions:
> > /*
> > * Stores cache invalidation messages distributed by other transactions.
> > - *
> > - * It is acceptable to skip invalidations received from concurrent
> > - * transactions during ReorderBufferForget and ReorderBufferInvalidate,
> > - * because the transaction being discarded wouldn't have loaded any shared
> >
> > IIUC, you only mentioned having some comments like this for ease of
> > understanding, and now you are suggesting to remove those.
>
> I forgot to mention the reason. I thought we need either a
> comprehensive comment in a place about in which case we need to
> execute both the current transaction's inval messages and the
> distributed inval messages and in which case we need to execute only
> inval messages in the current transaction or to put comments where we
> need. The v11 added the comprehensive comment to the declaration of
> ninvalidations_distributed and invalidations_distributed in
> ReoderBufferTXN, but I'm not sure that was the right place to have
> such a comment as it's beyond the description of these fields. So in
> my suggestion, I tried to clarify that we execute only the inval
> message in the current transaction in ReorderBufferForget() and
> ReorderBufferAbort() as they seems to already have a enough comment
> for the reason why we need to execute the inval message there.
>

Fair enough. I see one more case aka the call to
ReorderBufferExecuteInvalidations() in ReorderBufferFinishPrepared().
I think we should have executed the required invalidations at the end
of prepare only, so why do we need to execute in
ReorderBufferFinishPrepared? It might be kept as a general cleanup
call and if that is the case we might want to even adjust comments
there. What do you think?

--
With Regards,
Amit Kapila.

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-07 07:17:00
Message-ID:	CALDaNm0DkWbr8eA3kAkEw+cZA-n=KPmFxZ_5bz_YDqhYQGk6iA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Fri, 6 Jun 2025 at 22:51, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Jun 5, 2025 at 8:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Jun 5, 2025 at 11:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Jun 4, 2025 at 11:36 PM Hayato Kuroda (Fujitsu)
> > > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > > >
> > > > Dear Sawada-san,
> > > >
> > > > > Alternative idea is to lower the constant value when using an
> > > > > assertion build. That way, we don't need to rely on injection points
> > > > > being enabled.
> > > >
> > > > Hmm, possible but I prefer current one. Two concerns:
> > > >
> > > > 1.
> > > > USE_ASSERT_CHECKING has not been used to change the value yet. The main usage is
> > > > to call debug functions in debug build.
> > >
> > > I think we have a similar precedent such as MT_NRELS_HASH to improve
> > > the test coverages.
> > >
> > > > 2.
> > > > If we add tests which is usable only for debug build, it must be run only when it
> > > > is enabled. IIUC such test does not exist yet.
> > >
> > > I think we need to test cases not to check if we reach a specific code
> > > point but to check if we can get the correct results even if we've
> > > executed various code paths. As for this bug, it is better to check
> > > that it works properly in a variety of cases. That way, we can check
> > > overflow cases and non-overflow cases also in test cases added in the
> > > future, improving the test coverage more.
> > >
> >
> > This makes sense, but we should see whether some existing tests cover
> > this code path after lowering the limit in the overflow code path. One
> > minor point to consider is that at the time, the value MT_NRELS_HASH
> > was used to cover cases in a debug build, but we didn't have the
> > injection_point framework.
>
> True.
>
> After thinking about it more, perhaps my proposal would not be a good
> idea for this case. I think that the cases where we selectively
> invalidate caches is more complex and error-prone than the cases where
> we invalidate a complete cache. If we invalidated all caches after
> decoding each transaction, we wouldn't have had the original data-loss
> issue. Having a lower MAX_DISTR_INVAL_MSG_PER_TXN value when using an
> assertio build means that we're going to test the cases using a
> simpler invalidation mechanism while productions systems, which has a
> higher MAX_DISTR_INVAL_MSG_PER_TXN value, would end up executing
> complex cases, which is not great. What do you think?
>
> BTW, as for a new test case, it might be worth having a case I
> mentioned before[1]:
>
> 1) S1: CREATE TABLE d (data text not null);
> 2) S1: INSERT INTO d VALUES ('d1');
> 3) S2: BEGIN; INSERT INTO d VALUES ('d2');
> 4) S3: BEGIN; INSERT INTO d VALUES ('d3');
> 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> 6) S2: INSERT INTO d VALUES ('d4');
> 7) S2: COMMIT;
> 8) S3: COMMIT;
> 9) S2: INSERT INTO d VALUES('d5');
> 10) S1: INSERT INTO d VALUES ('d6');
>
> With this case, we can test if we need to execute the distributed
> invalidations as well in the non-error path in
> ReorderBufferProcessTXN().

The attached v13 version patch has the change to include this test case.

Regards,
Vignesh

Attachment	Content-Type	Size
v13-master-0001-Fix-exponential-memory-allocation-issue-i.patch	application/octet-stream	17.2 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-08 13:43:33
Message-ID:	CALDaNm0TaTPuza7Fa+DRMzL+mqK3+7RVEvFiRoDJbU2vkJESwg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Sat, 7 Jun 2025 at 12:47, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Fri, 6 Jun 2025 at 22:51, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Jun 5, 2025 at 8:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Jun 5, 2025 at 11:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Jun 4, 2025 at 11:36 PM Hayato Kuroda (Fujitsu)
> > > > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > > > >
> > > > > Dear Sawada-san,
> > > > >
> > > > > > Alternative idea is to lower the constant value when using an
> > > > > > assertion build. That way, we don't need to rely on injection points
> > > > > > being enabled.
> > > > >
> > > > > Hmm, possible but I prefer current one. Two concerns:
> > > > >
> > > > > 1.
> > > > > USE_ASSERT_CHECKING has not been used to change the value yet. The main usage is
> > > > > to call debug functions in debug build.
> > > >
> > > > I think we have a similar precedent such as MT_NRELS_HASH to improve
> > > > the test coverages.
> > > >
> > > > > 2.
> > > > > If we add tests which is usable only for debug build, it must be run only when it
> > > > > is enabled. IIUC such test does not exist yet.
> > > >
> > > > I think we need to test cases not to check if we reach a specific code
> > > > point but to check if we can get the correct results even if we've
> > > > executed various code paths. As for this bug, it is better to check
> > > > that it works properly in a variety of cases. That way, we can check
> > > > overflow cases and non-overflow cases also in test cases added in the
> > > > future, improving the test coverage more.
> > > >
> > >
> > > This makes sense, but we should see whether some existing tests cover
> > > this code path after lowering the limit in the overflow code path. One
> > > minor point to consider is that at the time, the value MT_NRELS_HASH
> > > was used to cover cases in a debug build, but we didn't have the
> > > injection_point framework.
> >
> > True.
> >
> > After thinking about it more, perhaps my proposal would not be a good
> > idea for this case. I think that the cases where we selectively
> > invalidate caches is more complex and error-prone than the cases where
> > we invalidate a complete cache. If we invalidated all caches after
> > decoding each transaction, we wouldn't have had the original data-loss
> > issue. Having a lower MAX_DISTR_INVAL_MSG_PER_TXN value when using an
> > assertio build means that we're going to test the cases using a
> > simpler invalidation mechanism while productions systems, which has a
> > higher MAX_DISTR_INVAL_MSG_PER_TXN value, would end up executing
> > complex cases, which is not great. What do you think?
> >
> > BTW, as for a new test case, it might be worth having a case I
> > mentioned before[1]:
> >
> > 1) S1: CREATE TABLE d (data text not null);
> > 2) S1: INSERT INTO d VALUES ('d1');
> > 3) S2: BEGIN; INSERT INTO d VALUES ('d2');
> > 4) S3: BEGIN; INSERT INTO d VALUES ('d3');
> > 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> > 6) S2: INSERT INTO d VALUES ('d4');
> > 7) S2: COMMIT;
> > 8) S3: COMMIT;
> > 9) S2: INSERT INTO d VALUES('d5');
> > 10) S1: INSERT INTO d VALUES ('d6');
> >
> > With this case, we can test if we need to execute the distributed
> > invalidations as well in the non-error path in
> > ReorderBufferProcessTXN().
>
> The attached v13 version patch has the change to include this test case.

Attached are the patches, including those required for the back branches.

Regards,
Vignesh

Attachment	Content-Type	Size
v13-master-0001-Fix-exponential-memory-allocation-issue-i.patch	application/octet-stream	17.2 KB
v13-PG13-0001-Fix-exponential-memory-allocation-issue-in-.patch	application/octet-stream	16.7 KB
v13-PG15-0001-Fix-exponential-memory-allocation-issue-in-.patch	application/octet-stream	17.9 KB
v13-PG16-0001-Fix-exponential-memory-allocation-issue-in-.patch	application/octet-stream	18.0 KB
v13-PG14-0001-Fix-exponential-memory-allocation-issue-in-.patch	application/octet-stream	17.8 KB
v13-PG17-0001-Fix-exponential-memory-allocation-issue-in-.patch	application/octet-stream	17.9 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-10 19:19:55
Message-ID:	CAD21AoDRkvBQdFtHfro2zd7HxcCT4JSWGWfF68YU977mvu6oVg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Sun, Jun 8, 2025 at 6:43 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Sat, 7 Jun 2025 at 12:47, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Fri, 6 Jun 2025 at 22:51, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Jun 5, 2025 at 8:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Thu, Jun 5, 2025 at 11:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Wed, Jun 4, 2025 at 11:36 PM Hayato Kuroda (Fujitsu)
> > > > > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > > > > >
> > > > > > Dear Sawada-san,
> > > > > >
> > > > > > > Alternative idea is to lower the constant value when using an
> > > > > > > assertion build. That way, we don't need to rely on injection points
> > > > > > > being enabled.
> > > > > >
> > > > > > Hmm, possible but I prefer current one. Two concerns:
> > > > > >
> > > > > > 1.
> > > > > > USE_ASSERT_CHECKING has not been used to change the value yet. The main usage is
> > > > > > to call debug functions in debug build.
> > > > >
> > > > > I think we have a similar precedent such as MT_NRELS_HASH to improve
> > > > > the test coverages.
> > > > >
> > > > > > 2.
> > > > > > If we add tests which is usable only for debug build, it must be run only when it
> > > > > > is enabled. IIUC such test does not exist yet.
> > > > >
> > > > > I think we need to test cases not to check if we reach a specific code
> > > > > point but to check if we can get the correct results even if we've
> > > > > executed various code paths. As for this bug, it is better to check
> > > > > that it works properly in a variety of cases. That way, we can check
> > > > > overflow cases and non-overflow cases also in test cases added in the
> > > > > future, improving the test coverage more.
> > > > >
> > > >
> > > > This makes sense, but we should see whether some existing tests cover
> > > > this code path after lowering the limit in the overflow code path. One
> > > > minor point to consider is that at the time, the value MT_NRELS_HASH
> > > > was used to cover cases in a debug build, but we didn't have the
> > > > injection_point framework.
> > >
> > > True.
> > >
> > > After thinking about it more, perhaps my proposal would not be a good
> > > idea for this case. I think that the cases where we selectively
> > > invalidate caches is more complex and error-prone than the cases where
> > > we invalidate a complete cache. If we invalidated all caches after
> > > decoding each transaction, we wouldn't have had the original data-loss
> > > issue. Having a lower MAX_DISTR_INVAL_MSG_PER_TXN value when using an
> > > assertio build means that we're going to test the cases using a
> > > simpler invalidation mechanism while productions systems, which has a
> > > higher MAX_DISTR_INVAL_MSG_PER_TXN value, would end up executing
> > > complex cases, which is not great. What do you think?
> > >
> > > BTW, as for a new test case, it might be worth having a case I
> > > mentioned before[1]:
> > >
> > > 1) S1: CREATE TABLE d (data text not null);
> > > 2) S1: INSERT INTO d VALUES ('d1');
> > > 3) S2: BEGIN; INSERT INTO d VALUES ('d2');
> > > 4) S3: BEGIN; INSERT INTO d VALUES ('d3');
> > > 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> > > 6) S2: INSERT INTO d VALUES ('d4');
> > > 7) S2: COMMIT;
> > > 8) S3: COMMIT;
> > > 9) S2: INSERT INTO d VALUES('d5');
> > > 10) S1: INSERT INTO d VALUES ('d6');
> > >
> > > With this case, we can test if we need to execute the distributed
> > > invalidations as well in the non-error path in
> > > ReorderBufferProcessTXN().
> >
> > The attached v13 version patch has the change to include this test case.
>
> Attached are the patches, including those required for the back branches.

Thank you for updating the patches! I'll review them and share comments if any.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'vignesh C' <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-11 01:45:19
Message-ID:	OSCPR01MB14966C14006CDF9FB0CC151B4F575A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear hackers,

> Attached are the patches, including those required for the back branches.

While reviewing patch for PG13, I found the doubtful point in ReorderBufferCommit().

```
/*
* Every time the CommandId is incremented, we could
* see new catalog contents, so execute all
* invalidations.
*/
ReorderBufferExecuteInvalidations(txn->ninvalidations,
txn->invalidations);
```

This is called when REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID is dequeued from the
change queue, and this part exists only in PG13 codebase.
We are not sure whether we should execute txn->invalidations_distributed as well.
This can affect below case:

txn1: BEGIN; INSERT INTO d VALUES ('d1');
txn2: ALTER PUBLICATION pb ADD TABLE d;
txn1: CREATE TABLE another (id int);
txn1: INSERT INTO d VALUES ('d2');
txn1: COMMIT;
-> PG13 - no output
-> PG13 + v13 patch - no output
-> PG13 + v13 patch + additional inval execution - d2 can be replicated
-> (master - d2 can be replicated)

Personally I think txn->invalidations_distributed is not needed to be executed
because the spec seems bit complex, but I want to know other opinion.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-11 03:19:07
Message-ID:	CAA4eK1LONi-aehQ=fxXhqjqGE2ehdrtS881as-XyfK94UCvfhQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Wed, Jun 11, 2025 at 7:15 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> While reviewing patch for PG13, I found the doubtful point in ReorderBufferCommit().
>
> ```
> /*
> * Every time the CommandId is incremented, we could
> * see new catalog contents, so execute all
> * invalidations.
> */
> ReorderBufferExecuteInvalidations(txn->ninvalidations,
> txn->invalidations);
> ```
>
> This is called when REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID is dequeued from the
> change queue, and this part exists only in PG13 codebase.
> We are not sure whether we should execute txn->invalidations_distributed as well.
> This can affect below case:
>
>
> txn1: BEGIN; INSERT INTO d VALUES ('d1');
> txn2: ALTER PUBLICATION pb ADD TABLE d;
> txn1: CREATE TABLE another (id int);
> txn1: INSERT INTO d VALUES ('d2');
> txn1: COMMIT;
> -> PG13 - no output
> -> PG13 + v13 patch - no output
> -> PG13 + v13 patch + additional inval execution - d2 can be replicated
> -> (master - d2 can be replicated)
>
> Personally I think txn->invalidations_distributed is not needed to be executed
> because the spec seems bit complex, but I want to know other opinion.
>

This is not new; we knew from the time we committed this in PG13 that
concurrent transactions wouldn't pick up DDL changes, but later
transactions should. See commit message of commit
247ee94150b6fe8906da51afadbedf8acf3c17cf in PG13 ("The fix for 13 is
different from what we did in branches 14 and above, such that for 13,
the concurrent DDL changes (from DDL types mentioned earlier) will be
visible for any newly started transactions...). So, the above is
expected behavior for PG-13.

--
With Regards,
Amit Kapila.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-11 22:03:04
Message-ID:	CAD21AoDCphZ30iiMOG8mZH_eo40m_1KY=4gz7-Yd+9M3FRBySg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Tue, Jun 10, 2025 at 12:19 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sun, Jun 8, 2025 at 6:43 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Sat, 7 Jun 2025 at 12:47, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Fri, 6 Jun 2025 at 22:51, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Thu, Jun 5, 2025 at 8:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Thu, Jun 5, 2025 at 11:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Wed, Jun 4, 2025 at 11:36 PM Hayato Kuroda (Fujitsu)
> > > > > > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > > > > > >
> > > > > > > Dear Sawada-san,
> > > > > > >
> > > > > > > > Alternative idea is to lower the constant value when using an
> > > > > > > > assertion build. That way, we don't need to rely on injection points
> > > > > > > > being enabled.
> > > > > > >
> > > > > > > Hmm, possible but I prefer current one. Two concerns:
> > > > > > >
> > > > > > > 1.
> > > > > > > USE_ASSERT_CHECKING has not been used to change the value yet. The main usage is
> > > > > > > to call debug functions in debug build.
> > > > > >
> > > > > > I think we have a similar precedent such as MT_NRELS_HASH to improve
> > > > > > the test coverages.
> > > > > >
> > > > > > > 2.
> > > > > > > If we add tests which is usable only for debug build, it must be run only when it
> > > > > > > is enabled. IIUC such test does not exist yet.
> > > > > >
> > > > > > I think we need to test cases not to check if we reach a specific code
> > > > > > point but to check if we can get the correct results even if we've
> > > > > > executed various code paths. As for this bug, it is better to check
> > > > > > that it works properly in a variety of cases. That way, we can check
> > > > > > overflow cases and non-overflow cases also in test cases added in the
> > > > > > future, improving the test coverage more.
> > > > > >
> > > > >
> > > > > This makes sense, but we should see whether some existing tests cover
> > > > > this code path after lowering the limit in the overflow code path. One
> > > > > minor point to consider is that at the time, the value MT_NRELS_HASH
> > > > > was used to cover cases in a debug build, but we didn't have the
> > > > > injection_point framework.
> > > >
> > > > True.
> > > >
> > > > After thinking about it more, perhaps my proposal would not be a good
> > > > idea for this case. I think that the cases where we selectively
> > > > invalidate caches is more complex and error-prone than the cases where
> > > > we invalidate a complete cache. If we invalidated all caches after
> > > > decoding each transaction, we wouldn't have had the original data-loss
> > > > issue. Having a lower MAX_DISTR_INVAL_MSG_PER_TXN value when using an
> > > > assertio build means that we're going to test the cases using a
> > > > simpler invalidation mechanism while productions systems, which has a
> > > > higher MAX_DISTR_INVAL_MSG_PER_TXN value, would end up executing
> > > > complex cases, which is not great. What do you think?
> > > >
> > > > BTW, as for a new test case, it might be worth having a case I
> > > > mentioned before[1]:
> > > >
> > > > 1) S1: CREATE TABLE d (data text not null);
> > > > 2) S1: INSERT INTO d VALUES ('d1');
> > > > 3) S2: BEGIN; INSERT INTO d VALUES ('d2');
> > > > 4) S3: BEGIN; INSERT INTO d VALUES ('d3');
> > > > 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> > > > 6) S2: INSERT INTO d VALUES ('d4');
> > > > 7) S2: COMMIT;
> > > > 8) S3: COMMIT;
> > > > 9) S2: INSERT INTO d VALUES('d5');
> > > > 10) S1: INSERT INTO d VALUES ('d6');
> > > >
> > > > With this case, we can test if we need to execute the distributed
> > > > invalidations as well in the non-error path in
> > > > ReorderBufferProcessTXN().
> > >
> > > The attached v13 version patch has the change to include this test case.
> >
> > Attached are the patches, including those required for the back branches.
>
> Thank you for updating the patches! I'll review them and share comments if any.

Thank you for updating the patch. I have one comment on the newly added test:

+session "s3"
+step "s3i1" { INSERT INTO tbl1 (val1, val2) VALUES (1, 1);}
+step "s3a" { ALTER PUBLICATION pub ADD TABLE tbl1; }
+step "s3i2" { INSERT INTO tbl1 (val1, val2) VALUES (6, 6); }
+step "s3_get_binary_changes" { SELECT count(data) FROM
pg_logical_slot_get_binary_changes('isolation_slot', NULL, NULL,
'proto_version', '4', 'publication_names', 'pub') WHERE get_byte(data,
0) = 73; }
+
+session "s4"
+step "s4b" { BEGIN; }
+step "s4i1" { INSERT INTO tbl1 (val1, val2) VALUES (2, 2);}
+step "s4i2" { INSERT INTO tbl1 (val1, val2) VALUES (4, 4); }
+step "s4c" { COMMIT; }
+step "s4i3" { INSERT INTO tbl1 (val1, val2) VALUES (5, 5); }
+
+session "s5"
+step "s5b" { BEGIN; }
+step "s5i1" { INSERT INTO tbl1 (val1, val2) VALUES (3, 3); }
+step "s5c" { COMMIT; }

I think we don't necessarily need to add sessions "s4" and "s5". Let's
reuse "s1" and "s2" instead of adding them. I've attached a patch to
change that.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v13_test_change.patch	application/octet-stream	4.4 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-12 06:47:44
Message-ID:	CALDaNm3m6Dq2d6bCBaOjnABYavT3p+=VbJd5oJKok0sCTOSZog@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Thu, 12 Jun 2025 at 03:33, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Jun 10, 2025 at 12:19 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sun, Jun 8, 2025 at 6:43 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Sat, 7 Jun 2025 at 12:47, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > >
> > > > On Fri, 6 Jun 2025 at 22:51, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Thu, Jun 5, 2025 at 8:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Thu, Jun 5, 2025 at 11:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > On Wed, Jun 4, 2025 at 11:36 PM Hayato Kuroda (Fujitsu)
> > > > > > > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > > > > > > >
> > > > > > > > Dear Sawada-san,
> > > > > > > >
> > > > > > > > > Alternative idea is to lower the constant value when using an
> > > > > > > > > assertion build. That way, we don't need to rely on injection points
> > > > > > > > > being enabled.
> > > > > > > >
> > > > > > > > Hmm, possible but I prefer current one. Two concerns:
> > > > > > > >
> > > > > > > > 1.
> > > > > > > > USE_ASSERT_CHECKING has not been used to change the value yet. The main usage is
> > > > > > > > to call debug functions in debug build.
> > > > > > >
> > > > > > > I think we have a similar precedent such as MT_NRELS_HASH to improve
> > > > > > > the test coverages.
> > > > > > >
> > > > > > > > 2.
> > > > > > > > If we add tests which is usable only for debug build, it must be run only when it
> > > > > > > > is enabled. IIUC such test does not exist yet.
> > > > > > >
> > > > > > > I think we need to test cases not to check if we reach a specific code
> > > > > > > point but to check if we can get the correct results even if we've
> > > > > > > executed various code paths. As for this bug, it is better to check
> > > > > > > that it works properly in a variety of cases. That way, we can check
> > > > > > > overflow cases and non-overflow cases also in test cases added in the
> > > > > > > future, improving the test coverage more.
> > > > > > >
> > > > > >
> > > > > > This makes sense, but we should see whether some existing tests cover
> > > > > > this code path after lowering the limit in the overflow code path. One
> > > > > > minor point to consider is that at the time, the value MT_NRELS_HASH
> > > > > > was used to cover cases in a debug build, but we didn't have the
> > > > > > injection_point framework.
> > > > >
> > > > > True.
> > > > >
> > > > > After thinking about it more, perhaps my proposal would not be a good
> > > > > idea for this case. I think that the cases where we selectively
> > > > > invalidate caches is more complex and error-prone than the cases where
> > > > > we invalidate a complete cache. If we invalidated all caches after
> > > > > decoding each transaction, we wouldn't have had the original data-loss
> > > > > issue. Having a lower MAX_DISTR_INVAL_MSG_PER_TXN value when using an
> > > > > assertio build means that we're going to test the cases using a
> > > > > simpler invalidation mechanism while productions systems, which has a
> > > > > higher MAX_DISTR_INVAL_MSG_PER_TXN value, would end up executing
> > > > > complex cases, which is not great. What do you think?
> > > > >
> > > > > BTW, as for a new test case, it might be worth having a case I
> > > > > mentioned before[1]:
> > > > >
> > > > > 1) S1: CREATE TABLE d (data text not null);
> > > > > 2) S1: INSERT INTO d VALUES ('d1');
> > > > > 3) S2: BEGIN; INSERT INTO d VALUES ('d2');
> > > > > 4) S3: BEGIN; INSERT INTO d VALUES ('d3');
> > > > > 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> > > > > 6) S2: INSERT INTO d VALUES ('d4');
> > > > > 7) S2: COMMIT;
> > > > > 8) S3: COMMIT;
> > > > > 9) S2: INSERT INTO d VALUES('d5');
> > > > > 10) S1: INSERT INTO d VALUES ('d6');
> > > > >
> > > > > With this case, we can test if we need to execute the distributed
> > > > > invalidations as well in the non-error path in
> > > > > ReorderBufferProcessTXN().
> > > >
> > > > The attached v13 version patch has the change to include this test case.
> > >
> > > Attached are the patches, including those required for the back branches.
> >
> > Thank you for updating the patches! I'll review them and share comments if any.
>
> Thank you for updating the patch. I have one comment on the newly added test:
>
> +session "s3"
> +step "s3i1" { INSERT INTO tbl1 (val1, val2) VALUES (1, 1);}
> +step "s3a" { ALTER PUBLICATION pub ADD TABLE tbl1; }
> +step "s3i2" { INSERT INTO tbl1 (val1, val2) VALUES (6, 6); }
> +step "s3_get_binary_changes" { SELECT count(data) FROM
> pg_logical_slot_get_binary_changes('isolation_slot', NULL, NULL,
> 'proto_version', '4', 'publication_names', 'pub') WHERE get_byte(data,
> 0) = 73; }
> +
> +session "s4"
> +step "s4b" { BEGIN; }
> +step "s4i1" { INSERT INTO tbl1 (val1, val2) VALUES (2, 2);}
> +step "s4i2" { INSERT INTO tbl1 (val1, val2) VALUES (4, 4); }
> +step "s4c" { COMMIT; }
> +step "s4i3" { INSERT INTO tbl1 (val1, val2) VALUES (5, 5); }
> +
> +session "s5"
> +step "s5b" { BEGIN; }
> +step "s5i1" { INSERT INTO tbl1 (val1, val2) VALUES (3, 3); }
> +step "s5c" { COMMIT; }
>
> I think we don't necessarily need to add sessions "s4" and "s5". Let's
> reuse "s1" and "s2" instead of adding them. I've attached a patch to
> change that.

Thanks, this is better.
In the case of PG13 I have slightly changed the test to do the insert
after the commit, because the queuing of invalidation messages into
reorder buffer queue is not supported in PG13. This limitation is due
to the absence of support for REORDER_BUFFER_CHANGE_INVALIDATION which
is already present in >= PG14
The attached v14 version patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v14-PG14-0001-Fix-exponential-memory-allocation-issue-in-.patch	text/x-patch	17.3 KB
v14-PG13-0001-Fix-exponential-memory-allocation-issue-in-.patch	text/x-patch	16.4 KB
v14-PG15-0001-Fix-exponential-memory-allocation-issue-in-.patch	text/x-patch	17.3 KB
v14-PG16-0001-Fix-exponential-memory-allocation-issue-in-.patch	text/x-patch	17.4 KB
v14-PG17-0001-Fix-exponential-memory-allocation-issue-in-.patch	text/x-patch	17.4 KB
v14-master-0001-Fix-exponential-memory-allocation-issue-i.patch	text/x-patch	16.7 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-13 23:52:00
Message-ID:	CAD21AoCHqKXVhUZbxPKBve-uEHJWKxV9SVC5deiB5VUONTH=6w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Wed, Jun 11, 2025 at 11:47 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Thu, 12 Jun 2025 at 03:33, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Jun 10, 2025 at 12:19 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Sun, Jun 8, 2025 at 6:43 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > >
> > > > On Sat, 7 Jun 2025 at 12:47, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Fri, 6 Jun 2025 at 22:51, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Thu, Jun 5, 2025 at 8:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > On Thu, Jun 5, 2025 at 11:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Jun 4, 2025 at 11:36 PM Hayato Kuroda (Fujitsu)
> > > > > > > > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > > > > > > > >
> > > > > > > > > Dear Sawada-san,
> > > > > > > > >
> > > > > > > > > > Alternative idea is to lower the constant value when using an
> > > > > > > > > > assertion build. That way, we don't need to rely on injection points
> > > > > > > > > > being enabled.
> > > > > > > > >
> > > > > > > > > Hmm, possible but I prefer current one. Two concerns:
> > > > > > > > >
> > > > > > > > > 1.
> > > > > > > > > USE_ASSERT_CHECKING has not been used to change the value yet. The main usage is
> > > > > > > > > to call debug functions in debug build.
> > > > > > > >
> > > > > > > > I think we have a similar precedent such as MT_NRELS_HASH to improve
> > > > > > > > the test coverages.
> > > > > > > >
> > > > > > > > > 2.
> > > > > > > > > If we add tests which is usable only for debug build, it must be run only when it
> > > > > > > > > is enabled. IIUC such test does not exist yet.
> > > > > > > >
> > > > > > > > I think we need to test cases not to check if we reach a specific code
> > > > > > > > point but to check if we can get the correct results even if we've
> > > > > > > > executed various code paths. As for this bug, it is better to check
> > > > > > > > that it works properly in a variety of cases. That way, we can check
> > > > > > > > overflow cases and non-overflow cases also in test cases added in the
> > > > > > > > future, improving the test coverage more.
> > > > > > > >
> > > > > > >
> > > > > > > This makes sense, but we should see whether some existing tests cover
> > > > > > > this code path after lowering the limit in the overflow code path. One
> > > > > > > minor point to consider is that at the time, the value MT_NRELS_HASH
> > > > > > > was used to cover cases in a debug build, but we didn't have the
> > > > > > > injection_point framework.
> > > > > >
> > > > > > True.
> > > > > >
> > > > > > After thinking about it more, perhaps my proposal would not be a good
> > > > > > idea for this case. I think that the cases where we selectively
> > > > > > invalidate caches is more complex and error-prone than the cases where
> > > > > > we invalidate a complete cache. If we invalidated all caches after
> > > > > > decoding each transaction, we wouldn't have had the original data-loss
> > > > > > issue. Having a lower MAX_DISTR_INVAL_MSG_PER_TXN value when using an
> > > > > > assertio build means that we're going to test the cases using a
> > > > > > simpler invalidation mechanism while productions systems, which has a
> > > > > > higher MAX_DISTR_INVAL_MSG_PER_TXN value, would end up executing
> > > > > > complex cases, which is not great. What do you think?
> > > > > >
> > > > > > BTW, as for a new test case, it might be worth having a case I
> > > > > > mentioned before[1]:
> > > > > >
> > > > > > 1) S1: CREATE TABLE d (data text not null);
> > > > > > 2) S1: INSERT INTO d VALUES ('d1');
> > > > > > 3) S2: BEGIN; INSERT INTO d VALUES ('d2');
> > > > > > 4) S3: BEGIN; INSERT INTO d VALUES ('d3');
> > > > > > 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> > > > > > 6) S2: INSERT INTO d VALUES ('d4');
> > > > > > 7) S2: COMMIT;
> > > > > > 8) S3: COMMIT;
> > > > > > 9) S2: INSERT INTO d VALUES('d5');
> > > > > > 10) S1: INSERT INTO d VALUES ('d6');
> > > > > >
> > > > > > With this case, we can test if we need to execute the distributed
> > > > > > invalidations as well in the non-error path in
> > > > > > ReorderBufferProcessTXN().
> > > > >
> > > > > The attached v13 version patch has the change to include this test case.
> > > >
> > > > Attached are the patches, including those required for the back branches.
> > >
> > > Thank you for updating the patches! I'll review them and share comments if any.
> >
> > Thank you for updating the patch. I have one comment on the newly added test:
> >
> > +session "s3"
> > +step "s3i1" { INSERT INTO tbl1 (val1, val2) VALUES (1, 1);}
> > +step "s3a" { ALTER PUBLICATION pub ADD TABLE tbl1; }
> > +step "s3i2" { INSERT INTO tbl1 (val1, val2) VALUES (6, 6); }
> > +step "s3_get_binary_changes" { SELECT count(data) FROM
> > pg_logical_slot_get_binary_changes('isolation_slot', NULL, NULL,
> > 'proto_version', '4', 'publication_names', 'pub') WHERE get_byte(data,
> > 0) = 73; }
> > +
> > +session "s4"
> > +step "s4b" { BEGIN; }
> > +step "s4i1" { INSERT INTO tbl1 (val1, val2) VALUES (2, 2);}
> > +step "s4i2" { INSERT INTO tbl1 (val1, val2) VALUES (4, 4); }
> > +step "s4c" { COMMIT; }
> > +step "s4i3" { INSERT INTO tbl1 (val1, val2) VALUES (5, 5); }
> > +
> > +session "s5"
> > +step "s5b" { BEGIN; }
> > +step "s5i1" { INSERT INTO tbl1 (val1, val2) VALUES (3, 3); }
> > +step "s5c" { COMMIT; }
> >
> > I think we don't necessarily need to add sessions "s4" and "s5". Let's
> > reuse "s1" and "s2" instead of adding them. I've attached a patch to
> > change that.
>
> Thanks, this is better.
> In the case of PG13 I have slightly changed the test to do the insert
> after the commit, because the queuing of invalidation messages into
> reorder buffer queue is not supported in PG13. This limitation is due
> to the absence of support for REORDER_BUFFER_CHANGE_INVALIDATION which
> is already present in >= PG14
> The attached v14 version patch has the changes for the same.

Hmm, but the modified test is essentially the same as what we already
have in invalidation_distribution.spec. I think that it's good to have
the same test case in all branches even though we get a different
result in v13 as we currently don't have the test to check such
differences.

I've attached the updated patches accordingly and the commit messages.
I'm going to push them to all supported branches early next week,
barring any objections and further review comments.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
REL15_v15-0001-Fix-re-distributing-previously-distributed-inval.patch	application/octet-stream	19.1 KB
master_v15-0001-Fix-re-distributing-previously-distributed-inval.patch	application/octet-stream	18.4 KB
REL16_v15-0001-Fix-re-distributing-previously-distributed-inval.patch	application/octet-stream	19.2 KB
REL14_v15-0001-Fix-re-distributing-previously-distributed-inval.patch	application/octet-stream	19.0 KB
REL17_v15-0001-Fix-re-distributing-previously-distributed-inval.patch	application/octet-stream	19.1 KB
REL13_v15-0001-Fix-re-distributing-previously-distributed-inval.patch	application/octet-stream	17.8 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-16 02:27:50
Message-ID:	CALDaNm2udG-+nNnOfYYjC7pBVziZBRbJNPFq__Ynv3eASsZMAw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Sat, 14 Jun 2025 at 05:22, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > > Thank you for updating the patch. I have one comment on the newly added test:
> > >
> > > +session "s3"
> > > +step "s3i1" { INSERT INTO tbl1 (val1, val2) VALUES (1, 1);}
> > > +step "s3a" { ALTER PUBLICATION pub ADD TABLE tbl1; }
> > > +step "s3i2" { INSERT INTO tbl1 (val1, val2) VALUES (6, 6); }
> > > +step "s3_get_binary_changes" { SELECT count(data) FROM
> > > pg_logical_slot_get_binary_changes('isolation_slot', NULL, NULL,
> > > 'proto_version', '4', 'publication_names', 'pub') WHERE get_byte(data,
> > > 0) = 73; }
> > > +
> > > +session "s4"
> > > +step "s4b" { BEGIN; }
> > > +step "s4i1" { INSERT INTO tbl1 (val1, val2) VALUES (2, 2);}
> > > +step "s4i2" { INSERT INTO tbl1 (val1, val2) VALUES (4, 4); }
> > > +step "s4c" { COMMIT; }
> > > +step "s4i3" { INSERT INTO tbl1 (val1, val2) VALUES (5, 5); }
> > > +
> > > +session "s5"
> > > +step "s5b" { BEGIN; }
> > > +step "s5i1" { INSERT INTO tbl1 (val1, val2) VALUES (3, 3); }
> > > +step "s5c" { COMMIT; }
> > >
> > > I think we don't necessarily need to add sessions "s4" and "s5". Let's
> > > reuse "s1" and "s2" instead of adding them. I've attached a patch to
> > > change that.
> >
> > Thanks, this is better.
> > In the case of PG13 I have slightly changed the test to do the insert
> > after the commit, because the queuing of invalidation messages into
> > reorder buffer queue is not supported in PG13. This limitation is due
> > to the absence of support for REORDER_BUFFER_CHANGE_INVALIDATION which
> > is already present in >= PG14
> > The attached v14 version patch has the changes for the same.
>
> Hmm, but the modified test is essentially the same as what we already
> have in invalidation_distribution.spec. I think that it's good to have
> the same test case in all branches even though we get a different
> result in v13 as we currently don't have the test to check such
> differences.
>
> I've attached the updated patches accordingly and the commit messages.

Thanks, the changes look good to me.

Regards,
Vignesh

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-17 17:40:36
Message-ID:	CAD21AoBfFEG=y=sHrhOO-EWpcFW=PmXcFhkPGYF+O0AH+bRo6w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Sun, Jun 15, 2025 at 7:28 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Sat, 14 Jun 2025 at 05:22, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > > > Thank you for updating the patch. I have one comment on the newly added test:
> > > >
> > > > +session "s3"
> > > > +step "s3i1" { INSERT INTO tbl1 (val1, val2) VALUES (1, 1);}
> > > > +step "s3a" { ALTER PUBLICATION pub ADD TABLE tbl1; }
> > > > +step "s3i2" { INSERT INTO tbl1 (val1, val2) VALUES (6, 6); }
> > > > +step "s3_get_binary_changes" { SELECT count(data) FROM
> > > > pg_logical_slot_get_binary_changes('isolation_slot', NULL, NULL,
> > > > 'proto_version', '4', 'publication_names', 'pub') WHERE get_byte(data,
> > > > 0) = 73; }
> > > > +
> > > > +session "s4"
> > > > +step "s4b" { BEGIN; }
> > > > +step "s4i1" { INSERT INTO tbl1 (val1, val2) VALUES (2, 2);}
> > > > +step "s4i2" { INSERT INTO tbl1 (val1, val2) VALUES (4, 4); }
> > > > +step "s4c" { COMMIT; }
> > > > +step "s4i3" { INSERT INTO tbl1 (val1, val2) VALUES (5, 5); }
> > > > +
> > > > +session "s5"
> > > > +step "s5b" { BEGIN; }
> > > > +step "s5i1" { INSERT INTO tbl1 (val1, val2) VALUES (3, 3); }
> > > > +step "s5c" { COMMIT; }
> > > >
> > > > I think we don't necessarily need to add sessions "s4" and "s5". Let's
> > > > reuse "s1" and "s2" instead of adding them. I've attached a patch to
> > > > change that.
> > >
> > > Thanks, this is better.
> > > In the case of PG13 I have slightly changed the test to do the insert
> > > after the commit, because the queuing of invalidation messages into
> > > reorder buffer queue is not supported in PG13. This limitation is due
> > > to the absence of support for REORDER_BUFFER_CHANGE_INVALIDATION which
> > > is already present in >= PG14
> > > The attached v14 version patch has the changes for the same.
> >
> > Hmm, but the modified test is essentially the same as what we already
> > have in invalidation_distribution.spec. I think that it's good to have
> > the same test case in all branches even though we get a different
> > result in v13 as we currently don't have the test to check such
> > differences.
> >
> > I've attached the updated patches accordingly and the commit messages.
>
> Thanks, the changes look good to me.

Pushed the fix (d87d07b7ad3). Thank you for working on this fix.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Alexander Lakhin <exclusion(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-20 05:00:00
Message-ID:	dbf561f7-465e-4086-adfa-733b9b9a34b3@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Hello Sawada-san,

17.06.2025 20:40, Masahiko Sawada wrote:
> Pushed the fix (d87d07b7ad3). Thank you for working on this fix.

As buildfarm [1] shows, the test case added to invalidation_distribution
with 1230be12f fails against -DCLOBBER_CACHE_ALWAYS on REL_13_STABLE:
diff -U3
/home/buildfarm/trilobite/buildroot/REL_13_STABLE/pgsql.build/contrib/test_decoding/expected/invalidation_distribution.out
/home/buildfarm/trilobite/buildroot/REL_13_STABLE/pgsql.build/contrib/test_decoding/output_iso/results/invalidation_distribution.out
---
/home/buildfarm/trilobite/buildroot/REL_13_STABLE/pgsql.build/contrib/test_decoding/expected/invalidation_distribution.out
2025-06-17 10:24:24.382768613 +0200
+++
/home/buildfarm/trilobite/buildroot/REL_13_STABLE/pgsql.build/contrib/test_decoding/output_iso/results/invalidation_distribution.out
2025-06-17 15:01:53.921913314 +0200
@@ -31,7 +31,7 @@
step s2_get_binary_changes: SELECT count(data) FROM pg_logical_slot_get_binary_changes('isolation_slot', NULL, NULL,
'proto_version', '1', 'publication_names', 'pub') WHERE get_byte(data, 0) = 73;
count
-----
- 0
+ 1
(1 row)

?column?

I could reproduce this locally and also checked that the test passes on
REL_14_STABLE and master.

Could you look at this, please?

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2025-06-17%2008%3A24%3A00

Best regards,
Alexander

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-20 06:04:26
Message-ID:	CALDaNm0=KzQ8TTwTwSgZgvYwccWJU9rpFwTZYNFE1zAFLtSw3g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Fri, 20 Jun 2025 at 10:30, Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>
> Hello Sawada-san,
>
> 17.06.2025 20:40, Masahiko Sawada wrote:
>
> Pushed the fix (d87d07b7ad3). Thank you for working on this fix.
>
>
> As buildfarm [1] shows, the test case added to invalidation_distribution
> with 1230be12f fails against -DCLOBBER_CACHE_ALWAYS on REL_13_STABLE:
> diff -U3 /home/buildfarm/trilobite/buildroot/REL_13_STABLE/pgsql.build/contrib/test_decoding/expected/invalidation_distribution.out /home/buildfarm/trilobite/buildroot/REL_13_STABLE/pgsql.build/contrib/test_decoding/output_iso/results/invalidation_distribution.out
> --- /home/buildfarm/trilobite/buildroot/REL_13_STABLE/pgsql.build/contrib/test_decoding/expected/invalidation_distribution.out 2025-06-17 10:24:24.382768613 +0200
> +++ /home/buildfarm/trilobite/buildroot/REL_13_STABLE/pgsql.build/contrib/test_decoding/output_iso/results/invalidation_distribution.out 2025-06-17 15:01:53.921913314 +0200
> @@ -31,7 +31,7 @@
> step s2_get_binary_changes: SELECT count(data) FROM pg_logical_slot_get_binary_changes('isolation_slot', NULL, NULL, 'proto_version', '1', 'publication_names', 'pub') WHERE get_byte(data, 0) = 73;
> count
> -----
> - 0
> + 1
> (1 row)
>
> ?column?
>
> I could reproduce this locally and also checked that the test passes on
> REL_14_STABLE and master.
>
> Could you look at this, please?

Thanks for reporting this, we will analyse this and provide a fix for this.

Regards,
Vignesh

From:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-20 07:48:25
Message-ID:	a52f4f46-629c-4a41-813f-b8a369037f3c@deepbluecap.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Many thanks to all of you for working on this.

Best wishes, Duncan.

>>> I've attached the updated patches accordingly and the commit messages.
>>
>> Thanks, the changes look good to me.
>
> Pushed the fix (d87d07b7ad3). Thank you for working on this fix.
>
> Regards,
>

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	"pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, 'vignesh C' <vignesh21(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-20 08:20:50
Message-ID:	OSCPR01MB14966F91DBCAEA80EDEE67842F57CA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear hackers,

I've analyzed the issue, and this can always happen when --DCLOBBER_CACHE_ALWAYS is
set on PG13. My suggestion is to remove the latter part of test on this branch.

In the failed workload, we tested the case that one long transaction inserts a
tuple after the concurrent transaction does ALTER PUBLICATION ADD TABLE.
For PG14+ the change can be published, and PG13 it cannot be replicated because
the distributed inval messages cannot be executed during the transaction. That's
why we expected no results were output.

So what if the -DDCLOBBER_CACHE_ALWAYS is set for the workload? relsync cache
can be discarded very frequently and backend process always read the system
catalog when it processes their changes. This means decoder can recognize that the
pg_publication has been updated and it can publish changes based on the altered
publication information. This behavior is opposite from normal.

To fix the test failure, I suggest to just remove the case. Insert-after-commit
case has already been tested by above part of this file, so no need to do others.

Attached file does accordingly.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment	Content-Type	Size
0001-Fix-oversight-by-1230be12.patch	application/octet-stream	3.3 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	"pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-21 00:17:32
Message-ID:	CAD21AoC+H0m1CORbV9vcnAEZ+w3ktaNPB0RWxHgGj7rKBzzKGg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Fri, Jun 20, 2025 at 5:21 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear hackers,
>
> I've analyzed the issue, and this can always happen when --DCLOBBER_CACHE_ALWAYS is
> set on PG13. My suggestion is to remove the latter part of test on this branch.
>
> In the failed workload, we tested the case that one long transaction inserts a
> tuple after the concurrent transaction does ALTER PUBLICATION ADD TABLE.
> For PG14+ the change can be published, and PG13 it cannot be replicated because
> the distributed inval messages cannot be executed during the transaction. That's
> why we expected no results were output.
>
> So what if the -DDCLOBBER_CACHE_ALWAYS is set for the workload? relsync cache
> can be discarded very frequently and backend process always read the system
> catalog when it processes their changes. This means decoder can recognize that the
> pg_publication has been updated and it can publish changes based on the altered
> publication information. This behavior is opposite from normal.
>
> To fix the test failure, I suggest to just remove the case. Insert-after-commit
> case has already been tested by above part of this file, so no need to do others.
>
> Attached file does accordingly.

Thank you for providing the patch. I'll look at the issue and the
patch in depth but it sounds like a reasonable solution.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	"pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-22 02:57:49
Message-ID:	CALDaNm1Fx+zzudTi1aoE9nFef_p4H-qbOcvF9_qRanNLtwU-tA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Fri, 20 Jun 2025 at 13:51, Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear hackers,
>
> I've analyzed the issue, and this can always happen when --DCLOBBER_CACHE_ALWAYS is
> set on PG13. My suggestion is to remove the latter part of test on this branch.
>
> In the failed workload, we tested the case that one long transaction inserts a
> tuple after the concurrent transaction does ALTER PUBLICATION ADD TABLE.
> For PG14+ the change can be published, and PG13 it cannot be replicated because
> the distributed inval messages cannot be executed during the transaction. That's
> why we expected no results were output.
>
> So what if the -DDCLOBBER_CACHE_ALWAYS is set for the workload? relsync cache
> can be discarded very frequently and backend process always read the system
> catalog when it processes their changes. This means decoder can recognize that the
> pg_publication has been updated and it can publish changes based on the altered
> publication information. This behavior is opposite from normal.

I agree with your analysis.

> To fix the test failure, I suggest to just remove the case. Insert-after-commit
> case has already been tested by above part of this file, so no need to do others.

Alternatively I was thinking of a fix if it is possible to run this
test conditionally when CLOBBER_CACHE_ALWAYS is not defined, I was not
sure if it is easy to do that and worth the effort for the PG13
branch. I'm ok with the proposed change.

Regards,
Vignesh

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-23 02:52:45
Message-ID:	CAA4eK1JjGrMsLoMdCynBLkPW5oTPsgpZKyJTP1g6YV+mmV-zWw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Sun, Jun 22, 2025 at 8:28 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Fri, 20 Jun 2025 at 13:51, Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
>
> > To fix the test failure, I suggest to just remove the case. Insert-after-commit
> > case has already been tested by above part of this file, so no need to do others.
>
> Alternatively I was thinking of a fix if it is possible to run this
> test conditionally when CLOBBER_CACHE_ALWAYS is not defined, I was not
> sure if it is easy to do that and worth the effort for the PG13
> branch. I'm ok with the proposed change.
>

I prefer to change the test because, if the above analysis is correct,
it indicates that the test has a cache flush hazard. It would be
better to make the test robust instead of working around it.

--
With Regards,
Amit Kapila.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-23 07:14:45
Message-ID:	CAD21AoD736ECsU-FR5x+hp87vuZzOc1N3sknvSwKuW=XEz0Jgw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Mon, Jun 23, 2025 at 11:52 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sun, Jun 22, 2025 at 8:28 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Fri, 20 Jun 2025 at 13:51, Hayato Kuroda (Fujitsu)
> > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > >
> >
> > > To fix the test failure, I suggest to just remove the case. Insert-after-commit
> > > case has already been tested by above part of this file, so no need to do others.
> >
> > Alternatively I was thinking of a fix if it is possible to run this
> > test conditionally when CLOBBER_CACHE_ALWAYS is not defined, I was not
> > sure if it is easy to do that and worth the effort for the PG13
> > branch. I'm ok with the proposed change.
> >
>
> I prefer to change the test because, if the above analysis is correct,
> it indicates that the test has a cache flush hazard. It would be
> better to make the test robust instead of working around it.

The analysis shared by Kuroda-san matches my understanding. I'd like
to avoid removing this test case just because it doesn't pass with
CLOBBER_CACHE_ALWAYS.

One solution is to have two expected-output files to cover both cases.
We do a similar thing for the plpgsql_cache.sql test case. What do you
think?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject:	RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-23 12:03:33
Message-ID:	OSCPR01MB14966CD70FA2144C1F972647BF579A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Dear Sawada-san,

> One solution is to have two expected-output files to cover both cases.
> We do a similar thing for the plpgsql_cache.sql test case. What do you
> think?

Personally I don't like the approach because XXX_1.out can be easily missed to be
updated, but it is not a strong opinion.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-23 14:05:35
Message-ID:	CAD21AoAf6-j0vPTpzUj-Q2ySPo-jW11+-pN3kmeSBypgwUZaSg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Mon, Jun 23, 2025 at 9:03 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Sawada-san,
>
> > One solution is to have two expected-output files to cover both cases.
> > We do a similar thing for the plpgsql_cache.sql test case. What do you
> > think?
>
> Personally I don't like the approach because XXX_1.out can be easily missed to be
> updated, but it is not a strong opinion.

I think that we have only a few releases for v13 and there would not
likely be many cases where we need to update _1.out file. But I'm open
to other ideas. Do you prefer removing the test from v13? I'm not sure
that just because it's easy to miss updating a _1.out file is a good
reason to remove tests.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-24 01:24:03
Message-ID:	CAD21AoACfSCfWv=Jtq2AjDyAKspkhNHN7ePvXyASMphYuDJPuw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Mon, Jun 23, 2025 at 11:05 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Jun 23, 2025 at 9:03 PM Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > Dear Sawada-san,
> >
> > > One solution is to have two expected-output files to cover both cases.
> > > We do a similar thing for the plpgsql_cache.sql test case. What do you
> > > think?
> >
> > Personally I don't like the approach because XXX_1.out can be easily missed to be
> > updated, but it is not a strong opinion.
>
> I think that we have only a few releases for v13 and there would not
> likely be many cases where we need to update _1.out file. But I'm open
> to other ideas. Do you prefer removing the test from v13? I'm not sure
> that just because it's easy to miss updating a _1.out file is a good
> reason to remove tests.
>

I've added the patch for that idea for discussion. I considered moving
the new cache-behavior-dependent test to another test file to minimize
the maintenance effort but didn't do that at this stage as the test
file has only a few tests.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v1-0001-Fix-invalidation-distribution-test-failure-in-log.patch	application/octet-stream	3.9 KB

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-24 03:51:06
Message-ID:	CAA4eK1KiKGRVgpfvmYv3r77mRmW_DGhtvMWibYTeNEwu6dngEQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Tue, Jun 24, 2025 at 6:54 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Jun 23, 2025 at 11:05 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Jun 23, 2025 at 9:03 PM Hayato Kuroda (Fujitsu)
> > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > >
> > > Dear Sawada-san,
> > >
> > > > One solution is to have two expected-output files to cover both cases.
> > > > We do a similar thing for the plpgsql_cache.sql test case. What do you
> > > > think?
> > >
> > > Personally I don't like the approach because XXX_1.out can be easily missed to be
> > > updated, but it is not a strong opinion.
> >
> > I think that we have only a few releases for v13 and there would not
> > likely be many cases where we need to update _1.out file. But I'm open
> > to other ideas. Do you prefer removing the test from v13? I'm not sure
> > that just because it's easy to miss updating a _1.out file is a good
> > reason to remove tests.
> >
>
> I've added the patch for that idea for discussion. I considered moving
> the new cache-behavior-dependent test to another test file to minimize
> the maintenance effort but didn't do that at this stage as the test
> file has only a few tests.
>

Your proposal sounds good to me.

--
With Regards,
Amit Kapila.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-24 04:18:45
Message-ID:	857722.1750738725@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> writes:
> Pushed the fix (d87d07b7ad3). Thank you for working on this fix.

There is something not right about the v13 version of this patch.
BF member trilobite, which builds with -DCLOBBER_CACHE_ALWAYS,
is showing this failure [1]:

diff -U3 /home/buildfarm/trilobite/buildroot/REL_13_STABLE/pgsql.build/contrib/test_decoding/expected/invalidation_distribution.out /home/buildfarm/trilobite/buildroot/REL_13_STABLE/pgsql.build/contrib/test_decoding/output_iso/results/invalidation_distribution.out
--- /home/buildfarm/trilobite/buildroot/REL_13_STABLE/pgsql.build/contrib/test_decoding/expected/invalidation_distribution.out 2025-06-17 10:24:24.382768613 +0200
+++ /home/buildfarm/trilobite/buildroot/REL_13_STABLE/pgsql.build/contrib/test_decoding/output_iso/results/invalidation_distribution.out 2025-06-17 15:01:53.921913314 +0200
@@ -31,7 +31,7 @@
step s2_get_binary_changes: SELECT count(data) FROM pg_logical_slot_get_binary_changes('isolation_slot', NULL, NULL, 'proto_version', '1', 'publication_names', 'pub') WHERE get_byte(data, 0) = 73;
count
-----
- 0
+ 1
(1 row)

I can reproduce that locally if I add -DCLOBBER_CACHE_ALWAYS.
Have not looked for the cause.

regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2025-06-17%2008%3A24%3A00

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-24 04:22:53
Message-ID:	CAD21AoC7=dVTbE0w_9_7P15e=zmCuw=_Of9JR0=S-hyW+57bEQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Tue, Jun 24, 2025 at 1:18 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> writes:
> > Pushed the fix (d87d07b7ad3). Thank you for working on this fix.
>
> There is something not right about the v13 version of this patch.
> BF member trilobite, which builds with -DCLOBBER_CACHE_ALWAYS,
> is showing this failure [1]:

Yes. We've discussed how to fix this test failure and I proposed the
patch for that[1]. If there is no comment and objection, I'm going to
push it to v13.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoACfSCfWv%3DJtq2AjDyAKspkhNHN7ePvXyASMphYuDJPuw%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-24 04:34:26
Message-ID:	859850.1750739666@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> writes:
> On Tue, Jun 24, 2025 at 1:18 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There is something not right about the v13 version of this patch.

> Yes. We've discussed how to fix this test failure and I proposed the
> patch for that[1]. If there is no comment and objection, I'm going to
> push it to v13.

Oh, my apologies for not having kept up on that thread!

Both of the proposed patches seem like band-aids, but I'm not
sure that it's worth working harder on a branch that has only
a few months to live. I'm okay with either solution.

regards, tom lane

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-24 04:58:43
Message-ID:	aFowg2mxXDrrh8CG@paquier.xyz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Tue, Jun 24, 2025 at 10:24:03AM +0900, Masahiko Sawada wrote:
> I've added the patch for that idea for discussion. I considered moving
> the new cache-behavior-dependent test to another test file to minimize
> the maintenance effort but didn't do that at this stage as the test
> file has only a few tests.

The spec test file only includes two short permutations, making the
generated output really short. A secondary output file sounds fine to
me as long as you document in the spec file the reason why the file is
around, and your patch does that.

+# This file contains cache-behavior-dependent test case. Their reults are
[..]
+# two expected-output files to cvoer both cases.

Two typos in three lines of comments: s/reults/results/ and
s/cvoer/cover/.
--
Michael

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-24 05:14:41
Message-ID:	CAD21AoBfOJMMA_Xw6BKyWLGuhhZ6NNm+qBt-6P4gnmX3Xj_i5w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Tue, Jun 24, 2025 at 1:59 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Tue, Jun 24, 2025 at 10:24:03AM +0900, Masahiko Sawada wrote:
> > I've added the patch for that idea for discussion. I considered moving
> > the new cache-behavior-dependent test to another test file to minimize
> > the maintenance effort but didn't do that at this stage as the test
> > file has only a few tests.
>
> The spec test file only includes two short permutations, making the
> generated output really short. A secondary output file sounds fine to
> me as long as you document in the spec file the reason why the file is
> around, and your patch does that.
>
> +# This file contains cache-behavior-dependent test case. Their reults are
> [..]
> +# two expected-output files to cvoer both cases.
>
> Two typos in three lines of comments: s/reults/results/ and
> s/cvoer/cover/.

Thank you for reviewing the patch!

I've attached the updated patch. I'm going to push it barring further comments.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v2-0001-Fix-cache-dependent-test-failures-in-logical-deco.patch	application/octet-stream	4.5 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-07-02 06:32:09
Message-ID:	CAD21AoBx6bfyeC93u6pPdKUfu3ir63JF_pZaO3Z9M-8wuisPZw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-bugs

On Tue, Jun 24, 2025 at 2:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Jun 24, 2025 at 1:59 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >
> > On Tue, Jun 24, 2025 at 10:24:03AM +0900, Masahiko Sawada wrote:
> > > I've added the patch for that idea for discussion. I considered moving
> > > the new cache-behavior-dependent test to another test file to minimize
> > > the maintenance effort but didn't do that at this stage as the test
> > > file has only a few tests.
> >
> > The spec test file only includes two short permutations, making the
> > generated output really short. A secondary output file sounds fine to
> > me as long as you document in the spec file the reason why the file is
> > around, and your patch does that.
> >
> > +# This file contains cache-behavior-dependent test case. Their reults are
> > [..]
> > +# two expected-output files to cvoer both cases.
> >
> > Two typos in three lines of comments: s/reults/results/ and
> > s/cvoer/cover/.
>
> Thank you for reviewing the patch!
>
> I've attached the updated patch. I'm going to push it barring further comments.
>

Forgot to record. I pushed the fix[1] and confirmed that trilobite is
back to green.

Regards,

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d87d07b7ad3b782cb74566cd771ecdb2823adf6a

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com