Skip to content

Commit 108a423

Browse files
Dave Chinnerdjwong
authored andcommitted
xfs: Lower CIL flush limit for large logs
The current CIL size aggregation limit is 1/8th the log size. This means for large logs we might be aggregating at least 250MB of dirty objects in memory before the CIL is flushed to the journal. With CIL shadow buffers sitting around, this means the CIL is often consuming >500MB of temporary memory that is all allocated under GFP_NOFS conditions. Flushing the CIL can take some time to do if there is other IO ongoing, and can introduce substantial log force latency by itself. It also pins the memory until the objects are in the AIL and can be written back and reclaimed by shrinkers. Hence this threshold also tends to determine the minimum amount of memory XFS can operate in under heavy modification without triggering the OOM killer. Modify the CIL space limit to prevent such huge amounts of pinned metadata from aggregating. We can have 2MB of log IO in flight at once, so limit aggregation to 16x this size. This threshold was chosen as it little impact on performance (on 16-way fsmark) or log traffic but pins a lot less memory on large logs especially under heavy memory pressure. An aggregation limit of 8x had 5-10% performance degradation and a 50% increase in log throughput for the same workload, so clearly that was too small for highly concurrent workloads on large logs. This was found via trace analysis of AIL behaviour. e.g. insertion from a single CIL flush: xfs_ail_insert: old lsn 0/0 new lsn 1/3033090 type XFS_LI_INODE flags IN_AIL $ grep xfs_ail_insert /mnt/scratch/s.t |grep "new lsn 1/3033090" |wc -l 1721823 $ So there were 1.7 million objects inserted into the AIL from this CIL checkpoint, the first at 2323.392108, the last at 2325.667566 which was the end of the trace (i.e. it hadn't finished). Clearly a major problem. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Allison Collins <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
1 parent b843299 commit 108a423

File tree

1 file changed

+23
-6
lines changed

1 file changed

+23
-6
lines changed

fs/xfs/xfs_log_priv.h

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -316,13 +316,30 @@ struct xfs_cil {
316316
* tries to keep 25% of the log free, so we need to keep below that limit or we
317317
* risk running out of free log space to start any new transactions.
318318
*
319-
* In order to keep background CIL push efficient, we will set a lower
320-
* threshold at which background pushing is attempted without blocking current
321-
* transaction commits. A separate, higher bound defines when CIL pushes are
322-
* enforced to ensure we stay within our maximum checkpoint size bounds.
323-
* threshold, yet give us plenty of space for aggregation on large logs.
319+
* In order to keep background CIL push efficient, we only need to ensure the
320+
* CIL is large enough to maintain sufficient in-memory relogging to avoid
321+
* repeated physical writes of frequently modified metadata. If we allow the CIL
322+
* to grow to a substantial fraction of the log, then we may be pinning hundreds
323+
* of megabytes of metadata in memory until the CIL flushes. This can cause
324+
* issues when we are running low on memory - pinned memory cannot be reclaimed,
325+
* and the CIL consumes a lot of memory. Hence we need to set an upper physical
326+
* size limit for the CIL that limits the maximum amount of memory pinned by the
327+
* CIL but does not limit performance by reducing relogging efficiency
328+
* significantly.
329+
*
330+
* As such, the CIL push threshold ends up being the smaller of two thresholds:
331+
* - a threshold large enough that it allows CIL to be pushed and progress to be
332+
* made without excessive blocking of incoming transaction commits. This is
333+
* defined to be 12.5% of the log space - half the 25% push threshold of the
334+
* AIL.
335+
* - small enough that it doesn't pin excessive amounts of memory but maintains
336+
* close to peak relogging efficiency. This is defined to be 16x the iclog
337+
* buffer window (32MB) as measurements have shown this to be roughly the
338+
* point of diminishing performance increases under highly concurrent
339+
* modification workloads.
324340
*/
325-
#define XLOG_CIL_SPACE_LIMIT(log) (log->l_logsize >> 3)
341+
#define XLOG_CIL_SPACE_LIMIT(log) \
342+
min_t(int, (log)->l_logsize >> 3, BBTOB(XLOG_TOTAL_REC_SHIFT(log)) << 4)
326343

327344
/*
328345
* ticket grant locks, queues and accounting have their own cachlines

0 commit comments

Comments
 (0)