Skip to content

Subscribe from the log offset that has been ttl don't throw OffsetOutOfRangeException #521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 2 tasks
luoyuxia opened this issue Mar 3, 2025 · 8 comments · May be fixed by #737
Open
1 of 2 tasks

Subscribe from the log offset that has been ttl don't throw OffsetOutOfRangeException #521

luoyuxia opened this issue Mar 3, 2025 · 8 comments · May be fixed by #737
Milestone

Comments

@luoyuxia
Copy link
Collaborator

luoyuxia commented Mar 3, 2025

Search before asking

  • I searched in the issues and found nothing similar.

Fluss version

main (development)

Please describe the bug 🐞

If subscribe from the log offset that has been delete by remote log ttl, I expect it to throw OffsetOutOfRangeException, but it doesn't

Can be reproduce by the following IT:

public class FlussLogITCase {
    @RegisterExtension
    public static final FlussClusterExtension FLUSS_CLUSTER_EXTENSION =
            FlussClusterExtension.builder()
                    .setNumOfTabletServers(3)
                    .setClusterConf(initConfig())
                    .build();

    private static Configuration initConfig() {
        Configuration conf = new Configuration();
        conf.set(ConfigOptions.LOG_SEGMENT_FILE_SIZE, MemorySize.parse("1kb"));
        conf.set(ConfigOptions.REMOTE_LOG_TASK_INTERVAL_DURATION, Duration.ofSeconds(1));
        return conf;
    }

    protected Connection conn;
    protected Admin admin;
    protected Configuration clientConf;

    @BeforeEach
    protected void setup() throws Exception {
        clientConf = FLUSS_CLUSTER_EXTENSION.getClientConfig();
        conn = ConnectionFactory.createConnection(clientConf);
        admin = conn.getAdmin();
    }

    @Test
    void testSubscribeTTLLog() throws Exception {
        TablePath tablePath = TablePath.of("test_db", "testSubscribeTTLLog");
        TableDescriptor tableDescriptor =
                TableDescriptor.builder()
                        .schema(
                                Schema.newBuilder()
                                        .column("a", DataTypes.INT())
                                        .column("b", DataTypes.STRING())
                                        .build())
                        // ttl eagerly
                        .property(ConfigOptions.TABLE_LOG_TTL, Duration.ofSeconds(1))
                        .distributedBy(1)
                        .build();
        createTable(tablePath, tableDescriptor, false);
        try (Table table = conn.getTable(tablePath)) {
            AppendWriter appendWriter = table.newAppend().createWriter();
            for (int n = 0; n < 10; n++) {
                for (int i = 0; i < 10; i++) {
                    appendWriter.append(row(1, "a"));
                }
                appendWriter.flush();
            }

            // wait 5s for log is ttl
            Thread.sleep(5_000);

            try (LogScanner logScanner = table.newScan().createLogScanner()) {
                logScanner.subscribe(0, 2);
                while (true) {
                    ScanRecords scanRecords = logScanner.poll(Duration.ofSeconds(1));
                    scanRecords.iterator().forEachRemaining(System.out::println);
                }
            }
        }
    }

    protected long createTable(
            TablePath tablePath, TableDescriptor tableDescriptor, boolean ignoreIfExists)
            throws Exception {
        admin.createDatabase(tablePath.getDatabaseName(), DatabaseDescriptor.EMPTY, ignoreIfExists)
                .get();
        admin.createTable(tablePath, tableDescriptor, ignoreIfExists).get();
        return admin.getTableInfo(tablePath).get().getTableId();
    }
}

Solution

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@luoyuxia luoyuxia added this to the v0.7 milestone Mar 10, 2025
@gyang94
Copy link
Contributor

gyang94 commented Mar 10, 2025

Hi @luoyuxia , I am interested in this bug, could you assign it to me?
I tried to run this test locally, it will keep fetching without any data printed out. I suppose somewhere didn't handle the offset checking after data was expired.

@wuchong
Copy link
Member

wuchong commented Mar 11, 2025

@gyang94, do you already have an idea for how to address this? If so, we can discuss the solution in the issue thread first.

@gyang94
Copy link
Contributor

gyang94 commented Mar 11, 2025

Hi @wuchong @luoyuxia, I tried to debug locally and there are some findings:

  1. In the server side, when Replica read log, there indeed check the offset and throw out the LogOffsetOutOfRangeException But this exception was handled in the ReplicaManager.handleFetchOutOfRangeException(), which put the error code and message in the response payload, with records field is null. PbFetchLogRespForBucket(errorCode=19, errorMessage="xxx", records=null).
Image

In TabletService.fetchLog(FetchLogRequest request) method, the future complete successfully without any exception with this "error" response paylod.
response.complete(RpcMessageUtils.makeFetchLogResponse(fetchResponseMap))

  1. Then, in the view of client side (LogFetcher), the server successfully returns the log without any exception. Thus in the LogFetcher.handleFetchLogResponse() method, the Throwable e is null. The client doesn't see any exception so that it will normally handle the response. It finds the records field is null so that nothing was added to logFetchBuffer, then nothing could be polled from logFetchBuffer.

Thus we have two ways to fix it: one is from server side, another is from client side. It depends on our design.

Proposed solution 1: in the server side, we won't wrap the exception into a response payload, just expose this exception and return to client. e.g. response.completeExceptionally(e).

Proposed solution 2: in the client side, when handle response, it needs to check the payload to see whether there is an error. If there is, throw exception based on the error code.

@wuchong
Copy link
Member

wuchong commented Mar 12, 2025

@gyang94 Thanks for the detailed investigation! I prefer option#2, as the bucket-level response/error is designed to have fine-grained handling for such case. In addition, we should only throw non RetriableException exception, and just log errors for RetriableExceptions. Besides, we should also throw non RetriableException exceptions for the top error of fetch response.

What do you think @luoyuxia ?

@luoyuxia
Copy link
Collaborator Author

I aslo perfer option#2. It's nature that client should check the per-bucket error from response.

@gyang94
Copy link
Contributor

gyang94 commented Mar 14, 2025

Hi @wuchong and @luoyuxia , I prefer option#2 too. But I found another problem we need to consider in solution#2:

Currently in the LogFetcher, the handleFetchLogResponse function are called in CompletedFuture.whenComplete() method, it is not in the main thread.

gateway.fetchLog(fetchLogRequest)
.whenComplete((fetchLogResponse, e) ->
handleFetchLogResponse(xxx)
});

If we check responses and throw exceptions in handleFetchLogResponse() function, these exceptions won't affect the the main poll() thread, which keeps trying to poll records from logFetchBuffer. No exception will be thrown in the main thread. It will continue loop the polling with empty record returned. Like this:

Image

Proposed Solution:
To throw exceptions in the main thread, we may need to consider to continue put the "error" log response into LogFetherBuffer and handle it after polled out from the buffer. Then it could break the polling loop, and the whole polling will end with exceptions.

Image

@gyang94
Copy link
Contributor

gyang94 commented Mar 21, 2025

@wuchong @luoyuxia Hi
Could we quickly discuss the question above about solution#2 when you have time?
If we can finalize this solution then I am going to implement it.

@wuchong
Copy link
Member

wuchong commented Mar 22, 2025

@gyang94 I think we can just remove the !MemoryLogRecords.EMPTY.equals(logRecords) condition. I think this condition is added by mistake in #242, could you confirm that @loserwang1024 ?

The CompletedFetch already extracts the bucket-level error from the response. And LogFetchCollector#handleInitializeErrors will check the error to throw exceptions.

} else if (error == Errors.LOG_OFFSET_OUT_OF_RANGE_EXCEPTION) {

Besides, the exception thrown here shouldn't be an ApiException, because ApiException doesn't have exception stack which is hard to debug. We can throw IllegalStateException here.

Please also add test (UT or IT) to reproduce this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants