Skip to content

[DataCap Refresh] <4th> Review of <ZCFIL+> #348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zcfil opened this issue Apr 26, 2025 · 4 comments
Open

[DataCap Refresh] <4th> Review of <ZCFIL+> #348

zcfil opened this issue Apr 26, 2025 · 4 comments
Assignees
Labels
Awaiting RKH Refresh request has been verified by Public, Watchdog, and Governance - now awaiting release of DC Refresh Applications received from existing Allocators for a refresh of DataCap allowance

Comments

@zcfil
Copy link

zcfil commented Apr 26, 2025

Basic info

  1. Type of allocator: manual
  1. Paste your JSON number: (v5 Notary Allocator Application:ZCFIL notary-governance#1009)

  2. Allocator verification: yes

  1. Allocator Application
  2. Compliance Report
  1. Previous reviews

Current allocation distribution

Client name DC granted
FengwoExtraordinary 0.5 PiB
Guangdong Zongheng Dapeng Innovation Technology Co., Ltd 2.5 PiB
Allen Institute 2 PiB

I. FengwoExtraordinary

  • DC requested: 15 PiB
  • DC granted so far: 7.2 PiB

II. Dataset Completion

https://drive.google.com/drive/folders/1cTLYO-9vumVrpT0aY0QQAGl5E6N_boT9?usp=sharing

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?

yes
IV. How many replicas has the client declared vs how many been made so far:

10 vs 12
Clients have been explained accordingly and reports are generated on an ongoing basis for observation with no significant subsequent increases
image

I. Allen Institute

  • DC requested: 9 PiB
  • DC granted so far: 7.7 PiB

II. Dataset Completion

aws s3 ls --no-sign-request s3://allen-mouse-brain-atlas/

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?

yes
IV. How many replicas has the client declared vs how many been made so far:

10 vs 11

Image

I. Guangdong Zongheng Dapeng Innovation Technology Co., Ltd

  • DC requested: 10 PiB
  • DC granted so far: 2.5 PiB

II. Dataset Completion

https://www.alipan.com/t/fHndX45PNnJn9WJpHBNf

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?

yes
IV. How many replicas has the client declared vs how many been made so far:

10 VS 10

Image The retrieval rate is not shown because the retrieval rate continues to drop, even to 0, due to ipni failure

Allocation summary

  1. Notes from the Allocator

Issues identified in the previous round have been addressed accordingly

Currently retrieval rates are consistently low or even 0 due to issues with the Spark team and IPMI, and I'd like to hear from you guys what we're going to do about this situation

  1. Did the allocator report up to date any issues or discrepancies that occurred during the application processing?

Yes, it's well documented, and we'll be the first to raise issues as they arise in our observations.

  1. What steps have been taken to minimize unfair or risky practices in the allocation process?

Regularly generate reports, retrieve sector data, etc.

  1. How did these distributions add value to the Filecoin ecosystem?

All are data types related to sustainable human development

  1. Please confirm that you have maintained the standards set forward in your application for each disbursement issued to clients and that you understand the Fil+ guidelines set forward in your application

Yes

  1. Please confirm that you understand that by submitting this Github request, you will receive a diligence review that will require you to return to this issue to provide updates.

Yes

@Amin-Foundation Amin-Foundation added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards labels Apr 28, 2025
@filecoin-watchdog filecoin-watchdog added Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. and removed Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Apr 28, 2025
@filecoin-watchdog
Copy link
Collaborator

Well-structured questionnaire for client onboarding,[link].
Evidence of basic due diligence in client/entity verification, including the collection of business licenses and user identification on all clients.
While the client's pay allocator deposits 100 FIL, there are no clear, deterministic rules for assessing if and how this deposit might be reduced/slashed Consequently, there are no clear guidelines provided on how funds will be deducted from the deposit or what those funds will be used for afterward. The absence of such deterministic rules makes it impossible to clearly assess the conditions under which a deposit should be subject to penalties.

[DataCap Application] Agriculture – Fengwo Extraordinary Agricultural # 6

  • After the last review, the allocator confirmed the locations of potential VPN-using nodes.
  • The list of service providers is updated.
  • Historical reports show that retrievability has not improved significantly, current results are misrepresented as ongoing IPNI technical issues are yet not resolved.
  • The number of replicas has steadily increased to the target of 10 without adding data to nodes that already have 11 or 12 copies.
  • The client declared a single data size of 1.5 PiB, but currently only 783 TiB of unique data is on the network—about 50% of the expected amount.
  • No explanation is provided regarding the database redundancy (raised by the governance team in the last review); the allocator did not seek clarification from the client.
  • Dataset seems excessively large, users should consider reducing the number of replicas and storing under a single allocator. It remains unclear whether data overlaps, what exactly is stored (there is no index file), or how community members can retrieve it (issues also noted by the governance team on previous review).
  • No HTTP endpoint was found for test downloads (this may be due to the IPNI issue and should be checked again once it’s resolved).

[DataCap Application] AWS Open Data – Aind_mouse # 3

  • The allocator confirmed the locations of VPN-enabled nodes.
  • Data replication to 11 copies has stopped, though distribution toward the target replica count is slowly improving.
  • The list of service providers is updated.
  • Historical reports indicate that retrievability has not improved significantly; results are again misrepresented as IPNI technical issues are yet not resolved
  • The dataset is currently receiving DataCap from four other allocators, and the same user is also receiving DataCap for the same set from another allocator—raising the same questions as with the previous client.
    • No explanation is provided regarding database redundancy (raised by the governance team in the last review); the allocator did not seek clarification from the client.
    • If the dataset seems large, users should consider reducing the number of replicas and storing them under a single allocator. It remains unclear whether data overlaps, what exactly is stored (there is no index file), or how community members can retrieve it (issues noted by the governance team previously).

[DataCap Application] <Guangdong Zongheng Dapeng Innovation Technology Co., Ltd> - <UAV> # 27

  • Evidence shows proper identification of VPN-operating nodes and confirmation of their geographical locations.
  • A test piece retrieval was successfully performed.
  • The index file format is difficult to verify and use; it should be updated during sealing so users know exactly which data each piece contains.
    There are retrievability issues, likely caused by the ongoing IPNI problems.
  • Data distribution across replicas is not ideal, though this may be explained by the current sealing process.

@filecoin-watchdog filecoin-watchdog added Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. and removed Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. labels Apr 28, 2025
@zcfil
Copy link
Author

zcfil commented May 1, 2025

The client declared a single data size of 1.5 PiB, but currently only 783 TiB of unique data is on the network—about 50% of the expected amount.

DC requested: 15 PiB
DC granted so far: 7.2 PiB

I think it's also because only half of the datacap has been allocated so far

No explanation is provided regarding the database redundancy (raised by the governance team in the last review); the allocator did not seek clarification from the client.

I note that this was a recommendation of the second review, which was an oversight on my part, only noting the reminder of the third review,all items of the third review have been followed up.

Overall good compliance here, and summarizing areas to continue focusing on:

  • Accurate dataset size calculations, with minimal overhead or padding
  • Continue to reduce excessive duplicates across the network
  • Continue to increase geographic distribution to ensure data availability
  • Updating bookkeeping and records to maintain accuracy, such as updating SP lists
  • Increasing retrieval rates

To help support this increased compliance and data onboarding, we are requesting 5PiB of DataCap from RKH.

If the dataset seems large, users should consider reducing the number of replicas and storing them under a single allocator. It remains unclear whether data overlaps, what exactly is stored (there is no index file), or how community members can retrieve it (issues noted by the governance team previously).

stcloudlisa/Allocator-Pathway-Data#5
You can see that the application has been closed, currently retaining only the applications under my allotment

No HTTP endpoint was found for test downloads (this may be due to the IPNI issue and should be checked again once it’s resolved).

These issues will be focused on in the future, thanks again for the heads up!

While the client's pay allocator deposits 100 FIL, there are no clear, deterministic rules for assessing if and how this deposit might be reduced/slashed Consequently, there are no clear guidelines provided on how funds will be deducted from the deposit or what those funds will be used for afterward. The absence of such deterministic rules makes it impossible to clearly assess the conditions under which a deposit should be subject to penalties.

I've also made some improvements to address the situation that has arisen with existing clients

  • For exceeding the number of declared copies, deduct 50 FIL, explain the situation, and if the problem still occurs, deduct all remaining FIL and no longer trigger the remaining datacap
  • For cases where CID sharing occurs, 100 FIL is deducted and the remaining datacap is no longer triggered
  • For long periods of time with 0 retrieval rate (2+ weeks), one sp deducts 10FIL.(Except in special cases, such as current ipni failure),After the penalty can pledge 100FIL again to get the subsequent datacap trigger

The new rules will be sent again to the comments of the client application

@filecoin-watchdog Thank you for your detailed review and I hope to see more improvements in subsequent allocations!

@filecoin-watchdog
Copy link
Collaborator

@zcfil
Thank you. I have nothing more to add.
@Amin-Foundation

@Amin-Foundation
Copy link
Collaborator

Transparency & Community Engagement: The allocator demonstrates strong collaboration and communication practices. Their application is well-maintained and provides transparency into their allocation process. Overall, the allocator is responsive, diligent, and appears committed to the long-term development of their solution within the Filecoin+ ecosystem.

Data Distribution: The allocator has shown notable improvement in geographical distribution, though the majority of activity remains concentrated in Asia. Distribution across Storage Providers (SPs) is balanced, with no single SP receiving a disproportionate share of allocation. However, we observed that a single client (FengwoExtraordinary) received a significant portion of recent allocations; continued monitoring is advised to ensure ecosystem diversity is maintained.

Replication & Redundancy: Data replication is compliant with the minimum threshold required; the number of replicas remains on the high side with 11+ instances. Database redundancy should be clarified.

Retrievability & Quality: Retrievability remains a primary area of concern, with RSR values well below the 75% threshold. Although this appears to be partially attributed to ongoing IPNI outages, the allocator’s historical RSR scores have not improved meaningfully since prior audits. The allocator has been transparent in flagging these issues, but performance improvement will be necessary moving forward to ensure high-quality data access.

DataCap Usage Analysis: Allocation velocity is uneven, with spikes in distribution followed by longer periods of inactivity, likely resulting from batch allocation patterns rather than the more regular tranche schedule originally outlined in the application. This behavior introduces inconsistencies that may hinder predictability and transparency in the allocation process. That said, SP-level distribution remains compliant.

Notable Flags

  • Batch allocation behavior observed since October 2024.
  • Very low retrievability scores, partially attributed to IPNI issues.
  • FengwoExtraordinary received the vast majority of recent DataCap allocations.
  • Database redundancy not clearly described.

We recommend matching the current allocation of 5 PiB. This would allow the allocator to continue developing their operational maturity while providing time to address key challenges:

  • Improve retrievability performance post-IPNI stabilization.
  • Align distribution cadence with the original tranche model.
  • Maintaining diversity across clients and SPs.
  • Clarify database-level redundancy practices.

@Amin-Foundation Amin-Foundation added Awaiting RKH Refresh request has been verified by Public, Watchdog, and Governance - now awaiting release of DC and removed Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards labels May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting RKH Refresh request has been verified by Public, Watchdog, and Governance - now awaiting release of DC Refresh Applications received from existing Allocators for a refresh of DataCap allowance
Projects
None yet
Development

No branches or pull requests

4 participants