Skip to content

[DataCap Refresh] <4th> Review of <EF> #263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Lind111 opened this issue Dec 20, 2024 · 17 comments
Closed

[DataCap Refresh] <4th> Review of <EF> #263

Lind111 opened this issue Dec 20, 2024 · 17 comments
Assignees
Labels
Awaiting RKH Refresh request has been verified by Public, Watchdog, and Governance - now awaiting release of DC DataCap - Refreshed Refresh Applications received from existing Allocators for a refresh of DataCap allowance

Comments

@Lind111
Copy link

Lind111 commented Dec 20, 2024

Basic info

  1. Type of allocator: [manual]
  1. Paste your JSON number: [1056]

  2. Allocator verification: [yes]

  1. Allocator Application
  2. Compliance Report
  1. Previous reviews

Current allocation distribution

Client name DC granted
globalnightlight 5 PiB
Art-related data 2.92 PiB

I. globalnightlight

  • DC requested: 8 PiB
  • DC granted so far: 8 PiB

II. Dataset Completion

aws s3 ls --no-sign-request s3://globalnightlight/

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?

Client is disclosed in advance when adding new SPs
IV. How many replicas has the client declared vs how many been made so far:

10 vs 12
The client has explained accordingly and will keep an eye on the subsequent data distribution
image

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f01969779 62.52% NO
f02201190 58.94% NO
f03175168 73.40% NO
f03166677 71.81% NO
f03166688 66.83% NO
f02639492 79.26% YES
f02362412 75.03% YES
f02368946 0.00% NO
f03166666 60.83% NO
f03166668 0.00% NO
f03175111 71.30% NO
f2822222 35.76% NO
f03253497 25.59% NO
f03253580 90.69% YES

The issue of retrieval rate has been followed up, and no client has been found to send deal to SPs with 0 retrieval rate subsequently

Allocation summary

  1. Notes from the Allocator

Issues such as copies of appeals and retrieval rates are being followed up on an ongoing basis.

  1. Did the allocator report up to date any issues or discrepancies that occurred during the application processing?

yes

  1. What steps have been taken to minimize unfair or risky practices in the allocation process?

Regularly generate cid reports to follow up on data distribution

  1. How did these distributions add value to the Filecoin ecosystem?

are publicly available datasets that can be viewed by all at any time

  1. Please confirm that you have maintained the standards set forward in your application for each disbursement issued to clients and that you understand the Fil+ guidelines set forward in your application

yes
12. Please confirm that you understand that by submitting this Github request, you will receive a diligence review that will require you to return to this issue to provide updates.

yes

@Kevin-FF-USA Kevin-FF-USA added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Dec 20, 2024
@Kevin-FF-USA
Copy link
Collaborator

HI @Lind111,

Thanks for using the new template. Wanted to check in with timelines for you on this application.

With the Holiday break coming up the teams will be ooo until January which means you may not see a Watchdog or Governance comment until that time.

Warmly,
-Kevin

@filecoin-watchdog
Copy link
Collaborator

@Lind111
SmallArt Ltd.
The client did not update all the Storage Providers (SPs) used for deals.
For most SPs on the list, there was duplication, averaging about 10% per SP. Although this percentage is not high, it should still be monitored.
Out of 7 SPs:

  • 4 SPs have a retrieval success rate of 0%,
  • 1 SP has a retrieval rate of 7%,
  • The remaining 2 SPs have a retrieval rate of 41%.

WorldBankGroup
In the previous review, it was noted that the client is working with multiple allocators simultaneously. Was this issue discussed with the client after the review?
Of 14 SPs, only 2 have a retrieval success rate at a satisfactory level.
SP performance details: SPs F03166666, F02639492, F02368946, and F02362412 have duplication rates below 20%. While this rate is relatively low, it should still be monitored.
The allocator continues to run CID reports and requests explanations from the client when discrepancies are found.

AMEstadium
The data preparation information provided by the client is inaccurate. The allocator did not ask follow-up questions, even though the client mentioned that the dataset would be prepared in tar or zip files. These formats are not suitable for storing data on Filecoin.
Although the client claimed the data had not been stored before, at least two applications have been identified with identical data samples and descriptions:

The dataset is marked as "open," but its description suggests it is private and not publicly accessible. There is no index or verification to justify the reported 650 TiB of data.

Additional issues:

  • The allocator requires at least four SPs, but the client is currently using only three.
  • SPs f03238633 and f01531188 are using a VPN, which violates the application rules.

@filecoin-watchdog filecoin-watchdog added Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. and removed Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Jan 13, 2025
@Lind111
Copy link
Author

Lind111 commented Jan 14, 2025

@Lind111 SmallArt Ltd. The client did not update all the Storage Providers (SPs) used for deals.

Are you referring to the updates to be made to the application form?
Clients have disclosed in advance in the comments
Lind111/EF#34 (comment)
Lind111/EF#34 (comment)

  • 4 SPs have a retrieval success rate of 0%,
  • 1 SP has a retrieval rate of 7%,
  • The remaining 2 SPs have a retrieval rate of 41%.

I have been following up on the retrieval rate, and at this point the client has not provided a reasonable explanation and will not trigger a subsequent DC

I'm wondering about the open source retrieval tool that the client mentioned here, is this recognized?

WorldBankGroup In the previous review, it was noted that the client is working with multiple allocators simultaneously. Was this issue discussed with the client after the review?

This one I did notice, and I realized that the client was storing different data, so I didn't ask too much about it
image

Of 14 SPs, only 2 have a retrieval success rate at a satisfactory level.
Client explains database crash, it's fixed

AMEstadium The data preparation information provided by the client is inaccurate. The allocator did not ask follow-up questions, even though the client mentioned that the dataset would be prepared in tar or zip files. These formats are not suitable for storing data on Filecoin.

My understanding is that the client compresses it into a tar or zip file and then uses the appropriate tool to make a car file for storage.

Although the client claimed the data had not been stored before, at least two applications have been identified with identical data samples and descriptions:

The dataset is marked as "open," but its description suggests it is private and not publicly accessible. There is no index or verification to justify the reported 650 TiB of data.

This client has just started applying and is currently in the second round,I'll follow up on those questions.

Additional issues:

  • The allocator requires at least four SPs, but the client is currently using only three.

The latest report now has 4 SPs spread across 3 different regions

  • SPs f03238633 and f01531188 are using a VPN, which violates the application rules.

The current client explains and provides supporting documentation in the comments

@filecoin-watchdog Finally thank you for your review, if there are any other issues please feel free to point them out, thanks again!

@Lind111
Copy link
Author

Lind111 commented Jan 14, 2025

  • SPs f03238633 and f01531188 are using a VPN, which violates the application rules.

The current client explains and provides supporting documentation in the comments

@filecoin-watchdog I would like to know the way you check VPN,As well as confirming that the client's response proves that he is not using a VPN

@filecoin-watchdog
Copy link
Collaborator

@Lind111
One additional observation regarding the AMEstadium client:

I retrieved a random piece from their dataset. Fortunately, the data is retrievable; however, the content of piece baga6ea4seaqibmmmvxms6k5uw7qfborr7jb5zvevu7zs2heicrhl3b4q26vwsdq from SP f03238633 consists of screen recordings of someone playing Subway Surfers. Below are two example files from this piece:

image

Questions to consider:

  1. Is this really the data that was declared to be stored?
  2. Does this data hold significant value for humanity as claimed?

@filecoin-watchdog
Copy link
Collaborator

I'm wondering about the open source retrieval tool that the clienthttps://github.com/Lind111/EF/issues/34#issuecomment-2563410039, is this recognized?

I don't recognize this tool. You can always ask the client what they are using. It also looks like the clients might not understand how that works. I recommend checking the official spark documentation https://docs.filspark.com/troubleshooting-miner-score#block-3c41a58cb03f4a8593924d0af9e8800b


This one I did notice, and I realized that the client was storing different data, so I didn't ask too much about it

In the previous review, Galen has said:

Overall good diligence and compliant behavior. It would be good to see how this allocator is investigating clients that are working with multiple allocator pathways. It is reasonable for a client or data preparer to be working with multiple teams, but there should be investigation, diligence, transparency, and justification for that behavior. (...)

This is why I asked if you did any follow up after the third review.


My understanding is that the client compresses it into a tar or zip file and then uses the appropriate tool to make a car file for storage.

Was this confirmed with the client or is it your assumption?


I would like to know the way you check VPN,As well as confirming that the client's response proves that he is not using a VPN

For SP f03238633: https://www.ipqualityscore.com/vpn-ip-address-check/lookup/103.25.202.111
For SP f01531188: https://www.ipqualityscore.com/vpn-ip-address-check/lookup/47.242.91.210

@Lind111
Copy link
Author

Lind111 commented Jan 15, 2025

@bashyang Thanks for the clarification.

@Lind111
Copy link
Author

Lind111 commented Jan 15, 2025

@filecoin-watchdog I'll follow up on these omissions, I've benefited greatly, and thank you again for your detailed review that

@filecoin-watchdog
Copy link
Collaborator

@Lind111 Would you like to respond to my last comments, or do you feel everything has been explained?

@Lind111
Copy link
Author

Lind111 commented Jan 16, 2025

Client accounts are banned and comments as well as request forms have disappeared
The following was previously clarified

Image

@Lind111
Copy link
Author

Lind111 commented Jan 16, 2025

@filecoin-watchdog Sorry, I was waiting for your reply.

@filecoin-watchdog
Copy link
Collaborator

@Lind111 I didn't see the above comment before it disappeared, hence the confusion.
Well, I guess there is nothing more to say about AMEstadium.
If any DC needs to be revoked from this client, please notify the gov team.

@filecoin-watchdog filecoin-watchdog added Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards and removed Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. labels Jan 16, 2025
@Lind111
Copy link
Author

Lind111 commented Jan 16, 2025

This is his last report.
and documents proving his geographic location.

Image

Image

@Lind111
Copy link
Author

Lind111 commented Jan 16, 2025

If any DC needs to be revoked from this client, please notify the gov team.

Got it. Thank you.

Checked to see that the client has already used up the second round of DC

Image

@Kevin-FF-USA
Copy link
Collaborator

Hi @Lind111,

Thanks for submitting this application for refresh.
Wanted to send you a friendly update - as this works its way through the system you should see a comment from Galen on behalf of the Governance this week. If you have any questions or need support until then, please let us know.

Warmly,
-Kevin

@galen-mcandrew
Copy link
Collaborator

Appreciate the investigations and communication above, but I want to flag that the allocators should be performing these diligence components in advance of investigations from community members, like the watchdog account. Specifically, it is unclear if the allocator performed data set retrieval and sampling before issues were flagged by the community. It appears that the allocator is pulling CID checker reports, and monitoring distribution and overall retrieval test results. This is good compliance checking, but does not cover the full range or scope that an allocator can be using to ensure compliance with their pathway. As reminder the goal is distributed onboarding of useful data according to each allocator's SLA's and requirements.

Some specific areas that should be focused on going forwards:

  • Continued diligence around data distribution
  • Continued investigation and enforcement of retrieval testing rates
  • Specific dataset and retrieval sampling to check for data accuracy
  • Investigating datasets and clients to monitor and then address redundant duplication
  • Investigating client claims around data preparation

Given the interventions and diligence investigation we are seeing, we are requesting 10PiB of DataCap from RKH.

@Kevin-FF-USA Kevin-FF-USA added Awaiting RKH Refresh request has been verified by Public, Watchdog, and Governance - now awaiting release of DC DataCap - Refreshed and removed Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards labels Feb 13, 2025
@Kevin-FF-USA
Copy link
Collaborator

Hi @Lind111

Friendly update on this refresh.

We are currently in the process of moving to a Metaallocator. In order for the tooling to work correctly an allocator can only use the DataCap balance they received through direct allocation from Root Key Holders, >>> or the DataCap received through Metaallocator. As a result, some of the metrics pages like Datacapstats, Pulse and other graphs might be a little confused during this update.

You will not lose any of the DataCap, but you will see that your refresh is amount of DC from refresh + remaining DC an allocator has left.

No action needed on your part, just a friendly note to thank you for your contributions and patience, and you may notice changes in your DataCap balance while the back end is updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting RKH Refresh request has been verified by Public, Watchdog, and Governance - now awaiting release of DC DataCap - Refreshed Refresh Applications received from existing Allocators for a refresh of DataCap allowance
Projects
None yet
Development

No branches or pull requests

5 participants
@filecoin-watchdog @galen-mcandrew @Kevin-FF-USA @Lind111 and others