Skip to content

[DataCap Refresh] <4th> Review of <RFfil> #265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MikeH1999 opened this issue Jan 2, 2025 · 13 comments
Closed

[DataCap Refresh] <4th> Review of <RFfil> #265

MikeH1999 opened this issue Jan 2, 2025 · 13 comments
Assignees
Labels
Awaiting RKH Refresh request has been verified by Public, Watchdog, and Governance - now awaiting release of DC DataCap - Doubled Refresh Applications received from existing Allocators for a refresh of DataCap allowance

Comments

@MikeH1999
Copy link

MikeH1999 commented Jan 2, 2025

Basic info

  1. Type of allocator: [maual]
  1. Paste your JSON number: [1054]

  2. Allocator verification: [yes]

  1. Allocator Application
  2. Compliance Report
  1. Previous reviews

Current allocation distribution

Client name DC granted
Hox 1.54 PiB
Dahua 5.04 PiB
Dongya 2 PiB

I. Hox

  • DC requested: 5 PiB
  • DC granted so far: 1.54 PiB

II. Dataset Completion

https://1drv.ms/u/s!Ai4k4rlYLrp3a0GJ9w4AY_7e9Y0?e=9l493h

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?

yes

IV. How many replicas has the client declared vs how many been made so far:

10 vs 8

It's still in storage.

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f03099981 90.63% YES
f03099987 91.30% YES
f03251993 87.37% YES
f03241837 88.00% YES
f03242023 86.49% YES
f01660795 79.72% YES
f01431043 89.97% YES

I. Dahua

  • DC requested: 10 PiB
  • DC granted so far: 5.04 PiB

II. Dataset Completion

https://pan.baidu.com/s/1QJrDXCHwOIjJBx07oWrUUw
Extract the code:1k7h

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?

yes

IV. How many replicas has the client declared vs how many been made so far:

10 vs 10

It's still in storage.

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f02201190 58.97% NO
f01969779 62.31% NO
f03175168 73.27% NO
f02808899 59.08% NO
f02808877 67.32% NO
f03226668 74.06% NO
f03226666 73.09% NO
f03175111 71.27% NO
f03253497 25.26% NO
f0323580 36.01% NO

I. Dongya

  • DC requested: 10 PiB
  • DC granted so far: 7 PiB

II. Dataset Completion

https://pan.baidu.com/s/1QJrDXCHwOIjJBx07oWrUUw
Extract the code:1k7h

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?

yes

IV. How many replicas has the client declared vs how many been made so far:

10 vs 12
After the last DC trigger, the number of copies of the robot report was viewed to be more than 10 and has been increasing, The official team has been requested to assist in the inspection,Client explains that due to an assignment error by the technical team, and the DC trigger has been stopped for this client at this time
image

It's still in storage.

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % retrieval Meet the >75% retrieval?
f03175168 72.70% NO
f02088999 58.91% NO
f03116688 66.09% NO
f03166677 71.60% NO
f02088777 68.69% NO
f02362412 75.11% YES
f02362419 79.61% YES
f02368946 0.00% NO
f03166666 9.94% NO
f03166666 0.00% NO
f03226668 72.78% NO
f03226668 74.51% NO
f03286266 35.89% NO
f03175111 71.42% NO

Allocation summary

  1. Notes from the Allocator

Dongya

This client stores more than 10 copies and has stopped datacap issuance for this client

  1. Did the allocator report up to date any issues or discrepancies that occurred during the application processing?

yes
9. What steps have been taken to minimize unfair or risky practices in the allocation process?

Regularly check the robot's data reports

  1. How did these distributions add value to the Filecoin ecosystem?

I think they are all data that contributes to human development, such as agriculture, healthcare
11. Please confirm that you have maintained the standards set forward in your application for each disbursement issued to clients and that you understand the Fil+ guidelines set forward in your application

yes
12. Please confirm that you understand that by submitting this Github request, you will receive a diligence review that will require you to return to this issue to provide updates.

yes

@filecoin-watchdog filecoin-watchdog added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Jan 2, 2025
@filecoin-watchdog
Copy link
Collaborator

@MikeH1999
Hox

  • The retrieval rate is good with only 2 out of 10 SPs having retrieval of <10%.
  • The client disclosed new SPs (except for the last one) in the comments. However, there are 10 SPs instead of the originally declared 4. SPs F03099987, F03099981, and F03156722 appear to be using a VPN. While this is allowed by the allocator, it should only occur after additional due diligence to confirm the providers’ addresses. There is no evidence of such a process being completed.
  • The allocator claims to have retrieved sample data, unfortunately the comment [DataCap Application] <Hox> - <NHXK> MikeH1999/RFfil#60 (comment) includes only a screenshot of the sample, suggesting that no real test or attempt at retrieval from the network was performed.

Dahua

  • SPs F01969779 and F02201190 are using a VPN. In this instance, there is evidence that verification and due diligence were properly conducted by the allocator.
  • There is good geographical diversification of replicas, and the allocator is monitoring retrievability, which is a commendable approach.
  • This dataset was stored previously on the network [DataCap Application] <Zhejiang Dahua Technology Co., Ltd> - <Smart agriculture> filecoin-plus-large-datasets#1975
  • Additionally, what’s the full dataset of this application? Was the allocator able to review the full dataset?

Dongya

  • The data appears to be very valuable to society.
  • Retrievability is at approximately 50%, with 0% retrieval noted only with two providers. However, the list of SPs has not been updated in the main DataCap application, making them difficult to track.
  • There are +2 replicas and small instances of data duplication, which the user explained were caused by technical issues.
  • The allocator has demonstrated good communication and due diligence throughout the process.

@filecoin-watchdog filecoin-watchdog added Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. and removed Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Jan 18, 2025
@MikeH1999
Copy link
Author

@MikeH1999 Hox

  • The client disclosed new SPs (except for the last one) in the comments. However, there are 10 SPs instead of the originally declared 4. SPs F03099987, F03099981, and F03156722 appear to be using a VPN. While this is allowed by the allocator, it should only occur after additional due diligence to confirm the providers’ addresses. There is no evidence of such a process being completed.

Already verified their geolocation documentation at the very beginning of this review

Image

Clients have been asked to explain and provide supporting documentation again

Image

It was an oversight on my part, I may have forgotten to put the screenshot up, I will retrieve it again today!

Dahua

  • Additionally, what’s the full dataset of this application? Was the allocator able to review the full dataset?

Image
I'm not sure you're talking about the whole copy?The data is in the client's own storage facility and I can't view it

Dongya

  • Retrievability is at approximately 50%, with 0% retrieval noted only with two providers. However, the list of SPs has not been updated in the main DataCap application, making them difficult to track.

Client continue to update the added sp info in the comments, next time I'll ask them to update it in the request form, thanks for the heads up!

  • There are +2 replicas and small instances of data duplication, which the user explained were caused by technical issues.

Too much data growth has stopped the remaining DC triggers for this client

@MikeH1999
Copy link
Author

@filecoin-watchdog Thank you for all your hard work.

@filecoin-watchdog
Copy link
Collaborator

@MikeH1999

I'm not sure you're talking about the whole copy?The data is in the client's own storage facility and I can't view it

This dataset was marked as public.
Did you download any portion of the data after the first allocation to verify its content?
Did the client provide any method for retrieving the data? For instance, if you wanted to download a specific file, how would you go about it?

@MikeH1999
Copy link
Author

@filecoin-watchdog

Image
Yes, I have verified that the data is relevant to agriculture and here are the results I retrieved yesterday

Image

My retrieval method is as follows
View clients's wallet deal information on this page,and randomly select a Deal ID

Image

Then look up the information for that ID via lotus
lotus state get-deal 104129417
{ "Proposal": { "PieceCID": { "/": "baga6ea4seaqmq5z5kdi24dms4uxayxb2ifcc4iveucqdcy26eol7y66q6tph4oq" }, "PieceSize": 34359738368, "VerifiedDeal": true, "Client": "f03277311", "Provider": "f03175168", "Label": "bafybeichjigtuupz2snrxjq3xszu72okypxu2ibpazjiphjlppojiajo6u", "StartEpoch": 4645446, "EndEpoch": 5163846, "StoragePricePerEpoch": "0", "ProviderCollateral": "5052664145686490", "ClientCollateral": "0" }, "State": { "SectorNumber": 74220, "SectorStartEpoch": 4626826, "LastUpdatedEpoch": -1, "SlashEpoch": -1 } }

Finally the car file is retrieved in its entirety via boost
boost retrieve --provider f03175168 bafybeichjigtuupz2snrxjq3xszu72okypxu2ibpazjiphjlppojiajo6u

That way you can get the data for comparison

If you have an easier way to do this, I hope you can share it with me, I would appreciate it, because I have to retrieve this in its entirety to see it, and I think it's too slow for me

Thank you.

@filecoin-watchdog
Copy link
Collaborator

@MikeH1999
This only partially answers my question.
The purpose of storing data on a server is to ensure it can be easily retrieved. To do this, you need to know where the data is located and have the ability to access it. With cloud storage, there’s no need to download the entire dataset you initially uploaded. Instead, you aim to retrieve specific files or pieces of information that are relevant at that moment, rather than randomly accessing parts of the dataset that may not be useful.

With this explanation in mind, do you know where the specific data is stored? Would you be able to retrieve information on a particular topic without needing to download the whole dataset from that client?

@MikeH1999
Copy link
Author

Did the client provide any method for retrieving the data?

Client did not provide a retrieval method

do you know where the specific data is stored?

Stored in the geographic location corresponding to the SP

With cloud storage, there’s no need to download the entire dataset you initially uploaded.
Would you be able to retrieve information on a particular topic without needing to download the whole dataset from that client?

I don't know how this retrieval that you're talking about works.Can you tell me about it? I'd love to learn it.

If you can, you can randomize a few CIDs to check the

By the way, my retrieval method also picks a random CID, not a specific one

@filecoin-watchdog
Copy link
Collaborator

@MikeH1999

By the way, my retrieval method also picks a random CID, not a specific one

That's exactly what I'm referring to.
To download A SPECIFIC file, the client has to propose some sort of registry of files that will allow for the retrieval of specific data. This could be a list provided as a web page, a .csv file, a list stored in github, etc. The method is not set, nor is it important - what matters is that it is effective and allows for finding a specific place in the dataset.

Please, let me know if that's clear and easy to understand.

But in order not to prolong this discussion, I am forwarding the thread to the gov team as I have no further questions.

@filecoin-watchdog filecoin-watchdog added Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards and removed Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. labels Jan 23, 2025
@MikeH1999
Copy link
Author

@filecoin-watchdog

Please, let me know if that's clear and easy to understand.

Yes,got it,thanks

@Kevin-FF-USA
Copy link
Collaborator

Hi @MikeH1999,

Thanks for submitting this application for refresh.
Wanted to send you a friendly update - as this works its way through the system you should see a comment from Galen on behalf of the Governance this week. If you have any questions or need support until then, please let us know.

Warmly,
-Kevin

@MikeH1999
Copy link
Author

Looking forward to it.

@galen-mcandrew
Copy link
Collaborator

Good discussion above, and appreciate the transparency and knowledge sharing. To echo what I think the watchdog account is referring to: does the client (or data preparer) have some type of index or registry that would allow others to search for specific data and the corresponding CID? As a reminder, one of the main goals of allocators is to help scale useful distributed data onboarding onto the Filecoin network, and part of that work means supporting clients and SPs so that future users are able to locate this useful data.

This is a great example of an area where allocators can set themselves apart and increase their overall strength in the network. As you work with clients and data preparers that are bringing new useful open data, we would love to see the ways that allocators are helping drive standards around indexes or registries.

Some other areas to summarize from above, where you should continue to focus work:

  • Updating bookkeeping and records to accurately reflect details, such as SP distribution lists
  • Proactive investigation into standards and SLAs for clients and SPs
  • Continued investigation and enforcement of various standards and SLAs, such as distribution, replicas, and VPN usage
  • Increasing standards for open retrieval rates

Given the evidence of compliance, we are requesting 20PiB of DataCap from RKH.

@Kevin-FF-USA Kevin-FF-USA added Awaiting RKH Refresh request has been verified by Public, Watchdog, and Governance - now awaiting release of DC DataCap - Doubled and removed Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards labels Feb 13, 2025
@Kevin-FF-USA
Copy link
Collaborator

Hi @MikeH1999

Friendly update on this refresh.

We are currently in the process of moving to a Metaallocator. In order for the tooling to work correctly an allocator can only use the DataCap balance they received through direct allocation from Root Key Holders, >>> or the DataCap received through Metaallocator. As a result, some of the metrics pages like Datacapstats, Pulse and other graphs might be a little confused during this update.

You will not lose any of the DataCap, but you will see that your refresh is amount of DC from refresh + remaining DC an allocator has left.

No action needed on your part, just a friendly note to thank you for your contributions and patience, and you may notice changes in your DataCap balance while the back end is updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting RKH Refresh request has been verified by Public, Watchdog, and Governance - now awaiting release of DC DataCap - Doubled Refresh Applications received from existing Allocators for a refresh of DataCap allowance
Projects
None yet
Development

No branches or pull requests

4 participants