Filtered Chameleon and Squirrel datasets #10102

RX28666 · 2025-03-07T17:20:11Z

RX28666
Mar 7, 2025

Hi, I'm following up on a previous inquiry #7531 regarding the filtered Chameleon and Squirrel datasets. Since the original datasets are problematic, I am wondering if the filtered version is available or not. Meanwhile, I'm trying to convert the provided filtered datasets into the PyG data format. However, it appears that I missed assigning some required data attributes.

Below is the code I've been following and using along with the bug I encountered. I am not sure the meaning of the attribute "edge_stores". And I am also wondering what additional attributes I need to include in order to successfully convert the original data into the standard PyG data format. Thanks.

The code I used:

class Data:
    def __init__(self):
        self.edge_index = None
        self.edge_attr = None
        self.x = None
        self.y = None
        self.num_features = None
        self.num_classes = None
        self.num_nodes = None
        self.train_mask = None
        self.val_mask = None
        self.test_mask = None
        self.device = None

    def get_idx_split(self, split_type="random", train_prop=0.5, valid_prop=0.25):
        """
        train_prop: The proportion of dataset for train split. Between 0 and 1.
        valid_prop: The proportion of dataset for validation split. Between 0 and 1.
        """

        if split_type == "random":
            # ignore_negative = False if self.name == "ogbn-proteins" else True
            ignore_negative = True
            train_idx, valid_idx, test_idx = rand_train_test_idx(
                self.y,
                train_prop=train_prop,
                valid_prop=valid_prop,
                ignore_negative=ignore_negative,
            )
            # split_idx = {"train": train_idx, "valid": valid_idx, "test": test_idx}
            num_nodes = self.x.shape[0]
            self.train_mask = torch.tensor(
                [True if idx in train_idx else False for idx in range(num_nodes)]
            )
            self.val_mask = torch.tensor(
                [True if idx in valid_idx else False for idx in range(num_nodes)]
            )
            self.test_mask = torch.tensor(
                [True if idx in test_idx else False for idx in range(num_nodes)]
            )
        # return train_mask, valid_mask, test_mask
    def to_device(self):
        self.edge_index = self.edge_index.to(self.device)
        self.x = self.x.to(self.device)
        self.y = self.y.to(self.device)
        self.train_mask = self.train_mask.to(self.device)
        self.val_mask = self.val_mask.to(self.device)
        self.test_mask = self.test_mask.to(self.device)
        if self.edge_attr is not None:
            self.edge_attr = self.edge_attr.to(self.device)

Where the bug appeared:

input = copy.deepcopy(data)
input = T.ToSparseTensor()(input)

The bug:

 in ToSparseTensor.forward(self, data)
     [71]def forward(
     [72]self,
     [73]data: Union[Data, HeteroData],
     [74] -> Union[Data, HeteroData]:
---> [76]    for store in data.edge_stores:
     [77]     if 'edge_index' not in store:
     [78]        continue

AttributeError: 'Data' object has no attribute 'edge_stores'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filtered Chameleon and Squirrel datasets #10102

{{title}}

Replies: 0 comments

Select a reply

Filtered Chameleon and Squirrel datasets #10102

RX28666 Mar 7, 2025

Replies: 0 comments

RX28666
Mar 7, 2025