- Download the dataset from the link provided in the table below.
- Unzip the dataset and put it in the Any directory.
- Make sure all data files are in the same directory.
- Necessary files for Yelp:
yelp_academic_dataset_business.json
,yelp_academic_dataset_user.json
,yelp_academic_dataset_review.json
- Necessary files for Amazon:
Industrial_and_Scientific.csv
,Musical_Instruments.csv
,Video_Games.csv
,Industrial_and_Scientific.jsonl
,Musical_Instruments.jsonl
,Video_Games.jsonl
,meta_Industrial_and_Scientific.jsonl
,meta_Musical_Instruments.jsonl
,meta_Video_Games.jsonl
- Necessary files for Goodreads:
goodreads_books_children.json
,goodreads_reviews_children.json
,goodreads_books_comics_graphic.json
,goodreads_reviews_comics_graphic.json
,goodreads_books_poetry.json
,goodreads_reviews_poetry.json
- Run the
data_process.py
script to prepare the data for the simulation and recommendation tasks.
python data_process.py --input <path_to_raw_dataset> --output <path_to_processed_dataset>
len(review) | len(business) | len(user) | link | |
---|---|---|---|---|
Yelp | - | - | - | download |
- | - | - | - | |
Industrial_and_Scientific | 412.9K | 25.8K | 51.0K | review, meta, rating_only |
Musical_Instruments | 511.8K | 24.6K | 57.4K | review, meta, rating_only |
Video_Games | 814.6K | 25.6K | 94.8K | review, meta, rating_only |
Amazon | 1,739,300 | 76,000 | ||
- | - | - | - | |
Goodreads _Children |
734,640 | 124,082 | review, meta | |
Goodreads _Comics&Graphic |
542,338 | 89,411 | review, meta | |
Goodreads _Poetry |
154,555 | 36,514 | review, meta | |
Goodreads | 1,431,533 | 250,007 |