Results: Please check out the jupyter notebook of the project here. If you experience loading problems (as it is a big file), please take a look of a markdown copy of the project here.
Keywords: Kmeans - Latent Dirichlet Allocation (LDA) - TF-IDF - PCA
Description: In this project, I used unsupervised learning models to cluster unlabeled documents into different groups, visualized the results and identified their latent topics/structures. The data contains a list of 100 movies and their synopses from IMDB and Wikipedia. I deployed K-means and Latent Dirichlet Allocation (LDA) to cluster the synopses into topic groups. I Applied TF-IDF to vectorize the synopses of Top 100 greatest movies of all time from IMDB. I Visualized the results through dimensionality reduction with PCA.