Build a recommendation system on movielens 100k dataset

22 September 2019

HADOOP on Recommend System

DataSet : Netflix

Algorithm : Item Collaborative Filtering

  • A form of collaborative filtering based on the similarity between items calculated using people’s ratings of those items

  • Why using Item CF rather than user CF
    • The number of users weighs more than number of products
    • Item will not change frequently, lowering calculation
    • Using user’s historical data, more convincing
  • How to implement Item CF
    • Build co-occurrence matrix
    • Build rating matrix
    • Matrix computation to get recommending result
  • How to define similarity between two movies
    • If one user rated two movies, these two are related
  • Co-Occurrence matrix:
    • A co-occurrence matrix is a matrix that is defined over an image to be the distribution of co-occurring pixel values (grayscale values, or colors) at a given offset.

Five MapReducer used: