A recommender system is an algorithm that helps users discover new content. It is popular in online marketing. Recommender systems are known to drastically increase an online vendor’s revenue.
Collaborative filtering is a type of recommender system. It is an algorithm that makes automatic predictions (recommendations) to a user based on the preferences of a set of similar users. It requires a large amount of data from users. Collaborative filtering does not necessarily require machine learning, and uses deterministic calculations.
Cosine similarity or Pearson correlation are used to measure similarity between two users, and create the neighborhood of N users.
Let x and y be vectors of user’s ratings.
cosine similarity
x and y are equivalent | ||
X and y are dissimilar |
The cosine similarity is equivalent to the pearson correlation coefficient if the x and y vectors are normalized by the mean
Users tend to rate on different scales. Rating vectors can be normalized by dividing by mean or using difference from mean.
If there are many users, calculations can be computationally intensive. It is common to cluster users into groups and perform recommendations for each group of users.
Item to item collaborative filtering is a similar method that finds a neighborhood of similar items instead of users. Estimates of ratings are based on ratings for similar items for each user.
Pros
- Does not require feature data for the items. No feature selection needed.
- Can be predictive.
Cons
- Does not work well with small datasets.
- Cannot recommend new/unrated items.
- Defaults to recommending popular items. Popularity bias.