As a part of my academic research project titled Impact of Recommender System, I got to study about various collaborative filtering algorithms. I was supposed to study, implement and compare them. Tendencies-based was the best among them in terms of accuracy and computational efficiency.1 It was proposed by Fidel Cacheda and his team of researchers from University of A Coruna in their paper titled Comparison of Collaborative Filtering Algorithms: Limitations of Current Techniques and Proposals for Scalable, High-Performance Recommender Systems. It was as accurate as other collaborative filtering algorithms like Item-based, similarity fusion, etc. if not more accurate than them. It was the most computationally efficient.
Algorithm
Tendencies-based algorithm, instead of looking for relations between users or items, looks at the differences between them.
Often users with similar opinions rate items in a different way: some users mostly give positive ratings and rate really bad items negative while others usually rate negative and give positive ratings to the best items only. This algorithm deals with this variations with the concept of user tendency and item tendency. In other words, tendencies-based algorithm, instead of looking for relations between users or items, looks at the differences between them.
Notation
$$v_{ui}$$ denotes the rating given by user u to to item i. $$p_{ui}$$ denotes the prediction made by the algorithm for the rating of item i by user u. $$\bar{v_{u}}$$ denotes user mean rating and $$\bar{v_{i}}$$ denotes item mean rating.
Tendency Calculation
Tendency of a user (τu) tells if a user tends to rate items positively. It is defined as the average difference between his/her ratings and the item mean.
Tendency of an item (τi) refers to whether the users consider it an especially good or especially bad item.
Prediction Calculation
If both the user and the item have a positive tendency:
If both the user and the item have a negative tendency:
If the user has a negative tendency but the item has a positive tendency:
where β is a parameter that controls the contribution of the item and user mean.
If the user has a positive tendency but the item has a negative tendency:
As observed, a simple formula is used in the four cases and the calculation is highly efficient: training time complexity is O(mn) and rating can be predicted in O(1) time.
Implementation code can be downloaded from my GitHub repository.
Nice written article. Can you mention the intuition behind why we take max when both user and item tendency is positive and minimum when negative?