Modern recommendation system in real-life applications, have you heard?
Recommendation System is designed to optimize the gigantic amount of information in order to bring the most suitable item to users. It detects the relationship between each user and item based on their habit, past choice or the product's characteristic...to give suitable recommendation in the future. This is the sharing of Mr. Pham Hoang Anh (R&D Unit) on Vietnam Mobile Day 2019.
“Problems” of current major systems
We are living in a time when user information and the related issues are always in the center of attention. Artificial intelligence is getting smarter, it helps us understand users more than anything else.
Let's take a look back at this issue, even the ones closest to you might have difficulties with "What kind of music do you like? What artists do you like? What are the types of articles and news that you take interest in?..." and most importantly, "WHAT DO YOU NEED".
Nowadays, you don't need to spend too much time thinking which song should you listen to on Youtube, because it has recommended plenty of content for you. Facebook, Instagram frequently update posts that you could be interested in. Netflix knows which type of films you like to give you suggestions, or Tiki often has ads of exactly the things that you are wanting to buy. All of those "wonders" happen right after us having been on the system for some time. Now, the tiniest information might be of greater value, it can be seen via the great interest of the world for the trials and scandals related to information leak of some organizations.
All information is now like gold and diamond, but too much information is a pain in the neck. Let's take a look at some statistics in the beginning of 2019:

Take the example of Youtube, when an MV of an artist is uploaded, how can people reach this video while at the same time there are 300 video hours are being uploaded every minute. No users can watch all of these videos, and among billions of users, each person has their own likings, and to each person, 99.99% of those content is not suitable for them. The same problem happens to Amazon, a single user cannot go through hundreds of millions of products while shopping. It might take them a couple of...years.
Users getting information overload results in the importance of recommendation system in the modern systems/applications.
Recommendation System
The phrase Recommendation System must be familiar with most people in modern time. Simply put, the idea and main mission of a Recommendation System is to optimize the gigantic amount of information in order to bring the most suitable item to users. The main point here is that it will identify the relationship of each user to items based on their habit, past choices or the product's characteristic,... to give suitable recommendations in the future. For example, if an user likes history videos, it is highly possible that they will be interested in another history video or educational video rather than an action movie. These connections are the results of a recommendation system.
The goal of a recommendation system
Now, readers must have understood that the most fundamental goal of recommendation system is to boost revenue for the website and improve user experiences...and ultimately, increase profit. By giving effective recommendations of items for users, it will create a sense of convenience and excitement. However, revenue results from bringing users the following experience:
- Relevance: This is the most obvious goal of a recommendation system. User will log in/use/buy things that are related to their preference and habit.
- Novelty: An effective recommendation system will bring users related products that they have never seen before. Suggesting hot or trending products will bring less novelty to users, resulting in imbalance without any diversity in system items.
- Surprise: A higher level of novelty is when the suggested product is not only a never before seen item, but it also awes them. It can be understood as the relevance that users have never thought about. For example, for someone who loves and frequents Chinese restaurants, suggesting a Chinese restaurant that they have never been to before can be considered novel, but not unexpected. If the system recommends a restaurant specialized in Vietnamese traditional food, that is unexpected. Of course, the surprise factor must be carefully calculated via information to ensure that users are highly likely to take interest in this product.
- Diversity: In the past, systems usually suggest a list of Top-K closely related products for users. This leads to a possibility that users might not like any of them. In other words, diversity increases the chance of users taking interest in at least one product in that list, without boring them with repeated similar products.
Not all system is the same
In this article, the word "item" is frquently used. However, for each system, items can be understood in different ways, based on their usage. For Amazon, Tiki, or other e-commerce websites, the items could be products and goods to be sold. However for Youtube or Netflix, they are videos. For Google News, they are news articles. For Spotify, they are songs.

Facebook or other social media give suggestions for "people you may know" to increase the number of connections in your social media. (This is a form of suggestion that does not directly bring revenue like other websites, but the success of a social media depends greatly on the number of users' connections).
How to build a recommendation system
In general, the usual recommendation system is built based on the following steps

Store & Collect Data
A recommendation system needs data to work. Data here means everything related to users and items such as browsing/shopping history, duration, other events like clicks or search, all of them are valuable. So the first step is always collecting and storing all of these data.
Data Analysis
In this step, based on the situation of the collected data, as well as the requirement and goal of the system, we will need to have thorough analysis to design an effective phase for your own recommendation system.
Filter and Recommend Items
To have an effective recommendation systems, there will be a lot of other components and technicalities (send and receive data, retrieve existing information...), however its core will always be algorithms to make good use of those data. In the next part, let's take a closer look into some of the most common methods used by major systems in the world.
The impact of using recommendation system
Some statistics of the first half of the year from the systems might describe the effectiveness of applying this system.

Also, if you are interested, you can easily find these statistics on mass media. These numbers show how indispensible the recommendation systems are to the modern world.
Some methods of current recommendation systems
This is considered the most important part of a recommendation system. As I do research on machine learning and artificial intelligence, I am also curious about how these systems can understand users, so I spend plenty of time to study about recommendation systems. In modern AI, there are plenty of methods for these recommendation systems, but for today let's look at the 4 algorithms that are the most common in major systems in the world.
Content – Based Recommendation System
This is one of the most basic and applicable algorithm of recommendation systems. The word "content" can be understood as "item description". The system will try to find and recommend items that are similar to products used in the past. For example, an user bought the book “Sherlock Holmes” – the same category of detective book with “The Davinci Code” and “Silence of the Lamb”. In this case, the algorithm will make suggestions of the mentioned books for that user.
Now, we will pay more attention to how the system can understand the similarities between different products.

In machine learning, it can be understood that everything needs to be matricized for later computing and optimizing, and this issue is no exception. Comparing the similarities between items will become comparing the distance between 2 vectors.
In terms of method, there are some classic algorithms that help us vectorize item description like “Bag of words”, “TFIDF” or some more complex Deep Learning phức algorithms like “Item2vec”. All products must be transformed into vectors, then saved into the most similar items based on vector difference, and finally the recommendation is produced.
- Strength: Works well with systems with moderate amount of user data
- Weakness: Requires understanding of vectorize items and appropriate application.
Clusting User

This algorithm is built based on the similarities in our preference. The number of users is giant, but number of preferences might be much much less. With the huge number of users, having to run algorithms then giving recommendations for each individual is a heavy workload. If based on such remarks, instead of making recommendations for each person, the system will try to group users, then give recommendations for these groups. Of course, the grouping must be carefully executed based on their data. Basic clusting algorithms will be of good use for this issue.
- Strength: Easy to apply, fast processing timeễ dàng áp dụng, hệ thống hoạt động nhanh
- Weakness: Algorithm does not work with new users or items (Re-clusting is required)
Collaborative Filtering
This algorithm created a matrix for "rating" of users and items to give recommendations. On a system, each user will give high and low ratings for different items. This method undertake a simple action, which is calculated each user's rating to items with no rating information (never seen before).

Above is an example on a matrix using collaborative filtering algorithm of a system of 7 users (u0 -> u6) and 5 items (i0 -> i4). Users will give a rating from 0 -> 5 for an used item (0 is worst to 5 is best experience). The "?" mark illustrates the absence of users' rating for such item, and those are where the calculations are needed. In this example, if the calculation system sees that u3 gives i2 1 or 2, but gives i4 4 or 5, it makes much greater sense if the system suggests i4 for u3.
Many formula have been created for these blanks, however, it depends a lot on the number of filled square. The more squares are filled, the more accurate the calculations. However, this is a major challenge for collaborative filtering, as major systems have hundreds and millions of items as well as users, but each item is only reviewed by few users, as well as the number of ratings is too trivial compared to the number of current items in the system.
- Strength: The easiest algorithm to understand and highly effective due to its logic
- Weakness: Requires too much reviews from user
Session – Based Recommendation System
Session is defined as a series of users' activities. This algorithm proves that people's habits are pretty much identical. For example, the system sees that many people tend to continue with "Ve nha di con" after having watched "Nguoi phan xu" and "Song chung voi me chong" (on the hypothesis that we do not use any other information like genre, character...). From then, if another user logs into the system and completes the 2 series, suggesting "Ve nha di con" is amazingly effective.

This is one of the methods that Netflix, Youtube or Alibaba is applying for their recommendation system. This "learning" can be based on some basic algorithms like Markov algorithm or Deep Learning networks like **Recurrent Neuron Network**. At present, Session-base is still a great attraction of scientists and programmers thanks to its applicability. There are many studies and contests held every year regarding this matter.
- Strength: No user identification needed (Only save activity series of the same person) - One of a kind, due to low number of users in website.
- Weakness: Incompatible with new items
Conclusion
I hope that the sharing has brought to you a new viewpoint, particularly on the magic of human understanding system. These are only 4 of many recommendation method. You can check out more details in the slide below: