Summary
The paper (Github) introduces Monolith, ByteDance’s recommendation system powering TikTok (and the BytePlus Recommend product). The paper details the infrastructure and system design that allows for industrial-scale recommendations (millions of users).
Key Findings:
- Collision-less hash tables improve performance for sparse categorical features: Industrial recommender systems handle a massive number of users and ranking items, making it difficult to store large embedding tables. Traditional low-collision hashing can be problematic because some users/items are much more popular than others, so grouping them in the same bucket can harm model performance. Collision-less hashing (specifically, cuckoo hashing) can accommodate a larger number of unique IDs and improves AUC without overfitting
- Models can be updated in chunks: Using TensorFlow parameter servers, the model can be updated incrementally during training, removing the need for synchronised replacement of the entire distributed model. Parameter servers (model chunks) can be synced independently because each update tends to be small
- Collision-less hash tables and fast online training address concept drift: Online training with collision-less hash tables, updated frequently, improves performance and reduces the impact of concept drift over time