Deduplication Algorithm
Problem Description
Sellers offering the same products create similar SKUs in Sellers Center and generate many entries of the same items.
- This problem makes the search and catalog ranking process more difficult.
- Accordingly, this results in bad UX and leads to a decrease in sales.
Current Solution
Manual check and fixing is ineffective due to size of catalog and speed at which new items appear.
In addition, only 6k items are labeled as masters and 13k items have corresponding master information. This is less than 1% of the total catalog.
- Algorithm will behave differently for each database. In order to use the algorithm in production, it has to be analyzed in staging.
Algorithm
Step | Description | Screenshot |
Short Overview |
The proposed solution is based on novel approaches in text mining that show outstanding performance in various contests and production applications. In general, all processes consists of two stages:
|