I recently came across this question in leetcode.
It looks naive at first glance but is tricky to do in time-complexity O(1).
Let us try to implement an acceptable solution which performs insert/delete in O(1) and access in O(n).
Further we will implement this paper which performs even access in O(1) using a beautiful yet simple data-structure.
Problem statement
Design and implement a data structure for Least Frequently Used (LFU) cache. It should support the following operations: get and set.
get(key) - Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
set(key, value) - Set or insert the value if the key is not already present. When the cache reaches its capacity, it should invalidate the least frequently used item before inserting a new item. For the purpose of this problem, when there is a tie (i.e., two or more keys that have the same frequency), the least recently used key would be evicted.
Could you do both operations in O(1) time complexity?
Solution
Step 1: Store values and access frequencies
We need 2 things to start with
Map to store key-value pairs
Map to store counts/frequency of access
Now insert and access operations are O(1) i.e. they perform in constant time.
Step 2: Implement Evict method
How do we implement evict method?
When size of map reaches max capacity, we need to find item with lowest frequency count.
There are 2 problems:
We have to iterate through all values of frequencies map, find lowest count and remove corresponding key from both maps.
This will take O(n) time.
Also, what if there are multiple keys with same frequency count? How do we find least recently used?
That’s not possible because HashMap does not store the order of insertion.
To solve both of above problems we need to add one more data structure:
Sorted map with frequency as map-keys and ‘list of item-keys with same frequency’ as map-values.
Awesome! Problem solved:
We can add new item can to the end of the list with frequency 1.
We can find the list with lowest frequency in O(1), since map is sorted by frequencies.
We can delete the first item of the list (of lowest frequency) since that will be least recently used. Also O(1).
Thus, both insert and delete are now O(1) i.e. constant time operations.
Step 3: Store Item’s position
Problem
While solving our delete problem, we accidentally increased our access operation time to O(n). How?
Note that all of item-keys sharing same frequency are in a list. Now if one of these items is accessed,
how do we move it to list of next frequency? We will have to iterate through the list first to find the item,
which in worst-case will take O(n) operations.
To solve the problem, we somehow need to jump directly to that item in the list without iteration.
If we can do that, it will be easier to delete the item and add it to end of next frequency list.
Unfortunately, this is not possible using our in-built data structures.
We need to create a new one (mentioned in the paper).
Solution
We need to store each item’s position.
We will create a simple class which stores item’s key, value and its position in the list.
We will convert the linked list to
Step 4: Remove the counts HashMap
Note that we need the intermediate map called counts to jump to the appropriate list.
We can go one step further (code not written) to remove this extra data structure.
Convert frequencies HashMap keys into a doubly linked list
Add variable reference to each item, which points to corresponding frequency
So instead of counts hashmap, we can go get frequency node directly from the item itself.
This is precisely the algorithm implemented in this paper