Let's map every user_id to a smaller value, but do it in such way that we maintain their order (we normalize them). Every timestamp query characterizes a substring in our log, therefore we can replace the timestamps with array indexes according to our log.
We can solve the problem offline first. Let's sort these modifies queries (which are of type
(left, right) -> count the number of distinct user_ids in the [left, right] substring) by the right index, increasingly. We can process them like this:
j = 0
for i in range(0, q):
while (j <= query[i].right):
if (last_position[value[j]] != -1):
BIT.add(last_position[value[j]], -1)
BIT.add(j, 1)
last_position[value[j]] = j
j = j + 1
answer[query[i].original_index] = BIT.query(query[i].right) - BIT.query(query[i].left - 1)
Suppose value contains the normalized user_id's in their initial order and BIT is a binary indexed tree. Time complexity is O((Q + N) * log(N)).
We can improve this idea to handle online queries. The idea is really similar. Suppose
segTree is a persistent segmented tree, which handles set update on position and query on interval.
for i in range(0, n):
if (last_position[value[i]] != -1):
segTree[i] = segTree[i - 1].update(last_position[value[i]], 0)
else:
segTree[i] = segTree[i - 1]
last_position[value[i]] = i
segTree[i] = segTree[i].update(i, 1)
for i in range(0, q):
print segTree[map(query_right)].sumOfRange(map(query_left), map(query_right))
At every update
O(log(N)) nodes are updated, so the space complexity is
O(N * log(N)), but the time complexity is the same as in the offline solution.