The objective of this hands-on is to play with Dynamic Programming in Rust. There are two problems to solve.
As the last hands-on of your class is due to two weeks after Christmas, you are ready to plan your Christmas holiday traveling around Europe, to visit different cities. You have a tour guide to Europe, which presents a different itinerary for each city. Each itinerary specifies how many different attractions can be visited per day. As an example, this is the itinerary for Florence
Day | 1 | 2 | 3 | 4 |
Number of attractions | 3 | 2 | 1 | 4 |
This means that if you spend two days in Florence you will have the chance to visit \(3 + 2 = 5\) different attractions. You want to visit as many attractions as you can, considering that you only have a limited number of days on vacation before the oral exam. Your task is to write a program to organize your holiday. Note that you can visit the attractions in the order provided by the guide, meaning that if you spend one day in Florence you will visit \(3\) attractions (i.e., you cannot “cherry pick” the \(4\) attractions of the last day).
You are provided with the number of attractions you can visit for each of the \(D\) days, in each city. The number of cities is \(n\). Your goal is to identify the maximum number of attractions the tourist can visit. The time complexity of your solution should be \(O(nD^2)\).
Here we have a set of tests with the following format.
The first line contains \(n\) and \(D\). Then, the following \(n\) lines contain each \(D\) different integer values and describe the itineraries \(I\).
The maximum number of attractions that you can visit.
The input is
2 3 // n D
3 2 1 // Florence
3 1 1 // London
The output is
8 // 2 days in Florence, 1 day in London
As this problem seems very challenging, you might be tempted to cancel your Christmas holiday plans to travel around Europe done in the previous problem. If luck is on your side, you’ll still have a couple of days to visit Massaciuccoli lake.
A poor professor in the city of straightening tower is tasked with preparing a new course. Armed with a list of potential topics to choose from, he knows the beauty \(b_i\) and the difficulty \(d_i\) of each topic \(i\).
As students can be picky (I’m joking, if they happen to be permalous too!), they appreciate a course only if each lecture is more beautiful than the previous one. Moreover, adhering to pedagogical principles, the topics must exhibit increasing levels of difficulty.
The poor professor’s objective is to select the maximum number of topics for his upcoming course.
Your challenge is to devise an efficient algorithm to determine this maximum number of selected topics.
Here we have a set of tests with the following format.
The first line contains \(n\). Each of the next \(n\) lines contains the beuaty \(b\) and the difficulty \(d\), one for each topic.
The largest number of selected topics.
5 // n
0 3 // beauty 0 and difficulty 3. Write me an email if you know what this topic is.
99 1 // Fenwick tree?
11 20
1 2
10 5
The output is
3
Submit the main files, a file lib.rs
, and a file Handson_3_solution_YOUR_NAME.pdf
to rossano.venturini@gmail.com by 06/01/2024.
The report Handson_3_solution_YOUR_NAME.pdf
briefly describes your solutions, your implementations, and an analysis of their time and space complexities. Add references to any relevant source you consulted to find your solutions or to develop their implementations.
Before submitting your solutions,
cargo fmt
to format your code.cargo clippy
to check your code.Very important! You are allowed to verbally discuss solutions with other students, BUT you must implement all solutions by yourself. Therefore, sharing implementations with others is strictly forbidden.
]]>The objective of this hands-on is to implement and play with Segment Trees in Rust. There are two problems to solve.
You are given an array \(A[1,n]\) of \(n\) positive integers, each integer is at most \(n\). You have to build a data structure to answer two different types of queries:
Update(i, j, T)
that replaces every value \(A[k]\) with \(\min(A[k], T)\), where \(i\leq k \leq j\);Max(i, j)
that returns the largest value in \(A[i\ldots j]\).You are also given \(m\) of these queries to solve. The target solution must run in \(O((n+m) \log n)\) time.
Here we have a set of tests with the following format.
The first line contains \(n\) and \(m\). The next line contains the \(n\) integers in \(A\).
Each of the subsequent \(m\) lines contains the query.
The first value of each line is either \(0\) (query Update
) or \(1\) (query Max
).
For a query Update
the values of \(i\), \(j\), and \(T\) follows.
For a query Max
the values of \(i\) and \(j\) follows.
Results of Max
queries.
5 3 // n m
5 1 4 3 2 // The array A
0 1 2 2 // Update(1, 2, 2). The array A becomes 2 1 4 3 2.
1 2 4 // Max(2, 4) = 4
1 1 2 // Max(1, 2) = 2
The output is
4
2
You are given \(n\) segments. A segment \(\langle l, r\rangle\) is such that \(0 \leq l \leq r \leq n-1\). Then, you are given \(m\) queries IsThere
.
A query IsThere(i,j,k)
has to return \(1\) if there exists a position \(p\), with \(0 \leq i \leq p \leq j \leq n-1\), such that exactly \(k\) segments contain position \(p\), \(0\) otherwise.
The solution must run in \(O((n+m)\log n)\) time.
Here we have a set of tests with the following format.
The first line contains \(n\) and \(m\). Each of the next \(n\) lines contains a pair integers \(\langle l, r\rangle\), one for each segment. Finally, there will be \(m\) lines, one for each query. Each of these lines contains \(i\), \(j\) and \(k\), separated by a space.
The result of each query in input order.
5 4 // n m
0 4 // segments
1 3
1 2
1 1
0 0
0 4 4 // i j k
0 4 0
1 3 1
1 4 1
The output is
1
0
0
1
Submit the files main.rs
and lib.rs
and a file Handson_2_solution_YOUR_NAME.pdf
to rossano.venturini@gmail.com by 27/11/2023.
main.rs
and lib.rs
contain your implementations.Handson_2_solution_YOUR_NAME.pdf
that briefly describes your solutions, your implementations, and an analysis of their time and space complexities. Add references to any relevant source you consulted to find your solutions or to develop their implementations.Before submitting your solutions,
cargo fmt
to format your code.cargo clippy
to check your code.Very important! You are allowed to verbally discuss solutions with other students, BUT you must implement all solutions by yourself. Therefore, sharing implementations with others is strictly forbidden.
]]>Mo’s algorithm tipically achieves a time complexity of \(O((n+q)\sqrt{n})\), where \(n\) represents the size of the dataset, and \(q\) is the number of queries.
Let’s consider the following problem.
We are given an array \(A[1, n]\) of integers and our goal is to solve \(q\) queries power
. For a query power(l,r)
we have to compute the “power” of the subarray \(A[l, r]\). For each integer \(s\) within this subarray, let \(K_s\) represent the number of occurrences. The subarray’s power is defined as the sum of the products \(s \cdot K_s \cdot K_s\) for every positive integer \(s\) that appears in the subarray.
Our goal is to achieve a time complexity of \(\Theta((n+q)\sqrt{n})\) to solve all the \(q\) queries. This may appear quite challenging, and you might even wondering where the factor \(\sqrt{n}\) is coming from.
For now, let’s temporarily set aside the current problem and begin by introducing the Mo’s algorithm with a simpler one. By the end of these notes, you’ll be astonished at how straightforward this problem becomes with the right algorithmic tool.
For many types of range queries, such as RangeSum
, RMQ
, Distinct
, and others, there exist suitable data structures (like the Segment Tree) to answer queries efficiently and online.
Solving a query online means that the data structure answers the query as soon as it is presented, without any delay. However, for some more complex query types, there doesn’t exist such online-efficient data structures.
For certain query types, the best we can hope for is an efficient solution that works effectively only when handling a sufficiently large batch of queries. This way, the solution can process the queries in the order it deems most favorable. With such solutions, the time complexity of an individual query is low only in an amortized sense.
The Mo’s algorithm is one of these strategies: if the batch consists of \(q = \Omega(n)\) queries, each query can be solved in \(\Theta(\sqrt{n})\) amortized time.
Consider now the following problem.
We are given an array \(A[0,n-1]\) consisting of colors, with each color represented by an integer within \([0, n-1]\). Additionally, we are given a set of \(q\) range queries called three_or_more
. The query three_or_more(l, r)
aims to count the colors that occur at least three times within the subarray \(A[l, r]\).
Let’s begin by examining a straightforward algorithm that addresses a query three_or_more(l, r)
by scanning the subarray \(A[l, r]\). The algorithm maintains an array of counters
to track the number of occurrences of each color within the query range. Whenever a color reaches three occurrences, the answer
is incremented by one.
Below is a Rust implementation of this strategy.
pub fn three_or_more_slow(a: &[usize], queries: &[(usize, usize)]) -> Vec<usize> {
let mut counters: Vec<usize> = vec![0; a.len()];
let mut answers = Vec::with_capacity(queries.len());
for &(l, r) in queries {
let answer = a[l..=r].iter().fold(0, |ans, &color| {
counters[color] += 1;
if counters[color] == 3 {
ans + 1
} else {
ans
}
});
answers.push(answer);
a[l..=r].iter().for_each(|&color| counters[color] = 0);
}
answers
}
Observe that, after each query, it’s essential to reset the vector of counters. In the above implementation, this reset is done using the code snippet a[l..=r].iter().for_each(|&color| counters[color] = 0)
. What’s noteworthy is that this method selectively resets only the counters associated with colors within the queried subarray. This approach ensures that the time spent on resetting is proportional to the size of the queried range, rather than the length of counters
. Consequently, this gives a better running time when dealing with short queried subarrays. However, this minor optimization doesn’t change the worst-case time complexity: the algorithm is very slooooooow.
Indeed, it’s evident that it has a time complexity of \(\Theta(qn)\). The figure below illustrates an input that showcases the worst-case running time. We have \(n\) queries. The first query range has a length of \(n\) and spans the entire array. Then, the subsequent query ranges are each one unit shorter, until the last one, which has a length of one. The total length of these ranges is \(\Theta(n^2)\), which is also the time complexity of the solution.
Let’s now introduce a different way to implementing the inefficent algorithm above. At first glance, this may appear to be just a more convoluted way of implementing the same strategy, seemingly offering no advantage in terms of worst-case running time. However, as we will see later on, we can achieve a significantly improved time complexity just by strategically rearranging the queries.
Suppose we have just answered the query for the range \([l', r']\) and are now addressing the query for the range \([l, r]\). Instead of starting from scratch, we can update the previous answer and counters by adding or removing the contributions of colors that are in the new query range but not in the previous one, or vice versa. Specifically, for the left endpoints, we must remove all the colors in \(A[l', l-1]\) if \(l' < l\), or we need to add all the colors in \(A[l, l'-1]\) if \(l < l'\). The same applies to the right endpoints \(r\) and \(r'\).
The Rust implementation below utilizes two closures, add
and remove
, to keep answer
and counters
updated as we adjust the endpoints.
pub fn three_or_more(a: &[usize], queries: &[(usize, usize)]) -> Vec<usize> {
let mut counters: Vec<usize> = vec![0; a.len()];
let mut answers = Vec::with_capacity(queries.len());
let mut cur_l = 0;
let mut cur_r = 0; // here right endpoint is excluded
let mut answer = 0;
for &(l, r) in queries {
let mut add = |i| {
counters[a[i]] += 1;
if counters[a[i]] == 3 {
answer += 1
}
};
while cur_l > l {
cur_l -= 1;
add(cur_l);
}
while cur_r <= r {
add(cur_r);
cur_r += 1;
}
let mut remove = |i| {
counters[a[i]] -= 1;
if counters[a[i]] == 2 {
answer -= 1
}
};
while cur_l < l {
remove(cur_l);
cur_l += 1;
}
while cur_r > r + 1 {
cur_r -= 1;
remove(cur_r);
}
answers.push(answer);
}
answers
}
The time complexity of this algorithm remains \(\Theta(qn)\). However, we observe that a query now executes more quickly if its range significantly overlaps with the range of the previous query.
This effect is perfectelly explained by the input of the previosu figure. This is input becomes a best-case for the new implementation as it takes \(\Theta(n)\) time. Indeed, after spending linear time on the first query, any subsequent query is answered in constant time.
This implementation is highly sensitive to the ordering of the queries. It is enough to modify the ordering of the above queries, as shown in the figure below, to revert to quadratic time. In the example below, we rearrange the queries to alternate between a long and a short query. With this ordering, the new implementation takes \(\Theta(n^2)\) time.
These considerations lead to a question: if we have a sufficient number of queries, can we rearrange them in a way that exploits the overlap between successive queries to gain an asymptotic advantage in the overall running time?
Mo’s algorithm answers positively this question by providing a reordering of the queries such that the time complexity reduces to \(\Theta((q+n)\sqrt{n})\).
The idea is to conceptually partition the array \(A\) into \(\sqrt{n}\) buckets, each with a size of \(\sqrt{n}\), named \(B_1, B_2, \ldots, B_{\sqrt{n}}\). A query belongs to bucket \(B_k\) if and only if its left endpoint \(l\) falls into the \(k\)-th bucket, which can be expressed as \(\lfloor l/\sqrt{n} \rfloor = k\).
Initially, we group the queries based on their corresponding buckets, and within each bucket, the queries are solved in ascending order of their right endpoints.
The figure shows this bucketing approach and the queries of one bucket sorted by their right endpoints.
Now, let’s analyze the time complexity of the algorithm with this query reordering. It’s sufficient to count the number of times we move the indexes cur_l
and cur_r
. This is because both add
and remove
take constant time, and, thus, the time complexity is proportional to the overall number of moves of these two indexes.
Let’s concentrate on a specific bucket. As we process the queries in ascending order of their right endpoints, the index cur_r
moves a total of at most \(n\) times. On the other hand, the index cur_l
can both increase and decrease but, it is constrained within the bucket, and, thus, it cannot move more than \(\sqrt{n}\) times per query. Thus, for a bucket with \(b\) queries, the overall time to process its queries is \(\Theta(b\sqrt{n} + n)\).
Summing up over all buckets, the time complexity is \(\Theta(q\sqrt{n} + n\sqrt{n})\), which results in \(\Theta(sqrt{n})\) amortized time per query when \(m = \Omega(n)\).
Here’s a Rust implementation of the reordering process. We sort the queries by buckets, using their left endpoints, and within the same bucket, we sort them in ascending order of the right endpoints. We also have to compute a permutation
to keep track of how the queries have been reordered. This permutation is essential for returning the answers to their original ordering.
pub fn mos(a: &[usize], queries: &[(usize, usize)]) -> Vec<usize> {
// Sort the queries by bucket and get the permutation induced by this sorting.
// The latter is needed to permute the answers back to the original ordering
let mut sorted_queries: Vec<_> = queries.iter().cloned().collect();
let mut permutation: Vec<usize> = (0..queries.len()).collect();
let sqrt_n = (a.len() as f64) as usize + 1;
sorted_queries.sort_by_key(|&(l, r)| (l / sqrt_n, r));
permutation.sort_by_key(|&i| (queries[i].0 / sqrt_n, queries[i].1));
let answers = three_or_more(a, &sorted_queries);
let mut permuted_answers = vec![0; answers.len()];
for (i, answer) in permutation.into_iter().zip(answers) {
permuted_answers[i] = answer;
}
permuted_answers
}
As I promised, the challenging problem introduced above no longer seems that hard. Just use Mo’s algorithm and a little bit of attention in updating the answer after a add
or a remove
.
Mo’s algorithm is an offline approach, which means we cannot use it when we are constrained to a specific order of queries or when update operations are involved.
When implementing Mo’s algorithm, the most challenging aspect is implementing the functions add
and remove
. There are query types for which these operations are not as straightforward as in previous problems and require the use of more advanced data structures than just an array of counters. One of these cases is the range minimum queries (RMQ).
For RMQ, the addition and removal of an element needs maintaining the elements in the range within a Min-Heap, which increases the query time by a factor of \(\log n\). Consequently, in this case, the amortized time per query is \(\Theta(\sqrt{n}\log n)\), which is much worse than the ad hoc (and online) solution using a segment tree. This shouldn’t come as a surprise, as ad hoc solutions that leverage specific properties of the problem at hand can often outperform general techniques like Mo’s algorithm.
To conclude, let’s consider an exercise that teaches the use of Mo’s algorithm to solve queries on a tree.
You have a rooted tree consisting of \(n\) vertices. Each vertex of the tree has some color. We will assume that the tree vertices are numbered by integers from \(1\) to \(n\). Then we represent the color of vertex \(v\) as \(c_v\). The tree root is the vertex with number \(1\).
We need to answer \(m\) queries. Each query is described by two integers \(v_j,k_j\). The answer to query \(v_j, k_j\) is the number of colors \(c\) that occur at least \(k_j\) in the subtree of vertex \(v_j\).
This problem can be solved in \(\Theta((m+n)\sqrt{n})\) time with the Mo’s algorithm. How?
We should note that for this problem there exists a more advanced solution which runs in \(\Theta((n+q)\log n)\) time. This solution uses the heavy-light decomposition of the tree. How?
]]>More precisely, the Fenwick tree solves the following problem.
We have an array \(A[1,n]\) of integers, and we would like to support the following operations:
sum(i)
returns the sum of the elements in \(A[1..i]\);add(i, v)
adds the value \(v\) to the entry \(A[i]\).The Fenwick tree efficiently handles these queries in \(\Theta(\log n)\) time while using linear space. In fact, the Fenwick tree is an implicit data structure, which means it requires only \(O(1)\) additional space in addition to the space needed to store the input data (the array \(A\) in our case).
In our descritpion, we are going to use the following array \(A\) as a running example. Notice that we are using a one-based indexing for the array.
Let’s describe two trivial solutions for the problem above.
The first solution simply stores \(A\) as it is. This way, sum(i)
is solved by scanning the array in \(\Theta(n)\) time, and add(i, v)
is solved in \(O(1)\) time.
The second solution, instead, stores the prefix-sums of \(A\). This way, sum(i)
is solved in \(O(1)\) time, and add(i, v)
is solved by modifying all the entries up to position \(i\) in \(\Theta(n)\) time.
The sum
/add
query time tradeoffs of these solutions are clearly unsatisfactory.
The Fenwick Tree provides better tradeoffs for this problem. In our description, we will gradually introduce this data structure by constructing it level by level.
To start, let’s simplify the original problem slightly. In this variant, we’ll focus on solving sum
queries only for positions that are powers of \(2\), like positions \(1\), \(2\), \(4\), and \(8\) in our array \(A\). The solution of this variant will be the first level of our Fenwick Tree.
The idea for solving this relaxed variant is to sparsify the second trivial solution above, storing only the prefix sums of positions that we need for queries. The figure below illustrates this solution as a tree, with a fictitious root node named \(0\) and child nodes named \(1\), \(2\), \(4\), and \(8\), each storing the sum up to the corresponding power of \(2\). Additionally, below every node, we provide the range of positions it covers. For instance, node \(4\) covers positions in the range \([1, 4]\).
We address the queries of the simplified problem as follows:
The sum(i)
query is straightforward. We simply access node \(i\). Of course, this only works for indexes \(i\) that are a power of \(2\).
For the add(i, v)
query we need to add \(v\) to all nodes covering ranges that include position \(i\). For example, for the query add(3, 10)
, we add the value \(10\) to nodes \(4\) and \(8\). In general, first we have to find the smallest power of \(2\) greater than \(i\), let’s call it \(j\). Then, we add \(v\) to nodes \(j, 2j, 2^2j, 2^3j, \ldots\).
Observe that sum
takes constanti time and add
takes \(\Theta(\log n)\) time. Hooray! We are within our target time complexity. Now, can we extend this solution to support sum
queries on more positions?
We observe that we’re not currently supporting queries for positions within the ranges between consecutive powers of \(2\). For instance, positions in the range \([5,7]\), which fall between \(2^2\) and \(2^3\).
But wait! Enabling queries for this subarray is just a smaller instance of our original problem. Therefore, we can apply the same strategy by adding a new level to our tree. If the subarray is \(A[l..r]\), the new level will support the sum(i)
query for any \(i\) such that \(i-l+1\) is a power of \(2\).
Our two-level tree can now handle sum(i)
queries also for positions that are the sum of two powers of \(2\). Why? Consider a position \(i\) expressed as \(2^{k'}+2^{k}\), where \(k'>k\). We can decompose the range \([1,i]\) into two subranges: \([1,2^{k'}]\) and \([2^{k'}+1,2^{k'}+2^{k}=i]\). Both of these subranges are covered by nodes in our tree. Specifically, range \([1,2^{k'}]\) is covered by node \(2^{k'}\) at the first level, while \([2^{k'}+1,2^{k'}+2^{k}=i]\) is covered by node \(i\) at the second level.
For example, let’s consider the query sum(5)
. We can handle this in our two-level tree because \(5=2^2+2^0\). Consequently, the range \([1,5]\) is divided into \([1,4]\) and \([5,5]\), and the result (which is \(6\)) is obtained by summing the values of nodes \(2^2=4\) and \(2^2+2^0=5\).
Which positions are still not supported for sum
queries? Positions that are neither powers of \(2\) nor the sum of two powers of \(2\). In our example, with \(n=8\), only position \(7=2^2+2^1+2^0\) falls into this category. So, what do we do next? We add a new level to our tree to support queries for positions that are the sum of three powers of \(2\).
That’s all. This is the Fenwick tree for the array \(A\). Now, let’s make some observations:
Now, let’s delve into the details of how to solve our sum
and add
queries on a Fenwick tree.
sum
queryLet’s start by discussing the sum(i)
query. Based on the previous discussion, solving this query involves beginning at node \(i\) and traversing up the tree to reach node \(0\). Thus, sum
takes time proportional to the height of the tree, resulting in a time complexity of \(\Theta(\log n)\).
For a running example, let’s take the case where \(i=7\). We start at node \(7\) and move to its parent (node \(6\)), its grandparent (node \(4\)), and stop at its great-grandparent (the fictitious node \(0\)), summing their values along the way. This works because the ranges of these nodes (\([1,4]\), \([5,6]\), and \([7,7]\)) collectively cover the queried range \([1,7]\).
It’s important to note that answering a sum
query becomes straightforward if we were allowed to store the tree’s structure. However, a significant part of the Fenwick tree’s elegance lies in the fact that storing the tree is not actually necessary. This is because we can efficiently navigate from a node to its parent using a few bit-tricks.
This is the reason why the Fenwick tree is also referred to as the Binary Indexed Tree.
We want to compute the parent
of a node, and we want to do it quickly and without representing the structure of the tree.
Let’s examine the binary representations of the IDs of the nodes involved in answering the previous query.
Can you find out any pattern? Surprisingly, the binary representation of a node’s parent can be obtained by removing the trailing one (i.e., rightmost bit set to 1) from the binary representation of its children.
Let’s explore why this method works.
Suppose we have a node \(i\), and its range is \([j,i]\) for some \(j\). Its children will be nodes \(i+2^0\), \(i+2^1\), \(i+2^2\), and so on, spanning ranges \([j+1, i+2^0]\), \([j+1, i+2^1]\), \([j+1, i+2^2]\), and so forth. The binary representation of any of these children is identical to that of \(i\), except for the addition of the trailing one (due to the term \(2^k\)).
Now, we need a clever bit-trick to efficiently obtain the parent of a node. Based on our previous discussion, it’s evident that we need a way to remove the trailing one from the binary representation of a node \(i\). The trailing one can be isolated by computing \(k = i {\tt \&} -i\). Thus, \(i-k\) is the parent of \(i\).
In fact, negative numbers are represented in two’s complement form. In this representation, the two’s complement of a number is obtained by taking the bitwise complement of the number and then adding one to it.
For instance, if we have the binary number \(7\) as 0111, its two’s complement, which represents \(-7\), is 1001.
The key property of the two’s complement is that it inverts all the bits in the binary representation of a number, except for the leftmost “trailing one. Thus, when we compute the logical AND of a number and its two’s complement, only the trailing one survives. Therefore, the final subtraction \(i-k\) effectively cancels out this bit from \(i\), as required.
For example,
add
Now, let’s consider the operation add(i, v)
. We need to add the value of v
to each node whose range include the position \(i\).
Certainly, node \(i\) is one of these nodes since its range ends at \(i\). Additionally, the right siblings of node \(i\) also encompass the position \(i\) in their ranges. This is because siblings share the same starting position, and right siblings have increasing sizes. The right siblings of the parent of node \(i\), the right siblings of the grandparent, and so on also contain position \(i\).
It might seem like we have to modify a large number of nodes. However, a simple observation reveals that this number is at most \(\log n\). This is because, each time we move from a node to its right sibling or to the right sibling of its parent, the size of the covered range at least doubles. And a range cannot double more than \(\log n\) times.
The figure below shows in red the nodes to modify for the operation add(5, _)
.
Now that we know which are the nodes to modify for add(i,_)
, let’s discuss how to compute these nodes with bit-tricks.
Coninuing the above example, starting from \(i=5\), the next node to modify is its right sibling, node \(6\). Let’s take a closer look at their binary representations.
Can you find out any pattern?
It seems that we need to isolate the trailing one in \(5\), which is 0001, and add it to \(5\) to obtain \(6\). Is this always the correct approach?
Let’s try it with another node. The right sibling of the parent of \(6\) (and, therefore, of \(5\)) is \(8\).
The trailing one in \(6\) is 0010 (i.e., \(2\)) and \(6+2=8\). Cool!
Why is this method correct? The binary representation of a node and its siblings matches, except for the position of the trailing one. When we move from a node to its right sibling, this trailing one shifts one position to the left. Adding this trailing one to a node accomplishes the required shift, as seen when we add \(5\) to its trailing one.
Now, consider the ID of a node that is the last child of its parent. In this case, the rightmost and second trailing one are adjacent. To obtain the right sibling of its parent, we need to remove the trailing one and shift the second trailing one one position to the left.
Thankfully, this effect is one again achieved by adding the trailing one to the node’s ID.
The time complexity of add
query is \(\Theta(\log n)\), as we observe that each time we move to the right sibling of the current node or the right sibling of its parent, the trailing one in its binary representation shifts at least one position to the left. This can occur at most \(\lfloor \log n \rfloor +1\) times.
Here, we present a minimal Rust implementation of a Fenwick tree. In this non-generic implementation, we’ve arbitrarily chosen to use i64
as the type for the values. While we’ve transitioned to 0-based indexing for queries, internally, we still use the 1-based indexing to maintain consistency with the notes.
For a more advanced implementation, it could be required to allow generic types and move away from the 1-based indexing. Additionally, there are various potential optimizations to enhance its performance. For more details, refer to Practical trade-offs for the prefix-sum problem.
#[derive(Debug)]
pub struct FenwickTree {
tree: Vec<i64>,
}
impl FenwickTree {
pub fn with_len(n: usize) -> Self {
Self {
tree: vec![0; n + 1],
}
}
pub fn len(&self) -> usize {
self.tree.len() - 1
}
/// Indexing is 0-based, even if internally we use 1-based indexing
pub fn add(&mut self, i: usize, delta: i64) {
let mut i = i + 1;
assert!(i < self.tree.len());
while i < self.tree.len() {
self.tree[i] += delta;
i = Self::next_sibling(i);
}
}
/// Indexing is 0-based, even if internally we use 1-based indexing
pub fn sum(&self, i: usize) -> i64 {
let mut i = i + 1;
assert!(i < self.tree.len());
let mut sum = 0;
while i != 0 {
sum += self.tree[i];
i = Self::parent(i);
}
sum
}
pub fn range_sum(&self, l: usize, r: usize) -> i64 {
self.sum(r) - if l == 0 { 0 } else { self.sum(l - 1) }
}
fn isolate_trailing_one(i: usize) -> usize {
if i == 0 {
0
} else {
1 << i.trailing_zeros()
}
}
fn parent(i: usize) -> usize {
i - Self::isolate_trailing_one(i)
}
fn next_sibling(i: usize) -> usize {
i + Self::isolate_trailing_one(i)
}
}
We present now three problems that can be solved with Fenwick Tree.
We are given an array \(A[1 .. n]\) of \(n\) positive integers. If \(1 \leq i < j \leq n\) and \(A[i] > A[j]\), then the pair \((i, j)\) is called an inversion of \(A\).
The goal is to count the number of inversions of \(A\).
We assume that the largest integer \(M\) in array \(A\) is in \(O(n)\). This assumption is important because we’re using a Fenwick Tree of size \(M\) and building such a data structure takes \(\Theta(M)\) time and space. If, on the other hand, \(M\) is too large, we need to sort array \(A\) and replace each element with its rank in the sorted array.
Then, we use a Fenwick tree on an array \(B\) with \(M\) elements, initially all set to \(0\). We scan array \(A\) from left to right. When processing \(A[j]\), we set \(B[j]\) to \(1\). The number of elements larger than \(A[j]\) that we’ve already processed can be calculated using the range_sum(j+1, M)
function.
The running time is \(\Theta(n\log n)\). It’s worth noting that there is another popular solution with the same time complexity, which utilizes a variant of Merge Sort.
A Rust implementation is as follows.
pub fn counting_inversions(a: &[u64]) -> usize {
if a.is_empty() {
return 0;
}
let max = *a.iter().max().unwrap() as usize;
let mut ft = FenwickTree::with_len(max + 1);
let mut count: usize = 0;
for &e in a {
count += ft.range_sum((e + 1) as usize, max) as usize;
ft.add(e as usize, 1);
}
count
}
This problem is from CodeForces.
We are given \(n\) segments: \([l_1, r_1], [l_2, r_2], \ldots, [l_n, r_n]\) on a line. There are no coinciding endpoints among the segments.
The task is to determine and report the number of other segments each segment contains.
We can restate the problem as follows: For the \(i\)-th segment, we want to count the number of segments \(j\) such that the following conditions hold: \(l_i < l_j\) and \(r_j < r_i\).
This problem can be solved by employing the sweep line algorithm and a Fenwick tree. First, we build the Fenwick tree by adding \(1\) in each position that correspons to right endpoint of a segment. This way, a sum(r)
reports the number of segments that end in the range \([1,r]\).
Next, we let a sweep line process the segments in increasing order of their left endpoints. When we process the segment \([l_i,r_i]\), we compute sum
\((r_i-1)\) as the result for the current segment. Before moving to the next segment, we add \(-1\) at position \(r_i\) to remove the contribution of the right endpoint of the current segment.
The claim is that sum
\((r_i-1)\) is the number of segments contained in \([l_i,r_i]\). This is because all the segments that start before \(l_i\) have already been processed, and their right endpoints have been removed from the Fenwick tree. Therefore, sum
\((r_i-1)\) is the number of segments that start after \(l_i\) and end before \(r_i\).
This problem is from SPOJ.
We are given an array \(A[1,n]\), initially all the entries are set to \(0\) and we would like to support two operations:
- access(i)
returns \(A[i]\);
- range_uptade(l, r, v)
updates the entries in \(A[i..j]\) by adding \(v\).
In this solution, we utilize a Fenwick tree on an array \(B[1,n]\), initially populated with zeros. For a range_update(l, r, v)
, we add \(v\) to \(B[l]\) and subtract \(v\) from \(B[r+1]\). This ensures that the value at position \(i\) is the prefix sum up to \(B[i]\).
A Rust implementation of this solution is the following.
#[derive(Debug)]
struct UpdateArray {
ft: FenwickTree,
}
impl UpdateArray {
pub fn with_len(n: usize) -> Self {
Self {
ft: FenwickTree::with_len(n),
}
}
pub fn len(&self) -> usize {
self.ft.len()
}
pub fn access(&self, i: usize) -> i64 {
self.ft.sum(i)
}
pub fn range_update(&mut self, l: usize, r: usize, v: i64) {
assert!(l <= r);
assert!(r < self.ft.len());
self.ft.add(l, v);
if r + 1 < self.ft.len() {
self.ft.add(r + 1, -v);
}
}
}
The range update of the previous problem is paired with the access(i)
operation. This is easier than the problem we are going to solve in this section.
Here we want to support range_update(l, r, v)
and sum(i)
operations. Notice that access(i)
is also immediately supported with sum(i)
- sum(i-1)
. This pair of operations makes the problem harder than the previous one.
More formally, the problem is as follow.
Given an array \(A[1,n]\) of integers, we would like to support the following operations.
sum(i)
returns \(\sum_{k=1}^i A[k]\);range_update(l, r, v)
updates the entries in \(A[l..r]\) by adding \(v\).We notice that the add
operation of the original Fenwick tree is just a special case of the range_update
operation. Moreover, as mentioned above, access(i)
operation is also supported with two sum
operations.
It’s evident that we can solve a range_update(l, r, v)
with \(j-i+1\) add
queries in \(\Theta((j-i+1)\log n)\) time. However, our goal is to achieve a time complexity of \(\Theta(\log n)\).
This time complexity is independent of the size of the updated range and, therefore, is much more better than the previous one for large ranges.
The solution to this problem utilizes two Fenwick trees. Therefore, we require doubling the space usage to support the more powerful range_update(l, r, v)
operation.
Initially, we consider a flawed solution using a single Fenwick tree, denoted as \(FT_1\). To fix the issues in this solution, we introduce a second Fenwick tree, denoted as \(FT_2\).
In our initial approach, we follow a similar strategy to the one used in the ‘Update the Array’ problem. For a range_update(r, l, v)
, we modify \(FT_1\) by adding \(v\) at position \(l\) and subtracting \(v\) at position \(r+1\). When querying sum(i)
, we multiply the result from \(FT_1\) by \(i\). This approach, however, led to errors in the results.
Let’s consider a range_update(l, r, v)
operation on an brand new Fenwick tree.
The correct results for a query sum(i)
after the update are the following.
sum(i)
is \(0\).sum(i)
is \(v(i-l+1)\).sum(i)
is \(v(r-l+1)\).Instead, the results returned by our implementation of sum(i)
are the following.
sum(i)
is \(0\);sum(i)
is \(v\cdot i = v (l-1) + v(i-l+1)\);sum(i)
is \((v-v)i = 0\).Our initial implementation reports the correct results for \(1 \leq i < l\) but introduces errors in other cases. Specifically, for \(l\leq i \leq r\), it includes an additional term \(v(l-1)\), while it erroneously reports \(0\) instead of the correct value \(v(r-l+1)\) in the latter case.
To address these errors, we introduce a second Fenwick tree, denoted as \(FT_2\), which will keep track of these discrepancies.
When we perform a range_update(l, r, v)
, we add \(-v(l-1)\) to position \(l\) and \(v\cdot r\) to position \(r+1\) in \(FT_2\).
This revised approach ensures that the result of sum(i)
can be expressed as \(a\cdot i + b\), where \(a\) represents the sum up to \(i\) in \(FT_1\) and \(b\) is the sum up to \(i\) in \(FT_2\).
The value of \(b\) from the second Fenwick tree corrects the errors present in the flawed solution. Specifically:
The Rust implementation of Fenwick tree with range update is as follows.
#[derive(Debug)]
struct RangeUpdate {
ft1: FenwickTree,
ft2: FenwickTree,
}
impl RangeUpdate {
pub fn with_len(n: usize) -> Self {
Self {
ft1: FenwickTree::with_len(n),
ft2: FenwickTree::with_len(n),
}
}
pub fn len(&self) -> usize {
self.ft1.len()
}
pub fn sum(&self, i: usize) -> i64 {
self.ft1.sum(i) * i as i64 + self.ft2.sum(i)
}
pub fn access(&self, i: usize) -> i64 {
self.sum(i) - if i == 0 { 0 } else { self.sum(i - 1) }
}
pub fn add(&mut self, i: usize, v: i64) {
self.range_update(i, i, v)
}
pub fn range_update(&mut self, l: usize, r: usize, v: i64) {
self.ft1.add(l, v);
self.ft2.add(l, -v * (l as i64 - 1));
if r + 1 < self.len() {
self.ft1.add(r + 1, -v);
self.ft2.add(r + 1, v * r as i64);
}
}
}
These notes are for the “Competitive Programming and Contests” course at Università di Pisa.
For an introduction to (static) prefix sums and their applications, take a look at ‘The Power of Prefix Sums’ post. ↩
The essence of prefix sums lies in transforming a given array of values into another array, where each element at a given index represents the cumulative sum of all preceding elements in the original array.
To be more formal, let’s assume we have an array \(A[1,n]\) of values, and our objective is to support the query range_sum(i,j)
, which returns the sum of the values in the subarray \(A[i..j]\).
For example, suppose you have an array \(A[1,8]\) with values: [2, 4, 1, 7, 3, 0, 4, 2]. The query range_sum(2, 6)
equals \(4+1+7+3+0 = 15\).
These queries can be solved in constant time by maintaining the prefix sum array. This array \(P[1,n]\) stores, at any position \(i\), the sum of the values in \(A\) up to the \(i\)th position. In other words, \(P[i] = \sum_{k=1}^i A[k]\).
The arrays \(A\) and \(P\) are shown in the figure below.
Armed with \(P\), a range_sum(i,j)
query is resolved by calculating \(P[j]-P[i-1]\).
Continuing the example shown in the figure above, range_sum(2, 6)
is \(P[6] - P[1] = 17 - 2 = 15\).
In Rust, the combinator scan
can produce the prefix sums (and much more) from an iterator.
scan
is an iterator adapter that bears similarity to fold. Similar to fold
, scan
maintains an internal state, initially set to a seed value, which is modified by a closure taking both the current internal state and the current element from the iterator into account.
The distinction between scan
and fold
is that the former produces a new iterator with all the states taken by its internal state, whereas the latter only returns the value of the final internal state.
The following code snippet illustrates how to employ scan
for computing prefix sums.
let a = vec![2, 4, 1, 7, 3, 0, 4, 2];
let psums = a
.iter()
.scan(0, |sum, e| {
*sum += e;
Some(*sum)
})
.collect::<Vec<_>>();
assert!(psums.eq(&vec![2, 6, 7, 14, 17, 17, 21, 23]));
Range sum queries are exceptionally useful for solving a variety of other problems involving other kind of range queries. We present here solutions to three problems from CodeForces, which serve as examples of the power of prefix sums.
Below, you’ll find links to these problems if you’d like to attempt them yourself before reading their solutions.
We have a string \(s=s_1s_2 \ldots s_n\) consisting only of characters \(a\) and \(b\) and we need to answer \(m\) queries.
Each query \(q(l, r)\), where \(1 \leq l < r \leq n\), asks for the number of positions \(i \in [l, r]\) such that \(s_i = s_{i+1}\).
Let’s consider an example to better illustrate this problem.
Given string \(s = aabbbaaba\). Consider the query \(q(3, 6)\). We are interested in the substring \(bbba\). So, the answer for this query is \(2\) because there are three positions followed by the same symbol, namely position \(1\), \(2\), and \(4\) in the substring.
The idea is that of computing the binary vector \(B[1,n]\) such that \(B[i]=1\) if \(s_i == s_{i+1}\), \(0\) otherwise. This way, the answer to the query \(q(l,r)\) is \(\sum_{i=l} ^{r-1} B[i]\). Thus, each query can be solved in constant time by computing prefix-sums on vector \(B\).
For example, the binary vector \(B\) for the string \(s = aabbbaaba\) is [1, 0, 1, 1, 0, 1, 0, 0, 0]. Its prefix sum array \(P\) is [1, 1, 2, 3, 3, 4, 4, 4, 4]. Therefore, the query \(q(3,6) = P[5]-P[2] = 3-1 = 2\).
The Rust implementation is as follows.
#[derive(Debug)]
struct Ilya {
psums: Vec<usize>,
}
impl Ilya {
pub fn new(s: &str) -> Self {
let psums = s
.as_bytes()
.windows(2)
.map(|w| if w[0] == w[1] { 1usize } else { 0usize })
.scan(0, |sum, e| {
*sum += e;
Some(*sum)
})
.collect::<Vec<_>>();
Self { psums }
}
// Queries use 0-based indexing
pub fn q(&self, i: usize, j: usize) -> usize {
assert!(i < j);
assert!(j <= self.psums.len());
self.psums[j - 1] - if i != 0 { self.psums[i - 1] } else { 0 }
}
}
We are given an array \(A[1,n]\) and a set \(Q\) of \(q\) queries. Each query is a range sum query \(i,j\) which returns the sum of elements in \(A[i..j]\).
The goal is to permute the elements in \(A\) in order to maximize the sum of the results of the queries in \(Q\).
The main observation is that if we want to maximize the sum, we have to assign the largest values to the most frequently accessed entries. Thus, the solution consists of sorting both \(A\) by descending values and the indexes of \(A\) by descending frequency of access and pairing them in this order. Therefore, once we have computed the frequencies, the solution takes \(\Theta(n\log n)\) time.
Thus, we are left with the problem of computing access frequencies. In other words, we want to compute the array \(F[1,n]\), where \(F[i]\) is the number of times the index \(i\) belongs to a query of \(Q\). Computing this vector by updating every single entry in \(F\) for each query takes \(O(nq)\) and, thus, is clearly infeasible.
We require a faster algorithm to compute these frequencies. One possible solution involves using the sweep line algorithm. Since the queries represent intervals, and calculating the frequencies equates to counting the number of overlapping intervals at each position, we can employ an approach similar to the one used in solving the Maximum Number of Overlapping Intervals problem, as detailed in these notes.
This solution has a time complexity of $\Theta(q\log q)$, due to the comparison-based sorting of interval endpoints. Since the endpoints in our problem have a maximum value of $n$, we can optimize the solution to run in $\Theta(q)$ using counting sort. However, there exists an alternative solution based on prefix sums, which is much simpler to implement.
The main idea of this alternative solution is to construct an array \(U[1\ldots n]\) such that its prefix sums are equal to our target array \(F\). Interestingly, we need to modify just two entries of \(U\) to account for a query in \(Q\).
Initially, all the entries of \(U\) are set to \(0\). For a query \(\langle l, r \rangle\), we add \(1\) to \(U[l]\) and subtract \(1\) from \(U[r+1]\). This way, the prefix sums are as follows:
Therefore, the prefix sum of \(U\) up to \(i\) equals \(F[i]\). This algorithm takes \(O(q+n)\) time.
Here’s the Rust implemetation.
// We assumes queries are 0-based indexed
pub fn little_girl(a: &[i64], q: &[(usize, usize)]) -> i64 {
if a.is_empty() {
return 0;
}
let mut u = vec![0i64; a.len()];
for &(l, r) in q {
assert!(l <= r);
assert!(r < u.len());
u[l] += 1;
if r + 1 < u.len() {
u[r + 1] -= 1;
}
}
let mut f = u
.iter()
.scan(0, |sum, e| {
*sum += e;
Some(*sum)
})
.collect::<Vec<_>>();
// we sort both f and a in decreasing order, nothing changes
f.sort_unstable();
let mut a_sorted = a.to_vec();
a_sorted.sort_unstable();
a_sorted
.iter()
.zip(f)
.fold(0, |result, (value, freq)| result + value * freq)
}
Given an array \(A[1,n]\), count the number of ways to split the array into three contiguous parts so that they have the same sums.
More formally, you need to find the number of such pairs of indices \(i\) and \(j\) (\(2 \leq i \leq j \leq n-1\)) such that:
\[\sum_{k=1}^{i-1} A[k] = \sum_{k=i}^{j} A[j] = \sum_{k=j+1}^n A[k]\]For the solution, let \(S\) be the sum of the values in the array. If \(3\) does not divide \(S\), we conclude that the result is zero. Otherwise, we compute an array \(C\) that stores, at position \(i\), the number of suffixes of \(A[i\ldots n]\) that sum to \(\frac{S}{3}\). Then, we scan \(A\) from left to right to compute the prefix sums. Every time the prefix sum at position \(i\) is \(\frac{S}{3}\), we add \(C[i+2]\) to the result. This is because the part \(A[1..i]\) sums to \(S/3\) and can be combined with any pair of parts of \(A[i+1..n]\) where both parts sums to \(S/3\). Since the values in \(A[i+1..n]\) sum to \(2/3 S\), the number of such pairs is the number of suffixes that sum to \(S/3\) in \(A[i+2..n]\). Indeed, if one of this suffix sums to \(S/3\), say \(A[j..n]\), then we are sure that \(A[i+1, j-1]\) sums to \(S/3\).
Here’s a Rust implementation.
pub fn number_of_ways(a: &[i64]) -> usize {
let sum: i64 = a.iter().sum();
if sum % 3 != 0 {
return 0;
}
let target = sum / 3;
let mut c: Vec<_> = a
.iter()
.rev()
.scan(0, |sum, e| {
*sum += e;
Some(*sum)
})
.scan(0, |counter, sum| {
if sum == target {
*counter += 1usize
};
Some(*counter)
})
.collect();
c.reverse();
let mut result = 0;
let mut sum = 0;
for (i, &v) in a[..a.len() - 2].iter().enumerate() {
sum += v;
if sum == target {
result += c[i + 2];
}
}
result
}
These notes are for the “Competitive Programming and Contests” course at Università di Pisa.
]]>Let’s start the description of this paradigm with a problem on a line.
We are given a set of \(n\) intervals \([s_i, e_i]\) on a line.
We say that two intervals \([s_i, e_i]\) and \([s_j, e_j]\) overlaps if and only if their intersection is not empty, i.e., if there exist at least a point \(x\) belonging to both intervals.
The goal is to compute the maximum number of overlapping intervals.
For example, consider the set of intevals in the figure.
In this example, we have a set of \(10\) intervals. The maximum number of overlapping intervals is \(5\) (at positions \(3\) and \(4\)).
The sweep line algorithm employs an imaginary vertical line sweeping over the x-axis. As it progresses, we maintain a running solution to the problem at hand. The solution is updated when the vertical line reaches certain key points where some event happen. The type of the event tells us how to update the current solution.
To apply this paradigm to our problem, we let the sweep line move from left to right and stop at the beginning or the end of the intervals. These are the important points at which an event occurs: new intervals start or end. We also maintain a counter which keeps track of the number of intervals that are currently intersecting the sweep line, along with the maximum value reached by the counter so far. For each point, we first add to the counter the number of intervals that begin at that point, and then we subtract the number of intervals that end at that point.
The figure below shows the points touched by the sweep line and the values of the counter.
Note that the sweep line touches only points on the x-axis where an event occurs. For example, points \(1\) and \(6\) are not taken into consideration. This is important because the number of considered points, and thus the time complexity, is proportional to the number of intervals and not to the size of the x-axis.
Here is a Rust implementation. We represent each interesting point as a pair consisting of the point and the kind, which is either begin
or end
. Then, we sort the vector of pairs in increasing order. Finally, we compute every state of the counter and its largest value. The correctness of the solution is based on a specific detail in the sorting step: since begin
is considered smaller than end
, if two points are the same, we first have pairs with begin
and then pairs with end
.
#[derive(PartialOrd, Ord, PartialEq, Eq, Debug)]
enum Event {
Begin,
End,
}
pub fn max_overlapping(intervals: &[(usize, usize)]) -> usize {
let mut pairs: Vec<_> = intervals
.iter()
.flat_map(|&(b, e)| [(b, PointKind::Begin), (e, PointKind::End)])
.collect();
pairs.sort_unstable();
pairs
.into_iter()
.scan(0, |counter, (_, kind)| {
if kind == Event::Begin {
*counter += 1;
} else {
*counter -= 1;
}
Some(*counter)
})
.max()
.unwrap()
}
Let’s tackle a second problem to apply the sweep line paradigm to a two-dimensional problem.
We are given a set of \(n\) points in the plane.
The goal is to find the closest pair of points in the set. The distance between two points \((x_1, y_1)\) and \((x_2,y_2)\) is the Euclidian distance \(d((x_1,y_1), (x_2,y_2)) = \sqrt{(x_1-x_2)^2 +(y_1-y_2)^2}\).
A brute force algorithm calculates the distances between all possible pairs of points, resulting in a time complexity of \(\Theta(n^2)\).
A faster algorithm employs the sweep line paradigm. We start by sorting the points in increasing order of their x-coordinates. We keep track of the shortest distance, denoted as \(\delta\), seen so far. Initially, \(\delta\) is set to the distance between an arbitrary pair of points.
We use a vertical sweep line to iterate through the points, attempting to improve the current shortest distance \(\delta\). Consider the point \(p = (x, y)\) just reached by the vertical sweep line. We can improve \(\delta\) if the closest point to the left of \(p\) has a distance smaller than \(\delta\). If such a point exists, it must have an x-coordinate in the interval \([x - \delta, x]\), as it is to the left of \(p\), and a y-coordinate in the interval \([y - \delta, y + \delta]\).
The figure below shows the rectangle within which this point must lie. We have a fact that, at a first glance, may seem quite surprising: there can be at most \(6\) points within the rectangle. The \(6\) circles within the perimeter of the rectangle represent points that are at distance exactly \(\delta\) apart from each other. See the Section 5.4 of Algorithm Design by Kleinberg and Tardos for a proof of this fact.
For our purposes, a slightly weaker result is sufficient, which states that the rectangle contains at most \(8\) points.
To understand why, consider the \(8\) squares in the figure above. Each of these squares, including its perimeter, can contain at most one point. Assume, for the sake of contradiction, that a square contains two points, denoted as \(q\) and \(q'\). The distance between \(q\) and \(q'\) is smaller than \(\delta\). If point \(q'\) exists, it would have already been processed by the sweep line because it has an x-coordinate smaller than that of \(p\). However, this is not possible, because otherwise the value of \(\delta\) would be smaller than its current value.
Now that we have the intuition of the solution, let’s add more details. The algorithm maintains a BST with points sorted by their y-coordinates. When we process point \(p=(x,y)\), we iterate over the points with y-coordinates in the interval \([y-\delta, y+\delta]\). If the current point has a \(x\)-coordinate smaller than \(x-\delta\), we remove this point from the set. It will be never useful anymore. Otherwise, we compute its distance with \(p\) and update \(\delta\) if needed. Before moving the sweep line to the next point, we insert \(p\) in the set.
What is the complexity of this algorithm? Identifying the range of points with the required y-coordinates takes \(\Theta(\log n)\) time. Iterating over the points in this range takes constant time per point and removing one of them takes \(\Theta(\log n)\) time.
How many points do we need to iterate over? There can be at most \(6\) points that have an x-coordinate greater than or equal to \(x-\delta\) and therefore survive. On the other hand, there can be many points with smaller x-coordinates. However, since each point is inserted and subsequently removed from the set at most once during the execution of the algorithm, the cost of dealing with all these points is at most \(\Theta(n \log n)\).
The following is a Rust implementation of this algorithm. There are two differences from the above description. First, we compute the squared Euclidean distance. This way, we avoid the computation of the square root, which is slow and results in a floating-point value
The second difference is that we swap the roles of x and y. Therefore, we process the points by ascending y-coordinate and use a horizontal sweep line.
This is easier to implement in Rust. Indeed, with the original approach, we would need to insert points into a BTreeSet
ordered by y-coordinate, which is the second component of the pair. This ordering is not possible with a BTreeSet
unless we create a wrapper for a point that implements the required behavior for a comparison. Instead, if we swap the roles of x and y, the ordering by y is only required during the sorting step, which can be customized using the sort_unstable_by_key
method.
pub fn distance_squared(p: (i64, i64), q: (i64, i64)) -> i64 {
(p.0 - q.0).pow(2) + (p.1 - q.1).pow(2)
}
use std::collections::BTreeSet;
use std::ops::Bound::Included;
// Returns the (squared) Euclidean distance between the closest pair of
// points in `points`
pub fn closest_pair(points: &mut [(i64, i64)]) -> Option<i64> {
if points.len() < 2 {
return None;
}
points.sort_unstable_by_key(|p| (p.1, p.0)); // sort by y
let min_y = points[0].1;
let max_y = points.last()?.1;
let mut delta = distance_squared(points[0], points[1]);
let mut set: BTreeSet<(i64, i64)> = BTreeSet::new();
for &point in points.iter() {
// Search by x and select the points with too small y-coordinate that we remove
// to not touch them again in the future
let to_delete: Vec<_> = set
.range((
Included(&(point.0 - delta, min_y)),
Included(&(point.0 + delta, max_y)),
))
.filter(|p| p.0 - delta >= point.0)
.cloned()
.collect();
// Remove those points
for p in to_delete {
set.remove(&p);
}
// Search again and compute the distances with survived points.
// Update delta if needed.
delta = set
.range((
Included(&(point.0 - delta, min_y)),
Included(&(point.0 + delta, max_y)),
))
.fold(delta, |acc, &p| acc.min(distance_squared(point, p)));
set.insert(point);
}
Some(delta)
}
These notes are for the “Competitive Programming and Contests” course at Università di Pisa.
]]>The objective of this hands-on is to implement recursive traversals of a binary tree in Rust. These exercises are valuable for preparing for coding interviews and are worth attempting even if you are not enrolled in the course.
Let’s begin by describing a basic binary tree implementation in Rust.
In our implementation, a node is represented as a struct with three fields: the key
of the node, and the ids of its id_left
and id_right
children. We represent the entire tree using a vector of Node
s. Each node is implicitly assigned an ID that corresponds to its position in the vector.
Therefore, a node is defined as follows.
struct Node {
key: u32,
id_left: Option<usize>,
id_right: Option<usize>,
}
impl Node {
fn new(key: u32) -> Self {
Self {
key,
id_left: None,
id_right: None,
}
}
}
We have chosen to use u32
as the data type for the key
. Implementing a generic version of the Node<T>
structure is left as an exercise, albeit potentially quite boring one. Both id_left
and id_right
are of type Option<usize>
and store the IDs of the left and right children of the node, respectively. If a child does not exist, the corresponding ID is set to None
.
To create a node, you can use the new
function and specify its key
. The newly created node is considered a leaf and, thus, both children are None
.
Now, we are prepared to define the struct Tree
, which is just a vector of nodes.
struct Tree {
nodes: Vec<Node>,
}
In our implementation, we have chosen not to allow empty trees. This simplifies the code a little bit. However, it’s easy to reverse this decision if necessary.
You can create a new tree using the with_root(key: u32)
function, which initializes a new tree with a root having the specified key
. The ID of the root node is always 0
.
We have also decided to restrict operations to only insertions of new nodes; that is, deletions or modifications of existing nodes are not allowed. This limitation aligns with our objectives, as our primary focus is on tree traversal.
To insert a new node, you can use the add_node
method. When adding a new node, you need to specify its parent_id
, its key
, and a boolean value, is_left
, which indicates whether the node should be the left or right child of its parent. The method panics if the parent_id
is invalid or if the parent node has already assigned the child we are trying to insert.
The implementation of a tree is as follows.
impl Tree {
pub fn with_root(key: u32) -> Self {
Self {
nodes: vec![Node::new(key)],
}
}
/// Adds a child to the node with `parent_id` and returns the id of the new node.
/// The new node has the specified `key`. The new node is the left child of the
/// node `parent_id` iff `is_left` is `true`, the right child otherwise.
///
/// # Panics
/// Panics if the `parent_id` does not exist, or if the node `parent_id ` has
/// the child already set.
pub fn add_node(&mut self, parent_id: usize, key: u32, is_left: bool) -> usize {
assert!(
parent_id < self.nodes.len(),
"Parent node id does not exist"
);
if is_left {
assert!(
self.nodes[parent_id].id_left == None,
"Parent node has the left child already set"
);
} else {
assert!(
self.nodes[parent_id].id_right == None,
"Parent node has the right child already set"
);
}
let child_id = self.nodes.len();
self.nodes.push(Node::new(key));
let child = if is_left {
&mut self.nodes[parent_id].id_left
} else {
&mut self.nodes[parent_id].id_right
};
*child = Some(child_id);
child_id
}
}
Let’s implement a simple tree traversal to compute the sum of the keys in a binary tree. This can serve as an example for implementing the solutions for the three exercises below.
We will use a recursive function called rec_sum(&self, node_id: Option<usize>
). This function takes a node_id
as input and computes the sum of all the keys in the subtree rooted at node_id
. There are two possibilities. If node_id
is None
, the subtree is empty, and thus, the sum is 0
. However, if node_id
refers to a valid node, the sum of the keys is equal to the key of the current node plus the sums of its left and right subtrees. These latter sums are computed recursively.
Here is the Rust code. Note that we have the sum
method, which is responsible for calling rec_sum
at the root.
/// Returns the sum of all the keys in the tree
pub fn sum(&self) -> u32 {
self.rec_sum(Some(0))
}
/// A private recursive function that computes the sum of
/// nodes in the subtree rooted at `node_id`.
fn rec_sum(&self, node_id: Option<usize>) -> u32 {
if let Some(id) = node_id {
assert!(id < self.nodes.len(), "Node id is out of range");
let node = &self.nodes[id];
let sum_left = self.rec_sum(node.id_left);
let sum_right = self.rec_sum(node.id_right);
return sum_left + sum_right + node.key;
}
0
}
The code described so far is here.
Write a method to check if the binary tree is a Binary Search Tree.
Write a method to check if the binary tree is balanced.
A tree is considered balanced if, for each of its nodes, the heights of its left and right subtrees differ by at most one.
Write a method to check if the binary tree is a max-heap.
A max-heap is a complete binary tree in which every node satisfies the max-heap property. A node satisfies the max-heap property if its key is greater than or equal to the keys of its children.
In the code snippet below, we provide a (limited) set of tests for the sum
method.
This code also shows how to construct a binary tree using our implementation.
To ensure the robustness of your solutions, we strongly recommend adding a comprehensive suite of tests
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_sum() {
let mut tree = Tree::with_root(10);
assert_eq!(tree.sum(), 10);
tree.add_node(0, 5, true); // id 1
tree.add_node(0, 22, false); // id 2
assert_eq!(tree.sum(), 37);
tree.add_node(1, 7, false); // id 3
tree.add_node(2, 20, true); // id 4
assert_eq!(tree.sum(), 64);
}
}
Submit a file lib.rs
and a file Handson_1_solution_YOUR_NAME.pdf
to rossano.venturini@gmail.com by 19/10/2023.
lib.rs
contains your implementations and a large set of tests.Handson_1_solution_YOUR_NAME.pdf
that briefly describes your implementations.Before submitting your solutions,
cargo fmt
to format your code.cargo clippy
to check your code.Very important! You are allowed to verbally discuss solutions with other students, BUT you must implement all solutions by yourself. Therefore, sharing implementations with others is strictly forbidden.
]]>Binary search repeatedly divides the search range in half until the target element is found or the search range becomes empty, resulting in a time complexity of \(\Theta(\log n)\). This is one of the easiest applications of the Divide-and-Conquer paradigm.
The divide-and-conquer paradigm tackles a complex problem by breaking it down into smaller, more manageable subproblems of the same type. These subproblems are addressed recursively, and their solutions are combined to yield the solution for the original problem.
More precisely, a divide-and-conquer-based algorithm follows three main steps:
We can apply the above paradigm to search for a key in a sorted array of \(n\) elements within \(\Theta(\log n)\) comparisons.
A Rust implementation of binary search is the following.
fn binary_search<T: Ord>(arr: &[T], key: T) -> Option<usize> {
let mut low = 0;
let mut high = arr.len();
while low < high {
let middle = low + (high - low)/2;
match key.cmp(&arr[middle]) {
std::cmp::Ordering::Equal => return Some(middle),
std::cmp::Ordering::Less => high = middle,
std::cmp::Ordering::Greater => low = middle + 1,
}
}
None
}
The generic implementation above works for types that are Ord
. Ord
is the trait for types that form a total order. The method cmp
returns an Ordering
between two elements:
In our case, the key
we are looking for and the element in the middle.
We use the result of this comparison to check for a match or to move either low
after middle
or high
to middle
. Note that the position high
is not included in the range.
It is worth noticing the expression middle = low + (high - low)/2
to compute the position in the middle of the current range.
A lot of existing implementations on the net use instead the expression middle = (low + high) / 2
, which is buggy.
Indeed, it leads to overflow if low + high
is greater than usize::MAX
.
It is also important to observe that when there are multiple occurrences of the searched key, the function returns the position of the first encountered occurrence, not necessarily the first occurrence in the vector. This behavior aligns with the implementation of binary search in the Rust Standard Library. However, it is often very useful to report the position of the first (or last) occurrence of the searched key. We can obtain this behavior with the following implementation.
fn binary_search<T: Ord>(arr: &[T], key: T) -> Option<usize> {
let mut low = 0;
let mut high = arr.len(); // note that high is excluded
let mut ans = None;
while low < high {
let middle = low + (high - low) / 2;
match key.cmp(&arr[middle]) {
std::cmp::Ordering::Equal => {
ans = Some(middle);
high = middle
}
std::cmp::Ordering::Less => high = middle,
std::cmp::Ordering::Greater => low = middle + 1,
}
}
ans
}
In this implementation, when a match is found, we do not immediately return its position.
Instead, we update the ans
variable and set high
to the position of this occurrence.
This way, we continue the search in the first half of the array, seeking additional occurrences of the key
. If there are more matches, ans
will be further updated with smaller positions.
As a useful exercise you could try to modify the code above to return the smallest position such that the element at that position is greater than or equal to key
. In other word, if the key
is not in the slice, it returns the position of its successor.
Instead of implementing the code above, we can find the first (or even the last) occurrence of a key with partition_point method of the standard library. This method is even more generic that our code above. Indeed, it returns the index of the partition point in a sorted vector according to any given predicate.
Consider a problem where all the possible candidate answers are restricted to a range of values between certain low
and high
possible answers. In other words, any candidate answer \(x\) falls within the range [low, high)
.
We also have a boolean predicate pred
defined on the candidate answers that tells us if an answer is good or not for our aims. Our goal is to find the largest good answer.
When no assumptions are made about the predicate, we cannot do better than evaluating the predicate on all the possible answers. So, the number of times we evaluate the predicate is \(\Theta(n)\), where \(n = high-low\) is the number of possible answers.
Instead, if the predicate is monotone, we can binary search the answer to find it with \(\Theta(\log n)\) evaluations. This strategy is implemented by the generic function below.
use num::FromPrimitive;
use num::Num;
use std::cmp::PartialOrd;
fn binary_search_range<T, F>(low: T, high: T, pred: F) -> Option<T>
where
T: Num + PartialOrd + FromPrimitive + Copy,
F: Fn(T) -> bool,
{
let mut low = low;
let mut high = high;
let mut ans = None;
while low < high {
let middle = low + (high - low) / FromPrimitive::from_u64(2).unwrap();
match pred(middle) {
true => {
low = middle + T::one();
ans = Some(middle)
}
false => high = middle,
}
}
ans
}
The function takes the extremes (of type T
) of the range and the predicate as an argument.
We use the external crate Num to require some basic arithmetic operations for type T
. The function returns the largest element of the range satisfying the predicate, or None
if there is no such element.
Let’s use this function to solve problems.
An example is the problem Sqrt.
We are given a non-negative integer \(v\) and we want to compute the square root of \(v\) rounded down to the nearest integer.
The possible answers are in \([0, v]\). For each candidate answer \(x\), the predicate is \(p(x) = x^2 <= v\). Thus, we can find the result in \(\Theta(\log v)\) time.
Thus, a one-line solution is
fn sqrt(v: u64) -> u64 {
binary_search_range(0, v + 1, |x| x * x <= v).unwrap()
}
Let’s consider another problem.
We have a sequence of \(n\) mutually-disjoint intervals. The extremes of each interval are non-negative integers. We aim to find \(c\) integer points within the intervals such that the smallest distance \(d\) between consecutive selected points is maximized.
Guess what? A solution to this problem binary searches the answer, the target distance \(d\). Why is this possible? If a certain distance is feasible (i.e., there exists a selection of points at that distance), then any smaller distance is also feasible. Thus, the feasibility is a monotone boolean predicate that we can use to binary search the answer.
As the candidate answers range from \(1\) to \(l\), where \(l\) is the overall length of the intervals, the solution takes \(\Theta(\log l)\) evaluations of the predicate.
What’s the cost of evaluating the predicate? Well, we first sort the intervals. Now, we can evaluate any candidate distance \(d'\) by scanning the sorted intervals from left to right. First, we select the left extreme of the first interval as the first point. Then, we move over the intervals, and we choose greedly the first point, which is at a distance at least \(d'\) from the previous one. Thus, an evaluation of the predicate takes \(\Theta(n)\) time.
The overall running time is \(\Theta(n\log l)\).
A Rust implementation of this strategy is the following.
fn select_intervals(intervals: &mut Vec<(usize, usize)>, c: usize) -> Option<usize> {
let l = intervals
.iter()
.fold(0, |acc, interval| acc + interval.1 - interval.0 + 1); // overall length
if l < c {
// there is no solution
return None;
}
intervals.sort_unstable();
// A closure implements our predicate
let pred = |d: usize| -> bool {
let mut last_selected = intervals[0].0;
let mut cnt = 1;
for &interval in intervals.iter() {
while interval.0.max(last_selected + d) <= interval.1 {
last_selected = interval.0.max(last_selected + d);
cnt += 1;
}
}
cnt >= c
};
binary_search_range(1, l + 1, pred)
}
These notes are for the “Competitive Programming and Contests” course at Università di Pisa.
]]>Formally, the problem can be defined as follows.
Given an array \(A[0,n-1]\) and an integer \(k\), the goal is to find the maximum of each subarray (window) of \(A\) of size \(k\).
The simplest approach to address this problem involves handling each of the \(n-k+1\) windows independently. Within each window, we calculate its maximum by scanning through all its elements, which takes \(\Theta(k)\) time. Consequently, this straightforward brute force solution operates in \(\Theta(nk)\) time.
Here is a Rust implementation.
fn brute_force(v: &Vec<i32>, k: usize) -> Vec<i32> {
let n = v.len();
if k > n {
return Vec::<i32>::new();
}
let mut maxs = Vec::with_capacity(n - k + 1);
for i in 0..(n - k + 1) {
let current_slice = &v[i..i + k];
let max_value = *current_slice.iter().max().unwrap();
maxs.push(max_value);
}
maxs
}
A more elegant one-line implementation uses combinators.
fn brute_force(v: &Vec<i32>, k: usize) -> Vec<i32> {
v.windows(k).map(|w| *w.iter().max().unwrap()).collect()
}
The inefficiency of this solution stems from the fact that when calculating the maximum of a window, we disregard all the computations previously performed to determine the maxima of the preceding windows.
Enhancing the brute force solution above entails leveraging a data structure to efficiently handle the next window while capitalizing on the progress made in processing the preceding window. The design of a faster solution begins with two straightforward observations:
Firstly, we can represent the elements within a window as a multiset \({\cal M}\) of size \(k\). In this representation, the result for the window is essentially the largest element contained within this multiset.
Secondly, when we transition from one window to the next, only two elements change: the first element of the first window exits from the scene, and the last element of the second one enters. Consequently, we can derive the multiset of the new window from the multiset of the previous window by simply adding one element and removing another one.
Hence, we require a data structure capable of performing three crucial operations on a (multi)set: inserting a new element, deleting an arbitrary element, and efficiently retrieving the maximum element within the multiset. By employing such a versatile data structure, we can seamlessly move the window across the array while efficiently updating and querying the multiset to calculate the desired results. Now the question is: What’s the best data structure supporting these operations? A Balanced Binary Search Tree (BBST) supports any of these operations in \(\Theta(\log |{\cal M}|)\), where \(|{\cal M}|\) is the number of elements in the multiset (and it is optimal in the comparison model). This way, we can solve the problem in \(\Theta(n \log k)\) time.
A Rust implementation of this strategy is as follows. Here we use BTreeSet. A BTreeSet
is BST-like data structure that represents a set of unique, ordered elements. It provides efficient implementation of insertion (insert), deletion (remove), and maximum (last) operations.
The only issue to deal with is that a BTreeSet
does not store repeated values.
For this reason, we store elements together with their positions. This way, every element in the array is inserted as a unique pair in the BTreeSet
.
use std::collections::BTreeSet;
pub fn bst(nums: &Vec<i32>, k: usize) -> Vec<i32> {
let n = nums.len();
if k > n {
return Vec::<i32>::new();
}
let mut maxs = Vec::with_capacity(n - k + 1);
let mut set = BTreeSet::new();
let mut max_sf = nums[0];
for (i, &v) in nums.iter().enumerate() {
set.insert((v, i));
// keep track of the max so far to avoid a costly query to the set
max_sf = max_sf.max(v);
if i >= k {
set.remove(&(nums[i - k], i - k));
if max_sf == nums[i - k] {
max_sf = set.last().unwrap().0;
}
}
if i >= k - 1 {
maxs.push(max_sf);
}
}
maxs
}
In this implementation we’ve incorporated a straightforward optimization to reduce the number of calls to set.last()
. We keep track of the maximum element encountered so far, denoted as max_sf
. We only call set.last()
when max_sf
might be invalidated by a delition of an element equals to max_sf
.
It’s worth noting an alternative solution that, theoretically, is slightly less efficient than the previous one (i.e., \(\Theta(n\log n)\) instead of \(\Theta(n\log k)\)). However, in practice, this alternative solution often proves to be faster.
As we are talking about maximum, the immediate choice that springs to mind is the priority queue, with its most renowned manifestation being the (max-)heap. A max-heap stores a set \(n\) of keys and supports three operations:
We can solve the sliding window maximum problem by employing a max-heap and scanning the array from left to right. Here’s how it works.
Initially, we populate a max-heap with the first \(k\) elements of \(A\) along with their respective positions. This gives us the maximum within the initial window, which is essentially the maximum provided by the heap.
As we move on to process the remaining elements of \(A\) one by one, we insert each current element into the heap alongside its position. We then request the heap to provide us with the current maximum. However, it’s important to note that this reported maximum element might fall outside the current window’s boundaries. To address this, we continuously extract elements from the heap until the reported maximum is within the constraints of the current window.
A Rust implementation of this strategy is the following one.
use std::collections::BinaryHeap;
pub fn heap(nums: &Vec<i32>, k: usize) -> Vec<i32> {
let n = nums.len();
if k > n {
return Vec::<i32>::new();
}
let mut heap: BinaryHeap<(i32, usize)> = BinaryHeap::new();
for i in 0..k - 1 {
heap.push((nums[i], i));
}
let mut maxs = Vec::with_capacity(n - k + 1);
for i in k - 1..n {
heap.push((nums[i], i));
while let Some((_, idx)) = heap.peek() {
if *idx < i - (k - 1) {
heap.pop();
} else {
break;
}
}
maxs.push(heap.peek().unwrap().0);
}
maxs
}
It’s worth noting that with this approach, there are a total of \(n\) insertions and at most \(n\) extractions of the maximum in the heap. Since the maximum number of elements present in the heap at any given time is up to \(n\), each of these operations takes \(\Theta(\log n)\) time. Consequently, the overall time complexity is \(\Theta(n\log n)\).
Can we achieve a better solution than the one presented earlier? Given that this section is titled linear time solution, you might rightly speculate, “Yes, it’s possible.” But, why is it intuitively reasonable to think about an improvement? Well, a good point is to observe that the BST-based solution can do much more than what is needed. If I ask you: What’s the second largest element in the window? No problem, the second largest element is the predecessor of the maximum and a BST supports also this operation in \(\Theta(\log n)\) time. You would be able to report the top-\(x\) largest or smallest elements in \(\Theta(x + \log n)\) time (How?). This is because the BST is implicitly keeping all the elements of all the windows sorted. The fact that we can do much more than what is requested, it’s an important signal to think that a faster solution could exist. Still, the title of this section is a stronger one.
Surprisingly, the better solution uses an elementary data structure: a queue. We require a Double-Ended Queue (Deque), which supports constant time insertion, removal and access at the front and the back of the queue. There are several ways to implement a deque. The easiest (but not the fastest) way is probably with a bidirectional list.
The algorithm starts with an empty deque \(Q\) and with the window \(W\) that covers the positions in the range \(\langle -k, -1 \rangle\). That is, the window starts before the beginning of the array \(A\). Then, we start sliding the window one position at a time and remove/insert elements from \(Q\). We claim that the front of \(Q\) will be the element to report.
More precisely, we repeat \(n\) times the following steps.
The implementation of the above solution is the following.
use std::collections::VecDeque;
fn linear(nums: &Vec<i32>, k: usize) -> Vec<i32> {
let n = nums.len();
if k > n {
return Vec::<i32>::new();
}
let mut q: VecDeque<usize> = VecDeque::new();
let mut maxs: Vec<i32> = Vec::with_capacity(n - k + 1);
for i in 0..k {
while (!q.is_empty()) && nums[i] > nums[*q.back().unwrap()] {
q.pop_back();
}
q.push_back(i);
}
maxs.push(nums[*q.front().unwrap()]);
for i in k..n {
while !q.is_empty() && q.front().unwrap() + k <= i {
// more idiomatic while let Some(&(p,_)) = q.front()
q.pop_front();
}
while (!q.is_empty()) && nums[i] > nums[*q.back().unwrap()] {
q.pop_back();
}
q.push_back(i);
maxs.push(nums[*q.front().unwrap()]);
}
maxs
}
Let’s prove the correctness of this solution. Looking at the running example below we enthusiastically think: “it could work” cit. Why?
We first observe that the elements in \(Q\) are always sorted in decreasing order. This can be proved by induction on the number of iterations. The claim is true for the initial \(Q\) as it is empty. Given the queue after \(i\) iterations, by hypothesis, it is sorted. The current iteration will only remove elements (no change in the ordering of the remaining elements) or insert the current element \(A[i+1]\) as the tail of the queue just below the first element which is larger than it (if any). Thus, the queue remains sorted.
The sortedness of \(Q\) is a nice starting point for proving the correctness but it’s not enough. We need now to introduce the definition of right leaders of the window to show that the largest element within the current window is at the top of the queue. Given a window, an element is called a right leader if and only if the element is larger than any other element of the window at its right.
As an example, consider the window of size \(5\) below.
The right leaders of this window are drawn in red.
We are now ready to prove a nice property of the elements in \(Q\): At every iteration, \(Q\) contains all and only the right leaders of the current window.
This is quite easy to see. Firstly, any right leader cannot be removed from \(Q\) as all the subsequent elements are smaller than it. Secondly, any non-right leader will be removed as soon as the next right leader enters \(Q\). Finally, any element outside the window cannot be in \(Q\). By contradiction, let us assume that \(Q\) contains one such element, say \(a\). Let \(r\) be the largest right leader. On the one hand, \(a\) cannot be smaller than or equal to \(r\), otherwise \(a\) would be removed when inserting \(r\) in \(Q\). On the other hand, \(a\) cannot be larger than \(r\), otherwise, it would be in front of \(Q\) and removed by the first inner loop.
We derive the correctness of the algorithm by combining the sortedness of \(Q\) with the fact that the largest right leader is the element to report.
Let us show that the algorithm runs in linear time.
We first use the standard approach to analyze an algorithm. We have a loop that is repeated \(n\) times. What’s the cost of an iteration? Looking at the implementation it should be clear that its cost is dominated by the cost (and, thus, number) of pop operations. However, in a certain iteration, we may pop out all the elements in the deque. As far as we know there may be up to \(n\) elements in the deque and, thus, an iteration costs \(O(n)\) time. So, the best we can conclude is that the algorithm runs in \(O(n^2)\) time. Can’t go too far with this kind of analysis!
In fact, there may indeed exist very costly iterations, but they are greatly amortized by many very cheap ones. Indeed, the overall number of pop operations cannot be larger than \(n\) as any element is not considered anymore by the algorithm as soon as it is removed from \(Q\). Each of them costs constant time and, thus, the algorithm runs in linear time.
As an useful exercise, you could try to adapt the previous solution to solve the Next Larger Element problem, which is as follows.
Given an array \(A[0,n-1]\) having distinct elements, the goal is to find the next greater element for each element of the array in order of their appearance in the array.
These notes are for the “Competitive Programming and Contests” course at Università di Pisa.
]]>