Pagini: [1]   În jos
  Imprimă  
Ajutor Subiect: Rolling hash, Rabin Karp, palindromes, rsync and others  (Citit de 4071 ori)
0 Utilizatori şi 1 Vizitator pe acest subiect.
Cosmin
Echipa infoarena
Nu mai tace
*****

Karma: 351
Deconectat Deconectat

Mesaje: 1.799



Vezi Profilul
« : Octombrie 30, 2012, 11:46:19 »

http://infoarena.ro/blog/rolling-hash
« Ultima modificare: Octombrie 30, 2012, 12:13:57 de către Cosmin Negruseri » Memorat
Cosmin
Echipa infoarena
Nu mai tace
*****

Karma: 351
Deconectat Deconectat

Mesaje: 1.799



Vezi Profilul
« Răspunde #1 : Octombrie 31, 2012, 12:34:09 »

So do you guys know how to solve all the proposed problems?
Memorat
wickedman
Echipa infoarena
Nu mai tace
*****

Karma: 227
Deconectat Deconectat

Mesaje: 670



Vezi Profilul WWW
« Răspunde #2 : Octombrie 31, 2012, 12:36:10 »

Frumos scris și foarte faine problemele pe care le-ai pus în articol. Mi-ai adus aminte de o altă problemă pe care ai pus-o într-un post precedent:

Citat
4 reversals We are given two equal length strings S and T. Figure out if we can get string T starting from string S and applying 4 substring reversal operations. (Hint: complexity O(n5))

Cred că se poate reduce la O(n4)) cu un rolling hash.
Memorat
a_h1926
Echipa infoarena
Nu mai tace
*****

Karma: 317
Deconectat Deconectat

Mesaje: 385



Vezi Profilul
« Răspunde #3 : Octombrie 31, 2012, 15:35:30 »

1. We want to count separately the number of palindromic sequences with even length and uneven length. We will now try to solve the problem for even palindromic sequences, as the other case is symmetrical. For every i from 1 to n, we will count how many palindromic sequences are centered on (i, i+1), which is equivalent to finding the largest possible L such that subsequence (i-L, i+1+L) is a palindrome. We can use binary search to find this value, and we now need to implement fast string comparison, which we can do by preprocessing some hash values.

2. Let’s consider the function F(x) = the maximum number of occurences of some substring of length x from the original string. We can observe that F is a decreasing function, thus we can use binary search to find the largest x such that F(x) >= k. Now we need to compute F(x) efficiently. We will iterate over the original string and insert the hash code for every substring of length x in a table. The maximum number of equal values in this table will be the value of F(x), and this can be easily computed in O(n) if we update the maximum value everytime we insert a new substring in the hash (for every code we keep in the table how many elements have that code). The final complexity is O(nlogn)

3. We could use binary search to find the side length of the square, and for a fixed length L, we want to see if there are at least 2 squares that are identical. For every element in the bitmap, we will compute a hash code that represents the substring formed by it and the previous L-1 elements on its line. Now, we apply the same process to the newly computed table, but for the previous L-1 elements on it’s column, and so, every element in the final table will have the hash code for an entire square. The only thing left to do is iterate over the elements in the final table and see if there is at least one duplicate. Final complexity is (n^2 logn), if the bitmap has size nxn.

4. If the string has period L, then S[1..L] = S[L+1...2L] = ... = S[k*L+1, (k+1)*L], with k from 0 to N/L - 1. This is exactly what we will try to find out, if the substring S[1...L] equals every other substring considered above, and for this we can use precomputed hash codes. Now, we just need to check this for every L from 1 to N. The number of steps is n (1 + 1/2 + 1/3 + ... + 1/n) = nlogn, so the final complexity is O(nlogn).

5. We build the hash code for both strings. Now, we will literally rotate one of the strings by updating its hash code (we erase the first character and add the new one), and we only need to see if there is at least one position in which the two hash codes are equal. Since there are n possible rotations, we have O(n) time complexity.

6. For two polygons to be similar, the corresponding angles must be equal and the ratio between the corresponding segments must be the same (if we have n vertices A1, A2, ..., An and B1, B2, ..., Bn, then A1A2 / B1B2 = A2A3/B2B3, etc.). From A1A2/B1B2 = A2A3/B2B3, we also deduce that A1A2/A2A4 = B1B2/B2B3, and similarly we have the equivalence between the previous relations and A(k)A(k+1)/A(k+1)A(k+2) = B(k)B(k+1)/B(k+1)B(k+2). Now, for every vertex in each polygon, we will associate it two values: the angle formed by the two segments incident in it, and the ratio between the two segments. We notice that each polygon was reduced to a string, in which a character consists of these two values. We observe that for the two polygons to be similar, one of the strings must be a rotation of the other, and we reduced this problem to the previous one which has an O(n) solution. Also, the number of steps to build the two “strings” takes time O(n), so the final complexity is O(n). The only thing at which we must be careful is in case we can "mirror" one of the polygons, case in which we just need to check with one of the strings reversed, so this doesn't influence the time complexity.

7. We will try to see the longest periodic prefix that has a fixed period L. This is equivalent to finding the largest k such that S[1...L] = S[L+1...2L] = ... = S[(k-1)L+1...kL], and the length of that prefix will be L*k. We can proceed just as in problem 4 and the resulting complexity is O(nlogn).

8. We will use a DFS and we keep a stack with the currently active nodes, so we can determine quickly which is the k-th father of a node. Let’s denote F[ x ][k] = the k-th father of x. Now, when we are at a node x, we want to see if the string formed by F[ x ][k-1], F[ x ][k-2], ..., F[ x ], x matches string P. We can do this by computing hash codes for every node x based on the hash code of its father, and compare it with the hash code of string P. When we compute the hash code for node x, the new character added to the string is the one associated with this node and the character we delete is the one associated to the k-th father of x. Resulting complexity is (n), where n is the number of nodes.
« Ultima modificare: Octombrie 31, 2012, 19:01:43 de către Heidelbacher Andrei » Memorat
Cosmin
Echipa infoarena
Nu mai tace
*****

Karma: 351
Deconectat Deconectat

Mesaje: 1.799



Vezi Profilul
« Răspunde #4 : Noiembrie 01, 2012, 09:12:38 »

@Cristian: Yup I think you're right Smile.
@Andrei: Nice! I think my solutions were the same.
« Ultima modificare: Noiembrie 01, 2012, 09:28:14 de către Cosmin Negruseri » Memorat
flmanea
Client obisnuit
**

Karma: 78
Deconectat Deconectat

Mesaje: 68



Vezi Profilul
« Răspunde #5 : Noiembrie 01, 2012, 13:17:18 »

@Andrei: Actually, problem 4 can be solved in a much easier way Smile
You are just interested in the existence of a border in the word S. That is, some L such that S[1..L]=S[N-L+1..N]. This can be tested in linear time with suffix arrays (or whatever data structure you prefer), as well as with rolling hash.
But, anyway, nice solutions Very Happy
Memorat
a_h1926
Echipa infoarena
Nu mai tace
*****

Karma: 317
Deconectat Deconectat

Mesaje: 385



Vezi Profilul
« Răspunde #6 : Noiembrie 01, 2012, 13:59:47 »

@Andrei: Actually, problem 4 can be solved in a much easier way Smile
You are just interested in the existence of a border in the word S. That is, some L such that S[1..L]=S[N-L+1..N]. This can be tested in linear time with suffix arrays (or whatever data structure you prefer), as well as with rolling hash.
But, anyway, nice solutions Very Happy


I think it can be found with KMP algorithm as well, and the solution is even simplier than with suffix arrays, but I thought we should have found a solution with rolling hashes, since that's the name of the post. Anyway, thanks for the observation.
Memorat
flmanea
Client obisnuit
**

Karma: 78
Deconectat Deconectat

Mesaje: 68



Vezi Profilul
« Răspunde #7 : Noiembrie 01, 2012, 14:38:04 »

Yeah, actually "whatever data structure you prefer" included the functions computed by KMP Smile But I must confess that I kind of like suffix arrays, so I just gave them priority Very Happy Anyway, as I said, the linear solution can be achieved with some sort of rolling hash as well.

Memorat
chris_11
Strain


Karma: 1
Deconectat Deconectat

Mesaje: 1



Vezi Profilul
« Răspunde #8 : Noiembrie 02, 2012, 00:31:00 »

Nice article and cute problems Smile. In the text of the first additional problem there is a small typo, I think you meant palindromic substrings, not subsequences (I think everybody understood substrings anyway Smile ).

P.S.: Oh, and while I'm still in the bitchin' mode: I had an issue while viewing comments on the blog page after logging in: it says "Continutul nu poate fi descarcat". It seems to happen when using the https protocol (the browser isn't very happy about performing an AJAX request to an unsecured location). Sorry for the digression.
Memorat
Cosmin
Echipa infoarena
Nu mai tace
*****

Karma: 351
Deconectat Deconectat

Mesaje: 1.799



Vezi Profilul
« Răspunde #9 : Noiembrie 03, 2012, 01:54:43 »

@Florin. Yes, other problems also have better solutions. For example the Longest palindrome can be solved in linear time using a suffix tree or using clever trick by Manacher (http://books.google.com/books?id=9NdohJXtIyYC&lpg=PP1&ots=llc90BNZHd&dq=rytter%20jewels%20o&pg=PA111#v=onepage&q&f=false)
« Ultima modificare: Noiembrie 03, 2012, 08:47:31 de către Cosmin Negruseri » Memorat
Pagini: [1]   În sus
  Imprimă  
 
Schimbă forumul:  

Powered by SMF 1.1.19 | SMF © 2006-2013, Simple Machines