Rolling hash, Rabin Karp, palindromes, rsync and others

•Cosmin

Echipa infoarena
Nu mai tace

Karma: 351

Deconectat

Mesaje: 1.799

Rolling hash, Rabin Karp, palindromes, rsync and others

« : Octombrie 30, 2012, 11:46:19 »

http://infoarena.ro/blog/rolling-hash


« Ultima modificare: Octombrie 30, 2012, 12:13:57 de către Cosmin Negruseri »	Memorat

•Cosmin

Echipa infoarena
Nu mai tace

Karma: 351

Deconectat

Mesaje: 1.799

RÄƒspuns: Rolling hash, Rabin Karp, palindromes, rsync and others

« Răspunde #1 : Octombrie 31, 2012, 12:34:09 »

So do you guys know how to solve all the proposed problems?


	Memorat

•wickedman

Echipa infoarena
Nu mai tace

Karma: 227

Deconectat

Mesaje: 670

RÄƒspuns: Rolling hash, Rabin Karp, palindromes, rsync and others

« Răspunde #2 : Octombrie 31, 2012, 12:36:10 »

Frumos scris È™i foarte faine problemele pe care le-ai pus Ã®n articol. Mi-ai adus aminte de o altÄƒ problemÄƒ pe care ai pus-o Ã®ntr-un post precedent:

Citat

4 reversals We are given two equal length strings S and T. Figure out if we can get string T starting from string S and applying 4 substring reversal operations. (Hint: complexity O(n⁵))

Cred cÄƒ se poate reduce la O(n⁴)) cu un rolling hash.


	Memorat

•a_h1926

Echipa infoarena
Nu mai tace

Karma: 317

Deconectat

Mesaje: 385

RÄƒspuns: Rolling hash, Rabin Karp, palindromes, rsync and others

« Răspunde #3 : Octombrie 31, 2012, 15:35:30 »

1. We want to count separately the number of palindromic sequences with even length and uneven length. We will now try to solve the problem for even palindromic sequences, as the other case is symmetrical. For every i from 1 to n, we will count how many palindromic sequences are centered on (i, i+1), which is equivalent to finding the largest possible L such that subsequence (i-L, i+1+L) is a palindrome. We can use binary search to find this value, and we now need to implement fast string comparison, which we can do by preprocessing some hash values.

2. Letâ€™s consider the function F(x) = the maximum number of occurences of some substring of length x from the original string. We can observe that F is a decreasing function, thus we can use binary search to find the largest x such that F(x) >= k. Now we need to compute F(x) efficiently. We will iterate over the original string and insert the hash code for every substring of length x in a table. The maximum number of equal values in this table will be the value of F(x), and this can be easily computed in O(n) if we update the maximum value everytime we insert a new substring in the hash (for every code we keep in the table how many elements have that code). The final complexity is O(nlogn)

3. We could use binary search to find the side length of the square, and for a fixed length L, we want to see if there are at least 2 squares that are identical. For every element in the bitmap, we will compute a hash code that represents the substring formed by it and the previous L-1 elements on its line. Now, we apply the same process to the newly computed table, but for the previous L-1 elements on itâ€™s column, and so, every element in the final table will have the hash code for an entire square. The only thing left to do is iterate over the elements in the final table and see if there is at least one duplicate. Final complexity is (n^2 logn), if the bitmap has size nxn.

4. If the string has period L, then S[1..L] = S[L+1...2L] = ... = S[k*L+1, (k+1)*L], with k from 0 to N/L - 1. This is exactly what we will try to find out, if the substring S[1...L] equals every other substring considered above, and for this we can use precomputed hash codes. Now, we just need to check this for every L from 1 to N. The number of steps is n (1 + 1/2 + 1/3 + ... + 1/n) = nlogn, so the final complexity is O(nlogn).

5. We build the hash code for both strings. Now, we will literally rotate one of the strings by updating its hash code (we erase the first character and add the new one), and we only need to see if there is at least one position in which the two hash codes are equal. Since there are n possible rotations, we have O(n) time complexity.

6. For two polygons to be similar, the corresponding angles must be equal and the ratio between the corresponding segments must be the same (if we have n vertices A1, A2, ..., An and B1, B2, ..., Bn, then A1A2 / B1B2 = A2A3/B2B3, etc.). From A1A2/B1B2 = A2A3/B2B3, we also deduce that A1A2/A2A4 = B1B2/B2B3, and similarly we have the equivalence between the previous relations and A(k)A(k+1)/A(k+1)A(k+2) = B(k)B(k+1)/B(k+1)B(k+2). Now, for every vertex in each polygon, we will associate it two values: the angle formed by the two segments incident in it, and the ratio between the two segments. We notice that each polygon was reduced to a string, in which a character consists of these two values. We observe that for the two polygons to be similar, one of the strings must be a rotation of the other, and we reduced this problem to the previous one which has an O(n) solution. Also, the number of steps to build the two â€œstringsâ€ takes time O(n), so the final complexity is O(n). The only thing at which we must be careful is in case we can "mirror" one of the polygons, case in which we just need to check with one of the strings reversed, so this doesn't influence the time complexity.

7. We will try to see the longest periodic prefix that has a fixed period L. This is equivalent to finding the largest k such that S[1...L] = S[L+1...2L] = ... = S[(k-1)L+1...kL], and the length of that prefix will be L*k. We can proceed just as in problem 4 and the resulting complexity is O(nlogn).

8. We will use a DFS and we keep a stack with the currently active nodes, so we can determine quickly which is the k-th father of a node. Letâ€™s denote F[ x ][k] = the k-th father of x. Now, when we are at a node x, we want to see if the string formed by F[ x ][k-1], F[ x ][k-2], ..., F[ x ], x matches string P. We can do this by computing hash codes for every node x based on the hash code of its father, and compare it with the hash code of string P. When we compute the hash code for node x, the new character added to the string is the one associated with this node and the character we delete is the one associated to the k-th father of x. Resulting complexity is (n), where n is the number of nodes.


« Ultima modificare: Octombrie 31, 2012, 19:01:43 de către Heidelbacher Andrei »	Memorat

•Cosmin

Echipa infoarena
Nu mai tace

Karma: 351

Deconectat

Mesaje: 1.799

RÄƒspuns: Rolling hash, Rabin Karp, palindromes, rsync and others

« Răspunde #4 : Noiembrie 01, 2012, 09:12:38 »

@Cristian: Yup I think you're right Smile

.
@Andrei: Nice! I think my solutions were the same.


« Ultima modificare: Noiembrie 01, 2012, 09:28:14 de către Cosmin Negruseri »	Memorat

•flmanea

Client obisnuit

Karma: 78

Deconectat

Mesaje: 68

RÄƒspuns: Rolling hash, Rabin Karp, palindromes, rsync and others

« Răspunde #5 : Noiembrie 01, 2012, 13:17:18 »

@Andrei: Actually, problem 4 can be solved in a much easier way Smile

You are just interested in the existence of a border in the word S. That is, some L such that S[1..L]=S[N-L+1..N]. This can be tested in linear time with suffix arrays (or whatever data structure you prefer), as well as with rolling hash.
But, anyway, nice solutions Very Happy


	Memorat

•a_h1926

Echipa infoarena
Nu mai tace

Karma: 317

Deconectat

Mesaje: 385

RÄƒspuns: Rolling hash, Rabin Karp, palindromes, rsync and others

« Răspunde #6 : Noiembrie 01, 2012, 13:59:47 »

Citat din mesajul lui: Florin Manea din Noiembrie 01, 2012, 13:17:18

@Andrei: Actually, problem 4 can be solved in a much easier way Smile

I think it can be found with KMP algorithm as well, and the solution is even simplier than with suffix arrays, but I thought we should have found a solution with rolling hashes, since that's the name of the post. Anyway, thanks for the observation.


	Memorat

•flmanea

Client obisnuit

Karma: 78

Deconectat

Mesaje: 68

RÄƒspuns: Rolling hash, Rabin Karp, palindromes, rsync and others

« Răspunde #7 : Noiembrie 01, 2012, 14:38:04 »

Yeah, actually "whatever data structure you prefer" included the functions computed by KMP Smile

But I must confess that I kind of like suffix arrays, so I just gave them priority Very Happy

Anyway, as I said, the linear solution can be achieved with some sort of rolling hash as well.


	Memorat

•chris_11

Strain

Karma: 1

Deconectat

Mesaje: 1

RÄƒspuns: Rolling hash, Rabin Karp, palindromes, rsync and others

« Răspunde #8 : Noiembrie 02, 2012, 00:31:00 »

Nice article and cute problems Smile

. In the text of the first additional problem there is a small typo, I think you meant palindromic substrings, not subsequences (I think everybody understood substrings anyway Smile

).

P.S.: Oh, and while I'm still in the bitchin' mode: I had an issue while viewing comments on the blog page after logging in: it says "Continutul nu poate fi descarcat". It seems to happen when using the https protocol (the browser isn't very happy about performing an AJAX request to an unsecured location). Sorry for the digression.


	Memorat

•Cosmin

Echipa infoarena
Nu mai tace

Karma: 351

Deconectat

Mesaje: 1.799

RÄƒspuns: Rolling hash, Rabin Karp, palindromes, rsync and others

« Răspunde #9 : Noiembrie 03, 2012, 01:54:43 »

@Florin. Yes, other problems also have better solutions. For example the Longest palindrome can be solved in linear time using a suffix tree or using clever trick by Manacher (http://books.google.com/books?id=9NdohJXtIyYC&lpg=PP1&ots=llc90BNZHd&dq=rytter%20jewels%20o&pg=PA111#v=onepage&q&f=false)


« Ultima modificare: Noiembrie 03, 2012, 08:47:31 de către Cosmin Negruseri »	Memorat

Pagini: [1] În sus

Imprimă

infoarena > Comunitate - feedback, proiecte si distractie > Blog > Subiect: Rolling hash, Rabin Karp, palindromes, rsync and others

« mesajul precedent următorul mesaj »

	Ajutor	Subiect: Rolling hash, Rabin Karp, palindromes, rsync and others (Citit de 4446 ori)
0 Utilizatori şi 1 Vizitator pe acest subiect.

infoarena informatica de performanta