School of Computing. Dublin City University. Home Blog Teaching Research Contact My big idea: Ancient Brain 


but we know that currently:
string[i4] == pattern[0]
string[i3] == pattern[1]
string[i2] == pattern[2]
string[i1] == pattern[3]
So no point moving right by 1 position and trying to match pattern[0] with string[i3]
Instead can jump straight to string[i] and start trying to match pattern[0] from there.
To be precise, the prefix "1" of the pattern
cannot be found in any of the bit we skip over "000".
Obviously a lot of choices for prefixes of the pattern we could be looking at.
"After a mismatch at string[i], restart comparing the pattern from string[i] onwards."
Why won't it work?
This will fail if beginning of pattern is in the bit we skip over.
Example 1:
pattern = 101001
string = ... 1010100111111
Let B = beginning of pattern that we lose.
B = "10"
Jumps over the match and misses it.
Obviously a lot of choices for prefixes of the pattern we could be looking at.
In above case, we should reset string index to i2, but
is it possible to leave string[i] alone and just change
pattern[j]?
Example 2:
pattern = 10101001
string = ... 101010100111111
B = "1010"
Rule fails if
beginning of pattern B is in the bit we skip over
 This must mean that a beginning part of the pattern C
(not necessarily the same beginning part as B!)
must repeat twice at start of the pattern
(since we matched all of the bit we skipped over).
That is, pattern = C C ..
In both above C = "10"
In Example 1,
pattern starts with a repeat:
pattern[0] = pattern[2]
pattern[1] = pattern[3]
and:
string[i4] == pattern[0]
string[i3] == pattern[1]
string[i2] == pattern[2]
string[i1] == pattern[3]
that is:
string[i2] == pattern[0]
string[i1] == pattern[1]
So we can keep i as it is, and reset j from 4 to 2 and try matching from there
Given a search pattern, prebuild
a table, next[j], showing,
when there is a mismatch at pattern position j,
where to reset j to.
If match fails,
keep i same,
reset j to position next[j].
pattern = 10
string = ...
1100000
pattern = 110
string = ...
11100000
i stays same,
j goes from 2 back to 1
pattern = 100
string = ...
10100000
i stays same,
j goes from 2 back to 0
Example for pattern "ABABAC":
j  0  1  2  3  4  5 
substring 0 to j  A  AB  ABA  ABAB  ABABA  ABABAC 
longest prefixsuffix match  none  none  A  AB  ABA  none 
next[j]  0  0  1  2  3  0 
notes 
no prefix and suffix that are different i.e. next[0]=0 for all patterns 
Given j, let n = next[j]
"pattern[0] .. pattern[n1]"
= "pattern[j(n1)] .. pattern[j]"
e.g. j = 4, n = 3,
"pattern[0] .. pattern[2]"
= "pattern[2] .. pattern[4]"
If match fails at position j+1,
keep i same,
reset pattern to position n.
Have already matched pattern[0] .. pattern[n1]
e.g. We have matched ABABA so far.
If next one fails, say we have matched ABA so far and then see if next one matches.
That is, keep i same, just reset j to 3
(= precisely length of longest prefixsuffix match)
Then, if match after ABA fails too, by the same rule
we say we have matched A so far, reset to j = 1, and try again from there.
In other words, it starts by trying to match the longest prefixsuffix,
but if that fails it works down to the shorter ones until exhausted
(no prefixsuffix matches left).
i = 0 next[i] = 0 i++ j = 0 while ( i < m ) { if ( pattern[j] == pattern[i] ) { next[i] = j+1 i++ j++ } else ( pattern[j] != pattern[i] ) { if ( j > 0 ) j = next[j1] else ( j == 0 ) { next[i] = 0 i++ j = 0 // redundant, just to make it clear what we are looping with } } } 
Outputs next[0], next[1], next[2] , .... in that order.
step  i  j  k = (i  j)  calculating  substring  notes  output  output j 
1  0  0  0  next[0]  A  always = 0  next[0] = 0  0 
2  1  0  1  next[1]  AB  pattern[0] != pattern[1] failed to match pattern[0] == pattern[1] 
next[1] = 0  0 
3  2  0  2  next[2]  ABA  pattern[0] == pattern[2] this is our first match, pattern[0] == pattern[2] pattern can be shifted right by k = i = 2 j = 0 means can't be shifted right by anything less than k = i = 2 let's see how long we can do this for i.e. how long we can start at pattern[0], with ever growing length, and shift right by k 
next[2] = 1  1 
4  3  1  2  next[3]  ABAB  pattern[1] == pattern[3]
we have already matched pattern[0] == pattern[2] so if we can match pattern[1] == pattern[3] then we get a match of length 2 note that the difference between j and i encodes a memory of: (a) when this process started (step 3) and (b) how much we are shifting right by (k) 
next[3] = 2  2 
5  4  2  2  next[4]  ABABA  pattern[2] == pattern[4]
we have already matched pattern[0] == pattern[2] and pattern[1] == pattern[3] so if we can match pattern[2] == pattern[4] then we get a match of length 3 
next[4] = 3  3 
6  5  3  2  next[5]  ABABAC 
meaning of j:
output j = 3 from last round tells us that: pattern[0]..pattern[j1] can be shifted right by k = (ij) to match pattern[ij]..pattern[i1] pattern[0]..pattern[2] = ABA can be shifted right (by 2) to match pattern[2]..pattern[4] so only need to check if pattern[3] == pattern[5] pattern[3] != pattern[5] so now our shifting right of the evergrowing pattern since step 3 fails maybe there is another pattern we can shift right though but this will be a smaller pattern (i.e. we were correct to look for larger one first)
look for smaller pattern than ABA to shift right,
ending in pattern[4]

j = next[2] = 1  
7  5  1  4 
output j = 1 from last round tells us that: pattern[0] can be shifted right by k = (ij) to match pattern[i1] pattern[0] = A can be shifted right (by 4) to match pattern[4] so check if pattern[1] == pattern[5] pattern[1] != pattern[5] 
j = next[0] = 0  
8  5  0  5 
no history back to start check if pattern[0] == pattern[5] pattern[0] != pattern[5] 
next[5] = 0  0 