String¶
A string is traditionally a sequence of charactors, either as a literal constant or as some kind of variable. The storage of string can be:
- an array of
charwith the end of\0, which is theC-style string; - the stl
stringcontainer, whose length is changable; - literal constant(some charactors enclosed by
"").
Some definiations:
- Prefix: a string
sis said to be a prefix oftif there exists a stringusuch thatt = su; - Proper prefix: if
uis nonempty,sis said to be aproper prefixoft; - Suffix: a string
sis said to be a prefix oftif there exists a stringusuch thatt = us; - Proper suffix: if
uis nonempty,sis said to be aproper prefixoft;
Pattern Matching¶
This probelm is defined as:
Given strings
sandt, find the position thattappears insfor the first time.tis called thepattern.
Brute Force¶
The basic method is to enumerate the start position of substring in pattern, and check if it's true;
1 2 3 4 5 6 7 8 9 10 11 | |
The time complexity is \(O(mn)\), which costs a lot;
Knuth-Morris-Pratt(KMP)¶
The disadvantage of BF is that every time comparing the s and t we have to return to the start of substring if failed, which wastes much time.
If the repeated string appears in substring, we can jump over the same prefix of the string and check the first different charactor when mismatching occurs. Where comes the next array:
next Array¶
Generally speaking, next is an array that next[i] represents the length of the longest common proper prefix of two string:
- prefix of string
s[0:i - 1]; - suffix of string
s[0:i - 1];
where s is the given substring.

We let next[0] = -1 because s[0:-1] has no proper prefix. And at the position i we fill the result of next[i + 1], so the terminate condition is i < n - 1.
1 2 3 4 5 6 7 8 9 | |
Main Process¶
The main process is the same with how we get the next array:
1 2 3 4 5 6 7 8 9 10 11 | |
The time complexity of KMP is \(O(m + n)\).
Problems: