String¶
A string
is traditionally a sequence of charactors, either as a literal constant or as some kind of variable. The storage of string can be:
- an array of
char
with the end of\0
, which is theC
-style string; - the stl
string
container, whose length is changable; - literal constant(some charactors enclosed by
""
).
Some definiations:
- Prefix: a string
s
is said to be a prefix oft
if there exists a stringu
such thatt = su
; - Proper prefix: if
u
is nonempty,s
is said to be aproper prefix
oft
; - Suffix: a string
s
is said to be a prefix oft
if there exists a stringu
such thatt = us
; - Proper suffix: if
u
is nonempty,s
is said to be aproper prefix
oft
;
Pattern Matching¶
This probelm is defined as:
Given strings
s
andt
, find the position thatt
appears ins
for the first time.t
is called thepattern
.
Brute Force¶
The basic method is to enumerate the start position of substring in pattern, and check if it's true;
1 2 3 4 5 6 7 8 9 10 11 |
|
The time complexity is \(O(mn)\), which costs a lot;
Knuth-Morris-Pratt(KMP)¶
The disadvantage of BF is that every time comparing the s
and t
we have to return to the start of substring if failed, which wastes much time.
If the repeated string appears in substring, we can jump over the same prefix of the string and check the first different charactor when mismatching occurs. Where comes the next
array:
next
Array¶
Generally speaking, next
is an array that next[i]
represents the length of the longest common proper prefix of two string:
- prefix of string
s[0:i - 1]
; - suffix of string
s[0:i - 1]
;
where s
is the given substring.
We let next[0] = -1
because s[0:-1]
has no proper prefix. And at the position i
we fill the result of next[i + 1]
, so the terminate condition is i < n - 1
.
1 2 3 4 5 6 7 8 9 |
|
Main Process¶
The main process is the same with how we get the next
array:
1 2 3 4 5 6 7 8 9 10 11 |
|
The time complexity of KMP is \(O(m + n)\).
Problems: