# KMP Algorithm

How to search a text in O(N)?

Given strings `s`

and `w`

, return the positions in `s`

that match to `w`

. Let the length of `s`

and `w`

be `N`

and `M`

, respectively. The time complexity of the naive approach is $O(NM)$.

On the other hand, the KMP (Knuth-Morris-Pratt) algorithm runs in $O(N)$ to do the same task.

As we can see from the above example, the KMP algorithm utilizes the structure of `w`

. Regardless of whether it finds a match or a mismatch in an iteration, it shifts `w`

using that knowledge.

Let's take a closer look at *the knowledge* here.

```
Check 5 letters and find the 5th character does not match.
Because w[3:5] = w[0:2] = AA, we slide w by 5-2 = 3.
0123456789012
s = AABAAABAABAAA
w = AABAAB
012345
```

Matching has proceeded to the 5th letter in `w`

means `s`

and `w`

matched up to the 4th letter in `w`

. Examining `w[0:5] (=AABAA)`

, we can see that **the longest proper prefix of ****w[0:5]**** that is also a suffix of ****w[0:5]**** is ****AA****. **That teaches us that if we slide `w`

by 5-2=3, we can have `w[0:2]`

matched and start matching from `w[2]`

.

# Creating LPS Array

As we have seen, we need to prepare an array of the length of **the longest proper prefix that is also a suffix (LPS)** for all `w[0:i]`

$(0 \lt i \le M)$. Here is how we calculate the LPS array by hand.

```
How to create an LPS array in O(M)
===================================
w[0:1] = A
There is no proper prefix in A.
LPS[0] = 0
w[0:2] = AA
The suffix A is also a proper prefix of AA.
LPS[1] = 1
w[0:3] = AAB
There is no suffix of AAB that is also a proper prefix of AAB.
LPS[2] = 0
W[0:4] = AABA
The suffix A is also a proper prefix of AABA.
LPS[3] = 1
W[0:5] = AABAA
The suffix AA is also a proper prefix of AABAA.
LPS[4] = 2
W[0:6] = AABAAB
The suffix AAB is also a proper prefix of AABAAB.
LPS[5] = 3
LPS = [0, 1, 0, 1, 2, 3]
```

# Implementation

Here is my implementation of the KMP algorithm in TypeScript.

## Proof of len=lps[len-1]

# Time Complexity

The preparation of the LPS array takes $O(M)$ and the search takes $O(N)$. Assuming $N > M$, we can say that the whole algorithm runs in $O(N)$.