BA02-优化后的KMP算法¶

优化后的KMP算法¶

算法优化介绍¶

传统的KMP算法在某些情况下仍有改进空间。优化后的KMP算法主要通过next数组的优化来进一步提升效率，避免不必要的回溯。

优化核心思想¶

当pattern[next[j]] == pattern[j]时，说明即使模式串右移后，字符仍然相同，会再次发生不匹配，因此可以直接跳过，继续使用更前面的next值。

优化后的完整实现¶

C
/**
 * 优化的next数组构造函数
 * @param pattern 模式串
 * @param next    优化后的next数组
 * @param len     模式串长度
 */
void buildOptimizedNext(char* pattern, int* next, int len) {
    next[0] = 0;
    int prefix = 0;

    for (int suffix = 1; suffix < len; suffix++) {
        // 标准KMP的next计算
        while (prefix > 0 && pattern[prefix] != pattern[suffix]) {
            prefix = next[prefix - 1];
        }

        if (pattern[prefix] == pattern[suffix]) {
            prefix++;
        }

        next[suffix] = prefix;

        // 优化：如果当前字符与回溯后的字符相同，则直接使用更前面的next值
        if (prefix > 0 && pattern[suffix] == pattern[next[suffix - 1]]) {
            next[suffix] = next[prefix - 1];
        }
    }
}

/**
 * 优化后的KMP搜索算法
 * @param text    文本串
 * @param pattern 模式串
 * @return        所有匹配位置的数组，NULL表示未找到
 */
int* optimizedKmpSearch(char* text, char* pattern, int* resultCount) {
    int textLen = strlen(text);
    int patternLen = strlen(pattern);

    if (patternLen == 0 || textLen < patternLen) {
        *resultCount = 0;
        return NULL;
    }

    // 构造优化的next数组
    int* next = (int*)malloc(patternLen * sizeof(int));
    buildOptimizedNext(pattern, next, patternLen);

    // 存储所有匹配位置
    int* matches = (int*)malloc(textLen * sizeof(int));
    int matchIndex = 0;

    int patternPos = 0;

    for (int textPos = 0; textPos < textLen; textPos++) {
        // 字符不匹配时，根据next数组调整模式串位置
        while (patternPos > 0 && text[textPos] != pattern[patternPos]) {
            patternPos = next[patternPos - 1];
        }

        // 字符匹配成功
        if (text[textPos] == pattern[patternPos]) {
            patternPos++;
        }

        // 完全匹配成功
        if (patternPos == patternLen) {
            matches[matchIndex++] = textPos - patternLen + 1;
            // 继续寻找下一个匹配（允许重叠匹配）
            patternPos = next[patternPos - 1];
        }
    }

    free(next);
    *resultCount = matchIndex;

    if (matchIndex == 0) {
        free(matches);
        return NULL;
    }

    return matches;
}

优化效果对比¶

标准KMP vs 优化KMP¶

对于模式串 "aaaa"：

Text Only
1 2 3 4 5 6 7	`标准KMP的Next数组: Pattern: a a a a Next: 0 1 2 3 优化后KMP的Next数组: Pattern: a a a a Next: 0 0 0 0`

优化优势¶

减少不必要的比较: 避免连续相同字符的重复比较
提高匹配效率: 在某些模式下能显著减少字符比较次数
保持时间复杂度: 仍然是O(n+m)，但实际运行时间更优

复杂度分析¶

时间复杂度¶

预处理阶段: O(m)
匹配阶段: O(n)
总体: O(n + m)

空间复杂度¶

next数组: O(m)
总体: O(m)

关键优化点总结¶

Next数组优化: 当pattern[j] == pattern[next[j-1]]时，直接使用next[next[j-1]]
避免重复比较: 减少模式串中连续相同字符的无效回溯
保持算法正确性: 优化不影响匹配结果的正确性

这个优化版本在处理具有大量重复字符的模式串时特别有效，是KMP算法在实际应用中的重要改进。