Rolling Hash: The Algorithm That Replaced Half of my DP Solutions
And nobody talks about it.
Part 1 — Strings Are Numbers, and You’ve Been Ignoring It
You’re scrolling LinkedIn right now. Or maybe Instagram. Someone’s posting their 47th “Day X of LeetCode” update with a green submission screenshot and zero explanation of what they actually learned.
Stop.
Take the next 8 minutes you were going to spend double-tapping vacation photos and read this instead. I promise you — one algorithm concept that actually sticks is worth more than 50 mindless scrolls.
Today I’m going to rewire how you think about strings. Not with some dry textbook explanation. With an idea so simple you’ll be angry nobody told you sooner.
It’s called Rolling Hash. And once you see it, you’ll wonder why you ever wrote an O(n²) DP table for a string problem again.
The Revelation That Changed Everything For Me
I used to be a DP-for-everything person. Longest common substring? DP. Repeated subarray? DP. Pattern matching? DP.
Then one day, I got TLE on a problem. My O(n²) DP was too slow for n = 100,000. I needed something fundamentally different — not a constant-factor optimization, but a different way of thinking.
That’s when I discovered:
You can check if two substrings are identical in O(1) time.
Not O(n). Not O(k). O(1). Constant time. Regardless of length.
When I first read that, I thought it was a lie. It’s not. And by the end of Part 2 (tomorrow), you’ll know exactly how and why it works — well enough to implement it from memory.
But first, we need to talk about something nobody explains properly.
Strings Are Just Numbers Wearing Costumes
Before we hash anything, we need to internalize one idea:
Every string is already a number. We just forgot to look at it that way.
Think about the number 4827. What does it actually mean?
4827 = 4 × 10³ + 8 × 10² + 2 × 10¹ + 7 × 10⁰It’s a sequence of digits, where each digit’s position determines how much it’s worth. The base is 10 because we have 10 possible digits (0–9).
Now look at the string “cat”.
If we say a=1, b=2, c=3, ... z=26, then:
"cat" = 3 × ?² + 1 × ?¹ + 20 × ?⁰What’s the base? Well, we have 26 possible characters. So the obvious choice is base 26:
"cat" = 3 × 26² + 1 × 26 + 20
= 2028 + 26 + 20
= 2074That’s it. “cat” = 2074. The string is a number in base-26.
“dog”? Different number. “cats”? Different number. No two distinct strings produce the same value.
At least, that’s true in theory. In practice, we have a problem. A big one.
The Explosion Problem
Let’s compute the numerical value of a modest 20-character string.
The largest value would be something like “zzzzzzzzzzzzzzzzzzzz” — twenty z’s:
26 × 26²⁰ + 26 × 26¹⁹ + ... ≈ 26²¹ ≈ 2.8 × 10²⁹A 64-bit integer maxes out at about 9.2 × 10¹⁸.
We overflow by a factor of 30 billion. And that’s just 20 characters. Real problems have strings of length 10⁵ or 10⁶.
So we can’t store these numbers directly. We need to shrink them down while keeping the useful property that different strings map to different numbers (most of the time).
Enter modular arithmetic:
hash("cat") = (3 × 26² + 1 × 26 + 20) mod MPick a large M, and every hash fits in a standard integer. The price we pay? Two different strings might produce the same hash.
This is called a collision. And it’s where most tutorials lose people.
I’m going to save the deep dive on collisions for Part 2 tomorrow — including why they’re less scary than you think, and a hierarchy of battle-tested defenses. For now, just know that with the right parameters, collisions are astronomically rare.
But before we get there, I need to warn you about something almost every beginner gets wrong.
Base 26 Is a Trap (And Here’s Why)
This is the part where I save you from a bug that will cost you 3 hours of debugging.
Pop quiz: with base = 26 and our a=1, ..., z=26 mapping, what’s the hash of the empty prefix?
Trick question. But consider this: what’s the hash of “a”?
hash("a") = 1And hash of “aa”?
hash("aa") = 1 × 26 + 1 = 27Seems fine. Now here’s the trap. What if we mapped a=0 instead of a=1?
hash("a") = 0
hash("aa") = 0 × 26 + 0 = 0
hash("aaa") = 0Every string of all a’s hashes to 0. You get collisions for free without even trying.
Even with a=1, base=26 has problems. The character values (1–26) are packed too tightly. Strings that look completely different end up with similar hashes because the base doesn’t spread them out enough.
The fix: pick a smarter base
Two rules:
Base should be prime — reduces patterns that create collisions
Base should be larger than the alphabet size — so single characters all land in distinct positions
Battle-tested choices:
Safe: base = 31, mod = 10⁹ + 7
Safer: base = 131, mod = 10⁹ + 7
Ultra-safe: Use BOTH (base=31, mod=10⁹+7) AND (base=131, mod=10⁹+9)That last option — double hashing — makes the collision probability roughly 10⁻¹⁸. You are literally more likely to get struck by lightning while winning the lottery.
I’ll show you exactly how double hashing works in Part 2. For now, just use base=131 and mod=10⁹+7. It’ll work for 95% of problems.
The Rolling Part — Where It Gets Beautiful
Okay. We can turn any string into a number (mod M). Cool. But computing the hash of a substring still takes O(k) time where k is the substring length. That’s no better than comparing characters directly.
Here’s where the “rolling” comes in, and honestly, this is the most elegant trick in all of string algorithms.
Scenario: hash every substring of length 3 in “abcdef”
We’ve computed:
hash("abc") = a × B² + b × B + c (mod M)Now we want hash(”bcd”). The brute-force way recomputes from scratch. But look at the structure:
hash("abc") = [a × B²] + b × B + c
hash("bcd") = b × B² + c × B + dTo get from one to the other:
Remove the leftmost character: subtract
a × B²Shift everything left: multiply by B
Add the new rightmost character: add d
hash("bcd") = ( hash("abc") − a × B² ) × B + d (mod M)One subtraction. One multiplication. One addition. O(1).
python
# Slide the window: O(1) per step
new_hash = ((old_hash - s[left] * pow_base) * base + s[right]) % modwhere pow_base = B^(k-1) mod M, precomputed once.
It’s the same idea as a sliding window sum — except instead of tracking a sum of values, we’re tracking a polynomial evaluation. The window slides, the hash updates, and we never recompute anything from scratch.
The Prefix Hash Array — The Real Weapon
Rolling a window is great when your substrings all have the same length. But what about comparing arbitrary substrings? Like “is s[2..7] the same as s[15..20]?”
This is where the prefix hash array enters, and it’s what separates “I know rolling hash” from “I destroy problems with rolling hash.”
Build this array once, in O(n):
python
def build_prefix_hash(s, base=131, mod=10**9 + 7):
n = len(s)
H = [0] * (n + 1)
for i in range(n):
H[i + 1] = (H[i] * base + ord(s[i])) % mod
return HNow the hash of any substring s[l..r] pops out in O(1):
python
def get_hash(H, power, l, r, mod):
return (H[r + 1] - H[l] * power[r - l + 1]) % modwhere power[k] = base^k mod M, also precomputed.
Think of it like a prefix sum for multiplication. Just like prefix sums let you compute any range-sum in O(1), prefix hashes let you compute any substring-hash in O(1).
And if the hashes of two substrings are equal? The substrings are (almost certainly) equal.
O(1) substring equality. That’s the superpower.
Every problem where your inner loop compares substrings? That loop just became free.
The Complete Template - Python
Here’s what I copy-paste at the start of every rolling hash problem:
pythonclass RollingHash:
def __init__(self, s, base=131, mod=10**9 + 7):
self.mod = mod
self.base = base
self.n = len(s)
# Prefix hashes and power array
self.h = [0] * (self.n + 1)
self.pw = [1] * (self.n + 1)
for i in range(self.n):
self.h[i+1] = (self.h[i] * base + ord(s[i])) % mod
self.pw[i+1] = (self.pw[i] * base) % mod
def query(self, l, r):
"""Hash of s[l..r] inclusive. O(1)."""
return (self.h[r+1] - self.h[l] * self.pw[r-l+1]) % self.modUsage:
python
rh = RollingHash("banana")
# Is s[1..3] equal to s[3..5]? ("ana" vs "ana")
print(rh.query(1, 3) == rh.query(3, 5)) # True
# Is s[0..2] equal to s[3..5]? ("ban" vs "ana")
print(rh.query(0, 2) == rh.query(3, 5)) # FalseSix lines of setup. O(1) per comparison. That’s it.
The Complete Template - C++
class RollingHash{
public:
const int base1 = 47;
const int base2 = 53;
const int mod = 1e9 + 7;
vector<pair<int,int>>pfxHash;
vector<pair<int,int>>powers;
string s;
RollingHash(const string &s){
int n = s.size();
pfxHash.resize(n+1,{0,0});
powers.resize(n+1,{1,1});
for(int i=0;i<n;i++){
pfxHash[i+1].first = ((pfxHash[i].first * 1ll * base1)%mod + s[i] - 'a' + 1) % mod;
pfxHash[i+1].second = ((pfxHash[i].second * 1ll * base2)%mod + s[i] - 'a' + 1) % mod;
powers[i+1].first = (powers[i].first * 1ll * base1) % mod;
powers[i+1].second = (powers[i].second * 1ll * base2) % mod;
}
}
pair<int,int> getHash(int l, int r){
l++,r++;
int right = pfxHash[r].first;
int left = (pfxHash[l-1].first * 1ll * powers[r-l+1].first)%mod;
int right1 = pfxHash[r].second;
int left1 = (pfxHash[l-1].second * 1ll * powers[r-l+1].second)%mod;
return {(right - left + mod)%mod, (right1- left1 + mod)%mod};
}
};But Wait — What About Collisions?
Right now you’re thinking: “Okay but what if two different substrings have the same hash? Don’t I get wrong answers?”
Yes. You would. And this is exactly where most people either give up on rolling hash (”too risky”) or use it blindly and get hacked on Codeforces.
Neither reaction is correct.
There’s a clean, principled way to handle collisions — from casual LeetCode to adversarial competitive programming. And there’s a reason I keep calling double hashing “10⁻¹⁸ probability” without explaining it yet.
That’s tomorrow.
Coming in Part 2 (Thursday)
The collision deep dive — the math, the defenses, and exactly when to use each level of protection
Rolling hash vs. DP — five specific problem patterns where rolling hash turns O(n²) into O(n log n)
A full worked example — solving “Longest Duplicate Substring” from scratch, step by step
The problem list — 10 problems in order, from Easy to Hard, that will drill rolling hash into your muscle memory
The one mental model that makes you instantly recognize “this is a rolling hash problem”
You just spent 8 minutes learning something that will save you hours. That’s a better ROI than anything in your LinkedIn feed.
Share this with someone who’s still scrolling reels instead of building skills. They’ll thank you later.
And if you want Part 2 in your inbox tomorrow — subscribe. I won’t spam you. Just algorithms that actually make you better.
Hit reply if something didn’t click. The best version of this post comes from your questions.


