Redis Search Engine
TF-IDF search engine built on Redis sorted sets
The Problem
Full-text search is a fundamental problem in computer science, but most developers just plug in Elasticsearch without understanding what happens underneath. This project builds a search engine from scratch using Redis sorted sets for storage and TF-IDF for relevance ranking. This proves you can build high-performance search with surprisingly simple primitives.
Architecture
Technical Decisions
Why Redis sorted sets instead of a regular database?
Redis sorted sets give me O(log N) inserts and O(log N + M) range queries, where M is the result size. For a search engine, the "score" in a sorted set IS the relevance score. ZUNIONSTORE lets me combine scores across multiple terms with custom weights (the IDF component) in a single atomic operation. It's elegant. The data structure IS the algorithm.
How does TF-IDF scoring work here?
TF (term frequency) counts how often a word appears in a document, stored as the score in Redis sorted sets at index time. IDF (inverse document frequency) is computed at query time: log(total_documents / documents_containing_term). The final score is TF × IDF. High TF-IDF means a word is frequent in this document but rare globally. Exactly what you want for relevance ranking.
How do negative search terms work?
When you search 'python -flask', the engine first finds all documents matching 'python', then computes the set of documents containing 'flask', and subtracts using Redis ZDIFFSTORE. The set operations make this trivially efficient. No iteration needed.
Code Spotlight
def _compute_scores(self, terms, n_docs):
"""Compute TF-IDF scores for search terms."""
weights = {}
for term in terms:
df = self.redis.scard(f"idx:{term}")
if df == 0:
continue
# IDF: log(N / df) - rare terms score higher
idf = math.log(n_docs / df)
weights[f"idx:{term}"] = idf
if not weights:
return []
# ZUNIONSTORE: combine sorted sets with IDF weights
# TF is already the score; multiply by IDF
dest = f"tmp:search:{uuid4().hex}"
self.redis.zunionstore(dest, weights)
results = self.redis.zrevrange(
dest, 0, -1, withscores=True
)
self.redis.delete(dest)
return resultsThe core of the search engine. TF is stored as Redis sorted set scores at index time. At query time, IDF weights are computed and ZUNIONSTORE combines everything in one atomic operation. The data structure IS the algorithm.
Key Features
Search & Indexing
- ▹TF-IDF relevance scoring with logarithmic IDF
- ▹Tokenization with English stop word filtering
- ▹Negative term exclusion (e.g., 'search -redis')
- ▹JSON bulk loader for batch document indexing
Performance
- ▹Redis pipelines for batch operations during indexing
- ▹Sub-millisecond query response on 10,000+ documents
- ▹Pagination support with --offset and --limit
Interface
- ▹Full CLI: index, search, remove, stats commands
- ▹JSON output support for scripting
- ▹Programmatic usage via SearchEngine class
Testing
Unit and integration tests covering indexing, search ranking, negative terms, and edge cases.
Unit + integration test suite
- ▹Tests run against a local Redis instance
- ▹Covers indexing, search, negative terms, pagination, and stats
Results
10,000+ documents indexed with sub-millisecond query response
TF-IDF scoring produces Google-relevant ranking quality
Negative term exclusion via Redis set operations
Clean CLI interface with JSON output support