Lumina Search is an ultra-fast, modern full-text search engine built from scratch using the latest features of Java 22. This project demonstrates how combining SIMD instructions (Vector API), direct memory access (FFM API), and virtual threads (Project Loom) can deliver performance that significantly exceeds traditional counterparts.
Unlike classic search engines (e.g., Apache Lucene), Lumina minimizes object allocations and relies heavily on hardware acceleration for mathematical computations:
- SIMD BM25 Scoring: Relevance scoring is computed in vectorized blocks (256-bit), processing multiple documents in a single CPU instruction rather than sequentially.
- Zero-Copy Memory: Dictionary and index files are memory-mapped directly from disk via
MemorySegment. This completely eliminates the overhead of copying data into the Java Heap and keeps the Garbage Collector idle. - Hashing Over Strings: The custom analyzer converts text tokens into 64-bit Murmur3 hashes immediately during the parsing stage, reducing the search process to rapid binary operations.
Benchmark based on the query "fast math memory" across a dataset of 1,000,000 documents. The table shows average response times.
*The Lucene result is an estimation based on a similar configuration using the built-in LuceneBenchmark. Lumina consistently delivers sub-millisecond latency thanks to hardware vectorization.
- Java: JDK 22 or newer.
- Maven: 3.8+.
- JVM Flags: You must run the application with
--enable-previewand--add-modules jdk.incubator.vectorto enable the incubator APIs.
Compile the project using Maven:
mvn clean package