Skip to main content

IndexTables for Spark

Your fast search data should be yours

The open revolution came to data lakes. Now it's coming to search.
Performance that rivals proprietary platforms — built entirely on open tech.

The Search Revolution

Six years ago, the data world flipped upside down. The data lakehouse combined the openness of data lakes with the performance of data warehouses. Data stopped belonging to vendors. It started belonging to you.

But one domain missed the revolution: search. Observability and security search stacks are still dominated by closed, expensive ecosystems.

IndexTables brings that same open revolution to search — with performance that rivals the biggest proprietary platforms, built entirely on open tech. Built on Spark. Powered by the community.

It's your data. Your performance. Your choice.

🚀

No Infrastructure

Runs inside Spark executors. No specialty servers to manage, scale, or pay for. Just add the library and start indexing.

Spark Native

DataSource V2 with full filter pushdown, aggregate pushdown (COUNT, SUM, AVG), and partition pruning. Works with Spark SQL.

☁️

Cloud Ready

QuickwitSplit format optimized for S3 and Azure. L2 disk cache makes massive queries instant.

📊

10-1000x Faster Analytics

Aggregations run directly in the search engine, not Spark. Evaluate billions of rows in seconds, not minutes or hours.

🔍

Full-Text Search

IndexQuery operators with Tantivy/Quickwit syntax. Boolean queries, phrase search, fuzzy matching, and more.

📈

Time-Series Analytics

Built-in date histograms and bucket aggregations. Analyze logs by hour, day, or month with a single SQL query.

Simple API, Powerful Results

Write

// Write with full-text indexing
df.write.format("io.indextables.spark.core.IndexTables4SparkTableProvider")
.option("spark.indextables.indexing.typemap.content", "text")
.save("s3://bucket/logs")

Search

// Query with IndexQuery syntax
spark.sql("""
SELECT * FROM logs
WHERE content indexquery 'error AND database'
""")

Aggregate

// Aggregations pushed to the search engine
spark.sql("""
SELECT COUNT(*), AVG(latency)
FROM logs
WHERE status = 500
""")

Ready to free your search?

Get up and running in 5 minutes with our quickstart guide.

Get Started