What Happened

The Apache Doris community published its 2026 development roadmap, outlining a strategic shift from analytical database to unified data platform designed to serve AI-native workloads. The announcement , published via the Juejin developer community , follows the release of two major versions in 2025 — 3.1 and 4.0 — and sets the annual theme as Scale Intelligence, Accelerate Insight .

Version 4.0, the most recent release , introduced Vector Search as a first-class feature, enabling unified querying across structured, semi-structured, and vector data within a single SQL engine. Prior to this, users required separate systems such as dedicated vector databases or Elasticsearch to handle semantic retrieval alongside relational analytics.

Why It Matters

The roadmap signals a consolidation play directly targeting the fragmented AI data stack. Enterprise teams currently running separate systems for OLAP analytics, full-text search, and vector retrieval face compounding operational overhead and data consistency risks. Apache Doris is positioning a single engine as the answer — a direct competitive challenge to architect ures combining ClickHouse or Snowflake with Pinecone or Weaviate plus Elasticsearch.

The Doris community identifies three structural pressures driving this consolid ation:

  • Schema instability at scale: Agent interaction logs, LLM outputs, and user behavior traces arrive predominantly as JSON with unp redictable structure and column cardinality. Traditional columnar schemas cannot absorb this without significant engineering overhead.
  • Concurrency amplification: A single Agent request generates multiple downstream data access calls, raising throughput and latency requirements beyond what batch -oriented analytical systems were designed to handle.
  • AI observability as a new work load class: Tracing Agent behavior — security anomalies, reasoning patterns, failure modes — requires joining trace, log, and metric data in real time, a workload the roadmap treats as structurally distinct from traditional AP M analytics.

The push to support OpenTelemetry integration directly is notable. It positions Doris as infrastructure for the emerging AI observability toolchain, competing with purpose-built solutions before that market consolidates.

The Technical Detail

Vector Search Scaling

The 2026 roadmap targets support for tens of billions of vectors through disk-based Approximate Nearest Neighbor (ANN) algorithms and data structures. Current in-memory ANN implementations hit cost and capacity c eilings well below this threshold. The roadmap also spec ifies building updatable vector indexes on top of the Merge-on-Write storage model and improving vector data compression and index management efficiency.

Hybrid Retrieval Architecture ( HSAP)

Doris 4.0 introduced the ability to execute full -text search, semantic scoring, and vector search within a single SQL statement. The 2026 roadmap expands this with :

  • Global index enhancements and lazy materialization to optimize TopN semantic retrieval queries, reducing data scan volume
  • Vector capability extensions to open lake formats — specifically Iceberg and Paimon — allowing vector search over data lake tables without migration
  • Index- first access path optimization to reduce retrieval latency

Variant Type and Semi-Structured Storage

The Variant type, introduced to handle JSON data in version 3.1 , is being extended for 2026 to support deeply nested JSON structures and optimize storage of sparse columns and high-cardinality string columns. The goal is columnar storage performance p arity for schema-on-read JSON workloads — the dominant data format for AI application logs.

Additional columnar capability work includes partial column updates, improved indexing for wide tables, and enhanced handling of tables with extremely high column counts — a direct consequence of LLM output and Agent state serialization patterns.

AI SQL and Multimodal Processing

The roadmap introduces an AI SQL construct combined with Python UDF support, targeting an end-to-end pipeline covering data preprocessing, feature extraction, vector construction , and analysis within the database engine. A new File data type is specified, with context-dependent semantics: file metadata access in standard SQL contexts, and direct content processing in AI SQL contexts. This targets multimodal data — audio, video, image — without requiring external preprocessing pipelines.

What To Watch

  • Version release cadence: The roadmap does not specify release dates for features targeting billion-scale vector support or the File type. Watch the Apache Doris GitHub repository for milestone tagging against the 2026 roadmap items .
  • OpenTelemetry integration depth: The roadmap references OpenTelemetry ecosystem integration for AI observability. The specif ics — whether this means an OTLP ingest endpoint, a collector plugin, or a query-layer schema convention — will determine whether this is a genuine observability play or a marketing positioning.
  • Lakehouse vector support: The stated intent to enable vector search over Iceberg and Paimon tables without data migration, if delivered, would be a meaningful differ entiator. Track whether this appears in any 2026 Q1 release candidate.
  • Competitive response: ClickHouse has been expanding JSON and semi-structured support aggressively. Databricks' acquisition activity in the vector and search space creates a direct overlap with Doris's HSAP positioning. Watch for benchmark publications from either side.
  • Community adoption signals: The roadmap references production deployments of version 3.1 in semi-structured analytics scenarios. Concrete case studies with query performance numbers would be the first external validation of whether the 4 .0 vector capabilities are production-grade.