[Webinar] Master Apache Kafka® Fundamentals with Confluent | Register Now

Building AI Agents and Copilots with Confluent, Airy, and Apache Flink

Écrit par

From automating routine tasks to providing real-time insights to inform complex decisions, AI agents and copilots are poised to become an integral part of enterprise operations. At least that’s true for the organizations that can figure out how to supply large language models (LLMs) with real-time, contextualized, and trustworthy data in a secure and scalable way.

At Airy, we’ve developed a framework that helps developers build copilots as a new interface to explore and work with their data. Together with Confluent, we bridge LLMs and real-time data, allowing users to ask questions in natural language about their data streaming across operational and analytical systems—leveraging Apache Kafka®, Apache Flink®, and Apache Iceberg™ to make AI copilots with agentic capabilities a reality for enterprise use cases.

Retail copilot built on Airy

This article walks you through how we solved common sticking points that developers and architects face when they attempt to advance from using simple chatbots to building intelligent AI copilots.

Solving data challenges that prevent AI success for the enterprise

Whether it's predicting customer behavior in finance, personalizing marketing strategies in retail, or streamlining customer support operations, businesses need copilots that can provide engineering and business teams with information that is both accurate and timely.

To make that possible, our team faced the following challenges:

  • Data governance and security

    ‎ 

    For businesses with growing data needs and an ever-increasing number of point-to-point integrations, data access, lineage, and security challenges escalate as the number of interconnections goes up and as different teams across multiple lines of business share data with one another. It’s important to consider how data streams are governed and secured to reduce risk and get the desired result.

    ‎ 

  • Accuracy of GenAI outputs and minimizing hallucinations

    ‎ 

    LLMs have a tremendous amount of knowledge based on publicly available data and may not have context-aware or business-specific data. When there is a knowledge gap, LLMs can hallucinate and generate inaccurate results.

    ‎ 

    For enterprises, the stakes are high. Hallucinations in a GenAI copilot designed to assist with legal, financial, or customer-facing tasks could have serious consequences. Ensuring that the data provided to the AI is trustworthy and verifiable is critical.

    ‎ 

  • Infrastructure that can handle high throughput and control costs

    ‎ 

    Enterprise customers need a data architecture scale that can process millions of events—ensuring that systems can still handle high volumes of historic and real-time data with minimal latency during peak loads—without overspending on large-scale use cases with variable demand.

To solve all these challenges, we needed a highly scalable and feature-rich data streaming platform that would allow us to power real-time inference with trustworthy outputs at scale, without compromising on security, accuracy, or cost.

Why a data streaming platform is key for building AI agents and copilots

Confluent Data Streaming Platform enables us to bridge the analytical-operational divide and integrate data from our customers’ SaaS applications like Salesforce, databases like MongoDB and PostgreSQL, and systems like Apache IcebergTM, Snowflake, and Delta Lake. This means the AI copilots that our customers want to build no longer have to wait for batch processing or rely on outdated data.

Converting natural language into Flink SQL queries for stream processing

Our primary use case on Confluent is agentic AI—allowing users to interact with streaming data in natural language, and turning Flink jobs into agents that continuously monitor data streams.

Users can ask the copilot questions as simple as “What topic contains order data?” and “Describe the purchases topic” to more complex questions such as:

  • What is our revenue today in stores per product category? Can you send an update on this anytime it exceeds our targets?

  • For a RFP, provide a list of suppliers for Vitamin C. Rank them in order of present 1) Defect Rate, 2) OTIF, and 3) SRI.

With Confluent, Airy allows agents to monitor simple tasks as well as evolve to more complex, multi-step workflows to fulfill the request.

Airy uses Confluent connectors to continuously ingest events, including from MongoDB, PostgreSQL, and Salesforce. Once data is ingested, it needs to be transformed before it can be used for tasks like segmentation or personalization. We use Flink to filter out noise, aggregate key metrics, and enrich data streams with additional context.

For structured data in topics, we enable LLMs to generate accurate Flink SQL by using data contracts, evolving schemas, and managing metadata with Stream Governance. Providing context to the LLM about the latest schema for topics helps generate more robust Flink SQL statements.

For semi-structured and unstructured data (e.g., text, JSON), we use Flink stream processing and AI Model Inference to create embeddings and store them in vector databases for retrieval-augmented generation (RAG). According to IDC, more than 90% of enterprise data is unstructured. Airy’s copilot makes it possible for teams to quickly and efficiently extract and analyze this data—all via natural language—to make knowledge instantly accessible for an organization.

Flink AI Model Inference has simplified our stack—we previously built components that we had to run manually, but have since retired these.

We’re also starting to use Tableflow to help customers take advantage of Iceberg, store historic and recent data, better manage their data catalog, and easily feed their real-time data to any data lake or warehouse of choice.

Finally, we can perform the post-processing step. For example, if a copilot were making recommendations based on product data, we can cross-check it against a real-time inventory topic to ensure accuracy and reduce the risk of hallucination by passing only fresh, relevant data to customers’ LLMs.

The benefits we’ve seen so far using Confluent include:

  • Streaming large volumes of data at low latency

  • Processing and governing data streams as a single source of truth

  • Enabling copilots to make better decisions based on the freshest, most accurate data

  • Scaling AI copilots at a lower cost

What’s next for Airy

Building AI agents and copilots for the enterprise isn’t a simple task. It requires real-time data processing, accuracy, robust security, and cost-effective scalability. But with the right infrastructure, it’s possible to tackle these challenges and deliver AI-powered solutions that drive real value for businesses.

Confluent Data Streaming Platform has been a game-changer for Airy. By leveraging Flink stream processing and AI Model Inference along with Stream Governance and connectors, we help customers build trustworthy AI that accelerates business copilot adoption.

As Airy evolves, we’re looking to enable Compound AI Systems that automate tool selection. Instead of spending the time to make a decision, users can communicate the objective in natural language and the system will autonomously select the right tools—such as Flink for stream processing, which data sources to query, which cost-effective LLMs to use—based on the task at hand.

If you’re looking to build and scale agents and AI copilots, we encourage you to explore the power of combining data streaming, stream processing, and AI. Learn more about Airy and check out more resources on Confluent’s GenAI hub.

‎ 

Apache®, Apache Flink®, Apache Kafka®, Kafka®, Flink®, Apache Iceberg™️, Iceberg™️, Iceberg logo, and the Flink logo are either registered trademarks or trademarks of the Apache Software Foundation.

  • Steffen Hoellinger is the co-founder and CEO of Airy, an innovative AI startup focused on building open source data infrastructure that combines the power of data streaming, stream processing, and AI. With a deep passion for the power of real-time, AI-driven insights, Steffen leads Airy in providing scalable, efficient solutions that empower enterprises to harness the full potential of generative AI and advanced machine learning and help shape the future of business.

Avez-vous aimé cet article de blog ? Partagez-le !