AgenticBI Elastic Store/Data Warehousing

AgenticBI provides an optional Elastic Store that can store and track Query results. Unlike traditional warehouses that require complex ETL processes and pre-defined schema, the Elastic Store is a flexible, scalable, and schema-less warehouse.

Elastic Store offers the following:

  • Cache/persistent layer for fast analysis and visualizations
  • Long-running queries
  • Leaves raw data at the source
  • Typically used to store query results, results of multi-datasource joins, etc.
  • Eliminates complex ETL processing
  • Does not require a predefined schema
  • Keeps multi-structured data intact
  • Shields your raw datasources from your BI/reporting queries
  • Builds aggregations and data pipelines from datasets stored from this store
  • Reduces the load on your database for reporting workloads by offloading it to the Elastic Store

Note: Using AgenticBI Elastic Store is optional and can be used only for queries that need it. You can also use the Direct Query Mode, which bypasses this data layer to interact with your raw datasource directly.

Overview

The result of any Query (real-time/direct queries or non-direct queries) is considered a dataset. A dataset is a reusable component that can span multiple datasources/queries that also abstracts the underlying complexity of execution modalities, runtime parameters, and transformations. Further, datasets can be reused as inputs into other queries. For more information, please refer to the documentation- Datasets

You can directly run Queries in real-time in your database or save the Query results into our Elastic Store via Non-direct execution. With Elastic Store usage, the Overwrite Strategy provides control on how the data are updated in the dataset once the Query is run. The options available are as follows:

Upsert: This will replace the existing values for the same key with the latest data and insert new records where such keys do not exist.

Replace All: This will replace all data in the current dataset with the latest run.

Replace All - Include Empty: Same as Replace All, except that in cases where no data are found for the query run, it will update the dataset with no data (whereas Replace All will not update)

TTL - Time-Based Retention: This will skip the records beyond a specific time period for each run. Example: DateField-3m will keep any records where DateField is within 3 months and will drop all prior records for each run.

Append: This will leave the existing data as it is while adding new records for each run. Caveat: This is not common and you may end up with duplicates if the data is the same.

AgenticBI Elastic Store

Query A Dataset

You can reuse and query any dataset stored within the ElasticStore (same applies for direct queries).

Step 1: Create a new AgenticBI Warehouse Datasource (if one does not exist) and start Querying.

#1 Enter a name for your datasource for unique identification.

#2 Optionally, you can check the ACL (Access Control List) which will give you the ability to show/hide datasets shown during queries. You can choose datasets you would like to show on the query page or use Regex to limit indexes shown for more complex use cases.

Query Elastic Store

If the datasource already exists, then you can simply click on the New Query button, select your existing AgenticBI datasource, and start Querying.

New Elastic Store

Step 2: Select the Dataset from the dropdown. Fields for the selected dataset will be auto-retrieved for building the Query.

Select the Dataset

Step 3: Start Querying against the dataset by either using the Visual Builder or by writing the query directly into the Query Editor.

Visual Builder: Directly drag and drop the fields for generating Queries.

Visual Builder

Query Editor: Write a query in a SQL-like syntax environment using a versatile text editor designed for editing code.

Query Editor

Reuse Existing Datasets

From any dataset/query, you can then create a new "Child" Query that takes a subset of the original "Parent" query and save it as a new, linked dataset.

Reuse Existing Dataset

Triggered Datasets

Triggered Datasets are dependent datasets that auto-update when the parent dataset is modified.

For example: let's say that a query is run on raw data spanning terabytes of data with a query against it returning millions of records. The results of the query can be stored in AgenticBI Elastic Store. Furthermore, any derived queries on the resulting dataset can be set up separately within AgenticBI, so that when the original query is updated, all dependent datasets will also be automatically updated.

Note: In the case of Triggered Datasets, make sure that:

  • The first dataset is from the AgenticBI data warehouse
  • The triggered query schedule is based on the first dataset's schedule and will not be triggered by any other dataset. Hence, ensure that the first dataset is the one you want to schedule. To trigger on every AgenticBI dataset refresh instead, enable Trigger on Each Dataset Refresh (see below).

To set up a triggered query, use the Linked Dataset menu from the query list, define the query against the dataset and select the "Triggered Query" option.

Reuse Existing Dataset

Skip if Connected Query Running

When enabled, this option ensures that a query will not run if any of its immediate upstream or downstream connected queries are still running. If skipped, the next query run will occur at the next automated execution. By skipping the execution, it prevents potential conflicts and resource contention.

Where is this option located?

The checkbox is located within the query edit page under the Data Strategy tab. This option is available for both the Triggered Queries and Scheduled Intervals data strategies.

This option is included in Scheduled Intervals because a scheduled query may need to be skipped if one of the datasets it triggers is still running from the previous execution (e.g., a downstream query).

Trigger on Each Dataset Refresh

When enabled, this query runs each time any AgenticBI data warehouse dataset it references is executed - whether scheduled or triggered manually. If the query references multiple AgenticBI datasets, each one independently triggers a run.

For example, if this query joins Join A (AgenticBI), Join B (MySQL), and Join C (AgenticBI), it will run whenever Join A or Join C is executed. Join B, being a non-AgenticBI datasource, will not trigger a run.

Where is this option located?

The checkbox is located within the query edit page under the Data Strategy tab, and is available when Triggered Query is selected as the data strategy.