Back

How We Set the 95% Accuracy Benchmark for LayerNext

Kelum Vithana
December 20, 2024

The LayerNext AI system has demonstrated exceptional accuracy in answering analytical queries from BI-ready databases. When tested on Microsoft's AdventureWorks Warehouse database, it achieved an impressive 95% accuracy.

Starting with the initial understanding of the data, LayerNext can deliver fairly accurate analytical insights. This accuracy can be further enhanced through continuous human feedback.

Evaluation Process

The capability of the LayerNext AI system to answer analytical questions from a standard dataset is measured through a systematic evaluation process.

High Level Overview

The LayerNext AI system comprises two key components: MetaLake and the Insight Engine. MetaLake manages metadata from connected data sources, while the Insight Engine generates queries and answers using this metadata. For evaluation, predefined questions were submitted to the Insight Engine, and accuracy was measured against available ground truth answers.

Dataset and Schema

Used the schema of Microsoft's AdventureWorks Warehouse SQL DB which contains 28 tables and 344 total columns.

Table Architecture

The tables are organised as 4 major stars such as internet sales, reseller sales, inventory and finance.

Internet sales

Reseller Sales

Inventory

Finance

Note that all dates of the original database are shifted in order to have up to date data. Originally the AdventureWorks had data from 2010 to 2013 and we have added a shift to make the range 2021 to 2024. Other than this, no alterations were made to the schema or data records in the original database.

AI Model

Used the GPT-4o version 2024-08-06

Evaluation Matrix

We created 20 analytical questions covering key areas of the dataset, ranging from low to high data retrieval complexity. The complexity of data retrieval for each question is determined based on these parameters.

  1. How many tables to join - measures the understanding of relationships of data
  2. How many where conditions - measures the understanding of matching criteria required
  3. No. of steps during data retrieval (i.e. with separate SQLs or sub queries in main SQL) - measures the understanding of distribution of data

Based on this, 3 complexity levels are defined as shown below.

Level Description No of joins No of wheres No of steps
L1 Low Less than 4 0 or 1 Only 1
L2 Medium 4 to 5 Less than 3 Less than 3
L3 High 5 or more 2 or more 2 or more

The questions can be categorized into 3 analytical objectives as this:

  1. Revenue and sales performance: Which products, customers, sales territories, and promotional campaigns drive the highest revenue and sales volume.
  2. Marketing and Promotions: Analyse the different reasons or factors affecting the sales growth and measure the impact of marketing and promotions.
  3. Inventory Management: Understand inventory turnover and optimize stock levels, to minimize stockouts and overstocks

Here is the full list of questions:

# Question Category Level
1 What are the top 3 highest priced product lines? Give me a table with product line, no of products in each line and the price. Revenue and sales performance L1
2 Give a summary of total quantity, total value and total tax amount for each quarter in 2023 for online sales Revenue and sales performance L1
3 Find the top 10 products in terms of online sales value. Give a list with product name, total quantity sold and total revenue earned Revenue and sales performance L1
4 Show yearly total revenue as a table Revenue and sales performance L2
5 who were the top 10 distributors in terms of resell quantity during 2024. Give the details with distributer name, city, province, country, sales quantity and revenue Revenue and sales performance L2
6 Who are the top 5 customers based on online sales volume from 2022 to 2024. Give me a list with total sales qty, total sales value, last purchase date, customer name, country, city and province Revenue and sales performance L2
7 What is the total online sales revenue for each product subcategory during each year? Give me a table Revenue and sales performance L2
8 Give the percentage of total resell value out of the total of sales revenue for each year as table having year, resell sales, total sales and percentage Revenue and sales performance L2
9 What is the total quantity and amount of sales for each product category during this year for each month? Consider only the top 2 online selling product categories Revenue and sales performance L3
10 Gompare Revenue Contribution and Performance Between Online and Reseller Sales by Product Subcategory for each quarter in this year using a table having the Quarter, the Total revenue, Online/Resell revenue percentages. Revenue and sales performance L3
11 What are the top performed promotions during 2024 in terms of Resell order count. Give a list of top 5 together with the total resell order count and the total sales amount. Marketing and promotions L2
12 Give a list of discount based promotions (volume and seasonal) with the average discount percentage for reselling and the total sales amount from these promotions. Marketing and promotions L2
13 List the top-performing sales reasons based on total online sales amount in 2024. Marketing and promotions L2
14 Can you get the monthly percentage of total sales (online and resell) came from volume discount promotions in 2024 compared to total sales. Give me a table with month, total volume discount sales amount, total sales amount, percentage. Marketing and promotions L3
15 Give me a table having the sales amount for online sales triggered by marketing reasons and promotion reasons during each month of 2024. Include the month, marketing sales amount, promotion sales amount, total sales amount, marketing sales percentage and promotion sales percentage out of total sales. Marketing and promotions L3
16 I want to see the distribution of trigger reasons for online sales yearly. Can you show a table of year, the reason type, sales item count, percentage out of total sales items. Marketing and promotions L3
17 How is the total quantity of products added to the inventory up to December 31, 2023? Provide the total for each product sub category. Inventory management L2
18 What is the average inventory level of each product sub category throughout the year 2024? Provide the average quantity for each sub category based on the inventory data across each month in 2024. Inventory management L2
19 For online sales in this year, give me a table for each of the top 10 selling product sub categories in terms of sales quantity having this information: Total quantity came into the inventory, total inventory out quantity, total sales quantity, sales to inventory input ratio. Inventory management L3
20 Examine the products that experienced potential stock overflow, based on inventory inflow and low sales activity. A potential stock overflow is defined as subcategories where the total inventory inflow significantly exceeds the total sales (online + resell) quantity (difference ≥ 10%). Give a table with the product name, total inventory inflow, total sales quantity, and the overflow percentage. Inventory management L3

Automatic Tuneup Process

The accuracy gained with initial metadata can be further improved by the automated tuneup facility provided by MetaLake. There are two methods of tuneup.

  1. Tuneup by comparing the ground truth data query with the LayerNext Insight agent generated query
  2. Tuneup from the user’s comments (@feedback) given to the analysis questions

Evaluation Results

Initial Evaluation - with initially generated metadata

Level Total Questions Passed Accuracy
L1 3 2 67%
L2 10 9 90%
L3 7 4 57%
Overall 20 15 75%

Auto Tuneup Round1: Using the comparison with ground truth

Level Total Questions Passed Accuracy
L1 3 3 100%
L2 10 10 100%
L3 7 4 57%
Overall 20 17 85%

Auto Tuneup Round2: Using direct user feedback for analysis questions

Level Total Questions Passed Accuracy
L1 3 3 100%
L2 10 10 100%
L3 7 6 88%
Overall 20 19 95%

Example analytical question answered by LayerNext system

Question: Can you get the monthly percentage of total sales (online and resell) from volume discount promotions in 2024 compared to total sales. Give a table with month, total volume discount sales amount, total sales amount, percentage.

SQL generated by LayerNext Insight Agent

WITH VolumeDiscountPromotions AS (
    SELECT PromotionKey
    FROM ERP_Data.DimPromotion
    WHERE EnglishPromotionType = 'Volume Discount'
),
ResellerSales AS (
    SELECT d.MonthNumberOfYear, SUM(f.SalesAmount) AS VolumeDiscountSalesAmount
    FROM ERP_Data.FactResellerSales f
    JOIN ERP_Data.DimDate d ON f.OrderDateKey = d.DateKey
    WHERE f.PromotionKey IN (SELECT PromotionKey FROM VolumeDiscountPromotions)
    AND d.CalendarYear = 2024
    GROUP BY d.MonthNumberOfYear
),
InternetSales AS (
    SELECT d.MonthNumberOfYear, SUM(f.SalesAmount) AS VolumeDiscountSalesAmount
    FROM ERP_Data.FactInternetSales f
    JOIN ERP_Data.DimDate d ON f.OrderDateKey = d.DateKey
    WHERE f.PromotionKey IN (SELECT PromotionKey FROM VolumeDiscountPromotions)
    AND d.CalendarYear = 2024
    GROUP BY d.MonthNumberOfYear
),
TotalResellerSales AS (
    SELECT d.MonthNumberOfYear, SUM(f.SalesAmount) AS TotalSalesAmount
    FROM ERP_Data.FactResellerSales f
    JOIN ERP_Data.DimDate d ON f.OrderDateKey = d.DateKey
    WHERE d.CalendarYear = 2024
    GROUP BY d.MonthNumberOfYear
),
TotalInternetSales AS (
    SELECT d.MonthNumberOfYear, SUM(f.SalesAmount) AS TotalSalesAmount
    FROM ERP_Data.FactInternetSales f
    JOIN ERP_Data.DimDate d ON f.OrderDateKey = d.DateKey
    WHERE d.CalendarYear = 2024
    GROUP BY d.MonthNumberOfYear
)
SELECT 
    COALESCE(r.MonthNumberOfYear, i.MonthNumberOfYear) AS MonthNumber,
    COALESCE(r.VolumeDiscountSalesAmount, 0) + COALESCE(i.VolumeDiscountSalesAmount, 0) AS TotalVolumeDiscountSales,
    COALESCE(rt.TotalSalesAmount, 0) + COALESCE(it.TotalSalesAmount, 0) AS TotalSales,
    CASE 
        WHEN COALESCE(rt.TotalSalesAmount, 0) + COALESCE(it.TotalSalesAmount, 0) > 0 THEN 
            (COALESCE(r.VolumeDiscountSalesAmount, 0) + COALESCE(i.VolumeDiscountSalesAmount, 0)) /
            (COALESCE(rt.TotalSalesAmount, 0) + COALESCE(it.TotalSalesAmount, 0)) * 100
        ELSE 0
    END AS Percentage
FROM ResellerSales r
FULL OUTER JOIN InternetSales i ON r.MonthNumberOfYear = i.MonthNumberOfYear
FULL OUTER JOIN TotalResellerSales rt ON COALESCE(r.MonthNumberOfYear, i.MonthNumberOfYear) = rt.MonthNumberOfYear
FULL OUTER JOIN TotalInternetSales it ON COALESCE(r.MonthNumberOfYear, i.MonthNumberOfYear) = it.MonthNumberOfYear
ORDER BY MonthNumber;

Final Answer

We would love to engage with anyone working on computer vision projects who is struggling to work with a large amount of vision data. Please join our slack channel or reach out to us (buddhika@layernext.ai) to discuss further.

Get in touch logo.
Get in touch