How We Set the 95% Accuracy Benchmark for LayerNext
Kelum Vithana
December 20, 2024
The LayerNext AI system has demonstrated exceptional accuracy in answering analytical queries from BI-ready databases. When tested on Microsoft's AdventureWorks Warehouse database, it achieved an impressive 95% accuracy.
Starting with the initial understanding of the data, LayerNext can deliver fairly accurate analytical insights. This accuracy can be further enhanced through continuous human feedback.
Evaluation Process
The capability of the LayerNext AI system to answer analytical questions from a standard dataset is measured through a systematic evaluation process.
High Level Overview
The LayerNext AI systemcomprises two key components: MetaLake and the Insight Engine. MetaLake manages metadata from connected data sources, while the Insight Engine generates queries and answers using this metadata. For evaluation, predefined questions were submitted to the Insight Engine, and accuracy was measured against available ground truth answers.
Dataset and Schema
Used the schema of Microsoft's AdventureWorks Warehouse SQL DB which contains 28 tables and 344 total columns.
Table Architecture
The tables are organised as 4 major stars such as internet sales, reseller sales, inventory and finance.
Internet sales
Reseller Sales
Inventory
Finance
Note that all dates of the original database are shifted in order to have up to date data. Originally the AdventureWorks had data from 2010 to 2013 and we have added a shift to make the range 2021 to 2024. Other than this, no alterations were made to the schema or data records in the original database.
AI Model
Used the GPT-4o version 2024-08-06
Evaluation Matrix
We created 20 analytical questions covering key areas of the dataset, ranging from low to high data retrieval complexity. The complexity of data retrieval for each question is determined based on these parameters.
How many tables to join - measures the understanding of relationships of data
How many where conditions - measures the understanding of matching criteria required
No. of steps during data retrieval (i.e. with separate SQLs or sub queries in main SQL) - measures the understanding of distribution of data
Based on this, 3 complexity levels are defined as shown below.
Level
Description
No of joins
No of wheres
No of steps
L1
Low
Less than 4
0 or 1
Only 1
L2
Medium
4 to 5
Less than 3
Less than 3
L3
High
5 or more
2 or more
2 or more
The questions can be categorized into 3 analytical objectives as this:
Revenue and sales performance: Which products, customers, sales territories, and promotional campaigns drive the highest revenue and sales volume.
Marketing and Promotions: Analyse the different reasons or factors affecting the sales growth and measure the impact of marketing and promotions.
Inventory Management: Understand inventory turnover and optimize stock levels, to minimize stockouts and overstocks
Here is the full list of questions:
#
Question
Category
Level
1
What are the top 3 highest priced product lines? Give me a table with product line, no of products in each line and the price.
Revenue and sales performance
L1
2
Give a summary of total quantity, total value and total tax amount for each quarter in 2023 for online sales
Revenue and sales performance
L1
3
Find the top 10 products in terms of online sales value. Give a list with product name, total quantity sold and total revenue earned
Revenue and sales performance
L1
4
Show yearly total revenue as a table
Revenue and sales performance
L2
5
who were the top 10 distributors in terms of resell quantity during 2024. Give the details with distributer name, city, province, country, sales quantity and revenue
Revenue and sales performance
L2
6
Who are the top 5 customers based on online sales volume from 2022 to 2024. Give me a list with total sales qty, total sales value, last purchase date, customer name, country, city and province
Revenue and sales performance
L2
7
What is the total online sales revenue for each product subcategory during each year? Give me a table
Revenue and sales performance
L2
8
Give the percentage of total resell value out of the total of sales revenue for each year as table having year, resell sales, total sales and percentage
Revenue and sales performance
L2
9
What is the total quantity and amount of sales for each product category during this year for each month? Consider only the top 2 online selling product categories
Revenue and sales performance
L3
10
Gompare Revenue Contribution and Performance Between Online and Reseller Sales by Product Subcategory for each quarter in this year using a table having the Quarter, the Total revenue, Online/Resell revenue percentages.
Revenue and sales performance
L3
11
What are the top performed promotions during 2024 in terms of Resell order count. Give a list of top 5 together with the total resell order count and the total sales amount.
Marketing and promotions
L2
12
Give a list of discount based promotions (volume and seasonal) with the average discount percentage for reselling and the total sales amount from these promotions.
Marketing and promotions
L2
13
List the top-performing sales reasons based on total online sales amount in 2024.
Marketing and promotions
L2
14
Can you get the monthly percentage of total sales (online and resell) came from volume discount promotions in 2024 compared to total sales. Give me a table with month, total volume discount sales amount, total sales amount, percentage.
Marketing and promotions
L3
15
Give me a table having the sales amount for online sales triggered by marketing reasons and promotion reasons during each month of 2024. Include the month, marketing sales amount, promotion sales amount, total sales amount, marketing sales percentage and promotion sales percentage out of total sales.
Marketing and promotions
L3
16
I want to see the distribution of trigger reasons for online sales yearly. Can you show a table of year, the reason type, sales item count, percentage out of total sales items.
Marketing and promotions
L3
17
How is the total quantity of products added to the inventory up to December 31, 2023? Provide the total for each product sub category.
Inventory management
L2
18
What is the average inventory level of each product sub category throughout the year 2024? Provide the average quantity for each sub category based on the inventory data across each month in 2024.
Inventory management
L2
19
For online sales in this year, give me a table for each of the top 10 selling product sub categories in terms of sales quantity having this information: Total quantity came into the inventory, total inventory out quantity, total sales quantity, sales to inventory input ratio.
Inventory management
L3
20
Examine the products that experienced potential stock overflow, based on inventory inflow and low sales activity. A potential stock overflow is defined as subcategories where the total inventory inflow significantly exceeds the total sales (online + resell) quantity (difference ≥ 10%). Give a table with the product name, total inventory inflow, total sales quantity, and the overflow percentage.
Inventory management
L3
Automatic Tuneup Process
The accuracy gained with initial metadata can be further improved by the automated tuneup facility provided by MetaLake. There are two methods of tuneup.
Tuneup by comparing the ground truth data query with the LayerNext Insight agent generated query
Tuneup from the user’s comments (@feedback) given to the analysis questions
Evaluation Results
Initial Evaluation - with initially generated metadata
Level
Total Questions
Passed
Accuracy
L1
3
2
67%
L2
10
9
90%
L3
7
4
57%
Overall
20
15
75%
Auto Tuneup Round1: Using the comparison with ground truth
Level
Total Questions
Passed
Accuracy
L1
3
3
100%
L2
10
10
100%
L3
7
4
57%
Overall
20
17
85%
Auto Tuneup Round2: Using direct user feedback for analysis questions
Level
Total Questions
Passed
Accuracy
L1
3
3
100%
L2
10
10
100%
L3
7
6
88%
Overall
20
19
95%
Example analytical question answered by LayerNext system
Question: Can you get the monthly percentage of total sales (online and resell) from volume discount promotions in 2024 compared to total sales. Give a table with month, total volume discount sales amount, total sales amount, percentage.
SQL generated by LayerNext Insight Agent
WITH VolumeDiscountPromotions AS (
SELECT PromotionKey
FROM ERP_Data.DimPromotion
WHERE EnglishPromotionType = 'Volume Discount'),
ResellerSales AS (
SELECT d.MonthNumberOfYear, SUM(f.SalesAmount) AS VolumeDiscountSalesAmount
FROM ERP_Data.FactResellerSales f
JOIN ERP_Data.DimDate d ON f.OrderDateKey = d.DateKey
WHERE f.PromotionKey IN (SELECT PromotionKey FROM VolumeDiscountPromotions)
AND d.CalendarYear = 2024 GROUP BY d.MonthNumberOfYear
),
InternetSales AS (
SELECT d.MonthNumberOfYear, SUM(f.SalesAmount) AS VolumeDiscountSalesAmount
FROM ERP_Data.FactInternetSales f
JOIN ERP_Data.DimDate d ON f.OrderDateKey = d.DateKey
WHERE f.PromotionKey IN (SELECT PromotionKey FROM VolumeDiscountPromotions)
AND d.CalendarYear = 2024 GROUP BY d.MonthNumberOfYear
),
TotalResellerSales AS (
SELECT d.MonthNumberOfYear, SUM(f.SalesAmount) AS TotalSalesAmount
FROM ERP_Data.FactResellerSales f
JOIN ERP_Data.DimDate d ON f.OrderDateKey = d.DateKey
WHERE d.CalendarYear = 2024 GROUP BY d.MonthNumberOfYear
),
TotalInternetSales AS (
SELECT d.MonthNumberOfYear, SUM(f.SalesAmount) AS TotalSalesAmount
FROM ERP_Data.FactInternetSales f
JOIN ERP_Data.DimDate d ON f.OrderDateKey = d.DateKey
WHERE d.CalendarYear = 2024 GROUP BY d.MonthNumberOfYear
)
SELECT
COALESCE(r.MonthNumberOfYear, i.MonthNumberOfYear) AS MonthNumber,
COALESCE(r.VolumeDiscountSalesAmount, 0) + COALESCE(i.VolumeDiscountSalesAmount, 0) AS TotalVolumeDiscountSales,
COALESCE(rt.TotalSalesAmount, 0) + COALESCE(it.TotalSalesAmount, 0) AS TotalSales,
CASE
WHEN COALESCE(rt.TotalSalesAmount, 0) + COALESCE(it.TotalSalesAmount, 0) > 0 THEN
(COALESCE(r.VolumeDiscountSalesAmount, 0) + COALESCE(i.VolumeDiscountSalesAmount, 0)) /
(COALESCE(rt.TotalSalesAmount, 0) + COALESCE(it.TotalSalesAmount, 0)) * 100 ELSE 0 END AS Percentage
FROM ResellerSales r
FULL OUTER JOIN InternetSales i ON r.MonthNumberOfYear = i.MonthNumberOfYear
FULL OUTER JOIN TotalResellerSales rt ON COALESCE(r.MonthNumberOfYear, i.MonthNumberOfYear) = rt.MonthNumberOfYear
FULL OUTER JOIN TotalInternetSales it ON COALESCE(r.MonthNumberOfYear, i.MonthNumberOfYear) = it.MonthNumberOfYear
ORDER BY MonthNumber;
Final Answer
We would love to engage with anyone working on computer vision projects who is struggling to work with a large amount of vision data. Please join our slack channel or reach out to us (buddhika@layernext.ai) to discuss further.