Approximate Query Processing is a new feature family. It aggregates across large datasets where responsiveness is more critical than absolute precision. In this first version, a new T-SQL aggregate function APPROX_COUNT_DISTINCT
returns the approximate number of unique non-null values in a group, and is a feature under the Intelligent Query Processing suite of features.
Note: Being new T-SQL notation, APPROX_COUNT_DISTINCT
can be used even if not upgrading to Database Compatibility Level 150 .
The new APPROX_COUNT_DISTINCT
function implementation guarantees up to a 2% error rate within a 97% probability. This is appropriate for dashboard scenarios and trend analysis against big data sets with many distinct values (for example, distinct orders counts over a time period) – and many concurrent users, where exact values are not necessary. However, this should not be used with applications where an exact value is required.
More information about this feature is available here.
You could choose to use a container to evaluate this feature. Create an instance of SQL Server 2019 using a Docker image and restore the WideWorldImportersDW database backup
You will need the WideWorldImportersDW database for this exercise. If you don't have this sample database, then you download the sample database here.
Restore the copied WideWorldImportersDW database backup into the container and restore the backup.
docker pull mcr.microsoft.com/mssql/server:2019-latest
docker run -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=`<A Strong Password`>" -p 1445:1433 --name sql2019demo -d mcr.microsoft.com/mssql/server:2019-latest
docker cp ".\Downloads\WideWorldImportersDW-Full.bak" sql2019demo:/var/opt/mssql/data
Note: For Linux installations the default path to use is /var/opt/mssql
USE [master]
GO
IF EXISTS (SELECT [database_id] FROM sys.databases WHERE [name] = 'WideWorldImportersDW')
ALTER DATABASE [WideWorldImportersDW] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
DECLARE @datafilepath VARCHAR(8000) = CAST(SERVERPROPERTY('InstanceDefaultDataPath') AS VARCHAR(4000)) + 'WideWorldImportersDW.mdf'
DECLARE @logfilepath VARCHAR(8000) = CAST(SERVERPROPERTY('InstanceDefaultLogPath') AS VARCHAR(4000)) + 'WideWorldImportersDW.ldf'
DECLARE @inmemfilepath VARCHAR(8000) = CAST(SERVERPROPERTY('InstanceDefaultDataPath') AS VARCHAR(4000)) + 'WideWorldImportersDW_InMemory_Data_1'
DECLARE @secondaryfilepath VARCHAR(8000) = CAST(SERVERPROPERTY('InstanceDefaultDataPath') AS VARCHAR(4000))+ 'WideWorldImportersDW_2.ndf'
-- Change @backupfile file path as needed
DECLARE @backupfile VARCHAR(8000) = 'E:\SampleDBs\WideWorldImportersDW-Full.bak'
RESTORE DATABASE WideWorldImportersDW
FROM DISK = @backupfile
WITH MOVE 'WWI_Primary' TO @datafilepath,
MOVE 'WWI_UserData' TO @secondaryfilepath,
MOVE 'WWIDW_InMemory_Data_1' TO @inmemfilepath,
MOVE 'WWI_Log' TO @logfilepath, NOUNLOAD, REPLACE, STATS = 10
GO
USE [master]
GO
ALTER DATABASE [WideWorldImportersDW] MODIFY FILE ( NAME = N'WWI_Log', SIZE = 4GB )
GO
USE WideWorldImportersDW;
GO
/*
Assumes a fresh restore of WideWorldImportersDW
*/
IF OBJECT_ID('Fact.OrderHistory') IS NULL
BEGIN
SELECT [Order Key], [City Key], [Customer Key], [Stock Item Key], [Order Date Key], [Picked Date Key], [Salesperson Key], [Picker Key], [WWI Order ID], [WWI Backorder ID], Description, Package, Quantity, [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Total Including Tax], [Lineage Key]
INTO Fact.OrderHistory
FROM Fact.[Order];
END;
ALTER TABLE Fact.OrderHistory
ADD CONSTRAINT PK_Fact_OrderHistory PRIMARY KEY NONCLUSTERED ([Order Key] ASC, [Order Date Key] ASC) WITH (DATA_COMPRESSION = PAGE);
GO
CREATE INDEX IX_Stock_Item_Key
ON Fact.OrderHistory ([Stock Item Key])
INCLUDE(Quantity)
WITH (DATA_COMPRESSION = PAGE);
GO
CREATE INDEX IX_OrderHistory_Quantity
ON Fact.OrderHistory ([Quantity])
INCLUDE([Order Key])
WITH (DATA_COMPRESSION = PAGE);
GO
CREATE INDEX IX_OrderHistory_CustomerKey
ON Fact.OrderHistory([Customer Key])
INCLUDE ([Total Including Tax])
WITH (DATA_COMPRESSION = PAGE);
GO
IF (SELECT COUNT(*) FROM [Fact].[OrderHistory]) < 3702592
BEGIN
DECLARE @i smallint
SET @i = 0
WHILE @i < 4
BEGIN
INSERT INTO [Fact].[OrderHistory] ([City Key], [Customer Key], [Stock Item Key], [Order Date Key], [Picked Date Key], [Salesperson Key], [Picker Key], [WWI Order ID], [WWI Backorder ID], Description, Package, Quantity, [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Total Including Tax], [Lineage Key])
SELECT [City Key], [Customer Key], [Stock Item Key], [Order Date Key], [Picked Date Key], [Salesperson Key], [Picker Key], [WWI Order ID], [WWI Backorder ID], Description, Package, Quantity, [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Total Including Tax], [Lineage Key]
FROM [Fact].[OrderHistory];
SET @i = @i +1
END;
END
GO
IF OBJECT_ID('Fact.OrderHistoryExtended') IS NULL
BEGIN
SELECT [Order Key], [City Key], [Customer Key], [Stock Item Key], [Order Date Key], [Picked Date Key], [Salesperson Key], [Picker Key], [WWI Order ID], [WWI Backorder ID], Description, Package, Quantity, [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Total Including Tax], [Lineage Key]
INTO Fact.OrderHistoryExtended
FROM Fact.[OrderHistory];
END;
ALTER TABLE Fact.OrderHistoryExtended
ADD CONSTRAINT PK_Fact_OrderHistoryExtended PRIMARY KEY NONCLUSTERED ([Order Key] ASC, [Order Date Key] ASC)
WITH (DATA_COMPRESSION = PAGE);
GO
CREATE INDEX IX_Stock_Item_Key
ON Fact.OrderHistoryExtended ([Stock Item Key])
INCLUDE (Quantity);
GO
IF (SELECT COUNT(*) FROM [Fact].[OrderHistory]) < 29620736
BEGIN
DECLARE @i smallint
SET @i = 0
WHILE @i < 3
BEGIN
INSERT Fact.OrderHistoryExtended([City Key], [Customer Key], [Stock Item Key], [Order Date Key], [Picked Date Key], [Salesperson Key], [Picker Key], [WWI Order ID], [WWI Backorder ID], Description, Package, Quantity, [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Total Including Tax], [Lineage Key])
SELECT [City Key], [Customer Key], [Stock Item Key], [Order Date Key], [Picked Date Key], [Salesperson Key], [Picker Key], [WWI Order ID], [WWI Backorder ID], Description, Package, Quantity, [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Total Including Tax], [Lineage Key]
FROM Fact.OrderHistoryExtended;
SET @i = @i +1
END;
END
GO
UPDATE Fact.OrderHistoryExtended
SET [WWI Order ID] = [Order Key];
GO
-- Repeat the following until log shrinks. These demos don't require much log space.
CHECKPOINT
GO
DBCC SHRINKFILE (N'WWI_Log' , 0, TRUNCATEONLY)
GO
SELECT * FROM sys.dm_db_log_space_usage
GO
Note that we are purposefully disabling Batch Mode on Rowstore, so that we isolate and measure only the effects of executing a query with COUNT(DISTINCT)
vs. APPROX_COUNT_DISTINCT()
.
USE [WideWorldImportersDW];
GO
ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE;
GO
SELECT COUNT(DISTINCT [WWI Order ID])
FROM [Fact].[OrderHistoryExtended]
OPTION (USE HINT('DISALLOW_BATCH_MODE'), RECOMPILE);
GO
(No column name) |
---|
29620736 |
USE [WideWorldImportersDW];
GO
ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE;
GO
SELECT APPROX_COUNT_DISTINCT([WWI Order ID])
FROM [Fact].[OrderHistoryExtended]
OPTION (USE HINT('DISALLOW_BATCH_MODE'), RECOMPILE);
GO
(No column name) |
---|
30382637 |
Notice the outputs above:
Observe the query execution plans (or actual plans) for the queries.
Remember: the "Query Cost" percentage seen in the plans is just based on estimated cost, not on runtime information.
Specifically, the memory each query was actually granted. 2.2GB for the count distinct vs. 200 bytes for the approximate count distinct:
Being so efficient in terms of consumed memory, it represents a drastic improvement for scenarios where one or more of these queries have to execute at a very high rate, and do so concurrently with other business-critical workloads. Having several concurrent queries taking 2.2GB will definitely have an impact on concurrency, whereas a query that takes 200 bytes will not have that effect.