TOP-10 Analytical SQL Queries - ByteScout
  • Home
  • /
  • Blog
  • /
  • TOP-10 Analytical SQL Queries

TOP-10 Analytical SQL Queries

This post gives a precise, out-and-out idea of analytic functions and their different choices through a range of easy yet concept-building cases. The post is designed for SQL coders, who might not be utilizing analytic functions due to a lack of understanding of its mysterious syntax or difficulty with its philosophy of action. This post presumes knowledge of essential Oracle SQL, sub-query, join, and group functions. Based on that knowledge, it creates the idea of analytic functions within a range of examples. Let’s take a look at it in more detail.

  1. DENSE_RANK
  2. FIRST_VALUE
  3. LAST_VALUE
  4. LEAD
  5. LAG
  6. Nth Value
  7. NTILE
  8. ROW_NUMBER
  9. RANK
  10. CUME_DIST
  11. Query_partition_clause

1. DENSE_RANK

It is a kind of analytic function that determines the order (rank) of a row. The difference between the RANK function and this function is that the DENSE_Rank function returns rank as sequential integers. In the example given below, we are going to obtain the rank of the column salary in the EMP table of Scott’s schema. The output will display the ranks in consecutive order.

SELECT
salary,
DENSE_RANK () OVER (
ORDER BY salary)
sal_of_emp
FROM
EMP;

2. FIRST_VALUE

The FIRST_VALUE is an analytic function that is utilized to give the value of the first row in an organized collection of rows. The example given below will display the lowest salary based on the city of the EMP table. In other words, the following example will display the lowest salary for each city.

select emp_no ,
sal,
city,
FIRST_VALUE(sal)
OVER(PARTITION BY city
ORDER BY emp_no
)FIRST_V
from emp;

3. LAST_VALUE

This is also one of the most widely used analytical functions. As the name suggests, this function gives the highest salary based on the city in the emp table. The example given below will display the highest salary based on the city of the EMP table.

select emp_no,
salary,
city,
LAST_VALUE(salary)
OVER(PARTITION BY city
ORDER BY emp_id
RANGE BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING)HIGHEST_SALARY
from emp;

The Range between clauses used in the above example means that the window frame begins at the first row and finishes in the end row of the output set.

4. LEAD

The LEAD is one more analytic function that enables a user to obtain the next row from the prevailing row based on an offset rate without applying self-join. The following query returns the salary from the next row to estimate the divergence between the salary of the current row and the succeeding row. The default is 1 if you do not specify offset. The absolute default status is returned if the offset goes past the extent of the window. The default is null if you do not determine the default.

SELECT emp_no,
empname,
job,
salary,
LEAD(salary, 1, 0) OVER (ORDER BY salary) AS salary_next,
LEAD(salary, 1, 0) OVER (ORDER BY salary) - salary AS salary_diff
FROM emp;

5. LAG

LAG is an analytic function that enables users to get a previous row from the prevailing row based on an offset rate without applying self-join. The following query returns the salary from the previous row to calculate the difference between the salary of the current row and that of the prior row. 

SELECT deptno,
eno,
emname,
job,
salary,
LAG(sal, 1, 0) OVER (PARTITION BY deptno ORDER BY salary) AS salary_prev
FROM emp;

6. Nth Value

The Nth value is an analytic function that is used to return the Nth value between the various set of values. For example, the example given below is displaying how to find the price of the second most costly bike using the bike producer’s name from the table bike.

SELECT
BIKE_NO,
bike_name,
price,
NTH_VALUE(price,2) OVER (
PARTITION BY BIKE_NAME
ORDER BY price DESC
RANGE BETWEEN
UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) AS second_costly_bike
FROM
bike;

Many business reports and high-level logical processing usually require restricting the number of rows passed by a query either by requesting for a particular number of rows (for instance top 5) or a section of rows (for instance 30% of all rows). This kind of query is usually described as a Top-N query and it gives a sincere form of generating reports that display “Who are the ten most valuable clients?”. It is also beneficial for user interfaces that give the first several rows of data inserted for browsing.

7. NTILE

One of the most widely used analytic functions is NTILE. It is used to break an organized set into containers and allots a container number to each row. The container is also called a bucket. The following example is displaying how to allot a bucket number to each bike and the cost column of the bike to a bucket number.

SELECT
bike_name,
price,
NTILE(4) OVER(
ORDER BY cost DESC
) bucket_number
FROM
bike;

8. ROW_NUMBER

This analytic function is used to allot an unprecedented consecutive number to each row of the output set. The following example is displaying how to assign a row number to the output set that includes bike id, bike name, price, city, and purchase when the output set is organized by the purchase column and ordered in descending order.

SELECT
ROW_NUMBER() OVER(
ORDER BY cost DESC
) row_number,
bike_id,
bike_name,
city
FROM
bike;

9. RANK

RANK is also an analytical function that is applied to determine the rank of a price in an organized collection of values. One crucial feature that makes it distinct from DENSE_RANK is that the ranks generated by this function are not consecutive numbers. The example given below is displaying the rank of each bike by its price in descending sequence.

SELECT bike_id, bike_name,
purchase, RANK() OVER(ORDER BY purchase desc)
RANK_NUMBER from
bike;

10. CUME_DIST

The CUME_DIST is an analytical function that is utilized to determine the total circulation of a specific value among a collection of prices. The example given below is displaying how to calculate the percentile rank for the purchase of each bike_id in table bike.

SELECT
bike_id,
purchase,
ROUND(PERCENT_RANK() OVER (ORDER BY purchase DESC) * 100,2) || '%' percent_rank
FROM
bike;

11. Query_partition_clause

The query_partition_clause divides the output into orders, or groups, of data. The expansion of the analytic query is confined to the confines imposed by these barriers, compared to the process of a GROUP BY clause that alters the execution of an aggregate function.If the query_partition_clause is omitted, the complete result group is defined as a distinct partition. The following query is displaying its use:

SELECT eno, dno, sal,
AVG(sal) OVER () AS avg_sal_company
FROM emp;

Analytic SQL enables users to split query output into organized groups of rows named “partitions”. Any aggregated outputs are open to analytical purposes. Partitions can be based upon any column(s) or character. A query output may have just an individual partition containing all the rows, several huge partitions, or multiple short partitions with each containing just some rows.

   

About the Author

ByteScout Team ByteScout Team of Writers ByteScout has a team of professional writers proficient in different technical topics. We select the best writers to cover interesting and trending topics for our readers. We love developers and we hope our articles help you learn about programming and programmers.  
prev
next