So far, we’ve been mostly writing SQL queries that either store or retrieve the data from the database. Besides that, we can rely on PostgreSQL to process the data and get the computed results. By doing that, we can learn more about the rows in our tables. In this article, we look into how we can use aggregate functions to generate statistics about our data.
For the code from this article check out this repository.
The purpose of aggregate functions
The job of an aggregate function is to compute a single result from multiple input rows. One of the most popular aggregate functions is count(). When used with an asterisk, it measures the total number of rows in the table.
1SELECT count(*) FROM usersWhen we provide the count function with a column name, it counts the number of rows with a non-NULL value for that column.
1SELECT count(address_id) AS number_of_users_with_address FROM usersThe count() function was handy in the previous article when we implemented pagination.
1import { Injectable } from '@nestjs/common';
2import DatabaseService from '../database/database.service';
3import PostModel from './post.model';
4
5@Injectable()
6class PostsRepository {
7 constructor(private readonly databaseService: DatabaseService) {}
8
9 async get(offset = 0, limit: number | null = null, idsToSkip = 0) {
10 const databaseResponse = await this.databaseService.runQuery(
11 `
12 WITH selected_posts AS (
13 SELECT * FROM posts
14 WHERE id > $3
15 ORDER BY id ASC
16 OFFSET $1
17 LIMIT $2
18 ),
19 total_posts_count_response AS (
20 SELECT COUNT(*)::int AS total_posts_count FROM posts
21 )
22 SELECT * FROM selected_posts, total_posts_count_response
23 `,
24 [offset, limit, idsToSkip],
25 );
26 const items = databaseResponse.rows.map(
27 (databaseRow) => new PostModel(databaseRow),
28 );
29 const count = databaseResponse.rows[0]?.total_posts_count || 0;
30 return {
31 items,
32 count,
33 };
34 }
35
36 // ...
37}
38
39export default PostsRepository;Grouping data in the table
Aggregate functions work great when we perform them on groups of data
1SELECT author_id, count(*) FROM posts
2GROUP BY author_idWhen we do the above, PostgreSQL divides the data into groups and runs the aggregate function on each group individually.
We could make our query even more helpful and order our results. By doing that, we can ensure the authors with the highest number of posts are at the top of the list.
1SELECT author_id, count(*) AS posts_count FROM posts
2GROUP BY author_id
3ORDER BY posts_count DESCLet’s create a model that can hold the above data.
1export interface PostAuthorStatisticsModelData {
2 author_id: number;
3 posts_count: number;
4}
5class PostAuthorStatisticsModel {
6 authorId: number;
7 postsCount: number;
8 constructor(postAuthorStatisticsData: PostAuthorStatisticsModelData) {
9 this.authorId = postAuthorStatisticsData.author_id;
10 this.postsCount = postAuthorStatisticsData.posts_count;
11 }
12}
13
14export default PostAuthorStatisticsModel;Let’s create a separate statistics repository to prevent our PostsRepository class from getting too big.
1import {
2 Injectable,
3} from '@nestjs/common';
4import DatabaseService from '../database/database.service';
5import PostAuthorStatisticsModel from './postAuthorStatistics.model';
6
7@Injectable()
8class PostsStatisticsRepository {
9 constructor(private readonly databaseService: DatabaseService) {}
10
11 async getPostsAuthorStatistics() {
12 const databaseResponse = await this.databaseService.runQuery(
13 `
14 SELECT author_id, count(*)::int AS posts_count FROM posts
15 GROUP BY author_id
16 ORDER BY posts_count DESC
17 `,
18 [],
19 );
20 return databaseResponse.rows.map(
21 (databaseRow) => new PostAuthorStatisticsModel(databaseRow),
22 );
23 }
24}
25
26export default PostsStatisticsRepository;The count() function returns the value using the bigint data type. Because of that, we convert it to a regular integer. If you want to know more, check out the previous article.
We also need to point to our new repository in the PostsService class.
1import { Injectable } from '@nestjs/common';
2import PostsRepository from './posts.repository';
3import PostsStatisticsRepository from './postsStatistics.repository';
4
5@Injectable()
6export class PostsService {
7 constructor(
8 private readonly postsRepository: PostsRepository,
9 private readonly postsStatisticsRepository: PostsStatisticsRepository,
10 ) {}
11
12 getPostAuthorStatistics() {
13 return this.postsStatisticsRepository.getPostsAuthorStatistics();
14 }
15
16 // ...
17}The last step is to use it in the controller.
1import {
2 ClassSerializerInterceptor,
3 Controller,
4 Get,
5 Query,
6 UseInterceptors,
7} from '@nestjs/common';
8import { PostsService } from './posts.service';
9import GetPostsByAuthorQuery from './getPostsByAuthorQuery';
10import PaginationParams from '../utils/paginationParams';
11
12@Controller('posts')
13@UseInterceptors(ClassSerializerInterceptor)
14export default class PostsController {
15 constructor(private readonly postsService: PostsService) {}
16
17 @Get()
18 getPosts(
19 @Query() { authorId }: GetPostsByAuthorQuery,
20 @Query() { offset, limit, idsToSkip }: PaginationParams,
21 ) {
22 return this.postsService.getPosts(authorId, offset, limit, idsToSkip);
23 }
24
25 @Get('statistics')
26 getStatistics() {
27 return this.postsService.getPostAuthorStatistics();
28 }
29
30 // ...
31}Other aggregate functions
There are more aggregate functions besides count(). Let’s go through them.
max and min
Using the max() function, we can find the largest value of the selected column. Respectively, the min() function returns the smallest value of the column.
Since we don’t have any numerical columns in our posts, let’s pair the above functions with length(). This way, we can get the longest and shortest posts of a particular author.
The length() function returns the length of a string.
1import { Injectable } from '@nestjs/common';
2import DatabaseService from '../database/database.service';
3import PostAuthorStatisticsModel from './postAuthorStatistics.model';
4
5@Injectable()
6class PostsStatisticsRepository {
7 constructor(private readonly databaseService: DatabaseService) {}
8
9 async getPostsAuthorStatistics() {
10 const databaseResponse = await this.databaseService.runQuery(
11 `
12 SELECT
13 author_id,
14 count(*)::int AS posts_count,
15 max(length(post_content)) AS longest_post_length,
16 min(length(post_content)) AS shortest_post_length
17 FROM posts
18 GROUP BY author_id
19 ORDER BY posts_count DESC
20 `,
21 [],
22 );
23 return databaseResponse.rows.map(
24 (databaseRow) => new PostAuthorStatisticsModel(databaseRow),
25 );
26 }
27}
28
29export default PostsStatisticsRepository;Running max() or min() on a text column returns a string based on the alphabetical order.
sum
With the sum() function, we can return a total sum of a particular column. Since it only works with numerical values, we also need the length() function.
Since the sum() function also returns the value in the bigint format, we transform it to a regular integer. We can do it because we don’t expect values bigger than 2³¹⁻¹.
1import { Injectable } from '@nestjs/common';
2import DatabaseService from '../database/database.service';
3import PostAuthorStatisticsModel from './postAuthorStatistics.model';
4
5@Injectable()
6class PostsStatisticsRepository {
7 constructor(private readonly databaseService: DatabaseService) {}
8
9 async getPostsAuthorStatistics() {
10 const databaseResponse = await this.databaseService.runQuery(
11 `
12 SELECT
13 author_id,
14 count(*)::int AS posts_count,
15 max(length(post_content)) AS longest_post_length,
16 min(length(post_content)) AS shortest_post_length,
17 sum(length(post_content))::int AS all_posts_content_sum
18 FROM posts
19 GROUP BY author_id
20 ORDER BY posts_count DESC
21 `,
22 [],
23 );
24 return databaseResponse.rows.map(
25 (databaseRow) => new PostAuthorStatisticsModel(databaseRow),
26 );
27 }
28}
29
30export default PostsStatisticsRepository;avg
The avg() function calculates the average of the values in a group. Let’s combine it with the length() function to calculate the average length of all posts of a particular author.
1import { Injectable } from '@nestjs/common';
2import DatabaseService from '../database/database.service';
3import PostAuthorStatisticsModel from './postAuthorStatistics.model';
4
5@Injectable()
6class PostsStatisticsRepository {
7 constructor(private readonly databaseService: DatabaseService) {}
8
9 async getPostsAuthorStatistics() {
10 const databaseResponse = await this.databaseService.runQuery(
11 `
12 SELECT
13 author_id,
14 count(*)::int AS posts_count,
15 max(length(post_content)) AS longest_post_length,
16 min(length(post_content)) AS shortest_post_length,
17 sum(length(post_content))::int AS all_posts_content_sum,
18 avg(length(post_content))::real AS average_post_content_length
19 FROM posts
20 GROUP BY author_id
21 ORDER BY posts_count DESC
22 `,
23 [],
24 );
25 return databaseResponse.rows.map(
26 (databaseRow) => new PostAuthorStatisticsModel(databaseRow),
27 );
28 }
29}
30
31export default PostsStatisticsRepository;A significant thing about the avg() function is that it returns the data in the numeric type. It can store many digits and is very useful when exactness is crucial. Parsing this data type to JSON converts it to a string by default. Since we don’t need many digits after the decimal, we convert it to the real data type.
Aggregating data from more than one table
So far, we’ve been grouping and aggregating data in one table. However, a typical case might be when we want to aggregate data using more than one table. In this case, we need to use the JOIN keyword.
1SELECT addresses.country, COUNT(*) AS number_of_users FROM users
2LEFT JOIN addresses ON users.address_id = addresses.id
3GROUP BY addresses.countryAbove, we perform an outer join using the LEFT JOIN keyword. If you want to know more, check out API with NestJS #73. One-to-one relationships with raw SQL queries
Filtering using aggregate functions and grouping
So far, to filter the results from the database, we’ve been using the WHERE keyword.
1SELECT * FROM posts
2WHERE id = 1When we want to filter using aggregate functions and grouping, we need to look at the execution order of SQL clauses.
- FROM
- WHERE
- GROUP BY
- HAVING
- DISTINCT
- SELECT
- ORDER BY
- LIMIT
Since PostgreSQL executes the WHERE clause before GROUP BY, we can’t use it with aggregate functions.
1SELECT author_id, count(*) as posts_count FROM posts
2WHERE count(*) > 100
3GROUP BY author_idERROR: aggregate functions are not allowed in WHERE LINE 2: WHERE count(*) > 100
Instead, we need to use the HAVING keyword.
1SELECT author_id, count(*) AS posts_count FROM posts
2GROUP BY author_id
3HAVING count(*) > 100Using aliases
An important caveat is that we can’t use column aliases with the HAVING keyword. So, for example, the following code wouldn’t work:
1SELECT author_id, count(*) AS posts_count FROM posts
2GROUP BY author_id
3HAVING posts_count > 100Instead, we need to use the count() function twice and rely on PostgreSQL to optimize it.
Summary
In this article, we’ve gone through how to use aggregate functions together with grouping. When doing so, we’ve implemented an endpoint that returns statistics about a particular table. We also wrote an example that uses an aggregate function and grouping when joining two tables. Finally, we’ve also learned how to filter our data using aggregate functions and why we can’t do that with the WHERE keyword.
There is still more to cover when writing raw SQL with NestJS, so stay tuned!