Learn NestJS - 188 In-Depth Articles

Getting distinct records with Prisma and PostgreSQL

Marcin WanagoJanuary 15, 2024

NestJSSQL

PostgreSQL allows us to filter a query’s results and ensure we don’t get duplicate rows. This can be helpful when your table has many rows where the data in the columns is the same. In this article, we explore two ways PostgreSQL helps us solve this problem. We also learn how to do that with Prisma and configure it to use the native database features instead of filtering the duplicates in memory.

The DISTINCT keyword

Let’s say we have the following model created with Prisma.

schema.prisma

1model Address {
2  id      Int    @id @default(autoincrement())
3  street  String
4  city    String
5  country String
6  user    User?
7}
8 
9model User {
10  id        Int       @id @default(autoincrement())
11  email     String    @unique
12  name      String
13  password  String
14  address   Address?  @relation(fields: [addressId], references: [id])
15  addressId Int?      @unique
16}

Above, we define a relationship between the users and the addresses. If you want to know more, check out API with NestJS #33. Managing PostgreSQL relationships with Prisma

Let’s say that we have the following addresses in our table:

First, let’s create a query to get a list of all city names from our database.

1SELECT city FROM "Address";

We can modify the above query using the DISTINCT keyword to get a list of unique cities.

1SELECT DISTINCT city FROM "Address";

Using multiple columns

When we look closer at the list of addresses, we can see a city called Richmond, both in the UK and the United States. Fortunately, we can use the DISTINCT keyword to get a unique combination of the city and country.

1SELECT DISTINCT city, country FROM "Address";

The DISTINCT ON keyword

With PostgreSQL, we can access another handy tool called the DISTINCT ON. It is similar to DISTINCT but allows us to retain other columns from the row.

1SELECT DISTINCT ON (city) * FROM "Address";

Above, we selected distinct cities, but we also see ids, countries, and streets.

The catch is that DISTINCT ON selected one row per each distinct value, but it is unpredictable. In the case of New York, it removed 350 5th Ave (the Empire State Building) and selected 1000 5th Ave (The Metropolitan Museum of Art).

The DISTINCT ON keyword makes the most sense when combined with the ORDER BY clause. If we order the results before applying the DISTINCT ON filter, we can be sure that PostgreSQL will always choose the first row based on the order we specified. Thanks to that, we can have predictable results.

Let’s use DISTINCT ON to find the first user who signed up with an address from a particular country.

First, we need to join the User and Address table.

1SELECT "User".id AS userId, "Address".country AS country
2FROM "User"
3JOIN "Address" ON "User"."addressId" = "Address".id

If you want to learn more about joins, check out API with NestJS #90. Using various types of SQL joins

Let’s make sure each country is unique using the DISTINCT ON clause. By ordering the rows by the user ID, we ensure we get the users with the lowest ID possible. Thanks to that, we get the first users who signed up from a particular country.

1SELECT DISTINCT ON(country) "User".id AS "firstRegisteredUserId", "Address".country AS country
2FROM "User"
3JOIN "Address" ON "User"."addressId" = "Address".id
4ORDER BY "Address".country, "firstRegisteredUserId"

Finding distinct values using Prisma

Prisma allows us to filter duplicate rows when using the findMany query.

1const distinctCities = await this.prismaService.address.findMany({
2  distinct: ['city'],
3  select: {
4    city: true
5  }
6})

We can take it further and replicate the example we wrote before using the DISTINCT ON keyword.

1const countriesWithFirstUser = await this.prismaService.address.findMany({
2  include: {
3    user: true,
4  },
5  distinct: ['country'],
6  orderBy: {
7    user: {
8      id: 'asc',
9    },
10  },
11});

How it works under the hood

Let’s configure Prisma to log all SQL queries to the console.

prisma.service.ts

1import { Injectable, OnModuleInit } from '@nestjs/common';
2import { PrismaClient, Prisma } from '@prisma/client';
3 
4@Injectable()
5export class PrismaService
6  extends PrismaClient<Prisma.PrismaClientOptions, Prisma.LogLevel>
7  implements OnModuleInit
8{
9  constructor() {
10    super({
11      log: [
12        {
13          emit: 'event',
14          level: 'query',
15        },
16      ],
17    });
18  }
19 
20  async onModuleInit() {
21    await this.$connect();
22 
23    this.$on('query', (event) => {
24      console.log(`Query: ${event.query}`);
25    });
26  }
27}

If you want to know more about logging with Prisma and NestJS, take a look at API with NestJS #113. Logging with Prisma

Let’s run our findMany query that finds all distinct cities and inspect the SQL query that Prisma makes under the hood:

1SELECT "public"."Address"."id", "public"."Address"."city" FROM "public"."Address" WHERE 1=1 OFFSET $1

Unfortunately, there is a catch. By default, Prisma makes separate SELECT queries and processes the data in memory when we use distinct. This can result in a performance that is not as good as a native SQL query that uses the DISTINCT keyword. Since Prisma 5.7.0, we can affect that by using a preview feature.

schema.prisma

1generator client {
2  provider        = "prisma-client-js"
3  previewFeatures = ["nativeDistinct"]
4}

When we add the nativeDistinct preview feature and run npx prisma generate, Prisma starts using the DISTINCT ON clause.

1SELECT DISTINCT ON ("public"."Address"."city") "public"."Address"."id", "public"."Address"."city" FROM "public"."Address" WHERE 1=1 OFFSET $1

Unfortunately, it only works with unordered queries right now. If we run our findMany query that finds the first users who signed up from a particular country, Prisma still parses the data in memory instead of using DISTINCT ON:

1SELECT "public"."Address"."id", "public"."Address"."street", "public"."Address"."city", "public"."Address"."country" FROM "public"."Address" LEFT JOIN "public"."User" AS "orderby_1" ON ("orderby_1"."addressId") = ("public"."Address"."id") WHERE 1=1 ORDER BY "orderby_1"."id" ASC OFFSET $1;
2SELECT "public"."User"."id", "public"."User"."email", "public"."User"."name", "public"."User"."password", "public"."User"."addressId" FROM "public"."User" WHERE "public"."User"."addressId" IN ($1,$2,$3) OFFSET $4;

Hopefully, Prisma will improve its distinct implementation soon and use the native DISTINCT ON in more cases. We can track the progress of this issue on GitHub.

Summary

PostgreSQL offers efficient methods to filter out duplicate rows in queries, which helps handle large datasets with repetitive data. In this article, we explored the DISTINCT and DISTINCT ON keywords in PostgreSQL and compared them using various examples.

We also learned how to filter out duplicate values through Prisma. Moreover, we’ve looked under the hood and learned that Prisma filters out the data in memory by default. Finally, we’ve learned how to change the default behavior and configure Prisma to rely more on the native features built into PostgreSQL.