Nest.js Tutorial

Full-text search with the Drizzle ORM and PostgreSQL

Marcin Wanago
NestJSSQL

With PostgreSQL’s full-text search feature, we can quickly find documents that contain a particular word or phrase. It can also sort the results to show the most relevant matches first. In this article, we learn how to implement it with the Drizzle ORM, PostgreSQL, and NestJS.

Column types used with the text-search feature

To implement the full-text search with PostgreSQL, we need two data types. They allow us to search through a set of texts and find the ones that best match a given query.

tsvector

With the tsvector column, we can store the text in a format optimized for searching. Unfortunately, the Drizzle ORM does not support it natively yet. Therefore, we must understand how it works from the ground up and use some raw SQL in our code.

To convert a regular string to the tsvector format, we need to use the to_tsvector function.

1SELECT to_tsvector('english', 'The quick brown fox quickly jumps over the lazy dog');
“The quick brown fox jumps over the lazy dog” is a common sentence that contains all the letters of the alphabet.

When we examine the result of the above query, several optimizations become apparent. The most apparent is duplicate grouping. Using the English dictionary, PostgreSQL recognized that “quick” and “quickly” are different forms of the same word.

Besides that, the tsvector type filters the stop words. These common words appear in almost every sentence but don’t add much value when searching through text. In the example above, since we used the English dictionary, PostgreSQL automatically filtered out words like “the” and “over.” This is a test.

tsquery

The tsquery data type is designed to store the text we want to search for. To easily convert a string into the tsquery format, we must use the to_tsquery function.

1SELECT to_tsquery('fox');

To verify if the  tsvector data matches a tsquery, we must use the @@ operator.

1SELECT to_tsvector('english', 'The quick brown fox quickly jumps over the lazy dog') @@ to_tsquery('dog');

We can use the boolean operators such as &, |, and !. For example, the ! operator helps ensure that a particular word is not included in the text.

1SELECT to_tsvector('english', 'The quick brown fox quickly jumps over the lazy dog') @@ to_tsquery('!cat');
Check out the official documentation for a description of all operators.

Another helpful function is plainto_tsquery. It converts an unformatted phrase to a query by inserting the & operator between words, making it a great option for handling the user’s input.

1SELECT to_tsvector('english', 'The quick brown fox quickly jumps over the lazy dog') @@ plainto_tsquery('brown fox');

Working with existing data

In the previous parts of this series, we’ve worked with the following database schema.

database-schema.ts
1import { serial, text, pgTable } from 'drizzle-orm/pg-core';
2 
3export const articles = pgTable('articles', {
4  id: serial('id').primaryKey(),
5  title: text('title'),
6  content: text('content'),
7});
8 
9export const databaseSchema = {
10  articles,
11};

Since our table does not contain a tsvector column, we must find another solution. The most straightforward approach is to convert our text to tsvector on the fly.

articles.service.ts
1import { Injectable } from '@nestjs/common';
2import { DrizzleService } from '../database/drizzle.service';
3import { databaseSchema } from '../database/database-schema';
4import { sql } from 'drizzle-orm';
5 
6@Injectable()
7export class ArticlesService {
8  constructor(private readonly drizzleService: DrizzleService) {}
9  
10  // ...
11 
12  searchByQuery(query: string) {
13    return this.drizzleService.db
14      .select()
15      .from(databaseSchema.articles)
16      .where(
17        sql`to_tsvector('english', ${databaseSchema.articles.content}) @@ plainto_tsquery(${query})`,
18      );
19  }
20}

To use this approach in our NestJS application, we should use an optional query param that allows the user to provide the search query.

articles.controller.ts
1import {
2  Controller,
3  Get,
4  Query,
5} from '@nestjs/common';
6import { ArticlesService } from './articles.service';
7 
8@Controller('articles')
9export class ArticlesController {
10  constructor(private readonly articlesService: ArticlesService) {}
11 
12  @Get()
13  getAll(@Query('search') searchQuery: string) {
14    if (searchQuery) {
15      return this.articlesService.searchByQuery(searchQuery);
16    }
17    return this.articlesService.getAll();
18  }
19 
20  // ...
21}

We can combine the title and content columns to search through them both.

articles.service.ts
1import { Injectable } from '@nestjs/common';
2import { DrizzleService } from '../database/drizzle.service';
3import { databaseSchema } from '../database/database-schema';
4import { sql } from 'drizzle-orm';
5 
6@Injectable()
7export class ArticlesService {
8  constructor(private readonly drizzleService: DrizzleService) {}
9  
10  // ...
11 
12    searchByQuery(query: string) {
13    return this.drizzleService.db
14      .select()
15      .from(databaseSchema.articles)
16      .where(
17        sql`
18          to_tsvector(
19            'english',
20             ${databaseSchema.articles.content} || ' ' || ${databaseSchema.articles.title}
21          ) @@ plainto_tsquery(${query})
22        `,
23      );
24  }
25}

The key problem with this approach is that it forces PostgreSQL to convert the text from every record in the articles table, which can be very time-consuming. Instead, we can define a generated column that automatically transforms the data into the tsvector format.

If you want to know more about generated colums, check out API with NestJS #161. Generated columns with the Drizzle ORM and PostgreSQL

Unfortunately, the Drizzle ORM does not support the tsvector columns. To deal with that, we have to define a custom type.

database-schema.ts
1import { serial, text, pgTable, customType, index } from 'drizzle-orm/pg-core';
2import { sql, SQL } from 'drizzle-orm';
3 
4const tsvector = customType<{ data: unknown }>({
5  dataType() {
6    return 'tsvector';
7  },
8});
9 
10export const articles = pgTable(
11  'articles',
12  {
13    id: serial('id').primaryKey(),
14    title: text('title'),
15    content: text('content'),
16    textTsvector: tsvector('text_tsvector').generatedAlwaysAs(
17      (): SQL => sql`
18      to_tsvector('english', ${articles.title} || ' ' || ${articles.content})
19    `,
20    ),
21  },
22  (table) => {
23    return {
24      textTsvectorIndex: index('text_tsvector_index').using(
25        'gin',
26        table.textTsvector,
27      ),
28    };
29  },
30);
31 
32export const databaseSchema = {
33  articles,
34};

Above, we create a stored generated column. PostgreSQL automatically updates it whenever the content or title columns change.

We also create a Generalized Inverted Index (GIN). This index type is highly effective for text searching and is ideal when a column contains multiple values. Implementing a GIN index can significantly enhance the speed of our SELECT queries.

Thanks to this approach, we can now simplify our query.

articles.service.ts
1import { Injectable } from '@nestjs/common';
2import { DrizzleService } from '../database/drizzle.service';
3import { databaseSchema } from '../database/database-schema';
4import { sql } from 'drizzle-orm';
5 
6@Injectable()
7export class ArticlesService {
8  constructor(private readonly drizzleService: DrizzleService) {}
9 
10  searchByQuery(query: string) {
11    return this.drizzleService.db
12      .select()
13      .from(databaseSchema.articles)
14      .where(
15        sql`${databaseSchema.articles.textTsvector} @@ plainto_tsquery(${query})`,
16      );
17  }
18 
19  // ...
20}

Ordering our results

Until now, we haven’t focused on the order of the results in our query. Sorting the search results by relevance can significantly improve the user experience.

For instance, we can prioritize the text from the title column over the content column. To achieve this, we can modify how we create the text_tsvector column and use the setweight function.

database-schema.ts
1import { serial, text, pgTable, customType, index } from 'drizzle-orm/pg-core';
2import { sql, SQL } from 'drizzle-orm';
3 
4const tsvector = customType<{ data: unknown }>({
5  dataType() {
6    return 'tsvector';
7  },
8});
9 
10export const articles = pgTable(
11  'articles',
12  {
13    id: serial('id').primaryKey(),
14    title: text('title'),
15    content: text('content'),
16    textTsvector: tsvector('text_tsvector').generatedAlwaysAs(
17      (): SQL => sql`
18      setweight(to_tsvector('english', ${articles.title}), 'A') ||
19      setweight(to_tsvector('english', ${articles.content}), 'B')
20    `,
21    ),
22  },
23  (table) => {
24    return {
25      textTsvectorIndex: index('text_tsvector_index').using(
26        'gin',
27        table.textTsvector,
28      ),
29    };
30  },
31);
32 
33export const databaseSchema = {
34  articles,
35};

With the setweight function, we assign a weight to each term in the tsvector column. The A indicates the highest weight, and the D indicates the lowest.

Thanks to this, we can use the ts_rank function to order the results based on the weight of each column.

1import { Injectable } from '@nestjs/common';
2import { DrizzleService } from '../database/drizzle.service';
3import { databaseSchema } from '../database/database-schema';
4import { sql } from 'drizzle-orm';
5 
6@Injectable()
7export class ArticlesService {
8  constructor(private readonly drizzleService: DrizzleService) {}
9 
10  searchByQuery(query: string) {
11    return this.drizzleService.db
12      .select()
13      .from(databaseSchema.articles)
14      .where(
15        sql`${databaseSchema.articles.textTsvector} @@ plainto_tsquery(${query})`,
16      )
17      .orderBy(
18        sql`ts_rank(${databaseSchema.articles.textTsvector}, plainto_tsquery(${query})) DESC`,
19      );
20  }
21 
22  // ...
23}