Search
ctrl/
Ask AI
Light
Dark
System

ext::pg_trgm

This extension provides tools for determining similarity of text based on trigram matching.

Word similarity tools can often supplement full-text search. Full-text search concentrates on matching words and phrases typically trying to account for some grammatical variations, while trigram matching analyzes similarity between words. Thus trigram matching can account for misspelling or words that aren’t dictionary words:

Copy
db> 
select fts::search(Doc, 'thaco').object{text};
{}
Copy
db> 
select Doc{text} filter ext::pg_trgm::word_similar('thaco', Doc.text);
{
  default::Doc {
    text: 'THAC0 is used in AD&D 2 to determine likelihood of hitting',
  },
}

The first search attempt fails to produce results because “THAC0” is an obscure acronym that is misspelled in the query. However, using similarity search produces a hit because the acronym is not too badly misspelled and is close enough.

The Postgres that comes packaged with the EdgeDB 4.0+ server includes pg_trgm, as does EdgeDB Cloud. It you are using a separate Postgres backend, you will need to arrange for it to be installed.

To activate this functionality you can use the extension mechanism:

Copy
using extension pg_trgm;

That will give you access to the ext::pg_trgm module where you may find the following functions:

ext::pg_trgm::similarity()

Computes how similar two strings are.

ext::pg_trgm::similarity_dist()

Computes how distant two strings are.

ext::pg_trgm::similar()

Returns whether two strings are similar.

ext::pg_trgm::word_similarity()

Returns similarity between the first and any part of the second string.

ext::pg_trgm::word_similarity_dist()

Returns distance between the first and any part of the second string.

ext::pg_trgm::word_similar()

Returns whether the first string is similar to any part of the second.

ext::pg_trgm::strict_word_similarity()

Same as word_similarity, but with stricter boundaries.

ext::pg_trgm::strict_word_similarity_dist()

Same as word_similarity_dist, but with stricter boundaries.

ext::pg_trgm::strict_word_similar()

Same as word_similar, but with stricter boundaries.

In addition to the functions this extension has two indexes that speed up queries that involve similarity searches: ext::pg_trgm::gin and ext::pg_trgm::gist.

This extension also adds a few configuration options to control some of the similarity search behavior:

Copy
type Config extending cfg::ConfigObject {
  required similarity_threshold: float32;
  required word_similarity_threshold: float32;
  required strict_word_similarity_threshold: float32;
}

All of the configuration parameters have to take values between 0 and 1.

The similarity_threshold sets the current similarity threshold that is used by ext::pg_trgm::similar() (default is 0.3).

The word_similarity_threshold sets the current word similarity threshold that is used by ext::pg_trgm::word_similar() (default is 0.6).

The strict_word_similarity_threshold sets the current strict word similarity threshold that is used by ext::pg_trgm::strict_word_similar() (default is 0.5).

function

ext::pg_trgm::similarity()
ext::pg_trgm::similarity(a: str, b: str) -> float32

Computes how similar two strings are.

The result is always a value between 0 and 1, where 0 indicates no similarity and 1 indicates the strings are identical.

Copy
db> 
select ext::pg_trgm::similarity('cat', 'dog');
{0}
Copy
db> 
select ext::pg_trgm::similarity('cat', 'cart');
{0.28571427}
Copy
db> 
select ext::pg_trgm::similarity('cat', 'car');
{0.33333337}
Copy
db> 
select ext::pg_trgm::similarity('cat', 'cat');
{1}

function

ext::pg_trgm::similarity_dist()
ext::pg_trgm::similarity_dist(a: str, b: str) -> float32

Computes how distant two strings are.

The distance between a and b is simply defined as 1 - ext::pg_trgm::similarity(a, b).

Copy
db> 
select ext::pg_trgm::similarity_dist('cat', 'dog');
{1}
Copy
db> 
select ext::pg_trgm::similarity_dist('cat', 'cart');
{0.71428573}
Copy
db> 
select ext::pg_trgm::similarity_dist('cat', 'car');
{0.6666666}
Copy
db> 
select ext::pg_trgm::similarity_dist('cat', 'cat');
{0}

function

ext::pg_trgm::similar()
ext::pg_trgm::similar(a: str, b: str) -> bool

Returns whether two strings are similar.

The result is true if the ext::pg_trgm::similarity() between the two strings is greater than the currently configured similarity_threshold.

Copy
db> 
select ext::pg_trgm::similar('cat', 'dog');
{false}
Copy
db> 
select ext::pg_trgm::similar('cat', 'cart');
{false}
Copy
db> 
select ext::pg_trgm::similar('cat', 'car');
{true}
Copy
db> 
select ext::pg_trgm::similar('cat', 'cat');
{true}

function

ext::pg_trgm::word_similarity()
ext::pg_trgm::word_similarity(a: str, b: str) -> float32

Returns similarity between the first and any part of the second string.

The result is the greatest similarity between the set of trigrams in a and any continuous extent of an ordered set of trigrams in b.

Copy
db> 
select ext::pg_trgm::word_similarity('cat', 'Lazy dog');
{0}
Copy
db> 
select ext::pg_trgm::word_similarity('cat', 'Dog in a car');
{0.5}
Copy
db> 
select ext::pg_trgm::word_similarity('cat', 'Dog catastrophe');
{0.75}
Copy
db> 
select ext::pg_trgm::word_similarity('cat', 'Lazy dog and cat');
{1}

function

ext::pg_trgm::word_similarity_dist()
ext::pg_trgm::word_similarity_dist(a: str, b: str) -> float32

Returns distance between the first and any part of the second string.

The distance between a and b is simply defined as 1 - ext::pg_trgm::word_similarity(a, b).

Copy
db> 
select ext::pg_trgm::word_similarity_dist('cat', 'Lazy dog');
{1}
Copy
db> 
select ext::pg_trgm::word_similarity_dist('cat', 'Dog in a car');
{0.5}
Copy
db> 
select ext::pg_trgm::word_similarity_dist('cat', 'Dog catastrophe');
{0.25}
Copy
db> 
select ext::pg_trgm::word_similarity_dist('cat', 'Lazy dog and cat');
{0}

function

ext::pg_trgm::word_similar()
ext::pg_trgm::word_similar(a: str, b: str) -> bool

Returns whether the first string is similar to any part of the second.

The result is true if the ext::pg_trgm::word_similarity() between the two strings is greater than the currently configured word_similarity_threshold.

Copy
db> 
select ext::pg_trgm::word_similar('cat', 'Lazy dog');
{false}
Copy
db> 
select ext::pg_trgm::word_similar('cat', 'Dog in a car');
{false}
Copy
db> 
select ext::pg_trgm::word_similar('cat', 'Dog catastrophe');
{true}
Copy
db> 
select ext::pg_trgm::word_similar('cat', 'Lazy dog and cat');
{true}

function

ext::pg_trgm::strict_word_similarity()
ext::pg_trgm::strict_word_similarity(a: str, b: str) -> float32

Same as word_similarity, but with stricter boundaries.

This works much like ext::pg_trgm::word_similarity(), but also forces the match within b to happen at word boundaries.

Copy
db> 
select ext::pg_trgm::strict_word_similarity('cat', 'Lazy dog');
{0}
Copy
db> 
select ext::pg_trgm::strict_word_similarity('cat', 'Dog in a car');
{0.5}
Copy
db> 
... 
select ext::pg_trgm::strict_word_similarity(
  'cat', 'Dog catastrophy');
{0.23076922}
Copy
db> 
... 
select ext::pg_trgm::strict_word_similarity(
  'cat', 'Lazy dog and cat');
{1}

function

ext::pg_trgm::strict_word_similarity_dist()
ext::pg_trgm::strict_word_similarity_dist(a: str, b: str) -> float32

Same as word_similarity_dist, but with stricter boundaries.

This works much like ext::pg_trgm::word_similarity_dist(), but also forces the match within b to happen at word boundaries.

Copy
db> 
... 
select ext::pg_trgm::strict_word_similarity_dist(
  'cat', 'Lazy dog');
{1}
Copy
db> 
... 
select ext::pg_trgm::strict_word_similarity_dist(
  'cat', 'Dog in a car');
{0.5}
Copy
db> 
... 
select ext::pg_trgm::strict_word_similarity_dist(
  'cat', 'Dog catastrophy');
{0.7692308}
Copy
db> 
... 
select ext::pg_trgm::strict_word_similarity_dist(
  'cat', 'Lazy dog and cat');
{0}

function

ext::pg_trgm::strict_word_similar()
ext::pg_trgm::strict_word_similar(a: str, b: str) -> bool

Same as word_similar, but with stricter boundaries.

This works much like ext::pg_trgm::word_similar(), but also forces the match within b to happen at word boundaries.

Copy
db> 
... 
select ext::pg_trgm::strict_word_similar(
  'cat', 'Lazy dog');
{false}
Copy
db> 
... 
select ext::pg_trgm::strict_word_similar(
  'cat', 'Lazy dog');
{false}
Copy
db> 
... 
select ext::pg_trgm::strict_word_similar(
  'cat', 'Dog catastrophy');
{false}
Copy
db> 
... 
select ext::pg_trgm::strict_word_similar(
  'cat', 'Lazy dog and cat');
{true}