String Functions and Operators
String | |
String indexing. | |
String slicing. | |
String concatenation. | |
Case-sensitive simple string matching. | |
Case-insensitive simple string matching. | |
Comparison operators | |
Returns the string representation of the input value. | |
Returns a string’s length. | |
Tests if a string contains a substring. | |
Finds the index of a substring. | |
Returns a lowercase copy of the input string. | |
Returns an uppercase copy of the input string. | |
Returns a titlecase copy of the input string. | |
Returns the input string padded at the start to the length n. | |
Returns the input string padded at the end to the length n. | |
Returns the input string with trim characters removed from both ends. | |
Returns the input string with all trim characters removed from the start. | |
Returns the input string with all trim characters removed from the end. | |
Repeats the input string n times. | |
Replaces all occurrences of a substring with a new one. | |
Reverses the order of the characters in the string. | |
Splits a string into an array using a delimiter. | |
Finds the first regular expression match in a string. | |
Finds all regular expression matches in a string. | |
Replaces matching substrings in a given string. | |
Tests if a regular expression has a match in a string. |
A unicode string of text.
Any other type (except bytes
) can be
cast
to and from a string:
db>
select <str>42;
{'42'}
db>
select <bool>'true';
{true}
db>
select "I ❤️ EdgeDB";
{'I ❤️ EdgeDB'}
Note that when a str
is cast into a json
,
the result is a JSON string value. The same applies for casting back
from json
- only a JSON string value can be cast into
a str
:
db>
select <json>'Hello, world';
{'"Hello, world"'}
There are two kinds of string literals in EdgeQL: regular and raw.
Raw string literals do not evaluate \
, so \n
in in a raw string
is two characters \
and n
.
The regular string literal syntax is 'a string'
or a "a string"
.
Two raw string syntaxes are illustrated below:
db>
select r'A raw \n string';
{'A raw \\n string'}
db>
select 'Not a raw \n string';
{ 'Not a raw string', }
db>
select $$something$$;
{'something'}
db> ...
select $marker$something $$
nested \!$$$marker$;
{'something $$ nested \!$$'}
Regular strings use \
to indicate line continuation. When a
line continuation symbol is encountered, the symbol itself as well
as all the whitespace characters up to the next non-whitespace
character are omitted from the string:
db> ...
select 'Hello, \
world';
{'"Hello, world"'}
This type is subject to the Postgres maximum field size of 1GB.
String indexing.
Indexing starts at 0. Negative indexes are also valid and count from the end of the string.
db>
select 'some text'[1];
{'o'}
db>
select 'some text'[-1];
{'t'}
It is an error to attempt to extract a character at an index outside the bounds of the string:
db>
select 'some text'[8];
{'t'}
db>
select 'some text'[9];
InvalidValueError: string index 9 is out of bounds
A slice up to the next index can be used if an empty string is preferred to an error when outside the bounds of the string:
db>
select 'some text'[8:9];
{'t'}
db>
select 'some text'[9:10];
{''}
String slicing.
Indexing starts at 0. Negative indexes are also valid and count from the end of the string.
db>
select 'some text'[1:3];
{'om'}
db>
select 'some text'[-4:];
{'text'}
db>
select 'some text'[:-5];
{'some'}
db>
select 'some text'[5:-2];
{'te'}
It is perfectly acceptable to use indexes outside the bounds of a string in a slice:
db>
select 'some text'[-4:100];
{'text'}
db>
select 'some text'[-100:-5];
{'some'}
Case-sensitive simple string matching.
Returns true
if the value (the str
on the left) matches the
pattern (the str
on the right) and false
otherwise. The operator
not like
is the negation of like
.
The pattern matching rules are as follows:
pattern |
interpretation |
---|---|
|
matches zero or more characters |
|
matches exactly one character |
|
matches a literal “%” |
|
matches a literal “_” |
any other character |
matches itself |
In particular, this means that if there are no special symbols in
the pattern, the operators like
and not
like
work identical to =
and !=
,
respectively.
db>
select 'abc' like 'abc';
{true}
db>
select 'abc' like 'a%';
{true}
db>
select 'abc' like '_b_';
{true}
db>
select 'abc' like 'c';
{false}
db>
select 'a%%c' not like r'a\%c';
{true}
Returns the input string padded at the start to the length n.
If the string is longer than n, then it is truncated to the first n characters. Otherwise, the string is padded on the left up to the total length n using fill characters (space by default).
db>
select str_pad_start('short', 10);
{' short'}
db>
select str_pad_start('much too long', 10);
{'much too l'}
db>
select str_pad_start('short', 10, '.:');
{'.:.:.short'}
Returns the input string padded at the end to the length n.
If the string is longer than n, then it is truncated to the first n characters. Otherwise, the string is padded on the right up to the total length n using fill characters (space by default).
db>
select str_pad_end('short', 10);
{'short '}
db>
select str_pad_end('much too long', 10);
{'much too l'}
db>
select str_pad_end('short', 10, '.:');
{'short.:.:.'}
Returns the input string with all trim characters removed from the start.
If trim specifies more than one character they will be removed from the beginning of the string regardless of the order in which they appear.
db>
select str_trim_start(' data');
{'data'}
db>
select str_trim_start('.....data', '.:');
{'data'}
db>
select str_trim_start(':::::data', '.:');
{'data'}
db>
select str_trim_start(':...:data', '.:');
{'data'}
db>
select str_trim_start('.:.:.data', '.:');
{'data'}
Returns the input string with all trim characters removed from the end.
If trim specifies more than one character they will be removed from the end of the string regardless of the order in which they appear.
db>
select str_trim_end('data ');
{'data'}
db>
select str_trim_end('data.....', '.:');
{'data'}
db>
select str_trim_end('data:::::', '.:');
{'data'}
db>
select str_trim_end('data:...:', '.:');
{'data'}
db>
select str_trim_end('data.:.:.', '.:');
{'data'}
Returns the input string with trim characters removed from both ends.
If trim specifies more than one character they will be removed from
both ends of the string regardless of the order in which they appear.
This is the same as applying
str_trim_start()
and str_trim_end()
.
db>
select str_trim(' data ');
{'data'}
db>
select str_trim('::data.....', '.:');
{'data'}
db>
select str_trim('..data:::::', '.:');
{'data'}
db>
select str_trim('.:data:...:', '.:');
{'data'}
db>
select str_trim(':.:.data.:.', '.:');
{'data'}
Replaces all occurrences of a substring with a new one.
Given a string s, finds all non-overlapping occurrences of the substring old and replaces them with the substring new.
db>
select str_replace('hello world', 'h', 'H');
{'Hello world'}
db>
select str_replace('hello world', 'l', '[L]');
{'he[L][L]o wor[L]d'}
db>
select str_replace('hello world', 'o', '😄');
{'hell😄 w😄rld'}
Finds the first regular expression match in a string.
Given an input string and a regular expression pattern, finds the first match for the regular expression
within the string. Each match returned is represented by an
array<str>
of matched groups.
db>
select re_match(r'\w{4}ql', 'I ❤️ edgeql');
{['edgeql']}
Finds all regular expression matches in a string.
Given an input string and a regular expression pattern, repeatedly matches the regular expression within
the string. Returns the set of all matches, with each match
represented by an array<str>
of matched groups.
db>
select re_match_all(r'a\w+', 'an abstract concept');
{['an'], ['abstract']}
Replaces matching substrings in a given string.
Takes an input string and a regular expression pattern, replacing matching substrings with the replacement string sub. Optional flag arguments can be used to specify additional regular expression flags.
db>
select re_replace('a', 'A', 'Alabama');
{'AlAbama'}
db>
select re_replace('a', 'A', 'Alabama', flags := 'g');
{'AlAbAmA'}
db>
select re_replace('A', 'A', 'Alabama', flags := 'ig');
{'AlAbAmA'}
Tests if a regular expression has a match in a string.
Given an input string and a regular expression pattern, tests whether there is a match for the regular
expression within the string. Returns true
if there is a
match, false
otherwise.
db>
select re_test(r'a', 'abc');
{true}
Returns the string representation of the input value.
A versatile polymorphic function defined for different input types,
to_str
uses corresponding converter functions from str
to specific types via the format argument fmt.
When converting bytes
, datetime
,
cal::local_datetime
, cal::local_date
,
cal::local_time
, duration
this function
is the inverse of to_bytes()
, to_datetime()
,
cal::to_local_datetime()
, cal::to_local_date()
,
cal::to_local_time()
, to_duration()
, correspondingly.
For valid date and time formatting patterns see here.
db> ...
select to_str(<datetime>'2018-05-07 15:01:22.306916-05',
'FMDDth "of" FMMonth, YYYY');
{'7th of May, 2018'}
db>
select to_str(<cal::local_date>'2018-05-07', 'CCth "century"');
{'21st century'}
If you want to use literal text in your format string, it’s best to
enclose it in double quotes as shown above with of
and
century
.
When converting one of the numeric types, this function is the
reverse of: to_bigint()
, to_decimal()
,
to_int16()
, to_int32()
, to_int64()
,
to_float32()
, to_float64()
.
For valid number formatting patterns see here.
See also to_json()
.
db>
select to_str(123, '999999');
{' 123'}
db>
select to_str(123, '099999');
{' 000123'}
db>
select to_str(123.45, 'S999.999');
{'+123.450'}
db>
select to_str(123.45e-20, '9.99EEEE');
{' 1.23e-18'}
db>
select to_str(-123.45n, 'S999.99');
{'-123.45'}
When converting json
, this function can take
'pretty'
as an optional fmt argument to produce a
pretty-formatted JSON string.
See also to_json()
.
db>
select to_str(<json>2);
{'2'}
db>
select to_str(<json>['hello', 'world']);
{'["hello", "world"]'}
db>
select to_str(<json>(a := 2, b := 'hello'), 'pretty');
{'{ "a": 2, "b": "hello" }'}
When converting arrays
, a delimiter argument
is required:
db>
select to_str(['one', 'two', 'three'], ', ');
{'one, two, three'}
A version of std::to_str
exists which operates on arrays but has
been deprecated; array_join()
should be used instead.
A bytes
value can be converted to a str
using
UTF-8 encoding. Returns an InvalidValueError if input UTF-8 is invalid.
db>
select to_str(b'\xe3\x83\x86');
{'テ'}
db>
select to_str(b'\xe3\x83');
edgedb error: InvalidValueError: invalid byte sequence for encoding "UTF8": 0xe3 0x83
Regular Expressions
EdgeDB supports Regular expressions (REs), as defined in POSIX 1003.2. They come in two forms: BRE (basic RE) and ERE (extended RE). In addition, EdgeDB supports certain common extensions to the POSIX standard commonly known as ARE (advanced RE). More details about BRE, ERE, and ARE support can be found in PostgreSQL documentation.
The table below outlines the different options accepted as the flags
argument to various regular expression functions, or as
embedded options in the pattern itself, e.g. '(?i)fooBAR'
:
Option Flags
Option |
Description |
---|---|
|
rest of RE is a BRE |
|
case-sensitive matching (overrides operator type) |
|
rest of RE is an ERE |
|
case-insensitive matching (overrides operator type) |
|
historical synonym for n |
|
newline-sensitive matching |
|
partial newline-sensitive matching |
|
rest of RE is a literal (“quoted”) string, all ordinary characters |
|
non-newline-sensitive matching (default) |
|
tight syntax (default) |
|
inverse partial newline-sensitive (“weird”) matching |
|
expanded syntax ignoring whitespace characters |
Formatting
Some of the type converter functions take an extra argument specifying
formatting (either for converting to a str
or parsing from one).
The different formatting options are collected in this section.
Date and time formatting options
Pattern |
Description |
---|---|
HH |
hour of day (01-12) |
HH12 |
hour of day (01-12) |
HH24 |
hour of day (00-23) |
MI |
minute (00-59) |
SS |
second (00-59) |
MS |
millisecond (000-999) |
US |
microsecond (000000-999999) |
SSSS |
seconds past midnight (0-86399) |
AM, am, PM or pm |
meridiem indicator (without periods) |
A.M., a.m., P.M. or p.m. |
meridiem indicator (with periods) |
Y,YYY |
year (4 or more digits) with comma |
YYYY |
year (4 or more digits) |
YYY |
last 3 digits of year |
YY |
last 2 digits of year |
Y |
last digit of year |
IYYY |
ISO 8601 week-numbering year (4 or more digits) |
IYY |
last 3 digits of ISO 8601 week- numbering year |
IY |
last 2 digits of ISO 8601 week- numbering year |
I |
last digit of ISO 8601 week-numbering year |
BC, bc, AD or ad |
era indicator (without periods) |
B.C., b.c., A.D. or a.d. |
era indicator (with periods) |
MONTH |
full upper case month name (blank- padded to 9 chars) |
Month |
full capitalized month name (blank- padded to 9 chars) |
month |
full lower case month name (blank- padded to 9 chars) |
MON |
abbreviated upper case month name (3 chars in English, localized lengths vary) |
Mon |
abbreviated capitalized month name (3 chars in English, localized lengths vary) |
mon |
abbreviated lower case month name (3 chars in English, localized lengths vary) |
MM |
month number (01-12) |
DAY |
full upper case day name (blank-padded to 9 chars) |
Day |
full capitalized day name (blank- padded to 9 chars) |
day |
full lower case day name (blank-padded to 9 chars) |
DY |
abbreviated upper case day name (3 chars in English, localized lengths vary) |
Dy |
abbreviated capitalized day name (3 chars in English, localized lengths vary) |
dy |
abbreviated lower case day name (3 chars in English, localized lengths vary) |
DDD |
day of year (001-366) |
IDDD |
day of ISO 8601 week-numbering year (001-371; day 1 of the year is Monday of the first ISO week) |
DD |
day of month (01-31) |
D |
day of the week, Sunday (1) to Saturday (7) |
ID |
ISO 8601 day of the week, Monday (1) to Sunday (7) |
W |
week of month (1-5) (the first week starts on the first day of the month) |
WW |
week number of year (1-53) (the first week starts on the first day of the year) |
IW |
week number of ISO 8601 week-numbering year (01-53; the first Thursday of the year is in week 1) |
CC |
century (2 digits) (the twenty-first century starts on 2001-01-01) |
J |
Julian Day (integer days since November 24, 4714 BC at midnight UTC) |
Q |
quarter |
RM |
month in upper case Roman numerals (I-XII; I=January) |
rm |
month in lower case Roman numerals (i-xii; i=January) |
TZ |
upper case time-zone abbreviation
(only supported in |
tz |
lower case time-zone abbreviation
(only supported in |
TZH |
time-zone hours |
TZM |
time-zone minutes |
OF |
time-zone offset from UTC (only
supported in |
Some additional formatting modifiers:
Modifier |
Description |
Example |
---|---|---|
FM prefix |
fill mode (suppress leading zeroes and padding blanks) |
FMMonth |
TH suffix |
upper case ordinal number suffix |
DDTH, e.g., 12TH |
th suffix |
lower case ordinal number suffix |
DDth, e.g., 12th |
FX prefix |
fixed format global option (see usage notes) |
FX Month DD Day |
Whitespace is normally ignored when parsing string input, unless the FX prefix modifier is used. For example:
db> ...
select cal::to_local_date(
'2000 JUN', 'YYYY MON');
{<cal::local_date>'2000-06-01'}
db> ...
select cal::to_local_date(
'2000 JUN', 'FXYYYY MON');
InternalServerError: invalid value " " for "MON"
Ordinary text is allowed in to_str()
format strings and will be
output literally. For better compatibility you should put a plain substring
in double quotes to force it to be interpreted as literal text even if it
contains template patterns now or could in the future.
Number formatting options
Pattern |
Description |
---|---|
9 |
digit position (can be dropped if insignificant) |
0 |
digit position (will not be dropped even if insignificant) |
. |
(period) decimal point |
, |
(comma) group (thousands) separator |
PR |
negative value in angle brackets |
S |
sign anchored to number (uses locale) |
L |
currency symbol (uses locale) |
D |
decimal point (uses locale) |
G |
group separator (uses locale) |
MI |
minus sign in specified position (if number < 0) |
PL |
plus sign in specified position (if number > 0) |
SG |
plus/minus sign in specified position |
RN |
Roman numeral (input between 1 and 3999) |
TH or th |
ordinal number suffix |
V |
shift specified number of digits (see notes) |
EEEE |
exponent for scientific notation |
Some additional formatting modifiers:
Modifier |
Description |
Example |
---|---|---|
FM prefix |
fill mode (suppress leading zeroes and padding blanks) |
FM99.99 |
TH suffix |
upper case ordinal number suffix |
999TH |
th suffix |
lower case ordinal number suffix |
999th |