Search
ctrl/
Ask AI
Light
Dark
System

Postgres

In this guide, we show how to move your data from Postgres to EdgeDB. However, most of the approaches covered here should be applicable to other SQL databases as well.

As an example we’ll use an app that allowed users to chat. The main features of this app revolve around posting and responding to messages. Once the data is moved, it is then possible to use EdgeQL instead of SQL to fetch the desired data in the app’s API calls. Here we’ll mainly focus on the data structures used for that, how to reflect them into EdgeDB, and how to script moving the data across to the EdgeDB database.

Let’s start from an overview of the SQL tables we have:

social=> \d
              List of relations
 Schema |       Name       |   Type   | Owner
--------+------------------+----------+-------
 public | badges           | table    | myapp
 public | bookmarks        | table    | myapp
 public | bookmarks_id_seq | sequence | myapp
 public | posts            | table    | myapp
 public | posts_id_seq     | sequence | myapp
 public | statuses         | table    | myapp
 public | statuses_id_seq  | sequence | myapp
 public | threads          | table    | myapp
 public | threads_id_seq   | sequence | myapp
 public | users            | table    | myapp
 public | users_id_seq     | sequence | myapp
(11 rows)

The users and posts tables store the bulk of the content, whereas the badges, bookmarks, statuses, and threads tables store some metadata that helps us tie everything together. When importing data, we want to start with tables that are standalone and don’t reference anything else and then move on to more complex data. In this case, badges and statuses are such tables, so let’s look at them:

social=> \d badges
                Table "public.badges"
   Column    | Type | Collation | Nullable | Default
-------------+------+-----------+----------+---------
 name        | text |           | not null |
 description | text |           | not null |
Indexes:
    "badges_pkey" PRIMARY KEY, btree (name)
Referenced by:
    TABLE "users" CONSTRAINT "users_badge_name_fkey" FOREIGN KEY (badge_name) REFERENCES badges(name)

social=> \d statuses
                            Table "public.statuses"
 Column |  Type   | Collation | Nullable |               Default
--------+---------+-----------+----------+--------------------------------------
 id     | integer |           | not null | nextval('statuses_id_seq'::regclass)
 title  | text    |           | not null |
Indexes:
    "statuses_pkey" PRIMARY KEY, btree (id)
    "statuses_title_key" UNIQUE CONSTRAINT, btree (title)
Referenced by:
    TABLE "users" CONSTRAINT "users_status_id_fkey" FOREIGN KEY (status_id) REFERENCES statuses(id)

The badges table uses the name of the badge as a primary key as opposed to a separate id. In order to reflect that properly in EdgeDB, we would have to add an exclusive constraint to this property. Meanwhile not null makes the property required, leaving us with a type like this:

Copy
type Badge {
    required name: str {
        constraint exclusive;
    }
    required description: str;
}

The statuses table has a dedicated id column in addition to title. However, the automatic id in EdgeDB is a uuid, whereas in our original dataset it is an integer. Let’s assume that for this table we never actually use the id in our code, relying instead on the fact that title is UNIQUE and serves as a much more descriptive identifier. We can use the unique title to correctly map the data during our migration without the need to also copy the old id. This leaves us with the following type:

Copy
type Status {
    required title: str {
        constraint exclusive;
    }
}

Next, we can look at the users table:

social=> \d users
                                 Table "public.users"
     Column      |  Type   | Collation | Nullable |              Default
-----------------+---------+-----------+----------+-----------------------------------
 id              | integer |           | not null | nextval('users_id_seq'::regclass)
 name            | text    |           | not null |
 email           | text    |           | not null |
 password        | text    |           | not null |
 client_settings | jsonb   |           |          |
 badge_name      | text    |           |          |
 status_id       | integer |           |          |
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
    "users_email_key" UNIQUE CONSTRAINT, btree (email)
    "users_name_key" UNIQUE CONSTRAINT, btree (name)
Foreign-key constraints:
    "users_badge_name_fkey" FOREIGN KEY (badge_name) REFERENCES badges(name)
    "users_status_id_fkey" FOREIGN KEY (status_id) REFERENCES statuses(id)
Referenced by:
    TABLE "bookmarks" CONSTRAINT "bookmarks_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)
    TABLE "posts" CONSTRAINT "posts_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)

The users table, like statuses, has an id column, which is not a uuid. Instead of omitting the id data, we’ll record it as app_id in EdgeDB to facilitate the transition. We may still want to eventually drop it in favor of the built-in id from EdgeDB, but we need it for now. Incidentally, even if the id was specified as a uuid value the recommended process is to record it as app_id as opposed to try and replicate it as the main object id. It is, however, also possible to bring it over as the main id by adjusting certain client connection settings. The column client_settings would become a json property. The columns badge_name and status_id reference badges and statuses respectively and will become links in EdgeDB instead of properties, even though a property would more closely mirror how they are stored in Postgres:

Copy
type User {
    required app_id: int32 {
        # It was unique originally, so this should be preserved.
        constraint exclusive;
    }
    required name: str {
        constraint exclusive;
    }
    required email: str {
        constraint exclusive;
    }
    required password: str;
    client_settings: json;

    # Both badge and status are optional.
    badge: Badge;
    status: Status;
}

The next table to consider is threads, which provides a way to group posts by referring to it:

social=> \d threads
                            Table "public.threads"
 Column |  Type   | Collation | Nullable |               Default
--------+---------+-----------+----------+-------------------------------------
 id     | integer |           | not null | nextval('threads_id_seq'::regclass)
 title  | text    |           |          |
Indexes:
    "threads_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "posts" CONSTRAINT "posts_thread_id_fkey" FOREIGN KEY (thread_id) REFERENCES threads(id)

This table has very simple structure that we’ve seen before, so we can model it similarly to our Status type in the new database. However, just like we did for the users table, we may want to preserve the original id as app_id:

Copy
type Thread {
    required app_id: int32 {
        constraint exclusive;
    }
    title: str;
}

Then we look at the posts table:

social=> \d posts
                                        Table "public.posts"
    Column     |           Type           | Collation | Nullable |              Default
---------------+--------------------------+-----------+----------+-----------------------------------
 id            | integer                  |           | not null | nextval('posts_id_seq'::regclass)
 body          | text                     |           | not null |
 creation_time | timestamp with time zone |           | not null |
 edited_time   | timestamp with time zone |           |          |
 user_id       | integer                  |           | not null |
 thread_id     | integer                  |           |          |
 reply_to_id   | integer                  |           |          |
Indexes:
    "posts_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "posts_reply_to_id_fkey" FOREIGN KEY (reply_to_id) REFERENCES posts(id)
    "posts_thread_id_fkey" FOREIGN KEY (thread_id) REFERENCES threads(id)
    "posts_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)
Referenced by:
    TABLE "bookmarks" CONSTRAINT "bookmarks_post_id_fkey" FOREIGN KEY (post_id) REFERENCES posts(id)
    TABLE "posts" CONSTRAINT "posts_reply_to_id_fkey" FOREIGN KEY (reply_to_id) REFERENCES posts(id)

The posts table also has an id that we want to keep around, at least during the transition. We have a couple of columns using a timestamp with time zone value, so they’ll become datetime properties in EdgeDB. The user_id, thread_id, and reply_to_id columns will become links to User, Thread, and Post respectively:

Copy
type Post {
    required app_id: int32 {
        constraint exclusive;
    }
    required body: str {
        constraint exclusive;
    }
    required creation_time: datetime {
        # We might as well provide a default here so we don't have
        # to pass it all the time when making a new post.
        default := datetime_current();
    }
    edited_time: datetime;

    required user: User;
    thread: Thread;
    reply_to: Post;
}

Finally, we get to bookmarks, which refers to both users and posts:

social=> \d bookmarks
                             Table "public.bookmarks"
 Column  |  Type   | Collation | Nullable |                Default
---------+---------+-----------+----------+---------------------------------------
 id      | integer |           | not null | nextval('bookmarks_id_seq'::regclass)
 user_id | integer |           | not null |
 post_id | integer |           | not null |
 note    | text    |           |          |
Indexes:
    "bookmarks_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "bookmarks_post_id_fkey" FOREIGN KEY (post_id) REFERENCES posts(id)
    "bookmarks_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)

This is expressing a many-to-many relationship between users and posts, which we can model as a multi link. We can then declare the note as a link property. Since we’re likely to want to fetch bookmarks given a particular user, it makes sense to place the link on the User type. In the end we end up with a schema that looks something like this:

Copy
type Badge {
    required name: str {
        constraint exclusive;
    }
    required description: str;
}

type Status {
    required title: str {
        constraint exclusive;
    }
}

type User {
    required app_id: int32 {
        constraint exclusive;
    }
    required name: str {
        constraint exclusive;
    }
    required email: str {
        constraint exclusive;
    }
    required password: str;
    client_settings: json;

    badge: Badge;
    status: Status;

    # Multi link to the Post objects
    multi bookmark: Post {
        note: str;
    }
}

type Thread {
    required app_id: int32 {
        constraint exclusive;
    }
    title: str;

    # Let's add a computed link back to posts.
    posts := .<thread[is Post];
}

type Post {
    required app_id: int32 {
        constraint exclusive;
    }
    required body: str {
        constraint exclusive;
    }
    required creation_time: datetime {
        default := datetime_current();
    }
    edited_time: datetime;

    required user: User;
    thread: Thread;
    reply_to: Post;
}

Now that we have a schema, we can use edgedb project init to set up our new EdgeDB database. A new schema migration is added via edgedb migration create and then edgedb migrate applies the schema changes to the database. After the schema migration, we’ll still need to copy the existing data from Postgres. JSON is a pretty good intermediate format for this operation. EdgeDB can cast data from json to all of the built-in scalar types, so we should be able to use a JSON dump with minimal additional processing when importing all the data.

We will dump badges and statuses first:

social=> SELECT ROW_TO_JSON(t) FROM badges AS t;
                                row_to_json
---------------------------------------------------------------------------
 {"name":"admin","description":"Superuser who can do anything"}
 {"name":"moderator","description":"User who can edit other user's posts"}
(2 rows)

social=> SELECT ROW_TO_JSON(t) FROM statuses AS t;
        row_to_json
----------------------------
 {"id":1,"title":"happy"}
 {"id":2,"title":"sad"}
 {"id":3,"title":"excited"}
 {"id":4,"title":"mad"}
(4 rows)

These tables can be dumped directly to a file using a COPY ... TO <filename> command. We can then read the files and use a simple loop to import the data into EdgeDB.

When Postgres dumps the JSON data as text files there will be a known gotcha causing all the backslashes used to escape characters inside JSON string values to be doubled. This is because the text format causes backslashes themselves to be escaped. This needs to be accounted for when reading the resulting files. Accomplishing this through the Python client library would look like the following:

Copy
for line in open('badges.json'):
    client.query('''
        with data := to_json(<str>$line)
        insert Badge {
            name := <str>data['name'],
            description := <str>data['description'],
        }
    ''',
    line=line.replace(r'\\', '\\'))

for line in open('statuses.json'):
    client.query('''
        with data := to_json(<str>$line)
        insert Status {
            title := <str>data['title'],
        }
    ''',
    line=line.replace(r'\\', '\\'))

The threads table can likewise be dumped directly as JSON with the only minor difference being that we want to change the id to app_id when we move the data to EdgeDB:

Copy
for line in open('threads.json'):
    client.query('''
        with data := to_json(<str>$line)
        insert Thread {
            app_id := <int32>data['id'],
            title := <str>data['title'],
        }
    ''',
    line=line.replace(r'\\', '\\'))

To copy the users table, we may want to use a more complex SELECT that joins the statuses so we can match them by their unique names:

SELECT ROW_TO_JSON(t)
FROM (
  SELECT
    users.id, name, email, password, client_settings,
    badge_name, statuses.title
  FROM users
  LEFT JOIN statuses ON status_id = statuses.id
) AS t;

When we run our import script, we can convert the badge_name and status JSON values into the corresponding objects by using sub-queries:

Copy
for line in open('users.json'):
    client.query('''
        with data := to_json(<str>$line)
        insert User {
            app_id := <int32>data['id'],
            name := <str>data['name'],
            email := <str>data['email'],
            password := <str>data['password'],
            client_settings := data['client_settings'],
            badge := (
                select Badge filter .name = <str>data['badge_name']
            ),
            status := (
                select Status filter .title = <str>data['status']
            ),
        }
    ''',
    line=line.replace(r'\\', '\\'))

The posts table can be dumped as JSON directly, but we’ll need to write sub-queries in the import script to correctly link Post objects. In order to make this simpler, we can order the original data by creation_time so we know any Post object that is referenced by the reply_to_id has already been re-created in EdgeDB.

Copy
for line in open('posts.json'):
    client.query('''
        with data := to_json(<str>$line)
        insert Post {
            app_id := <int32>data['id'],
            body := <str>data['body'],
            creation_time := <datetime>data['creation_time'],
            edited_time := <datetime>data['edited_time'],
            user := (
                select User filter .app_id = <int32>data['user_id']
            ),
            thread := (
                select Thread filter .app_id = <int32>data['thread_id']
            ),
            reply_to := (
                select detached Post
                filter .app_id = <int32>data['reply_to_id']
            ),
        }
    ''',
    line=line.replace(r'\\', '\\'))

Finally, we can deal with the bookmarks since we’ve imported both the users and the posts. The bookmarks table can be dumped as JSON directly, and then we can write appropriate update query to add this data to EdgeDB:

Copy
for line in open('bookmarks.json'):
    client.query('''
        with data := to_json(<str>$line)
        update User
        filter .app_id = <int32>data['user_id']
        set {
            bookmark += (
                select Post {
                    @note := <str>data['note']
                }
                filter .app_id = <int32>data['post_id']
            ),
        }
    ''',
    line=line)

We use += in our update query to add data incrementally. This way we don’t need to further organize the bookmarks when importing. This approach also mimics how the bookmarks might be created in the app going forward.

After all the import scripts, we end up with data that looks something like this:

Copy
local:db> 
......... 
......... 
......... 
......... 
......... 
......... 
......... 
......... 
......... 
......... 
select User {
  name,
  email,
  status: {title},
  badge: {name},
  bookmark: {
    @note,
    body,
    user: {name}
  },
} filter .name = 'Cameron';
{
  default::User {
    name: 'Cameron',
    email: 'cameron@edgedb.com',
    status: {},
    badge: default::Badge {name: 'admin'},
    bookmark: {
      default::Post {
        body: 'Hey everyone! How\'s your day going?',
        user: default::User {name: 'Alice'},
        @note: 'rendering glitch',
      },
      default::Post {
        body: 'Funny you ask, Alice. I actually work at EdgeDB!',
        user: default::User {name: 'Dana'},
        @note: 'follow-up',
      },
      default::Post {
        body: 'Pineapple on pizza? No way! It\'s a crime against taste buds.',
        user: default::User {name: 'Billie'},
        @note: {},
      },
    },
  },
}