Getting PostgreSQL data from AWS

Amazon RDS for hosting your production (or demo) database servers is great, because it is managed, which means that many minute but crucial things (such as backups, for example) are handled automatically.

However, sometimes it is necessary to try some part of production data on your local development machine.

If you try a simple Google search, you'll probably end up on StackOverflow and see a suggestion that goes something like this:

$ pg_dumpall -p 5432 > /path/to/my/dump_file.sql

This, however, is not the best solution as it dumps everything whereas we are only interested in one part of the data. Moreover, to make this approach work, we have to connect to the remote DB host via SSH and then organize the transfer of `dump_file.sql` to your local machine. There must be a better way.

One such a way is to use `psql` utility client that allows executing commands on remote PostgreSQL hosts.

First, we need to connect to the remote instance:

$ psql --host=proddb.our_region.rds.amazonaws.com --port=5432 --username=user_prod --password --dbname=prod_db

Once connected, we have a number of options. One option is to use the `COPY` command:

prod_db=# COPY (SELECT * FROM bt WHERE cr > '2012-05-01') TO '/tmp/file';

The problem with `COPY` is the `TO` argument? Where do you think the dump file will go? That's right, it will be stored on the remote host! We, however, need this file on our local machine, not on an RDS host. What can we do? Use the `\copy` command (note the backslash):

prod_db=# \copy (select * from feed_xml where create_date < '2018-08-30' order by create_date desc limit 10) To '/tmp/test.csv' With CSV HEADER

The commands look similar, but there is an important difference. The first `COPY` is a SQL command and it is executed by the remote engine. The second `\copy` is a `psql` command and it is executed on your local machine.

The last step is to import data into the locally running instance:

local_db=# \copy feed_xml from '/tmp/test.csv' With CSV HEADER

And that's it! No need to connect to a remote host using SSH or copy dump files manually.