Getting PostgreSQL data from AWS

Amazon RDS for hosting your production (or demo) database servers is great, because it is managed, which means that many minute but crucial things (such as backups, for example) are handled automatically.

However, sometimes it is necessary to try some part of production data on your local development machine.

If you try a simple Google search, you'll probably end up on StackOverflow and see a suggestion that goes something like this:

$ pg_dumpall -p 5432 > /path/to/my/dump_file.sql

This, however, is not the best solution as it dumps everything whereas we are only interested in one part of the data. Moreover, to make this approach work, we have to connect to the remote DB host via SSH and then organize the transfer of `dump_file.sql` to your local machine. There must be a better way.

One such a way is to use `psql` utility client that allows executing commands on remote PostgreSQL hosts.

First, we need to connect to the remote instance:

$ psql --port=5432 --username=user_prod --password --dbname=prod_db

Once connected, we have a number of options. One option is to use the `COPY` command:

prod_db=# COPY (SELECT * FROM bt WHERE cr > '2012-05-01') TO '/tmp/file';

The problem with `COPY` is the `TO` argument? Where do you think the dump file will go? That's right, it will be stored on the remote host! We, however, need this file on our local machine, not on an RDS host. What can we do? Use the `\copy` command (note the backslash):

prod_db=# \copy (select * from feed_xml where create_date < '2018-08-30' order by create_date desc limit 10) To '/tmp/test.csv' With CSV HEADER

The commands look similar, but there is an important difference. The first `COPY` is a SQL command and it is executed by the remote engine. The second `\copy` is a `psql` command and it is executed on your local machine.

The last step is to import data into the locally running instance:

local_db=# \copy feed_xml from '/tmp/test.csv' With CSV HEADER

And that's it! No need to connect to a remote host using SSH or copy dump files manually.

Creating a Rancher-managed host on AWS

While Rancher 2 offers a nice wrapper around Kubernetes, Rancher 1 is still a viable alternative if case you need to get something deployed quickly and reliably.

Essentially, Rancher is just a Docker container running on an EC2 instance. It offers a fancy-looking Web UI, an API for external tools like CircleCI and an agent that orchestrates services on another host. This other host is running business microservices, usually on top of RancherOS and is usually also hosted on EC2.

The first stap is to create an EC2 instance for the Rancher agent. We can use the RancherOS image as it comes with Docker preinstalled and it is about the only thing Rancher agent needs. Creating a new EC2 instance should be relatively straightforward as there are tons of tutorials on the Internet describing how it's done. The important thing here is to give the instance enough space and make sure it will get an external IP address (you'll need it to access Rancher UI). It is also crucial to download the generated PEM file immediately as AWS offers it only once.

After we've created a new EC2 instance for the Rancher itself, we need to connect to a newly created EC2 instance via SSH using the downloaded PEM file.

$ ssh -i rancher.pem rancher@our-server

Rancher agent keeps all settings in its own MySQL database. It is usually a good idea to bind the MySQL data volume outside of the agent container so that we will be able to backup independently of the image:

docker run -d -v /home/rancher/mysql-data:/var/lib/mysql --restart=unless-stopped -p 80:8080 rancher/server:v1.6.21

If you check the Docker logs of this image, you'll see that Rancher agent is written in Java and it immediately starts initializing. Once it's initialized (usually takes about 30 seconds), you can fire up the browser and log into the UI.

The UI allows you to login, but you'll probably want to restrict access to it to only the team members of the company. In order to do that, you'll need to go to "Admin/Access control", open GitHub in another tab and add Rancher as a new OAuth app. Then go back to Rancher UI and fill application keys/secrets.

To test that everything works, it is recommended to log out and then log in authorizing youself with GitHub.

Rancher UI can create instances on AWS provided that it has access to AWS with necessary permissions. The good thing about creating instances via Rancher is that you can always download the PEM file for SSH.