Image by: COSTI (@costi_cma)

Prometheus and Grafana on EC2 with Terraform and Ansible

2024-03-01

In this article, I'll show you how to deploy an EC2 virtual machine and set up Prometheus and Grafana servers on it. We'll achieve this using Infrastructure as Code with Terraform and automate the configuration management with Ansible.

If you're curious about Infrastructure as Code and its benefits, I've also written another article on that topic.

You can access the full code for this article on this GitHub repository.

Networking

Our first step will be to set up a Virtual Private Cloud (VPC) and a Subnet. To ensure the subnet is public, we'll create an Internet Gateway and the necessary Route Table to facilitate traffic routing to the Internet:

# network.tf
resource "aws_vpc" "monitoring" {
  cidr_block = "10.0.0.0/16"
}
resource "aws_subnet" "monitoring_a" {
  vpc_id     = aws_vpc.monitoring.id
  cidr_block = cidrsubnet(aws_vpc.monitoring.cidr_block, 8, 0)
}
resource "aws_internet_gateway" "monitoring" {
  vpc_id = aws_vpc.monitoring.id
}
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.monitoring.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.monitoring.id
  }
}
resource "aws_route_table_association" "monitoring-a" {
  subnet_id      = aws_subnet.monitoring_a.id
  route_table_id = aws_route_table.public.id
}

Also, we'll need to open ports to allow access to Prometheus and Grafana. The default port for Prometheus is 9090, while Grafana uses port 3000. We'll open port 22 to enable SSH access to the instance, too.

To achieve this, we'll create a security group that includes the necessary rules to permit incoming traffic on these ports. Even though our subnet's external traffic is being routed to the Internet, we must explicitly specify in our Security Group that we allow outbound traffic. This is because, by default, security groups deny all outbound traffic:

# network.tf
resource "aws_security_group" "monitoring" {
  name        = "monitoring-server"
  vpc_id      = aws_vpc.monitoring.id
  ingress {
    description = "Allow SSH"
    from_port        = 22
    to_port          = 22
    protocol         = "tcp"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }
  ingress {
    description = "Allow traffic to Prometheus server (port 9090 by default)"
    from_port        = 9090
    to_port          = 9090
    protocol         = "tcp"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }
  ingress {
    description = "Allow traffic to Grafana (port 3000 by default)"
    from_port        = 3000
    to_port          = 3000
    protocol         = "tcp"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }
  egress {
    description = "Allow all outbound traffic"
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }
}

Now that we have all the networking elements in place, we can move forward with the creation of our virtual machine.

EC2 instance

To establish an SSH connection to the instance, we need to create a key pair. This can be done using the following code, which allows us to import an existing public key from our local machine:

resource "aws_key_pair" "monitoring" {
  key_name   = "monitoring_server_keypair"
  public_key = file("~/.ssh/public_key.pub")
}

Now, we will create the instance and link it to the elements we previously created: subnet, security group, and key.

resource "aws_instance" "monitoring" {
  instance_type               = var.instance_type
  ami                         = var.ami_id
  subnet_id                   = aws_subnet.monitoring_a.id
  vpc_security_group_ids      = [aws_security_group.monitoring.id]
  key_name                    = aws_key_pair.monitoring.key_name
  associate_public_ip_address = true
}

Run Prometheus and Grafana

With our EC2 instance in place, our goal is to run Prometheus and Grafana on it. Instead of manually accessing the machine via SSH to execute these tasks, we will streamline the process by creating a provisioner block that calls our Ansible playbook. This playbook will be responsible for setting up and running Prometheus and Grafana binaries efficiently.

First, we will create our Ansible playbook. This is the fragment where Grafana is installed and started:

- rpm_key:
    key: https://rpm.grafana.com/gpg.key
    state: present
  become: true
- yum:
    name: "https://dl.grafana.com/oss/release/grafana-{{ grafana_version }}.x86_64.rpm"
    state: present
  become: true
- systemd:
    name: grafana-server
    state: started
  become: true

For simplicity, I won't show the full playbook here, but you can see the code here.

We want the playbook to be executed when our EC2 instance boots up. We can achieve this using a Terraform provisioner. We will add the code below to our aws_instance.monitoring Terraform resource:

provisioner "local-exec" {
  command = "ANSIBLE_HOST_KEY_CHECKING=false ansible-playbook -u ec2-user -i '${self.public_ip},' --private-key ${var.instance_ssh_priv_key} ../ansible/playbook.yml"
}

ec2-user is the default user for Amazon Linux machines.
self.public_ip refers to EC2 machine public IP.
--private-key ${var.instance_ssh_priv_key}: here we tell Ansible which private key we want to use to connect to the instance. Note we must incorporate a variable for the private key location.

Now, we are ready to initiate our infrastructure creation using Terraform (plan and apply). Once the EC2 instance is successfully created, the Ansible playbook will be automatically executed, making available the Prometheus and Grafana servers.

Prometheus and Grafana on EC2 with Terraform and Ansible

Networking

EC2 instance

Run Prometheus and Grafana

More posts...