How to Automate Anaconda Installation on AWS EC2 with CloudFormation

If you‘ve ever manually provisioned an Amazon EC2 instance and installed Anaconda along with the necessary Python packages for a project, you know it can be a tedious and error-prone process. Fortunately, AWS provides a service called CloudFormation that allows you to automate the provisioning of resources, including EC2 instances.

In this tutorial, I‘ll show you how to use a CloudFormation template to launch an EC2 instance with Anaconda and commonly used data science packages automatically installed. By the end, you‘ll be able to quickly spin up instances pre-configured for your projects without the manual setup hassle. Let‘s get started!

Why Use CloudFormation?

Before diving into the technical details, it‘s worth highlighting a few key benefits of using CloudFormation to provision your EC2 instances:

  1. Reproducible and consistent setup – By defining your instance configuration in code via a template, you ensure that every instance launched from that template has an identical setup. No more worrying about missing a step in the setup process!

  2. Faster deployment – Manually installing Anaconda, creating virtual environments, and installing packages can easily take 30+ minutes. With CloudFormation, a new instance with everything pre-installed can be ready in a matter of minutes.

  3. Infrastructure as code – Representing your infrastructure setup in a template file enables you to version control it just like your application source code. You can track changes, roll back to previous versions, and collaborate with team members more easily.

With the benefits laid out, let‘s walk through the process of creating a CloudFormation template to automate Anaconda installation on EC2.

Step 1: Create a CloudFormation Template

The first step is to define your instance configuration in a CloudFormation template file. CloudFormation templates are written in JSON or YAML. For this example, we‘ll use YAML.

Here‘s a basic template that launches an EC2 instance and runs a bash script to install Anaconda:

AWSTemplateFormatVersion: 2010-09-09
Description: Provision EC2 instance with Anaconda
Resources:
  EC2Instance:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: ami-0c55b159cbfafe1f0  # Amazon Linux 2 AMI 
      InstanceType: t2.micro
      KeyName: my-ssh-key
      SecurityGroupIds:
        - !Ref SSHSecurityGroup 
      UserData:
        Fn::Base64: !Sub |
          #!/bin/bash
          # Install Anaconda
          wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O anaconda.sh
          bash anaconda.sh -b -p $HOME/anaconda
          rm anaconda.sh
          echo ‘export PATH="$HOME/anaconda/bin:$PATH"‘ >> ~/.bashrc 

          # Create virtual environment and install packages
          conda create --yes --name myenv
          conda activate myenv
          conda install --yes numpy pandas matplotlib scikit-learn

          # Install AWS CLI and boto3
          pip install awscli boto3
          mkdir ~/.aws
          echo "[default]" > ~/.aws/credentials
          echo "aws_access_key_id = ${AWS::AccessKey}" >> ~/.aws/credentials
          echo "aws_secret_access_key = ${AWS::SecretKey}" >> ~/.aws/credentials

  SSHSecurityGroup:
    Type: ‘AWS::EC2::SecurityGroup‘
    Properties:
      GroupDescription: SSH Security Group
      SecurityGroupIngress:
        - IpProtocol: tcp 
          FromPort: 22
          ToPort: 22
          CidrIp: 0.0.0.0/0

Let‘s break this down:

  • The AWSTemplateFormatVersion specifies the CloudFormation template version. You‘ll typically use the latest version, which is 2010-09-09 as of this writing.

  • In the Resources section, we define an EC2Instance resource of type AWS::EC2::Instance

  • ImageId specifies the Amazon Machine Image (AMI) to use. Here we‘re using the Amazon Linux 2 AMI.

  • InstanceType defines the hardware configuration. A t2.micro instance is sufficient for this example.

  • KeyName is the name of an EC2 key pair used to SSH into the instance. You‘ll need to create this separately in the EC2 console.

  • SecurityGroupIds attaches a security group to allow inbound SSH access on port 22 from any IP address. The security group is defined at the bottom of the template.

  • The UserData section is where the real magic happens. It contains a base64 encoded bash script that runs when the instance first boots up. This script:

    1. Downloads and installs Anaconda
    2. Creates a virtual environment called "myenv"
    3. Activates the "myenv" environment and installs numpy, pandas, matplotlib, and scikit-learn
    4. Installs the AWS CLI and boto3 Python package
    5. Configures the AWS CLI credentials file using the AWS access keys passed in as CloudFormation parameters

Feel free to customize the UserData script to install different packages or perform additional configuration steps needed for your project.

Step 2: Launch a CloudFormation Stack

With your template file created, you‘re ready to launch a CloudFormation stack based on it. Here‘s how:

  1. Log in to the AWS Management Console and navigate to the CloudFormation page.

  2. Click "Create stack" and select "With new resources (standard)"

  3. Under "Specify template", choose "Upload a template file" and select your locally saved template YAML file. Click "Next"

  4. Give your stack a name like "anaconda-ec2-stack".

  5. Under "Parameters", input your AWS access key ID and secret access key. These will be used to set up the AWS CLI credentials file on the instance.

  6. Click "Next" to continue. You can optionally configure additional stack options on the next page or just click "Next" again.

  7. On the final Review page, scroll to the bottom and check the box acknowledging that CloudFormation will create IAM resources. Then click "Create stack".

CloudFormation will now begin provisioning the EC2 instance and executing the UserData script. You can monitor the progress on the stack events tab. It can take 10+ minutes for Anaconda to download and install, so be patient! When the stack status changes to CREATE_COMPLETE, your instance with Anaconda is ready.

Step 3: SSH into the instance and verify the setup

To confirm Anaconda and the specified packages were installed correctly:

  1. Go to the EC2 console and find your instance. Copy the public DNS name.

  2. SSH into the instance: ssh -i my-ssh-key.pem ec2-user@<public-dns-name>

  3. Active the conda environment: conda activate myenv

  4. Start a Python REPL and try importing the installed packages:

python
import numpy, pandas, matplotlib, sklearn 

If the imports succeed without errors, you‘re all set!

You can also verify that the AWS CLI and credentials were set up correctly by running a command like aws s3 ls to list your S3 buckets.

Extending the Solution

What we‘ve covered so far should give you a solid foundation for automating Anaconda installation on EC2 with CloudFormation. Here are some additional ideas to extend the solution:

  1. Automate package installation with a requirements file – Instead of specifying individual packages to install via conda install, you could upload a requirements.txt file to S3 and modify the UserData script to download it and run conda install --yes --file requirements.txt.

  2. Use a configuration management tool – For more complex setups, you may want to use tools like Ansible or Puppet in conjunction with CloudFormation to handle post-boot configuration.

  3. Integrate with a CI/CD pipeline – Imagine launching a new EC2 instance pre-installed with Anaconda and your project dependencies on each new code push. You could configure a CI/CD tool like Jenkins or CircleCI to automatically create or update a CloudFormation stack as part of your build and deployment process.

Conclusion

In this post, you learned how to use a CloudFormation template to automate the installation of Anaconda and Python packages on an Amazon EC2 instance. With this technique, you can save a significant amount of time provisioning instances for data science and machine learning projects.

The template we created in this tutorial is just a starting point — feel free to adapt it to your own project‘s needs. You can find the full template code along with the UserData script in this GitHub repo.

I hope this has been a helpful introduction to CloudFormation and infrastructure-as-code concepts. If you have any questions or suggestions for improving the template, feel free to reach out. Happy automating!

Similar Posts