Experimenting with the Apache Mesos HTTP API for Fun and Profit

Apache Mesos is a powerful cluster manager that abstracts CPU, memory, storage, and other compute resources away from machines, enabling fault-tolerant and elastic distributed systems to easily be built and run effectively.

While often used alongside big data processing frameworks like Spark, Hadoop, and Kafka, the real power of Mesos lies in its extensible architecture. Through its HTTP API, the core Mesos components can be controlled and monitored programmatically, allowing infrastructure to be managed as code.

In this post, we‘ll explore the Mesos HTTP API in depth and walk through some examples of using it to inspect a Mesos cluster, launch tasks, and even build a simple custom framework. By the end, you‘ll have a solid foundation for integrating Mesos into your own environment to automate resource management at scale.

Let‘s get started by taking a closer look at what Apache Mesos is and how it works under the hood.

Overview of Apache Mesos

At a high level, a Mesos cluster consists of a set of agent nodes that are managed by a master node. The master implements fine-grained sharing of resources across applications using resource offers. Mesos frameworks receive resource offers from the master and decide which resources to accept and which tasks to run on them.

Mesos architecture diagram

Some key Mesos concepts include:

  • Master: The Mesos master manages agent daemons and frameworks using ZooKeeper for persistence. It sends resource offers to registered frameworks and monitors the status of tasks.

  • Agent: Mesos agents run on each node in the cluster and manage the execution of tasks. They register with the master and report back metrics for tasks.

  • Framework: Mesos frameworks handle the scheduling of tasks. They can accept or reject the resource offers provided by the master. Frameworks include built-in schedulers like Marathon or custom scheduler implementations.

  • Offer: A list of a agent node‘s available CPU, RAM, and other resources. Offers are sent by the master to registered frameworks.

  • Task: A unit of scheduled work, which can be a single process or more commonly a container image to launch on a agent.

Mesos provides a scalable and resilient core for enabling various frameworks and processing engines to efficiently share clusters. Fault tolerance is achieved by the master using ZooKeeper to persist the state of running tasks. The agent nodes manage and monitor their own tasks independently. Mesos is natively supported in the DC/OS distributed platform.

Next, let‘s take a look at how we can interact with Mesos programmatically via its API.

The Mesos HTTP API

Both the Mesos master and agent expose HTTP APIs that enable programmatic interaction with the cluster. The API is built on Protocol Buffers and provides comprehensive functionality for retrieving cluster state, performing operations, and monitoring resource consumption. Some of the key APIs include:

  • /api/v1/scheduler: Used by frameworks to connect and interact with the Mesos master
  • /master/state: Returns overall state of the cluster including running frameworks and agents
  • /master/tasks: Lists active and completed tasks in the cluster
  • /master/frameworks: Provides information about connected and active frameworks
  • /master/redirect: Redirects to the leading master in a high availability setup
  • /agent/health: Health check endpoint for a agent node
  • /agent/state: Snapshot of the current resource usage and task states on an agent
  • /agent/monitor/statistics: Stream of resource usage metrics for an agent
  • /agent/flags: Configured options for the agent daemon
  • /logs/v1/range: Retrieve log entries from a specific framework or executor

The full specification for the API is available in the Mesos documentation. Both JSON and Protocol Buffer responses are supported.

With an understanding of what‘s possible with the API, let‘s setup a local development environment and start exploring it hands-on.

Setting Up a Mesos Dev Environment

One of the easiest ways to start experimenting with Mesos is using a virtual machine with a Mesos cluster pre-configured. This allows you to get up and running quickly without worrying about provisioning servers and installing Mesos from scratch.

For this guide, I‘ve provided a set of Vagrant configuration files and scripts to spin up a single-node Mesos cluster locally. The Vagrant setup provisions a CentOS 7 VM with:

  • Mesos 1.0.0
  • Marathon 1.4.0
  • ZooKeeper 3.4.8
  • Mesos-DNS 0.5.2
  • Docker 1.12.5

To get started, clone the following Git repo:

git clone https://github.com/Vanlightly/mesos-vagrant.git
cd mesos-vagrant

You‘ll need to have Vagrant and VirtualBox installed. Then simply run:

vagrant up

to create the cluster and start it up. You can access the Mesos web UI at http://192.168.33.10:5050. Marathon is available at http://192.168.33.10:8080.

Mesos web UI

The IP address of the Vagrant VM is 192.168.33.10 which we‘ll be using to access the HTTP API. The Mesos master daemon runs on port 5050 by default, while each agent reserves a random port.

Now that we have a working environment, we can dive into the API and start interacting with it programmatically.

Connecting to the API with Python

To interact with the Mesos HTTP API, we‘ll use the popular Python data science library Jupyter. Jupyter gives us an interactive notebook environment to write and iterate on our code.

With your Vagrant-based Mesos cluster up and running, create a new Python 3 notebook in Jupyter on your host machine. Install the requests library which we‘ll use to send HTTP requests:

!pip install requests

We can now use requests to send HTTP requests to the Mesos HTTP endpoints:

import requests

master_url = "http://192.168.33.10:5050"
print("Mesos version:", 
  requests.get(master_url + "/version").json()["version"])

This should print out the version of Mesos master running in the Vagrant VM:

Mesos version: 1.0.0

Easy as that, we can start calling the various Mesos HTTP APIs. Next up, let‘s take a look at some of the powerful things we can do with it.

Inspecting Cluster State

One of the first things you‘ll want to do when connecting to a Mesos cluster is inspect its overall state. The master /state API endpoint provides a comprehensive dump of the frameworks, agents, and tasks running in the cluster. It returns a JSON object representing the current cluster state.

To retrieve the full cluster state and print some of the key information:

state_url = master_url + "/state"
cluster_state = requests.get(state_url).json()

print("\nMesos Cluster State:")
print("Frameworks:", len(cluster_state["frameworks"]))
print("Agents:", len(cluster_state["agents"]))
print("Tasks:", len(cluster_state["frameworks"][0]["tasks"]))

The /state endpoint contains detailed information about each of the frameworks, agents, and tasks. Some of the useful fields include:

  • cluster_state["frameworks"]: List of active frameworks including scheduler info and tasks
  • cluster_state["frameworks"][0]["tasks"]: List of tasks for the first framework
  • cluster_state["frameworks"][0]["completed_tasks"]: Terminated tasks for the first framework
  • cluster_state["agents"]: List of connected agent nodes and their resource capacity/usage

With access to all this data about the cluster, you can build detailed monitoring and management dashboards. Or you could use the API to integrate with existing DevOps tools and configure alerting and notifications based on the cluster metrics.

Next, let‘s try actually scheduling some work in the cluster.

Launching Tasks and Containers

While the Mesos master provides the APIs for viewing cluster state, the actual launching of tasks is handled by a scheduler framework. Common Mesos frameworks include Marathon, Chronos, and Singularity. However, it‘s also possible to build your own framework to handle the scheduling of tasks.

For this example, we‘ll use Marathon, which is a framework for long-running tasks. Marathon has a REST API for scheduling tasks and querying application state.

To schedule a task, we simply send a POST request to the /v2/apps endpoint with details about the task containerization and resources required. For example, to schedule a new task using the hello-mesos Docker image:

marathon_url = "http://192.168.33.10:8080"
app_spec = {
  "id": "hello-mesos-app",
  "cmd": "python3 -m http.server 8080",
  "cpus": 0.5,
  "mem": 64.0,
  "container": {
    "type": "DOCKER",
    "docker": {
      "forcePullImage": True,
      "image": "mesosphere/hello-mesos:latest",
      "parameters": [],
      "privileged": False
    }
  }
}

We provide Marathon with the Docker image to use as well as the resources the task requires. Then we launch the task by sending the JSON to the /apps endpoint:

marathon_url = "http://192.168.33.10:8080"
r = requests.post(marathon_url + "/v2/apps", json=app_spec)
print("Status code:", r.status_code)

Marathon will handle scheduling the task and interacting with Mesos to allocate resources. You can view the status of the deployment in the Marathon web UI at http://192.168.33.10:8080.

Marathon web UI showing the hello-mesos task

Once the deployment is complete, the hello-mesos web server will be running somewhere in the cluster. You can find the hostname and port it was scheduled on through the API:

apps = requests.get(marathon_url + "/v2/apps").json()["apps"] 
hello_mesos_app = [a for a in apps if a["id"] == "/hello-mesos-app"][0]
host = hello_mesos_app["tasks"][0]["host"]
port = hello_mesos_app["tasks"][0]["ports"][0]
print("Access the app at: http://{}:{}".format(host, port))

Putting it all together, this notebook demonstrates how easy it is to programmatically interact with a Mesos cluster and schedule containerized tasks. You could integrate this into a CI/CD pipeline to automatically deploy services into the cluster.

Interacting with Running Tasks

Once tasks are running in the cluster, you can use the API to further monitor and interact with them. The /tasks endpoint on the agent nodes provides data about active and completed tasks. To retrieve the task state and find the running hello-mesos task:

agent_url = "http://192.168.33.10:" + \
  str(hello_mesos_app["tasks"][0]["slaveId"])
tasks_state = requests.get(agent_url + "/tasks").json()["tasks"]

hello_mesos_task = [t for t in tasks_state 
                    if t["executor_id"].startswith("hello-mesos-app")][0]
print(hello_mesos_task["id"], hello_mesos_task["state"])

The API can also be used to view the stdout/stderr logs for the task, retrieve resource usage metrics, and send signals to running processes.

Scheduling with Frameworks

While launching one-off tasks is useful for development and debugging purposes, real-world production workloads are typically scheduled using frameworks. Mesos frameworks like Marathon, Chronos and Aurora can schedule long-running apps, cron-style jobs, and more complex application deployments.

Frameworks use the /api/v1/scheduler endpoint to communicate with the Mesos master. They receive resource offers and can accept or decline offers based on application constraints. Frameworks then send back a description of the tasks to launch using the accepted offers.

The Mesos scheduler API is more complex and requires maintaining state and properly handling callbacks from Mesos. Most developers shouldn‘t need to write a custom scheduler, but it‘s useful to understand how frameworks interact with Mesos to handle resource scheduling and task launching.

Practical Applications

At a high level, the Mesos HTTP API enables programmatic interaction with every layer of the cluster. This opens up many powerful possibilities:

  • Monitoring and dashboarding: Query the /state endpoint to retrieve metrics and visualize cluster resource utilization and task status over time.

  • Continuous delivery: Build automated deployment pipelines that use the API to launch new versions of your app into the cluster.

  • Autoscaling: Dynamically scale up/down frameworks based on task throughput and resource utilization.

  • Batch job scheduling: Automatically launch periodic batch jobs and monitor their completion status.

  • Custom frameworks: Build your own framework to optimize task scheduling for your specific workload requirements.

By leveraging the API, you can automate cluster operations and integrate Mesos with your existing infrastructure tools and workflows.

Best Practices

As you start integrating the Mesos API into your systems, keep in the mind these recommended practices:

Rate limit API requests: Sending too many requests to the master can negatively impact cluster performance. Consider using a sidecar proxy that caches API responses.

Use the Mesos DNS API for service discovery: Mesos-DNS provides a stable endpoint for discovering services via DNS and can reduce load on the master API.

Consider the stability tradeoffs of automating via the API: Automating cluster actions via the API is powerful, but can also add complexity and new failure modes.

Secure and restrict access to the API: Be sure to secure your Mesos Web UI and API endpoint access with proper authentication and authorization.

By following these guidelines, you can build robust systems on top of Mesos while minimizing operational risk.

Conclusion

The Apache Mesos HTTP API provides a powerful way to programmatically interact with every aspect of an active cluster. From retrieving metrics, to scheduling tasks, to building custom frameworks, the API unlocks a variety of capabilities for automating cluster operations.

In this post, we took a deep dive into the API and walked through examples of interacting with it using Python and Jupyter Notebook. Hopefully this has inspired you to start experimenting with the Mesos API and thinking about how you can leverage it for your own projects. The potential use cases are vast, and you‘re really only limited by your imagination.

By mastering the Mesos API, you can take your resource management and DevOps automation the the next level. Go forth and build the next generation of powerful distributed systems and intelligent data pipelines!

Similar Posts