Friday, December 2, 2016

Why Habitat? – Plans and Packages, Part 2 [feedly]

Why Habitat? – Plans and Packages, Part 2
https://blog.chef.io/2016/11/28/why-habitat-plans-and-packages-part-2/

-- via my feedly newsfeed

TL;DR: Wow, a 2000 word blog post. Habitat is a better way to package apps, and creates better containers. Walk through the Habitat tutorial here.

In part 1 of this blog post series, we talked about how Habitat approaches the problem of Application Packaging. In this part, we'll show you the difference between creating a container using traditional methods and using Habitat.

Traditional Container Building

Let's look at a typical workflow for a developer leveraging Dockerfiles to package a node.js application. The typical workflow for this developer would be starting with a Docker Hub provided image to launch node. Docker Hub contains several thousand images you can use to get started packaging your application. Using one is as simple as running a short command.

michael@ricardo-2:plans_pkg_part_2$ docker run -it node  Unable to find image 'node:latest' locally  latest: Pulling from library/node  43c265008fae: Pull complete   af36d2c7a148: Pull complete   143e9d501644: Pull complete   f6a5aab6cd0c: Pull complete   1e2b64ecebce: Pull complete   328ff1526764: Pull complete   Digest: sha256:1b642fb211851e8515800efa8e6883b88cbf94fe1d99e674575cd24a96dcc940  Status: Downloaded newer image for node:latest  > i = 5  5  > x = 5  5  > x + i  10

If you've ever submitted a ticket to get a development machine to just do your job, this is a very delightful experience by comparison. With a few characters you have a machine with Node.js to run your application. The next thing you need to do is combine that Docker Hub provided image with your actual application artifacts. That's where you'll need to start understanding more about packaging with container formats.

Container formats provide a Domain Specific Language to describe how a container should be built. This is the venerable Dockerfile in the case of Docker, or an App Image Manifest for ACI. If we're using Docker we'll need to write a Dockerfile like the below to package the application.

FROM node:latest    # Create app directory  RUN mkdir -p /usr/src/app  WORKDIR /usr/src/app    # Install app dependencies  COPY source/package.json /usr/src/app/  RUN npm install    #Bundle Config  RUN mkdir -p /usr/src/app/config  COPY source/config/config.json /usr/src/app/config    # Bundle app source  COPY source/server.js /usr/src/app/    EXPOSE 8080    CMD [ "npm", "start" ]

This is a basic example how to package a Node.js application in a container. Now from a developer perspective, this is a great experience. We still don't need to know much about how the underlying system works. We simply need to pull in the required version of the Node.js image (FROM node:latest) and copy our source code to the container.

If we build a container from this Dockerfile, we will get an image with some operating system, an installation of node, our dependencies from NPM, and our application code. Running docker images will show us the results of this image creation.

michael@ricardo-2:plans_pkg_part_2$ docker images  REPOSITORY           TAG                 IMAGE ID            CREATED             SIZE  mfdii/node-example   latest              36c6568c606b        4 minutes ago       655.9 MB  node                 latest              04c0ca2a8dad        16 hours ago        654.6 MB

Great, we have an image, but it's almost 656 MB! If we compare this to the node image that we based our container on, we see that our application has taken only 1.3 MB of space. What's that additional 654 MB? If we take a look at the Dockerfile used to create the node:latest we can begin to piece together what's going on. The Dockerfile is simple, it sets up some keys and then pulls down and installs node. It's also building from another Docker image that's based on Debian's Jessie release. If we dig into that Dockerfile we can quickly see where that 654 MB is coming from. We've downloaded a whole host of packages using the OS package manager. We could keep tracing back these Dockerfiles for some time. An easier way is to run docker history on an image and you can quickly see where the bulk of an image comes from.

If you're an traditional Operations focused engineer you might shrug your shoulders at this. Of course you need an operating system to run your application. Others among you will be guffawed at the fact we've pulled in 654 MB of "who knows what". As a developer this bulk of an operating system creates unnecessary friction for you when it's time to move to production. When it's time to ship you application you get to ship 654 MB of stuff you didn't explicitly ask for or may not even need. Of course, your operations team might want a more complete understanding of what's in your container. Are you shipping vulnerable libraries or code? Who built the underlying container? Can you trust the source of that container?

Problems with the traditional way

Harken back to part 1 and remember that containers allow us to rethink how to package our applications. If you look at different patterns and practices that have emerged regarding containers, how much of the underlying OS you require (or you should ship) is a sliding scale based on the needs of the application. For some applications you can statically link them at compile time, and create container images of only a few MB in size, with no operating system. For other applications you'll need more of the operating system, and thus you have the sliding scale as below.

You can think of the left side of the above diagram as more traditional methods of application deployment using VMs, and the right side as more modern designs such as microservices. The goal as you move to the right is to reduce the footprint of the operating system as much as possible. This is important for a few reasons:

  • Reducing the operating system creates smaller images which in turns increases the speed at which we can deploy containers from these images.
  • Reducing the operating system decreases the attack surface of the container and increases the ease at which the container can be audited for vulnerable software components.
  • Reducing the operating system footprint decreases the chance your application will consume a component of that OS, thus coupling you to that OS vendor's release cadence.

Let's look at Habitat's approach to building the same container.

Habitat's approach

To get started packaging your application into a container using Habitat you first start by defining a "plan" to package your application. Remember Habitat takes a top down approach, starting with the application concerns, rather than the bottoms up approach of starting with the operating system. Where we decide to deploy our application is something we can be concerned with later.

The Habitat plan or "plan.sh" starts with some standard information that allows you to define the metadata of the application package you wish to create.

pkg_origin=myorigin  pkg_name=mytutorialapp  pkg_version=0.2.0  pkg_maintainer="The Habitat Maintainers <humans@habitat.sh>"  pkg_license=()  pkg_upstream_url=https://github.com/habitat-sh/habitat-example-plans  pkg_source=nosuchfile.tar.gz  pkg_deps=(core/node)  pkg_expose=(8080)

The metadata contains items such as our Habitat origin (or organization), the name of the package, version, source code location (if applicable), etc. It also allows you to declare dependencies your application needs to run. Since we are packaging the same Node.js application we packaged in the docker example, we need to declare a dependency to the Habitat node package.

This dependency statement is similar to what we did in the Docker example when we declared a dependency to the Docker provided node image (the FROM node:latest line). However, instead of declaring a dependency to an entire operating system (which is what the node:latest image gives us), we've declared a dependency to just the application runtime itself.

Once we've defined the metadata, we need to specify the lifecycle of our application build process. Habitat gives you various methods you use to define the building and installation lifecycle of your application. Virtually all application will go through one of these lifecycle stages, and Habitat allows you to define what happens in those lifecycles stages.

For our application we'll need to define the Build and Install stages of the application. We can do this by defining those methods in our plan.sh.

do_build() {    # copy the source code to where Habitat expects it    cp -vr $PLAN_CONTEXT/../source/* $HAB_CACHE_SRC_PATH/$pkg_dirname      # This installs the dependencies listed in packages.json    npm install  }    do_install() {    # copy our source to the final location    cp package.json ${pkg_prefix}    cp server.js ${pkg_prefix}      # Copy over the nconf module to the package that we installed in do_build().    mkdir -p ${pkg_prefix}/node_modules/    cp -vr node_modules/* ${pkg_prefix}/node_modules/  }

To complete our plan.sh, we'll need to override a few other Habitat callbacks. You can see the complete plan file on the github repo for this example.

These functions we've defined are essentially the same steps we defined in our Dockerfile. The important part to note is that I've made zero assumptions about where this application is going to be run. Our Dockerfile example immediately makes the assumption that 1) you're running this application in a container (obviously), and 2) that you're running on a particular operating system (node:latest is based on Debian remember). We've made no such assumptions in our Habitat plan. What we have defined are the build lifecycle steps of our application, and explicitly called out the dependencies our application needs. Remember, we want to start with the application, not the operating system, and we've done just that.

Creating our Habitat Docker Container

Now that we've defined what the application needs to run, and how to build it, we can have Habitat build an application artifact we can then deploy in various ways. This is a simple process of entering the Habitat studio and issuing the build command. The build will create an application artifact (a Habitat Artifact or .hart file).

When we're ready to run our application inside a container, we can export this artifact and its required dependencies in a container format. The hab pkg export command gives us several options to export our application artifact. Coming back to the Dockerfile example, we can run hab pkg export docker myorigin/mytutorialapp to export our application as a container. This will query our application artifact to calculate its dependencies, including transitive dependencies, and package those dependencies into our Docker container. The export will also include a lightweight OS, and the Habitat supervisor (which we will discuss more in Part 3).

If we look at the container we export with Habitat, you'll see we've created a much slimmer container.

michael@ricardo-2:plans_pkg_part_2$ docker images  REPOSITORY           TAG                 IMAGE ID            CREATED             SIZE  mfdii/node-example   latest              36c6568c606b        40 minutes ago      655.9 MB  node                 latest              04c0ca2a8dad        16 hours ago        654.6 MB  mfdii/mytutorialapp  latest     534afd80d74d        2 minutes ago       182.1 MB

Why this matters

All in all, we end up with a container that contains only the artifacts that are required to run our application. We don't have a container with a mystery 650 MB that comes from some unknown place. This behavior is because we've started at the application first, defined the build lifecycle in our plan.sh, and explicitly declared our dependencies.  Habitat can use this info to then build the minimum viable container we need to run our application. We end up with a container that's slimmer (182MB vs 655MB), and only contains the concerns required.

This idea of a minimum viable container starts to become important when you think of auditing containers to verify your not running a library or binary with known vulnerabilities. The surface area of this container is smaller and thus easier to audit, and the container build system (Habitat in this case) has explicit knowledge about what we included inside this container.

The Habitat build system also makes for a much more portable application. Because we haven't explicitly declared a container format in connection with defining our application requirements, like we did with our Dockerfile, we can export our application to a variety of formats (Docker, ACI, Mesos, or a tar.gz). We could extend this further by offering more export formats such as VM images.

What's Coming Up

In the next blog post in our series, we'll talk more about running the actual application artifact Habitat creates, and the features of Habitat that let you inject configuration into your application at run time. In the meantime, try the Habitat tutorial. It walks you through packaging up the example Node.js application we've talked about.

You might also be interested in our webinar, "Simplifying Container Management with Habitat." Watch to lean how Habitat makes building, deploying, and running your applications in containers simple, no matter how complex the production environment.

The post Why Habitat? – Plans and Packages, Part 2 appeared first on Chef Blog.