Part 1

In-depth dive to images

In depth dive to images

Images are the basic building blocks for containers and other images. When you "containerize" an application you work towards creating the image.

By learning what images are and how to create them you are ready to start utilizing containers in your own projects.

Where do the images come from?

When running a command such as docker run hello-world, Docker will automatically search Docker Hub for the image if it is not found locally.

This means that we can pull and run any public image from Docker's servers. For example‚ if we wanted to start an instance of the PostgreSQL database, we could just run docker run postgres, which would pull and run https://hub.docker.com/_/postgres/.

We can search for images in the Docker Hub with docker search. Try running docker search hello-world.

The search finds plenty of results, and prints each image's name, short description, amount of stars, and "official" and "automated" statuses.

$ docker search hello-world

  NAME                         DESCRIPTION    STARS   OFFICIAL   AUTOMATED
  hello-world                  Hello World!…  699     [OK]
  kitematic/hello-world-nginx  A light-weig…  112
  tutum/hello-world            Image to tes…  56                 [OK]
  ...

Let's examine the list.

The first result, hello-world, is an official image. Official images are curated and reviewed by Docker, Inc. and are usually actively maintained by the authors. They are built from repositories in the docker-library.

When browsing the CLI's search results, you can recognize an official image from the "[OK]" in the "OFFICIAL" column and also from the fact that the image's name has no prefix (aka organization/user). When browsing Docker Hub, the page will show "Docker Official Images" as the repository, instead of a user or organization. For example, see the Docker Hub page of the hello-world image.

The third result, tutum/hello-world, is marked as "automated". This means that the image is automatically built from the source repository. Its Docker Hub page shows its previous "Builds" and a link to the image's "Source Repository" (in this case, to GitHub) from which Docker Hub builds the image.

The second result, kitematic/hello-world-nginx, is neither an official nor an automated image. We can't know what the image is built from, since its Docker Hub page has no links to any repositories. The only thing its Docker Hub page reveals is that the image is 6 years old. Even if the image's "Overview" section had links to a repository, we would have no guarantees that the published image was built from that source.

There are also other Docker registries competing with Docker Hub, such as quay. However, docker search will only search from Docker Hub, so we will need to use the registry's web pages to search for images. Take a look at the page of the nordstrom/hello-world image on quay. The page shows the command to use to pull the image, which reveals that we can also pull images from hosts other than Docker Hub:

docker pull quay.io/nordstrom/hello-world

So, if the host's name (here: quay.io) is omitted, it will pull from Docker Hub by default.

NOTE: Trying above command may fail giving manifest errors as default tag latest is not present in quay.io/nordstrom/hello-world image. Specifying correct tag for image will pull image without any errors, for ex. docker pull quay.io/nordstrom/hello-world:2.0

A detailed look into an image

Let's go back to a more relevant image than 'hello-world', the ubuntu image is one of the most common Docker images to use as a base for your own image.

Let's pull Ubuntu and look at the first lines:

$ docker pull ubuntu
  Using default tag: latest
  latest: Pulling from library/ubuntu

Since we didn't specify a tag, Docker defaulted to latest, which is usually the latest image built and pushed to the registry. However, in this case, the repository's README says that the ubuntu:latest tag points to the "latest LTS" instead since that's the version recommended for general use.

Images can be tagged to save different versions of the same image. You define an image's tag by adding :<tag> after the image's name.

Ubuntu's Docker Hub page reveals that there's a tag named 18.04 which promises us that the image is based on Ubuntu 18.04. Let's pull that as well:

$ docker pull ubuntu:18.04

  18.04: Pulling from library/ubuntu
  c2ca09a1934b: Downloading [============================================>      ]  34.25MB/38.64MB
  d6c3619d2153: Download complete
  0efe07335a04: Download complete
  6b1bb01b3a3b: Download complete
  43a98c187399: Download complete

Images are composed of different layers that are downloaded in parallel to speed up the download. Images being made of layers also have other aspects and we will talk about them in part 3.

We can also tag images locally for convenience, for example, docker tag ubuntu:18.04 ubuntu:bionic creates the tag ubuntu:bionic which refers to ubuntu:18.04.

Tagging is also a way to "rename" images. Run docker tag ubuntu:18.04 fav_distro:bionic and check docker images to see what effects the command had.

To summarize, an image name may consist of 3 parts plus a tag. Usually like the following: registry/organisation/image:tag. But may be as short as ubuntu, then the registry will default to docker hub, organisation to library and tag to latest. The organisation may also be an user, but calling it an organisation may be more clear.

Building images

Finally, we get to build our own images and get to talk about Dockerfile and why it's so great.

Dockerfile is simply a file that contains the build instructions for an image. You define what should be included in the image with different instructions. We'll learn about the best practices here by creating one.

Let's take a most simple application and containerize it first. Here is a script called "hello.sh"

hello.sh

#!/bin/sh

echo "Hello, docker!"

First, we will test that it even works. Create the file, add execution permissions and run it:

$ chmod +x hello.sh

$ ./hello.sh
  Hello, docker!
  • If you're using windows you can skip these two and add chmod +x hello.sh to the Dockerfile.

And now to create an image from it. We'll have to create the Dockerfile that declares all of the required dependencies. At least it depends on something that can run shell scripts. So I will choose alpine, it is a small Linux distribution and often used to create small images.

Even though we're using alpine here, you can use ubuntu during exercises. Ubuntu images by default contain more tools to debug what is wrong when something doesn't work. In part 3 we will talk more about why small images are important.

We will choose exactly which version of a given image we want to use. This makes it so that we don't accidentally update through a breaking change, and we know which images need updating when there are known security vulnerabilities in old images.

Now create a file and name it "Dockerfile" and lets put the following instructions inside it:

Dockerfile

# Start from the alpine image that is smaller but no fancy tools
FROM alpine:3.13

# Use /usr/src/app as our workdir. The following instructions will be executed in this location.
WORKDIR /usr/src/app

# Copy the hello.sh file from this location to /usr/src/app/ creating /usr/src/app/hello.sh
COPY hello.sh .

# Alternatively, if we skipped chmod earlier, we can add execution permissions during the build.
# RUN chmod +x hello.sh

# When running docker run the command will be ./hello.sh
CMD ./hello.sh

Great! By default docker build will look for a file named Dockerfile. Now we can run docker build with instructions where to build (.) and give it a name (-t <name>):

$ docker build . -t hello-docker
  Sending build context to Docker daemon  54.78kB
  Step 1/4 : FROM alpine:3.13
   ---> d6e46aa2470d
  Step 2/4 : WORKDIR /usr/src/app
   ---> Running in bd0b4e349cb4
  Removing intermediate container bd0b4e349cb4
   ---> b382ca27c182
  Step 3/4 : COPY hello.sh .
   ---> 7fbc1b6e45ab
  Step 4/4 : CMD ./hello.sh
   ---> Running in 24f28f026b3f
  Removing intermediate container 24f28f026b3f
   ---> 444f21cf7bd5
  Successfully built 444f21cf7bd5
  Successfully tagged hello-docker:latest

$ docker images
  REPOSITORY            TAG          IMAGE ID       CREATED         SIZE
  hello-docker          latest       444f21cf7bd5   2 minutes ago   5.57MB

Now executing the application is as simple as running docker run hello-docker. Try it! During the build we see that there are multiple steps with hashes and intermediate containers. The steps here represent the layers so that each step is a new layer to the image.

The layers have multiple functions. We often try to limit the number of layers to save on storage space but layers can work as a cache during build time. If we just edit the last lines of Dockerfile the build command can start from the previous layer and skip straight to the section that has changed. COPY automatically detects changes in the files, so if we change the hello.sh it'll run from step 3/4, skipping 1 and 2. This can be used to create faster build pipelines. We'll talk more about optimization in part 3.

The intermediate containers are containers created from the image in which the command is executed. Then the changed state is stored into an image. We can do similiar task and a new layer manually. Create a new file called additional.txt and let's copy it inside the container and learn new trick while we're at it! We'll need two terminals so I will label the lines with 1 and 2 representing the two.

1 $ docker run -it hello-docker sh
1 /usr/src/app #

Now we're inside of the container. We replaced the CMD we defined earlier with sh and used -i and -t to start the container so that we can interact with it. In the second terminal we will copy the file here.

2 $ docker ps
2   CONTAINER ID   IMAGE          COMMAND   CREATED         STATUS         PORTS     NAMES
    9c06b95e3e85   hello-docker   "sh"      4 minutes ago   Up 4 minutes             zen_rosalind

2 $ touch additional.txt
2 $ docker cp ./additional.txt zen_rosalind:/usr/src/app/

I created the file with touch right before copying it in. Now it's there and we can confirm that with ls:

1 /usr/src/app # ls
1 additional.txt  hello.sh

Great! Now we've made a change to the container. We can use diff to check what has changed

2 $ docker diff zen_rosalind
    C /usr
    C /usr/src
    C /usr/src/app
    A /usr/src/app/additional.txt
    C /root
    A /root/.ash_history

The character in front of the file name indicates the type of the change in the container's filesystem: A = added, D = deleted, C = changed. The additional.txt was created and our ls created .ash_history. Next we will save the changes as a new layer!

2 $ docker commit zen_rosalind hello-docker-additional
    sha256:2f63baa355ce5976bf89fe6000b92717f25dd91172aed716208e784315bfc4fd
2 $ docker images
    REPOSITORY                   TAG          IMAGE ID       CREATED          SIZE
    hello-docker-additional      latest       2f63baa355ce   3 seconds ago    5.57MB
    hello-docker                 latest       444f21cf7bd5   31 minutes ago   5.57MB

We will actually never use docker commit again. This is because defining the changes to the Dockerfile is much more sustainable method of managing changes. No magic actions or scripts, just a Dockerfile that can be version controlled.

Let's do just that and create hello-docker with v2 tag that includes additional.txt.

Dockerfile

# Start from the alpine image
FROM alpine:3.13

# Use /usr/src/app as our workdir. The following instructions will be executed in this location.
WORKDIR /usr/src/app

# Copy the hello.sh file from this location to /usr/src/app/ creating /usr/src/app/hello.sh.
COPY hello.sh .

# Execute a command with `/bin/sh -c` prefix.
RUN touch additional.txt

# When running docker run the command will be ./hello.sh
CMD ./hello.sh

Build it with docker build . -t hello-docker:v2 and we are done! Let's compare the output of ls:

$ docker run hello-docker-additional ls
  additional.txt
  hello.sh

$ docker run hello-docker:v2 ls
  additional.txt
  hello.sh

Now we know that all instructions in a Dockerfile except CMD (and one other that we will learn about soon) are executed during build time. CMD is executed when we call docker run, unless we overwrite it.

You have reached the end of this section! Continue to the next section: