Dockerfile Best Practice: Layer Ordering

Fatah Nur Alam Majid
5 min readMar 27, 2021
Source image: https://foxutech.com

Most of us knows Docker, right? It is the most known platform for building containers right now (but not the only one). But did you put your best effort to create the best possible, most efficient Docker images? In this article we’re gonna show you about one of the Dockerfile best practices for building Docker images. We will do a PoC (Proof of Concept) for leveraging the build cache aspect.

For start we can build a very simple Node JS application that just works. Then we will wrap the application into a Docker image, and run it as a container (not really necessary in this case). The main point is not on the application itself, but instead in the Dockerfile.

P.S. you can find the code on my GitHub here.

The preps

First thing first. We need to create a very simple application that just works. Here I choose to use Express JS app. You can start with anything you want, but this is mine:

Maybe you want to make sure that the code is able to run, but in this article we might not need to do that. Again, the main point for this article is the Dockerfile, not the application itself.

Next step is we will add two Dockerfiles, the first one is for the image that “just works”, and the second one is for the image built with “best practices”. In case you want to see mine, you can check the code below.

Dockerfile that carefully put layers ordering
Dockerfile that “just works”

That’s it! We can now continue to do the PoC itself by making changes on the source code and invoke the Docker to build the image.

The PoC

For the PoC, we need to do several steps. The first one we will build the image using the started application code. After succeeded, we can make a change on the source code, and rebuild the Docker image again. By using the build cache, the “best practice” Dockerfile will make a faster build time compared to the Dockerfile that “just works”, theoretically. We’ll see it.

Now let’s build the current application code. For me, here is the result for first build:

Capture of 1st build logs for Dockerfile that “just works”
Capture of 1st build logs for Dockerfile that implements “best practices”

As you can see on those images, both Dockerfile will try to install the dependencies first, then running the application code.

Now let’s make a change on the application code. You can make any changes you want. For short, I tried to add more route for the Express app. You can see it here:

Now for the prove itself. After the changes we made to the application code, we can try to build the image again using the same Dockerfiles. See image below for my results.

Capture of 2nd build logs for Dockerfile that “just works” after changes have been made
Capture of 2nd build logs for Dockerfile that implements “best practices” after changes have been made

You should notice something interesting. As you can see, on the Dockerfile “that just works”, the Docker will try to run the install dependencies step again. But on the other hand, on the Dockerfile with “best practices” by carefully ordering the layer, it was using the build cache from the result before (the first build). This is aligned with the explanation given by Docker on their documentation website:

Once the cache is invalidated, all subsequent Dockerfile commands generate new images and the cache is not used.

But how? Technically, because we have changed the source code itself. In the “just works” Dockerfile, we can see that the Docker process will copy everything in the current folder to the container, including the application code itself.

This will make Docker drops all the build cache since every time we change the application code, the checksum of those files are changed. And if the file checksum are changed, the build cache is invalidated. This is also aligned with the Docker documentation:

If anything has changed in the file(s), such as the contents and metadata, then the cache is invalidated.

And if you look back on the first quoted sentence, all this things will be make sense.

Question: how does the “best practice” Dockerfile uses the build cache? what and where is the difference?

When you try to guess it yourself by comparing both Dockerfiles, you should be able to notice another interesting thing. As we can see on the “best practice” Dockerfile, we made a separate step for the build step. First step is copying the package*.json files only, and then running the npm install command. And for the second step is copying the whole application code to the container.

This separating step will make the Docker use the build cache, as long as the dependencies aren’t changed, in this case, the package*.json files. Hence, if the ones changing are the application code (and not the package*.json files), Docker will always use the build cache if it can (and if exists in the first place). This will become more and more natural for you if you had to deal with Dockerfile frequently, also you can do practice on your own.

Another thing..

In this short section, we will try to update the package.json file by adding another dependency for our simple application. For example:

And now if we try to build again using each Dockerfile, we will see these:

Capture of 3rd build logs for Dockerfile that “just works” after adding dependency
Capture of 3rd build logs for Dockerfile that implements “best practices” after adding dependency

Easily noticing, you will see that both Dockerfile will not use the build cache. Why? Simply because we’re changing the package.json file. Thus it will make the Docker drops the build cache and build as the Dockerfile instructs.

That’s all from this short article. I think this should be enough to motivate all of us to improve our Dockerfile whenever we have one. Also I hope that this article will bring much knowledge for always using best practices (if it fits the needs).

Thank you! See you on the next topic!

--

--

Fatah Nur Alam Majid

Tech hobbyist, Learn from scratch, Learning the hard way, Just want to share anything