{"id":1622,"date":"2019-04-05T03:48:40","date_gmt":"2019-04-05T03:48:40","guid":{"rendered":"http:\/\/kusuaks7\/?p=1227"},"modified":"2023-08-08T12:46:48","modified_gmt":"2023-08-08T12:46:48","slug":"slimming-down-your-docker-images-part-4-of-learn-enough-docker-to-be-useful","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/slimming-down-your-docker-images-part-4-of-learn-enough-docker-to-be-useful\/","title":{"rendered":"Slimming Down Your Docker Images Part 4 of Learn Enough Docker to be Useful"},"content":{"rendered":"<p id=\"2cf7\">In this article you\u2019ll learn how to speed up your Docker build cycles and create lightweight images. Keeping with our food metaphors, we\u2019re going to be eating salad as we slim down our Docker images\u200a\u2014\u200ano more pizza, donuts, and bagels.<\/p>\n<figure id=\"6bf5\" data-scroll=\"native\"><canvas width=\"75\" height=\"47\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 445px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/2560\/1*jn9ZKS6pZ3P2R5VDrqkTkg.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/2560\/1*jn9ZKS6pZ3P2R5VDrqkTkg.jpeg\" \/><\/figure>\n<p id=\"c53d\">In Part 3 of this series,\u00a0 we covered a dozen Dockerfile instructions to know. If you missed it, check out the article <a href=\"https:\/\/www.experfy.com\/blog\/learn-enough-docker-to-be-useful-part-3-a-dozen-dandy-dockerfile-instructions\">here<\/a>.<\/p>\n<p>Here\u2019s the cheatsheet.<\/p>\n<p id=\"7ee1\"><code>FROM<\/code>\u200a\u2014\u200aspecifies the base (parent) image.<br \/>\n<code>LABEL<\/code>\u00a0\u2014provides metadata. Good place to include maintainer info.<br \/>\n<code>ENV<\/code>\u200a\u2014\u200asets a persistent environment variable.<br \/>\n<code>RUN<\/code>\u00a0\u2014runs a command and creates an image layer. Used to install packages into containers.<br \/>\n<code>COPY<\/code>\u200a\u2014\u200acopies files and directories to the container.<br \/>\n<code>ADD<\/code>\u200a\u2014\u200acopies files and directories to the container. Can upack local\u00a0.tar files.<br \/>\n<code>CMD<\/code>\u200a\u2014\u200aprovides a command and arguments for an executing container. Parameters can be overridden. There can be only one CMD.<br \/>\n<code>WORKDIR<\/code>\u200a\u2014\u200asets the working directory for the instructions that follow.<br \/>\n<code>ARG<\/code>\u200a\u2014\u200adefines a variable to pass to Docker at build-time.<br \/>\n<code>ENTRYPOINT<\/code>\u200a\u2014\u200aprovides command and arguments for an executing container. Arguments persist.<br \/>\n<code>EXPOSE<\/code>\u200a\u2014\u200aexposes a port.<br \/>\n<code>VOLUME<\/code>\u200a\u2014\u200acreates a directory mount point to access and store persistent data.<\/p>\n<p id=\"4b9e\">Let\u2019s now look at how we can fashion our Dockerfiles to save time when developing images and pulling containers.<\/p>\n<h3 id=\"575e\">Caching<\/h3>\n<p id=\"99b4\">One of Docker\u2019s strengths is that it provides caching to help you more quickly iterate your image builds.<\/p>\n<p id=\"2e53\">When building an image, Docker steps through the instructions in your Dockerfile, executing each in order. As each instruction is examined, Docker looks for an existing intermediate image in its cache that it can reuse instead of creating a new (duplicate) intermediate image.<\/p>\n<p id=\"b3d3\">If cache is invalidated, the instruction that invalidated it and all subsequent Dockerfile instructions generate new intermediate images. As soon as the cache is invalidated, that\u2019s it for the rest of the instructions in the Dockerfile.<\/p>\n<p id=\"304c\">So starting at the top of the Dockerfile, if the base image is already in the cache it is reused. That\u2019s a hit. Otherwise, the cache is invalidated.<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*FOCF2hBIRuQ0nB8o-VJCxA.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*FOCF2hBIRuQ0nB8o-VJCxA.jpeg\" \/><\/p>\n<p style=\"text-align: center;\">Also a\u00a0hit<\/p>\n<p id=\"13d1\">Then the next instruction is compared against all child images in the cache derived from that base image. Each cached intermediate image is compared to see if the instruction finds a cache hit. If it\u2019s a cache miss, the cache is invalidated. The same process is repeated until the end of the Dockerfile is reached.<\/p>\n<p id=\"29bf\">Most new instructions are simply compared with those in the intermediate images. If there\u2019s a match, then the cached copy is used.<\/p>\n<p id=\"46d3\">For example, when a\u00a0<code>RUN pip install -r requirements.txt<\/code>\u00a0instruction is found in a Dockerfile, Docker searches for the same instruction in its locally cached intermediate images. The content of the old and new\u00a0<em>requirements.txt<\/em>files are not compared.<\/p>\n<p id=\"07d3\">This behavior can be problematic if you update your\u00a0<em>requirements.txt<\/em>\u00a0file with new packages and use\u00a0<code>RUN pip install<\/code>\u00a0and want to rerun the package installation with the new package names. I\u2019ll show a few solutions in a moment.<\/p>\n<p id=\"9863\">Unlike other Docker instructions, ADD and COPY instructions do require Docker to look at the contents of the file(s) to determine if there is a cache hit. The checksum of the referenced file is compared against the checksum in the existing intermediate images. If the file contents or metadata have changed, then the cache is invalidated.<\/p>\n<p id=\"5cf0\">Here are a few tips for using caching effectively.<\/p>\n<ul>\n<li id=\"a817\">Caching can be turned off by passing\u00a0<code>--no-cache=True<\/code>\u00a0with\u00a0<code>docker build<\/code>.<\/li>\n<li id=\"0681\">If you are going to be making changes to instructions, then every layer that follows will be rebuilt frequently. To take advantage of caching, put instructions that are likely to change as low as you can in your Dockerfile.<\/li>\n<li id=\"1340\">Chain\u00a0<code>RUN apt-get update<\/code>\u00a0and\u00a0<code>apt-get install<\/code>\u00a0commands to avoid cache miss issues.<\/li>\n<li id=\"079f\">If you\u2019re using a package installer such as pip with a\u00a0<em>requirements.txt<\/em>\u00a0file, then follow a model like the one below to make sure you don\u2019t receive a stale intermediate image with the old packages listed in\u00a0<em>requirements.txt<\/em>.<\/li>\n<\/ul>\n<p><span style=\"font-family: courier new,courier,monospace;\">COPY requirements.txt \/tmp\/<br \/>\nRUN pip install -r \/tmp\/requirements.txt<br \/>\nCOPY . \/tmp\/<\/span><\/p>\n<p id=\"2557\">Those are the suggestions for using Docker build caching effectively.<\/p>\n<h3 id=\"fd03\">Size Reduction<\/h3>\n<p id=\"6630\">Docker images can get large. You want to keep them small so they can pulled quickly and use few resources. Let\u2019s skinny down your images!<\/p>\n<figure id=\"141f\"><canvas width=\"75\" height=\"50\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*RFi86DbJQy39RjNQrQiD6A.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*RFi86DbJQy39RjNQrQiD6A.jpeg\" \/><\/figure>\n<p style=\"text-align: center;\">Go for a salad instead of a\u00a0bagel<\/p>\n<p id=\"2122\">An Alpine base image is a full Linux distribution without much else. It is usually under 5mb to download, but it requires you to spend more time writing the code for the dependencies you need to build a working app.<\/p>\n<figure id=\"7186\"><canvas width=\"75\" height=\"47\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*NnmWS0yPVhS_5nKW8jzSLg.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*NnmWS0yPVhS_5nKW8jzSLg.jpeg\" \/><\/figure>\n<p style=\"text-align: center;\">Alpine comes from\u00a0Alps<\/p>\n<p id=\"d087\">If you need Python in your container, the Python Alpine build is a nice compromise. It contains Linux and Python and you supply most everything else.<\/p>\n<p id=\"14fc\">An image I built with the latest Python Alpine build with a\u00a0<em>print(\u201chello world\u201d)<\/em>script weighs in at 78.5 MB. Here\u2019s the Dockerfile:<\/p>\n<p id=\"f2a2\"><span style=\"font-family: courier new,courier,monospace;\">FROM python:3.7.2-alpine3.8<br \/>\nCOPY . \/app<br \/>\nENTRYPOINT [\u201cpython\u201d, \u201c.\/app\/my_script.py\u201d, \u201cmy_var\u201d]<\/span><\/p>\n<p id=\"68e5\">On the Docker Hub website the base image is listed as 29 MB. When the child image is built it downloads and installs Python, making it grow larger.<\/p>\n<p id=\"45bf\">Besides using Alpine base images, another method for reducing the size of your images is using multistage builds. This technique also adds complexity to your Dockerfile.<\/p>\n<h3 id=\"f344\">Multistage Builds<\/h3>\n<p id=\"fb7a\">Multistage builds use multiple FROM instructions. You can selectively copy files, called build artifacts, from one stage to another. You can leave behind anything you don\u2019t want in the final image. This method can reduce your overall image size.<\/p>\n<p id=\"8675\">Each FROM instruction<\/p>\n<ul>\n<li id=\"afef\">begins a new stage of the build.<\/li>\n<li id=\"2b67\">leaves behind any state created in prior stages.<\/li>\n<li id=\"fb47\">can use a different base.<\/li>\n<\/ul>\n<p id=\"d2e9\">Here\u2019s a modified example of a multistage build from the\u00a0<a href=\"https:\/\/docs.docker.com\/develop\/develop-images\/multistage-build\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/docs.docker.com\/develop\/develop-images\/multistage-build\/\" data->Docker docs<\/a>:<\/p>\n<p id=\"f4a7\"><span style=\"font-family: courier new,courier,monospace;\">FROM golang:1.7.3 AS build<br \/>\nWORKDIR \/go\/src\/github.com\/alexellis\/href-counter\/<br \/>\nRUN go get -d -v golang.org\/x\/net\/html<br \/>\nCOPY app.go .<br \/>\nRUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .<\/span><\/p>\n<p id=\"99f7\"><span style=\"font-family: courier new,courier,monospace;\">FROM alpine:latest<br \/>\nRUN apk &#8211;no-cache add ca-certificates<br \/>\nWORKDIR \/root\/<br \/>\nCOPY &#8211;from=build \/go\/src\/github.com\/alexellis\/href-counter\/app .<br \/>\nCMD [&#8220;.\/app&#8221;]<\/span><\/p>\n<p id=\"00a0\">Note that we name the first stage by appending a name to the FROM instruction to name. The named stage is then be referred to in the<code>COPY --from=<\/code>\u00a0instruction later in the Dockerfile.<\/p>\n<p id=\"12b1\">Multistage builds make sense in some cases where you\u2019ll be making lots of containers in production. Multistage builds can help you squeeze every last ounce (gram if you think in metric) out of your image size. However, sometimes multistage builds add more complexity that can make images harder to maintain, so you probably won\u2019t use them in most builds. See further discussion of the tradeoffs\u00a0<a href=\"https:\/\/blog.realkinetic.com\/building-minimal-docker-containers-for-python-applications-37d0272c52f3\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/blog.realkinetic.com\/building-minimal-docker-containers-for-python-applications-37d0272c52f3\" data->here<\/a>\u00a0and advanced patterns\u00a0<a href=\"https:\/\/medium.com\/@tonistiigi\/advanced-multi-stage-build-patterns-6f741b852fae\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/medium.com\/@tonistiigi\/advanced-multi-stage-build-patterns-6f741b852fae\" data->here<\/a>.<\/p>\n<p id=\"434f\">In contrast, everyone should use a\u00a0.dockerignore file to help keep their Docker images skinny.<\/p>\n<h3 id=\"f20e\">.dockerignore<\/h3>\n<p id=\"0267\"><em>.dockerignore<\/em>\u00a0files are something you should know about as a person who knows enough Docker to be d\u0336a\u0336n\u0336g\u0336e\u0336r\u0336o\u0336u\u0336s\u0336 useful.<\/p>\n<p id=\"849c\">.dockerignore is similar to\u00a0.gitignore. It\u2019s a file with a list of patterns for Docker to match with file names and exclude when making an image.<\/p>\n<figure id=\"4099\"><canvas width=\"75\" height=\"52\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*yzTSAJgy7dM9qolu8qx_TA.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*yzTSAJgy7dM9qolu8qx_TA.jpeg\" \/><\/figure>\n<p style=\"text-align: center;\">Just\u00a0.dockerignore it<\/p>\n<p id=\"4302\">Put your\u00a0.dockerignore file in the same folder as your Dockerfile and the rest of your build context.<\/p>\n<p id=\"2030\">When you run\u00a0<code>docker build<\/code>\u00a0to create an image, Docker checks for a\u00a0.dockerignore file. If one is found, it then goes through the file line by line and uses Go\u2019s\u00a0<a href=\"https:\/\/golang.org\/pkg\/path\/filepath\/#Match\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/golang.org\/pkg\/path\/filepath\/#Match\" data->filepath.Match rules<\/a>\u200a\u2014\u200aand a few of\u00a0<a href=\"https:\/\/docs.docker.com\/v17.09\/engine\/reference\/builder\/#dockerignore-file\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/docs.docker.com\/v17.09\/engine\/reference\/builder\/#dockerignore-file\" data->Docker\u2019s own rules<\/a>\u200a\u2014\u200ato match file names for exclusion. Think Unix-style glob patterns, not regular expressions.<\/p>\n<p id=\"b48b\">So\u00a0<code>*.jpg<\/code>\u00a0will exclude files with a\u00a0<em>.jpg<\/em>\u00a0extension. And\u00a0<code>videos<\/code>\u00a0will exclude the videos folder and its contents.<\/p>\n<p id=\"e4b0\">You can explain what you\u2019re doing in your\u00a0.dockerignore with comments that start with a<code>#<\/code>.<\/p>\n<p id=\"b1d1\">Using\u00a0.dockerignore to exclude files you don\u2019t need from your Docker image is a good idea.\u00a0.dockerignore can:<\/p>\n<ul>\n<li id=\"388f\">help you keep your secrets from being revealed. No one wants passwords in their images.<\/li>\n<li id=\"c1ea\">reduce image size. Fewer files means smaller, faster images.<\/li>\n<li id=\"1bbc\">reduce build cache invalidation. If logs or other files are changing and your image is having its cache invalidated because of it, that\u2019s slowing down your build cycle.<\/li>\n<\/ul>\n<p id=\"54f4\">Those are the reasons to use a\u00a0.dockerignore file. Check out the\u00a0<a href=\"https:\/\/docs.docker.com\/v17.09\/engine\/reference\/builder\/#dockerignore-file\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/docs.docker.com\/v17.09\/engine\/reference\/builder\/#dockerignore-file\" data->docs<\/a>\u00a0for more details.<\/p>\n<h3 id=\"ced0\">Size Inspection<\/h3>\n<p id=\"9429\">Let\u2019s look at how to find the size of Docker images and containers from the command line.<\/p>\n<ul>\n<li id=\"e588\">To view the approximate size of a running container, you can use the command\u00a0<code>docker container ls -s<\/code>.<\/li>\n<li id=\"9c1b\">Running\u00a0<code>docker image ls<\/code>\u00a0shows the sizes of your images.<\/li>\n<li id=\"8441\">To see the size of the intermediate images that make up your image use\u00a0<code>docker image history my_image:my_tag<\/code>.<\/li>\n<li id=\"6524\">Running\u00a0<code>docker image inspect my_image:tag<\/code>\u00a0will show you many things about your image, including the sizes of each layer. Layers are subtly different than the images that make up an entire image. But you can think of them as the same for most purposes. Check out\u00a0<a href=\"https:\/\/windsock.io\/explaining-docker-image-ids\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/windsock.io\/explaining-docker-image-ids\/\" data->this great article<\/a>\u00a0by Nigel Brown if you want to dig into layer and intermediate image intricacies.<\/li>\n<li id=\"1415\">Installing and using the\u00a0<a href=\"https:\/\/github.com\/wagoodman\/dive\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/github.com\/wagoodman\/dive\" data->dive<\/a>\u00a0package makes it easy to see into your layer contents.<\/li>\n<\/ul>\n<p id=\"faaf\">I updated the above section Feb. 8, 2019 to use management command names. In the next part of this series we\u2019ll dive further into common Docker commands. Follow me to make sure you don\u2019t miss it.<\/p>\n<p id=\"e552\">Now let\u2019s look at a few best practices to slim things down.<\/p>\n<h3 id=\"e6ed\">Eight Best Practices to Reduce Image Sizes &amp; Build\u00a0Times<\/h3>\n<p id=\"bc77\">1. Use an official base image whenever possible. Official images are updated regularly and are more secure than un-official images.<br \/>\n2. Use variations of Alpine images when possible to keep your images lightweight.<br \/>\n3. If using apt, combine RUN apt-get update with apt-get install in the same instruction. Then chain multiple packages in that instruction. List the packages in alphabetical order over multiple lines with the\u00a0<code><\/code>\u00a0character. For example:<\/p>\n<p id=\"d768\"><span style=\"font-family: courier new,courier,monospace;\">RUN apt-get update &amp;&amp; apt-get install -y<br \/>\npackage-one<br \/>\npackage-two<br \/>\n<code>&amp;&amp; rm -rf \/var\/lib\/apt\/lists\/*<\/code><\/span><\/p>\n<p id=\"cd54\">This method reduces the number of layers to be built and keeps things nice and tidy.<br \/>\n4. Include\u00a0<code>&amp;&amp; rm -rf \/var\/lib\/apt\/lists\/*<\/code>\u00a0at the end of the RUN instruction to clean up the apt cache so it isn\u2019t stored in the layer. See more in the\u00a0<a href=\"https:\/\/docs.docker.com\/develop\/develop-images\/dockerfile_best-practices\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/docs.docker.com\/develop\/develop-images\/dockerfile_best-practices\/\" data->Docker Docks<\/a>. Thanks to\u00a0<a href=\"https:\/\/medium.com\/@avijayr\" target=\"_blank\" rel=\"noopener noreferrer\" data-action=\"show-user-card\" data-action-type=\"hover\" data-action-value=\"6c50ea34d16f\" data-anchor-type=\"2\" data-href=\"https:\/\/medium.com\/@avijayr\" data-user-id=\"6c50ea34d16f\" data->Vijay Raghavan Aravamudhan<\/a>\u00a0for this suggestion. Updated Feb. 4, 2019.<br \/>\n5. Use caching wisely by putting instructions likely to change lower in your Dockerfile.<br \/>\n6. Use a\u00a0.dockerignore file to keep unwanted and unnecessary files out of your image.<br \/>\n7. Check out\u00a0<a href=\"https:\/\/github.com\/wagoodman\/dive\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/github.com\/wagoodman\/dive\" data->dive<\/a>\u200a\u2014\u200aa very cool tool for inspecting your Docker image layers and helping you trim the fat.<br \/>\n8. Don\u2019t install packages you don\u2019t need. Duh! But common.<\/p>\n<h3 id=\"258d\">Wrap<\/h3>\n<p id=\"8007\">Now you know how to make Docker images that build quickly, download quickly, and don\u2019t take up much space. As with eating healthy, knowing is half the battle.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article you&rsquo;ll learn how to speed up your Docker build cycles and create lightweight images. One of Docker&rsquo;s strengths is that it provides caching to help you more quickly iterate your image builds. When building an image, Docker steps through the instructions in your Dockerfile, executing each in order. As each instruction is examined, Docker looks for an existing intermediate image in its cache that it can reuse instead of creating a new (duplicate) intermediate image.<\/p>\n","protected":false},"author":369,"featured_media":2442,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[94],"ppma_author":[2134],"class_list":["post-1622","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science"],"authors":[{"term_id":2134,"user_id":369,"is_guest":0,"slug":"jeff-hale","display_name":"Jeff Hale","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Hale","first_name":"Jeff","job_title":"","description":"Jeff Hale is a co-founder of Rebel Desk, where he oversees technology, finance, and operations for this company. He&nbsp;is an experienced entrepreneur who has managed technology, operations, and finances for several companies.&nbsp;"}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1622","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/369"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1622"}],"version-history":[{"count":3,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1622\/revisions"}],"predecessor-version":[{"id":30039,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1622\/revisions\/30039"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/2442"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1622"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1622"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1622"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1622"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}