Don't download the internet. Share Maven and Ivy repositories with Docker containers

Downloading the internet. A term commonly used when building Maven projects. It is due to downloading your applications dependencies and their transitive dependencies. With a normal application those dependencies can add up to quite a few jars and a lot of megabytes to download. Especially on a 'clean' machine without cached dependencies in your local repository.

Running a new container in Docker sometimes feels the same as every image layer it is built on top of also has to be downloaded. Eventually these are also cached in the Docker cache.

If you then have a Maven based application in Docker you have the perfect storm of bandwidth hogging. Especially as your Docker image's Maven configuration will by default not reuse any cached dependencies and instead always download everything from Maven Central. And for each new build or launch it will re-download the internet as it knows of no local cached dependencies.

Work around, not solution

My work around is to mount my local host repositories for Maven, and for its derivative Ivy, as data volumes for the Docker container. This then avoids re-downloading the internet on every further build and launch and will also reuse dependencies I most likely have already downloaded on the host.

It is not the best solution as your images should ideally not be influenced by what you have on your host machine, but it saves a lot of time, and a lot of grief.

Vagrant

When I run my docker containers inside a Vagrant instance I first mount my host repositories by adding these 2 lines to the Vagrantfile:

config.vm.synced_folder "/Users/myusername/.m2", "/home/vagrant/.m2"

config.vm.synced_folder "/Users/myusername/.ivy2", "/home/vagrant/.ivy2"


Boot2docker

Since Docker 1.3 the OSX /Users path has been by default shared with the boot2docker VM on the same path. So to share the maven repositories you need to link the /Users path to your docker user home folder.

   boot2docker ssh;

   ln -s /Users/mysername/.m2 /home/docker/.m2;
   ln -s /Users/mysername/.ivy2 /home/docker/.ivy2

Data volume container

For easy sharing these folders between many docker containers I mount them as a data volume container and naming it ‘maven’.

   docker run -d -P --name maven \
   -v ~/.m2:/root/.m2:rw \
   -v ~/.ivy2:/root/.ivy2:rw ubuntu

I am basing it on the basic Ubuntu image, and it will stop immediately as no process is running. That is fine, the mounted volumes will still work.

Alternatively instead of mounting the host folders you can have a persistent container with these folders exposed as VOLUME in the Dockerfile so it is shared in a similar manner.

Launch container 

   docker run -d --volumes-from maven \
   yourapplicationimage

Obviously you probably have other options on your docker launch, but the important bit here is the ‘--volumes-from maven’ which maps all the volumes from the data volume container called maven into this container.

Now in theory all containers with Maven and Ivy based build tools such as SBT should first look at the mounted Maven and Ivy repositories for their dependencies.

Build problem

Unfortunately if you build new images frequently you will have the same problem still as the above solution is only for running containers not building. During the build stage Docker does not allow you mount any volumes. This is so it is reproducible anywhere.

In your Dockerfile you can ADD or COPY whole repositories into your image, but that will make it very bloated with a lot of irrelevant dependencies unless you somehow construct and maintain a perfect repository. (Ps. don’t do this).

Maven repository manager

One solution that makes sense in general is to add you companies Maven repository manager’s public Maven Central mirror to the image. That way at least you will only download the intranet, not the internet.

For a Maven build add a settings.xml file to your image folder with at least these settings if using Nexus:

<mirrors>
   <mirror>
      <id>Nexus</id>
      <name>Nexus Public Mirror</name>
      <url>http://nexusmachine/nexus/content/groups/public</url>
      <mirrorOf>central</mirrorOf>
   </mirror>
</mirrors>
                
Then add this to your Dockerfile:

   ADD settings.xml /root/.m2/settings.xml

Note this will be overwritten if you later during the run stage mount .m2 folder on top of it, but for most that is fine as the .m2 folder usually contain a relevant settings.xml file anyway.

For a SBT build add this repositories file to your image folder with this content:


[repositories]
local 
maven-local
company-repo: http://nexusmachine/nexus/content/groups/public
scala-tools-releases
maven-central

Then add this to your Dockerfile:

   ADD repositories /root/.sbt/repositories

Repository manager container

A further solution is to run a repository manager as another container and have all Maven/SBT builds refer to it instead. This avoids the involvement of the host computer yet saves any network traffic as everything is localhost.


    docker pull mattgruter/artifactory

Configure then run and name it:

    docker run -d --name artifactory \
    mattgruter/artifactory

Modify the above settings.xml and repositories to refer to you locally linked artifactory name as url instead, e.g this one liner:

   company-ivy-repo: http://artifactory/artifactory/repo/,
   [organization]/[module]/[revision]/[type]s/[artifact](-[classifier]).[ext]
   company-repo: http://artifactory/artifactory/repo/

(The example above added it as an Ivy repository which Artifactory supports).



Hopefully these tips will avoid downloading the internet too often and save a few grey hairs.

Comments

Comments below were made on a legacy Blog, before move to current Blog (February 2019)

Unknown:

I got same pain point to manage container. It's very helpful to me. Thanks.

30 Jun 2015, 09:52:00
Creative Commons License

Unless otherwise specified, all content is licensed under Creative Commons by Attribution 3.0 (CC-BY).

Externally linked images and content are not included and are licensed and copyrighted separately.