Trever Ehrfurth
Data Engineer
Docker Overlay2 Balooning Issue Fix
When I just started learning Docker and all of its wonders, I was so excited for all the new containers and services I could deploy at ease. Until I started discovering a small problem, much like a sliver that grew to a rather annoying and sometimes detrimental issue causing outages.
I am of course talking about the rogue Overlay2 directory.
A directory that likes to balloon out beyond control, taking all of your VM's memory for hostage and causing whatever services that read/write on your docker container to seize and lock up. An issue that has plagued so many in the community, one could (and this one has) spent countless hours, days, weeks, and yes months; troubleshooting this annoying and persistent thorn that has taken much of the magic away from developers, homelaber's and enthusiasts alike.
I was immediately baffled by how many posts and cries for help were plastered all over the internet in forums, help sites, reddit threads and more. I couldn't imagine why such a common and widespread issue had not been alleviated in this span of time. One of my favorite findings from someone who knew more than most had said "It's not a bug, it’s a feature" and I whole heartedly hoped they were jesting. In part, I’ve come to understand they were not and it is both a blessing and a curse.
What is Docker Overlay2?
Overlay2 can get complicated quick, and a detailed explanation of overlay2 and other docker storage drivers are available on dockers website. In short, its the preferred method for the container file system. The var/lib/docker/overlay2 folder represents a file system that every file on that the container read or writes can be found within. It is a Union File System, the container does not make direct modifications to any original files so there may be two copies of the same file. The base image or original files are stored in a "lowerdir", the modified files are stored in the "upperdir" and the combination of the two is the "merged" folder. For example, there is a file called foo that is 50mb in the "lowerdir". The container writes to the file which causes a modified file containing the changes to exist in the "upperdir".
How to Investigate
If you have spent hours scouring the web for fixes, you've most likely come across many or possibly all of these attempts to fix the problem. There is no one fix-all solution here unfortunately.
Running a docker system prune might do the trick for you. This however, only removes available space of non-running containers and images. You can check your available space with this command:
docker system df
If you wish to take it a step further, you may stop all of your running containers to clear them out as well. This is non-destructive and won't interfere or delete configs and important files when brought back up. Here is a quick snippet to stop all containers and prune:
docker stop $(docker ps -a -q)
docker system prune -a
Once these are ran you can redeploy your containers/stacks or run your docker compose up commands.
If you are like me, this did not solve much, if anything. Another reason may be due to growing log files. This usually is a slow growing problem unless you have dozens of very active containers. To quickly find if logs are your issue, you can run this command to determine the space of all your docker logs:
find /var/lib/docker/containers -type f -name "*.log" | xargs du -sh
You cannot delete running containers logs safely. But a clever way around this is simply truncating. You can truncate all your container logs to free up space with this:
truncate -s 0 /var/lib/docker/containers/*/*-json.log
Yet again, this helped, but did not solve my problem. These solutions may help some of you or at least give you the means of some quick and easy cleanup if/when necessary. Next I will finally go into what solved the issue for me across all my virtual machines and servers.
The Fix
After weeks of trial and error, I have found this issue for this in my use case that I believe applies to many out there facing this same issue. If you are like me, and love to have a nice GUI for docker and are using Portainer, this might be the key to fixing your issues as well as mine.
In my VM's, I always install docker and portainer and launched all of my containers through stacks which were essentially my docker-compose.yaml files.
I found portainer and that setup was my culprit all along. In that setup and whilst using stacks to house and deploy everything, my disk space was eaten up quickly and hastened with more services deployed. What solved all of my issues was simply not running my docker-compose code in stacks, and creating an actual docker-compose.yaml file in my docker directory and using docker compose up -d to deploy the services and not portainer.
It seems portainer has some extra bits in the way that cause this issue and running everything natively from within docker and just letting portainer handle visualization of the containers is the best route to go. While it may add some inconvenience to a few, this actually adds more functionality once you get more familiar with docker as you have the ability to setup cron jobs for not only stopping your containers but removing them and relaunching them with two simple commands. In the event you ever need to do a clear out, you can run these or simply set them to a schedule such as once a week while you are doing your other backups. The commands for stopping and starting:
docker compose down
docker compose up -d
In conclusion, while portainer is great for visualizing running containers, viewing logs and even restarting some broken containers, do not rely on it to run and manage what docker does best. Remove the middle man and keep your docker instance as intended, with docker-compose.yaml files. As you learn docker and portainer, many will show you launching services from portainer stacks or even creating containers on their own. While this does work, it can create some odd quirks that end up being more problematic than they're worth.