Some computing problems fall into the category of evergreen – no matter what you do, they will always, always, always occur – they are evergreen just like a pine tree. Today's version of this is storage. I've been running an EC2 node which has a SystemD service (no flames; I actually like SystemD although I do regard it as a betrayal of Unix's heritage but …) which processes some data via a Ruby application which is run through Docker.
I was monitoring the underlying processing queue and I noticed that this box had seemingly stopped processing leading to a slow down in my data pipeline. When I dug into the box, I noticed that the box was out of disc space. This led to the first question "#*#$&#$# where did my disc space go" and caused me to invoke this shell incantation:
cd / && sudo du -h --max-depth=1
which gave me this:
16K ./opt 105M ./boot 789M ./snap 56K ./root 234M ./lib 40K ./tmp 763M ./home 844K ./run 0 ./dev 16G ./var 1.6G ./usr 15M ./sbin 15M ./bin du: cannot access './proc/17764/task/17764/fd/4': No such file or directory du: cannot access './proc/17764/task/17764/fdinfo/4': No such file or directory du: cannot access './proc/17764/fd/3': No such file or directory du: cannot access './proc/17764/fdinfo/3': No such file or directory 0 ./proc 4.0K ./mnt 4.0K ./srv 4.0K ./lib64 0 ./sys 16K ./lost+found 4.0K ./media 5.5M ./etc 20G .
Looking at the above, I could see that the bulk of the data was in /var so I changed into var and did it again:
cd /var && sudo du -h --max-depth=1
which gave me:
4.0K ./local 4.0K ./opt 36K ./snap 4.0K ./mail 11G ./lib 24K ./tmp 5.3G ./log 768K ./backups 82M ./cache 4.0K ./crash 28K ./spool 16G .
Clearly I could have looked at log but I chose to go after lib which was twice as large:
cd lib && sudo du -h --max-depth=1 cd lib && sudo du -h --max-depth=1 20K ./update-notifier 544K ./usbutils 16K ./amazon 0 ./lxcfs 8.0K ./sudo 12K ./grub 11G ./docker 32K ./polkit-1 33M ./dpkg 28K ./pam 4.0K ./unattended-upgrades 4.0K ./misc 4.0K ./dhcp 4.0K ./git 4.0K ./os-prober 4.0K ./python 8.0K ./logrotate 12K ./AccountsService 338M ./snapd 432K ./systemd 4.0K ./lxd 129M ./apt 4.0K ./landscape 12K ./private 184K ./containerd 120K ./ucf 4.0K ./ubuntu-release-upgrader 11M ./mlocate 4.0K ./command-not-found 680K ./cloud 8.0K ./vim 4.0K ./plymouth 12K ./update-manager 4.0K ./man-db 4.0K ./dbus 8.0K ./ureadahead 16K ./initramfs-tools 11G .
And that told me that that culprit was Docker! A vague memory of having this issue earlier in my life led me to these docker commands:
docker system df
root@ip-172-31-15-140:/var/lib/docker/overlay2/fbacd11ec88524762f258c92223fa8499fb4514c4a2d494e4cf5078d924be626# docker system df TYPE TOTAL ACTIVE SIZE RECLAIMABLE Images 35 9 9.139GB 8.929GB (97%) Containers 37 1 2.287GB 2.287GB (100%) Local Volumes 0 0 0B 0B Build Cache 0 0 0B 0B
and then the logical successor to docker system df was:
docker system prune -f
After that I had disc space again and then I could start my SystemD service with:
systemctl status systemctl status service.service systemctl start service.service
Yes my service is unimaginatively named "service.service"
In Case Rails is Part of Your System
I ran out of disc space today (July 27, 2020) on a production system running Rails and I started to follow the process above and then I thought – "Wait – Rails log". and I:
- Changed into right user for the Rails process.
- Changed into the right directory for the Rails app (i.e. RAILS_ROOT/current).
- Executed a du -h log
- Ran a bundle exec rake log:clear when I saw the size of the logs was 90 gigs.