AWS S3, “simple storage service”, is the classic AWS service. It was the first to launch, the first one I ever used and, seemingly, lies at the very heart of almost everything AWS does.
Given that S3 is essentially a filesystem, a logical thing is to be able to count the files in an S3 bucket. Illustrated below are three ways.
Method 1: aws s3 ls
S3 is fundamentally a filesystem and you can just call ls on it. Yep – ls in the cloud. blink
aws s3 ls s3://adl-ohi/ --recursive --summarize | grep "Total Objects:" Total Objects: 444803
Method 2: aws s3api
And since S3 is a modern filesystem, it actually has an API that you can call. Yep – a json api. blink blink
aws s3api list-objects --bucket adl-ohi --output json --query "[length(Contents)]" [ 448444 ]
Method 3: A Python Example
Naturally you can just run code to do all this. I started with an example from the Stack Overflow link below that was written for boto and upgraded it to boto3 (as still a Python novice, I feel pretty good about doing this successfully; I remember when Ruby went thru the same AWS v2 to v3 transition and it sucked there too). I also learned how to dynamically introspect methods from Python objects as part of this debugging cycle.
#!/usr/local/bin/python import sys import boto3 s3 = boto3.resource('s3') s3bucket = s3.Bucket(sys.argv) size = 0 totalCount = 0 for key in s3bucket.objects.all(): totalCount += 1 size += key.size print('total size:') print("%.3f GB" % (size*1.0/1024/1024/1024)) print('total count:') print(totalCount)
which gives output like this:
python3 scratch/count_s3.py adl-ohi total size: 0.298 GB total count: 486468
Note: I have a live upload happening on another machine so the numbers do change and that’s actually fine.