Amazon S3 Bucket retrieval

**eek** · 19 December 2020, 13:19

That won’t work see amazon s3 - C# AWS S3 - List objects created before or after a certain time - Stack Overflow

**tazdevil** · 19 December 2020, 17:43

Originally posted by eek View Post

That won’t work see amazon s3 - C# AWS S3 - List objects created before or after a certain time - Stack Overflow

Yes it doesn't so I'm having to pull the full object list back each time but have optimised identification of what's already been processed so only retrieve the data for new objects, not ideal but it works. I just hope I find a more efficient approach before the the volumes get too big. Over 100K files in there already for half a years worth of data so it'll grow over time

Unfortunately I've no option to get at the data earlier in the process before its put into S3

**eek** · 20 December 2020, 07:34

Originally posted by tazdevil View Post

Yes it doesn't so I'm having to pull the full object list back each time but have optimised identification of what's already been processed so only retrieve the data for new objects, not ideal but it works. I just hope I find a more efficient approach before the the volumes get too big. Over 100K files in there already for half a years worth of data so it'll grow over time

Unfortunately I've no option to get at the data earlier in the process before its put into S3

Do they need to be stored where they are after you've pulled them or could you move them to say a processed bucket?

**RasputinDude** · 21 December 2020, 09:55

Depending on whether you have access to the AWS account or not, would setting up an event triggering a lambda or a SQS be an option for you?

Configuring Amazon S3 event notifications - Amazon Simple Storage Service

**BigDataPro** · 21 December 2020, 13:30

Life will be much easier, if you could make a simple design change.

1. Consider writing new objects into another folder (e.g. temp-bucket)
2. When file copy is completed in temp-bucket, write a dummy file (e.g. success) to indicate successful copy.
3. Use S3 events that will get fired upon 'success' file creation. (in S3 events you can watch for specific file or pattern etc)
4. Write a lambda that will get triggered based on step 3.
5. Lambda function will do whatever is required and finally delete the contents of temp-bucket.

This is useful for batch loads. If you are receiving continuous stream of data, then you need a different approach.

Hope this helps

Amazon S3 Bucket retrieval