Amazon S3 Bucket retrieval

Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

BigDataPro replied

21 December 2020, 13:30
Life will be much easier, if you could make a simple design change.

1. Consider writing new objects into another folder (e.g. temp-bucket)
2. When file copy is completed in temp-bucket, write a dummy file (e.g. success) to indicate successful copy.
3. Use S3 events that will get fired upon 'success' file creation. (in S3 events you can watch for specific file or pattern etc)
4. Write a lambda that will get triggered based on step 3.
5. Lambda function will do whatever is required and finally delete the contents of temp-bucket.

This is useful for batch loads. If you are receiving continuous stream of data, then you need a different approach.

Hope this helps

Last edited by BigDataPro; 21 December 2020, 13:33.
Leave a comment:
RasputinDude replied

21 December 2020, 09:55
Depending on whether you have access to the AWS account or not, would setting up an event triggering a lambda or a SQS be an option for you?

Configuring Amazon S3 event notifications - Amazon Simple Storage Service
Leave a comment:
eek replied

20 December 2020, 07:34
Originally posted by tazdevil View Post

Yes it doesn't so I'm having to pull the full object list back each time but have optimised identification of what's already been processed so only retrieve the data for new objects, not ideal but it works. I just hope I find a more efficient approach before the the volumes get too big. Over 100K files in there already for half a years worth of data so it'll grow over time Unfortunately I've no option to get at the data earlier in the process before its put into S3

Do they need to be stored where they are after you've pulled them or could you move them to say a processed bucket?
Leave a comment:
tazdevil replied

19 December 2020, 17:43
Originally posted by eek View Post

That won’t work see amazon s3 - C# AWS S3 - List objects created before or after a certain time - Stack Overflow

Yes it doesn't so I'm having to pull the full object list back each time but have optimised identification of what's already been processed so only retrieve the data for new objects, not ideal but it works. I just hope I find a more efficient approach before the the volumes get too big. Over 100K files in there already for half a years worth of data so it'll grow over time Unfortunately I've no option to get at the data earlier in the process before its put into S3
Leave a comment:
eek replied

19 December 2020, 13:19
That won’t work see amazon s3 - C# AWS S3 - List objects created before or after a certain time - Stack Overflow
Leave a comment:
tazdevil started a topic Amazon S3 Bucket retrieval

19 December 2020, 12:40
Amazon S3 Bucket retrieval

Anyone here got experience with Amazon S3 buckets?

I'm writing an integration workflow that'll pull objects out of S3 on a polled basis. The buckets are being filled independently and I just need to get all new objects I haven't yet retrieved. It looks like the argument to use is withStartAfter from ListObjectsV2Request. Not an issue but am I guaranteed that the way S3 works is that keys are retrieved in creation order? If not I can't see the point in withStartAfter and will probably just have to retrieve all keys every time which is inefficient! I'm concerned because in a test retrieval was obviously not in creation order but by folder then by name
Tags: None