Process logs in batches and expose source info#7
Conversation
Previous implementation was listing all the objects in the bucket. If you feed a considerably large bucket for the first time, this can take too much time postponing actual events import and also blow your machine memory. Now we list at most `@batch_size` objects and start their processing. Also this change exposes `s3_bucket` and `s3_key` that might be very useful. Like `path` exposed by the `file` plugin.
|
Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run; then say 'jenkins, test it'. |
|
So this PR was submitted long time ago. Since that moment new logstash versions has been released. And AFAIK source information is already exposed. |
|
Batching would be tremendous. I honestly don't think they maintain this plugin anymore with how little feedback I've seen on other PRs. |
| process_s3_objects(queue, objects) | ||
| end | ||
|
|
||
| return sorted_objects = objects.keys.sort {|a,b| objects[a] <=> objects[b]} |
There was a problem hiding this comment.
It seems like you've eliminated the sort step. In cases where alphabetical order matches chronological order by last_modified time, this will work (since the S3 API always returns results in alphabetical order) but if not this creates a problem because the sincedb assumes that objects are always handled in the same order.
Previous implementation was listing all the objects in the bucket.
If you feed a considerably large bucket for the first time, this can
take too much time postponing actual events import and also blow
your machine memory.
Now we list at most
@batch_sizeobjects and start their processing.Also this change exposes
s3_bucketands3_keythat might be veryuseful. Like
pathexposed by thefileplugin.NB!
I haven't tested how it now works with
backup_bucket. Please let me know if you see any potential issues.I would also make some tests, if I figured out the best way to mock s3 bucket input for this plugin...