You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Similarly to the file connector, ingesting data from S3 would be fantastic.
S3 can emit notifications of new files onto SQS, Kinesis, etc. so it may be beneficial to hook in there.
Essentially, it would be great if Brooklin could be notified of new S3 files and then ingest the actual files, so we can output them onto Kafka.
It may be necessary to differentiate between different types of files
Plain-text line-by-line
Single-line JSON objects
Pretty-printed JSON
Finally, using import java.util.zip.{ GZIPInputStream, ZipInputStream }, files could be unarchived on-the-fly.
Describe the solution you'd like
Provide the system with an S3 bucket and credentials.
New S3 files will be streamed into data sink (the AWS REST API allows actual streaming of files). Depending on type of file, apply different logic to unarchive/read (see above).
I'd like to have the file streamed into separate Kafka messages depending on the above logic.
For example:
New file foo.tar.gz is written to S3
Notification is emitted by AWS
File is streamed into Brooklin
File is automatically unarchived using GZIPInputStream
File contains 1.json and 2.json, which have pretty printed JSON objects inside
Send each JSON object from each of the file in a separate message onto Kafka
Describe alternatives you've considered
Custom implementation of the above logic using an SQS client and Kafka Streams
Kafka Connect has an S3 connector but the officially supported one only allows Kafka -> S3, not S3 as a source
Additional context
This would be an extremely valuable connector when working with systems that can export their data feeds to S3.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Similarly to the file connector, ingesting data from S3 would be fantastic.
S3 can emit notifications of new files onto SQS, Kinesis, etc. so it may be beneficial to hook in there.
Essentially, it would be great if Brooklin could be notified of new S3 files and then ingest the actual files, so we can output them onto Kafka.
It may be necessary to differentiate between different types of files
Finally, using
import java.util.zip.{ GZIPInputStream, ZipInputStream }
, files could be unarchived on-the-fly.Describe the solution you'd like
Provide the system with an S3 bucket and credentials.
New S3 files will be streamed into data sink (the AWS REST API allows actual streaming of files). Depending on type of file, apply different logic to unarchive/read (see above).
I'd like to have the file streamed into separate Kafka messages depending on the above logic.
For example:
foo.tar.gz
is written to S3GZIPInputStream
1.json
and2.json
, which have pretty printed JSON objects insideDescribe alternatives you've considered
Additional context
This would be an extremely valuable connector when working with systems that can export their data feeds to S3.
The text was updated successfully, but these errors were encountered: