Skip to content

ujjawal-khare-27/ray-kafka-producer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kafka Producer for Efficient Data Streaming to Kafka

This Python Kafka producer facilitates high-performance data streaming from Ray DataFrame and Pandas DataFrame to Kafka. It is optimized to provide approximately 3-4x performance improvement compared to standard Kafka producers.

Installation

Install the package using pip:

pip3 install ray_kafka_producer@git+https://github.com/ujjawal-khare-27/ray-kafka-producer@main --force-reinstall

Usage

  1. Import the package
from ray_kafka_producer.producer_manager import KafkaProducerManager
  1. Create an instance of KafkaProducerManager
# actor_pool_size is the number of actors that will be created to send data to Kafka
# num_cpu is the number of CPUs that will be allocated to each actor
kafka_producer_manager = KafkaProducerManager(bootstrap_servers="localhost:9092", topic="test", actor_pool_size=12,
            num_cpu=0.25)
  1. Send messages to Kafka (Ray DataFrame)
kafka_producer_manager.flush_ray_df(df = ray_df, is_actor=True)
  1. Send messages to Kafka (Pandas DataFrame)
kafka_producer_manager.flush_pandas_df(df = pandas_df)

Releases

No releases published

Packages

 
 
 

Contributors