Mobius/notes/configuration-mobius.md

3.4 KiB

Mobius Configuration

Type Property Name Usage
Worker spark.mobius.CSharpWorker.maxProcessCount Sets max number of C# worker processes in Spark executors
Streaming (Kafka) spark.mobius.streaming.kafka.CSharpReader.enabled Enables use of C# Kafka reader in Mobius streaming applications
Streaming (Kafka) spark.mobius.streaming.kafka.maxMessagesPerTask.<topicName> Sets the max number of messages per RDD partition created from specified Kafka topic to uniformly spread load across tasks that process them
Streaming (Kafka) spark.mobius.streaming.kafka.numPartitions.<topicName>[.<clusterId>] Sets RDD partitions to a different number from kafka parations per topic and per cluster(optional, defined as "cluster.id" in kafkaParams if the topic is from multiple kafka clusters) to uniformly and better spread load across tasks that process them
Streaming (Kafka) spark.mobius.streaming.kafka.fetchRate Set the number of Kafka metadata fetch operation per batch
Streaming (Kafka) spark.mobius.streaming.kafka.numReceivers Set the number of threads used to materialize the RDD created by applying the user read function to the original KafkaRDD.
Streaming (UpdateStateByKey) spark.mobius.streaming.parallelJobs Sets 0-based max number of parallel jobs for UpdateStateByKey so that next N batches can start its tasks on time even if previous batch not completed yet. default: 0, recommended: 1. It's a special version of spark.streaming.concurrentJobs which does not observe UpdateStateByKey's state ordering properly
Worker spark.mobius.CSharp.socketType Sets the socket type that will be used in IPC when transferring data between JVM and CLR. Valid values for this setting are:
  • Normal: default .Net Socket implementation will be used. This is the default socket type in Mobius.
  • Rio: Windows RIO socket will be used. This option can be used only in Windows OS. To use this socket type, "Prefer 32-bit" option should be set to false when building Mobius driver program.
  • Saea: .Net Socket implementation with SocketAsyncEventArgs class will be used
Riosocket and SaeaSocket has better performance when dealing with larger data transmission than traditional .Net Socket. Significant performance improvement has been observed by using RIO/SAEA socket types when the average size of each row in the data processed in Mobius is over 4KB. You can profile your application for different socket types and decide which one offers best performance for your data. Depending on the OS, either Rio (Windows-only) or Saea (Windows/Linux) socket types can be used for data with larger row sizes
Worker spark.mobius.CSharpWorker.readBufferSize Sets the buffer size in bytes for data read operation from JVM to CSharpWorker. By default the value is 8KB if not explicitly specified. A typical scenario which can benefits a lot from this option is that CSharpWorker reads large amount of small records from JVM process. Please adjust the number based on your scenario.
Worker spark.mobius.CSharpWorker.writeBufferSize Sets the buffer size in bytes for data write operation from CSharpWorker to JVM. The default value is 8KB. Usually better performance can be gained if specify this option with a proper value when CSharpWorker needs to sends lots of small records (multiple bytes size) back to JVM process. Please adjust the buffer size based on your scenario.