21. How many Workers should I use in Apache Storm?
The total number of workers is set by the supervisors — there’s some number of JVM slots each supervisor will superintend. The thing you set on the topology is how many worker slots it will try to claim.
There’s no great reason to use more than one worker per topology per machine.
With one topology running on three 8-core nodes, and parallelism hint 24, each bolt gets 8 executors per machine, i.e. one for each core. There are three big benefits to running three workers (with 8 assigned executors each) compare to running say 24 workers (one assigned executor each).
22. How do you set the batch size in Apache Storm?
Trident doesn’t place its own limits on the batch count. In the case of the Kafka spout, the max fetch bytes size divided by the average record size defines an effective records per subbatch partition.
23. How does Apache Storm implement reliability in an efficient way?
A Storm topology has a set of special “acker” tasks that track the DAG of tuples for every spout tuple. When an acker sees that a DAG is complete, it sends a message to the spout task that created the spout tuple to ack the message. You can set the number of acker tasks for a topology in the topology configuration using Config.TOPOLOGY_ACKERS. Storm defaults TOPOLOGY_ACKERS to one task per worker.