Parallel processing of streams in Java 8

Streams can be sequential and parallel. Operations on sequential streams are performed in one processor thread, on parallel streams - using several processor threads. Parallel streams use a common ForkJoinPool available through the static ForkJoinPool.commonPool() method. Moreover, if the environment is not multi-core, then the thread will be executed as sequential. In fact, the use of parallel streams boils down to the fact that the data in the streams will be divided into parts, each part is processed on a separate processor core, and at the end these parts are connected and terminal operations are performed on them.

You can also use the parallelStream() method of the Collection interface to create a parallel stream from a collection.

To parallelize a normal serial stream, call the parallel() method on the Stream. The isParallel() method lets you know if a stream is parallel.

With the help of the parallel() and sequential() methods, you can determine which operations can be parallel, and which only sequential. You can also make a parallel stream from any sequential stream and vice versa:

collection
.stream()
.peek(...) // operation is sequential
.parallel()
.map(...) // operation can be performed in parallel,
.sequential()
.reduce(...) // operation is sequential again

Typically, items are uploaded to the stream in the same order in which they are defined in the data source. When working with parallel streams, the system maintains the order of the elements. The exception is the forEach() method, which can display elements in any order. And to preserve the order, you need to use the forEachOrdered() method.

Criteria that can affect performance in parallel streams:

  • Data size - the larger the data, the more difficult it is to split the data first and then merge it.
  • The number of processor cores. In theory, the more cores a computer has, the faster the program will run. If the machine has one core, there is no point in using parallel threads.
  • The simpler the data structure with which the thread operates, the faster the operations will occur. For example, the data from ArrayList is easy to use because the structure of the collection assumes a sequence of unrelated data. But a collection of type LinkedList is not the best option, since in a sequential list, all elements are related to the previous/next ones. And such data is difficult to parallelize.
  • Operations on data of primitive types will be performed faster than on class objects.
  • It is highly discouraged to use parallel streams for any lengthy operations (for example, network connections), since all parallel streams work with one ForkJoinPool, then such long operations can stop the work of all parallel streams in the JVM due to the lack of available streams in the pool, those parallel streams should be used only for short operations, where the counting goes for milliseconds, but not for those where the counting can go for seconds and minutes;
  • Preserving order in parallel streams increases execution overhead, and if order is not important, then it is possible to disable order preservation and thereby increase performance by using the intermediate unordered() operation:

    collection.parallelStream()
        .sorted()
        .unordered()
        .collect(Collectors.toList());
    


Read also:


Comments

Popular posts from this blog

Methods for reading XML in Java

XML, well-formed XML and valid XML

ArrayList and LinkedList in Java, memory usage and speed