The Stream API

Map/Filter/Reduce algorithm implementation in JDK

Steps of the Map/Filter/Reduce algorithm:

Map

Changes the type of the data, does not change the number of elements. The mapping step respects the order of your objects

Filter

Does not change the type of the data, changes the number of elements. You may end up with no data

Reduce

Produces a result

If we want to create an efficient implementation of the Map/Filter/Reduce algorithm, it should not duplicate any data. It should work in the same way, performance wise, as the iterative pattern. Map/Filter/Reduce algorithm implemented by the Collection API would cause duplication of the data (storing the data in an intermediate structure)

The Stream API

Implementation of the Map/Filter/Reduce algorithm that does not duplicate any data and that does not create any load on the CPU or memory

Stream by definition is an empty object - it does not carry any kind of data
Every time you call a method on a Stream that returns another Stream, it is going to be a new Stream object

Difference between the Stream API and iterative approach is that in the Stream API we describe the computation and not how this computation should be conducted. This is none of my business, this is the business of the API

Reduce method triggers the computation of the elements, and those elements are going to be taken one-by-one, first mapped, then filtered and computed if they pass the filtering step. Using Streams is about creating pipelines of operations.

Intermediate operation (Map/Filter)

Operation that creates a Stream (it does not do anything, it does not process any data)

Terminal operation (Reduce)

Operation that produces a result (it will trigger the computation of the elements)

If you have a pattern using a Stream that does not end up with a Terminal operation, your pattern is not going to process any data. It will be useless code
You are not allowed to process the same Stream twice. This is why it is completely useless to create intermittent variables to store the Streams. You should inline it

The Stream API

The Stream API gives you 4 interfaces:

  • Stream<T> - a sequence of elements supporting sequential and parallel aggregate operations

    Modifier and Type Method Description

    <R> Stream<R>

    map(Function<? super T,? extends R> mapper)

    Returns a stream consisting of the results of applying the given function to the elements of this stream.

    IntStream

    mapToInt(ToIntFunction<? super T> mapper)

    Returns an ⚪ IntStream consisting of the results of applying the given function to the elements of this stream.

    <R> Stream<R>

    flatMap(Function<? super T,? extends Stream<? extends R>> mapper)

    Returns a stream consisting of the results of replacing each element of this stream with the contents of a mapped stream produced by applying the provided mapping function to each element

    long count = cities.stream() (1)
        .flatMap(city -> city.getPeople().stream()) (2)
        .count(); (3)
    1 3 cities
    2 With 3 people each
    3 Returns 9

    Stream<T>

    filter(Predicate<? super T> predicate)

    Returns a stream consisting of the elements of this stream that match the given predicate.

    Optional<T>

    reduce(BinaryOperator<T> accumulator)

    Performs a reduction on the elements of this stream, using an associative accumulation function, and returns an 🔴 Optional<T> describing the reduced value, if any. See Reducing Data to compute Statistics.

    T

    reduce(T identity, BinaryOperator<T> accumulator)

    Performs a reduction on the elements of this stream, using the provided identity value and an associative accumulation function, and returns the reduced value. See Reducing Data to compute Statistics.

    <R, A> R

    collect(Collector<? super T,A,R> collector)

    Performs a mutable reduction operation on the elements of this stream using a ⚪ Collector<T,A,R>. See The Collectors API and Collecting Data from Streams to create Lists/Sets/Maps.

    Stream<T>

    distinct()

    Returns a stream consisting of the distinct elements (according to 🟢 Object#equals(Object obj)) of this stream.

    Stream<T>

    sorted()

    Returns a stream consisting of the elements of this stream, sorted according to natural order.

    long

    count()

    Returns the count of elements in this stream.

    Optional<T>

    min(Comparator<? super T> comparator)

    Returns the minimum element of this stream according to the provided ⚪ Comparator<T>.

    Optional<T>

    max(Comparator<? super T> comparator)

    Returns the maximum element of this stream according to the provided ⚪ Comparator<T>.

    Object[]

    toArray()

    Returns an array containing the elements of this stream.

    <A> A[]

    toArray(IntFunction<A[]> generator)

    Returns an array containing the elements of this stream, using the provided generator function to allocate the returned array, as well as any additional arrays that might be required for a partitioned execution or for resizing. For example toArray(String[]::new) will return String[].

  • IntStream - int primitive specialization of ⚪ Stream<T>

    Modifier and Type Method Description

    OptionalInt

    min()

    Returns an 🔴 OptionalInt describing the minimum element of this stream, or an empty optional if this stream is empty.

    OptionalInt

    max()

    Returns an 🔴 OptionalInt describing the maximum element of this stream, or an empty optional if this stream is empty.

    OptionalDouble

    average()

    Returns an 🔴 OptionalDouble describing the arithmetic mean of elements of this stream, or an empty optional if this stream is empty.

    int

    sum()

    Returns the sum of elements in this stream.

    IntSummaryStatistics

    summaryStatistics()

    Returns an 🟢 IntSummaryStatistics describing various summary data about the elements of this stream.

  • LongStream - long primitive specialization of ⚪ Stream<T>. It has similar methods to ⚪ IntStream

  • DoubleStream - double primitive specialization of ⚪ Stream<T>. It has similar methods to ⚪ IntStream

The Collectors API

🔴 Collectors - implementations of ⚪ Collector<T,A,R> that implement various useful reduction operations, such as accumulating elements into collections, summarizing elements according to various criteria, etc.

Type parameters of ⚪ Collector<T,A,R>:

  • T - the type of input elements to the reduction operation

  • A - the mutable accumulation type of the reduction operation (often hidden as an implementation detail)

  • R - the result type of the reduction operation

Modifier and Type Method Description

static <T> Collector<T,?,List<T>>

toList()

Returns a ⚪ Collector<T,A,R> that accumulates the input elements into a new ⚪ List<T>, in encounter order.

static <T> Collector<T,?,List<T>>

toUnmodifiableList()

Returns a ⚪ Collector<T,A,R> that accumulates the input elements into an unmodifiable ⚪ List<T>, in encounter order.

static <T> Collector<T,?,Set<T>>

toSet()

Returns a ⚪ Collector<T,A,R> that accumulates the input elements into a new ⚪ Set<T>.

static <T> Collector<T,?,Set<T>>

toUnmodifiableSet()

Returns a ⚪ Collector<T,A,R> that accumulates the input elements into an unmodifiable ⚪ Set<T>.

static <T, C extends Collection<T>> Collector<T,?,C>

toCollection(Supplier<C> collectionFactory)

Returns a ⚪ Collector<T,A,R> that accumulates the input elements into a new ⚪ Collection<T>, in encounter order. For example toCollection(MyCollection::new) will return 🟢 MyCollection.

static Collector<CharSequence,?,String>

joining()

Returns a ⚪ Collector<T,A,R> that concatenates the input elements into a 🔴 String, in encounter order. The string won’t be delimited.

static Collector<CharSequence,?,String>

joining(CharSequence delimiter)

Returns a ⚪ Collector<T,A,R> that concatenates the input elements, separated by the specified delimiter, in encounter order.

static Collector<CharSequence,?,String>

joining(CharSequence delimiter, CharSequence prefix, CharSequence suffix)

Returns a ⚪ Collector<T,A,R> that concatenates the input elements, separated by the specified delimiter, with the specified prefix and suffix, in encounter order.

static <T, K> Collector<T,?,Map<K,List<T>>>

groupingBy(Function<? super T,? extends K> classifier)

Returns a ⚪ Collector<T,A,R> implementing a "group by" operation on input elements of type T, grouping elements according to a classification function, and returning the results in a ⚪ Map.

static <T, K, A, D> Collector<T,?,Map<K,D>>

groupingBy(Function<? super T,? extends K> classifier, Collector<? super T,A,D> downstream)

Returns a ⚪ Collector<T,A,R> implementing a cascaded "group by" operation on input elements of type T, grouping elements according to a classification function, and then performing a reduction operation on the values associated with a given key using the specified downstream ⚪ Collector<T,A,R>.

static <T> Collector<T,?,Long>

counting()

Returns a ⚪ Collector<T,A,R> accepting elements of type T that counts the number of input elements.

static <T> Collector<T,?,Integer>

summingInt(ToIntFunction<? super T> mapper)

Returns a ⚪ Collector<T,A,R> that produces the sum of an integer-valued function applied to the input elements. For example summingInt(City::getPopulation) will return population of all cities.

Building a Stream from Data in Memory

  • Creating a Stream from Arrays:

    • 🔴 Arrays#stream(T[] array) - returns a sequential Stream<T> with the specified array as its source

    • Stream<T>#of(T…​ values) - returns a sequential ordered stream whose elements are the specified values

      Stream<String> streamOfStrings = Stream.<String>of("abcd", "efgh");
  • Creating a Stream from a Text File:

    • 🔴 Files#lines(Path path) - read all lines from a file as a Stream<String>

      Path path = Path.of("src/main/resources/first-names.txt"); (1)
      try (Stream<String> lines = Files.lines(path)) {
          long count = lines.count(); (2)
      } catch (IOException e) {
          e.printStackTrace();
      }
      1 File with 200 lines
      2 Returns 200
  • Creating a Stream from a RegEx:

    • 🔴 Pattern#splitAsStream(CharSequence input) - creates a Stream<String> from the given input sequence around matches of this pattern.

      String sentence = "the quick brown fox jumps over the lazy dog";
      
      (1)
      String[] words = sentence.split(" ");
      long count1 = Arrays.stream(words).count(); (3)
      
      (2)
      long count2 = Pattern.compile(" ").splitAsStream(sentence).count(); (3)
      1 Bad Practice: we create an array and store it in memory
      2 Best Practice: we do not create an array when we create a ⚪ Stream<T> from 🔴 Pattern
      3 Returns 9
  • Creating an ⚪ IntStream (Stream of ASCII Codes) from a String:

    • 🔴 String#chars() - Returns an ⚪ IntStream of int zero-extending the char values from this sequence.

      String sentence = "the quick brown fox jumps over the lazy dog";
      sentence.chars()
          .mapToObj(codePoint -> Character.toString(codePoint)) (1)
          .filter(letter -> !letter.equals(" "))
          .distinct().sorted().forEach(System.out::print);
      1 Converts ⚪ IntStream to ⚪ Stream<String>
  • Selecting elements of a Stream:

    IntStream.range(0, 30)
        .skip(10) (1)
        .limit(10) (2)
        .forEach(index -> System.out.print(index + " ")); (3)
    1 Skip first 10 elements
    2 Take next 10 elements after skip
    3 Prints "10 11 12 13 14 15 16 17 18 19"
  • Closing a ⚪ Stream<T> with a ⚪ Predicate<? super T>:

    • takeWhile - consume elements of the ⚪ Stream<T> until ⚪ Predicate<? super T> is true

      Stream.<Class<?>>iterate(ArrayList.class, c -> c.getSuperclass())
          .takeWhile(c -> c != null)
          .forEach(System.out::println);
      // Prints:
      class java.util.ArrayList
      class java.util.AbstractList
      class java.util.AbstractCollection
      class java.lang.Object
    • dropWhile - consume remaining elements of the ⚪ Stream<T> after ⚪ Predicate<? super T> becomes false

      Stream.<Class<?>>iterate(ArrayList.class, c -> c.getSuperclass())
          .dropWhile(c -> !c.equals(AbstractCollection.class))
          .forEach(System.out::println);
      // Prints
      class java.util.AbstractCollection
      class java.lang.Object
      null
      Exception in thread "main" java.lang.NullPointerException: Cannot invoke "java.lang.Class.getSuperclass()" because "c" is null

Converting a For Loop to a Stream

  • The following:

    int sum = 0;
    int count = 0;
    for (Person person : people) {
        if (person.getAge() > 20) {
            count++;
            sum += person.getAge();
        }
    }
    double average = 0d;
    if (count > 0) {
        average = sum / count;
    }

    can be converted into:

    double average = people.stream()
        .mapToInt(Person::getAge)
        .filter(age -> age > 20)
        .average().orElseThrow();
  • The Stream API always computes one thing. Never sacrifice the readability of your code to the performance. The performance here is measured in nanoseconds:

    double totalAmount = 0;
    int frequentRenterPoints = 0;
    String statement = composeHeader();
    for (Rental rental : rentals) {
        totalAmount += computeRentalAmount(rental);
        frequentRenterPoints += getFrequentRenterPoints(rental);
        statement += computeStatementLine(rental);
    }
    statement += composeFooter(totalAmount, frequentRenterPoints);

    can be converted into:

    double totalAmount = rentals.stream()
        .mapToDouble(this::computeRentalAmount)
        .sum();
    int frequentRenterPoints = rentals.stream()
        .mapToInt(this::getFrequentRenterPoints)
        .sum();
    String statement = composeHeader();
    statement += rentals.stream()
        .map(this::computeStatementLine)
        .collect(Collectors.joining());
    statement += composeFooter(totalAmount, frequentRenterPoints);
Forget about processing your data in one pass (unless you are doing an SQL request). Most of the time when you have a for loop or when you have a ⚪ Stream<T>, the JVM will optimize that for you and get rid of your for loop, get rid of your iteration, and you will have a zero pass of your data, just inline code, extremely performant, and extremely efficient. See Fine-grained optimizations provided by JIT Compiler in Java

Reducing Data to compute Statistics

⚪ Stream<T>#reduce(T identity, BinaryOperator<T> accumulator) adds the Identity Element before the elements of the ⚪ Stream<T>.

  • If you have an empty ⚪ Stream<T>, it will return Identity Element

  • If you have only one element in the ⚪ Stream<T>, it will return the reduction of the Identity Element and this only element:

    stream reduce

Some reduction operations do not have any Identity Element (in case for the ⚪ IntStream#min(), the ⚪ IntStream#max(), the ⚪ IntStream#average() and ⚪ Stream<T>#reduce(BinaryOperator<T> accumulator).
🔴 Optional<T> are used by the Stream API, because in cases where we have an empty ⚪ Stream<T> without any Identity Element we don’t have any result.

Collecting Data from Streams to create Lists/Sets/Maps

Collector

Complex object used to reduce a Stream. Can be used to gather data in collections and maps - it is called as reduction in a "mutable container" or mutable reduction

Downstream Collector

Collector that is passed to 🔴 Collectors#groupingBy(Function<? super T,? extends K> classifier, Collector<? super T,A,D> downstream) which is applied to the streaming of the list of values

The Collector API

Uses the ⚪ Stream#collect(Collector<? super T,A,R> collector) that takes ⚪ Collector<T,A,R> implementation as a parameter:

  • Use 🔴 Collectors class and its factory methods

  • You can create your own collectors, but it is complex and tricky

// Bad Practice
List<Person> people = new ArrayList<>();
List<Person> peopleFromNewYork = new ArrayList<>(); (1)
people.stream()
    .filter(p -> p.getCity().equals("New York"))
    .forEach(p -> peopleFromNewYork.add(p));

// Best Practice
List<Person> people = new ArrayList<>();
List<Person> peopleFromNewYork = people.stream()
    .filter(p -> p.getCity().equals("New York"))
    .collect(Collectors.toList()); (2)
1 Creating a ⚪ List<E> to store the result
2 Using ⚪ Collector<T,A,R> to store the result in ⚪ List<E>