玩转Java8Stream（三、Collectors收集器）

之前的文章中也提到了，Stream 的核心在于 Collectors，即对处理后的数据进行收集。Collectors 提供了非常多且强大的 API，可以将最终的数据收集成 List、Set、Map，甚至是更复杂的结构 (这三者的嵌套组合)。

Collectors 提供了很多 API，有很多都是一些函数的重载，这里我个人将其分为三大类，如下：

数据收集：set、map、list
聚合归约：统计、求和、最值、平均、字符串拼接、规约
前后处理：分区、分组、自定义操作

1. API 使用

这里会讲到一些常用 API 的用法，不会讲解所有 API，因为真的是太多了，而且各种 API 的组合操作起来太可怕太复杂了。

2. 数据收集

Collectors.toCollection() 将数据转成 Collection，只要是 Collection 的实现都可以，例如 ArrayList、HashSet ，该方法接受一个 Collection 的实现对象或者说 Collection 工厂的入参。

@Test
public void testCollection() {
    // List
    ArrayList<Integer> integers = Stream.of(1, 2, 3, 4, 5, 6, 8, 9, 0)
        .collect(Collectors.toCollection(ArrayList::new));

    // Set
    HashSet<Integer> integerHashSet = Stream.of(1, 2, 3, 4, 5, 6, 8, 9, 0)
        .collect(Collectors.toCollection(HashSet::new));
}

Collectors.toList() 和 Collectors.toSet() 其实和 Collectors.toCollection() 差不多，只是指定了容器的类型，默认使用 ArrayList 和 HashSet。本来我以为这两个方法的内部会使用到 Collectors.toCollection()，结果并不是，而是在内部 new 了一个 CollectorImpl。

// List
Stream.of(1,2,3,4,5,6,8,9,0).collect(Collectors.toList());

// Set
Stream.of(1,2,3,4,5,6,8,9,0).collect(Collectors.toSet());

Collectors.toMap() 和 Collectors.toConcurrentMap()，见名知义，收集成 Map 和 ConcurrentMap，默认使用 HashMap 和 ConcurrentHashMap。这里 toConcurrentMap() 是可以支持并行收集的，这两种类型都有三个重载方法，不管是 Map 还是 ConcurrentMap，他们和 Collection 的区别是 Map 是 K-V 形式的，所以在收集成 Map 的时候必须指定收集的 K (依据)。这里 toMap() 和 toConcurrentMap() 最少参数是 key的获取，要存的 value。

示例：这里以 Student 这个结构为例，Student 包含 id、name。

@Data
@NoArgsConstructor
@AllArgsConstructor
public class Student {
    private String id;
    private String name;
}

说明：这里制定 k 为 id，value 既可以是对象本身，也可以指定对象的某个字段。可见 map 的收集自定义性非常高。

@Test
public void testMap() {
    Student studentA = new Student("20190001", "小明");
    Student studentB = new Student("20190002", "小红");
    Student studentC = new Student("20190003", "小丁");

    //Function.identity() 获取这个对象本身，那么结果就是 Map<String,Student> 即 id -> student
    //串行收集
    Stream.of(studentA, studentB, studentC)
        .collect(Collectors.toMap(Student::getId, Function.identity()));

    //并发收集
    Stream.of(studentA, studentB, studentC)
        .parallel()
        .collect(Collectors.toConcurrentMap(Student::getId, Function.identity()));

    //================================================================================

    //Map<String,String> 即 id -> name
    //串行收集
    Stream.of(studentA, studentB, studentC)
        .collect(Collectors.toMap(Student::getId, Student::getName));

    //并发收集
    Stream.of(studentA, studentB, studentC)
        .parallel()
        .collect(Collectors.toConcurrentMap(Student::getId, Student::getName));

}

那么如果 key 重复的该怎么处理？这里我们假设有两个 id 相同 Student，如果他们 id 相同，在转成 Map 的时候，取 name 大的一个，小的将会被丢弃。

// Map<String,Student>
Stream.of(studentA, studentB, studentC)
    .collect(Collectors.toMap(Student::getId,
                              Function.identity(),
                              BinaryOperator.maxBy(Comparator.comparing(Student::getName))));


//可能上面比较复杂，这编写一个命令式
// Map<String,Student>
Stream.of(studentA, studentB, studentC)
    .collect(Collectors.toMap(Student::getId,
                              Function.identity(),
                              (s1, s2) -> {
                                  //这里使用compareTo 方法 s1>s2 会返回1,s1==s2 返回0 ，否则返回-1
                                  if (s1.getName().compareTo(s2.getName()) < -1) {
                                      return s2;
                                  } else {
                                      return s1;
                                  }
                              }));

如果不想使用默认的 HashMap 或者 ConcurrentHashMap , 第三个重载方法还可以使用自定义的 Map 对象 (Map 工厂)。

//自定义 LinkedHashMap
// Map<String,Student>
Stream.of(studentA, studentB, studentC)
    .collect(Collectors.toMap(Student::getId,
                               Function.identity(),
                               BinaryOperator.maxBy(Comparator.comparing(Student::getName)),
                               LinkedHashMap::new));

3. 聚合规约

1）Collectors.joining()，拼接，有三个重载方法，底层实现是 StringBuilder，通过 append 方法拼接到一起，并且可以自定义分隔符（这个感觉还是很有用的，很多时候需要把一个 list 转成一个String，指定分隔符就可以实现了，非常方便）、前缀、后缀。

示例：

@Test
public void testJoining() {
    Student studentA = new Student("20190001", "小明");
    Student studentB = new Student("20190002", "小红");
    Student studentC = new Student("20190003", "小丁");

    //使用分隔符：201900012019000220190003
    String str1 = Stream.of(studentA, studentB, studentC)
        .map(Student::getId)
        .collect(Collectors.joining());
    System.out.println(str1);

    //使用^_^ 作为分隔符
    //20190001^_^20190002^_^20190003
    String str2 = Stream.of(studentA, studentB, studentC)
        .map(Student::getId)
        .collect(Collectors.joining("^_^"));
    System.out.println(str2);

    //使用^_^ 作为分隔符
    //[]作为前后缀
    //[20190001^_^20190002^_^20190003]
    String str3 = Stream.of(studentA, studentB, studentC)
        .map(Student::getId)
        .collect(Collectors.joining("^_^", "[", "]"));
    System.out.println(str3);
}

2）Collectors.counting() 统计元素个数，这个和 Stream.count() 作用都是一样的，返回的类型一个是包装 Long，另一个是基本 long，但是他们的使用场景还是有区别的，这个后面再提。

示例：

@Test
public void testCounting() {
    // Long 8
    Long l1 = Stream.of(1, 0, -10, 9, 8, 100, 200, -80)
        .collect(Collectors.counting());

    //如果仅仅只是为了统计，那就没必要使用 Collectors 了，那样更消耗资源
    // long 8
    long l2 = Stream.of(1, 0, -10, 9, 8, 100, 200, -80)
        .count();
}

3）Collectors.minBy()、Collectors.maxBy() 和 Stream.min()、Stream.max() 作用也是一样的，只不过 Collectors.minBy()、Collectors.maxBy() 适用于高级场景。

示例：

@Test
public void testMinMax() {
    // maxBy 200
    Stream.of(1, 0, -10, 9, 8, 100, 200, -80)
        .collect(Collectors.maxBy(Integer::compareTo)).ifPresent(System.out::println);

    // max 200
    Stream.of(1, 0, -10, 9, 8, 100, 200, -80)
        .max(Integer::compareTo).ifPresent(System.out::println);

    // minBy -80
    Stream.of(1, 0, -10, 9, 8, 100, 200, -80)
        .collect(Collectors.minBy(Integer::compareTo)).ifPresent(System.out::println);

    // min -80
    Stream.of(1, 0, -10, 9, 8, 100, 200, -80)
        .min(Integer::compareTo).ifPresent(System.out::println);
}

4）Collectors.summingInt()、Collectors.summarizingLong()、Collectors.summarizingDouble() 这三个分别用于 int、long、double 类型数据一个求总操作，返回的是一个 SummaryStatistics (求总)，包含了数量统计 count、求和 sum、最小值 min、平均值 average、最大值 max。虽然 IntStream、DoubleStream、LongStream 都可以是求和 sum 但是也仅仅只是求和，没有 summing 结果丰富。如果要一次性统计、求平均值什么的，summing 还是非常方便的。

示例：

@Test
public void testSummarize() {
    //IntSummaryStatistics{count=10, sum=55, min=1, average=5.500000, max=10}
    Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
        .collect(Collectors.summarizingInt(Integer::valueOf));

    //DoubleSummaryStatistics{count=10, sum=55.000000, min=1.000000, average=5.500000, max=10.000000}
    Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
        .collect(Collectors.summarizingDouble(Double::valueOf));

    //LongSummaryStatistics{count=10, sum=55, min=1, average=5.500000, max=10}
    Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
        .collect(Collectors.summarizingLong(Long::valueOf));


    // 55
    Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).mapToInt(Integer::valueOf)
        .sum();

    // 55.0
    Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).mapToDouble(Double::valueOf)
        .sum();

    // 55
    Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).mapToLong(Long::valueOf)
        .sum();
}

5）Collectors.averagingInt()、Collectors.averagingDouble()、Collectors.averagingLong() 求平均值，适用于高级场景，这个后面再提。

示例：

@Test
public void testAvg() {
    // 5.5
    final Double d1 = Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
        .collect(Collectors.averagingInt(Integer::valueOf));
    System.out.println(d1);

    // 5.5
    final Double d2 = Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
        .collect(Collectors.averagingDouble(Double::valueOf));
    System.out.println(d2);

    // 5.5
    final Double d3 = Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
        .collect(Collectors.averagingLong(Long::valueOf));
    System.out.println(d3);
}

6）Collectors.reducing() 好像也和 Stream.reduce() 差不多，也都是规约操作。其实 Collectors.counting() 就是用 reducing() 实现的，如代码所示：

1
2
3

public static <T> Collector<T, ?, Long> counting() {
    return reducing(0L, e -> 1L, Long::sum);
}

那既然这样的话，我们就实现一个对所有学生名字长度求和规约操作。

示例：

@Test
public void testReducing() {
    Student studentA = new Student("20190001", "小明");
    Student studentB = new Student("20190002", "小红");
    Student studentC = new Student("20190003", "小丁");
    //Optional[6]
    Stream.of(studentA, studentB, studentC)
        .map(student -> student.getName().length())
        .collect(Collectors.reducing(Integer::sum));

    //6
    //或者这样，指定初始值，这样可以防止没有元素的情况下正常执行
    Stream.of(studentA, studentB, studentC)
        .map(student -> student.getName().length())
        .collect(Collectors.reducing(0, (i1, i2) -> i1 + i2));


    //6
    //更或者先不转换，规约的时候再转换
    Stream.of(studentA, studentB, studentC)
        .collect(Collectors.reducing(0, s -> s.getName().length(), Integer::sum));
}

4. 前后处理

1）Collectors.groupingBy() 和 Collectors.groupingByConcurrent()，这两者区别也仅是单线程和多线程的使用场景。为什么 groupingBy归类为前后处理呢？groupingBy 是在数据收集前分组的，再将分好组的数据传递给下游的收集器。

这是 groupingBy 最长的参数的函数 classifier 是分类器，mapFactory map 的工厂，downstream下游的收集器，正是downstream 的存在，可以在数据传递个下游之前做很多的骚操作。

public static <T, K, D, A, M extends Map<K, D>>
    Collector<T, ?, M> groupingBy(Function<? super T, ? extends K> classifier,
                                  Supplier<M> mapFactory,
                                  Collector<? super T, A, D> downstream)

示例：这里将一组数整型数分为正数、负数、零，groupingByConcurrent() 的参数也是跟它一样的就不举例了。

@Test
public void testGrouping() {
    //Map<String,List<Integer>>
    final Map<String, List<Integer>> m1 = Stream.of(-6, -7, -8, -9, 1, 2, 3, 0, 4, 5, 6)
        .collect(Collectors.groupingBy(n -> n == 0 ? "等于" : n < 0 ? "小于" : "大于"));

    //Map<String, Set<Integer>>
    //自定义下游收集器
    final Map<String, Set<Integer>> m2 = Stream.of(-6, -7, -8, -9, 1,  0, 2, 3, 4, 5, 6)
        .collect(Collectors.groupingBy(n -> n == 0 ? "等于" : n < 0 ? "小于" : "大于", Collectors.toSet()));

    //LinkedHashMap<String, Set<Integer>>
    //自定义map容器 和 下游收集器
    final LinkedHashMap<String, Set<Integer>> m3 = Stream.of(-6, -7,  0, -8, -9, 1, 2, 3, 4, 5, 6)
        .collect(Collectors.groupingBy(n -> n == 0 ? "等于" : n < 0 ? "小于" : "大于", LinkedHashMap::new, Collectors.toSet()));

}

2）Collectors.partitioningBy() 字面意思话就叫分区好了，但是 partitioningBy 最多只能将数据分为两部分，因为 partitioningBy 分区的依据是 Predicate，而 Predicate 只会有 true 和 false 两种结果，所有 partitioningBy 最多只能将数据分为两组。partitioningBy 除了分类器与 groupingBy 不一样外，其他的参数都相同。

示例：

@Test
public void testPartition() {
    //Map<Boolean,List<Integer>>
    final Map<Boolean, List<Integer>> m1 = Stream.of(0, 1, 0, 1)
        .collect(Collectors.partitioningBy(integer -> integer == 0));

    //Map<Boolean,Set<Integer>>
    //自定义下游收集器
    final Map<Boolean, Set<Integer>> m2 = Stream.of(0, 1, 0, 1)
        .collect(Collectors.partitioningBy(integer -> integer == 0, Collectors.toSet()));
}

3）Collectors.mapping() 可以自定义要收集的字段。

示例：

1
2
3

// List<String>
Stream.of(studentA,studentB,studentC)
    .collect(Collectors.mapping(Student::getName,Collectors.toList()));

4）Collectors.collectingAndThen() 收集后操作，如果你要在收集数据后再做一些操作，那么这个就非常有用了。

示例：这里在收集后转成了 listIterator，只是个简单的示例，具体的实现逻辑非常有待想象。

@Test
public void testCollecting() {
    Student studentA = new Student("20190001", "小明");
    Student studentB = new Student("20190002", "小红");
    Student studentC = new Student("20190003", "小丁");
    final ListIterator<Student> data = Stream.of(studentA, studentB, studentC)
        .collect(Collectors.collectingAndThen(Collectors.toList(), List::listIterator));
}

5. 总结

Collectors 作为 Stream 的核心，功能丰富强大，在我所写的业务代码中，几乎没有 Collectors 完不成的，实在太难，只要多想想，多试试这些 API 的组合，相信还是可以用 Collectors 来完成的。