Reduce vs Fold in Scala
For better or worse in functional programming there exists fuzzy concrete notions on what it means to fold over some input versus what it means to reduce over some input. The concrete implementations vary from language to language and even between some libraries. Yet, the underlying fundemental concept tends to hold true, that you are taking some value of N
and performing aggregation operations on it such that the end result is typically some value of <= N
.
Even in haskell the fold
family of functions is overloaded in the sense that it is capable of doing both what scala calls reducing and folding. For example, take the following haskell function that takes a list of integers and produces a sum value in terms of folding:
sumList :: [Int] -> Int
sumList = foldl (+) 0
In haskell one might also implement the filterList
combinator in terms of folding. The definition of which would resemble the following example:
filterList :: (a -> bool) -> [a] -> [a]
filterList p = foldl (\ m n -> if (p n) then m ++ [n] else m) []
While the fold derived operations of sumList
and filterList
both take a value of N
and perform aggregegate operations on it such that the end result is a a value of <= N
, their type signatures tell a different story.
sumList
takes a list of integers and produces a single integer while filterList
takes a list of integers and produces a subset of that list. Putting their type signatures next to each other highlights this important difference.
sumList :: [Int] -> Int
filterList :: (a -> bool) -> [a] -> [a]
This is all possible because foldl's type signature is really generic in the sense that all it cares about is that the type of the final value is the same type of the initial value. It's type signature is as follows:
foldl :: (a -> b -> a) -> a -> [b] -> a
Now this all becomes important when we take the type signatures for scala's implementations of folding and reducing into consideration. Both of which are as follows:
def foldLeft [B] (z: B)(f: (B, A) => B): B
def reduceLeft [B >: A] (f: (B, A) => B): B
A quick glance reveals some of the following differences:
- foldLeft has a paramaterized type of
B
while reduceLeft has a paramaterized type ofB >: A
- foldLeft is a curried function where the first curried group takes the folds initial value and that reduceLeft is non curried and doesn't take an initial value at all
- In foldLeft the accumulator and the final return type (B) must be related to the item type (A).
So if reduceLeft doesn't take an initial value as an argument, then what is it's initial value? The initial value of reduceLeft
is the initial value of the collection. So if you have a List[Int]
and you reduceLeft
over it, you will only ever be able to produce an Int
(or a supertype of Int) from that operation. However, foldLeft
is capable of producing any type you can pass into the initial value.
So we would be able to easily convert our sumList
implementation from haskell to scala using reduceLeft
like so:
def sum(inputs: List[Int]): Int = inputs.reduceLeft (_+_)
It's impossible for us to implement filterList
in terms of reduceLeft
. However, we can easily implement filterList
in terms of foldLeft
like so:
def filter(inputs: List[Int], p: (Int) => Boolean) = {
inputs.foldLeft(List[Int]()) { (m, n) =>
p(n) match {
case true => m ++ List(n)
case false => m
}
}
}
Some interesting things about what can be represented in terms of reduceLeft
and foldLeft
outside of the type system are related to wether the parameter type for the iterable is a monoid or a semi-group. If the type parameter of your iterable is a monoid
, meaning it has a zero value and an append operation, you can implement reduceLeft
in terms of foldLeft
. But if your type parameter is a semi-group
, meaning it has an append operation but no zero value then there are a subset of reduceLeft
operations that can not be implemented in terms of foldLeft
.