Rust Foundation Series: Case for generic Partial Function application!

Context

When consuming data from a source, either for an inquiry or for some analytical purpose, we often source data based on some qualifying criteria. The intent is to consume data that interests us and ignore the rest. Throw no qualms about such irrelevant data. Looking at this from a basic set theory perspective, it’s the process of defining some kind of relationship from a source S to some target T. A function that exhibits such selective applicative behavior is referred to as a partial function. A mapping to the anticipated target type is achieved, only if the element from the source qualifies for the given predicate criteria.

Rust’s standard library has necessary type semantics along with auxiliary methods, which in combination can help meet the objective of partial function application, for a potential outcome. However, if the desire is for a lean idiomatic expressive interface, one must build it. This article presents an approach by leveraging traits and generics support for a partial function application support, that is stateless, idiomatically expressive and composable, and supports an implicit mandate for graceful handling of possible terminal states.

Application Scenarios

The Qualify stage as shown in the above visual is a typical scenario where a partial function application finds its relevance. Here are few practical examples:

  • Process event logs with certain severity levels, with an intent to enhance Observability and Supportability requirements.
  • In an IoT scenario, like an end user gadget case, sniff signals that warrant an immediate healthcare dispatch to assist the vulnerable.
  • In certain data processing frameworks like Spark, you’ll come across constructs where a conditional application of certain transformation logic is implemented. For example, a plan optimizer.

Objectives

The following are a few objectives to help realize the goal of a generic partial function support for both scalar and vector type categories:

  • A stateless implementation with standardized signature support, that abstracts away redundant boilerplate.
  • Favors static type mapping and call-site intention for object lifetimes that get validated at compile-time.
  • Predicate and Transformer closures will work with immutable value references.
  • Follows a DRY approach when extending the function support to different concrete types.
  • Allows straight as well as cross type mapping; extended to collection types. For example:
    • Extract values for all matching keys from a given key-value pair collection.
    • Collect processed elements from a Vec type collection to a LinkedList or vice-versa.
  • Implicit mandate for graceful error handling, including systemic failures, where further traversal becomes irrelevant.

Canonical Representation

Primarily a function’s application is over some discrete value, which we can refer to as some scalar type, for example an i8 or a &str kind. When iterating over a collection of elements, the partial function can be passed in as a closure to the iterator’s traversal function. For example, a map, flat-map or a for-each kind. For the topic’s scope we’ll present the case where the partial function is passed in as a closure to a map transformer.

Breaking the signature details further, the following is the high-level support for scalars and vectors:

//For a scalar value
<sT>.map_scalar(predicate, transformer) -> Result<tT, String>

//Over some elements of a vector
{<sT>}.map_vector(predicate, transformer) -> Result<{tT}, String>

Internally, the map_vector linearly traverses the source vector, iteratively applying the predicate to first qualify the element for the acceptance criteria, and then subjectively applies the transformer. If such outcome is an Ok(tT) type, it’ll be accumulated into the target iterator. While the objective here is to come-up with an end-to-end approach, there’s more room for further efficiency improvement, for example optimizing the time-complexity factor.

As you’d notice, the signature preferred use of Result type, over an Option type. Both are enumeration types with two variants useful enough to represent the outcome of a partial function application. However, the Result type offers completeness and satisfies the implicit mandate requirement to be explicit about side-effects.

Approach

Scalar types

In alignment with aforementioned objectives, the signature support is implicitly added to chosen types by first defining the higher order function i.e., the map_* as a method of a trait as shown below:

Partial Function support for Scalar types

The following line of code automatically enables implicit availability of the map_scalar method for some scalar type sT (out-of-the-box or a defined custom type):

impl<'a, 'b, sT: 'a> PartialFnForScalars for sT {}

Here’s an example of its usage to conditionally process some string slice (&str):

Example of Partial Function over Scalar types

Vector types

The following is a code snippet that extends the support to vector types i.e., collections:

Partial Function support for Vectors

While the above code is self-explanatory, a few noteworthy aspects of further assistance are as follows:

  • Any vector i.e., collection type that can render an Iter<’_, T> type can leverage above support. For example, a LinkedList’s iter() method renders an Iter<’_, T> type instance.
  • The filter_map method is the key ingredient of this approach that makes the selective mapping possible. The method helps collect outcomes of Some<tT>, while ignoring values of type None.
  • Output values are collected to some collection of FromIterator kind, where the composite element is some defined target type tT.
  • The Iterator trait is the super trait on this custom trait definition. This helps inject valid use of an iterator’s auxiliary methods, such as the filter_map or the collect.
  • Further the use of this map_vector is restricted to iterator type collections with some source type nested element. It’s a trait bound for the call-site to adhere; it gets validated at compile-time.
  • Likewise, the call-sites can be explicit about expected collection type in the Result.
    • For example, the results can be accumulated to same or different vector collection type such as Vec, Vec-Deque, LinkedList, HashSet, etc.
    • The standard API documentation provides some useful guidance on how to make a choice of a collection type, based on objective of the use case.

Similar to the Scalars, support for a partial function application can be introduced to Vector types using the following code:

impl<'a, 'b, sT: 'a, tT: 'b> 
    PartialFnForVectors<'a, 'b, sT, tT>
        for std::collections::linked_list::Iter<'_, sT> {}

Here’s an example usage of a partial function over a LinkedList type string slice collection, where the outcome is collected to a Vec type comprised of Strings:

Cross-collection type mapping

Example use of Partial Function over Vector types

Extending the approach further, a similar trait support for a cross-collection type can be introduced, as shown in the following snippet:

Partial Function support to achieve cross collection type mapping

Here’s an example application of partial function to selectively choose value portion from an input key-value pair collection:

As you’d notice, there’s something interesting that happens on the line#216. The keys in the original collection i.e., a BTreeMap are sorted by keys, and are evaluated in that order, producing an output collection of a Vec type. A Vec’s pop retrieves the last pushed element which isn’t the last key in the source list, but the first qualified one. Likewise, the call-site provided transformer is achieving the cross mapping, by returning the key value as the result.

Runtime Characteristics

The partial function design favored a stateless approach, with mere ephemeral state either in some CPU register, or an iterator to which values are collected, however assigned to a variable at call-site at the turn of its execution scope. The following is a brief description of the runtime profile of the partial function or its use in iterators, described as space-time complexity:

  • Space complexity per element is the max size requirement to represent the Result value, which is composed of some target mapped value type, or an error string. So, for a given result of type Result<i8, String> the at-most size requirement here is a fixed length pointer reference of 24 bytes, with value assigned to some heap space.
  • Time wise, given each element processing carries an at most 2N complexity (predicate and transformer), for a given collection size, the complexity grows linearly and is represented as O(2N). By using parallel computation methods over proportionally sliced subsets, the source collection can be processed with at-most time complexity of O(log 2N).

What’s the motivation?

While the initial context set tone on a purpose driven inquiry to find support to help accomplish partial function application, the drive to achieve something in a lean, idiomatic way came from prior experience using Scala and Scala based data engineering frameworks, such as Delta. Few examples to quote are here’s one from Spark – Delta code base, and here’s a more generalized reference from Scala book’s Partial Functions documentation.

Additional Thoughts

  • The current approach is a good fit for call-sites intending to selectively consume some source data (may be from a stream), by applying the partial function support in micro-batches.
    • Note, the expected return type for all qualified elements processed by the iterator’s closure is some Iterator.
    • Instead of mapping the outcome to an iterator i.e., with collect interface, the transformer can directly write the outcomes from qualified elements to some external sink.
    • In such case, the iterator can gracefully conclude without having to hold the processed elements, even momentarily. This will help extend the supportability to a pure streaming requirement.
  • Counters – considering a partial function application is mostly a boundary application scenario i.e., at entry or at exit in moving data or operations, the respective closures may be modified to FnMut kind, where in they can then update some atomic counters.
  • Function serialization – this is something off-tangential to current article’s scope with some attached nuances. Hope to bring another article about remote function passing. If anyone has done any research or intend to share your pursuits, do Little-‘r’ me via one of the posted social channels. One source that I constantly visit for tips, clues is the Rust Programming Language forum.

Conclusion

Thank you, the Reader, for visiting, investing your valuable time and enduring this far (if you have). As always, a strong believer about learning being a bi-directional reflective process. Looking forward to any inputs to better the approach!

Leave a Reply