Tuple Labels Considered Harmful

[O]ur intellectual powers are rather geared to master static relations and our power to visualizae processes eveloving in time are relatively poorly developed. For that reason we should do (as wise programmers aware of our limitations) our utmost to shortedn the conceptual gap between the static program and the dynamic process… - E. Dijkstra Go To Statement Considered Harmful

Starting with C# 7.0 there has been syntactic sugar to label tuple items. That means that if you had a coordinate system that you wanted to describe, then you could simply have (int x, int y) as a type in your code. For example, you could have the following code that calculates the distance between two points:

(int x, int y) p1 = (1, 3);
(int x, int y) p2 = (4, 4);

var distance = Math.Sqrt(Math.Pow(p2.x - p1.x, 2) + Math.Pow(p2.y - p1.y));

This is incredibly convenient for local variables or internal interfaces. The problem I find is when this is used for public interfaces and to encapsulate functionality.

Use in interfaces

As far as I can tell, the lable is syntactic sugar that makes your code that your reading and editing easier to read. That means that any method that you write with this syntax in its signature will have the lables stripped out in the published binary. The user of your published code will have to access the tuple items as Item1 through Item9 instead of your nice well thought out labels.

As an example I created a simple class library with the following implementation. It’s a pretty clean implementation of a Geometry class that provides the mid point between two other points where all points are implemented as tuples of doubles with labels x and y.

namespace TupleLib
{
    public static class Geometry
    {
        public static (double x, double y) Midpoint((double x, double y) p1, (double x, double y) p2)
        {
            return (
                (p1.x + p2.x) / 2,
                (p1.y + p2.y) / 2
            );
        }
    }
}

After compiling my TupleLib class library, I ran the resulting dll file through ILSpy to decompile it. As can be seen in the following code listing, we unfortunately don’t get the convenient labeling for our touples, and would have to fall back to referencing Item1 and Item2 instead of x and y.


using System;

namespace TupleLib;

public static class Geometry
{
	public static ValueTuple<double, double> Midpoint(ValueTuple<double, double> p1, ValueTuple<double, double> p2)
	{
		//IL_0001: Unknown result type (might be due to invalid IL or missing references)
		//IL_0007: Unknown result type (might be due to invalid IL or missing references)
		//IL_0018: Unknown result type (might be due to invalid IL or missing references)
		//IL_001e: Unknown result type (might be due to invalid IL or missing references)
		//IL_002f: Unknown result type (might be due to invalid IL or missing references)
		//IL_0034: Unknown result type (might be due to invalid IL or missing references)
		//IL_0037: Unknown result type (might be due to invalid IL or missing references)
		return new ValueTuple<double, double>((p1.Item1 + p2.Item1) / 2.0, (p1.Item2 + p2.Item2) / 2.0);
	}
}

It’s not too big of a deal that we don’t get these symbols in the resulting library. It would be a good next step though for future versions of C#.

Ecapsulating functionality

The real issue that I have run into with this syntactic sugar is the encapsulation of functionaility. A developer that I worked with had a function as one of their Tuple items, which in and of itself doesn’t seem too bad given that functions are first class objects in C#. I wouldn’t write off grouping together functions in a datascructure, and so I’m reluctant to categorically say that having a function as an item in a tuple is a bad thing. Unfortunately, what this developer did was re-invent virtual tables.

We had a pipeline that we were building to retrieve messages from multiple Kafka topics, transform them, and ultimately sink them to various databases via HTTP POSTs. The original plan that I had drawn out was to have different message processors that would be instantiated with the appropriate data sinks via the IoC Container for this service. The other developer that was supposed to take the plan and implement it created a template method pattern that heavily leveraged the following datastructure:

protected ReadOnlyDictionary<string, (string endpoint, System.Type payloadType, System.Reflection.MethodInfo sender, ReadonlyDictionary<System.Type, System.Reflection.MethodInfo> mappers)> DataSinks;

The data structure that contained multiple tuples which encapsulated both data and functions were used instead of having multiple classes wired together using inversion of control. This allowed the developer to have a singe template method implement the core functionality of the system, but it locked us into transferring data via HTTP. If let’s say the director we worked under came to us to ask that we switch out one of our data flows to Google PubSub (which did happen), we would have to change processor class instead of just popping in a new data sink. Given this rigidity, I think that this code violates every single Solid principle.

The processor doesn’t have a Single Responsibility. The processor has to have the code for sinking the data via HTTP into a data sink. As mentioned before, if we had to add Goggole Pubsub, we would have to add that method to the template method base class, and then wire it into the DataSinks dictionary. In fact I did have to change the base class later to introduce batching calls instead of sending messages as a single request. This change then had to affect all data flows instead of introducing a new class implmentation for an interface and updating the processors that would leverage that interface.

While we could theoritically use extension methods tack on functionality to the Tuple type, those signatures would get rather harry. We would also have to change every single extension method if we did decide to add a new piece of data or a new function to our tuple. I think it’s safe to say that we can’t reliable satisfy the Open/Close principle.

As for Liskov substitution, we don’t really have a base class here. We have tuples. We’re not calling a method that is specified in the base class, and we could not implement the ability to wrap the functionality of whatever tuple function we considered to be the base without having write a bunch of code to compose the different functions contained within our dictionary of tuples.

With the aforementioned difficulty sinking data to Google PubSub, we definitely don’t have Interface Segregation. We have a singular way of doing things and that is POST a batch of data via HTTP. Instead of having a new client for Google PubSub, gRPC, Kafka, etc, you have to extend a base class to add this new functionality and may or may not have to tack on more functions to the Tuple.

A function in a Tuple is concrete. Sure you can have different functions that slot into that Tuple type, but you won’t have much in the way of abstraction if let’s say you wanted to mock the function for a unit test, so we’re definitely not satisfying the Dependency Inversion principle.

Conclusion

I started this article with a quote from GOTO Considered Harmful, which was a letter written to the ACM by E. Djikstra. In the letter, Djikstra points out the issues of using GOTO statements to control the flow of our programs, and advocates for the Structured paradigm. Of course with any tool there are trade offs. We still use GOTOs today. Every continue or break that you put in a loop is a GOTO statement. As I emphasized earlier in the article, this syntactic sugar is useful for local variables or internally within an assembly. I just would be careful using it beyond that limited scope.