In responses to my last week’s post, several readers mentioned LINQ-like operators they implemented themselves. I also had ideas for operators that would lead to neat solutions for some problems, so I decided to give it some thought and collect up the most useful operators into a reusable library.

My goal was to include operators that are simple to use, but applicable to a broad range of problems. I  left out operators that I thought were either too complicated to use, or too specific to a particular problem domain.

You can download the full source code of the library here (rename the file to ExtendedEnumerable.cs). Read on to find out what it contains.

ReadLinesFrom, WriteLinesTo – I/O in LINQ queries

LINQ is a great programming model for simple file-processing tasks. Treating a file as an enumerable of lines, we can filter, transform and analyze it using various LINQ operators. To support this use case, my library includes several operators to convert between streams and line enumerables. Two most general overloads are ReadLinesFrom and WriteLinesTo, which have the following signatures:

public static IEnumerable<string> ReadLinesFrom(TextReader reader)
public static void WriteLinesTo(
this IEnumerable<string> lines, TextWriter writer)

However, in most cases you will want to use one of the more specific overloads, ReadLinesFromConsole, ReadLinesFromFile, WriteLinesToConsole and WriteLinesToFile. For example, the Grep method below reads a file, keeps only lines that contain a particular substring, and writes out the results into another file:

static void Grep(string inputFile, string outputFile, string substring)
{
    ExtendedEnumerable.ReadLinesFromFile(inputFile)
        .Where(line => line.Contains(substring))
        .WriteLinesToFile(outputFile);
}

Isn’t that neat?

Generate – generate a sequence from a user delegate

In C# 2, generating arbitrary sequences became much more convenient than it used to be in C# 1. Instead of implementing two classes, the IEnumerable<T> and the IEnumerator<T>, you can implement a single method that yields items using the iterator block syntax (i.e. the yield statements).

However, I still try to avoid creating a method just to generate a simple sequence, particularly if I use that sequence only in one place in my program. The Generate operator below accepts a delegate which generates the sequence element by element. To signal the end of the sequence, the generator returns null.

Since value types cannot be null, we need one overload for reference types, and another overload that uses a nullable wrapper to handle value types:

public static IEnumerable<T> Generate<T>(Func<T> generator)
where T : class
public static IEnumerable<T> Generate<T>(Func<Nullable<T>> generator)
where T : struct

To give a usage example, the ReadLinesFromConsole operator I mentioned above could be implemented as follows:

public static IEnumerable<string> ReadLinesFromConsole()
{
    return ExtendedEnumerable.Generate(() => Console.ReadLine());
}

As another example, this code sample generates an infinite sequence of random integers:

Random rand = new Random();
var
randomSeq = ExtendedEnumerable.Generate(() => (int?)rand.Next());

This Generate operator has two disadvantages. First, it cannot be used to generate sequences that contain null values, because null is the terminator of the sequence. Second, it is a bit annoying to have to use the cast in the value-type overload (see the cast to int? in the random-sequence example). These are minor disadvantages, though, and I much prefer using the Generate operator over implementing a new method each time I need to generate a simple sequence.

As a side note, apparently Jon Skeet also looked at the problem of generating a sequence from a user’s delegate, and came up with a similar but slightly different solution, which you can find here.

ForEach – execute an action for each element in the sequence

As has been suggested by Magnus Martensson in a comment to my previous posting, as well as by others elsewhere, it is often neat to be able to specify an action at the end of the query using a ForEach operator, rather than having to iterate over the query in a foreach statement.

So, instead of this:

foreach (int x in Enumerable.Range(0,10).Where(i => (i % 2 == 0)).Take(5))
{
    Console.WriteLine(x);
}

You can write this:

Enumerable.Range(0,10).Where(i => (i % 2 == 0)).Take(5)
.ForEach(i => Console.WriteLine(i));

Do – execute side effects in the middle of the query

Sometimes it is useful to add side-effects in the middle of query, rather than to the end. For example, we can log which elements have been processed at a particular stage of the query. The Do operator provides this functionality:

Enumerable.Range(0,10)
    .Do((e) => Console.WriteLine("Processing {0}", e))
    .Select(x => x*2).ToArray();

Combine – combine two sequences

The Combine operator exists in various functional languages including F#, sometimes under the name Zip or ZipWith. It accepts two sequences as inputs, and combines their elements into a single sequence. So, the first element in sequence 1 and the first element in sequence 2 will be combined to produce the first element in the output sequence, and so forth. The function which combines an element from one sequence with an element from the other sequence is provided by the user. If one of the sequences is longer, the remaining elements in the longer sequence will be ignored.

To compute the pairwise sum between elements in seq1 and seq2, use the Combine operator like this:

IEnumerable<int> sumSeq = seq1.Combine(seq2, (a, b) => a + b);

As another example, to check whether a sequence of integers seq is increasing, use this query:

bool isIncreasing = seq.Combine(seq.Skip(1), (a, b) => a < b).All(x => x);

ToStringPretty – convert a sequence to a delimited string

Converting a sequence to a nicely-formatted string is a bit of a pain. The String.Join method definitely helps, but unfortunately it accepts an array of strings, so it does not compose with LINQ very nicely.

My library includes several overloads of the ToStringPretty operator that hides the uninteresting code. Here is an example of use:

Console.WriteLine(Enumerable.Range(0, 10).ToStringPretty("From 0 to 9: [", ",", "]"));

The output of this program is:

From 0 to 9: [0,1,2,3,4,5,6,7,8,9]

FromEnumerator – convert an enumerator to an enumerable

Several times I got into a situation where I have an enumerator, but really need an enumerable instead. There does not seem to be a simple way to do the conversion in .Net. Hence, my library of operators includes FromEnumerator which accepts an enumerator and returns an enumerable.

This sample converts enumerator e1 into an enumerable and then iterates over it in a foreach statement:

foreach (int x inExtendedEnumerable.FromEnumerator(e1)) { ... }

And this sample converts enumerator e2 into an enumerable to use it as a data source in a LINQ query:

var query = from x in ExtendedEnumerable.FromEnumerator(e2)
            where x % 2 == 0
            select x;

Single – convert an item to an enumerable

As I mentioned in my previous posting, I have found converting a single item to an enumerable to be a fairly frequent operation. So, my library includes an operator for the conversion:

IEnumerable<int> e = ExtendedEnumerable.Single(5);

Shuffle – randomly shuffle a sequence

I find myself regularly re-implementing the Shuffle operator when I am testing my code. Shuffle operator accepts a sequence and returns the same sequence, randomly rearranged.

This example prints digits 0..9 in a random order:

Enumerable.Range(0, 10).Shuffle().WriteLinesToConsole();

Comments and Conclusion

Again, the source code is available for download here. If there operators that I haven’t included, but you think they are useful, let me know in the comments!

kick it on DotNetKicks.com

Related:

Tags:

10 Comments to “Extended LINQ: additional operators for LINQ to objects”

  1. [...] If you have your own bag of LINQ tricks, please share them in the comments! Also, if you like this article, you may like my next article, Extended LINQ: additional operators for LINQ to objects. [...]

  2. [...] Extended LINQ: additional operators for LINQ to objects – Igor Ostrovsky offers a collection of extension method operators for LINQ to objects [...]

  3. Marcel Popescu says:

    static void Grep(string inputFile, string outputFile, string substring)
    {
    ExtendedEnumerable.ReadLinesFromFile(inputFile)
    .Where(line => line.Contains(word))
    .WriteLinesToFile(outputFile);
    }

    It would be really useful if you only pasted code that actually compiles… you’re using word instead of substring here.

  4. Marcel: Good catch. Fixed.

    By the way, I may be misreading your tone, but it seems kind of harsh. Typos like this one are notoriously easy to make when writing an article. You paste in your code, but then have to reformat it later, move stuff around, rename variables… It is easy to make a mistake. Also, keep in mind that writing blog posts is something I do for fun on my weekends.

    Either way, I hope you enjoyed the article. Thanks for reading!

  5. Marcel Popescu says:

    Yes, the article was interesting, it’s just that it’s annoying when I see typos in code. I bet you find it annoying when you read other people’s articles / blogs too :) Sorry about the tone, though.

  6. Hal Pierson says:

    In conjunction with

    public static IEnumerable ReadLinesFrom(TextReader reader)

    I often use

    public static IEnumerable
    CSV(this IEnumerable lines)

    That converts comma separated fields into a string[].

    By writing a class’ ToString() function to generate a comma separated string; and a constructor that takes a matching string[] parameter; I have an simple serialization/de-serialization mechanism.

    Hal Pierson

    P.S. The separator doesn’t have to be a comma. But you do have to have a strategy for string fields that coould contain the seperator.

  7. ryp says:

    Hi! Great article as usual .

    I have a question though, I am looking at: public static string ToStringPretty(this IEnumerable source, string before, string delimiter, string after)

    and cannot figure why it isn’t possible to change whole function to something like this:

    return before + source.Cast().Aggregate((workingSentence, next) => next + ” ” + workingSentence) + after;

    At debugger I receive the following error:

    System.InvalidCastException was unhandled
    Message=”Unable to cast object of type ‘System.Int32′ to type ‘System.String’.”

    I know it’s not your code, but may be you can help me to figure this out, and perhaps we can enhance your library.

    Thanks in advance.

  8. @ryp: Glad you like the article!

    Your implementation is more clever than the one I have. It is less efficient, but that could be fixed by rewriting the Aggregate to use a StringBuilder.

    To fix the compile error, you need to use a Select to convert from an int to a string. Cast operator does not work in this context.

    Try this:

    return before + source.Select(x => x.ToString()).Aggregate((workingSentence, next) => workingSentence + delimiter + next) + after;

  9. ryp says:

    @Igor Ostrovsky

    Thanks, I have changed it to:

    return source.Aggregate(new StringBuilder(before), (workingSentence, next) => workingSentence.Length > before.Length ? workingSentence.Append(delimiter).Append(next) : workingSentence.Append(next)).Append(after).ToString();

  10. [...] If you have your own bag of LINQ tricks, please share them in the comments! Also, if you like this article, you may like my next article, Extended LINQ: additional operators for LINQ to objects. [...]

Leave a Reply

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>