Omer van Kloeten's .NET Zen : Tools: Linq Extensions News Feed 

Here we go with the third installment of the Extension Method Roundup. The reason behind these 'code dumps' is that LINQ is a central part of my coding and always find new problems I want to find elegant solutions to. Hope these prove as useful to you as they do to me.

Intersect / Union

Again, shorthand for when you have Enumerable of Enumerable of T and you simply want to intersect or union all of the enumerations in one single call.

public static IEnumerable<T> Intersect<T>(this IEnumerable<IEnumerable<T>> enumeration)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

IEnumerable<T> returnValue = null;

foreach (var e in enumeration)
{
if (returnValue != null)
returnValue = e;
else
returnValue = returnValue.Intersect(e);
}

return returnValue;
}

public static IEnumerable<T> Union<T>(this IEnumerable<IEnumerable<T>> enumeration)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

IEnumerable<T> returnValue = null;

foreach (var e in enumeration)
{
if (returnValue != null)
returnValue = e;
else
returnValue = returnValue.Union(e);
}

return returnValue;
}

AsNullable


I was always missing this method, to coincide with the Cast and OfType methods.

public static IEnumerable<T?> AsNullable<T>(this IEnumerable<T> enumeration)
where T : struct
{
return from item in enumeration
select new Nullable<T>(item);
}

GroupEvery


This takes count items from an enumeration and groups them into a single array.

public static IEnumerable<T[]> GroupEvery<T>(this IEnumerable<T> enumeration, int count)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

if (count <= 0)
throw new ArgumentOutOfRangeException("count");

int current = 0;
T[] array = new T[count];

foreach (var item in enumeration)
{
array[current++] = item;

if (current == count)
{
yield return array;
current = 0;
array = new T[count];
}
}

if (current != 0)
{
yield return array;
}
}

I've also gone and updated my LINQ Extensions project on CodePlex with everything I've published since the last update. You're welcome to download and fiddle with it. :)

real-buffering by 'nicedexter', CC BY-NC-SA As a rule of thumb, when presented with two independent blocking I/O operations on more than one independent devices, it's best to use threads to create parallel operations, instead of waiting for a single synchronous operation to complete. That way, executing operations O1, ..., On, each of which take T1, ..., Tn will result in total time where T < T1 + ... + Tn, instead of T = T1 + ... + Tn.

Let's take a simple example that consists of the following two operations: The first tries to load all of the files in the My Pictures folder as assemblies (silly), while the other simulates some obscure database operation by calling Thread.Sleep (very silly).

var assemblies = (from file in new DirectoryInfo(@"C:\...\My Pictures").GetFiles("*.*")
let assembly = TryLoadAssembly(file.FullName)
where assembly != null
select assembly).ToArray();

// Do some long database work.
Thread.Sleep(1000);

The code executes synchronously in about two seconds, one second for assembly loading operation and another second holding the thread.


What we could do is enqueue a work item for the ToArray call before the database operation and complete the execution synchronously once we're done. I've coded a short-hand syntax for that:

var assemblies = from file in new DirectoryInfo(@"C:\...\My Pictures").GetFiles("*.*")
let assembly = TryLoadAssembly(file.FullName)
where assembly != null
select assembly;

assemblies = assemblies.Buffered();

// Do some long database work.
Thread.Sleep(1000);

// Force Load
assemblies = assemblies.ToArray();

Once the Buffer call is made, the deferred query begins to run in a separate thread, buffering items into memory, waiting for the query to be executed. Once the query is executed, the buffered items are immediately returned and the iteration completes synchronously. The above code takes approximately one second to run, as both operations run concurrently.


I've attached the code behind this for your consideration and will be adding it to my LINQ Extensions project on CodePlex once I get around to it... :)

public static class ExtensionMethods
{
/// <summary>
/// Asynchronously begins buffering an enumeration, even before it is lazy loaded.
/// </summary>
public static IEnumerable<T> Buffered<T>(this IEnumerable<T> enumeration)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

return new AsyncEnumerable<T>(enumeration);
}

private class AsyncEnumerable<T> : IEnumerable<T>
{
private bool shouldContinueBuffering;
private IEnumerator<T> enumerator;
private IAsyncResult asyncResult;
private List<T> buffer;
private Action bufferAction;
private object syncLock;

public AsyncEnumerable(IEnumerable<T> enumeration)
{
this.enumerator = enumeration.GetEnumerator();
this.shouldContinueBuffering = true;
this.buffer = new List<T>();
this.syncLock = new object();
this.bufferAction = this.Buffer;

this.asyncResult = this.bufferAction.BeginInvoke(null, null);
}

private void Buffer()
{
lock (this.syncLock)
{
// Continue buffering for as long as we can and while there are still items left.
while (this.shouldContinueBuffering && this.enumerator.MoveNext())
{
buffer.Add(enumerator.Current);
}
}
}

IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
this.shouldContinueBuffering = false;

// Wait for the last item buffered to finish.
lock (this.syncLock)
{
// End invocation so that exceptions could be throw here.
this.bufferAction.EndInvoke(this.asyncResult);
}

// Iterate over buffered items.
foreach (var item in buffer)
{
yield return item;
}

// Continue iterating from the point where we stopped buffering.
while (enumerator.MoveNext())
{
yield return enumerator.Current;
}
}

System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return ((IEnumerable<T>)this).GetEnumerator();
}
}
}


Hey hey hey! It's time for another Extension Methods Roundup! Here are some of the extension methods I've written since the last one:


Dictionary's Missing Remove Methods


public static void Remove<TKey, TValue>(this IDictionary<TKey, TValue> dictionary, TValue value)
{
// Check to see that dictionary is not null
if (dictionary == null)
throw new ArgumentNullException("dictionary");

foreach (var key in (from pair in dictionary
where EqualityComparer<TValue>.Default.Equals(value, pair.Value)
select pair.Key).ToArray())
{
dictionary.Remove(key);
}
}

public static void RemoveRange<TKey, TValue>(this IDictionary<TKey, TValue> dictionary, IEnumerable<TValue> values)
{
// Check to see that dictionary is not null
if (dictionary == null)
throw new ArgumentNullException("dictionary");

// Check to see that values is not null
if (values == null)
throw new ArgumentNullException("values");

foreach (var value in values.ToArray())
{
ExtensionMethods.Remove(dictionary, value);
}
}

public static void RemoveRange<TKey, TValue>(this IDictionary<TKey, TValue> dictionary, IEnumerable<TKey> keys)
{
// Check to see that dictionary is not null
if (dictionary == null)
throw new ArgumentNullException("dictionary");

// Check to see that keys is not null
if (keys == null)
throw new ArgumentNullException("keys");

foreach (var key in keys.ToArray())
{
dictionary.Remove(key);
}
}

String Aggregation



public static string Aggregate(this IEnumerable<string> enumeration, string separator)
{
return Aggregate(enumeration, str => str, separator);
}

public static string Aggregate<T>(this IEnumerable<T> enumeration, Func<T, string> toString, string separator)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

// Check to see that toString is not null
if (toString == null)
throw new ArgumentNullException("toString");

// Check to see that separator is not null or an empty string
if (string.IsNullOrEmpty(separator))
throw new ArgumentNullException("separator");

return enumeration.Aggregate(string.Empty,
(accum, item) => string.Format("{0}{1}{2}", accum, separator, toString(item)),
str => str.Length > separator.Length ? str.Substring(separator.Length) : str);
}


Those are very good for when you want to create strings such as "a, b, c, d".


LastOrDefault



public static T LastOrDefault<T>(this IList<T> list)
{
// Check to see that list is not null
if (list == null)
throw new ArgumentNullException("list");

if (list.Count == 0)
return default(T);

return list[list.Count - 1];
}


This is an optimized version of the original LastOrDefault for lists that allow random access.


At



public static T At<T>(this IEnumerable<T> enumeration, int index)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

return enumeration.Skip(index).First();
}

public static IEnumerable<T> At<T>(this IEnumerable<T> enumeration, params int[] indices)
{
return At(enumeration, (IEnumerable<int>)indices);
}

public static IEnumerable<T> At<T>(this IEnumerable<T> enumeration, IEnumerable<int> indices)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

// Check to see that indices is not null
if (indices == null)
throw new ArgumentNullException("indices");

int currentIndex = 0;

foreach (int index in indices.OrderBy(i => i))
{
while (currentIndex != index)
{
enumeration = enumeration.Skip(1);
currentIndex++;
}

yield return enumeration.First();
}
}


At provides pseudo-random access to enumerable lists, where needed. I've found use for it in a couple of places which returned indices for non-IList<T> enumerations.


SequenceEqual<T1, T2>



public static bool SequenceEqual<T1, T2>(this IEnumerable<T1> left, IEnumerable<T2> right, Func<T1, T2, bool> comparer)
{
using (IEnumerator<T1> leftE = left.GetEnumerator())
{
using (IEnumerator<T2> rightE = right.GetEnumerator())
{
bool leftNext = leftE.MoveNext(), rightNext = rightE.MoveNext();

while (leftNext && rightNext)
{
// If one of the items isn't the same...
if (!comparer(leftE.Current, rightE.Current))
return false;

leftNext = leftE.MoveNext();
rightNext = rightE.MoveNext();
}

// If left or right is longer
if (leftNext || rightNext)
return false;
}
}

return true;
}


This differs from the original SequenceEqual in that it is able to accept two different types of sequences.


AsIndexed



public static IEnumerable<KeyValuePair<int, T>> AsIndexed<T>(this IEnumerable<T> enumeration)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

int i = 0;

foreach (var item in enumeration)
{
yield return new KeyValuePair<int, T>(i++, item);
}
}


This is when you need indices, but don't want the overhead of creating an Array<T>.


The Missing SelectMany



public static IEnumerable<T> SelectMany<T>(this IEnumerable<IEnumerable<T>> source)
{
// Check to see that source is not null
if (source == null)
throw new ArgumentNullException("source");

foreach (var enumeration in source)
{
foreach (var item in enumeration)
{
yield return item;
}
}
}


Oh, come on! Why wasn't there a parameterless SelectMany in the framework? Oh well, here's one.


ToDictionary of IGrouping



public static Dictionary<TKey, IEnumerable<TElement>> ToDictionary<TKey, TElement>(
this IEnumerable<IGrouping<TKey, TElement>> enumeration)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

return enumeration.ToDictionary(item => item.Key, item => item.Cast<TElement>());
}


This is shorthand for when you want to create a dictionary from the result of GroupBy.


As I do from time to time, here is a batch of three Extension Methods I've written recently:

IndicesWhere

/// <summary>
///
Gets the indices where the predicate is true.
/// </summary>
public static IEnumerable<int> IndicesWhere<T>(this IEnumerable<T> enumeration, Func<T, bool> predicate)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

// Check to see that predicate is not null
if (predicate == null)
throw new ArgumentNullException("predicate");

int index = 0;

foreach (T item in enumeration)
{
if (predicate(item))
yield return index;

index++;
}
}


This is especially useful when you want to cache indices from an array, rather than the array itself. Here's an example:



var indicesWithValues = values.IndicesWhere(value => value != null);

TakeEvery



/// <summary>
///
Take items from 'startAt' every at 'hopLength' items.
/// </summary>
public static IEnumerable<T> TakeEvery<T>(this IEnumerable<T> enumeration, int startAt, int hopLength)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

int first = 0;
int count = 0;

foreach (T item in enumeration)
{
if (first < startAt)
{
first++;
}
else if (first == startAt)
{
yield return item;

first++;
}
else
{
count++;

if (count == hopLength)
{
yield return item;

count = 0;
}
}
}
}


This is equivalent to an unbounded series of Skip(startAt).Take(1).Skip(hopLength).Take(1).Skip(hopLength)... Useful for when you, for instance, need only every other item in a list.


Distinct



It's really been pissing me off that there's no overload to Distinct that takes a delegate, which means I have to write a new class whenever my comparison isn't the default one. When talking about Anonymous Types, Distinct becomes useless. So here's an overload I can actually use:



private class EqualityComparer<T> : IEqualityComparer<T>
{
public Func<T, T, bool> Comparer { get; internal set; }
public Func<T, int> Hasher { get; internal set; }

bool IEqualityComparer<T>.Equals(T x, T y)
{
return this.Comparer(x, y);
}

int IEqualityComparer<T>.GetHashCode(T obj)
{
// No hashing capabilities. Default to Equals(x, y).
if (this.Hasher == null)
return 0;

return this.Hasher(obj);
}
}

/// <summary>
///
Gets distinct items by a comparer delegate.
/// </summary>
public static IEnumerable<T> Distinct<T>(this IEnumerable<T> enumeration, Func<T, T, bool> comparer)
{
return Distinct(enumeration, comparer, null);
}

/// <summary>
///
Gets distinct items by comparer and hasher delegates (faster than only comparer).
/// </summary>
public static IEnumerable<T> Distinct<T>(this IEnumerable<T> enumeration, Func<T, T, bool> comparer, Func<T, int> hasher)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

// Check to see that comparer is not null
if (comparer == null)
throw new ArgumentNullException("comparer");

return enumeration.Distinct(new EqualityComparer<T> { Comparer = comparer, Hasher = hasher });
}


I'll be integrating these methods into the Linq Extensions project soon enough. Good hunting. :)

Wednesday, April 23, 2008  |  From Omer van Kloeten's .NET Zen : Tools: Linq Extensions

Sometimes you want to use FirstOrDefault, but the default value of T is a valid value that might get returned. If you used FirstOrDefault, you wouldn't know whether the value that you got is a valid first or the default fallback. I use FirstOrFallback to explicitly specify which fallback value I want, rather than always use default(T):

public static T FirstOrFallback<T>(this IEnumerable<T> enumeration, T fallback)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

IEnumerator<T> enumerator = enumeration.GetEnumerator();

enumerator.Reset();

if (enumerator.MoveNext())
return enumerator.Current;

return fallback;
}


On a side note: I realize most of the extension methods I post here are pretty obvious, but I love sharing time-saving code. :)

Wednesday, April 23, 2008  |  From Omer van Kloeten's .NET Zen : Tools: Linq Extensions

I've updated my long standing Linq Extensions project on CodePlex to .NET 3.5 RTM and added the latest extension methods.

You can open bugs and feature requests using the Issue Tracker. If you want to download it, go and download the source code directly.

Enumerating over a Dictionary<TKey, TValue> you will get structs of type KeyValuePair<TKey, TValue>. Whenever you use the ToDictionary extension method, you are forced to specify how to get the key and value for each item, even if it's an enumeration of KeyValuePairs. Seems a bit redundant, doesn't it?

public static Dictionary<TKey, TValue> ToDictionary<TKey, TValue>(this IEnumerable<KeyValuePair<TKey, TValue>> enumeration)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

return enumeration.ToDictionary(item => item.Key, item => item.Value);
}


Well, that little bit of code settles that. :)

Here's another Linq method I wanted to share. This one takes two enumerations, enumeration and subset and checks to see whether subset exists in enumeration in its original order. I used the naming convention set by SequenceEqual.

Examples:

  1. enumeration = (1, 2, 3, 4, 5); subset = (3, 4) ==> SequenceSuperset(enumeration, subset) = true
  2. enumeration = (1, 2, 3, 4, 5); subset = (6, 5) ==> SequenceSuperset(enumeration, subset) = false (6 does not exist in enumeration)
  3. enumeration = (1, 2, 3, 4, 5); subset = (3, 5) ==> SequenceSuperset(enumeration, subset) = false (3 is never immediately followed by 5 in enumeration)
  4. enumeration = (1, 2, 3, 4, 5); subset = (5, 4) ==> SequenceSuperset(enumeration, subset) = false (5 is never immediately followed by 4 in enumeration)

And here's the code:

public static bool SequenceSuperset<T>(this IEnumerable<T> enumeration, IEnumerable<T> subset)
{
return SequenceSuperset(enumeration, subset, EqualityComparer<T>.Default.Equals);
}

public static bool SequenceSuperset<T>(this IEnumerable<T> enumeration,
IEnumerable<T> subset,
Func<T, T, bool> equalityComparer)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

// Check to see that subset is not null
if (subset == null)
throw new ArgumentNullException("subset");

// Check to see that comparer is not null
if (equalityComparer == null)
throw new ArgumentNullException("comparer");

using (IEnumerator<T> big = enumeration.GetEnumerator(), small = subset.GetEnumerator())
{
big.Reset(); small.Reset();

while (big.MoveNext())
{
// End of subset, which means we've gone through it all and it's all equal.
if (!small.MoveNext())
return true;

if (!equalityComparer(big.Current, small.Current))
{
// Comparison failed. Let's try comparing with the first item.
small.Reset();

// There's more than one item in the small enumeration. Guess why I know this.
small.MoveNext();

// No go with the first item? Reset the collection and brace for the next iteration of the big loop.
if (!equalityComparer(big.Current, small.Current))
small.Reset();
}
}

// End of both, which means that the small is the end of the big.
if (!small.MoveNext())
return true;
}

return false;
}

I've created a couple of useful extension methods that I like to use with Linq, so here they are:

ContainsAtLeast

I've noticed that there's no way to find out whether a collection has at least X items. The following takes the collection, tries to take X items from it and asks whether it succeeded or not:

public static bool ContainsAtLeast<T>(this IEnumerable<T> enumeration,
int count)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

return (from t in enumeration.Take(count)
select t)
.Count() == count;
}

AggregationOrDefault



When I need the first item in a collection, I love using First when I need an exception and FirstOrDefault when I don't. However, this doesn't exist for aggregations and if your collection is empty, Max, Min, etc. will throw an exception.



public static T AggregationOrDefault<T>(this IEnumerable<T> enumeration,
Func<IEnumerable<T>, T> aggregationMethod)
{
// Check to see that enumeration is not null
if (enumeration == null)
throw new ArgumentNullException("enumeration");

// Check to see that aggregationMethod is not null
if (aggregationMethod == null)
throw new ArgumentNullException("aggregationMethod");

if (!enumeration.ContainsAtLeast(1))
return default(T);

return aggregationMethod(enumeration);
}


What this does is take a collection and an aggregation method, checks whether there's at least one item and applies the method. This is very useful, because most aggregation methods that take only one parameter - the collection, are not generic, so that would mean, just like in Linq itself, creating a copy of this method for every numeric type (as an example, see Max<T> compared to Max).



How to use it? Simple:



myEnumeration.AggregationOrDefault(Enumerable.Max);


In this case, sending the Max method to AggregationOrDefault.

Saturday, October 14, 2006  |  From Omer van Kloeten's .NET Zen : Tools: Linq Extensions

My new Linq Extensions project for the Linq May CTP is now up on CodePlex and I've released beta 1 (numbered 0.1).

It took me a while to understand how to work with it and the connection to the Team Foundation Server wasn't as fast as I had hoped, but now it's ready for use.

Currently implemented are the following:

  • Hierarchy
  • Reverse Hierarchy
  • Binary Tree
    • Infix Traversal
    • Prefix Traversal
    • Postfix Traversal
  • Graphs
    • Breadth-First Traversal
    • Depth-First Traversal
  • In

A FAQ and examples will be written soon, but it's pretty straight forward. You can open bugs and feature requests using the Issue Tracker. The release is downloadable here.

PS: Since the System.Query assembly is not signed, which I'm guessing is because this is only a CTP, so the DotNetZen.Query assembly is not signed.

After reading Bill Wagner's excellent introduction to Linq, I really liked what I saw and decided that I wanted to create an extension method as an exercise. I sat and thought what I would have wanted from Linq and decided I wanted to write a method that lets you select hierarchical data.

Here's a mockup of how a query would look like:

from   node in tree.ByHierarchy<HierarchicalData>(startWith, connectBy)
select node;
Each 'node' in the query will be an object holding the original item, its level of depth in the hierarchy and its parent (for convenience's sake):
public class Node<T>
{
    public int Level;
    public Node<T> Parent;
    public T Item;
}
This node will be used to determine what the item is and where it's located.
Let's take an example:
Tree t7 = new Tree { Value = 7 };
Tree t6 = new Tree { Value = 6 };
Tree t5 = new Tree { Value = 5, Left = t6, Right = t7 };
Tree t4 = new Tree { Value = 4 };
Tree t3 = new Tree { Value = 3 };
Tree t2 = new Tree { Value = 2, Left = t3, Right = t4 };
Tree t1 = new Tree { Value = 1, Left = t2, Right = t5 };
 
Tree[] treeNodes = new Tree[] { t1, t2, t3, t4, t5, t6, t7 };
 
var nodes = from   node in treeNodes.ByHierarchy<Tree>(
                               t => t == t1,
                               (parent, child) => (parent.Left == child) ||
                                                  (parent.Right == child))
            select node;
 
foreach (LinqExtensions.Node<Tree> n in nodes)
{
    for (int i = 0; i < n.Level; i++) Console.Write(' ');
 
    Console.WriteLine("Node {0}, child of {1}.",
                      n.Item.Value,
                      (n.Parent != null ? n.Parent.Item.Value.ToString() : "no one"));
}
This example will walk on the tree (which so happens to be a three-tiered complete binary tree) and print it in the order of hierarchy:
Node 1, child of no one.
Node 2, child of 1.
Node 3, child of 2.
Node 4, child of 2.
Node 5, child of 1.
Node 6, child of 5.
Node 7, child of 5.

And here's the code that does this:
using System;
using System.Collections.Generic;
using System.Query;

namespace DotNetZen.Linq
{
    public static partial class LinqExtensions
    {
        public class Node<T>
        {
            internal Node()
            {
            }
 
            public int Level;
            public Node<T> Parent;
            public T Item;
        }
 
        public static IEnumerable<Node<T>> ByHierarchy<T>(
            this IEnumerable<T> source,
            Func<T, bool> startWith,
            Func<T, T, bool> connectBy)
        {
            return source.ByHierarchy<T>(startWith, connectBy, null);
        }

        
        private static IEnumerable<Node<T>> ByHierarchy<T>(
            this IEnumerable<T> source,
            Func<T, bool> startWith,
            Func<T, T, bool> connectBy,
            Node<T> parent)
        {
            int level = (parent == null ? 0 : parent.Level + 1);
 
            if (source == null)
                throw new ArgumentNullException("source");
 
            if (startWith == null)
                throw new ArgumentNullException("startWith");
 
            if (connectBy == null)
                throw new ArgumentNullException("connectBy");
 
            foreach (T value in from   item in source
                                where  startWith(item)
                                select item)
            {
                Node<T> newNode = new Node<T> { Level = level, Parent = parent, Item = value };
 
                yield return newNode;
 
                foreach (Node<T> subNode in source.ByHierarchy<T>(possibleSub => connectBy(value, possibleSub),
                                            connectBy, newNode))
                {
                    yield return subNode;
                }
            }
        }
    }
}

 Omer van Kloeten's .NET Zen : Tools: Linq Extensions News Feed 

Last edited Apr 23, 2008 at 7:25 AM by ovanklot, version 3

Comments

No comments yet.