Tuesday, September 23, 2008

Read large text files using C#

Read large text files using C#

Reading and manipulating all the individual lines of a text file in a For Each loop isn't difficult. For example you can load all the text lines in a string array, use the array in a loop, as following
using System.IO;

using System.Text.RegularExpressions;
StreamReader sr = new StreamReader(@"c:\test.txt");
string[] tempArr = Regex.Split(sr.ReadToEnd(), @"\r\n");
// don’t forger to do house keeping for stream object using try - finally
sr.Close();
foreach ( string str in tempArr)
Console.WriteLine(str);
This approach does not work in the case of large files. Because, the above code reads all the contents into the memory. For lager files (say 20MB or higher,) the program may crash or intensively slow depends on your processor / RAM speed. We can solve this memory problem and still use the For Each loop by creating our custom class TextFileReader class that implements the dotnet’s IEnumerable interface. The interface has only one method GetEnumerator, which is expected to return an instance of another class, which must implement the IEnumerator interface.

In dotnet 2.0 we can use IEnumerable interface
using System;

using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
public class TextFileReader : IEnumerable, IDisposable
{ // The StreamReader object
StreamReader sr;
public TextFileReader(string path)
{ sr = new StreamReader(path); }
public void Dispose()
{ // close the file stream
if (sr != null)
{ sr.Close(); }
}
// the IEnumerable interface
public IEnumerator GetEnumerator()
{
while ( sr.Peek() != -1 )
{
yield return sr.ReadLine();
// The "yield" keyword is used to iterate through objects returned by a method.

It creates a state engine in IL so we can create methods that retain their state and we no need to

maintain the state in code.
}

Dispose();
}
IEnumerator IEnumerable.GetEnumerator()
{ return GetEnumerator(); }
}
We can use the TextFileReader object as followes.
TextFileReader TFR = new TextFileReader(@"c:\test.txt");

foreach (string str in TFR)
Console.WriteLine(str);
Happy coding.

-Rajesh Pillai


No comments:

Post a Comment