Linq Database Experiment.

I've recently run an interesting experiment. I created a set of classes that make files that are readable via the TextFieldParser enumerable and in the case of delimited fields it will pull in the first record as column names. The significance of this is that I can now use Linq queries and treat them like database tables. In addition to the flexibility of the text field parser I used a stream view on a memory mapped file for access to the table files. This approach allows larger files to be handled efficiently as long as the result set isn't large. Below is an example application I wrote to test these classes. The Linq functionality combined with these classes give us similar functionality to SQL select statements on structured files. On more than one occasion I've seen developers load information into a database and handle operations there to ease development. I believe using this technique will allow developers to use similar techniques without needing a full RDBMS. In addition I think this  simple application could be the basis for an actual database. The next step would be add insert support and then create a source to source compiler that would take SQL statements and convert them into linq statements that could be compiled by the C# compiler. The resulting assembly would take the place of an execution plan in a traditional RDBMS. This approach is similar to how Hive queries in Hadoop boil down to mapreduce jobs. This solution isn't designed to horizontally scale like big data solutions but instead is designed to scale vertically. This is because I had what I like to call medium data in mind. The data sets I'm accustomed to are too small to warrant big data solutions but are too large for inefficient solutions. With all of this in mind there is one catch I've run across so far. Because I'm using memory mapped file the last record will have null characters at the end. The reason for this is that memory mapped files are allocated in pages so on the last page memory will be allocated for the whole page size even if the remaining size of the file is less. This results in blank memory at the end of the virtual file. I've added some code into my solution to remove the nulls on read but the performance cost of this solution is not yet known. This will at a minimum make field operations O2 instead of O1 because the replace will occur and then the intended operation.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace SchaeferDB_Console
{
    class Program
    {
        static void Main(string[] args)
        {
            var users = new SchaeferDB.Table("Users.csv");
            var roles = new SchaeferDB.Table("Roles.csv");

            users.Delimiters = new string[] { "," };
            users.TrimWhiteSpace = true;
            roles.Delimiters = new string[] { "," };
            roles.TrimWhiteSpace = true;

            var query =
            from user in users
            join role in roles on user["Username"] equals role["Username"]
            select new { data = user["Username"] + "," + user["Email"] + "," + role["Name"] };
            var rows = query.ToList();

            foreach (var row in rows)
            {
                Console.WriteLine(row.data);
            }

            Console.ReadLine();

        }
    }
}

Comments

Popular posts from this blog

VK9 - Milestone8 Completed

VK9 - Milestone13 Completed

VK9 - Milestone16 Completed