I just returned from SQLBits 2016 and had a fantastic time. They really put on a great
conference, with brilliant speakers on a variety of SQL topics (and increasingly NoSQL/Big Data). Throw in good catering and an amazing space themed party, all for a very reasonable price and you have a conference I really can’t recommend enough!


The talk that excited me most this year was an introduction to U-SQL by Michael Rys. U-SQL is a new querying/processing language that acts as a composite of SQL and C#, allowing you to call C# code from within SQL-like queries. You can write C# code in-line in queries, or call it from referenced assemblies. Here’s a sample query from https://azure.microsoft.com/en-gb/documentation/articles/data-lake-analytics-u-sql-get-started/

DROP FUNCTION IF EXISTS Searchlog;

CREATE FUNCTION Searchlog() 
RETURNS @searchlog TABLE
(
            UserId          int,
            Start           DateTime,
            Region          string,
            Query           string,
            Duration        int?,
            Urls            string,
            ClickedUrls     string
)
AS BEGIN 
@searchlog =
    EXTRACT UserId          int,
            Start           DateTime,
            Region          string,
            Query           string,
            Duration        int?,
            Urls            string,
            ClickedUrls     string
    FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
RETURN;
END;

For the most part it looks like standard T-SQL, but you see that call to “Extractors.Tsv()” near the bottom? That’s C# code. In this case it’s an inbuilt function for U-SQL, but it could be any C# you wrote or referenced. You can also see some handy inbuilt features that allow it to easily extract data from files (the demo took them from an Azure Blob Storage account).

U-SQL acts as Data Processing As A Service – you write your U-SQL query, submit it to Azure, then wait for the results. You pay per query, and it handles all infrastructure orchestration. You don’t need to set up machines – it also handles parallelization automatically for you!

Microsoft are positioning this as a Big Data solution designed to be used with the new Azure Data Lake features, but I think it has the potential to be used in a wide variety of scenarios.

It’s very early days, but it’s something I’ll be definitely keeping my eye on.

SHARE IT:

Leave a Reply