When I first put this site together, it was in a bit of a rush. I decided to keep a count of the number of visits for each article, but did it very simply. All I did was increment a database field called Views by one each time a page was requested. Consequently, the only metric I get is all-time page views. I wondered how I would be able to get the number of page views within the last 7 or 30 days, and it crossed my mind to redesign the database, and the code to track all this. Then I came across the Google AnalyticsData Export API, which provides access to Google Analytics accounts. I have been using Google for my web stats since day one, so I set to work.
Using the API is not that difficult. It's a 3 stage process:
- Log in via an HTP POST request and obtain an authorisation key
- Query the stats via an HTTP GET request (passing the key as a header) and obtain a data feed
- Parse the feed for display
Digging around in the API reference, we see that the core to getting the right data is querying for the correct combination of Dimensions and Metrics. We also see that feeds are returned as XML. The reference site also provides a Query Explorer, with which you can test a few queries, and see how the urls for feed requests are constructed.
I could simply write some procedural code that runs from top to bottom to get the job done quickly, which is what I initially did. However, heavily inspired by Jacob Reimar's most excellent Google Analytics Reader, I decided to build something more extensible so that I can add other reports more easily if required in the future.
I am going to start off with a couple of helper methods. One will make HTTP POST requests and the other will perform the same task using HTTP GET requests:
public static string HttpPostRequest(string url, string post) { var encoding = new ASCIIEncoding(); byte[] data = encoding.GetBytes(post); WebRequest request = WebRequest.Create(url); request.Method = "POST"; request.ContentType = "application/x-www-form-urlencoded"; request.ContentLength = data.Length; Stream stream = request.GetRequestStream(); stream.Write(data, 0, data.Length); stream.Close(); WebResponse response = request.GetResponse(); String result; using (var sr = new StreamReader(response.GetResponseStream())) { result = sr.ReadToEnd(); sr.Close(); } return result; }
public static string HttpGetRequest(string url, string[] headers) { String result; WebRequest request = WebRequest.Create(url); if (headers.Length > 0) { foreach (var header in headers) { request.Headers.Add(header); } } WebResponse response = request.GetResponse(); using (var sr = new StreamReader(response.GetResponseStream())) { result = sr.ReadToEnd(); sr.Close(); } return result; }
These are actually utility methods I have in a static class called HttpRequests and use a fair amount. They are fairly standard uses of the WebRequest and WebResponse classes.
Now we need a way to manage the Dimensions and Metrics. Since they are constants, an Enumeration fits the bill nicely in each case, so I create a file called Dimensions.cs, and then simply add the following to it:
public enum Dimension { pagePath, pageTitle }
Then I do the same for Metrics:
public enum Metric { pageviews }
I could at this stage have added all the dimensions and metrics that the Google Reference lists as Jacob has, but for the purposes of this exercise, decided to keep the content to the barest minimum. Having played with the Query Explorer I linked to earlier, I know these are all the ones I need at this stage for the page view data. The final Enumeration I need is for the sort direction I want to apply to the data. This one is very simple:
public enum SortDirection { Ascending, Descending }
Here's where things get a little more complicated. I want to work with strongly typed data, but the data that Google Analytics provides is in an XML document. Clearly that will need some work done on it. The other thing is that I might create additional reports in the future, so I need that strongly typed base data to be as generic as possible in the first instance. I add a class called BaseData as follows:
public class BaseData { public IEnumerable<KeyValuePair<Dimension, string>> Dimensions { get; set; } public IEnumerable<KeyValuePair<Metric, string>> Metrics { get; set; } }
Each item of BaseData can have a collection of Key/Value pairs containing Dimension data (page titles, page paths etc) and its corresponding value as a string, and a collection of Key/Value pairs containing Metrics and their corresponding values as strings. In the case of the report I want, the Dimensions property will hold two items, whereas the Metrics property will have one key/value pair. However, this approach allows for reuse nicely, as the other future reports may need to make use of many more dimensions or metrics. At the risk of running ahead of myself, here's how that actually looks when BaseData is generated and viewed in the Locals window of the VS debugger:
However, before we get there, we have to generate the data. This requires the following steps:
- Get authenticated
- Get the XML document
- Convert it into BaseData objects
All of this is going to be the job of one class, which I have called GAReporter. It will have 3 methods: one to get authenticated by Google, one to obtain the raw data, and one to convert that to BaseData items. To begin with, I declare a number of string constants:
private const string AuthenticationUrl = "https://www.google.com/accounts/ClientLogin"; private const string AuthenticationPost = "accountType=GOOGLE&[email protected]&Passwd=xxx&service=analytics&source=xxx-xxx"; private const string PageViewReportUrl = "https://www.google.com/analytics/feeds/data?ids={0}&dimensions={1}&metrics={2}&start-date={3}&end-date={4}&sort={5}&max-results={6}";
Not all reports require the number of parameters as in the Page View report. Additional reports may require additional constants to be added to this class. If you fiddle about with the Query Explorer, you can soon find which reports will require similar patterns, and possibly set your constants up accordingly. The first two constants will be the same regardless. All you need to do is to add your own account details. I am using the ClientLogin method of authorisation. For the source parameter, I provide mikesdotnetting-mikesdotnetting-1.0.
The method to authenticate is private to the class. It's only used within it. All it needs to do is to return a string:
private static string Authentication() { string key = null; string result = HttpRequests.HttpPostRequest(AuthenticationUrl, AuthenticationPost); var tokens = result.Split(new string[] { "\n" }, StringSplitOptions.RemoveEmptyEntries); foreach (var item in tokens) { if (item.StartsWith("Auth=")) key = item; } return key; }
You can see that this method makes use of the HttpPostRequest helper method I talked about earlier. When you attempt to log in, you will receive a response which consists of a series of 3 values. You need to get hold of the one that starts with "Auth=". This value needs to be passed back to Google when the report data is requested. The next private method takes care of requesting and returning the XML report data:
private static XDocument getXMLData(string account, IEnumerable<Dimension> dimensions, IEnumerable<Metric> metrics, DateTime from, DateTime to, Metric sort, SortDirection direction, int maxrecords) { XDocument doc = null; var key = Authentication(); if (key.Length > 0) { var dimension = new StringBuilder(); for (var i = 0; i < dimensions.Count(); i++) { dimension.Append("ga:" + dimensions.ElementAt(i)); if (i < dimensions.Count() - 1) dimension.Append(","); } var metric = new StringBuilder(); for (var i = 0; i < metrics.Count(); i++) { metric.Append("ga:" + metrics.ElementAt(i)); if (i < metrics.Count() - 1) metric.Append(","); } var sorter = "ga:" + sort; if (direction == SortDirection.Descending) sorter = "-" + sorter; var fromDate = from.ToString("yyyy-MM-dd"); var toDate = to.ToString("yyyy-MM-dd"); var url = string.Format(PageViewReportUrl, "ga:" + account, dimension, metric, fromDate, toDate, sorter, maxrecords); var header = new[] { "Authorization: GoogleLogin " + key }; doc = XDocument.Parse(HttpRequests.HttpGetRequest(url, header)); } return doc; }
This method is responsible for obtaining the XML document form Google. Having obtained the authentication key from the Authenticate() method, it loops through the collection of Dimensions and Metrics that are passed into it, and together with the other parameters, constructs a valid url with query string which contains the details of the data I want.
The final method is the public method which calls the getXMLData() method. It is reponsible for taking the XDocument object generated by the getXMLDoc() method and parsing it before returning a collection of BaseData objects:
public static IEnumerable<BaseData> GetBaseData(string account, IEnumerable<Dimension> dimensions, IEnumerable<Metric> metrics, DateTime from, DateTime to, Metric sort, SortDirection direction, int maxrecords) { IEnumerable<BaseData> data = null; XDocument xml = getXMLData(account, dimensions, metrics, from, to, sort, direction, maxrecords); if (xml != null) { XNamespace dxp = xml.Root.GetNamespaceOfPrefix("dxp"); XNamespace dns = xml.Root.GetDefaultNamespace(); data = xml.Root.Descendants(dns + "entry").Select(element => new BaseData { Dimensions = new List<KeyValuePair<Dimension, string>>( element.Elements(dxp + "dimension").Select( dimensionElement => new KeyValuePair<Dimension, string>( dimensionElement.Attribute("name").Value.Replace("ga:", "") .ParseEnum<Dimension>(), dimensionElement.Attribute("value").Value))), Metrics = new List<KeyValuePair<Metric, string>>( from metricElement in element.Elements(dxp + "metric") select new KeyValuePair<Metric, string>( metricElement.Attribute("name").Value.Replace("ga:", "") .ParseEnum<Metric>(), metricElement.Attribute("value").Value)) }); } return data; }
There's a lot of angle brackets going on in here, but it isn't as complex as it looks. It will probably help to see a snippet of the XML provided by Google that this code is actually working on:
I've highlighted one entry element. If you look at it, you can see that it contains elements prefixed with dxp: - dxp:dimension and dxp:metric. The LINQ to XML code in the method targets these elements. xml.Root.Descendants(dns + "entry") returns a collection of <entry> nodes. Within each of those nodes, dxp:dimension nodes are selected, and the value of their name attribute (minus the leading "ga:") is assigned to a Dimension object, followed by the value of their value attribute, which is assigned to the string part of the Key/value pair that makes up a BaseData object. This happens until all dxp:dimension nodes have been exhausted, building up an List of key/value pairs. Then the dxp:metric nodes are subjected to the same treatment.
If you are familier with Enumerations, you might be wondering what that ParseEnum<T>() method is all about. If you are not familiar with Enumerations and copy and paste this code as-is, you will definitely wonder why the compiler complains about it. It's an extension method I use to wrap the Enum.Parse() method:
public static T ParseEnum<T>(this string token) { return (T)Enum.Parse(typeof(T), token); }
That takes care of all the base methods and classes. Now I need a specific class for the values in the Page Views report:
public class PageViewReportData { public string Url { get; set; } public string Title { get; set; } public int Views { get; set; } }
And a method to generate it:
public class PageViewReporter { public static IEnumerable<PageViewReportData> GetPageViewReport(string account, DateTime from, DateTime to, int max) { var dims = new Dimension[] { Dimension.pagePath, Dimension.pageTitle }; var mets = new Metric[] {Metric.pageviews}; var sort = Metric.pageviews; var order = SortDirection.Descending; IEnumerable<BaseData> data = GAReporter.GetBaseData(account, dims, mets, from, to, sort, order, max); return data.Select(d => new PageViewReportData { Url = d.Dimensions.First(dim => dim.Key == Dimension.pagePath).Value, Title = d.Dimensions.First(dim => dim.Key == Dimension.pageTitle).Value, Views = Convert.ToInt32(d.Metrics.First(met => met.Key == Metric.pageviews).Value) }); } }
Like Jacob, I put this method in its own class. If I want different page view reports that require more Dimensions, for example, I can simpy add another method to this class, while keeping similar reports together. When you call this method, whether in an MVC Controller action or a code-behind, you will retrieve a strongly typed collection of objects which can be passed to a Model in MVC, or simply bound to a control in a web form:
One final thing - if you plan to show Google report data on a public page on your web site together with "local" data, I would advise using javascript to load it asynchronously after the rest of the page has rendered. Waiting for remote data can otherwise delay the rendering of your page considerably. In the case of this site, I have used jQuery which targets a controller action called GetGoogleData that returns a partial view:
public ActionResult GetGoogleData(int days) { DateTime toDate = DateTime.Now.AddDays(-days); DateTime fromDate = DateTime.Now; IEnumerable<PageViewReportData> data = PageViewReporter.GetPageViewReport("xxxxxx", toDate, fromDate, 15); return View("VisitorStatsPartial", data); }
And the jQuery that shows a "loading" image before populating the div earmarked for the stats data is as follows:
$(document).ready(function() { $("#analyticsdata7").html("<img src=\"../../images/loading.gif\" />"); $.ajax({ type: "GET", contentType: "text/html", url: "/Article/GetGoogleData/7", success: function(response) { $("#analyticsdata7").empty(); $("#analyticsdata7").html(response); } }); });
Summary
If you look at the code that Jacob Reimers provides, you will see that mine doesn't deviate very much from it in terms of structure. That's because it is nice and solid, and allows for extensibility. What I hope I have added to it in this article is a detailed explanation of how it works so that you can extend it as you like.