Manipulating XML Google Merchant Data using C# and LINQ

Receiving a Google Merchant Data feed (also known as a Google Product Feed) can be fairly easily manipulated on import time using a little C# and LINQ.

The feed is basically a XML RSS 2.0 feed with some added properties using the namespace xmlns:g=”http://base.google.com/ns/1.0.

These feeds often comes from older systems and data is created by busy merchants, so data can be relatively dirty, and a cleanup is required before you add them to your product database.

The feed could look like this:

<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
    <channel>
        <title>Google product feed</title>
        <link href="https://pentia.dk" rel="alternate" type="text/html"/>
        <description>Google product feed</description>
        <item>
            <g:id><![CDATA[1123432]]></g:id>
            <title><![CDATA[Some product]]></g:title>
            <link><![CDATA[https://pentia.dk]]></g:link>
            <g:description><![CDATA[description]]></g:description>
            <g:gtin><![CDATA[5712750043243446]]></g:gtin>
            <g:mpn><![CDATA[34432-00]]></g:mpn>
            <g:image_link><![CDATA[https://pentia.dk/someimage.jpg]]></g:image_link>
            <g:product_type><![CDATA[Home &gt; Dresses &gt; Maxi Dresses]]></g:product_type>
            <g:condition><![CDATA[new]]></g:condition>
            <g:availability><![CDATA[in stock]]></g:availability>
            <g:price><![CDATA[15.00 USD]]></g:price>
            <g:sale_price><![CDATA[10.00 USD]]></g:sale_price>
        </item>
        ...
        ...
    </channel>
</rss>

See the full specification in the Google Merchant Center help.

Sometimes the feed would contain content that you does not need, and a little XML manipulation is required.

But first thing first:

STEP 1: GET THE XML FEED AND CONVERT IT INTO AN XML DOCUMENT

using System;
using System.Net;
using System.Net.Http;
using System.Xml;
using System.Xml.Linq;
using System.Linq;
using System.Dynamic;

private static HttpClient _httpClient = new HttpClient();

public static async Task<string> GetFeed(string url)
{
  using (var result = await _httpClient.GetAsync($"{url}"))
  {
    string content = await result.Content.ReadAsStringAsync();
    return content;
  }
}

public static void Run()
{
  // Get the RSS 2.0 XML data
  string feedData = GetData("https://url/thefeed.xml").Result;

  // Convert the data into an XDocument
  var document = XDocument.Parse(feedData);
  // Speficy the Google namespace
  XNamespace g = "http://base.google.com/ns/1.0";
  // Get a list of all "item" nodes
  var items = document.Descendants().Where(node =&amp;gt; node.Name == "item");
    
  // Now we are ready to manipulate
  // ...
  // ...
}

NOW TO THE MANIPULATIONS:

EXAMPLE 1: Remove duplicates – all products with the same ID is removed:

items.GroupBy(node => node.Element(g+"id").Value)
  .SelectMany(node => node.Skip(1))
  .Remove();

EXAMPLE 2: Remove all products out of stock:

items = document.Descendants()
  .Where(node => node.Name == "item" 
         && node.Descendants()
         .Any(desc => desc.Name == g + "availability" 
              && desc.Value == "out of stock"));
items.Remove();

EXAMPLE 3: Remove adverts not on sale (all adverts that do not have a g:sale_price node)

items = document.Descendants()
  .Where(node => node.Name == "item" 
         && node.Descendants()
         .Any(desc => desc.Name == g + "sale_price" 
         && desc.Value.Trim() == string.Empty));
items.Remove();

EXAMPLE 4: ADD TRACKING PARAMETERS TO URL’S (adding query string parameters to the URL)

var items = document.Descendants().Where(node => node.Name == "item");
foreach (var item in items)
{
  string url = item.Element("link").Value;
  if (url.Contains("?"))
    item.Element("link").ReplaceNodes(new XCData(url + "&" + "utm_source=s&utm_medium=m&utm_campaign=c"));
  else  
    item.Element("link").ReplaceNodes(new XCData(url + "?" + "utm_source=s&utm_medium=m&utm_campaign=c"));
}

EXAMPLE 5: CHANGE THE TITLE (for example, if the feed contains used products, you might want to add the word “used” to the title

var items = document.Descendants().Where(node => node.Name == "item");
foreach (var item in items)
{
  var title = "USED " + item.Element("title").Value;
  item.Element("title").ReplaceNodes(title);
}

…AND THE EXOTIC EXAMPLE: COMBINE ALL PRODUCTS IF THEY BELONG TO A PRODUCT_TYPE THAT CONTAIN LESS THAN 2 PRODUCTS

foreach(var group in items.GroupBy(node => node.Element(g+"product_type").Value))
{
  if (group.Count() <= 2)
  {
    foreach (var advert in group)
    {
      advert.Element(g+"product_type").ReplaceNodes(new XCData("Other"));
    }
  }
}

Finally you can grab the manipulated document and do what you need to do:

// Grab converted content
string convertedFeedData = document.ToString();

I hope this gives some examples on how to do much with less code.

MORE TO READ:

About briancaos

Developer at Pentia A/S since 2003. Have developed Web Applications using Sitecore Since Sitecore 4.1.
This entry was posted in .net, .NET Core, c#, General .NET and tagged , , , , , . Bookmark the permalink.

1 Response to Manipulating XML Google Merchant Data using C# and LINQ

  1. Pingback: C# Working with Namespaces in XDocument and XML | Brian Pedersen's Sitecore and .NET Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.