Receiving a Google Merchant Data feed (also known as a Google Product Feed) can be fairly easily manipulated on import time using a little C# and LINQ.
The feed is basically a XML RSS 2.0 feed with some added properties using the namespace xmlns:g=”http://base.google.com/ns/1.0.
These feeds often comes from older systems and data is created by busy merchants, so data can be relatively dirty, and a cleanup is required before you add them to your product database.
The feed could look like this:
<?xml version="1.0" encoding="utf-8" ?> <rss version="2.0" xmlns:g="http://base.google.com/ns/1.0"> <channel> <title>Google product feed</title> <link href="https://pentia.dk" rel="alternate" type="text/html"/> <description>Google product feed</description> <item> <g:id><![CDATA[1123432]]></g:id> <title><![CDATA[Some product]]></g:title> <link><![CDATA[https://pentia.dk]]></g:link> <g:description><![CDATA[description]]></g:description> <g:gtin><![CDATA[5712750043243446]]></g:gtin> <g:mpn><![CDATA[34432-00]]></g:mpn> <g:image_link><![CDATA[https://pentia.dk/someimage.jpg]]></g:image_link> <g:product_type><![CDATA[Home > Dresses > Maxi Dresses]]></g:product_type> <g:condition><![CDATA[new]]></g:condition> <g:availability><![CDATA[in stock]]></g:availability> <g:price><![CDATA[15.00 USD]]></g:price> <g:sale_price><![CDATA[10.00 USD]]></g:sale_price> </item> ... ... </channel> </rss>
See the full specification in the Google Merchant Center help.
Sometimes the feed would contain content that you does not need, and a little XML manipulation is required.
But first thing first:
STEP 1: GET THE XML FEED AND CONVERT IT INTO AN XML DOCUMENT
using System; using System.Net; using System.Net.Http; using System.Xml; using System.Xml.Linq; using System.Linq; using System.Dynamic; private static HttpClient _httpClient = new HttpClient(); public static async Task<string> GetFeed(string url) { using (var result = await _httpClient.GetAsync($"{url}")) { string content = await result.Content.ReadAsStringAsync(); return content; } } public static void Run() { // Get the RSS 2.0 XML data string feedData = GetData("https://url/thefeed.xml").Result; // Convert the data into an XDocument var document = XDocument.Parse(feedData); // Speficy the Google namespace XNamespace g = "http://base.google.com/ns/1.0"; // Get a list of all "item" nodes var items = document.Descendants().Where(node =&gt; node.Name == "item"); // Now we are ready to manipulate // ... // ... }
NOW TO THE MANIPULATIONS:
EXAMPLE 1: Remove duplicates – all products with the same ID is removed:
items.GroupBy(node => node.Element(g+"id").Value) .SelectMany(node => node.Skip(1)) .Remove();
EXAMPLE 2: Remove all products out of stock:
items = document.Descendants() .Where(node => node.Name == "item" && node.Descendants() .Any(desc => desc.Name == g + "availability" && desc.Value == "out of stock")); items.Remove();
EXAMPLE 3: Remove adverts not on sale (all adverts that do not have a g:sale_price node)
items = document.Descendants() .Where(node => node.Name == "item" && node.Descendants() .Any(desc => desc.Name == g + "sale_price" && desc.Value.Trim() == string.Empty)); items.Remove();
EXAMPLE 4: ADD TRACKING PARAMETERS TO URL’S (adding query string parameters to the URL)
var items = document.Descendants().Where(node => node.Name == "item"); foreach (var item in items) { string url = item.Element("link").Value; if (url.Contains("?")) item.Element("link").ReplaceNodes(new XCData(url + "&" + "utm_source=s&utm_medium=m&utm_campaign=c")); else item.Element("link").ReplaceNodes(new XCData(url + "?" + "utm_source=s&utm_medium=m&utm_campaign=c")); }
EXAMPLE 5: CHANGE THE TITLE (for example, if the feed contains used products, you might want to add the word “used” to the title
var items = document.Descendants().Where(node => node.Name == "item"); foreach (var item in items) { var title = "USED " + item.Element("title").Value; item.Element("title").ReplaceNodes(title); }
…AND THE EXOTIC EXAMPLE: COMBINE ALL PRODUCTS IF THEY BELONG TO A PRODUCT_TYPE THAT CONTAIN LESS THAN 2 PRODUCTS
foreach(var group in items.GroupBy(node => node.Element(g+"product_type").Value)) { if (group.Count() <= 2) { foreach (var advert in group) { advert.Element(g+"product_type").ReplaceNodes(new XCData("Other")); } } }
Finally you can grab the manipulated document and do what you need to do:
// Grab converted content string convertedFeedData = document.ToString();
I hope this gives some examples on how to do much with less code.
MORE TO READ:
- Google Merchant data – product feed specification from Google
- XML RSS 2.0 Feed specification from the W3 Organisation
- XDocument class from Microsoft
Pingback: C# Working with Namespaces in XDocument and XML | Brian Pedersen's Sitecore and .NET Blog