Remove duplicates from XML feed

Apparently XML isn’t dead yet, and today I received a Google Product Feed in the RSS 2.0 XML format. The feed was full of duplicates and my job is to remove them:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
    <channel>
        <item>
            <g:id>100</g:id>
            <title>Product 100</title>
            ...
            ...
        </item>
        <item>
            <g:id>100</g:id>
            <title>Product 100</title>
            ...
            ...
        </item>
        <item>
            <g:id>200</g:id>
            <title>Product 200</title>
            ...
            ...
        </item>
        <item>
            <g:id>300</g:id>
            <title>Product 300</title>
            ...
            ...
        </item>
    </channel>
</rss>

As you can see, “Product 100” appears twice.

THE SOLUTION:

A little LINQ can get you far:

using System.Xml;
using System.Xml.Linq;
using System.Linq;

var document = XDocument.Parse(theXMLString);

XNamespace g = "http://base.google.com/ns/1.0";
document.Descendants().Where(node => node.Name == "item");
    .GroupBy(node => node.Element(g+"id").Value)
    .SelectMany(node => node.Skip(1))
    .Remove();

HOW IT WORKS:

  • document.Descendants().Where(node => node.Name == “item”): Get all elements called “item
  • GroupBy(node => node.Element(g+”id”).Value): Group them by the “g:id” element.
  • SelectMany(node => node.Skip(1)): Select every one of them apart from the first one
  • Remove(): Delete all that were selected

MORE TO READ:

About briancaos

Developer at Pentia A/S since 2003. Have developed Web Applications using Sitecore Since Sitecore 4.1.
This entry was posted in .net, .NET Core, c#, General .NET and tagged , , , , . Bookmark the permalink.

1 Response to Remove duplicates from XML feed

  1. Pingback: C# Working with Namespaces in XDocument and XML | Brian Pedersen's Sitecore and .NET Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.