.NET Schema Validation · Musings

.NET Schema Validation

I ran across a problem the other day the reason for which took me somewhat by suprise. A class in the codebase I work on has the responsibility for validating incoming XML messages against a schema. We noticed that a bad message was failing after the schema check, and it wasn’t just a slightly incorrect message, it was a totally incorrect message intended for another part of the system.

Using the awesome Linqpad I pulled together a trivial example to investigate.

<?xml version="1.0" encoding="utf-8"?>
<xs:schema targetNamespace="schemas.deabill.net"
           elementFormDefault="qualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:element name="MyRootElement" type="xs:string" />

</xs:schema>

My schema expected a single MyRootElement node (of type string) in the schemas.deabill.net namespace. The validation code looked much like this:

var schemaSet = new XmlSchemaSet();
schemaSet.Add(XmlSchema.Read(new StringReader(MySchema), (sender, args) => Console.WriteLine(args.Message)));

var settings = new XmlReaderSettings
   {
      ValidationType = ValidationType.Schema,
      Schemas = schemaSet
   };
settings.ValidationEventHandler += (sender, args) => Console.WriteLine(args.Message);

var reader = XmlReader.Create(new StringReader(MyXml), settings);
while (reader.Read()) {}

I’ve simplified and tweaked a few things but the important function is as-was. Then I validated the following XML…

<IncorrectRootElement />

Despite being the wrong element name and having no namespace at all the validation succeeded. I decided to see how the validator responded is the element was at least in the correct namespace…

<IncorrectRootElement xmlns="schemas.deabill.net" />

The ‘schemas.deabill.net:IncorrectRootElement’ element is not declared.

This was more like what I was expecting. A validation failure highlighting the undefined element. Some further digging revealed that the schema validating XML reader in .NET does not consider the bad namespace an error, merely a warning. The upshot being that slightly incorrect messages would fail validation, but totally incorrect messages would fall through.

Personally, I think an unexpected element in an unexpected namespace is very much an error. The default behaviour I’d expect would be to reject anything not in a given schema namespace, but who am I to argue? A quick tweak to the XMLReaderSettings solves the problem.

ValidationFlags = XmlSchemaValidationFlags.ReportValidationWarnings

Now my spurious XML fails validation as expected.

<IncorrectRootElement />

Could not find schema information for the element ‘IncorrectRootElement’.

You do need to be a bit careful raising validation warnings as it may lead to other (unforeseen) failures, but it’s a small gotcha that got me.