Processing schema.org markup with Perl

This is a very old article. It has been imported from older blogging software, and the formatting, images, etc may have been lost. Some links may be broken. Some of the information may no longer be correct. Opinions expressed in this article may no longer be held.

Someone on IRC asked me for an example of how to parse schema.org markup using my HTML::HTML5::Microdata::Parser module. So here one is. It pulls the microdata from the page, and queries it using SPARQL.

#!/usr/bin/env perl

use HTML::HTML5::Microdata::Parser;
use LWP::Simple ‘get’;
use RDF::Query;

my $uri = “http://buzzword.org.uk/2012/schema-org.html”;
my $microdata = HTML::HTML5::Microdata::Parser->new(
   get($uri),
   $uri,
);

my $query = RDF::Query->new(<<‘SPARQL’);

PREFIX schema: <http://schema.org/>
SELECT ?name ?page
WHERE {
   ?person
      a schema:Person ;
      schema:name ?name ;
      schema:url ?page .
}

SPARQL

my $people = $query->execute($microdata->graph);

while (my $person = $people->next)
{
   printf(
      “Found person: %s %s\n”,
      $person->{name},
      $person->{page},
   );
}