Cognition 0.1 Alpha 6

This is a very old article. It has been imported from older blogging software, and the formatting, images, etc may have been lost. Some links may be broken. Some of the information may no longer be correct. Opinions expressed in this article may no longer be held.

Tonight I've released another alpha version of Cognition, my semantic web parser. Changelog includes:

  • Microformats:
    • Add option (disabled by default) to require <head profile> for microformat support. Microformat profiles are treated as opaque strings! Supports the following profiles:
      • http://purl.org/uF/2008/03/
      • http://www.w3.org/2006/03/hcard or http://purl.org/uF/hCard/1.0/
      • http://dannyayers.com/microformats/hcalendar-profile or http://purl.org/uF/hCalendar/1.0/
      • http://purl.org/uF/hAtom/0.1/
      • http://purl.org/uF/rel-tag/1.0/
      • http://purl.org/uF/rel-license/1.0/
      • No profiles required for rel-enclosure, adr or geo (yet).
    • Support for hAtom, WebSlices.
      • In addition to hAtom 0.1, rel-enclosure is supported within hEntries.
    • Improve include-pattern support to prevent some infinite loops.
  • GRDDL:
    • Add option (disabled by default) to require for GRDDL.
    • Add option to check profile URLs for profileTransformation links.
  • Export:
    • Atom output. (Supports RDF/RSS and hAtom as input.)
    • iCalendar export option.
      • hCalendar 1.1 events.
      • hCalendar 1.1 todo items
      • hCalendar 1.1 freebusy info.
      • hCalendar 1.1 alarms.
      • hAtom entries (as VJOURNAL).
      • W3C's iCal RDF vocab (but see note in Cognition/Export/Calendar.pm)
      • RSS Event Module
  • Added a “–nofollow” option to prevent secondary fetching from particular hosts. (Secondary fetching = requesting <head profile>, <link rel="meta">, <link rel="transformation">.)
  • Support <rdf:RDF> elements found directly in (X)HTML.
  • Much improved HTML to Text convertion. Namely: word wrapping, line breaks added after block elements, quote marks around <q> elements, bullet points and numbers before <li> elements in unordered and ordered lists, brackets around superscript text, parentheses around subscripts, tab characters between table cells, usenet-style quoting for <blockquote>, alt text from <img> and <input type="img">, values from other <input> tags. Should be able to handle nested elements like //ul/li/ol/li/dl/dd/blockquote/img[@alt]. Won't be completely foolproof, but should be an improvement over what was there before!
  • Fix so that the entire page is not given a rdf:type of ical:vcalendar unless it contains some bona fide vevent/vtodo/valarm/vfreebusy nodes.