Tonight I've released another alpha version of Cognition, my semantic web parser. Changelog includes:
- Microformats:
- Add option (disabled by default) to require
<head profile>
for microformat support. Microformat profiles are treated as opaque strings! Supports the following profiles:- http://purl.org/uF/2008/03/
- http://www.w3.org/2006/03/hcard or http://purl.org/uF/hCard/1.0/
- http://dannyayers.com/microformats/hcalendar-profile or http://purl.org/uF/hCalendar/1.0/
- http://purl.org/uF/hAtom/0.1/
- http://purl.org/uF/rel-tag/1.0/
- http://purl.org/uF/rel-license/1.0/
- No profiles required for rel-enclosure, adr or geo (yet).
- Support for hAtom, WebSlices.
- In addition to hAtom 0.1, rel-enclosure is supported within hEntries.
- Improve include-pattern support to prevent some infinite loops.
- Add option (disabled by default) to require
- GRDDL:
- Add option (disabled by default) to require for GRDDL.
- Add option to check profile URLs for profileTransformation links.
- Export:
- Atom output. (Supports RDF/RSS and hAtom as input.)
- iCalendar export option.
- hCalendar 1.1 events.
- hCalendar 1.1 todo items
- hCalendar 1.1 freebusy info.
- hCalendar 1.1 alarms.
- hAtom entries (as VJOURNAL).
- W3C's iCal RDF vocab (but see note in Cognition/Export/Calendar.pm)
- RSS Event Module
- Added a “–nofollow” option to prevent secondary fetching from particular hosts. (Secondary fetching = requesting
<head profile>
,<link rel="meta">
,<link rel="transformation">
.) - Support
<rdf:RDF>
elements found directly in (X)HTML. - Much improved HTML to Text convertion. Namely: word wrapping, line breaks added after block elements, quote marks around
<q>
elements, bullet points and numbers before<li>
elements in unordered and ordered lists, brackets around superscript text, parentheses around subscripts, tab characters between table cells, usenet-style quoting for<blockquote>
, alt text from<img>
and<input type="img">
, values from other<input>
tags. Should be able to handle nested elements like//ul/li/ol/li/dl/dd/blockquote/img[@alt]
. Won't be completely foolproof, but should be an improvement over what was there before! - Fix so that the entire page is not given a rdf:type of ical:vcalendar unless it contains some bona fide vevent/vtodo/valarm/vfreebusy nodes.