But that's a long way off. For now, it's just a parser for metadata embedded in HTML.
Cognition is licensed under the GNU GPL version 3.
It supports metadata embedded using the following methods:
Many of these methods make use of namespaces.
Standard XML namespaces are mostly understood,
and namespaces may also be linked to using RFC 2731.
You may run into problems if you define the same prefix differently in different parts of the
document. A number of namespaces are also predefined, so that stuff like
<meta name="DC.creator"> will "just work" even if the author never explicitly
defined the DC prefix.
A number of Microformats are also understood:
The document's structure is inferred from
<hX> elements and a tree
is built from them, including semantically used tables (i.e. tables which have
<caption>), figures (see figure microformat) and XOXO lists.
Sections inferred from headings are automatically given funny-looking identifiers
(e.g. <http://example.org/doc.html#section(2.1)>) for use in RDF
@resource unless they already have an
Other miscellaneous buzzwords that Cognition uses or can grok are:
All this data is internally represented in a namespace-aware RDF-triple-like structure.
The predefined values for the
rev attributes in
HTML 3.2 onwards (including HTML5 drafts) are automatically pulled into the XHTML
namespace. Microformatted data is assigned logical namespaces. (e.g. hCalendar data is
given the namespace "urn:ietf:rfc:2445#", the namespace of the iCalendar standard, from
which it inherits its names.)
On the horizon is support for external RDF data (linked to with
support for RDF data types (e.g. xsd:datetime), support for more microformats and the
ability to export data from microformats (e.g. export hCard as vCard).
Note that both HTML and XHTML are supported equally. The stuff that strictly speaking should not work in HTML (e.g. XML namespaces, RDFa) does work: HTML is treated as if it were funny-looking XHTML.
Not a lot really. Spit it out as a Perl data structure, or export it as RDF.