readme
This commit is contained in:
parent
f4f66e5b7b
commit
0bce2df2de
108
README.mkdown
108
README.mkdown
|
@ -1,50 +1,80 @@
|
|||
Dexter is a simple language for data-extraction from XML-like documents (including HTML). A simple script, or "dex", looks like this:
|
||||
Dexter is a simple language for data-extraction from XML-like documents (including HTML). Dexter is:
|
||||
|
||||
{
|
||||
"title": "h1",
|
||||
"links": [
|
||||
{
|
||||
"text": "a",
|
||||
"href": "$text/@href"
|
||||
}
|
||||
]
|
||||
}
|
||||
1. Blazing fast -- Typical HTML parses are sub-50ms.
|
||||
2. Easy to write and understand.
|
||||
3. Powerful. Dexter can understand full XPath, including standard and user-defined functions.
|
||||
|
||||
|
||||
A simple script, or "dex", looks like this:
|
||||
|
||||
{
|
||||
"title": "h1",
|
||||
"links": [
|
||||
{
|
||||
"text": "a",
|
||||
"href": "$text/@href"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
This returns JSON output with the same structure. Applying this dex to http://www.yelp.com/biz/amnesia-san-francisco yields:
|
||||
|
||||
{
|
||||
"title": "Amnesia",
|
||||
"links": [
|
||||
{
|
||||
"href": "\/",
|
||||
"text": "Yelp"
|
||||
},
|
||||
{
|
||||
"href": "\/",
|
||||
"text": "Welcome"
|
||||
},
|
||||
{
|
||||
"href": "\/signup?return_url=%2Fuser_details",
|
||||
"text": " About Me"
|
||||
},
|
||||
.....
|
||||
]
|
||||
}
|
||||
"title": "Amnesia",
|
||||
"links": [
|
||||
{
|
||||
"href": "\/",
|
||||
"text": "Yelp"
|
||||
},
|
||||
{
|
||||
"href": "\/",
|
||||
"text": "Welcome"
|
||||
},
|
||||
{
|
||||
"href": "\/signup?return_url=%2Fuser_details",
|
||||
"text": " About Me"
|
||||
},
|
||||
.....
|
||||
]
|
||||
}
|
||||
|
||||
This dex could also have been expressed as:
|
||||
|
||||
{
|
||||
"title": "h1",
|
||||
"links(a)": [
|
||||
{
|
||||
"text": ".",
|
||||
"href": "@href"
|
||||
}
|
||||
]
|
||||
}
|
||||
{
|
||||
"title": "h1",
|
||||
"links(a)": [
|
||||
{
|
||||
"text": ".",
|
||||
"href": "@href"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
The "a" in links(a) is a "key selector" an explicit grouping (with scope) for the array. You can use any XPath or CSS3 expression as a value or a key selector. Dexter will try to be smart, and figure out which you are using.
|
||||
The "a" in links(a) is a "key selector" -- an explicit grouping (with scope) for the array. You can use any XPath 1.0 or CSS3 expression as a value or a key selector. Dexter will try to be smart, and figure out which you are using. You can use CSS selectors inside XPath functions -- "substring-after(h1>a, ':')" is a valid expression.
|
||||
|
||||
We've made a few more changes, which you should be aware of.
|
||||
### Variables
|
||||
|
||||
#
|
||||
You can use $foo to access the value of the key "foo" in the current scope (i.e. nested curly brace depth). Also available are $parent.foo, $parent.parent.foo, $root.foo, $root.foo.bar, etc.
|
||||
|
||||
|
||||
### Custom Functions
|
||||
|
||||
You can write custom functions in XSLT. They look like:
|
||||
|
||||
<func:function name="user:excited">
|
||||
<xsl:param name="input" />
|
||||
<func:result select="concat($input, '!!!!!!!')" />
|
||||
</func:function>
|
||||
|
||||
If you run
|
||||
|
||||
{
|
||||
"title": "user:excited(h1)",
|
||||
}
|
||||
|
||||
on the Yelp page, you'll get:
|
||||
|
||||
{
|
||||
"title": "Amnesia!!!!!!!",
|
||||
}
|
||||
|
Loading…
Reference in New Issue