Developer Guide › Pages
'Pages' are the webpages you categorise using muddy. A page belongs to a collection and contains a list of entities that muddy has identified in the page.
Pages can be created, updated and deleted via the API, as well as from the Dashboard interface.
Methods
Create/Index page
The create/index page API is used to categorise a page or some text and store it in the muddy.it system. The API can be called in a synchronous or asynchronous fashion ('realtime' or 'fire and forget').
To index a web page, a post should be submitted to the URI below, where COLLECTION_TOKEN is the token of the collection you wish the page categorisation results to be stored with. The request should be authenticated with a signed request signature.
http://muddy.it/collections/COLLECTION_TOKEN/pages
An xml or json document should be submitted, with an appropriate 'Content-Type' header.
The available response formats are xml and json. To set the response format, the request 'Accept' header needs to be set to application/xml or application/json.
The HTTP request must contain a document that defines the URL or text to be analysed and may also contain configuration options for the request. The document may be submitted as an XML or JSON response. The document contains two sections, the 'page' section that contains attributes to be set for the page and an 'options' section which customises the behaviour of the categorisation process.
The available 'page' attributes are listed in the table below :
| Attribute | Permitted Values | Mandatory (y/n) | Default value | |
|---|---|---|---|---|
| identifier | An identifier for the page | n | A GUID | |
| uri | URI to be categorised | Any valid http URI | n | - |
| text | text to be categorised | n | - | |
| created_at | The creation date of the page | n | Time of request |
The available 'options' parameters are listed below :
| Parameter | Permitted Values | Mandatory (y/n) | Default value | |
|---|---|---|---|---|
| realtime | Request should be synchronous or asynchronous | true|false | n | false |
| include_content | Provides information about the extracted page content, useful for debugging | true|false | n | false |
| include_unclassified | Include dbpedia links that are not entities (have no 'type') | true|false | n | false |
| minimum_confidence | Only return results with a confidence level >= minimum confidence | true|false | n | 0 |
| extraction_match | Extract content using supplied regex | regex | n | - |
| extended_response | Include relevent dbpedia attributes in response for each result | true|false | n | false |
| disambiguate | Disambiguate ambiguous entitites and find the best match | true|false | n | true |
| content_extractor | Content extractor used to identify key text | standard | n | standard |
| term_extractor | Allows the specification of the type of term extraction method used by muddy.it | standard|custom | n | standard |
| store | Store the page against a collection (only used with the 'realtime' option) | true|false | n | true |
| content | Used to pass in the unparsed page content (useful when content has already been retrieved) | Any utf8 string | n | - |
Sample Request
View page.xsd
<?xml version="1.0" encoding="UTF-8"?>
<page>
<uri>http://news.bbc.co.uk/1/hi/entertainment/8219362.stm</uri>
<options>
<realtime>true</realtime>
</options>
</page>
Sample Response
A successful response is shown below.
<response status="OK">
<created-at>2009-04-01T22:56:30Z</created-at>
<entities>
<entity>
<uri>http://dbpedia.org/resource/United_Auto_Workers</uri>
<confidence>0.625</confidence>
<classification>http://muddy.it/ontology/Organisation</classification>
<position>345</position>
</entity>
<entity>
<uri>http://dbpedia.org/resource/United_Auto_Workers</uri>
<confidence>0.625</confidence>
<classification>http://muddy.it/ontology/Organisation</classification>
<position>445</position>
</entity>
<entity>
<uri>http://dbpedia.org/resource/Detroit</uri>
<confidence>0.252605482752542</confidence>
<classification>http://muddy.it/ontology/Place</classification>
<position>545</position>
</entity>
</entities>
<title>1 Auto story</title>
<identifier>65c02056-e561-455c-ac6f-239415160711</identifier>
</response>
Refresh/Index page
The refresh/index page API is used to update a categorised page store it in the muddy.it system. The API can be called in a synchronous or asynchronous fashion ('realtime' or 'fire and forget').
To update a web page categorisation, a PUT request should be submitted to the URI below, where COLLECTION_TOKEN is the token of the collection you wish the page categorisation results to be stored with and IDENTIFIER is the assigned identifier for the page categorisation to be updated. The request should be authenticated with a signed request signature.
http://muddy.it/collections/COLLECTION_TOKEN/pages/IDENTIFIER
An xml or json document should be submitted, with an appropriate 'Content-Type' header.
The available response formats are xml and json. To set the response format, the request 'Accept' header needs to be set to application/xml or application/json.
The HTTP request can contain a document that may define configuration options for the request. The document may be submitted as an XML or JSON response. The document contains two sections, the 'page' section that contains attributes to be set for the page and an 'options' section which customises the behaviour of the categorisation process.
The available 'page' attributes are listed in the table below :
| Attribute | Permitted Values | Mandatory (y/n) | Default value | |
|---|---|---|---|---|
| uri | URI to be categorised | Any valid http URI | n | - |
| text | text to be categorised | n | - | |
| created_at | The creation date of the page | n | Time of request |
The available 'options' parameters are the same as for a 'create' request.
Sample Response
A successful response is shown below.
<response status="OK">
<created-at>2009-04-01T22:56:30Z</created-at>
<entities>
<entity>
<uri>http://dbpedia.org/resource/United_Auto_Workers</uri>
<confidence>0.625</confidence>
<classification>http://muddy.it/ontology/Organisation</classification>
<position>345</position>
</entity>
<entity>
<uri>http://dbpedia.org/resource/United_Auto_Workers</uri>
<confidence>0.625</confidence>
<classification>http://muddy.it/ontology/Organisation</classification>
<position>445</position>
</entity>
<entity>
<uri>http://dbpedia.org/resource/Detroit</uri>
<confidence>0.252605482752542</confidence>
<classification>http://muddy.it/ontology/Place</classification>
<position>545</position>
</entity>
</entities>
<title>1 Auto story</title>
<identifier>65c02056-e561-455c-ac6f-239415160711</identifier>
</response>
View page
The view page API is used to update a view a pages details in the muddy.it system.
To view a pages details, a GET request should be submitted to the URI below. The request should be authenticated with a signed request signature.
http://muddy.it/collections/COLLECTION_TOKEN/pages/IDENTIFIER
An xml or json document should be submitted, with an appropriate 'Content-Type' header.
The available response formats are xml and json. To set the response format, the request 'Accept' header needs to be set to application/xml or application/json.
Sample Responses
A successful response is shown below :
<response status="OK">
<created-at>2009-04-01T22:56:30Z</created-at>
<entities>
<entity>
<uri>http://dbpedia.org/resource/United_Auto_Workers</uri>
<confidence>0.625</confidence>
<classification>http://muddy.it/ontology/Organisation</classification>
<position>345</position>
</entity>
<entity>
<uri>http://dbpedia.org/resource/United_Auto_Workers</uri>
<confidence>0.625</confidence>
<classification>http://muddy.it/ontology/Organisation</classification>
<position>445</position>
</entity>
<entity>
<uri>http://dbpedia.org/resource/Detroit</uri>
<confidence>0.252605482752542</confidence>
<classification>http://muddy.it/ontology/Place</classification>
<position>545</position>
</entity>
</entities>
<title>1 Auto story</title>
<identifier>65c02056-e561-455c-ac6f-239415160711</identifier>
</response>
Delete page
The delete page API is used to delete a page in the muddy.it system.
To delete a page, a HTTP DELETE request should be submitted to the URI below, where COLLECTION_TOKEN is the token of the collection for the page to be deleted and IDENTIFIER is the assigned identifier for the page categorisation to be deleted. The request should be authenticated with a signed request signature.
http://muddy.it/collections/COLLECTION_TOKEN/pages/IDENTIFIER
Sample Responses
A successful response returns a HTTP 200 OK