Muddy

Making your content more findable and connected

Developer Guide › Pages

  1. Introduction
  2. Authentication
  3. Request Limits
  4. Methods:
    1. Extract
    2. Collections
    3. Pages
    4. Related Pages
    5. Entities
    6. Related Entities

'Pages' are the webpages you categorise using muddy. A page belongs to a collection and contains a list of entities that muddy has identified in the page.

Pages can be created, updated and deleted via the API, as well as from the Dashboard interface.

Methods

  1. Create/Index page
  2. Update/Index page
  3. View page
  4. Delete page

Create/Index page

The create/index page API is used to categorise a page or some text and store it in the muddy.it system. The API can be called in a synchronous or asynchronous fashion ('realtime' or 'fire and forget').

To index a web page, a post should be submitted to the URI below, where COLLECTION_TOKEN is the token of the collection you wish the page categorisation results to be stored with. The request should be authenticated with a signed request signature.

http://muddy.it/collections/COLLECTION_TOKEN/pages

An xml or json document should be submitted, with an appropriate 'Content-Type' header.

The available response formats are xml and json. To set the response format, the request 'Accept' header needs to be set to application/xml or application/json.

The HTTP request must contain a document that defines the URL or text to be analysed and may also contain configuration options for the request. The document may be submitted as an XML or JSON response. The document contains two sections, the 'page' section that contains attributes to be set for the page and an 'options' section which customises the behaviour of the categorisation process.

The available 'page' attributes are listed in the table below :

Attribute Permitted Values Mandatory (y/n) Default value
identifier An identifier for the page n A GUID
uri URI to be categorised Any valid http URI n -
text text to be categorised n -
created_at The creation date of the page n Time of request

The available 'options' parameters are listed below :

Parameter Permitted Values Mandatory (y/n) Default value
realtime Request should be synchronous or asynchronous true|false n false
include_content Provides information about the extracted page content, useful for debugging true|false n false
include_unclassified Include dbpedia links that are not entities (have no 'type') true|false n false
minimum_confidence Only return results with a confidence level >= minimum confidence true|false n 0
extraction_match Extract content using supplied regex regex n -
extended_response Include relevent dbpedia attributes in response for each result true|false n false
disambiguate Disambiguate ambiguous entitites and find the best match true|false n true
content_extractor Content extractor used to identify key text standard n standard
term_extractor Allows the specification of the type of term extraction method used by muddy.it standard|custom n standard
store Store the page against a collection (only used with the 'realtime' option) true|false n true
content Used to pass in the unparsed page content (useful when content has already been retrieved) Any utf8 string n -
Sample Request

View page.xsd

<?xml version="1.0" encoding="UTF-8"?>
<page>
  <uri>http://news.bbc.co.uk/1/hi/entertainment/8219362.stm</uri>
  <options>
    <realtime>true</realtime>
  </options>
</page>
Sample Response

A successful response is shown below.

<response status="OK">
  <created-at>2009-04-01T22:56:30Z</created-at>
  <entities>
    <entity>
      <uri>http://dbpedia.org/resource/United_Auto_Workers</uri>
      <confidence>0.625</confidence>
      <classification>http://muddy.it/ontology/Organisation</classification>
      <position>345</position>
    </entity>
    <entity>
      <uri>http://dbpedia.org/resource/United_Auto_Workers</uri>
      <confidence>0.625</confidence>
      <classification>http://muddy.it/ontology/Organisation</classification>
      <position>445</position>
    </entity>
    <entity>
      <uri>http://dbpedia.org/resource/Detroit</uri>
      <confidence>0.252605482752542</confidence>
      <classification>http://muddy.it/ontology/Place</classification>
      <position>545</position>
    </entity>
  </entities>
  <title>1 Auto story</title>
  <identifier>65c02056-e561-455c-ac6f-239415160711</identifier>
</response>

Refresh/Index page

The refresh/index page API is used to update a categorised page store it in the muddy.it system. The API can be called in a synchronous or asynchronous fashion ('realtime' or 'fire and forget').

To update a web page categorisation, a PUT request should be submitted to the URI below, where COLLECTION_TOKEN is the token of the collection you wish the page categorisation results to be stored with and IDENTIFIER is the assigned identifier for the page categorisation to be updated. The request should be authenticated with a signed request signature.

http://muddy.it/collections/COLLECTION_TOKEN/pages/IDENTIFIER

An xml or json document should be submitted, with an appropriate 'Content-Type' header.

The available response formats are xml and json. To set the response format, the request 'Accept' header needs to be set to application/xml or application/json.

The HTTP request can contain a document that may define configuration options for the request. The document may be submitted as an XML or JSON response. The document contains two sections, the 'page' section that contains attributes to be set for the page and an 'options' section which customises the behaviour of the categorisation process.

The available 'page' attributes are listed in the table below :

Attribute Permitted Values Mandatory (y/n) Default value
uri URI to be categorised Any valid http URI n -
text text to be categorised n -
created_at The creation date of the page n Time of request

The available 'options' parameters are the same as for a 'create' request.

Sample Response

A successful response is shown below.

<response status="OK">
  <created-at>2009-04-01T22:56:30Z</created-at>
  <entities>
    <entity>
      <uri>http://dbpedia.org/resource/United_Auto_Workers</uri>
      <confidence>0.625</confidence>
      <classification>http://muddy.it/ontology/Organisation</classification>
      <position>345</position>
    </entity>
    <entity>
      <uri>http://dbpedia.org/resource/United_Auto_Workers</uri>
      <confidence>0.625</confidence>
      <classification>http://muddy.it/ontology/Organisation</classification>
      <position>445</position>
    </entity>
    <entity>
      <uri>http://dbpedia.org/resource/Detroit</uri>
      <confidence>0.252605482752542</confidence>
      <classification>http://muddy.it/ontology/Place</classification>
      <position>545</position>
    </entity>
  </entities>
  <title>1 Auto story</title>
  <identifier>65c02056-e561-455c-ac6f-239415160711</identifier>
</response>

View page

The view page API is used to update a view a pages details in the muddy.it system.

To view a pages details, a GET request should be submitted to the URI below. The request should be authenticated with a signed request signature.

http://muddy.it/collections/COLLECTION_TOKEN/pages/IDENTIFIER

An xml or json document should be submitted, with an appropriate 'Content-Type' header.

The available response formats are xml and json. To set the response format, the request 'Accept' header needs to be set to application/xml or application/json.

Sample Responses

A successful response is shown below :

<response status="OK">
  <created-at>2009-04-01T22:56:30Z</created-at>
  <entities>
    <entity>
      <uri>http://dbpedia.org/resource/United_Auto_Workers</uri>
      <confidence>0.625</confidence>
      <classification>http://muddy.it/ontology/Organisation</classification>
      <position>345</position>
    </entity>
    <entity>
      <uri>http://dbpedia.org/resource/United_Auto_Workers</uri>
      <confidence>0.625</confidence>
      <classification>http://muddy.it/ontology/Organisation</classification>
      <position>445</position>
    </entity>
    <entity>
      <uri>http://dbpedia.org/resource/Detroit</uri>
      <confidence>0.252605482752542</confidence>
      <classification>http://muddy.it/ontology/Place</classification>
      <position>545</position>
    </entity>
  </entities>
  <title>1 Auto story</title>
  <identifier>65c02056-e561-455c-ac6f-239415160711</identifier>
</response>

Delete page

The delete page API is used to delete a page in the muddy.it system.

To delete a page, a HTTP DELETE request should be submitted to the URI below, where COLLECTION_TOKEN is the token of the collection for the page to be deleted and IDENTIFIER is the assigned identifier for the page categorisation to be deleted. The request should be authenticated with a signed request signature.

http://muddy.it/collections/COLLECTION_TOKEN/pages/IDENTIFIER
Sample Responses

A successful response returns a HTTP 200 OK