Monitor API Tutorial, Part 5: A Simple Case Study

import requests
import json
username = '***'
password = '***'
api_key = '***'
base_url = 'https://ethersource.gavagai.se/ethersource/rest/v2'

 

Now, when we have a better understanding of the different parts of the system, we would like to walk you through a case study of how you would set up a configuration using Python that best suits your needs. While setting up a configuration is quite simple, the tricky part is to make sure that the configuration retrieves as many relevant documents as possible (high recall), while still keeping the number of irrelevant documents low (high precision).

Novice: Tracking IKEA

IKEA is a particularly grateful target to set up, since it is a unique term seldom used in other contexts than when referring to the furniture company. That is, it is not ambiguous, making it very easy to get both high recall and high precision with minimal effort.

Let's make the initial target configuration, only containing the word IKEA and its genitive form, and using the /documents API call to view the titles, a snippet, and URLs of example documents.

ikea_terms = ['ikea', 'ikea\'s']
call_url = base_url + '/documents?maxResults=5&apiKey=' + api_key + '&term=' + '&term='.join(ikea_terms)
req = requests.get(call_url, auth=(username, password))
# Iterating through the document snippets from the returned request, printing titles and URLs
for snippet in req.json()['documentSnippets']:
    print("Title: " + snippet['title'])
    print("Snippet: " + snippet['snippets'][0])
    print("URL: " + snippet['url'] + '\n')
    Title: tadjustable height desk ikea
    Snippet: IKEA last week introduced a new desk that can be raised and lowered with the push of a button to. Though height
    URL: http://tvl.allalla.com/g8ig
    
    Title: Ikea Bedrooms Ideas
    Snippet: Here are my tips for navigating the Showroom floor.     Best ikea living room designs for 2012 - interior design ideas
    URL: http://minimalist-interiors.blogspot.com/2014/11/ikea-bedrooms-ideas.html
    
    Title: IKEA Canada, Save the Children and UNICEF Launch Annual Soft Toy Campaign to Support Education for t | Non Profits
    Snippet: IKEA Canada, Save the Children and UNICEF Launch Annual Soft Toy Campaign to Support Education for the World''s Most
    URL: http://www.businesspress24.com/pressrelease1314947/ikea-canada-save-the-children-and-unicef-launch-annual-soft-toy-campaign-to-support-education-for-the-worldund-x0027-und-x0027-s-most-vulnerable-children.html
    
    Title: The IKEA Soft Toys for Education Campaign Returns; It's Already Improved the Lives of 11 Million Children
    Snippet: SOURCE: IKEA USA
    
    November 03, 2014 07:00 ET
    
    The IKEA Soft Toys for Education Campaign Returns; It's Already Improved
    URL: http://www.marketwired.com/press-release/ikea-soft-toys-education-campaign-returns-its-already-improved-lives-11-million-children-1963551.htm
    
    Title: The IKEA Soft Toys for Education campaign returns; it’s already improved the lives of 11 million children - IKEA USA
    Snippet: The IKEA Soft Toys for Education campaign returns; it’s already improved the lives of 11 million children
    
    Mon, Nov 03
    URL: http://news.cision.com/ikea-usa/r/the-ikea-soft-toys-for-education-campaign-returns--it-s-already-improved-the-lives-of-11-million-chi,c9672474

 

Based on the snippets, they indeed all seem to be relevant to IKEA. Some of them might seem to be duplicates of one another, based on very similar content, but notice that they are from different URLs. We do not perform deduplication on documents that contain the same content but are available from different URLs: spread-out press-releases could be interesting data for yo

Since everything is relevant, we will continue with setting up the observer. Now, let's create an IKEA observer using the observers request.

observer_config = {'targetTerms': ikea_terms,
                   'language': 'EN',           # Important: capital letters on language
                   'name': 'IKEA'}
req = requests.post(base_url + '/observers?apiKey=' + api_key, auth=(username, password), 
                    data=json.dumps(observer_config), 
                    headers={'content-type': 'application/json;charset="UTF-8"'})
print(req.json())
{'kpiId': 0, 'name': 'IKEA', 'id': 12666, 'created': '2014-11-04 11:22:36 CET'}

And we have created a new observer! To make sure everything looks good, we can list the observer configuration.

requests.get(base_url + '/observers/12667?apiKey=' + api_key, auth=(username, password)).json()
{'disambiguationIncludeTerms': [],
     'kpiId': 0,
     'name': 'IKEA',
     'description': '',
     'language': 'EN',
     'targetTerms': ['ikea', 'ikea\'s'],
     'editable': True,
     'disambiguationExcludeTerms': [],
     'id': 12667,
     'created': '2014-11-04 11:32:01 CET'}

And everything looks good! Let's wait for a while and check back when we have more data to analyze.