Monitor API Tutorial, Part 5: A Simple Case Study
import requests import json username = '***' password = '***' api_key = '***' base_url = 'https://ethersource.gavagai.se/ethersource/rest/v2'
Now, when we have a better understanding of the different parts of the system, we would like to walk you through a case study of how you would set up a configuration using Python that best suits your needs. While setting up a configuration is quite simple, the tricky part is to make sure that the configuration retrieves as many relevant documents as possible (high recall), while still keeping the number of irrelevant documents low (high precision).
Novice: Tracking IKEA
IKEA is a particularly grateful target to set up, since it is a unique term seldom used in other contexts than when referring to the furniture company. That is, it is not ambiguous, making it very easy to get both high recall and high precision with minimal effort.
Let's make the initial target configuration, only containing the word IKEA and its genitive form, and using the /documents API call to view the titles, a snippet, and URLs of example documents.
ikea_terms = ['ikea', 'ikea\'s'] call_url = base_url + '/documents?maxResults=5&apiKey=' + api_key + '&term=' + '&term='.join(ikea_terms) req = requests.get(call_url, auth=(username, password)) # Iterating through the document snippets from the returned request, printing titles and URLs for snippet in req.json()['documentSnippets']: print("Title: " + snippet['title']) print("Snippet: " + snippet['snippets'][0]) print("URL: " + snippet['url'] + '\n')
Title: tadjustable height desk ikea Snippet: IKEA last week introduced a new desk that can be raised and lowered with the push of a button to. Though height URL: http://tvl.allalla.com/g8ig Title: Ikea Bedrooms Ideas Snippet: Here are my tips for navigating the Showroom floor. Best ikea living room designs for 2012 - interior design ideas URL: http://minimalist-interiors.blogspot.com/2014/11/ikea-bedrooms-ideas.html Title: IKEA Canada, Save the Children and UNICEF Launch Annual Soft Toy Campaign to Support Education for t | Non Profits Snippet: IKEA Canada, Save the Children and UNICEF Launch Annual Soft Toy Campaign to Support Education for the World''s Most URL: http://www.businesspress24.com/pressrelease1314947/ikea-canada-save-the-children-and-unicef-launch-annual-soft-toy-campaign-to-support-education-for-the-worldund-x0027-und-x0027-s-most-vulnerable-children.html Title: The IKEA Soft Toys for Education Campaign Returns; It's Already Improved the Lives of 11 Million Children Snippet: SOURCE: IKEA USA November 03, 2014 07:00 ET The IKEA Soft Toys for Education Campaign Returns; It's Already Improved URL: http://www.marketwired.com/press-release/ikea-soft-toys-education-campaign-returns-its-already-improved-lives-11-million-children-1963551.htm Title: The IKEA Soft Toys for Education campaign returns; it’s already improved the lives of 11 million children - IKEA USA Snippet: The IKEA Soft Toys for Education campaign returns; it’s already improved the lives of 11 million children Mon, Nov 03 URL: http://news.cision.com/ikea-usa/r/the-ikea-soft-toys-for-education-campaign-returns--it-s-already-improved-the-lives-of-11-million-chi,c9672474
Based on the snippets, they indeed all seem to be relevant to IKEA. Some of them might seem to be duplicates of one another, based on very similar content, but notice that they are from different URLs. We do not perform deduplication on documents that contain the same content but are available from different URLs: spread-out press-releases could be interesting data for yo
Since everything is relevant, we will continue with setting up the observer. Now, let's create an IKEA observer using the observers request.
observer_config = {'targetTerms': ikea_terms, 'language': 'EN', # Important: capital letters on language 'name': 'IKEA'} req = requests.post(base_url + '/observers?apiKey=' + api_key, auth=(username, password), data=json.dumps(observer_config), headers={'content-type': 'application/json;charset="UTF-8"'}) print(req.json())
{'kpiId': 0, 'name': 'IKEA', 'id': 12666, 'created': '2014-11-04 11:22:36 CET'}
And we have created a new observer! To make sure everything looks good, we can list the observer configuration.
requests.get(base_url + '/observers/12667?apiKey=' + api_key, auth=(username, password)).json()
{'disambiguationIncludeTerms': [], 'kpiId': 0, 'name': 'IKEA', 'description': '', 'language': 'EN', 'targetTerms': ['ikea', 'ikea\'s'], 'editable': True, 'disambiguationExcludeTerms': [], 'id': 12667, 'created': '2014-11-04 11:32:01 CET'}
And everything looks good! Let's wait for a while and check back when we have more data to analyze.